This problem report is going to appear a little vague, but I have spent a day trying to diagnose / trace it without much success, so thought it was worth seeing if anybody else has seen a similar problem and/or might have any clues/ideas as to what is going on...
Our application has been developed on Infinispan 5.1.5, it makes use of indexing to retrieve objects and defines transactions on the caches. We have recently been evaluating Infinispan 5.2 / 5.2.1 because this has some fixes related to indexing and transactions which we really need. We have a test data generator which we use to populate our caches for testing. Under 5.1.5 this has always operated without problems. With 5.2.1 we are now seeing errors from the Lucene indexing system which causes data loading to fail and consequently the application to be unusable. Specifically these errors are:
20857 [Lucene Merge Thread #0 for index com.hp.ampa.dd.domain.AbstractObject] ERROR org.hibernate.search.exception.impl.LogErrorHandler - HSEARCH000058: HSEARCH000118: Exception during index Merge operation
org.apache.lucene.index.MergePolicy$MergeException: java.io.FileNotFoundException: /tmp/lucene4/com.hp.ampa.dd.domain.AbstractObject/_fd.cfs (Too many open files)
Caused by: java.io.FileNotFoundException: /tmp/lucene4/com.hp.ampa.dd.domain.AbstractObject/_fd.cfs (Too many open files)
at java.io.RandomAccessFile.open(Native Method)
Initially I tried experimenting with the various configuration properties for Lucene / Hibernate Search file indexes, e.g. merge_factor, max_merge_docs, etc. I tried most of them and with little effect, but the problem is more fundamental...
I then switched to monitoring the application usage of file handles by running the Linux "lsof" while the data load is in progress and I can see that the count for our process increases linearly as objects are added, until it presumably exceeds the "ulimit -n" setting. If I monitor the same situation when 5.1.5 is in use then the count of open files displayed by lsof is stable. The only difference is switching between Infinispan 5.1.5 and 5.2.1.
The test involves injecting 10,000 objects into the cache, although for the 5.2.1 attempt only a couple of hundred objects are successfully processed before the exceptions start. To add a little further confusion into the discussion, I have found that within our set of object types/caches, this problem affects some but not others - even though their configurations are identical. Obviously the objects have different structures and are all defined as @Indexed with various fields marked as @Field.
Runtime environment is Java 1.6.0_32 on RHEL 5.6
Any suggestions / ideas are appreciated,