OutOfMemoryError with Hibernate Search and Infinispan Directory
lbilger Mar 26, 2015 11:47 AMHi,
I am currently trying to set up hibernate search to use an infinispan directory. I am using JBoss EAP 6.2.4.GA and I have tried hibernate search versions 4.4.0.Final and 4.4.6.Final, which both use infinispan lucene directory 5.3.0.Final. I know these are pretty old, but I'm kind of stuck with them because newer versions would need newer versions of hibernate than what is contained in the EAP 6.2.4.GA.
Our setup is a cluster of two nodes - clustering is mainly for failover, not for performance reasons. But for my tests I currently use a single instance on my machine, so this is not about any clustering / replication issues. For testing, I configured the cache-container as follows in the standalone.xml:
<cache-container name="lucene" start="EAGER"> <transport cluster="HibernateSearch-Infinispan-cluster-${user.name}" lock-timeout="60000"/> <replicated-cache name="LuceneIndexesMetadata" mode="SYNC" remote-timeout="25000"> <locking acquire-timeout="20000" concurrency-level="500"/> <eviction max-entries="-1"/> <state-transfer timeout="480000"/> <file-store passivation="false" purge="false"/> </replicated-cache> <replicated-cache name="LuceneIndexesData" mode="SYNC" remote-timeout="25000"> <locking acquire-timeout="20000" concurrency-level="500"/> <eviction strategy="LIRS" max-entries="32"/> <state-transfer timeout="480000"/> <file-store passivation="false" purge="false"> <write-behind thread-pool-size="8"/> </file-store> </replicated-cache> <replicated-cache name="LuceneIndexesLocking" mode="SYNC" remote-timeout="25000"> <locking acquire-timeout="20000" concurrency-level="500"/> <state-transfer timeout="480000"/> </replicated-cache> </cache-container>
I have tried several variations of this configuration along with several combinations of hibernate.search.default.chunk_size, hibernate.search.default.indexwriter.merge_max_size, hibernate.search.default.indexwriter.merge_max_optimize_size and hibernate.search.default.indexwriter.ram_buffer_size as I could not find any definite recommendations regarding these properties. What I could find was:
- The chunk_size should optimally be such that a segment fits in a single chunk.
- Also, I noticed that the file-store becomes quite inefficient when there are many chunks per segment file, because the hash value used to determine the bucket does not include the chunk number, so usually there will be one bucket per segment file and each read from the file store will read the complete bucket.
- The segment size should be as large as possible as searching will be less efficient the more segments the index has.
Any suggestions regarding these values would be most welcome, but this question is targeted at another problem:
When I do choose the chunk_size such that each segment fits in a single chunk, I notice a lot of heap is consumed. This is regardless of the actual size of chunks and segments (I tried with 8MB and 16MB chunks). After some profiling, heap analysis and debugging it seems the problem is that when an IndexReader is opened by hibernate search (which it does upon deployment of the application), it in turns opens a SegmentReader for each segment of the index. Each of these SegmentReaders opens a SingleChunkIndexInput (or InfinispanIndexInput) that contains the chunk as a byte[]. So effectively I have the complete index in memory regardless of the cache's eviction settings. Only one chunk per segment is kept in memory this way, so it's not so bad if there are many chunks per segment, but that would contradict the one-chunk-per-segment rule above.
Did anybody have the same problem? Is there a way to fix this?
Thanks
Lars