Cache eviction with Lucene / HSearch
hvico Jan 23, 2012 6:39 AMHi!
I've recently released an HSearch application storing the indexes using InfiniSpan 4.x, and an MySQL JDBC-backed cacheloader. After a full index rebuild my cacheloader "LuceneIndexData" table has around 30.000 registers (indexing process lasts around 20 minutes). The number of documents to index grow really slowly in this applications, around 30 documents per day, so the indexes should not grow considerably.
The application was doing well, until some days passed. Then I got out of memory erros, and noticing the cacheloader LuceneData table had grown over 120.000 registers, so I had to truncate it and reindex from scratch.
After that, I started to keep track of the LuceneData table size, and noticed that it grows even on weekends where no new documents are stored, so this must be related to Infinispan. My Lucene config is set to optimize indexes after 1000 transactions, so it should not grow this way.
Possibly my problem is related to having a wrong eviction policy, because the Infinispan config I used had this configuration:
<eviction maxEntries="-1" />
At first, I changed that to:
<eviction maxEntries="30000" strategy="LIRS" wakeUpInterval="5000"/>
But then I could not rebuild my indexes, in the middle of the process I got "read past EOF" exceptions and "could not acquire lock" exceptions. I think Infinispan is trying to evict entries that are in use by the indexing process.
Now I am trying the following configuration in order to allow the indexing process to finish without any eviction occuring in the middle of it:
<eviction maxEntries="30000" strategy="LIRS" wakeUpInterval="1800000"/>
Using this configuration I got no errors, but I am blindly configuring something I do not really understand.
So, as you may see I am confused and do not really know how the eviction process works. I need all my entities to be available when I do a full text search, I cannot afford to get "partial" results, and of course I need the cache not to explode because out of memory errors. Could you please explain how Infinispan handles eviction when acting as a Lucene index store, and which eviction policy should I use?
Many thanks in advance,