4 Replies Latest reply on Mar 15, 2010 6:41 AM by sannegrinovero

LuceneDirectory limited by Memory

theunique89 Mar 10, 2010 4:57 PM

Hi,

we are experimenting with LuceneDirectory from Infinispan. Out index is really huge and does not fit into memory.

In our experiments Infinispan keeps the complete transaction in memory until we close the IndexWriter. A simple commit

does not help. Is there a way that Infinispan writes the cache entries into a cache store?

ciao.frank.

1. Re: LuceneDirectory limited by Memory

sannegrinovero Mar 11, 2010 9:59 AM (in response to theunique89)

Hi,
the Infinispan Directory is using an internal transaction starting at lock() of Directory and committed ad unlock(), so the relevant index segments are loaded in memory for a lifespan dependent to the IndexWriter lifespan: the Lucene IndexWriter acquires the lock at initialization and releases it at close().

This is not dependent to your commit(), as that wouldn't be really possible: a commit() in Lucene can't be mapped directly to a transaction as it might be followed by more changes.

I'd suggest you to close the IndexWriter as soon as possible, or frequently during batch operations, to make sure you clean up references to segments. You would like to close the IndexWriter frequently and keep it open as short as possible anyway, or other nodes won't be able to open one and will timeout on lock aquires.

Sanne
Actions
2. Re: LuceneDirectory limited by Memory

theunique89 Mar 13, 2010 11:32 AM (in response to sannegrinovero)

Hi,

thank you for your anwser. Your solution will work if we only index deltas but we sometimes want to optimize the complete index with
IndexWriter.optimize(). I think this method will read/write the complete index and this will load the complete index in the memory. Is this true?

ciao.frank.
Actions
3. Re: LuceneDirectory limited by Memory

sannegrinovero Mar 14, 2010 4:14 PM (in response to theunique89)

you make a good point, optimizing is definitely going to mess with all segments in the same transaction; for batch works of this size you need to back off from the Transactional LockFactory.

I just committed https://jira.jboss.org/jira/browse/ISPN-372 in trunk, which makes it possible to use a transactionless Index: it will behave like a filesystem, sending out changes to the cluster as soon as they are done.
If you switch to use the new org.infinispan.lucene.locking.BaseLockFactory (in trunk now, and now used by default) you can even use the standard merge strategy.

Watch for https://jira.jboss.org/jira/browse/ISPN-250 which is meant to improve performance of the transactionless mode to enable batching on the transport layer.
Actions
4. Re: LuceneDirectory limited by Memory

sannegrinovero Mar 15, 2010 6:41 AM (in response to sannegrinovero)

there's a dirty but effective workaround: if you just commit the transaction used by the cache underlying the Directory, and don't start a new one, subsequent operations will happen out-of-transaction and be more memory-friendly.
This would get you the same behaviour as using the new BaseLockFactory.
Actions

Go to original post