4 Replies Latest reply on Mar 15, 2010 6:41 AM by sannegrinovero

    LuceneDirectory limited by Memory

    theunique89

      Hi,

       

      we are experimenting with LuceneDirectory from Infinispan. Out index is really huge and does not fit into memory.

      In our experiments Infinispan keeps the complete transaction in memory until we close the IndexWriter. A simple commit

      does not help. Is there a way that Infinispan writes the cache entries into a cache store?

       

      ciao.frank.

        • 1. Re: LuceneDirectory limited by Memory
          sannegrinovero

          Hi,

          the Infinispan Directory is using an internal transaction starting at lock() of Directory and committed ad unlock(), so the relevant index segments are loaded in memory for a lifespan dependent to the IndexWriter lifespan: the Lucene IndexWriter acquires the lock at initialization and releases it at close().

           

          This is not dependent to your commit(), as that wouldn't be really possible: a commit() in Lucene can't be mapped directly to a transaction as it might be followed by more changes.

           

          I'd suggest you to close the IndexWriter as soon as possible, or frequently during batch operations, to make sure you clean up references to segments. You would like to close the IndexWriter frequently and keep it open as short as possible anyway, or other nodes won't be able to open one and will timeout on lock aquires.

           

          Sanne

          • 2. Re: LuceneDirectory limited by Memory
            theunique89

            Hi,

             

            thank you for your anwser. Your solution will work if we only index deltas but we sometimes want to optimize the complete index with

            IndexWriter.optimize(). I think this method will read/write the complete index and this will load the complete index in the memory. Is this true?

             

            ciao.frank.

            • 3. Re: LuceneDirectory limited by Memory
              sannegrinovero

              you make a good point, optimizing is definitely going to mess with all segments in the same transaction; for batch works of this size you need to back off from the Transactional LockFactory.

               

              I just committed https://jira.jboss.org/jira/browse/ISPN-372 in trunk, which makes it possible to use a transactionless Index: it will behave like a filesystem, sending out changes to the cluster as soon as they are done.

              If you switch to use the new org.infinispan.lucene.locking.BaseLockFactory (in trunk now, and now used by default) you can even use the standard merge strategy.

               

              Watch for https://jira.jboss.org/jira/browse/ISPN-250 which is meant to improve performance of the transactionless mode to enable batching on the transport layer.


              • 4. Re: LuceneDirectory limited by Memory
                sannegrinovero

                there's a dirty but effective workaround: if you just commit the transaction used by the cache underlying the Directory, and don't start a new one, subsequent operations will happen out-of-transaction and be more memory-friendly.

                This would get you the same behaviour as using the new BaseLockFactory.