3 Replies Latest reply on Jun 28, 2019 3:57 AM by gustavonalle

    Infinispan Lucence Directory - IndexWriter Master/Slave best practices?

    muhmuhkuh

      Hello,

       

      at our company we are going to implement the infinispan lucene directory.

      We have now got some problems with the IndexWriter. The documentation points out that there should only be one IndexWriter within a cluster.

      Are there any best practices on how to implement that or maybe some examples to use?

        • 1. Re: Infinispan Lucence Directory - IndexWriter Master/Slave best practices?
          gustavonalle

          Yes, only one node can write to a single index at a time.

           

          Infinispan has implementations that automatically handles index readers/writers cluster-wide, and so does Hibernate Search, but I imagine you are working on a pure Lucene project?

          • 2. Re: Infinispan Lucence Directory - IndexWriter Master/Slave best practices?
            muhmuhkuh

            Yes, this is correct.

             

            We are using only Lucene in our application.

            Now we would like to get the application able to run in a cluster, so we decided to use the Inifinispan Lucene Directory Implementation and it works really well except of the IndexWriter issue.

             

            We thougt about implementing a wrapper around the index writer which "catches" all document updates and send them to a jChannel. Another idea was to use the NoLockFactory, so there is no lock on the index, but i have no clue what impact this will have.

             

            Any hints on how to implement this properly?

            • 3. Re: Infinispan Lucence Directory - IndexWriter Master/Slave best practices?
              gustavonalle

              NoLockFactory will lead to corruption from the moment a second writer writes to the index.

               

              If you have a wrapper that sends the documents to index to a single place (node) and have it index all the data, it could work. But there are some caveats: Lucene objects are not serializable. You'd need to make them serializable or come up with a way to avoid sending those objects around, by sending around maybe a serializable structure with the data you need to index/update/delete.

              Another caveat is that you'd need to have some leader election process in place: in case the node that is indexing dies, another one needs to take over. Also, this leader election should be partition tolerant, to avoid the situation where your cluster would split it in half and you end-up with two "writers", thus causing corruptions.

               

              Hibernate Search uses this strategy to index its data, by using JGroups or JMS "backends":

               

              Hibernate Search 5.11.2.Final: Reference Guide

               

              This strategy has some drawbacks for very large indexes under heavy write: the writer can become the bottleneck

               

              Hope this helps!