3 Replies Latest reply on Jun 28, 2019 3:57 AM by gustavonalle

Infinispan Lucence Directory - IndexWriter Master/Slave best practices?

muhmuhkuh Jun 26, 2019 11:21 AM

Hello,

at our company we are going to implement the infinispan lucene directory.

We have now got some problems with the IndexWriter. The documentation points out that there should only be one IndexWriter within a cluster.

Are there any best practices on how to implement that or maybe some examples to use?

1. Re: Infinispan Lucence Directory - IndexWriter Master/Slave best practices?

gustavonalle Jun 26, 2019 12:12 PM (in response to muhmuhkuh)

Yes, only one node can write to a single index at a time.

Infinispan has implementations that automatically handles index readers/writers cluster-wide, and so does Hibernate Search, but I imagine you are working on a pure Lucene project?
Actions
2. Re: Infinispan Lucence Directory - IndexWriter Master/Slave best practices?

muhmuhkuh Jun 26, 2019 1:10 PM (in response to gustavonalle)

Yes, this is correct.

We are using only Lucene in our application.
Now we would like to get the application able to run in a cluster, so we decided to use the Inifinispan Lucene Directory Implementation and it works really well except of the IndexWriter issue.

We thougt about implementing a wrapper around the index writer which "catches" all document updates and send them to a jChannel. Another idea was to use the NoLockFactory, so there is no lock on the index, but i have no clue what impact this will have.

Any hints on how to implement this properly?
Actions
3. Re: Infinispan Lucence Directory - IndexWriter Master/Slave best practices?

gustavonalle Jun 28, 2019 3:57 AM (in response to muhmuhkuh)

NoLockFactory will lead to corruption from the moment a second writer writes to the index.

If you have a wrapper that sends the documents to index to a single place (node) and have it index all the data, it could work. But there are some caveats: Lucene objects are not serializable. You'd need to make them serializable or come up with a way to avoid sending those objects around, by sending around maybe a serializable structure with the data you need to index/update/delete.
Another caveat is that you'd need to have some leader election process in place: in case the node that is indexing dies, another one needs to take over. Also, this leader election should be partition tolerant, to avoid the situation where your cluster would split it in half and you end-up with two "writers", thus causing corruptions.

Hibernate Search uses this strategy to index its data, by using JGroups or JMS "backends":

Hibernate Search 5.11.2.Final: Reference Guide

This strategy has some drawbacks for very large indexes under heavy write: the writer can become the bottleneck

Hope this helps!
Actions

Go to original post