5 Replies Latest reply on Jan 20, 2017 9:21 AM by vladsz83

    How to configure actually working search index over Infinispan for a distributed cache

    vladsz83

      Hi, folks!

       

      Can anyone pls. help me to setup well-working search for replicated/distributed caches?

      I need to allocate a distributed cache on 2-4 nodes and a replicated cache on 2 nodes. Both might be synchronized or not (not decided yet). I’m using only sync. ones now. I also need searching them so I enabled the index:

       

      ConfigurationBuilder cfg = …;

      cfg.clustering().cacheMode(CacheMode.REPL_SYNC)

      .transaction().transactionMode(TransactionMode.TRANSACTIONAL)

      .indexing().index(Index.ALL).indexing()

      .addProperty("default.directory_provider", "infinispan")

      .addProperty("default.chunk_size", "524288")

       

      and optionally the NRT

      .indexing().addProperty("default.indexmanager", "near-real-time");

       

       

      The most widely used configurations for the index caches are:

      "LuceneIndexesData":            CacheMode.REPL_ASYNC

      "LuceneIndexesMetadata":     CacheMode.REPL_SYNC

      "LuceneIndexesLocking":       CacheMode.LOCAL

       

      I unveiled that it is completely impossible to use DIST_ASYNC or REPL_SYNC for the lock data cache ("LuceneIndexesLocking"). Once more that one node appears in the cluster, Lucene starts yielding:

       

      ERROR LogErrorHandler HSEARCH000058: Exception occurred org.apache.lucene.store.LockObtainFailedException: lock instance already assigned

      Primary Failure:

                          Entity com.bpcbt.test.cache.Record  Id S:61319  Work Type  org.hibernate.search.backend.UpdateLuceneWork

      Subsequent failures:

                          Entity com.bpcbt.test.cache.Record  Id S:114133  Work Type  org.hibernate.search.backend.UpdateLuceneWork

                          Entity com.bpcbt.test.cache.Record  Id S:128795  Work Type  org.hibernate.search.backend.UpdateLuceneWork

       

      org.apache.lucene.store.LockObtainFailedException: lock instance already assigned

                          at org.infinispan.lucene.impl.CommonLockObtainUtils.failLockAcquire(CommonLockObtainUtils.java:33)

                          at org.infinispan.lucene.impl.CommonLockObtainUtils.attemptObtain(CommonLockObtainUtils.java:20)

                          at org.infinispan.lucene.impl.BaseLockFactory.obtainLock(BaseLockFactory.java:35)

                          at org.infinispan.lucene.impl.BaseLockFactory.obtainLock(BaseLockFactory.java:18)

                          at org.infinispan.lucene.impl.DirectoryLucene.obtainLock(DirectoryLucene.java:152)

                          at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:776)

                          at org.hibernate.search.backend.impl.lucene.IndexWriterHolder.createNewIndexWriter(IndexWriterHolder.java:123)

                          at org.hibernate.search.backend.impl.lucene.IndexWriterHolder.getIndexWriter(IndexWriterHolder.java:89)

                          at org.hibernate.search.backend.impl.lucene.AbstractWorkspaceImpl.getIndexWriter(AbstractWorkspaceImpl.java:117)

                          at org.hibernate.search.backend.impl.lucene.AbstractWorkspaceImpl.getIndexWriterDelegate(AbstractWorkspaceImpl.java:203)

                          at org.hibernate.search.backend.impl.lucene.LuceneBackendQueueTask.applyUpdates(LuceneBackendQueueTask.java:80)

                          at org.hibernate.search.backend.impl.lucene.LuceneBackendQueueTask.run(LuceneBackendQueueTask.java:46)

                          at org.hibernate.search.backend.impl.lucene.SyncWorkProcessor$Consumer.applyChangesets(SyncWorkProcessor.java:162)

                          at org.hibernate.search.backend.impl.lucene.SyncWorkProcessor$Consumer.run(SyncWorkProcessor.java:148)

                          at java.lang.Thread.run(Thread.java:745)

      ERROR LuceneBackendQueueTask HSEARCH000072: Couldn't open the IndexWriter because of previous error: operation skipped, index ouf of sync!

       

      To test and compare performance of the searching with different setups I had to set the index data cache in LOCAL mode. But I guess it’s incorrectly to use local locking mode. Isn’t it?

       

      Moreover, depending on the following parameters:

      -    Mode of the target cache: DIST_SYNC/REPL_SYNC

      -    NRT (“near-real-time”): on/off

      -    Node number: 1 to 4 (on same machine)

      -    Mode of the index data cache: DIST_SYNC/REPL_SYNC

       

      I could get a successful runs or failures inside the index directory or locking routines like the shown above one or:

       

      ERROR LogErrorHandler HSEARCH000058: Exception occurred java.io.FileNotFoundException: Error loading metadata for index file: M|segments_1s|com.bpcbt.test.cache.Record|-1

      Primary Failure:

      Entity com.bpcbt.test.cache.Record  Id S:17510  Work Type  org.hibernate.search.backend.UpdateLuceneWork

      Subsequent failures:

      Entity com.bpcbt.test.cache.Record  Id S:42430  Work Type  org.hibernate.search.backend.UpdateLuceneWork

        java.io.FileNotFoundException: Error loading metadata for index file: M|segments_1s|com.my.infinitest.TestRecord|-1

      at org.infinispan.lucene.impl.DirectoryImplementor.openInput(DirectoryImplementor.java:138)

      at org.infinispan.lucene.impl.DirectoryLucene.openInput(DirectoryLucene.java:102)

      at org.apache.lucene.store.Directory.openChecksumInput(Directory.java:109)

      at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:294)

      at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:171)

      at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:949)

      at org.hibernate.search.backend.impl.lucene.IndexWriterHolder.createNewIndexWriter(IndexWriterHolder.java:123)

      at org.hibernate.search.backend.impl.lucene.IndexWriterHolder.getIndexWriter(IndexWriterHolder.java:89)

      at org.hibernate.search.backend.impl.lucene.AbstractWorkspaceImpl.getIndexWriter(AbstractWorkspaceImpl.java:117)

      at org.hibernate.search.backend.impl.lucene.AbstractWorkspaceImpl.getIndexWriterDelegate(AbstractWorkspaceImpl.java:203)

      at org.hibernate.search.backend.impl.lucene.LuceneBackendQueueTask.applyUpdates(LuceneBackendQueueTask.java:80)

      at org.hibernate.search.backend.impl.lucene.LuceneBackendQueueTask.run(LuceneBackendQueueTask.java:46)

      at org.hibernate.search.backend.impl.lucene.SyncWorkProcessor$Consumer.applyChangesets(SyncWorkProcessor.java:162)

      at org.hibernate.search.backend.impl.lucene.SyncWorkProcessor$Consumer.run(SyncWorkProcessor.java:148)

      at java.lang.Thread.run(Thread.java:745)

      ERROR LuceneBackendQueueTask HSEARCH000072: Couldn't open the IndexWriter because of previous error: operation skipped, index ouf of sync!

       

      or

       

      ERROR LogErrorHandler HSEARCH000058: HSEARCH000117: IOException on the IndexWriter

      java.io.IOException: Read past EOF

      at org.infinispan.lucene.impl.SlicedBufferIndexInput.readByte(SlicedBufferIndexInput.java:64)

      at org.apache.lucene.store.DataInput.readInt(DataInput.java:101)

      at org.apache.lucene.codecs.CodecUtil.checkHeader(CodecUtil.java:194)

      at org.apache.lucene.codecs.CodecUtil.checkIndexHeader(CodecUtil.java:255)

      at org.apache.lucene.codecs.lucene50.Lucene50PostingsReader.<init>(Lucene50PostingsReader.java:93)

      at org.apache.lucene.codecs.lucene50.Lucene50PostingsFormat.fieldsProducer(Lucene50PostingsFormat.java:443)

      at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.<init>(PerFieldPostingsFormat.java:261)

      at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat.fieldsProducer(PerFieldPostingsFormat.java:341)

      at org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:104)

      at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:65)

      at org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:145)

      at org.apache.lucene.index.BufferedUpdatesStream$SegmentState.<init>(BufferedUpdatesStream.java:385)

      at org.apache.lucene.index.BufferedUpdatesStream.openSegmentStates(BufferedUpdatesStream.java:417)

      at org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates(BufferedUpdatesStream.java:262)

      at org.apache.lucene.index.IndexWriter.applyAllDeletesAndUpdates(IndexWriter.java:3161)

      at org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:3147)

      at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2809)

      at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2963)

      at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2930)

      at org.hibernate.search.backend.impl.lucene.IndexWriterHolder.commitIndexWriter(IndexWriterHolder.java:146)

      at org.hibernate.search.backend.impl.lucene.IndexWriterHolder.commitIndexWriter(IndexWriterHolder.java:159)

       

      I noticed that DIST_SYNC mode of the index data cache significantly reduces search performance compared to REPL_SYNC mode. Also disabling NRT leads to dramatic decrease of search performance and to huge increase of cache loading and replication time, but keeps you away from search misses when the sharding actually engaged.

       

      However…

       

      I found the only accebtable ones:

      - target cache is REPL_SYNC. Works much stablier, consistently and faster in coherence with another options

      - NRT is on (works muuuuuch faster)

      - index data cace is also REPL_SYNC. Works much stablier and faster.

      - lock data cache is LOCAL. The only mode whick doesn't crash

       

      I can't say this set perfectly fits my needs. it just works.

       

      How to configure the searching to make it working out-of-box, using various resonable options of cache and index, not getting the output flooded with

      headache-making errors?

        • 1. Re: How to configure actually working search index over Infinispan for a distributed cache
          gustavonalle

          There is a configuration called auto-config that will choose sensible defaults depending on the cache type.

           

          Some comments about the configurations you tried:

           

          * Indexing caches cannot be ASYNC

           

          * If you are using 'default.directory_provider' as  'infinispan' alone and trying to use clustered index caches, it will not work. The reason why is that Lucene requires a single index-wide lock to do writes, and you will

          eventually hit "Lock already assigned". To use the 'infinispan' directory provider in a cluster you'll need to configure an IndexManager that will manage the lock for you. It is recommend to use "<entity|default>.indexmanager" to

          "org.infinispan.query.indexmanager.InfinispanIndexManager", and you can omit the "default.directory_provider" setting.

           

          * If you set the index caches as LOCAL, you will end-up with each node having its own separate index. This is fine if your cache is REPL with Index.ALL, since the full index will be present on every

          node of the cluster and you can query every node that you'll get right results. Typically NRT can be used here for maximum performance. On the other hand, if index caches are local and but your cache is DIST, you will end up

          with each node in the cluster with its own different index. In this scenario, when doing queries, you will need to query all the nodes. This can be achieved by using the Clustered Query API. Some example here

           

          * When you are not planning to use Clustered queries, it's recommended to have the LOCK_CACHE as REPL. It's a very tiny cache and frequently accessed. Same holds for METADATA_CACHE.

           

          * When using a DIST cache with the "org.infinispan.query.indexmanager.InfinispanIndexManager", you can have a MASSIVE performance increase by doing all the indexing asynchronously.

          To configure it, set "default.worker.execution" to "async". The trade-off is that you will have a small delay to have data changes reflected on searches. The delay is configurable and by default it's 1 second.

           

          1 of 1 people found this helpful
          • 2. Re: How to configure actually working search index over Infinispan for a distributed cache
            vladsz83

            Thanks a lot, Gustavo.

             

            Actually, I've never used ASYNC mode. It's a mistake. Indeed, it's irrelevant for locking.

             

            I'll consider using LOCAL index data cache for my REPL_SYNC target cache running with NRT. This combination works stable, updates index in time and provides three to five time faster search compared to database.

             

            For DIST_SYNC target cache, sadly, I have to find a solution. I enabled InfinispanIndexManager and tested search time of sync and async workers. The sync one doesn’t exhibit cache misses, but its search time is 60-130% of similar db searching. The more records it searches, the slower it becomes in comparison with db. Async worker consumes 35-45% search time relatively to db, but constantly misses a few records of the testing set.

             

            By the way, I’ve just met another problem. As I assume, it slows down cache copying/replication between nodes. Once an additional node starts, one, several or all of them are screaming for a while:

             

            failed submitting DONT_BUNDLE message to thread pool: java.util.concurrent.RejectedExecutionException: Task org.jgroups.protocols.TP$SingleMessageHandler@3cbc14f1 rejected from java.util.concurrent.ThreadPoolExecutor@75c43f01[Running, pool size = 10, active threads = 2, queued tasks = 0, completed tasks = 40]. Msg: FRAG2: [id=1, frag_id=42, num_frags=43], UNICAST3: DATA, seqno=64, TP: [cluster_name=testCluster]

            • org.jgroups.protocols.TP removeAndDispatchNonBundledMessages

             

            What’s wrong with JGroups? The pool is not even full. I see 0 to 3 active threads. Can’t find a solution.

            • 3. Re: How to configure actually working search index over Infinispan for a distributed cache
              gustavonalle
              Async worker consumes 35-45% search time relatively to db, but constantly misses a few records of the testing set.

              It should miss if you query immediately after doing a cache operation, as I mentioned before, there is a refresh time of 1s by default.

               

              With relation to the exception, could you provide more info? Which Infinispan version? Do you have stack trace? Do you have a reproducer?

              • 4. Re: How to configure actually working search index over Infinispan for a distributed cache
                vladsz83

                The usecase is simple: a node starts, gets cache, fills it if empty. Nothing more. The error message appear during replication ( EmbeddedCacheManager::getCache() ). The cache is DIST_SYNC or REPl_SYNC, indexed or not. Reproduces each run. No stack trace appears

                 

                I use Infinispan 8.2.5.Final.

                • 5. Re: How to configure actually working search index over Infinispan for a distributed cache
                  vladsz83

                  Got the stacktrace:

                   

                  java.util.concurrent.RejectedExecutionException: Task org.jgroups.protocols.TP$SingleMessageHandler@43e037a3 rejected from java.util.concurrent.ThreadPoolExecutor@712ec67c[Running, pool size = 10, active threads = 0, queued tasks = 0, completed tasks = 98]

                    at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)

                    at org.jgroups.util.ShutdownRejectedExecutionHandler.rejectedExecution(ShutdownRejectedExecutionHandler.java:33)

                    at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)

                    at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)

                    at org.jgroups.protocols.TP.removeAndDispatchNonBundledMessages(TP.java:1742)

                    at org.jgroups.protocols.TP.handleMessageBatch(TP.java:1660)

                    at org.jgroups.protocols.TP.receive(TP.java:1641)

                    at org.jgroups.protocols.BasicTCP.receive(BasicTCP.java:144)

                    at org.jgroups.blocks.cs.BaseServer.receive(BaseServer.java:154)

                    at org.jgroups.blocks.cs.TcpConnection$Receiver.run(TcpConnection.java:312)

                    at java.lang.Thread.run(Thread.java:745)

                   

                  Looks like it stemmed from default config of TCP jgroups protocol. I added my basic settings for the obb pool. Has helped. At least I'm not seeing the problem now.

                   

                  To the point, how can I affect the time window of index inconsistency in case of async workers? It’s is about 1 second. Ok. But what are the settings which influence it? Is there something useful: https://docs.jboss.org/hibernate/search/4.1/reference/en-US/html_single/#lucene-indexing-performance  ?

                   

                  Also wanted to ask whether each indexed cache has to get its own set of index cache (index data/meta/locking) or there could be just one index cache set per cluster? Same for index config. How do I set default index settings? In hibernate.properties? In the default cache config?

                   

                  Why does the autoconfig add

                   

                  "hibernate.search.default.exclusive_index_use" -> "true"

                  "hibernate.search.default.reader.strategy" -> "shared"

                   

                  for both distributed and not distributed cache? Why exclusive_index_use ? There could be several threads on several nodes which make change data and index.