11 Replies Latest reply on Jul 3, 2018 12:45 PM by Gustavo Fernandes

    Significant performance drop of distributed stream search when use persistence

    Nikolai Sahattchiev Newbie

      Hi all,

       

      we face a very big performance drop when our Infinispan cache is configured to use persistence.

       

      The test setup is as follow:

       

      2 boxes, each with 1 Xeon CPUs (with 8 cores and 2 threads per core) -> each box has 16 threads, in total we have 32 threads. Each box has 16 GB RAM memory.

      On each box we start 5 Infinispan nodes with java max heap 512 MB each -> in total we have 10 nodes spread over two boxes. The Infinispan version is 9.2.3.Final.

       

       

      The Infinispan cache is configured as below:


      <
      distributed-cache name="dist-sync" mode="SYNC" remote-timeout="300000" owners="2" segments="100">
          <locking concurrency-level="1000" acquire-timeout="60000"/>
          <transaction mode="NONE"/>

          <indexing index="LOCAL">
              <property name="default.indexmanager">elasticsearch</property>
              <property name="default.elasticsearch.host">http://10.20.0.40:9200</property>
              <property name="default.elasticsearch.max_total_connection">50</property>
              <property name="default.elasticsearch.max_total_connection_per_route">25</property>
              <property name="lucene_version">LUCENE_CURRENT</property>
          </indexing>

          <state-transfer timeout="60000"></state-transfer>
      </distributed-cache>

       

      We fill it with 1 million documents and run as performance test a distributed stream search:

       

      List<TestDocument> documentList = (List<TestDocument>)cache.values().parallelStream()
              .filter((Serializable & Predicate<? super TestDocument>) e -> e.getName().equals(nameToFind))
              .collect(CacheCollectors.serializableCollector(() -> Collectors.toList()));

      When we run the test with 10 parallel users, the throughput is about 18 requests/second and the average response time about 241 millis (min: 47, max: 1198). The test machines a fully utilized. Each search returns between 0 and 350 documents as result.

       

      When we do the same test on cache with enabled persistence with rocksdbStore (config below), the performance drops to 1.7 requests/second and average response time of 5281 millis (min: 427, max: 9292). We put exactly the same documents in the cache and execute exactly the same searches.

       

      <cache-container default-cache="dist-sync">
          <transport stack="my-tcp" cluster="mycluster"/>

          <distributed-cache name="dist-sync" mode="SYNC" remote-timeout="300000" owners="2" segments="100">
              <locking concurrency-level="1000" acquire-timeout="60000"/>
              <transaction mode="NONE"/>

              <persistence passivation="false">
                  <rocksdbStore:rocksdb-store preload="true" fetch-state="true" path="${HOME}/data/${nodename}/data/">
                      <rocksdbStore:expiration path="${HOME}/data/${nodename}/expired/"/>
                  </rocksdbStore:rocksdb-store>
              </persistence>

              <indexing index="LOCAL">
                  <property name="default.indexmanager">elasticsearch</property>
                  <property name="default.elasticsearch.host">http://10.20.0.40:9200</property>
                  <property name="default.elasticsearch.max_total_connection">50</property>

                  <property name="default.elasticsearch.max_total_connection_per_route">25</property>
                  <property name="lucene_version">LUCENE_CURRENT</property>
              </indexing>

              <state-transfer timeout="60000"></state-transfer>
          </distributed-cache>
      </cache-container>

      So, we do the same test twice - once without persistence and once with persistence. All other parameters are the same - CPUs, memory, java heap, number of Infinispan nodes, number and content of documents and search requests.
      The persistence is configured without passivation - all documents in this case must be in the memory. Eviction is disabled. 
      The performance without persistence is about 10x better. Has somebody an idea, what could be the reason for that?
        • 1. Re: Significant performance drop of distributed stream search when use persistence
          Galder Zamarreño Master

          Does the performance test include storing the documents? Or is it just limited to the distributed search?

          • 2. Re: Significant performance drop of distributed stream search when use persistence
            Nikolai Sahattchiev Newbie

            No, it is just limited to the distributed search. The result of:

             

            List<TestDocument> documentList = (List<TestDocument>)cache.values().parallelStream()....

             

            is discarded. We just measure the throughput and the response time.

            • 3. Re: Significant performance drop of distributed stream search when use persistence
              William Burns Expert

              So there are a few points to touch on here.

               

              Internally distributed streams always consults the cache loader when one is present. This is because we don't currently have any optimizations to ignore the loader. I thought we had an issue created for this, but since we didn't I created [1].

               

              The only way currently to get around this is to use the SKIP_CACHE_LOAD flag as I recommended in [2]. This would make it so it only reads in memory and ignores the loader.

               

              cache = cache.getAdvancedCache().withFlags(Flag.SKIP_CACHE_LOAD);  
              cache.entrySet().stream() ...  
              
              

               

              Also I would HIGHLY recommend upgrading to 9.3.0.Final if you are using in memory streams, as it includes [3] which should provide a major increase in performance for stream operations that return a little amount of data.

               

              And finally Infinispan 9.4, which is currently under development, will have performance improvements for cache loaders with streams, performance increase should be much larger than in memory with [4]. I am hoping to get in [1] in this version as well if possible, as well as some other various improvements for stores.

               

              [1] [ISPN-9343] Streams should skip loader when preload=true and eviction is not enabled - JBoss Issue Tracker

              [2] Cache with JpaStore constantly queries database

              [3] [ISPN-5451] Data Container Segment Striping - JBoss Issue Tracker

              [4] [ISPN-8905] Segment-aware non-shared cache stores - JBoss Issue Tracker

              1 of 1 people found this helpful
              • 4. Re: Significant performance drop of distributed stream search when use persistence
                Nikolai Sahattchiev Newbie

                Thanks William, will try first with the SKIP_CACHE_LOAD flag. And after that I can upgrade to 9.3.0.Final and test once again.

                • 5. Re: Significant performance drop of distributed stream search when use persistence
                  Nikolai Sahattchiev Newbie

                  I set the SKIP_CACHE_LOAD flag as you suggested and the performance has increased from 1.7 req/sec to 18 req/sec, i.e. there is no more difference between a cache with persistence and without persistence.

                   

                  After that I upgraded Infinispan to 9.3.0.Final and did the test again (with persistence). The performance gain was 3x! The throughput increased to 55 req/sec (from 18), the average response time was 161 millis (min: 23, max: 797). This is a really great performance improvement!!

                   

                  Thanks a lot for your help!

                   

                  Now I have one other issue - the throughput of search with query in the cache (with InfinispanIndexManager as backend) is only a bit faster than the distributed stream search.

                   

                  Search with distributed stream: 18 req/sec

                  Search with query and InfinispanIndexManager - 19 req/sec

                  Search with query and Elasticsearch - 99 req/sec

                   

                  Below the query search implementation:

                   

                  SearchManager sm = Search.getSearchManager(cache);
                  QueryBuilder builder = sm.buildQueryBuilderForClass(TestDocument.class).get();
                  Query luceneQuery = builder.keyword()
                          .onField("name")
                          .matching(nameToFind)
                          .createQuery();

                  CacheQuery query = sm.getQuery(luceneQuery);
                  List<TestDocument> results = query.list();

                  You can see the cache config with Elasticsearch in my first posting. In the tests with InfinispanIndexManager I just replaced the <indexing> config with this one:

                   

                  <indexing index="LOCAL">
                      <property name="default.indexmanager">org.infinispan.query.indexmanager.InfinispanIndexManager</property>
                      <property name="default.locking_cachename">LuceneIndexesLocking_custom</property>
                      <property name="default.data_cachename">LuceneIndexesData_custom</property>
                      <property name="default.metadata_cachename">LuceneIndexesMetadata_custom</property>
                      <property name="default.exclusive_index_use">true</property>
                      <property name="default.worker.execution">async</property>
                      <property name="default.index_flush_interval">500</property>
                  </indexing>

                  and added the 3 Lucene caches for the shared index:

                   

                  <replicated-cache name="LuceneIndexesLocking_custom" mode="SYNC" remote-timeout="300000">
                      <indexing index="NONE" />
                      <locking concurrency-level="1000" acquire-timeout="60000"/>
                      <transaction mode="NONE"/>

                      <persistence passivation="false">
                          <rocksdbStore:rocksdb-store preload="true" fetch-state="true" path="${HOME}/data/${nodename}-lil/data/">
                              <rocksdbStore:expiration path="${HOME}/data/${nodename}-lil/expired/"/>
                          </rocksdbStore:rocksdb-store>
                      </persistence>

                      <state-transfer timeout="120000" await-initial-transfer="true"></state-transfer>
                  </replicated-cache>

                  <replicated-cache name="LuceneIndexesMetadata_custom" mode="SYNC" remote-timeout="300000">
                      <indexing index="NONE" />
                      <locking concurrency-level="1000" acquire-timeout="60000"/>
                      <transaction mode="NONE"/>

                      <persistence passivation="false">
                          <rocksdbStore:rocksdb-store preload="true" fetch-state="true" path="${HOME}/data/${nodename}-lim/data/">
                              <rocksdbStore:expiration path="${HOME}/data/${nodename}-lim/expired/"/>
                          </rocksdbStore:rocksdb-store>
                      </persistence>

                      <state-transfer timeout="120000" await-initial-transfer="true"></state-transfer>
                  </replicated-cache>

                  <distributed-cache name="LuceneIndexesData_custom" mode="SYNC" remote-timeout="300000">
                      <indexing index="NONE" />
                      <locking concurrency-level="1000" acquire-timeout="60000"/>
                      <transaction mode="NONE"/>

                      <persistence passivation="false">
                          <rocksdbStore:rocksdb-store preload="true" fetch-state="true" path="${HOME}/data/${nodename}-lid/data/">
                              <rocksdbStore:expiration path="${HOME}/data/${nodename}-lid/expired/"/>
                          </rocksdbStore:rocksdb-store>
                      </persistence>

                      <state-transfer timeout="120000" await-initial-transfer="true"></state-transfer>
                  </distributed-cache>

                  Could that be because of the persistent Lucene caches? Probably similar problem as the distributed search without the SKIP_CACHE_LOAD flag?
                  • 7. Re: Significant performance drop of distributed stream search when use persistence
                    Nikolai Sahattchiev Newbie

                    @Indexed(index = "TestDocumentIndex")

                    public class TestDocument implements Serializable {

                     

                       @Field(index=Index.YES, analyze=Analyze.NO, store=Store.NO)

                       private String name;

                     

                    .....

                     

                    }

                    • 8. Re: Significant performance drop of distributed stream search when use persistence
                      Gustavo Fernandes Apprentice

                      The annotations look fine. In order to isolate the issue, could you try the following combinations:

                       

                      1. Removing persistence of all 3 lucene caches;

                      2. Removing persistence of all 3 lucene caches and making all 3 lucene caches REPL_SYNC

                       

                      Also, is the query throughput measured while writing to the cache(s)?

                      • 9. Re: Significant performance drop of distributed stream search when use persistence
                        Nikolai Sahattchiev Newbie

                        Ok, I will try both combinations.

                         

                        No, there are no write operations during the query throughput measuring. I start 10 nodes without content, generate 10 million documents and after that I do test the performance.

                        • 10. Re: Significant performance drop of distributed stream search when use persistence
                          Nikolai Sahattchiev Newbie

                          Tried both combinations:

                           

                          1. removing the persistence did not help

                          2. defining the 3 lucene caches as REPL_SYNC improved the throughput from 19 req/sec to about 280 req/sec!

                           

                          I did the 2nd test with and without persistence and in both cases the performance was the same.

                           

                          Why the lucene distributed cache slows down the performance so dramatically, do you know that? 

                          • 11. Re: Significant performance drop of distributed stream search when use persistence
                            Gustavo Fernandes Apprentice

                            Because it needs to fetch index segments, causing latency. By having index caches as REPL_SYNC, you are effectively making the index available locally. You can also try adding the configuration:

                            <property name="default.reader.strategy">async</property>

                            to improve performance a bit more.

                             

                            The trade-off of having index caches as REPL, you'll have a full copy of the index in each node, and depending on how big you expect your data to grow, you may hit a scalability limit.

                             

                            If scalability limit is hit, the solution is to go for a non-shared index [1], where you configure each node to maintain only an index with its own data (independent of other), and at query time it combines the result of individual nodes.

                             

                            To do so you'll need to use the following config:

                             

                            <indexing index="PRIMARY_OWNER">

                                <property name="default.indexmanager">near-real-time</property>

                                <property name="default.indexBase">/path/to/index</property>

                            </indexing>

                             

                            And query would need to be done differently, as DSL is not supported for this.

                             

                            CacheQuery query = Search.getSearchManager(cache).getQuery("FROM TestDocument where name = 'Joe'", IndexedQueryMode.BROADCAST, TestDocument.class);

                             

                            The drawback of non-shared indexes is queries are done in two phases, you can try and see if it works for you.

                             

                            Conclusion: shared and non-shared indexes have both advantages and drawbacks, it's up to you to pick the best one for your use case.

                             

                            [1] http://infinispan.org/docs/stable/user_guide/user_guide.html#query.non-shared-index

                            1 of 1 people found this helpful