Significant performance drop of distributed stream search when use persistence
nsahattchiev Jul 2, 2018 8:12 AMHi all,
we face a very big performance drop when our Infinispan cache is configured to use persistence.
The test setup is as follow:
2 boxes, each with 1 Xeon CPUs (with 8 cores and 2 threads per core) -> each box has 16 threads, in total we have 32 threads. Each box has 16 GB RAM memory.
On each box we start 5 Infinispan nodes with java max heap 512 MB each -> in total we have 10 nodes spread over two boxes. The Infinispan version is 9.2.3.Final.
The Infinispan cache is configured as below:
<distributed-cache name="dist-sync" mode="SYNC" remote-timeout="300000" owners="2" segments="100">
<locking concurrency-level="1000" acquire-timeout="60000"/>
<transaction mode="NONE"/>
<indexing index="LOCAL">
<property name="default.indexmanager">elasticsearch</property>
<property name="default.elasticsearch.host">http://10.20.0.40:9200</property>
<property name="default.elasticsearch.max_total_connection">50</property>
<property name="default.elasticsearch.max_total_connection_per_route">25</property>
<property name="lucene_version">LUCENE_CURRENT</property>
</indexing>
<state-transfer timeout="60000"></state-transfer>
</distributed-cache>
We fill it with 1 million documents and run as performance test a distributed stream search:
List<TestDocument> documentList = (List<TestDocument>)cache.values().parallelStream()
.filter((Serializable & Predicate<? super TestDocument>) e -> e.getName().equals(nameToFind))
.collect(CacheCollectors.serializableCollector(() -> Collectors.toList()));
When we run the test with 10 parallel users, the throughput is about 18 requests/second and the average response time about 241 millis (min: 47, max: 1198). The test machines a fully utilized. Each search returns between 0 and 350 documents as result.
When we do the same test on cache with enabled persistence with rocksdbStore (config below), the performance drops to 1.7 requests/second and average response time of 5281 millis (min: 427, max: 9292). We put exactly the same documents in the cache and execute exactly the same searches.
<cache-container default-cache="dist-sync">
<transport stack="my-tcp" cluster="mycluster"/>
<distributed-cache name="dist-sync" mode="SYNC" remote-timeout="300000" owners="2" segments="100">
<locking concurrency-level="1000" acquire-timeout="60000"/>
<transaction mode="NONE"/>
<persistence passivation="false">
<rocksdbStore:rocksdb-store preload="true" fetch-state="true" path="${HOME}/data/${nodename}/data/">
<rocksdbStore:expiration path="${HOME}/data/${nodename}/expired/"/>
</rocksdbStore:rocksdb-store>
</persistence>
<indexing index="LOCAL">
<property name="default.indexmanager">elasticsearch</property>
<property name="default.elasticsearch.host">http://10.20.0.40:9200</property>
<property name="default.elasticsearch.max_total_connection">50</property>
<property name="default.elasticsearch.max_total_connection_per_route">25</property>
<property name="lucene_version">LUCENE_CURRENT</property>
</indexing>
<state-transfer timeout="60000"></state-transfer>
</distributed-cache>
</cache-container>