FileCacheStore redesign

Version 22

Created by manik on Apr 4, 2012 10:40 AM. Last modified by rvansa on Jul 10, 2013 9:29 AM.

This is an early design doc to redesign Infinispan's FileCacheStore.

Some general ideas:

B+Tree-based, good for fast lookup (reading), but slower for writing.
Append-only store
- Fast writing, slow to read
- Useful if data set is held in memory and write through is purely for resilience (not expanded capacity).
- Would require a separate thread/process to handle compacting (back into a B+Tree)

Some good background reading:

https://www.kernel.org/pub/linux/kernel/people/suparna/aio/262/results/aio-stress-results.txt

http://www.acunu.com/2/post/2011/03/why-is-acunu-in-kernel.html

http://www.datastax.com/dev/blog/what-persistence-and-why-does-it-matter

http://www.datastax.com/dev/blog/cassandra-file-system-design

http://wiki.apache.org/cassandra/Durability

http://wiki.apache.org/cassandra/ArchitectureCommitLog

http://www.slideshare.net/rbranson/cassandra-and-solid-state-drives

http://antirez.com/post/redis-persistence-demystified.html

http://hornetq.blogspot.co.uk/2009/08/persistence-on-hornetq.html

http://hornetq.sourceforge.net/docs/hornetq-2.0.0.BETA5/user-manual/en/html/persistence.html

http://hornetq.sourceforge.net/docs/hornetq-2.0.0.GA/user-manual/en/html/libaio.html

https://code.google.com/p/leveldb/

https://issues.jboss.org/browse/ISPN-1362

https://issues.jboss.org/browse/ISPN-1303

https://issues.jboss.org/browse/ISPN-1302

https://issues.jboss.org/browse/ISPN-1301

https://issues.jboss.org/browse/ISPN-517

Test plan:

Operations to test: load, store, remove, preload
These operations should be tested in two major scenarios:
- Test operations, a local cache with no eviction plugged with the file cache store (no async store), in such way that the cache the cache store have exactly the same data. E.g. 1 GB data stored. This test aims to see how fast we can update a cache store. Reads would be very fast because they'd be served by the in-memory cache.
- Test operations, in a small in-memory local cache with agreesive eviction settings plugged with a file based cache store (no async store) that's used as overflow. E.g. keep 1GB in memory and store 20 GB in file store. Here we're trying to get a better idea of how good the cache store is at reading data. That's because most of the data will be present in the cache store and not in the cache, so it requires reading the cache store and storing that data in-memory.
Before writing any cache stores, we should evaluate the performance of the cache stores available right now, which are:
- Current FileCacheStore (master branch: https://github.com/infinispan/infinispan)
- Karsten's FileCacheStore (keeps keys and file positions in memory, but worth comparing, branch: https://github.com/galderz/infinispan/tree/t_2806_karsten)
- Level DB Cache Store with JNI native Fusesource library (master branch: https://github.com/infinispan/infinispan)
- Level DB Cache Store with Java library (master branch: https://github.com/infinispan/infinispan
Preferably, tests should be run in a modern SSD drives.

Objectives:

For each of the major scenarions, target performance objectives need to be set. TBD.

Current results:

All setups used local-cache, benchmark was executed via Radargun (actually version not merged into master yet [2]). I've used 4 nodes just to get more data - each slave was absolutely independent of the others.

First test was preloading performance - the cache started and tried to load 1GB of data from harddrive. Without cachestore the startup takes about 2 - 4 seconds, average numbers for the cachestores are below:

Cache store	Startup-time
FileCacheStore	9.8 s
KarstenFileCacheStore	14 s
LevelDB-JAVA impl.	12.3 s
LevelDB-JNI impl	12.9 s

IMO nothing special, all times seem affordable. We don't benchmark exactly storing the data into the cachestore, here FileCacheStore took about 44 minutes, while Karsten about 38 seconds, LevelDB-JAVA 4 minutes and LevelDB-JNI 96 seconds. The units are right, it's minutes compared to seconds. But we all know that FileCacheStore is bloody slow.

Second test is stress test (5 minutes, preceded by 2 minute warmup) where each of 10 threads works on 10k entries with 1kB values (~100 MB in total). 20 % writes, 80 % reads, as usual. No eviction is configured, therefore the cache-store works as a persistent storage only for case of crash.

Cache store	reads/s	writes/s	note
FileCacheStore	3.1M	112	on one node the performance was only 2.96M reads/s 75 writes/s
KarstenFileCacheStore	9.2M	226k
LevelDB-JAVA impl.	3.9M	5100
LevelDB-JNI impl.	6.6M	14k	on one node the performance was 3.9M/8.3k - about half of the others
Without cache store	15.5M	4.4M

Karsten implementation pretty rules here for two reasons. First of all, it does not flush the data (it calls only RandomAccessFile.write()). Other cheat is that it stores in-memory the keys and offsets of data values in the database file. Therefore, it's definitely the best choice for this scenario, but it does not allow to scale the cache-store, especially in cases where the keys are big and values small. However, this performance boost is definitely worth checking - I could think of caching the disk offsets in memory and querying persistent index only in case of missing record, with part of the persistent index flushed asynchronously (the index can be always rebuilt during the preloading for case of crash).

The third test should have tested the scenario with more data to be stored than memory - therefore, the stressors operated on 100k entries (~100 MB of data) but eviction was set to 10k entries (9216 entries ended up in memory after the test has ended).

Cache store	reads/s	writes/s	note
FileCacheStore	750	285	one node had only 524 reads and 213 writes per second
KarstenFileCacheStore	458k	137k
LevelDB-JAVA impl.	21k	9k	these values are for mmap implementation (typo in test)
LevelDB-JNI impl.	13k-46k	6.6k-15.2k	the performance varied a lot!

We have also tested the second and third scenario with increased amount of data used - each thread operated on 200k entries, giving about 2 GB of data in total. The test execution was also prolonged to 5 minute warmup and 10 minute test. FileCacheStore was excluded from this comparison.

Update: I have also added the FileChannel.force(false) calls to the Karsten implementation and the results are provided.

Persistent storage scenario:

Cache store	reads/s	writes/s
KarstenFileCacheStore	3.8M-5.3M	3600-7700
KarstenFileCacheStore - force(false)	3.2M	1650
LevelDB-JAVA	3.8M	2200
LevelDB-JAVA - force(false)	3.2M	400
LevelDB-JAVA - force(false), SNAPPY (iq80)	3.2M	390
LevelDB-JNI	5.3M	4650
LevelDB-JNI - sync writes	3.0M	1240
LevelDB-JNI - sync writes, SNAPPY	3.2M	1240
Without cache store	6.2M	1.9M

Overflow scenario:

Cache store	reads/s	writes/s	note
KarstenFileCacheStore	265k	16k	one node had 21k writes/s
KarstenFileCacheStore - force(false)	285k	1200
LevelDB-JAVA	500 or 5900	400 or 4000	one node 10x faster! It shows different memory and CPU usage pattern. these values are for mmap implementation (typo in test)
LevelDB-JAVA - force(false)	950	520
LevelDB-JAVA - force(false), SNAPPY (iq80)	950	515
LevelDB-JNI	9200-14.4k	5400-6500
LevelDB-JNI - sync writes	15.5k	900	some variance between nodes
LevelDB-JNI - sync writes, SNAPPY	14k-19k	750-1100	one node slower at writes

Obviously the performance dropped radically from the 100 MB case.

Another test tried to find out the impact of value size. We have used the persistent configuration, with each thread operating on 100k entries with value size 1kB, 25k entries with value size 4kB or 6125 entries with value size 16kB.

Cache store	1kB values	4kB values	16kB values
KarstenFileCacheStore	13k writes/s, one node 22k	13k writes/s one node 24k	12.5k writes/s, one node 19k
LevelDB-JNI	6k writes/s	1400 writes/s	400 writes/s

Next test used 1kB, 4kB or 16kB keys and empty values:

Cache store	1kB keys	4kB keys	16kB keys
KarstenFileCacheStore	13k writes/s	12k writes/s	7k writes/s
LevelDB-JNI	8k writes/s	490 writes/s	130 writes/s

JBossDeveloper

FileCacheStore redesign

Comments