This is an early design doc to redesign Infinispan's FileCacheStore.
Some general ideas:
- B+Tree-based, good for fast lookup (reading), but slower for writing.
- Append-only store
- Fast writing, slow to read
- Useful if data set is held in memory and write through is purely for resilience (not expanded capacity).
- Would require a separate thread/process to handle compacting (back into a B+Tree)
Some good background reading:
- Operations to test: load, store, remove, preload
- These operations should be tested in two major scenarios:
- Test operations, a local cache with no eviction plugged with the file cache store (no async store), in such way that the cache the cache store have exactly the same data. E.g. 1 GB data stored. This test aims to see how fast we can update a cache store. Reads would be very fast because they'd be served by the in-memory cache.
- Test operations, in a small in-memory local cache with agreesive eviction settings plugged with a file based cache store (no async store) that's used as overflow. E.g. keep 1GB in memory and store 20 GB in file store. Here we're trying to get a better idea of how good the cache store is at reading data. That's because most of the data will be present in the cache store and not in the cache, so it requires reading the cache store and storing that data in-memory.
- Before writing any cache stores, we should evaluate the performance of the cache stores available right now, which are:
- Current FileCacheStore (master branch: https://github.com/infinispan/infinispan)
- Karsten's FileCacheStore (keeps keys and file positions in memory, but worth comparing, branch: https://github.com/galderz/infinispan/tree/t_2806_karsten)
- Level DB Cache Store with JNI native Fusesource library (master branch: https://github.com/infinispan/infinispan)
- Level DB Cache Store with Java library (master branch: https://github.com/infinispan/infinispan
- Preferably, tests should be run in a modern SSD drives.
- For each of the major scenarions, target performance objectives need to be set. TBD.
All setups used local-cache, benchmark was executed via Radargun (actually version not merged into master yet ). I've used 4 nodes just to get more data - each slave was absolutely independent of the others.
First test was preloading performance - the cache started and tried to load 1GB of data from harddrive. Without cachestore the startup takes about 2 - 4 seconds, average numbers for the cachestores are below:
|LevelDB-JAVA impl.||12.3 s|
|LevelDB-JNI impl||12.9 s|
IMO nothing special, all times seem affordable. We don't benchmark exactly storing the data into the cachestore, here FileCacheStore took about 44 minutes, while Karsten about 38 seconds, LevelDB-JAVA 4 minutes and LevelDB-JNI 96 seconds. The units are right, it's minutes compared to seconds. But we all know that FileCacheStore is bloody slow.
Second test is stress test (5 minutes, preceded by 2 minute warmup) where each of 10 threads works on 10k entries with 1kB values (~100 MB in total). 20 % writes, 80 % reads, as usual. No eviction is configured, therefore the cache-store works as a persistent storage only for case of crash.
|FileCacheStore||3.1M||112||on one node the performance was only 2.96M reads/s 75 writes/s|
|LevelDB-JNI impl.||6.6M||14k||on one node the performance was 3.9M/8.3k - about half of the others|
|Without cache store||15.5M||4.4M|
Karsten implementation pretty rules here for two reasons. First of all, it does not flush the data (it calls only RandomAccessFile.write()). Other cheat is that it stores in-memory the keys and offsets of data values in the database file. Therefore, it's definitely the best choice for this scenario, but it does not allow to scale the cache-store, especially in cases where the keys are big and values small. However, this performance boost is definitely worth checking - I could think of caching the disk offsets in memory and querying persistent index only in case of missing record, with part of the persistent index flushed asynchronously (the index can be always rebuilt during the preloading for case of crash).
The third test should have tested the scenario with more data to be stored than memory - therefore, the stressors operated on 100k entries (~100 MB of data) but eviction was set to 10k entries (9216 entries ended up in memory after the test has ended).
|FileCacheStore||750||285||one node had only 524 reads and 213 writes per second|
|LevelDB-JAVA impl.||21k||9k||these values are for mmap implementation (typo in test)|
|LevelDB-JNI impl.||13k-46k||6.6k-15.2k||the performance varied a lot!|
We have also tested the second and third scenario with increased amount of data used - each thread operated on 200k entries, giving about 2 GB of data in total. The test execution was also prolonged to 5 minute warmup and 10 minute test. FileCacheStore was excluded from this comparison.
Update: I have also added the FileChannel.force(false) calls to the Karsten implementation and the results are provided.
Persistent storage scenario:
KarstenFileCacheStore - force(false)
|LevelDB-JAVA - force(false)||3.2M||400|
|LevelDB-JAVA - force(false), SNAPPY (iq80)||3.2M||390|
|LevelDB-JNI - sync writes||3.0M||1240|
|LevelDB-JNI - sync writes, SNAPPY||3.2M||1240|
|Without cache store||6.2M||1.9M|
|KarstenFileCacheStore||265k||16k||one node had 21k writes/s|
|KarstenFileCacheStore - force(false)||285k||1200|
|LevelDB-JAVA||500 or 5900||400 or 4000|
one node 10x faster! It shows different memory and CPU usage pattern.
these values are for mmap implementation (typo in test)
|LevelDB-JAVA - force(false)||950||520|
|LevelDB-JAVA - force(false), SNAPPY (iq80)||950||515|
|LevelDB-JNI - sync writes||15.5k||900||some variance between nodes|
|LevelDB-JNI - sync writes, SNAPPY||14k-19k||750-1100||one node slower at writes|
Obviously the performance dropped radically from the 100 MB case.
Another test tried to find out the impact of value size. We have used the persistent configuration, with each thread operating on 100k entries with value size 1kB, 25k entries with value size 4kB or 6125 entries with value size 16kB.
|Cache store||1kB values||4kB values||16kB values|
|KarstenFileCacheStore||13k writes/s, one node 22k||13k writes/s one node 24k||12.5k writes/s, one node 19k|
|LevelDB-JNI||6k writes/s||1400 writes/s||400 writes/s|
Next test used 1kB, 4kB or 16kB keys and empty values:
|Cache store||1kB keys||4kB keys||16kB keys|
|KarstenFileCacheStore||13k writes/s||12k writes/s||7k writes/s|
|LevelDB-JNI||8k writes/s||490 writes/s||130 writes/s|