4 Replies Latest reply on Nov 19, 2013 4:00 AM by jhberthemet

How (not to) store a Lucene Infinispan Directory on disk

jhberthemet Nov 13, 2013 12:15 PM

Hi Community,

This is a continuation of a discussion wrongly started on Hibernate forums:

Hibernate Community • View topic - Using LuceneStoreConfigurationBuilder from Infinispan 6.0

Basically I'd like to be able to persist a replicated Lucene directory to disk to reduce memory footprint and allow total failure of cluster without having to rebuild all Lucene files.

Sanne Grinovero proposed me to try to implement my own FileStore and that's what I tried. Unfortunately I don't think this is possible while keeping performance on par with pure Lucene or even usable. I implemented an AdvancedLoadWriteStore, blending the SingleFileStore and Sanne's LuceneCacheLoader.

While debugging I found out that when Lucene does a seek() operation on a file, the final operation translates into a CacheStore load() of the corresponding file using ChunkCacheKey. The problem is that this will fully read the file into memory just to skip those bytes. Lucene makes very heavy usage of seek operation, so performance will be horrible once index gets big. That also explains why JDBCCacheStore is (supposedly) very slow.

I don't think it is worth posting my code as it is not functional and it will be a hassle for me to get approval of my company.

1. Re: How (not to) store a Lucene Infinispan Directory on disk

sannegrinovero Nov 15, 2013 9:05 AM (in response to jhberthemet)

Hi,
glad to see finally someone understanding the essential concepts of Lucene. Indeed it does a lot of seek operations, but the intention of the Lucene Infinispan Directory is to cache it in memory: so when using a CacheLoader, the first access would be slow as it needs to effectively load the related buffers from the store, but after that you should not see any other load on that same segment.

The write scenario is different: if you configure Infinispan with a write-through configuration, you would have a performance problem as each write would hit both the in-memory cache and the (slow) cachestore. The idea there is that you either should not write to a cachestore, or you use a write-behind configuration. Infinispan has some pretty smart features when dealing with write-behind: an entry which gets modified while it's still in the queue to be written to the underlying storage, will have it write only the latest version. If you enable passivation on eviction policies, it will only write to storage those entries which are not frequently used in memory.

Some of these concepts are the reason for the Directory to take various different caches: so you can configure each cache differently and take advantage of the different features. For example, the metadata cache will contain only very small information but each of these elements will be read very frequently by all nodes, so it's better to keep it in a replicated cache rather than distributed.

Finally a note on CacheStores: the FileStore used to be very slow in version 5.x. JDBCCacheStore is extremely slow. The Lucene CacheLoader (which only does loading) I wrote is mainly meant for importing an existing index: in my case I don't use any CacheStore at all, and to avoid data loss I rely on having it replicated on multiple nodes. Worst case, I'll rebuild the index from scratch, as that's possible in my use case.
The Lucene CacheLoader uses an inderlying Lucene FSDirectory, which is probably the fastest you can get using a combination of memory mapped files and NIO. Ideally if you need "write back" capabilities I'd route the writing logic to this same FSDirectory rather than looking at JDBC or the other filestores. The Cassandra CacheStore is probably a good option, I know some other people used that one in combination with the Lucene Directory, but that was a long time ago.

Let's explore some more options.
Are you sure you need a CacheStore?
Is the index too large to fit completely in memory?
Is your use case write-mostly or read-mostly?

My typical usage is read-mostly, (almost) fitting completely in memory, and don't need storage. In this scenario, performance is much better than the FSDirectory, and also scales much better than the RAMDirectory, but your mileage my vary of course. Would be very interested to know more about your use case?
1 of 1 people found this helpful
Actions
2. Re: How (not to) store a Lucene Infinispan Directory on disk

jhberthemet Nov 15, 2013 9:49 AM (in response to sannegrinovero)

Our application is write mostly, specially regarding Lucene access. I'd say that in heaviest read load it'd be 80% write 20% read. It is some kind of CRM connecting to an SQL DB, Lucene index in only for full text search, most read accesses are done at SQL layer.

I initially wrote a replicated version of our application that would use Infinispan as a replicated cache for Lucene write operations. All members of the cluster would push Lucene documents to the cache, all nodes would pull new docs from cache and apply the write operation. Last member is removing the document from the cache. This solution works well but it is not yet in productised so I was checking if there was already a "all in one" solution with InfiniSpan Directory.

Currently our application fits well in 512MB of XMX, it doesn't reach 300MB most of time. Lucene index can reach 20GB, maybe up to 100GB for few customers, so having a 100GB requirement whereas 512MB was needed previously is difficult to justify. We currently used Lucene NRT and save data as soon as DB tx is commit, shutdown of application is needed for upgrades so currently we need persistent storage. Rebuilding an index can take few days.

I have the feeling that even if I managed to have a working CacheStore it would not be very efficient. I'm surprised Cassandra cache store is working well, maybe I should take a look at it maybe their there is nice trick for seeks. The CacheStore I implemented is based on your CacheLoader and I use the same FSDirectory for write and delete operations.

Thank you!
Actions
3. Re: How (not to) store a Lucene Infinispan Directory on disk

sannegrinovero Nov 18, 2013 12:13 PM (in response to jhberthemet)

Jacques-Henri Berthemet wrote:

Our application is write mostly, specially regarding Lucene access. I'd say that in heaviest read load it'd be 80% write 20% read. It is some kind of CRM connecting to an SQL DB, Lucene index in only for full text search, most read accesses are done at SQL layer.

I initially wrote a replicated version of our application that would use Infinispan as a replicated cache for Lucene write operations. All members of the cluster would push Lucene documents to the cache, all nodes would pull new docs from cache and apply the write operation. Last member is removing the document from the cache. This solution works well but it is not yet in productised so I was checking if there was already a "all in one" solution with InfiniSpan Directory.

That's similar to the approach used by Infinispan Query, when setup to index elements contained in an Infinispan Cache using replication. In that case however it's not storing the Lucene Document in the Cache temporarily: I see some complexity in your case in defining when it's safe to remove the Document? In other words, I think it's pretty hard to know which member is the last one.

Jacques-Henri Berthemet wrote:

Currently our application fits well in 512MB of XMX, it doesn't reach 300MB most of time. Lucene index can reach 20GB, maybe up to 100GB for few customers, so having a 100GB requirement whereas 512MB was needed previously is difficult to justify. We currently used Lucene NRT and save data as soon as DB tx is commit, shutdown of application is needed for upgrades so currently we need persistent storage. Rebuilding an index can take few days.

I see your point but the idea of using Infinispan is to store data in memory, if you can't change your heap requirements then you are probably looking at the wrong technology. That said, the idea is to find a reasonable balance: we wouldn't strictly require to store 100GB in memory as that's not practically feasible, but the idea is to find a sweet spot balance which allows Infinispan to keep the hottest data in memory for top speed, while offloading the largest chunks to disk. The disk will of course be slower than when you can directly write to it via the Lucene FSDirectory, but picking a correct CacheLoader the drawback can be greatly compensated by having way faster performance on other areas.
As a rule of thumb we suggest starting with heaps of about 6GB, then tune from there and see if your specific use case works better with larger or smaller heaps.

Since you have a write-heavy application, an async CacheStore is very likely a good choice: especially during indexing lots of small files are generated which have a short life span. In this case, most of these segments would never actually trigger IO operations.

Jacques-Henri Berthemet wrote:

I have the feeling that even if I managed to have a working CacheStore it would not be very efficient. I'm surprised Cassandra cache store is working well, maybe I should take a look at it maybe their there is nice trick for seeks. The CacheStore I implemented is based on your CacheLoader and I use the same FSDirectory for write and delete operations.

There are two areas that could be interesting to investigate:
# you're probably right that no CacheStore will be as fast as not using Infinispan (or not using it with a CacheStore) unless you set it up for async (write-behind)
# While I do know about how the seek() operations work, these haven't been a performance problem in my cases (specifically). I wonder if my performance tests are doing something significantly different than yours?
If you could provide a small performance testing project I'd be happy to investigate strategies to make sure the seek operation would not load unrequested chunks.
Actions
4. Re: How (not to) store a Lucene Infinispan Directory on disk

jhberthemet Nov 19, 2013 4:00 AM (in response to sannegrinovero)

The limitation of my implementation is that all cluster members must be known, a new node must be added on cold start. Each member have a unique id and add this id to the record to mark it as processed. Currently it is more a POC.

One of the specificity of our application is that it is not adding Documents in one step, all docs will at least be updated once. So each doc is retrieved from index, updated, deleted and added again. There is also a write buffer to be able to batch documents before a commit, but NRT working so well it is no very useful anymore.

It would be difficult for me to provide parts of our application, I'll try to work on something "standalone" that replicates our behavior but I'm not sure I'll be able to provide that soon.

Thank you for all you help!
Actions

Go to original post