-
1. Re: Infinispan as a document store?
sannegrinovero Sep 4, 2014 7:29 PM (in response to brenuart)Hi,
since Infinispan shines when you can keep all of its data in memory, your first step would be to try seeing if you have enough memory across your servers to fit it all (and have some spare memory over as well of course).
I guess you won't be able to keep it all in memory, so you are probably looking into the CacheStore capabilities of Infinispan to keep the "hot" data in memory while offloading most of it to a different storage engine.. probably the disks on each server?
Also to consider the index size. Since the index is usually significantly smaller than the data, we often aim at having a full replica of the index on each node. That's not strictly a requirement, as you can distribute it too, but the more "diluted" this information is the less performing your queries will be.
Although, it's designed to do millions of queries per second so since you aim at just 3 units this might not be a problem.. still I would advice to verify with a POC how large your index would get for the full dataset. I can't pull an estimate of index size as this wildfly depends on the indexing options you intend to apply, which in turn depends on which queries you will need to be able to run. If you're just indexing some metadata that should not be a problem at all to fully replicate the index.
Might be worth considering that Infinispan 7 will also be able to run queries without the need for any index: as with traditional databases, queries will be slower but you get a benefit in write performance and scalability. Bear in mind though that an indexless query will simply iterate all entries.
It's good to remind that Infinispan was designed as a Cache, and not as an ACID database: the transactional features are meant to be able to participate with other JTA transactions (go/no go for a batch of changes), but strict durability hasn't been a design objective; as a cache, it's expected that you can load data from another system in case of catastrophical failure.
Considering the very low load you're expecting, and assuming you will keep most data off-heap in a CacheStore, you should be fine with just a couple of servers. Probably the hardest part will be to load all the initial data.. in this case I generally suggest to not load it at all, but to develop a custom CacheStore to have it load "on demand" from the existing source.