Storing large collections in Infinispan
ydewit Nov 22, 2011 1:53 PMI am looking for some recommendations on how best to store large collections (ordered/unordered) in Infinispan.
I am basically trying to store a parent entity that can contain a large number of child entities, either ordered or not. The simple solution is to store the collection of child IDs in a collection (arraylist or hashset) with the parent entity so that when I get a specific parent entity I will also get the list of IDs to the children entities and can fetch the children individually as needed.
The first problem with this approach is that any updates to the collection would require updating the parent entity including the whole collection. The natural solution here would be to detach the collection from it's parent so that we have one cache entry for each parent entity, one separate cache entry for each parent-collection, and one cache entry for each child entity.
The second problem is that the collection itself could be quite large. An arraylist with say 1 million entries would take around 50Mb of memory and updating this list would be a drag to say the least if the whole collection has to be updated in the cache. The natural solution here would be to split the collection into smaller chunks effectively paginating it. And a page size of 1 would be equivalent to having the parent ID stored in the child entity as a foreign key. Here, the issue then becomes how to query all the child entities in the cache with a given parent ID and to return only N entries starting at position P. And afaik, this is not directly supported by the Infinispan APIs but can be done with something like Lucene on top of it providing the required indexing.
1. The first question is whether something like Lucene is the only solution here or are there other options/recommendations. I came across something about AtomicMap, but not sure how it could help here (pointers appreciated if it is a viable options).
2. And the second question is how reliable is lucene's indexing in a clustered environment with Infinispan? My concern here stems from the fact that, in the event of an issue with the indexing/lucene, querying the children of a given parent entity is much more critical to the app functionality than having full text search enabled in a drop-down in the UI.
thanks in advance for any insights into this,