5 Replies Latest reply on Dec 8, 2011 1:51 PM by sannegrinovero

Infinispan for a new project

zeeman Dec 7, 2011 12:43 PM

Hi all, few questions:

1- Is Infinispan the right fit for a bi-directional persisted cache between 2 servers? One of servers can be off the cluster, when it joins it needs to sync with other node. Both servers can have data changed when they're not connected to the cluster.

2- Would replication or distribution mode work better for #1?

3- The cache keys would mostly be sessionId, UserId, Timestamp. Most queries would be range queries (like get all values for a given time range). What would be a good way of doing this?

4- Caches on both server need to be persisted, only mostly used items need to stay in cache. When a query is run it needs to be aganist data in cache and on disk. Can Infinispan support this? How to configure it? Passivation?

I have complete controll on stacks used on both servers.

Thanks!

1. Re: Infinispan for a new project

sannegrinovero Dec 7, 2011 12:56 PM (in response to zeeman)

Hi,

1- Is Infinispan the right fit for a bi-directional persisted cache between 2 servers? One of servers can be off the cluster, when it joins it needs to sync with other node. Both servers can have data changed when they're not connected to the cluster.
There is no automatic merge of conflicting updates, so if both servers contain different values for the same key, when they join your application should deal with that.

2- Would replication or distribution mode work better for #1?
If it's just 2 servers, you should use REPL. Using DIST with 2 owners would be basically the same as each key would be replicated to both.. while if you configure it to use 1 owner only, when they disconnect each node will be able to retrieve only ~half of the elements and return null for the remaining.

3- The cache keys would mostly be sessionId, UserId, Timestamp. Most queries would be range queries (like get all values for a given time range). What would be a good way of doing this?
yes that could work fine, but the index used for queries need to be kept consistent with the values too: so again it depends if you expect both nodes to be able to apply updates to the values (and index) while they are disconnected. That won't work.

4- Caches on both server need to be persisted, only mostly used items need to stay in cache. When a query is run it needs to be aganist data in cache and on disk. Can Infinispan support this? How to configure it? Passivation?
Yes that's supported and we call it passivation. Queries do work because the index is managed independently from the values; you can store the index on a traditional filesystem or in a dedicated Infinispan cache so that you can share it among the two servers.
Actions
2. Re: Infinispan for a new project

zeeman Dec 7, 2011 1:20 PM (in response to sannegrinovero)

Thanks for your answers.

1- If I use REPL mode, JGROUPS is configured to use TCP and TCPPING. When one node starts up, the other node is offline, there will be an error. Can the cache still start in that case? What would my startup logic need to handle to work both when cluster is formed or not? Looking at Infinistart examples, cache only works when cluster is formed (both nodes are online).

Also, I see in API there is support for delta aware interfaces. Are there any examples to use them? Is that what I need to handle conflicts? If not, how should I handle conflicts. It's possible the same key would be updated on both nodes when the cluster is offline. I want to force changes from one to always win.

3- I was looking for how to query cache for a range of keys? Cache only offers get(k), how would I make a range query? If I know the keys I can get values easily. But I don't know the keys and I don't see any cache API that offers a way to accomplish this. Something like Sortedset from Redis helps greatly with this.
Actions
3. Re: Infinispan for a new project

sannegrinovero Dec 7, 2011 1:31 PM (in response to zeeman)

1- If I use REPL mode, JGROUPS is configured to use TCP and TCPPING. When one node starts up, the other node is offline, there will be an error. Can the cache still start in that case? What would my startup logic need to handle to work both when cluster is formed or not? Looking at Infinistart examples, cache only works when cluster is formed (both nodes are online).

Also, I see in API there is support for delta aware interfaces. Are there any examples to use them? Is that what I need to handle conflicts? If not, how should I handle conflicts. It's possible the same key would be updated on both nodes when the cluster is offline. I want to force changes from one to always win.
The org.infinispan.atomic.DeltaAware is a relatively internal API which you could use but is considered "advanced", and I'm not sure it's your best choice. Since you say you want one node to always "win" it sounds more suited for you to use two different Infinispan instances and NOT consider them part of the same cluster: configure them both as local caches with passivation.
You can then use HotRod to connect from one cache to the other, and write a little utility class which copies over the values from the other node.. sound simple?

3- I was looking for how to query cache for a range of keys? Cache only offers get(k), how would I make a range query? If I know the keys I can get values easily. But I don't know the keys and I don't see any cache API that offers a way to accomplish this. Something like Sortedset from Redis helps greatly with this.
Entries in Infinispan are not sorted. You can either iterate them all to find what you need, or store your data in different shapes (see the TreeCache API), or use Infinispan Query to index your entries and make them searchable.
Iterating via an EntrySet could be tricky as you won't "see" what was passivated to the CacheLoader, but you could workaround that maintaining a list/set/sortedSet of entries and store that in the Cache as well under a specific well known key.
Actions
4. Re: Infinispan for a new project

zeeman Dec 7, 2011 2:39 PM (in response to sannegrinovero)

1- Only in case of conflicts changes from one server always win. But in normal operation changes from both servers should be exchanged and replicated. If we use two caches as you suggested there is no way of doing merges. I might as well do that myself and no need for infinispan right? This is where I thought delta aware made sense.

2- I checked TreeCache API, it does not offer anything different from standard cache to accomodate queries. Iterating over all keys will be very expensive. The only option left is using Hibernate Search & Lucene. This is not feasible. As the app is running on a desktop like hardware, won't be enough resources.

Is it possible to have keys sorted in Infinispan? Or anyway to be able to say get all keys greater or less than a specific key? Looks like querying in Infinispan needs more support from what I see.

New question, if I do go with Hibernate search approach. How does the indexed items work with passivation? When items are evicted from cache? Will they still be available in index? Is there any documentation about workflow and interaction between Infinispan and Hiberante Search?

Is it possible to use an memory index to be backed what's in the cache, but also passivated?

Thanks!
Actions
5. Re: Infinispan for a new project

sannegrinovero Dec 8, 2011 1:51 PM (in response to zeeman)

1- Only in case of conflicts changes from one server always win. But in normal operation changes from both servers should be exchanged and replicated. If we use two caches as you suggested there is no way of doing merges. I might as well do that myself and no need for infinispan right? This is where I thought delta aware made sense.
Right, but DeltaAware is not helping you as it works on a specific key and is only a means to reduce the amount of data transferred (to transfer only the diff), it's not meant to sync up conflicting values.
By "coding yourself", I agree it should be simpler because I don't think it would be easy to find any tool implementing the sync and conflict resolution you might need for your specific use cases; still using HotRod and connecting two Infinispan non-clustered caches you avoid 90% of the complexity.

The only option left is using Hibernate Search & Lucene. This is not feasible. As the app is running on a desktop like hardware, won't be enough resources.
Not sure if you're being seriour here.. Hibernate Search & Lucene are very efficient; Lucene might need some memory especially for sorting, but is quite reasonable for a Desktop. I can assure you we all develop and run integration tests on laptop-class hardware .
Admittedly they might "boot" rather slowly, like a half second depending on your index size, but I'm not aware of any project able to provide you the same runtime speed, so if your server is not super beefy it should work very well.

I'm not saying it's the right option for this specific job, but I definitely can't agree on the reason to not consider it.

Is it possible to have keys sorted in Infinispan? Or anyway to be able to say get all keys greater or less than a specific key? Looks like querying in Infinispan needs more support from what I see.
No. It's a key/value store with some nice extras, but it won't slowly grow into a relational database: querying will always be limited to keep efficiency in other areas.

New question, if I do go with Hibernate search approach. How does the indexed items work with passivation? When items are evicted from cache? Will they still be available in index? Is there any documentation about workflow and interaction between Infinispan and Hiberante Search?

Is it possible to use an memory index to be backed what's in the cache, but also passivated?
The indexes are stored where you want, three options : RAM using Lucene's own in memory index, Disk using Lucene's own filesystem based index, or in Infinispan using infinispan-lucene-directory, which means can use some memory and offload to disks/databases/etc..

The indexes are kept in sync with all the contents of the cache, and it makes no difference between what is in Infinispan in memory or passivated, so entries are added to the index when added to Infinispan, updated when the entries are updated, and removed from the index when removed from Infinispan. It listens to Infinispan events, very simple, and manage a searchable index of your values; you can define how your values are indexed.

https://docs.jboss.org/author/display/ISPN/Querying+Infinispan
Actions

Go to original post