5 Replies Latest reply on Jan 16, 2014 6:24 AM by dan.berindei

Distributed persistent cache with local non shared store

jonathandfields Jan 5, 2014 1:49 PM

Hi All,

I am new to ISPN, and I am attempting to setup a large (20-40GB) distributed persistent cache with a non-shared local cache store. I'm using the ISPN that is bundled with EAP 6.1. I am trying to understand how to manage the operational aspects of the distributed cache - starting/stopping (no topology change) versus reconfiguring (topology change).

1) When starting the cluster, I want the individual ISPN nodes (each running in EAP) to load whatever data is stored in the local store, and not get their state from other nodes in the cluster. I just want them to start warm with the same state as when they were stopped. This seems to imply fetchInMemoryState = false, however this contradicts (3) below. This also suggests preload = true, and fetchPersistentState = false for the loader configuration.

2) When stopping the cluster, I want the individual ISPN nodes (each running in EAP) to stop without transferring their data other nodes. (It seems that if I don't somehow prevent the state transfer, the entire cache data (20-40GB) will be shuffled around until it finally all ends up at the last node to leave, with the opposite happening when the cluster is restarted.) This again seems to imply fetchInMemoryState = false, and fetchPersistentState = false for the loader configuration, but this contradicts (3) below.

3) When reconfiguring the cache (adding and removing nodes), I do want the data re-balanced. This implies fetchInMemoryState = true, and fetchPersistentState = true for the loader configuration, which contradicts the above.

I have read ISPN-1239, Controlled cluster shutdown with data restore from persistent storage, ISPN-3351, and infinispan5.3 fetchInMemoryState=false,some data loss. These seem to imply that (2) is possible by disabling re-balancing using JMX. (3) is obviously possible with state transfer enabled. But I do not see how (1) is possible, since the JMX operation applies to a running cache - EAP is going to start the node with state transfer enabled before I can disable it with JMX.

As a newbie, am I missing something obvious and does what I am trying to do make sense? Any suggestions how to accomplish the above would be greatly appreciated.

Also, as a side note, it's unclear to me the distinction between fetchPersistentState and fetchInMemoryState. In searching the source code it appears that these flags are equivalent, and if either (or both) are set, it results in org.infinispan.statetransfer.StateConsumerImpl.isFetchEnabled being set to true. Is there a distinction or have I missed something?

Thanks!

1. Re: Distributed persistent cache with local non shared store

mircea.markus Jan 6, 2014 6:23 AM (in response to jonathandfields)

Indeed 1) is not possible without having ISPN-3351 implemented (please vote for it in JIRA as you consider it important, that would help to prioritize).
fetchPersistentState transfer the state from the remote cache store vs fetchInMemoryStat, which only transfers what's in memory.
1 of 1 people found this helpful
Actions
2. Re: Distributed persistent cache with local non shared store

jonathandfields Jan 6, 2014 11:07 AM (in response to mircea.markus)

Thanks I have voted for both issues. In the mean time, it seems that the best approach is to configure state transfer, and to just let re-balancing do its thing. It does seem that order of shutdown and startup of nodes is important for non-shared cache store, since the last node to be shutdown will contain all the data in it's store. Therefore it needs to be the first node to be started, correct? If I were to use a shared cache store, this would not be the case.

Another operational question is how to perform backup and restore of non-shared cache stores for a distributed cache. Let's take BDBJE as an example. I can easily create hot backups of the BDBJE data files on each node to off-site storage. Now let's say that we need to restore the entire cluster due to a disaster. It does not seem that would be possible. If I restored each nodes data files, then the first node to be started would start with only the data that it contained when it was lost - not all of the data as in a controlled shutdown. Am I missing something, or is it currently impossible to backup the persistent state of a distributed cache using non-shared cache stores?

Thanks!
Actions
3. Re: Distributed persistent cache with local non shared store

mircea.markus Jan 9, 2014 8:22 AM (in response to jonathandfields)

Am I missing something, or is it currently impossible to backup the persistent state of a distributed cache using non-shared cache stores?
It is possible, but with certain limitations:
- shutdown all the nodes one by one
- the last shut down node will persist all the data (assuming passivation is used here, otherwise the last node might OOM)
- when restarting, always make sure you restart the last node first. This will have access to all the data
- for all other nodes, wipe out the persistent store, and then restart them with fetchPersistentStore==true.

HTH,
M
1 of 1 people found this helpful
Actions
4. Re: Distributed persistent cache with local non shared store

jonathandfields Jan 9, 2014 9:20 AM (in response to mircea.markus)

Thanks that is helpful. However after experimenting with distributed caches and non shared loaders for a couple of days now, I have decided to change to a shared loader to keep things simple. I had hoped to keep the system self contained and all Java (hence the use of a non-shared loader like BDBJE) but it seems that for a distributed cache, a shared loader is the the better approach. Using something like Cassandra does allow me to keep it pure Java.

One more question if you don't mind... . Using a shared loader and distribution, is it possible to shut down all nodes in the cluster, and then have them load persistent data after restarting? I will always have enough servers/memory to do this. But it is not clear what combination of preload, fetchInMemoryState, and fetchPersistentState would accomplish this, or if it is even possible. This could be ISPN-3351 again... I could do this at the application level by fetching every item in the cache when starting the app, but it would be nice if there is a way to do this via configuration.

Thanks!
Actions
5. Re: Distributed persistent cache with local non shared store

dan.berindei Jan 16, 2014 6:24 AM (in response to jonathandfields)

I looked at a test (RehashAfterJoinWithPreloadTest) and I'm pretty sure preload should work with shared cache stores. Because preload happens before joining the cluster, every joiner will preload all the data, so you will have a lot of extra work, but other than that it should be fine.

As a note, fetchPersistentState is ignored for shared loaders, as loading from the shared store directly is faster than asking another node to load from the same store.
Actions

Go to original post