4 Replies Latest reply on Jul 10, 2018 7:22 AM by nsahattchiev

How to shutdown an Infinispan cluster without losing data?

nsahattchiev Jul 6, 2018 4:13 AM

What is the best strategy to restart a complete Infinispan cluster (distributed cache with persistence) without to lose any data?

For example, when an operating system patch has to be applied or a new software version has to be deployed on all Infinispan servers and they have to be stopped, is there any way to do that, without causing Infinispan to do a rebalancing after each node has stopped?

When each node is stopped gracefully, then Infinispan will start rebalancing to ensure, that each object has at least numOwner owners.

When we have a cluster with, let say 8 nodes, and we stop 6 of them, Infinspan will try to move all objects to the last 2 nodes and when they do not have enough capacity (java heap) for the whole cache, then this would lead to OutOfMemory issues and probably data loss.

Is there a way to stop an Infinispan node with disabled rebalancing?

I have read this discussion: https://community.jboss.org/wiki/ControlledClusterShutdownWithDataRestoreFromPersistentStorage and the referenced ISPN-3351, but I'm not able to find any documentation how to achieve that. The link https://github.com/infinispan/infinispan/wiki/Graceful-shutdown-&-restore mentioned in ISPN-3351 is not more valid.

1. Re: How to shutdown an Infinispan cluster without loosing data?

nsahattchiev Jul 5, 2018 1:18 PM (in response to nsahattchiev)

ok, I did some more research. I assume, that the GlobalStateManager should be used to achieve that.

Where can I find documentation about how to configure it? There are only 3 rows about that in the user guide web page Infinispan 9.3 User Guide

       <global-state> ...    </global-state>

Using of global state config as below does not work:

<cache-container default-cache="dist-sync">
   <transport stack="my-tcp" cluster="mycluster"/>

   <global-state>
       <persistent-location path="/appl/infinispan-poc/home/tmp/persistent" />
       <shared-persistent-location path="/appl/infinispan-poc/home/tmp/shared"/>
       <temporary-location path="/appl/infinispan-poc/home/tmp/tmp"/>
       <overlay-configuration-storage />
   </global-state>

   <distributed-cache name="dist-sync" mode="SYNC" remote-timeout="300000" owners="2" segments="100">
       <locking concurrency-level="1000" acquire-timeout="60000"/>
       <transaction mode="NONE"/>
       .......
   </distributed-cache>
......
</cache-container>

The first cluster node starts successfully, any additional node is failing with:

org.infinispan.manager.EmbeddedCacheManagerStartupException: org.infinispan.commons.CacheConfigurationException: ISPN000512: Cannot acquire lock '/appl/infinispan-poc/home/tmp/persistent/___global.lck' for persistent global state
         at org.infinispan.factories.GlobalComponentRegistry.start(GlobalComponentRegistry.java:271)
         at org.infinispan.manager.DefaultCacheManager.start(DefaultCacheManager.java:678)
         at org.infinispan.manager.DefaultCacheManager.<init>(DefaultCacheManager.java:343)
         at org.infinispan.manager.DefaultCacheManager.<init>(DefaultCacheManager.java:311)
         at org.infinispan.manager.DefaultCacheManager.<init>(DefaultCacheManager.java:298)
         at ipoc.InfinispanNode.start(InfinispanNode.java:76)
         at ipoc.Bootstrap.start(Bootstrap.java:96)
         at ipoc.Bootstrap.main(Bootstrap.java:82)
Caused by: org.infinispan.commons.CacheConfigurationException: ISPN000512: Cannot acquire lock '/appl/infinispan-poc/home/tmp/persistent/___global.lck' for persistent global state
         at org.infinispan.globalstate.impl.GlobalStateManagerImpl.acquireGlobalLock(GlobalStateManagerImpl.java:81)
         at org.infinispan.globalstate.impl.GlobalStateManagerImpl.start(GlobalStateManagerImpl.java:61)
         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
         at java.lang.reflect.Method.invoke(Method.java:498)
         at org.infinispan.commons.util.SecurityActions.lambda$invokeAccessibly$0(SecurityActions.java:79)
         at org.infinispan.commons.util.SecurityActions.doPrivileged(SecurityActions.java:71)
         at org.infinispan.commons.util.SecurityActions.invokeAccessibly(SecurityActions.java:76)
         at org.infinispan.commons.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:185)
         at org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:968)
         at org.infinispan.factories.AbstractComponentRegistry.lambda$invokePrioritizedMethods$6(AbstractComponentRegistry.java:703)
         at org.infinispan.factories.SecurityActions.lambda$run$1(SecurityActions.java:72)
         at org.infinispan.security.Security.doPrivileged(Security.java:44)
         at org.infinispan.factories.SecurityActions.run(SecurityActions.java:71)
         at org.infinispan.factories.AbstractComponentRegistry.invokePrioritizedMethods(AbstractComponentRegistry.java:696)
         at org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:689)
         at org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:607)
         at org.infinispan.factories.GlobalComponentRegistry.start(GlobalComponentRegistry.java:250)
Actions
2. Re: How to shutdown an Infinispan cluster without losing data?

nsahattchiev Jul 8, 2018 6:09 AM (in response to nsahattchiev)

Can anybody give an advice where to find a documentation about the GlobalStateManager or how to configure it properly?
Actions
3. Re: How to shutdown an Infinispan cluster without losing data?

nadirx Jul 9, 2018 3:02 AM (in response to nsahattchiev)
The key item in achieving this is the

<global-state/>

element. This ensures that all nodes get assigned a persistent UUID which is used to ensure that they can be identified after a restart.
The persistent UUID, together with other cache state (aside from cache data), is stored in the path chosen by:

<persistent-location path="/path/to/persistent/state" />

This path must be unique per node, i.e. it cannot be shared by multiple nodes, be they on the same machine or different machines using a shared disk, such as NFS.
If you want to shut down a cache gracefully, you must invoke the following method:

cache.shutdown();

Which will invoke the shutdown on every node.
In order for the cache to restart successfully, all original nodes must be present in the cluster view, i.e. if you had nodes A, B, C when you shut down the cache, the cluster must include exactly those nodes when starting up. Once a cache has been restarted, you can freely scale up/down.
1 of 1 people found this helpful
Actions
4. Re: How to shutdown an Infinispan cluster without losing data?

nsahattchiev Jul 10, 2018 7:22 AM (in response to nadirx)

Thanks Tristan! My persistent-location path was not unique per node and that caused the exception above. I have changed the config and now all works fine
Actions

Go to original post