4 Replies Latest reply on Jul 10, 2018 7:22 AM by Nikolai Sahattchiev

    How to shutdown an Infinispan cluster without losing data?

    Nikolai Sahattchiev Newbie

      What is the best strategy to restart a complete Infinispan cluster (distributed cache with persistence) without to lose any data?

       

      For example, when an operating system patch has to be applied or a new software version has to be deployed on all Infinispan servers and they have to be stopped, is there any way to do that, without causing Infinispan to do a rebalancing after each node has stopped?

       

      When each node is stopped gracefully, then Infinispan will start rebalancing to ensure, that each object has at least numOwner owners.

       

      When we have a cluster with, let say 8 nodes, and we stop 6 of them, Infinspan will try to move all objects to the last 2 nodes and when they do not have enough capacity (java heap) for the whole cache, then this would lead to OutOfMemory issues and probably data loss.

       

      Is there a way to stop an Infinispan node with disabled rebalancing?

       

      I have read this discussion: https://community.jboss.org/wiki/ControlledClusterShutdownWithDataRestoreFromPersistentStorage and the referenced ISPN-3351, but I'm not able to find any documentation how to achieve that. The link https://github.com/infinispan/infinispan/wiki/Graceful-shutdown-&-restore mentioned in ISPN-3351 is not more valid.

        • 1. Re: How to shutdown an Infinispan cluster without loosing data?
          Nikolai Sahattchiev Newbie

          ok, I did some more research. I assume, that the GlobalStateManager should be used to achieve that.

           

          Where can I find documentation about how to configure it? There are only 3 rows about that in the user guide web page Infinispan 9.3 User Guide

           

             <!-- if needed to persist counter, global state needs to be configured -->
             <global-state>
            ...
             </global-state>

           

           

          Using of global state config as below does not work:

           

          <cache-container default-cache="dist-sync">

             <transport stack="my-tcp" cluster="mycluster"/>

           

             <global-state>

                 <persistent-location path="/appl/infinispan-poc/home/tmp/persistent" />

                 <shared-persistent-location path="/appl/infinispan-poc/home/tmp/shared"/>

                 <temporary-location path="/appl/infinispan-poc/home/tmp/tmp"/>

                 <overlay-configuration-storage />

             </global-state>

           

             <distributed-cache name="dist-sync" mode="SYNC" remote-timeout="300000" owners="2" segments="100">

                 <locking concurrency-level="1000" acquire-timeout="60000"/>

                 <transaction mode="NONE"/>

                 .......

             </distributed-cache>

          ......

          </cache-container>

           

           

          The first cluster node starts successfully, any additional node is failing with:

           

          org.infinispan.manager.EmbeddedCacheManagerStartupException: org.infinispan.commons.CacheConfigurationException: ISPN000512: Cannot acquire lock '/appl/infinispan-poc/home/tmp/persistent/___global.lck' for persistent global state

                   at org.infinispan.factories.GlobalComponentRegistry.start(GlobalComponentRegistry.java:271)

                   at org.infinispan.manager.DefaultCacheManager.start(DefaultCacheManager.java:678)

                   at org.infinispan.manager.DefaultCacheManager.<init>(DefaultCacheManager.java:343)

                   at org.infinispan.manager.DefaultCacheManager.<init>(DefaultCacheManager.java:311)

                   at org.infinispan.manager.DefaultCacheManager.<init>(DefaultCacheManager.java:298)

                   at ipoc.InfinispanNode.start(InfinispanNode.java:76)

                   at ipoc.Bootstrap.start(Bootstrap.java:96)

                   at ipoc.Bootstrap.main(Bootstrap.java:82)

          Caused by: org.infinispan.commons.CacheConfigurationException: ISPN000512: Cannot acquire lock '/appl/infinispan-poc/home/tmp/persistent/___global.lck' for persistent global state

                   at org.infinispan.globalstate.impl.GlobalStateManagerImpl.acquireGlobalLock(GlobalStateManagerImpl.java:81)

                   at org.infinispan.globalstate.impl.GlobalStateManagerImpl.start(GlobalStateManagerImpl.java:61)

                   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

                   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

                   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

                   at java.lang.reflect.Method.invoke(Method.java:498)

                   at org.infinispan.commons.util.SecurityActions.lambda$invokeAccessibly$0(SecurityActions.java:79)

                   at org.infinispan.commons.util.SecurityActions.doPrivileged(SecurityActions.java:71)

                   at org.infinispan.commons.util.SecurityActions.invokeAccessibly(SecurityActions.java:76)

                   at org.infinispan.commons.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:185)

                   at org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:968)

                   at org.infinispan.factories.AbstractComponentRegistry.lambda$invokePrioritizedMethods$6(AbstractComponentRegistry.java:703)

                   at org.infinispan.factories.SecurityActions.lambda$run$1(SecurityActions.java:72)

                   at org.infinispan.security.Security.doPrivileged(Security.java:44)

                   at org.infinispan.factories.SecurityActions.run(SecurityActions.java:71)

                   at org.infinispan.factories.AbstractComponentRegistry.invokePrioritizedMethods(AbstractComponentRegistry.java:696)

                   at org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:689)

                   at org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:607)

                   at org.infinispan.factories.GlobalComponentRegistry.start(GlobalComponentRegistry.java:250)

          • 2. Re: How to shutdown an Infinispan cluster without losing data?
            Nikolai Sahattchiev Newbie

            Can anybody give an advice where to find a documentation about the GlobalStateManager or how to configure it properly?

            • 3. Re: How to shutdown an Infinispan cluster without losing data?
              Tristan Tarrant Master

              The key item in achieving this is the

               

              <global-state/>

               

              element. This ensures that all nodes get assigned a persistent UUID which is used to ensure that they can be identified after a restart.

              The persistent UUID, together with other cache state (aside from cache data), is stored in the path chosen by:

               

              <persistent-location path="/path/to/persistent/state" />

               

              This path must be unique per node, i.e. it cannot be shared by multiple nodes, be they on the same machine or different machines using a shared disk, such as NFS.

              If you want to shut down a cache gracefully, you must invoke the following method:

               

              cache.shutdown();

               

              Which will invoke the shutdown on every node.

              In order for the cache to restart successfully, all original nodes must be present in the cluster view, i.e. if you had nodes A, B, C when you shut down the cache, the cluster must include exactly those nodes when starting up. Once a cache has been restarted, you can freely scale up/down.

              1 of 1 people found this helpful
              • 4. Re: How to shutdown an Infinispan cluster without losing data?
                Nikolai Sahattchiev Newbie

                Thanks Tristan! My persistent-location path was not unique per node and that caused the exception above. I have changed the config and now all works fine