Infinispan node cannot join cache after crash
nsahattchiev Mar 20, 2019 11:49 AMHi,
we have a distributed Infinispan cache running in 4 nodes and with global state and persistence:
<global-state>
<persistent-location path="rocksdb/${localNodeId}/persistent" />
<shared-persistent-location path="rocksdb/${localNodeId}/shared"/>
<temporary-location path="rocksdb/${localNodeId}/tmp"/>
<overlay-configuration-storage />
</global-state>
.....
<distributed-cache name="EiwoDistributedCache" mode="SYNC" remote-timeout="300000" owners="2" segments="100">
<locking concurrency-level="1000" acquire-timeout="60000"/>
<transaction mode="NONE"/>
<persistence passivation="false">
<rocksdbStore:rocksdb-store preload="true" fetch-state="true" path="rocksdb/${localNodeId}/data/">
<rocksdbStore:expiration path="rocksdb/${localNodeId}/expired/"/>
</rocksdbStore:rocksdb-store>
</persistence>
<indexing index="NONE"/>
<state-transfer timeout="120000" await-initial-transfer="true"></state-transfer>
</distributed-cache>
After an out-of-memory in node 2 the whole cluster was in an unstable state and we tried to restart it. Nodes 1, 2 and 4 could be started without any issues, but node 3 failed always with the following exception:
org.infinispan.commons.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.start() throws java.lang.Exception on object of type StateTransferManagerImpl
at org.infinispan.commons.util.SecurityActions.lambda$invokeAccessibly$0(SecurityActions.java:83)
at org.infinispan.commons.util.SecurityActions.doPrivileged(SecurityActions.java:71)
at org.infinispan.commons.util.SecurityActions.invokeAccessibly(SecurityActions.java:76)
at org.infinispan.commons.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:185)
at org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:968)
at org.infinispan.factories.AbstractComponentRegistry.lambda$invokePrioritizedMethods$6(AbstractComponentRegistry.java:703)
at org.infinispan.factories.SecurityActions.lambda$run$1(SecurityActions.java:72)
at org.infinispan.security.Security.doPrivileged(Security.java:44)
at org.infinispan.factories.SecurityActions.run(SecurityActions.java:71)
at org.infinispan.factories.AbstractComponentRegistry.invokePrioritizedMethods(AbstractComponentRegistry.java:696)
at org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:689)
at org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:607)
at org.infinispan.factories.ComponentRegistry.start(ComponentRegistry.java:244)
at org.infinispan.cache.impl.CacheImpl.start(CacheImpl.java:1051)
at org.infinispan.cache.impl.AbstractDelegatingCache.start(AbstractDelegatingCache.java:421)
at org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:646)
at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:591)
at org.infinispan.manager.DefaultCacheManager.internalGetCache(DefaultCacheManager.java:477)
at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:463)
at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:449)
..........
Caused by: org.infinispan.topology.CacheJoinException: ISPN000410: Node eiwopoc-14554 attempting to join cache EiwoDistributedCache with incompatible state
at org.infinispan.topology.ClusterCacheStatus.addMember(ClusterCacheStatus.java:233)
at org.infinispan.topology.ClusterCacheStatus.doJoin(ClusterCacheStatus.java:692)
at org.infinispan.topology.ClusterTopologyManagerImpl.handleJoin(ClusterTopologyManagerImpl.java:212)
at org.infinispan.topology.CacheTopologyControlCommand.doPerform(CacheTopologyControlCommand.java:178)
at org.infinispan.topology.CacheTopologyControlCommand.invokeAsync(CacheTopologyControlCommand.java:160)
at org.infinispan.remoting.inboundhandler.GlobalInboundInvocationHandler.invokeReplicableCommand(GlobalInboundInvocationHandler.java:169)
at org.infinispan.remoting.inboundhandler.GlobalInboundInvocationHandler.runReplicableCommand(GlobalInboundInvocationHandler.java:150)
at org.infinispan.remoting.inboundhandler.GlobalInboundInvocationHandler.lambda$handleReplicableCommand$1(GlobalInboundInvocationHandler.java:144)
at org.infinispan.util.concurrent.BlockingTaskAwareExecutorServiceImpl$RunnableWrapper.run(BlockingTaskAwareExecutorServiceImpl.java:212)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
How can we get it up and running again without losing any data? We use Infinispan version 9.3.6.Final.
Regards
Nikolai