Node cannot join cluster correctly after restart
maruta.s Oct 6, 2014 4:35 AMHello,
we are facing problem, that after some unexpected server restart, node is not able to join the cluster again.
Our environment is 2 node cluster (infinispan rest servers). One node1 (server) was restarted and after that it cannot join the cluster until second node (node2) was restarted:
Logs from node1:
After restart there are many errors regarding transactions:
Sep 29, 2014 3:03:53 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport stop
INFO: ISPN000080: Disconnecting and closing JGroups Channel
Sep 29, 2014 3:03:54 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport stop
INFO: ISPN000082: Stopping the RpcDispatcher
Sep 29, 2014 3:05:20 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport start
INFO: ISPN000078: Starting JGroups Channel
Sep 29, 2014 3:05:21 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport viewAccepted
INFO: ISPN000094: Received new cluster view: [LOUAPPWPS984-56770|15] [LOUAPPWPS984-56770, LOUAPPWPS983-56765]
Sep 29, 2014 3:05:21 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport startJGroupsChannelIfNeeded
INFO: ISPN000079: Cache local address is LOUAPPWPS983-56765, physical addresses are [133.27.18.204:7800]
Sep 29, 2014 3:05:21 AM org.infinispan.factories.GlobalComponentRegistry start
INFO: ISPN000128: Infinispan version: Infinispan 'Delirium' 5.2.6.Final
Sep 29, 2014 3:05:22 AM org.infinispan.factories.TransactionManagerFactory construct
INFO: ISPN000161: Using a batchMode transaction manager
Sep 29, 2014 3:05:22 AM org.infinispan.jmx.CacheJmxRegistration start
INFO: ISPN000031: MBeans were successfully registered to the platform MBean server.
Sep 29, 2014 3:05:23 AM org.infinispan.factories.TransactionManagerFactory construct
INFO: ISPN000161: Using a batchMode transaction manager
Sep 29, 2014 3:05:23 AM org.infinispan.jmx.CacheJmxRegistration start
INFO: ISPN000031: MBeans were successfully registered to the platform MBean server.
Sep 29, 2014 3:05:23 AM org.infinispan.factories.TransactionManagerFactory construct
INFO: ISPN000161: Using a batchMode transaction manager
Sep 29, 2014 3:05:23 AM org.infinispan.jmx.CacheJmxRegistration start
INFO: ISPN000031: MBeans were successfully registered to the platform MBean server.
Sep 29, 2014 3:06:18 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport viewAccepted
INFO: ISPN000094: Received new cluster view: [LOUAPPWPS983-56765|16] [LOUAPPWPS983-56765]
Sep 29, 2014 3:06:38 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport viewAccepted
INFO: ISPN000093: Received new, MERGED cluster view: MergeView::[LOUAPPWPS983-56765|17] [LOUAPPWPS983-56765, LOUAPPWPS984-56770], subgroups=[LOUAPPWPS984-56770|15] [LOUAPPWPS984-56770], [LOUAPPWPS983-56765|16] [LOUAPPWPS983-56765]
Sep 29, 2014 3:07:33 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport stop
INFO: ISPN000080: Disconnecting and closing JGroups Channel
Sep 29, 2014 3:07:33 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport stop
INFO: ISPN000082: Stopping the RpcDispatcher
Sep 29, 2014 3:08:45 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport start
INFO: ISPN000078: Starting JGroups Channel
Sep 29, 2014 3:08:46 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport viewAccepted
INFO: ISPN000094: Received new cluster view: [LOUAPPWPS984-56770|19] [LOUAPPWPS984-56770, LOUAPPWPS983-54866]
Sep 29, 2014 3:08:47 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport startJGroupsChannelIfNeeded
INFO: ISPN000079: Cache local address is LOUAPPWPS983-54866, physical addresses are [133.27.18.204:7800]
Sep 29, 2014 3:08:47 AM org.infinispan.factories.GlobalComponentRegistry start
INFO: ISPN000128: Infinispan version: Infinispan 'Delirium' 5.2.6.Final
Sep 29, 2014 3:08:47 AM org.infinispan.factories.TransactionManagerFactory construct
INFO: ISPN000161: Using a batchMode transaction manager
Sep 29, 2014 3:08:47 AM org.infinispan.jmx.CacheJmxRegistration start
INFO: ISPN000031: MBeans were successfully registered to the platform MBean server.
Sep 29, 2014 3:12:47 AM org.apache.catalina.core.ApplicationContext log
SEVERE: StandardWrapper.Throwable
org.infinispan.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.InterruptedException on object of type StateTransferManagerImpl
at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:205)
at org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:886)
at org.infinispan.factories.AbstractComponentRegistry.invokeStartMethods(AbstractComponentRegistry.java:657)
at org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:646)
at org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:549)
at org.infinispan.factories.ComponentRegistry.start(ComponentRegistry.java:217)
at org.infinispan.CacheImpl.start(CacheImpl.java:582)
at org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:686)
at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:649)
at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:545)
at org.infinispan.rest.StartupListener$$anonfun$init$1.apply(StartupListener.scala:66)
at org.infinispan.rest.StartupListener$$anonfun$init$1.apply(StartupListener.scala:65)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1156)
at org.infinispan.rest.StartupListener.init(StartupListener.scala:65)
at org.apache.catalina.core.StandardWrapper.initServlet(StandardWrapper.java:1280)
at org.apache.catalina.core.StandardWrapper.loadServlet(StandardWrapper.java:1193)
at org.apache.catalina.core.StandardWrapper.load(StandardWrapper.java:1088)
at org.apache.catalina.core.StandardContext.loadOnStartup(StandardContext.java:5176)
at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5460)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901)
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:633)
at org.apache.catalina.startup.HostConfig.deployDirectory(HostConfig.java:1120)
at org.apache.catalina.startup.HostConfig$DeployDirectory.run(HostConfig.java:1678)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.infinispan.CacheException: Initial state transfer timed out for cache eu.ysoft.safeq.core.cache.entity.CacheableJobInfo_index on LOUAPPWPS983-54866
at org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:216)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:203)
... 30 more
....
Followed by :
Sep 29, 2014 4:29:06 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport viewAccepted
INFO: ISPN000094: Received new cluster view: [LOUAPPWPS983-54866|20] [LOUAPPWPS983-54866]
Sep 29, 2014 4:29:07 AM org.jgroups.logging.JDKLogImpl warn
WARNING: I was suspected by LOUAPPWPS984-56770; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
Sep 29, 2014 4:29:10 AM org.jgroups.logging.JDKLogImpl warn
WARNING: I was suspected by LOUAPPWPS984-56770; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
Sep 29, 2014 4:29:12 AM org.infinispan.factories.TransactionManagerFactory construct
INFO: ISPN000161: Using a batchMode transaction manager
Sep 29, 2014 4:29:12 AM org.infinispan.jmx.CacheJmxRegistration start
INFO: ISPN000031: MBeans were successfully registered to the platform MBean server.
Sep 29, 2014 4:29:13 AM org.jgroups.logging.JDKLogImpl warn
WARNING: I was suspected by LOUAPPWPS984-56770; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
Sep 29, 2014 4:29:16 AM org.jgroups.logging.JDKLogImpl warn
WARNING: I was suspected by LOUAPPWPS984-56770; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
Sep 29, 2014 4:29:19 AM org.jgroups.logging.JDKLogImpl warn
WARNING: I was suspected by LOUAPPWPS984-56770; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
Sep 29, 2014 4:29:22 AM org.jgroups.logging.JDKLogImpl warn
WARNING: I was suspected by LOUAPPWPS984-56770; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
Sep 29, 2014 4:29:46 AM org.jgroups.logging.JDKLogImpl warn
WARNING: I was suspected by LOUAPPWPS984-56770; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
Sep 29, 2014 4:29:49 AM org.jgroups.logging.JDKLogImpl warn
WARNING: I was suspected by LOUAPPWPS984-56770; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
Sep 29, 2014 4:29:52 AM org.jgroups.logging.JDKLogImpl warn
WARNING: I was suspected by LOUAPPWPS984-56770; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
...
Sep 29, 2014 4:30:44 AM org.jgroups.logging.JDKLogImpl warn
WARNING: I was suspected by LOUAPPWPS984-56770; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
Sep 29, 2014 4:30:47 AM org.jgroups.logging.JDKLogImpl warn
WARNING: I was suspected by LOUAPPWPS984-56770; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
Sep 29, 2014 4:30:50 AM org.jgroups.logging.JDKLogImpl warn
WARNING: I was suspected by LOUAPPWPS984-56770; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
Sep 29, 2014 4:30:52 AM org.jgroups.logging.JDKLogImpl warn
WARNING: LOUAPPWPS983-54866: no physical address for LOUAPPWPS984-56770, dropping message
Sep 29, 2014 4:30:52 AM org.jgroups.logging.JDKLogImpl warn
WARNING: LOUAPPWPS983-54866: no physical address for LOUAPPWPS984-56770, dropping message
Sep 29, 2014 4:30:53 AM org.jgroups.logging.JDKLogImpl warn
....
still repeating
Logs from node2:
The second node was not restarted:
Sep 29, 2014 3:05:21 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport viewAccepted
INFO: ISPN000094: Received new cluster view: [LOUAPPWPS984-56770|15] [LOUAPPWPS984-56770, LOUAPPWPS983-56765]
Sep 29, 2014 3:06:38 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport viewAccepted
INFO: ISPN000093: Received new, MERGED cluster view: MergeView::[LOUAPPWPS983-56765|17] [LOUAPPWPS983-56765, LOUAPPWPS984-56770], subgroups=[LOUAPPWPS984-56770|15] [LOUAPPWPS984-56770], [LOUAPPWPS983-56765|16] [LOUAPPWPS983-56765]
Sep 29, 2014 3:07:33 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport viewAccepted
INFO: ISPN000094: Received new cluster view: [LOUAPPWPS984-56770|18] [LOUAPPWPS984-56770]
Sep 29, 2014 3:07:33 AM org.infinispan.topology.ClusterTopologyManagerImpl handleNewView
ERROR: ISPN000196: Failed to recover cluster state after the current node became the coordinator
java.lang.IllegalStateException: Trying to set a topology with invalid members for cache eu.ysoft.safeq.core.cache.entity.CacheableJobInfo_index: members = [LOUAPPWPS984-56770], topology = CacheTopology{id=37, currentCH=DefaultConsistentHash{numSegments=60, numOwners=1, members=[LOUAPPWPS984-56770, LOUAPPWPS983-56765]}, pendingCH=null}
at org.infinispan.topology.ClusterCacheStatus.updateCacheTopology(ClusterCacheStatus.java:165)
at org.infinispan.topology.ClusterTopologyManagerImpl.updateCacheStatusAfterMerge(ClusterTopologyManagerImpl.java:315)
at org.infinispan.topology.ClusterTopologyManagerImpl.handleNewView(ClusterTopologyManagerImpl.java:236)
at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener.handleViewChange(ClusterTopologyManagerImpl.java:579)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.infinispan.notifications.AbstractListenerImpl$ListenerInvocation$1.run(AbstractListenerImpl.java:212)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Sep 29, 2014 3:08:46 AM org.infinispan.remoting.transport.jgroups.JGroupsTransport viewAccepted
INFO: ISPN000094: Received new cluster view: [LOUAPPWPS984-56770|19] [LOUAPPWPS984-56770, LOUAPPWPS983-54866]
Sep 29, 2014 3:12:48 AM org.infinispan.statetransfer.StateProviderImpl getCacheTopology
WARN: ISPN000211: Transactions were requested by node LOUAPPWPS983-54866 with topology 30, older than the local topology (35)
Sep 29, 2014 3:12:48 AM org.infinispan.statetransfer.StateProviderImpl getCacheTopology
WARN: ISPN000212: Segments were requested by node LOUAPPWPS983-54866 with topology 30, older than the local topology (35)
Sep 29, 2014 12:42:27 PM org.infinispan.transaction.TransactionTable shutDownGracefully
WARN: ISPN000100: Stopping, but there are 0 local transactions and 16 remote transactions that did not finish in time.
Sep 29, 2014 12:42:27 PM org.infinispan.remoting.transport.jgroups.JGroupsTransport stop
INFO: ISPN000080: Disconnecting and closing JGroups Channel
Sep 29, 2014 12:42:29 PM org.jgroups.logging.JDKLogImpl warn
WARNING: LOUAPPWPS984-56770: failed to collect all ACKs (expected=1) for view [LOUAPPWPS983-54866|20] after 2000ms, missing ACKs from [LOUAPPWPS983-54866]
Sep 29, 2014 12:42:30 PM org.jgroups.logging.JDKLogImpl error
SEVERE: couldn't deliver OOB message [dst: LOUAPPWPS984-56770, src: LOUAPPWPS983-54866 (3 headers), size=1396 bytes, flags=OOB|DONT_BUNDLE]
java.lang.IllegalStateException: channel is not connected
at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.down(MessageDispatcher.java:615)
at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:544)
at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:391)
at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:249)
at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:598)
at org.jgroups.JChannel.up(JChannel.java:707)
at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1020)
at org.jgroups.protocols.RSVP.up(RSVP.java:188)
at org.jgroups.protocols.FRAG2.up(FRAG2.java:181)
at org.jgroups.protocols.FlowControl.up(FlowControl.java:418)
at org.jgroups.protocols.FlowControl.up(FlowControl.java:400)
at org.jgroups.protocols.pbcast.GMS.up(GMS.java:896)
at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:245)
at org.jgroups.protocols.UNICAST2.handleDataReceived(UNICAST2.java:765)
at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:420)
at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:606)
at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:143)
at org.jgroups.protocols.FD.up(FD.java:253)
at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:288)
at org.jgroups.protocols.MERGE2.up(MERGE2.java:205)
at org.jgroups.protocols.Discovery.up(Discovery.java:359)
at org.jgroups.protocols.TP.passMessageUp(TP.java:1263)
at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1825)
at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1798)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Until the node 1 receives new view in 4:29, where it is the only node in cluster, the node1 cannot read or write data to infinispan cache.
After 4:29 is uses, but the data are not replicated, as the cluster is probably disconnected.
Could someone help me what is the reason, why after restart of node1 in 3:08 to 4:29 the cluster is not working?
And how to prevent/resolve this issue?
Full logs and infinispan-rest-server configuration are attached.
Thanks
Marta
-
node1WithRestart.txt.zip 16.7 KB
-
node2WithoutRestart.log.zip 2.0 KB
-
configuration.txt.zip 705 bytes
-
node1-configuration.xml 2.1 KB
-
node2-configuration.xml 2.1 KB