Only one node refuses to join an Infinispan cluster
il_pizzaiolo Jan 28, 2015 5:12 PMThis is a real headscratcher, so I'm looking for some help on this. My customer has three nodes with more-or-less identical Infinispan and JGroups configs (see below); in addition, the application code which uses Infinispan is also the same across all three nodes . The jgroups-tcp.xml config of each uses the host's name in '<TCP bind_addr="${jgroups.bind_addr:<FQDN>}, but the TCPPING 'initial_hosts' is identical for each node. I've attached a sample jgroups-tcp.xml.
The behavior is this:
1) Start node-1.
2) Start node-2. Node-1 and node-2 see each other and join.
3) Start node-3. Node-1 and node-2 log that they see it: ISPN000094: Received new cluster view: [<node-1's name>|6] (3) [<node-1's name>, <node-2's name>, <node-3's name>]
4) At about this time, node-3 logs the following complaint:
<TIMESTAMP> - [ERROR] - from org.jgroups.protocols.TCP in OOB-1,shared=tcp
JGRP000030: null: failed handling incoming message: java.lang.NoSuchFieldError: serializedCreator
5) Exactly 4 minutes later, node-3 reports that it can't start Infinispan and goes down:
<TIMESTAMP> - [ERROR] - from application in main
Exception occured in InfinispanPlugin.onStartUnable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.start() throws java.lang.Exception on object of type StateTransferManagerImpl
org.infinispan.commons.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.start() throws java.lang.Exception on object of type StateTransferManagerImpl
at org.infinispan.commons.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:185) ~[org.infinispan.infinispan-commons-6.0.2.Final.jar:6.0.2.Final]
at org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:869) ~[org.infinispan.infinispan-core-6.0.2.Final.jar:6.0.2.Final]
at org.infinispan.factories.AbstractComponentRegistry.invokeStartMethods(AbstractComponentRegistry.java:638) ~[org.infinispan.infinispan-core-6.0.2.Final.jar:6.0.2.Final]
at org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:627) ~[org.infinispan.infinispan-core-6.0.2.Final.jar:6.0.2.Final]
at org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:530) ~[org.infinispan.infinispan-core-6.0.2.Final.jar:6.0.2.Final]
at org.infinispan.factories.ComponentRegistry.start(ComponentRegistry.java:216) ~[org.infinispan.infinispan-core-6.0.2.Final.jar:6.0.2.Final]
at org.infinispan.CacheImpl.start(CacheImpl.java:675) ~[org.infinispan.infinispan-core-6.0.2.Final.jar:6.0.2.Final]
at org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:553) ~[org.infinispan.infinispan-core-6.0.2.Final.jar:6.0.2.Final]
at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:516) ~[org.infinispan.infinispan-core-6.0.2.Final.jar:6.0.2.Final]
at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:398) ~[org.infinispan.infinispan-core-6.0.2.Final.jar:6.0.2.Final]
...
Caused by: org.infinispan.util.concurrent.TimeoutException: Node <node-1's name> timed out
at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:174) ~[org.infinispan.infinispan-core-6.0.2.Final.jar:6.0.2.Final]
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:521) ~[org.infinispan.infinispan-core-6.0.2.Final.jar:6.0.2.Final]
at org.infinispan.topology.LocalTopologyManagerImpl.executeOnCoordinator(LocalTopologyManagerImpl.java:287) ~[org.infinispan.infinispan-core-6.0.2.Final.jar:6.0.2.Final]
at org.infinispan.topology.LocalTopologyManagerImpl.join(LocalTopologyManagerImpl.java:100) ~[org.infinispan.infinispan-core-6.0.2.Final.jar:6.0.2.Final]
at org.infinispan.statetransfer.StateTransferManagerImpl.start(StateTransferManagerImpl.java:100) ~[org.infinispan.infinispan-core-6.0.2.Final.jar:6.0.2.Final]
...
Caused by: org.jgroups.TimeoutException: timeout sending message to <node-1's name>
at org.jgroups.blocks.MessageDispatcher.sendMessage(MessageDispatcher.java:419) ~[org.jgroups.jgroups-3.4.1.Final.jar:3.4.1.Final]
at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.processSingleCall(CommandAwareRpcDispatcher.java:353) ~[org.infinispan.infinispan-core-6.0.2.Final.jar:6.0.2.Final]
at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:167) ~[org.infinispan.infinispan-core-6.0.2.Final.jar:6.0.2.Final]
It's not specific to node-1 either, because under starts and restarts of all nodes, sometimes node-3 will fail trying to talk to node-2.
I think the core issue is the initial 'java.lang.NoSuchFieldError: serializedCreator' which points to an issue node-3 demarshalling objects from its cluster mates. However, all three nodes have the same versions of:
org.infinispan.infinispan-core-6.0.2.Final.jar
org.jboss.marshalling.jboss-marshalling-river-1.4.4.Final.jar
org.jboss.marshalling.jboss-marshalling-1.4.4.Final.jar
I captured concurrent network traces on all three nodes, and node-3 communicates with node-1 (in the scenario above). This problem appeared out of the blue but has been consistent for about a week. Anyone have any guesses as to what could be the issue?
-
jgroups-tcp_node-3.xml 2.7 KB