2 Replies Latest reply on Jul 6, 2011 4:49 AM by galder.zamarreno

    Distributed Cache - Join Aborts

    daver32

      We are running Infinispan 4.2.1.FINAL within JBoss AS 5.1.0.GA on two systems.  Our application accepts JMS messages and places them in the Infinispan cache.  Listeners to the cache on each system awaken and try to lock the just added node, process it and delete it.  The plan is to be handling many JMS messages per second.

       

      The trouble we've had is that upon the arrival of the first JMS message, Infinispan's attempt to Join the two instances fails, logging:

       

       

      2011-07-01 12:54:22,876 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (WorkManager(2)-3) Starting JGroups Channel

      2011-07-01 12:54:22,907 INFO  [org.jgroups.JChannel] (WorkManager(2)-3) JGroups version: 2.12.0.CR5

      2011-07-01 12:54:22,907 DEBUG [org.jgroups.conf.ClassConfigurator] (WorkManager(2)-3) Using jg-magic-map.xml as magic number file and jg-protocol-ids.xml for protocol IDs

      2011-07-01 12:54:23,704 DEBUG [org.jgroups.protocols.FD] (Timer-1,xxx.xxx.xxx.61:7900) sending are-you-alive msg to xxx.xxx.xxx.111:7900 (own address=xxx.xxx.xxx.61:7900)

      2011-07-01 12:54:23,766 DEBUG [org.jgroups.protocols.FD] (Timer-6,xxx.xxx.xxx.61:1474) sending are-you-alive msg to xxx.xxx.xxx.111:1660 (own address=xxx.xxx.xxx.61:1474)

      2011-07-01 12:54:23,766 DEBUG [org.jgroups.protocols.pbcast.GMS] (WorkManager(2)-3) changed role to org.jgroups.protocols.pbcast.ClientGmsImpl

      2011-07-01 12:54:23,797 DEBUG [org.jgroups.stack.Configurator] (WorkManager(2)-3) set property TCP.diagnostics_addr to default value /224.0.75.75

      2011-07-01 12:54:23,813 DEBUG [org.jgroups.protocols.FRAG2] (WorkManager(2)-3) received CONFIG event: {bind_addr=/xxx.xxx.xxx.61}

      2011-07-01 12:54:23,813 DEBUG [org.jgroups.protocols.MPING] (WorkManager(2)-3) bind_addr=/xxx.xxx.xxx.61 mcast_addr=/229.11.12.12, mcast_port=45699

      2011-07-01 12:54:23,829 DEBUG [org.jgroups.protocols.FRAG2] (WorkManager(2)-3) received CONFIG event: {flush_supported=true}

      2011-07-01 12:54:23,844 WARN  [org.jgroups.protocols.UDP] (Incoming-19,xxx.xxx.xxx.61:1474) packet from xxx.xxx.xxx.111:45699 has different version (0.8.1) from ours (2.6.10). Packet is discarded

      2011-07-01 12:54:23,844 WARN  [org.jgroups.protocols.UDP] (Incoming-16,xxx.xxx.xxx.61:1474) packet from xxx.xxx.xxx.61:45699 has different version (0.8.1) from ours (2.6.10). Packet is discarded

      2011-07-01 12:54:24,172 DEBUG [org.jgroups.protocols.FD] (Timer-10,xxx.xxx.xxx.61:1474) sending are-you-alive msg to xxx.xxx.xxx.111:1660 (own address=xxx.xxx.xxx.61:1474)

      2011-07-01 12:54:24,219 DEBUG [org.jgroups.protocols.FD] (Timer-6,xxx.xxx.xxx.61:1474) sending are-you-alive msg to xxx.xxx.xxx.111:1660 (own address=xxx.xxx.xxx.61:1474)

      2011-07-01 12:54:25,360 WARN  [org.jgroups.protocols.UDP] (Incoming-18,xxx.xxx.xxx.61:1474) packet from xxx.xxx.xxx.61:45699 has different version (0.8.1) from ours (2.6.10). Packet is discarded

      2011-07-01 12:54:26,860 DEBUG [org.jgroups.protocols.pbcast.GMS] (WorkManager(2)-3) initial_mbrs are [own_addr=SYSB-38243, view id=null, is_server=false, is_coord=false, logical_name=SYSB-38243, physical_addrs=xxx.xxx.xxx.111:1234]

      2011-07-01 12:54:26,860 DEBUG [org.jgroups.protocols.pbcast.GMS] (WorkManager(2)-3) election results: {}

      2011-07-01 12:54:27,016 WARN  [org.jgroups.protocols.UDP] (Incoming-1,xxx.xxx.xxx.61:1474) packet from xxx.xxx.xxx.111:45699 has different version (0.8.1) from ours (2.6.10). Packet is discarded

      2011-07-01 12:54:27,360 WARN  [org.jgroups.protocols.UDP] (Incoming-20,xxx.xxx.xxx.61:1474) packet from xxx.xxx.xxx.61:45699 has different version (0.8.1) from ours (2.6.10). Packet is discarded

      2011-07-01 12:54:27,360 DEBUG [org.jgroups.protocols.pbcast.GMS] (WorkManager(2)-3) initial_mbrs are [own_addr=SYSB-38243, view id=[SYSB-38243|0], is_server=true, is_coord=true, logical_name=SYSB-38243, physical_addrs=xxx.xxx.xxx.111:1234]

      2011-07-01 12:54:27,360 DEBUG [org.jgroups.protocols.pbcast.GMS] (WorkManager(2)-3) election results: {SYSB-38243=1}

      2011-07-01 12:54:27,360 DEBUG [org.jgroups.protocols.pbcast.GMS] (WorkManager(2)-3) sending handleJoin(SYSA-18253) to SYSB-38243

      2011-07-01 12:54:27,438 DEBUG [org.jgroups.protocols.pbcast.NAKACK] (WorkManager(2)-3)

      [setDigest()]

      existing digest:  []

      new digest:       SYSB-38243: [0 : 2 (2)], SYSA-18253: [0 : 0]

      resulting digest: SYSB-38243: [0 : 2 (2)], SYSA-18253: [0 : 0 (0)]

      2011-07-01 12:54:27,438 DEBUG [org.jgroups.protocols.pbcast.GMS] (WorkManager(2)-3) [SYSA-18253]: JoinRsp=[SYSB-38243|1] [SYSB-38243, SYSA-18253] [size=2]

       

       

      2011-07-01 12:54:27,438 DEBUG [org.jgroups.protocols.pbcast.GMS] (WorkManager(2)-3) new_view=[SYSB-38243|1] [SYSB-38243, SYSA-18253]

      2011-07-01 12:54:27,438 DEBUG [org.jgroups.protocols.pbcast.GMS] (WorkManager(2)-3) SYSA-18253: view is [SYSB-38243|1] [SYSB-38243, SYSA-18253]

      2011-07-01 12:54:27,438 DEBUG [org.jgroups.protocols.FD_SOCK] (WorkManager(2)-3) VIEW_CHANGE received: [SYSB-38243, SYSA-18253]

      2011-07-01 12:54:27,438 DEBUG [org.jgroups.protocols.pbcast.STABLE] (WorkManager(2)-3) [ergonomics] setting max_bytes to 800KB (2 members)

      2011-07-01 12:54:27,438 DEBUG [org.jgroups.protocols.FD_SOCK] (FD_SOCK pinger,GSIN_DJR_WAS_HERE,SYSA-18253) ping_dest is SYSB-38243, pingable_mbrs=[SYSB-38243, SYSA-18253]

      2011-07-01 12:54:27,438 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (WorkManager(2)-3) Received new cluster view: [SYSB-38243|1] [SYSB-38243, SYSA-18253]

      2011-07-01 12:54:27,438 DEBUG [org.jgroups.protocols.pbcast.FLUSH] (WorkManager(2)-3) SYSA-18253: installing view [SYSB-38243|1] [SYSB-38243, SYSA-18253]

      2011-07-01 12:54:27,438 DEBUG [org.jgroups.protocols.pbcast.GMS] (WorkManager(2)-3) SYSA-18253:

      2011-07-01 12:54:27,532 DEBUG [org.jgroups.protocols.pbcast.FLUSH] (Incoming-2,GSIN_DJR_WAS_HERE,SYSA-18253) SYSA-18253: received STOP_FLUSH, unblocking FLUSH.down() and sending UNBLOCK up

      2011-07-01 12:54:27,532 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (WorkManager(2)-3) Cache local address is SYSA-18253, physical addresses are [xxx.xxx.xxx.61:1234]

      2011-07-01 12:54:27,532 INFO  [org.infinispan.factories.GlobalComponentRegistry] (WorkManager(2)-3) Infinispan version: Infinispan 'Ursus' 4.2.1.FINAL

      2011-07-01 12:54:27,563 ERROR [org.infinispan.distribution.JoinTask] (Rehasher-SYSA-18253) Caught exception!  Aborting join.

      org.infinispan.CacheException: java.lang.RuntimeException: Failure to marshal argument(s)

          at org.infinispan.util.Util.rewrapAsCacheException(Util.java:468)

          at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommands(CommandAwareRpcDispatcher.java:118)

          at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:407)

          at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:102)

          at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:126)

          at org.infinispan.distribution.JoinTask.retrieveOldConsistentHash(JoinTask.java:183)

          at org.infinispan.distribution.JoinTask.getPermissionToJoin(JoinTask.java:66)

          at org.infinispan.distribution.JoinTask.performRehash(JoinTask.java:86)

          at org.infinispan.distribution.RehashTask.call(RehashTask.java:75)

          at org.infinispan.distribution.RehashTask.call(RehashTask.java:41)

          at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)

          at java.util.concurrent.FutureTask.run(Unknown Source)

          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)

          at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

          at java.lang.Thread.run(Unknown Source)

      Caused by: java.lang.RuntimeException: Failure to marshal argument(s)

          at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher$ReplicationTask.marshallCall(CommandAwareRpcDispatcher.java:210)

          at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher$ReplicationTask.call(CommandAwareRpcDispatcher.java:248)

          at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommands(CommandAwareRpcDispatcher.java:116)

          ... 13 more

      Caused by: java.lang.NullPointerException

          at org.infinispan.marshall.jboss.GenericJBossMarshaller$1.initialValue(GenericJBossMarshaller.java:75)

          at org.infinispan.marshall.jboss.GenericJBossMarshaller$1.initialValue(GenericJBossMarshaller.java:71)

          at java.lang.ThreadLocal.setInitialValue(Unknown Source)

          at java.lang.ThreadLocal.get(Unknown Source)

          at org.infinispan.marshall.jboss.GenericJBossMarshaller.startObjectOutput(GenericJBossMarshaller.java:130)

          at org.infinispan.marshall.VersionAwareMarshaller.startObjectOutput(VersionAwareMarshaller.java:126)

          at org.infinispan.marshall.VersionAwareMarshaller.objectToBuffer(VersionAwareMarshaller.java:91)

          at org.infinispan.marshall.AbstractMarshaller.objectToBuffer(AbstractMarshaller.java:31)

          at org.infinispan.remoting.transport.jgroups.MarshallerAdapter.objectToBuffer(MarshallerAdapter.java:22)

          at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher$ReplicationTask.marshallCall(CommandAwareRpcDispatcher.java:208)

          ... 15 more

       

       

      One type of error that jumps out relates to the Incoming-1,xxx.xxx.xxx.61:1474) packet from xxx.xxx.xxx.111:45699 has different version (0.8.1) from ours (2.6.10). Packet is discarded

      We've not had this the entire time, but we've been trying many different configuation parameters.  Can anyone tell us what we need to do to get rid of this?

       

      Clearly, the big question is why is the Join failing?  Here is our configuration information:

          GlobalConfiguration gc = GlobalConfiguration.getClusteredDefault();

          gc.setClusterName( "DJR_WAS_HERE" );

          Configuration c = new Configuration();

          c.setCacheMode( Configuration.CacheMode.DIST_SYNC);

          Properties p = new Properties();

          p.setProperty("configurationFile", "jgroups-tcp.xml");

          gc.setTransportProperties(p);

          c.setTransactionManagerLookupClass(

              "org.infinispan.transaction.lookup.GenericTransactionManagerLookup" );

          EmbeddedCacheManager cm = new DefaultCacheManager( gc, c );

       

      We start JBoss with:

      %JBOSS_HOME%\bin\run.bat -c all -b xxx.xxx.xxx.61 -Djgroups.udp.mcast_addr=229.11.12.12 -Djgroups.udp.mcast_port=45699 -Djgroups.tcp.address=xxx.xxx.xxx.61 -Djgroups.tcp.port=1234 -g PROJ_Cluster_1_Partition -Djboss.service.binding.set=ports-01 -Djboss.messaging.ServerPeerID=67

       

      We have also verified that multicasting is working via the testers:

       

      C:\jboss-5.1.0.GA>java -cp server\all\lib\jgroups.jar org.jgroups.tests.McastReceiverTest -mcast_addr 229.11.12.12 -port 45699  C:\jboss-5.1.0.GA>java -cp server\all\lib\jgroups.jar org.jgroups.tests.McastSenderTest -mcast_addr 229.11.12.12 -port 45699

      The receivers saw the input from the sender on both nodes in the cluster very quickly and without issue.