1 Reply Latest reply on Jul 22, 2010 11:00 AM by galder.zamarreno

    5 node cluster - exceptions

    kapilnayar1

      I have a cluster (5 nodes) configured for distribution mode with L1 and  sync operation (Infinispan 4.1.0.BETA2).
      The cluster nodes are  running on Windows Server 2003 VMs (3 nodes on VM1 and 2nodes on VM2)  with jgroups configured as TCP.
      Single cache instance was created on all nodes and the 5nodes seemed to  connect successfully.
      I left the cluster running overnight without  any application/ cache activity and noticed the following messages and  exceptions next morning:

      The jmx RPCManager statistics still show cluster size as 5.
      I  need to understand if these exceptions would have messed up the cache/  cache manager or is it only transient.
      Any comments/ observations are  appreciated.

      Thanks,
      Kapil

       

      2010-07-20 04:28:44,180 WARN  [NAKACK]  VM1-57619: dropped message from VM2-62323 (not in xmit_table), keys are  [VM1-57619, V
      M1-29440, VM1-57675], view=[VM1-29440|5] [VM1-29440,  VM1-57619, VM1-57675]
      2010-07-20 04:28:44,195 WARN  [NAKACK] VM1-57619: dropped message from  VM2-33071 (not in xmit_table), keys are [VM1-57619, V
      M1-29440,  VM1-57675], view=[VM1-29440|5] [VM1-29440, VM1-57619, VM1-57675]
      2010-07-20  04:28:44,242 WARN  [NAKACK] VM1-57619: dropped message from VM2-33071  (not in xmit_table), keys are [VM1-57619, V
      M1-29440, VM1-57675], view=[VM1-29440|5] [VM1-29440, VM1-57619,  VM1-57675]
      2010-07-20 04:28:44,258 WARN  [NAKACK] VM1-57619: dropped  message from VM2-62323 (not in xmit_table), keys are [VM1-57619, V
      M1-29440,  VM1-57675], view=[VM1-29440|5] [VM1-29440, VM1-57619, VM1-57675]
      2010-07-20 04:28:44,273 WARN  [NAKACK] VM1-57619: dropped message from  VM2-33071 (not in xmit_table), keys are [VM1-57619, V
      M1-29440,  VM1-57675], view=[VM1-29440|5] [VM1-29440, VM1-57619, VM1-57675]
      2010-07-20  04:28:44,570 WARN  [FD_SOCK] I (VM1-57619) was suspected by VM2-62323;  ignoring the SUSPECT message
      2010-07-20 04:48:45,124 ERROR [JoinTask] Caught exception!
      org.infinispan.CacheException:  Unable to retrieve old consistent hash from coordinator even after  several attempts at sleeping and retrying!
              at  org.infinispan.distribution.

      JoinTask.retrieveOldCH(JoinTask.java:191)
               at org.infinispan.distribution.JoinTask.performRehash(JoinTask.java:83)
               at org.infinispan.distribution.RehashTask.call(RehashTask.java:52)
               at org.infinispan.distribution.RehashTask.call(RehashTask.java:32)
               at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
               at java.util.concurrent.FutureTask.run(FutureTask.java:138)
               at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
               at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
               at java.lang.Thread.run(Thread.java:619)
      2010-07-20  05:51:27,370 WARN  [FD] I was suspected by VM2-62323; ignoring the  SUSPECT message and sending back a HEARTBEAT_ACK
      2010-07-20 05:51:28,588 WARN  [NAKACK] VM1-57619: dropped message from  VM2-62323 (not in xmit_table), keys are [VM1-57619, V
      M1-29440,  VM1-57675], view=[VM1-57619|8] [VM1-57619, VM1-29440, VM1-57675]
      2010-07-20  05:51:29,557 WARN  [FD_SOCK] I (VM1-57619) was suspected by VM2-62323;  ignoring the SUSPECT message
      2010-07-20 05:51:30,370 WARN  [FD] I was suspected by VM2-62323;  ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
      2010-07-20  05:51:30,370 WARN  [TCP] VM1-57619: no physical address for VM2-33071,  dropping message
      2010-07-20 05:51:30,604 WARN  [NAKACK] VM1-57619: dropped message from  VM2-62323 (not in xmit_table), keys are [VM1-57619, V
      M1-29440,  VM1-57675], view=[VM1-57619|8] [VM1-57619, VM1-29440, VM1-57675]
      2010-07-20  05:51:30,651 WARN  [NAKACK] VM1-57619: dropped message from VM2-62323  (not in xmit_table), keys are [VM1-57619, V
      M1-29440, VM1-57675], view=[VM1-57619|8] [VM1-57619, VM1-29440,  VM1-57675]
      2010-07-20 05:51:30,698 WARN  [NAKACK] VM1-57619: dropped  message from VM2-62323 (not in xmit_table), keys are [VM1-57619, V
      M1-29440,  VM1-57675], view=[VM1-57619|8] [VM1-57619, VM1-29440, VM1-57675]
      2010-07-20 05:51:30,713 WARN  [NAKACK] VM1-57619: dropped message from  VM2-62323 (not in xmit_table), keys are [VM1-57619, V
      M1-29440,  VM1-57675], view=[VM1-57619|8] [VM1-57619, VM1-29440, VM1-57675]
      2010-07-20  05:51:33,354 WARN  [NAKACK] VM1-57619: dropped message from VM2-62323  (not in xmit_table), keys are [VM1-57619, V
      M1-29440, VM1-57675], view=[VM1-57619|8] [VM1-57619, VM1-29440,  VM1-57675]
      2010-07-20 05:51:39,026 WARN  [NAKACK] VM1-57619: dropped  message from VM2-62323 (not in xmit_table), keys are [VM1-57619, V
      M1-29440,  VM1-57675], view=[VM1-57619|8] [VM1-57619, VM1-29440, VM1-57675]
      2010-07-20 05:51:39,042 WARN  [NAKACK] VM1-57619: dropped message from  VM2-62323 (not in xmit_table), keys are [VM1-57619, V
      M1-29440,  VM1-57675], view=[VM1-57619|8] [VM1-57619, VM1-29440, VM1-57675]
      2010-07-20  05:51:40,042 WARN  [FD_SOCK] I (VM1-57619) was suspected by VM2-62323;  ignoring the SUSPECT message
      2010-07-20 06:11:40,362 ERROR [JoinTask] Caught exception!
      org.infinispan.CacheException:  Unable to retrieve old consistent hash from coordinator even after  several attempts at sleeping and retrying!
              at  org.infinispan.distribution.JoinTask.retrieveOldCH(JoinTask.java:191)
               at org.infinispan.distribution.JoinTask.performRehash(JoinTask.java:83)
               at org.infinispan.distribution.RehashTask.call(RehashTask.java:52)
               at org.infinispan.distribution.RehashTask.call(RehashTask.java:32)
               at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
               at java.util.concurrent.FutureTask.run(FutureTask.java:138)
               at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
               at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
               at java.lang.Thread.run(Thread.java:619)
        • 1. Re: 5 node cluster - exceptions
          galder.zamarreno

          It seems like there was some split between nodes in VM1 and VM2. It's hard to say what caused it but this would have affected Infinispan, because most likely two islands would have been formed. One with VM1 nodes and the other with VM2 ones. Can you attach your JGroups confifguration? If you run the same test again, enable TRACE logging on org.jgroups package from the start to get more info.

          1 of 1 people found this helpful