1 Reply Latest reply on Jul 22, 2010 11:00 AM by galder.zamarreno

5 node cluster - exceptions

kapilnayar1 Jul 20, 2010 10:21 AM

I have a cluster (5 nodes) configured for distribution mode with L1 and sync operation (Infinispan 4.1.0.BETA2).
The cluster nodes are running on Windows Server 2003 VMs (3 nodes on VM1 and 2nodes on VM2) with jgroups configured as TCP.
Single cache instance was created on all nodes and the 5nodes seemed to connect successfully.
I left the cluster running overnight without any application/ cache activity and noticed the following messages and exceptions next morning:

The jmx RPCManager statistics still show cluster size as 5.
I need to understand if these exceptions would have messed up the cache/ cache manager or is it only transient.
Any comments/ observations are appreciated.

Thanks,
Kapil

2010-07-20 04:28:44,180 WARN [NAKACK] VM1-57619: dropped message from VM2-62323 (not in xmit_table), keys are [VM1-57619, V
M1-29440, VM1-57675], view=[VM1-29440|5] [VM1-29440, VM1-57619, VM1-57675]
2010-07-20 04:28:44,195 WARN [NAKACK] VM1-57619: dropped message from VM2-33071 (not in xmit_table), keys are [VM1-57619, V
M1-29440, VM1-57675], view=[VM1-29440|5] [VM1-29440, VM1-57619, VM1-57675]
2010-07-20 04:28:44,242 WARN [NAKACK] VM1-57619: dropped message from VM2-33071 (not in xmit_table), keys are [VM1-57619, V
M1-29440, VM1-57675], view=[VM1-29440|5] [VM1-29440, VM1-57619, VM1-57675]
2010-07-20 04:28:44,258 WARN [NAKACK] VM1-57619: dropped message from VM2-62323 (not in xmit_table), keys are [VM1-57619, V
M1-29440, VM1-57675], view=[VM1-29440|5] [VM1-29440, VM1-57619, VM1-57675]
2010-07-20 04:28:44,273 WARN [NAKACK] VM1-57619: dropped message from VM2-33071 (not in xmit_table), keys are [VM1-57619, V
M1-29440, VM1-57675], view=[VM1-29440|5] [VM1-29440, VM1-57619, VM1-57675]
2010-07-20 04:28:44,570 WARN [FD_SOCK] I (VM1-57619) was suspected by VM2-62323; ignoring the SUSPECT message
2010-07-20 04:48:45,124 ERROR [JoinTask] Caught exception!
org.infinispan.CacheException: Unable to retrieve old consistent hash from coordinator even after several attempts at sleeping and retrying!
at org.infinispan.distribution.

JoinTask.retrieveOldCH(JoinTask.java:191)
         at org.infinispan.distribution.JoinTask.performRehash(JoinTask.java:83)
         at org.infinispan.distribution.RehashTask.call(RehashTask.java:52)
         at org.infinispan.distribution.RehashTask.call(RehashTask.java:32)
         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
         at java.lang.Thread.run(Thread.java:619)
2010-07-20 05:51:27,370 WARN [FD] I was suspected by VM2-62323; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
2010-07-20 05:51:28,588 WARN [NAKACK] VM1-57619: dropped message from VM2-62323 (not in xmit_table), keys are [VM1-57619, V
M1-29440, VM1-57675], view=[VM1-57619|8] [VM1-57619, VM1-29440, VM1-57675]
2010-07-20 05:51:29,557 WARN [FD_SOCK] I (VM1-57619) was suspected by VM2-62323; ignoring the SUSPECT message
2010-07-20 05:51:30,370 WARN [FD] I was suspected by VM2-62323; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
2010-07-20 05:51:30,370 WARN [TCP] VM1-57619: no physical address for VM2-33071, dropping message
2010-07-20 05:51:30,604 WARN [NAKACK] VM1-57619: dropped message from VM2-62323 (not in xmit_table), keys are [VM1-57619, V
M1-29440, VM1-57675], view=[VM1-57619|8] [VM1-57619, VM1-29440, VM1-57675]
2010-07-20 05:51:30,651 WARN [NAKACK] VM1-57619: dropped message from VM2-62323 (not in xmit_table), keys are [VM1-57619, V
M1-29440, VM1-57675], view=[VM1-57619|8] [VM1-57619, VM1-29440, VM1-57675]
2010-07-20 05:51:30,698 WARN [NAKACK] VM1-57619: dropped message from VM2-62323 (not in xmit_table), keys are [VM1-57619, V
M1-29440, VM1-57675], view=[VM1-57619|8] [VM1-57619, VM1-29440, VM1-57675]
2010-07-20 05:51:30,713 WARN [NAKACK] VM1-57619: dropped message from VM2-62323 (not in xmit_table), keys are [VM1-57619, V
M1-29440, VM1-57675], view=[VM1-57619|8] [VM1-57619, VM1-29440, VM1-57675]
2010-07-20 05:51:33,354 WARN [NAKACK] VM1-57619: dropped message from VM2-62323 (not in xmit_table), keys are [VM1-57619, V
M1-29440, VM1-57675], view=[VM1-57619|8] [VM1-57619, VM1-29440, VM1-57675]
2010-07-20 05:51:39,026 WARN [NAKACK] VM1-57619: dropped message from VM2-62323 (not in xmit_table), keys are [VM1-57619, V
M1-29440, VM1-57675], view=[VM1-57619|8] [VM1-57619, VM1-29440, VM1-57675]
2010-07-20 05:51:39,042 WARN [NAKACK] VM1-57619: dropped message from VM2-62323 (not in xmit_table), keys are [VM1-57619, V
M1-29440, VM1-57675], view=[VM1-57619|8] [VM1-57619, VM1-29440, VM1-57675]
2010-07-20 05:51:40,042 WARN [FD_SOCK] I (VM1-57619) was suspected by VM2-62323; ignoring the SUSPECT message
2010-07-20 06:11:40,362 ERROR [JoinTask] Caught exception!
org.infinispan.CacheException: Unable to retrieve old consistent hash from coordinator even after several attempts at sleeping and retrying!
        at org.infinispan.distribution.JoinTask.retrieveOldCH(JoinTask.java:191)
         at org.infinispan.distribution.JoinTask.performRehash(JoinTask.java:83)
         at org.infinispan.distribution.RehashTask.call(RehashTask.java:52)
         at org.infinispan.distribution.RehashTask.call(RehashTask.java:32)
         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
         at java.lang.Thread.run(Thread.java:619)

1. Re: 5 node cluster - exceptions

galder.zamarreno Jul 22, 2010 11:00 AM (in response to kapilnayar1)

It seems like there was some split between nodes in VM1 and VM2. It's hard to say what caused it but this would have affected Infinispan, because most likely two islands would have been formed. One with VM1 nodes and the other with VM2 ones. Can you attach your JGroups confifguration? If you run the same test again, enable TRACE logging on org.jgroups package from the start to get more info.
1 of 1 people found this helpful
Actions