Cluster doesn't work properly (HAPartition, JGroups)
evgeniy.khist Dec 1, 2011 11:35 AMHi!
I'm working on project that requires clustering.
Application runs on JBoss AS 5.1.0.
After some time nodes become inactive (stops processing requests). This problem appears randomly, but periodically.
Last time cluster suddenly crashes (3 of 4 nodes stopped serving requests) after 4 days of stable work.
Log:
org.jboss.messaging.core.impl.postoffice.MessagingPostOffice cannot find node ID for address
...
org.jgroups.protocols.pbcast.GMS GMS flush by coordinator at failed
...
2011-11-30 11:58:30,718 WARN [CloserThread] org.jgroups.protocols.pbcast.GMS join(10.*.*.177:55489) sent to 10.*.*.168:36319 timed out (after 3000 ms), retrying
2011-11-30 11:58:33,721 WARN [CloserThread] org.jgroups.protocols.pbcast.GMS join(10.*.*.177:55489) sent to 10.*.*.168:36319 timed out (after 3000 ms), retrying
...
2011-11-30 19:15:20,705 INFO org.jboss.ha.framework.interfaces.HAPartition.mcps Suspected member: 10.*.*.177:55612
2011-11-30 19:15:20,841 WARN org.jgroups.protocols.pbcast.NAKACK 10.*.*.168:7901] discarded message from non-member 10.*.*.177:7902, my view is [10.*.0.177:7901|10]
[10.*.*.177:7901, 10.*.*.168:7900, 10.*.*.168:7901, 10.*.*.168:7902, 10.*.*.177:7900]
2011-11-30 19:15:23,140 INFO org.jboss.ha.framework.interfaces.HAPartition.mcps New cluster view for partition mcps: 4 ([10.*.*.177:1999, 10.*.*.177:1199, 10.*.*.16
8:1099, 10.*.*.168:1199, 10.*.*.168:1999] delta: -1)
2011-11-30 19:15:23,143 WARN org.jgroups.protocols.pbcast.NAKACK 10.*.*.168:49865] discarded message from non-member 10.*.*.177:55612, my view is [10.*.*.177:46275|
4] [10.*.*.177:46275, 10.*.*.177:60615, 10.*.*.168:49865, 10.*.*.168:38167, 10.*.*.168:39465]
2011-11-30 19:15:23,153 INFO org.jboss.ha.framework.server.DistributedReplicantManagerImpl.mcps I am (10.*.*.168:1099) received membershipChanged event:
2011-11-30 19:15:23,153 INFO org.jboss.ha.framework.server.DistributedReplicantManagerImpl.mcps Dead members: 1 ([10.*.*.177:1099])
2011-11-30 19:15:23,153 INFO org.jboss.ha.framework.server.DistributedReplicantManagerImpl.mcps New Members : 0 ([])
2011-11-30 19:15:23,153 INFO org.jboss.ha.framework.server.DistributedReplicantManagerImpl.mcps All Members : 5 ([10.*.*.177:1999, 10.*.*.177:1199, 10.*.*.168:1099,
- 10.*.*.168:1199, 10.*.*.168:1999])
2011-11-30 19:15:23,302 WARN org.jgroups.protocols.pbcast.NAKACK 10.*.*.168:49865] discarded message from non-member 10.*.*.177:55612, my view is [10.*.*.177:46275|
4] [10.*.*.177:46275, 10.*.*.177:60615, 10.*.*.168:49865, 10.*.*.168:38167, 10.*.*.168:39465]
2011-11-30 19:15:23,302 WARN org.jgroups.protocols.pbcast.NAKACK 10.*.*.168:49865] discarded message from non-member 10.44.0.177:55612, my view is [10.*.*.177:46275|
...
Any help will be appreciated.