jboss 6.0 cluster with two MasterNode (NPE in CoreGroupCommunicationService)
gcontini Jul 22, 2011 6:22 AMI jave a 2 node cluster. I'm using jboss 6.0.
I have 2 servers that see each other well, server1 (10.143.89.206) is the master node (cluster coordinator).
Now i pause server1 with
{code}
pkill -STOP -f java
{code}
Server2 rightfully detects the failure and becomes master:
{code}
2011-07-21 11:48:29,392 DEBUG [FD$Monitor] (Timer-2,<ADDR>) heartbeat missing from 10.143.89.206:1099 (number=4)
2011-07-21 11:48:33,393 DEBUG [FD$Monitor] (Timer-5,<ADDR>) sending are-you-alive msg to 10.143.89.206:1099 (own address=10.143.89.207:1099)
2011-07-21 11:48:33,393 DEBUG [FD$Monitor] (Timer-5,<ADDR>) [10.143.89.207:1099]: received no heartbeat ack from 10.143.89.206:1099 for 6 times (24000 milliseconds), suspecting it
2011-07-21 11:48:33,396 DEBUG [FD$BroadcastTask] (Timer-3,<ADDR>) broadcasting SUSPECT message [suspected_mbrs=[10.143.89.206:1099]] to group
....
2011-07-21 11:48:35,412 DEBUG [HASingletonController] (AsynchViewChangeHandler Thread) starting singleton, mSingleton=org.jboss.ha.singleton.HASingletonProfileManager@106051c1, mSingletonMBean=null
2011-07-21 11:48:35,412 DEBUG [HASingletonImpl] (AsynchViewChangeHandler Thread) startSingleton() : elected for master singleton node
{code}
Now i can see he is the masterNode from the jmx console. Now i unpause server1:
{code}
pkill -CONT -f java
{code}
Server 1 wake up:
{code}
2011-07-21 11:48:55,682 WARN [FD] (OOB-11,null) I was suspected by 10.143.89.207:1099; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
2011-07-21 11:48:55,685 DEBUG [FLUSH] (Incoming-17,null) 10.143.89.206:1099: received START_FLUSH but I am not flush participant, not responding
...
{code}
Server 2 decides he is the merge coordinator:
{code}
2011-07-21 11:49:00,231 DEBUG [Merger] (ViewHandler,clusterTest-HAPartition,10.143.89.207:1099) I (10.143.89.207:1099) will be the leader. Starting the merge task for [10.143.89.207:1099, 10.143.89.206:1099]
{code}
Server1 understands and install the view:
{code}
2011-07-21 11:48:59,361 INFO [org.jboss.ha.framework.server.ClusterPartition.clusterTest] CoreGroupCommunicationService (Incoming-8,null) New cluster view for partition clusterTest: 3 (org.jboss.ha.core.framework.server.CoreGroupCommunicationService$GroupView@4c6cb02a delta: 0, merge: true)
{code}
But now if i go to the jmx console i see both nodes think they both think to be the master (MasterNode=True)...
What's wrong here?
In another similar run, putting logs at trace level i've got this exception on server1:
{code}
2011-07-20 18:51:21,768 TRACE [CoreGroupCommunicationService$RpcHandler] (Incoming-6,null) Partition clusterTest rpc call threw exception: java.lang.NullPointerException
at org.jboss.modcluster.ha.HAModClusterService$RpcHandler.clusterStatusComplete(HAModClusterService.java:887) [:1.1.0.Final]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [:1.6.0_24]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [:1.6.0_24]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [:1.6.0_24]
at java.lang.reflect.Method.invoke(Method.java:597) [:1.6.0_24]
at org.jgroups.blocks.MethodCall.invoke(MethodCall.java:351) [:2.12.1.Final]
at org.jboss.ha.core.framework.server.CoreGroupCommunicationService$RpcHandler.handle(CoreGroupCommunicationService.java:1971) [:1.0.0.Final]
at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:577) [:2.12.1.Final]
at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:488) [:2.12.1.Final]
at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:364) [:2.12.1.Final]
at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:770) [:2.12.1.Final]
at org.jboss.ha.core.jgroups.blocks.mux.DelegatingStateTransferUpHandler.up(DelegatingStateTransferUpHandler.java:63) [:1.0.0.Final]
at org.jgroups.blocks.mux.MuxUpHandler.up(MuxUpHandler.java:99) [:2.12.1.Final]
at org.jgroups.JChannel.up(JChannel.java:1484) [:2.12.1.Final]
at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1074) [:2.12.1.Final]
at org.jgroups.protocols.pbcast.FLUSH.up(FLUSH.java:477) [:2.12.1.Final]
at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.up(STREAMING_STATE_TRANSFER.java:263) [:2.12.1.Final]
at org.jgroups.protocols.FRAG2.up(FRAG2.java:189) [:2.12.1.Final]
at org.jgroups.protocols.FlowControl.up(FlowControl.java:400) [:2.12.1.Final]
at org.jgroups.protocols.FlowControl.up(FlowControl.java:418) [:2.12.1.Final]
at org.jgroups.protocols.pbcast.GMS.up(GMS.java:891) [:2.12.1.Final]
at org.jgroups.protocols.VIEW_SYNC.up(VIEW_SYNC.java:170) [:2.12.1.Final]
at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:246) [:2.12.1.Final]
at org.jgroups.protocols.UNICAST.up(UNICAST.java:309) [:2.12.1.Final]
at org.jgroups.protocols.pbcast.NAKACK.handleMessage(NAKACK.java:838) [:2.12.1.Final]
at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:667) [:2.12.1.Final]
at org.jgroups.protocols.BARRIER.up(BARRIER.java:119) [:2.12.1.Final]
at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:133) [:2.12.1.Final]
at org.jgroups.protocols.FD.up(FD.java:275) [:2.12.1.Final]
at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:275) [:2.12.1.Final]
at org.jgroups.protocols.MERGE2.up(MERGE2.java:209) [:2.12.1.Final]
at org.jgroups.protocols.Discovery.up(Discovery.java:293) [:2.12.1.Final]
at org.jgroups.protocols.PING.up(PING.java:69) [:2.12.1.Final]
at org.jgroups.stack.Protocol.up(Protocol.java:413) [:2.12.1.Final]
at org.jgroups.protocols.TP.passMessageUp(TP.java:1109) [:2.12.1.Final]
at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1665) [:2.12.1.Final]
at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1647) [:2.12.1.Final]
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [:1.6.0_24]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [:1.6.0_24]
at java.lang.Thread.run(Thread.java:662) [:1.6.0_24]
{code}
Maybe somebody is instantiating ModClusterServiceDRMEntry with mcmpServerStates=null? (BasicConstructorJoinPoint.dispatch?)
Thanks in advance.
Gabriele
-
server2.log.zip 4.5 KB
-
server1.log.zip 1.5 KB