Need help on the clustering issue -- failure reconnecting to channel...
sam.fung Dec 14, 2015 6:07 AMAn exception was happened in company's JBoss AS clustering environment, from the error message below, it seems SERVER_A failed to send back a HEARTBEAT_ACK to SERVER_B, and then close the Socket, and finally failed to reconnect to channel, I suspected this issue was caused by network failure but I can't find any useful info from windows event log, therefore I post the issue here to seek your guys' help, thanks.
OS: Windows Server 2008 R2
JBOSS version: Release ID: JBoss [The Oracle] 5.1.0.GA (build: SVNTag=JBoss_5_1_0_GA date=200905221053)
2 Nodes in cluster group: SERVER_B (vip: 169.254.1.52), SERVER_B(vip: 169.254.2.230)
[Error Message Captured from SERVER_A]
2015-09-20 05:39:44,855 WARN [org.jgroups.protocols.FD] (OOB-18548,169.254.1.52:57132) I was suspected by 169.254.2.230:63439; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
2015-09-20 05:39:44,855 WARN [org.jgroups.protocols.pbcast.GMS] (Incoming-5,169.254.1.52:57132) I (169.254.1.52:57132) am not a member of view [169.254.2.230:63439|22] [169.254.2.230:63439], shunning myself and leaving the group (prev_members are [169.254.2.230:63439, 169.254.1.52:57132], current view is [169.254.2.230:63439|21] [169.254.2.230:63439, 169.254.1.52:57132])
2015-09-20 05:39:44,855 WARN [org.jgroups.protocols.pbcast.GMS] (Incoming-12,169.254.1.52:57132) I (169.254.1.52:57132) am not a member of view [169.254.2.230:63439|22] [169.254.2.230:63439], shunning myself and leaving the group (prev_members are [169.254.2.230:63439, 169.254.1.52:57132], current view is [169.254.2.230:63439|21] [169.254.2.230:63439, 169.254.1.52:57132])
2015-09-20 05:39:44,855 WARN [org.jgroups.protocols.pbcast.GMS] (Incoming-18,169.254.1.52:57132) I (169.254.1.52:57132) am not a member of view [169.254.2.230:63439|22] [169.254.2.230:63439], shunning myself and leaving the group (prev_members are [169.254.2.230:63439, 169.254.1.52:57132], current view is [169.254.2.230:63439|21] [169.254.2.230:63439, 169.254.1.52:57132])
2015-09-20 05:39:44,855 WARN [org.jgroups.protocols.pbcast.GMS] (Incoming-2,169.254.1.52:57132) I (169.254.1.52:57132) am not a member of view [169.254.2.230:63439|22] [169.254.2.230:63439], shunning myself and leaving the group (prev_members are [169.254.2.230:63439, 169.254.1.52:57132], current view is [169.254.2.230:63439|21] [169.254.2.230:63439, 169.254.1.52:57132])
2015-09-20 05:39:44,855 WARN [org.jgroups.protocols.FD] (OOB-18547,169.254.1.52:57132) I was suspected by 169.254.2.230:63439; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
2015-09-20 05:39:44,855 WARN [org.jgroups.protocols.FD] (OOB-18545,169.254.1.52:57132) I was suspected by 169.254.2.230:63439; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
2015-09-20 05:39:44,855 WARN [org.jgroups.protocols.FD] (OOB-18550,169.254.1.52:57132) I was suspected by 169.254.2.230:63439; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
2015-09-20 05:39:44,855 ERROR [org.jgroups.blocks.ConnectionTable] (Timer-3,169.254.1.52:7900) failed sending data to 169.254.2.230:7900: java.net.SocketException: Software caused connection abort: socket write error
2015-09-20 05:39:44,886 ERROR [org.jgroups.blocks.ConnectionTable] (ReceiverThread) failed sending data to 169.254.2.230:7900: java.net.SocketException: Socket closed
2015-09-20 05:39:45,494 INFO [javax.enterprise.resource.webservices.assembler] (Thread-355757) Default metro-default.xml configuration file located at: 'vfszip:/C:/jboss-5.1.0.GA/server/all/lib/webservices-rt.jar/META-INF/metro-default.xml'
2015-09-20 05:39:45,494 INFO [javax.enterprise.resource.webservices.assembler] (Thread-355756) Default metro-default.xml configuration file located at: 'vfszip:/C:/jboss-5.1.0.GA/server/all/lib/webservices-rt.jar/META-INF/metro-default.xml'
2015-09-20 05:39:46,103 INFO [STDOUT] (CloserThread)
---------------------------------------------------------
GMS: address is 169.254.1.52:53709 (cluster=DefaultPartition-SessionCache)
---------------------------------------------------------
2015-09-20 05:39:46,118 FATAL [org.jgroups.JChannel] (CloserThread) local_addr is null; cannot connect
2015-09-20 05:39:46,118 FATAL [org.jgroups.JChannel] (CloserThread) local_addr is null; cannot connect
2015-09-20 05:39:46,134 ERROR [org.jgroups.JChannel] (CloserThread) failure reconnecting to channel, retrying
org.jgroups.ChannelException: local_addr is null
at org.jgroups.JChannel.startStack(JChannel.java:1631)
at org.jgroups.JChannel.connect(JChannel.java:366)
at org.jgroups.JChannel$CloserThread.run(JChannel.java:2046)
2015-09-20 05:39:46,134 ERROR [org.jgroups.JChannel] (CloserThread) failure reconnecting to channel, retrying
org.jgroups.ChannelException: local_addr is null
at org.jgroups.JChannel.startStack(JChannel.java:1631)
at org.jgroups.JChannel.connect(JChannel.java:366)
at org.jgroups.JChannel$CloserThread.run(JChannel.java:2046)
2015-09-20 05:39:46,181 FATAL [org.jgroups.JChannel] (CloserThread) local_addr is null; cannot connect
2015-09-20 05:39:46,181 ERROR [org.jgroups.JChannel] (CloserThread) failure reconnecting to channel, retrying
org.jgroups.ChannelException: local_addr is null
at org.jgroups.JChannel.startStack(JChannel.java:1631)
at org.jgroups.JChannel.connect(JChannel.java:366)
at org.jgroups.JChannel$CloserThread.run(JChannel.java:2046)
[Error Message Captured from SERVER_B]
2015-09-20 05:39:43,574 INFO [org.jboss.ha.framework.interfaces.HAPartition.lifecycle.DefaultPartition] (VERIFY_SUSPECT.TimerThread,DefaultPartition,169.254.2.230:63439) Suspected member: 169.254.1.52:57132
2015-09-20 05:39:43,637 INFO [org.jboss.ha.framework.interfaces.HAPartition.lifecycle.DefaultPartition] (Incoming-17,169.254.2.230:63439) New cluster view for partition DefaultPartition (id: 22, delta: -1) : [169.254.2.230:1099]
2015-09-20 05:39:43,637 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] (AsynchViewChangeHandler Thread) I am (169.254.2.230:1099) received membershipChanged event:
2015-09-20 05:39:43,638 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] (AsynchViewChangeHandler Thread) Dead members: 1 ([169.254.1.52:1099])
2015-09-20 05:39:43,638 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] (AsynchViewChangeHandler Thread) New Members : 0 ([])
2015-09-20 05:39:43,638 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] (AsynchViewChangeHandler Thread) All Members : 1 ([169.254.2.230:1099])
2015-09-20 05:39:43,640 INFO [org.jboss.cache.RPCManagerImpl] (Incoming-1,169.254.2.230:63439) Received new cluster view: [169.254.2.230:63439|22] [169.254.2.230:63439]
2015-09-20 05:39:43,641 INFO [org.jboss.messaging.core.impl.postoffice.GroupMember] (Incoming-1,169.254.2.230:63439) org.jboss.messaging.core.impl.postoffice.GroupMember$ControlMembershipListener@4622fec1 got new view [169.254.2.230:63439|22] [169.254.2.230:63439], old view is [169.254.2.230:63439|21] [169.254.2.230:63439, 169.254.1.52:57132]
2015-09-20 05:39:43,641 INFO [org.jboss.messaging.core.impl.postoffice.GroupMember] (Incoming-1,169.254.2.230:63439) I am (169.254.2.230:63439)
2015-09-20 05:39:43,641 ERROR [org.jboss.messaging.core.impl.postoffice.GroupMember] (Incoming-1,169.254.2.230:63439) Caught Exception in MembershipListener
java.lang.IllegalStateException: org.jboss.messaging.core.impl.postoffice.MessagingPostOffice@2ee1f68a cannot find node ID for address 169.254.1.52:57132
at org.jboss.messaging.core.impl.postoffice.MessagingPostOffice.nodesLeft(MessagingPostOffice.java:942)
at org.jboss.messaging.core.impl.postoffice.GroupMember$ControlMembershipListener.viewAccepted(GroupMember.java:508)
at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.handleUpEvent(MessageDispatcher.java:733)
at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:776)
at org.jgroups.JChannel.up(JChannel.java:1339)
at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:462)
at org.jgroups.protocols.pbcast.FLUSH.up(FLUSH.java:449)
at org.jgroups.protocols.pbcast.STATE_TRANSFER.up(STATE_TRANSFER.java:144)
at org.jgroups.protocols.FRAG2.up(FRAG2.java:188)
at org.jgroups.protocols.FC.up(FC.java:493)
at org.jgroups.protocols.pbcast.GMS.installView(GMS.java:548)
at org.jgroups.protocols.pbcast.CoordGmsImpl.handleViewChange(CoordGmsImpl.java:460)
at org.jgroups.protocols.pbcast.GMS.up(GMS.java:753)
at org.jgroups.protocols.VIEW_SYNC.up(VIEW_SYNC.java:192)
at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:233)
at org.jgroups.protocols.UNICAST.up(UNICAST.java:299)
at org.jgroups.protocols.pbcast.NAKACK.handleMessage(NAKACK.java:884)
at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:716)
at org.jgroups.protocols.BARRIER.up(BARRIER.java:136)
at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:167)
at org.jgroups.protocols.FD.up(FD.java:284)
at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:309)
at org.jgroups.protocols.MERGE2.up(MERGE2.java:144)
at org.jgroups.protocols.Discovery.up(Discovery.java:264)
at org.jgroups.protocols.PING.up(PING.java:273)
at org.jgroups.protocols.TP$ProtocolAdapter.up(TP.java:2327)
at org.jgroups.protocols.TP.passMessageUp(TP.java:1261)
at org.jgroups.protocols.TP.access$100(TP.java:49)
at org.jgroups.protocols.TP$1.run(TP.java:1188)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
2015-09-20 05:39:44,596 WARN [org.jgroups.protocols.pbcast.NAKACK] (OOB-5433,169.254.2.230:63439) 169.254.2.230:63439] discarded message from non-member 169.254.1.52:57132, my view is [169.254.2.230:63439|22] [169.254.2.230:63439]
2015-09-20 05:39:44,671 WARN [org.jgroups.protocols.pbcast.NAKACK] (OOB-5432,169.254.2.230:63439) 169.254.2.230:63439] discarded message from non-member 169.254.1.52:57132, my view is [169.254.2.230:63439|22] [169.254.2.230:63439]
2015-09-20 05:39:44,714 WARN [org.jgroups.protocols.pbcast.NAKACK] (OOB-27872,169.254.2.230:7900) 169.254.2.230:7900] discarded message from non-member 169.254.1.52:7900, my view is [169.254.2.230:7900|22] [169.254.2.230:7900]