0 Replies Latest reply on Dec 14, 2015 6:07 AM by sam.fung

    Need help on the clustering issue -- failure reconnecting to channel...

    sam.fung

      An exception was happened in company's JBoss AS clustering environment, from the error message below, it seems SERVER_A failed to send back a HEARTBEAT_ACK to SERVER_B, and then close the Socket, and finally failed to reconnect to channel, I suspected this issue was caused by network failure but I can't find any useful info from windows event log, therefore I post the issue here to seek your guys' help, thanks.

       

      OS: Windows Server 2008 R2

      JBOSS version: Release ID: JBoss [The Oracle] 5.1.0.GA (build: SVNTag=JBoss_5_1_0_GA date=200905221053)

      2 Nodes in cluster group: SERVER_B (vip: 169.254.1.52), SERVER_B(vip: 169.254.2.230)

       

      [Error Message Captured from SERVER_A]

      2015-09-20 05:39:44,855 WARN  [org.jgroups.protocols.FD] (OOB-18548,169.254.1.52:57132) I was suspected by 169.254.2.230:63439; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK

      2015-09-20 05:39:44,855 WARN  [org.jgroups.protocols.pbcast.GMS] (Incoming-5,169.254.1.52:57132) I (169.254.1.52:57132) am not a member of view [169.254.2.230:63439|22] [169.254.2.230:63439], shunning myself and leaving the group (prev_members are [169.254.2.230:63439, 169.254.1.52:57132], current view is [169.254.2.230:63439|21] [169.254.2.230:63439, 169.254.1.52:57132])

      2015-09-20 05:39:44,855 WARN  [org.jgroups.protocols.pbcast.GMS] (Incoming-12,169.254.1.52:57132) I (169.254.1.52:57132) am not a member of view [169.254.2.230:63439|22] [169.254.2.230:63439], shunning myself and leaving the group (prev_members are [169.254.2.230:63439, 169.254.1.52:57132], current view is [169.254.2.230:63439|21] [169.254.2.230:63439, 169.254.1.52:57132])

      2015-09-20 05:39:44,855 WARN  [org.jgroups.protocols.pbcast.GMS] (Incoming-18,169.254.1.52:57132) I (169.254.1.52:57132) am not a member of view [169.254.2.230:63439|22] [169.254.2.230:63439], shunning myself and leaving the group (prev_members are [169.254.2.230:63439, 169.254.1.52:57132], current view is [169.254.2.230:63439|21] [169.254.2.230:63439, 169.254.1.52:57132])

      2015-09-20 05:39:44,855 WARN  [org.jgroups.protocols.pbcast.GMS] (Incoming-2,169.254.1.52:57132) I (169.254.1.52:57132) am not a member of view [169.254.2.230:63439|22] [169.254.2.230:63439], shunning myself and leaving the group (prev_members are [169.254.2.230:63439, 169.254.1.52:57132], current view is [169.254.2.230:63439|21] [169.254.2.230:63439, 169.254.1.52:57132])

      2015-09-20 05:39:44,855 WARN  [org.jgroups.protocols.FD] (OOB-18547,169.254.1.52:57132) I was suspected by 169.254.2.230:63439; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK

      2015-09-20 05:39:44,855 WARN  [org.jgroups.protocols.FD] (OOB-18545,169.254.1.52:57132) I was suspected by 169.254.2.230:63439; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK

      2015-09-20 05:39:44,855 WARN  [org.jgroups.protocols.FD] (OOB-18550,169.254.1.52:57132) I was suspected by 169.254.2.230:63439; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK

      2015-09-20 05:39:44,855 ERROR [org.jgroups.blocks.ConnectionTable] (Timer-3,169.254.1.52:7900) failed sending data to 169.254.2.230:7900: java.net.SocketException: Software caused connection abort: socket write error

      2015-09-20 05:39:44,886 ERROR [org.jgroups.blocks.ConnectionTable] (ReceiverThread) failed sending data to 169.254.2.230:7900: java.net.SocketException: Socket closed

      2015-09-20 05:39:45,494 INFO  [javax.enterprise.resource.webservices.assembler] (Thread-355757) Default metro-default.xml configuration file located at: 'vfszip:/C:/jboss-5.1.0.GA/server/all/lib/webservices-rt.jar/META-INF/metro-default.xml'

      2015-09-20 05:39:45,494 INFO  [javax.enterprise.resource.webservices.assembler] (Thread-355756) Default metro-default.xml configuration file located at: 'vfszip:/C:/jboss-5.1.0.GA/server/all/lib/webservices-rt.jar/META-INF/metro-default.xml'

      2015-09-20 05:39:46,103 INFO  [STDOUT] (CloserThread)

      ---------------------------------------------------------

      GMS: address is 169.254.1.52:53709 (cluster=DefaultPartition-SessionCache)

      ---------------------------------------------------------

      2015-09-20 05:39:46,118 FATAL [org.jgroups.JChannel] (CloserThread) local_addr is null; cannot connect

      2015-09-20 05:39:46,118 FATAL [org.jgroups.JChannel] (CloserThread) local_addr is null; cannot connect

      2015-09-20 05:39:46,134 ERROR [org.jgroups.JChannel] (CloserThread) failure reconnecting to channel, retrying

      org.jgroups.ChannelException: local_addr is null

        at org.jgroups.JChannel.startStack(JChannel.java:1631)

        at org.jgroups.JChannel.connect(JChannel.java:366)

        at org.jgroups.JChannel$CloserThread.run(JChannel.java:2046)

      2015-09-20 05:39:46,134 ERROR [org.jgroups.JChannel] (CloserThread) failure reconnecting to channel, retrying

      org.jgroups.ChannelException: local_addr is null

        at org.jgroups.JChannel.startStack(JChannel.java:1631)

        at org.jgroups.JChannel.connect(JChannel.java:366)

        at org.jgroups.JChannel$CloserThread.run(JChannel.java:2046)

      2015-09-20 05:39:46,181 FATAL [org.jgroups.JChannel] (CloserThread) local_addr is null; cannot connect

      2015-09-20 05:39:46,181 ERROR [org.jgroups.JChannel] (CloserThread) failure reconnecting to channel, retrying

      org.jgroups.ChannelException: local_addr is null

        at org.jgroups.JChannel.startStack(JChannel.java:1631)

        at org.jgroups.JChannel.connect(JChannel.java:366)

        at org.jgroups.JChannel$CloserThread.run(JChannel.java:2046)

       

       

      [Error Message Captured from SERVER_B]

      2015-09-20 05:39:43,574 INFO  [org.jboss.ha.framework.interfaces.HAPartition.lifecycle.DefaultPartition] (VERIFY_SUSPECT.TimerThread,DefaultPartition,169.254.2.230:63439) Suspected member: 169.254.1.52:57132

      2015-09-20 05:39:43,637 INFO  [org.jboss.ha.framework.interfaces.HAPartition.lifecycle.DefaultPartition] (Incoming-17,169.254.2.230:63439) New cluster view for partition DefaultPartition (id: 22, delta: -1) : [169.254.2.230:1099]

      2015-09-20 05:39:43,637 INFO  [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] (AsynchViewChangeHandler Thread) I am (169.254.2.230:1099) received membershipChanged event:

      2015-09-20 05:39:43,638 INFO  [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] (AsynchViewChangeHandler Thread) Dead members: 1 ([169.254.1.52:1099])

      2015-09-20 05:39:43,638 INFO  [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] (AsynchViewChangeHandler Thread) New Members : 0 ([])

      2015-09-20 05:39:43,638 INFO  [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] (AsynchViewChangeHandler Thread) All Members : 1 ([169.254.2.230:1099])

      2015-09-20 05:39:43,640 INFO  [org.jboss.cache.RPCManagerImpl] (Incoming-1,169.254.2.230:63439) Received new cluster view: [169.254.2.230:63439|22] [169.254.2.230:63439]

      2015-09-20 05:39:43,641 INFO  [org.jboss.messaging.core.impl.postoffice.GroupMember] (Incoming-1,169.254.2.230:63439) org.jboss.messaging.core.impl.postoffice.GroupMember$ControlMembershipListener@4622fec1 got new view [169.254.2.230:63439|22] [169.254.2.230:63439], old view is [169.254.2.230:63439|21] [169.254.2.230:63439, 169.254.1.52:57132]

      2015-09-20 05:39:43,641 INFO  [org.jboss.messaging.core.impl.postoffice.GroupMember] (Incoming-1,169.254.2.230:63439) I am (169.254.2.230:63439)

      2015-09-20 05:39:43,641 ERROR [org.jboss.messaging.core.impl.postoffice.GroupMember] (Incoming-1,169.254.2.230:63439) Caught Exception in MembershipListener

      java.lang.IllegalStateException: org.jboss.messaging.core.impl.postoffice.MessagingPostOffice@2ee1f68a cannot find node ID for address 169.254.1.52:57132

        at org.jboss.messaging.core.impl.postoffice.MessagingPostOffice.nodesLeft(MessagingPostOffice.java:942)

        at org.jboss.messaging.core.impl.postoffice.GroupMember$ControlMembershipListener.viewAccepted(GroupMember.java:508)

        at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.handleUpEvent(MessageDispatcher.java:733)

        at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:776)

        at org.jgroups.JChannel.up(JChannel.java:1339)

        at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:462)

        at org.jgroups.protocols.pbcast.FLUSH.up(FLUSH.java:449)

        at org.jgroups.protocols.pbcast.STATE_TRANSFER.up(STATE_TRANSFER.java:144)

        at org.jgroups.protocols.FRAG2.up(FRAG2.java:188)

        at org.jgroups.protocols.FC.up(FC.java:493)

        at org.jgroups.protocols.pbcast.GMS.installView(GMS.java:548)

        at org.jgroups.protocols.pbcast.CoordGmsImpl.handleViewChange(CoordGmsImpl.java:460)

        at org.jgroups.protocols.pbcast.GMS.up(GMS.java:753)

        at org.jgroups.protocols.VIEW_SYNC.up(VIEW_SYNC.java:192)

        at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:233)

        at org.jgroups.protocols.UNICAST.up(UNICAST.java:299)

        at org.jgroups.protocols.pbcast.NAKACK.handleMessage(NAKACK.java:884)

        at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:716)

        at org.jgroups.protocols.BARRIER.up(BARRIER.java:136)

        at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:167)

        at org.jgroups.protocols.FD.up(FD.java:284)

        at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:309)

        at org.jgroups.protocols.MERGE2.up(MERGE2.java:144)

        at org.jgroups.protocols.Discovery.up(Discovery.java:264)

        at org.jgroups.protocols.PING.up(PING.java:273)

        at org.jgroups.protocols.TP$ProtocolAdapter.up(TP.java:2327)

        at org.jgroups.protocols.TP.passMessageUp(TP.java:1261)

        at org.jgroups.protocols.TP.access$100(TP.java:49)

        at org.jgroups.protocols.TP$1.run(TP.java:1188)

        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

        at java.lang.Thread.run(Thread.java:662)

      2015-09-20 05:39:44,596 WARN  [org.jgroups.protocols.pbcast.NAKACK] (OOB-5433,169.254.2.230:63439) 169.254.2.230:63439] discarded message from non-member 169.254.1.52:57132, my view is [169.254.2.230:63439|22] [169.254.2.230:63439]

      2015-09-20 05:39:44,671 WARN  [org.jgroups.protocols.pbcast.NAKACK] (OOB-5432,169.254.2.230:63439) 169.254.2.230:63439] discarded message from non-member 169.254.1.52:57132, my view is [169.254.2.230:63439|22] [169.254.2.230:63439]

      2015-09-20 05:39:44,714 WARN  [org.jgroups.protocols.pbcast.NAKACK] (OOB-27872,169.254.2.230:7900) 169.254.2.230:7900] discarded message from non-member 169.254.1.52:7900, my view is [169.254.2.230:7900|22] [169.254.2.230:7900]