1 Reply Latest reply on Sep 19, 2013 1:12 PM by rcottle79

    Jboss error :java.net.SocketException: Broken pipe

    sibbala

      Hi,
         We have two applications (App1 and App2) deployed in a clustered environment in two nodes (both applications deployed in both the nodes). App2 exposes an MDB and App1 consumes the MDB. Whenever a request is posted to App1, App1 tries to publish messages to App2 for further processing. Recently we found exception occuring while publishing  messages to App2. Analyzing we found that the nodes were getting dropped from clustering very often. Sometimes the dropped node was able to reconnect but sometimes its omitted from clustering. We see this dropping and attempting to reconnect scenario multiple times but after a point it fails completely and we are forced to restart the server after a period of 1 week or 10 days for proper functioning of the application. We see that both the nodes are unable to interact with each other. Can you please help/suggest us in identifying why the nodes are dropped from clustering suddenly and the possible resolution for this issue. Below are the details of the software's version used

       

      JDK: jdk-1.6.0_20

      JBOSS: jboss-eap-5.0.1

       

      Below are some log blocks for reference from both the nodes.

       

      Logs from node-1

       

      14:58:50,673 WARN  [BisocketServerInvoker] org.jboss.remoting.transport.bisocket.BisocketServerInvoker$ControlMonitorTimerTask@109f69c: detected failure on control connection Thread[control:
      Socket[addr=xxx02.xxx.xxxxx.com/aaa.aa.aa.aaa,port=50637,localport=55046],5,jboss] (4sv63z-6umcfq-h5lh0ioa-1-h5lhd6q0-er: requesting new control connection
      14:58:50,674 ERROR [ConnectionTable] failed sending data to aaa.aa.aa.aaa:7900: java.net.SocketException: Broken pipe
      14:58:50,706 WARN  [FD] I was suspected by aaa.aa.aa.aaa:40220; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
      14:58:50,674 WARN  [GMS] I (bbb.bb.b.bbb:53434) am not a member of view [aaa.aa.aa.aaa:40220|32] [aaa.aa.aa.aaa:40220], shunning myself and leaving the group (prev_members are [aaa.aa.aa.aaa:650
      82, bbb.bb.b.bbb:53434, aaa.aa.aa.aaa:40220], current view is [bbb.bb.b.bbb:53434|31] [bbb.bb.b.bbb:53434, aaa.aa.aa.aaa:40220])
      14:58:50,707 WARN  [FD] I was suspected by aaa.aa.aa.aaa:40220; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
      14:58:50,708 WARN  [FD] I was suspected by aaa.aa.aa.aaa:40220; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK
      14:58:50,706 WARN  [GMS] I (bbb.bb.b.bbb:53434) am not a member of view [aaa.aa.aa.aaa:40220|33] [aaa.aa.aa.aaa:40220], shunning myself and leaving the group (prev_members are [aaa.aa.aa.aaa:650
      82, bbb.bb.b.bbb:53434, aaa.aa.aa.aaa:40220], current view is [bbb.bb.b.bbb:53434|32] [bbb.bb.b.bbb:53434, aaa.aa.aa.aaa:40220])
      14:58:51,117 WARN  [NAKACK] bbb.bb.b.bbb:53434] discarded message from non-member aaa.aa.aa.aaa:40220, my view is [bbb.bb.b.bbb:53434|31] [bbb.bb.b.bbb:53434, aaa.aa.aa.aaa:40220]
      14:58:51,121 ERROR [UNICAST] bbb.bb.b.bbb:53434: sender window for bbb.bb.b.bbb:53434 not found

      14:58:57,711 WARN  [GMS] join(bbb.bb.b.bbb:53434) sent to aaa.aa.aa.aaa:40220 timed out (after 3000 ms), retrying
      14:58:59,232 ERROR [UNICAST] bbb.bb.b.bbb:53434: sender window for bbb.bb.b.bbb:53434 not found
      14:59:00,721 WARN  [GMS] join(bbb.bb.b.bbb:53434) sent to aaa.aa.aa.aaa:40220 timed out (after 3000 ms), retrying
      14:59:02,842 ERROR [UNICAST] bbb.bb.b.bbb:53434: sender window for bbb.bb.b.bbb:53434 not found
      14:59:03,733 WARN  [GMS] join(bbb.bb.b.bbb:53434) sent to aaa.aa.aa.aaa:40220 timed out (after 3000 ms), retrying
      14:59:06,452 ERROR [UNICAST] bbb.bb.b.bbb:53434: sender window for bbb.bb.b.bbb:53434 not found
      14:59:06,741 WARN  [GMS] join(bbb.bb.b.bbb:53434) sent to aaa.aa.aa.aaa:40220 timed out (after 3000 ms), retrying
      14:59:09,751 WARN  [GMS] join(bbb.bb.b.bbb:53434) sent to aaa.aa.aa.aaa:40220 timed out (after 3000 ms), retrying
      14:59:10,063 ERROR [UNICAST] bbb.bb.b.bbb:53434: sender window for bbb.bb.b.bbb:53434 not found

      14:59:20,710 WARN  [BisocketServerInvoker] org.jboss.remoting.transport.bisocket.BisocketServerInvoker$ControlMonitorTimerTask@109f69c: detected failure on control connection Thread[control:
      Socket[addr=xxx02.xxx.xxxxx.com/aaa.aa.aa.aaa,port=50637,localport=34757],5,jboss] (4sv63z-6umcfq-h5lh0ioa-1-h5lhd6q0-er: requesting new control connection
      14:59:20,892 ERROR [UNICAST] bbb.bb.b.bbb:53434: sender window for bbb.bb.b.bbb:53434 not found
      14:59:24,502 ERROR [UNICAST] bbb.bb.b.bbb:53434: sender window for bbb.bb.b.bbb:53434 not found
      14:59:28,112 ERROR [UNICAST] bbb.bb.b.bbb:53434: sender window for bbb.bb.b.bbb:53434 not found
      14:59:31,722 ERROR [UNICAST] bbb.bb.b.bbb:53434: sender window for bbb.bb.b.bbb:53434 not found
      14:59:35,332 ERROR [UNICAST] bbb.bb.b.bbb:53434: sender window for bbb.bb.b.bbb:53434 not found
      14:59:38,942 ERROR [UNICAST] bbb.bb.b.bbb:53434: sender window for bbb.bb.b.bbb:53434 not found
      14:59:42,552 ERROR [UNICAST] bbb.bb.b.bbb:53434: sender window for bbb.bb.b.bbb:53434 not found
      14:59:46,162 ERROR [UNICAST] bbb.bb.b.bbb:53434: sender window for bbb.bb.b.bbb:53434 not found
      14:59:49,771 ERROR [UNICAST] bbb.bb.b.bbb:53434: sender window for bbb.bb.b.bbb:53434 not found

       


      Logs from Node-2

       

      14:58:15,878 WARN  [ClusterConnectionManager] Connection failure detected. Clean up and retry connection. maxRetry: -1 retryInterval: 5000
      14:58:17,923 ERROR [ClusterConnectionManager] Retrying ConnectionInfo org.jboss.messaging.core.impl.clusterconnection.ClusterConnectionManager$ConnectionInfo@13da2f0 failed after maxmum retry
      : 0
      14:58:18,943 WARN  [BisocketServerInvoker] org.jboss.remoting.transport.bisocket.BisocketServerInvoker$ControlMonitorTimerTask@13cb06d: detected failure on control connection Thread[control:
      Socket[addr=xxx01.xxx.xxxxx.com/bbb.bb.b.bbb,port=33852,localport=48464],5,] (4sv65s-iiagu4-h5lhb5u1-1-h5lhd5ce-al: requesting new control connection
      14:58:49,490 INFO  [GroupMember] org.jboss.messaging.core.impl.postoffice.GroupMember$ControlMembershipListener@1bebf70 got new view [aaa.aa.aa.aaa:40220|32] [aaa.aa.aa.aaa:40220], old view is
      [bbb.bb.b.bbb:53434|31] [bbb.bb.b.bbb:53434, aaa.aa.aa.aaa:40220]
      14:58:49,490 INFO  [GroupMember] I am (aaa.aa.aa.aaa:40220)
      14:58:49,492 INFO  [MessagingPostOffice] JBoss Messaging is failing over for failed node 1. If there are many messages to reload this may take some time...
      14:58:49,665 INFO  [MessagingPostOffice] JBoss Messaging failover completed
      14:58:49,665 INFO  [GroupMember] Dead members: 1 ([bbb.bb.b.bbb:53434])
      14:58:49,665 INFO  [GroupMember] All Members : 1 ([aaa.aa.aa.aaa:40220])
      14:58:50,539 INFO  [MARS-PARTITION] Suspected member: bbb.bb.b.bbb:53434
      14:58:50,645 WARN  [NAKACK] aaa.aa.aa.aaa:40220] discarded message from non-member bbb.bb.b.bbb:53434, my view is [aaa.aa.aa.aaa:40220|32] [aaa.aa.aa.aaa:40220]
      14:58:50,681 INFO  [MARS-PARTITION] New cluster view for partition MARS-PARTITION (id: 33, delta: -1) : [aaa.aa.aa.aaa:1099]
      14:58:50,690 INFO  [MARS-PARTITION] I am (aaa.aa.aa.aaa:1099) received membershipChanged event:
      14:58:50,691 INFO  [MARS-PARTITION] Dead members: 1 ([bbb.bb.b.bbb:1099])
      14:58:50,691 INFO  [MARS-PARTITION] New Members : 0 ([])
      14:58:50,691 INFO  [MARS-PARTITION] All Members : 1 ([aaa.aa.aa.aaa:1099])


      02:49:48,706 ERROR [JmsServerSession] Unexpected error delivering message delegator->JBossMessage[41537308229632249]:PERSISTENT, deliveryId=249
      java.lang.reflect.UndeclaredThrowableException
              at $Proxy202.onMessage(Unknown Source)
              at org.jboss.resource.adapter.jms.inflow.JmsServerSession.onMessage(JmsServerSession.java:179)
              at org.jboss.jms.client.container.ClientConsumer.callOnMessageStatic(ClientConsumer.java:160)
              at org.jboss.jms.client.container.SessionAspect.handleRun(SessionAspect.java:831)
              at org.jboss.aop.advice.org.jboss.jms.client.container.SessionAspect_z_handleRun_8285419.invoke(SessionAspect_z_handleRun_8285419.java)
              at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:102)
              at org.jboss.jms.client.container.ClosedInterceptor.invoke(ClosedInterceptor.java:170)
              at org.jboss.aop.advice.PerInstanceInterceptor.invoke(PerInstanceInterceptor.java:86)
              at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:102)
              at org.jboss.jms.client.delegate.ClientSessionDelegate.run(ClientSessionDelegate.java)
              at org.jboss.jms.client.JBossSession.run(JBossSession.java:199)
              at org.jboss.resource.adapter.jms.inflow.JmsServerSession.run(JmsServerSession.java:236)
              at org.jboss.resource.work.WorkWrapper.execute(WorkWrapper.java:205)
              at org.jboss.util.threadpool.BasicTaskWrapper.run(BasicTaskWrapper.java:260)
              at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
              at java.lang.Thread.run(Thread.java:619)

       

       

       

      Thanks,

      Sri.