3 Replies Latest reply on Dec 15, 2005 2:12 AM by jukvaa

    Failover of MDB connection takes a long time

    siano

      I have the following problem with clustering (a two node cluster), a JMS-Queue implemented as a HA-Singleton and MDBs with JBoss 4.0.1SP1.

      Under some conditions the clustering works:
      If I shutdown the node, that hosts the JMS-Queues, the other node will take over the Queues after about 15 seconds and the remaining MDBs on that node start processing the messages again.
      Even though I would wish the failover to take part a little faster, this is the expected behavoiur.

      However, if I turn off the node, that hosts the JMS queues (or remove the network cable), the remaining node will start to take over the JMS-Queues after about 20 seconds. After that time, I can send new messages to the Queue, but the MDBs don't process them for 15 Minutes. After that time the message processing starts again (and the messages sent in the meantime get processed).

      During this 15 Minutes period, I get the following log messages every minute.

      2005-04-22 14:21:04,438 WARN [org.jboss.mq.Connection] Connection failure:
      org.jboss.mq.SpyJMSException: Connection Failed; - nested throwable: (java.io.IOException: ping timeout.)
       at org.jboss.mq.Connection.asynchFailure(Connection.java:436)
       at org.jboss.mq.Connection$PingTask.run(Connection.java:1385)
       at EDU.oswego.cs.dl.util.concurrent.ClockDaemon$RunLoop.run(ClockDaemon.java:364)
       at java.lang.Thread.run(Thread.java:534)
      Caused by: java.io.IOException: ping timeout.
       at org.jboss.mq.Connection$PingTask.run(Connection.java:1377)
       ... 2 more
      2005-04-22 14:21:04,499 WARN [org.jboss.mq.Connection] Connection failure:
      org.jboss.mq.SpyJMSException: Connection Failed; - nested throwable: (java.io.IOException: ping timeout.)
       at org.jboss.mq.Connection.asynchFailure(Connection.java:436)
       at org.jboss.mq.Connection$PingTask.run(Connection.java:1385)
       at EDU.oswego.cs.dl.util.concurrent.ClockDaemon$RunLoop.run(ClockDaemon.java:364)
       at java.lang.Thread.run(Thread.java:534)
      Caused by: java.io.IOException: ping timeout.
       at org.jboss.mq.Connection$PingTask.run(Connection.java:1377)
       ... 2 more
      


      Is there a way to reduce the timeout, till the MDBs reconnect to the Queue?

        • 1. Re: Failover of MDB connection takes a long time

          Hi,

          Did you find any solution to this? I'm having the same problem with JBoss 4.0.2.


          13:36:25,997 WARN [Connection] Connection failure, use javax.jms.Connection.setExceptionListener() to handle this error and reconnect
          org.jboss.mq.SpyJMSException: No pong received; - nested throwable: (java.io.IOException: ping timeout.)
           at org.jboss.mq.Connection$PingTask.run(Connection.java:1323)
           at EDU.oswego.cs.dl.util.concurrent.ClockDaemon$RunLoop.run(ClockDaemon.java:364)
           at java.lang.Thread.run(Thread.java:595)
          Caused by: java.io.IOException: ping timeout.
           ... 3 more
          


          Warning says about setExceptionListener but I cannot see how I could use
          it inside MDB...

          • 2. Re: Failover of MDB connection takes a long time
            vegecat

            I experienced similar problems. In my case, I tested stateful session beans failover. I unplugged the network cable of the JBoss server which was responding to the request. The other clustered JBoss server discovered the failure but did not respond until five minutes later. I have no idea where the problem lies in the client smart proxy or the JBoss server.

            If you found the solution, please share with us. Thanks.

            • 3. Re: Failover of MDB connection takes a long time

              The MDB problem we eventually solved by deploying MDB as ha-singleton. In JBoss 4.0.3 it seemed to work ok, but upgrading was not an option for us.

              With remote EJB clients, we were able to reduce the failover time by editing TCP settings. In Linux we reduced /proc/sys/net/ipv4/tcp_retries2 value to 2, which caused failover in reasonable time.