6 Replies Latest reply on Sep 29, 2011 8:30 AM by bnc119

    HornetQ 2.2.5 JMS Bridge not reconnecting after network outage

    bnc119

      Hi HornetQ community:

       

      I have two hosts:  Host A, and Host B.  There is a HornetQ 2.2.5 server running on on each host.  Each HornetQ server has a single topic configured:  let us call it "exampleTopic".

       

      Host A has a msg publisher writing to exampleTopic on Host A.  The messages get diverted from the topic to a queue, then "bridged" to exampleTopic on Host B.  There is a single subscriber reading the messages off exampleTopic on Host B.

       

      All is well with this simple setup until I deliberately sever the network between the two machines.  If I sever the connection for long enough (approx 20-30 seconds), I usually get a warning in the Host A HornetQ log saying:

       

      org.hornetq.core.server.cluster.impl.BridgeImpl:  Unable to send message, will try again when bridge reconnects.

      err:  HornetQException [ error code = 3] message = Timed out waiting for response when sending packet

       

      Once I re-establish the network between the two machines, Host B usually receives the next message that was due to arrive just before the network went down, but no messages after that.  If the HornetQ server on Host B is re-started,  Host B immediately receives all the queued messages that were missed while the network was down.

       

      On the other hand, if I reconnect the network before seeing the warning message above, messages get buffered up correctly and delivered to Host B after the network is restored.

       

      Is there something else I need to configure to achieve fully transparent fault-tolerant bridge re-connections?  I have connection-ttl = -1 on both HornetQ servers.  My understanding is that this will prevent a server from shutting down a connection if it detects a network fault/crashed client/etc...

       

      Thanks