1 Reply Latest reply on Mar 13, 2010 3:36 AM by timfox

    Core Bridge hangs when network disconnected during message send

      Following the bug reporting instructions, I believe this issue is related to these JIRAs:

      https://jira.jboss.org/jira/browse/HORNETQ-216

      https://jira.jboss.org/jira/browse/HORNETQ-47

       

      Our environment essentially consists of two nodes (A and B) that are connected with an unreliable, low bandwidth, high latency, error-prone network (e.g., wireless or satellite communications). We have established a core bridge on node A as a way to reliably forward messages across this network to node B. I believe we have set all of the relevant settings so the bridge should re-connect:

      <bridges>

              <bridge name="source-to-dest-bridge">

                      <queue-name>jms.queue.BridgeSendQueue</queue-name>

                      <forwarding-address>jms.queue.BridgeReceiveQueue</forwarding-address>

                      <retry-interval>2000</retry-interval>

                      <retry-interval-multiplier>1.0</retry-interval-multiplier>

                      <reconnect-attempts>-1</reconnect-attempts>

                      <failover-on-server-shutdown>false</failover-on-server-shutdown>

                      <use-duplicate-detection>true</use-duplicate-detection>

                      <confirmation-window-size>10000000</confirmation-window-size>

                      <connector-ref connector-name="netty-destination-connector"/>

              </bridge>

         </bridges>

       

      If the bridge is idle it will reconnect after a network disconnect. However, if the network is disconnected (for various lengths of time) then reconnected while sending a message across the two nodes the bridge hangs and will not resume sending messages. Server A detects the connection failure:

       

      [Thread-0 (group:HornetQ-client-global-threads-2089015486)] 09:38:24,844 WARNING [org.hornetq.core.remoting.impl.RemotingConnectionImpl]  Connection failure has been detected: Did not receive data from server for org.hornetq.integration.transports.netty.NettyConnection@279977bd[local= /172.16.1.93:41790, remote=/172.16.2.129:5445] [code=3]

       

      The bridge will not resume sending once the network is re-established. If HornetQ on server A is restarted, the bridge will re-connect and resume sending messages. Note that in attempting to shut down server A these messages appear:

      [HornetQ Server Shutdown Timer] 10:07:44,571 WARNING [org.hornetq.core.server.cluster.impl.BridgeImpl]  Timed out waiting to stop

       

      And

       

      [HornetQ Server Shutdown Timer] 10:08:16,406 WARNING [org.hornetq.core.server.impl.HornetQServerImpl]  Timed out waiting for pool to terminate

       

      Based on the documentation I was expecting the bridge to detect that the send operation has failed and attempt to re-connect.