Following the bug reporting instructions, I believe this issue is related to these JIRAs:
Our environment essentially consists of two nodes (A and B) that are connected with an unreliable, low bandwidth, high latency, error-prone network (e.g., wireless or satellite communications). We have established a core bridge on node A as a way to reliably forward messages across this network to node B. I believe we have set all of the relevant settings so the bridge should re-connect:
If the bridge is idle it will reconnect after a network disconnect. However, if the network is disconnected (for various lengths of time) then reconnected while sending a message across the two nodes the bridge hangs and will not resume sending messages. Server A detects the connection failure:
[Thread-0 (group:HornetQ-client-global-threads-2089015486)] 09:38:24,844 WARNING [org.hornetq.core.remoting.impl.RemotingConnectionImpl] Connection failure has been detected: Did not receive data from server for org.hornetq.integration.transports.netty.NettyConnection@279977bd[local= /172.16.1.93:41790, remote=/172.16.2.129:5445] [code=3]
The bridge will not resume sending once the network is re-established. If HornetQ on server A is restarted, the bridge will re-connect and resume sending messages. Note that in attempting to shut down server A these messages appear:
[HornetQ Server Shutdown Timer] 10:07:44,571 WARNING [org.hornetq.core.server.cluster.impl.BridgeImpl] Timed out waiting to stop
[HornetQ Server Shutdown Timer] 10:08:16,406 WARNING [org.hornetq.core.server.impl.HornetQServerImpl] Timed out waiting for pool to terminate
Based on the documentation I was expecting the bridge to detect that the send operation has failed and attempt to re-connect.