yes bridges automatically try to reconnect when a server is unavailable. This is to cope with temporary unavailibilty. This should stop tho once the node has been informed of a change in available cluster nodes, see updateConnectors method
Andy that true but something is not working correctly here. When i shutdown the peer i get the following message
2011-02-01 09:31:56,248 WARN [org.hornetq.core.protocol.core.impl.RemotingConnectionImpl] (Thread-3 (group:HornetQ-client-global-threads-944889298):) Connection failure has been detected: The connection was disconnected because of server shutdown [code=4]
2011-02-01 09:31:56,251 WARN [org.hornetq.core.protocol.core.impl.RemotingConnectionImpl] (Thread-6 (group:HornetQ-client-global-threads-944889298):) Connection failure has been detected: The connection was disconnected because of server shutdown [code=4]
2011-02-01 09:31:56,252 WARN [org.hornetq.core.protocol.core.impl.RemotingConnectionImpl] (Thread-7 (group:HornetQ-client-global-threads-944889298):) Connection failure has been detected: The connection was disconnected because of server shutdown [code=4]
but the connector goes into a failover mode and keeps trying to reconnect. Just doing a tcp dump on the live server shows it sends tcp syn packets to the peer. Also if i now make the peer unavailable in the network i get a no route to host exception continously.
I put a log in DiscoveryGroupImpl.java in the main thread loop but i see that when connection is down the connectors Map becomes empty and so the callListeners method is never called from here.
This is easily reproducible in the examples clustered-topic. I just changed the ClusteredExampleTopic.java to pump more logs and added a sleep of 2 sec after every publish. Now when servers are up I kill the second instance. And doing a tcp dump on the loopback port shows that the live server is still trying to connect to the peer. This looks like a bug to me. I also added some logs in the connectorChange() method in DiscoveryGroupImpl.java and it is never called.
It may or not be a bug but we have lots of tests for this functionality and to be honest its acedemic as we have just re written this functionality. If you want to debug further feel free, start by debugging the run method of discoverygroupimpl to see what happens when the connector times out and is removed.
you mean 2.2 has lots of changes around this area?
Any progress on this one Jalandip? It looks like in 2.1.2, the ClusterConnectionImpl was implementing DiscoveryListener so that the updateConnector method did get called but in 2.2.5, it no longer implements DiscoveryListener. With the reconnectionAttempts = -1 for the clustered connection, the remote queue binding never gets removed until the crashed server gets back up again. If the client failed over when the crash happens, on the live server now, there would be two bindings (one local and one remote) for the same address, with the round robin delivery, the client will miss message when the remote binding gets the turn. Is this a bug Andy?
This is as designed... you are supposed to start the server back.
However this is being changed now. I'm introducing a reconnect attempt# on the cluster connection. That means the cluster will be considered dead and the remote binding will be removed on that case.