7 Replies Latest reply on Jul 24, 2011 6:58 PM by clebert.suconic

On Cluster the core bridge created for cluster has reconnect attempts as -1

jalandip Feb 1, 2011 2:00 AM

I have a two node setup on cluster with no backups using 2.1.2. When one of the peer is shutdown the live peer detects the shutdown but the ClientSession keeps trying to reconnect. I saw the client created for the bridge has reconnect attempts set to -1 in ClusterConnectionImpl.java. I see lots of connection exception when the peer server is down with no connectivity at all. Its easily reproducible by shutting down the peer so that all connectivity is lost.

My question is

1 ) why is reconnect attempts set to -1 on the core bridge created for cluster?

2) Is there any config knob to set this to some value i havent found any such value till now now?

1. On Cluster the core bridge created for cluster has reconnect attempts as -1

ataylor Feb 1, 2011 3:03 AM (in response to jalandip)

yes bridges automatically try to reconnect when a server is unavailable. This is to cope with temporary unavailibilty. This should stop tho once the node has been informed of a change in available cluster nodes, see updateConnectors method
Actions
2. On Cluster the core bridge created for cluster has reconnect attempts as -1

jalandip Feb 1, 2011 5:28 AM (in response to ataylor)

Andy that true but something is not working correctly here. When i shutdown the peer i get the following message
===============================================================
2011-02-01 09:31:56,248 WARN [org.hornetq.core.protocol.core.impl.RemotingConnectionImpl] (Thread-3 (group:HornetQ-client-global-threads-944889298):) Connection failure has been detected: The connection was disconnected because of server shutdown [code=4]
2011-02-01 09:31:56,251 WARN [org.hornetq.core.protocol.core.impl.RemotingConnectionImpl] (Thread-6 (group:HornetQ-client-global-threads-944889298):) Connection failure has been detected: The connection was disconnected because of server shutdown [code=4]
2011-02-01 09:31:56,252 WARN [org.hornetq.core.protocol.core.impl.RemotingConnectionImpl] (Thread-7 (group:HornetQ-client-global-threads-944889298):) Connection failure has been detected: The connection was disconnected because of server shutdown [code=4]
================================================================

but the connector goes into a failover mode and keeps trying to reconnect. Just doing a tcp dump on the live server shows it sends tcp syn packets to the peer. Also if i now make the peer unavailable in the network i get a no route to host exception continously.

I put a log in DiscoveryGroupImpl.java in the main thread loop but i see that when connection is down the connectors Map becomes empty and so the callListeners method is never called from here.

This is easily reproducible in the examples clustered-topic. I just changed the ClusteredExampleTopic.java to pump more logs and added a sleep of 2 sec after every publish. Now when servers are up I kill the second instance. And doing a tcp dump on the loopback port shows that the live server is still trying to connect to the peer. This looks like a bug to me. I also added some logs in the connectorChange() method in DiscoveryGroupImpl.java and it is never called.
Actions
3. On Cluster the core bridge created for cluster has reconnect attempts as -1

ataylor Feb 1, 2011 5:51 AM (in response to jalandip)

It may or not be a bug but we have lots of tests for this functionality and to be honest its acedemic as we have just re written this functionality. If you want to debug further feel free, start by debugging the run method of discoverygroupimpl to see what happens when the connector times out and is removed.
Actions
4. On Cluster the core bridge created for cluster has reconnect attempts as -1

jalandip Feb 1, 2011 6:07 AM (in response to ataylor)

you mean 2.2 has lots of changes around this area?
Actions
5. On Cluster the core bridge created for cluster has reconnect attempts as -1

ataylor Feb 1, 2011 6:35 AM (in response to jalandip)

yes
Actions
6. Re: On Cluster the core bridge created for cluster has reconnect attempts as -1

rick.dong Jul 23, 2011 10:38 PM (in response to ataylor)

Any progress on this one Jalandip? It looks like in 2.1.2, the ClusterConnectionImpl was implementing DiscoveryListener so that the updateConnector method did get called but in 2.2.5, it no longer implements DiscoveryListener. With the reconnectionAttempts = -1 for the clustered connection, the remote queue binding never gets removed until the crashed server gets back up again. If the client failed over when the crash happens, on the live server now, there would be two bindings (one local and one remote) for the same address, with the round robin delivery, the client will miss message when the remote binding gets the turn. Is this a bug Andy?
Actions
7. Re: On Cluster the core bridge created for cluster has reconnect attempts as -1

clebert.suconic Jul 24, 2011 6:58 PM (in response to rick.dong)

This is as designed... you are supposed to start the server back.

However this is being changed now. I'm introducing a reconnect attempt# on the cluster connection. That means the cluster will be considered dead and the remote binding will be removed on that case.
Actions

Go to original post