connection to a backup in a cluster
jmesnil Nov 26, 2009 1:29 PMI found an issue while working on the cluster with backup failover test.
setup is following:
* 3 live servers (0, 1, 2) and 3 corresponding backup server (3, 4, 5)
* cluster connections b/w all the nodes
the test:
* start all servers
* wait for all bindings
* send/receive messages
* fail node #0
=> backup #3 is activated and start its cluster conns to live nodes #1 & #2
* wait for all bindings
* send/receive messages
* fail node #1
=> backup #4 is activated and starts its cluster conns to live nodes #0 and #2
And there is a problem here: the backup #4 has a cluster conn configured with live #0 and #backup #3.
It will try to connect to #0 again and again and it will not connect to backup #3.
This cluster conn issue can be replicated with a client configured with static connectors, to reconnect infinitely and live server which is down. it will go in a infinite loop when creating the session even though there is a backup server configured:
// #0 is the live node, #1 is its backup setupClusters(); startServers(0, 1); stopServers(0); TransportConfiguration liveTC = new TransportConfiguration(InVMConnectorFactory.class.getName()); liveTC.getParams().put(TransportConstants.SERVER_ID_PROP_NAME, 0); TransportConfiguration backupTC = new TransportConfiguration(InVMConnectorFactory.class.getName()); backupTC.getParams().put(TransportConstants.SERVER_ID_PROP_NAME, 1); ClientSessionFactoryImpl sf = new ClientSessionFactoryImpl(liveTC, backupTC); sf.setReconnectAttempts(-1); // => infinite loop to connecto the server #0 which is down ClientSession session = sf.createSession(); assertNotNull(session);
I'll have to change the ClusterConnection code to support that use case. Something like connecting to the live server a finite number of time and if it does not succeeds, open a connection to the backup server instead. I need to think more about it as it can introduce another set of pb (eg while starting a cluster, if a cluster conn connect to a backup before the corresponding live server is started and activate it, etc.)