We have a messaging application where we are using hornetq 2.3.0.Final. While testing hornetq fail-over and fail-back setup for high availability (with one live and one backup server), I am coming across one issue for which I am not able to find any solution.
I start the live and backup server. I can see backup announced log in backup server logs. I start a client application which connects to live server, creates consumers and producers and sends and receives messages. When I kill the live server, the failover works fine (with automatic client fail-over to backup node). The backup server becomes live, client application connects to the backup server and messaging works fine (though after the backup server is up, I can see the following exception in backup server logs:)
08:37:14,668 INFO [org.hornetq.core.server] HQ221020: Started Netty Acceptor version 3.6.2.Final-c0d783c 10.0.1.6:6455 for CORE protocol
08:37:14,672 INFO [org.hornetq.core.server] HQ221020: Started Netty Acceptor version 3.6.2.Final-c0d783c 10.0.1.6:6445 for CORE protocol
08:37:14,679 WARN [org.hornetq.core.client] HQ212028: error starting server locator: HornetQException[errorType=ILLEGAL_STATE message=null]
at org.hornetq.core.client.impl.ServerLocatorImpl.initialise(ServerLocatorImpl.java:371) [hornetq-core-client.jar:]
at org.hornetq.core.client.impl.ServerLocatorImpl.start(ServerLocatorImpl.java:566) [hornetq-core-client.jar:]
at org.hornetq.core.client.impl.ServerLocatorImpl$StaticConnector$1.connectionFailed(ServerLocatorImpl.java:1773) [hornetq-core-client.jar:]
at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.callFailureListeners(RemotingConnectionImpl.java:570) [hornetq-core-client.jar:]
at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.fail(RemotingConnectionImpl.java:341) [hornetq-core-client.jar:]
at org.hornetq.core.client.impl.ClientSessionFactoryImpl$CloseRunnable.run(ClientSessionFactoryImpl.java:1631) [hornetq-core-client.jar:]
at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:106) [hornetq-core-client.jar:]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_15]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_15]
at java.lang.Thread.run(Thread.java:722) [rt.jar:1.7.0_15]
When I bring the live server up again, fail-back happens and the backup server shuts down (though I can see the above exception on live server logs now). At this point, automatic client fail-back doesn't happen. The client application keeps waiting and doesn't re-connect to the live server now. If now, I start the backup server again, and then kill the live server again (causing the backup server to become live again), the client application re-connects to the backup server (which is now live) and messaging starts to work again. My guess is that earlier the client application kept waiting to re-connect to the backup server (when it should have re-connected to the live server after fail-back has happened).
Is this behavior expected or am I doing anything wrong. Restarting clients solves everything but I need clients to fail-back automatically.
Attached are my configuration files. Any help would be appreciated.