Failover after failback not working with HornetQ 2.4.0 Final and JBoss 7.2
anilkumar_konapure Sep 2, 2014 9:26 AMWe are using Jboss7.2 with HornetQ2.4. Topology is collocated HA using in-vm connection factory.
When we start JBOSS, sometimes JMS server is not starting and it continuously gives the error:
INFO [org.hornetq.jms.server] (ServerService Thread Pool -- 122) HQ121004: JMS Server Manager Caching command for destroyConnectionFactory for RemoteConnectionFactory since the JMS Server is not active yet
HQ121004: JMS Server Manager Caching command for createQueue for XXX since the JMS Server is not active yet
(MSC service thread 1-17) HQ122018: Could not start recovery discovery on XARecoveryConfig [transportConfiguration = [TransportConfiguration(name=8bfca16f-baf5-11e3-bcee-cb0c6044933c, factory=org-hornetq-core-remoting-impl-invm-InVMConnectorFactory) ?server-id=0], discoveryConfiguration = null, username=null, password=****], we will retry every recovery scan until the server is available
INFO [org.hornetq.ra] (default-threads - 2) HQ151005: awaiting HornetQ Server availability
ERROR [org.hornetq.core.server] (HQ119000: Activation for server HornetQServerImpl::serverUUID=null) HQ224000: Failure in initialisation: HornetQIllegalStateException[errorType=ILLEGAL_STATE message=HQ119026: Backup Server was not yet in sync with live]
at org.hornetq.core.server.impl.HornetQServerImpl$SharedNothingBackupActivation.run(HornetQServerImpl.java:2523) [hornetq-server-2.4.0-SNAPSHOT.jar:]
at java.lang.Thread.run(Thread.java:744) [rt.jar:1.7.0_45]
ERROR [stderr] (HQ119000: Activation for server HornetQServerImpl::serverUUID=null) HornetQIllegalStateException[errorType=ILLEGAL_STATE message=HQ119026: Backup Server was not yet in sync with live]
If the server is restarted manually it starts working.
However on subsequent failover of the other node Backup server doesn’t get turned to Live for the first node. In this scenario messages are getting lost.
Scenario is:
Node 1: Start(Live/Backup) Node 2:Start(Live/Backup)
Messages are created.
Load balancing works fine.
Node 1: Stop() Node 2: (Live/Live) -> First Failover
Load balancing working fine. Messages were replicated to 28
Node 1: Start()
ConnectionFactory Error (refer above stack trace). JMS server not started
Manual Restart of Node 1
Node 1: (Live/Backup) Node 2: (Live/Backup) -> First Failback
Load balancing working fine.
Node 1: (Live/Backup) Node 2: (Stop) -> Second failover
Backup did not turn to Live
Hence messages from Node 2’s Live were not replicated to Node 1 and they were lost.
In the logs i see following error which seems to be the reason,
2014-09-02 03:26:31,241 WARN [org.hornetq.core.server] (Thread-30 (HornetQ-client-global-threads-1912556493)) HQ222095: Connection failed with failedOver=false: HornetQNotConnectedException[errorType=NOT_CONNECTED message=HQ119006: Channel disconnected]
at org.hornetq.core.client.impl.ClientSessionFactoryImpl.connectionDestroyed(ClientSessionFactoryImpl.java:421) [hornetq-core-client-2.4.0-SNAPSHOT.jar:]
at org.hornetq.core.remoting.impl.netty.NettyConnector$Listener$1.run(NettyConnector.java:871) [hornetq-core-client-2.4.0-SNAPSHOT.jar:]
at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:107) [hornetq-core-client-2.4.0-SNAPSHOT.jar:]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_45]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_45]
at java.lang.Thread.run(Thread.java:744) [rt.jar:1.7.0_45]
Any suggestions on this would be a great help.
Thanks,
Anil