2 Replies Latest reply on Sep 3, 2014 12:48 AM by anilkumar_konapure

    Failover after failback not working with HornetQ 2.4.0 Final and JBoss 7.2

    anilkumar_konapure

      We are using Jboss7.2 with HornetQ2.4. Topology is collocated HA using in-vm connection factory.

      When we start JBOSS, sometimes JMS server is not starting and it continuously gives the error:

       

      INFO [org.hornetq.jms.server] (ServerService Thread Pool -- 122) HQ121004: JMS Server Manager Caching command for destroyConnectionFactory for RemoteConnectionFactory since the JMS Server is not active yet

      HQ121004: JMS Server Manager Caching command for createQueue for XXX since the JMS Server is not active yet

      (MSC service thread 1-17) HQ122018: Could not start recovery discovery on XARecoveryConfig [transportConfiguration = [TransportConfiguration(name=8bfca16f-baf5-11e3-bcee-cb0c6044933c, factory=org-hornetq-core-remoting-impl-invm-InVMConnectorFactory) ?server-id=0], discoveryConfiguration = null, username=null, password=****], we will retry every recovery scan until the server is available

      INFO [org.hornetq.ra] (default-threads - 2) HQ151005: awaiting HornetQ Server availability

      ERROR [org.hornetq.core.server] (HQ119000: Activation for server HornetQServerImpl::serverUUID=null) HQ224000: Failure in initialisation: HornetQIllegalStateException[errorType=ILLEGAL_STATE message=HQ119026: Backup Server was not yet in sync with live]

      at org.hornetq.core.server.impl.HornetQServerImpl$SharedNothingBackupActivation.run(HornetQServerImpl.java:2523) [hornetq-server-2.4.0-SNAPSHOT.jar:]

      at java.lang.Thread.run(Thread.java:744) [rt.jar:1.7.0_45]

      ERROR [stderr] (HQ119000: Activation for server HornetQServerImpl::serverUUID=null) HornetQIllegalStateException[errorType=ILLEGAL_STATE message=HQ119026: Backup Server was not yet in sync with live]

       

      If the server is restarted manually it starts working.

      However on subsequent failover of the other node Backup server doesn’t get turned to Live for the first node. In this scenario messages are getting lost.

       

      Scenario is:

      Node 1: Start(Live/Backup)         Node 2:Start(Live/Backup)

      Messages are created.

      Load balancing works fine.

      Node 1: Stop() Node 2: (Live/Live)  -> First Failover

      Load balancing working fine. Messages were replicated to 28

      Node 1: Start()

      ConnectionFactory Error (refer above stack trace). JMS server not started

      Manual Restart of Node 1

      Node 1: (Live/Backup) Node 2: (Live/Backup) -> First Failback

      Load balancing working fine.

      Node 1: (Live/Backup) Node 2: (Stop) -> Second failover

      Backup did not turn to Live

      Hence messages from Node 2’s Live were not replicated to Node 1 and they were lost.

       

      In the logs i see following error which seems to be the reason,

      2014-09-02 03:26:31,241 WARN [org.hornetq.core.server] (Thread-30 (HornetQ-client-global-threads-1912556493)) HQ222095: Connection failed with failedOver=false: HornetQNotConnectedException[errorType=NOT_CONNECTED message=HQ119006: Channel disconnected]

      at org.hornetq.core.client.impl.ClientSessionFactoryImpl.connectionDestroyed(ClientSessionFactoryImpl.java:421) [hornetq-core-client-2.4.0-SNAPSHOT.jar:]

      at org.hornetq.core.remoting.impl.netty.NettyConnector$Listener$1.run(NettyConnector.java:871) [hornetq-core-client-2.4.0-SNAPSHOT.jar:]

      at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:107) [hornetq-core-client-2.4.0-SNAPSHOT.jar:]

      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_45]

      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_45]

      at java.lang.Thread.run(Thread.java:744) [rt.jar:1.7.0_45]

       

      Any suggestions on this would be a great help.

       

      Thanks,

      Anil