10 Replies Latest reply on Jan 21, 2014 9:15 PM by jbertram

    Failover: online data replication failure

    jeroen.v

      Hi,

       

      We 're currently using HornetQ 2.1.2-FINAL and we regularly have a split brain scenario. This currently happens at random times but I have the impression it happens especially during periods of low activity. Does anyone have any ideas?

       

      From the log file of the master HornetQ node:

      [Thread-16 (group:HornetQ-server-threads995824187-1527616973)] 18:13:25,660 WARNING [org.hornetq.core.protocol.core.impl.R

      emotingConnectionImpl]  Connection failure has been detected: Did not receive data from server for org.hornetq.core.remoti

      ng.impl.netty.NettyConnection@5bfd526f[local= /192.168.150.3:41360, remote=remote-backup-server/192.168.150.4:5446] [code=

      3]

      [Thread-16 (group:HornetQ-server-threads995824187-1527616973)] 18:13:27,671 WARNING [org.hornetq.core.replication.impl.Rep

      licationManagerImpl]  Connection to the backup node failed, removing replication now

      HornetQException[errorCode=3 message=Did not receive data from server for org.hornetq.core.remoting.impl.netty.NettyConnection@5bfd526f[local= /192.168.150.3:41360, remote=remote-backup-server/192.168.150.4:5446]]

              at org.hornetq.core.client.impl.FailoverManagerImpl$PingRunnable.run(FailoverManagerImpl.java:1198)

       

      From the log file of the backup HornetQ node:

      [main] 09:24:24,204 INFO [org.hornetq.core.server.impl.HornetQServerImpl]  HornetQ Server version 2.1.2.Final (Colmeia, 120) started

      [Old I/O server worker (parentId: 1682540307, channelId: 1591650152, null => /192.168.150.4:5446)] 18:13:27,715 INFO [org.hornetq.core.server.impl.HornetQServerImpl]  Activating backup server

      [Old I/O server worker (parentId: 1682540307, channelId: 1591650152, null => /192.168.150.4:5446)] 18:13:27,716 INFO [org.hornetq.core.persistence.impl.journal.JournalStorageManager]  Using AIO Journal

      [Old I/O server worker (parentId: 1682540307, channelId: 1591650152, null => /192.168.150.4:5446)] 18:13:30,086 WARNING [org.hornetq.core.server.cluster.impl.BroadcastGroupImpl]  local-bind-address specified for broadcast group but no local-bind-port specified so socket will NOT be bound to a local address/port

      [Thread-27 (group:HornetQ-server-threads1136341612-1619783666)] 18:13:32,087 INFO [org.hornetq.core.server.cluster.impl.BridgeImpl]  Connecting bridge sf.local-cluster.466fcfde-021b-11e0-af98-1cc1de6fb76e to its destination

      [hornetq-failure-check-thread] 18:13:32,114 WARNING [org.hornetq.core.protocol.core.impl.RemotingConnectionImpl]  Connection failure has been detected: Did not receive ping from /192.168.150.3:41360. It is likely the client has exited or crashed without closing its connection, or the network between the server and client has failed. The connection will now be closed. [code=3]

       

      Thanks, Jeroen