Failover: online data replication failure
jeroen.v Feb 23, 2011 4:32 AMHi,
We 're currently using HornetQ 2.1.2-FINAL and we regularly have a split brain scenario. This currently happens at random times but I have the impression it happens especially during periods of low activity. Does anyone have any ideas?
From the log file of the master HornetQ node:
[Thread-16 (group:HornetQ-server-threads995824187-1527616973)] 18:13:25,660 WARNING [org.hornetq.core.protocol.core.impl.R
emotingConnectionImpl] Connection failure has been detected: Did not receive data from server for org.hornetq.core.remoti
ng.impl.netty.NettyConnection@5bfd526f[local= /192.168.150.3:41360, remote=remote-backup-server/192.168.150.4:5446] [code=
3]
[Thread-16 (group:HornetQ-server-threads995824187-1527616973)] 18:13:27,671 WARNING [org.hornetq.core.replication.impl.Rep
licationManagerImpl] Connection to the backup node failed, removing replication now
HornetQException[errorCode=3 message=Did not receive data from server for org.hornetq.core.remoting.impl.netty.NettyConnection@5bfd526f[local= /192.168.150.3:41360, remote=remote-backup-server/192.168.150.4:5446]]
at org.hornetq.core.client.impl.FailoverManagerImpl$PingRunnable.run(FailoverManagerImpl.java:1198)
From the log file of the backup HornetQ node:
[main] 09:24:24,204 INFO [org.hornetq.core.server.impl.HornetQServerImpl] HornetQ Server version 2.1.2.Final (Colmeia, 120) started
[Old I/O server worker (parentId: 1682540307, channelId: 1591650152, null => /192.168.150.4:5446)] 18:13:27,715 INFO [org.hornetq.core.server.impl.HornetQServerImpl] Activating backup server
[Old I/O server worker (parentId: 1682540307, channelId: 1591650152, null => /192.168.150.4:5446)] 18:13:27,716 INFO [org.hornetq.core.persistence.impl.journal.JournalStorageManager] Using AIO Journal
[Old I/O server worker (parentId: 1682540307, channelId: 1591650152, null => /192.168.150.4:5446)] 18:13:30,086 WARNING [org.hornetq.core.server.cluster.impl.BroadcastGroupImpl] local-bind-address specified for broadcast group but no local-bind-port specified so socket will NOT be bound to a local address/port
[Thread-27 (group:HornetQ-server-threads1136341612-1619783666)] 18:13:32,087 INFO [org.hornetq.core.server.cluster.impl.BridgeImpl] Connecting bridge sf.local-cluster.466fcfde-021b-11e0-af98-1cc1de6fb76e to its destination
[hornetq-failure-check-thread] 18:13:32,114 WARNING [org.hornetq.core.protocol.core.impl.RemotingConnectionImpl] Connection failure has been detected: Did not receive ping from /192.168.150.3:41360. It is likely the client has exited or crashed without closing its connection, or the network between the server and client has failed. The connection will now be closed. [code=3]
Thanks, Jeroen