-
1. Re: Repititive Fail Over
ataylor May 13, 2014 5:16 AM (in response to yairogen)your live server must be incorrectly configured, make sure it announces itself correctly and that it announces the correct connector to use.
-
2. Re: Repititive Fail Over
yairogen May 13, 2014 6:20 AM (in response to ataylor)Which files can I attach so you can help me out?
Also - following up on https://community.jboss.org/thread/234727, could it be I just need to add the "max-saved-replicated-journal-size" parameter? Should that help?
-
3. Re: Repititive Fail Over
ataylor May 13, 2014 6:25 AM (in response to yairogen)check in the server logs that the backup gets announced on live restart, check in your configuration that the cluster connection is announcing the correct connector. if bot these are correct then it may be env. you could try debugging your client and seeing what topology gets set.
-
4. Re: Re: Repititive Fail Over
yairogen May 13, 2014 7:08 AM (in response to ataylor)live log:
07:13:06,890 INFO [org.hornetq.core.server] HQ221020: Started Netty Acceptor version 4.0.13.Final 10.45.37.122:5445 07:13:06,903 INFO [org.hornetq.core.server] HQ221020: Started Netty Acceptor version 4.0.13.Final 10.45.37.122:5455 07:13:06,921 INFO [org.hornetq.core.server] HQ221007: Server is now live 07:13:06,921 INFO [org.hornetq.core.server] HQ221001: HornetQ Server version 2.5.0.SNAPSHOT (Wild Hornet, 124) [705b9028-d9df-11e3-851c-8d2fc19a3b2f] 07:13:22,739 INFO [org.hornetq.core.server] HQ221025: Replication: sending JournalFileImpl: (hornetq-data-1.hq id = 148, recordID = 148) (size=10,485,760) to backup. AIOSequentialFile:/opt/hornetq/installed/hornetq-2.4.0.Final/bin/../data/journal/hornetq-data-1.hq 07:13:22,943 INFO [org.hornetq.core.server] HQ221025: Replication: sending JournalFileImpl: (hornetq-bindings-50.bindings id = 1, recordID = 1) (size=1,048,576) to backup. NIOSequentialFile ../data/bindings/hornetq-bindings-50.bindings 07:13:22,947 INFO [org.hornetq.core.server] HQ221025: Replication: sending JournalFileImpl: (hornetq-bindings-59.bindings id = 2, recordID = 2) (size=1,048,576) to backup. NIOSequentialFile ../data/bindings/hornetq-bindings-59.bindings 07:13:22,968 INFO [org.hornetq.core.server] HQ221025: Replication: sending JournalFileImpl: (hornetq-bindings-2.bindings id = 56, recordID = 56) (size=1,048,576) to backup. NIOSequentialFile ../data/bindings/hornetq-bindings-2.bindings
backup log:
07:13:22,111 INFO [org.hornetq.core.server] HQ221109: HornetQ Backup Server version 2.5.0.SNAPSHOT (Wild Hornet, 124) [null] started, waiting live to fail before it gets active 07:13:23,933 INFO [org.hornetq.core.server] HQ221024: Backup server HornetQServerImpl::serverUUID=705b9028-d9df-11e3-851c-8d2fc19a3b2f is synchronized with live-server. 07:13:26,923 INFO [org.hornetq.core.server] HQ221031: backup announced
killing live yields this in the backup:
07:15:41,216 WARN [org.hornetq.core.client] HQ212037: Connection failure has been detected: HQ119015: The connection was disconnected because of server shutdown [code=DISCONNECTED] 07:15:41,215 WARN [org.hornetq.core.client] HQ212037: Connection failure has been detected: HQ119015: The connection was disconnected because of server shutdown [code=DISCONNECTED] 07:15:41,251 WARN [org.hornetq.core.client] HQ212037: Connection failure has been detected: HQ119015: The connection was disconnected because of server shutdown [code=DISCONNECTED] 07:15:41,292 WARN [org.hornetq.core.client] HQ212004: Failed to connect to server. 07:15:41,289 INFO [org.hornetq.core.server] HQ221037: HornetQServerImpl::serverUUID=705b9028-d9df-11e3-851c-8d2fc19a3b2f to become 'live' 07:15:41,274 WARNING [io.netty.channel.AbstractChannel] Can't invoke task later as EventLoop rejected it: java.util.concurrent.RejectedExecutionException: event executor terminated at io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:703) [netty.jar:4.0.13.Final] at io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:296) [netty.jar:4.0.13.Final] at io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:688) [netty.jar:4.0.13.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe.invokeLater(AbstractChannel.java:727) [netty.jar:4.0.13.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe.deregister(AbstractChannel.java:593) [netty.jar:4.0.13.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:565) [netty.jar:4.0.13.Final] at io.netty.channel.DefaultChannelPipeline$HeadHandler.close(DefaultChannelPipeline.java:1018) [netty.jar:4.0.13.Final] at io.netty.channel.DefaultChannelHandlerContext.invokeClose(DefaultChannelHandlerContext.java:561) [netty.jar:4.0.13.Final] at io.netty.channel.DefaultChannelHandlerContext.close(DefaultChannelHandlerContext.java:546) [netty.jar:4.0.13.Final] at io.netty.channel.ChannelDuplexHandler.close(ChannelDuplexHandler.java:73) [netty.jar:4.0.13.Final] at io.netty.channel.DefaultChannelHandlerContext.invokeClose(DefaultChannelHandlerContext.java:561) [netty.jar:4.0.13.Final] at io.netty.channel.DefaultChannelHandlerContext.close(DefaultChannelHandlerContext.java:546) [netty.jar:4.0.13.Final] at io.netty.channel.DefaultChannelHandlerContext.close(DefaultChannelHandlerContext.java:424) [netty.jar:4.0.13.Final] at io.netty.channel.DefaultChannelPipeline.close(DefaultChannelPipeline.java:826) [netty.jar:4.0.13.Final] at io.netty.channel.AbstractChannel.close(AbstractChannel.java:178) [netty.jar:4.0.13.Final] at io.netty.channel.ChannelFutureListener$2.operationComplete(ChannelFutureListener.java:56) [netty.jar:4.0.13.Final] at io.netty.channel.ChannelFutureListener$2.operationComplete(ChannelFutureListener.java:52) [netty.jar:4.0.13.Final] at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:628) [netty.jar:4.0.13.Final] at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:593) [netty.jar:4.0.13.Final] at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:550) [netty.jar:4.0.13.Final] at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:414) [netty.jar:4.0.13.Final] at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:251) [netty.jar:4.0.13.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:502) [netty.jar:4.0.13.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:452) [netty.jar:4.0.13.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:346) [netty.jar:4.0.13.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:101) [netty.jar:4.0.13.Final] at java.lang.Thread.run(Thread.java:722) [rt.jar:1.7.0_05] 07:15:42,190 INFO [org.hornetq.core.server] HQ221003: trying to deploy queue jms.queue.DLQ 07:15:42,257 INFO [org.hornetq.core.server] HQ221003: trying to deploy queue jms.queue.ExpiryQueue 07:15:42,444 INFO [org.hornetq.core.server] HQ221020: Started Netty Acceptor version 4.0.13.Final 10.45.37.123:5445 07:15:42,451 INFO [org.hornetq.core.server] HQ221020: Started Netty Acceptor version 4.0.13.Final 10.45.37.123:5455
restarting live.
backup log - many more lines like this:
07:16:36,746 WARN [org.hornetq.core.client] HQ212034: There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=705b9028-d9df-11e3-851c-8d2fc19a3b2f 07:16:37,477 WARN [org.hornetq.core.client] HQ212034: There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=705b9028-d9df-11e3-851c-8d2fc19a3b2f 07:16:41,746 WARN [org.hornetq.core.client] HQ212034: There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=705b9028-d9df-11e3-851c-8d2fc19a3b2f 07:16:42,477 WARN [org.hornetq.core.client] HQ212034: There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=705b9028-d9df-11e3-851c-8d2fc19a3b2f 07:16:46,749 WARN [org.hornetq.core.client] HQ212034: There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=705b9028-d9df-11e3-851c-8d2fc19a3b2f 07:16:47,477 WARN [org.hornetq.core.client] HQ212034: There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=705b9028-d9df-11e3-851c-8d2fc19a3b2f 07:16:51,749 WARN [org.hornetq.core.client] HQ212034: There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=705b9028-d9df-11e3-851c-8d2fc19a3b2f 07:16:52,478 WARN [org.hornetq.core.client] HQ212034: There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=705b9028-d9df-11e3-851c-8d2fc19a3b2f 07:16:56,749 WARN [org.hornetq.core.client] HQ212034: There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=705b9028-d9df-11e3-851c-8d2fc19a3b2f 07:16:57,478 WARN [org.hornetq.core.client] HQ212034: There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=705b9028-d9df-11e3-851c-8d2fc19a3b2f 07:17:01,750 WARN [org.hornetq.core.client] HQ212034: There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=705b9028-d9df-11e3-851c-8d2fc19a3b2f
live log:
07:16:36,720 INFO [org.hornetq.core.server] HQ221020: Started Netty Acceptor version 4.0.13.Final 10.45.37.122:5445 07:16:36,733 INFO [org.hornetq.core.server] HQ221020: Started Netty Acceptor version 4.0.13.Final 10.45.37.122:5455 07:16:36,752 INFO [org.hornetq.core.server] HQ221007: Server is now live 07:16:36,753 INFO [org.hornetq.core.server] HQ221001: HornetQ Server version 2.5.0.SNAPSHOT (Wild Hornet, 124) [705b9028-d9df-11e3-851c-8d2fc19a3b2f] 07:16:41,746 WARN [org.hornetq.core.client] HQ212034: There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=705b9028-d9df-11e3-851c-8d2fc19a3b2f 07:16:42,478 WARN [org.hornetq.core.client] HQ212034: There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=705b9028-d9df-11e3-851c-8d2fc19a3b2f 07:16:46,748 WARN [org.hornetq.core.client] HQ212034: There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=705b9028-d9df-11e3-851c-8d2fc19a3b2f 07:16:47,478 WARN [org.hornetq.core.client] HQ212034: There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=705b9028-d9df-11e3-851c-8d2fc19a3b2f
NOTE: I try deleting the data folder in both nodes. Same behavior. looks like after replication once it doesn't know it should be back as backup. Also, the "<backup>true</backup>" flag is only set on the backup configuration and not on the live configuration node. is that ok?
-
-
6. Re: Re: Repititive Fail Over
yairogen May 13, 2014 7:59 AM (in response to ataylor)Just to verify I understand.
If I set "<allow-failback>true</allow-failback>" and "<check-for-live-server>true</check-for-live-server>" when primary goes back up it will detect the other live server using it's id and the newly started primary will kill the backup.
If I only set the "<check-for-live-server>true</check-for-live-server>" flag then the original primary server will detect there is an existing live server and it will take ownership and become a new backup instead.
Is that correct?
-
7. Re: Repititive Fail Over
jbertram May 13, 2014 10:39 AM (in response to yairogen)Is that correct?
Yes.