HA: errors on resurrected live server after failover
cwong15 Mar 12, 2012 4:31 PMHi. I am testing HA/failover on HornetQ 2.2.5 based on the examples. I am puzzled about connection error messages that come up during the failover scenario. This is the sequence:
- Run 2 HornetQ instances, configured as a live/backup HA pair using discovery.
- Shut down the live instance (failover-on-shutdown is true).
- The backup becomes live, as expected.
- Start up the original live instance (allow-failback is true).
- The newly resurrected instance starts logging reconnection error messages every 2 seconds. Everything seems to work otherwise.
This is what the error messages look like:
03-12-2012 16:12:51 DEBUG impl.ClientSessionFactoryImpl: Trying reconnection attempt 21
03-12-2012 16:12:51 DEBUG netty.NettyConnector: Started Netty Connector version 3.2.3.Final-r${buildNumber}
03-12-2012 16:12:51 DEBUG impl.ClientSessionFactoryImpl: Trying to connect at the main server using connector :org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=5446&host=172-17-172-5&tcp-send-buffer-size=262144&tcp-no-delay=true&tcp-receive-buffer-size=262144
03-12-2012 16:12:51 DEBUG impl.ClientSessionFactoryImpl: Main server is not up. Hopefully there's a backup configured now!
This seems to be the stack trace where this is happening:
"Thread-1 (group:HornetQ-client-global-threads-1119552518)" daemon prio=10 tid=0x00007f292c00f000 nid=0x76e1 in Object.wait() [0x00007f2940b9e000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000007b1b55178> (a java.lang.Object)
at org.hornetq.core.client.impl.ClientSessionFactoryImpl.getConnectionWithRetry(ClientSessionFactoryImpl.java:916)
- locked <0x00000007b1b55178> (a java.lang.Object)
at org.hornetq.core.client.impl.ClientSessionFactoryImpl.reconnectSessions(ClientSessionFactoryImpl.java:840)
at org.hornetq.core.client.impl.ClientSessionFactoryImpl.failoverOrReconnect(ClientSessionFactoryImpl.java:588)
- locked <0x00000007b1b549b8> (a java.lang.Object)
at org.hornetq.core.client.impl.ClientSessionFactoryImpl.handleConnectionFailure(ClientSessionFactoryImpl.java:482)
at org.hornetq.core.client.impl.ClientSessionFactoryImpl.access$800(ClientSessionFactoryImpl.java:78)
at org.hornetq.core.client.impl.ClientSessionFactoryImpl$DelegatingFailureListener.connectionFailed(ClientSessionFactoryImpl.java:1318)
at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.callFailureListeners(RemotingConnectionImpl.java:528)
at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.fail(RemotingConnectionImpl.java:298)
at org.hornetq.core.client.impl.ClientSessionFactoryImpl$Channel0Handler$1.run(ClientSessionFactoryImpl.java:1262)
at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:100)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
The reconnection attempts are failing because the resurrected server has become the live server again (failback), but this same live server is trying to connect to the server that has reverted to backup mode. What puzzles me is that I do not have any retries set on my connection factories, so they should not be attempting continuously to reconnect. Where is this connection activity coming from, and is it benign?
For what it's worth, these connection factories are configured in my hornetq-jms.xml:
<connection-factory name="hornetqConnectionFactory">
<xa>false</xa>
<connectors>
<connector-ref connector-name="netty-connector"/>
</connectors>
<entries>
<entry name="/hornetqConnectionFactory"/>
</entries>
<ha>true</ha>
<use-global-pools>false</use-global-pools>
</connection-factory>
<connection-factory name="hornetqXaConnectionFactory">
<xa>true</xa>
<connectors>
<connector-ref connector-name="netty-connector"/>
</connectors>
<entries>
<entry name="/hornetqXaConnectionFactory"/>
</entries>
<ha>true</ha>
<use-global-pools>false</use-global-pools>
</connection-factory>
Thanks in advance for any insight.