Failover / HA not working in stand-alone mode
ejb3workshop Dec 16, 2011 6:15 AMI am trying to setup a HA environment consisting of two severs. The data folder including the journal are shared between the server via an NFS share. The directory /mnt/share is common to both systems. After some fiddling the configuration I managed to get fail over working (I think).
When I stop server1 using CTRL+C the other (server1_standby) seems to take over and become the live server.
My test client sends a series of simple text messages to a queue hosted on the server. It uses JNDI and JMS api rather then HornetQ specific code. It does a standard JNDI lookup and retrieves the SpecialConnectionFactory configured as shown below / attached.
The problem which occurs is that the client (attached) keeps logging the following message
Exception Session is closed during sending
rather then falling over to the backup server. My question is should I expect it to continue. I don't mind it getting some exception while failover takes place, but I am hoping to be able to avoid having to re-connect to the backup server manually. The examples included seem to suggest this is possible even though they focus on message acknowledgment rather then sending.
Any pointers what do try out to get HA with failover working correctly ?
Also if I restart the stopped server the client doesn't seem to resume sending messages ? Any pointer to get this working ? Using the supplied examples I managed to get this working. I tried to compare the server configuration, but haven't been able to find the relevant difference.
I am using hornetq-2.2.5.Final on Linux with Java 1.7.0_01 64 bit
Thanks in advance
Here are extracts from my configuration. The complete files are attached to the discussion.
hornetq-jms.xml
<connection-factory name="NettyConnectionFactory">
<xa>true</xa>
<ha>true</ha>
<!-- Pause 1 second between connect attempts -->
<retry-interval>1000</retry-interval>
<!-- Multiply subsequent reconnect pauses by this multiplier. This can be used to
implement an exponential back-off. For our purposes we just set to 1.0 so each reconnect
pause is the same length -->
<retry-interval-multiplier>1.0</retry-interval-multiplier>
<!-- Try reconnecting an unlimited number of times (-1 means "unlimited") -->
<reconnect-attempts>-1</reconnect-attempts>
<client-failure-check-period>100</client-failure-check-period>
<failover-on-server-shutdown>true</failover-on-server-shutdown>
<failover-on-initial-connection>true</failover-on-initial-connection>
<discovery-group-ref discovery-group-name="dg-group1"/>
<connectors>
<connector-ref connector-name="netty"/>
</connectors>
<entries>
<entry name="/SpecialConnectionFactory"/>
</entries>
<connection-load-balancing-policy-class-name>org.hornetq.api.core.client.loadbalance.RandomConnectionLoadBalancingPolicy</connection-load-balancing-policy-class-name>
</connection-factory>
My hornet configuration file (hornetq-configuration.xml) contains the following details
<configuration xmlns="urn:hornetq"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:hornetq /schema/hornetq-configuration.xsd">
<clustered>true</clustered>
<shared-store>true</shared-store>
<backup>${backup:false}</backup>
<allow-failback>true</allow-failback>
<failover-on-shutdown>true</failover-on-shutdown>
<paging-directory>${data.dir:../data}/paging</paging-directory>
<bindings-directory>${data.dir:../data}/bindings</bindings-directory>
<journal-directory>${data.dir:../data}/journal</journal-directory>
<journal-min-files>10</journal-min-files>
-
hornetq-beans.xml 2.1 KB
-
ClientSender.java.zip 1,012 bytes
-
server1_standby.sh 2.1 KB
-
server1.sh 2.1 KB
-
hornetq-jms.xml 3.7 KB
-
hornetq-configuration.xml 4.1 KB