2 Replies Latest reply on Dec 16, 2011 11:09 AM by Alexander Hartner

    Failover / HA not working in stand-alone mode

    Alexander Hartner Expert

      I am trying to setup a HA environment consisting of two severs. The data folder including the journal are shared between the server via an NFS share. The directory /mnt/share is common to both systems. After some fiddling the configuration I managed to get fail over working (I think).

       

      When I stop server1 using CTRL+C the other (server1_standby) seems to take over and become the live server.

       

      My test client sends a series of simple text messages to a queue hosted on the server. It uses JNDI and JMS api rather then HornetQ specific code. It does a standard JNDI lookup and retrieves the SpecialConnectionFactory configured as  shown below / attached.

       

      The problem which occurs is that the client (attached) keeps logging the following message

       

      Exception Session is closed during sending

       

      rather then falling over to the backup server. My question is should I expect it to continue. I don't mind it getting some exception while failover takes place, but I am hoping to be able to avoid having to re-connect to the backup server manually. The examples included seem to suggest this is possible even though they focus on message acknowledgment rather then sending.

       

      Any pointers what do try out to get HA with failover working correctly ?

       

      Also if I restart the stopped server the client doesn't seem to resume sending messages ? Any pointer to get this working ? Using the supplied examples I managed to get this working. I tried to compare the server configuration, but haven't been able to find the relevant difference.

       

      I am using hornetq-2.2.5.Final on Linux with Java 1.7.0_01 64 bit

       

      Thanks in advance

       

      Here are extracts from my configuration. The complete files are attached to the discussion.

      hornetq-jms.xml

        <connection-factory name="NettyConnectionFactory">

          <xa>true</xa>     

          <ha>true</ha>

          <!-- Pause 1 second between connect attempts -->

          <retry-interval>1000</retry-interval>

          <!-- Multiply subsequent reconnect pauses by this multiplier. This can be used to

            implement an exponential back-off. For our purposes we just set to 1.0 so each reconnect

            pause is the same length -->

          <retry-interval-multiplier>1.0</retry-interval-multiplier>

          <!-- Try reconnecting an unlimited number of times (-1 means "unlimited") -->

          <reconnect-attempts>-1</reconnect-attempts>

          <client-failure-check-period>100</client-failure-check-period>

          <failover-on-server-shutdown>true</failover-on-server-shutdown>

          <failover-on-initial-connection>true</failover-on-initial-connection>

          <discovery-group-ref discovery-group-name="dg-group1"/>

          <connectors>

            <connector-ref connector-name="netty"/>

          </connectors>

          <entries>

            <entry name="/SpecialConnectionFactory"/>

          </entries>

          <connection-load-balancing-policy-class-name>org.hornetq.api.core.client.loadbalance.RandomConnectionLoadBalancingPolicy</connection-load-balancing-policy-class-name>     

        </connection-factory>

       

      My hornet configuration file (hornetq-configuration.xml) contains the following details

      <configuration xmlns="urn:hornetq"

                     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

                     xsi:schemaLocation="urn:hornetq /schema/hornetq-configuration.xsd">

       

        <clustered>true</clustered>

        <shared-store>true</shared-store>

        <backup>${backup:false}</backup>     

        <allow-failback>true</allow-failback>

        <failover-on-shutdown>true</failover-on-shutdown>

        <paging-directory>${data.dir:../data}/paging</paging-directory>  

        <bindings-directory>${data.dir:../data}/bindings</bindings-directory>  

        <journal-directory>${data.dir:../data}/journal</journal-directory>  

        <journal-min-files>10</journal-min-files>

        • 1. Re: Failover / HA not working in stand-alone mode
          Andy Taylor Master

          all your connection factories in you jms config have the same name, so I'm guessing you arent actually using the one youve configured.

          • 2. Re: Failover / HA not working in stand-alone mode
            Alexander Hartner Expert

            I thought the name was set in the entry name:

            <entry name="/SpecialConnectionFactory"/>

            This is also the name I am using for the JNDI lookup. I have several connection factories configured:

            <entry name="/SpecialConnectionFactory"/>...

            <entry name="/ExampleConnectionFactory"/>...

            <entry name="/ConnectionFactory"/>...

            <entry name="/XAThroughputConnectionFactory"/>...

            I based this on the default configuration (config/stand-alone/clustered/hornetq-jms.xml) included with the download. In this example there are several different connection factories named NettyConnectionFactory but with different "entry names". Please could you confirm that this is the issue. I am not quite sure what NettyConnectionFactory refers to as I haven't been able to find any other references other than this file. I thought it was a reference back to the class used for the connection factory.