12 Replies Latest reply on Aug 19, 2010 5:16 AM by Joydeep Sarkar

    Failover not working

    Joydeep Sarkar Newbie

      Hi,

       

      I have configured Live-backup pair. But the client failover is not taking place. When the Live server(which client is looking up) is down, the client does not connect the backup server and throws exception instead.

       

      Observations:

      1. The JMX console of backup server does not show all the attributes in org.hornetq link except one.

      2. When the .ear is deployed in the backup server, shows an error as "/connection factory not bound".

      3. When the ConnectionFactory is obtained in the client code, the value shows there is only 1 connector available. i.e. netty factory. It shows "b" of the pair as null.

       

      I am not quite sure if these are desired characteristics.

      Also, I have made the configuration such that normal shut down also leads to failover, but of no avail.

       

      Queries:

      1) What exactly triggers the failover?

      2) Is it required to have the .ear deployed in the backup server? If yes, then how would the deployment be successful if the resources are not available?( As they are not showing up in the JMX console. May be because the server is not yet activated?)

       

      I have attached the configuration files herewith. Following is the client code,

       

       

      import org.hornetq.api.jms.HornetQJMSClient;
       
      import javax.jms.*;
      import javax.naming.Context;
      import javax.naming.InitialContext;
      import javax.naming.NamingException;
      import java.util.Properties;
       
      public class FailoverTest {
          public static void main(String[] args) {
      String queueName = null;
          Context jndiContext = null;
          ConnectionFactory connectionFactory = null;
          Connection connection = null;
          Session session = null;
          Queue queue = null;
          MessageProducer producer = null;
          TextMessage message = null;
          final int NUM_MSGS = 4;
          queueName = "queue/oracleXMLQueue";
       
       
          try {
              Properties props = new Properties();
              props.put(Context.INITIAL_CONTEXT_FACTORY, "org.jnp.interfaces.NamingContextFactory");
              props.put(Context.URL_PKG_PREFIXES, "org.jboss.naming :-o  rg.jnp.interfaces");
              props.put(Context.PROVIDER_URL, "jnp://lonmmsweb07:1199");
              jndiContext = new InitialContext(props);
          } catch (NamingException e) {
              System.out.println("Could not create JNDI API " +
             "context: " + e.toString());
              e.printStackTrace();
              System.exit(1);
          }
      try {
          connectionFactory = (QueueConnectionFactory)jndiContext.lookup("/ConnectionFactory");
          System.out.println(connectionFactory);
          queue = (Queue) jndiContext.lookup(queueName);
              connection = connectionFactory.createConnection();
              session = connection.createSession(false,Session.AUTO_ACKNOWLEDGE);
               producer = session.createProducer(queue);
               message = session.createTextMessage();
              for (int i = 0; i < NUM_MSGS; i++) {
                  message.setText("This is message " + (i + 1));
                  producer.send(message);
                 Thread.sleep(2000);
              }
          }catch (NamingException ne) {
              ne.printStackTrace();
          } catch (JMSException e) {
              e.printStackTrace();
          } catch(InterruptedException ie){
          }finally {
              if (connection != null) {
                  try {
                      connection.close();
                  } catch (JMSException jmse) {
                      jmse.printStackTrace();
                  }
              }
          }
          }
      }
      
      

       

       

      could anyone please tell me what is the problem here?

       

      TIA,

      Joydeep

        • 1. Re: Failover not working
          Joydeep Sarkar Newbie

          Hi,

           

          Any lead/suggestion about the problem?

           

          Regards,

          Joydeep

          • 2. Re: Failover not working
            Clebert Suconic Master

            We work here with first come first serve... We have no guarantees of free support.

             

            Someone will get back to you.

             

             

            anyway, answering your question... Failover on consumer will be kicked in by Pings and time-to-live.

             

             

            the backup node will be activated as soon as you have connections on it.

             

             

            (Be careful with MDBs or Resource Adapters on the Backup node). We are improving failover onto 2.2. but ATM you can't have any connections connected on the backup node until the node is live).

            • 3. Re: Failover not working
              Joydeep Sarkar Newbie

              Hi Clebert,

               

              Thanks a lot for the response.

               

              Since failover will betriggered by the time-to-live, the "The connection TTL" in ra.xml has to be configured on both the servers?

              Despite of specifying the TTL value, the client did ot failover. May be I believe it has no idea about the backup server. I have configured the connection factory with backup,

              <connection-factory name="ConnectionFactory">
                    <connectors>
                       <connector-ref connector-name="netty" backup-connector-name="backup-connector"/>
                    </connectors>
              

               

              And the broadcast group as well,

              <broadcast-groups>
                    <broadcast-group name="bg-group-level2">
                       <group-address>224.0.0.1</group-address>
                       <group-port>9876</group-port>
                       <broadcast-period>5000</broadcast-period>
                       <connector-ref connector-name="netty"
                             backup-connector-name="backup-connector"/>
                    </broadcast-group>
                 </broadcast-groups>
              

               

              Even I have tried to activate the backup server by doing a lookup using the client code. Since the server is not activated, it did not work.

               

              Any thoughts about the same?

               

              Regards,

              Joydeep

              • 4. Re: Failover not working
                Joydeep Sarkar Newbie

                Hi,

                 

                I was checking the client code in debug mode, where I saw that the ConnectionFactory object does not have any entry of the backup server. I see the following values,

                connectionFactory = {org.hornetq.jms.client.HornetQConnectionFactory@804}
                  sessionFactory = {org.hornetq.core.client.impl.ClientSessionFactoryImpl@819}
                  failoverManagerMap = {java.util.LinkedHashMap@820} size = 0
                  receivedBroadcast = false
                  threadPool = null
                  scheduledThreadPool = null
                  discoveryGroup = null
                  loadBalancingPolicy = null
                  failoverManagerArray = null
                 
                  cacheLargeMessagesClient = false
                  staticConnectors = {java.util.ArrayList@826} size = 1
                     [0] = {org.hornetq.api.core.Pair@838}"Pair[a=org-hornetq-integration-transports-netty-NettyConnectorFactory?host=10-1-0-71&port=5446, b=null]"
                              a = {org.hornetq.api.core.TransportConfiguration@1220}"org-hornetq-integration-transports-netty-NettyConnectorFactory?host=10-1-0-71&port=5446"
                              b = null
                              hash = -1
                  discoveryAddress = null

                 

                Even though I have the ConnectionFactory defined in the hornetq-jms as follows,

                <connection-factory name="ConnectionFactory">
                      <connectors>
                         <connector-ref connector-name="netty" backup-connector-name="backup-connector"/>
                      </connectors>
                      <entries>
                         <entry name="ConnectionFactory"/>
                      </entries>
                        <retry-interval>1000</retry-interval>
                        <retry-interval-multiplier>1.5</retry-interval-multiplier>
                        <max-retry-interval>10000</max-retry-interval>
                        <reconnect-attempts>10</reconnect-attempts>
                   </connection-factory>
                

                 

                When the client code is sending messages to the live server, upon killing the live server the client gets following exception,

                Sending message: This is message 1
                Sending message: This is message 2
                javax.jms.IllegalStateException: Producer is closed
                 at org.hornetq.jms.client.HornetQMessageProducer.checkClosed(HornetQMessageProducer.java:507)
                 at org.hornetq.jms.client.HornetQMessageProducer.send(HornetQMessageProducer.java:202)
                 at FailoverTest.main(FailoverTest.java:90)
                 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
                 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
                 at java.lang.reflect.Method.invoke(Method.java:585)
                 at com.intellij.rt.execution.application.AppMain.main(AppMain.java:115)

                 

                Does anyone have any idea what could be the reason for this?

                 

                Regards,

                Joydeep

                • 5. Re: Failover not working
                  Joydeep Sarkar Newbie

                  Hi,

                   

                  Somehow I have managed to get the setup going after a lot of "trial and error". So don't really have an idea what made it run properly.

                  Anyways, now that it is working, I am facing a different problem with consistency.

                  At times the backup server would initiate itself at the time of startup and at times after the failover the backup server wouldn't activate itself and wait forever.

                   

                  Following is the connectionfactory that I am using.

                  <connection-factory name="ConnectionFactory">
                        <connectors>
                           <connector-ref connector-name="netty" backup-connector-name="backup-connector"/>
                        </connectors>
                        <entries>
                           <entry name="ConnectionFactory"/>
                        </entries>
                          <client-failure-check-period>500</client-failure-check-period>
                          <retry-interval>1000</retry-interval>
                          <retry-interval-multiplier>1.5</retry-interval-multiplier>
                          <max-retry-interval>10000</max-retry-interval>
                          <reconnect-attempts>10</reconnect-attempts>
                     </connection-factory>
                  

                   

                  And this is the ra.xml

                        <config-property>
                          <description>The client failure check period</description>
                          <config-property-name>ClientFailureCheckPeriod</config-property-name>
                          <config-property-type>java.lang.Long</config-property-type>
                          <config-property-value>500</config-property-value>
                        </config-property>
                        <config-property>
                          <description>The connection TTL</description>
                          <config-property-name>ConnectionTTL</config-property-name>
                          <config-property-type>java.lang.Long</config-property-type>
                          <config-property-value>1000</config-property-value>
                        </config-property>          .
                            .
                            .
                       <config-property>
                          <description>Should clean server shutdown trigger failover?</description>
                          <config-property-name>FailoverOnServerShutdown</config-property-name>
                          <config-property-type>java.lang.Boolean</config-property-type>
                          <config-property-value>true</config-property-value>
                        </config-property>
                  

                   

                  Does anyone have any idea why the behaviour is inconsistent?

                   

                  Regards,
                  Joydeep

                  • 6. Re: Failover not working
                    Clebert Suconic Master

                    It seems you have a resource adapter live at the backup node. what's activating the backup right at startup.

                     

                    You probably need a backup server without any MDBs or any Resource adapters on it, and playing with remote configurations like the JCA-remote example on the distribution.

                    • 7. Re: Failover not working
                      Joydeep Sarkar Newbie

                      Thank you Clebert for the prompt response.

                      I will try out the same.

                       

                      Do you think due to the same reason the failover is also not happening?

                       

                      Regards,

                      Joydeep

                      • 8. Re: Failover not working
                        Joydeep Sarkar Newbie

                        Hi,

                         

                        I am facing a different problem now. The failover is not happening. It worked few times earlier. But all of a suddent I don't see the backup server being activated upon failure of the master node.

                         

                        Following is the configuration that I have,

                         

                        hornetq-configuration.xml

                           <backup-connector-ref connector-name="backup-connector"/>
                           <connectors>
                              <connector name="netty">
                                 <factory-class>org.hornetq.integration.transports.netty.NettyConnectorFactory</factory-class>
                                 <param key="host"  value="${hornetq.remoting.netty.host:10.1.0.71}"/>
                                 <param key="port"  value="${hornetq.remoting.netty.port:5446}"/>
                              </connector>
                              <connector name="backup-connector">
                                <factory-class>org.hornetq.integration.transports.netty.NettyConnectorFactory</factory-class>
                                <param key="host" value="${hornetq.remoting.netty.host:10.1.0.235}"/>
                                <param key="port" value="${hornetq.remoting.netty.port:5445}"/>
                              </connector>
                           </connectors>
                           <acceptors>
                              <acceptor name="netty">
                                 <factory-class>org.hornetq.integration.transports.netty.NettyAcceptorFactory</factory-class>
                                 <param key="host"  value="${hornetq.remoting.netty.host:10.1.0.71}"/>
                                 <param key="port"  value="${hornetq.remoting.netty.port:5446}"/>
                              </acceptor>
                           </acceptors>
                                  .
                                  .
                                  .
                          <queues>
                              <queue name="jms.queue.oraclsXMLQueue">
                                 <address>jms.queue.oraclsXMLQueue</address>
                              </queue>
                          </queues>
                        
                            <bridges>
                             <bridge name="my-bridge">
                                <queue-name>jms.queue.oraclsXMLQueue</queue-name>
                                <forwarding-address>jms.queue.oracleXMLQueue</forwarding-address>
                                <retry-interval>1000</retry-interval>
                                <retry-interval-multiplier>1.0</retry-interval-multiplier>
                                <reconnect-attempts>3</reconnect-attempts>
                                <failover-on-server-shutdown>true</failover-on-server-shutdown>
                                <use-duplicate-detection>true</use-duplicate-detection>
                                <confirmation-window-size>10000000</confirmation-window-size>
                                <connector-ref connector-name="netty"
                                        backup-connector-name="backup-connector"/>
                              </bridge>
                            </bridges>
                        

                         

                         

                        hornetq-jms.xml

                        <connection-factory name="ConnectionFactory">
                              <connectors>
                                 <connector-ref connector-name="netty" backup-connector-name="backup-connector"/>
                              </connectors>
                              <entries>
                                 <entry name="ConnectionFactory"/>
                              </entries>
                                <retry-interval>1000</retry-interval>
                                <retry-interval-multiplier>1.5</retry-interval-multiplier>
                                <max-retry-interval>10000</max-retry-interval>
                                <reconnect-attempts>10</reconnect-attempts>
                           </connection-factory>
                        
                        
                        

                         

                         

                         

                        ra.xml

                              <config-property>
                                <description>The client failure check period</description>
                                <config-property-name>ClientFailureCheckPeriod</config-property-name>
                                <config-property-type>java.lang.Long</config-property-type>
                                <config-property-value>500</config-property-value>
                              </config-property>
                              <config-property>
                                <description>The connection TTL</description>
                                <config-property-name>ConnectionTTL</config-property-name>
                                <config-property-type>java.lang.Long</config-property-type>
                                <config-property-value>1000</config-property-value>
                              </config-property>
                                  .          .
                                  .
                              <config-property>
                                <description>Should clean server shutdown trigger failover?</description>
                                <config-property-name>FailoverOnServerShutdown</config-property-name>
                                <config-property-type>java.lang.Boolean</config-property-type>
                                <config-property-value>true</config-property-value>
                              </config-property>
                        
                        
                        


                         

                        All the above configuration is from the Live server. Rest of the configuration are in default state.

                         

                        Is there any idea why the failover is not working? I do not see any error messages either during the startup of the servers.

                        Is there any possibility that it could be a network issue?

                        Please help.

                         

                        Regards,

                        Joydeep

                        • 9. Re: Failover not working
                          Tim Fox Master

                          As Clebert explained in his earlier post, failover is triggered from the client side. It is NOT triggered by killing the live server.

                           

                          IIRC this is explained in the user manual and has been discussed several times in other threads.

                          • 10. Re: Failover not working
                            Joydeep Sarkar Newbie

                            Hello Tim,

                             

                            I do understand that the failover is triggered by the client. But when I was using the standalone java client, the failover was not taking place. So I was under the impression that there is something wrong either in the configuration or in the network that I am using. I was just trying to simulate a real-time scenario so was killing the server.

                            Anyways, I would like to point out few things that I have observed.

                            1) The in-vm acceptors in the backup server was actually activating the backup server at the startup. Removing which made it start properly as a backup server.

                            2) The in-vm connection factory in Live server was basically playing some role into not letting the failover take place. When I have the in-vm connection factory enabled, the failover does not occur.

                             

                            The live-backup setup is working afetr taking care of above points.

                             

                            Any thoughts about the same?

                             

                            Regards,
                            Joydeep

                            • 11. Re: Failover not working
                              Clebert Suconic Master

                              I said earlier we are improving Failover at the moment... what will avoid split brains and situations like that.

                               

                              This is how it works at the moment.

                              • 12. Re: Failover not working
                                Joydeep Sarkar Newbie

                                Hmm... I understand.

                                But this current process is leading me to another serious problem. The MDB stops listening to the queue in the backup server.

                                Attempts:

                                1) I have tried to incorporate JCA, which basically activated the backup server at the startup.

                                2) And can not use the in-vm acceptor in the backup configuration, which is again, activating the backup server at the startup.

                                 

                                What else could be done so as to enable the listener?

                                 

                                Following are the Configurations that I have in the backup server,

                                 

                                hornetq-jms.xml

                                <connection-factory name="ConnectionFactory">
                                      <connectors>
                                         <connector-ref connector-name="netty"/>
                                      </connectors>
                                      <entries>
                                         <entry name="ConnectionFactory"/>
                                      </entries>
                                        <retry-interval>1000</retry-interval>
                                        <retry-interval-multiplier>1.5</retry-interval-multiplier>
                                        <max-retry-interval>10000</max-retry-interval>
                                        <reconnect-attempts>10</reconnect-attempts>
                                   </connection-factory>
                                
                                
                                

                                 

                                 

                                hornetq-configuration.xml

                                   <connectors>
                                      <connector name="netty">
                                         <factory-class>org.hornetq.integration.transports.netty.NettyConnectorFactory</factory-class>
                                         <param key="host"  value="${hornetq.remoting.netty.host:10.1.0.235}"/>
                                         <param key="port"  value="${hornetq.remoting.netty.port:5445}"/>
                                      </connector>
                                      <connector name="in-vm">
                                         <factory-class>org.hornetq.core.remoting.impl.invm.InVMConnectorFactory</factory-class>
                                      </connector>
                                   </connectors>
                                   <acceptors>
                                      <acceptor name="netty">
                                         <factory-class>org.hornetq.integration.transports.netty.NettyAcceptorFactory</factory-class>
                                         <param key="host"  value="${hornetq.remoting.netty.host:10.1.0.235}"/>
                                         <param key="port"  value="${hornetq.remoting.netty.port:5445}"/>
                                      </acceptor>
                                <!--      <acceptor name="in-vm">
                                        <factory-class>org.hornetq.core.remoting.impl.invm.InVMAcceptorFactory</factory-class>
                                        <param key="server-id" value="0"/>
                                      </acceptor> -->
                                   </acceptors>
                                

                                 

                                If I enable the in-vm acceptor, the backup server gets activated at the startup. And MDB stops listening to the queue if done otherwise.

                                How to get rid of this?

                                 

                                Regards,

                                Joydeep