12 Replies Latest reply on Jun 16, 2015 12:35 PM by realkobe

    Silent HornetQ Failover with 1 Live and 2 Backup

    realkobe

      Hi there,

       

      I want to start 1 live and 2 backup HornetQ server with a silent failover for the client. Unfortunately I don't get a connection to the backup server that went live. Probably I did a misconfiguration but I don't know what.

       

      I adapted the configuration from config/stand-alone/clustered example in HornetQ. See attachment

      1. my live server is instance0

      2. my backup server is instance1 (and later instance2, too)


      When started the ports look like

      > netstat -tulpen 2> /dev/null | grep java
      tcp6       0      0 :::62500                :::*                    LISTEN      1000       988888     12518/java          
      tcp6       0      0 :::62501                :::*                    LISTEN      1000       988871     12518/java          
      tcp6       0      0 :::62502                :::*                    LISTEN      1000       992055     12518/java          
      tcp6       0      0 :::62503                :::*                    LISTEN      1000       988038     12518/java          
      tcp6       0      0 :::40647                :::*                    LISTEN      1000       988716     12518/java          
      tcp6       0      0 :::62512                :::*                    LISTEN      1000       988159     12549/java          
      tcp6       0      0 :::62513                :::*                    LISTEN      1000       990819     12549/java          
      tcp6       0      0 :::44691                :::*                    LISTEN      1000       992078     12549/java          
      tcp6       0      0 127.0.0.1:39837         :::*                    LISTEN      1000       492818     2385/java           
      udp6       0      0 :::35858                :::*                                1000       988889     12518/java          
      udp6       0      0 231.7.7.7:9876          :::*                                1000       991105     12549/java          
      udp6       0      0 231.7.7.7:9876          :::*                                1000       990927     12549/java          
      udp6       0      0 231.7.7.7:9876          :::*                                1000       990890     12518/java          
      


      My Java Client (see QueueExampleForDiscovery.java in attachment) creates a connection via

       

      private HornetQConnectionFactory createStaticFactory() {
          TransportConfiguration[] transportConfiguration = createTransportConfiguration();
          HornetQConnectionFactory factory = HornetQJMSClient.createConnectionFactoryWithHA(
                  JMSFactoryType.CF, transportConfiguration);
          setupFactory(factory);
          return factory;
      }
      
      private void setupFactory(HornetQConnectionFactory factory) {
          // http://docs.jboss.org/hornetq/2.3.0.Final/docs/user-manual/html/ha.html#ha.automatic.failover
          factory.setClientFailureCheckPeriod(Duration.ofSeconds(1).toMillis());
          // 39.2.1.1 on
          // http://docs.jboss.org/hornetq/2.3.0.Final/docs/user-manual/html/ha.html#ha.automatic.failover
          factory.setInitialConnectAttempts(5);
          // 34.1
          factory.setConfirmationWindowSize(1000000);
          // 34.1
          factory.setReconnectAttempts(20);
          factory.setRetryIntervalMultiplier(1.5);
          factory.setMaxRetryInterval(8000);
      }
      
      private TransportConfiguration[] createTransportConfiguration() {
          return new TransportConfiguration[] { createTransportConfiguration("localhost", 62500),
                  createTransportConfiguration("localhost", 62510),
          // createTransportConfiguration("localhost", 62520)
          };
      }
      
      private TransportConfiguration createTransportConfiguration(String host, int port) {
          HashMap<String, Object> map = new HashMap<String, Object>();
          map.put(TransportConstants.HOST_PROP_NAME, host);
          map.put(TransportConstants.PORT_PROP_NAME, String.valueOf(port));
          TransportConfiguration tc = new TransportConfiguration(NettyConnectorFactory.class
                  .getName(), map);
          return tc;
      }
      
      connection = createStaticFactory().createConnection();
      connection.start();
      


      and in the main

      connection = createStaticFactory().createConnection();
      connection.start();
      


      My Problems are

      1. At startup it wants to connect to the backup server (localhost:62510). As it is not live (port 62510 is not open yet) it retries until initialConnectoAttempts is reached. Is there a opportunity to say "it's ok when you find any server, you don't need to find each of them"?
      2. After the backup went live (port 62510 is open) I don't get connection to the backup. It throws an exception Connection failure has been detected: The connection was disconnected because of server shutdown
      3. When the live server chrashes and the backup becomes live, the backup server says
      11:07:11,582 WARN  [org.hornetq.core.client] HQ212034: There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=1135b5d3-0e85-11e5-94de-bbf7381e5600
      
      

       

       

      Can someone please tell me how to configure a failover? (without JNDI - all examples are with JNDI)

       

      Regards

      Stephan

       

      ----

       

      How to use my scripts:

      • sh install.sh downloads a HornetQ and links it as current
      • sh install-instance.sh 0 copies configurations from config/stand-alone/clustered , includes queues.xml and sets the ports 62500 to 62504 (netty and jnp)
      • sh start-instance.sh 0 starts the live server
      • for backup server the same just replace 0 by another number and "-backup true"
        • I installed with sh install-instance.sh 0 && sh install-instance.sh 1 -backup true && sh install-instance.sh 2 -backup true
        • 1. Re: Silent HornetQ Failover with 1 Live and 2 Backup
          jbertram

          Got a quick way to run your client?

          • 2. Re: Silent HornetQ Failover with 1 Live and 2 Backup
            jbertram

            A few observations:

            1. Setting the client-failure-check-period to 1 second is pretty aggressive.  I recommend you leave the default value there for now until you've confirmed it needs to be set so low.
            2. Make sure you're actually inducing fail-over properly.  By default if you shut the server down gracefully then clients won't failover; you'd need to kill it to induce fail-over (e.g. kill -9 <pid>).  Alternatively you can set failover-on-shutdown = true in hornetq-configuration.xml.  It looks like you're doing this properly based on the kill-instance.sh script you attached, but I wasn't sure because you didn't outline your entire use-case.
            • 3. Re: Silent HornetQ Failover with 1 Live and 2 Backup
              realkobe

              Pardon me. Here is a quick (maven) way to run my client

              //edit

              updated the client

              • 4. Re: Silent HornetQ Failover with 1 Live and 2 Backup
                realkobe

                1. Ok, removed this client setting

                2. Yes. Either I use start-instance.sh and afterwards kill-instance.sh (from another bash) or simply CTRL+C to stop (kill?) the current running instance. I want to simulate an unexpected shutdown of the server that would cause a failover.

                 

                Start my client (it will produce and consume lots of small messages) and kill the live. I expect a hiccup in the client and then go on working. It does not - why?

                 

                Remember I don't want to use JNDI but either a static configuration with TransportConfiguration or the discovery group (without udp multicast).

                • 5. Re: Silent HornetQ Failover with 1 Live and 2 Backup
                  jbertram

                  What command should I use to run it?

                  • 6. Re: Silent HornetQ Failover with 1 Live and 2 Backup
                    realkobe

                    You can run it with

                     

                    my.hornetq> mvn clean compile exec:java -Dexec.mainClass="QueueExampleForDiscovery"

                    • 7. Re: Silent HornetQ Failover with 1 Live and 2 Backup
                      realkobe

                      Running my example, complete included in one zip:

                       

                      Installing / Configuring

                      stephan@praetsch:~> wget https://developer.jboss.org/servlet/JiveServlet/download/933359-133748/hornetq-example.zip
                      stephan@praetsch:~> unzip hornetq-example.zip
                      stephan@praetsch:~> cd hornetq/
                      stephan@praetsch:~/hornetq> sh install.sh
                      stephan@praetsch:~/hornetq> sh install-instance.sh 0 && sh install-instance.sh 1 -backup true
                      
                      
                      
                      

                       

                      Starting: each command in a different bash / shell

                      stephan@praetsch:~/hornetq> sh start-instance.sh 0 # start live
                      stephan@praetsch:~/hornetq> sh start-instance.sh 1 # start backup
                      stephan@praetsch:~/hornetq> cd my.hornetq.client/ ; mvn clean compile exec:java -Dexec.mainClass="QueueExampleForDiscovery"
                      
                      
                      
                      

                       

                      While it is running, kill (or CTRL+C) the live instance. The Java Client does not failover to the backup server that became live.

                      • 8. Re: Silent HornetQ Failover with 1 Live and 2 Backup
                        realkobe

                        I remade my example client based on hornetq-jms-replicated-failback-static-example

                         

                        Configure

                        stephan@praetsch:~> wget https://developer.jboss.org/servlet/JiveServlet/download/933475-133870/hornetq.zip
                        stephan@praetsch:~> unzip hornetq.zip
                        stephan@praetsch:~> cd hornetq/
                        stephan@praetsch:~/hornetq> sh install.sh
                        stephan@praetsch:~/hornetq> sh install-instance.sh 0
                        stephan@praetsch:~/hornetq> sh install-instance.sh 1 -backup true
                        
                        

                         

                        Start: Each command in a different shell

                        stephan@praetsch:~/hornetq> sh start-instance.sh 0 # start live server
                        stephan@praetsch:~/Downloads/hornetq> sh start-instance.sh 1 # start backup server
                        stephan@praetsch:~/hornetq> sh run.sh # start java client
                        
                        

                         

                        The client sends and receives. When I stop (CTRL+C) the live the backup becomes live and the client sends/receives from/to the backup after a short hiccup, BUT

                         

                        1. Why does the backup logs

                        15:51:04,030 WARN  [org.hornetq.core.client] HQ212034: There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=95fafdbd-1109-11e5-8c35-3314c01b57a4

                         

                        2. When I start the live server again, there is a mess: Both (new) live and backup log periodically

                        15:52:27,064 WARN  [org.hornetq.core.client] HQ212034: There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=95fafdbd-1109-11e5-8c35-3314c01b57a4

                         

                        3. When I stop the backup the client does not re-connect to the (new) live.

                         

                        Can u point my disconfiguration, please?

                        • 9. Re: Silent HornetQ Failover with 1 Live and 2 Backup
                          jbertram

                          1. Why does the backup logs

                          15:51:04,030 WARN  [org.hornetq.core.client] HQ212034: There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=95fafdbd-1109-11e5-8c35-3314c01b57a4

                          I believe this is a side-effect of the way UDP works.  As long as this isn't logged continuously then you shouldn't have a problem (as the message itself implies).

                           

                          2. When I start the live server again, there is a mess: Both (new) live and backup log periodically

                          15:52:27,064 WARN  [org.hornetq.core.client] HQ212034: There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=95fafdbd-1109-11e5-8c35-3314c01b57a4

                          Take a look at the documentation - specifically the bit about <check-for-live-server>.

                           

                          3. When I stop the backup the client does not re-connect to the (new) live.

                          Once you address problem #2 this should be resolved as well.

                          • 10. Re: Silent HornetQ Failover with 1 Live and 2 Backup
                            realkobe

                            Thanx. I configured <check-for-live-server>true</check-for-live-server> and it works quite well. BUT

                            1. start live
                            2. start backup
                              • backup announced
                            3. Client sends/receives
                            4. stop live
                              • backup becomes live
                              • Clients continues sending/receiving after hiccup
                            5. re-start original live
                              • Live server will not fail-back automatically
                              • backup announced
                              • Clients keeps sending/receiving
                            6. stop backup server (that just acted as live)
                              • original live server becomes actual live server
                              • Client does not get a connection anymore
                                • send: javax.jms.JMSException: Timed out waiting for response when sending packet 43 and could not rollback javax.jms.JMSException: Timed out waiting for response when sending packet 68
                                • receive: nothing is received
                            7. re-start the client: send / receive works

                             

                            Why does the re-connect at 6. does not work? According 7. the (original) live server is reachable.

                             

                            // edit

                            No matter what I set for <failover-on-shutdown>false</failover-on-shutdown>

                             

                            Same behavior with live, backup and backup server:

                            1. start live
                            2. start backup1
                            3. start backup2
                            4. start client send/receive
                            5. stop live, backup1 becomes live, short client hiccup
                            6. stop backup1, backup2 becomes live, client does not get a connection anymore

                            So after stopping 2 instances the client does not re-connect.

                             

                            //edit2

                            Playing around with my hornetq-jms.xml, it contains currently

                            <connection-factory name="ConnectionFactory">
                                 <connectors>
                                      <connector-ref connector-name="netty-connector" />
                                 </connectors>
                                 <entries>
                                      <entry name="ConnectionFactory" />
                                      <entry name="XAConnectionFactory" />
                                 </entries>
                                 <ha>true</ha>
                                 <retry-interval>3333</retry-interval>
                                 <reconnect-attempts>1000</reconnect-attempts>
                                 <client-failure-check-period>5000</client-failure-check-period>
                            </connection-factory>
                            
                            • 11. Re: Silent HornetQ Failover with 1 Live and 2 Backup
                              realkobe

                              See hornetq-configuration.zip for my configurations: instance0 is live and instance1 is backup.

                               

                              The important things (AFAIK) are

                              hornetq-configuration.xml

                              <failover-on-shutdown>true</failover-on-shutdown>
                              <allow-failback>true</allow-failback>
                              <check-for-live-server>true</check-for-live-server>
                              <shared-store>false</shared-store>
                              <backup>false</backup> or <backup>true</backup>
                              
                              

                               

                              hornetq-jms.xml

                              <connection-factory name="ConnectionFactory">
                                   <connectors>
                                        <connector-ref connector-name="netty-connector" />     
                                   </connectors>
                                   <entries>
                                        <entry name="ConnectionFactory" />
                                   </entries>
                                   <ha>true</ha>
                                   <retry-interval>1000</retry-interval>
                                   <retry-interval-multiplier>1.0</retry-interval-multiplier>
                                   <reconnect-attempts>-1</reconnect-attempts>
                                   <client-failure-check-period>5000</client-failure-check-period>
                                   <confirmation-window-size>1048576</confirmation-window-size>
                                   <failover-on-server-shutdown>true</failover-on-server-shutdown>
                              </connection-factory>
                              

                               

                              The Client works now with JNDI and creates a  connections exactly once by

                              private Connection createConnectionWithJndi() throws NamingException, JMSException {
                                   InitialContext initialContext = getContext("jnp://localhost:62502");
                                   ConnectionFactory connectionFactory = (ConnectionFactory) initialContext
                                        .lookup("/ConnectionFactory");
                                   return connectionFactory.createConnection();
                              }
                              

                               

                              1. I start live and backup and client - everything fine
                              2. i stop live, backup becomes new live, client automatically re-connect to new live - everytyhing fine
                              3. I restart live, new live becomes live again, backup becomes backup again, client does not re-connect to live
                              4. I stop backup, client does not bother, nothing happens
                              5. I stop live, client does not bother, nothing happens
                              6. I bring backup back as live (start live, start backup, stop live), client re-connects to the backup server and goes on working

                               

                              Why does my client does not failback at all? Failover works fine but thats it.

                               

                              By the way

                              1. With live, backup1 and backup2
                              2. stop live, backup1 becomes live, client reconnects succesfully to backup1
                              3. stop backup1, backup2 becomes live, client does not reconnect to backup2
                              • 12. Re: Silent HornetQ Failover with 1 Live and 2 Backup
                                realkobe

                                I finally got it working. It was an insance dependency problem on client side:

                                You have to use hornetq-jms-client that brings all the needed dependencies.

                                 

                                Working dependencies (with JNDI)

                                <dependencies>
                                  <dependency>
                                    <groupId>org.hornetq</groupId>
                                    <artifactId>hornetq-jms-client</artifactId>
                                    <version>2.4.0.Final</version>
                                  </dependency>
                                  <dependency>
                                    <groupId>jboss</groupId>
                                    <artifactId>jnp-client</artifactId>
                                    <version>4.2.2.GA</version>
                                    <scope>compile</scope>
                                  </dependency>
                                </dependencies>
                                

                                 

                                NOT WORKING dependencies

                                <dependencies>
                                  <dependency>
                                    <groupId>org.hornetq</groupId>
                                    <artifactId>hornetq-core</artifactId>
                                    <version>2.2.7.Final</version>
                                    <scope>compile</scope>
                                  </dependency>
                                  <dependency>
                                    <groupId>org.hornetq</groupId>
                                    <artifactId>hornetq-jms</artifactId>
                                    <version>2.2.7.Final</version>
                                    <scope>compile</scope>
                                  </dependency>
                                  <dependency>
                                    <groupId>org.jboss.javaee</groupId>
                                    <artifactId>jboss-jms-api</artifactId>
                                    <version>1.1.0.GA</version>
                                    <scope>compile</scope>
                                  </dependency>
                                  <dependency>
                                    <groupId>jboss</groupId>
                                    <artifactId>jnp-client</artifactId>
                                    <version>4.2.2.GA</version>
                                  </dependency>
                                  <dependency>
                                    <groupId>org.jboss.logging</groupId>
                                    <artifactId>jboss-logging</artifactId>
                                    <version>3.1.3.GA</version>
                                  </dependency>
                                  <dependency>
                                    <groupId>org.jboss.netty</groupId>
                                    <artifactId>netty</artifactId>
                                    <version>3.2.10.Final</version>
                                  </dependency>
                                </dependencies>
                                

                                 

                                I don't know why but the second dependencies work fine except of failback in client.