12 Replies Latest reply on Jun 16, 2015 12:35 PM by Stephan Prätsch

    Silent HornetQ Failover with 1 Live and 2 Backup

    Stephan Prätsch Newbie

      Hi there,

       

      I want to start 1 live and 2 backup HornetQ server with a silent failover for the client. Unfortunately I don't get a connection to the backup server that went live. Probably I did a misconfiguration but I don't know what.

       

      I adapted the configuration from config/stand-alone/clustered example in HornetQ. See attachment

      1. my live server is instance0

      2. my backup server is instance1 (and later instance2, too)


      When started the ports look like

      > netstat -tulpen 2> /dev/null | grep java
      tcp6       0      0 :::62500                :::*                    LISTEN      1000       988888     12518/java          
      tcp6       0      0 :::62501                :::*                    LISTEN      1000       988871     12518/java          
      tcp6       0      0 :::62502                :::*                    LISTEN      1000       992055     12518/java          
      tcp6       0      0 :::62503                :::*                    LISTEN      1000       988038     12518/java          
      tcp6       0      0 :::40647                :::*                    LISTEN      1000       988716     12518/java          
      tcp6       0      0 :::62512                :::*                    LISTEN      1000       988159     12549/java          
      tcp6       0      0 :::62513                :::*                    LISTEN      1000       990819     12549/java          
      tcp6       0      0 :::44691                :::*                    LISTEN      1000       992078     12549/java          
      tcp6       0      0 127.0.0.1:39837         :::*                    LISTEN      1000       492818     2385/java           
      udp6       0      0 :::35858                :::*                                1000       988889     12518/java          
      udp6       0      0 231.7.7.7:9876          :::*                                1000       991105     12549/java          
      udp6       0      0 231.7.7.7:9876          :::*                                1000       990927     12549/java          
      udp6       0      0 231.7.7.7:9876          :::*                                1000       990890     12518/java          
      


      My Java Client (see QueueExampleForDiscovery.java in attachment) creates a connection via

       

      private HornetQConnectionFactory createStaticFactory() {
          TransportConfiguration[] transportConfiguration = createTransportConfiguration();
          HornetQConnectionFactory factory = HornetQJMSClient.createConnectionFactoryWithHA(
                  JMSFactoryType.CF, transportConfiguration);
          setupFactory(factory);
          return factory;
      }
      
      private void setupFactory(HornetQConnectionFactory factory) {
          // http://docs.jboss.org/hornetq/2.3.0.Final/docs/user-manual/html/ha.html#ha.automatic.failover
          factory.setClientFailureCheckPeriod(Duration.ofSeconds(1).toMillis());
          // 39.2.1.1 on
          // http://docs.jboss.org/hornetq/2.3.0.Final/docs/user-manual/html/ha.html#ha.automatic.failover
          factory.setInitialConnectAttempts(5);
          // 34.1
          factory.setConfirmationWindowSize(1000000);
          // 34.1
          factory.setReconnectAttempts(20);
          factory.setRetryIntervalMultiplier(1.5);
          factory.setMaxRetryInterval(8000);
      }
      
      private TransportConfiguration[] createTransportConfiguration() {
          return new TransportConfiguration[] { createTransportConfiguration("localhost", 62500),
                  createTransportConfiguration("localhost", 62510),
          // createTransportConfiguration("localhost", 62520)
          };
      }
      
      private TransportConfiguration createTransportConfiguration(String host, int port) {
          HashMap<String, Object> map = new HashMap<String, Object>();
          map.put(TransportConstants.HOST_PROP_NAME, host);
          map.put(TransportConstants.PORT_PROP_NAME, String.valueOf(port));
          TransportConfiguration tc = new TransportConfiguration(NettyConnectorFactory.class
                  .getName(), map);
          return tc;
      }
      
      connection = createStaticFactory().createConnection();
      connection.start();
      


      and in the main

      connection = createStaticFactory().createConnection();
      connection.start();
      


      My Problems are

      1. At startup it wants to connect to the backup server (localhost:62510). As it is not live (port 62510 is not open yet) it retries until initialConnectoAttempts is reached. Is there a opportunity to say "it's ok when you find any server, you don't need to find each of them"?
      2. After the backup went live (port 62510 is open) I don't get connection to the backup. It throws an exception Connection failure has been detected: The connection was disconnected because of server shutdown
      3. When the live server chrashes and the backup becomes live, the backup server says
      11:07:11,582 WARN  [org.hornetq.core.client] HQ212034: There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=1135b5d3-0e85-11e5-94de-bbf7381e5600
      
      

       

       

      Can someone please tell me how to configure a failover? (without JNDI - all examples are with JNDI)

       

      Regards

      Stephan

       

      ----

       

      How to use my scripts:

      • sh install.sh downloads a HornetQ and links it as current
      • sh install-instance.sh 0 copies configurations from config/stand-alone/clustered , includes queues.xml and sets the ports 62500 to 62504 (netty and jnp)
      • sh start-instance.sh 0 starts the live server
      • for backup server the same just replace 0 by another number and "-backup true"
        • I installed with sh install-instance.sh 0 && sh install-instance.sh 1 -backup true && sh install-instance.sh 2 -backup true
        • 2. Re: Silent HornetQ Failover with 1 Live and 2 Backup
          Justin Bertram Master

          A few observations:

          1. Setting the client-failure-check-period to 1 second is pretty aggressive.  I recommend you leave the default value there for now until you've confirmed it needs to be set so low.
          2. Make sure you're actually inducing fail-over properly.  By default if you shut the server down gracefully then clients won't failover; you'd need to kill it to induce fail-over (e.g. kill -9 <pid>).  Alternatively you can set failover-on-shutdown = true in hornetq-configuration.xml.  It looks like you're doing this properly based on the kill-instance.sh script you attached, but I wasn't sure because you didn't outline your entire use-case.
          • 3. Re: Silent HornetQ Failover with 1 Live and 2 Backup
            Stephan Prätsch Newbie

            Pardon me. Here is a quick (maven) way to run my client

            //edit

            updated the client

            • 4. Re: Silent HornetQ Failover with 1 Live and 2 Backup
              Stephan Prätsch Newbie

              1. Ok, removed this client setting

              2. Yes. Either I use start-instance.sh and afterwards kill-instance.sh (from another bash) or simply CTRL+C to stop (kill?) the current running instance. I want to simulate an unexpected shutdown of the server that would cause a failover.

               

              Start my client (it will produce and consume lots of small messages) and kill the live. I expect a hiccup in the client and then go on working. It does not - why?

               

              Remember I don't want to use JNDI but either a static configuration with TransportConfiguration or the discovery group (without udp multicast).

              • 6. Re: Silent HornetQ Failover with 1 Live and 2 Backup
                Stephan Prätsch Newbie

                You can run it with

                 

                my.hornetq> mvn clean compile exec:java -Dexec.mainClass="QueueExampleForDiscovery"

                • 7. Re: Silent HornetQ Failover with 1 Live and 2 Backup
                  Stephan Prätsch Newbie

                  Running my example, complete included in one zip:

                   

                  Installing / Configuring

                  stephan@praetsch:~> wget https://developer.jboss.org/servlet/JiveServlet/download/933359-133748/hornetq-example.zip
                  stephan@praetsch:~> unzip hornetq-example.zip
                  stephan@praetsch:~> cd hornetq/
                  stephan@praetsch:~/hornetq> sh install.sh
                  stephan@praetsch:~/hornetq> sh install-instance.sh 0 && sh install-instance.sh 1 -backup true
                  
                  
                  
                  

                   

                  Starting: each command in a different bash / shell

                  stephan@praetsch:~/hornetq> sh start-instance.sh 0 # start live
                  stephan@praetsch:~/hornetq> sh start-instance.sh 1 # start backup
                  stephan@praetsch:~/hornetq> cd my.hornetq.client/ ; mvn clean compile exec:java -Dexec.mainClass="QueueExampleForDiscovery"
                  
                  
                  
                  

                   

                  While it is running, kill (or CTRL+C) the live instance. The Java Client does not failover to the backup server that became live.

                  • 8. Re: Silent HornetQ Failover with 1 Live and 2 Backup
                    Stephan Prätsch Newbie

                    I remade my example client based on hornetq-jms-replicated-failback-static-example

                     

                    Configure

                    stephan@praetsch:~> wget https://developer.jboss.org/servlet/JiveServlet/download/933475-133870/hornetq.zip
                    stephan@praetsch:~> unzip hornetq.zip
                    stephan@praetsch:~> cd hornetq/
                    stephan@praetsch:~/hornetq> sh install.sh
                    stephan@praetsch:~/hornetq> sh install-instance.sh 0
                    stephan@praetsch:~/hornetq> sh install-instance.sh 1 -backup true
                    
                    

                     

                    Start: Each command in a different shell

                    stephan@praetsch:~/hornetq> sh start-instance.sh 0 # start live server
                    stephan@praetsch:~/Downloads/hornetq> sh start-instance.sh 1 # start backup server
                    stephan@praetsch:~/hornetq> sh run.sh # start java client
                    
                    

                     

                    The client sends and receives. When I stop (CTRL+C) the live the backup becomes live and the client sends/receives from/to the backup after a short hiccup, BUT

                     

                    1. Why does the backup logs

                    15:51:04,030 WARN  [org.hornetq.core.client] HQ212034: There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=95fafdbd-1109-11e5-8c35-3314c01b57a4

                     

                    2. When I start the live server again, there is a mess: Both (new) live and backup log periodically

                    15:52:27,064 WARN  [org.hornetq.core.client] HQ212034: There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=95fafdbd-1109-11e5-8c35-3314c01b57a4

                     

                    3. When I stop the backup the client does not re-connect to the (new) live.

                     

                    Can u point my disconfiguration, please?

                    • 9. Re: Silent HornetQ Failover with 1 Live and 2 Backup
                      Justin Bertram Master

                      1. Why does the backup logs

                      15:51:04,030 WARN  [org.hornetq.core.client] HQ212034: There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=95fafdbd-1109-11e5-8c35-3314c01b57a4

                      I believe this is a side-effect of the way UDP works.  As long as this isn't logged continuously then you shouldn't have a problem (as the message itself implies).

                       

                      2. When I start the live server again, there is a mess: Both (new) live and backup log periodically

                      15:52:27,064 WARN  [org.hornetq.core.client] HQ212034: There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=95fafdbd-1109-11e5-8c35-3314c01b57a4

                      Take a look at the documentation - specifically the bit about <check-for-live-server>.

                       

                      3. When I stop the backup the client does not re-connect to the (new) live.

                      Once you address problem #2 this should be resolved as well.

                      • 10. Re: Silent HornetQ Failover with 1 Live and 2 Backup
                        Stephan Prätsch Newbie

                        Thanx. I configured <check-for-live-server>true</check-for-live-server> and it works quite well. BUT

                        1. start live
                        2. start backup
                          • backup announced
                        3. Client sends/receives
                        4. stop live
                          • backup becomes live
                          • Clients continues sending/receiving after hiccup
                        5. re-start original live
                          • Live server will not fail-back automatically
                          • backup announced
                          • Clients keeps sending/receiving
                        6. stop backup server (that just acted as live)
                          • original live server becomes actual live server
                          • Client does not get a connection anymore
                            • send: javax.jms.JMSException: Timed out waiting for response when sending packet 43 and could not rollback javax.jms.JMSException: Timed out waiting for response when sending packet 68
                            • receive: nothing is received
                        7. re-start the client: send / receive works

                         

                        Why does the re-connect at 6. does not work? According 7. the (original) live server is reachable.

                         

                        // edit

                        No matter what I set for <failover-on-shutdown>false</failover-on-shutdown>

                         

                        Same behavior with live, backup and backup server:

                        1. start live
                        2. start backup1
                        3. start backup2
                        4. start client send/receive
                        5. stop live, backup1 becomes live, short client hiccup
                        6. stop backup1, backup2 becomes live, client does not get a connection anymore

                        So after stopping 2 instances the client does not re-connect.

                         

                        //edit2

                        Playing around with my hornetq-jms.xml, it contains currently

                        <connection-factory name="ConnectionFactory">
                             <connectors>
                                  <connector-ref connector-name="netty-connector" />
                             </connectors>
                             <entries>
                                  <entry name="ConnectionFactory" />
                                  <entry name="XAConnectionFactory" />
                             </entries>
                             <ha>true</ha>
                             <retry-interval>3333</retry-interval>
                             <reconnect-attempts>1000</reconnect-attempts>
                             <client-failure-check-period>5000</client-failure-check-period>
                        </connection-factory>
                        
                        • 11. Re: Silent HornetQ Failover with 1 Live and 2 Backup
                          Stephan Prätsch Newbie

                          See hornetq-configuration.zip for my configurations: instance0 is live and instance1 is backup.

                           

                          The important things (AFAIK) are

                          hornetq-configuration.xml

                          <failover-on-shutdown>true</failover-on-shutdown>
                          <allow-failback>true</allow-failback>
                          <check-for-live-server>true</check-for-live-server>
                          <shared-store>false</shared-store>
                          <backup>false</backup> or <backup>true</backup>
                          
                          

                           

                          hornetq-jms.xml

                          <connection-factory name="ConnectionFactory">
                               <connectors>
                                    <connector-ref connector-name="netty-connector" />     
                               </connectors>
                               <entries>
                                    <entry name="ConnectionFactory" />
                               </entries>
                               <ha>true</ha>
                               <retry-interval>1000</retry-interval>
                               <retry-interval-multiplier>1.0</retry-interval-multiplier>
                               <reconnect-attempts>-1</reconnect-attempts>
                               <client-failure-check-period>5000</client-failure-check-period>
                               <confirmation-window-size>1048576</confirmation-window-size>
                               <failover-on-server-shutdown>true</failover-on-server-shutdown>
                          </connection-factory>
                          

                           

                          The Client works now with JNDI and creates a  connections exactly once by

                          private Connection createConnectionWithJndi() throws NamingException, JMSException {
                               InitialContext initialContext = getContext("jnp://localhost:62502");
                               ConnectionFactory connectionFactory = (ConnectionFactory) initialContext
                                    .lookup("/ConnectionFactory");
                               return connectionFactory.createConnection();
                          }
                          

                           

                          1. I start live and backup and client - everything fine
                          2. i stop live, backup becomes new live, client automatically re-connect to new live - everytyhing fine
                          3. I restart live, new live becomes live again, backup becomes backup again, client does not re-connect to live
                          4. I stop backup, client does not bother, nothing happens
                          5. I stop live, client does not bother, nothing happens
                          6. I bring backup back as live (start live, start backup, stop live), client re-connects to the backup server and goes on working

                           

                          Why does my client does not failback at all? Failover works fine but thats it.

                           

                          By the way

                          1. With live, backup1 and backup2
                          2. stop live, backup1 becomes live, client reconnects succesfully to backup1
                          3. stop backup1, backup2 becomes live, client does not reconnect to backup2
                          • 12. Re: Silent HornetQ Failover with 1 Live and 2 Backup
                            Stephan Prätsch Newbie

                            I finally got it working. It was an insance dependency problem on client side:

                            You have to use hornetq-jms-client that brings all the needed dependencies.

                             

                            Working dependencies (with JNDI)

                            <dependencies>
                              <dependency>
                                <groupId>org.hornetq</groupId>
                                <artifactId>hornetq-jms-client</artifactId>
                                <version>2.4.0.Final</version>
                              </dependency>
                              <dependency>
                                <groupId>jboss</groupId>
                                <artifactId>jnp-client</artifactId>
                                <version>4.2.2.GA</version>
                                <scope>compile</scope>
                              </dependency>
                            </dependencies>
                            

                             

                            NOT WORKING dependencies

                            <dependencies>
                              <dependency>
                                <groupId>org.hornetq</groupId>
                                <artifactId>hornetq-core</artifactId>
                                <version>2.2.7.Final</version>
                                <scope>compile</scope>
                              </dependency>
                              <dependency>
                                <groupId>org.hornetq</groupId>
                                <artifactId>hornetq-jms</artifactId>
                                <version>2.2.7.Final</version>
                                <scope>compile</scope>
                              </dependency>
                              <dependency>
                                <groupId>org.jboss.javaee</groupId>
                                <artifactId>jboss-jms-api</artifactId>
                                <version>1.1.0.GA</version>
                                <scope>compile</scope>
                              </dependency>
                              <dependency>
                                <groupId>jboss</groupId>
                                <artifactId>jnp-client</artifactId>
                                <version>4.2.2.GA</version>
                              </dependency>
                              <dependency>
                                <groupId>org.jboss.logging</groupId>
                                <artifactId>jboss-logging</artifactId>
                                <version>3.1.3.GA</version>
                              </dependency>
                              <dependency>
                                <groupId>org.jboss.netty</groupId>
                                <artifactId>netty</artifactId>
                                <version>3.2.10.Final</version>
                              </dependency>
                            </dependencies>
                            

                             

                            I don't know why but the second dependencies work fine except of failback in client.