5 Replies Latest reply on Jun 4, 2010 7:12 PM by clebert.suconic

    HA Failover in shared-store configuration

      Hi

       

      I'm trying to configure a HornetQ HA live/backup pair using a shared-store configuration. Both live and backup nodes are running in JBoss 4.2.3 and are on separate servers, using NFS mount point to hold their journal files. My message producer, a web service is running in the same Jboss as the live consumer.

       

      What I am looking to configure is that, when the live Consumer/server fails (e.g. the server is switched off), the Consumer on the backup node should take over and process any messages remaining in the queue. There isn't any need for Client/Producer failover because if the consumer has failed, the whole node will be down and no requests (which produce JMS messages) will be received. Any requests received by the service will be redirected to other nodes in the configuration by the http load balancer.

       

      I haven't been able to get failover to work; when I kill the live node, the backup doesn't take over. I've been through the HA chapter in the User Guide and what I don't understand is how the Backup node will know that the Live has failed, so the backup has to take over, and how to configure the two nodes so they're aware of each other. Can anyone explain?

       

      Thanks in advance

       

      John

        • 1. Re: HA Failover in shared-store configuration
          clebert.suconic

          ** using NFS mount **

           

           

          don't use NFS with the Journal. You need local access to the disk.

           

          When we mean shared store, that's over a SAN. (Not just simple NFS).

           

           

          Any DB requires local disk, as you need to guarantee syncs and other stuff, what I'm not sure it can be done through simple NFS.

           

           

           

          What happens with Failover is... when the client connect to the node, that node is activate. For that you need to configure the backup node on the ConnectionFactory, what could be done through UDP discovery or direct configuration. Look at the documentation for more details.

           

           

          BTW: You should be using 2.1.0.Final. (You didn't mention what version you were using)

          • 2. Re: HA Failover in shared-store configuration

            Thanks for the reply

             

            Regarding NFS - I could be using the wrong term, since I'm no Unix expert; this is a mount point that targets a file server. In production, this will certainly be on a SAN, but I don't have that available for development.

             

            The problem with my scenario is that when the Live server fails, there won't be any client left to failover, since the client/Producer is on the same server as the Live Consumer and that server is dead. Am I mis-understanding what's meant by the Client?

             

            John

            • 3. Re: HA Failover in shared-store configuration
              clebert.suconic

              The node will be activated when the first client is activated..

               

              There's currently also a JIRA opened onward of 2.2 to make MDBs passive until the server is activated. A Backup node shouldn't have any clients on it until the server is active.

              • 4. Re: HA Failover in shared-store configuration

                What is the client in my scenario? There won't be any Producers, because no requests will be sent to the backup node. There will be a Consumer Client on the Backup node, which should process any messages remaining in the queue, but how does the Backup node know that the Live node has failed, so that messages should be delivered to the Consumer on the backup?

                 

                Thanks for your patience...

                 

                John

                • 5. Re: HA Failover in shared-store configuration
                  clebert.suconic

                  Client = Any connection to that server.

                   

                   

                  Client = Producer | Consumer.. it doesn't matter.

                   

                   

                  As soon as you connect a consumer to the backup, the backup is already activated.

                   

                  If you connect a consumer to the backup, you will have a split brain scenario. (We are improving HA in that sense to avoid split brains).

                   

                   

                  At the moment, you should't connect anything to the backup while the live still alive.

                   

                  As soon as the clients connect to the backup (due to their failover code), the backup will be activated.