10 Replies Latest reply on Jan 14, 2010 8:00 PM by Clebert Suconic

    Cluster messaging failover

    Karen Greene Newbie

      Hi,

       

      I have been reading the user manual for HornetQ.  My understanding is that you can set up a cluster of servers to achieve load balancing and sharing of messages across those servers.  It was not clear to me what happens when one of the nodes in the cluster goes down.  Does hornetq know that server went down and stop delivering messages to that server? What happens to the messages already delivered to that server or messages in progress when it went down?  Are those messages supposed to be redelivered to another node in the cluster?  If so, is there a configuration someone that dictates how long HornetQ is supposed to wait before redelivering the message to another node?

       

      When you cluster a set of servers, my understanding is that for performance reasons, HornetQ does not replicate message across each server.  In follow-up to the previous question, does this mean only certain servers in the cluster will process a message if another server in the cluster goes down?

       

      The reason I ask these questions is that my understanding was that HornetQ is supposed to redeliver a message to another node if one server in a cluster goes down.  However, this is not the behaviour we are seeing.  For some reason, the message gets "stuck" on the server that went down.  I am wondering if there is some delay in the redelivery.

       

      Thanks,

       

      Karen

        • 1. Re: Cluster messaging failover
          Clebert Suconic Master

          Cluster and Failover are two different things on HornetQ.


          Karen Greene wrote: "HornetQ is supposed to redeliver a message to another node if one server in a cluster goes down"

           

          You will need a backup node for a node for that:

           

          http://hornetq.sourceforge.net/docs/hornetq-2.0.0.GA/user-manual/en/html/ha.html

          • 2. Re: Cluster messaging failover
            Karen Greene Newbie

            Thanks for the reply.  I did see the section about failover.  However, it seems to talk more about client failover

             

            " Before failover, only the live server is serving the HornetQ clients while the backup             server remains passive. When clients fail over to the backup server, the backup server             becomes active and starts to service the HornetQ clients."

             

            In other words this allows backup in case a client goes down, the client on the backup server will come up.  I did not see any discussion regarding redelivery of messages if a server goes down.

             

            My understanding is that you can cluster servers together for load balancing purposes.  So, for example, you can have 4 servers clustered together in which messages get sent to all four of these servers.  Does the failover described in this section also apply to messages?  I talks here about having a live and backup pair.  So, does that mean that if serverA and serverB are paired and serverA goes down, that the messages on serverA will be redelivered to serverB?  If so, is there a delay in that redelivery?  Is there a setting in a configuration file for this delay?

             

            We are seeing that messages get stuck if a server in the cluster goes down.  I am wondering if this is supposed to happen or if I might be missing some configuration somewhere.

             

            Thanks,

             

            Karen

            • 3. Re: Cluster messaging failover
              Karen Greene Newbie

              Thanks for the reply.  I did see the section about failover.  However, it seems to talk more about client failover

               

              " Before failover, only the live server is serving the HornetQ clients while the backup             server remains passive. When clients fail over to the backup server, the backup server             becomes active and starts to service the HornetQ clients."

               

              In other words this allows backup in case a client goes down, the client on the backup server will come up.  I did not see any discussion regarding redelivery of messages if a server goes down.

               

              My understanding is that you can cluster servers together for load balancing purposes.  So, for example, you can have 4 servers clustered together in which messages get sent to all four of these servers.  Does the failover described in this section also apply to messages?  I talks here about having a live and backup pair.  So, does that mean that if serverA and serverB are paired and serverA goes down, that the messages on serverA will be redelivered to serverB?  If so, is there a delay in that redelivery?  Is there a setting in a configuration file for this delay?

               

              We are seeing that messages get stuck if a server in the cluster goes down.  I am wondering if this is supposed to happen or if I might be missing some configuration somewhere.

               

              Thanks,

               

              Karen
              • 4. Re: Cluster messaging failover
                Tim Fox Master

                Take a look in the ha chapter.

                 

                I describes how live and backup servers can be paired in one of two ways a) using a shared store b) by data replication.

                • 5. Re: Cluster messaging failover
                  Tim Fox Master

                  There are also fully working examples in the distro which show clustering and failover working.

                   

                  I'd suggest getting familiar with those too, if you haven't done so already.

                  • 6. Re: Cluster messaging failover
                    Clebert Suconic Master

                    Karen said: "We are seeing that messages get stuck if a server in the cluster goes down.  I am wondering if this is supposed to happen or if I might be missing some configuration somewhere."

                     

                     

                    Yes.. as I said you need a backup node to recover from failures. Look at the doc Tim Fox pointed you out.

                    • 7. Re: Cluster messaging failover
                      Karen Greene Newbie

                      Thanks for the reply.

                       

                      When a server in a cluster goes down, we are noticing that messages still get delivered to that server.  Is that the behaviour that is expected?

                      • 8. Re: Cluster messaging failover
                        Clebert Suconic Master

                        Look at TTL docs.

                         

                        http://hornetq.sourceforge.net/docs/hornetq-2.0.0.GA/user-manual/en/html/connection-ttl.html

                         

                         

                        The message should be replayed to the other servers when the failure is identified. (based on message acknowledgements between the servers)

                        • 9. Re: Cluster messaging failover
                          Karen Greene Newbie

                          Ok.  Thanks for the feedback.  From initial reading of the documentation, I thought that was supposed to be the behavior.  Specifically, if you have clustered a group of servers and one server goes down, messages should no longer be delivered to that server or should be rerouted to the remaining servers that are active.

                           

                          If that is the case, I must have some setup incorrect as I am NOT seeing the messages being rerouted if a server in the cluster goes down.  Instead, the messages are lost.

                           

                          I will try again.

                           

                          Thanks,

                           

                          Karen

                          • 10. Re: Cluster messaging failover
                            Clebert Suconic Master
                            If you think you found a bug, we can always take a look if you provide a test or something like that.