1 2 Previous Next 19 Replies Latest reply on Jun 26, 2010 2:47 AM by raja murugesan

    HornetQ cluster, messages still sent to failed node?

    Bryan Keller Newbie

      I have a simple symmetric HornetQ cluster set up with 2 nodes. Each node has a consumer reading from its local queue. One node produces messages and puts them in the local queue. Messages are being load balanced round robin across both nodes and everything works as expected.

       

      If I shut down the second node (the one not producing messages), the remaining node only receives every other message. It is as if HornetQ thinks the node that went down is still up and is blindly sending messages there, and ignores the fact the node is down. The result is that every other message is lost.

       

      If I bring the second node back up, messages are again load balanced across both nodes and no messages are lost. The messages that were missed are not redelivered.

       

      If I bring up the first node by itself, it consumes all messages.as a standalone server as expected.

       

      Is there a way to configure the cluster so that if a node goes down, HornetQ will no longer attempt to forward messages to that node? What is happening to the messages that are being lost?

        • 1. Re: HornetQ cluster, messages still sent to failed node?
          Bryan Keller Newbie

          I need to make a small correction. Once I bring the second node back up, the messages that were missed are, in fact, sent to the node as soon as it comes up again. So the producer node continues to round-robin load balance messages even though the second node is down, queues the messages for the second node, and once the second node comes up again, it gets flooded it with all of the messages that had been queued for it while it was down. I am basically looking to have the node that is up handle all new messages while the second node is down. Is this possible?

          • 2. Re: HornetQ cluster, messages still sent to failed node?
            Tim Fox Master

            No messages are being lost.

             

            If you take a node down, clearly the messages in the store for that node won't be available until that node is restarted.

             

            When that node restarts, it's messages will become re-available to be consumed.

            • 3. Re: HornetQ cluster, messages still sent to failed node?
              Bryan Keller Newbie

              Yes, that's what I subsequently posted. Is there a setting to make all new messages go to the node that is up instead of having new messages balanced to a node that is down (for an indefinite amount of time)? I looked in the code and noticed that the cluster connection creates a bridge with max retries of -1 (infinite), and that cannot be configured. Would setting that to a positive number cause the bridge to the node that is down to close, and then all messages would go to the node that is up?

              • 4. Re: HornetQ cluster, messages still sent to failed node?
                Bryan Keller Newbie

                I modified my local copy of the HornetQ code so the bridge being created by the cluster connection sets the max retries to 1. That had no effect. Messages were still targeted to the node that is down. I'll need to dig deeper to understand why this is happening.

                • 5. Re: HornetQ cluster, messages still sent to failed node?
                  Tim Fox Master

                  Bryan Keller wrote:

                   

                  Yes, that's what I subsequently posted. Is there a setting to make all new messages go to the node that is up instead of having new messages balanced to a node that is down (for an indefinite amount of time)?

                  Messages don't go to the node that is down. After all, it's down so it can't receive any messages.

                   

                  What happens is messages are stored in the node that is up, and then forwarded to the down node when it returns.

                  • 6. Re: HornetQ cluster, messages still sent to failed node?
                    Bryan Keller Newbie

                    Yes I understand. Maybe I should be more clear. When a node goes down, I don't want new messages to be targeted for that node. After all, the node is down so why is HornetQ attempting to send messages to it? The node could have suffered from a catastrophic hardware failure and I don't want to have to wait to have the node to come up again to have messages delivered and processed. (What if the node never comes up again?) Is there a setting that makes that possible?

                    • 7. Re: HornetQ cluster, messages still sent to failed node?
                      Leos Bitto Novice

                      Bryan Keller wrote:

                       

                      Yes I understand. Maybe I should be more clear. When a node goes down, I don't want new messages to be targeted for that node. After all, the node is down so why is HornetQ attempting to send messages to it? The node could have suffered from a catastrophic hardware failure and I don't want to have to wait to have the node to come up again to have messages delivered and processed. (What if the node never comes up again?) Is there a setting that makes that possible?

                       

                      What about forgetting the cluster at all and configuring both consumers to consume the messages from the same queue (where the producer puts them)?

                      • 8. Re: HornetQ cluster, messages still sent to failed node?
                        Bryan Keller Newbie

                        Yes, I considered that topography, though it isn't ideal for my purpose. I need high availability, so presumably I would go with a master-slave setup if I did this. This turns out to be not ideal. I will not have a shared file system available, and HornetQ doesn't support JDBC persistence, so once the master goes down, I cannot fail-back to it without having an outage where I copy state from the slave back to the master. I want a setup that will not require an outage. Also, HornetQ supports only one slave.

                         

                        Thus I looked into clustering. I was hoping I could cluster multiple M-S pairs, so having an outage on one pair for maintenance would be OK (though then my requirements for HA would require 4 nodes minimum). Using server-side load balancing does not work for me. As I mentioned, HornetQ will continue to target messages to nodes that are down and wait for the node to come up again to deliver. I need messages to be processed in a timely manner. I also was not able to get server-side load balancing to work as I expected with M-S pairs.

                         

                        The next alternative is client-side load balancing, where I have multiple M-S pairs, no server forwarding, and the messaging client picks a server pair in a round robin fashion using discovery. I think this is probably my last hope if I decide to use HornetQ, though I still need to consider if this will work given the product requirements I have to work with.

                        • 9. Re: HornetQ cluster, messages still sent to failed node?
                          Leos Bitto Novice

                          I do not understand your requirements/architecture. Can you afford losing some messages when you lose a server (due to some hardware failure)? Can you afford downtime during your maintenance (for example to get the failed server back online)? I am afraid that if your answers to both questions would be No, it would lead to some serious clustering which the current versions of HornetQ would not be able to achieve yet. On the other hand the previous messaging provider from JBoss/RedHat (JBoss Messaging 1.4) could be the right thing for you. Sice you mentioned that you would not have any shared file system, consider coupling it with MySQL cluster, that could give you a working solution - probably even with a proper commercial support, if that would be needed. And if JBoss Messaging loses support after some time, you can replace it with ActiveMQ which should work fine with MySQL cluster too, or maybe even with HornetQ if that happens to work with JDBC storage by then...

                          • 10. Re: HornetQ cluster, messages still sent to failed node?
                            Bryan Keller Newbie

                            High availability is my major goal, i.e. no downtime even for fail-back. Load balancing the message queue would be nice but is of lesser concern, load balancing the consumers is more important to me and that is easily achieved. I was looking into using a cluster as a way to allow a M-S pair to come down for fail-back maintenance without having an outage. I was hoping the live M-S pair could service the messages while the other pair is down. But requiring 4 nodes for HA+failback is a little much, even if it did work.

                             

                            As you mentioned, I will probably need to use a MQ provider that supports JDBC (JBoss Messaging, ActiveMQ) given I have no shared filesystem.

                            • 11. Re: HornetQ cluster, messages still sent to failed node?
                              Tim Fox Master

                              Bryan Keller wrote:

                               

                              High availability is my major goal, i.e. no downtime even for fail-back. Load balancing the message queue would be nice but is of lesser concern, load balancing the consumers is more important to me and that is easily achieved. I was looking into using a cluster as a way to allow a M-S pair to come down for fail-back maintenance without having an outage. I was hoping the live M-S pair could service the messages while the other pair is down. But requiring 4 nodes for HA+failback is a little much, even if it did work.

                               

                              As you mentioned, I will probably need to use a MQ provider that supports JDBC (JBoss Messaging, ActiveMQ) given I have no shared filesystem.

                              "Fail-back" (or promoting a backup server to be a live) is on the roadmap. BTW I'd bear in mind that ActiveMQ does it pretty much the same way as HornetQ as present - i.e. it has no automatic fail back and requires manual copying of the data directories same as HQ.

                              • 12. Re: HornetQ cluster, messages still sent to failed node?
                                Leos Bitto Novice

                                Tim Fox wrote:

                                 

                                BTW I'd bear in mind that ActiveMQ does it pretty much the same way as HornetQ as present - i.e. it has no automatic fail back and requires manual copying of the data directories same as HQ.

                                 

                                That is actually true only for one kind of ActiveMQ cluster. Other kinds do not suffer from this shortcoming: http://activemq.apache.org/masterslave.html

                                • 13. Re: HornetQ cluster, messages still sent to failed node?
                                  Tim Fox Master

                                  Leos Bitto wrote:

                                   

                                  Tim Fox wrote:

                                   

                                  BTW I'd bear in mind that ActiveMQ does it pretty much the same way as HornetQ as present - i.e. it has no automatic fail back and requires manual copying of the data directories same as HQ.

                                   

                                  That is actually true only for one kind of ActiveMQ cluster. Other kinds do not suffer from this shortcoming: http://activemq.apache.org/masterslave.html

                                  And HQ also supports shared file system failover same as ActiveMQ which does not require copying. No difference from ActiveMQ

                                  • 14. Re: HornetQ cluster, messages still sent to failed node?
                                    Tim Fox Master

                                    If there's something you find lacking in HQ failover / HA please file a JIRA and we will deal with it.

                                     

                                    We are more than willing to address any concerns if they are communicated this. That is the value of community and why we appreciate your feedback.

                                     

                                    So, keep 'em coming

                                    1 2 Previous Next