1 2 Previous Next 19 Replies Latest reply on Jun 26, 2010 2:47 AM by raja021084

HornetQ cluster, messages still sent to failed node?

xbryan May 21, 2010 8:23 PM

I have a simple symmetric HornetQ cluster set up with 2 nodes. Each node has a consumer reading from its local queue. One node produces messages and puts them in the local queue. Messages are being load balanced round robin across both nodes and everything works as expected.

If I shut down the second node (the one not producing messages), the remaining node only receives every other message. It is as if HornetQ thinks the node that went down is still up and is blindly sending messages there, and ignores the fact the node is down. The result is that every other message is lost.

If I bring the second node back up, messages are again load balanced across both nodes and no messages are lost. The messages that were missed are not redelivered.

If I bring up the first node by itself, it consumes all messages.as a standalone server as expected.

Is there a way to configure the cluster so that if a node goes down, HornetQ will no longer attempt to forward messages to that node? What is happening to the messages that are being lost?

1. Re: HornetQ cluster, messages still sent to failed node?

xbryan May 22, 2010 1:13 AM (in response to xbryan)

I need to make a small correction. Once I bring the second node back up, the messages that were missed are, in fact, sent to the node as soon as it comes up again. So the producer node continues to round-robin load balance messages even though the second node is down, queues the messages for the second node, and once the second node comes up again, it gets flooded it with all of the messages that had been queued for it while it was down. I am basically looking to have the node that is up handle all new messages while the second node is down. Is this possible?
Actions
2. Re: HornetQ cluster, messages still sent to failed node?

timfox May 22, 2010 2:59 AM (in response to xbryan)

No messages are being lost.

If you take a node down, clearly the messages in the store for that node won't be available until that node is restarted.

When that node restarts, it's messages will become re-available to be consumed.
Actions
3. Re: HornetQ cluster, messages still sent to failed node?

xbryan May 22, 2010 3:08 AM (in response to timfox)

Yes, that's what I subsequently posted. Is there a setting to make all new messages go to the node that is up instead of having new messages balanced to a node that is down (for an indefinite amount of time)? I looked in the code and noticed that the cluster connection creates a bridge with max retries of -1 (infinite), and that cannot be configured. Would setting that to a positive number cause the bridge to the node that is down to close, and then all messages would go to the node that is up?
Actions
4. Re: HornetQ cluster, messages still sent to failed node?

xbryan May 22, 2010 3:29 AM (in response to xbryan)

I modified my local copy of the HornetQ code so the bridge being created by the cluster connection sets the max retries to 1. That had no effect. Messages were still targeted to the node that is down. I'll need to dig deeper to understand why this is happening.
Actions
5. Re: HornetQ cluster, messages still sent to failed node?

timfox May 22, 2010 3:40 AM (in response to xbryan)

Bryan Keller wrote:

Yes, that's what I subsequently posted. Is there a setting to make all new messages go to the node that is up instead of having new messages balanced to a node that is down (for an indefinite amount of time)?
Messages don't go to the node that is down. After all, it's down so it can't receive any messages.

What happens is messages are stored in the node that is up, and then forwarded to the down node when it returns.
Actions
6. Re: HornetQ cluster, messages still sent to failed node?

xbryan May 22, 2010 3:47 AM (in response to timfox)

Yes I understand. Maybe I should be more clear. When a node goes down, I don't want new messages to be targeted for that node. After all, the node is down so why is HornetQ attempting to send messages to it? The node could have suffered from a catastrophic hardware failure and I don't want to have to wait to have the node to come up again to have messages delivered and processed. (What if the node never comes up again?) Is there a setting that makes that possible?
Actions
7. Re: HornetQ cluster, messages still sent to failed node?

leosbitto May 22, 2010 6:30 AM (in response to xbryan)

Bryan Keller wrote:

Yes I understand. Maybe I should be more clear. When a node goes down, I don't want new messages to be targeted for that node. After all, the node is down so why is HornetQ attempting to send messages to it? The node could have suffered from a catastrophic hardware failure and I don't want to have to wait to have the node to come up again to have messages delivered and processed. (What if the node never comes up again?) Is there a setting that makes that possible?

What about forgetting the cluster at all and configuring both consumers to consume the messages from the same queue (where the producer puts them)?
Actions
8. Re: HornetQ cluster, messages still sent to failed node?

xbryan May 22, 2010 2:40 PM (in response to leosbitto)

Yes, I considered that topography, though it isn't ideal for my purpose. I need high availability, so presumably I would go with a master-slave setup if I did this. This turns out to be not ideal. I will not have a shared file system available, and HornetQ doesn't support JDBC persistence, so once the master goes down, I cannot fail-back to it without having an outage where I copy state from the slave back to the master. I want a setup that will not require an outage. Also, HornetQ supports only one slave.

Thus I looked into clustering. I was hoping I could cluster multiple M-S pairs, so having an outage on one pair for maintenance would be OK (though then my requirements for HA would require 4 nodes minimum). Using server-side load balancing does not work for me. As I mentioned, HornetQ will continue to target messages to nodes that are down and wait for the node to come up again to deliver. I need messages to be processed in a timely manner. I also was not able to get server-side load balancing to work as I expected with M-S pairs.

The next alternative is client-side load balancing, where I have multiple M-S pairs, no server forwarding, and the messaging client picks a server pair in a round robin fashion using discovery. I think this is probably my last hope if I decide to use HornetQ, though I still need to consider if this will work given the product requirements I have to work with.
Actions
9. Re: HornetQ cluster, messages still sent to failed node?

leosbitto May 22, 2010 3:55 PM (in response to xbryan)

I do not understand your requirements/architecture. Can you afford losing some messages when you lose a server (due to some hardware failure)? Can you afford downtime during your maintenance (for example to get the failed server back online)? I am afraid that if your answers to both questions would be No, it would lead to some serious clustering which the current versions of HornetQ would not be able to achieve yet. On the other hand the previous messaging provider from JBoss/RedHat (JBoss Messaging 1.4) could be the right thing for you. Sice you mentioned that you would not have any shared file system, consider coupling it with MySQL cluster, that could give you a working solution - probably even with a proper commercial support, if that would be needed. And if JBoss Messaging loses support after some time, you can replace it with ActiveMQ which should work fine with MySQL cluster too, or maybe even with HornetQ if that happens to work with JDBC storage by then...
Actions
10. Re: HornetQ cluster, messages still sent to failed node?

xbryan May 22, 2010 4:07 PM (in response to leosbitto)

High availability is my major goal, i.e. no downtime even for fail-back. Load balancing the message queue would be nice but is of lesser concern, load balancing the consumers is more important to me and that is easily achieved. I was looking into using a cluster as a way to allow a M-S pair to come down for fail-back maintenance without having an outage. I was hoping the live M-S pair could service the messages while the other pair is down. But requiring 4 nodes for HA+failback is a little much, even if it did work.

As you mentioned, I will probably need to use a MQ provider that supports JDBC (JBoss Messaging, ActiveMQ) given I have no shared filesystem.
Actions
11. Re: HornetQ cluster, messages still sent to failed node?

timfox May 22, 2010 6:06 PM (in response to xbryan)

Bryan Keller wrote:

High availability is my major goal, i.e. no downtime even for fail-back. Load balancing the message queue would be nice but is of lesser concern, load balancing the consumers is more important to me and that is easily achieved. I was looking into using a cluster as a way to allow a M-S pair to come down for fail-back maintenance without having an outage. I was hoping the live M-S pair could service the messages while the other pair is down. But requiring 4 nodes for HA+failback is a little much, even if it did work.

As you mentioned, I will probably need to use a MQ provider that supports JDBC (JBoss Messaging, ActiveMQ) given I have no shared filesystem.
"Fail-back" (or promoting a backup server to be a live) is on the roadmap. BTW I'd bear in mind that ActiveMQ does it pretty much the same way as HornetQ as present - i.e. it has no automatic fail back and requires manual copying of the data directories same as HQ.
Actions
12. Re: HornetQ cluster, messages still sent to failed node?

leosbitto May 22, 2010 6:20 PM (in response to timfox)

Tim Fox wrote:

BTW I'd bear in mind that ActiveMQ does it pretty much the same way as HornetQ as present - i.e. it has no automatic fail back and requires manual copying of the data directories same as HQ.

That is actually true only for one kind of ActiveMQ cluster. Other kinds do not suffer from this shortcoming: http://activemq.apache.org/masterslave.html
Actions
13. Re: HornetQ cluster, messages still sent to failed node?

timfox May 22, 2010 6:29 PM (in response to leosbitto)

Leos Bitto wrote:

Tim Fox wrote:

BTW I'd bear in mind that ActiveMQ does it pretty much the same way as HornetQ as present - i.e. it has no automatic fail back and requires manual copying of the data directories same as HQ.

That is actually true only for one kind of ActiveMQ cluster. Other kinds do not suffer from this shortcoming: http://activemq.apache.org/masterslave.html
And HQ also supports shared file system failover same as ActiveMQ which does not require copying. No difference from ActiveMQ
Actions
14. Re: HornetQ cluster, messages still sent to failed node?

timfox May 22, 2010 6:32 PM (in response to timfox)

If there's something you find lacking in HQ failover / HA please file a JIRA and we will deal with it.

We are more than willing to address any concerns if they are communicated this. That is the value of community and why we appreciate your feedback.

So, keep 'em coming
Actions

1 2 Previous Next

Go to original post