Okay I tried this with the trunk code and am still getting the same behavior. If I nicely shut down one of the nodes then the other node exhibits the behavior I would expect but not if I hard kill it. This may not seem like a big problem but let me elaborate a little. I am somewhat concerned about this if a node just goes down unexpectedly in the night then the clients will all exhibit bizarre behavior until we restart the other node and then when the other node comes up it will be flooded with requests that probably should be load balanced. That is one scenario. The main reason that this is coming up is that when we bring up our two nodes in our production environment after just a few minutes one of the nodes with state something like "Node ... didn't handle a message" and then it will mark the node as un-reliable. However it will continue to put messages onto the bridge expecting that something will come get the messages. The only consumer has been suspected and so it can't get the messages. What makes this even worse is that there seems to be no way to get that suspected node back into the cluster. I have tried restarting it and no luck it never starts getting messages from the bridge. So the only way to correct the problem is to restart the entire cluster. That is awful. Anyway, the issue of why one node very shortly suspects the other node in our production environment is another problem that I will start another thread about but that would be a small problem if the node failure or unreliable status had some way of automatically recovering cleanly.
I have filed a JIRA: https://jira.jboss.org/browse/HORNETQ-571. I have set the level as critical as I believe this to be a major issue especially since a hard kill of the server is not the only way to get into this situation, but it looks like if one of the nodes is marked as NON-RELIABLE this will start happening and then the non-reliable node is not allowed to rejoin the cluster and the surviving node won't process the messages so the only way to correct the problem is to restart the whole cluster or only run with one node and kill the cluster connections manually.
I added the steps to reproduce the issue to the JIRA. Like I mentioned in the JIRA I integrated hornetQ with 2 jbosses started them up added a single Q to them that was clustered and then have a very simple message driven bean consuming from the Q that just prints a message to the log when it gets a message. I just wrote a simple client to send a jms message to the Q and then I killed one of the servers. What do you want me to provide? The jboss's? My MDB code? Config files? Simple client?
I believe we are seeing a similar situation
Our scenario is:
- 2 Live/Backup pairs in a standalone cluster (we are not integrated with JBoss)
- A single producer which connects to the cluster and sends at a very slow rate of 1 message per second
- A single consumer which reads the messages from the clustered queue.
- Consumer reads all messages.
- Shutdown first Live server, clients connected to that server failover to the backup (At this point we are seeing some duplicates coming through but that is for a different thread)
- All messages continue to be consumed correctly.
- Shutdown the 1st backup server.
- Clients that were connected to the backup server establish a new connection to the cluster.
Now this is where things vary a bit. In about 50% of cases all messages continue to be consumed from the queue.
In the other 50% of cases I only see every 2nd message being consumed.
I'm guessing the variation in the results is due to which live/backup pair the producer and consumer is connecting to when they orginally connect to the cluster.
When I'm only receiving every second message then I can see the message count on the bridge queue grows with 1 consumer but when I'm receiving all messages then the bridge queue has 1 consumer and the message count stays is 0.
The behaviour you're seeing is by design. It's also been discussed in other threads in the past.
When you have two nodes in a cluster, each node maintains an internal store and forward queue corresponding to the other node (actually there may be many of these). When messages arrive on a node, they are round robin'd between the local queues and the store and forward queues.
Any messages in the store and forward queues will then be forwarded to the other node, assuming it is up. If the other node is not up, e.g. it has crashed then messages will still go into the store and forward queue - this is why you only see every other message.
When the other node comes back up the messages will get forwarded on. This allows us to maintain strict round robin even in the event of server crash.
If you don't want this behaviour, perhaps you could add a feature request for a switch to be added which, if true, would cause messages in s & f queues to be rerouted "immediately" (actually after connection-ttl, since the server has no other way of detecting other node crash than pinging).
In the mean-time you can manually move messages from the s & f queues using the management API.
I will definitely add a feature request. I am having a hard time thinking of a situation where this would be desirable behavior. The whole point of clustering should be balancing the load across servers so that the requests get serviced as quickly as possible as well as provide fault tolerance to the application. This behavior seems to hurt both goals. First your cluster is not fault tolerant in that when a node goes down it has a measurable consequence and also it doesn't nicely balance the requests. For example if I have 2 servers and I send 1000 messages to be processed and one goes down immediately then 500 messages will be processed by the server that is up and the other 500 will be processed once the other server comes up. Wouldn't better behavior be that the messages got distributed to the live server once it is done processing its load and then when the other server comes back up to round robin at that point. With the current behavior all of the load of these messages that get piled up are then handled by a single server. If it is down for a day and 1000000 messages stack up waiting for the other server when maybe there are 10 other nodes in the cluster you basically are forcing the single node when it comes back up to do all the work even though maybe the other 10 nodes are idle.
Not only that but I think that this is clearly doing something wrong because if you are running a cluster and you do a nice shutdown of jboss the behavior is as expected. Messages no longer get put into the queue for the down node but the remaining node services all the requests. I think you might be incorrect that this is intended behavior. Anyway, I will file a feature request for this bug but I think it is pretty critical that it gets addressed for this product to be any sort of production ready solution.
A nice shutdown is different than a failure.
If you kill a node, you are expected to have a backup or restart the node. On that case the node should receive the messages when it's back.
If you shutdown the node cleanly, then you are taking out a node of the cluster, hence the messages will be delivered to the remaining nodes.
Given the discussion that is going on with kill v shutdown and live with backup in a cluster, can I ask what the expected behaviour is in the following scenario:
- 2 live/backup pairs in a cluster
- 1 producer sending a message to a queue
- 1 consumer reading from the clustered queue
- KILL first live server - all messages continue to be consumed
- SHUTDOWN backup server (to enable the live server to be restarted)
In about 50% of the times I run this scenario after step 5 every 2nd message gets delivered to the consumer.
In the other 50% all of the messages get delivered.
Based on what you are saying, after a shutdown all the messages should be redistributed to the other nodes and therefore the consumer should get 100% of the messages 100% of the time.
We are working on a trading system where message delivery is of the highest priority (certainly in comparison to strict load balancing). We can't have messages not being delivered to consumers for any more than a few hundred ms let alone minutes if a we had to go and restart a live/backup pair. Therefore this feature to always redistribute messages is a critical issue for us.
Without being able to ensure 100% delivery 100% of the time (or certainly very close to it), including node outages due to KILL or SHUTDOWN, HornetQ in a clustered scenario simply isn't a viable option for us in a production environment.