ServiceInvoker.deliverSync() replies getting bounced around JBM Cluster
ryanhos Feb 7, 2011 6:18 PMEnvironment: JBoss ESB 4.7 deployed on JBoss 5.1.0 GA. JBoss Messaging 1.4.7.
We noticed some peculiar behavior on our test cluster last week when we spun-up the load testing software. Replies to ServiceInvoker.deliverSync() were seemingly getting lost, ending in an eventual timeout. I watched the record counts in the JBoss Messaging tables and noticed something surprising. There were quite a few messages on the reply queues, and they were getting shuffled from cluster node to cluster node. I instantly suspected our old friend, the Message Selector, which is how JBoss ESB routes reply messages. So, i wrote a test bed. This is what I found. I hope that someone has already run into this problem and can steer me around it.
Blue Cluster Node:
One clustered queue named "test.queue"
One message producer, constantly sending messages with the property SelectorKey=Blue_<ever-increasing-integer-starting-with-zero> to the local test.queue.
One message producer, constantly sending messages with the property SelectorKey=Red_<ever-increasing-integer-starting-with-zero> to the local test.queue
One message consumer, constantly making a new connection, session, and consumer, attempting to consume messages with
SelectorKey=Blue_<ever-increasing-integer-starting-with-zero>
Red Cluster Node:
One clustered queue named "test.queue"
One message consumer, constantly making a new connection, session, and consumer, attempting to consume messages with
SelectorKey=Red_<ever-increasing-integer-starting-with-zero>
I booted both cluster nodes and invoked the MBean operations to launch the producers and then the consumers. Everything went well until about the 40th producer iteration, when both consumers stalled waiting for a message. The producers kept going until I terminated them.
On the Blue Cluster Node, Consumer was waiting on SelectorKey=Blue_40. On the Red Cluster Node, Consumer was waiting on SelectorKey=Red_37. I used the Queue's listAllMessages(String selector) method on each cluster node to ascertain where the messages were. You guessed it. Message Blue_40 was in the queue on the Red Cluster Node. Message Red_37 was in the queue on the Blue Cluster Node. (listAllMessages(String selector) does not return every known message on a logical clustered queue, just the messages currently the responsibility of a particular JBoss Messaging Cluster Node).
It follows logically that the Blue_40 message was sucked across to the Red Node by the JBoss Messaging MessageSucker. This is probably due to the fact that a consumer appeared over on that node and found an empty queue. The Red Node JBM process wanted to feed the starved consumer, so it sucked a block of messages over from the Blue Node, which had all of them since that is the only place they were being produced. It didn't manage to obtain the message it was looking for (Red_37), but it did manage to steal the Blue_40 message.
So, who else is running synchronous ESB invocations on a cluster and has navigated around this problem or avoided encountering it entirely? Does anyone have any creative ideas for getting aroudn this? One "hail-mary" suggestion from our team was to make the reply queue a topic, in the hope that the selector processing is somewhat different. This is not a solution we're proud of; it's just a possible, untested solution.