Messages being dropped during failover?
stwhit May 12, 2011 5:49 PMIf a JMS consumer is using a transacted session, receives a message, attempts to commit, and receives a TransactionRolledBackException on the commit() call, what should that consumer do with that message?
I believe the answer is "discard it". But if that is the case, I've got some code that demonstrates that (sometimes) that message will never get redelivered. See the attachment.
This example uses a live/backup HA pair of hornetq instances. The application starts 2 threads (a producer and a consumer). The producer produces messages containing incrementing numbers beginning with 1. After a period of time, the live server is killed, and everything fails over to the backup properly.
When the producer is finished, it sends a final message containing the negative of the total number of messages sent. So, when the consumer receives a negative value, it know the total number of messages to expect, and will shut down when all those messages have been received.
Many times, this example will run to successful completion, which means the consumer received all the messages it expected. But sometimes, the example will hang. During these runs, the consumer never exits, because it is forever waiting on a message it will never receive. The message that the consumer is waiting for is the message that it received but failed to commit.
To run the example, untar it, set your HORNETQ_222_HOME environment variable to point to the directory containing a hornetq 2.2.2 installation, and run build.sh.
If you see the "SUCCESS!" message, that means the problem didn't occur during that run. Please re-run a few times, and you should see the problem.
I've also attached a couple of logs, one demonstrating successful completion of the example, the other demonstrating the consumer hanging waiting for a message that never arrives.
-
failure.log.zip 4.2 KB
-
success.log.zip 4.6 KB