I added some optimizations to the test bridge I posted in my other error report concerning the OOM issue yesterday and encountered some strange message-out-of-sequence conditions over night I'd like to report:
The test setup is the following: The HornetQ versions I tested with were Branch2_1/r9608 as well as Branch2_1/r9653. A single-threaded sender is sending to a single queue (HornetQ address jms.queue.msgduplicate_divert_inbound) that has two exclusive diverts (jms.queue.msgduplicate_divert_outbound_1/jms.queue.msgduplicate_divert_outbound_2) attached. Two receivers, running in parallel, are consuming messages from the *divert_outbound_* queues (one on each queue). The major differences to the tests before are that the maximum sizes of the *divert_outbound_* queues are a bit higher than before (configuration file is attached) and that the batch size of the receivers is time based (i.e. they receive as many messages as possible for a configured time and then commit) while the sender still uses a count based batch size. And I'm not restarting HornetQ anymore all over the time
I extended the multi-threaded test client used before in order to reproduce the issue. The client + config files are attached as usual. The command line parameters used were
stopfile 200 25000 0 5000 COUNT 5000 TIME 10000 true
The issue I see is swapped blocks (excerpt):
WARN [2010-09-08 11:26:49,844] R0: Received message # 849,393, expected 847,789.
WARN [2010-09-08 11:26:56,758] R0: Received message # 847,789, expected 850,000.
WARN [2010-09-08 11:26:56,805] R0: Received message # 850,000, expected 849,393.
WARN [2010-09-08 11:35:44,312] R1: Received message # 2,254,054, expected 2,253,814.
WARN [2010-09-08 11:35:51,393] R1: Received message # 2,253,814, expected 2,255,000.
WARN [2010-09-08 11:35:51,396] R1: Received message # 2,255,000, expected 2,254,054.
WARN [2010-09-08 11:52:52,079] R0: Received message # 4,934,300, expected 4,932,568.
WARN [2010-09-08 11:52:58,910] R0: Received message # 4,932,568, expected 4,935,000.
WARN [2010-09-08 11:52:58,928] R0: Received message # 4,935,000, expected 4,934,300.
WARN [2010-09-08 11:59:44,190] R1: Received message # 6,008,469, expected 6,006,710.
WARN [2010-09-08 11:59:52,080] R1: Received message # 6,006,710, expected 6,010,000.
WARN [2010-09-08 11:59:52,104] R1: Received message # 6,010,000, expected 6,008,469.
The first out-of-sequence conditions appear on my system usually within 1-2h. There are no errors/exceptions in the HornetQ log nor in the client log.
It might be relevant that the sequence errors do not appear on both *divert_outbound_* queues at the same messages. One receiver might get some messages out of sequence while the message sequence is fine for the same messages for the other receiver.
Additionally, it looks like the swapped blocks are somehow caused on the writing side as some of the start/end-IDs of the swapped blocks are multiples of the sending batch size (5000 for the log excerpt above).