1 2 Previous Next 24 Replies Latest reply on May 20, 2013 8:04 AM by jmrecio

out-of-order messages after abrupt MDB termination

jmrecio Apr 3, 2013 9:09 AM

I have thoroughly read the docs, FAQ, and discussions (such as https://community.jboss.org/message/568278, https://community.jboss.org/message/551502), I am not sure whether we are seeing is a bug, probably I am missing something, I would appreciate your comments.

Setup (simplification of a more complex application):

System P: bunch of producers, sending messages to a queue Q1. Every message sent by a given producer belongs to the same messageGroup, i.e. relationship between producers and groups is 1-1 (this is a simplification over the real application).

System C1: bunch of MDB consumers, reading from queue Q1.

System C2: bunch of MDB consumers, reading from queue Q1.

C1 and C2 are clustered.

Standalone HornetQ: Queue Q1

Each messageGroup is associated to an event stream for a given user-application pair, those events must be processed in order. Groups are long lived.

Test:

1) Producers happily sending messages, being consumed in C1 and C2. C1 cleanly stops. Its groups are quickly reassigned to consumers in C2. No message is lost or out of order.

2) Producers happily sending messages, being consumed in C1 and C2. C1 abruptly disappears (simulated with kill -9). Server waits for connection-ttl to expire, groups previously consumed in C1 are now assigned to consumers in C2. In this case some groups have sometimes out of order messages.

Schematic example explaining what we see in case 2:

Producer P1 in system P sends messages belonging to group G1: ..., M1-G1, M2-G1, M3-G1, M4-G1, M5-G1, M6-G1, M7-G1, M8-G1, ...

Those messages were consumed by a consumer in C1. C1 abruptly disappears. When the group G1 is assigned to a consumer in C2, this is what (sometimes) this consumer receives:

M500-G1, M501-G1,..., M700-G1, M498-G1, M499-G1, M701-G1, M702-G1, ...

In this example, messages M498 and M499 arrive out of order, after a while.

Discussion:

My understanding is that case (2) should work the same as case (1) (of course once server realizes problem in C2 when connection-ttl expires). The messages not ack'ed by the dead client should be put at the head of the queue, and delivered according to messageGroup rules, so I would expect that messages in group G1 should be delivered to a consumer in C2 in order.

But it seems some messages are kept back for some reason, and delivered after a while.

We have tested with different HornetQ releases, played with some settings in HornetQ server (thread pool sizes, connection-ttl-override, transaction-timeout, ...) and MDBs (useLocalTx, CMT/bean controlled transactions, ...), but not been able to fix.

Number of out of order msgs is smaller when useLocalTx is activated in the MDBs, so it seems related to transactions. Also when thread-pool-max-size and scheduled-thread-pool-max-size are increased, the number of out-of-order messages is smaller. We still see same behaviour with connection-ttl-override = 15000, transaction-timeout = 10000, transaction-timeout-scan-period = 500.

I can provide test programs, this does not happen in every test, but it is reproducible. I am happy to share configurations, etc. but before overloading with info, I would first like to check whether my understading is correct.

Question: Is this expected? Any setting that might help?

Thanks for your expertise

1. Re: out-of-order messages after abrupt MDB termination

ataylor Apr 3, 2013 9:25 AM (in response to jmrecio)

non acked messages are added back to the head of the queue, however messages received within a transaction but tyhen rolled back, say because of consumer crash will be marked as redelivered, what do you have your redeliver-delay set to?
Actions
2. Re: out-of-order messages after abrupt MDB termination

jmrecio Apr 3, 2013 9:55 AM (in response to ataylor)
Thanks, Andy

redelivery-delay is set to zero. HornetQ configuration files attached.

I think my understanding aligns with your comment. That is how I explain much less impact after using useLocalTx = true in MDB, as there are no longer XA transactions involving the HornetQ server and there is no need for a XA rollback to redeliver the messages.

Let me rephrase, assuming there are no XA transactions: once server has marked a consumer as crashed, JMS transactions are rolled back and messages added at the top of the queue, right?. For some messages this doesn't happen immediately after consumer is marked as crashed, it takes longer.

hornetq-2.2.24.GA.config.3.apr.tgz 2.4 KB
Actions
3. Re: out-of-order messages after abrupt MDB termination

ataylor Apr 3, 2013 10:04 AM (in response to jmrecio)

local transactions are still rolled back and messages will still be marked as redelivered, although it will be a much smaller window which is why you probably dont see it. Saying that if redelivery is 0 messages are put back on the head of the queue so there may be a bug where the consumer is un pinned before the messaged are rolled back. Any chance you could provide a simple test case and I will take a look?
1 of 1 people found this helpful
Actions
4. Re: out-of-order messages after abrupt MDB termination

jmrecio Apr 3, 2013 11:00 AM (in response to ataylor)

Very good.
Working on packaging a complete and simple test case.
Actions
5. Re: out-of-order messages after abrupt MDB termination

clebert.suconic Apr 3, 2013 11:17 AM (in response to jmrecio)

Take a look on consumer-window-size. That will create a big cache on the client. In certain cases you need to turn it off (setting it to 0)
Actions
6. Re: out-of-order messages after abrupt MDB termination

jmrecio Apr 3, 2013 11:52 AM (in response to clebert.suconic)

Thanks, Clebert.

We see same behaviour with consumer-window-size = 0.

I have reproduced with AS-7.2.0.Final and hornetq-2.3.0.CR1. I am preparing the test case package.
Actions
7. Re: out-of-order messages after abrupt MDB termination

clebert.suconic Apr 3, 2013 1:22 PM (in response to jmrecio)

Can you try with CR2 just in case?
Actions
8. Re: out-of-order messages after abrupt MDB termination

jmrecio Apr 3, 2013 2:03 PM (in response to clebert.suconic)
Tested with CR2.

Find attached a simple self-contained test setup that can reproduce the behaviour almost in every test (4 out of 5 tries). Test bench: 2.3.0-CR2 server; AS7.2.0-Final; MDBs deployed in AS7 (source included); standalone producers (source included).

Check manifest.txt for an explanation. Testing is simple, but a bit involved, as there are several moving parts (producers, AS, HornetQ server). Open to explain additional steps or whatever I might have forgotten: skype, IRC, etc.

Many thanks for your excellent and prompt response.

test.order.abrupt.tgz 12.7 KB
Actions
9. Re: out-of-order messages after abrupt MDB termination

clebert.suconic Apr 3, 2013 3:52 PM (in response to jmrecio)

Not related to your thread.. but You are using hornetq standalone inside the AS7? Why that?
Actions
10. Re: out-of-order messages after abrupt MDB termination

jmrecio Apr 4, 2013 6:10 AM (in response to clebert.suconic)

No, no, sorry for the misunderstanding, I have not explained myself very well:

- Producers are standalone java programs, connected to a standalone HornetQ

- Consumers are MDB deployed in AS 7.2.0-Final, connected to the same standalone HornetQ
Actions
11. Re: out-of-order messages after abrupt MDB termination

ataylor Apr 4, 2013 6:16 AM (in response to jmrecio)

@Jose I will take a look at the test you attached although it won't be until next week, will get back to you asap
Actions
12. Re: out-of-order messages after abrupt MDB termination

jmrecio Apr 4, 2013 7:08 AM (in response to ataylor)

Thanks, Andy
Actions
13. Re: out-of-order messages after abrupt MDB termination

jmrecio May 4, 2013 3:41 AM (in response to ataylor)

Hello, Andy,

Did you have the chance to take a look at the test setup?

Thanks !!!
Actions
14. Re: out-of-order messages after abrupt MDB termination

ataylor May 7, 2013 5:01 AM (in response to jmrecio)

Im looking at this now
Actions

1 2 Previous Next

Go to original post