1 2 Previous Next 18 Replies Latest reply on Jun 17, 2013 3:37 AM by grim_toaster

Duplicate/retried messages when acknowledge fails due to race condition?

grim_toaster Apr 19, 2013 5:25 AM

Hi,

I'm getting duplicate messages sent to a consumer as it appears the acknowledgement response fails with the server side error. This also leads to excessively large log files and a lot of messages being sent to the DLQ, but the duplicate messages on queue consumers is probably the scariest bit.

Apr 19, 2013 9:52:03 AM org.hornetq.core.protocol.core.ServerSessionPacketHandler handlePacket

ERROR: HQ224016: Caught exception

HornetQException[errorType=ILLEGAL_STATE message=HQ119027: Could not find reference on consumer ID=0, messageId = 156 queue = b6f5c32a-3058-4c57-bfd9-3e4bee9e2d7d]

at org.hornetq.core.server.impl.ServerConsumerImpl.acknowledge(ServerConsumerImpl.java:704)

at org.hornetq.core.server.impl.ServerSessionImpl.acknowledge(ServerSessionImpl.java:634)

at org.hornetq.core.protocol.core.ServerSessionPacketHandler.handlePacket(ServerSessionPacketHandler.java:274)

at org.hornetq.core.protocol.core.impl.ChannelImpl.handlePacket(ChannelImpl.java:631)

at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.doBufferReceived(RemotingConnectionImpl.java:547)

at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.bufferReceived(RemotingConnectionImpl.java:523)

at org.hornetq.core.remoting.server.impl.RemotingServiceImpl$DelegatingBufferHandler.bufferReceived(RemotingServiceImpl.java:565)

at org.hornetq.core.remoting.impl.netty.HornetQChannelHandler.messageReceived(HornetQChannelHandler.java:72)

at org.jboss.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:88)

at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)

at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)

at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:281)

at org.hornetq.core.remoting.impl.netty.HornetQFrameDecoder2.decode(HornetQFrameDecoder2.java:169)

at org.hornetq.core.remoting.impl.netty.HornetQFrameDecoder2.messageReceived(HornetQFrameDecoder2.java:134)

at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)

at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)

at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555)

at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)

at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)

at org.jboss.netty.channel.socket.oio.OioWorker.process(OioWorker.java:71)

at org.jboss.netty.channel.socket.oio.AbstractOioWorker.run(AbstractOioWorker.java:73)

at org.jboss.netty.channel.socket.oio.OioWorker.run(OioWorker.java:51)

at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)

at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)

at org.jboss.netty.util.VirtualExecutorService$ChildExecutorRunnable.run(VirtualExecutorService.java:175)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)

at java.lang.Thread.run(Thread.java:722)

If I have a single producer with a single consumer, then there are no issues, every message sent is consumed once. However, if I use multiple producers that are sending messages in parallel with the single consumer, then I get the above stack trace and although only 2000 messages are sent, over 10000 messages are normally consumed (redelivered messages).

I have this issue with a deployed application that has recently been upgraded from 2.2.14 (due to the HORNETQ-1042), but I have managed to recreate the issue in the attached test case, it fails on at least one of the tests almost every time run (increasing the values for MESSAGES_TO_SEND_PER_PRODUCER or MULTIPLE_PRODUCERS_COUNT increases the likelihood of failure).

I'm presuming some kind of race condition in HornetQ, but would appreciate if anyone knows of this issue (I couldn't find anything relevant in Jira) or whether it's a simple configuration issue.

Many Thanks

Andy

project.zip 3.6 KB

1. Re: Duplicate/retried messages when acknowledge fails due to race condition?

ataylor Apr 19, 2013 5:39 AM (in response to grim_toaster)

you shouldnt be sharing JMS sessions in multiple threadsl, JMS sessions are not thread safe.
Actions
2. Re: Duplicate/retried messages when acknowledge fails due to race condition?

grim_toaster Apr 19, 2013 5:44 AM (in response to ataylor)

I'm not, each thread is using its own connection and session.
Actions
3. Re: Duplicate/retried messages when acknowledge fails due to race condition?

ataylor Apr 19, 2013 6:27 AM (in response to grim_toaster)

hmmm, strange, that message is indicitave of multiple threads accking the same session, basically a message is being acked that probably has been acked from elsewhere, can you double check your code and if you stillthink its an issue i will take alook next week.
Actions
4. Re: Duplicate/retried messages when acknowledge fails due to race condition?

grim_toaster Apr 19, 2013 6:46 AM (in response to ataylor)

Thanks for the response,

This is actually an issue we're seeing in the real application, where all of the consumers/producers are on seperate boxes, this sample code took a while to piece together to recreate the issue, but each of the 3 threads the test case uses definitely uses a different connection and session object. I did had another test program that ran the server and consumers and producers in seperate VMs, but that involved all the external XML configurations and bootstrappers and required too many manual steps to be able to post here!

Although probably not conclusive, a simple sysout on the created sessions reveals:

HornetQRASession->DelegatingSession [session=ClientSessionImpl [name=1555b248-a8de-11e2-b5e0-2d03312c2ae2, username=null, closed=false, factory = ClientSessionFactoryImpl [serverLocator=ServerLocatorImpl [initialConnectors=[TransportConfiguration(name=14f0d6e8-a8de-11e2-b5e0-2d03312c2ae2, factory=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory) ], discoveryGroupConfiguration=null], connectorConfig=TransportConfiguration(name=1554eef3-a8de-11e2-b5e0-2d03312c2ae2, factory=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory) , backupConfig=null], metaData=(jms-client-id=-605023830,jms-session=,)]@5e310a07]

HornetQRASession->DelegatingSession [session=ClientSessionImpl [name=15551604-a8de-11e2-b5e0-2d03312c2ae2, username=null, closed=false, factory = ClientSessionFactoryImpl [serverLocator=ServerLocatorImpl [initialConnectors=[TransportConfiguration(name=14f0d6e8-a8de-11e2-b5e0-2d03312c2ae2, factory=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory) ], discoveryGroupConfiguration=null], connectorConfig=TransportConfiguration(name=154eae5d-a8de-11e2-b5e0-2d03312c2ae2, factory=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory) , backupConfig=null], metaData=(jms-client-id=-1433120513,jms-session=,)]@493a377d]

HornetQRASession->DelegatingSession [session=ClientSessionImpl [name=1554eef2-a8de-11e2-b5e0-2d03312c2ae2, username=null, closed=false, factory = ClientSessionFactoryImpl [serverLocator=ServerLocatorImpl [initialConnectors=[TransportConfiguration(name=14f0d6e8-a8de-11e2-b5e0-2d03312c2ae2, factory=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory) ], discoveryGroupConfiguration=null], connectorConfig=TransportConfiguration(name=14f0d6e8-a8de-11e2-b5e0-2d03312c2ae2, factory=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory) , backupConfig=null], metaData=(jms-client-id=-1776981760,jms-session=,)]@5ec881b5]
Actions
5. Re: Duplicate/retried messages when acknowledge fails due to race condition?

ataylor Apr 19, 2013 9:32 AM (in response to grim_toaster)

I see you are using the Resource Adapter connections, how are you running these tests,
Actions
6. Re: Duplicate/retried messages when acknowledge fails due to race condition?

grim_toaster Apr 19, 2013 9:37 AM (in response to ataylor)

I've used both maven 3.0.4 at the command line (pom.xml file attached in the original zip) and from within eclipse, just as a normal Junit test.
Actions
7. Re: Duplicate/retried messages when acknowledge fails due to race condition?

ataylor Apr 19, 2013 9:53 AM (in response to grim_toaster)

the sys outs show as HornetQRASession's but this is a typo in our tostring. I will take a look early next week when i have some time.
Actions
8. Re: Duplicate/retried messages when acknowledge fails due to race condition?

grim_toaster Apr 19, 2013 9:57 AM (in response to ataylor)

Great, thank you.
Actions
9. Re: Duplicate/retried messages when acknowledge fails due to race condition?

grim_toaster May 3, 2013 4:21 AM (in response to ataylor)

Hi,

upgrading to version 2.3.0.Final fixes the issue, I couldn't find anything specific in Jira to identify the issue - but it all works now.

Thanks for your time
Andy
Actions
10. Re: Duplicate/retried messages when acknowledge fails due to race condition?

grim_toaster May 17, 2013 5:07 AM (in response to grim_toaster)
Hi,

unfortunately, after further testing with the real application, the problem still persists. I have updated the test project to include paging with the page size bytes and max bytes set to an artificially low number so the issue can be recreated consistently - in the "real" application we had a lot more producers and consumers sending larger messages.

If the configuration is changed to use the AddressFullMessagePolicy of block, everything works fine.

I have tried this with 2.3.0.Final (and 2.3.1.Final, which doesn't seem to be officially released, but is available via maven).

Stack trace snippet:

May 17, 2013 10:03:24 AM org.hornetq.core.protocol.core.ServerSessionPacketHandler handlePacket
ERROR: HQ224016: Caught exception
HornetQException[errorType=ILLEGAL_STATE message=HQ119027: Could not find reference on consumer ID=0, messageId = 2,026 queue = bde0dd50-17fd-4aad-8cdf-eb8b4d4fa478]
    at org.hornetq.core.server.impl.ServerConsumerImpl.acknowledge(ServerConsumerImpl.java:704)
    at org.hornetq.core.server.impl.ServerSessionImpl.acknowledge(ServerSessionImpl.java:634)
    at org.hornetq.core.protocol.core.ServerSessionPacketHandler.handlePacket(ServerSessionPacketHandler.java:274)
    at org.hornetq.core.protocol.core.impl.ChannelImpl.handlePacket(ChannelImpl.java:631)
    at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.doBufferReceived(RemotingConnectionImpl.java:547)
    at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.bufferReceived(RemotingConnectionImpl.java:523)

Thanks
Andy

project.zip 4.0 KB
Actions
11. Re: Duplicate/retried messages when acknowledge fails due to race condition?

grim_toaster May 24, 2013 4:44 AM (in response to ataylor)

Hi,

has anyone had a chance to look the project to help me identify the issues? I'm presuming I've misconfigured something somewhere, as I'm sure other people have higher throughput systems that would have found these issues previously?

Andy
Actions
12. Re: Duplicate/retried messages when acknowledge fails due to race condition?

grim_toaster Jun 7, 2013 11:00 AM (in response to grim_toaster)

Hi,

anything obviously wrong with my configuration, or should I raise a Jira?

Thanks
Andy
Actions
13. Re: Duplicate/retried messages when acknowledge fails due to race condition?

gaohoward Jun 8, 2013 10:44 AM (in response to grim_toaster)

Hi,

I believe there is a bug. It seems it has something to do with hornetq's direct deliver logic. I'll investigate more, but I think it's no harm to raise a Jira first. Thanks!

Howard
Actions
14. Re: Duplicate/retried messages when acknowledge fails due to race condition?

gaohoward Jun 9, 2013 8:41 AM (in response to grim_toaster)

Hi ,

https://issues.jboss.org/browse/HORNETQ-1224

The issue is the Queue has two executors that concurrently call Queue's deliver method, resulting in message acknowledgement out of order.
Actions

1 2 Previous Next

Go to original post