1 2 Previous Next 18 Replies Latest reply on Jun 17, 2013 3:37 AM by grim_toaster

    Duplicate/retried messages when acknowledge fails due to race condition?

    grim_toaster

      Hi,

       

      I'm getting duplicate messages sent to a consumer as it appears the acknowledgement response fails with the server side error.  This also leads to excessively large log files and a lot of messages being sent to the DLQ, but the duplicate messages on queue consumers is probably the scariest bit.

       

      Apr 19, 2013 9:52:03 AM org.hornetq.core.protocol.core.ServerSessionPacketHandler handlePacket

      ERROR: HQ224016: Caught exception

      HornetQException[errorType=ILLEGAL_STATE message=HQ119027: Could not find reference on consumer ID=0, messageId = 156 queue = b6f5c32a-3058-4c57-bfd9-3e4bee9e2d7d]

          at org.hornetq.core.server.impl.ServerConsumerImpl.acknowledge(ServerConsumerImpl.java:704)

          at org.hornetq.core.server.impl.ServerSessionImpl.acknowledge(ServerSessionImpl.java:634)

          at org.hornetq.core.protocol.core.ServerSessionPacketHandler.handlePacket(ServerSessionPacketHandler.java:274)

          at org.hornetq.core.protocol.core.impl.ChannelImpl.handlePacket(ChannelImpl.java:631)

          at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.doBufferReceived(RemotingConnectionImpl.java:547)

          at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.bufferReceived(RemotingConnectionImpl.java:523)

          at org.hornetq.core.remoting.server.impl.RemotingServiceImpl$DelegatingBufferHandler.bufferReceived(RemotingServiceImpl.java:565)

          at org.hornetq.core.remoting.impl.netty.HornetQChannelHandler.messageReceived(HornetQChannelHandler.java:72)

          at org.jboss.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:88)

          at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)

          at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)

          at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:281)

          at org.hornetq.core.remoting.impl.netty.HornetQFrameDecoder2.decode(HornetQFrameDecoder2.java:169)

          at org.hornetq.core.remoting.impl.netty.HornetQFrameDecoder2.messageReceived(HornetQFrameDecoder2.java:134)

          at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)

          at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)

          at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555)

          at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)

          at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)

          at org.jboss.netty.channel.socket.oio.OioWorker.process(OioWorker.java:71)

          at org.jboss.netty.channel.socket.oio.AbstractOioWorker.run(AbstractOioWorker.java:73)

          at org.jboss.netty.channel.socket.oio.OioWorker.run(OioWorker.java:51)

          at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)

          at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)

          at org.jboss.netty.util.VirtualExecutorService$ChildExecutorRunnable.run(VirtualExecutorService.java:175)

          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)

          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)

          at java.lang.Thread.run(Thread.java:722)

       

       

      If I have a single producer with a single consumer, then there are no issues, every message sent is consumed once.  However, if I use multiple producers that are sending messages in parallel with the single consumer, then I get the above stack trace and although only 2000 messages are sent, over 10000 messages are normally consumed (redelivered messages). 

       

      I have this issue with a deployed application that has recently been upgraded from 2.2.14 (due to the HORNETQ-1042), but I have managed to recreate the issue in the attached test case, it fails on at least one of the tests almost every time run (increasing the values for MESSAGES_TO_SEND_PER_PRODUCER or MULTIPLE_PRODUCERS_COUNT increases the likelihood of failure).

       

      I'm presuming some kind of race condition in HornetQ, but would appreciate if anyone knows of this issue (I couldn't find anything relevant in Jira) or whether it's a simple configuration issue.

       

      Many Thanks

      Andy

        • 1. Re: Duplicate/retried messages when acknowledge fails due to race condition?
          ataylor

          you shouldnt be sharing JMS sessions in multiple threadsl, JMS sessions are not thread safe.

          • 2. Re: Duplicate/retried messages when acknowledge fails due to race condition?
            grim_toaster

            I'm not, each thread is using its own connection and session.

            • 3. Re: Duplicate/retried messages when acknowledge fails due to race condition?
              ataylor

              hmmm, strange, that message is indicitave of multiple threads accking the same session, basically a message is being acked that probably has been acked from elsewhere, can you double check your code and if you stillthink its an issue i will take alook next week.

              • 4. Re: Duplicate/retried messages when acknowledge fails due to race condition?
                grim_toaster

                Thanks for the response,

                 

                This is actually an issue we're seeing in the real application, where all of the consumers/producers are on seperate boxes, this sample code took a while to piece together to recreate the issue, but each of the 3 threads the test case uses definitely uses a different connection and session object.  I did had another test program that ran the server and consumers and producers in seperate VMs, but that involved all the external XML configurations and bootstrappers and required too many manual steps to be able to post here!

                 

                Although probably not conclusive, a simple sysout on the created sessions reveals:

                 

                HornetQRASession->DelegatingSession [session=ClientSessionImpl [name=1555b248-a8de-11e2-b5e0-2d03312c2ae2, username=null, closed=false, factory = ClientSessionFactoryImpl [serverLocator=ServerLocatorImpl [initialConnectors=[TransportConfiguration(name=14f0d6e8-a8de-11e2-b5e0-2d03312c2ae2, factory=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory) ], discoveryGroupConfiguration=null], connectorConfig=TransportConfiguration(name=1554eef3-a8de-11e2-b5e0-2d03312c2ae2, factory=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory) , backupConfig=null], metaData=(jms-client-id=-605023830,jms-session=,)]@5e310a07]

                 

                HornetQRASession->DelegatingSession [session=ClientSessionImpl [name=15551604-a8de-11e2-b5e0-2d03312c2ae2, username=null, closed=false, factory = ClientSessionFactoryImpl [serverLocator=ServerLocatorImpl [initialConnectors=[TransportConfiguration(name=14f0d6e8-a8de-11e2-b5e0-2d03312c2ae2, factory=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory) ], discoveryGroupConfiguration=null], connectorConfig=TransportConfiguration(name=154eae5d-a8de-11e2-b5e0-2d03312c2ae2, factory=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory) , backupConfig=null], metaData=(jms-client-id=-1433120513,jms-session=,)]@493a377d]

                 

                HornetQRASession->DelegatingSession [session=ClientSessionImpl [name=1554eef2-a8de-11e2-b5e0-2d03312c2ae2, username=null, closed=false, factory = ClientSessionFactoryImpl [serverLocator=ServerLocatorImpl [initialConnectors=[TransportConfiguration(name=14f0d6e8-a8de-11e2-b5e0-2d03312c2ae2, factory=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory) ], discoveryGroupConfiguration=null], connectorConfig=TransportConfiguration(name=14f0d6e8-a8de-11e2-b5e0-2d03312c2ae2, factory=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory) , backupConfig=null], metaData=(jms-client-id=-1776981760,jms-session=,)]@5ec881b5]

                • 5. Re: Duplicate/retried messages when acknowledge fails due to race condition?
                  ataylor

                  I see you are using the Resource Adapter connections, how are you running these tests,

                  • 6. Re: Duplicate/retried messages when acknowledge fails due to race condition?
                    grim_toaster

                    I've used both maven 3.0.4 at the command line (pom.xml file attached in the original zip) and from within eclipse, just as a normal Junit test.

                    • 7. Re: Duplicate/retried messages when acknowledge fails due to race condition?
                      ataylor

                      the sys outs show as HornetQRASession's but this is a typo in our tostring. I will take a look early next week when i have some time.

                      • 8. Re: Duplicate/retried messages when acknowledge fails due to race condition?
                        grim_toaster

                        Great, thank you.

                        • 9. Re: Duplicate/retried messages when acknowledge fails due to race condition?
                          grim_toaster

                          Hi,

                           

                          upgrading to version 2.3.0.Final fixes the issue, I couldn't find anything specific in Jira to identify the issue - but it all works now.

                           

                          Thanks for your time

                          Andy

                          • 10. Re: Duplicate/retried messages when acknowledge fails due to race condition?
                            grim_toaster

                            Hi,

                             

                            unfortunately, after further testing with the real application, the problem still persists.  I have updated the test project to include paging with the page size bytes and max bytes set to an artificially low number so the issue can be recreated consistently - in the "real" application we had a lot more producers and consumers sending larger messages.

                             

                            If the configuration is changed to use the AddressFullMessagePolicy of block, everything works fine.

                             

                            I have tried this with 2.3.0.Final (and 2.3.1.Final, which doesn't seem to be officially released, but is available via maven).

                             

                            Stack trace snippet:

                             

                            May 17, 2013 10:03:24 AM org.hornetq.core.protocol.core.ServerSessionPacketHandler handlePacket

                            ERROR: HQ224016: Caught exception

                            HornetQException[errorType=ILLEGAL_STATE message=HQ119027: Could not find reference on consumer ID=0, messageId = 2,026 queue = bde0dd50-17fd-4aad-8cdf-eb8b4d4fa478]

                                at org.hornetq.core.server.impl.ServerConsumerImpl.acknowledge(ServerConsumerImpl.java:704)

                                at org.hornetq.core.server.impl.ServerSessionImpl.acknowledge(ServerSessionImpl.java:634)

                                at org.hornetq.core.protocol.core.ServerSessionPacketHandler.handlePacket(ServerSessionPacketHandler.java:274)

                                at org.hornetq.core.protocol.core.impl.ChannelImpl.handlePacket(ChannelImpl.java:631)

                                at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.doBufferReceived(RemotingConnectionImpl.java:547)

                                at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.bufferReceived(RemotingConnectionImpl.java:523)

                             

                             

                            Thanks

                            Andy

                            • 11. Re: Duplicate/retried messages when acknowledge fails due to race condition?
                              grim_toaster

                              Hi,

                               

                              has anyone had a chance to look the project to help me identify the issues?  I'm presuming I've misconfigured something somewhere, as I'm sure other people have higher throughput systems that would have found these issues previously?

                               

                              Andy

                              • 12. Re: Duplicate/retried messages when acknowledge fails due to race condition?
                                grim_toaster

                                Hi,

                                 

                                anything obviously wrong with my configuration, or should I raise a Jira?

                                 

                                Thanks

                                Andy

                                • 13. Re: Duplicate/retried messages when acknowledge fails due to race condition?
                                  gaohoward

                                  Hi,

                                   

                                  I believe there is a bug. It seems it has something to do with hornetq's direct deliver logic. I'll investigate more, but I think it's no harm to raise a Jira first. Thanks!

                                   

                                  Howard

                                  • 14. Re: Duplicate/retried messages when acknowledge fails due to race condition?
                                    gaohoward

                                    Hi ,

                                     

                                    https://issues.jboss.org/browse/HORNETQ-1224

                                     

                                    The issue is the Queue has two executors that concurrently call Queue's deliver method, resulting in message acknowledgement out of order.

                                    1 2 Previous Next