9 Replies Latest reply on May 17, 2010 10:28 AM by rnicholson10

    Duplicate Packets being received (100K+)

    rnicholson10

      Hi,

       

      I have a strange issue. Our app servers were up and running for 3 days. On the third day they were restarted, we then received 100K+ messages from the first and second days that had already been received previously.

       

      We have two app servers A and B both running JBoss 5.1.0.GA with HornetQ2.0.0.GA. The servers are connected by a core bridge in one direction, A->B.

       

      As an example there was a message sent at 2010-05-14@17:16:15,613 and received at 2010-05-14@17:16:15,615 as we would expect.

       

      On the third day 2010-05-16 both servers were shut down. Server B was started started up @16:01 and Server A started @16:02. This very same message was received on Server B after the restart 2010-05-16@16:02:30,994.

       

      I have taken a sample of the 100K messages received and all of them were received previously and then resent upon restart.

       

      Any idea why these messages were not correctly acknowledged in the first place? And why are they not detected as duplicates?

       

      As a guideline we send approx. 50,000 messages per hour.

       

      Cheers,

       

      R.

       

      P.S. This has happened before but after the servers were running for 1 month. We scheduled this restart after 3 days to attempt to figure out what was happening. It took approximately 2 hours for all of these resent packets to be processed and for the system to return to normal processing.

        • 1. Re: Duplicate Packets being received (100K+)
          timfox
          • 2. Re: Duplicate Packets being received (100K+)
            timfox

            Also confirmation-window-size

            • 3. Re: Duplicate Packets being received (100K+)
              timfox

              BTW, you should try TRUNK

              • 4. Re: Duplicate Packets being received (100K+)
                rnicholson10

                Unfortunately I cannot reporduce this error in test so using trunk would not help me (I guess it has a lot to do with the volume of traffic). I can't push trunk out onto the production environment!

                 

                The manual states that confirmation-window-size is defaulted to -1 which means no buffering, so that should mean that all packets get acknowledged immediately? Unless this is diferent when using core bridges.

                 

                Regarding duplicate detection the default is 2000. I would have needed to set this to 3.6 million (3 days approx @ 50K/hour) to stop the issue occuring this time. On the previous occasion this happened one of the messages was 27 days old which would mean a duplicate cache of 32.4 million would be required.

                • 5. Re: Duplicate Packets being received (100K+)
                  rnicholson10

                  This is interesting. We have several servers that send data. So think of it as several Server As from teh example above.

                   

                  I received copies of the logs from each server individually just now and one of the servers has a lot of these types of errors:

                   

                  2010-05-16 15:56:31,252 ERROR [org.hornetq.core.server.cluster.impl.BridgeImpl] (Old I/O client worker (channelId: 1679787891, /172.16.71.111:57589 => phase-engine-live/172.16.71.120:5445)) Failed to ack
                  java.util.NoSuchElementException
                          at java.util.LinkedList.remove(Unknown Source)
                          at java.util.LinkedList.removeFirst(Unknown Source)
                          at java.util.LinkedList.poll(Unknown Source)
                          at org.hornetq.core.server.cluster.impl.BridgeImpl.sendAcknowledged(BridgeImpl.java:371)
                          at org.hornetq.core.client.impl.ClientSessionImpl.commandConfirmed(ClientSessionImpl.java:1078)
                          at org.hornetq.core.remoting.impl.ChannelImpl.clearUpTo(ChannelImpl.java:509)
                          at org.hornetq.core.remoting.impl.ChannelImpl.handlePacket(ChannelImpl.java:435)
                          at org.hornetq.core.remoting.impl.RemotingConnectionImpl.doBufferReceived(RemotingConnectionImpl.java:423)
                          at org.hornetq.core.remoting.impl.RemotingConnectionImpl.bufferReceived(RemotingConnectionImpl.java:389)
                          at org.hornetq.core.client.impl.FailoverManagerImpl$DelegatingBufferHandler.bufferReceived(FailoverManagerImpl.java:1095)
                          at org.hornetq.integration.transports.netty.HornetQChannelHandler.messageReceived(HornetQChannelHandler.java:67)
                          at org.jboss.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:98)
                          at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
                          at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:796)
                          at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:358)
                          at org.hornetq.integration.transports.netty.HornetQFrameDecoder2.decode(HornetQFrameDecoder2.java:171)
                          at org.hornetq.integration.transports.netty.HornetQFrameDecoder2.messageReceived(HornetQFrameDecoder2.java:136)
                          at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80)
                          at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
                          at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555)
                          at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:345)
                          at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:332)
                          at org.jboss.netty.channel.socket.oio.OioWorker.run(OioWorker.java:99)
                          at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
                          at org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46)
                          at org.jboss.netty.util.VirtualExecutorService$ChildExecutorRunnable.run(VirtualExecutorService.java:179)
                          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
                          at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
                          at java.lang.Thread.run(Unknown Source)

                   

                  Gimme some time to see if all the offending packets came from this server in particular, it may be the root cuase of the problem.

                  • 6. Re: Duplicate Packets being received (100K+)
                    timfox

                    Looking at the code I can see there is a possibility for seeing exceptions like you're getting, because the refs member is not a concurrent structure but is accessed by different threads.

                     

                    I'll change it to a ConcurrentLinkedQueue in TRUNK

                    • 7. Re: Duplicate Packets being received (100K+)
                      rnicholson10

                      Is it possible that this is the source of the packets that are resent? I.e. Would packets not be acknowedged because of this?

                      • 8. Re: Duplicate Packets being received (100K+)
                        timfox

                        I can't be sure it's causing your issue, but if it did occur then it would result in the message not being acked, so it would be resent on the next restart.

                        • 9. Re: Duplicate Packets being received (100K+)
                          rnicholson10

                          Ok, that's good enough for me as I'm at a loss to discover what else could be causing this!

                           

                          The strange thing is that this issue does not always cause a problem when we restart. This happens sometimes and not others. I'll set the logger to capture these log messages so that we can restart pre-emptively if this happens again.

                           

                          Might I ask when we will see the 2.1.0 release of HornetQ?

                           

                          Since CR3 we have been using HornetQ in our production environment and this is the first serious problem we've come across.

                           

                          I also had a look at Server Bs logs and I do get quite a few of these errors although I have no way of tying them to the logs from Server A.

                           

                          2010-05-14 03:49:25,703 ERROR [org.hornetq.core.server.impl.ServerSessionImpl] (Thread-1171 (group:HornetQ-server-threads735615486-426117423)) Failed to acknowledge
                          java.lang.IllegalStateException: 899471952 Could not find reference on consumerID=0, messageId = 217174730 queue = jms.queue.phaseQueueFromInput closed = false
                                  at org.hornetq.core.server.impl.ServerConsumerImpl.acknowledge(ServerConsumerImpl.java:524)
                                  at org.hornetq.core.server.impl.ServerSessionImpl.handleAcknowledge(ServerSessionImpl.java:648)
                                  at org.hornetq.core.server.impl.ServerSessionPacketHandler.handlePacket(ServerSessionPacketHandler.java:157)
                                  at org.hornetq.core.remoting.impl.ChannelImpl.handlePacket(ChannelImpl.java:466)
                                  at org.hornetq.core.remoting.impl.RemotingConnectionImpl.doBufferReceived(RemotingConnectionImpl.java:423)
                                  at org.hornetq.core.remoting.impl.RemotingConnectionImpl.bufferReceived(RemotingConnectionImpl.java:389)
                                  at org.hornetq.core.remoting.server.impl.RemotingServiceImpl$DelegatingBufferHandler.bufferReceived(RemotingServiceImpl.java:432)
                                  at org.hornetq.core.remoting.impl.invm.InVMConnection$1.run(InVMConnection.java:131)
                                  at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:96)
                                  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
                                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
                                  at java.lang.Thread.run(Unknown Source)