-
1. Re: Duplicate Packets being received (100K+)
timfox May 17, 2010 8:09 AM (in response to rnicholson10)Probably you didn't configure your cache to be large enough
-
2. Re: Duplicate Packets being received (100K+)
timfox May 17, 2010 8:11 AM (in response to timfox)Also confirmation-window-size
-
3. Re: Duplicate Packets being received (100K+)
timfox May 17, 2010 8:11 AM (in response to timfox)BTW, you should try TRUNK
-
4. Re: Duplicate Packets being received (100K+)
rnicholson10 May 17, 2010 8:39 AM (in response to timfox)Unfortunately I cannot reporduce this error in test so using trunk would not help me (I guess it has a lot to do with the volume of traffic). I can't push trunk out onto the production environment!
The manual states that confirmation-window-size is defaulted to -1 which means no buffering, so that should mean that all packets get acknowledged immediately? Unless this is diferent when using core bridges.
Regarding duplicate detection the default is 2000. I would have needed to set this to 3.6 million (3 days approx @ 50K/hour) to stop the issue occuring this time. On the previous occasion this happened one of the messages was 27 days old which would mean a duplicate cache of 32.4 million would be required.
-
5. Re: Duplicate Packets being received (100K+)
rnicholson10 May 17, 2010 8:42 AM (in response to rnicholson10)This is interesting. We have several servers that send data. So think of it as several Server As from teh example above.
I received copies of the logs from each server individually just now and one of the servers has a lot of these types of errors:
2010-05-16 15:56:31,252 ERROR [org.hornetq.core.server.cluster.impl.BridgeImpl] (Old I/O client worker (channelId: 1679787891, /172.16.71.111:57589 => phase-engine-live/172.16.71.120:5445)) Failed to ack
java.util.NoSuchElementException
at java.util.LinkedList.remove(Unknown Source)
at java.util.LinkedList.removeFirst(Unknown Source)
at java.util.LinkedList.poll(Unknown Source)
at org.hornetq.core.server.cluster.impl.BridgeImpl.sendAcknowledged(BridgeImpl.java:371)
at org.hornetq.core.client.impl.ClientSessionImpl.commandConfirmed(ClientSessionImpl.java:1078)
at org.hornetq.core.remoting.impl.ChannelImpl.clearUpTo(ChannelImpl.java:509)
at org.hornetq.core.remoting.impl.ChannelImpl.handlePacket(ChannelImpl.java:435)
at org.hornetq.core.remoting.impl.RemotingConnectionImpl.doBufferReceived(RemotingConnectionImpl.java:423)
at org.hornetq.core.remoting.impl.RemotingConnectionImpl.bufferReceived(RemotingConnectionImpl.java:389)
at org.hornetq.core.client.impl.FailoverManagerImpl$DelegatingBufferHandler.bufferReceived(FailoverManagerImpl.java:1095)
at org.hornetq.integration.transports.netty.HornetQChannelHandler.messageReceived(HornetQChannelHandler.java:67)
at org.jboss.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:98)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:796)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:358)
at org.hornetq.integration.transports.netty.HornetQFrameDecoder2.decode(HornetQFrameDecoder2.java:171)
at org.hornetq.integration.transports.netty.HornetQFrameDecoder2.messageReceived(HornetQFrameDecoder2.java:136)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:345)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:332)
at org.jboss.netty.channel.socket.oio.OioWorker.run(OioWorker.java:99)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46)
at org.jboss.netty.util.VirtualExecutorService$ChildExecutorRunnable.run(VirtualExecutorService.java:179)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)Gimme some time to see if all the offending packets came from this server in particular, it may be the root cuase of the problem.
-
6. Re: Duplicate Packets being received (100K+)
timfox May 17, 2010 9:55 AM (in response to rnicholson10)Looking at the code I can see there is a possibility for seeing exceptions like you're getting, because the refs member is not a concurrent structure but is accessed by different threads.
I'll change it to a ConcurrentLinkedQueue in TRUNK
-
7. Re: Duplicate Packets being received (100K+)
rnicholson10 May 17, 2010 9:59 AM (in response to timfox)Is it possible that this is the source of the packets that are resent? I.e. Would packets not be acknowedged because of this?
-
8. Re: Duplicate Packets being received (100K+)
timfox May 17, 2010 10:01 AM (in response to rnicholson10)I can't be sure it's causing your issue, but if it did occur then it would result in the message not being acked, so it would be resent on the next restart.
-
9. Re: Duplicate Packets being received (100K+)
rnicholson10 May 17, 2010 10:28 AM (in response to timfox)Ok, that's good enough for me as I'm at a loss to discover what else could be causing this!
The strange thing is that this issue does not always cause a problem when we restart. This happens sometimes and not others. I'll set the logger to capture these log messages so that we can restart pre-emptively if this happens again.
Might I ask when we will see the 2.1.0 release of HornetQ?
Since CR3 we have been using HornetQ in our production environment and this is the first serious problem we've come across.
I also had a look at Server Bs logs and I do get quite a few of these errors although I have no way of tying them to the logs from Server A.
2010-05-14 03:49:25,703 ERROR [org.hornetq.core.server.impl.ServerSessionImpl] (Thread-1171 (group:HornetQ-server-threads735615486-426117423)) Failed to acknowledge
java.lang.IllegalStateException: 899471952 Could not find reference on consumerID=0, messageId = 217174730 queue = jms.queue.phaseQueueFromInput closed = false
at org.hornetq.core.server.impl.ServerConsumerImpl.acknowledge(ServerConsumerImpl.java:524)
at org.hornetq.core.server.impl.ServerSessionImpl.handleAcknowledge(ServerSessionImpl.java:648)
at org.hornetq.core.server.impl.ServerSessionPacketHandler.handlePacket(ServerSessionPacketHandler.java:157)
at org.hornetq.core.remoting.impl.ChannelImpl.handlePacket(ChannelImpl.java:466)
at org.hornetq.core.remoting.impl.RemotingConnectionImpl.doBufferReceived(RemotingConnectionImpl.java:423)
at org.hornetq.core.remoting.impl.RemotingConnectionImpl.bufferReceived(RemotingConnectionImpl.java:389)
at org.hornetq.core.remoting.server.impl.RemotingServiceImpl$DelegatingBufferHandler.bufferReceived(RemotingServiceImpl.java:432)
at org.hornetq.core.remoting.impl.invm.InVMConnection$1.run(InVMConnection.java:131)
at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:96)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)