HornetQ concurrency issue with fully occupied thread pool
swany May 27, 2014 10:53 AMEncountered the following exception at a high-volume customer site the other day:
2014-05-21 06:25:26,802 WARN (Old I/O server worker (parentId: 649447027, [id: 0x26b5c673, /0.0.0.0:5445])) [org.hornetq.core.server.impl.QueueImpl] Couldn't finish waiting executors. Try increasing the thread pool size
java.lang.Exception: trace
at org.hornetq.core.server.impl.QueueImpl.blockOnExecutorFuture(QueueImpl.java:471)
at org.hornetq.core.server.impl.QueueImpl.getMessageCount(QueueImpl.java:729)
at org.hornetq.core.server.impl.ServerSessionImpl.executeQueueQuery(ServerSessionImpl.java:514)
at org.hornetq.core.protocol.core.ServerSessionPacketHandler.handlePacket(ServerSessionPacketHandler.java:221)
at org.hornetq.core.protocol.core.impl.ChannelImpl.handlePacket(ChannelImpl.java:474)
at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.doBufferReceived(RemotingConnectionImpl.java:496)
at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.bufferReceived(RemotingConnectionImpl.java:457)
at org.hornetq.core.remoting.server.impl.RemotingServiceImpl$DelegatingBufferHandler.bufferReceived(RemotingServiceImpl.java:459)
at org.hornetq.core.remoting.impl.netty.HornetQChannelHandler.messageReceived(HornetQChannelHandler.java:73)
at org.jboss.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:100)
at org.jboss.netty.channel.StaticChannelPipeline.sendUpstream(StaticChannelPipeline.java:362)
at org.jboss.netty.channel.StaticChannelPipeline$StaticChannelHandlerContext.sendUpstream(StaticChannelPipeline.java:514)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:287)
at org.hornetq.core.remoting.impl.netty.HornetQFrameDecoder2.decode(HornetQFrameDecoder2.java:169)
at org.hornetq.core.remoting.impl.netty.HornetQFrameDecoder2.messageReceived(HornetQFrameDecoder2.java:134)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80)
at org.jboss.netty.channel.StaticChannelPipeline.sendUpstream(StaticChannelPipeline.java:362)
at org.jboss.netty.channel.StaticChannelPipeline.sendUpstream(StaticChannelPipeline.java:357)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261)
at org.jboss.netty.channel.socket.oio.OioWorker.run(OioWorker.java:90)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46)
at org.jboss.netty.util.VirtualExecutorService$ChildExecutorRunnable.run(VirtualExecutorService.java:181)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
CPUs are not fully utilized. Long-running (tens of seconds, and up to minutes) processes on the MDB handler thread pool. Attempted to address the issue by setting:
<thread-pool-max-size>60</thread-pool-max-xize>
in the hornetq-configuration.xml file (24 CPU machine), and ensuring our MDB pool max size is also at least this large in standardjboss.xml. However, this did not seem to solve our CPU not-fully-utilized problem. We have further research to do to (I'm not sure our application is fully utilizing all available CPUs).
But my most important question is this:
We are using HornetQ 2.2.5 and in QueueImpl.java we see:
public long getMessageCount()
{
blockOnExecutorFuture();
return getInstantMessageCount();
}
Why must the getMessageCount() call run an empty Future through the (fully occupied) thread pool just to get what is probably an "estimate" count anyway? This "run an empty Future through the thread pool" is blocking due to other long-running activities that explicitly use the thread pool. And the session handler cannot handle packets for a lengthy period of time due to this blocking on the message count (for what?).
Is there a way around this via configuration or other mechanism?
Thanks.
Mark