9 Replies Latest reply on May 29, 2014 9:59 PM by clebert.suconic

    HornetQ concurrency issue with fully occupied thread pool

    swany

      Encountered the following exception at a high-volume customer site the other day:

       

      2014-05-21 06:25:26,802 WARN  (Old I/O server worker (parentId: 649447027, [id: 0x26b5c673, /0.0.0.0:5445])) [org.hornetq.core.server.impl.QueueImpl] Couldn't finish waiting executors. Try increasing the thread pool size

      java.lang.Exception: trace

                  at org.hornetq.core.server.impl.QueueImpl.blockOnExecutorFuture(QueueImpl.java:471)

                  at org.hornetq.core.server.impl.QueueImpl.getMessageCount(QueueImpl.java:729)

                  at org.hornetq.core.server.impl.ServerSessionImpl.executeQueueQuery(ServerSessionImpl.java:514)

                  at org.hornetq.core.protocol.core.ServerSessionPacketHandler.handlePacket(ServerSessionPacketHandler.java:221)

                  at org.hornetq.core.protocol.core.impl.ChannelImpl.handlePacket(ChannelImpl.java:474)

                  at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.doBufferReceived(RemotingConnectionImpl.java:496)

                  at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.bufferReceived(RemotingConnectionImpl.java:457)

                  at org.hornetq.core.remoting.server.impl.RemotingServiceImpl$DelegatingBufferHandler.bufferReceived(RemotingServiceImpl.java:459)

                  at org.hornetq.core.remoting.impl.netty.HornetQChannelHandler.messageReceived(HornetQChannelHandler.java:73)

                  at org.jboss.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:100)

                  at org.jboss.netty.channel.StaticChannelPipeline.sendUpstream(StaticChannelPipeline.java:362)

                  at org.jboss.netty.channel.StaticChannelPipeline$StaticChannelHandlerContext.sendUpstream(StaticChannelPipeline.java:514)

                  at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:287)

                  at org.hornetq.core.remoting.impl.netty.HornetQFrameDecoder2.decode(HornetQFrameDecoder2.java:169)

                  at org.hornetq.core.remoting.impl.netty.HornetQFrameDecoder2.messageReceived(HornetQFrameDecoder2.java:134)

                  at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80)

                  at org.jboss.netty.channel.StaticChannelPipeline.sendUpstream(StaticChannelPipeline.java:362)

                  at org.jboss.netty.channel.StaticChannelPipeline.sendUpstream(StaticChannelPipeline.java:357)

                  at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274)

                  at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261)

                  at org.jboss.netty.channel.socket.oio.OioWorker.run(OioWorker.java:90)

                  at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)

                  at org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46)

                  at org.jboss.netty.util.VirtualExecutorService$ChildExecutorRunnable.run(VirtualExecutorService.java:181)

                  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)

                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)

                  at java.lang.Thread.run(Thread.java:722)

       

      CPUs are not fully utilized. Long-running (tens of seconds, and up to minutes) processes on the MDB handler thread pool. Attempted to address the issue by setting:

       

      <thread-pool-max-size>60</thread-pool-max-xize>

       

      in the hornetq-configuration.xml file (24 CPU machine), and ensuring our MDB pool max size is also at least this large in standardjboss.xml. However, this did not seem to solve our CPU not-fully-utilized problem. We have further research to do to (I'm not sure our application is fully utilizing all available CPUs).

       

      But my most important question is this:

       

      We are using HornetQ 2.2.5 and in QueueImpl.java we see:

       

         public long getMessageCount()

         {

            blockOnExecutorFuture();

       

            return getInstantMessageCount();

         }

       

      Why must the getMessageCount() call run an empty Future through the (fully occupied) thread pool just to get what is probably an "estimate" count anyway? This "run an empty Future through the thread pool" is blocking due to other long-running activities that explicitly use the thread pool. And the session handler cannot handle packets for a lengthy period of time due to this blocking on the message count (for what?).

       

      Is there a way around this via configuration or other mechanism?

       

      Thanks.

       

      Mark

        • 1. Re: HornetQ concurrency issue with fully occupied thread pool
          jbertram

          My understanding is that the future is submitted to the executor so that the message count will be as accurate as possible (i.e. all current operations will have completed).  In any case, the message count will be returned after the future times out.

           

          Later versions have a getMessageCount(long timeout) on org.hornetq.core.server.impl.QueueImpl, but it's not exposed on the management API.

           

          What specifically is calling getMessageCount() and why?  How frequently is getMessageCount() being invoked?

           

          Have you considered moving to HornetQ 2.4.0.Final?  Queue locking has been improved which might mitigate some of these problems.

           

          Lastly, if you're using MDBs then take a look at the "maxSession" activation configuration property.  Increasing that will allow more MDB sessions to run concurrently which should use more CPU.

          1 of 1 people found this helpful
          • 2. Re: HornetQ concurrency issue with fully occupied thread pool
            clebert.suconic

            There were improvements to queue delivery on later versions... 2.2.5 would block while the client was receiving a message, while new versions would send it unblocked.

             

             

            you can try latest 2.2.x (you would need EAP or a manual checkout of the github and build it).. or the best would be moving to a newer version (2.4.0 as justin mentioned)... for this case even 2.3. (from AS7) will have the fix already.

            1 of 1 people found this helpful
            • 3. Re: HornetQ concurrency issue with fully occupied thread pool
              swany

              The stack trace we're seeing shows its the HornetQ ServerSessionPacketHandler (attempting to get the queue name?) and passing back all sorts of queue info (according to the source comments - to be able to recreate the queue on failover ...). The trouble is one of those pieces of info is the current queue count. So we have a mandatory 10-second wait just to get the queue name when a packet passes through the HornetQ packet handler. I checked release 2.2.14, 2.3.0, and 2.4.0. Operation is identical, with the single change that the packet handler for 2.3.0+ passes in a 1-second delay instead of the 10-second delay. So I suppose moving up to 2.3.0+ would improve our issue somewhat, but not completely alleviate it.

               

              Worse, we're stuck on JBoss 4.0.3 for the time being. Even having 'hacked' in HornetQ 2.2.5 was somewhat non-trivial. We've had poor luck thus far attempting to go much newer than that. We may renew our efforts in that regard, however, given Clebert's comments below.

               

              We did in fact increase the maxSession property to be 60, up from the default of 15 (and equal to our setting of the HornetQ thread pool max size).

               

              Thanks for the information. The real issue is in our application. I was just concerned that we were hanging and timing out receiving a message packet.

              • 4. Re: HornetQ concurrency issue with fully occupied thread pool
                jbertram

                The stack trace we're seeing shows its the HornetQ ServerSessionPacketHandler (attempting to get the queue name?) and passing back all sorts of queue info (according to the source comments - to be able to recreate the queue on failover ...).

                OK, so it looks like this is being done in response to a packet from a client to create a consumer.

                 

                Worse, we're stuck on JBoss 4.0.3 for the time being. Even having 'hacked' in HornetQ 2.2.5 was somewhat non-trivial. We've had poor luck thus far attempting to go much newer than that. We may renew our efforts in that regard, however, given Clebert's comments below.

                JBoss AS 4.0.3 is ancient.  Could you ease the difficulty of an upgrade by hosting HornetQ as a standalone server rather than integrated with JBoss AS?

                 

                We did in fact increase the maxSession property to be 60, up from the default of 15 (and equal to our setting of the HornetQ thread pool max size).

                I'm not entirely sure which thread pool services MDB sessions, but assuming it's the same thread pool that is controlled by thread-pool-max-size then setting the maxSession activation property on the MDB and the thread-pool-max-size both to 60 would almost certainly cause problems during heavy loads.  I think the thread-pool-max-size would need to be higher than the combined maxSession values of every MDB in the container to give HornetQ spare threads for other tasks aside from delivering messages to MDBs.

                • 5. Re: HornetQ concurrency issue with fully occupied thread pool
                  clebert.suconic

                  It sounds you are creating the consumer on every message you're receiving? that sound of operation is only required when you create a consumer.. but if you are creating a consumer for each message you receive.. don't use a message system at all.. just poll of a database.

                   

                  If that's the anti-pattern you're using I suggest you to reuse the consumers and keep them alive.

                   

                  If you can't fix it for any reason (like some people will say they are using spring JMS Template).. then don't use any message system at all. just poll of a database and you're better in perf. (moving to ActiveMQ, or any other JMS system would only transfer the issue to that other system).

                   

                   

                  I mean.. if that's what you're seeing... if not I would need some more information to figure out what other thing could be that you're facing.

                   

                   

                   

                  As for hacking 2.2.x into AS 4.. just rebuild 2.2.eap from github.

                   

                  It should be possible to someone to write integration to AS4 for 2.3.x. It's not that hard actually. Just instantiating the beans.. We are not doing it because we don't have resources.. but anyone with basic java and AS4 knowledge could do it with one or two days worth of work

                  • 6. Re: HornetQ concurrency issue with fully occupied thread pool
                    swany

                    Thanks for the overall architecture overview Clebert. In a future product revision, we will be heading towards a stateless, transactionless, pure standalone HornetQ server version of our product designed around HornetQ and tuned for pure throughput of processing items. However, for the near future (1 year or so) we'll be stuck supporting HornetQ 2.2.5 (or greater if I can manage to hack in a newer version) embedded in JBoss 4.0.3. I can't recall the issues we had trying to upgrade to 2.4.0, but it's nice to know not to keep trying without making code changes! I don't know if we have the JBoss source code experience to attack this project, but we'll see how desperate we get. Any tips on the HornetQ 2.4 class(es) I might look at to start this effort (perhaps I should hunt around the dev forum)?

                     

                    For now, we've implemented the max thread pool size being greater than the total MDB pool size (as Justin suggested) now that we know the thread pool does work other than message processing and should always have a few available threads for this work. I didn't think we were creating a session or consumer for each message, and not sure how Justin derived this from our stack trace (I don't see it), but I'll check the source. That sure is an anti-pattern I'd be anxious to fix. With a lot of fingers in the development "pie" over the years, this large product has evidently accumulated some issues that need addressed. And our production servers typically run at customer sites where we do not have direct access. They send us log files and stack traces (like above) and we are sometimes left to poke around and guess.

                     

                    Thanks again for spending so much time to clarify and assist. And thanks for a great product (we actually like HornetQ so much we're removing JBoss in a future project rev and using a standalone HornetQ server as described above - a testament to our faith in the product).

                     

                    Sorry about the late response. It appears I can post only once per hour to the forum.

                     

                    Mark

                    • 7. Re: HornetQ concurrency issue with fully occupied thread pool
                      jbertram

                      For now, we've implemented the max thread pool size being greater than the total MDB pool size (as Justin suggested) now that we know the thread pool does work other than message processing and should always have a few available threads for this work.

                      To be clear, the documentation refers to this thread pool as a "general purpose thread pool" which "is used for most asynchronous actions on the server side."

                       

                      I didn't think we were creating a session or consumer for each message, and not sure how Justin derived this from our stack trace (I don't see it), but I'll check the source.

                      Your stack-trace indicates that org.hornetq.core.server.impl.ServerSessionImpl.executeQueueQuery() is being invoked from line 221 of org.hornetq.core.protocol.core.ServerSessionPacketHandler.  As you can see in the linked source code, at line 221 the handlePacket() method is handling a SESS_CREATECONSUMER packet.  In other words, a remote client has sent a packet to the server indicating it wants to create a consumer.

                       

                      It's not 100% clear that you are creating a consumer for every message, but that would be one way to explain the thread contention.

                      • 8. Re: HornetQ concurrency issue with fully occupied thread pool
                        swany

                        Justin / Clebert,

                         

                        I have dug through several layers of libraries in this large application I've "inherited", and it appears your suspicions are absolutely correct. We are creating sessions & producers with every message send. I will work to remove this anti-pattern.

                         

                        Many thanks for all the assistance and useful internal HornetQ operational information. It has helped expose further poor assumptions on our part involving our threading model.

                         

                        We have some work to do. Consider this issue closed.

                         

                        Mark

                        • 9. Re: HornetQ concurrency issue with fully occupied thread pool
                          clebert.suconic

                          I bet you're using JMSTemplate from Spring.... ???