6 Replies Latest reply on Aug 19, 2013 9:22 PM by pjlegato

    HornetQ delivery locks up using STOMP/2.3.0.CR2

    mrgordon

      We have been running 2.3.0.CR2 and the message delivery locks up permanently from time to time. We have approximately 15 HornetQs running on the application servers that bridge to a single HornetQ that has a consumer built on the ActiveMessaging poller using STOMP 1.2. The recipient HornetQ continues to successfully receive messages (ie. MessageCount increases normally) but the ActiveMessaging consumer no longer processes any messages. We have tried restarting the consumer but it takes a kill -9 of that HornetQ before we can get any message consumption to occur again. Note that the process does not appear to do anything in response to kill without -9.

       

      I have attached a thread dump and the counts for MessageCount, DeliveringCount, and ConsumerCount as Justin suggested. We have let the MessageCount exceed 400,000 in the past before restarting HornetQ but with no success in getting messages to process again. DeliveringCount and ConsumerCount have always been 1.

       

      Sometimes we have seen the queueBusy log message: "Queue {0} was busy for more than {1} milliseconds. There are possibly consumers hanging on a network operation"

       

      This seems to make sense since we have a monitoring process that checks MessageCount. Not sure what is causing the queue to be so busy though. One Ruby consumer keeps the queue empty consistently all day long so there really isn't that much traffic even though it has bridges from a number of other queues. Is this a fairly typical setup? Could it be related to overhead maintaining the bridges? Those seem to be stable so I wouldn't think so.

       

      Thanks for any advice. Happy to provide additional information.

        • 1. Re: HornetQ delivery locks up using STOMP/2.3.0.CR2
          gaohoward

          Hi,

           

          I have some questions regarding your issue:

           

          1) How your stomp poller connected to the single HornetQ server? Is it running at the same host machine as the HornetQ server? Is the poller a long running process?

          2) Did you observed any network problem between the poller and the HornetQ server?

           

          Thanks

          • 2. Re: HornetQ delivery locks up using STOMP/2.3.0.CR2
            mrgordon

            Yes, I should have mentioned that the STOMP poller is on the same server as the HornetQ. The poller itself runs continually under a supervisor, but the logs indicate that the connection is reset about once a minute. The poller will print out "connection.receive returning EOF as nil - resetting connection." at those times. The HornetQ will have an accompanying message in the logs that looks like:

             

            17:12:04,977 WARN  [org.hornetq.core.server] HQ222067: Connection failure has been detected: HQ119014: Did not receive data from /127.0.0.1:46527. It is likely the client has exited or crashed without closing its connection, or the network between the server and client has failed. You also might have configured connection-ttl and client-failure-check-period incorrectly. Please check user manual for more information. The connection will now be closed. [code=CONNECTION_TIMEDOUT]

            17:12:04,979 WARN  [org.hornetq.core.server] HQ222061: Client connection failed, clearing up resources for session 1d1bfe16-a914-11e2-8381-81f4cc04e5e2

            17:12:04,980 WARN  [org.hornetq.core.server] HQ222107: Cleared up resources for session 1d1bfe16-a914-11e2-8381-81f4cc04e5e2

             

            I have been trying to understand why the connection is not kept open for longer, but the poller continues to receive messages after the connection is reset so I was left with the impression that it was not the root of the problem.

            • 3. Re: HornetQ delivery locks up using STOMP/2.3.0.CR2
              gaohoward

              You have take measures to keep the connection alive. Either you enable Stomp ping (1.1, 1.2) on your poller, or you can make the stomp connection ttl (connection-ttl) very large.

              • 4. Re: HornetQ delivery locks up using STOMP/2.3.0.CR2
                mrgordon

                Yes I will look into that more. I have more carefully configured my STOMP producers but the ActiveMessaging library doesn't seem to expose much in the way of configuring the STOMP connections it makes to consume. That said, one consumer disconnecting once a minute should not cause HornetQ to lock up permanently requiring a kill -9!

                • 5. Re: HornetQ delivery locks up using STOMP/2.3.0.CR2
                  gaohoward

                  Can you help raising a Jira? I'll investigate it.

                   

                  Thanks

                  • 6. Re: HornetQ delivery locks up using STOMP/2.3.0.CR2
                    pjlegato

                    We opened a ticket for this issue at https://issues.jboss.org/browse/HORNETQ-1245 .