2 Replies Latest reply on Sep 29, 2010 6:52 AM by robertjlee

    HornetQ locks up on shutdown

    robertjlee

      Running TRUNK:

       

      We had the situation where we had around 9000 messagees for one consumer in a topic subscription on a blocking topic. We used JMX to move all messages for that subscription into another queue, but JConsole then said that there were 270 messages in the subscription. Several hours later, the topic was still blocking with no more messages successfully added, and subsequent attempts to move messages from the core queue representing the topic subscription returned 0 (i.e. no messages moved).

       

      We tried to restart HornetQ to see if we could recover, but it did not shut down properly; it locked up after the org.hornetq node was removed from JConsole.

       

      Looking at the stacktrace, thread 24463 is blocked on PostOfficeImpl$Reaper.stop(), which is a synchronised method that sets a flag and calls notify(). Looking at the source code, this seems to be a bug, as the run() method checks the closed flag (set only by the stop() method) repeatedly, but it can never change while run() is running as both methods are synchronised.

       

      Also, thread 5104 appears to be blocked on expiring references in a queue. Thread 5196 is the only thread inside a method synchronised on a queue, and it appears to be trying to deliver message through the core bridge; the only core bridge we have pulls messages from a (paging) JMS queue into the (blocking) JMS topic. This may be related to the original problem?

       

      We haven't been able to reproduce this yet, but we have taken stacktraces (attached). jstack.5080 was taken after moving the messages from the subscription, while jstack.5080.shutdown was taken several minutes after we signalled HornetQ to shutdown.

       

      Is there anything further we can usefully do to investigate problems like this?

        • 1. Re: HornetQ locks up on shutdown
          clebert.suconic

          Did you have paged messages on the address when it happened? Just trying to understand the scenario.

          • 2. Re: HornetQ locks up on shutdown
            robertjlee

            The scenario is that we have JMS queue that feeds messages into a JMS topic by means of a core bridge.  The topic is configured to block when address is full and the queue is configured to page.  The idea being that when paging occurs on the queue we can institute diverts on the topic, move messages out of the largest topic subscription into a new holding queue and the transformer on the core bridge will alter the message headers to cause the diverts to cause messages to 'split' into two: one copy for the topic minus the offending subscriber's token and the other copy to go into the subscriber's holding queue.

            The move part of this had failed in as much as it had not managed to move some 270 of 9000 messages (possibly because they were being delivered to a consumer who had failed mid-delivery).  The outcome was that the topic was still blocking despite having a lot fewer messages in it than the memory parameters would allow.

             

            We would have had paged messages on the JMS queue. The JMS topic was configured as blocking, so its address would not have had paged messages.