6 Replies Latest reply on Oct 1, 2010 4:15 AM by robertjlee

    JMX Console locked; JConsole shows no information

    robertjlee

      Our setup is a JMS topic, configured to block. A core bridge moves messages from a JMS queue (configured to page) into the topic

       

      A large number of messages are being put into the queue over time, and not all consumers in the topic are consuming messages, so we expect the topic to block and the queue to page. We also have a non-exclusive divert to copy some messages from the topic to another core queue (also paging).

       

      The problem is that when we get enough messages in the system, it seems that JConsole stops showing us any information. It looks like if we try to access something that's blocking, it stops collecting information from the server and just shows us blank screens.

       

      Doing a stacktrace on the server, it seems that the JMX thread (Thread 8434) is blocked trying to access a synchronised method on a core queue. Meanwhile, one other thread (Thread 4701) is blocked while being inside a synchronised method on a core queue (presumably the same queue), delivering a message through the core queue.

       

      The latter thread is the one that worries us, because it doesn't seem to be deadlocked; it just seems to be parked waiting for a notification that never arrives (we have left it running in this state overnight, and JMX hasn't recovered).

       

      We are using 1.6.0u21, so we know that JDK bug 6801020 isn't the issue (http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6801020).

       

      Is this a bug, and how do we avoid this situation?

       

      Attached is a jstack stack-trace including the blocked threads.

        • 1. Re: JMX Console locked; JConsole shows no information
          timfox

          in older versions, getMessageCount() used to block all delivery on a queue when it was in progress, so it was important that monitoring tools don't call it too often, it could really slow down the system.

           

          In current TRUNK there is no longer a lock. However it's still an expensive operation so don't call it in a tight loop.

          • 2. Re: JMX Console locked; JConsole shows no information
            robertjlee

            That's great to hear.

            Out of interest, is there a more efficient way to find out which queue at an address contains the largest number of messages?

            • 3. Re: JMX Console locked; JConsole shows no information
              robertjlee

              Hi Tim,

               

              We've checked out and built the code from TRUNK, and  it would appear that, while getMessageCount() now does not block when a  topic is blocking, it throws an IllegalStateException instead.

               

              I don't understand why this is an illegal state; the  address-full policy of BLOCKing is clearly defined in the manual - so it  seems wrong to throw an IllegalStateException here. It's not like the client code is calling methods on an object out of a required order, for example.

               

              What I'm trying to do is identify, when the topic is blocking, which core queue/consumer is not consuming messages, i.e. which queue is causing the block. This is an exceptional case, but currently, the only way I have  to do this is to go over all the queues in the topic (using JMX) and get  the message count for each one. It seems that it would be an advantage to  have a simpler mechanism to test if a queue was blocking without actually counting anything (assuming that the queue causing the error is the one which throws this error), but it would also seem wrong to catch an  IllegalStateException as this exception is far too general.

               

              It seems very odd that blocking addresses cause so much to stop working; I've just tried stopping the core bridge that feeds the topic, and the JMX method returns successfully but the bridge's state remains "started=true", and the stacktrace still shows the consumer parked in acquireCredits(). If you attempt to look at the attributes of the core queue MBean on the source of the bridge, then the JMX thread blocks on getConsumerCount() and doesn't recover until HornetQ is restarted - unless you can somehow remedy the problem of a consumer not picking up without using the JMX console!

               

              I am also worried that we may not be able to move messages out of a core queue if its causing its address to block; this really would be a showstopper for us. (Although we've not been able to test this yet).

              • 4. Re: JMX Console locked; JConsole shows no information
                timfox

                If you can provide test case, instructions, stack trace etc someone can investigate.

                • 5. Re: JMX Console locked; JConsole shows no information
                  timfox

                  AIUI the only reason you do this is so you can avoid a slow subscription blocking delivery on other subscriptions on the same address when paging is in operation.

                   

                  If so, Clebert is currently implementing new functionality in paging which should allow paging to be configured at the queue level, thus preventing blocking. See dev forum thread for more info.

                  • 6. Re: JMX Console locked; JConsole shows no information
                    robertjlee

                    Hi Tim,

                     

                    We are aware of Clebert's work, but we currently have a JBoss MQ system that isn't coping with the load that we're trying to send at it, and so we want to start moving to HornetQ as soon as possible. The HORNETQ-498 currently says 6 weeks remaining; that's obviously an estimate but I doubt we can keep going for anything like that long on JBoss MQ, hence trying to find a workaround.

                     

                    We haven't managed to reproduce the problem reliably outwith the network where our server is running, but we have noticed a coincidence between IllegalStateExceptions when the outgoing bandwidth (from the server to the clients) is being fully used/contested; possibly there is something inside the static block in QueueImpl that is retrying or looping when trying to send messages to clients, slowing down JMX calls to the point of timing out?