5 Replies Latest reply on Sep 18, 2013 6:34 PM by clebert.suconic

    DeliveringCount higher than number of connected consumers with consumer-window-size=0

    trondgzi

      Hi,

       

      I'm running JBoss AS 7.1.4 (built from source) with HornetQ 2.2.21. 3 nodes running the same JBoss AS 7.1.4 with a total of 75 consumers are consuming from the same queue. All clients are connected with consumer-window-size=0 because they are generally slow consumers. The consumers are EJB3 MessageDriven beans, using XA transactions for acknowledge/rollback of messages.

       

      When MessageCount < ConsumerCount, then DeliveringCount is equal to MessageCount. When MessageCount > ConsumerCount, then DeliveringCount is equal to ConsumerCount. This is exactly as expected since we are using consumer-window-size=0.

       

      However when my consumers are exceptionally slow (db performance issue) with very long running transactions I see that DeliveringCount can increase up to the same number as MessageCount and eventually everything grinds to a halt. Right now ConsumerCount=75, DeliveringCount=5418 and MessageCount=23750. DeliveringCount is slowly increasing.

       

      This has happened a few times before and when the performance issue is resolved, we end up with some of the messages "hanging" on the queue. The queue has a DeliveringCount of e.g 2500 and nothing happens at the consumer side. To have these messages (re-)delivered to the consumers HornetQ must be restarted.

       

      Any insights on why this is happening is much appreciated, and suggestions to how I can avoid this is even more appreciated.

       

       

      Best regards,

      Trond

        • 1. Re: DeliveringCount higher than number of connected consumers with consumer-window-size=0
          trondgzi

          Logs on the server are full of these:

           

          2013-09-18 16:31:42,701 WARN  [Thread-2 (HornetQ-scheduled-threads-17550809)] [org.hornetq.core.transaction.impl.ResourceManagerImpl] [] [] transaction with xid XidImpl (1891372116 bq:0.0.0.0.0.0.0.0.0.0.-1.-1.10.-96.12.51.99.-92

          .-84.4.82.57.-107.-73.0.3.-128.-11.0.0.0.0.0.0.0.0 formatID:131077 gtxid:0.0.0.0.0.0.0.0.0.0.-1.-1.10.-96.12.51.99.-92.-84.4.82.57.-107.-73.0.3.-128.-12.49 base64:AAAAAAAAAAAAAP__CqAMM2OkrARSOZW3AAOA9QAAAAAAAAAAAAAAAAAAAAAAAP__Cq

          AMM2OkrARSOZW3AAOA9DEHAgIA timed out

          2013-09-18 16:31:42,702 WARN  [Thread-46 (HornetQ-server-HornetQServerImpl::serverUUID=1f560cc2-058c-11e3-8c36-15d997702512-932141844)] [org.hornetq.core.server.impl.QueueImpl] [] [] Error on checkDLQ: java.lang.IllegalStateExcep

          tion: Cannot find add info 47014456

                  at org.hornetq.core.journal.impl.JournalImpl.appendUpdateRecord(JournalImpl.java:910) [hornetq-core-2.2.21.Final.jar:2.2.21.SNAPSHOT (HQ_2_2_21_final, 122)]

                  at org.hornetq.core.persistence.impl.journal.JournalStorageManager.updateDeliveryCount(JournalStorageManager.java:845) [hornetq-core-2.2.21.Final.jar:2.2.21.SNAPSHOT (HQ_2_2_21_final, 122)]

                  at org.hornetq.core.server.impl.QueueImpl.checkRedelivery(QueueImpl.java:1972) [hornetq-core-2.2.21.Final.jar:2.2.21.SNAPSHOT (HQ_2_2_21_final, 122)]

                  at org.hornetq.core.server.impl.QueueImpl$RefsOperation.afterRollback(QueueImpl.java:2382) [hornetq-core-2.2.21.Final.jar:2.2.21.SNAPSHOT (HQ_2_2_21_final, 122)]

                  at org.hornetq.core.transaction.impl.TransactionImpl.afterRollback(TransactionImpl.java:516) [hornetq-core-2.2.21.Final.jar:2.2.21.SNAPSHOT (HQ_2_2_21_final, 122)]

                  at org.hornetq.core.transaction.impl.TransactionImpl$4.done(TransactionImpl.java:376) [hornetq-core-2.2.21.Final.jar:2.2.21.SNAPSHOT (HQ_2_2_21_final, 122)]

                  at org.hornetq.core.persistence.impl.journal.OperationContextImpl$1.run(OperationContextImpl.java:239) [hornetq-core-2.2.21.Final.jar:2.2.21.SNAPSHOT (HQ_2_2_21_final, 122)]

                  at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:100) [hornetq-core-2.2.21.Final.jar:2.2.21.SNAPSHOT (HQ_2_2_21_final, 122)]

                  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_25]

                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_25]

                  at java.lang.Thread.run(Thread.java:724) [rt.jar:1.7.0_25]

           

          And on the clients the logs are full of these:

          2013-09-18 16:25:39,131 WARN  [Periodic Recovery] [com.arjuna.ats.jta] [] [] ARJUNA016038: No XAResource to recover < formatId=131077, gtrid_length=29, bqual_length=36, tx_uid=0:ffff0aa00c33:-569a1ddb:52397c1b:ba42, node_name=1,

          branch_uid=0:ffff0aa00c33:-569a1ddb:52397c1b:ba43, subordinatenodename=null, eis_name=unknown eis name >

          • 2. Re: DeliveringCount higher than number of connected consumers with consumer-window-size=0
            clebert.suconic

            [#HORNETQ-722] Multiple Redeliveries with "IllegalStateException: Cannot find add info" after previous consumer lost con…

             

             

            Once the message gets on redelivery, it will do scheduled redelivery.. which will be in delivering.

             

             

            It seems you have a lot of scheduled redeliveries. I would recommend you updating HornetQ. You could either get a newer version of JBoss. or if you must stay on that version, you could replace hornetq libraries by the one on the 2.3.x branch. (it's pretty stable.. we use it for EAP releases).

             

            Only drawback is JBoss has the client libraries on the jboss-all-client (forgot the name), so you would have to unzip it.. replace it.. and take it back.

             

             

            Although your best option is to get a newer version of JBoss, since we may have changed something on the integrations.

            1 of 1 people found this helpful
            • 3. Re: DeliveringCount higher than number of connected consumers with consumer-window-size=0
              clebert.suconic

              At least a workaround is to disable scheduled redelivery. you may add a sleep on your MDB if deliveryCount > 0 to avoid bursting your Database in case of deadlocks and other issues on the DB.

               

               

              Watch out also for the DLQ settings.

               

               

              (Just to let you know.. sometimes I ask question trying to nail down a problem.. on this case I'm pretty sure this is what's happening).

              1 of 1 people found this helpful
              • 4. Re: DeliveringCount higher than number of connected consumers with consumer-window-size=0
                trondgzi

                Hi Clebert,

                 

                you are correct about a lot of redeliveries. When the performance issues occur we get a lot of transaction timeouts on the clients which will cause redelivery. The queue has redelivery-delay=2000 and max-delivery-attempts=2. I suspected that the redelivery mechanism was using scheduled delivery, but I was not sure since ScheduledCount remains 0 when this happens.

                 

                I found the JIRA link you provided while googling earlier, but I dismissed the relevance since the JIRA issue has Fix Version/s: 2.2.8.Final and I'm using 2.2.21.Final.

                 

                Upgrading JBoss and/or HornetQ is not something I can do for at least 2-3 months, but implementing sleep in the MDB should be an easy fix, although I would consider this a sub-par solution. I will try to make a test case to replicate the issue, and try to verify that it behaves differently with the 2.3.x branch. Hopefully this will make it easier to get approval for upgrading JBoss...

                • 5. Re: DeliveringCount higher than number of connected consumers with consumer-window-size=0
                  clebert.suconic

                  U right... I thought 2.2.1 when I read 2.2.21...

                   

                   

                  I looked at the scheduled redelivery and I don't see a lot of difference in logic between 2.2.21 and 2.2.x

                   

                   

                  There are lots of differences on the delivery handling since I've removed quite a few locks and fixed other issues.

                   

                  Your issue is definitely related to scheduled redelivery. I would be interested in know what happened so you're still having that issue. maybe a replicator would be help us to fix it further. but you would have to evaluate it on the latest versions.

                   

                   

                  you should probably look at disabling scheduled redelivery as we talked till here.