1 2 Previous Next 15 Replies Latest reply on Mar 24, 2010 12:42 PM by mlabudde

    Pending messages not browsable/consumable, no more delivering, need restart

    stremblay

      Hi everyone,

       

      I have a weird issue that happened to me a few times now when doing stress testing on an application using activemq.

       

      I am using Fuse message broker 5.3.0.1.

       

      All of a sudden (I am sadly not able to give a scenario to reproduce), activemq becomes in a non-functional state:

       

        - The consumers stops being able to consume messages

        - The producers are still able to put messages on the queues

        - The web console shows pending messages (increasing when producer put new messages on the queues)

        - When clicking on any queue name with pending messages in the web console, the messages are NOT shown?!

        - If using jconsole to call the browse function, no messages are shown

       

      If I simply stop/restart activemq, everything comes back to normal. The pending messages are still there and regain their "browsability" in the web console / in jconsole and if I connect a consumer it is now able to consume messages and everything works as supposed.

       

      I have not seen this in production yet, but its happening when stress-testing in our QA environment...

       

      Is this a known issue, any hint / idea about this problem?

       

      Thanks a lot,

       

      Sylvain Tremblay

        • 1. Re: Pending messages not browsable/consumable, no more delivering, need restart
          lvisnick

          Sylvain,

             no known issues are jumping out at me - is it possible for you to do your stress testing with a newer version (5.3.0.3 has been out since jun 30)?

           

          Can you attach your activemq.xml for us to browse/see how you are configuring?

           

          Finally, when you get into this nasty state, can you do a kill -3 and attach the stack?

           

          Lorinda

          • 2. Re: Pending messages not browsable/consumable, no more delivering, need restart
            gseben

            Also, are you setting a time to live on these messages?

            • 3. Re: Pending messages not browsable/consumable, no more delivering, need restart
              janylj_lijun.yan

              I have exactly the same problem. It has been mystery in our QA environment. Have you fixed your problem? Would you please shed some light on what causes this issue?

               

              Thank you so much.

              • 4. Re: Pending messages not browsable/consumable, no more delivering, need res
                msmyers

                We experience this issue with ActiveMQ 2.5.0 in production twice per day with trivial traffic (2,000 messages).

                 

                Standard configuration, out of the box, sprint template, jencks container, no transactions, tcp transport.

                 

                I was hoping the issue would be solved by switching to FUSE (perhaps it gets better QA?).

                • 5. Re: Pending messages not browsable/consumable, no more delivering, need res
                  msmyers

                  I should note that ActiveMQ has been working perfectly for 6 months without a restart, and I must now restart ActiveMQ daily. I am personally in charge of all changes, and I can tell you that no configuration file has changed, and our clients utilize the exact same Sprint JMSTemplate in the same way.

                   

                  This started on Monday of this week. I have no way to reliably reproduce this deadlock yet. Once I figure it out, I hope to create a fixable jira.

                  • 6. Re: Pending messages not browsable/consumable, no more delivering, need res
                    philippe_tseyen_philippe.tseyen

                    Same problem over here. Usually happens every couple of days, with no clear indication.

                     

                    Java Virtual Machine: Java HotSpot(TM) Server VM version 1.5.0.16 jinteg:03.09.09-09:59 IA64

                    Vendor: Hewlett-Packard Company

                     

                    Operating System: HP-UX B.11.23

                    Architecture: IA64N

                    Number of processors: 6

                     

                    When the issue occurs you can still publish on a queue, but consuming doesn't work anymore.

                     

                    Topics still work as expected.

                     

                    I'll try to attach a stacktrace next time it happens.

                     

                    Philippe

                    • 7. Re: Pending messages not browsable/consumable, no more delivering, need restart
                      lvisnick

                      Any/all of you who hit this problem: when you get into this nasty state, can you please do a kill -3 and attach the stack.

                       

                      Once we have a stack, we willl investigate

                       

                      Thank you,

                      Lorinda

                      • 8. Re: Pending messages not browsable/consumable, no more delivering, need restart
                        philippe_tseyen_philippe.tseyen

                        After 28 days without a restart, active mq seems to hang again on one of our servers.

                         

                        In attachment you'll find the activemq.log, which includes a thread dump done at the moment that it was blocked.

                         

                        Also in attachement you'll find our activemq.xml configuration.

                         

                        ActiveMQMonitor is a script that we'll use to see that activemq is still up and running.

                         

                        When debugging we noticed that producer.send() was hanging.

                         

                        After a restart of activemq the monitor was working fine again.

                         

                        If you need more information, feel free to contact me.

                         

                        Philippe

                        • 9. Re: Pending messages not browsable/consumable, no more delivering, need restart
                          lvisnick

                          Need to look at your thread dump some more - but one theory is that you are running out of file descriptors.  Can you check if you are using the system defualt - and then increase the number of available resources in that area?

                           

                          I realize this is not reproducible on demand - so if you increase the value and you get in this state again, we'll have to capture thread dump again and compare this one to that newly captured one.

                           

                          tx,

                          Lorinda

                          • 10. Re: Pending messages not browsable/consumable, no more delivering, need restart
                            philippe_tseyen_philippe.tseyen

                            We are running with 4096 as maximum number of open file descriptors. Currently FuseMQ uses around 500.

                             

                            If we have another freeze, I'll monitor the number of open file descriptors and get back to you, but I think the number of file descriptors shouldn't be a problem unless there is a leak.

                             

                            Grtz,

                             

                            Philippe

                            • 11. Re: Pending messages not browsable/consumable, no more delivering, need restart
                              philippe_tseyen_philippe.tseyen

                              Our administrator just informed me that our test environment gave a freeze. Number of open file descriptors is 538.

                               

                              In attachment you'll find a number of screenshots of jconsole and a new threaddump.

                               

                              This time we get a "ping timeout" in the program that I attached in one of the previous posts: we can send the message, we can register the listener, but we are not receiving the message that we send before.

                               

                              I hope this is helpful,

                               

                              Philippe

                              • 12. Re: Pending messages not browsable/consumable, no more delivering, need restart
                                mielket

                                This seems to be the same issue as MB-545 which just got resolved and will be in the next release that is going to be released soon.

                                • 13. Re: Pending messages not browsable/consumable, no more delivering, need restart
                                  philippe_tseyen_philippe.tseyen

                                  Not sure this is valueable, but in attachement you'll find a ls -R of the data directory of our activemq installation.

                                   

                                  Philippe

                                  • 14. Re: Pending messages not browsable/consumable, no more delivering, need restart
                                    steff

                                    We are not a paying customer yet, but would like to be if we can get help to get this problem solved.

                                     

                                    We used to use ActiveMQ 5.3.0, but a few days ago we tried to use FUSE MQ 5.3.0.5 to see if it would be able to solve some of our problems.

                                     

                                    It solves some of our problems but this "Messages stops getting consumed" problem still exists.

                                     

                                    We are using Glassfish 2.1.1, FUSE MQ 5.3.0.5 and MySQL (something new). We have an application in Glassfish with MDB's driven by queues in FUSE MQ. It works fine, but some times we eventually end up in a situation where (it seems like) there is still unhandled messages left on MDB-queues, but they are not delivered to the MDB's. We have no 5-sec-drill to recreate the situation, but we have a "endurance"-test that we are able to run, and then the "situation" eventually occurs (sometimes after short time and sometimes after long time).

                                     

                                    I have run our "endurance"-test and established the "situation".

                                     

                                    I have attached a zip file. It contains:

                                    - activemq.xml

                                    - stderr.log: standard error piped from FUSE MQ process

                                    - stdout.log: standard output piped from FUSE MQ process. This also contains the threaddump triggered by "kill -3 " hours after the "situation" has been established.

                                    - activemq.log

                                    - WebAdminQueues.png: Picture of the Admin console "Queues" hours after the "situation" has been established. Shows 2 pending messages in MDB-queue "DistrubutorWPQ"

                                    - DistributorWPQQueue.png: Picture of the Admin console "Queue DistributorWPQ" hours after the "situation" has been established. Shows NO pending messages!

                                    - activemq_msgsInMySQL.png: A picture of the result of a query directly into the message table in MySQL hours after the "situation" has been established. Shows 11 pending messages in "DistributorWPQ" and 7 pending messages in "DecoderWPQ"

                                    - FragmentOfGlassfishServer.log: Selected fragment of the server.log (Glassfish). There is actually almost no exceptions in the log. The only section of error-like things can be found in FragmentOfGlassfishServer.log, and this fragment if from about an hour AFTER the last message was delivered to Glassfish. I believe that it just is due to some timeout of connections, that will be reestablished again when needed. I dont think it has anything to do with the "Messages stops getting consumed" problem.

                                     

                                    Clearly there is a mixed picture of "how many undelivered messages" there are. Different things are shown by Admin console "Queues", Admin console "Queue DistributorWPQ" and the persisting database.

                                     

                                    More information can be requested if needed.

                                     

                                    We hope for some kind of help from FUSE.

                                    Thanks!

                                    1 2 Previous Next