1 2 Previous Next 15 Replies Latest reply on Mar 24, 2010 12:42 PM by mlabudde

Pending messages not browsable/consumable, no more delivering, need restart

stremblay Jul 21, 2009 8:10 AM

Hi everyone,

I have a weird issue that happened to me a few times now when doing stress testing on an application using activemq.

I am using Fuse message broker 5.3.0.1.

All of a sudden (I am sadly not able to give a scenario to reproduce), activemq becomes in a non-functional state:

- The consumers stops being able to consume messages

- The producers are still able to put messages on the queues

- The web console shows pending messages (increasing when producer put new messages on the queues)

- When clicking on any queue name with pending messages in the web console, the messages are NOT shown?!

- If using jconsole to call the browse function, no messages are shown

If I simply stop/restart activemq, everything comes back to normal. The pending messages are still there and regain their "browsability" in the web console / in jconsole and if I connect a consumer it is now able to consume messages and everything works as supposed.

I have not seen this in production yet, but its happening when stress-testing in our QA environment...

Is this a known issue, any hint / idea about this problem?

Thanks a lot,

Sylvain Tremblay

1. Re: Pending messages not browsable/consumable, no more delivering, need restart

lvisnick Jul 21, 2009 11:56 AM (in response to stremblay)

Sylvain,
no known issues are jumping out at me - is it possible for you to do your stress testing with a newer version (5.3.0.3 has been out since jun 30)?

Can you attach your activemq.xml for us to browse/see how you are configuring?

Finally, when you get into this nasty state, can you do a kill -3 and attach the stack?

Lorinda
Actions
2. Re: Pending messages not browsable/consumable, no more delivering, need restart

gseben Jul 22, 2009 10:51 AM (in response to stremblay)

Also, are you setting a time to live on these messages?
Actions
3. Re: Pending messages not browsable/consumable, no more delivering, need restart

janylj_lijun.yan Aug 10, 2009 4:17 PM (in response to stremblay)

I have exactly the same problem. It has been mystery in our QA environment. Have you fixed your problem? Would you please shed some light on what causes this issue?

Thank you so much.
Actions
4. Re: Pending messages not browsable/consumable, no more delivering, need res

msmyers Sep 18, 2009 3:47 PM (in response to stremblay)

We experience this issue with ActiveMQ 2.5.0 in production twice per day with trivial traffic (2,000 messages).

Standard configuration, out of the box, sprint template, jencks container, no transactions, tcp transport.

I was hoping the issue would be solved by switching to FUSE (perhaps it gets better QA?).
Actions
5. Re: Pending messages not browsable/consumable, no more delivering, need res

msmyers Sep 18, 2009 3:49 PM (in response to msmyers)

I should note that ActiveMQ has been working perfectly for 6 months without a restart, and I must now restart ActiveMQ daily. I am personally in charge of all changes, and I can tell you that no configuration file has changed, and our clients utilize the exact same Sprint JMSTemplate in the same way.

This started on Monday of this week. I have no way to reliably reproduce this deadlock yet. Once I figure it out, I hope to create a fixable jira.
Actions
6. Re: Pending messages not browsable/consumable, no more delivering, need res

philippe_tseyen_philippe.tseyen Sep 29, 2009 10:05 AM (in response to msmyers)

Same problem over here. Usually happens every couple of days, with no clear indication.

Java Virtual Machine: Java HotSpot(TM) Server VM version 1.5.0.16 jinteg:03.09.09-09:59 IA64
Vendor: Hewlett-Packard Company

Operating System: HP-UX B.11.23
Architecture: IA64N
Number of processors: 6

When the issue occurs you can still publish on a queue, but consuming doesn't work anymore.

Topics still work as expected.

I'll try to attach a stacktrace next time it happens.

Philippe
Actions
7. Re: Pending messages not browsable/consumable, no more delivering, need restart

lvisnick Sep 29, 2009 11:44 AM (in response to stremblay)

Any/all of you who hit this problem: when you get into this nasty state, can you please do a kill -3 and attach the stack.

Once we have a stack, we willl investigate

Thank you,
Lorinda
Actions
8. Re: Pending messages not browsable/consumable, no more delivering, need restart

philippe_tseyen_philippe.tseyen Oct 5, 2009 11:42 AM (in response to lvisnick)
After 28 days without a restart, active mq seems to hang again on one of our servers.

In attachment you'll find the activemq.log, which includes a thread dump done at the moment that it was blocked.

Also in attachement you'll find our activemq.xml configuration.

ActiveMQMonitor is a script that we'll use to see that activemq is still up and running.

When debugging we noticed that producer.send() was hanging.

After a restart of activemq the monitor was working fine again.

If you need more information, feel free to contact me.

Philippe

ActiveMQMonitor.java 1.7 KB

activemq.xml 3.4 KB

activemq.log 641.5 KB
Actions
9. Re: Pending messages not browsable/consumable, no more delivering, need restart

lvisnick Oct 6, 2009 9:44 PM (in response to philippe_tseyen_philippe.tseyen)

Need to look at your thread dump some more - but one theory is that you are running out of file descriptors. Can you check if you are using the system defualt - and then increase the number of available resources in that area?

I realize this is not reproducible on demand - so if you increase the value and you get in this state again, we'll have to capture thread dump again and compare this one to that newly captured one.

tx,
Lorinda
Actions
10. Re: Pending messages not browsable/consumable, no more delivering, need restart

philippe_tseyen_philippe.tseyen Oct 14, 2009 5:18 AM (in response to lvisnick)

We are running with 4096 as maximum number of open file descriptors. Currently FuseMQ uses around 500.

If we have another freeze, I'll monitor the number of open file descriptors and get back to you, but I think the number of file descriptors shouldn't be a problem unless there is a leak.

Grtz,

Philippe
Actions
11. Re: Pending messages not browsable/consumable, no more delivering, need restart

philippe_tseyen_philippe.tseyen Oct 14, 2009 5:35 AM (in response to philippe_tseyen_philippe.tseyen)
Our administrator just informed me that our test environment gave a freeze. Number of open file descriptors is 538.

In attachment you'll find a number of screenshots of jconsole and a new threaddump.

This time we get a "ping timeout" in the program that I attached in one of the previous posts: we can send the message, we can register the listener, but we are not receiving the message that we send before.

I hope this is helpful,

Philippe

activemq.log 655.5 KB

freeze activemq.zip 313.9 KB
Actions
12. Re: Pending messages not browsable/consumable, no more delivering, need restart

mielket Oct 14, 2009 5:43 AM (in response to stremblay)

This seems to be the same issue as MB-545 which just got resolved and will be in the next release that is going to be released soon.
Actions
13. Re: Pending messages not browsable/consumable, no more delivering, need restart

philippe_tseyen_philippe.tseyen Oct 14, 2009 5:52 AM (in response to philippe_tseyen_philippe.tseyen)
Not sure this is valueable, but in attachement you'll find a ls -R of the data directory of our activemq installation.

Philippe

activemq_dirlist.log 18.0 KB
Actions
14. Re: Pending messages not browsable/consumable, no more delivering, need restart

steff Feb 23, 2010 4:29 AM (in response to stremblay)
We are not a paying customer yet, but would like to be if we can get help to get this problem solved.

We used to use ActiveMQ 5.3.0, but a few days ago we tried to use FUSE MQ 5.3.0.5 to see if it would be able to solve some of our problems.

It solves some of our problems but this "Messages stops getting consumed" problem still exists.

We are using Glassfish 2.1.1, FUSE MQ 5.3.0.5 and MySQL (something new). We have an application in Glassfish with MDB's driven by queues in FUSE MQ. It works fine, but some times we eventually end up in a situation where (it seems like) there is still unhandled messages left on MDB-queues, but they are not delivered to the MDB's. We have no 5-sec-drill to recreate the situation, but we have a "endurance"-test that we are able to run, and then the "situation" eventually occurs (sometimes after short time and sometimes after long time).

I have run our "endurance"-test and established the "situation".

I have attached a zip file. It contains:
- activemq.xml
- stderr.log: standard error piped from FUSE MQ process
- stdout.log: standard output piped from FUSE MQ process. This also contains the threaddump triggered by "kill -3 " hours after the "situation" has been established.
- activemq.log
- WebAdminQueues.png: Picture of the Admin console "Queues" hours after the "situation" has been established. Shows 2 pending messages in MDB-queue "DistrubutorWPQ"
- DistributorWPQQueue.png: Picture of the Admin console "Queue DistributorWPQ" hours after the "situation" has been established. Shows NO pending messages!
- activemq_msgsInMySQL.png: A picture of the result of a query directly into the message table in MySQL hours after the "situation" has been established. Shows 11 pending messages in "DistributorWPQ" and 7 pending messages in "DecoderWPQ"
- FragmentOfGlassfishServer.log: Selected fragment of the server.log (Glassfish). There is actually almost no exceptions in the log. The only section of error-like things can be found in FragmentOfGlassfishServer.log, and this fragment if from about an hour AFTER the last message was delivered to Glassfish. I believe that it just is due to some timeout of connections, that will be reestablished again when needed. I dont think it has anything to do with the "Messages stops getting consumed" problem.

Clearly there is a mixed picture of "how many undelivered messages" there are. Different things are shown by Admin console "Queues", Admin console "Queue DistributorWPQ" and the persisting database.

More information can be requested if needed.

We hope for some kind of help from FUSE.
Thanks!

MessagesStopsGettingConsumed.zip 403.4 KB
Actions

1 2 Previous Next

Go to original post