no known issues are jumping out at me - is it possible for you to do your stress testing with a newer version (126.96.36.199 has been out since jun 30)?
Can you attach your activemq.xml for us to browse/see how you are configuring?
Finally, when you get into this nasty state, can you do a kill -3 and attach the stack?
Also, are you setting a time to live on these messages?
I have exactly the same problem. It has been mystery in our QA environment. Have you fixed your problem? Would you please shed some light on what causes this issue?
Thank you so much.
We experience this issue with ActiveMQ 2.5.0 in production twice per day with trivial traffic (2,000 messages).
Standard configuration, out of the box, sprint template, jencks container, no transactions, tcp transport.
I was hoping the issue would be solved by switching to FUSE (perhaps it gets better QA?).
I should note that ActiveMQ has been working perfectly for 6 months without a restart, and I must now restart ActiveMQ daily. I am personally in charge of all changes, and I can tell you that no configuration file has changed, and our clients utilize the exact same Sprint JMSTemplate in the same way.
This started on Monday of this week. I have no way to reliably reproduce this deadlock yet. Once I figure it out, I hope to create a fixable jira.
Same problem over here. Usually happens every couple of days, with no clear indication.
Java Virtual Machine: Java HotSpot(TM) Server VM version 188.8.131.52 jinteg:03.09.09-09:59 IA64
Vendor: Hewlett-Packard Company
Operating System: HP-UX B.11.23
Number of processors: 6
When the issue occurs you can still publish on a queue, but consuming doesn't work anymore.
Topics still work as expected.
I'll try to attach a stacktrace next time it happens.
Any/all of you who hit this problem: when you get into this nasty state, can you please do a kill -3 and attach the stack.
Once we have a stack, we willl investigate
After 28 days without a restart, active mq seems to hang again on one of our servers.
In attachment you'll find the activemq.log, which includes a thread dump done at the moment that it was blocked.
Also in attachement you'll find our activemq.xml configuration.
ActiveMQMonitor is a script that we'll use to see that activemq is still up and running.
When debugging we noticed that producer.send() was hanging.
After a restart of activemq the monitor was working fine again.
If you need more information, feel free to contact me.
Need to look at your thread dump some more - but one theory is that you are running out of file descriptors. Can you check if you are using the system defualt - and then increase the number of available resources in that area?
I realize this is not reproducible on demand - so if you increase the value and you get in this state again, we'll have to capture thread dump again and compare this one to that newly captured one.
We are running with 4096 as maximum number of open file descriptors. Currently FuseMQ uses around 500.
If we have another freeze, I'll monitor the number of open file descriptors and get back to you, but I think the number of file descriptors shouldn't be a problem unless there is a leak.
Our administrator just informed me that our test environment gave a freeze. Number of open file descriptors is 538.
In attachment you'll find a number of screenshots of jconsole and a new threaddump.
This time we get a "ping timeout" in the program that I attached in one of the previous posts: we can send the message, we can register the listener, but we are not receiving the message that we send before.
I hope this is helpful,
We are not a paying customer yet, but would like to be if we can get help to get this problem solved.
We used to use ActiveMQ 5.3.0, but a few days ago we tried to use FUSE MQ 184.108.40.206 to see if it would be able to solve some of our problems.
It solves some of our problems but this "Messages stops getting consumed" problem still exists.
We are using Glassfish 2.1.1, FUSE MQ 220.127.116.11 and MySQL (something new). We have an application in Glassfish with MDB's driven by queues in FUSE MQ. It works fine, but some times we eventually end up in a situation where (it seems like) there is still unhandled messages left on MDB-queues, but they are not delivered to the MDB's. We have no 5-sec-drill to recreate the situation, but we have a "endurance"-test that we are able to run, and then the "situation" eventually occurs (sometimes after short time and sometimes after long time).
I have run our "endurance"-test and established the "situation".
I have attached a zip file. It contains:
- stderr.log: standard error piped from FUSE MQ process
- stdout.log: standard output piped from FUSE MQ process. This also contains the threaddump triggered by "kill -3 " hours after the "situation" has been established.
- WebAdminQueues.png: Picture of the Admin console "Queues" hours after the "situation" has been established. Shows 2 pending messages in MDB-queue "DistrubutorWPQ"
- DistributorWPQQueue.png: Picture of the Admin console "Queue DistributorWPQ" hours after the "situation" has been established. Shows NO pending messages!
- activemq_msgsInMySQL.png: A picture of the result of a query directly into the message table in MySQL hours after the "situation" has been established. Shows 11 pending messages in "DistributorWPQ" and 7 pending messages in "DecoderWPQ"
- FragmentOfGlassfishServer.log: Selected fragment of the server.log (Glassfish). There is actually almost no exceptions in the log. The only section of error-like things can be found in FragmentOfGlassfishServer.log, and this fragment if from about an hour AFTER the last message was delivered to Glassfish. I believe that it just is due to some timeout of connections, that will be reestablished again when needed. I dont think it has anything to do with the "Messages stops getting consumed" problem.
Clearly there is a mixed picture of "how many undelivered messages" there are. Different things are shown by Admin console "Queues", Admin console "Queue DistributorWPQ" and the persisting database.
More information can be requested if needed.
We hope for some kind of help from FUSE.
MessagesStopsGettingConsumed.zip 403.4 K