Hello everyone. I've been using hornetq 2.2.2 in basically a production environment for more than a year without any issue.
Two days ago there was a release on an outside system that started sending slightly bigger XML messages to our hornetq server.
Our server has around 10 durable topics created at startup. The amount of clients connected to those topics can vary considerably, from 1 at night to 10-20 during some hours of the day.
The issues that began after the outside system started sending the new XML formats are like this (taken from the producer threat stacktrace via jmx):
State: WAITING on java.util.concurrent.Semaphore$NonfairSync@1a27dbc
Total blocked: 723 Total waited: 72.765
Stack trace: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969) java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281) java.util.concurrent.Semaphore.acquire(Semaphore.java:441) org.hornetq.core.client.impl.ClientProducerCreditsImpl.acquireCredits(ClientProducerCreditsImpl.java:81) org.hornetq.core.client.impl.ClientProducerImpl.doSend(ClientProducerImpl.java:305) org.hornetq.core.client.impl.ClientProducerImpl.send(ClientProducerImpl.java:142) org.hornetq.jms.client.HornetQMessageProducer.doSend(HornetQMessageProducer.java:451) org.hornetq.jms.client.HornetQMessageProducer.send(HornetQMessageProducer.java:199) com.rhad.dataminer.parser.superclasses.AbstractResourceParserPublisher.sendJMSMessage(AbstractResourceParserPublisher.java:221) ...
The producer doesn't share non-thread safe objects (like Session) between threads. The Session in particular is returned by a Singleton Connection like this:
return connection.createSession(false, Session.AUTO_ACKNOWLEDGE);
After the deadlock of yesterday I took the chance to upgrade to hornetq 2.2.14 on both the client and server side, but today the deadlock happened again.
One important thing, I'm using this as my address settings (taken from the default standalone non clustered configuration):
<address-settings> <!--default for catch all--> <address-setting match="#"> <dead-letter-address>jms.queue.DLQ</dead-letter-address> <expiry-address>jms.queue.ExpiryQueue</expiry-address> <redelivery-delay>0</redelivery-delay> <max-size-bytes>10485760</max-size-bytes> <message-counter-history-day-limit>10</message-counter-history-day-limit> <address-full-policy>BLOCK</address-full-policy> </address-setting> </address-settings>
Could it be that the new, slightly bigger messages that get published over the topics are causing the memory size limit to be reached? Should I change the policy from BLOCK to PAGE to prevent the issue?
One more question to help me understand how to debug it by myself next time.
Lets take a currently deadlocked topic right now (I'm using jconsole):
Topic -> JMS -> MY_TOPIC: deliveringCount=486, messageCount=8357, messagesAdded=13581, nonDurableMessageCount=8357, nonDurableSubscriptionCount=2, subscriptionCount=2, durableSubscriptionCount=0, durableMessageCount=0
Now I've looked up the same topic under queue instead:
Queue -> Core -> jms.topic.MY_TOPIC: it has 3 subelements, two with random id names, and one called jms.topic.MY_TOPIC.
Starting with the first:
26496bfd-ac2b-4859-82bd-c6032cd918ae -> consumerCount=1, deliveringCount=486, messageCount=8357, messagesAdded=8357
4b7bb57d-cc99-4dcb-838c-bd648e16b740 -> consumerCount=1, deliveringCount=0, messageCount=0, messagesAdded=5224
jms.topic.MY_TOPIC -> consumerCount=0, deliveringCount=0, messageCount=0, messagesAdded=0
I'd be really thankful to you if you could explain to me how to better interpret this data and learn to debug such issue by myself if it will happen again.