producers blocked
noky Aug 12, 2010 11:44 AMWe're testing HornetQ 2.1.1Final, getting ready to put it into production. However, every so often we've seen this disconcerting problem where producers on certain hosts become blocked. The only way to fix the problem is to restart the hornetq server. I searched the discussions and the problem is somewhat similar to this (http://community.jboss.org/message/537666), but there are some key differences.
First off, we have two network segments, let's call them A and B. The hornetq server is on network A. Network B is a cluster of webservers behind a load balancer (thus, connections from apps running on servers in network B appear to come from the same IP address). There are producers apps on networks A and B. When the problem happens, all producers on network B become blocked in org.hornetq.jms.client.HornetQMessageProducer.send(). Producers on network A continue to publish just fine.
When the blocked producer problem happens, the only thing that fixes it is restarting the hornetq server: producers on network B then reconnect and continue on their merry way. Restarting the producer applications on network B (running in Tomcat) has no effect. The producers reconnect and get blocked again.
The message throughput for our application is fairly low, peaking around 40 msgs/sec. Messages are about 250 bytes. Messages are not persistent. We are only making use of JMS topics, not queues. Given this usage scenario, I would not expect the hornetq server to block producers based on flow control policies. Am I missing something here?
A stack dump of the hung producer looks like this:
"AVLParserPublisher" daemon prio=10 tid=0x3e35e400 nid=0x1f47 waiting on condition [0x4678c000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x7ed6f7b8> (a java.util.concurrent.Semaphore$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:905)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1217)
at java.util.concurrent.Semaphore.acquire(Semaphore.java:441)
at org.hornetq.core.client.impl.ClientProducerCreditsImpl.acquireCredits(ClientProducerCreditsImpl.java:67)
at org.hornetq.core.client.impl.ClientProducerImpl.doSend(ClientProducerImpl.java:303)
at org.hornetq.core.client.impl.ClientProducerImpl.send(ClientProducerImpl.java:139)
at org.hornetq.jms.client.HornetQMessageProducer.doSend(HornetQMessageProducer.java:451)
at org.hornetq.jms.client.HornetQMessageProducer.send(HornetQMessageProducer.java:199)
at com.mycompany.JMSPublisher.publish(JMSPublisher.java:142)
- locked <0x7ecd2e08> (a com.mycompany.VehicleReportJMSPublisher)
at com.mycompany.ParserPublisher.mainLoop(ParserPublisher.java:207)
at com.mycompany.ParserPublisher.access$000(ParserPublisher.java:28)
at com.mycompany.ParserPublisher$1.runSafe(ParserPublisher.java:105)
at com.mycompany.SafeThread.run(SafeThread.java:32)
Any ideas? I was hoping for some clues in the hornetq logs but didn't see anything about blocking producers or flow-control kicking in.