1 2 Previous Next 18 Replies Latest reply on Jan 30, 2017 4:35 PM by jbertram

    producers blocked

    noky

      We're testing HornetQ 2.1.1Final, getting ready to put it into production.  However, every so often we've seen this disconcerting problem where producers on certain hosts become blocked.  The only way to fix the problem is to restart the hornetq server.  I searched the discussions and the problem is somewhat similar to this (http://community.jboss.org/message/537666), but there are some key differences.

       

      First off, we have two network segments, let's call them A and B.  The hornetq server is on network A.  Network B is a cluster of webservers behind a load balancer (thus, connections from apps running on servers in network B appear to come from the same IP address).  There are producers apps on networks A and B.  When the problem happens, all producers on network B become blocked in org.hornetq.jms.client.HornetQMessageProducer.send().  Producers on network A continue to publish just fine.

       

      When the blocked producer problem happens, the only thing that fixes it is restarting the hornetq server: producers on network B then reconnect and continue on their merry way.  Restarting the producer applications on network B (running in Tomcat) has no effect.  The producers reconnect and get blocked again.

       

      The message throughput for our application is fairly low, peaking around 40 msgs/sec.  Messages are about 250 bytes.  Messages are not persistent.  We are only making use of JMS topics, not queues.  Given this usage scenario, I would not expect the hornetq server to block producers based on flow control policies.  Am I missing something here?

       

      A stack dump of the hung producer looks like this:

       

      "AVLParserPublisher" daemon prio=10 tid=0x3e35e400 nid=0x1f47 waiting on condition [0x4678c000]
         java.lang.Thread.State: WAITING (parking)
              at sun.misc.Unsafe.park(Native Method)
              - parking to wait for  <0x7ed6f7b8> (a java.util.concurrent.Semaphore$NonfairSync)
              at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:905)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1217)
              at java.util.concurrent.Semaphore.acquire(Semaphore.java:441)
              at org.hornetq.core.client.impl.ClientProducerCreditsImpl.acquireCredits(ClientProducerCreditsImpl.java:67)
              at org.hornetq.core.client.impl.ClientProducerImpl.doSend(ClientProducerImpl.java:303)
              at org.hornetq.core.client.impl.ClientProducerImpl.send(ClientProducerImpl.java:139)
              at org.hornetq.jms.client.HornetQMessageProducer.doSend(HornetQMessageProducer.java:451)
              at org.hornetq.jms.client.HornetQMessageProducer.send(HornetQMessageProducer.java:199)
              at com.mycompany.JMSPublisher.publish(JMSPublisher.java:142)
              - locked <0x7ecd2e08> (a com.mycompany.VehicleReportJMSPublisher)
              at com.mycompany.ParserPublisher.mainLoop(ParserPublisher.java:207)
              at com.mycompany.ParserPublisher.access$000(ParserPublisher.java:28)
              at com.mycompany.ParserPublisher$1.runSafe(ParserPublisher.java:105)
              at com.mycompany.SafeThread.run(SafeThread.java:32)

       

      Any ideas?  I was hoping for some clues in the hornetq logs but didn't see anything about blocking producers or flow-control kicking in.

        • 1. Re: producers blocked
          timfox

          The stack trace clearly shows the producer is waiting to receive more credits from the server.

           

          If it's not receiving any it means the amount of memory taken by the messages for the addresses in question has been exceeded.

          • 2. Re: producers blocked
            timfox

            Since you haven't posted your flow control config is hard to comment further.

            • 3. Re: producers blocked
              noky

              We are using the default flow control settings in <address-settings>

               

              Does it not matter that we are not using queues, only topics?  My assumption was that producers should not get blocked since the server should not be waiting for consumers to drain messages from queues.  Apparently, this is not the case.

               

              Also, it seems that having multiple producers effectively behind a single IP address is confusing the hornetq server, since it cannot tell that connections are actually from apps on different servers.  How do we get around this?

              • 5. Re: producers blocked
                noky

                Thanks for the link.  Yes, I have read the chapter on paging in the past.  We currently don't have paging enabled.  Given our usage pattern, I would not expect the server to run out of memory.

                 

                I still don't understand how our JMS producers can get blocked, since they are publishing non-persistent messages to a topic.  Consumers are not using durable subscriptions.  Message throughput is fairly low.  There is no queue to get filled up with messages, hornetq should not be reaching its memory limit.  What am I missing here?

                • 6. Re: producers blocked
                  clebert.suconic

                  I believe the default setting is blocking. That's why I sent you that link.

                  • 7. Re: producers blocked
                    timfox

                     

                    I still don't understand how our JMS producers can get blocked, since they are publishing non-persistent messages to a topic.  Consumers are not using durable subscriptions.  Message throughput is fairly low.  There is no queue to get filled up with messages, hornetq should not be reaching its memory limit.  What am I missing here?

                    Yes there is a queue. In fact there are multiple queues. Each subscription on your topic is a queue, and when the combined size of unacknowledged messages in all the subscriptions in that topic exceeds the value you have configured, the producers will block.

                     

                    When you consume and ack your messages then producers will unblock.

                     

                    The default size is given by:

                     

                    <address-settings>
                          <!--default for catch all-->
                          <address-setting match="#">
                             <dead-letter-address>jms.queue.DLQ</dead-letter-address>
                             <expiry-address>jms.queue.ExpiryQueue</expiry-address>
                             <redelivery-delay>0</redelivery-delay>
                             <max-size-bytes>10485760</max-size-bytes>      
                             <message-counter-history-day-limit>10</message-counter-history-day-limit>
                             <address-full-policy>BLOCK</address-full-policy>
                          </address-setting>
                       </address-settings>

                     

                    i.e. it's 10 MiB

                     

                    You can always increase this value. The semantics of blocking producers are explained in the user manual.

                    • 8. Re: producers blocked
                      noky

                      Aha, that explains it.  Thank you Tim for your timely response and all the hard work you put into HornetQ!  We likely need to turn on paging and perform some additional monitoring.  I will experiment with smaller values for MessageProducer.setTimeToLive(), as this should prevent subscriber queues from getting out of hand.

                       

                      Looks like I also need to find the badly behaved subscribers that are not processing messages in a timely fashion.  I enabled the JMX connector so I can monitor HornetQ with jconsole, very  slick!  However, I cannot seen to identify specific subscribers, which have all called Connection.setClientID()... One last question: Does HornetQ not expose the clientID in the JMX interface?  When I look at org.hornetq.Queue.Core.my-topic.connection  I can only see Name values like "2942f8e3-579f-4643-af96-aead012f5ce0" and ID values like "32816238".

                      • 9. Re: producers blocked
                        jmesnil

                        Mike Charnoky wrote:

                         

                        Looks like I also need to find the badly behaved subscribers that are not processing messages in a timely fashion.  I enabled the JMX connector so I can monitor HornetQ with jconsole, very  slick!  However, I cannot seen to identify specific subscribers, which have all called Connection.setClientID()... One last question: Does HornetQ not expose the clientID in the JMX interface?  When I look at org.hornetq.Queue.Core.my-topic.connection  I can only see Name values like "2942f8e3-579f-4643-af96-aead012f5ce0" and ID values like "32816238".

                        Have a look at the TopicControl MBean. You'll be able to list all the durable subscriptions (which have a client ID).

                        • 10. Re: producers blocked
                          noky

                          Upon further investigation, I think I have found a problem with HornetQ.  The blocked producer problem happened again today, a thread dump of the tomcat app confirms the producers are all stuck in HornetQMessageProducer.send().  This time I used jconsole to check the client queues associated with the topic.  Here's the kicker: there were 4 subscribers for the topic (whose publishers were blocked) and all 4 showed zero messages in their queues!

                           

                          I am looking at MBeans: org.hornetq.Queue.Core.jms.MYTOPIC.CLIENTNAME (where CLIENTNAME is something like: 0148a21b-c994-49c6-ad06-23e2da27bd89).  There are 4 of these MBeans, one for each subscriber.  The MessageCount for each MBean is 0 and the MessagesAdded never changes.

                           

                          Is there another reason producers would block if the subscribers have drained all the messages in their respective queues?  Is there any additional debugging I can perform to get to the bottom of this?  I have left the system in the current blocked state.

                          • 11. Re: producers blocked
                            clebert.suconic

                            "and all 4 showed zero messages in their queues"

                             

                            No pending ACKs or transactions?

                             

                             

                             

                            We would need a testcase.. but it seems a basic scenario to be a bug.. most likely some config scenario.

                             

                            We would need something more concrete to help you.. some way to reproduce the issue.

                             

                             

                            Otherwise you would have to do your own debug.  If you can replicate this at your env only, you could add logs to ClientProducerCredit to see when credits are received.

                            • 12. Re: producers blocked
                              noky

                              We are not using transactions at all.  Not sure how to ascertain whether there are any ACKs pending... is there an MBean which shows this information?

                               

                              Not sure I can reproduce this in a simple test case, it happens randomly after our application runs for a while (somewhere between a day to several weeks).  Here's the odd part: once the problem gets triggered, if I start up a new producer and start publishing messages, the first message actually gets published and received by subscribers, but the producer still blocks in MessageProducer.send().  Also, the producers still stay blocked even when all the subscribers disconnect from the server!

                               

                              The hornetq configuration is pretty close to the out-of-the-box "standalone/clustered" config.  I have changed some port numbers, logging levels, etc.  Nothing major.  We have two hornetq servers in the cluster.  I tried shutting down the second hornetq server when the problem happens but this does not alleviate things.

                               

                              Hopefully this information provides some clues...  Will look into adding credit logging to ClientProducerCredit...

                              • 13. Re: producers blocked
                                clebert.suconic

                                What i was asking is if you're properly acknowledging your messages at your code. What's the ACK mode you're using at your consumers?

                                • 14. Re: producers blocked
                                  noky

                                  The consumers all use AUTO_ACKNOWLEDGE.

                                  1 2 Previous Next