5 Replies Latest reply on Oct 11, 2010 4:42 PM by clebert.suconic

    topic subscribers gone silent

    noky

      I've been running HornetQ 2.1.1 in a test environment for some time now.  Today, I saw a very strange (and troubling) problem whereby all the subscribers for a particular topic suddenly all stopped receiving data.  The publishers had no problems publishing.  Restarting the subscribers had no effect: they connected to the server and started listening on the topic, but no data came in.  The subscribers never reported any exceptions via the JMS ExceptionListener facility.  However, other topics were working fine and clients received data normally.

       

      About 10 minutes after the failure, the HornetQ logs showed a lot of the following:

       

      [hornetq-failure-check-thread] 07:33:00,982 WARNING [org.hornetq.core.protocol.core.impl.RemotingConnectionImpl]  Connection failure has been detected: Did not receive ping from /aaa.bbb.ccc.ddd:59984. It is likely the client has exited or crashed without closing its connection, or the network between the server and client has failed. The connection will now be closed. [code=3]
      [hornetq-failure-check-thread] 07:33:00,982 WARNING [org.hornetq.core.server.impl.ServerSessionImpl]  Client connection failed, clearing up resources for session db622b6c-bd63-11df-bb01-0030488a33d0
      [hornetq-failure-check-thread] 07:33:00,982 WARNING [org.hornetq.core.server.impl.ServerSessionImpl]  Cleared up resources for session db622b6c-bd63-11df-bb01-0030488a33d0

       

      The only way I could fix the problem was to restart the HornetQ server.  After startup, the logs showed over 49,000 instances of the following message:

       

      [Thread-19 (group:HornetQ-server-threads9420495-29769356)] 11:45:34,110 WARNING
      [org.hornetq.core.postoffice.impl.PostOfficeImpl]  Duplicate message detected -
      transaction will be rejected

       

      I'm worried this type of problem will happen again once we put HornetQ into production.  This seems like a major blocker.  I'd like to find out exactly what happened before official deployment time comes.

       

      Any ideas?  Anything I can do to help provide more clues?  I have JMX enabled, are there any properties I can examine for clues if this happens again?

        • 1. Re: topic subscribers gone silent
          clebert.suconic

          You probably had a staled consumer, messages built up to the max size, and nothing could be consumed after that.

           

          We are doing some improvements on paging so consumption wouldn't be prevented on that case.

           

           

          I'm not sure about the messages on startup. It seems you're using duplicate detection and had messages re-delivered somehow.. I would need a test replicating it speak anything about it.

          • 2. Re: topic subscribers gone silent
            noky

            Thanks for your quick response Clebert.  Is there a way to mitigate the problem of a staled consumer, short of waiting for the new release with the paging changes?  Is there a way to detect which consumer is not consuming?  I do have paging enabled...

             

            Regarding those "Duplicate messages detected" log messages at startup... I'm not actually using duplicate detection, though I see the following in section 37.5 of the manual:

            HornetQ also uses duplicate detection when paging messages to storage. This is so when a message is depaged from
            storage and server failure occurs, we do not end up depaging the message more than once which could result in
            duplicate delivery.

            Not sure how we are getting tens of thousands of duplicate messages getting stored via paging...

            • 3. Re: topic subscribers gone silent
              clebert.suconic

              There's certainly something misconfigured at your system. The system is certainly paging even though you didn't mean doing it. Look at your address-settings and fix things accordingly.

               

               

              Regarding detecting consumers, there are paging operations you could use through JMX console. We are improving that also for the next release.

               

               

              You could also be using Diverts and having each subscription to have its own queue. You will have messages duplicated on memory though.

              • 4. Re: topic subscribers gone silent
                noky

                Thanks.  Looking forward to the new release!  Might just wait until then before using HornetQ in a production setting, since configuring multiple Diverts seems pretty heavy-handed.

                 

                Just to clarify, I *do* have paging enabled, so it is not a misconfiguration.

                • 5. Re: topic subscribers gone silent
                  clebert.suconic

                  Right. ATM this is documented on the User's manual. You probably saw that.

                   

                  http://hornetq.sourceforge.net/docs/hornetq-2.1.2.Final/user-manual/en/html/paging.html#d0e4868

                   

                   

                  Slow consumers are a burden on any message system out there. Each provider will have a different solution for them with paging / not paging.. etc.

                   

                  We are improving things a bit for slow consumers on topics.

                   

                  But we always suggest to take care of slow consumers on topic subscriptions (or queues on the address in core-api terms).  I mean.. they should be an exception.. not a rule.. otherwise you will be using a message system like a database and that's not usually a good combination.