3 Replies Latest reply on Sep 17, 2014 1:16 PM by jbertram

    App unable to see new messages on one of the clustered HornetQ servers, even though msgs are being added...journal files increasing.

    dr1985

      Environment: 
      Hornetq v230 on Linux x86
      Running with 2 clustered HornetQ servers.


      Problem: 
      A Developer reported that it appeared that our production HornetQ servers were losing messages.

      The Developer  could run a transaction  which would put  a single message  on a queue. 

      Some messages that he was sending  processed normally, while other messages never showed up.


      jconsole showed:

      DeliveryingCount = 0
      Message Count = 0
      Messages Added =  148195


      After doing some investigation, I noticed that instead of the 10 journals that are normally out there,
      we had 25 journals on one of our HornetQ servers...and the number of journals was growing.
      The other HornetQ server in the cluster still showed just 10 journals.


      Bouncing  the tomcat servers didn't help,  So, we bounced the HornetQ server with the 25 journals. 


      When HornetQ restarted, the jconsole Message Count immediately showed there were over 13k messages queued. 

      Eventually all the messages processed, but the fact that  messages were being Journaled, while the
      Messages Counters were not being incremented is obviousily a serious concern.


      We have a QDepth monitor set up, but up 'til this point, we hadn't thought to monitor the number of
      journal files in existence.


      I couldn't find anything regarding this problem, but I thought I'd check if this was a known issue?

        • 1. Re: App unable to see new messages on one of the clustered HornetQ servers, even though msgs are being added...journal files increasing.
          jbertram

          That problem doesn't ring any bells.  I take it this is not something you can reproduce reliably.

           

          You might consider moving to a later release.

          • 2. Re: App unable to see new messages on one of the clustered HornetQ servers, even though msgs are being added...journal files increasing.
            dr1985

            Thanks for the response.  We've got HornetQ 2.4 on our sandbox servers with plans to move forward with it.

             

            As for the problem, more information has come to light which I think might explain why messages were journaled, but the QDepth Counters were never incremented.

             

            Obviously I don't know the internals of HornetQ, so I may totally off base.

             

            anyway...

             

            This problem actually spanned multiple (5) days! 

            Again we run with  2 clustered HornetQ servers (HQ1 and HQ2).

             

             

            1. It looks like our problem started with a network issue last Thursday....and:

            the Bridge from HQ1 to HQ2 appears to have reconnected once the network was working again, but...

            the Bridge from HQ2 to HQ doesn't appear to have reconnected.

             

            HQ1 log messages:

            HQ212037: Connection failure has been detected: HQ119014: Did not receive data from....The connection will now be closed. [code=CONNECTION_TIMEDOUT]

            HQ222061: Client connection failed, clearing up resources for session 4faff486-32b8-11e4-be70-4525a9df527d

            HQ222107: Cleared up resources for session 4faff486-32b8-11e4-be70-4525a9df527d

            HQ221027: Bridge ClusterConnectionBridge@729021a5......is connected

             

             

            HQ2 log messages:

            HQ212037: Connection failure has been detected: HQ119011: Did not receive data from server...

            HQ222095: Connection failed with failedOver=false: HornetQException[errorType=CONNECTION_TIMEDOUT message=HQ119011: ...from HQ1

            HQ212037: Connection failure has been detected: ....The connection will now be closed

             

            The HQ2 log doesn't have the "HQ221027....is connected" message in the log.

             

             


            2. Now for the app that reported the problem...it had 2 consumers on HQ1, and no consumers running on HQ2.

             

            So, based on test observations from when an app has 2 consumers on HQ1, and no consumers on HQ2...AND when the Bridge is working,

            messages going to HQ2 would automatically be forwarded from HQ2 to HQ1 (where the consumers are), and the

            Message counters for the queue on HQ2 would never be incremented.

             

             

            3.Basically, I'm thinking:

             

            - HQ1 told HQ2 that HQ1 had consumers on the queue....(the HQ1 to HQ2 bridge is working)

            - HQ2 had no consumers on the queue, so it tries to forward the message to HQ1...but since

               the HQ2 to HQ1 bridge is down, HQ2 just ends up queuing the message on the Bridge-cluster-connection queue.

             

             

            I realize this is all just conjecture, but I'm trying to come up with a reasonable explanation for what we saw.

             

            ...if the problem were to reoccur, I would definitely check the Bridge-cluster-connection queue counters in the jconsole...that would answer this question for me.

             

             

             

            4. Here's the cluster connection definition used by both HQ1 and HQ2.  btw, we've had the network drop and re-connect before   without any issues:

             

             

            HQ1 and HQ2 use an identical cluster-connections definitions: hornetq-configuration.xml

             

            <!--##########################################################-->

            <!--#################### Cluster Connections #################-->

            <!--##########################################################-->

               <cluster-connections>

                  <cluster-connection name="xxxxx.cluster.connection">

                     <address>jms</address>

                     <connector-ref>primary-connector</connector-ref>

                     <retry-interval>10000</retry-interval>

                     <reconnect-attempts>999</reconnect-attempts>

                     <use-duplicate-detection>true</use-duplicate-detection>

                     <forward-when-no-consumers>false</forward-when-no-consumers>

                     <max-hops>1</max-hops>

                     <confirmation-window-size>1048576</confirmation-window-size>

                     <discovery-group-ref discovery-group-name="DiscoveryGroupxxxxxxxxx"/>

                  </cluster-connection>

               </cluster-connections>

             

            ...any help is appreciated.

            • 3. Re: App unable to see new messages on one of the clustered HornetQ servers, even though msgs are being added...journal files increasing.
              jbertram

              That seems like a reasonable explanation.  A series of thread dumps and an inspection of the store-and-forward queue (as you suggested) would help confirm.