4 Replies Latest reply on Oct 6, 2017 4:22 PM by abi.chavan

    Hornetq JMS crashes with error [org.hornetq.jms.server] (Periodic Recovery) HQ122016: Error in XA Recovery: javax.transaction.xa.XAException

    abi.chavan

      We have a cluster setup and randomly one of our node stop responding due to Hornetq JMS crashe with error "[org.hornetq.jms.server] (Periodic Recovery) HQ122016: Error in XA Recovery: javax.transaction.xa.XAException".
      This happens very frequently like within 3-4 weeks time-span.

      We have about 5 queues on each node out of which 2 queues are always contains messages within the range of 20K.

      For now we are depending on JBOSS default settings, we have not configured anything beyond that.

      Can somebody point me what will be the reason behind this and how to tackle with such situation.

      Can below settings help us in this situation
      ===============================

      # Allow a 25MB UDP receive buffer for JGroups

      net.core.rmem_max = 26214400

      # Allow a 1MB UDP send buffer for JGroups

      net.core.wmem_max = 1048576
      ================================

       

      we are using following versions:
      -----------------------------------------------

      JBossAS......... 7.2.x.slim.incremental.16
      HornetQ......... 2.3.1.Final (Wild Hornet, 123)

        • 1. Re: Hornetq JMS crashes with error [org.hornetq.jms.server] (Periodic Recovery) HQ122016: Error in XA Recovery: javax.transaction.xa.XAException
          jbertram

          We have a cluster setup and randomly one of our node stop responding due to Hornetq JMS crashe with error "[org.hornetq.jms.server] (Periodic Recovery) HQ122016: Error in XA Recovery: javax.transaction.xa.XAException".

          I wouldn't expect a problem with periodic recovery to cause the entire broker to crash and/or stop responding.  My guess is that this error is a symptom of the problem rather than the cause.  I often see this kind of error in the logs when the broker is working fine.  The error in XA recovery typically just means that the recovery manager has lost contact with a remote resource manager that was involved in an XA transaction.

           

          Can below settings help us in this situation

          ===============================

          # Allow a 25MB UDP receive buffer for JGroups

          net.core.rmem_max = 26214400

          # Allow a 1MB UDP send buffer for JGroups

          net.core.wmem_max = 1048576
          ================================

          This appear to be JGroups settings which wouldn't apply to HornetQ.  I don't see how they could be helpful for you.

           

          we are using following versions:

          -----------------------------------------------

          JBossAS......... 7.2.x.slim.incremental.16
          HornetQ......... 2.3.1.Final (Wild Hornet, 123)

          This software is pretty old at this point.  I'd recommend you migrate to a recent release.

          • 2. Re: Hornetq JMS crashes with error [org.hornetq.jms.server] (Periodic Recovery) HQ122016: Error in XA Recovery: javax.transaction.xa.XAException
            abi.chavan

            I wouldn't expect a problem with periodic recovery to cause the entire broker to crash and/or stop responding.  My guess is that this error is a symptom of the problem rather than the cause.  I often see this kind of error in the logs when the broker is working fine.  The error in XA recovery typically just means that the recovery manager has lost contact with a remote resource manager that was involved in an XA transaction.

            => Does this means my assumption about Honetq JMS crash is wrong.  Is there any possibility that this node lost connection with cluster, because there are no new request on the node just below warnings

            15:12:36,102 WARN  [org.hornetq.jms.server] (Periodic Recovery) HQ122013: Error in XA Recovery recover

            15:32:48,601 WARN  [org.hornetq.jms.server] (Periodic Recovery) HQ122016: Error in XA Recovery

            As you can see there is noting in-between these two warnings, we have to restart this node to connect with cluster after this its working fine

             

            I have following questions now

            1) Is lots (around 15k continuously) of messages in JMS queue can cause this problem?
            2) Is node not reachable from cluster related to JGroups settings where under load it lost connection, because in this case we got following warning before XA-recovery errors

            15:44:48,329 WARNING [org.jgroups.protocols.UDP] (ServerService Thread Pool -- 50) JGRP000015: the send buffer of socket MulticastSocket was set to 640KB, but the OS only allocated 124.93KB. This might lead to performance problems. Please set your max send buffer in the OS correctly (e.g. net.core.wmem_max on Linux)

            15:44:48,329 WARNING [org.jgroups.protocols.UDP] (ServerService Thread Pool -- 50) JGRP000015: the receive buffer of socket MulticastSocket was set to 20MB, but the OS only allocated 124.93KB. This might lead to performance problems. Please set your max receive buffer in the OS correctly (e.g. net.core.rmem_max on Linux)

            • 3. Re: Hornetq JMS crashes with error [org.hornetq.jms.server] (Periodic Recovery) HQ122016: Error in XA Recovery: javax.transaction.xa.XAException
              jbertram

              Does this means my assumption about Honetq JMS crash is wrong. Is there any possibility that this node lost connection with cluster, because there are no new request on the node just below warnings

              15:12:36,102 WARN [org.hornetq.jms.server] (Periodic Recovery) HQ122013: Error in XA Recovery recover

              15:32:48,601 WARN [org.hornetq.jms.server] (Periodic Recovery) HQ122016: Error in XA Recovery

              As you can see there is noting in-between these two warnings, we have to restart this node to connect with cluster after this its working fine

              There's really not enough information to tell what happened.  Have you gathered thread dumps to see what's happening within the application server (and HornetQ)?  Have you used the JBoss CLI or a JMX tool to investigate the health of the broker?  The broker doesn't log anything during normal operation after start-up so the lack of log entries doesn't indicate a problem to me.

               

              Is lots (around 15k continuously) of messages in JMS queue can cause this problem?

              In general, you want to keep your message count as low as possible since a message broker is not a database, but the number of messages in a queue is just a small part of the whole system.  How large are the messages?  Is the address paging?  How many consumers are there?  How many messages are in delivery?  Are they're long GC pauses impacting the JVM?

               

              Is node not reachable from cluster related to JGroups settings where under load it lost connection, because in this case we got following warning before XA-recovery errors

              HornetQ doesn't use JGroups for clustering by default so this may be completely unrelated.  That said, if JGroups is having a hard time with cluster communication that indicates a problem on the box likely with GC or network communication.  At this point it's anybody's guess as to the root cause of the issue.

              1 of 1 people found this helpful
              • 4. Re: Hornetq JMS crashes with error [org.hornetq.jms.server] (Periodic Recovery) HQ122016: Error in XA Recovery: javax.transaction.xa.XAException
                abi.chavan
                There's really not enough information to tell what happened.  Have you gathered thread dumps to see what's happening within the application server (and HornetQ)?  Have you used the JBoss CLI or a JMX tool to investigate the health of the broker?  The broker doesn't log anything during normal operation after start-up so the lack of log entries doesn't indicate a problem to me.

                I am not using anything above

                 

                In general, you want to keep your message count as low as possible since a message broker is not a database, but the number of messages in a queue is just a small part of the whole system.  How large are the messages?  Is the address paging?  How many consumers are there?  How many messages are in delivery?  Are they're long GC pauses impacting the JVM?

                Here what we are doing is interesting, we are using message queue to hold activity and audit logging info which is going to store in Database later through Background job. We don't have message consumer here what we are doing here is we have background job which runs for every 1 minute and fetch max 1000 messages each from above 2 queues and insert those messages into Database. These 2 queues are always loaded with messages over 15k each even after background job is running. The message size is up to 5000 chars which is in ruby Hash format

                We are not using address paging here

                 

                HornetQ doesn't use JGroups for clustering by default so this may be completely unrelated.  That said, if JGroups is having a hard time with cluster communication that indicates a problem on the box likely with GC or network communication.  At this point it's anybody's guess as to the root cause of the issue.

                Ok