2 Replies Latest reply on Jan 28, 2014 2:04 PM by Elias Ross

    HornetQ, XAResources and Oracle database locking issues on RHQ 4.9

    Elias Ross Master


      I have an HA configuration with quite a lot of agent hosts (2,000). Seen periodically are these weird HornetQ errors. I think this has something to do with XAResource and the database. I see a lot of database locks when this happens. Basically the whole server becomes stuck holding up transactions and I have to force a restart.

       

      Logs:

       

      03:06:13,640 WARN  [com.arjuna.ats.arjuna] (Transaction Reaper) ARJUNA012117: TransactionReaper::check timeout for TX 0:ffff11b24825:57ca474d:52df1c4c:2ee0 in state  RUN
      05:08:29,284 WARN  [com.arjuna.ats.arjuna] (Transaction Reaper) ARJUNA012117: TransactionReaper::check timeout for TX 0:ffff11b24825:57ca474d:52df1c4c:2f10 in state  RUN
      05:35:53,539 WARN  [com.arjuna.ats.arjuna] (Transaction Reaper) ARJUNA012117: TransactionReaper::check timeout for TX 0:ffff11b24825:57ca474d:52df1c4c:36f8 in state  RUN
      02:03:32,031 WARN  [org.jboss.as.ejb3] (EJB default - 5) JBAS014143: Timer aaf75dff-caeb-4172-9681-fe047faf2cdd is still active, skipping overlapping scheduled execution at: Wed Jan 22 02:03:32 UTC 2014
      05:35:53,540 WARN  [com.arjuna.ats.arjuna] (Transaction Reaper) ARJUNA012117: TransactionReaper::check timeout for TX 0:ffff11b24825:57ca474d:52df1c4c:39ff in state  RUN
      05:36:09,992 WARN  [com.arjuna.ats.arjuna] (Transaction Reaper) ARJUNA012117: TransactionReaper::check timeout for TX 0:ffff11b24825:57ca474d:52df1c4c:3b41 in state  RUN
      05:36:09,992 WARN  [org.jboss.as.ejb3] (EJB default - 5) JBAS014143: Timer e1ae3fd6-93b8-4e89-a24e-b4ce0587d0ca is still active, skipping overlapping scheduled execution at: Wed Jan 22 05:36:09 UTC 2014
      05:36:14,704 WARN  [com.arjuna.ats.arjuna] (Transaction Reaper) ARJUNA012117: TransactionReaper::check timeout for TX 0:ffff11b24825:57ca474d:52df1c4c:3b68 in state  RUN
      05:36:21,907 WARN  [com.arjuna.ats.arjuna] (Transaction Reaper) ARJUNA012117: TransactionReaper::check timeout for TX 0:ffff11b24825:57ca474d:52df1c4c:3b69 in state  RUN
      05:36:21,907 WARN  [com.arjuna.ats.arjuna] (Transaction Reaper) ARJUNA012117: TransactionReaper::check timeout for TX 0:ffff11b24825:57ca474d:52df1c4c:3b76 in state  RUN
      05:36:21,907 WARN  [com.arjuna.ats.arjuna] (Transaction Reaper) ARJUNA012117: TransactionReaper::check timeout for TX 0:ffff11b24825:57ca474d:52df1c4c:3b95 in state  RUN
      05:36:21,908 WARN  [com.arjuna.ats.arjuna] (Transaction Reaper) ARJUNA012117: TransactionReaper::check timeout for TX 0:ffff11b24825:57ca474d:52df1c4c:3b9b in state  RUN
      05:36:21,907 WARN  [org.hornetq.core.client] (hornetq-failure-check-thread) HQ212107: Connection failure has been detected: HQ119034: Did not receive data from invm:0. It is likely the client has exited o
      r crashed without closing its connection, or the network between the server and client has failed. You also might have configured connection-ttl and client-failure-check-period incorrectly. Please check u
      ser manual for more information. The connection will now be closed. [code=CONNECTION_TIMEDOUT]
      
      


      Now see the database locks: (notice the lock time in seconds ...) This is from a query on Oracle


      ['Object', 'Terminal', 'Machine', 'Locker', 'Wait', 'Seconds', 'Lockmode', 'Object Type', 'Session ID', 'Serial', 'sid']
      ('RHQ.RHQ_AFFINITY_GROUP', 'rhq', '-rhq001', 'RHQ', 'ACTIVE', 13504, 'ROW EXCLUSIVE', 'TABLE', 1145, 9745, 1145)
      ('RHQ.RHQ_AGENT', 'rhq', '-rhq001', 'RHQ', 'ACTIVE', 13504, 'ROW EXCLUSIVE', 'TABLE', 1145, 9745, 1145)
      ('RHQ.RHQ_FAILOVER_LIST', 'rhq', '-rhq001', 'RHQ', 'ACTIVE', 11346, 'ROW EXCLUSIVE', 'TABLE', 11, 22221, 11)
      ('RHQ.RHQ_PARTITION_DETAILS', 'rhq', '-rhq001', 'RHQ', 'ACTIVE', 11346, 'ROW EXCLUSIVE', 'TABLE', 11, 22221, 11)
      ('RHQ.RHQ_PARTITION_EVENT', 'rhq', '-rhq001', 'RHQ', 'ACTIVE', 11346, 'ROW EXCLUSIVE', 'TABLE', 11, 22221, 11)
      ('RHQ.RHQ_SERVER', 'rhq', '-rhq001', 'RHQ', 'ACTIVE', 13504, 'ROW EXCLUSIVE', 'TABLE', 1145, 9745, 1145)
      

       

      I have a couple of thoughts and questions.

       

      1. Is this related at all to distributed transaction support?

      2. Is there a way to use simple database connections rather than XA ones?

      3. Is there a bug in JBoss EAP related to this at all?

       

      I'm sort of thinking co-locating Cassandra and RHQ on the same machine isn't a good idea, as I suspect the CPU usage spike Cassandra causes may result in unexpected timeouts on the RHQ side.