2 Replies Latest reply on Mar 5, 2009 5:30 AM by adrian.brock

    Transaction Manager patchy or How to recover successfully fr

    mreasy

      Hi,

      We experienced the symptom of JMS-messages being processed successfully by the JMS-provider (and removed from its context, i.e. MessageCache) but due to XA-Errors not being deleted from the database. This leads to the messages being re-read on the next JBoss-start and being treated as new messages which are processed again ==> duplicate message-processing.

      See https://jira.jboss.org/jira/browse/JBAS-6498 for the same symptom, but under the assumption that it derived from DB-errors directly.

      Stacktrace, for failing XA-transaction (stacktrace actually showing the end-method, whereas the commit-method seems to be the problem):

      org.jboss.resource.adapter.jdbc.xa.XAManagedConnectionFactory End transaction failed for XAResource
      oracle.jdbc.xa.OracleXAException
       at oracle.jdbc.xa.OracleXAResource.checkError(OracleXAResource.java:938)
       at oracle.jdbc.xa.client.OracleXAResource.end(OracleXAResource.java:385)
       at org.jboss.resource.adapter.jdbc.xa.XAManagedConnection.end(XAManagedConnection.java:147)
       at org.jboss.tm.TransactionImpl$Resource.endResource(TransactionImpl.java:2143)
       at org.jboss.tm.TransactionImpl$Resource.endResource(TransactionImpl.java:2118)
       at org.jboss.tm.TransactionImpl.endResources(TransactionImpl.java:1462)
       at org.jboss.tm.TransactionImpl.beforePrepare(TransactionImpl.java:1116)
       at org.jboss.tm.TransactionImpl.commit(TransactionImpl.java:324)
       at org.jboss.tm.TxManager.commit(TxManager.java:240)
       at org.jboss.jms.asf.StdServerSession.onMessage(StdServerSession.java:351)
       at org.jboss.mq.SpyMessageConsumer.sessionConsumerProcessMessage(SpyMessageConsumer.java:906)
       at org.jboss.mq.SpyMessageConsumer.addMessage(SpyMessageConsumer.java:170)
       at org.jboss.mq.SpySession.run(SpySession.java:323)
       at org.jboss.jms.asf.StdServerSession.run(StdServerSession.java:194)
       at EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(PooledExecutor.java:748)
       at java.lang.Thread.run(Thread.java:619)


      Now what I did to provoke this error was to hack exceptions into XAMAnagedConnection in a way it would be possible for the JDBC-driver to throw (will attach a diff towards Branch_4_2).

      This hack always shows the problem of JMS-messages laying around in the DB and not available (anymore) in the JMS-provider.

      Now Adrian came into play and told us to rtfm and activate XARecovery as described in https://jira.jboss.org/jira/browse/JBAS-6498.
      Did so and from a first sight it seemed better (messages actually processed having TXOP 'D' in DB) but on the next restart those messages were not deleted as one would have expected.

      The problems were present with any tupel of the following:
      JBoss 4.0.5_GA with/without XARecovery
      JBoss 4.2.3_GA with/without XARecovery
      JBoss Branch_4_2 (upcoming 4.2.4) with/without XARecovery
      Oracle 10 / 11
      Oracle_JAVA_PACKAGE installed /uninstalled to allow DB-side Java-XA-transactions
      Orale rights for sys.dba_pending_transactions, sys.dba_2pc_pending, sys.pending_trans and sys.dbms_system available /unavailable

      Oracle-THIN-driver was used (versions 1.4, 5 and 6 make no difference here).

      So you see we had arjuna and the old TM under test.

      Some combinations with XARecovery activated even showed a lot worse behaviour resulting in messages being multiplied, or DB-records being locked forever with arjuna (but that's worth another topic).

      So finally the question:
      - Is this a general bug in the transaction manager?
      - Are we missing sth. which is not described in any of these
      - http://management-platform.blogspot.com/2008/11/transaction-recovery-in-jbossas.html
      - http://www.jboss.org/community/docs/DOC-10013
      - Usual topics in Jira and forum?
      - Is our use-case b.s.? Nevertheless it would be valid, since experienced 'in-the-wild'.

      Thanks for any light you can shed on this ;)

      Regards
      Rico

        • 1. Re: Transaction Manager patchy or How to recover successfull
          mreasy

          Hmm, any possibility to add this as attachment?
          XAManagedConnection-diff towards Branch_4_2 as described above

          Index: src/main/org/jboss/resource/adapter/jdbc/xa/XAManagedConnection.java
          ===================================================================
          --- src/main/org/jboss/resource/adapter/jdbc/xa/XAManagedConnection.java (revision 84608)
          +++ src/main/org/jboss/resource/adapter/jdbc/xa/XAManagedConnection.java (working copy)
          @@ -21,8 +21,10 @@
           */
           package org.jboss.resource.adapter.jdbc.xa;
          
          +import java.lang.management.ManagementFactory;
           import java.sql.SQLException;
           import java.util.Properties;
          +import java.util.Random;
          
           import javax.resource.ResourceException;
           import javax.resource.spi.LocalTransaction;
          @@ -45,12 +47,22 @@
           */
           public class XAManagedConnection extends BaseWrapperManagedConnection implements XAResource, LocalTransaction
           {
          + /**
          + * 5 min.
          + */
          + private static final long UPTIME_THRESHOLD = 300000;
          +
           protected final XAConnection xaConnection;
          
           protected final XAResource xaResource;
          
           protected Xid currentXid;
          +
          + private final Random random = new Random();
          
          + private volatile boolean started;
          +
          +
           public XAManagedConnection(XAManagedConnectionFactory mcf, XAConnection xaConnection, Properties props,
           int transactionIsolation, int psCacheSize) throws SQLException
           {
          @@ -121,6 +133,7 @@
           }
           try
           {
          + fail("start");
           xaResource.start(xid, flags);
          
           }catch(XAException e)
          @@ -188,6 +201,7 @@
          
           public void commit(Xid xid, boolean onePhase) throws XAException
           {
          + fail("commit");
           xaResource.commit(xid, onePhase);
           }
          
          @@ -331,5 +345,45 @@
           unlock();
           }
           }
          +
          +
          + /**
          + * @throws XAException
          + */
          + private void fail(String methodName) throws XAException
          + {
          + if (!started)
          + {
          + final long uptime = ManagementFactory.getRuntimeMXBean().getUptime();
          + if (uptime > UPTIME_THRESHOLD)
          + {
          + System.out.println("System up for " + uptime + "ms, considering started.");
          + started = true;
          + }
          + else
          + {
          + return;
          + }
          + }
          
          + // only consider cases with JMS-addMessage
          + boolean found = false;
          + StackTraceElement[] stackTrace = new Throwable().getStackTrace();
          + for (int i = 0; i < stackTrace.length; i++)
          + {
          + if (stackTrace.getMethodName().toLowerCase().contains(("addmessage")))
           + {
           + found = true;
           + break;
           + }
           + }
           +
           + // failing with probability of 10 percent
           + if (found && started && random.nextFloat() < 0.1f)
           + {
           + final XAException ex = new XAException("FAKE Error in " + methodName);
           + ex.errorCode = XAException.XAER_RMERR;
           + throw ex;
           + }
           + //System.out.println("No FAKE Error in " + methodName);
           + }
           +
           }
          


          • 2. Re: Transaction Manager patchy or How to recover successfull

             

            "MrEasy" wrote:

            Now Adrian came into play and told us to rtfm and activate XARecovery as described in https://jira.jboss.org/jira/browse/JBAS-6498.
            Did so and from a first sight it seemed better (messages actually processed having TXOP 'D' in DB) but on the next restart those messages were not deleted as one would have expected.

            The problems were present with any tupel of the following:
            JBoss 4.0.5_GA with/without XARecovery
            JBoss 4.2.3_GA with/without XARecovery
            JBoss Branch_4_2 (upcoming 4.2.4) with/without XARecovery
            Oracle 10 / 11


            Its impossible to answer this question without seeing your configuration
            (and preferably some logging when the problem occurs).

            Recovery by the TM should redo the commit which then removes the "D"
            records from JBossMQ,

            jdbc2.PersistenceManager
             public void commitPersistentTx(Tx txId) throws javax.jms.JMSException
             {
             if (txId.wasPersisted() == false)
             return;
            
             TransactionManagerStrategy tms = new TransactionManagerStrategy();
             tms.startTX();
             Connection c = null;
             boolean threadWasInterrupted = Thread.interrupted();
             try
             {
            
             c = this.getConnection();
             removeMarkedMessages(c, txId, "D"); // HERE!!!!!!!!!!!!!!!!!
             removeTXRecord(c, txId.longValue());
            
             }
            


            My guess is what is happening is:

            1) You have a failed commit.
            2) The TM recovers and rollsback
            3) It redelivers the message
            4) The transaction of the redelivered message still fails to commit

            Whether (4) is a bug, configuration issue or some other problem is impossible
            to say without more information.

            NOTE: XARM_ERR is special, but I'm not sure it is relevant to your problem?
            The TM will not typically try to recover from XARM_ERR,
            it requires manual recovery since it is an "unknown error".
            That is as opposed to say XARM_FAIL which represents a temporary failure.