1 2 Previous Next 27 Replies Latest reply on Aug 8, 2011 9:54 AM by Rico Neubauer

    Warning during attempted recovery of a successfully committed XA transaction branch: JBoss 5

    Andriy Hnativ Newbie

      When inside an XA transaction JBoss 5 calls commit() on a XAResource that is participating in the transaction, then (right after successfully returning from XAResource.commit()) the server crashes, and then is restarted, the message “Could not find new XAResource to use for recovering non-serializable XAResource” is thrown. This happens due to the fact that when attempting recovery, the recover() method of the  resource (that successfully committed the transaction before the crash) returns an empty array of xids (because as far as this resource is concerned, the previous transaction was completed successfully, so it does not store the corresponding XID of the previous transaction in its log).  Later (in XARecoveryModule.getNewXAResource()) JBoss tries to match elements of this array to the XID that it tries to recover (since because of the crash JBoss did not update its transactional logs so it thinks it should still recover the transaction branch with a given XID) and the failure to match that XID to any element in the array of XIDs leads to the warning.

       

      Is this a known issue, and is there any workaround to prevent the warning from appearing in the logs?

       

      To reproduce

       

      • 1) Choose any XAResource that will participate in the XA transaction, and set the breakpoint at its XAResource.commit()
      • 2) Start the JBoss and initiate the transaction in which the XAResource, for which the breakpoint is set, will participate; right after the execution flow returns from the XAResource.commit() method, kill the server.
      • 3) Restart JBoss
        • 2. Re: Warning during attempted recovery of a successfully committed XA transaction branch: JBoss 5
          Andriy Hnativ Newbie

          Thanks for your answer. I have some additional questions:

           

          In the JBossTS documentation it says:

           

          "

          The number of times the RecoveryManager will attempt to issue commit (at its own initiative, as part of the periodic recovery) is controlled by the property COMMITTED_TRANSACTION_RETRY_LIMIT (default is 3 times).

          "

           

          First of all, for us the recovery process does not stop after 3 attempts, and it continues over and over, so we get the warning messages periodically for a long period of time. We get periodically 2 kinds of warnings (The first one is "Could not find new XAResource to use for recovering non-serializable XAResource" and another one is "No XAResource to recover"), and the second kind we get from XAResourceRecord.topLevelCommit (). So why even after 3 attempts recovery process still tries to commit the failed transaction?

           

          Also when we try to set the COMMITTED_TRANSACTION_RETRY_LIMIT property to 1 ( we do this by adding

           

          <property

                      name="com.arjuna.ats.jts.recovery.commitTransactionRetryLimit" value="1"/>

           

          to the <properties depends="arjuna,txoj,jta" name="recoverymanager"> section of the jbossts-properties.xml) it does not seem to change the recovery behavior whatsoever - we still get the warning messages periodically.

           

          So how do we limit the number of recovery attempts? Thank you.

          • 3. Re: Warning during attempted recovery of a successfully committed XA transaction branch: JBoss 5
            Michael Musgrove Master

            Are you using JTS or JTA, I ask because it looks like com.arjuna.ats.jts.recovery.commitTransactionRetryLimit is JTS specific, which document refers to this property.

             

            How long are you waiting for these warnings to go away, the article Mark updated talks about the recovery eventually assuming that the resource committed but the default is 12 hours.

             

            And thirdly, there is a property that controls what happens if it is not possible to find a suitable XAResource to recover the incomplete branch,

            and it comes with a strong health warning. You can set the property <property name="com.arjuna.ats.jta.xaAssumeRecoveryComplete" value="true"/> in the jta section of the properties file. In this case failing to find a suitable XAResource will cause recovery to assume the branch completed and discard the log record.

             

            It would be helpful if someone could comment under what circumstances setting this property can lead to unsafe outcomes.

            • 4. Re: Warning during attempted recovery of a successfully committed XA transaction branch: JBoss 5
              Andriy Hnativ Newbie

              Thanks Michael. We use JTA.

               

              Yes, it would be very useful if somebody could provide the information about downsides of using the “com.arjuna.ats.jta.xaAssumeRecoveryComplete” property and the conditions when setting this property can lead to unsafe outcomes.

               

              The question that we still have is: during recovery JBossTS (in XARecoveryModule.getNewXAResource) gets an array of XIDs that are in a prepared or heuristically completed state (so the ones that should be recovered) from the resource and tries to match one of the elements of this array to the XID of one of the transaction branches of the failed transaction, and if it cannot match any, it returns null instead of the instance of the XAResource, which leads to the warning messages being constantly reported and failed recovery. When JBossTS cannot match the XID with any of the XIDs returned by the resource, why would not it just assume that the transaction branch with that XID was successfully committed before the crash, and remove that branch with that XID from the current transaction which it tries to recover?

               

              If the resource returns an empty array of XIDs, this means that it does not know of any XIDs that should be recovered, which means that this resource successfully completed commit before JBoss crashed – is there any other case why the necessary XID was not returned in the XID array by the resource during recovery? Why for 12 hours (or any other chosen time) JBossTS would periodically try getting XIDs from that resource, constantly try to match them with its known XIDs, constantly fail, and constantly report the same warnings?

              • 5. Re: Warning during attempted recovery of a successfully committed XA transaction branch: JBoss 5
                Jonathan Halliday Master

                You're making the fatally over-optimistic assumption that JBossTS recovery always has complete knowledge of the global state.  There may be more than one resource manager involved in a transaction. At any given time there may be one or more such resource managers that are unavailable due to failures or misconfiguration of the recovery plugins. The fact that the recovery system does not obtain knowledge of a given in-doubt branch cannot normally be taken as proof that no such branch exists.

                 

                An empty scan result can be taken as conclusive proof of the absence of an in-doubt branch only if you can also show that scans were performed against all the resource managers that may have held knowledge of that branch. In short, you need some way to match the resource manager you're doing the recovery scan with to the one that was used to commit the branch, or to prove that you've exhaustively scanned all possible resource managers.

                 

                The latter is roughly what xaAssumeRecoveryComplete does - it overrides the normally paranoid assumptions and forces the TS to assume a scan pass provides globally complete knowledge. If you use it and one or more of your resource managers is not correctly configured for recovery or is crashed with prepared branches in it, the recovery system will throw away the assumed complete logs on the basis that it did not get an in-doubt branch from the RM recovery scan. When the RM reconnects, the in-doubt branches will be rolled back under presumed abort as no tx logs exists. This is not good.

                 

                The other approach, using additional meta-data in the logs to match individual branches to their owning RM, is more fine-grained and robust. On the down side, it needs some api changes inside the AS integration code. We've had it on the roadmap for years whilst waiting for those changes and with luck should finally get it implemented somewhere around AS 7.1.

                • 6. Re: Warning during attempted recovery of a successfully committed XA transaction branch: JBoss 5
                  Rico Neubauer Novice

                  Having the same symptoms as Andriy: Never ending "Could not find new XAResource to use for recovering non-serializable XAResource" and  "No XAResource to recover".

                   

                  Difference here is, that I use JTS and I can attest, that those messages do not stop after 12 hours. As of now I have a system runnig for 16 hours and I still get them.

                   

                  I also have more, potentially related, problems in this context as the system cannot reliable work anymore after a crash recovery once took place, not regarding the failed transactions, but also newly created ones. But that's maybe worth another thread.

                   

                  Thanks for anyone being able to shed some light on this.

                  • 7. Re: Warning during attempted recovery of a successfully committed XA transaction branch: JBoss 5
                    Mark Little Master

                    I know Jonathan has answered this extremely eloquently, but I just thought it worth reminding folks that we are an open source company and many of the things we discuss here can be found by checking the code as well as the docs. For instance, if you look within a specific class (no hints given, I'll leave that as an exercise for the reader), then you will find the following comment, which explains everything:

                     

                    /*

                                         * WARNING: USE WITH EXTEREME CARE!!

                                         *

                                         * This assumes that if there is no XAResource that can deal with an Xid

                                         * after recovery, then we failed after successfully committing the transaction

                                         * but before updating the log. In which case we just need to ignore this

                                         * resource and remove the entry from the log.

                                         *

                                         * BUT if not all XAResourceRecovery instances are correctly implemented

                                         * (or present) we may end up removing participants that have not been dealt

                                         * with. Hence USE WITH EXTREME CARE!!

                                         */

                    • 9. Re: Warning during attempted recovery of a successfully committed XA transaction branch: JBoss 5
                      Rico Neubauer Novice

                      JBoss 6.0.0 with JTS 4.14.0.

                      I also tried with JTS 4.15.0 and also the current version of JBoss 6.1 - all without noticeable difference.

                      • 10. Re: Warning during attempted recovery of a successfully committed XA transaction branch: JBoss 5
                        Mark Little Master

                        OK, unless your situation is *exactly* the same as that described here originally (which it isn't, since the title of this is AS5 and not AS6), let's take this to a separate forum entry. That way it won't mess up anything else we may discuss in this original thread and it'll make it easier for others to follow or search later.

                         

                        So create a new forum entry and make sure you include all of the details, which should include details of the scenario you were testing, the way in which the failures happened etc.

                        • 11. Re: Warning during attempted recovery of a successfully committed XA transaction branch: JBoss 5
                          Michael Musgrove Master

                          Rico Neubauer wrote:

                           

                          Difference here is, that I use JTS and I can attest, that those messages do not stop after 12 hours. As of now I have a system runnig for 16 hours and I still get them.

                          The default jbossts-properties.xml only removes expired TransactionStatusManager items. My guess is you also need an entry for expired transactions:

                           

                          <property name="com.arjuna.ats.arjuna.recovery.expiryScannerExpiredTransaction" value="com.arjuna.ats.internal.arjuna.recovery.ExpiredTransactionScanner"/>

                           

                          and this will run every 12 hours if you have used the default scan interval:

                              <property name="com.arjuna.ats.arjuna.recovery.expiryScanInterval" value="12"/>

                          • 12. Re: Warning during attempted recovery of a successfully committed XA transaction branch: JBoss 5
                            Andriy Hnativ Newbie

                            Jonathan, you are saying "At any given time there may be one or more such resource managers that are unavailable due to failures or misconfiguration of the recovery plugins." - so if the resource manager is unavailable due to a fauilure, wouldn't it just throw an exception during recovery? When the JBossTS gets an empty array from a resource during recovery, wouldn't that mean that the resource manager is available?

                            • 13. Re: Warning during attempted recovery of a successfully committed XA transaction branch: JBoss 5
                              Michael Musgrove Master

                              We use different expiry scanners for different log record types. We'll open a JIRA to document when to use the different scanners. Note that the scan interval of all expiry scanners is controlled by the com.arjuna.ats.arjuna.recovery.expiryScanInterval property (whose unit is hours) so to test that you are using the correct scanner you might like to reduce the scan interval (minimum is 1 hour) until you are sure it is working as expected.

                               

                              Looking at the code here is what I think you need to do:

                               

                              For JTA use the following two scanners:

                               

                              com.arjuna.ats.internal.arjuna.recovery.AtomicActionExpiryScanner

                              com.arjuna.ats.internal.arjuna.recovery.ExpiredTransactionStatusManagerScanner

                               

                              And if you are using JTS then additionally include these two scanners:

                               

                              com.arjuna.ats.internal.jts.recovery.transactions.ExpiredServerScanner

                              com.arjuna.ats.internal.jts.recovery.transactions.ExpiredToplevelScanner

                               

                              So, for example, if you were using JTS then your properties file should contain something similar to:

                               

                              <property name="com.arjuna.ats.arjuna.recovery.expiryScanner1"

                                   value="com.arjuna.ats.internal.arjuna.recovery.AtomicActionExpiryScanner"/>

                              <property name="com.arjuna.ats.arjuna.recovery.expiryScanner2"

                                   value="com.arjuna.ats.internal.arjuna.recovery.ExpiredTransactionStatusManagerScanner"/>

                              <property name="com.arjuna.ats.arjuna.recovery.expiryScanner3"

                                   value="com.arjuna.ats.internal.jts.recovery.transactions.ExpiredServerScanner"/>

                              <property name="com.arjuna.ats.arjuna.recovery.expiryScanner4"

                                   value="com.arjuna.ats.internal.jts.recovery.transactions.ExpiredToplevelScanner"/>

                               

                              One caveat, I would have expected to see a JTA specific expiry scanner so we will need to look into that whilst documenting when to use which scanners.

                              • 14. Re: Warning during attempted recovery of a successfully committed XA transaction branch: JBoss 5
                                Andriy Hnativ Newbie

                                Can somebody please confirm me the following:

                                 

                                JBossTS will not start the recovery process until all Resource Managers that participated in the failed transaction are back up. So the only way a Resource Manager may not be available during recovery is if it was up when recovery started and failed during recovery.

                                 

                                Also as far as I see there is no way for clients of JBossTS to distinguish between the following two cases:

                                 

                                a) The recovery process failed because one of the Resource Managers failed (which means there was an XAException thrown by the resource and suppressed by JBossTS).

                                b) The recovery failed because JBossTS simply could not match the XID of one of the transaction branches to any of those returned by all resource managers (and there was no exception thrown).

                                 

                                It would be really beneficial if we could distinguish between these two cases, as in the first case we would like the recovery for the given transaction to be attempted again and again until all resource managers are up during the entire recovery process and no XAExceptions are thrown, and in the second case we would like the recovery not to be attempted anymore. If the two cases are in fact not currently being distinguished by JBossTS, is there a specific reason for that since it looks like it is technically possible. Can this be changed? Thanks.

                                1 2 Previous Next