1 2 Previous Next 27 Replies Latest reply on Aug 8, 2011 9:54 AM by Rico Neubauer Go to original post
      • 15. Re: Warning during attempted recovery of a successfully committed XA transaction branch: JBoss 5
        Mark Little Master

        "JBossTS will not start the recovery process until all Resource Managers that participated in the failed transaction are back up. So the only way a Resource Manager may not be available during recovery is if it was up when recovery started and failed during recovery."

         

        No. The recovery subsystem is running continuously and will attempt to recover as soon as possible. Recovery of Xids on behalf of some RMs mayfail and in which case they remain in the log for the next recovery attempt. Successful recovery of Xids sourced from an RM mean that they are removed from the log and that RM will eventually no longer be required.

         

        "Also as far as I see there is no way for clients of JBossTS to distinguish between the following two cases:"

         

        What do you mean by a "client" in this case? You should realise that there is no recovery for inflight transactions, so application clients for a transaction being recovered have probably ceased to exist, or at least moved on to some other unit of work. In the case of recovery, the client is probably a sys admin.

         

        "a) The recovery process failed because one of the Resource Managers failed (which means there was an XAException thrown by the resource and suppressed by JBossTS)."

         

        Failures to recover are reported in the logs. You can even turn debugging on for recovery and get lots more information.

         

        "b) The recovery failed because JBossTS simply could not match the XID of one of the transaction branches to any of those returned by all resource managers (and there was no exception thrown)."

         

        Logs are your friend here too.

         

        "It would be really beneficial if we could distinguish between these two cases, as in the first case we would like the recovery for the given transaction to be attempted again and again until all resource managers are up during the entire recovery process and no XAExceptions are thrown, and in the second case we would like the recovery not to be attempted anymore. If the two cases are in fact not currently being distinguished by JBossTS, is there a specific reason for that since it looks like it is technically possible. Can this be changed? Thanks."

         

        These are certainly different situations. In b), if we have an Xid that we can't find "in" any of the RMs then the transaction system will automatically assume that it sent a commit message which was acted upon by the RM but the response from the RM either failed to get back before the crash, or there was a crash before the transaction coordinator could update its logs. These are very different situations and are already covered by the implementation. What makes you think they are not?

        • 16. Re: Warning during attempted recovery of a successfully committed XA transaction branch: JBoss 5
          Andriy Hnativ Newbie

          "These are very different situations and are already covered by the implementation. What makes you think they are not?"


          First of all, in the ”JBoss Transactions 4.2 – JTS Programmers Guide” in chapter 7 (Failure recovery) subsection “Recovery behaviour” it is said “It is important to realise that the RecoveryManager cannot distinguish these two cases (the same ones we talk about here) by any protocol mechanism.


          Second, I can judge based on the recovery behaviour. If I simulate the situation b), the transaction system does not assume anything for me, and does not stop to attempt recovery for that XID after the first attempt; so still continuously (by default, every 2 minutes) it tries to recover the same XID even though it was not matched to any of the RMs during previous attempts.


          How do I make the JBossTS stop recovery attempts after the first attempt in the situation b), but continue recovery attempts in the situation a)? Where in the code is it supposed to differentiate the 2 cases, and stop the recovery attempts in the situation b)?


          Yes, by “clients” I mean sys admins.


          “No. The recovery subsystem is running continuously and will attempt to recover as soon as possible”


          Which is when? Again, in the JBossTS doc it is saidIn the case of machine (system) crash or network failure, the recovery will not take place until the system or network are restored, but the original application does not need to be restarted”. Doesn’t that sentence mean that recovery will not start until all Resource Managers that participated in the failed transaction are back up?

          • 17. Re: Warning during attempted recovery of a successfully committed XA transaction branch: JBoss 5
            Mark Little Master

            Comments in line ...

            Andriy Hnativ wrote:

             

            "These are very different situations and are already covered by the implementation. What makes you think they are not?"


            First of all, in the ”JBoss Transactions 4.2 – JTS Programmers Guide” in chapter 7 (Failure recovery) subsection “Recovery behaviour” it is said “It is important to realise that the RecoveryManager cannot distinguish these two cases (the same ones we talk about here) by any protocol mechanism.


            <ml>You shouldn't take bits of documentation text out of context. First, you are quoting from the JTS documentation, which is talking about JTS/OTS Resources and not specifically XA Resource Managers. Yes, the docs are right in that the error responses are identical in the case of CORBA. But if you have specific types of Resources, then assumptions can be made on their behalf. These assumptions may not be valid for arbitrary Resources, which is why the OTS standard does not call them out specifically. If you read the rest of what I posted then you should be able to understand what I mean and how the documentation is still valid.</ml>


            Second, I can judge based on the recovery behaviour. If I simulate the situation b), the transaction system does not assume anything for me, and does not stop to attempt recovery for that XID after the first attempt; so still continuously (by default, every 2 minutes) it tries to recover the same XID even though it was not matched to any of the RMs during previous attempts.


            <ml>Without seeing your test, we really can't comment.</ml>


            How do I make the JBossTS stop recovery attempts after the first attempt in the situation b), but continue recovery attempts in the situation a)? Where in the code is it supposed to differentiate the 2 cases, and stop the recovery attempts in the situation b)?


            <ml>It's all in the repository. Usually within <...>.recovery.<...> package structures. There are tests in the repository too that you can check out and run to confirm that the release you have is functioning correctly. However, if you do believe that there is an issue, create a JIRA and attach a stand-alone test case for us to look at.</ml>


            Yes, by “clients” I mean sys admins.


            “No. The recovery subsystem is running continuously and will attempt to recover as soon as possible”


            Which is when? Again, in the JBossTS doc it is saidIn the case of machine (system) crash or network failure, the recovery will not take place until the system or network are restored, but the original application does not need to be restarted”. Doesn’t that sentence mean that recovery will not start until all Resource Managers that participated in the failed transaction are back up?


            <ml>Did you read the Failure Recovery Guide?</ml>

            • 18. Re: Warning during attempted recovery of a successfully committed XA transaction branch: JBoss 5
              Andriy Hnativ Newbie

              We retested the following four cases:

               

              1)            Test case when during recovery the Resource is not available, with the property com.arjuna.ats.jta.xaAssumeRecoveryComplete set to true.

              2)            Test case when during recovery the Resource is available but returns an empty array of xids when recover() is called, with the property com.arjuna.ats.jta.xaAssumeRecoveryComplete set to true.

              3)            Test case when during recovery the Resource is not available, with the property com.arjuna.ats.jta.xaAssumeRecoveryComplete set to false.

              4)            Test case when during recovery the Resource is available but returns an empty array of xids when recover() is called, with the property com.arjuna.ats.jta.xaAssumeRecoveryComplete set to false.

               

               

              I can confirm that for both first 2 cases (with the property xaAssumeRecoveryComplete set to true) the recovery process stops after the unsuccessful attempt. After that attempt as far as I can judge, JBoss removes the XID from the transaction log, and successfully commits the failed transaction. So recovery process behaves similarly regardless of whether the resource manager is unavailable, or the resource manager is available but returns an empty array of xids.

               

              Also for both last 2 cases (with the property xaAssumeRecoveryComplete set to false) the recovery process does not stop after an unsuccessful attempt, and continues over and over periodically. So again recovery process behaves similarly regardless of whether the resource manager is unavailable, or the resource manager is available but returns an empty array of xids.

               

              Here are the steps of our test cases:

               

              a)            For 1) and 2), set the property com.arjuna.ats.jta.xaAssumeRecoveryComplete in jbossts-properties.xml to true.

              b)            During a transaction kill JBoss right after one of the resources returned from commit () method (so after the resource successfully committed but before JBoss updated its transactional logs).

              c)            For 1) and 3) kill the resource that successfully committed during the previous transaction.

              d)            Restart JBoss, during the startup the recovery process is being triggered.

               

              I think, cases 1) and 2) should differ in the following: for case 1) JBoss should be continuing recovery after the unsuccessful attempt, since it knows that one of the resources is unavailable during recovery. In other words, JBoss should know that it never matched the necessary XID and because there is a failing resource manager that could potentially return the matching XID when it becomes available, it should never remove the XID from the transaction log.  Recovery process for that XID should start over and over until the resource becomes available and the XID can be matched or not.

              • 19. Re: Warning during attempted recovery of a successfully committed XA transaction branch: JBoss 5
                Andriy Hnativ Newbie

                Hi, so can I get a response to my previous post? Is that really a problem with JBoss?

                 

                We have provided you with the test cases that (we think) show the problem - do you need some additional information? Maybe we are doing something wrong?

                 

                Thanks.

                • 20. Re: Warning during attempted recovery of a successfully committed XA transaction branch: JBoss 5
                  Mark Little Master

                  I'll check with Mike when he gets back, but did you provide the actual test cases (code) or just describe what you are doing?

                  • 21. Re: Warning during attempted recovery of a successfully committed XA transaction branch: JBoss 5
                    Jonathan Halliday Master

                    cases 1) and 2) should differ in the following: for case 1) JBoss should be continuing recovery after the unsuccessful attempt, since it knows that one of the resources is unavailable during recovery. In other words, JBoss should know that it never matched the necessary XID and because there is a failing resource manager that could potentially return the matching XID when it becomes available, it should never remove the XID from the transaction log.  Recovery process for that XID should start over and over until the resource becomes available and the XID can be matched or not."

                     

                    Umm, No. You explicitly told it to assumeComplete regardless of what it actually saw with its own eyes. Stop trying to blame it for doing as it's told. If you want the stated behaviour, just go back to the default settings - that's how they work.

                    • 22. Re: Warning during attempted recovery of a successfully committed XA transaction branch: JBoss 5
                      Andriy Hnativ Newbie

                      Thanks Mark. For now I did not provide any code because before investing the time in effort in writing the tests, I want to understand the expected behaviour. I want to know how the recovery should behave in the four cases I described.

                       

                      Jonathan, the problem with the default settings and the reason for this whole thread is that the case 4 is working not the way we want – it retries recovery indefinitely, polluting the logs whereas we want it to stop after trying only once.

                       

                      Our goal is to conditionally stop the recovery process after the first unsuccessful attempt:

                         stop recovery attempts if the first attempt failed because JBossTS could not match the XID of one of the transaction branches to any of   

                         those returned by all resource managers

                      but

                         continue recovery attempts if recovery failed because one of the resource managers was unavailable.

                       

                      Thanks.

                      • 23. Re: Warning during attempted recovery of a successfully committed XA transaction branch: JBoss 5
                        Michael Musgrove Master

                        Andriy Hnativ wrote:

                         

                        Our goal is to conditionally stop the recovery process after the first unsuccessful attempt:

                           stop recovery attempts if the first attempt failed because JBossTS could not match the XID of one of the transaction branches to any of those returned by all resource managers

                        Just because the known XA resource managers don't know about the xid doesn't mean that it does not exist. There could be other reasons as to why the configured RMs don't know about the xid (the simplest scenario is misconfiguration).

                         

                        As I mentioned in earlier post on this thread you need to configure the transaction expirary scanners. The the recovery will stop (after a number of hours).

                        • 24. Re: Warning during attempted recovery of a successfully committed XA transaction branch: JBoss 5
                          Jonathan Halliday Master

                          yup, the problem is the system can't distinguish those two cases. In the absence of logged meta-data relating the Xid to a datasource Id, it's impossible to distinguish between 'all registered resource managers' and 'all possible resource managers where the Xid may reside'.  The situation where you forget to register an RM for recovery, or where they are deployed and undeployed dynamically, is still going to hurt.  assumeComplete is a pretty crude global override covering two cases. What you're looking for is a more conservative, fine-grained control that treats them separately, equivalent to 'assume the set of RMs registered for recovery is complete, but within that assumption continue to be paranoid.'  i.e. 'unavailable' becomes something categorically determined by a failed scan attempt, rather than something that may be caused by a failed scan attempt or the inability to scan in the first place because it's unregistered.  I can see the utility of such an option, but we're unlikely to add it because the new meta-data based approach that's already in the pipeline would largely obsolete it.

                          • 25. Re: Warning during attempted recovery of a successfully committed XA transaction branch: JBoss 5
                            Andriy Hnativ Newbie

                            Thanks Jonathan for the information, this is what we were looking for, now we can make our decisions based on this information.

                             

                            Thanks Michael, we will look into that.

                            • 26. Re: Warning during attempted recovery of a successfully committed XA transaction branch: JBoss 5
                              Andriy Hnativ Newbie

                              In case anybody wants a solution for this (making JBoss capable of differentiating whether

                              (1) it could not match XIDs (which would lead to the failed recovery attempt) when one of the resources registered for recovery in jbossts-properties.xml was not available, or

                              (2) all registered resources were available but JBoss could not match XIDs anyway),

                              we developed a patch (attached) - when applied, JBoss will stop attempting recovery if during the previous recovery attempt all registered resources were available but JBoss could not match XIDs (so in the case 2) AND IF THE PROPERTY com.arjuna.ats.jta.xaAssumeRecoveryComplete IS SET TO TRUE (so we changed the behavior of that property: now if it is set to true, JBoss will not always stop attempting recovery if the previous attempt failed - in the case 1 the attempts will not be stopped).

                              1 2 Previous Next