5 Replies Latest reply on May 23, 2006 11:46 AM by marklittle

    No recovery of XA resource if crash during prepare

    paul.robinson

      Hello,

      I have created a test XA resource that simply logs its actions. I can give the resource a name so that when it logs it identifies itself. I can also choose when (if at all) I want it to crash.

      I am investigating what happens when a resource fails during prepare. I have six resources, (A-F). I enlist them in order and configure resource E to crash at the beginning of its prepare method.

      My output is bellow. For example, the first line states that the method start was called on A. Eventually, when E is prepared it crashes, this is shown by the line "E: CRASH". After the crash (a Runtime.getRuntime().halt(0)) I restart JBoss and then the resources A-D are found in the intentions log, deserialised and rolled back.

      There are 2 things that stand out to me as odd...

      1) Rollback is not called on E. I know that E crashed at the start of the prepare method. However, as far as the coordinator is concerned, it may have crashed after the prepare had completed but before the outcome had been logged completely.

      Because JBossTS only serialises resources after the completion of a prepare, E was never serialised and thus the recovery manager knew nothing about E when I rebooted JBoss. This seems wrong to me as E could have prepared and now be waiting around for instructions that it will never receive.



      2) The resources A-D are rolled back twice. I know the rollback method should be idempotent, but it does seem a little odd.

      Thanks,

      Paul.





      A: start
      B: start
      C: start
      D: start
      E: start
      F: start
      A: end
      A: prepare (0)
      A: serializing
      B: end
      B: prepare (0)
      B: serializing
      C: end
      C: prepare (0)
      C: serializing
      D: end
      D: prepare (0)
      D: serializing
      E: end
      E: CRASH!
      A: de-serializing
      A: rollback
      A: rollback
      B: de-serializing
      B: rollback
      B: rollback
      C: de-serializing
      C: rollback
      C: rollback
      D: de-serializing
      D: rollback
      D: rollback

        • 1. Re: No recovery of XA resource if crash during prepare
          marklittle

          In order to explain 2), you'll need to tell us which version of JTA are you using (local or JTS)?

          Either way, the explanation for 1) remains the same: presumed abort! The coordinator doesn't log anything until it knows for sure that the prepare has been successfully acked by all participants. If it doesn't know this (and in the case of E it can't), it rolls back. There is no point in it trying to contact E because it knows that E, if it's a well behaved (i.e., correctly implemented) participant will also abide by the rules and rollback if it crashed. If it didn't crash and did get the prepare and sent an ack (but the ack got lost, for example), then it should call back to the coordinator eventually and find out: it'll be told that the transaction rolled back.

          What were you expecting? That the coordinator would record the information about the failed participant(s) and keep trying to roll them back? Not much of an optimisation in that case ;-)

          OK, it could just do this from volatile memory, but that doesn't reduce the window of vulernability (plus there's still overhead on the coordinator). All it is doing is sending a convenience signal to the participant (and that will only be received if the participant has recovered). If the participant really did fail then it is likely that instance won't ever come back (why would it, since it may not have logged anything), so the repeated rollback message from the coordinator could easily come back with a fault: participant-does-not-exist. A lot of effort for very little advantage.

          Implement the participant so it abides by presumed abort semantics. 2PC is a contract between coordinator and participant: it's not the domain of the coordinator to implement the whole thing.

          • 2. Re: No recovery of XA resource if crash during prepare
            paul.robinson

            Thanks for the response.

            I am using JBossTS in JTS mode.

            I thought it looked like it was doing presumed abort. However, I was under the impression that XA didn't support presumed abort. My understanding was that presumed abort will only work when resource driven recovery is possible. I didn't realise that you could do resource driven recovery with XA. How does the resource request the outcome of a transaction from the coordinator; what interface does it use?

            • 3. Re: No recovery of XA resource if crash during prepare
              marklittle

              Vanilla XA doesn't. However, we wrap it in JTS so it does. Read the JTS/OTS specification to understand how recovery works there.

              The reason you get two commit/rollback messages is because top-down and bottom-up recovery kick in simultaneously.

              • 4. Re: No recovery of XA resource if crash during prepare
                paul.robinson

                Ah ok, that makes sense now.

                I see two rollbacks in my example; because each prepared resource is asking the coordinator what the outcome was (bottom up recovery) and the coordinator is rolling back all the prepared resources (top-down) recovery.

                For resource E I won't see top-down recovery because, as far as the coordinator is concerned, E did not prepare. Also, from the point of view of E, E was never prepared either as it crashed during the prepare method.

                So, if E.prepare() returns successfully, E's JTS wrapper (Ejts) would then be logged. Should E crash after the log, but before the return to the coordinator, the recovery manager at E would find Ejts in the log, look at the XID to find the address of the coordinator and then ask the coordinator what the outcome was. The coordinator would have nothing logged for E, so would presume that rollback is the right thing to do.

                However, how would E recover if it was to crash after the return of E.prepare() but before successful completion of the logging by Ejts?

                Am I correct in thinking there is a window of vulnerability here?

                Paul.

                • 5. Re: No recovery of XA resource if crash during prepare
                  marklittle

                   

                  "n9086822" wrote:

                  However, how would E recover if it was to crash after the return of E.prepare() but before successful completion of the logging by Ejts?


                  The situation is that the resource has prepared but the jts wrapper hasn't written the coordinator reference (and hasn't sent an ack to the prepare request). In that case, the transaction rolls back, but your resource will remain in a prepared state and recovery won't kick in. Then you need an admin tool to manually resolve. This is an edge case, with a small window of vulernability. However, as long as your admin resolves the state using the transaction information, the transaction will eventually complete (roll back).