-
1. Re: No recovery of XA resource if crash during prepare
marklittle May 22, 2006 12:31 PM (in response to paul.robinson)In order to explain 2), you'll need to tell us which version of JTA are you using (local or JTS)?
Either way, the explanation for 1) remains the same: presumed abort! The coordinator doesn't log anything until it knows for sure that the prepare has been successfully acked by all participants. If it doesn't know this (and in the case of E it can't), it rolls back. There is no point in it trying to contact E because it knows that E, if it's a well behaved (i.e., correctly implemented) participant will also abide by the rules and rollback if it crashed. If it didn't crash and did get the prepare and sent an ack (but the ack got lost, for example), then it should call back to the coordinator eventually and find out: it'll be told that the transaction rolled back.
What were you expecting? That the coordinator would record the information about the failed participant(s) and keep trying to roll them back? Not much of an optimisation in that case ;-)
OK, it could just do this from volatile memory, but that doesn't reduce the window of vulernability (plus there's still overhead on the coordinator). All it is doing is sending a convenience signal to the participant (and that will only be received if the participant has recovered). If the participant really did fail then it is likely that instance won't ever come back (why would it, since it may not have logged anything), so the repeated rollback message from the coordinator could easily come back with a fault: participant-does-not-exist. A lot of effort for very little advantage.
Implement the participant so it abides by presumed abort semantics. 2PC is a contract between coordinator and participant: it's not the domain of the coordinator to implement the whole thing. -
2. Re: No recovery of XA resource if crash during prepare
paul.robinson May 23, 2006 5:48 AM (in response to paul.robinson)Thanks for the response.
I am using JBossTS in JTS mode.
I thought it looked like it was doing presumed abort. However, I was under the impression that XA didn't support presumed abort. My understanding was that presumed abort will only work when resource driven recovery is possible. I didn't realise that you could do resource driven recovery with XA. How does the resource request the outcome of a transaction from the coordinator; what interface does it use? -
3. Re: No recovery of XA resource if crash during prepare
marklittle May 23, 2006 5:51 AM (in response to paul.robinson)Vanilla XA doesn't. However, we wrap it in JTS so it does. Read the JTS/OTS specification to understand how recovery works there.
The reason you get two commit/rollback messages is because top-down and bottom-up recovery kick in simultaneously. -
4. Re: No recovery of XA resource if crash during prepare
paul.robinson May 23, 2006 7:19 AM (in response to paul.robinson)Ah ok, that makes sense now.
I see two rollbacks in my example; because each prepared resource is asking the coordinator what the outcome was (bottom up recovery) and the coordinator is rolling back all the prepared resources (top-down) recovery.
For resource E I won't see top-down recovery because, as far as the coordinator is concerned, E did not prepare. Also, from the point of view of E, E was never prepared either as it crashed during the prepare method.
So, if E.prepare() returns successfully, E's JTS wrapper (Ejts) would then be logged. Should E crash after the log, but before the return to the coordinator, the recovery manager at E would find Ejts in the log, look at the XID to find the address of the coordinator and then ask the coordinator what the outcome was. The coordinator would have nothing logged for E, so would presume that rollback is the right thing to do.
However, how would E recover if it was to crash after the return of E.prepare() but before successful completion of the logging by Ejts?
Am I correct in thinking there is a window of vulnerability here?
Paul. -
5. Re: No recovery of XA resource if crash during prepare
marklittle May 23, 2006 11:46 AM (in response to paul.robinson)"n9086822" wrote:
However, how would E recover if it was to crash after the return of E.prepare() but before successful completion of the logging by Ejts?
The situation is that the resource has prepared but the jts wrapper hasn't written the coordinator reference (and hasn't sent an ack to the prepare request). In that case, the transaction rolls back, but your resource will remain in a prepared state and recovery won't kick in. Then you need an admin tool to manually resolve. This is an edge case, with a small window of vulernability. However, as long as your admin resolves the state using the transaction information, the transaction will eventually complete (roll back).