-
15. Re: XA resource and setting the timeout
timfox Nov 4, 2008 3:47 AM (in response to ataylor)I can accept that if we return true from setTransactionTimeout then the Transaction Manager will expect us to timeout with a XA_RBTIMEOUT after the time set.
But logically, that _does not_ imply, that if we return false from setTransactionTimeout it is ok to return XAERR_NOTA if a rollback/commit is attempted after we have heuristically timed out the branch. -
16. Re: XA resource and setting the timeout
ataylor Nov 4, 2008 3:59 AM (in response to ataylor)But logically, that _does not_ imply, that if we return false from setTransactionTimeout it is ok to return XAERR_NOTA if a rollback/commit is attempted after we have heuristically timed out the branch.
If thats the case then we either don't time out at all and return false for setTimeout or we need to keep the tx associated with the Resource manager and return XA_RBTIMEOUT. Actually as long as we release the acks and messages this should be ok. If we want we could have some sort of reaper that removes any timeout sessions that are really old. -
17. Re: XA resource and setting the timeout
timfox Nov 4, 2008 4:04 AM (in response to ataylor)"ataylor" wrote:
If thats the case then we either don't time out at all and return false for setTimeout or we need to keep the tx associated with the Resource manager and return XA_RBTIMEOUT.
The problem with that is how long do we keep the xid for? The xid list could grow without bound, so we'd have to clear it down after a time, and then if it does get cleared down and the tx mgr subsequently decides to commit/rollback on that branch we won't have record of the xid and we'll have to return XAERR_NOTA anyway!
For now, let's return false from setTransactionTimeout, then just timeout tx ourselves according to our own timeout, and return XAERR_NOTA, and double check with Jonathan this is ok. -
18. Re: XA resource and setting the timeout
timfox Nov 4, 2008 2:31 PM (in response to ataylor)Looking at the code committed today for this.
I notice a new TxTimeoutHandler is being created for every transaction. I don't think this scales and is unnecessary.
Instead we should have a single thread that scans the transaction map every x seconds/milliseconds and times transactions out that way. -
19. Re: XA resource and setting the timeout
timfox Nov 4, 2008 2:37 PM (in response to ataylor)Also ResourceManagerImpl::timeoutSeconds is only being set via the setTimeout method.
This means if the transactionmanager never calls setTimeout() it will always be zero. -
20. Re: XA resource and setting the timeout
timfox Nov 4, 2008 2:40 PM (in response to ataylor)I'd also test that time outs work after server is restarted and prepared txs are reloaded.
-
21. Re: XA resource and setting the timeout
timfox Nov 4, 2008 3:01 PM (in response to ataylor)Take a look at RemotingServiceImpl to see how we do something similar.
There's a timer that scans the connections and fails one where no ping has arrived.
You could do something very similar (copy and paste). -
22. Re: XA resource and setting the timeout
ataylor Nov 5, 2008 3:32 AM (in response to ataylor)Take a look at RemotingServiceImpl to see how we do something similar.
There's a timer that scans the connections and fails one where no ping has arrived.
You could do something very similar (copy and paste).
Ok, will doAlso ResourceManagerImpl::timeoutSeconds is only being set via the setTimeout method.
This means if the transactionmanager never calls setTimeout() it will always be zero.
Ok, I'll set a default of 10 minutes and also make it configurable -
23. Re: XA resource and setting the timeout
ataylor Nov 5, 2008 5:58 AM (in response to ataylor)"timfox" wrote:
I'd also test that time outs work after server is restarted and prepared txs are reloaded.
Ok, since i'll need to add the expireTime when the prepare record is saved to the journal, I'm renaming XidEncoding to TransactionEncoding and adding the expireTime to it. -
24. Re: XA resource and setting the timeout
jhalliday Nov 5, 2008 6:04 AM (in response to ataylor)> I'd also test that time outs work after server is restarted and prepared txs are reloaded.
Except of course that it should time out only new, unprepared tx created after the restart, not the prepared ones. Which come to think of it is true even in the non-restart case - you need to test that a prepared tx will not be timed out, or at least be concious of the issues that arise if it is.
BTW, do you include inflight but unprepared tx in the list returned by a recover call on the XAResource? -
25. Re: XA resource and setting the timeout
timfox Nov 5, 2008 6:11 AM (in response to ataylor)"jhalliday" wrote:
> I'd also test that time outs work after server is restarted and prepared txs are reloaded.
Except of course that it should time out only new, unprepared tx created after the restart, not the prepared ones. Which come to think of it is true even in the non-restart case - you need to test that a prepared tx will not be timed out, or at least be concious of the issues that arise if it is.
You lost me there. I thought the whole point of time outs was to time out prepared transaction branches, i.e. heuristically roll them back.
You're saying we should only timeout unprepared txs? If so, why?
BTW, do you include inflight but unprepared tx in the list returned by a recover call on the XAResource?
No, we just include prepared txs -
26. Re: XA resource and setting the timeout
timfox Nov 5, 2008 6:13 AM (in response to ataylor)"ataylor" wrote:
Ok, since i'll need to add the expireTime when the prepare record is saved to the journal, I'm renaming XidEncoding to TransactionEncoding and adding the expireTime to it.
Better to store create time rather than expire time. That allows for the user to dynamically change timeout value. -
27. Re: XA resource and setting the timeout
ataylor Nov 5, 2008 6:13 AM (in response to ataylor)Except of course that it should time out only new, unprepared tx created after the restart, not the prepared ones. Which come to think of it is true even in the non-restart case - you need to test that a prepared tx will not be timed out, or at least be concious of the issues that arise if it is.
Ok, so what issues will arise if we time out a prepared tx? If we don't time it out, aren't we in the same boat, i.e. transactions hanging around forever that would never be committed.BTW, do you include inflight but unprepared tx in the list returned by a recover call on the XAResource?
No, we currently only return tx's in the prepared state. -
28. Re: XA resource and setting the timeout
ataylor Nov 5, 2008 6:16 AM (in response to ataylor)Better to store create time rather than expire time. That allows for the user to dynamically change timeout value.
-
29. Re: XA resource and setting the timeout
jhalliday Nov 5, 2008 6:21 AM (in response to ataylor)> You're saying we should only timeout unprepared txs? If so, why?
Let's go back to my quote from the XA spec:
"An RM can mark a transaction branch as rollback-only any time except after a successful prepare. ...An RM can also unilaterally roll back and forget a branch any time except after a successful prepare."
Timeouts allow an RM to walk away from an unprepared tx that it thinks may have been abandoned. It's useful to cover e.g. client crashes. It does not lead to an inconsistent tx outcome since the tx is presumed abort at that stage anyhow.
Unilaterally deciding to rollback a prepared branch is a much bigger deal as it can lead to heuristics. It can be done, but needs a lot more thought and logging.
Personally I'd avoid automating post-prepare rollbacks and instead provide a tool that admins can use to manually force a tx outcome. Since it may lead to data inconsistency it's not a decision to be made lightly and the best course of action usually needs some understanding of the business context e.g. in some cases it's better to stay blocked for consistency, whilst in others availability concerns may make a rollback+manual data reconciliation a more attractive option.
> No, we just include prepared txs
ok, that's fine. Some systems will include the unprepared ones too, on the basis that the tx manager will rollback any it does not recognise, which means they get cleaned up sooner than a timeout may achieve. It's not a common approach though.