I think your case 2 covers my concerns. I was thinking that if the subordinate decides to timeout but the root TM then commits the transaction (as the timeout thread is blocked say in the parent) then we would have an issue as the subordinate would say "I can't find this transaction" but based on what you said below, it should just work, thanks for clarifying
Mark Little wrote:
It's not needed. If the subordinate times out "early" and there are only two scenarios when the coordinator eventually times out too:
1: it hadn't been prepared, so the "I can't find this transaction" message that comes from the subordinate is OK, i.e., the coordinator knows that the subordinate rolled back anyway.
2: if it had prepared and decided to roll back then the "Heuristic Law" states that the subordinate needs to remember the fact and it can't ever say "I can't find this transaction". Instead it needs to say "I found the transaction and rolled back" or "I found the transaction and committed".
Case 2 has to be considered even without user level transaction time outs since it's basic heuristic capabilities.
David Lloyd wrote:
Tom Jenkinson wrote:
I don't think I am explaining this well enough.
Say there is a root TM and a subordinate TM with a timeout of say 2 second
time T+0: root TM starts transaction timeout of 2
time T+1: tx flows to subordinate TM (using the remaining amount of timeout at root TM as the timeout for subordinate TM, i.e. the timeout value appears to be 1 at the subordinate TM)
time T+2: subordinate TM timesout and forgets about the transaciton AND root TM timesout. The root TM timeout cascades the abort to all XAResources (remember in the model we are going for subordinate transaction managers are registered as XAResources so each subordinate gets a call to abort from the root tm)
time T+3: the subordinate transaction manager receives the abort message from the root TM via the proxy XAResource but by now has cleaned up the transaction so will return an error indicating that it can't find the transaction
Yeah I get the potential issue. The extra time added has to be greater than the expected latency between the time the root controller issues the timeout notice and the time that the subordinate can receive and process it.
That said, we need some level of tolerance for the case where the given extra time is not sufficient and the subordinate node is unable to abort the transaction because it's already gone, because it will happen at some point. Ideally, this transaction abort would be idempotent.
Just to confirm, I have now committed a unit test that proves Marks assertion to be true: https://svn.jboss.org/repos/labs/labs/jbosstm/branches/JBOSSTS_4_15_0_Final/atsintegration/tests/classes/com/arjuna/ats/jta/distributed/SimpleIsolatedServers.java -r 37580
When flowing the transaction you can just use the value obtained from TransactionTimeoutConfiguration::getTimeLeftBeforeTransactionTimeout(boolean) (/1000)
Just an update here, we're observing that if we start a subordinate transaction with a timeout, and the transaction is completed well ahead of that timeout, we are seeing messages like this after the timeout:
21:32:26,085 WARN [com.arjuna.ats.arjuna] (Transaction Reaper) ARJUNA12117: TransactionReaper::check timeout for TX 0:ffff7f000001:c478e9b:4ea82d79:7a in state RUN
These are logged at a WARN level. Are these spurious messages, or did we do something wrong?
To add a bit more details to that WARN message - It appears that when a subordinate tx is imported as follows (that 300 value is a example timeout):
final Transaction newSubOrdinateTx = SubordinationManager.getTransactionImporter().importTransaction(xidTransactionID.getXid(), 300);
it ultimately leads to com.arjuna.ats.internal.jta.transaction.arjunacore.subordinate.SubordinateAtomicAction constructor which adds it to the TransactionReaper. But I don't see any code which removes it from the reaper once the SubordinateTransaction is committed/rolledback well before the timeout. As a result, the TransactionReaper ends up logging that WARN. So I'm not sure whether it's just a case of removing the ReaperElement from the TransactionReaper or something more. Let us know if you need more details.
Thanks for the report. My initial view is that for your purpose this shouldn't really affect testing the component (as far as I can see), but it will look strange to JCA users so I have raised a Jira for you as it also affects that: https://issues.jboss.org/browse/JBTM-935