1 2 3 Previous Next 35 Replies Latest reply on Sep 28, 2009 9:11 AM by jmesnil Go to original post
      • 15. Re: XA resource and setting the timeout
        timfox

        I can accept that if we return true from setTransactionTimeout then the Transaction Manager will expect us to timeout with a XA_RBTIMEOUT after the time set.

        But logically, that _does not_ imply, that if we return false from setTransactionTimeout it is ok to return XAERR_NOTA if a rollback/commit is attempted after we have heuristically timed out the branch.

        • 16. Re: XA resource and setting the timeout
          ataylor

           

          But logically, that _does not_ imply, that if we return false from setTransactionTimeout it is ok to return XAERR_NOTA if a rollback/commit is attempted after we have heuristically timed out the branch.


          If thats the case then we either don't time out at all and return false for setTimeout or we need to keep the tx associated with the Resource manager and return XA_RBTIMEOUT. Actually as long as we release the acks and messages this should be ok. If we want we could have some sort of reaper that removes any timeout sessions that are really old.

          • 17. Re: XA resource and setting the timeout
            timfox

             

            "ataylor" wrote:


            If thats the case then we either don't time out at all and return false for setTimeout or we need to keep the tx associated with the Resource manager and return XA_RBTIMEOUT.


            The problem with that is how long do we keep the xid for? The xid list could grow without bound, so we'd have to clear it down after a time, and then if it does get cleared down and the tx mgr subsequently decides to commit/rollback on that branch we won't have record of the xid and we'll have to return XAERR_NOTA anyway!

            For now, let's return false from setTransactionTimeout, then just timeout tx ourselves according to our own timeout, and return XAERR_NOTA, and double check with Jonathan this is ok.

            • 18. Re: XA resource and setting the timeout
              timfox

              Looking at the code committed today for this.

              I notice a new TxTimeoutHandler is being created for every transaction. I don't think this scales and is unnecessary.

              Instead we should have a single thread that scans the transaction map every x seconds/milliseconds and times transactions out that way.

              • 19. Re: XA resource and setting the timeout
                timfox

                Also ResourceManagerImpl::timeoutSeconds is only being set via the setTimeout method.

                This means if the transactionmanager never calls setTimeout() it will always be zero.

                • 20. Re: XA resource and setting the timeout
                  timfox

                  I'd also test that time outs work after server is restarted and prepared txs are reloaded.

                  • 21. Re: XA resource and setting the timeout
                    timfox

                    Take a look at RemotingServiceImpl to see how we do something similar.

                    There's a timer that scans the connections and fails one where no ping has arrived.

                    You could do something very similar (copy and paste).

                    • 22. Re: XA resource and setting the timeout
                      ataylor

                       

                      Take a look at RemotingServiceImpl to see how we do something similar.

                      There's a timer that scans the connections and fails one where no ping has arrived.

                      You could do something very similar (copy and paste).


                      Ok, will do

                      Also ResourceManagerImpl::timeoutSeconds is only being set via the setTimeout method.

                      This means if the transactionmanager never calls setTimeout() it will always be zero.


                      Ok, I'll set a default of 10 minutes and also make it configurable

                      • 23. Re: XA resource and setting the timeout
                        ataylor

                         

                        "timfox" wrote:
                        I'd also test that time outs work after server is restarted and prepared txs are reloaded.


                        Ok, since i'll need to add the expireTime when the prepare record is saved to the journal, I'm renaming XidEncoding to TransactionEncoding and adding the expireTime to it.

                        • 24. Re: XA resource and setting the timeout
                          jhalliday

                          > I'd also test that time outs work after server is restarted and prepared txs are reloaded.

                          Except of course that it should time out only new, unprepared tx created after the restart, not the prepared ones. Which come to think of it is true even in the non-restart case - you need to test that a prepared tx will not be timed out, or at least be concious of the issues that arise if it is.

                          BTW, do you include inflight but unprepared tx in the list returned by a recover call on the XAResource?

                          • 25. Re: XA resource and setting the timeout
                            timfox

                             

                            "jhalliday" wrote:
                            > I'd also test that time outs work after server is restarted and prepared txs are reloaded.

                            Except of course that it should time out only new, unprepared tx created after the restart, not the prepared ones. Which come to think of it is true even in the non-restart case - you need to test that a prepared tx will not be timed out, or at least be concious of the issues that arise if it is.


                            You lost me there. I thought the whole point of time outs was to time out prepared transaction branches, i.e. heuristically roll them back.

                            You're saying we should only timeout unprepared txs? If so, why?


                            BTW, do you include inflight but unprepared tx in the list returned by a recover call on the XAResource?


                            No, we just include prepared txs

                            • 26. Re: XA resource and setting the timeout
                              timfox

                               

                              "ataylor" wrote:

                              Ok, since i'll need to add the expireTime when the prepare record is saved to the journal, I'm renaming XidEncoding to TransactionEncoding and adding the expireTime to it.


                              Better to store create time rather than expire time. That allows for the user to dynamically change timeout value.

                              • 27. Re: XA resource and setting the timeout
                                ataylor

                                 

                                Except of course that it should time out only new, unprepared tx created after the restart, not the prepared ones. Which come to think of it is true even in the non-restart case - you need to test that a prepared tx will not be timed out, or at least be concious of the issues that arise if it is.


                                Ok, so what issues will arise if we time out a prepared tx? If we don't time it out, aren't we in the same boat, i.e. transactions hanging around forever that would never be committed.

                                BTW, do you include inflight but unprepared tx in the list returned by a recover call on the XAResource?


                                No, we currently only return tx's in the prepared state.

                                • 28. Re: XA resource and setting the timeout
                                  ataylor

                                  Better to store create time rather than expire time. That allows for the user to dynamically change timeout value.

                                  • 29. Re: XA resource and setting the timeout
                                    jhalliday

                                    > You're saying we should only timeout unprepared txs? If so, why?

                                    Let's go back to my quote from the XA spec:

                                    "An RM can mark a transaction branch as rollback-only any time except after a successful prepare. ...An RM can also unilaterally roll back and forget a branch any time except after a successful prepare."

                                    Timeouts allow an RM to walk away from an unprepared tx that it thinks may have been abandoned. It's useful to cover e.g. client crashes. It does not lead to an inconsistent tx outcome since the tx is presumed abort at that stage anyhow.

                                    Unilaterally deciding to rollback a prepared branch is a much bigger deal as it can lead to heuristics. It can be done, but needs a lot more thought and logging.

                                    Personally I'd avoid automating post-prepare rollbacks and instead provide a tool that admins can use to manually force a tx outcome. Since it may lead to data inconsistency it's not a decision to be made lightly and the best course of action usually needs some understanding of the business context e.g. in some cases it's better to stay blocked for consistency, whilst in others availability concerns may make a rollback+manual data reconciliation a more attractive option.

                                    > No, we just include prepared txs

                                    ok, that's fine. Some systems will include the unprepared ones too, on the basis that the tx manager will rollback any it does not recognise, which means they get cleaned up sooner than a timeout may achieve. It's not a common approach though.