14 Replies Latest reply on Sep 18, 2006 8:47 AM by marklittle

    Can the transaction manager retry commit or rollback?

    timfox

      Just wanted to clarify something w.r.t transactions in JBM.

      On the messsaging server we maintain state corresponding to transactions in the prepared state in a map in memory.

      If commit or rollback is invoked on an xaresource we look up the corresponding transactional state in the map and commit/roll-it-back.

      My question is this: If commit or rollback throws an exception is it valid for the transaction manager to retry the commit or rollback (without first calling recover)?

      If I can assume the commit/rollback is never retried then I can safely remove the transaction state from the map in a finally block, whether or not the commit or rollback succeeded.

      If not, then I cannot see a safe time when the transaction state can be removed from the map.

      I cannot just remove it when the commit/rollback processing seems to complete without throwing an exception, since the network might fail before the invocation has returned to the client, so the client may still receive an exception.

      Not removing the state will give us a memory leak.

      My assumption is therefore that commit/rollback can never be retried (what about prepare?) - is this correct?

        • 1. Re: Can the transaction manager retry commit or rollback?

          My understanding is - the commit phase errors needs to be retried by the TM (probably in a background thread) else some resource may have already committed while one is repoting exception leaving in an inconsistent state. However, after few retries, probably TM should report the error back to the client/admin to resolve the issue manually/other means.

          Eceptions during Prepare transactions should also work exactly same as commit except the overall outcome is rollback (as opposed to commit in commit phase) - that is, they also to be retried.

          Well, that is theory, at least :). I am not sure JBossTS does any retries.

          Thanks
          Madhu

          • 2. Re: Can the transaction manager retry commit or rollback?
            timfox

            I understand that commits are likely to be retried - but that's not the question I was asking.

            I was asking whether commits can be retried without recover() being called.

            • 3. Re: Can the transaction manager retry commit or rollback?
              timfox

              If so, then that means we would have to force recovery if a commit/rollback comes in and we can't find it in the in memory map

              • 4. Re: Can the transaction manager retry commit or rollback?
                marklittle

                 

                "timfox" wrote:
                My question is this: If commit or rollback throws an exception is it valid for the transaction manager to retry the commit or rollback (without first calling recover)?


                Yes, it is valid in some cases. However, depending upon the error, it may not be worthwhile.


                If I can assume the commit/rollback is never retried then I can safely remove the transaction state from the map in a finally block, whether or not the commit or rollback succeeded.

                If not, then I cannot see a safe time when the transaction state can be removed from the map.


                Is this durable data? Will it survive a failure (I assume so, but need to ask).

                • 5. Re: Can the transaction manager retry commit or rollback?
                  timfox

                  I was hoping you were going to answer :)

                  We currently log prepared states in the db (this will change).

                  So after the prepare has completed the prepared state has been logged durably.

                  The thing in memory is our "Transaction" class, which has methods prepare(), commit(), rollback() etc.

                  When a prepare comes in we add the Transaction instance to the in memory map.

                  This means that when we receive a subsequent commit/rollback we look in the map, get the instance and call commit/rollback on it.

                  This then moves the prepared state from the log to permanent storage.

                  So, if I remove the Transaction instance from memory after commit/rollback irrespective of whether an exception occurs, then if another commit/rollback comes in without recover (recover repopulates the map from the log) it won't find it.

                  If I don't remove the instance then I get a memory leak.

                  AFAICT the solution here is to repopulate the map if a commit/rollback comes in and it isn't in the map - which is a bit of a pain....

                  • 6. Re: Can the transaction manager retry commit or rollback?
                    marklittle

                     

                    "timfox" wrote:
                    I was hoping you were going to answer :)


                    Given the amount of corporate spam we get, sometimes email is easier ;-)


                    We currently log prepared states in the db (this will change).

                    So after the prepare has completed the prepared state has been logged durably.

                    The thing in memory is our "Transaction" class, which has methods prepare(), commit(), rollback() etc.

                    When a prepare comes in we add the Transaction instance to the in memory map.

                    This means that when we receive a subsequent commit/rollback we look in the map, get the instance and call commit/rollback on it.

                    This then moves the prepared state from the log to permanent storage.

                    So, if I remove the Transaction instance from memory after commit/rollback irrespective of whether an exception occurs, then if another commit/rollback comes in without recover (recover repopulates the map from the log) it won't find it.

                    If I don't remove the instance then I get a memory leak.

                    AFAICT the solution here is to repopulate the map if a commit/rollback comes in and it isn't in the map - which is a bit of a pain....


                    Aren't you going to have to do this anyway if you have a crash failure, given that the map is in volatile store?

                    • 7. Re: Can the transaction manager retry commit or rollback?

                       

                      "timfox" wrote:

                      If I don't remove the instance then I get a memory leak.


                      You only have a memory leak if the number grows large.
                      Which you can give a warning about anyway.
                      if (map.size() > 100)
                      log.warn("Your TM is probably broken!");

                      The usual solution is to move the transaction to an "in doubt" state
                      after a configurable period of time. e.g. 1 hour
                      or if you have to recover after a failure.

                      Then provide a mechanism for the administrator to resolve
                      these heuristically.

                      You need to record permenant storage the result of these
                      heuristics so you can give the correct response if you do
                      get the commit/rollback.

                      You could also get a forget().

                      But this is a 0.1%/99.9% case. Having to go to the db to get the
                      heuristic state or the state of an in doubt transaction is
                      not a real an issue.
                      If the state isn't there, you are going to throw an exception anyway.

                      • 8. Re: Can the transaction manager retry commit or rollback?
                        marklittle

                         

                        "adrian@jboss.org" wrote:
                        "timfox" wrote:

                        If I don't remove the instance then I get a memory leak.


                        You only have a memory leak if the number grows large.
                        Which you can give a warning about anyway.
                        if (map.size() > 100)
                        log.warn("Your TM is probably broken!");

                        The usual solution is to move the transaction to an "in doubt" state
                        after a configurable period of time. e.g. 1 hour
                        or if you have to recover after a failure.



                        • 9. Re: Can the transaction manager retry commit or rollback?
                          marklittle

                           

                          "adrian@jboss.org" wrote:
                          "timfox" wrote:

                          If I don't remove the instance then I get a memory leak.


                          You only have a memory leak if the number grows large.
                          Which you can give a warning about anyway.
                          if (map.size() > 100)
                          log.warn("Your TM is probably broken!");

                          The usual solution is to move the transaction to an "in doubt" state
                          after a configurable period of time. e.g. 1 hour
                          or if you have to recover after a failure.


                          It should definitely try to optimise for the non-failure case. However, it sounds like you're implementing a transaction manager here, so my next question would be: why?

                          Why can't you just use the one that is available in whatever the deployed environment happens to be?

                          • 10. Re: Can the transaction manager retry commit or rollback?
                            timfox

                             

                            "mark.little@jboss.com" wrote:

                            Aren't you going to have to do this anyway if you have a crash failure, given that the map is in volatile store?


                            Yes. The map will be repopulated on server startup with the state of any prepared txs from the log (db).

                            So I guess this is a non issue- we just do the same if the tx instance is not found in the map.





                            • 11. Re: Can the transaction manager retry commit or rollback?

                              I'm not implementing a TM, I'm implementing an heuristic resource
                              with a local branch log.

                              JMS is a remote resource.

                              e.g.
                              If a JBossMessaging server gets enlisted by a transaction manager
                              on a machine that explodes and the log is lost then you need
                              a mechanism to put the messages back in the queue.

                              The exploded tm is never going to recover. :-)

                              There are other reasons why the TM might not complete
                              the transaction (e.g. network outage) and you need to get
                              the messages processed even if it means an heuristic.

                              Its the same issue as when you need to clear db locks
                              when a server crashes holding the locks and it can't be recovered.

                              • 12. Re: Can the transaction manager retry commit or rollback?
                                marklittle

                                 

                                "adrian@jboss.org" wrote:
                                I'm not implementing a TM, I'm implementing an heuristic resource
                                with a local branch log.

                                JMS is a remote resource.


                                OK, with you now.


                                e.g.
                                If a JBossMessaging server gets enlisted by a transaction manager
                                on a machine that explodes and the log is lost then you need
                                a mechanism to put the messages back in the queue.

                                The exploded tm is never going to recover. :-)


                                I'm sure if you ask an IBM sales-bod they've got a solution for this too ;-) It's just "sometime in 2010"!


                                There are other reasons why the TM might not complete
                                the transaction (e.g. network outage) and you need to get
                                the messages processed even if it means an heuristic.

                                Its the same issue as when you need to clear db locks
                                when a server crashes holding the locks and it can't be recovered.


                                Agreed. Though as ever I'd warn against over-use of heuristics to "solve" the problem. It's often a too-easy cop-out. Not saying it is in this case, but just a general warning.

                                • 13. Re: Can the transaction manager retry commit or rollback?
                                  timfox

                                  On a related issue, if the commit succeeds in the JBM resource, but some failure occurs e.g. network failure before the invocation returns to the transaction manager, so the tx mgr gets an exception and thinks the commit failed.

                                  So it retries the commit, but there is no record of the tx in the JBM resource since it committed ok and was removed as far as it was concerned.

                                  How should we deal with this? Should we always return success if a commit comes in for a transction we don't know about? Is this safe?

                                  • 14. Re: Can the transaction manager retry commit or rollback?
                                    marklittle

                                     

                                    "timfox" wrote:
                                    On a related issue, if the commit succeeds in the JBM resource, but some failure occurs e.g. network failure before the invocation returns to the transaction manager, so the tx mgr gets an exception and thinks the commit failed.

                                    So it retries the commit, but there is no record of the tx in the JBM resource since it committed ok and was removed as far as it was concerned.

                                    How should we deal with this? Should we always return success if a commit comes in for a transction we don't know about? Is this safe?


                                    It's complex ;-)

                                    In order to receive a commit from the TM, the resource obviously had to return VoteCommit during prepare. If the resource subsequently decided to change that decision then we're in the realms of heuristics and the resource has to remember that decision durably and respond appropriately during commit.

                                    Unfortunately just because this resource returns VoteCommit in prepare, doesn't mean that all of the participants in the transaction will do so. It could be told to rollback. In that case, it obviously won't receive a commit message from the TM at any point, so we're alright here.

                                    So if it receives a commit and there's no information in the log, it could return committed, on the assumption that the TM will only be sending commit for participants it remembers were associated with this transaction. Usually a good assumption ;-)

                                    There is one wrinkle in this, which if you're a good resource implementation won't show up anyway: if commit fails and you send back HeuristicMixed/Hazard/Rollback (depending on the failure type) and then cleanup, you would be giving the wrong answer next time commit comes in. However, if a heuristic decision happens, you really should remember that *before* replying to the TM and keep the information around until it is resolved.

                                    BTW, I'm assuming we're talking about the case where there has been a previous prepare call, i.e., in 2PC with NO one-phase commit optimisation.