14 Replies Latest reply on May 17, 2006 10:40 AM by marklittle

    How JBoss Transaction detect fault in sub-activity

    xiaoqj

      I am just a novice of JBoss Transaction.
      Sorry for post such a topic here, I am just afraid of no replies if posted it in user's forum.


      It's frequently mentioned in manual, when compared with traditional atomic transaction, extended Transaction Model, saying nested transaction model, can give user a chance to react to exception thrown from a sub-activity rather than rollback the whole long-running transaction instantly. The user can adopt several cannonical strategies at application level, like
      (1) Ignore such a fault, if the activity is trivial one
      (2) Retry several time, if the fault is transient
      (3) Find an alternative for the faulty activity
      (4) Do a compensation and push the transaction to an acceptable state


      But it seems the whole wonderful plot is based on one fact: There is an underlying mechanism who provides perfect(complete and accurate) fault detectors over sub-activities, whether or not the remote activity function smoothly.

      The asynchoronous tendency of the communication media SOA assumed, makes perfect fault detect difficult, if not impossible.

      My question is how JBoss transaction solves the problem?

        • 1. Re: How JBoss Transaction detect fault in sub-activity
          marklittle

          First, please post to the User Forum in future. That's the right place for such questions.

          Secondly, it has nothing to do with requiring an accurate failure detector: with the exception of a quantum-based detector, there's no such thing and all systems use failure suspectors (a subtle, but important difference). What I'm getting at here is that irrespective of whether we're talking closely coupled or loosely coupled systems, accurate failure detection is practically impossible as is accurate failure suspicion. Truly asynchronous systems introduce even more problems around completeness and correctness.

          If a participant believes that the coordinator has failed, then it can call back to the coordinator to determine the outcome. Repeated failures to contact the coordinator MAY be interpreted by some participants as the coordinator having failed and the action subsequently taken by the participant will be implementation dependant. In an implementation based on the traditional ACID model, heuristics may then occur (assuming the participant got past the prepare phase; if it didn't then we're OK since the participant can rollback and be sure that will be the eventual outcome of the transaction). In an extended transaction model, you can get the same behaviour but the result MAY be less of a problem.

          If the coordinator suspects a participant failure then it again depends where the failure is suspected: if it is before prepare, then we're using presumed abort and the coordinator can rollback and be sure that all participants will eventually rollback (even if it can't contact them at the moment: those that have failed will not have to do anything upon recovery). If the failure is suspected after the participant has prepared, then the failure recovery subsystem will keep trying to issue the commit (assuming the coordinator had subsequently decided to commit). If after some implementation specific retry period (which could in theory be until the end of the universe) it still hasn't been able to reach the failed participant, then it may flag this to a sys admin to figure out.

          So in the end we don't need accurate failure suspicion. If we're wrong (either end of the protocol) we may cancel/rollback more transactions than we really need to, but ultimately everything will be OK.

          • 2. Re: How JBoss Transaction detect fault in sub-activity
            xiaoqj

            Thank you for helpful reply.
            To summerize your comment:

            Most transaction processing is a centric-based solution of the consensus problem. Usually a two phase completion is adopted
            (1) Vote Collection (bottom-up message flow) to make final decision non-trivial
            (2) Decision Diffusion(top-down message flow) to realize consistent agreement

            Thus the existence of centric arbitrator reduces the complex consensus problem to relative easy "at lease one" message transmission.
            (1) To tolerate duplication of messages(due to inaccurate failure suspector), idempotent operations are needed, saying Prepare, Abort, Commit, e.g.
            (2) To tolerate crash of coordinator, resort to recovery.

            My point is, it seems in ACID transaction, coordinator notifies user the final outcome just out of courtesy. So it's possible for user to think of the transaction as rolledback but in fact it's committed.

            The relationship between a parent scope and the child scope is relative the same. The parent scope needs a correct sense over the outcome of its children, which might be the basis to adjust its control flow. Maybe the sub-scope could to use again the "at-lease-once" message transmission to notify parent scope its outcome, which is witnessed from user as if it's an accurate failure detector.

            • 3. Re: How JBoss Transaction detect fault in sub-activity
              marklittle

               

              "XiaoQJ" wrote:
              Thank you for helpful reply.
              To summerize your comment:

              Most transaction processing is a centric-based solution of the consensus problem. Usually a two phase completion is adopted
              (1) Vote Collection (bottom-up message flow) to make final decision non-trivial
              (2) Decision Diffusion(top-down message flow) to realize consistent agreement

              Thus the existence of centric arbitrator reduces the complex consensus problem to relative easy "at lease one" message transmission.
              (1) To tolerate duplication of messages(due to inaccurate failure suspector), idempotent operations are needed, saying Prepare, Abort, Commit, e.g.
              (2) To tolerate crash of coordinator, resort to recovery.

              My point is, it seems in ACID transaction, coordinator notifies user the final outcome just out of courtesy. So it's possible for user to think of the transaction as rolledback but in fact it's committed.


              Absolutely NOT. Apart from rollback (and presumed abort), there is nothing about this being "out of courtesy". These messages MUST be delivered in order to guarantee atomicity, even in the presence of failures. Check out http://labs.jboss.com/portal/jbosstm/resources for lots of useful information.


              The relationship between a parent scope and the child scope is relative the same. The parent scope needs a correct sense over the outcome of its children, which might be the basis to adjust its control flow. Maybe the sub-scope could to use again the "at-lease-once" message transmission to notify parent scope its outcome, which is witnessed from user as if it's an accurate failure detector.


              I'm unsure what point you are trying to make here.

              • 4. Re: How JBoss Transaction detect fault in sub-activity
                xiaoqj

                I mistake the peer autonmy from parent-child autonomy, If we borrow the concept of "Autonomy" from workflow. The same concepts are implemented in JBossTS. Coordinator/Participant corresponds to the parent-child autonmy. Completion/CompletionWithAck Protocol corresponds to peer autonomy, which ensures that the user would get the outcome notification in a committed transaction. But seems such a reliable outcome notification mechanism is mentioned only in WS-AT.

                Is it also implemented in WS-BA?

                Is a special log component needed for transaction terminator to acquire the notification reliably?

                Will such a completion protocol disable the one phase commit optimization?

                • 5. Re: How JBoss Transaction detect fault in sub-activity
                  xiaoqj

                  Back to the fault detection question, When viewing WS-BA state transition graph, confused by the Fault/Faulted message pair. It's impratical to compel all participants to conform a Fail&Notify style failure model. So fault-suspecter you mentioned may be a necessity in WS-BA. Why it's left out in the WS-BA specification?

                  • 6. Re: How JBoss Transaction detect fault in sub-activity
                    marklittle

                     

                    "XiaoQJ" wrote:
                    I mistake the peer autonmy from parent-child autonomy, If we borrow the concept of "Autonomy" from workflow. The same concepts are implemented in JBossTS. Coordinator/Participant corresponds to the parent-child autonmy. Completion/CompletionWithAck Protocol corresponds to peer autonomy


                    Not quite. Coordinator/Participant are roles in a protocol, irrespective of whether it's WS-AT or WS-BA. Completion/CompletionWithAck is the actual protocol (from WS-BA), same as Durable2PC is a protocol (from WS-AT).

                    , which ensures that the user would get the outcome notification in a committed transaction. But seems such a reliable outcome notification mechanism is mentioned only in WS-AT.


                    What do you mean by "reliable outcome notification"? I presume you mean in the event of crash failures?


                    Is it also implemented in WS-BA?


                    Yes, WS-BA is expected to work in the event of failures of coordinator and/or participants.


                    Is a special log component needed for transaction terminator to acquire the notification reliably?


                    A log is used. The specifications don't say how this happens though, in the same way they don't say in WS-AT: that is implementation specific.


                    Will such a completion protocol disable the one phase commit optimization?


                    There is no one-phase commit optimization in WS-AT or WS-BA.

                    • 7. Re: How JBoss Transaction detect fault in sub-activity
                      marklittle

                       

                      "XiaoQJ" wrote:
                      Back to the fault detection question, When viewing WS-BA state transition graph, confused by the Fault/Faulted message pair. It's impratical to compel all participants to conform a Fail&Notify style failure model. So fault-suspecter you mentioned may be a necessity in WS-BA. Why it's left out in the WS-BA specification?


                      Fault/Faulted is the same as Rollback/RolledBack or Commit/Committed, for example. If the implementation doesn't get back a response then it should act as defined in the protocol for that message exchange.

                      There's no mention of a fault detector/suspector in WS-AT either ;-)

                      • 8. Re: How JBoss Transaction detect fault in sub-activity
                        xiaoqj

                        Propabaly, the concept of failure suspecter is implicit mentioned in the state transition table. Because there're two timers triggering "Comms Times out" and "Transaction Expiration" events.

                        "Comms Times out" is an effective detector for message loss in synchronous environment.


                        Although the transaction time-out mechanism may not take effect in 2nd phase, it can also shield crash failure in 1st phase.

                        Btw. interposition can really be helpful to make such a dectection accurate.

                        • 9. Re: How JBoss Transaction detect fault in sub-activity
                          xiaoqj

                          I mean relative more accurate not a perfect FD

                          • 10. Re: How JBoss Transaction detect fault in sub-activity
                            marklittle

                             

                            "XiaoQJ" wrote:
                            Propabaly, the concept of failure suspecter is implicit mentioned in the state transition table. Because there're two timers triggering "Comms Times out" and "Transaction Expiration" events.

                            "Comms Times out" is an effective detector for message loss in synchronous environment.


                            Suspector, not detector ;-)

                            Time outs don't help in an asynchronous environment, which is essentially what WS-BA is targeted at. Failures need to be suspected and dealt with through other mechanisms.


                            Although the transaction time-out mechanism may not take effect in 2nd phase, it can also shield crash failure in 1st phase.


                            If time outs happen in before the end of the first phase, we're OK because we use presumed abort in the WS-AT specification. After that, it becomes much more tricky.


                            Btw. interposition can really be helpful to make such a dectection accurate.


                            I'm not exactly sure what you mean.

                            • 11. Re: How JBoss Transaction detect fault in sub-activity
                              xiaoqj

                              I mean, if in a trusted domain where a sub-coordinator exits, it will find out the participant's impolite absense more easily, because usually a trusted domain is more synchronous-prone.

                              Then the discovery(absence of participant) can be propagated between coordinators.

                              To get a knowledge of whether a nearby coordinator is alive, some heart-beat signal mechanism or something can be introduction, for in a interposition infrastructure, we can afford it.

                              • 12. Re: How JBoss Transaction detect fault in sub-activity
                                marklittle

                                So I agree that interposition is useful for performance and security reasons. However, the use of interposition in Web Services doesn't imply synchronous behaviour at all. That's a deployment/use-time choice.

                                • 13. Re: How JBoss Transaction detect fault in sub-activity
                                  xiaoqj

                                  Then with current network conditions, how WS-BA address the problem of fault suspector. When viewing the released WS-BA spec, we can easily image some special execution runs, that causes inconsistency between coordinator's assumption of participant's state and the real state.

                                  I read a paper by Fischer, Lynch and Paterson that proves in a real asynchronous env., such a consensus cannot be achieved.

                                  • 14. Re: How JBoss Transaction detect fault in sub-activity
                                    marklittle

                                     

                                    "XiaoQJ" wrote:
                                    Then with current network conditions, how WS-BA address the problem of fault suspector. When viewing the released WS-BA spec, we can easily image some special execution runs, that causes inconsistency between coordinator's assumption of participant's state and the real state.


                                    I'll try to post something later. I'm at JavaOne at the moment and not a lot of time.


                                    I read a paper by Fischer, Lynch and Paterson that proves in a real asynchronous env., such a consensus cannot be achieved.


                                    Yes, that's correct.