4 Replies Latest reply on May 24, 2017 5:14 AM by abof abof

    Consequences of restarting one of participants in a distributed transaction

    abof abof Newbie

      In short: my question is about the mechanism of committing changes in a distributed transaction. Distributed between two Wildfly server instances. What's important in my examples, is that one of servers is restarted before the transaction ends. Before I go into details, I will describe the run-time environment:

      • application server: Wildfly 8.2;
      • database server: "PostgreSQL 9.5.5 to x86_64-pc-linux-gnu"; in Wildfly as a XA Datasource with 'postgresql-9.1-903.jdbc4' driver;
      • Wildfly configuration is slightly changed relative to base 'standalone.xml' file (not 'full-standalone'!); (the 'jboss.node.name' of each server is unique); each server is running in debug-mode; second server has 100-port-offset;
      • I'm starting main method on first server with Eclipse NEON as a client; Eclipse has two 'remote java application' connection - one for each server - for debugging.


      I'll describe two cases. First of them – A – involves one datasource  in remote server S2 (insertion of simple entity). Second case – B – involves two datasources – one for each server.


      Simplified diagram of case A:


      Description of case A:

      The first server – S1 – starts main method (@TransactionAttribute = REQUIRED) and, in the same time begins transaction – T1.

      First part of mentioned method invokes remote method on the second server; remote method joins to propagated transaction – T1 (@TransactionAttribute = REQUIRED). There's database involved within that method – simple 'INSERT' via XA Datasource – DS_1. After that, control returns to main method on S1.


      Imagine that along with first X  (the red one) time consuming operation starts within main method on S1. It ends “after” second X (the green one). I've simulated time consuming processing with breakpoint during debugging. What's important – S2 has restarted “between” those X'es (i.e. S2 starts again before "green X").
      On S1, within server logs there's INFO message: “EJBCLIENT000016: Channel Channel ID b6b41d02 (outbound) of Remoting connection 438a2403 to / can no longer process messages”.

      Main method on S1 ends; T1 also. Debugging on 'org.postgresql.xa.PGXAConnection' shows `commit(Xid xid, boolean onePhase)` invocation with 'onePhase = true'.


      The point is that after exiting from main method there's no changes in database of datasource DS_1! What's bothers me even more is that there's no ERRORS or EXCEPTIONS - neither on S1 nor S2 logs!

      Without restarting S2 – everything works as expected – there's new tuple in database inserted by S2.


      Simplified diagram of case B:



      Description of case B:

      Invocation sequence is almost the same as in case A – with a additional step in main method on S1. There's simple 'INSERT' that involves database via datasource DS_2.

      Again – during restart of S2 – I'v found similar INFO message on S1. At the end of T1 – during commit phase debugging on 'org.postgresql.xa.PGXAConnection' shows `commit(Xid xid, boolean onePhase)` invocation with 'onePhase = false'. I'm assuming that two phase commit took place.


      Again: the point is that after exiting from main method there's no changes in database of datasource DS_1; again without any ERRORS or EXCEPTIONS in both servers logs! To make it more interesting change in DS_1 database – 'INSERT' by S1 – was successfully committed!


      Let me ensure you that:

      • both cases works as expected without restarting participant server;
      • I don't want to find solution – I know that “REQUIRES_NEW” is one of them; I want to understand described behavior.


      Final questions:

      • Is that expected behavior of commit phase at the end of transaction which is spread across two servers? I would rather be expecting EXCEPTIONS during commit phase (even in second phase in 2PC which is “void” one)
      • Is there way to configure transactions or datasources to achieve expecting behavior (exceptions during commit)?
        • 1. Re: Consequences of restarting one of participants in a distributed transaction
          Amos Feng Apprentice

          Can you move the question to the Narayana which concerns with the transaction ?




          • 2. Re: Consequences of restarting one of participants in a distributed transaction
            Tom Jenkinson Master

            I think dmlloyd fixed something related to this quite a while back. Are you able to run your example on a more current version of WildFly? For example 10.1


            You could move to using JTS/CORBA (with EJB Home etc): jboss-eap-quickstarts/jts at 7.1.Alpha · jboss-developer/jboss-eap-quickstarts · GitHub as that was not susceptible to this issue.

            • 3. Re: Consequences of restarting one of participants in a distributed transaction
              Ondra Chaloupka Apprentice

              hi abof, I was interested in the behavior thus I tried to reproduce your issue. I can see the same behavior for JTA implementation. As Tom mentioned the JTS works fine.


              The behavior is following - the transaction is started on `S1` by invoking business method calling server `S2`. The ejb call passes the transaction context along. `S2` receives the call, checks about the existence of the context. If context exists it starts a subordinate transaction with the same transaction id as `S1` defined (behavior depends on `@TransactionAttribute`) . During the business method processing each server stores the transaction information only in the memory. If a node crashes the info is just lost.


              As said the `S2` is called and the insertion to the database is processed. For the database insertion, the database starts own local transaction setting up the transaction timeout based on the timeout defined in global transaction maintained by WildFly.
              The ejb call is returned back to `S1`. Now the `S2` is restarted which means transaction info is forgotten at `S2`. The processing of `S1` business method ends and transaction manager invokes two-phase commit/one phase commit. The transaction manager tries to call prepare/commit for the `S2` (`S2` is enlisted to the transaction as a special resource).


              The correct behavior is that the call to  `S2`  fails as `S2` does have any information about the transaction which `S1` tries to commit. In such case, there is thrown the exception (which could be seen in the `server.log`) and the transaction is aborted, plus the `S2` resource is set as heuristic as transaction manager does not know what happened. It's up to the user to decide.

              But in the case of current ejb transaction context propagation, it returns `TwoPhaseOutcome.PREPARE_READONLY` in the case of two-phase commit or TwoPhaseOutcome.FINISH_OK in the case of one-phase commit optimization. Thus transaction manager counts the processing as correct and finishes commits on all participant (for two-phase commit).


              As I think it's an issue and it's still valid for JTA even for WFLY 11 I created an issue on that https://issues.jboss.org/browse/JBEAP-11081

              • 4. Re: Consequences of restarting one of participants in a distributed transaction
                abof abof Newbie

                First of all - ochaloup - thanks a lot for reproducing my issue on more current version of WildFly. Tha was main reason why I'was delaying with the answer.

                I'm glad to read confirmation of what I suspected as a correct behavior for the two mentioned test cases. Unfortunately using JTS in my case is impossible :/

                I'm marking your post as a correct answer! Thanks once again!


                I'll be watching the issue on JBoss Issue Tracker.