7 Replies Latest reply on Aug 3, 2017 4:50 PM by abof

Consequences of restarting one of participants in a distributed transaction

abof Aug 3, 2017 4:44 PM

In short: my question is about the mechanism of committing changes in a distributed transaction. Distributed between two Wildfly server instances. What's important in my examples, is that one of servers is restarted before the transaction ends. Before I go into details, I will describe the run-time environment:

application server: Wildfly 8.2;
database server: "PostgreSQL 9.5.5 to x86_64-pc-linux-gnu"; in Wildfly as a XA Datasource with 'postgresql-9.1-903.jdbc4' driver;
Wildfly configuration is slightly changed relative to base 'standalone.xml' file (not 'full-standalone'!); (the 'jboss.node.name' of each server is unique); each server is running in debug-mode; second server has 100-port-offset;
I'm starting main method on first server with Eclipse NEON as a client; Eclipse has two 'remote java application' connection - one for each server - for debugging.

I'll describe two cases. First of them – A – involves one datasource in remote server S2 (insertion of simple entity). Second case – B – involves two datasources – one for each server.

Simplified diagram of case A:

Description of case A:

The first server – S1 – starts main method (@TransactionAttribute = REQUIRED) and, in the same time begins transaction – T1.

First part of mentioned method invokes remote method on the second server; remote method joins to propagated transaction – T1 (@TransactionAttribute = REQUIRED). There's database involved within that method – simple 'INSERT' via XA Datasource – DS_1. After that, control returns to main method on S1.

Imagine that along with first X (the red one) time consuming operation starts within main method on S1. It ends “after” second X (the green one). I've simulated time consuming processing with breakpoint during debugging. What's important – S2 has restarted “between” those X'es (i.e. S2 starts again before "green X").
On S1, within server logs there's INFO message: “EJBCLIENT000016: Channel Channel ID b6b41d02 (outbound) of Remoting connection 438a2403 to /12.3.45.678:4447 can no longer process messages”.

Main method on S1 ends; T1 also. Debugging on 'org.postgresql.xa.PGXAConnection' shows `commit(Xid xid, boolean onePhase)` invocation with 'onePhase = true'.

The point is that after exiting from main method there's no changes in database of datasource DS_1! What's bothers me even more is that there's no ERRORS or EXCEPTIONS - neither on S1 nor S2 logs!

Without restarting S2 – everything works as expected – there's new tuple in database inserted by S2.

Simplified diagram of case B:

Description of case B:

Invocation sequence is almost the same as in case A – with a additional step in main method on S1. There's simple 'INSERT' that involves database via datasource DS_2.

Again – during restart of S2 – I'v found similar INFO message on S1. At the end of T1 – during commit phase debugging on 'org.postgresql.xa.PGXAConnection' shows `commit(Xid xid, boolean onePhase)` invocation with 'onePhase = false'. I'm assuming that two phase commit took place.

Again: the point is that after exiting from main method there's no changes in database of datasource DS_1; again without any ERRORS or EXCEPTIONS in both servers logs! To make it more interesting change in DS_2 database – 'INSERT' by S1 – was successfully committed!

Let me ensure you that:

both cases works as expected without restarting participant server;
I don't want to find solution – I know that “REQUIRES_NEW” is one of them; I want to understand described behavior.

Final questions:

Is that expected behavior of commit phase at the end of transaction which is spread across two servers? I would rather be expecting EXCEPTIONS during commit phase (even in second phase in 2PC which is “void” one)
Is there way to configure transactions or datasources to achieve expecting behavior (exceptions during commit)?

1. Re: Consequences of restarting one of participants in a distributed transaction

zhfeng May 18, 2017 11:34 AM (in response to abof)

Can you move the question to the Narayana which concerns with the transaction ?

Thanks,
Amos
Actions
2. Re: Consequences of restarting one of participants in a distributed transaction

tomjenkinson May 18, 2017 5:33 PM (in response to abof)

I think dmlloyd fixed something related to this quite a while back. Are you able to run your example on a more current version of WildFly? For example 10.1

You could move to using JTS/CORBA (with EJB Home etc): jboss-eap-quickstarts/jts at 7.1.Alpha · jboss-developer/jboss-eap-quickstarts · GitHub as that was not susceptible to this issue.
Actions
3. Re: Consequences of restarting one of participants in a distributed transaction

ochaloup May 23, 2017 7:16 AM (in response to abof)

hi abof, I was interested in the behavior thus I tried to reproduce your issue. I can see the same behavior for JTA implementation. As Tom mentioned the JTS works fine.

The behavior is following - the transaction is started on `S1` by invoking business method calling server `S2`. The ejb call passes the transaction context along. `S2` receives the call, checks about the existence of the context. If context exists it starts a subordinate transaction with the same transaction id as `S1` defined (behavior depends on `@TransactionAttribute`) . During the business method processing each server stores the transaction information only in the memory. If a node crashes the info is just lost.

As said the `S2` is called and the insertion to the database is processed. For the database insertion, the database starts own local transaction setting up the transaction timeout based on the timeout defined in global transaction maintained by WildFly.
The ejb call is returned back to `S1`. Now the `S2` is restarted which means transaction info is forgotten at `S2`. The processing of `S1` business method ends and transaction manager invokes two-phase commit/one phase commit. The transaction manager tries to call prepare/commit for the `S2` (`S2` is enlisted to the transaction as a special resource).

The correct behavior is that the call to `S2` fails as `S2` does have any information about the transaction which `S1` tries to commit. In such case, there is thrown the exception (which could be seen in the `server.log`) and the transaction is aborted, plus the `S2` resource is set as heuristic as transaction manager does not know what happened. It's up to the user to decide.

But in the case of current ejb transaction context propagation, it returns `TwoPhaseOutcome.PREPARE_READONLY` in the case of two-phase commit or TwoPhaseOutcome.FINISH_OK in the case of one-phase commit optimization. Thus transaction manager counts the processing as correct and finishes commits on all participant (for two-phase commit).

As I think it's an issue and it's still valid for JTA even for WFLY 11 I created an issue on that https://issues.jboss.org/browse/JBEAP-11081
Actions
4. Re: Consequences of restarting one of participants in a distributed transaction

abof May 24, 2017 5:14 AM (in response to ochaloup)

First of all - ochaloup - thanks a lot for reproducing my issue on more current version of WildFly. Tha was main reason why I'was delaying with the answer.
I'm glad to read confirmation of what I suspected as a correct behavior for the two mentioned test cases. Unfortunately using JTS in my case is impossible :/
I'm marking your post as a correct answer! Thanks once again!

I'll be watching the issue on JBoss Issue Tracker.
Actions
5. Re: Consequences of restarting one of participants in a distributed transaction

abof Aug 3, 2017 9:51 AM (in response to ochaloup)

ochaloup - one more question - do you happen to know which Wildfly release will have fix mentioned in JBEAP-11081?
Actions
6. Re: Consequences of restarting one of participants in a distributed transaction

ochaloup Aug 3, 2017 4:15 PM (in response to abof)

abof : the fix was made in WFTC project (https://github.com/wildfly/wildfly-transaction-client) and the wildfly github upstream already contains the update. Thus the fix should be part of the upcoming WFLY release. Unfortunatelly I have no idea either if it is 11.0.0.Beta, 11.0.0.Final nor when it will come. I hope that will be soon.

The possibility could be to take the current 11.0.0.Alpha and change the WFTC version to the most up to date (or at least 1.0.0.CR2): https://github.com/wildfly/wildfly-transaction-client/releases. I belive it works but uUnfortunatelly I haven't tested it.
1 of 1 people found this helpful
Actions
7. Re: Consequences of restarting one of participants in a distributed transaction

abof Aug 3, 2017 4:50 PM (in response to ochaloup)

ochaloup : once again - thx for quick response! I'm glad that I can always count on you to answer my questions!
Actions

Go to original post