Data inconsistency for XA transaction when db resource returns XAER_RMERR code for lost connection
ochaloup Jan 12, 2015 10:59 AMHi,
I would like start discussion about scenario which ends with inconsistent data for XA transaction when XA resource returns XAException.XAER_RMERR and continues with commit.
This behaviour is against specification as it mandates that:
An error occurred in committing the work performed on behalf of the transaction
branch and the branch’s work has been rolled back. Note that returning this error
signals a catastrophic event to a transaction manager since other resource
managers may successfully commit their work on behalf of this branch. This error
should be returned only when a resource manager concludes that it can never
commit the branch and that it cannot hold the branch’s resources in a prepared
state. Otherwise, [XA_RETRY] should be returned.
see discussion and Tom's comment at Bug 1169671 – Recovery scenario where db connection is halted after prepare phase does not rollback resource
but the problem is that a lot of databases behaves in this way. Databases as PostgreSQL, MSSQL or Sybase throws XAException.XAER_RMERR anytime when connection is lost.
Narayana transaction manager then rollbacks the rest of the transaction. If we have following scenario the result is inconsistent data.
- prepare DB xa resource
- prepare second xa resource
- commit DB xa resource
- DB commits
- connection crashes (before confirmation is received by transaction manager)
- jdbc driver returns XAException as connection is down
There are now 2 cases. At least for databases that EAP app server supports.
The jdbc driver returns XAException.XAER_RMFAIL or XAER_RETRY. That's ok as all the subsequent xa resources are committed. Apart from a small issue of recovery manager that will repeat a try to commit non-existent XID (as DB already commits). This should be fixed by [JBTM-860] use XAResourceWrapper metadata for assume complete - JBoss Issue Tracker.
The second case is the problematic one.
The jdbc driver returns XAException.XAER_RMERR. In this case DB commits but after connection is lost method doAbort is called for the rest of xa resources. Thus the other resources are rollbacked.
I understand that it's problem of jdbc driver and incorrect error code but it disconcerts me a bit the fact that databases like mssql, postgresql etc. could end up with inconsistent data. At least for this (corner) case.
Is this just a documentation issue from TM point of view? Or could Narayana somelike prevent that situation?
Thanks
Ondra