Design Discussion: Changing the reaper to use s...| JBoss.org Content Archive (Read Only)

30. Re: Design Discussion: Changing the reaper to use setRollbackOnly() instead of rollback()

tomjenkinson Aug 8, 2014 8:59 AM (in response to smarlow)

For a user transaction, we could do the integration here:

jboss-transaction-spi/ServerVMClientUserTransaction.java at master · jbosstm/jboss-transaction-spi · GitHub

and:

https://github.com/jbosstm/jboss-transaction-spi/blob/master/src/main/java/org/jboss/tm/usertx/client/ServerVMClientUserTransaction.java#L182

Those methods can call txDissasociated as they should be coming from the app thread as the reaper doesn't invoke them.

31. Re: Design Discussion: Changing the reaper to use setRollbackOnly() instead of rollback()

smarlow Aug 8, 2014 9:45 AM (in response to tomjenkinson)

Sounds like we are getting closer to a different possible workaround (which gives us two workarounds, one for the distributed bug with SRO and one possible for workaround that is done via NEWAPI/SPI changes).

For the NEWAPI/SPI change, we also should follow open standards (for the same reason that we previously choose to avoid the XA breaking Status.timedout change). How could we implement the NEWAPI/SPI change in a way that doesn't involve any NEWAPI/SPI extensions in Hibernate? So we can continue to work with versions of Hibernate that do not know about the extension? Very much like if we extended XA, we would want to be compatible with the current XA standard and not require XA.Next.

32. Re: Design Discussion: Changing the reaper to use setRollbackOnly() instead of rollback()

tomjenkinson Aug 8, 2014 9:50 AM (in response to smarlow)

In comment #24 I provided a simple implementation, it is possible that that could wrap _any_ none-concurrent able actor, including the hibernate synchronization. During register synchronization maybe we can wrap their synchronization for them in the integration code before it hits Narayana?

33. Re: Design Discussion: Changing the reaper to use setRollbackOnly() instead of rollback()

smarlow Aug 8, 2014 2:00 PM (in response to tomjenkinson)

Here is an example of where container level callbacks (_any_ none-concurrent able actors) would be called too late. For example, a servlet invocation would have the web container as its top level container. Queuing up the (persistence provider) synchronization call backs in some fashion, to occur when the web container gets control as part of a servlet invocation, sounds very late, especially if the servlet is in a loop and creating more than one JTA transaction (on TX rollback the persistence context will contain invalid state until the top level container is returned to, which breaks the next loop iteration in the servlet).

34. Re: Design Discussion: Changing the reaper to use setRollbackOnly() instead of rollback()

tomjenkinson Aug 11, 2014 5:38 AM (in response to smarlow)

Hi Scott,

If I wasn't clear, I am proposing that the NEWAPI callback can be done during commit in the SPI code rather than Narayana?

For UT, it would be this:

https://github.com/jbosstm/jboss-transaction-spi/blob/master/src/main/java/org/jboss/tm/usertx/client/ServerVMClientUserTransaction.java#L182

and:

jboss-transaction-spi/ServerVMClientUserTransaction.java at master · jbosstm/jboss-transaction-spi · GitHub

Tom

35. Re: Re: Design Discussion: Changing the reaper to use setRollbackOnly() instead of rollback()

smarlow Aug 11, 2014 9:58 AM (in response to tomjenkinson)

Tom,

For UT, wouldn't the VMClientUserTransaction.rollback() be called from the wrong (reaper or communication) thread? The reaper thread needs to coordinate with the application thread, in a way that doesn't require cooperation with the EE container (for cases where the EE container is not in control).

Question about the distributed issue with SRO. How would you clarify further what you meant when you said:

In light of the fact that for the distributed case we would end up needing to call XAR::rollback concurrently anyway (either by pinging the parent coordinator to detect failure of JVM1 or some other mechanism) I am going to close the issue as WONTFIX. It is really a CANT_FIX as there isn't an algorithm that works for the distributed case.

Assuming that we had a secondary timeout, for detecting that this situation is occurring. What other strategies could we consider for automatically rolling back when related JVM processes are no longer reachable (due to a network partition or unexpected termination of other JVM)? I'm puzzled by the reaction that there isn't an algorithm that works. At the very least, this seems similar to split brain handling which could also be handled by calling system.exit(). Or, another approach is requiring manual intervention from an administrator (wouldn't calling system.exit also require administrator or recovery also?)

Scott

36. Re: Design Discussion: Changing the reaper to use setRollbackOnly() instead of rollback()

tomjenkinson Aug 11, 2014 12:57 PM (in response to smarlow)

Scott Marlow wrote:

Tom,

For UT, wouldn't the VMClientUserTransaction.rollback() be called from the wrong (reaper or communication) thread? The reaper thread needs to coordinate with the application thread, in a way that doesn't require cooperation with the EE container (for cases where the EE container is not in control).

Hi Scott,

No, that SPI method is only accessed by the users application thread actually, that is the trick . The object that it calls is the one that is also called by the reaper (the reference is called tm, its actually a structure internal to that even that is invoked by the reaper).

Tom

37. Re: Design Discussion: Changing the reaper to use setRollbackOnly() instead of rollback()

tomjenkinson Aug 11, 2014 1:15 PM (in response to smarlow)

Scott Marlow wrote:

Question about the distributed issue with SRO. How would you clarify further what you meant when you said:

In light of the fact that for the distributed case we would end up needing to call XAR::rollback concurrently anyway (either by pinging the parent coordinator to detect failure of JVM1 or some other mechanism) I am going to close the issue as WONTFIX. It is really a CANT_FIX as there isn't an algorithm that works for the distributed case.

Assuming that we had a secondary timeout, for detecting that this situation is occurring. What other strategies could we consider for automatically rolling back when related JVM processes are no longer reachable (due to a network partition or unexpected termination of other JVM)? I'm puzzled by the reaction that there isn't an algorithm that works. At the very least, this seems similar to split brain handling which could also be handled by calling system.exit(). Or, another approach is requiring manual intervention from an administrator (wouldn't calling system.exit also require administrator or recovery also?)

Scott

I am just stating the fact that if the "app thread" in JVM1 dies, and there is a corresponding thread (T2) in JVM2 that is still under the control of "app thread" in JVM1, whatever process we put in place to issue a rollback may be invoked concurrently with an in progress invocation in T2:

38. Re: Design Discussion: Changing the reaper to use setRollbackOnly() instead of rollback()

smarlow Aug 11, 2014 1:47 PM (in response to tomjenkinson)

I'm trying to imagine what an implementation would actually look like.

One implementation could arrange to intercept all Synchronization callbacks and register a single Synchronization callback that might be called in the reaper thread. When the application thread makes the next call to an ended transaction (determined by a UT listener and TM listener), call the intercepted Synchronization callbacks. This is a little different than you described but seems logically similar.

Pushing this problem out of the transaction manager, still feels wrong (IMO, control of Synchronizations should be in the TM) but we are only talking here, so lets continue.

39. Re: Design Discussion: Changing the reaper to use setRollbackOnly() instead of rollback()

smarlow Aug 11, 2014 1:51 PM (in response to tomjenkinson)

Good point, so we couldn't safely roll back the transaction after the second time out is exceeded. Are there any other alternatives after the second time out is exceeded, that would be safer?

40. Re: Design Discussion: Changing the reaper to use setRollbackOnly() instead of rollback()

marklittle Aug 12, 2014 4:42 AM (in response to smarlow)

Apart from crashing the JVM and allowing transaction recovery to kick in, not really.

41. Re: Design Discussion: Changing the reaper to use setRollbackOnly() instead of rollback()

smarlow Aug 12, 2014 11:50 AM (in response to marklittle)

A while back, we talked about introducing a "timed out" status, that could be made available to Synchronization.afterCompletion(int status) calls (either via extension or a JTA Status class change). Would it be possible to have the "timed out" status as an extension?

42. Re: Design Discussion: Changing the reaper to use setRollbackOnly() instead of rollback()

marklittle Aug 13, 2014 5:32 AM (in response to smarlow)

The reason we didn't go with the timed out status was because it's not standards compliant and would therefore only work if you're deployed into EAP/WF. It's also a change to the public API and hence wouldn't be available until EAP 7.

43. Re: Design Discussion: Changing the reaper to use setRollbackOnly() instead of rollback()

smarlow Aug 13, 2014 11:46 AM (in response to marklittle)

The reason why I am circling back to the timed out status, is because no other paths have been agreed to have solutions. At this point, my question is whether we could/would add a Narayana extension for getting the timed out status? I'm not asking for that change yet, just asking if it would work in distributed/non-distributed cases.

As of now, I think the possible solutions could be:

Add configuration to allow reaper thread to roll back in the background or mark roll back only. This doesn't work for the distributed case that Tom mentioned when a sibling JVM involved in the transaction terminates and the TX then times out. I wonder what happens when the TX doesn't time out and the JVM with control, tries to commit the transaction. I'd expect that attempting to prepare the resource for the terminated JVM, would fail (or something like that .
Change WildFly to wrap the TransactionManager/TransactionSychronizationRegistry and take control of managing which thread the Synchronizations run in. Mostly, I'm concerned that the best place to manage Synchronizations, is in the Narayana project. The WildFly Synchronization implementation would need to fork all of the current Synchronization logic from Narayana, to improve the chances of WildFly getting it right (if it is possible without access to Narayana internals). Another concern that I have, is the behaviour changes that this would bring, where WildFly Synchronizations are run later than Narayana Synchronization would of been (could lead to OOM issues if certain Synchronizations are run later or skipped). There could be ways to minimize the risk of this approach, by having the WildFly Synchronization controller implementation, extend the Narayana Synchronization controller (not sure what the correct term for the class that manages the Synchronizations is). Such, that the WildFly Synchronization controller, when run in the application thread (as part of checking the timed out status) would simply pass control to the Narayana Synchronization controller. The WildFly Synchronizations controller, needs to ensure that the Synchronizations are run as close the transaction ending boundary as possible and also compensate for any application bugs (e.g. the reaper rolled the tx back and app logic skips calling UT.rollback()). Obviously, this path would need a lot more discussion and planning.
Change certain thread unsafe Synchronization JPA callbacks (EE container/Persistence providers) to check if they are called from a background thread (via the timed out status extension if that becomes possible). If this is the only viable path, it is a somewhat longer road than using SRO, as this is best done as a JTA extension (instead of Narayana specific extension), that is followed up by the JPA expert group in some future version of the JPA spec and lastly, accepted in some fashion into the EE spec. The action that the JPA callback would take, when called in a background thread, could be a few different things. As a start, the JPA persistence context could be marked as "timed out", which would have some impact in the application thread that needs to be discussed further (with regard to whether we want a new JPA state and what applications can do about it).
Introduce the TxAssociationListener that is discussed above. Depending on which Synchronization callbacks are replaced with TxAssociationListener, we might have some Synchronization callbacks running in the reaper thread (e.g. IronJacamar which would defer connection closing to the TxAssociationListener based JPA callback). The JPA callbacks (the ones that are modified to depend on a WildFly/someProject TxAssociationListener extension), may be invoked from the application thread if possible. In the cases where the application thread doesn't call the TxAssociationListener callback, the top level thread in control (e.g EE container), would then need to call the TxAssociationListener callbacks. Deferring the calling of the TxAssociationListener callbacks, could lead to memory issues (especially if the application logic is looping and creating different UTs within one invocation but the application fails to properly end the TX or other application bugs like that).

44. Re: Design Discussion: Changing the reaper to use setRollbackOnly() instead of rollback()

tomjenkinson Aug 13, 2014 11:30 AM (in response to smarlow)

Hi Scott,

Aren't you are missing the option 4 I have proposed above?

4. Provide an SPI method to allow "actors" (in concrete terms, most likely the EE/JPA sync) to know when a transaction is disassociated from the thread. I am thinking the API would just be:

package org.wildfly;

interface TxAssociationListener {

public void txDisassociated(javax.transaction.Transaction);

}

Tom