1 2 3 Previous Next 75 Replies Latest reply: Feb 25, 2015 5:31 AM by Michael Musgrove RSS

    Design Discussion: Changing the reaper to use setRollbackOnly() instead of rollback()

    Tom Jenkinson Master

      We are currently about to commence work on [JBTM-2079] Reaper mark tx rollback-only - JBoss Issue Tracker and are looking for community feedback on a couple of outstanding questions in regards the implementation.

       

      Briefly, Narayana used to use setRollbackOnly when timing out transactions in our asynchronous reaper thread. Over the years we moved to a model of calling rollback directly instead. This has the advantage that it will free up locks held in the resource manager in a more timely manner thereby allowing overall throughput of the system to improve.

       

      There are certain actors in a transaction such as Synchronizations and XAResources that are not multi-threaded aware and having the rollback operation invoked concurrently with business logic may lead to undesirable behaviour.

       

      We therefore propose to introduce a configurable option to allow the transaction reaper to simply invoke setRollbackOnly on the transaction, rather than a complete rollback.

       

      We are looking for feedback in the following areas:

       

      1. Should this be the default? i.e. Does it affect most actors? We are only aware of one circumstance so far. We therefore propose it not be made the default at this time

      2. Should it only trigger when Narayana detects actors of a configurable type are enlisted in the transaction

      3. How should we handle the distributed transaction scenario where the remote server crashes and a transaction has been propagated to another server. In this case the remote server would be marked rollback only but as the application thread never returns to it it won't be given the opportunity to rollback. Potentially interposed coordinators could be forced to use rollback semantics from their reapers?

        • 1. Re: Design Discussion: Changing the reaper to use setRollbackOnly() instead of rollback()
          Scott Marlow Master

          1. Should this be the default? i.e. Does it affect most actors? We are only aware of one circumstance so far. We therefore propose it not be made the default at this time

           

          I agree that setting the transaction to roll-back only should not be the default.  My preference would be to let users configure their choice and default to calling rollback directly from the reaper thread, as is done today.  My reason is to minimize the operational impact on users.  Let them opt in to the set roll-back only option if that better fits their needs.

           

          2. Should it only trigger when Narayana detects actors of a configurable type are enlisted in the transaction

           

          Hmm, this is an interesting idea.  If one of the actors is a JPA persistence provider Synchronization or an EE (JPA) container level Synchronization, we would want to use the configured choice (mark roll-back only or rollback), but if non of these Synchronizations are registered in the active transaction, we could use  a different configured option (mark roll-back only or rollback). 

           

          3. How should we handle the distributed transaction scenario where the remote server crashes and a transaction has been propagated to another server. In this case the remote server would be marked rollback only but as the application thread never returns to it it won't be given the opportunity to rollback. Potentially interposed coordinators could be forced to use rollback semantics from their reapers?

           

          An example would help me here.  Are you saying that JVM1 starts TX1 and invokes JVM2 with TX1.  In the middle of the call to JVM2, JVM2 terminates and therefore, does not return control to JVM1.  Meanwhile, in JVM1 the remote client invocation detects a socket error and throws an exception, which is caught by the EJB container that will end the transaction by rolling it back.  Alternatively, if JVM1 did not detect the socket error and is still waiting for a response from JVM2, the JVM1 reaper will eventually mark the tx as rollback-only but the socket read will continue to block until it times out.  Is this close to what happened?

           

          I don't yet see in this case, how the (subordinate coordinator/resource???) gets the chance to use rollback semantics from their reaper (I assume you mean JVM1 in this case).  How does the subordinate coordinator detect that JVM2 crashed (is there a watch dog timer pulse in play here?) 

          • 2. Re: Design Discussion: Changing the reaper to use setRollbackOnly() instead of rollback()
            Tom Jenkinson Master

            Hi Scott, here is the scenario I am talking about:

             

            1. JVM1 creates TX

            2. JVM1 calls JVM2

            3. JVM2 schedules a reaper element for the TX

            4. JVM2 enlists a resource

            5. JVM2 returns control to JVM1

            6. JVM1 crashes

            7. JVM2 reaper marks the TX as rollback only

            8. JVM2 is never rolled back

             

            If we are using rollback only, there is no mechanism for JVM2 to actually ever rollback the transaction in the event that JVM1 crashes before it prepares the transaction as the recovery manager won't have a record of the TX to complete.

            • 3. Re: Design Discussion: Changing the reaper to use setRollbackOnly() instead of rollback()
              Scott Marlow Master

              How about some type of liveness check where we later detect that the set rollback only didn't have any impact (on the transaction) after N seconds?

              • 4. Re: Design Discussion: Changing the reaper to use setRollbackOnly() instead of rollback()
                Mark Little Master

                For subordinate transactions (more generally transactions where the control is not within the JVM in question), the reaper needs to know that it should still call rollback when the time-out goes off, or after some period.

                • 5. Re: Design Discussion: Changing the reaper to use setRollbackOnly() instead of rollback()
                  Mark Little Master

                  Two ways of approaching this:

                   

                  a) have the reaper call rollback eventually irrespective of the type of transaction (call setRollbackOnly on the timeout, then set another timeout for rollback).

                   

                  b) have the reaper call setRollbackOnly on timeout and later use bottom-up recovery on inflight transactions to check the status of the parent transaction. Then act accordingly.

                  • 6. Re: Design Discussion: Changing the reaper to use setRollbackOnly() instead of rollback()
                    Tom Jenkinson Master

                    In light of the fact that for the distributed case we would end up needing to call XAR::rollback concurrently anyway (either by pinging the parent coordinator to detect failure of JVM1 or some other mechanism) I am going to close the issue as WONTFIX. It is really a CANT_FIX as there isn't an algorithm that works for the distributed case.

                     

                    Thanks for your input on the discussion and should you have any further input please do get in contact.

                     

                    Tom

                    • 7. Re: Design Discussion: Changing the reaper to use setRollbackOnly() instead of rollback()
                      Scott Marlow Master

                      I don't yet understand why the distributed bug that you mentioned above (with "set rollback only"), is reason enough to dismiss the implementation idea of using SRO.  Having said that, if there are other implementations that address the issue in a better way (making the distributed case easier to solve), lets list them (and the underlying solution that they target).

                       

                      One that comes to mind, would be for WildFly users to only use persistence providers that synchronize access to the underlying persistence context.  We don't have that option now and even if we did (e.g. some future version of Hibernate), we would still want to support applications that package the older version of the persistence provider.  So that doesn't work.

                       

                      One underlying requirement is that the transaction manager reaper thread, be flexible about meeting the user expectation for the following goals that will vary:

                       

                      1. Some users need the reaper thread to roll back JTA transactions in the background, even though this can violate the JPA concurrency requirements when the persistence provider Synchronization.afterCompletion(int status) callback mutates the underlying entity manager instance, while the application thread may be actively mutating it also.
                      2. Some users need the reaper thread to roll back the JTA transaction but not in a way that leads to concurrency violations of the JPA persistence context (underlying entity manager).

                       

                      We already have #1 solved today, but need a way to configure for #2.

                       

                      Do we have other ideas besides SRO?  I'm especially interested in ideas that target implementation in Narayana.

                      • 8. Re: Design Discussion: Changing the reaper to use setRollbackOnly() instead of rollback()
                        Tom Jenkinson Master

                        Hi Scott,

                         

                        How about some kind of SPI that lets you know when the app thread is disassociated with a transaction. In more concrete terms I am thinking something in CMTInterceptor that can call you back to say app thread terminated?

                         

                        In that way a "sync" can collect the outcome of the tx from the transaction manager as normal, plus obtain a callback from the application server to say that thread has been disassociated from a transaction. You can then close the EM when you get both call backs.

                         

                        The only scenario where your "sync" would not be given a callback from the CMTInterceptor is in deadlock. It would be up to your "sync" implementation to implement its own (possibly reaper based) system to cope with that case if you felt it necessary.

                         

                        To be clear, there would be no modification to Narayana required. Inside wildfly someone would provide an SPI callback somewhere in this area:

                        https://github.com/wildfly/wildfly/blob/master/ejb3/src/main/java/org/jboss/as/ejb3/tx/CMTTxInterceptor.java#L274

                         

                        Something like:

                        package org.wildfly;

                        interface TxAssociationListener {

                        public void txAssociated(javax.transaction.Transaction);

                        public void txDisassociated(javax.transaction.Transaction);

                        }

                         

                        What do you think?

                        Tom

                        • 9. Re: Design Discussion: Changing the reaper to use setRollbackOnly() instead of rollback()
                          Michael Musgrove Master

                          Why not make the reaper timeout configurable so that JPA can trigger what it needs to do after the tx timeout has passed but before the reaper timeout is breached.

                          • 10. Re: Design Discussion: Changing the reaper to use setRollbackOnly() instead of rollback()
                            Scott Marlow Master

                            Closing the EM, is a specification requirement for the EE (JPA) container.  Detaching any loaded entities, is a spec requirement for the JPA persistence provider (only if transaction was rolled back).  Detaching entities from a non-application thread will violate EntityManager concurrency.  In your proposal, which thread calls the TxAssociationListener callbacks (reaper/TM communication threads or application thread that previously owned the transaction that rolled back?

                            • 11. Re: Design Discussion: Changing the reaper to use setRollbackOnly() instead of rollback()
                              Scott Marlow Master

                              How does that prevent the reaper thread from calling the Hibernate ORM Synchronization.afterCompletion(int) that updates the Hibernate session while the application thread may be using the Hibernate session still?

                              • 12. Re: Design Discussion: Changing the reaper to use setRollbackOnly() instead of rollback()
                                Michael Musgrove Master

                                Scott Marlow wrote:

                                 

                                How does that prevent the reaper thread from calling the Hibernate ORM Synchronization.afterCompletion(int) that updates the Hibernate session while the application thread may be using the Hibernate session still?

                                I don't have an in depth knowledge of the JPA spec so perhaps my solution is naive but I was thinking that you don't necessarily need to register any synchronizations. When the tx timeout is reached the reaper marks it rollback-only (and when the reaper timeout is reached it would roll back the transaction). This gives the hibernate layer the opportunity to detect that the transaction timeout period is passed (by setting a timer) before the reaper actually rolls back the transaction and you can do what you need to do to work around the issue.

                                • 14. Re: Design Discussion: Changing the reaper to use setRollbackOnly() instead of rollback()
                                  Scott Marlow Master

                                  The Hibernate synchronization would run in the application thread, which avoids concurrent updates to the Hibernate session.  In the distributed/multi-vm case, the abandoned jvm that doesn't know that the transaction should end, would use the second timeout to eventually roll back the transaction. 

                                  1 2 3 Previous Next