12 Replies Latest reply on Mar 1, 2017 8:17 AM by hchiorean

    [ModeShape 5.x] Dealing with aborted user transactions

    illia.khokholkov

      I am seeking for a guidance on how to deal with aborted user transactions. I am utilizing Arjuna as a transaction manager and make use of user transactions when multiple workspace-write JCR methods are necessary to complete the requested task. By default, the transaction manager in use has a timeout of 60 seconds, which is plenty, and it does allow for custom timeout configuration. I was testing how ModeShape would behave if transaction reaper that comes with Arjuna aborts an active transaction because of a timeout. I do not fully understand whether what I see is what is expected to happen. Hence, please look at the examples provided below. Any help is greatly appreciated. Please, note that the links to the test source code are provided in the end of this post (the examples do not reflect the nature of the problem I am working on, they simply attempt to reproduce the outcome that I see).

       

      The example of adding a single child node to a given parent node inside a user transaction. For the full source code, please refer to [1].

       

      1. Lock the node to be updated by obtaining shallow, open-scoped JCR lock.
      2. Start user transaction, which is set to time out after 3 seconds.
      3. Pause the thread of execution for 5 seconds to give transaction reaper a reason to abort the current transaction.
      4. Add a child node to the existing one.
      5. Save the session.
      6. End user transaction.
      7. Unlock the initially locked node.
      8. Check whether the new child node was added to the repository.

       

      @Test
      public void addOneNode() throws Exception {
          Session session = createSession(repositoryIterator.next());
          MutableObject<String> childPath1 = new MutableObject<>();
          
          try {
              Node parentNode = session.getNode(ABSOLUTE_PARENT_NODE_PATH);
              NodeHelper.lockNode(parentNode);
              
              try {
                  TransactionExecutor.runInTransaction(() -> {
                      try {
                          Thread.sleep(TimeUnit.SECONDS.toMillis(5));
                      } catch (InterruptedException e) {
                          throw new RuntimeException(e);
                      }
                      
                      Node childNode1 = parentNode.addNode(UUID.randomUUID().toString());
                      childNode1.addMixin("mix:lockable");
                      session.save();
                      
                      childPath1.setValue(childNode1.getPath());
                      return null;
                  });
              } finally {
                  NodeHelper.unlockNode(parentNode);
              }
          } finally {
              session.logout();
          }
          
          assertThat(childPath1.getValue()).isNotNull();
          Session newSession = createSession(repositoryIterator.next());
          
          try {
              assertThat(newSession.nodeExists(childPath1.getValue()))
                      .as("The child node should not be saved, because user transaction was aborted")
                      .isFalse();
          } finally {
              newSession.logout();
          }
      }
      

       

      To my surprise, despite the fact that user transaction was aborted, ModeShape created a brand new transaction to fulfill the request to save the session and the new child node got persisted in the data store. Looking at the underlying source code, what I observed matches the existing algorithm. I assume that this behavior is intentional, because there was no way for ModeShape to detect that user transaction got aborted, therefore, it had nothing else to do other than create a new transaction to take care of the request to save the changes. Is this correct?

       

      The example of adding two child nodes to a given parent node inside a user transaction. For the full source code, please refer to [2].

       

      1. Lock the node to be updated by obtaining shallow, open-scoped JCR lock.
      2. Start user transaction, which is set to time out after 3 seconds.
      3. Add the first child node.
      4. Save the session.
      5. Pause the thread of execution for 5 seconds to make sure that transaction reaper does terminate the initial user transaction.
      6. Add the second child node.
      7. Attempt to save the session.
      8. End user transaction.
      9. Unlock the initially locked node.
      10. Attempt to check whether any of the child nodes were added.

       

      @Test
      public void addTwoNodes() throws Exception {
          Session session = createSession(repositoryIterator.next());
          MutableObject<String> childPath1 = new MutableObject<>();
          MutableObject<String> childPath2 = new MutableObject<>();
          
          try {
              Node parentNode = session.getNode(ABSOLUTE_PARENT_NODE_PATH);
              NodeHelper.lockNode(parentNode);
              
              try {
                  TransactionExecutor.runInTransaction(() -> {
                      Node childNode1 = parentNode.addNode(UUID.randomUUID().toString());
                      childNode1.addMixin("mix:lockable");
                      session.save();
                      
                      try {
                          Thread.sleep(TimeUnit.SECONDS.toMillis(5));
                      } catch (InterruptedException e) {
                          throw new RuntimeException(e);
                      }
                      
                      Node childNode2 = parentNode.addNode(UUID.randomUUID().toString());
                      childNode2.addMixin("mix:lockable");
                      session.save();
                      
                      childPath1.setValue(childNode1.getPath());
                      childPath2.setValue(childNode2.getPath());
                      
                      return null;
                  });
              } finally {
                  NodeHelper.unlockNode(parentNode);
              }
          } finally {
              session.logout();
          }
          
          // This part of the test is never reached, due to timeout exception on the attempt to save
          // the second added node.
          
          assertThat(childPath1.getValue()).isNotNull();
          assertThat(childPath2.getValue()).isNotNull();
          Session newSession = createSession(repositoryIterator.next());
          
          try {
              assertThat(newSession.nodeExists(childPath1.getValue()))
                      .as("The first child node should not be saved, because user transaction was aborted")
                      .isFalse();
              
              assertThat(newSession.nodeExists(childPath2.getValue()))
                      .as("The seconds child node should not be saved, because user transaction was aborted")
                      .isFalse();
          } finally {
              newSession.logout();
          }
      }
      

       

      An attempt to save the session after adding a second child node takes a while and then errors with:

       

      Caused by: org.modeshape.jcr.TimeoutException: Timeout while attempting to lock the keys [4a789507505d642b9564ce-24e3-49d5-b8a7-b1cc36785ae4] after 0 retry attempts.
          at org.modeshape.jcr.cache.document.WritableSessionCache.lockNodes(WritableSessionCache.java:1543)
          at org.modeshape.jcr.cache.document.WritableSessionCache.save(WritableSessionCache.java:687)
          ... 34 more
      

       

      If nothing else, I would expect ModeShape to successfully create a transaction and persist the requested changes, based on what was done in the first example, where changes were persisted despite the interrupted and disassociated user transaction. Is this expected and if so, why? If atomicity of actions executed within a user transaction cannot be enforced when user transaction gets aborted and ModeShape creates a new one, does it even make sense to set transaction timeout (should it simply be indefinite to avoid persisting changes that should not be)? Am I misusing something or making false assumptions about how Arjuna and ModeShape should behave? Does ModeShape support nested transactions? Many thanks in advance, your help is greatly appreciated.

       

      [1] modeshape-cluster-test/TransactionTest.java at master · dnillia/modeshape-cluster-test · GitHub

      [2] modeshape-cluster-test/TransactionTest.java at master · dnillia/modeshape-cluster-test · GitHub

        • 1. Re: [ModeShape 5.x] Dealing with aborted user transactions
          hchiorean

          Both [1] and [2] are bugs, hence [MODE-2668] User transactions are not rolled back if timeouts occur - JBoss Issue Tracker

           

          In general, if multiple JCR operations take place within a user transaction and this transaction is rolled back (for whatever reason - e.g. timeout), all the JCR operations should be rolled back as well, ensuring atomicity.

          • 2. Re: [ModeShape 5.x] Dealing with aborted user transactions
            hchiorean

            Once this is fixed, the behavior will be that ModeShape will raise an exception whenever a session.save or similar operation is performed with a user transaction which is not valid.

            However, as per the JCR spec of session#save()

              * If validation fails, then no pending changes are dispatched and they

              * remain recorded on the <code>Session</code>. There is no best-effort or

              * partial <code>save</code>.

            In other words, any changes *up to the point of the failure* will remain recorded in the session instance. This means it's up to the client code to correctly clean the session (via #refresh(false)) or retry sections of the logic.

            • 3. Re: [ModeShape 5.x] Dealing with aborted user transactions
              illia.khokholkov

              hchiorean, thanks a lot for logging the bug and fixing the problem in such a quick manner. After minor updates to the original test project, I can confirm that everything functions as expected.

              • 4. Re: [ModeShape 5.x] Dealing with aborted user transactions
                illia.khokholkov

                hchiorean, I have some other related questions about the updated behavior. As it now stands, if a rolled back transaction is detected, an exception gets thrown, which is expected. However, what would happen if a LockManager#unlock() needs to be performed, as provided in the original examples? Upon unlock, which happens in the same thread of execution, the transaction manager will be asked to provide a current transaction, which is the aborted one. As a result, the newly introduced exception will be thrown, which makes sense too, but the initially locked node will remain in the locked state, especially with open-scoped locks, where JCR will not do any kind of cleanup upon Session#logout(). What would you recommend in such a case? Should LockManager#unlock() be happening in the separate thread, so that a fresh transaction can be created? My apologies if I am missing something here. Thank you.

                • 5. Re: [ModeShape 5.x] Dealing with aborted user transactions
                  hchiorean

                  illia.khokholkov there a couple of things that can be discussed on this topic:

                   

                  * as a general recommendation, you should not be mixing explicit user transactions (either explicit ones like your test case or CMTs if you're running in EE servers for example) with implicit ones *off the same thread* (when I say "implicit" I mean from the point of view of the repository). In your client application, when you're performing JCR operations, you should decide to either manage transactions externally or let the repository manage the transactions for you, but not mix the 2 approaches.

                  Technically you can of course - as your example illustrates - but I don't recommend it

                   

                  * prior to this fix, as you pointed out in the original comment, ModeShape was taking the "liberty" of suspending existing user transactions and doing some "other magic stuff" to persist it's data. That is not correct conceptually, because if the client application decides to manage transactions by itself, then it should be 100% responsible for managing the entire lifecycle of those tx (start/commit/rollback/handle exceptional cases etc). ModeShape *should not* under any circumstances interfere with the lifecycle of these transactions. The only thing it should do is make sure it does not break ACID guarantees from the point of the view the JCR operations and the persisted data within each transaction

                   

                  * tx managers associate transactions per calling thread (most of them) so if there's some sort of problem with any tx/thread (in your case it times out or it's rolled back) if the client code wants to continue to use *the same thread* to perform subsequent JCR operations, it should make sure the invalid transaction is disassociated from the calling thread first. In the case of Arjuna you do that by calling #suspend on the transaction manager)

                   

                  To conclude, in your test case I would first recommend that you do the locking/unlocking off a separate thread, as per the first point. If you don't want that, the alternative is to handle RepositoryExceptions caused by invalid transactions (i.e. try/catch) and make sure you call #suspend before continuing (at least for Arjuna)

                  • 6. Re: [ModeShape 5.x] Dealing with aborted user transactions
                    illia.khokholkov

                    hchiorean, thank you for the clarifications. If you do not mind, could you please answer a few more follow-up questions? They are presented below.

                     

                    In your client application, when you're performing JCR operations, you should decide to either manage transactions externally or let the repository manage the transactions for you, but not mix the 2 approaches.

                    What do I need to do to not mix two approaches? I thought that since I provide ModeShape with the transaction manager and I use the same manager too, the management of transactions, or at the very least the source of them, would be the same. It would be nice if repository could manage everything for me, but what happens when I need to invoke multiple workspace-write methods and I want them to be atomic, i.e. they all either fail or succeed? Can repository do that for me and if so, how?

                    To conclude, in your test case I would first recommend that you do the locking/unlocking off a separate thread, as per the first point.

                    Do you mean that a new thread could be created for lock/unlock operations to ensure that rolled back transaction does not affect the ability to perform the requested operation? Thanks a lot for all your help.

                    • 7. Re: [ModeShape 5.x] Dealing with aborted user transactions
                      illia.khokholkov

                      hchiorean, in case I misunderstood your original suggestion, here is another point of view. Are you suggesting that either all transactions should be managed by user, or none at all, is that right? And if so, something like lock/unlock should also be happening inside a user transaction, so that ModeShape by itself does not try creating any new ones?

                      • 8. Re: [ModeShape 5.x] Dealing with aborted user transactions
                        hchiorean
                        What do I need to do to not mix two approaches? I thought that since I provide ModeShape with the transaction manager and I use the same manager too, the management of transactions, or at the very least the source of them, would be the same. It would be nice if repository could manage everything for me, but what happens when I need to invoke multiple workspace-write methods and I want them to be atomic, i.e. they all either fail or succeed? Can repository do that for me and if so, how?

                        the problem with mixing the 2 approaches is that there are JCR session-write operations outside of session.save() like unlock, lock and a bunch of others which, if called outside of a transaction, take effect immediately. So on one hand you're using JCR operations which create and commit transactions behind the scenes, while on the other you're explicitly grouping other session-write operations (like session.save) into an atomic unit.

                         

                        If you're mixing this approach on the same thread, this becomes highly problematic because any leftover user transaction on a given thread - leftover = a transaction which is not dissassociated after a regular commit or unexpected failure - will impact any write operation that comes after it outside of a transaction. That's why I said this is not recommended and in this case you have to do some transaction management, more specifically make sure that if a transaction is aborted it gets cleaned up from that thread before calling any more JCR operations.

                        • 9. Re: [ModeShape 5.x] Dealing with aborted user transactions
                          hchiorean

                          illia.khokholkov the options that I see

                           

                          1. what you're doing right now - which I said is not recommended - in which case you have to explicitly catch RepositoryException in the transactional block and check if the current tx is rolled back and if yes, suspend it

                          2. move unlock off a separate thread (and any other JCR write operation outside and after a transaction which might have failed

                           

                          In reality, I would expect any application using explicit user transactions to specifically handle any unexpected cases with these transactions

                          • 10. Re: [ModeShape 5.x] Dealing with aborted user transactions
                            illia.khokholkov

                            hchiorean, thanks for thinking through the options available. Using the latest snapshot of ModeShape, I have decided to take both options for a spin. To my surprise, I ran into a different kind of failure, which manifests itself in one of the following forms (either way, the outcome is that the initially locked node remains locked):

                             

                            java.lang.IllegalStateException: Cannot attempt to lock documents without an existing ModeShape transaction
                                at org.modeshape.jcr.cache.document.LocalDocumentStore.lockDocuments(LocalDocumentStore.java:160)
                                at org.modeshape.jcr.cache.document.LocalDocumentStore.lockDocuments(LocalDocumentStore.java:153)
                                at org.modeshape.jcr.cache.document.WritableSessionCache.lockNodes(WritableSessionCache.java:1524)
                                at org.modeshape.jcr.cache.document.WritableSessionCache.save(WritableSessionCache.java:682)
                                at org.modeshape.jcr.RepositoryLockManager.unlock(RepositoryLockManager.java:479)
                                at org.modeshape.jcr.RepositoryLockManager.unlock(RepositoryLockManager.java:447)
                                at org.modeshape.jcr.JcrLockManager.unlock(JcrLockManager.java:305)
                                at org.modeshape.jcr.JcrLockManager.unlock(JcrLockManager.java:283)
                            
                            

                             

                            org.modeshape.jcr.TimeoutException: Timeout while attempting to lock the keys [4a789507505d6437b00517-9dc7-480d-8cb8-91e152baf53d] after 0 retry attempts.
                                at org.modeshape.jcr.cache.document.WritableSessionCache.save(WritableSessionCache.java:702)
                                at org.modeshape.jcr.RepositoryLockManager.unlock(RepositoryLockManager.java:479)
                                at org.modeshape.jcr.RepositoryLockManager.unlock(RepositoryLockManager.java:447)
                                at org.modeshape.jcr.JcrLockManager.unlock(JcrLockManager.java:305)
                                at org.modeshape.jcr.JcrLockManager.unlock(JcrLockManager.java:283)
                            

                             

                            I have updated the originally submitted test cases to demonstrate the problems observed. The following will result in "Cannot attempt to lock documents without an existing ModeShape transaction":

                             

                             

                            As far as I can tell, the "Cannot attempt to lock documents without an existing ModeShape transaction" occurs because the transaction, that was created to unlock the node, gets aborted by the transaction reaper. If I were to increase timeout to something like 100 seconds and pause the thread of execution for 110 seconds, the LockManager#unlock() inside a user transaction, which gets created after suspending the currently inactive one or in a brand new thread, will fail because of "Timeout while attempting to lock the keys".

                             

                             

                            hchiorean, my apologies for being annoying and potentially not seeing something inherently incorrect with the test cases that I provided. It would be awesome if you could find some more time to review the problems I am running into and confirm/deny whether the behavior observed is the expected one. As a side question, would it be a really bad idea to set an indefinite transaction timeout? Thank you.

                            • 11. Re: [ModeShape 5.x] Dealing with aborted user transactions
                              hchiorean

                              #addOneNodeAbortAfterSaveUnlockSuspendAborted() and #addTwoNodesAbortBeforeSecondSaveUnlockSuspendAborted() both pass on my machine; if they fail on yours, it's likely some thread race issue (see below)

                               

                              #addTwoNodesAbortBeforeSecondSaveUnlockNewThread()  fails on my machine as well and exposes [MODE-2669] Relational persistence provider does not rollback data correctly when a user transaction is rolledback - JBo…

                              However, the message "Cannot attempt to lock..." is misleading (and will be fixed as part of[MODE-2668] Incorrect handling of aborted user transactions - JBoss Issue Tracker).

                              What's causing this failure is the fact that Arjuna's reaper thread will abort the transaction started as part of the unlock operation

                               

                              I've added some more logging and you can see for the #unlock operation

                               

                              [main] org.modeshape.jcr.txn.Transactions - Found user transaction TransactionImple < ac, BasicAction: 0:ffff0a28c815:ff8e:58b6b2c3:21 status: ActionStatus.RUNNING >
                              [main] com.arjuna.ats.jta - TransactionImple.registerSynchronization - Class: class org.modeshape.jcr.txn.Transactions$TransactionTracer HashCode: 315072539 toString: org.modeshape.jcr.txn.Transactions$TransactionTracer@12c7a01b
                              [Transaction Reaper] com.arjuna.ats.arjuna - TransactionReaper::check - comparing 1488368326712
                              [Transaction Reaper] com.arjuna.ats.arjuna - ARJUNA012117: TransactionReaper::check timeout for TX 0:ffff0a28c815:ff8e:58b6b2c3:21 in state  RUN
                              [Transaction Reaper] com.arjuna.ats.arjuna - Reaper scheduling TX for cancellation 0:ffff0a28c815:ff8e:58b6b2c3:21
                              [Transaction Reaper] com.arjuna.ats.arjuna - TransactionReaper::check - comparing 1488368327212
                              [Transaction Reaper] com.arjuna.ats.arjuna - Thread Thread[Transaction Reaper,5,main] sleeping for 500
                              [Transaction Reaper Worker 0] com.arjuna.ats.arjuna - Thread Thread[Transaction Reaper Worker 0,5,main] performing cancellations
                              [Transaction Reaper Worker 0] com.arjuna.ats.arjuna - Reaper Worker Thread[Transaction Reaper Worker 0,5,main] attempting to cancel 0:ffff0a28c815:ff8e:58b6b2c3:21
                              [Transaction Reaper Worker 0] com.arjuna.ats.arjuna - BasicAction::Abort() for action-id 0:ffff0a28c815:ff8e:58b6b2c3:21
                              [Transaction Reaper Worker 0] com.arjuna.ats.arjuna - ARJUNA012095: Abort of action id 0:ffff0a28c815:ff8e:58b6b2c3:21 invoked while multiple threads active within it.
                              [Transaction Reaper Worker 0] com.arjuna.ats.arjuna - ARJUNA012381: Action id 0:ffff0a28c815:ff8e:58b6b2c3:21 completed with multiple threads - thread main was in progress with java.lang.Thread.sleep(Native Method)
                              

                               

                              If you look at lines 1-2 and 10-12 you'll see that what ModeShape detects an active transaction initially which is later on aborted off a different thread (the Arjuna reaper thread). It seems that Arjuna aborts this transaction as well because it times-out waiting for a lock to be obtained.

                               

                              In short, all these issues are caused by the fact that a user transaction rollback is performed off a separate thread than the thread which ModeShape used to detect the transaction in the first place.