1 2 3 Previous Next 33 Replies Latest reply on Dec 8, 2005 9:19 AM by Manik Surtani

    JBCACHE-359 Discussion Thread

    Brian Stansberry Master

      Opening discussion of issues related to http://jira.jboss.com/jira/browse/JBCACHE-359.

      Issues and possible solutions in the 1.2.4 branch.

      (We'll probably want a fix on the 1.2.4_SP1 branch both to support the JBento testing efforts and for use in JBoss AS 4.0.4. The relevant code in HEAD is quite different, and different solutions will be needed).

      One failure mode is when a rollback is initiated due to a call to tx.setRollbackOnly(). Among other reasons, this will happen on the initiating side of a replication if the replicated prepare() call fails. Need to investigate the exact procedure that occurs on the receiving side if a tx times out after a successful prepare() call because the sender crashes and never replicates a commit() call.

      The simple solution is to change PessimisticLockInterceptor.invoke() from

      if(tx_mgr != null && (tx=tx_mgr.getTransaction()) != null && isValid(tx)) { // ACTIVE or PREPARING
       ... find gtx
      }
      


      to

      if(tx_mgr != null && (tx=tx_mgr.getTransaction()) != null && isValidOrRollingBack(tx)) { // ACTIVE or PREPARING or ROLLING_BACK
       ... find gtx
      }
      


      plus, of couse, adding method isValidOrRollingBack().

      I made those changes on my local 1.2.4 branch copy and found that the newly added test in SyncReplTxTest passes.

        • 1. Re: JBCACHE-359 Discussion Thread
          Brian Stansberry Master

          More on issues and possible solutions in the 1.2.4 branch.

          The fix discussed allows the SyncReplTxTest case to succeed, but with testing session replication I still see errors like this:

          2005-12-02 17:02:07,564 ERROR [org.jboss.cache.interceptors.PessimisticLockInterceptor] undo operation failed
          java.lang.RuntimeException: JBossCacheService: exception occurred in cache get after retry ...
           at org.jboss.web.tomcat.tc5.session.JBossCacheService._get(JBossCacheService.java:442)
           at org.jboss.web.tomcat.tc5.session.JBossCacheService.nodeDirty(JBossCacheService.java:730)
           at org.jboss.web.tomcat.tc5.session.JBossCacheService.nodeModified(JBossCacheService.java:721)
           at org.jboss.cache.TreeCache.notifyNodeModified(TreeCache.java:4519)
           at org.jboss.cache.TreeCache._put(TreeCache.java:3715)
           at jrockit.reflect.VirtualNativeMethodInvoker.invoke(Ljava.lang.Object;[Ljava.lang.Object;)Ljava.lang.Object;(Unknown Source)
           at java.lang.reflect.Method.invoke(Ljava.lang.Object;[Ljava.lang.Object;J)Ljava.lang.Object;(Unknown Source)
           at org.jgroups.blocks.MethodCall.invoke(MethodCall.java:321)
           at org.jboss.cache.interceptors.PessimisticLockInterceptor.rollback(PessimisticLockInterceptor.java:336)
           at org.jboss.cache.interceptors.PessimisticLockInterceptor.access$200(PessimisticLockInterceptor.java:31)
           at org.jboss.cache.interceptors.PessimisticLockInterceptor$SynchronizationHandler.afterCompletion(PessimisticLockInterceptor.java:418)
           at org.jboss.cache.interceptors.OrderedSynchronizationHandler.afterCompletion(OrderedSynchronizationHandler.java:80)
           at org.jboss.tm.TransactionImpl.doAfterCompletion(TransactionImpl.java:1508)
           at org.jboss.tm.TransactionImpl.completeTransaction(TransactionImpl.java:1180)
           at org.jboss.tm.TransactionImpl.rollback(TransactionImpl.java:395)
           at org.jboss.cache.interceptors.ReplicationInterceptor.replicate(ReplicationInterceptor.java:205)
           at org.jboss.cache.TreeCache._replicate(TreeCache.java:4033)


          (where the exception is due to lock timeout on the callback).

          A diifference in this case vs. the unit test is that the rollback is not due to a problem in the local tx resulting in a call to setRollbackOnly() on the tx, with the rollback happening on the tx.commit() call. Here the tx is being explicitly rolled back via a call to tx.rollback(). This call is happening due to a replicated rollback message from the cluster node that originated the gtx.

          Need to investigate why in this case the lock interceptor isn't properly identifying the gtx.

          • 2. Re: JBCACHE-359 Discussion Thread
            Brian Stansberry Master

            Issues on the HEAD branch.

            For 1.3, the interceptor chain has been significantly changed, with the result that the 1.2.4 fix discussed above won't work.

            A similar change could be made to TxInterceptor.registerTransaction(), such that if a tx is rolling back the gtx is identified and returned to TxInterceptor.attachGlobalTransaction().

            There is a problem though, in that attachGlobalTransaction associates the gtx with the call going through the interceptor chain by adding the correct gtx to the arg[] of the MethodCall. This only works if GlobalTransaction is one of the parameter types of the MethodCall. In many methods it's not, particularly the various _get methods.

            In cases where the gtx is not available as a MethodCall arg, the PessimisticLockInterceptor looks for it by calling TreeCache.getCurrentTransaction(). This method has the same kind of isValid(tx) logic we've seen elsewhere.

            We could change that method as well to use isValidOrRollingBack(tx), but now we're changing the behavior of something that's called from several places and is part of TreeCache's public API. (It's not part of the MBean interface).

            Any thoughts?

            • 3. Re: JBCACHE-359 Discussion Thread
              Brian Stansberry Master

               

              "bstansberry@jboss.com" wrote:
              A diifference in this case vs. the unit test is that the rollback is not due to a problem in the local tx resulting in a call to setRollbackOnly() on the tx, with the rollback happening on the tx.commit() call. Here the tx is being explicitly rolled back via a call to tx.rollback(). This call is happening due to a replicated rollback message from the cluster node that originated the gtx.


              OK, found the problem. When o.j.tm.TransactionImpl.rollback() is called, by the time a registered Synchronization's afterCompletion() method is called, Transaction.getStatus() returns STATUS_ROLLEDBACK rather than STATUS_ROLLING_BACK. Therefore, the isValidOrRollingBack() test fails and no gtx is found.

              (The unit test doesn't catch this because DummyTransaction has a slightly different behavior -- see http://jira.jboss.com/jira/browse/JBCACHE-360.)

              To fix this (in 1.2.4) I'll probably keep a list of gtx's in the process of rollback, and use it to look up the gtx if there is a callback. At this point I've no idea what to do in the 1.3 code.

              • 4. Re: JBCACHE-359 Discussion Thread
                Ben Wang Master

                I have discussed this solution of adding the rolling back status to isValid() call last week with Bela. But he mentioned that we won't want to do that because there may be a real case of rolling back (since isValid() is used in many different places). We may need to revisit this issue though now that we know the real cause of the problem.

                • 5. Re: JBCACHE-359 Discussion Thread
                  Brian Stansberry Master

                  For the JBossCache_1_2_4_SP1 branch I was able to fix this without using isValidOrRollingBack(). Added a Map<Transaction, GlobalTransaction). PessimisticLockInterceptor.rollback() adds entry to the map; invoke() looks up the gtx from the map if the tx is not valid.

                  For 1.3, this won't work as interceptors other than TxInterceptor have no way to look up a GlobalTransaction that isn't included as a MethodCall arg except via a call to TreeCache.getCurrentTransaction(). I think this is a flaw in general.

                  How about the TxInterceptor binds the gtx to a ThreadLocal?

                  At the moment my main concern is with 1.2.4_SP1 as it impacts the JBento work and JBoss AS 4.0.4.

                  • 6. Re: JBCACHE-359 Discussion Thread
                    Ben Wang Master

                    Either ThreadLocal or we need notion of Invocation (payload) between interceptor chain is needed. But we will probably need to wait untill 2.0 for Invocation object though as this will break the backward compatibility.

                    • 7. Re: JBCACHE-359 Discussion Thread
                      Manik Surtani Master

                      I think the concept of a payload is the ideal place to put this (along with invocation specific override options - see JBCACHE-106)

                      For my current implementation of JBCACHE-106 though, I have this in thread local. I did toy with putting the gtx in thread local as well (and can easily revert to doing this again if this works best for all), but in the end opted for using TreeCache.getCurrentTransaction().

                      Since TreeCache.getCurrentTransaction() currently creates a new gtx if none is found, I've overloaded this method with TreeCache.getCurrentTransaction(boolean createIfNotExists).


                      • 8. Re: JBCACHE-359 Discussion Thread
                        Manik Surtani Master

                        Brian,

                        Have a look at Interceptor.getCurrentGlobalTx(). This method first looks in method call args for a gtx passed in, and if this is not present, it uses TreeCache.getCurrentTransaction( false ).

                        This is useful in cases where you don't know whether to expect the gtx as a part of the method call. If you are certain the gtx is not in the method call (i.e., are looking for a specific method) it would probably be best to use TreeCache.getCurrentTransaction( false ) directly.

                        Cheers,
                        Manik

                        • 9. Re: JBCACHE-359 Discussion Thread
                          Bela Ban Master

                           

                          "bstansberry@jboss.com" wrote:

                          The simple solution is to change PessimisticLockInterceptor.invoke() from

                          if(tx_mgr != null && (tx=tx_mgr.getTransaction()) != null && isValid(tx)) { // ACTIVE or PREPARING
                           ... find gtx
                          }
                          


                          to

                          if(tx_mgr != null && (tx=tx_mgr.getTransaction()) != null && isValidOrRollingBack(tx)) { // ACTIVE or PREPARING or ROLLING_BACK
                           ... find gtx
                          }
                          


                          plus, of couse, adding method isValidOrRollingBack().

                          I made those changes on my local 1.2.4 branch copy and found that the newly added test in SyncReplTxTest passes.


                          Okay, after reviewing this, I think this is okay. However, make sure you don't change the semantics of isValid() (because it is used in other places), but to add isValidOrRollingBack(). This is IMPORTANT !
                          Also, make sure this is applied to CVS head, whose code is totally different (Manik-ized) :-)

                          • 10. Re: JBCACHE-359 Discussion Thread
                            Brian Stansberry Master

                             

                            "bela@jboss.com" wrote:
                            Okay, after reviewing this, I think this is okay. However, make sure you don't change the semantics of isValid() (because it is used in other places), but to add isValidOrRollingBack(). This is IMPORTANT !


                            Turned out this didn't work in the "real world". It fixed the unit test because when DummyTransaction called a Synchronization's afterCompletion() it's status was ROLLING_BACK. So, isValidOrRollingBack() could test for that status. JBossTM had different behavior -- when it called afterCompletion() the status was ROLLEDBACK. So, to get the isValidOrRollingBack() approach to work I'd have to accept status ROLLEDBACK, which seems dangerous.

                            Instead I did the following, starting w/ PessimisticLockInterceptor's rollback() method, through which any rollback flows:

                            private void rollback(GlobalTransaction tx) {
                             ....
                            
                             TransactionEntry entry=tx_table.get(tx);
                            
                             .....
                            
                             Transaction ltx=entry.getTransaction();
                             rollbackTransactions.put(ltx, tx);
                            
                             ...... executing any "undo operations", remove nodes, release locks, etc.
                            
                             rollbackTransactions.remove(ltx);
                            }


                            And in the invoke() method, where it would trying to find the gtx if a listener made a callback into the cache:

                            if(tx_mgr != null && (tx=tx_mgr.getTransaction()) != null && isValid(tx)) { // ACTIVE or PREPARING
                            
                             ..... current code
                            
                            }
                            else if (tx != null) {
                             gtx = (GlobalTransaction) rollbackTransactions.get(tx);
                            }


                            The effect is the gtx can be found during a rollback as long as the callback occurs during the processing of the interceptor's rollback() method.

                            I believe the same approach can work in the 1.3 code; I'm not sure but I think the relevant method to add the "rollbackTransactions" map entry is TxInterceptor.runRollbackPhase(). The problem in 1.3 is how to make other interceptors than TxInterceptor aware of the gtx.

                            • 11. Re: JBCACHE-359 Discussion Thread
                              Manik Surtani Master

                              The more I think about it and the more I look at issues like this (and others) the more I lean towards doing something like:

                              - TX interceptor finds the 'correct' TX and GTX for the call. This interceptor alone locates the correct GTX (either a new one, in the method call or in a rollback map).
                              - the TX and GTX are placed in thread local - see TreeCache.InvocationSpecificSettings
                              - all other interceptors get TX and GTX handles from TreeCache.InvocationSpecificSettings.

                              I'm not too happy with the thread local code being in a static class inside TreeCache though - I'd rather this resides in Interceptor.

                              What does everyone think of this?

                              • 12. Re: JBCACHE-359 Discussion Thread
                                Brian Stansberry Master

                                 

                                "manik.surtani@jboss.com" wrote:
                                Have a look at Interceptor.getCurrentGlobalTx(). This method first looks in method call args for a gtx passed in, and if this is not present, it uses TreeCache.getCurrentTransaction( false ).


                                The problem is TreeCache.getCurrentTransaction( false ) will return null if there is a tx but it's not ACTIVE or PREPARING. In this particular use case, the tx is ROLLEDBACK, but the lock interceptor still needs acquire read locks held by the gtx. And we don't want to change the semantics of TreeCache.getCurrentTransaction to support this one use case.

                                • 13. Re: JBCACHE-359 Discussion Thread
                                  Manik Surtani Master

                                   

                                  "bstansberry@jboss.com" wrote:
                                  "manik.surtani@jboss.com" wrote:
                                  Have a look at Interceptor.getCurrentGlobalTx(). This method first looks in method call args for a gtx passed in, and if this is not present, it uses TreeCache.getCurrentTransaction( false ).


                                  The problem is TreeCache.getCurrentTransaction( false ) will return null if there is a tx but it's not ACTIVE or PREPARING. In this particular use case, the tx is ROLLEDBACK, but the lock interceptor still needs acquire read locks held by the gtx. And we don't want to change the semantics of TreeCache.getCurrentTransaction to support this one use case.


                                  Yes, I see what you mean.

                                  Like I was saying though, we could change it such that TXs and GTXs are held in ThreadLocal with access code in the Interceptor class, and have all Interceptors except the TxInterceptor use this mechanism of retrieving TXs and GTXs.

                                  We could then restrict it such that only the TxInterceptor has the code to set these values. Potentially even deprecate TreeCache.getCurrentTransaction() since this should only be used by the TxInterceptor and could hence be moved there.

                                  Still could lead to other (custom) interceptors directly manipulating the TxTable (using TreeCache.getTransactionTable()) which I'm not too hot on and moving the TxTable to the TxInterceptor may be too drastic a change as far as custom interceptors go, although this is the logical place for it.




                                  • 14. Re: JBCACHE-359 Discussion Thread
                                    Brian Stansberry Master

                                    Manik, I think my previous post crossed in the mail with the one where you said you think a ThreadLocal is the way to go. :-)

                                    I see what you mean about TreeCache.InvocationSpecificSettings. A very loose interpretation of "settings" could include the tx and gtx associated with the invocation, but it really seems like mixing different things. Seems more natural to reside in Interceptor.

                                    It would be nice not to have any transactional stuff in TreeCache itself, other than the public API that calls into the interceptor chain.

                                    1 2 3 Previous Next