11 Replies Latest reply on Mar 6, 2009 5:56 AM by adinn

    Thread infinitely blocking on TransactionReaper.insert()/rem

      I just started using JBoss TS 4.5.0.GA in the following setup:

      Tomcat 6 (NIO)
      Spring 2.5.6
      JBossCache 3.0.2
      JDK 1.6
      JBoss TS 4.5.0.GA
      Hibernate 3.3.1.GA

      I ran a load test with 50 users simultaneously hitting Tomcat reading same data repeatedly, each with 1sec think time. No writes take place; each user makes 200 requests (so 50X200=10000 requests total).

      I re-ran this load test (using JMeter) several times; each time 1-7 users would not finish at all (a request would never return).

      JProfiler indicated that "tomcat-exec-XX" threads in question were blocking at TransactionReaper.insert() or remove(). Each time the thread owning a monitor in question appeared to be a non "tomcat-exec" thread (ie NOT a thread handing client HTTP requests), but rather some thread spawned by JBossTS to examine transaction objects placed into a data structure by TransactionReaper (for stale-ness, I assume). That thread also appeared blocked in JProfiler's thread map, though wasn't clear on what.

      Needless to say this is pretty worrisome. I would like to pursue two avenues in dealing with above:

      1). Determine root cause of this problem and fix it.
      2). Avoid executing TransactionReaper's insert() and remove() methods

      In trying to determine root cause of this problem, what loggers should I switch to debug? I am looking for a minimal set, as turning up com.arjuna to DEBUG produced > 4.5GB of logs during the load test.


      On the other hand, I would like to try and avoid TransactionReaper completely. The reason is that:
      -our system is simply and has only one resource (hibernate) and one synchronization (JBossCache)
      -we have been relying on DB server for timing out lengthy transactions and it's worked fine; so I am unsure we'd get much benefit from this functionality at Transaction Manager level.
      So, it seems that Transaction Reaper isn't engaged if a given Transaction has a timeout value of "-1". How and where can I set tx timeout value to "-1"? JBossTS code seems to be peppered with statements like:

      if(! tx.getTimeout() > 0) tx.setTimeout(0); //pseudo code


      thanks a lot,
      -nikita

        • 1. Re: Thread infinitely blocking on TransactionReaper.insert()

          I should add the following about the setup i mention above:
          -this is a single server setup (no cluster)
          -average request time was only 33 ms; no request took over 1 second.

          • 2. Re: Thread infinitely blocking on TransactionReaper.insert()
            adinn

             


            JProfiler indicated that "tomcat-exec-XX" threads in question were blocking at TransactionReaper.insert() or remove(). Each time the thread owning a monitor in question appeared to be a non "tomcat-exec" thread (ie NOT a thread handing client HTTP requests), but rather some thread spawned by JBossTS to examine transaction objects placed into a data structure by TransactionReaper (for stale-ness, I assume). That thread also appeared blocked in JProfiler's thread map, though wasn't clear on what.


            I would be very grateful if you could obtain and post a stack trace for the thread which you say is blocking the tomcat threads. The most likely candidate for the blocking thread is the reaper thread. It wakes up in time to check whether the first (i.e. earliest expiring) transaction on the queue has expired. It should never block while holding the queue lock since all it does while holding the lock is remove (or occasionally re-insert) a transaction from the queue and/or compute it's next wakeup time. In very unusual circumstances it may post an error to the system log about zombie ReaperWorker threads while holding the lock but that is highly unlikely (you could check your log for any mention of zombies). A stack trace would make it very clear what is causing any blockage here.

            • 3. Re: Thread infinitely blocking on TransactionReaper.insert()

              @adinn:

              Thanks for a prompt reply. So far I was only able to get the following deadlock captured. This deadlock was apparently indefinite - persisted for 195 seconds at which point I killed the server:

              Waiting Thread:
              tomcat-exec-10

              Stack Trace:
              com.arjuna.ats.arjuna.coordinator.TransactionReaper.remove(java.lang.Object)
              com.arjuna.ats.arjuna.AtomicAction.commit(boolean)
              com.arjuna.ats.internal.jta.transaction.arjunacore.TransactionImple.commitAndDisassociate()
              com.arjuna.ats.internal.jta.transaction.arjunacore.BaseTransaction.commit()
              org.springframework.aop.aspectj.MethodInvocationProceedingJoinPoint.proceed(java.lang.Object[ ])
              com.doppelganger.framework.transaction.TransactionConfigAspect.config(org.aspectj.lang.ProceedingJoinPoint, com.doppelganger.framework.transaction.TransactionConfig)
              org.springframework.aop.aspectj.MethodInvocationProceedingJoinPoint.proceed()
              com.doppelganger.aspect.ServiceIdentityAspect.recordServiceIdentity(org.aspectj.lang.ProceedingJoinPoint)
              org.springframework.aop.framework.JdkDynamicAopProxy.invoke(java.lang.Object, java.lang.reflect.Method, java.lang.Object[ ])
              $Proxy89.getFriends2ByUserName(java.lang.String)
              com.doppelganger.envmanager.envclient.GetFriends2Controller.getOnlineFriends(java.lang.String)
              java.lang.reflect.Method.invoke(java.lang.Object, java.lang.Object[ ])
              com.doppelganger.envmanager.mvc.AbstractEnvmanagerController.handleRequestInternal(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse)
              org.springframework.web.filter.AbstractRequestLoggingFilter.doFilterInternal(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse, javax.servlet.FilterChain)
              com.doppelganger.log.Log4JNDCFilter.doFilterInternal(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse, javax.servlet.FilterChain)
              java.lang.Thread.run()

              Monitor class:
              com.arjuna.ats.arjuna.coordinator.TransactionReaper

              Owning Thread
              tomcat-exec-20

              Stack Trace:
              java.util.Collections$SynchronizedMap.put(java.lang.Object, java.lang.Object)
              com.arjuna.ats.arjuna.coordinator.TransactionReaper.insert(com.arjuna.ats.arjuna.coordinator.Reapable, int)
              com.arjuna.ats.arjuna.AtomicAction.begin(int)
              com.arjuna.ats.internal.jta.transaction.arjunacore.TransactionImple.(int)
              com.arjuna.ats.internal.jta.transaction.arjunacore.BaseTransaction.begin()
              org.springframework.aop.aspectj.MethodInvocationProceedingJoinPoint.proceed(java.lang.Object[ ])
              com.doppelganger.framework.transaction.TransactionConfigAspect.config(org.aspectj.lang.ProceedingJoinPoint, com.doppelganger.framework.transaction.TransactionConfig)
              org.springframework.aop.aspectj.MethodInvocationProceedingJoinPoint.proceed()
              com.doppelganger.aspect.ServiceIdentityAspect.recordServiceIdentity(org.aspectj.lang.ProceedingJoinPoint)
              org.springframework.aop.framework.JdkDynamicAopProxy.invoke(java.lang.Object, java.lang.reflect.Method, java.lang.Object[ ])
              $Proxy89.getFriends2ByUserName(java.lang.String)
              com.doppelganger.envmanager.envclient.GetFriends2Controller.getOnlineFriends(java.lang.String)
              java.lang.reflect.Method.invoke(java.lang.Object, java.lang.Object[ ])
              com.doppelganger.envmanager.mvc.AbstractEnvmanagerController.handleRequestInternal(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse)
              org.springframework.web.filter.AbstractRequestLoggingFilter.doFilterInternal(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse, javax.servlet.FilterChain)
              com.doppelganger.log.Log4JNDCFilter.doFilterInternal(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse, javax.servlet.FilterChain)
              java.lang.Thread.run()

              • 4. Re: Thread infinitely blocking on TransactionReaper.insert()

                I set defaultTimeout to 0 and now see now blocked threads. (not executing TransactionReaper.insert())

                However, AtomicAction.commit() has this line:

                TransactionReaper.create().remove(this);

                ...which still lazily creates txReaper and forces execution of synchronized block in doing so.

                why not:

                if(! TransactionReaper.exists())
                {
                TransactionReaper.create();
                }
                TransactionReaper.remove(this)?

                The above would avoid entering 'synchronized' block for all but one execution of commit().

                -nikita

                • 5. Re: Thread infinitely blocking on TransactionReaper.insert()
                  adinn

                  Ok, your deadlock trace is certainly very interesting. Here is the code for insert and remove:

                   /**
                   * timeout is given in seconds, but we work in milliseconds.
                   */
                  
                   public final boolean insert(Reapable control, int timeout)
                   {
                   if (tsLogger.arjLogger.debugAllowed())
                   {
                   tsLogger.arjLogger.debug(DebugLevel.FUNCTIONS,
                   VisibilityLevel.VIS_PUBLIC, FacilityCode.FAC_ATOMIC_ACTION,
                   "TransactionReaper::insert ( " + control + ", " + timeout
                   + " )");
                   }
                  
                   /*
                   * Ignore if the timeout is zero, since this means the transaction
                   * should never timeout.
                   */
                  
                   if (timeout == 0)
                   return true;
                  
                   /**
                   * Ignore if it's already in the list with a different timeout.
                   * (This should never happen)
                   */
                   if (_timeouts.containsKey(control)) {
                   return false;
                   }
                  
                   ReaperElement e = new ReaperElement(control, timeout);
                  
                   synchronized (this)
                   {
                   TransactionReaper._lifetime += timeout;
                  
                   _timeouts.put(control, e);
                   boolean rtn = _transactions.add(e);
                  
                   if(_dynamic)
                   {
                   notify(); // force recalc of next wakeup time, taking into account the newly inserted element
                   }
                  
                   return rtn;
                   }
                   }
                  


                   public final boolean remove(java.lang.Object control)
                   {
                   if (tsLogger.arjLogger.debugAllowed())
                   {
                   tsLogger.arjLogger.debug(DebugLevel.FUNCTIONS,
                   VisibilityLevel.VIS_PUBLIC, FacilityCode.FAC_ATOMIC_ACTION,
                   "TransactionReaper::remove ( " + control + " )");
                   }
                  
                   if (control == null)
                   return false;
                  
                   ReaperElement key;
                  
                   synchronized(this)
                   {
                   key = (ReaperElement)_timeouts.remove(control);
                   if(key == null) {
                   return false;
                   }
                   }
                  
                   // if a cancellation is in progress then we have to
                   // see it through as we have to ensure that the worker
                   // thread does not get wedged. so we have to tell the
                   // control has gone away. in order to test the status
                   // we need to synchronize on the element before we
                   // synchronize on this so we can ensure that we don't
                   // deadlock ourselves.
                  
                   synchronized(key)
                   {
                   if (key._status != ReaperElement.RUN)
                   {
                   // we are cancelling this TX anyway and need
                   // to track the progress of the cancellation
                   // using this entry so we cnanot remove it
                  
                   return false;
                   }
                  
                   synchronized(this)
                   {
                   removeElement(key);
                  
                   return true;
                   }
                   }
                   }
                  


                  Method removeElement is coded as follows

                   /*
                   * Remove element from list and trigger waiter if we are
                   * being shutdown.
                   */
                  
                   private final void removeElement (ReaperElement e)
                   {
                   synchronized (_shutdownLock)
                   {
                   _timeouts.remove(e._control);
                   _transactions.remove(e);
                  
                   if (_inShutdown && (_transactions.size() == 0))
                   {
                   _shutdownLock.notify();
                   }
                   }
                   }
                  


                  Now the objects which are synchronized on here are the transaction reaper itself (this), synchronized map _timeouts , synchronized list _transactions, ReaperElement key and Object _shutDownLock. There are implicit synchronizations on _transactions and _timeouts at the calls to put, add, containsKey and remove because the value of the targets for these method invocations are synchronized collections obtained from class Collections.

                  None of these objects are locked from outside of the calls to methods of TransactionReaper (the reaper is accessible via the create method but neither of these threads obtains and locks a handle on it). Inside the two methods in question the only place where multiple locks can be held are:

                  insert:
                  in the synchronized(this) block _transactions and _timeouts are implicitly locked by the calls to put() and add(). So the lock hierarchy here is

                   this
                   _transactions
                   _timeouts
                  


                  remove:
                  in the first synchronized block _timeouts is implicitly locked by the call to remove

                   this
                   _timeouts
                  




                  in the second nested synchronized block _timeouts and _transactions are implicitly locked by the respective calls to remove occuring inside removeElement. these calls occur inside a synchronization on _shutDownLock

                   key
                   this
                   _shutdownLock
                   _timeouts
                   _transactions
                  


                  So, the locks which are common between these hierarchies are always obtained in the same order. This means the two threads should not be able to deadlock each other. That would require each of them to hold a lock desired by the other and the ordering constraint for obtaining the locks makes that impossible.

                  n.b.the comment in remove about synchronizing on key before synchronizing on this refers to ordering with respect to the ReaperWorkerThread which actually performs transaction cancellations. It also uses lock ordering to avoid deadlocks, ensuring that it always locks a ReaperElement before it locks the TransactionReaper before locking either of the synchronized collections wheneerit needs to hold multiple locks from this set. Anyyway your deadlock trace does not indicate that the ReaperThread or ReaperWorkerThread are involved here.

                  The only other possible place this lock ordering constraint might be violated is inside the methods on the synchronized collection classes. A conflict might occur if put or add (called inside insert) were to lock the key object. This could cause a deadlock when remove obtains a lock on key and then tries to lock _transactions or _timeouts. However, looking at the collections code the implementations of the sorted collections never lock the objects they contain (nor should they). They only lock an object private to the synchronized collection itself. The sorted set implementation does call the comparison method of its elements. But this does not account for the deadlock since ReaperElement.compareTo is not synchronized.

                   /**
                   * Order by absoluteTimeout first, then by Uid.
                   * This is required so that the set maintained by the TransactionReaper
                   * is in timeout order for efficient processing.
                   *
                   * @param o
                   * @return
                   */
                   public int compareTo(Object o)
                   {
                   ReaperElement other = (ReaperElement)o;
                  
                   if(_absoluteTimeout == other._absoluteTimeout) {
                   if(_control.get_uid().equals(other._control.get_uid())) {
                   return 0;
                   } else if (_control.get_uid().greaterThan(other._control.get_uid())) {
                   return 1;
                   } else {
                   return -1;
                   }
                   } else {
                   return (_absoluteTimeout > other._absoluteTimeout) ? 1 : -1;
                   }
                   }
                  


                  So, I am unable to understand how this deadlock can arise.

                  In particular, your trace shows that the deadlock is on the reaper object and that the inserting thread is blocked on the (SynchronizedSortedMap) put operation while the removing thread is blocked somewhere in TransactionReaper.remove. This implies that the inserting thread has locked the reaper and is waiting on the lock for sorted map _timeouts. It also iplis that the removing thread holds the lock on _timeouts and is waiting for a lock on the reaper. Now remove() never holds the lock on _timeouts when it is trying to lock the reaper object.It only calls _timeouts.remove() inside blocks which are synchronized on this. So, either your deadlock analysis report is wrong about what is causing the deadlock or there is a problem in the JVM compiler or JVM runtime.


                  • 6. Re: Thread infinitely blocking on TransactionReaper.insert()
                    adinn

                    Firstly, my analysis of the lock ordering was made looking at the trunk code, not the 4.5 code. However, 4.5. is actually simpler than trunk -- it does not include the shutdown lock -- but otherwise the analysis still holds.

                    Can you confirm which AS version you are using and explain how you built and installed the JBossTS 4.5 release.

                    Secondly, you say


                    However, AtomicAction.commit() has this line:

                    TransactionReaper.create().remove(this);

                    ...which still lazily creates txReaper and forces execution of synchronized block in doing so.

                    why not:

                    if(! TransactionReaper.exists())
                    {
                    TransactionReaper.create();
                    }
                    TransactionReaper.remove(this)?

                    The above would avoid entering 'synchronized' block for all but one execution of commit().


                    I am assuming that you mean to make exists() unsynchronized so as to only synchronize when create() is called. Well, this is wrong on two counts. Firstly, whether or not exists() is synchronized there is a window between the call to exists() and the call to create() in which the situation tested by exists() can change. So you can still get more than one thread entering create(). Imagine thread A and B both execute this code in parallel. They can both call exists() before either thread enters create() and both get result false. This means that create() has to call exists() before it does anything else in order to avoid the risk of creating two reaper instances. If the reaper still does not exist after synchronization then it is safe to create and assign it. If it does exist then create should just return the current value.

                    The second problem is more subtle and much more serious. I'll explain it because it is a very common mistake made by Java programmers when they start dealing with multi-threaded code. Assume the following definitions in class TransactionReaper:
                     static private TransactionReaper theReaper = null;
                     private SortedSet transactions;
                     private Map timeouts;
                    
                     public static boolean exists() { return theReaper != null; } // wrong!
                     public static synchronized void create()
                     {
                     if (!exists()) {
                     theReaper = new TransactionReaper()
                     }
                     }
                     private TransactionReaper()
                     {
                     transactions = Collections.synchronizedSortedSet(new TreeSet());
                     timeouts = Collections.synchronizedMap(new HashMap());
                     }
                    


                    Now imagine your code is executed by thread A and it calls exists(), gets result false, enters create() and suspends after executing the assignment to theReaper but before exiting the synchronized block. Next assume that thread B runs and calls exists(). At this point there is no guarantee that the assignments made by thread A are all visible to thread B nor that they are all invisible. What is worse is that only some of the assignments might be visible and the real issue is that assignments might be seen out of order!

                    It is only guaranteed that writes occurring in a synchronized block are all visible when the writer thread exits the synchronized block. Up to that point any fields written by the writer are in an indeterminate state as far as visibility to another reader thread is concerned. Go check the Java memory model spec if you want reassurance that this is all that is guaranteed and look up out of order (OOO) processor architectures if you want to see the full details of how the situation I am describing can happen (or if you have trouble believing it).

                    So, when thread B enters your code it might see theReaper != null but it might also see theReaper.transactions == null. This means that the call to exists() in thread B would return true and thread B could then go on to call insert() and get a Null Pointer Exception when it executes theReaper,transactions.add(). With current architectures this can only happen on an OOO multiprocessor architecture or an OOO multi-core processor architecture but there are newer architectures in the pipeline which may also manifest this same error in your code.

                    If you make exists() synchronized then it will fix this problem. However, this means that you still have to perform a synchronization (two synchronizations if theReaper is null) and you still have to call exists() when you enter create(). So, you might as well remove the call to exists() from your code and just call create() directly.

                    Which is why we wrote it the way we did :-)



                    • 7. Re: Thread infinitely blocking on TransactionReaper.insert()
                      adinn

                      Can you also specify which version of JDK 1.6 you were using?

                      • 8. Re: Thread infinitely blocking on TransactionReaper.insert()

                        Adinn,

                        Thanks for both explanations - a colleague of mine actually explained the latter to me as well.

                        I was using JBossTS 4.5.0.GA; jdk version is:
                        c:\code\trunk\java>java -version
                        java version "1.6.0_12"
                        Java(TM) SE Runtime Environment (build 1.6.0_12-b04)
                        Java HotSpot(TM) Client VM (build 11.2-b01, mixed mode)

                        A follow up on your explanation, if I may:

                        why not simply init the Reaper at the very beginning with a static block. Ie do exactly what you do today, but avoid lazy initialization? If I recall correctly the constructor does nothing (thus you're not gaining much by deferring init).

                        thanks again,

                        -nikita


                        • 9. Re: Thread infinitely blocking on TransactionReaper.insert()
                          marklittle

                          The reaper thread is not needed if there are no transactions with timeouts. In that case we never create a reaper. The intention is also that when the reaper list gets to zero we could kill the thread and recreate it again later (though I think we only ever commented on that option and didn't implement it).

                          • 10. Re: Thread infinitely blocking on TransactionReaper.insert()
                            adinn

                            quote]
                            I was using JBossTS 4.5.0.GA; jdk version is:
                            c:\code\trunk\java>java -version
                            java version "1.6.0_12"
                            Java(TM) SE Runtime Environment (build 1.6.0_12-b04)
                            Java HotSpot(TM) Client VM (build 11.2-b01, mixed mode)


                            Hmm, well I don't know of any problems with synchronization in that JVM build. Frankly, I'm baffled by this behaviour.


                            why not simply init the Reaper at the very beginning with a static block. Ie do exactly what you do today, but avoid lazy initialization? If I recall correctly the constructor does nothing (thus you're not gaining much by deferring init).


                            • 11. Re: Thread infinitely blocking on TransactionReaper.insert()
                              adinn

                              Hmm, last post got a little screwed up -- here it is again


                              I was using JBossTS 4.5.0.GA; jdk version is:
                              c:\code\trunk\java>java -version
                              java version "1.6.0_12"
                              Java(TM) SE Runtime Environment (build 1.6.0_12-b04)
                              Java HotSpot(TM) Client VM (build 11.2-b01, mixed mode)


                              Hmm, well I don't know of any problems with synchronization in that JVM build. Frankly, I'm baffled by this behaviour.


                              why not simply init the Reaper at the very beginning with a static block. Ie do exactly what you do today, but avoid lazy initialization? If I recall correctly the constructor does nothing (thus you're not gaining much by deferring init).


                              We gain the ability to stop and restart the reaper without having to unload and reload the reaper class.