3 Replies Latest reply on Jul 30, 2007 7:59 AM by manik

    Potential Deadlock - 1.4.1.SP3 and BoundedLinkedQueue

    mraccola

      I am experiencing what appears to be a deadlock situation in JBoss Cache 1.4.1.SP3. After some period of time with the app up and running in JBoss AS 4.0.5 I am seeing one thread holding a couple of monitor locks in BoundedLinkedQueue and a multitude of threads waiting on the lock. The application grinds to a halt with users experiencing hanging HTTP responses.

      In the latest scenario I have 28 HTTP threads,
      26 are blocked (see Stack #1).
      1 thread holds 2 locks (see Stack #2).
      1 thread is still free (see Stack #3).

      A Google search indicates that BoundedLinkedQueue has had issues with race conditions in the past, http://altair.cs.oswego.edu/pipermail/concurrency-interest/2004-September/001037.html, although the scenario seems slightly different.

      Stack #1

      "http-0.0.0.0-8080-12" daemon prio=1 tid=0x08c5bd58 nid=0x26e0 waiting for monitor entry [0x97cfc000..0x97cff030]
       at EDU.oswego.cs.dl.util.concurrent.BoundedLinkedQueue.put(Unknown Source)
       - waiting to lock <0xb4291690> (a java.lang.Object)
       at org.jboss.cache.eviction.Region.putNodeEvent(Region.java:141)
       at org.jboss.cache.interceptors.EvictionInterceptor.doEventUpdatesOnRegionManager(EvictionInterceptor.java:149)
       at org.jboss.cache.interceptors.EvictionInterceptor.updateNode(EvictionInterceptor.java:122)
       at org.jboss.cache.interceptors.EvictionInterceptor.invoke(EvictionInterceptor.java:97)
       at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68)
       at org.jboss.cache.interceptors.OptimisticCreateIfNotExistsInterceptor.invoke(OptimisticCreateIfNotExistsInterceptor.java:69)
       at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68)
       at org.jboss.cache.interceptors.OptimisticValidatorInterceptor.invoke(OptimisticValidatorInterceptor.java:87)
       at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68)
       at org.jboss.cache.interceptors.OptimisticLockingInterceptor.invoke(OptimisticLockingInterceptor.java:126)
       at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68)
       at org.jboss.cache.interceptors.TxInterceptor.handleNonTxMethod(TxInterceptor.java:365)
       at org.jboss.cache.interceptors.TxInterceptor.invoke(TxInterceptor.java:160)
       at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68)
       at org.jboss.cache.interceptors.CacheMgmtInterceptor.invoke(CacheMgmtInterceptor.java:138)
       at org.jboss.cache.TreeCache.invokeMethod(TreeCache.java:5863)
       at org.jboss.cache.TreeCache.get(TreeCache.java:3627)
       at org.jboss.cache.TreeCache.get(TreeCache.java:3608)
      


      Stack #2
      "http-0.0.0.0-8080-11" daemon prio=1 tid=0x087dab18 nid=0x2691 in Object.wait() [0x976f0000..0x976f3030]
       at java.lang.Object.wait(Native Method)
       at java.lang.Object.wait(Object.java:474)
       at EDU.oswego.cs.dl.util.concurrent.BoundedLinkedQueue.put(Unknown Source)
       - locked <0xb42914e8> (a EDU.oswego.cs.dl.util.concurrent.BoundedLinkedQueue)
       - locked <0xb4291690> (a java.lang.Object)
       at org.jboss.cache.eviction.Region.putNodeEvent(Region.java:141)
       at org.jboss.cache.interceptors.EvictionInterceptor.doEventUpdatesOnRegionManager(EvictionInterceptor.java:149)
       at org.jboss.cache.interceptors.EvictionInterceptor.updateNode(EvictionInterceptor.java:122)
       at org.jboss.cache.interceptors.EvictionInterceptor.invoke(EvictionInterceptor.java:97)
       at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68)
       at org.jboss.cache.interceptors.OptimisticCreateIfNotExistsInterceptor.invoke(OptimisticCreateIfNotExistsInterceptor.java:69)
       at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68)
       at org.jboss.cache.interceptors.OptimisticValidatorInterceptor.invoke(OptimisticValidatorInterceptor.java:87)
       at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68)
       at org.jboss.cache.interceptors.OptimisticLockingInterceptor.invoke(OptimisticLockingInterceptor.java:126)
       at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68)
       at org.jboss.cache.interceptors.TxInterceptor.handleNonTxMethod(TxInterceptor.java:365)
       at org.jboss.cache.interceptors.TxInterceptor.invoke(TxInterceptor.java:160)
       at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68)
       at org.jboss.cache.interceptors.CacheMgmtInterceptor.invoke(CacheMgmtInterceptor.java:138)
       at org.jboss.cache.TreeCache.invokeMethod(TreeCache.java:5863)
       at org.jboss.cache.TreeCache.get(TreeCache.java:3627)
       at org.jboss.cache.TreeCache.get(TreeCache.java:3608)
      


      Stack #3
      "http-0.0.0.0-8080-27" daemon prio=1 tid=0x09643720 nid=0x2fc0 in Object.wait() [0x978f7000..0x978f71b0]
       at java.lang.Object.wait(Native Method)
       - waiting on <0xc3c65dc0> (a org.apache.tomcat.util.net.MasterSlaveWorkerThread)
       at java.lang.Object.wait(Object.java:474)
       at org.apache.tomcat.util.net.MasterSlaveWorkerThread.await(MasterSlaveWorkerThread.java:81)
       - locked <0xc3c65dc0> (a org.apache.tomcat.util.net.MasterSlaveWorkerThread)
       at org.apache.tomcat.util.net.MasterSlaveWorkerThread.run(MasterSlaveWorkerThread.java:107)
       at java.lang.Thread.run(Thread.java:595)
      


      TreeCache Configuration
      <server>
       <classpath codebase="./lib" archives="jboss-cache.jar, jgroups.jar"/>
       <mbean code="org.jboss.cache.TreeCache" name="jboss.cache:service=TreeCache">
       <depends>jboss:service=Naming</depends>
       <depends>jboss:service=TransactionManager</depends>
       <attribute name="ClusterName">FCXp_hibernate</attribute>
       <attribute name="CacheMode">LOCAL</attribute>
       <attribute name="SyncReplTimeout">10000</attribute>
       <attribute name="LockAcquisitionTimeout">15000</attribute>
       <attribute name="FetchInMemoryState">false</attribute>
       <attribute name="NodeLockingScheme">OPTIMISTIC</attribute>
      
       <attribute name="EvictionPolicyClass"> org.jboss.cache.eviction.LRUPolicy </attribute>
       <attribute name="EvictionPolicyConfig">
       <config>
       <attribute name="wakeUpIntervalSeconds">10000</attribute>
       <!-- Cache wide default -->
       <region name="/_default_">
       <attribute name="maxNodes">200000</attribute>
       <attribute name="timeToIdleSeconds">10000</attribute>
       </region>
       </config>
       </attribute>
       </mbean>
      </server>


        • 1. Re: Potential Deadlock - 1.4.1.SP3 and BoundedLinkedQueue
          brian.stansberry

          Your stack traces don't look like a deadlock; looks like what I would expect if the BoundedLinkQueue is full and threads are waiting for another thread to take from the queue. The Stack 2 thread is blocking waiting to be notified after the take; the Stack 1 threads are blocking waiting to contend to be the next to do the put once Stack 2 is done.

          The thread that takes from the queue is the eviction thread, which you've got configured to only run every few *hours* (wakeUpIntervalSeconds=10000).

          Do you have WARN logging suppressed for org.jboss.cache.eviction.Region? If WARN is enabled I'd expect lots of messages saying "putNodeEvent(): eviction node event queue size is at 98% threshold value of capacity: 200000 You will need to reduce the wakeUpIntervalSeconds parameter."

          • 2. Re: Potential Deadlock - 1.4.1.SP3 and BoundedLinkedQueue
            mraccola

            I appreciate the quick response. This has helped eliminate the problem and your assessment was 100% correct.

            I am wondering what is an optimal setting for the wakeUpIntervalSeconds? Based off forum posts it seems like a common setting is 5 seconds. That strikes me as a little excessive. I will proceed with some tuning of this parameter but if there are some guidelines on what what factors would lead to a larger/smaller setting that would be helpful.

            • 3. Re: Potential Deadlock - 1.4.1.SP3 and BoundedLinkedQueue
              manik

              5 seconds isn't really that excessive at all if the thread has entries to process in the queue every time. Depends on the kind of activity you see in the cache, really. I know of use cases that set this to 1 second and even have requests to be able to specify this param in millis.