2 Replies Latest reply on Jul 3, 2008 11:31 AM by atijms

    Eviction thread causes deadlock?

    atijms

      Hi,

      I'm using version 1.3.0 of Jboss cache. Recently one of our production servers was hanging. A stack trace showed that the evictor thread seemed to be stuck:

      "Timer-1" daemon prio=10 tid=0x08ffdc00 nid=0x1c85 in Object.wait() [0x75813000..0x75813ec0]
       java.lang.Thread.State: TIMED_WAITING (on object monitor)
       at java.lang.Object.wait(Native Method)
       - waiting on <0x81bbe008> (a org.jboss.cache.lock.ReadWriteLockWithUpgrade$WriterLock)
       at org.jboss.cache.lock.ReadWriteLockWithUpgrade$WriterLock.attempt(ReadWriteLockWithUpgrade.java:389)
       - locked <0x81bbe008> (a org.jboss.cache.lock.ReadWriteLockWithUpgrade$WriterLock)
       at org.jboss.cache.lock.IdentityLock.acquireWriteLock(IdentityLock.java:195)
       at org.jboss.cache.Node.acquireWriteLock(Node.java:431)
       at org.jboss.cache.Node.acquire(Node.java:386)
       at org.jboss.cache.interceptors.PessimisticLockInterceptor.lock(PessimisticLockInterceptor.java:228)
       at org.jboss.cache.interceptors.PessimisticLockInterceptor.invoke(PessimisticLockInterceptor.java:166)
       at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68)
       at org.jboss.cache.interceptors.UnlockInterceptor.invoke(UnlockInterceptor.java:32)
       at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68)
       at org.jboss.cache.interceptors.TxInterceptor.handleNonTxMethod(TxInterceptor.java:345)
       at org.jboss.cache.interceptors.TxInterceptor.invoke(TxInterceptor.java:156)
       at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68)
       at org.jboss.cache.interceptors.CacheMgmtInterceptor.invoke(CacheMgmtInterceptor.java:179)
       at org.jboss.cache.TreeCache.invokeMethod(TreeCache.java:5520)
       at org.jboss.cache.TreeCache.evict(TreeCache.java:3716)
       at org.jboss.cache.eviction.BaseEvictionPolicy.evict(BaseEvictionPolicy.java:34)
       at org.jboss.cache.eviction.BaseEvictionAlgorithm.evictCacheNode(BaseEvictionAlgorithm.java:210)
       at org.jboss.cache.eviction.BaseEvictionAlgorithm.emptyRecycleQueue(BaseEvictionAlgorithm.java:461)
       at org.jboss.cache.eviction.BaseEvictionAlgorithm.process(BaseEvictionAlgorithm.java:97)
       at org.jboss.cache.eviction.EvictionTimerTask.run(EvictionTimerTask.java:80)
       - locked <0x81a20490> (a org.jboss.cache.eviction.Region)
       - locked <0x7ec74670> (a java.util.Collections$SynchronizedSet)
       at java.util.TimerThread.mainLoop(Timer.java:512)
       at java.util.TimerThread.run(Timer.java:462)
      


      Even though the time out has been set to 120000 milliseconds, it seemed to be stuck there forever.

      The stack trace of one of the many threads that was waiting for a lock on the SynchronizedSet object:

      "TP-Processor19" daemon prio=10 tid=0x09523400 nid=0x3b0f waiting for monitor entry [0x755b1000..0x755b3140]
       java.lang.Thread.State: BLOCKED (on object monitor)
       at java.util.Collections$SynchronizedCollection.add(Collections.java:1577)
       - waiting to lock <0x7ec74670> (a java.util.Collections$SynchronizedSet)
       at org.jboss.cache.eviction.EvictionTimerTask.addRegionToProcess(EvictionTimerTask.java:46)
       at org.jboss.cache.eviction.RegionManager.createRegion(RegionManager.java:154)
       at com.mbuyu.tableflow.cache.std.TreeCacheProvider.createLocalCache(TreeCacheProvider.java:139)
       - locked <0x822dd4b8> (a org.apache.catalina.session.StandardSessionFacade)
       - locked <0x822dd4c8> (a com.mbuyu.tableflow.cache.std.TreeCacheProvider)
       at com.mbuyu.tableflow.cache.std.TreeCacheProvider.getCache(TreeCacheProvider.java:45)
       at com.mbuyu.tableflow.cache.TableFlowCacheFactory.getCache(TableFlowCacheFactory.java:64)
       at com.mbuyu.tableflow.util.TableFlowProvider.<init>(TableFlowProvider.java:57)
       at com.mbuyu.tableflow.jsf.UITableFlow.encodeEnd(UITableFlow.java:134)
      


      I took a look at the source code. Naively spoken there seems to be an opportunity for an infinite loop:

      synchronized (WriterLock.this) {
       if (msecs <= 0) {
       // Upgrade thread has prioirty.
       if (waitingUpgrader_ != 0) {
       if (upgraderLocal_.get(ReadWriteLockWithUpgrade.this) != null) {
       log_.info("attempt(): upgrade to write lock");
       return startWrite();
       }
       else
       return false;
       } else
       return startWrite();
       } else if (startWriteFromNewWriter())
       return true;
       else {
       long waitTime = msecs;
       long start = System.currentTimeMillis();
       while(true) {
       try {
       WriterLock.this.wait(waitTime);
       }
       catch(InterruptedException ex) {
       cancelledWaitingWriter();
       WriterLock.this.notifyAll();
       ie=ex;
       break;
       }
      
       if(waitingUpgrader_ != 0) { // Has upgrade request
       if(upgraderLocal_.get(ReadWriteLockWithUpgrade.this) != null) { // Upgrade thread
       if(startWriteFromWaitingWriter())
       return true;
       }
       else { // Normal write thread, go back to wait.
       continue;
       }
       }
       else { // Every one is normal write thread. Compete; if fail go back to wait.
       if(startWriteFromWaitingWriter())
       return true;
       }
      
       waitTime=msecs - (System.currentTimeMillis() - start);
       if(waitTime <= 0) {
       cancelledWaitingWriter();
       WriterLock.this.notifyAll();
       break;
       }
       }
       }
       }
      


      the else statement commented with
      "// Normal write thread, go back to wait." continues the loop without taking the timeout into account. I'm not sure whether this is the issue here though.

      Has anyone seen this before? I've tried to search for it, but could not find any prior reports.

        • 1. Re: Eviction thread causes deadlock?
          manik

          This is pretty old code, have you tried a newer version of JBC?

          • 2. Re: Eviction thread causes deadlock?
            atijms

             

            "manik.surtani@jboss.com" wrote:
            This is pretty old code, have you tried a newer version of JBC?


            I haven't tried that yet. This problem also just happened for the first time ever. The code has been running stable ever since JBC 1.3 was released (it was the latest version at the time we developed/deployed the code).

            Since we're going to upgrade JBC anyway soon, it's tempting to just ignore this occurrence for now. Just wanted to see if this problem has been seen by anyone before.