Eviction thread causes deadlock?
atijms Jul 3, 2008 10:14 AMHi,
I'm using version 1.3.0 of Jboss cache. Recently one of our production servers was hanging. A stack trace showed that the evictor thread seemed to be stuck:
"Timer-1" daemon prio=10 tid=0x08ffdc00 nid=0x1c85 in Object.wait() [0x75813000..0x75813ec0] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x81bbe008> (a org.jboss.cache.lock.ReadWriteLockWithUpgrade$WriterLock) at org.jboss.cache.lock.ReadWriteLockWithUpgrade$WriterLock.attempt(ReadWriteLockWithUpgrade.java:389) - locked <0x81bbe008> (a org.jboss.cache.lock.ReadWriteLockWithUpgrade$WriterLock) at org.jboss.cache.lock.IdentityLock.acquireWriteLock(IdentityLock.java:195) at org.jboss.cache.Node.acquireWriteLock(Node.java:431) at org.jboss.cache.Node.acquire(Node.java:386) at org.jboss.cache.interceptors.PessimisticLockInterceptor.lock(PessimisticLockInterceptor.java:228) at org.jboss.cache.interceptors.PessimisticLockInterceptor.invoke(PessimisticLockInterceptor.java:166) at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68) at org.jboss.cache.interceptors.UnlockInterceptor.invoke(UnlockInterceptor.java:32) at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68) at org.jboss.cache.interceptors.TxInterceptor.handleNonTxMethod(TxInterceptor.java:345) at org.jboss.cache.interceptors.TxInterceptor.invoke(TxInterceptor.java:156) at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68) at org.jboss.cache.interceptors.CacheMgmtInterceptor.invoke(CacheMgmtInterceptor.java:179) at org.jboss.cache.TreeCache.invokeMethod(TreeCache.java:5520) at org.jboss.cache.TreeCache.evict(TreeCache.java:3716) at org.jboss.cache.eviction.BaseEvictionPolicy.evict(BaseEvictionPolicy.java:34) at org.jboss.cache.eviction.BaseEvictionAlgorithm.evictCacheNode(BaseEvictionAlgorithm.java:210) at org.jboss.cache.eviction.BaseEvictionAlgorithm.emptyRecycleQueue(BaseEvictionAlgorithm.java:461) at org.jboss.cache.eviction.BaseEvictionAlgorithm.process(BaseEvictionAlgorithm.java:97) at org.jboss.cache.eviction.EvictionTimerTask.run(EvictionTimerTask.java:80) - locked <0x81a20490> (a org.jboss.cache.eviction.Region) - locked <0x7ec74670> (a java.util.Collections$SynchronizedSet) at java.util.TimerThread.mainLoop(Timer.java:512) at java.util.TimerThread.run(Timer.java:462)
Even though the time out has been set to 120000 milliseconds, it seemed to be stuck there forever.
The stack trace of one of the many threads that was waiting for a lock on the SynchronizedSet object:
"TP-Processor19" daemon prio=10 tid=0x09523400 nid=0x3b0f waiting for monitor entry [0x755b1000..0x755b3140] java.lang.Thread.State: BLOCKED (on object monitor) at java.util.Collections$SynchronizedCollection.add(Collections.java:1577) - waiting to lock <0x7ec74670> (a java.util.Collections$SynchronizedSet) at org.jboss.cache.eviction.EvictionTimerTask.addRegionToProcess(EvictionTimerTask.java:46) at org.jboss.cache.eviction.RegionManager.createRegion(RegionManager.java:154) at com.mbuyu.tableflow.cache.std.TreeCacheProvider.createLocalCache(TreeCacheProvider.java:139) - locked <0x822dd4b8> (a org.apache.catalina.session.StandardSessionFacade) - locked <0x822dd4c8> (a com.mbuyu.tableflow.cache.std.TreeCacheProvider) at com.mbuyu.tableflow.cache.std.TreeCacheProvider.getCache(TreeCacheProvider.java:45) at com.mbuyu.tableflow.cache.TableFlowCacheFactory.getCache(TableFlowCacheFactory.java:64) at com.mbuyu.tableflow.util.TableFlowProvider.<init>(TableFlowProvider.java:57) at com.mbuyu.tableflow.jsf.UITableFlow.encodeEnd(UITableFlow.java:134)
I took a look at the source code. Naively spoken there seems to be an opportunity for an infinite loop:
synchronized (WriterLock.this) { if (msecs <= 0) { // Upgrade thread has prioirty. if (waitingUpgrader_ != 0) { if (upgraderLocal_.get(ReadWriteLockWithUpgrade.this) != null) { log_.info("attempt(): upgrade to write lock"); return startWrite(); } else return false; } else return startWrite(); } else if (startWriteFromNewWriter()) return true; else { long waitTime = msecs; long start = System.currentTimeMillis(); while(true) { try { WriterLock.this.wait(waitTime); } catch(InterruptedException ex) { cancelledWaitingWriter(); WriterLock.this.notifyAll(); ie=ex; break; } if(waitingUpgrader_ != 0) { // Has upgrade request if(upgraderLocal_.get(ReadWriteLockWithUpgrade.this) != null) { // Upgrade thread if(startWriteFromWaitingWriter()) return true; } else { // Normal write thread, go back to wait. continue; } } else { // Every one is normal write thread. Compete; if fail go back to wait. if(startWriteFromWaitingWriter()) return true; } waitTime=msecs - (System.currentTimeMillis() - start); if(waitTime <= 0) { cancelledWaitingWriter(); WriterLock.this.notifyAll(); break; } } } }
the else statement commented with
"// Normal write thread, go back to wait." continues the loop without taking the timeout into account. I'm not sure whether this is the issue here though.
Has anyone seen this before? I've tried to search for it, but could not find any prior reports.