PingStressTests hanging
clebert.suconic Mar 5, 2009 10:01 AMIt' s not formally a deadlock...
But if you enable PingStressTest (getNumberOfIterations() > 10), you will see one thread waiting the executor to finish, and another thread waiting to lock:
"LocalThread i = 10" prio=10 tid=0x00007fc50035c000 nid=0x6484 waiting on condition [0x0000000040849000..0x0000000040849c00] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00007fc50a5eebe8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1963) at java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1244) at org.jboss.messaging.integration.transports.netty.NettyConnector.close(NettyConnector.java:363) - locked <0x00007fc50a6114d0> (a org.jboss.messaging.integration.transports.netty.NettyConnector) at org.jboss.messaging.core.client.impl.ConnectionManagerImpl.checkCloseConnections(ConnectionManagerImpl.java:761) at org.jboss.messaging.core.client.impl.ConnectionManagerImpl.returnConnection(ConnectionManagerImpl.java:852) at org.jboss.messaging.core.client.impl.ConnectionManagerImpl.removeSession(ConnectionManagerImpl.java:387) - locked <0x00007fc50a611708> (a java.lang.Object) - locked <0x00007fc50a6116f8> (a java.lang.Object) at org.jboss.messaging.core.client.impl.ClientSessionImpl.doCleanup(ClientSessionImpl.java:1313) at org.jboss.messaging.core.client.impl.ClientSessionImpl.close(ClientSessionImpl.java:761) at org.jboss.messaging.tests.stress.remote.PingStressTest$1LocalThread.run(PingStressTest.java:248) "Thread-1 (group:jbm-pinger-threads-896472140)" daemon prio=10 tid=0x00007fc500385c00 nid=0x6463 waiting for monitor entry [0x0000000045ece000..0x0000000045ecec00] java.lang.Thread.State: BLOCKED (on object monitor) at org.jboss.messaging.core.client.impl.ConnectionManagerImpl.failover(ConnectionManagerImpl.java:493) - waiting to lock <0x00007fc50a611708> (a java.lang.Object) at org.jboss.messaging.core.client.impl.ConnectionManagerImpl.connectionFailed(ConnectionManagerImpl.java:411) at org.jboss.messaging.core.remoting.impl.RemotingConnectionImpl.callListeners(RemotingConnectionImpl.java:530) at org.jboss.messaging.core.remoting.impl.RemotingConnectionImpl.fail(RemotingConnectionImpl.java:421) at org.jboss.messaging.core.remoting.impl.RemotingConnectionImpl$Pinger.run(RemotingConnectionImpl.java:1537) - locked <0x00007fc50a614508> (a org.jboss.messaging.core.remoting.impl.RemotingConnectionImpl$Pinger) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) "LocalThread i = 10" prio=10 tid=0x00007fc50035c000 nid=0x6484 waiting on condition [0x0000000040849000..0x0000000040849c00] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00007fc50a5eebe8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1963) at java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1244) at org.jboss.messaging.integration.transports.netty.NettyConnector.close(NettyConnector.java:363) - locked <0x00007fc50a6114d0> (a org.jboss.messaging.integration.transports.netty.NettyConnector) at org.jboss.messaging.core.client.impl.ConnectionManagerImpl.checkCloseConnections(ConnectionManagerImpl.java:761) at org.jboss.messaging.core.client.impl.ConnectionManagerImpl.returnConnection(ConnectionManagerImpl.java:852) at org.jboss.messaging.core.client.impl.ConnectionManagerImpl.removeSession(ConnectionManagerImpl.java:387) - locked <0x00007fc50a611708> (a java.lang.Object) - locked <0x00007fc50a6116f8> (a java.lang.Object) at org.jboss.messaging.core.client.impl.ClientSessionImpl.doCleanup(ClientSessionImpl.java:1313) at org.jboss.messaging.core.client.impl.ClientSessionImpl.close(ClientSessionImpl.java:761) at org.jboss.messaging.tests.stress.remote.PingStressTest$1LocalThread.run(PingStressTest.java:248)
NettyConnector.close is waiting the executor to finish,
But the Pinger is waiting the lock, while it holds the executor.
To fix this I would need to run the Pinger cleanup on a different thread. This problem was fixed with this fix:
Index: src/main/org/jboss/messaging/core/remoting/impl/RemotingConnectionImpl.java =================================================================== --- src/main/org/jboss/messaging/core/remoting/impl/RemotingConnectionImpl.java (revision 6007) +++ src/main/org/jboss/messaging/core/remoting/impl/RemotingConnectionImpl.java (working copy) @@ -417,10 +417,17 @@ log.warn("Connection failed, client " + client + " " + System.identityHashCode(this) + " " + me.getMessage(), me); - // Then call the listeners - callListeners(me); + // Using another thread to avoid a hang or dead lock (another thread waiting an executor to f inish, while it holds the lock) + new Thread() + { + public void run() + { + // Then call the listeners + callListeners(me); - internalClose(); + internalClose(); + } + }.start(); } public void destroy()
I found this while investigating another deadlock.
It is related to the waitToFinish change on the executors, but it s probably a different issue. (different from what I originally thought)