8 Replies Latest reply on Mar 6, 2009 10:47 AM by trustin

    PingStressTests hanging

    clebert.suconic

      It' s not formally a deadlock...

      But if you enable PingStressTest (getNumberOfIterations() > 10), you will see one thread waiting the executor to finish, and another thread waiting to lock:

      "LocalThread i = 10" prio=10 tid=0x00007fc50035c000 nid=0x6484 waiting on condition [0x0000000040849000..0x0000000040849c00]
       java.lang.Thread.State: TIMED_WAITING (parking)
       at sun.misc.Unsafe.park(Native Method)
       - parking to wait for <0x00007fc50a5eebe8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
       at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
       at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1963)
       at java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1244)
       at org.jboss.messaging.integration.transports.netty.NettyConnector.close(NettyConnector.java:363)
       - locked <0x00007fc50a6114d0> (a org.jboss.messaging.integration.transports.netty.NettyConnector)
       at org.jboss.messaging.core.client.impl.ConnectionManagerImpl.checkCloseConnections(ConnectionManagerImpl.java:761)
       at org.jboss.messaging.core.client.impl.ConnectionManagerImpl.returnConnection(ConnectionManagerImpl.java:852)
       at org.jboss.messaging.core.client.impl.ConnectionManagerImpl.removeSession(ConnectionManagerImpl.java:387)
       - locked <0x00007fc50a611708> (a java.lang.Object)
       - locked <0x00007fc50a6116f8> (a java.lang.Object)
       at org.jboss.messaging.core.client.impl.ClientSessionImpl.doCleanup(ClientSessionImpl.java:1313)
       at org.jboss.messaging.core.client.impl.ClientSessionImpl.close(ClientSessionImpl.java:761)
       at org.jboss.messaging.tests.stress.remote.PingStressTest$1LocalThread.run(PingStressTest.java:248)
      
      
      "Thread-1 (group:jbm-pinger-threads-896472140)" daemon prio=10 tid=0x00007fc500385c00 nid=0x6463 waiting for monitor entry [0x0000000045ece000..0x0000000045ecec00]
       java.lang.Thread.State: BLOCKED (on object monitor)
       at org.jboss.messaging.core.client.impl.ConnectionManagerImpl.failover(ConnectionManagerImpl.java:493)
       - waiting to lock <0x00007fc50a611708> (a java.lang.Object)
       at org.jboss.messaging.core.client.impl.ConnectionManagerImpl.connectionFailed(ConnectionManagerImpl.java:411)
       at org.jboss.messaging.core.remoting.impl.RemotingConnectionImpl.callListeners(RemotingConnectionImpl.java:530)
       at org.jboss.messaging.core.remoting.impl.RemotingConnectionImpl.fail(RemotingConnectionImpl.java:421)
       at org.jboss.messaging.core.remoting.impl.RemotingConnectionImpl$Pinger.run(RemotingConnectionImpl.java:1537)
       - locked <0x00007fc50a614508> (a org.jboss.messaging.core.remoting.impl.RemotingConnectionImpl$Pinger)
       at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
       at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
       at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
       at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
       at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
       at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
       at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
       at java.lang.Thread.run(Thread.java:619)
      
      
      
      "LocalThread i = 10" prio=10 tid=0x00007fc50035c000 nid=0x6484 waiting on condition [0x0000000040849000..0x0000000040849c00]
       java.lang.Thread.State: TIMED_WAITING (parking)
       at sun.misc.Unsafe.park(Native Method)
       - parking to wait for <0x00007fc50a5eebe8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
       at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
       at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1963)
       at java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1244)
       at org.jboss.messaging.integration.transports.netty.NettyConnector.close(NettyConnector.java:363)
       - locked <0x00007fc50a6114d0> (a org.jboss.messaging.integration.transports.netty.NettyConnector)
       at org.jboss.messaging.core.client.impl.ConnectionManagerImpl.checkCloseConnections(ConnectionManagerImpl.java:761)
       at org.jboss.messaging.core.client.impl.ConnectionManagerImpl.returnConnection(ConnectionManagerImpl.java:852)
       at org.jboss.messaging.core.client.impl.ConnectionManagerImpl.removeSession(ConnectionManagerImpl.java:387)
       - locked <0x00007fc50a611708> (a java.lang.Object)
       - locked <0x00007fc50a6116f8> (a java.lang.Object)
       at org.jboss.messaging.core.client.impl.ClientSessionImpl.doCleanup(ClientSessionImpl.java:1313)
       at org.jboss.messaging.core.client.impl.ClientSessionImpl.close(ClientSessionImpl.java:761)
       at org.jboss.messaging.tests.stress.remote.PingStressTest$1LocalThread.run(PingStressTest.java:248)
      
      



      NettyConnector.close is waiting the executor to finish,

      But the Pinger is waiting the lock, while it holds the executor.



      To fix this I would need to run the Pinger cleanup on a different thread. This problem was fixed with this fix:


      Index: src/main/org/jboss/messaging/core/remoting/impl/RemotingConnectionImpl.java
      ===================================================================
      --- src/main/org/jboss/messaging/core/remoting/impl/RemotingConnectionImpl.java (revision 6007)
      +++ src/main/org/jboss/messaging/core/remoting/impl/RemotingConnectionImpl.java (working copy)
      @@ -417,10 +417,17 @@
      
       log.warn("Connection failed, client " + client + " " + System.identityHashCode(this) + " " + me.getMessage(), me);
      
      - // Then call the listeners
      - callListeners(me);
      + // Using another thread to avoid a hang or dead lock (another thread waiting an executor to f
      inish, while it holds the lock)
      + new Thread()
      + {
      + public void run()
      + {
      + // Then call the listeners
      + callListeners(me);
      
      - internalClose();
      + internalClose();
      + }
      + }.start();
       }
      
       public void destroy()
      
      



      I found this while investigating another deadlock.

      It is related to the waitToFinish change on the executors, but it s probably a different issue. (different from what I originally thought)

        • 1. Re: PingStressTests hanging
          clebert.suconic

          I added some hacking to JBMFactory, to print threads that are still alive...

          And this thread is the one holding the Executor:

          This thread is still created by the factory : Thread[New I/O client worker #1268-1,5,jbm-netty-connector-worker-threads-29494967] name = New I/O client worker #1268-1
          Trace : org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:169)
          Trace : org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:72)
          Trace : java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
          Trace : java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
          Trace : org.jboss.messaging.utils.JBMThreadFactory$1.run(JBMThreadFactory.java:88)
          Trace : java.lang.Thread.run(Thread.java:619)
          


          • 2. Re: PingStressTests hanging
            clebert.suconic

            Trustin, If I could have some help with this:

            I've been running org.jboss.messaging.tests.stress.remote.PingStressTest (change getNumberOfIterations to 100).

            One thread will fail with Ping, while other threads are also trying to close the connections.

            The Pinger thread will be waiting the Executors to shutdown, but a thread will stay alive forever. (Look at my previous post).

            Looking at NioWorker, I can see that selector.keys() is not empty hence the shutdown check is never made.

            I'm not sure if there is something that should be closed after the ping failure.

            If you want to verify what thread is holding the executor, this is the patch for the hack I've done here locally. (which I don't intend to commit it.. it's just for debug purposes):


            Index: src/main/org/jboss/messaging/utils/JBMThreadFactory.java
            ===================================================================
            --- src/main/org/jboss/messaging/utils/JBMThreadFactory.java (revision 6014)
            +++ src/main/org/jboss/messaging/utils/JBMThreadFactory.java (working copy)
            @@ -37,7 +37,31 @@
            
             private final AtomicInteger threadCount = new AtomicInteger(0);
            
            + private final ConcurrentSet<Thread> threads = new ConcurrentHashSet<Thread>();
            +
            + private final String groupName;
            +
            +
            + public void debug()
            + {
            + if (threads.isEmpty())
            + {
            + System.out.println("Thread factory is empty");
            + }
            + for (Thread t : threads)
            + {
            + System.out.println("This thread is still created by the factory : " + t + " name = " + t.getName());
            + StackTraceElement elements[] = t.getStackTrace();
            + for (StackTraceElement trace : elements)
            + {
            + System.out.println("Trace : " + trace);
            + }
            + }
            + }
            +
            +
             private final int threadPriority;
            +
             public JBMThreadFactory(final String groupName)
             {
             this(groupName, Thread.NORM_PRIORITY);
            @@ -47,11 +71,27 @@
             {
             group = new ThreadGroup(groupName + "-" + System.identityHashCode(this));
             threadPriority = priority;
            + this.groupName = groupName;
             }
            
             public Thread newThread(final Runnable command)
             {
            - Thread t = new Thread(group, command, "Thread-" + threadCount.getAndIncrement() +
            + Runnable myRunnable = new Runnable()
            + {
            + public void run()
            + {
            + try
            + {
            + command.run();
            + }
            + finally
            + {
            + threads.remove(Thread.currentThread());
            + }
            + }
            + };
            +
            + Thread t = new Thread(group, myRunnable, "Thread-" + threadCount.getAndIncrement() +
             " (group:" +
             group.getName() +
             ")");
            @@ -59,6 +99,7 @@
             // Don't want to prevent VM from exiting
             t.setDaemon(true);
             t.setPriority(threadPriority);
            + threads.add(t);
             return t;
             }
             }
            Index: src/main/org/jboss/messaging/integration/transports/netty/NettyConnector.java
            ===================================================================
            --- src/main/org/jboss/messaging/integration/transports/netty/NettyConnector.java (revision 6014)
            +++ src/main/org/jboss/messaging/integration/transports/netty/NettyConnector.java (working copy)
            @@ -90,7 +90,10 @@
             public class NettyConnector implements Connector
             {
             // Constants -----------------------------------------------------
            +
            + static JBMThreadFactory workerFactory = new JBMThreadFactory("jbm-netty-connector-worker-threads");
            
            +
             private static final Logger log = Logger.getLogger(NettyConnector.class);
            
             // Attributes ----------------------------------------------------
            @@ -236,7 +239,7 @@
             return;
             }
            
            - workerExecutor = Executors.newCachedThreadPool(new JBMThreadFactory("jbm-netty-connector-worker-threads"));
            + workerExecutor = Executors.newCachedThreadPool(workerFactory);
             if (useServlet)
             {
             bossExecutor = Executors.newCachedThreadPool(new JBMThreadFactory("jbm-netty-connector-boss-threads"));
            @@ -360,10 +363,14 @@
             {
             try
             {
            - if (workerExecutor.awaitTermination(1, TimeUnit.SECONDS))
            + if (workerExecutor.awaitTermination(5, TimeUnit.SECONDS))
             {
             break;
             }
            + else
            + {
            + workerFactory.debug();
            + }
             }
             catch (InterruptedException e)
             {
            




            • 3. Re: PingStressTests hanging
              trustin

              NioWorker will not shut itself down as long as there's any unfinished connection. If it just exits the loop when requested, resources will leak (i.e. some sockets will remain open.) Therefore, you need to make sure all connections are closed before shutting down the Executors.

              When shutting down the Executors, you must make sure that you are not waiting for the termination (ExecutorService.awaitTermination()) of the Executors in the worker thread. It's because you can't wait for the termination of the thread you are running on.

              • 4. Re: PingStressTests hanging
                clebert.suconic

                As far as I know, we are closing all the connections.

                And the awaitTermination is done in another thread.

                • 5. Re: PingStressTests hanging
                  timfox

                  Here's the full dump for completeness:

                  Full thread dump Java HotSpot(TM) Server VM (1.5.0_13-b05 mixed mode):
                  
                  "New I/O client worker #202-1" daemon prio=1 tid=0x0885c330 nid=0x48ba runnable [0x8a16f000..0x8a16fdc0]
                   at sun.nio.ch.PollArrayWrapper.poll0(Native Method)
                   at sun.nio.ch.PollArrayWrapper.poll(PollArrayWrapper.java:100)
                   at sun.nio.ch.PollSelectorImpl.doSelect(PollSelectorImpl.java:56)
                   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
                   - locked <0x91ab37a0> (a sun.nio.ch.Util$1)
                   - locked <0x91ab3790> (a java.util.Collections$UnmodifiableSet)
                   - locked <0x91ab33e0> (a sun.nio.ch.PollSelectorImpl)
                   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
                   at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:158)
                   at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:72)
                   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
                   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
                   at java.lang.Thread.run(Thread.java:595)
                  
                  "New I/O server worker #9-2" daemon prio=1 tid=0x083af868 nid=0x4898 runnable [0x89fec000..0x89fecec0]
                   at sun.nio.ch.PollArrayWrapper.poll0(Native Method)
                   at sun.nio.ch.PollArrayWrapper.poll(PollArrayWrapper.java:100)
                   at sun.nio.ch.PollSelectorImpl.doSelect(PollSelectorImpl.java:56)
                   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
                   - locked <0x91aa4ee0> (a sun.nio.ch.Util$1)
                   - locked <0x91aa4ed0> (a java.util.Collections$UnmodifiableSet)
                   - locked <0x91aa4cf0> (a sun.nio.ch.PollSelectorImpl)
                   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
                   at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:158)
                   at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:72)
                   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
                   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
                   at java.lang.Thread.run(Thread.java:595)
                  
                  "Thread-0 (group:jbm-netty-acceptor-worker-threads-2547660)" daemon prio=1 tid=0x0849d6f8 nid=0x4893 waiting on condition [0x8bcfe000..0x8bcff140]
                   at sun.misc.Unsafe.park(Native Method)
                   at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:146)
                   at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:807)
                   at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1122)
                   at java.util.concurrent.SynchronousQueue$Node.waitForPut(SynchronousQueue.java:291)
                   at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:443)
                   at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:475)
                   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:674)
                   at java.lang.Thread.run(Thread.java:595)
                  
                  "LocalThread i = 16" prio=1 tid=0x8bf68528 nid=0x4884 waiting on condition [0x8b9fc000..0x8b9fd0c0]
                   at sun.misc.Unsafe.park(Native Method)
                   at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:146)
                   at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1879)
                   at java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1028)
                   at org.jboss.messaging.integration.transports.netty.NettyConnector.close(NettyConnector.java:363)
                   - locked <0x91ab5140> (a org.jboss.messaging.integration.transports.netty.NettyConnector)
                   at org.jboss.messaging.core.client.impl.ConnectionManagerImpl.checkCloseConnections(ConnectionManagerImpl.java:774)
                   at org.jboss.messaging.core.client.impl.ConnectionManagerImpl.returnConnection(ConnectionManagerImpl.java:865)
                   at org.jboss.messaging.core.client.impl.ConnectionManagerImpl.removeSession(ConnectionManagerImpl.java:400)
                   - locked <0x91ab5358> (a java.lang.Object)
                   - locked <0x91ab5350> (a java.lang.Object)
                   at org.jboss.messaging.core.client.impl.ClientSessionImpl.doCleanup(ClientSessionImpl.java:1313)
                   at org.jboss.messaging.core.client.impl.ClientSessionImpl.close(ClientSessionImpl.java:761)
                   at org.jboss.messaging.tests.stress.remote.PingStressTest$1LocalThread.run(PingStressTest.java:242)
                  
                  "Timer-17" daemon prio=1 tid=0x8bf64e68 nid=0x4873 in Object.wait() [0x8bafe000..0x8baff140]
                   at java.lang.Object.wait(Native Method)
                   - waiting on <0x91ab9350> (a java.util.TaskQueue)
                   at java.util.TimerThread.mainLoop(Timer.java:509)
                   - locked <0x91ab9350> (a java.util.TaskQueue)
                   at java.util.TimerThread.run(Timer.java:462)
                  
                  "New I/O server boss #9 (channelId: 3075861, /127.0.0.1:5445)" daemon prio=1 tid=0x8bf40ef8 nid=0x4872 runnable [0x8a06d000..0x8a06ddc0]
                   at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
                   at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:145)
                   - locked <0x91aa98a0> (a java.lang.Object)
                   at org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink$Boss.run(NioServerSocketPipelineSink.java:204)
                   at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:72)
                   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
                   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
                   at java.lang.Thread.run(Thread.java:595)
                  
                  "Timer-16" daemon prio=1 tid=0x8bf6bc38 nid=0x4871 in Object.wait() [0x8befe000..0x8befee40]
                   at java.lang.Object.wait(Native Method)
                   - waiting on <0x91aa0db8> (a java.util.TaskQueue)
                   at java.util.TimerThread.mainLoop(Timer.java:509)
                   - locked <0x91aa0db8> (a java.util.TaskQueue)
                   at java.util.TimerThread.run(Timer.java:462)
                  
                  "Thread-0 (group:JBM-scheduled-threads-32455885)" daemon prio=1 tid=0x8bf41ce8 nid=0x4870 waiting on condition [0x8c47d000..0x8c47dec0]
                   at sun.misc.Unsafe.park(Native Method)
                   at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:146)
                   at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1879)
                   at java.util.concurrent.DelayQueue.take(DelayQueue.java:135)
                   at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:504)
                   at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:497)
                   at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470)
                   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:674)
                   at java.lang.Thread.run(Thread.java:595)
                  
                  "Thread-4 (group:jbm-pinger-threads-11110706)" daemon prio=1 tid=0x08766a78 nid=0x45ab waiting on condition [0x899e0000..0x899e0f40]
                   at sun.misc.Unsafe.park(Native Method)
                   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:118)
                   at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1841)
                   at java.util.concurrent.DelayQueue.take(DelayQueue.java:131)
                   at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:504)
                   at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:497)
                   at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470)
                   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:674)
                   at java.lang.Thread.run(Thread.java:595)
                  
                  "Thread-3 (group:jbm-pinger-threads-11110706)" daemon prio=1 tid=0x08763950 nid=0x45a9 waiting on condition [0x89a61000..0x89a62040]
                   at sun.misc.Unsafe.park(Native Method)
                   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:118)
                   at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1841)
                   at java.util.concurrent.DelayQueue.take(DelayQueue.java:131)
                   at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:504)
                   at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:497)
                   at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470)
                   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:674)
                   at java.lang.Thread.run(Thread.java:595)
                  
                  "Thread-2 (group:jbm-pinger-threads-11110706)" daemon prio=1 tid=0x0869b8e8 nid=0x45a7 waiting on condition [0x89b63000..0x89b64140]
                   at sun.misc.Unsafe.park(Native Method)
                   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:118)
                   at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1841)
                   at java.util.concurrent.DelayQueue.take(DelayQueue.java:131)
                   at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:504)
                   at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:497)
                   at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470)
                   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:674)
                   at java.lang.Thread.run(Thread.java:595)
                  
                  "Thread-1 (group:jbm-pinger-threads-11110706)" daemon prio=1 tid=0x8bf8b988 nid=0x45a5 waiting for monitor entry [0x89c65000..0x89c65e40]
                   at org.jboss.messaging.core.client.impl.ConnectionManagerImpl.failover(ConnectionManagerImpl.java:506)
                   - waiting to lock <0x91ab5358> (a java.lang.Object)
                   at org.jboss.messaging.core.client.impl.ConnectionManagerImpl.connectionFailed(ConnectionManagerImpl.java:424)
                   at org.jboss.messaging.core.remoting.impl.RemotingConnectionImpl.callListeners(RemotingConnectionImpl.java:530)
                   at org.jboss.messaging.core.remoting.impl.RemotingConnectionImpl.fail(RemotingConnectionImpl.java:421)
                   at org.jboss.messaging.core.remoting.impl.RemotingConnectionImpl$Pinger.run(RemotingConnectionImpl.java:1549)
                   - locked <0x91ab5648> (a org.jboss.messaging.core.remoting.impl.RemotingConnectionImpl$Pinger)
                   at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:417)
                   at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:280)
                   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:135)
                   at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:65)
                   at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:142)
                   at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:166)
                   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
                   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
                   at java.lang.Thread.run(Thread.java:595)
                  
                  "Thread-0 (group:jbm-pinger-threads-11110706)" daemon prio=1 tid=0x8bf6e2c0 nid=0x45a0 waiting on condition [0x89eea000..0x89eeb0c0]
                   at sun.misc.Unsafe.park(Native Method)
                   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:118)
                   at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1841)
                   at java.util.concurrent.DelayQueue.take(DelayQueue.java:131)
                   at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:504)
                   at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:497)
                   at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470)
                   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:674)
                   at java.lang.Thread.run(Thread.java:595)
                  
                  "ReaderThread" prio=1 tid=0x8c50f3d0 nid=0x4566 runnable [0x8c4fe000..0x8c4fedc0]
                   at java.net.SocketInputStream.socketRead0(Native Method)
                   at java.net.SocketInputStream.read(SocketInputStream.java:129)
                   at sun.nio.cs.StreamDecoder$CharsetSD.readBytes(StreamDecoder.java:411)
                   at sun.nio.cs.StreamDecoder$CharsetSD.implRead(StreamDecoder.java:453)
                   at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:183)
                   - locked <0x919fab30> (a java.io.InputStreamReader)
                   at java.io.InputStreamReader.read(InputStreamReader.java:167)
                   at java.io.BufferedReader.fill(BufferedReader.java:136)
                   at java.io.BufferedReader.readLine(BufferedReader.java:299)
                   - locked <0x919fab30> (a java.io.InputStreamReader)
                   at java.io.BufferedReader.readLine(BufferedReader.java:362)
                   at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner$ReaderThread.run(RemoteTestRunner.java:140)
                  
                  "Low Memory Detector" daemon prio=1 tid=0x08115ea8 nid=0x4564 runnable [0x00000000..0x00000000]
                  
                  "CompilerThread1" daemon prio=1 tid=0x081149d0 nid=0x4563 waiting on condition [0x00000000..0x8d271168]
                  
                  "CompilerThread0" daemon prio=1 tid=0x08113920 nid=0x4562 waiting on condition [0x00000000..0x8d2f2128]
                  
                  "AdapterThread" daemon prio=1 tid=0x081127a0 nid=0x4561 waiting on condition [0x00000000..0x00000000]
                  
                  "Signal Dispatcher" daemon prio=1 tid=0x081118d0 nid=0x4560 waiting on condition [0x00000000..0x00000000]
                  
                  "Finalizer" daemon prio=1 tid=0x081064c0 nid=0x455e in Object.wait() [0x8d4bb000..0x8d4bc140]
                   at java.lang.Object.wait(Native Method)
                   - waiting on <0x91907370> (a java.lang.ref.ReferenceQueue$Lock)
                   at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
                   - locked <0x91907370> (a java.lang.ref.ReferenceQueue$Lock)
                   at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
                   at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
                  
                  "Reference Handler" daemon prio=1 tid=0x08105f08 nid=0x455d in Object.wait() [0x8d53c000..0x8d53cdc0]
                   at java.lang.Object.wait(Native Method)
                   - waiting on <0x9197f6c8> (a java.lang.ref.Reference$Lock)
                   at java.lang.Object.wait(Object.java:474)
                   at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
                   - locked <0x9197f6c8> (a java.lang.ref.Reference$Lock)
                  
                  "main" prio=1 tid=0x0805fd70 nid=0x4554 in Object.wait() [0xbfd53000..0xbfd53bb8]
                   at java.lang.Object.wait(Native Method)
                   - waiting on <0x91aa3ce8> (a org.jboss.messaging.tests.stress.remote.PingStressTest$1LocalThread)
                   at java.lang.Thread.join(Thread.java:1095)
                   - locked <0x91aa3ce8> (a org.jboss.messaging.tests.stress.remote.PingStressTest$1LocalThread)
                   at java.lang.Thread.join(Thread.java:1148)
                   at org.jboss.messaging.tests.stress.remote.PingStressTest.internalTest(PingStressTest.java:266)
                   at org.jboss.messaging.tests.stress.remote.PingStressTest.testMultiThreadOpenAndCloses(PingStressTest.java:111)
                   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
                   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
                   at java.lang.reflect.Method.invoke(Method.java:585)
                   at junit.framework.TestCase.runTest(TestCase.java:164)
                   at junit.framework.TestCase.runBare(TestCase.java:130)
                   at junit.framework.TestResult$1.protect(TestResult.java:106)
                   at junit.framework.TestResult.runProtected(TestResult.java:124)
                   at junit.framework.TestResult.run(TestResult.java:109)
                   at junit.framework.TestCase.run(TestCase.java:120)
                   at junit.framework.TestSuite.runTest(TestSuite.java:230)
                   at junit.framework.TestSuite.run(TestSuite.java:225)
                   at org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130)
                   at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
                   at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460)
                   at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673)
                   at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:386)
                   at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:196)
                  
                  "VM Thread" prio=1 tid=0x081039f0 nid=0x455c runnable
                  
                  "GC task thread#0 (ParallelGC)" prio=1 tid=0x0807a7b8 nid=0x455a runnable
                  
                  "GC task thread#1 (ParallelGC)" prio=1 tid=0x0807b408 nid=0x455b runnable
                  
                  "VM Periodic Task Thread" prio=1 tid=0x08071d58 nid=0x4565 waiting on condition
                  


                  You can see we have a thread awaiting termination of the netty worker thread pool.

                  There are also two netty worker threads and one netty acceptor thread in the dump.

                  Both netty worker threads are just sitting on a select:

                  at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)

                  They're not doing anything special and the thread that calls awaitTermination is not a netty worker thread.

                  My hunch is there is a race condition in netty, so when shutdownNow() is called, the thread is interrupted but it doesn't exit the loop properly.

                  This can be easily replicated by running PingStressTest until it hangs and doing a killall -3 to get the dump.




                  • 6. Re: PingStressTests hanging
                    trustin

                    I've just checked in the fix: http://fisheye.jboss.org/changelog/Messaging/trunk?cs=6017

                    To shut down a ChannelFactory, all Netty channels must be closed, including server sockets and accepted / connected sockets. I made sure all channels are added to a ChannelGroup so that they are closed before ChannelFactory is shut down.

                    To achieve this behavior, I modified MessagingChannelHandler to override channelOpen() handler method and added a Channel to a ChannelGroup which was specified in the constructor. Because the previous NettyConnector implementation didn't have a ChannelGroup field, I added one and called ChannelGroup.close() to clean up all connections.

                    On the other hand, there's a convenience method called 'releaseExternalResources()' in ChannelFactory. It shuts down all Executors which were specified in the ChannelFactory's constructor. I replaced the verbose shutdown / awaitTermination loop with that. I hope you like it.

                    After the modification, I confirmed the PingStressTest passes with no freeze. With 100 iterations, it took about 3 minutes to finish in my Intel Core 2 Quad processor.

                    • 7. Re: PingStressTests hanging
                      timfox

                      Thanks Trustin.

                      As always, fixed promptly :)

                      • 8. Re: PingStressTests hanging
                        trustin

                        You're welcome. Anytime!