14 Replies Latest reply on Mar 27, 2009 12:30 PM by clebert.suconic

    intermittent failures i am seeing

    ataylor

      1. MessageChunkTest

      Timed out waiting for response when sending packet 43
      
      MessagingException[errorCode=3 message=Timed out waiting for response when sending packet 43]
       at org.jboss.messaging.core.remoting.impl.RemotingConnectionImpl$ChannelImpl.sendBlocking(RemotingConnectionImpl.java:1107)
       at org.jboss.messaging.core.client.impl.ClientSessionImpl.commit(ClientSessionImpl.java:500)
       at org.jboss.messaging.tests.integration.chunkmessage.ChunkTestBase.testChunks(ChunkTestBase.java:191)
       at org.jboss.messaging.tests.integration.chunkmessage.ChunkTestBase.testChunks(ChunkTestBase.java:94)
       at org.jboss.messaging.tests.integration.chunkmessage.MessageChunkTest.testMessageChunkFilePersistenceBlocked(MessageChunkTest.java:300)
      


      2. JMSFailoverTest. sometimes the wrong exception is thrown, but it may be valid

      expected:<MessagingException[errorCode=2 message=null]> but was:<MessagingException[errorCode=102 message=The conn has been closed by the server]>
      
      junit.framework.AssertionFailedError: expected:<MessagingException[errorCode=2 message=null]> but was:<MessagingException[errorCode=102 message=The conn has been closed by the server]>
       at org.jboss.messaging.tests.integration.jms.cluster.JMSFailoverTest.testManualFailover(JMSFailoverTest.java:283)
      


      3. SymmetricClusterWithDiscoveryTest (i see this on different tests)

      Timed out waiting for bindings (bindingCount = 23, totConsumers = 22
      
      java.lang.IllegalStateException: Timed out waiting for bindings (bindingCount = 23, totConsumers = 22
       at org.jboss.messaging.tests.integration.cluster.distribution.ClusterTestBase.waitForBindings(ClusterTestBase.java:220)
       at org.jboss.messaging.tests.integration.cluster.distribution.SymmetricClusterTest.testMixtureLoadBalancedAndNonLoadBalancedQueues(SymmetricClusterTest.java:428)
      




        • 1. Re: intermittent failures i am seeing
          clebert.suconic

           

          1. MessageChunkTest


          This test will use a lot of disk. *Maybe* if you are using your disk for something else or if your disk is slow, this test could fail.

          I have ran it for 100 times.. no failures. But I will keep an eye on the test.


          2) I have seen it on Hudson. We should investigate it.


          3) This was raised as part of https://jira.jboss.org/jira/browse/JBMESSAGING-1551. It may be related to a different issue besides of Redistribution but we don't know yet.



          • 2. Re: intermittent failures i am seeing
            timfox

             

            "ataylor" wrote:


            2. JMSFailoverTest. sometimes the wrong exception is thrown, but it may be valid
            




            I have run this over 600 times now in a loop with no failures.

            • 3. Re: intermittent failures i am seeing
              clebert.suconic

              +PagingFailoverTests (after the latest clustering changes).

              The stresstestsuite will confirm that.

              • 4. Re: intermittent failures i am seeing
                clebert.suconic

                ant -Dtest-mask=ClientSessionStopStartTest integration-tests

                The test will have two failures one in four executions.

                It didn't show up on Hudson yet.. probably because of ClassLoading not slowing down the test when part of the whole testsuite, but the test shouldn't fail in any circunstance.

                (I have talked to Andy on IRC about this today already.. just wanted to update this thread).

                • 5. Re: intermittent failures i am seeing
                  ataylor

                  Clebert, Ive ran this in a loop for 100's of times withouit failures. can you apply the following patch.

                  Index: tests/src/org/jboss/messaging/tests/integration/client/ClientSessionStopStartTest.java
                  ===================================================================
                  --- tests/src/org/jboss/messaging/tests/integration/client/ClientSessionStopStartTest.java (revision 6180)
                  +++ tests/src/org/jboss/messaging/tests/integration/client/ClientSessionStopStartTest.java Fri Mar 27 08:15:18 GMT 2009
                  @@ -590,10 +590,13 @@
                   assertNull(cm);
                  
                   session.start();
                  + System.out.println("consumer = " + consumer);
                   cm = consumer.receive(5000);
                   assertNotNull(cm);
                  + System.out.println("consumer2 = " + consumer2);
                   cm = consumer2.receive(5000);
                   assertNotNull(cm);
                  + System.out.println("consumer3 = " + consumer3);
                   cm = consumer3.receive(5000);
                   assertNotNull(cm);
                   session.close();
                  Index: src/main/org/jboss/messaging/core/client/impl/ClientConsumerImpl.java
                  ===================================================================
                  --- src/main/org/jboss/messaging/core/client/impl/ClientConsumerImpl.java (revision 6173)
                  +++ src/main/org/jboss/messaging/core/client/impl/ClientConsumerImpl.java Fri Mar 27 08:15:18 GMT 2009
                  @@ -763,6 +763,11 @@
                   channel.write(ByteBuffer.wrap(body));
                   }
                  
                  + public String toString()
                  + {
                  + return "messages in buffer = " + buffer.size();
                  + }
                  +
                   // Inner classes
                   // --------------------------------------------------------------------------------
                  
                  


                  i would always see

                  consumer = messages in buffer = 24
                  consumer2 = messages in buffer = 24
                  consumer3 = messages in buffer = 24


                  Since ive set the clientwindowsize it should guarantee that all the consumers receive messages and have messages in there buffer.

                  • 6. Re: intermittent failures i am seeing
                    clebert.suconic

                    It failed yesterday on hudson:

                    http://hudson.qa.jboss.com/hudson/view/JBM%202/job/JBM2-tests/4197/testReport/org.jboss.messaging.tests.integration.client/ClientSessionStopStartTest/testStopStartMultipleConsumers/

                    (It failed before I published the build, so I won't have a public link for this one ATM.. just the internal link)

                    • 7. Re: intermittent failures i am seeing
                      clebert.suconic

                      I kept calling "ant -Dtest-mask=ClientSessionStopStartTest integration-tests" until it failed... (about 4 times):

                       [echo]
                       [junit] Running org.jboss.messaging.tests.integration.client.ClientSessionStopStartTest
                       [junit] main 08:20:24,939 INFO [UnitTestCase] ###### starting test testStopStartConsumerSyncReceiveImmediate
                       [junit] main 08:20:25,019 WARN [SecurityStoreImpl] It has been detected that the cluster admin password which is used to replicate management operation from one node to the other has not had its password changed from the installation default. Please see the JBoss Messaging user guide for instructions on how to do this.
                       [junit] main 08:20:25,284 INFO [UnitTestCase] ###### starting test testStopStartConsumerSyncReceive
                       [junit] main 08:20:25,285 WARN [SecurityStoreImpl] It has been detected that the cluster admin password which is used to replicate management operation from one node to the other has not had its password changed from the installation default. Please see the JBoss Messaging user guide for instructions on how to do this.
                       [junit] main 08:20:26,316 INFO [UnitTestCase] ###### starting test testStopStartConsumerAsyncSyncStoppedByHandler
                       [junit] main 08:20:26,317 WARN [SecurityStoreImpl] It has been detected that the cluster admin password which is used to replicate management operation from one node to the other has not had its password changed from the installation default. Please see the JBoss Messaging user guide for instructions on how to do this.
                       [junit] main 08:20:26,463 INFO [UnitTestCase] ###### starting test testStopStartConsumerAsyncSync
                       [junit] main 08:20:26,464 WARN [SecurityStoreImpl] It has been detected that the cluster admin password which is used to replicate management operation from one node to the other has not had its password changed from the installation default. Please see the JBoss Messaging user guide for instructions on how to do this.
                       [junit] Thread-3 (group:jbm-pinger-threads-169081233) 08:20:36,467 WARN [RemotingConnectionImpl] Connection failure has been detected Did not receive pong from server, active true client true:2
                       [junit] main 08:20:36,480 WARN [ClientConsumerImpl] Timed out waiting for handler to complete processing
                       [junit] main 08:20:36,485 INFO [UnitTestCase] ###### starting test testStopStartConsumerAsyncASyncStoppeeByHandler
                       [junit] main 08:20:36,486 WARN [SecurityStoreImpl] It has been detected that the cluster admin password which is used to replicate management operation from one node to the other has not had its password changed from the installation default. Please see the JBoss Messaging user guide for instructions on how to do this.
                       [junit] main 08:20:36,712 INFO [UnitTestCase] ###### starting test testStopStartConsumerAsyncASync
                       [junit] main 08:20:36,713 WARN [SecurityStoreImpl] It has been detected that the cluster admin password which is used to replicate management operation from one node to the other has not had its password changed from the installation default. Please see the JBoss Messaging user guide for instructions on how to do this.
                       [junit] main 08:20:36,933 INFO [UnitTestCase] ###### starting test testStopStartMultipleConsumers
                       [junit] main 08:20:36,934 WARN [SecurityStoreImpl] It has been detected that the cluster admin password which is used to replicate management operation from one node to the other has not had its password changed from the installation default. Please see the JBoss Messaging user guide for instructions on how to do this.
                       [junit] consumer = messages in buffer = 2
                       [junit] consumer2 = messages in buffer = 4
                       [junit] consumer3 = messages in buffer = 5
                       [junit] Thread-29 08:20:36,962 WARN [RemotingConnectionImpl] Connection failure has been detected The connection has been closed by the server:102
                       [junit] Thread-29 08:20:36,962 WARN [RemotingConnectionImpl] Connection failure has been detected The connection has been closed by the server:102
                       [junit] main 08:20:36,964 INFO [UnitTestCase] ###### starting test testStopStartAlreadyStartedSession
                       [junit] main 08:20:36,964 WARN [SecurityStoreImpl] It has been detected that the cluster admin password which is used to replicate management operation from one node to the other has not had its password changed from the installation default. Please see the JBoss Messaging user guide for instructions on how to do this.
                       [junit] main 08:20:36,985 INFO [UnitTestCase] ###### starting test testStopAlreadyStoppedSession
                       [junit] main 08:20:36,986 WARN [SecurityStoreImpl] It has been detected that the cluster admin password which is used to replicate management operation from one node to the other has not had its password changed from the installation default. Please see the JBoss Messaging user guide for instructions on how to do this.
                       [junit] Tests run: 9, Failures: 0, Errors: 1, Time elapsed: 12.451 sec
                       [junit] Test org.jboss.messaging.tests.integration.client.ClientSessionStopStartTest FAILED
                      



                      This is one of the Failures that will eventually happen:


                      
                      Connection is destroyed
                      
                      MessagingException[errorCode=2 message=Connection is destroyed]
                       at org.jboss.messaging.core.remoting.impl.RemotingConnectionImpl$ChannelImpl.sendBlocking(RemotingConnectionImpl.java:1072)
                       at org.jboss.messaging.core.client.impl.ClientSessionImpl.stop(ClientSessionImpl.java:627)
                       at org.jboss.messaging.tests.integration.client.ClientSessionStopStartTest.testStopStartConsumerAsyncSync(ClientSessionStopStartTest.java:317
                      




                      I have also seen another Failure.... Let me try to reproduce the other one.

                      • 8. Re: intermittent failures i am seeing
                        clebert.suconic

                        I took a thread dump right before the exception:

                         [junit] Full thread dump Java HotSpot(TM) 64-Bit Server VM (1.5.0_16-b02 mixed mode):
                         [junit]
                         [junit] "Thread-0 (group:JBM-async-session-delivery-threads-528574181)" daemon prio=1 tid=0x00007f07f41dda70 nid=0x57a6 waiting on condition [0x00000000424fc000..0x00000000424fce00]
                         [junit] at sun.misc.Unsafe.park(Native Method)
                         [junit] at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:146)
                         [junit] at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:807)
                         [junit] at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1122)
                         [junit] at java.util.concurrent.SynchronousQueue$Node.waitForPut(SynchronousQueue.java:291)
                         [junit] at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:443)
                         [junit] at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:475)
                         [junit] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:674)
                         [junit] at java.lang.Thread.run(Thread.java:595)
                         [junit]
                         [junit] "Thread-3 (group:jbm-pinger-threads-1155557696)" daemon prio=1 tid=0x00007f07f03772b0 nid=0x57a5 waiting on condition [0x00000000406a9000..0x00000000406a9d80]
                         [junit] at sun.misc.Unsafe.park(Native Method)
                         [junit] at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:146)
                         [junit] at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1879)
                         [junit] at java.util.concurrent.DelayQueue.take(DelayQueue.java:135)
                         [junit] at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:504)
                         [junit] at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:497)
                         [junit] at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470)
                         [junit] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:674)
                         [junit] at java.lang.Thread.run(Thread.java:595)
                         [junit]
                         [junit] "Timer-7" daemon prio=1 tid=0x00007f07f03d77e0 nid=0x57a4 in Object.wait() [0x00000000423fb000..0x00000000423fbb00]
                         [junit] at java.lang.Object.wait(Native Method)
                         [junit] - waiting on <0x00007f0826208b20> (a java.util.TaskQueue)
                         [junit] at java.util.TimerThread.mainLoop(Timer.java:509)
                         [junit] - locked <0x00007f0826208b20> (a java.util.TaskQueue)
                         [junit] at java.util.TimerThread.run(Timer.java:462)
                         [junit]
                         [junit] "Timer-6" daemon prio=1 tid=0x00007f07f08162c0 nid=0x57a3 in Object.wait() [0x000000004020f000..0x000000004020fa80]
                         [junit] at java.lang.Object.wait(Native Method)
                         [junit] - waiting on <0x00007f0826204e58> (a java.util.TaskQueue)
                         [junit] at java.util.TimerThread.mainLoop(Timer.java:509)
                         [junit] - locked <0x00007f0826204e58> (a java.util.TaskQueue)
                         [junit] at java.util.TimerThread.run(Timer.java:462)
                         [junit]
                         [junit] "Thread-0 (group:JBM-scheduled-threads-1818873899)" daemon prio=1 tid=0x00007f07f0377cc0 nid=0x57a2 waiting on condition [0x00000000418c2000..0x00000000418c2c00]
                         [junit] at sun.misc.Unsafe.park(Native Method)
                         [junit] at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:146)
                         [junit] at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1879)
                         [junit] at java.util.concurrent.DelayQueue.take(DelayQueue.java:135)
                         [junit] at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:504)
                         [junit] at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:497)
                         [junit] at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470)
                         [junit] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:674)
                         [junit] at java.lang.Thread.run(Thread.java:595)
                         [junit]
                         [junit] "Thread-1 (group:jbm-client-session-threads-1516725303)" daemon prio=1 tid=0x00007f07f0966250 nid=0x579d waiting for monitor entry [0x00000000407aa000..0x00000000407aad80]
                         [junit] at org.jboss.messaging.core.client.impl.ClientConsumerImpl.setMessageHandler(ClientConsumerImpl.java:242)
                         [junit] - waiting to lock <0x00007f08262bfea8> (a org.jboss.messaging.core.client.impl.ClientConsumerImpl)
                         [junit] at org.jboss.messaging.tests.integration.client.ClientSessionStopStartTest$2MyHandler.onMessage(ClientSessionStopStartTest.java:301)
                         [junit] at org.jboss.messaging.core.client.impl.ClientConsumerImpl.callOnMessage(ClientConsumerImpl.java:592)
                         [junit] at org.jboss.messaging.core.client.impl.ClientConsumerImpl.access$300(ClientConsumerImpl.java:45)
                         [junit] at org.jboss.messaging.core.client.impl.ClientConsumerImpl$Runner.run(ClientConsumerImpl.java:776)
                         [junit] at org.jboss.messaging.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:105)
                         [junit] at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
                         [junit] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
                         [junit] at java.lang.Thread.run(Thread.java:595)
                         [junit]
                         [junit] "Thread-2 (group:JBM-InVM-Transport-Threads-257798932)" daemon prio=1 tid=0x00007f07f41dca80 nid=0x579c waiting for monitor entry [0x00000000421f9000..0x00000000421f9b00]
                         [junit] at org.jboss.messaging.core.client.impl.ClientConsumerImpl.handleMessage(ClientConsumerImpl.java:333)
                         [junit] - waiting to lock <0x00007f08262bfea8> (a org.jboss.messaging.core.client.impl.ClientConsumerImpl)
                         [junit] at org.jboss.messaging.core.client.impl.ClientSessionImpl.handleReceiveMessage(ClientSessionImpl.java:718)
                         [junit] at org.jboss.messaging.core.client.impl.ClientSessionPacketHandler.handlePacket(ClientSessionPacketHandler.java:84)
                         [junit] at org.jboss.messaging.core.remoting.impl.RemotingConnectionImpl$ChannelImpl.handlePacket(RemotingConnectionImpl.java:1535)
                         [junit] at org.jboss.messaging.core.remoting.impl.RemotingConnectionImpl$ChannelImpl.access$600(RemotingConnectionImpl.java:893)
                         [junit] at org.jboss.messaging.core.remoting.impl.RemotingConnectionImpl.bufferReceived(RemotingConnectionImpl.java:522)
                         [junit] - locked <0x00007f082620c510> (a java.lang.Object)
                         [junit] at org.jboss.messaging.core.client.impl.ConnectionManagerImpl$DelegatingBufferHandler.bufferReceived(ConnectionManagerImpl.java:1066)
                         [junit] at org.jboss.messaging.core.remoting.impl.invm.InVMConnection$1.run(InVMConnection.java:128)
                         [junit] at org.jboss.messaging.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:105)
                         [junit] at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
                         [junit] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
                         [junit] at java.lang.Thread.run(Thread.java:595)
                         [junit]
                         [junit] "Thread-2 (group:jbm-pinger-threads-1155557696)" daemon prio=1 tid=0x00007f07f037cc00 nid=0x579a waiting on condition [0x00000000408bc000..0x00000000408bcc00]
                         [junit] at sun.misc.Unsafe.park(Native Method)
                         [junit] at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:146)
                         [junit] at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1879)
                         [junit] at java.util.concurrent.DelayQueue.take(DelayQueue.java:135)
                         [junit] at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:504)
                         [junit] at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:497)
                         [junit] at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470)
                         [junit] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:674)
                         [junit] at java.lang.Thread.run(Thread.java:595)
                         [junit]
                         [junit] "Thread-1 (group:jbm-pinger-threads-1155557696)" daemon prio=1 tid=0x00007f07f03758a0 nid=0x5791 waiting on condition [0x0000000041e23000..0x0000000041e23b80]
                         [junit] at sun.misc.Unsafe.park(Native Method)
                         [junit] at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:146)
                         [junit] at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1879)
                         [junit] at java.util.concurrent.DelayQueue.take(DelayQueue.java:135)
                         [junit] at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:504)
                         [junit] at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:497)
                         [junit] at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470)
                         [junit] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:674)
                         [junit] at java.lang.Thread.run(Thread.java:595)
                         [junit]
                         [junit] "Thread-0 (group:jbm-client-session-threads-1516725303)" daemon prio=1 tid=0x00007f07f08b7b60 nid=0x5789 waiting on condition [0x00000000420f8000..0x00000000420f8b80]
                         [junit] at sun.misc.Unsafe.park(Native Method)
                         [junit] at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:146)
                         [junit] at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:807)
                         [junit] at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1122)
                         [junit] at java.util.concurrent.SynchronousQueue$Node.waitForPut(SynchronousQueue.java:291)
                         [junit] at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:443)
                         [junit] at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:475)
                         [junit] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:674)
                         [junit] at java.lang.Thread.run(Thread.java:595)
                         [junit]
                         [junit] "Thread-1 (group:JBM-InVM-Transport-Threads-257798932)" daemon prio=1 tid=0x0000000041a20550 nid=0x5787 waiting on condition [0x0000000040411000..0x0000000040411c80]
                         [junit] at sun.misc.Unsafe.park(Native Method)
                         [junit] at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:146)
                         [junit] at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:807)
                         [junit] at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1122)
                         [junit] at java.util.concurrent.SynchronousQueue$Node.waitForPut(SynchronousQueue.java:291)
                         [junit] at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:443)
                         [junit] at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:475)
                         [junit] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:674)
                         [junit] at java.lang.Thread.run(Thread.java:595)
                         [junit]
                         [junit] "Thread-0 (group:JBM-InVM-Transport-Threads-257798932)" daemon prio=1 tid=0x00007f07e8002a40 nid=0x5786 waiting on condition [0x0000000041ff7000..0x0000000041ff7e00]
                         [junit] at sun.misc.Unsafe.park(Native Method)
                         [junit] at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:146)
                         [junit] at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:807)
                         [junit] at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1122)
                         [junit] at java.util.concurrent.SynchronousQueue$Node.waitForPut(SynchronousQueue.java:291)
                         [junit] at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:443)
                         [junit] at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:475)
                         [junit] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:674)
                         [junit] at java.lang.Thread.run(Thread.java:595)
                         [junit]
                         [junit] "Thread-0 (group:jbm-pinger-threads-1155557696)" daemon prio=1 tid=0x00007f07e809d340 nid=0x5785 waiting on condition [0x0000000040310000..0x0000000040310d80]
                         [junit] at sun.misc.Unsafe.park(Native Method)
                         [junit] at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:146)
                         [junit] at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1879)
                         [junit] at java.util.concurrent.DelayQueue.take(DelayQueue.java:135)
                         [junit] at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:504)
                         [junit] at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:497)
                         [junit] at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470)
                         [junit] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:674)
                         [junit] at java.lang.Thread.run(Thread.java:595)
                         [junit]
                         [junit] "Low Memory Detector" daemon prio=1 tid=0x00007f07f0017d60 nid=0x5780 runnable [0x0000000000000000..0x0000000000000000]
                         [junit]
                         [junit] "CompilerThread1" daemon prio=1 tid=0x00007f07f0016300 nid=0x577f waiting on condition [0x0000000000000000..0x00000000415be560]
                         [junit]
                         [junit] "CompilerThread0" daemon prio=1 tid=0x00007f07f0014f60 nid=0x577e waiting on condition [0x0000000000000000..0x00000000414bd690]
                         [junit]
                         [junit] "AdapterThread" daemon prio=1 tid=0x00007f07f0013b90 nid=0x577d waiting on condition [0x0000000000000000..0x0000000000000000]
                         [junit]
                         [junit] "Signal Dispatcher" daemon prio=1 tid=0x00007f07f00128d0 nid=0x577c waiting on condition [0x0000000000000000..0x0000000000000000]
                         [junit]
                         [junit] "Finalizer" daemon prio=1 tid=0x0000000041a05d70 nid=0x577b in Object.wait() [0x0000000041d22000..0x0000000041d22a80]
                         [junit] at java.lang.Object.wait(Native Method)
                         [junit] - waiting on <0x00007f07fae8bb38> (a java.lang.ref.ReferenceQueue$Lock)
                         [junit] at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:120)
                         [junit] - locked <0x00007f07fae8bb38> (a java.lang.ref.ReferenceQueue$Lock)
                         [junit] at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:136)
                         [junit] at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
                         [junit]
                         [junit] "Reference Handler" daemon prio=1 tid=0x0000000041a05280 nid=0x577a in Object.wait() [0x0000000041c21000..0x0000000041c21c00]
                         [junit] at java.lang.Object.wait(Native Method)
                         [junit] - waiting on <0x00007f07fadfa540> (a java.lang.ref.Reference$Lock)
                         [junit] at java.lang.Object.wait(Object.java:474)
                         [junit] at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
                         [junit] - locked <0x00007f07fadfa540> (a java.lang.ref.Reference$Lock)
                         [junit]
                         [junit] "main" prio=1 tid=0x00000000418cfb80 nid=0x576d in Object.wait() [0x00007fff4ae1f000..0x00007fff4ae20620]
                         [junit] at java.lang.Object.wait(Native Method)
                         [junit] - waiting on <0x00007f08262c5260> (a org.jboss.messaging.utils.Future)
                         [junit] at org.jboss.messaging.utils.Future.await(Future.java:45)
                         [junit] - locked <0x00007f08262c5260> (a org.jboss.messaging.utils.Future)
                         [junit] at org.jboss.messaging.core.client.impl.ClientConsumerImpl.waitForOnMessageToComplete(ClientConsumerImpl.java:543)
                         [junit] at org.jboss.messaging.core.client.impl.ClientConsumerImpl.stop(ClientConsumerImpl.java:306)
                         [junit] - locked <0x00007f08262bfea8> (a org.jboss.messaging.core.client.impl.ClientConsumerImpl)
                         [junit] at org.jboss.messaging.core.client.impl.ClientSessionImpl.stop(ClientSessionImpl.java:624)
                         [junit] at org.jboss.messaging.tests.integration.client.ClientSessionStopStartTest.testStopStartConsumerAsyncSync(ClientSessionStopStartTest.java:319)
                         [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                         [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
                         [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
                         [junit] at java.lang.reflect.Method.invoke(Method.java:585)
                         [junit] at junit.framework.TestCase.runTest(TestCase.java:164)
                         [junit] at junit.framework.TestCase.runBare(TestCase.java:130)
                         [junit] at junit.framework.TestResult$1.protect(TestResult.java:106)
                         [junit] at junit.framework.TestResult.runProtected(TestResult.java:124)
                         [junit] at junit.framework.TestResult.run(TestResult.java:109)
                         [junit] at junit.framework.TestCase.run(TestCase.java:120)
                         [junit] at junit.framework.TestSuite.runTest(TestSuite.java:230)
                         [junit] at junit.framework.TestSuite.run(TestSuite.java:225)
                         [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420)
                         [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911)
                         [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:768)
                         [junit]
                         [junit] "VM Thread" prio=1 tid=0x0000000041a00d90 nid=0x5779 runnable
                         [junit]
                         [junit] "GC task thread#0 (ParallelGC)" prio=1 tid=0x00000000418ee0f0 nid=0x5771 runnable
                         [junit]
                         [junit] "GC task thread#1 (ParallelGC)" prio=1 tid=0x00000000418eef70 nid=0x5772 runnable
                         [junit]
                         [junit] "GC task thread#2 (ParallelGC)" prio=1 tid=0x00000000418efdf0 nid=0x5773 runnable
                         [junit]
                         [junit] "GC task thread#3 (ParallelGC)" prio=1 tid=0x00000000418f0c70 nid=0x5774 runnable
                         [junit]
                         [junit] "GC task thread#4 (ParallelGC)" prio=1 tid=0x00000000418f1af0 nid=0x5775 runnable
                         [junit]
                         [junit] "GC task thread#5 (ParallelGC)" prio=1 tid=0x00000000418f2970 nid=0x5776 runnable
                         [junit]
                         [junit] "GC task thread#6 (ParallelGC)" prio=1 tid=0x00000000418f37f0 nid=0x5777 runnable
                         [junit]
                         [junit] "GC task thread#7 (ParallelGC)" prio=1 tid=0x00000000418f4670 nid=0x5778 runnable
                         [junit]
                         [junit] "VM Periodic Task Thread" prio=1 tid=0x00007f07f001a740 nid=0x5781 waiting on condition
                         [junit]
                         [junit] Thread-0 (group:jbm-pinger-threads-1155557696) 08:55:45,863 WARN [RemotingConnectionImpl] Connection failure has been detected Did not
                        


                        • 9. Re: intermittent failures i am seeing
                          ataylor

                          Ok, i can see from that what the issue is. Theres a race condition on a lock. leave it with me

                          • 10. Re: intermittent failures i am seeing
                            timfox

                            Don't get the clientconsumer lock before calling waitForOnMessageToComplete() in stop()

                            • 11. Re: intermittent failures i am seeing
                              ataylor

                              yes, Ive realized that. Clebert can you unsync the method and try again.

                              • 12. Re: intermittent failures i am seeing
                                timfox

                                I think it still needs to be synced to protect the stopped flag, but this should be after the waitForOnMessagToComplete()

                                • 13. Re: intermittent failures i am seeing
                                  ataylor

                                  agreed, clebert can you try with this code change while I'm running the test suite

                                  public void stop() throws MessagingException
                                   {
                                   waitForOnMessageToComplete();
                                   synchronized (this)
                                   {
                                   if (stopped)
                                   {
                                   return;
                                   }
                                   stopped = true;
                                   }
                                   }


                                  • 14. Re: intermittent failures i am seeing
                                    clebert.suconic

                                    It didn't fail with that change.