8 Replies Latest reply on Dec 4, 2008 8:23 AM by clebert.suconic

    Weird deadlock I had today...

    clebert.suconic


      Can I be an user for one second? :-)

      I need help! now! Urgent! (Hopefully you won't send me to the user's forum :-P )


      Jokes apart:

      I wrote a test on trunk that was using a lot of disk (doing too many syncs), what left my box at some unusable state. (I couldn't even click anywhere).

      When the test was over, it dead locked on the following condition, which I wanted to share with you guys.

      I didn't do any investigation yet.. just registering for future investigations for now.

      Found one Java-level deadlock:
      =============================
      "Thread-4 (group:jbm-pinger-threads-1418257117)":
       waiting to lock monitor 0x00007f514714fc28 (object 0x00007f5177468c88, a java.lang.Object),
       which is held by "Thread-0 (group:jbm-pinger-threads-1418257117)"
      "Thread-0 (group:jbm-pinger-threads-1418257117)":
       waiting to lock monitor 0x00007f5147204e48 (object 0x00007f51774c80c0, a java.lang.Object),
       which is held by "Thread-4 (group:jbm-pinger-threads-1418257117)"
      
      Java stack information for the threads listed above:
      ===================================================
      "Thread-4 (group:jbm-pinger-threads-1418257117)":
       at org.jboss.messaging.core.remoting.impl.RemotingConnectionImpl.destroy(RemotingConnectionImpl.java:446)
       - waiting to lock <0x00007f5177468c88> (a java.lang.Object)
       at org.jboss.messaging.core.client.impl.ConnectionManagerImpl.connectionFailed(ConnectionManagerImpl.java:545)
       - locked <0x00007f51774c80c0> (a java.lang.Object)
       at org.jboss.messaging.core.remoting.impl.RemotingConnectionImpl.callListeners(RemotingConnectionImpl.java:544)
       at org.jboss.messaging.core.remoting.impl.RemotingConnectionImpl.fail(RemotingConnectionImpl.java:431)
       - locked <0x00007f517740f448> (a java.lang.Object)
       at org.jboss.messaging.core.remoting.impl.RemotingConnectionImpl$Pinger.run(RemotingConnectionImpl.java:1557)
       - locked <0x00007f517740f350> (a org.jboss.messaging.core.remoting.impl.RemotingConnectionImpl$Pinger)
       at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
       at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
       at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
       at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
       at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
       at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
       at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
       at java.lang.Thread.run(Thread.java:619)
      "Thread-0 (group:jbm-pinger-threads-1418257117)":
       at org.jboss.messaging.core.client.impl.ConnectionManagerImpl.connectionFailed(ConnectionManagerImpl.java:436)
       - waiting to lock <0x00007f51774c80c0> (a java.lang.Object)
       at org.jboss.messaging.core.remoting.impl.RemotingConnectionImpl.callListeners(RemotingConnectionImpl.java:544)
       at org.jboss.messaging.core.remoting.impl.RemotingConnectionImpl.fail(RemotingConnectionImpl.java:431)
       - locked <0x00007f5177468c88> (a java.lang.Object)
       at org.jboss.messaging.core.remoting.impl.RemotingConnectionImpl$Pinger.run(RemotingConnectionImpl.java:1557)
       - locked <0x00007f5177449738> (a org.jboss.messaging.core.remoting.impl.RemotingConnectionImpl$Pinger)
       at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
       at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
       at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
       at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
       at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
       at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
       at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
       at java.lang.Thread.run(Thread.java:619)
      
      





        • 1. Re: Weird deadlock I had today...
          clebert.suconic

          What happened on my test was a call to close taking longer than PING_TIME.

          The test was sending 100 messages, in 10 threads. All the sends were being performed asynchronously, and every PersistentMessage sent was syncing the page (it would be same effect on Journal if using NIO). When close was called, it had to wait the pending commands to finish, what took longer than the PING_TIME.

          But you shouldn't have a DeadLock on that case.. right?

          • 2. Re: Weird deadlock I had today...
            clebert.suconic

            To replicate this:

            I - Open org.jboss.messaging.tests.performance.paging.MeasurePagingMultiThreadTest

            II - Change testPagingMultipleSenders to:

             public void testPagingMultipleSenders() throws Throwable
             {
            
             final int NUMBER_OF_THREADS = 18;
             final int NUMBER_OF_MESSAGES = 100;
             final int SIZE_OF_MESSAGE = 1024;
            
             Configuration config = createDefaultConfig();
            
             HashMap<String, QueueSettings> settings = new HashMap<String, QueueSettings>();
            
             config.setPagingMaxGlobalSizeBytes(20 * 1024);
             config.setJournalSyncNonTransactional(true);
            


            III - run the test inside Eclipse.


            Wait some time...., and call the kill -3 on the process. (I aways do killall -3 java)


            My disk is somewhat slow at sync, maybe because it is really syncing and not using any caching. If you don't get the deadlock, use a higher number of Messages as your disk may be faster.

            • 3. Re: Weird deadlock I had today...
              timfox

               

              "clebert.suconic@jboss.com" wrote:
              What happened on my test was a call to close taking longer than PING_TIME.

              The test was sending 100 messages, in 10 threads. All the sends were being performed asynchronously, and every PersistentMessage sent was syncing the page (it would be same effect on Journal if using NIO). When close was called, it had to wait the pending commands to finish, what took longer than the PING_TIME.

              But you shouldn't have a DeadLock on that case.. right?



              Why are you waiting for completion at close?

              For non transactional persistent messages you should be waiting for completion at send time, and for transactional at commit. In no case should you be waiting for completion at close.

              It's crucial that all commands can execute quickly so they prevent the remoting thread from executing other commands

              • 4. Re: Weird deadlock I had today...
                timfox

                Assuming you meant it had to wait for the backlog of commands on the connection to be processed rather than completions, then a couple of observations.

                I notice you have sync on persistent set to true on server, but set to false on the connection factory.

                I'm not sure if that makes sense (it might cause backlogs), perhaps we should disallow that combination. What happens if you have sync on send set on the session factory too?

                • 5. Re: Weird deadlock I had today...
                  timfox

                  Hmm, I tried with no changes (50000 messages) and the test passes for me.

                  I increased messages to 200000 and the test still passes for me....

                  • 6. Re: Weird deadlock I had today...
                    timfox

                    Increased to 1 million messages and test still passes

                    Thread Thread-10 finished sending in 603769 milliseconds
                    Thread Thread-18 finished sending in 604207 milliseconds
                    Thread Thread-9 finished sending in 605579 milliseconds
                    Thread Thread-17 finished sending in 605947 milliseconds
                    Thread Thread-8 finished sending in 608785 milliseconds
                    Thread Thread-16 finished sending in 609078 milliseconds
                    Thread Thread-19 finished sending in 611433 milliseconds
                    Thread Thread-11 finished sending in 613736 milliseconds
                    Thread Thread-7 finished sending in 615404 milliseconds
                    Thread Thread-14 finished sending in 615485 milliseconds
                    Thread Thread-15 finished sending in 615514 milliseconds
                    Thread Thread-6 finished sending in 615672 milliseconds
                    Thread Thread-5 finished sending in 678608 milliseconds
                    Thread Thread-13 finished sending in 678615 milliseconds
                    Thread Thread-21 finished sending in 678629 milliseconds
                    Thread Thread-20 finished sending in 680410 milliseconds
                    Thread Thread-4 finished sending in 680474 milliseconds
                    Thread Thread-12 finished sending in 680461 milliseconds
                    Total Time: 680485 milliseconds what represented 1205 per second
                    main 10:26:02,531 WARN [RemotingConnectionImpl] The conn has been closed by the server
                    main 10:26:02,689 WARN [RemotingConnectionImpl] The conn has been closed by the server
                    main 10:26:02,689 WARN [RemotingConnectionImpl] The conn has been closed by the server
                    main 10:26:02,690 WARN [RemotingConnectionImpl] The conn has been closed by the server
                    main 10:26:02,690 WARN [RemotingConnectionImpl] The conn has been closed by the server
                    main 10:26:02,690 WARN [RemotingConnectionImpl] The conn has been closed by the server
                    main 10:26:02,691 WARN [RemotingConnectionImpl] The conn has been closed by the server
                    main 10:26:02,691 WARN [RemotingConnectionImpl] The conn has been closed by the server
                    Full thread dump Java HotSpot(TM) Server VM (1.5.0_13-b05 mixed mode):
                    


                    • 7. Re: Weird deadlock I had today...
                      timfox

                      Another observation:

                      Total Time: 680485 milliseconds what represented 1205 per second

                      The calculation of the final figure is broken (1205), some times I get a -ve number!

                      • 8. Re: Weird deadlock I had today...
                        clebert.suconic

                         

                        "timfox" wrote:
                        Assuming you meant it had to wait for the backlog of commands on the connection to be processed rather than completions, then a couple of observations.

                        I notice you have sync on persistent set to true on server, but set to false on the connection factory.

                        I'm not sure if that makes sense (it might cause backlogs), perhaps we should disallow that combination. What happens if you have sync on send set on the session factory too?


                        On that case the test aways pass for me.

                        The point is the backlogs. Maybe as I have the SCSI disk it has a slower sync?

                        Lets talk about that later today on the IRC.