12 Replies Latest reply on Apr 3, 2013 1:24 PM by clebert.suconic

    horentq-failure-check-thread stopps removing death client connections

    michael10

      We are using hornetq 2.2.14 and OIO

      It seems that the hornetq-failure-check-thread is blocking a very long time (hours) on getting snchronized "lock" Object. There is no death-lock detected

       

      Thread[hornetq-failure-check-thread,5,jboss]

          org.hornetq.core.server.impl.ServerConsumerImpl.setStarted(ServerConsumerImpl.java:499)

          org.hornetq.core.server.impl.ServerSessionImpl.doRollback(ServerSessionImpl.java:1427)

          org.hornetq.core.server.impl.ServerSessionImpl.rollback(ServerSessionImpl.java:692)

          org.hornetq.core.server.impl.ServerSessionImpl.doClose(ServerSessionImpl.java:300)

          org.hornetq.core.server.impl.ServerSessionImpl.access$100(ServerSessionImpl.java:87)

          org.hornetq.core.server.impl.ServerSessionImpl$1.done(ServerSessionImpl.java:1089)

          org.hornetq.core.persistence.impl.nullpm.NullStorageManager.afterCompleteOperations(NullStorageManager.java:400)

          org.hornetq.core.server.impl.ServerSessionImpl.close(ServerSessionImpl.java:1079)

          org.hornetq.core.server.impl.ServerSessionImpl.connectionFailed(ServerSessionImpl.java:1352)

          org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.callFailureListeners(RemotingConnectionImpl.java:579)

          org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.fail(RemotingConnectionImpl.java:336)

          org.hornetq.core.remoting.server.impl.RemotingServiceImpl$FailureCheckAndFlushThread.run(RemotingServiceImpl.java:631)

       

       

      And a lot of:

       

      Thread[Thread-69 (HornetQ-server-HornetQServerImpl::serverUUID=f98644db-6fa1-11e2-aa99-efbb161a08cc-3765233),5,HornetQ-server-HornetQServerImpl::serverUUID=f98644db-6fa1-11e2-aa99-efbb161a08cc-3765233]

          org.hornetq.core.protocol.core.impl.ChannelImpl.send(ChannelImpl.java:174)

          org.hornetq.core.protocol.core.impl.ChannelImpl.sendBatched(ChannelImpl.java:162)

          org.hornetq.core.protocol.core.impl.CoreSessionCallback.sendMessage(CoreSessionCallback.java:76)

          org.hornetq.core.server.impl.ServerConsumerImpl.deliverStandardMessage(ServerConsumerImpl.java:798)

          org.hornetq.core.server.impl.ServerConsumerImpl.handle(ServerConsumerImpl.java:313)

          org.hornetq.core.server.impl.QueueImpl.handle(QueueImpl.java:2200)

          org.hornetq.core.server.impl.QueueImpl.deliver(QueueImpl.java:1751)

          org.hornetq.core.server.impl.QueueImpl.doPoll(QueueImpl.java:1630)

          org.hornetq.core.server.impl.QueueImpl.access$1300(QueueImpl.java:77)

          org.hornetq.core.server.impl.QueueImpl$ConcurrentPoller.run(QueueImpl.java:2487)

          org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:100)

          java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

          java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

          java.lang.Thread.run(Thread.java:619)

       

      Thread[Thread-227 (HornetQ-server-HornetQServerImpl::serverUUID=f98644db-6fa1-11e2-aa99-efbb161a08cc-3765233),5,HornetQ-server-HornetQServerImpl::serverUUID=f98644db-6fa1-11e2-aa99-efbb161a08cc-3765233]

          sun.misc.Unsafe.park(Native Method)

          java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)

          java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)

          java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:905)

          java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1217)

          java.util.concurrent.Semaphore.acquire(Semaphore.java:286)

          org.hornetq.core.remoting.impl.netty.NettyConnection.write(NettyConnection.java:182)

          org.hornetq.core.protocol.core.impl.ChannelImpl.send(ChannelImpl.java:225)

          org.hornetq.core.protocol.core.impl.ChannelImpl.sendBatched(ChannelImpl.java:162)

          org.hornetq.core.protocol.core.impl.CoreSessionCallback.sendMessage(CoreSessionCallback.java:76)

          org.hornetq.core.server.impl.ServerConsumerImpl.deliverStandardMessage(ServerConsumerImpl.java:798)

          org.hornetq.core.server.impl.ServerConsumerImpl.handle(ServerConsumerImpl.java:313)

          org.hornetq.core.server.impl.QueueImpl.handle(QueueImpl.java:2200)

          org.hornetq.core.server.impl.QueueImpl.deliver(QueueImpl.java:1751)

          org.hornetq.core.server.impl.QueueImpl.doPoll(QueueImpl.java:1630)

          org.hornetq.core.server.impl.QueueImpl.access$1300(QueueImpl.java:77)

          org.hornetq.core.server.impl.QueueImpl$ConcurrentPoller.run(QueueImpl.java:2487)

          org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:100)

          java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

          java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

          java.lang.Thread.run(Thread.java:619)

       

      ...

       

      We run into this problem when the terminal server, logoff client sessions after inactivity and hornetq must clean up death connections. This could happen very often.

      If the cleanup thread did not work a long time, we sometimes run in a out of memory.

       

      I think there is a relation to discussion: https://community.jboss.org/thread/203648.

      I hope there is another opportunity, so we do not have to change from OIO to NIO.

       

      I attache a thread dump.

      If needed, I will send you a heap-dump.

       

      Greetings

        • 1. Re: horentq-failure-check-thread stopps removing death client connections
          clebert.suconic

          I have recently changed the branches to not lock on the queue during delivery, what should fix this.

          • 2. Re: horentq-failure-check-thread stopps removing death client connections
            phylanx

            Hello!

            I was analysing this problem with Michael Woldrich.

            The problem here isn't a lock on the queue.

             

            Maybe some more analysis info:

            failure check thread waits for a lock object of a ServerConsumerImpl.

            In this thread dump there are a total of 3 ServerConsumerImpls holding this lock ("Thread-227","Thread-230","Thread-63").

            Of this three threads, the Thread-230 is the one blocking the other two (holding the writeLock semaphore of the NettyConnection).

             

            Thread[Thread-230 (HornetQ-server-HornetQServerImpl::serverUUID=f98644db-6fa1-11e2-aa99-efbb161a08cc-3765233),5,HornetQ-server-HornetQServerImpl::serverUUID=f98644db-6fa1-11e2-aa99-efbb161a08cc-3765233]

                java.net.SocketOutputStream.socketWrite0(Native Method)

                java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)

                java.net.SocketOutputStream.write(SocketOutputStream.java:136)

                org.jboss.netty.buffer.HeapChannelBuffer.getBytes(HeapChannelBuffer.java:116)

                org.jboss.netty.buffer.DynamicChannelBuffer.getBytes(DynamicChannelBuffer.java:156)

                org.jboss.netty.channel.socket.oio.OioWorker.write(OioWorker.java:119)

                org.jboss.netty.channel.socket.oio.OioServerSocketPipelineSink.handleAcceptedSocket(OioServerSocketPipelineSink.java:126)

                org.jboss.netty.channel.socket.oio.OioServerSocketPipelineSink.eventSunk(OioServerSocketPipelineSink.java:64)

                org.jboss.netty.channel.StaticChannelPipeline$StaticChannelHandlerContext.sendDownstream(StaticChannelPipeline.java:522)

                org.jboss.netty.channel.SimpleChannelHandler.writeRequested(SimpleChannelHandler.java:304)

                org.jboss.netty.channel.SimpleChannelHandler.handleDownstream(SimpleChannelHandler.java:266)

                org.jboss.netty.channel.StaticChannelPipeline.sendDownstream(StaticChannelPipeline.java:399)

                org.jboss.netty.channel.StaticChannelPipeline.sendDownstream(StaticChannelPipeline.java:390)

                org.jboss.netty.channel.Channels.write(Channels.java:611)

                org.jboss.netty.channel.Channels.write(Channels.java:578)

                org.jboss.netty.channel.AbstractChannel.write(AbstractChannel.java:251)

                org.hornetq.core.remoting.impl.netty.NettyConnection.write(NettyConnection.java:220)

                org.hornetq.core.protocol.core.impl.ChannelImpl.send(ChannelImpl.java:225)

                org.hornetq.core.protocol.core.impl.ChannelImpl.sendBatched(ChannelImpl.java:162)

                org.hornetq.core.protocol.core.impl.CoreSessionCallback.sendMessage(CoreSessionCallback.java:76)

                org.hornetq.core.server.impl.ServerConsumerImpl.deliverStandardMessage(ServerConsumerImpl.java:798)

                org.hornetq.core.server.impl.ServerConsumerImpl.handle(ServerConsumerImpl.java:313)

                org.hornetq.core.server.impl.QueueImpl.handle(QueueImpl.java:2200)

                org.hornetq.core.server.impl.QueueImpl.deliver(QueueImpl.java:1751)

                org.hornetq.core.server.impl.QueueImpl.doPoll(QueueImpl.java:1630)

                org.hornetq.core.server.impl.QueueImpl.access$1300(QueueImpl.java:77)

                org.hornetq.core.server.impl.QueueImpl$ConcurrentPoller.run(QueueImpl.java:2487)

                org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:100)

                java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

                java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

                java.lang.Thread.run(Thread.java:619)

             

            So this is not a Queue Locking problem.

            And the threads are still blocking after one day runtime.

             

            Yours sincerly Johann

            • 3. Re: horentq-failure-check-thread stopps removing death client connections
              clebert.suconic

              The queue is locking because your client is stuck and not receiving messages

              • 4. Re: horentq-failure-check-thread stopps removing death client connections
                clebert.suconic

                if you could provide a thread dump on both client and server for when it happened?

                 

                But the queue change I mentioned should avoid this anyway..... I just wanted to make sure this isn't something else.

                • 5. Re: horentq-failure-check-thread stopps removing death client connections
                  alex75

                  Hello Clebert!

                  Which change do you mean? Can you provide a link to the CVS-submit?

                  Is this change already released with some 2.2.x minor version?

                  • 7. Re: horentq-failure-check-thread stopps removing death client connections
                    alex75

                    I've seen there are only changes for 2.2.EAP.next for a long time. When do you plan to release it? Or should we/can we download the current state somewhere?

                    • 8. Re: horentq-failure-check-thread stopps removing death client connections
                      clebert.suconic

                      That's the one I did the backport.. yeah.

                       

                      point releases on 2.2.EAP and AS7 are for EAP5 and 6. This will be released on 2.3.0.Final in a week or 2. But you're welcome to take the branch and build yourself if you like.

                      • 9. Re: horentq-failure-check-thread stopps removing death client connections
                        phylanx

                        We found the hanging client.

                        The thread dump is attached.

                        • 10. Re: horentq-failure-check-thread stopps removing death client connections
                          michael10

                          We decided to integrate the patch (https://github.com/hornetq/hornetq/commit/c469327f7a6d9418e2134d10ce20606b897d6ea4) into our 2.2.14. Because we have already integrate patches into this brunch and we need a quick solution for our customers.

                          So please inform us if you see any problems like missing- or dependent changes.

                          • 11. Re: horentq-failure-check-thread stopps removing death client connections
                            clebert.suconic

                            If I was on your side, I would rebase your committs in top of Branch_2_2_EAP or Branch_2_2_AS7.. whatever that is you are using now...   keep that on a github account. Make proper tags for stuff you put into production (i.e. releases). If you ever get into a situation, you can easily look at your tag and your own releases. I would also change the version properties to make sure they match what you have and make it easy to find the proper tags.

                             

                             

                            If you don't want to rebase.. at least keep that on a github account and make proper tags. At least if you ever get into a stack trace or something you can refer us your repository and we can at least make some sense of the stack trace information. We can't promise to fully support a fork.. but we can do our best effort on that case.

                            • 12. Re: horentq-failure-check-thread stopps removing death client connections
                              clebert.suconic

                              Just a notice.. I just post edited by previous note.. basically just fixed some grammar and added some extra information about version.properties.