7 Replies Latest reply on Feb 12, 2015 1:53 PM by nigord

    Found a way to dead lock some management threads // Memory leak

    nigord

      Hello everyone,

       

      We are starting the migration path to Wildfly starting from a JBOSS 5 version.

       

      Our stack is the following:

       

      • Java(TM) SE Runtime Environment (build 1.8.0_31-b13)
      • Wildfly-8.2.0.final
      • CentOS Linux release 7.0.1406 (Core) [lastest patches]

       

      We are moving components by components and have deployed about 20 Oracle datasources.

       

      First step was to adapt our monitoring for those datasources utilisation.

      We used twiddle.sh before and have moved to using "The twiddle-standalone project on GitHub" as a mean to limit the amount of changed required in our monitoring strategy.

       

      We also do memory monitoring through jboss-cli.sh and have noticed the following memory utilization pattern.

      This is repeatable and occur over a 2 days cycle.

      image (2).png

      The memory histogram list the following as the major memory consumer:

      num     #instances         #bytes  class name

      ----------------------------------------------

         1:         65691      770643968  [B

         2:        262778       17226488  [Ljava.lang.Object;

       

      Lookinh deeper into this, I found out what I think is the cause:

       

      Found one Java-level deadlock:

      =============================

      "Remoting "bouvmbuild05:MANAGEMENT" task-14":

        waiting to lock monitor 0x00007f13643b0328 (object 0x00000000ecdd8a00, a java.util.ArrayDeque),

        which is held by "XNIO-1 I/O-2"

      "XNIO-1 I/O-2":

        waiting to lock monitor 0x00007f1360758d18 (object 0x00000000ece452c8, a org.xnio.streams.BufferPipeOutputStream),

        which is held by "Remoting "bouvmbuild05:MANAGEMENT" task-10"

      "Remoting "bouvmbuild05:MANAGEMENT" task-10":

        waiting to lock monitor 0x00007f13643b0328 (object 0x00000000ecdd8a00, a java.util.ArrayDeque),

        which is held by "XNIO-1 I/O-2"

       

       

      Java stack information for the threads listed above:

      ===================================================

      "Remoting "bouvmbuild05:MANAGEMENT" task-14":

        at org.jboss.remoting3.remote.RemoteConnectionHandler.closeAllChannels(RemoteConnectionHandler.java:421)

        - waiting to lock <0x00000000ecdd8a00> (a java.util.ArrayDeque)

        at org.jboss.remoting3.remote.RemoteConnectionHandler.handleConnectionClose(RemoteConnectionHandler.java:114)

        at org.jboss.remoting3.remote.RemoteReadListener$1$1.run(RemoteReadListener.java:56)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

        at java.lang.Thread.run(Thread.java:745)

      "XNIO-1 I/O-2":

        at org.jboss.remoting3.remote.OutboundMessage.cancel(OutboundMessage.java:288)

        - waiting to lock <0x00000000ece452c8> (a org.xnio.streams.BufferPipeOutputStream)

        at org.jboss.remoting3.remote.RemoteConnectionChannel.closeMessages(RemoteConnectionChannel.java:560)

        at org.jboss.remoting3.remote.RemoteConnectionChannel.closeAction(RemoteConnectionChannel.java:542)

        at org.jboss.remoting3.spi.AbstractHandleableCloseable.closeAsync(AbstractHandleableCloseable.java:359)

        at org.jboss.remoting3.remote.RemoteConnectionHandler.closeAllChannels(RemoteConnectionHandler.java:423)

        - locked <0x00000000ecdd8a00> (a java.util.ArrayDeque)

        at org.jboss.remoting3.remote.RemoteConnectionHandler.receiveCloseRequest(RemoteConnectionHandler.java:213)

        at org.jboss.remoting3.remote.RemoteReadListener.handleEvent(RemoteReadListener.java:110)

        at org.jboss.remoting3.remote.RemoteReadListener.handleEvent(RemoteReadListener.java:45)

        at org.xnio.ChannelListeners.invokeChannelListener(ChannelListeners.java:92)

        at org.xnio.channels.TranslatingSuspendableChannel.handleReadable(TranslatingSuspendableChannel.java:199)

        at org.xnio.channels.TranslatingSuspendableChannel$1.handleEvent(TranslatingSuspendableChannel.java:113)

        at org.xnio.ChannelListeners.invokeChannelListener(ChannelListeners.java:92)

        at org.xnio.ChannelListeners$DelegatingChannelListener.handleEvent(ChannelListeners.java:1092)

        at org.xnio.ChannelListeners.invokeChannelListener(ChannelListeners.java:92)

        at org.xnio.conduits.ReadReadyHandler$ChannelListenerHandler.readReady(ReadReadyHandler.java:66)

        at org.xnio.nio.NioSocketConduit.handleReady(NioSocketConduit.java:88)

        at org.xnio.nio.WorkerThread.run(WorkerThread.java:539)

      "Remoting "bouvmbuild05:MANAGEMENT" task-10":

        at org.jboss.remoting3.remote.RemoteConnection$RemoteWriteListener.send(RemoteConnection.java:294)

        - waiting to lock <0x00000000ecdd8a00> (a java.util.ArrayDeque)

        at org.jboss.remoting3.remote.RemoteConnection.send(RemoteConnection.java:122)

        at org.jboss.remoting3.remote.OutboundMessage$1.accept(OutboundMessage.java:154)

        at org.xnio.streams.BufferPipeOutputStream.send(BufferPipeOutputStream.java:122)

        at org.xnio.streams.BufferPipeOutputStream.send(BufferPipeOutputStream.java:110)

        at org.xnio.streams.BufferPipeOutputStream.flush(BufferPipeOutputStream.java:139)

        - locked <0x00000000ece452c8> (a org.xnio.streams.BufferPipeOutputStream)

        at org.xnio.streams.BufferPipeOutputStream.flush(BufferPipeOutputStream.java:131)

        at org.jboss.remoting3.remote.OutboundMessage.flush(OutboundMessage.java:277)

        at java.io.DataOutputStream.flush(DataOutputStream.java:123)

        at java.io.FilterOutputStream.close(FilterOutputStream.java:158)

        at org.jboss.remotingjmx.DelegatingRemotingConnectorServer.writeVersionHeader(DelegatingRemotingConnectorServer.java:208)

        at org.jboss.remotingjmx.DelegatingRemotingConnectorServer.access$200(DelegatingRemotingConnectorServer.java:60)

        at org.jboss.remotingjmx.DelegatingRemotingConnectorServer$ChannelOpenListener.channelOpened(DelegatingRemotingConnectorServer.java:288)

        at org.jboss.remoting3.spi.SpiUtils$ServiceOpenTask.run(SpiUtils.java:126)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

        at java.lang.Thread.run(Thread.java:745)

       

      Found 1 deadlock.

       

      Now, I was wondering what to do with this?

      Do you think this is me doing something terribly wrong or could I be on to some bug?

       

      I have the heap dump, histogram, thread dump available should those help pointing me to the next step.

       

      Thanks ahead,

      Maxime

        • 1. Re: Found a way to dead lock some management threads // Memory leak
          ctomc

          This looks related to https://issues.jboss.org/browse/REM3-200

           

          also could you try with jboss-remoting-4.0.7.Final ?

          1 of 1 people found this helpful
          • 2. Re: Found a way to dead lock some management threads // Memory leak
            nigord

            Thanks!

             

            I agree that these looks somewhat similar.

            I will do the following 3 tests:

             

            • Test with the following flag overnight: -Djboss.remoting.pooled-buffers=false
            • This will confirm us whether our problems are really similar because they behave the same to that change.
            • Revert
            • Then, try to change the jboss-remoting version as suggested in your comment. Run for a couple hour.
            • Revert
            • Then, try to change the NIO version as suggested in that BUG. Run for a couple hour.
            • Add back the jboss-remoting version change and run for a couple hour.

             

            This should give us a pretty good view of what affect us.

            Ok with you?

            • 3. Re: Found a way to dead lock some management threads // Memory leak
              ctomc

              sounds like great plan.

              • 4. Re: Found a way to dead lock some management threads // Memory leak
                nigord

                Test one indeed changed the memory pattern. Now the GC is able to do it's job.

                image (4).png

                 

                Moving on to test 2.

                • 5. Re: Found a way to dead lock some management threads // Memory leak
                  nigord

                  Ok, I have replaced the 4.0.6 with the following http://mvnrepository.com/artifact/org.jboss.remoting/jboss-remoting/4.0.7.Final

                  Changed the modules and Wildfly is running and appears to by running fine.

                   

                  But this appear to break the ability for the service wrapper to confirm startup success.

                   

                  [root@bouvmbuild05 standalone]# service wildfly start

                  Starting wildfly (via systemctl): Job for wildfly.service failed. See 'systemctl status wildfly.service' and 'journalctl -xn' for details.                                                      [FAILED]

                  [root@bouvmbuild05 standalone]# systemctl status wildfly.service

                  wildfly.service - SYSV: WildFly startup script

                     Loaded: loaded (/etc/rc.d/init.d/wildfly)

                     Active: failed (Result: timeout) since Thu 2015-02-12 09:54:41 EST; 9min ago

                    Process: 9875 ExecStop=/etc/rc.d/init.d/wildfly stop (code=exited, status=0/SUCCESS)

                    Process: 9958 ExecStart=/etc/rc.d/init.d/wildfly start (code=exited, status=0/SUCCESS)

                  Main PID: 3956

                   

                  Feb 12 09:49:41 bouvmbuild05.acquisio.com systemd[1]: Starting SYSV: WildFly startup script...

                  Feb 12 09:49:41 bouvmbuild05.acquisio.com runuser[9968]: pam_unix(runuser:session): session opened for user root by (uid=0)

                  Feb 12 09:49:46 bouvmbuild05.acquisio.com wildfly[9958]: Starting wildfly: [  OK  ]

                  Feb 12 09:49:46 bouvmbuild05.acquisio.com systemd[1]: PID file /var/run/wildfly/wildfly.pid not readable (yet?) afte...art.

                  Feb 12 09:54:41 bouvmbuild05.acquisio.com systemd[1]: wildfly.service operation timed out. Terminating.

                  Feb 12 09:54:41 bouvmbuild05.acquisio.com systemd[1]: Failed to start SYSV: WildFly startup script.

                  Feb 12 09:54:41 bouvmbuild05.acquisio.com systemd[1]: Unit wildfly.service entered failed state.

                  Hint: Some lines were ellipsized, use -l to show in full.

                   

                  [root@bouvmbuild05 standalone]# ps -ef | grep java

                  root     10013  9971  2 09:49 ?        00:00:19 /usr/java/jdk1.8.0_31//bin/java -D[Standalone] -server -Xms1024m -Xmx1024m -XX:MaxPermSize=256m -Djava.net.preferIPv4Stack=true -Djboss.modules.system.pkgs=org.jboss.byteman -Djava.awt.headless=true -Dorg.jboss.boot.log.file=/usr/local/wildfly/standalone/log/server.log -Dlogging.configuration=file:/usr/local/wildfly/standalone/configuration/logging.properties -jar /usr/local/wildfly/jboss-modules.jar -mp /usr/local/wildfly/modules org.jboss.as.standalone -Djboss.home.dir=/usr/local/wildfly -Djboss.server.base.dir=/usr/local/wildfly/standalone

                   

                  I will let it run like this for 2 hours.

                  • 6. Re: Found a way to dead lock some management threads // Memory leak
                    nigord

                    If we ignore the service problem; 4.0.7 does not present the same memory problem as 4.0.6:

                     

                    image (5).png

                    I will now revert and run 2 hour with the latest NIO.

                    • 7. Re: Found a way to dead lock some management threads // Memory leak
                      nigord

                      Ok forget about my comment about the service status. I did a stupid typo in standalone.sh and I was the cause.

                      Regarding, NIO, 8.2.0 is already packaging the latest and greatest; so the third test is not possible.

                       

                      So I have 2 acceptable work around:

                      A) -Djboss.remoting.pooled-buffers=false or B) jboss-remoting-4.0.7.Final.jar


                      I am affraid of performance impact of solution A), I think I prefer solution B).

                      Do you agree?


                      If so, I will mark this thread as resolved.

                      Hopefully, others might find this interesting as well.