1 Reply Latest reply on Jun 20, 2016 8:41 AM by jaikiran

    Wildfly 10.0.0 occasional performance issues leading to inoperability

    gargan

      Hello,

       

      We run an ejb based web service on Wildfly. It uses both SOAP and REST web service endpoints and a MySQL database. Shortly after we migrated some production servers from Wildfly 9.0.2.Final to 10.0.0.Final, we started experiencing occasional performance problems that rapidly escalate to complete inoperability. Symptoms include constant very high processor load, rapidly rising memory consumption that leads to out of memory exceptions, increased count of timer interrupts, web server serving content slower and slower, some EJB operations taking a very long time to complete and management interface becoming unresponsive so that deploying and undeploying applications (using jboss-cli.sh) fails on a timeout. The server remained responsive enough that it could be restarted using the init script.

       

      The operating system is CentOS 6.8, with OpenJDK version downgraded to java-1.8.0-openjdk-1.8.0.91-0.b14.el6_7 from CentOS 6.7 because of this bug: Issue with SSL and java-1.8.0-openjdk 91-1.b14 - Red Hat Customer Portal

       

      We have thus far only encountered this issue in production, though on two separate servers. One of them is virtual and the other is bare metal, so it is very unlikely this is a hardware issue. Our servers that run Wildfly 10 on CentOS 7 have not encountered the issue yet. We haven't been able to isolate a factor that would trigger the issue. It has happened as soon as 15 minutes after application server restart. The MySQL database has shown no signs of trouble, such as deadlocks. MySQL JDBC connector version is 5.1.33, and the issue also occurs when using version 5.1.39. There is indication that high load would make the issue more likely to occur but it is not a requirement as the issue has also occurred during off peak hours.

       

      I took a few stack traces spaced a few seconds apart using "kill -3" at the time the issue was happening. Attached is the console.log file containing the stack traces.

       

      We have also observed that with Wildfly 9 the server sometimes consumes all processing time of one core until restart. This can accumulate multiple times but does not happen often and does not lead to inoperability. The server remains fully responsive. This has been observed on the aforementioned two servers that ran into the more serious issue on Wildfly 10 and also on one server running CentOS 7 that has worked properly with Wildfly 10.

        • 1. Re: Wildfly 10.0.0 occasional performance issues leading to inoperability
          jaikiran

          Each of those thread dumps has a couple of threads which look interesting. Here's those 2 threads from one of the thread dumps:

          "default I/O-22" #110 prio=5 os_prio=0 tid=0x00007ff32c2a2800 nid=0x64d waiting for monitor entry [0x00007ff3adfa8000]

             java.lang.Thread.State: BLOCKED (on object monitor)

              at sun.security.ssl.SSLSessionImpl.getPacketBufferSize(SSLSessionImpl.java:783)

              - waiting to lock <0x000000053fcc8110> (a sun.security.ssl.SSLSessionImpl)

              at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:871)

              at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:781)

              - locked <0x00000005455b47c8> (a java.lang.Object)

              at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624)

              at io.undertow.protocols.ssl.SslConduit.doUnwrap(SslConduit.java:705)

              at io.undertow.protocols.ssl.SslConduit.doHandshake(SslConduit.java:608)

              at io.undertow.protocols.ssl.SslConduit.access$600(SslConduit.java:63)

              at io.undertow.protocols.ssl.SslConduit$SslReadReadyHandler.readReady(SslConduit.java:1034)

              at io.undertow.protocols.ssl.SslConduit$1.run(SslConduit.java:229)

              at org.xnio.nio.WorkerThread.safeRun(WorkerThread.java:580)

              at org.xnio.nio.WorkerThread.run(WorkerThread.java:464)

           

           

          "default I/O-12" #98 prio=5 os_prio=0 tid=0x00007ff32c28e800 nid=0x643 runnable [0x00007ff3ae9b2000]

             java.lang.Thread.State: RUNNABLE

              at sun.security.ssl.SSLSessionImpl.getPacketBufferSize(SSLSessionImpl.java:783)

              - locked <0x000000053fcc8110> (a sun.security.ssl.SSLSessionImpl)

              at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:871)

              at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:781)

              - locked <0x000000054542f908> (a java.lang.Object)

              at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624)

              at io.undertow.protocols.ssl.SslConduit.doUnwrap(SslConduit.java:705)

              at io.undertow.protocols.ssl.SslConduit.doWrap(SslConduit.java:789)

              at io.undertow.protocols.ssl.SslConduit.doHandshake(SslConduit.java:609)

              at io.undertow.protocols.ssl.SslConduit.access$600(SslConduit.java:63)

              at io.undertow.protocols.ssl.SslConduit$SslReadReadyHandler.readReady(SslConduit.java:1034)

              at io.undertow.protocols.ssl.SslConduit$1.run(SslConduit.java:229)

              at org.xnio.nio.WorkerThread.safeRun(WorkerThread.java:580)

              at org.xnio.nio.WorkerThread.run(WorkerThread.java:464)

           

          I have highlighted the object locks that seem to be playing a role. So appears to be something related to SSL handling. I haven't looked at the code so won't be able to tell what exactly might be causing this.