3 Replies Latest reply on May 25, 2016 2:31 PM by dmlloyd

    EJB Client threads left waiting to acquire a semaphore.

    whitingjr

      Hi,

      I am using a benchmark to drive load at an application which includes remote RMI calls on a SLSB. Along with other protocol calls but unrelated to the issue I have.

      The benchmark has load driven by a number of client machines. On occasion one of the clients stop sending network traffic. This should not happen when the benchmark is midway through.

       

      Using jstack I inspected the thread stack traces for one of the jvm processes. I found several hundred threads WAITING with stack trace like this

       

      "AcmeAgent[0].2234" #2287 daemon prio=5 os_prio=0 tid=0x00007f25b0f60800 nid=0x4f1f waiting on condition [0x00007f231fa76000]

             java.lang.Thread.State: WAITING (parking)

                  at sun.misc.Unsafe.park(Native Method)

                  - parking to wait for  <0x00000003d4d54850> (a java.util.concurrent.Semaphore$FairSync)

                  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

                  at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)

                  at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)

                  at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)

                  at java.util.concurrent.Semaphore.acquire(Semaphore.java:312)

                  at org.jboss.ejb.client.remoting.ChannelAssociation.acquireChannelMessageOutputStream(ChannelAssociation.java:275)

                  at org.jboss.ejb.client.remoting.RemotingConnectionEJBReceiver.processInvocationInternal(RemotingConnectionEJBReceiver.java:295)

                  at org.jboss.ejb.client.remoting.RemotingConnectionEJBReceiver.access$100(RemotingConnectionEJBReceiver.java:77)

                  at org.jboss.ejb.client.remoting.RemotingConnectionEJBReceiver$2.run(RemotingConnectionEJBReceiver.java:282)

                  at java.security.AccessController.doPrivileged(Native Method)

                  at org.jboss.ejb.client.remoting.RemotingConnectionEJBReceiver.processInvocation(RemotingConnectionEJBReceiver.java:279)

                  at org.jboss.ejb.client.EJBClientInvocationContext.sendRequest(EJBClientInvocationContext.java:184)

                  at org.jboss.ejb.client.EJBObjectInterceptor.handleInvocation(EJBObjectInterceptor.java:58)

                  at org.jboss.ejb.client.EJBClientInvocationContext.sendRequest(EJBClientInvocationContext.java:186)

                  at org.jboss.ejb.client.EJBHomeInterceptor.handleInvocation(EJBHomeInterceptor.java:83)

                  at org.jboss.ejb.client.EJBClientInvocationContext.sendRequest(EJBClientInvocationContext.java:186)

                  at org.jboss.ejb.client.TransactionInterceptor.handleInvocation(TransactionInterceptor.java:42)

                  at org.jboss.ejb.client.EJBClientInvocationContext.sendRequest(EJBClientInvocationContext.java:186)

                  at org.jboss.ejb.client.ReceiverInterceptor.handleInvocation(ReceiverInterceptor.java:138)

                  at org.jboss.ejb.client.EJBClientInvocationContext.sendRequest(EJBClientInvocationContext.java:186)

                  at org.jboss.ejb.client.EJBInvocationHandler.sendRequestWithPossibleRetries(EJBInvocationHandler.java:255)

                  at org.jboss.ejb.client.EJBInvocationHandler.doInvoke(EJBInvocationHandler.java:200)

                  at org.jboss.ejb.client.EJBInvocationHandler.doInvoke(EJBInvocationHandler.java:183)

                  at org.jboss.ejb.client.EJBInvocationHandler.invoke(EJBInvocationHandler.java:146)

                  at com.sun.proxy.$Proxy4.acmeTask(Unknown Source)

                  at org.acme.Acme.createVehicle(Acme.java:389)

                  at org.acme.Acme.createVehicleEJB(Acme.java:112)

                  at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)

                  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

                  at java.lang.reflect.Method.invoke(Method.java:497)

                  at com.sun.faban.driver.engine.TimeThread.doRun(TimeThread.java:169)

                  at com.sun.faban.driver.engine.AgentThread.run(AgentThread.java:202)

       

       

      After some minutes of waiting the jvm process still isn't driving load. Still with threads in WAITING parked state.

      Looking at this it appears at some point beforehand an exception was thrown in the code. Causing a failure to release the acquired Semaphore. Over time Semaphores might have leaked causing exhaustion of available Semaphores.

       

      I am using Wildfly 10.Final. Which has become EAP 7.0 GA.

      The version of the jboss-ejb-client used is 2.1.4.Final. So I checked out the project and looked at the code.

       

      Looking at acquireChannelMessageOutputStream I can see there is a possible cause for the leak.

      Say for example if j.l.InterruptedException is thrown and wrapped. Wrapped as j.io.InterruptedIOException the code will return before calling the line


      this.channelWriteSemaphore.release();

       

      In fact to release the Semaphore the method has to throw an IOException to access the exception catch block. Containing the call to release the Semaphore.

       

      In an attempt to improve this code I patched the code and tested again using the benchmark. The patch moves the release to a finally block. To ensure Semaphore permits do not leak.

      The issue of the jvm threads parked WAITING for a semaphore has gone.

       

      Do you see there is the possibility of leaking Semaphore permits ?

       

      Regards,

      Jeremy

        • 1. Re: EJB Client threads left waiting to acquire a semaphore.
          whitingjr

          Ok I do see now there is a corresponding release for the acquired permit. In the RemotingConnectionEJBReceiver.processInvocationInternal method.

          • 2. Re: EJB Client threads left waiting to acquire a semaphore.
            whitingjr

            The code it does appear to be correct.

            But again today a different machine (using the same JDK version) became wedged. No longer creating load for the target system.

             

            As some background the driver application logs the number of threads started in the jvm. At startup.

             

            <record>

              <host>someserver2</host>

              <date>2016-05-25T12:19:15</date>

              <millis>1464193155390</millis>

              <sequence>4524</sequence>

              <logger>com.sun.faban.driver.engine.AgentImpl</logger>

              <level>INFO</level>

              <class>com.sun.faban.driver.engine.AgentImpl</class>

              <method>run</method>

              <thread>28</thread>

              <message>DuffBeerAgent[1]: Successfully started 2250 driver threads.</message>

            </record>

             

            Indicating 2250 started threads.

            When the jvm stopped creating load I used jstack to dump out the call stack of each thread. Saved to a file called 8704.txt.

            To check the number of threads blocked waiting for a permit I used grep.

             

            # grep "AbstractQueuedSynchronizer.doAcquireSharedInterruptibly" /tmp/8704.txt |wc -l

            2250

             

            Identical numbers. Every thread is waiting for a Semaphore. The ejb client was started using the default of 80. If the documentation I found is still relevant.

             

            Overview of Client properties - EJB Client Library (AS7+) - Project Documentation Editor

             

            Think I will need to track down where the 80 semaphores have gone.

            • 3. Re: EJB Client threads left waiting to acquire a semaphore.
              dmlloyd

              It is possible that for some reason the message streams are being opened, but never closed (thus never releasing the semaphore) - i.e. somehow the request is going into a "black hole".  That said, in this particular version I'm not 100% sure of the code paths especially resiliancy around failure cases, which I'd have to analyze from scratch, so there may just be a plain old bug in some path (e.g. exception or cancellation handling, or maybe asynchronous invocation) that causes the cleanup to be bypassed.