4 Replies Latest reply on Feb 13, 2007 2:59 AM by Yuske KOMORI

    Memory leak caused by Connection$PingTask

    John Amos Newbie

      JBoss [Zion] 4.0.3SP1 (build: CVSTag=JBoss_4_0_3_SP1 date=200510231054)

      ClockDaemon is used by org.jboss.mg.Connection to periodically run Connection$PingTask. ClockDaemon holds a Heap that contains references to PingTask instances. Normally, the PingTask instances are removed from the ClockDaemon's Heap by ClockDaemon$RunLoop, which contains an infinite loop that runs each PingTask.

      Ocassionally, after running my web application for 20-30 hours, I start to see a memory leak that is caused by references to objects in the ClockDaemon's Heap.

      My application has a scheduler that issues a JMS event every 30 seconds, and each one instantiates a new Connection. The constructor for Connection calls startPingThread(), which runs

      clockDaemon.executePeriodically(pingPeriod, new PingTask(), true);


      This method call inserts a new ClockDaemon$TaskNode on the ClockDaemon's Heap.

      When I put my debugger on the leaking application, I see that ClockDaemon$RunLoop is hung on the following line:

      task.command.run();


      The task is a TaskNode instance with a command that is a PingTask instance. As a result, the loop is stopped, so none of the objects can be removed from the ClockDaemon's Heap.

      Drilling down into the PingTask.run() method, I see that it is hung at the following line:

      pingTaskSemaphore.acquire();


      My debugger shows that pingTaskSemaphore has zero permits, so the acquire() method will block until another thread calls pingTaskSemaphore.release(). But apparently release() is never called.

      It seems to me that this should never be allowed to happen, because a low priority ping task is effectively hijacking the ClockDaemon's loop that dereferences objects. Once the application reaches this state, the JVM ultimately fails with an OutOfMemory error.

      I don't immediately see why pingTaskSemaphore.release() is never called, except that pingTaskSemaphore.acquire() is not in the same try-finally block. The PingTask.run() method is below:

       /**
       * The ping task
       */
       class PingTask implements Runnable
       {
       /**
       * Main processing method for the PingTask object
       */
       public void run()
       {
       try
       {
       pingTaskSemaphore.acquire();
       }
       catch (InterruptedException e)
       {
       log.debug("Interrupted requesting ping semaphore");
       return;
       }
       try
       {
       if (ponged == false)
       {
       // Server did not pong use with in the timeout
       // period.. Assuming the connection is dead.
       throw new SpyJMSException("No pong received", new IOException("ping timeout."));
       }
      
       ponged = false;
       pingServer(System.currentTimeMillis());
       }
       catch (Throwable t)
       {
       asynchFailure("Unexpected ping failure", t);
       }
       finally
       {
       pingTaskSemaphore.release();
       }
       }
       }
      


      Notice that pingTaskSemaphore.release() is in a finally block, but it isn't the same try block as the pingTaskSemaphore.acquire() method call, which means that it is possible for pingTaskSemaphore.acquire() to decrement the Semaphore's permits and then throw an exception that is not InterruptedException. In that case, the finally block would never be executed because the Exception would be thrown from the first try block. However, this last paragraph is speculation. I don't really know what is happening. I also don't know why it takes 20-30 hours of operation for this problem to appear. I don't have a reproducible test case because I can't consistently reproduce this problem.