7 Replies Latest reply on Feb 7, 2008 5:44 AM by timfox

    Threads not being cleaned up when clustered

    chipschoch

      Clustered App servers. JBossAS 4.2.2.GA, JBM 1.4.0SP3

      --------------
      | Appserver1 | <====== ConvServer1
      | Appserver2 | <====== ConvServer2
       --------------
      

      My configuration is a shown above. I have 2 linux JBoss servers clustered and two windows jboss servers not clustered that act as clients to the clustered queues on the app servers. The conv servers connect using the ClusteredXAConnection factory. When I run a bunch of messages through in this configuration the thread count on both appservers continually increases until eventually I run out of memory.

      When I shut down one of the app servers and perform the same test the thread count on the other app server decreases initially then remains steady . It does not increase when I run a bunch of messages through it.
      --------------
      | | <====== ConvServer1
      | Appserver2 | <====== ConvServer2
       --------------
       DevApp1 ThreadCount DevApp2 ThreadCount
      Cluster with queuing started on devapp2
      Start 158 200
      After 4 usign packages 167 206
      After 4 usign packages 173 208
      After 4 usign packages 179 208
      After hundred events 177 212
      After 10 WPS packages 178 212
      After 10 WPS packages 180 212
      
      DevApp2 Only
      
      Before shutting down Devapp1 212
      After shutting down DevApp1 187
      After 4 usign packages 187
      After 6 usign packages 187
      
      Restarted with each conv server
      connecting to different appserver (no message sucking)
      
      Start 155 225
      After 4 usign packages 169 228
      After 4 usign packages 174 232
      After 3 packages one at a time 175 235*
      After 20 uSign packages 187 256
      
      


      Without belaboring the point, in our system a package causes a series of
      messages to be posted to various queues as it moves through the processing chain.

      The packages called usign require that a conversion be performed by a service
      running on the windows (conv) servers. These are the ones that cause the increase
      in the thread count. I am convinced that it is not my code that is leaving the threads
      around because it only happens when the cluster has more than one server running.

      * I observed that appserver one processed the message, but the thread count increased
      on appserver2 anyway.


      Here are a couple of the stranded threads from the jmx-console view.

      Thread: Thread-1562 : priority:5, demon:true, threadId:4470, threadState:WAITING, lockName:java.lang.Object@11bad13
      
       java.lang.Object.wait(Native Method)
       java.lang.Object.wait(Object.java:474)
       EDU.oswego.cs.dl.util.concurrent.LinkedQueue.take(LinkedQueue.java:122)
       EDU.oswego.cs.dl.util.concurrent.QueuedExecutor$RunLoop.run(QueuedExecutor.java:83)
       java.lang.Thread.run(Thread.java:595)
      
      Thread: Thread-1568 : priority:5, demon:false, threadId:4478, threadState:WAITING, lockName:java.lang.Object@1ffed3a
      
       java.lang.Object.wait(Native Method)
       java.lang.Object.wait(Object.java:474)
       EDU.oswego.cs.dl.util.concurrent.LinkedQueue.take(LinkedQueue.java:122)
       EDU.oswego.cs.dl.util.concurrent.QueuedExecutor$RunLoop.run(QueuedExecutor.java:83)
       java.lang.Thread.run(Thread.java:595)
      
      Thread: Thread-1569 : priority:5, demon:true, threadId:4479, threadState:WAITING, lockName:java.lang.Object@12f926d
      
       java.lang.Object.wait(Native Method)
       java.lang.Object.wait(Object.java:474)
       EDU.oswego.cs.dl.util.concurrent.LinkedQueue.take(LinkedQueue.java:122)
       EDU.oswego.cs.dl.util.concurrent.QueuedExecutor$RunLoop.run(QueuedExecutor.java:83)
       java.lang.Thread.run(Thread.java:595)
      
      
      
      


      Has anyone seen anything like this before?

        • 1. Re: Threads not being cleaned up when clustered
          timfox

          Please can you post (or mail me) a complete thread dump of the server when this problem occurs? (killall -3 java)

          • 2. Re: Threads not being cleaned up when clustered
            chipschoch

            Tim,
            I emailed a thread dump to tim.fox@jboss.com.

            I have been able to narrow my parameters and get a reproducible environment for this issue.

            I wrote a webapp program that queues up messages to the clustered queue. It uses the default provider so it is always queueing to its partial queue. This is executed on AppSvr1.

            The consumers are connected to AppSvr2, so all messages posted get sucked from AppSvr1 to AppSvr2. In this configuration both servers leak the same number of threads, one for each message.

            However, when I changed the request message to not specify a temporary queue for the return message I get no leakage. My consumer has a default queue to send responses if a response queue is not specified in the message. All the response messages end up on the default response queue.

            So to summarize:

            Cluster has 2 JBoss servers AppSrv1 & AppSrv2.
             AppSrvr1 posts messages to [partial queue 1] requestQueue.
             Messages are sucked over to AppSvr2 [partial queue 2] requestQueue
             Consumers consumes from [partial queue 2] requestQueue
            
             a) When a temporary response queue is specified, all response
             messages end up back on AppSvr1, but both servers leak one
             thread per message.
            
             b) When no response queue is specified then all responses end up on
             AppSvr2 [partial queue 2] responseQueue (the default) and no
             threads are leaked.
            

            It would appear that the issue is somewhere in the code that deals with the temporary queues.

            I hope this helps to resolve this.

            • 3. Re: Threads not being cleaned up when clustered
              timfox

              I would have a look in your code to see where you are creating temporary queues, and make sure you are deleting them when you're finished.

              Also it's worth taking a look in JNDI (use jmx-console) to see if there are a lot of temp queues hanging around.

              • 4. Re: Threads not being cleaned up when clustered
                timfox

                BTW I would avoid creating a new temp rely queue for every message you send. This is likely to adversely affect performance.

                • 5. Re: Threads not being cleaned up when clustered
                  chipschoch

                  Deleting the TemporaryQueue has no effect. The JMS API spec for createTemporaryQueue() says:

                  "Create a temporary queue. It's lifetime will be that of the QueueConnection unless deleted earlier."

                  I create a connection make the call and close the connection so I should not need to delete the queue. That said, I put in code to delete it anyway and it makes no difference. Also, I created a temporary queue then stopped execution in my debugger and went to jmx-console. The JNDIView does not list any temporary queues. Go Figure.

                  • 6. Re: Threads not being cleaned up when clustered
                    chipschoch

                    So, I modified my code to reuse the same jms connection and temporary queue within each process that posts messages, and now the thread leakage is gone.

                    While this change admittedly optimizes the processing, I would still consider it as a work around for a bug that does not dispose of the thread created by the use of a temporary queue.

                    • 7. Re: Threads not being cleaned up when clustered
                      timfox

                      I agree that although your application usage of temporary queues was an anti-pattern, it shouldn't leak threads as long as you were closing the connection.

                      Can you create a JIRA with a small program that demonstrates this issue and we will investigate further?

                      Also can you first verify you're not just hitting this http://jira.jboss.org/jira/browse/JBMESSAGING-1215 issue that was fixed a while back?