6 Replies Latest reply on Apr 6, 2004 2:18 PM by richieb

    JBoss on Solaris leaves threads hanging

    richieb

      We are testing with JBoss on Solaris and we observe the following situation. If the client process is killed, the topic subscriptions are not getting cleaned up and others who publish on this topic fill up the queues.

      Looking at the thread listing in jmx-console we see several Read tasks that are stuck in socket read and never time out:

      
      Thread: OIL Worker Server : priority:5, demon:false
      Thread: OIL2 Worker Server : priority:5, demon:false
      Thread: UILServerILService Accept Thread : priority:5, demon:false
      Thread: Message Pushers-1 : priority:5, demon:true
      Thread: Message Pushers-2 : priority:5, demon:true
      Thread: UIL2.SocketManager.ReadTask#5 : priority:5, demon:true
      Thread: UIL2.SocketManager.WriteTask#6 : priority:5, demon:true
      Thread: Message Pushers-3 : priority:5, demon:true
      Thread: Message Pushers-4 : priority:5, demon:true
      Thread: Message Pushers-5 : priority:5, demon:true
      Thread: Message Pushers-6 : priority:5, demon:true
      Thread: UIL2.SocketManager.ReadTask#7 : priority:5, demon:true
      Thread: UIL2.SocketManager.WriteTask#8 : priority:5, demon:true
      


      How can I make sure these connections are cleaned up? I cannot depent on the client, since the client is running over an unreliable network and sometimes looses the connection entirely.

      I tried to enable ClientMonitorInterceptor but that doesn't do anything.

      This is Jboss 3.2.3 and Soloris 5.8. Clients run on Windows and Linux.

      BTW, this problem does not occur when the server is running on Linux. If the client is killed, the subscriptions are immediately cleaned up.

      Any suggestions greatly appreaciated.

      ...richie



        • 1. Re: JBoss on Solaris leaves threads hanging

          There is a "ReadTimeout" parameter configured in deploy/jms/uil2-service.xml
          (default 2 minutes - 120000 milli seconds)

          If the server doesn't read anything from the socket in that configured period it
          will automatically close the connection.
          NOTE: The client should be pinging the server every "PingPeriod" milli-seconds
          if it is active.

          The ReadTimeout is a TCP/IP socket option - it is not implemented by JBoss.
          If it is not working on Solaris I suggest you ask Sun why.

          Regards,
          Adrian

          • 2. Re: JBoss on Solaris leaves threads hanging
            richieb

            I have ReadTimeout set and it seems to work sometimes.

            The problem seems to occur when I run clients over wide area network, which has many delays and occasional errors.

            Doing a simple kill -9 on a client seems to terminate everything properly. It's when network hits occur that things can get messed up.

            What can I trace to get some more info about what's going on?

            Thanks!

            ...richie

            • 3. Re: JBoss on Solaris leaves threads hanging
              richieb

              One thing I noticed that when the connections start going bad this is what the thread dump of the ReaderTask looks like.

              "UIL2.SocketManager.ReadTask#11" daemon prio=5 tid=0x00cfade8 nid=0x72 runnable [ae580000..ae5819c0]
               at java.net.SocketInputStream.socketRead0(Native Method)
               at java.net.SocketInputStream.read(SocketInputStream.java:129)
               at java.io.BufferedInputStream.fill(BufferedInputStream.java:183)
               at java.io.BufferedInputStream.read1(BufferedInputStream.java:222)
               at java.io.BufferedInputStream.read(BufferedInputStream.java:277)
               - locked <0xcb846e28> (a org.jboss.util.stream.NotifyingBufferedInputStream)
               at org.jboss.util.stream.NotifyingBufferedInputStream.read(NotifyingBufferedInputStream.java:77)
               at java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2150)
               at java.io.ObjectInputStream$BlockDataInputStream.refill(ObjectInputStream.java:2370)
               at java.io.ObjectInputStream$BlockDataInputStream.read(ObjectInputStream.java:2539)
               at java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2579)
               at java.io.ObjectInputStream.readFully(ObjectInputStream.java:944)
               at org.jboss.mq.SpyObjectMessage.readExternal(SpyObjectMessage.java:154)
               at org.jboss.mq.SpyMessage.readMessage(SpyMessage.java:726)
               at org.jboss.mq.ReceiveRequest.readExternal(ReceiveRequest.java:35)
               at org.jboss.mq.il.uil2.msgs.ReceiveRequestMsg.read(ReceiveRequestMsg.java:55)
               at org.jboss.mq.il.uil2.SocketManager$ReadTask.run(SocketManager.java:304)
               at java.lang.Thread.run(Thread.java:534)
              


              Normally these threads sit in "readByte" routine...

              ...richie

              • 4. Re: JBoss on Solaris leaves threads hanging
                starksm64

                What is the netstat status of these connections, connected? Perhaps the tcp_keepalive_interval is too high, or not all OS patches for java have been installed.

                • 5. Re: JBoss on Solaris leaves threads hanging
                  richieb

                   

                  "scott.stark@jboss.org" wrote:
                  What is the netstat status of these connections, connected? Perhaps the tcp_keepalive_interval is too high, or not all OS patches for java have been installed.


                  Ah! Thanks for the suggestion. Seems that the tcp_keepalive_interval is set to the default value of 120 minutes (!!!). Doing netstat after reproducing the problem I see that the connection is still active. Even after the clients have been killed.

                  ...richie

                  • 6. Re: JBoss on Solaris leaves threads hanging
                    richieb

                    Here is the resolution of this issue. It turns out that we hit some weird networking/Solaris problem between two specific Solaris machines. What happens is that when the client on one machine disconnects from the server (running on the other machine), Solaris does not cleanup the sockets.

                    As far as JBoss is concerned the sockets are still open. Furthermore, this condition seems to break the SO_TIMEOUT setting. Even though timeout is set on the read, the read never times out.

                    I was able to reproduce this behavior with a simple program that just does simple socket connections and transmits some data.

                    Clearly this is not a JBoss problem but Solaris/network issue.

                    Thanks for all the pointers!

                    ...richie