1 Reply Latest reply on May 22, 2003 1:44 PM by pwwisnes

    Sockets stuck in CLOSE_WAIT under load

    pwwisnes

      While stress testing our application which runs under JBoss 3.0.6 on a dual processor Linux box (kernel 2.4.18-5smp) using Sun's 1.4.1_01 JVM, we've encountered an interesting failure. The box is providing a variety of stateful and stateless session beans to a separate web server box that is running Apache with the Resin servlet container.

      At right around 500 users, the web server box starts reporting the following error:

      05/16 10:17:14.377 akZtqtw_JFfd (PageDisplayUI.java:62) - Exception occurred ...
      java.rmi.RemoteException: Service unavailable

      At that time, we noticed that the Apache and Resin threads skyrockets on the web server box and the number of threads on the JBoss box jumped from 100 or so to over 350. During this time, the load on either box did not exceed 2 (and averaged around 1.2).

      Further investigation revealed that JBoss reported an active thread count of 350+ threads but its list thread method on the JMX console only reported around 85 or so. We then noticed that there were a large number of
      sockets open to the box that were stuck in the CLOSE_WAIT state. In fact, the number of sockets open exactly corresponded to the number of missing threads.

      Once we stopped the load on the web server, the web box did not show any open sockets to JBoss while the sockets stuck in CLOSE_WAIT on the JBoss box remained. In fact, these sockets did not go away until we stopped JBoss.

      At this point our leading theory is that we hit some kind of resource limit (probably related to networking) and that caused RMI related errors which JBoss was not able to recover from due to a bug in either in JBoss or the RMI code. I have searched Sun's Java bug database, the JBoss forums, and the sourceforge project's bug database and not seen anything that appears to be related to this.

      So I'm posting this to the forum to see if anyone has ever seen anything similiar or has any ideas of possible kernel or JBoss parameters to tune. We plan on trying the latest Sun JVM (1.4.1_02) as well as a newer JBoss version (3.0.7 and 3.2), and possibly even a different vendor's JVM. If we can find the cause of the failure, I'ld like to then improve JBoss' handling of it.

        • 1. Re: Sockets stuck in CLOSE_WAIT under load
          pwwisnes

          Well, I ended up solving our own problems. It turns out that when the project was migrated from Solaris to Linux a few months back, they left the memory settings for the JVM at the old high levels. This was causing the default heap size to use up almost the entire 1 gig virtual address space available to the process (the defaulta ddress space size RedHat's kernel is built with).

          This was causing some thread deep in the bowels of the RMI library to fail to spawn (due to lack of room for the thread's stack) and this thread was responsible for cleaning up the sockets in CLOSE_WAIT state. So the sockets stuck in CLOSE_WAIT were a side effect and not the cause of our problems.