2 Replies Latest reply on Feb 23, 2006 8:07 PM by smarlow

    99% usage using cluster

    possumpc

      I am seeing an odd behavior that maybe someone else has had experience with.
      Jboss 4.0.1sp1.

      I have an application that is deployed on a cluster, to 3 nodes, on
      separte Linux servers (SLES 9).

      After a while (3-4 days) , nodes 2 and 3 spike the CPU usage to 99%, and the
      application becomes sluggish. node 1 appears stable.

      If I restart nodes 2 and 3, it comes back to normal, but then node 1 spikes to 99%.

      Can anyone explain what may be happening and where I should look to tune this?
      This is a production server so it is hard to "play around" with settings.

      Thanks for ay help in advance.





        • 1. Re: 99% usage using cluster
          smarlow

          It should be useful to get a quick snapshot of what is going on an application server that is consuming 99% cpu.

          The best way to get a snapshot, is to dump the stack traces for the application server. The thread stack traces will be writing to the server out (you can redirect the stdout/stderr to a file so the trace doesn't scroll off or configure the server logs to capture it).

          1. Open a shell and run "ps -ef | grep jboss" command. Note the process id of the application server.
          Example output:
          smarlow 11947 11941 74 22:41 pts/1 00:00:12 /usr/java/j2sdk1.4.2_07/bin/java -server -Xms128m -Xmx128m -Dprogram.name=run.sh -Djava.endorsed.dirs=/disk2/jboss/jboss-4.0.2/lib/endorsed -classpath /disk2/jboss/jboss-4.0.2/bin/run.jar:/usr/java/j2sdk1.4.2_07/lib/tools.jar org.jboss.Main

          The second field is the process id, 11947 in my case.

          2. Enter a kill -3 against the application server process id.
          kill -3 11947

          3. Repeat step #2 two or three times. Keep in mind that the application server will pause while each threads current call stack is printed to the console output device. This could make the application even less responsive while the trace is collected and printed.

          You can read more about Java stack traces here http://java.sun.com/developer/technicalArticles/Programming/Stacktrace

          Reading Java stack traces can be slow and tedious, but you should be able to gain hints on what to check next by examing the traces.

          Also look for errors in the server log around when the high cpu consumption started.

          I hope this helps.

          Scott

          • 2. Re: 99% usage using cluster
            smarlow