We are struggling with the same kind of problems. Tomcat gets unresponsive after a while and sometimes we can see an OutOfMemoryError but not always.
We are running jboss 4.0.3_SP1 in a cluster with apache in front, java version is 1.5.0_11 and os is RedaHat EL.
The problem seams only to affect tomcat and the rest of jboss e.g. seams to run ok. From jstats we see that most threads are "BLOCKED" and the ones that are not are in "IN_NATIVE" state doing either socketAccespt, socketRead or receive. We cannot see any correlation to the load on the server, we can provoke this with only one user. However, it occurs most irregular, sometimes several times pers day sometimes a week can go by.
Would appreciate all hints that could help us solve this problem
it's good to know we are not the only ones.
We are still struggling to reproduce this problem on our test environments. When you say you can provoke it sometimes, what exactly are you doing? We tried with very heavy load, but never got any problems during our tests. It's just in production, and only every couple of days, and that's very annoying.
We are also facing similar problems. Running 2 clusters with 4 nodes. Every 1-2 days all nodes in a cluster get locked up.
Have no idea on what needs to be done.
Does anyone know under what condition the entire node can be affected. Th eonly thing i found was
"Also, a slow member could slow or even prevent purging of stable messages (http://wiki.jboss.org/wiki/Wiki.jsp?page=JGroupsPbcastSTABLE), so in the worst case, all members could run out of memory because they would never purge stable messages. Exclusion of such a member resumes progress in the stability protocol.
What I meant by provoke it with one user is that we have had the problem in production with only one test-user logged in, so it doesn't seam to be connected to the load on the system. We have now set up a replica (as close as possible at least) of the prod env for testing but not been able to reproduce the problem.
We have to get more info about what is actually going on inside jboss/tomcat when this happens. Currently we don't have any good analyzing-tools at least not any that we can put in the prod env which is the only place we have these problem. Does anyone have any good ideas about tools or so we can use to analyze the state and the bevaiour of jboss?