Hmm. I'm curious, what is the machine doing during those 4-30 seconds? Is it burning CPU or I/O maybe? Is the JVM in RAM completely or are portions of it paged out to disk?
You could run "vmstat 2" in a terminal next to "tail -f acces.log". As soon as you see the access log stall, check the vmstat to see what the box is doing, cpu or disk.
Sounds like you are hitting a major collection - I can see that taking 4-30 seconds to clean up the tenured generation based on the heap size. You could greatly improve your performance by setting the young generation size. For details, see my presentation at http://www.cecmg.de/doc/tagung_2007/agenda07/24-mai/2b3-peter-johnson/index.html
Hmm. I'm thinking that if it is a mayor collection, you should be able to see the time spent in the memory view of jconsole. On that view, check the lower panel, where the garbage collectors are shown.
Peter: so what you are suggesting is that there actually is such a thing as too much memory?
The JVM comes with several GC's. Perhaps picking a different one would alleviate this?
OK, the problem has receded now, thanks for your input, much appreciated.
The confusing thing was the JVM JMX beans showing the GC times and counts for PS Scavenge and PS MarkSweep painted a totally false picture.
After reading the document suggested by PeterJ, and reading on about Ergonomics of the JVM I arrived at having only one JVM option set:
We now let the JVM manage memory itself, as discussed here:
The other problem was quickly highlighted once all the GC'ing was eliminated; our half hourly DB backup was gumming up the system.
Thanks again to everyone for taking the time.
PeterJ, I had already read your presentation, and I would like to add that I considered implementing something based on your suggestions, but instead of using the GC debug information, getting the info from the JMX beans. The advantages being that i) we already have a graphing solution ready to plug in the output (cacti), and ii) I could turn graphing of GCs on and off without a restart.
However, I now have to conclude that the JMX beans do not paint such a complete or accurate picture as the GC debug output.
The only reliable picture of what is going on with GC is by collecting GC data - mechanisms that use JMX or the JConsole provide only snapshots of current heap usage at the time the snapshot was taken. What is missing from that data is how much heap space survives each GC and in particular the final heap size after a major GC. That data is critical in right-sizing the heap.
And yes, too big of a heap is bad. There is an anti-pattern along the lines of "if a little is good then more must be better", and that definitely applies to the heap. If the heap is too large you will pay for it in time spent on a major GC.
What you said earlier in the thread suggests that it is not so much having a massive heap that is an issue, but having too many objects end up in the tenured generation that are really too young. By increasing the size of the young generations, you are effectively raising the age at which objects pour over into the tenured generation.
Also, would using a parallel garbage collector not alleviate the pausing?
Smaller young generations get collected more often, which means that objects will more quickly use up their 32 (by default) lives before being placed into into the tenured generation. Thus medium-lived objects tend to end up in the tenured generation. By keeping your young generation larger, you are more apt to "kill" medium-lived objects via a minor collections because there are fewer of them.
In one test (on a real-world application, not on a test app) I saw the number of collections go from over 8000 to under 100 by right-sizing the young generation. The total time spent in GC was also drastically reduced from several minutes to under one minute.
Both size and number of live objects are factors in GC time. Larger heaps take longer to collect. Given two heaps of the same size, the one with the most number of live objects takes the longest to collect.
Parallel collection helps, given one CPU/core per thread. Fortunately, parallel collection is turned on by default starting with JVM 1.5 if you have two CPUs/cores and at least 1GB or RAM (yes RAM, not heap size).
One final question; how much overhead is associated with having GC debug information recorded?
I'm thinking of turning it on for a week to give me a sustained period for analysis.
If you use the -Xloggc:file-name option to direct logging statements to to a file, the logging overhead will be minimal, probably around 2-3%. (It is slightly higher when dumping it to the console just because writing to the console is slow.)