I have found the same strange behavior - while running parallel gc threads for minor collections reduces the gc pause time, running parallel gc threads for major collections increases the pause time. It makes no sense, but there you have it.
You really have to try various heap and gc settings and find the best settings for your app. Have yo considered the CMS collector? With the number of CPUs you have it might be a good option.
See this presentation:
Is there any possible reason why the multi-thread ParallelOldGC is running poor than the single thread one?
Lock contention over common object, such as the free memory list? I really have not had time to investigate it.
Thank you Peter. Can I say?
The problem typically happens when there are too many parallelOldGC
threads in the process and there is too small an old generation. This
results in excessive work stealing between the GC threads and this
work stealing bangs on a lock. Too many ParallelOldGC threads without
enough old space to carve up between them result in this work stealing
I have a quad-core and for my testing I used a 1GB heap (I did not specify a young gen size, but I believe the JVM never set it to more than 100M). When using multiple tenured GC threads the JVM splits the tenured generation into sections and lets each thread clean its own section to minimize contention. So I had 4 thread cleaning about 200MB each. You, or course, had 8 threads so your lock-contention is higher. But I read a very interesting paper the other day regarding cache coherency between L2 caches in the CPUs that caused a significant performance drop when running a multi-threaded app, so I'm wondering if that could be a reason. Of course, I'd need VTune to track that down.
I want to understand more about the point 'lock contention' in the free memory list.
In your case, you have 1GB Heap, with 4 cores. So, assume your young gen size is 100MB, the old gen is around 900MB. So each core will share 900MB / 4 = around 225MB.
If I can use 8 cores, each core will share 113MB.
Lock contention occurs because each thread is working on 'Too few' old gen size?
You'll notice the question mark after my statement about the free memory list. That means I don't know, I am just guessing and my guess could be completely off. I also stated that I had not had time to look into why the parallel old GC runs slow. So asking me to explain it is futile because I have no answers. As I stated earlier, the best thing you can do is try several different GC mechanisms and use the one that works best for you. If you are really concerned about the parallel old GC performance, you should take that up with Sun, after all, it's their JVM and their code.