10 Replies Latest reply on Oct 6, 2009 11:31 AM by peterj

Diff response times at different times

njrfrens Oct 3, 2009 1:28 AM

We have a strict SLA requirement that the reports should be displayed in 1 min max for a load of 500 users
We are running a load test of 500 users on the application.

We are getting different response times in different time intervals. The deviation is from 1 min. to 2 min.

I want to understand why the application is giving huge diff. in response times at different time intervals.

1. Is it some thing because of garbage collection? If yes, how to analyze it?
2. Is it because of any memory leaks in the application? If yes, how to analyze it?

Can you please help me regarding this?

I'm setting the JVM Parameters Xms, -Xmx both to 1024.
All the other parameters I'm leaving them as they are in \bin\run.bat

1. Re: Diff response times at different times

njrfrens Oct 3, 2009 3:14 AM (in response to njrfrens)

I tried with the below JVM Settings :
set JAVA_OPTS=%JAVA_OPTS% -Xms1024m â€“Xmx1024m â€“XX:NewSize=300M â€“XX:MaxNewSize=300M â€“XX:SurvivorRatio=32 â€“XX:+UseTLAB â€“XX:TLABSize=64K

But this settings didn't work well for me.
With this setting, I observed that the response time is increased even more.
Actions
2. Re: Diff response times at different times

njrfrens Oct 3, 2009 11:28 AM (in response to njrfrens)

My heap graph with JVM Parameters :

1. When set JAVA_OPTS=%JAVA_OPTS% -Xms1024m -Xmx1024m
Graph is http://img260.imageshack.us/img260/2228/withoutyoungsize.jpg
[img]http://img260.imageshack.us/img260/2228/withoutyoungsize.jpg[/img]

2. When set JAVA_OPTS=%JAVA_OPTS% -Xms1024m -Xmx1024m -XX:NewSize=300M -XX:MaxNewSize=300M -XX:SurvivorRatio=32 -XX:+UseTLAB -XX:TLABSize=64K
Graph is http://img260.imageshack.us/img260/1281/withyoungsize.jpg
[img]http://img260.imageshack.us/img260/1281/withyoungsize.jpg
[/img]
Actions
3. Re: Diff response times at different times

peterj Oct 5, 2009 8:57 AM (in response to njrfrens)

Looks like you are following my instructions for graphing the GC data, but you incorrectly added the "after" and "millis" data points to the graph. When adding the extra data points, be sure to select the entire column and to place that selection in the "Y Values" box, and leave the "X Values" box blank. I suspect you switched the values of these two boxes.

Based on the current graphs, I don't think that 1GB is a sufficient heap size, try 1.5GB. and increase the young gen size to 500MB.

As to the response time differences, it could be the major GC time, but as I noted, the graphs are not correctly plotted so I cannot tell what the GC times are. You could, of course, scan the GC times for the major collections to see about how long each one takes.

How many processors/cores are on your system? If you have several, you might try the CMS collection.

Are you running a 32-bit or 64-bit JVM?
Actions
4. Re: Diff response times at different times

njrfrens Oct 5, 2009 12:25 PM (in response to njrfrens)

I think I plotted graphs properly this time.
1. When set JAVA_OPTS=%JAVA_OPTS% -Xms1024m -Xmx1024m
Graph is http://img169.imageshack.us/img169/6715/withoutyoung.jpg

2. When set JAVA_OPTS=%JAVA_OPTS% -Xms1024m -Xmx1024m -XX:NewSize=300M -XX:MaxNewSize=300M -XX:SurvivorRatio=32 -XX:+UseTLAB -XX:TLABSize=64K
Graph is http://img203.imageshack.us/img203/9528/withyoung.jpg

The server machine is having 4 processors. 32 bit JVM.

What is meant by CMS collection?
You mean using Concurrent Collector (Using -XX:+UseConcMarkSweepGC -XX:+UseParNewGC options)?

Btw, I configured the HttpConnector with maxthreads=600.
(I arrived at this number after monitoring the busy threads using JMX Console)
Actions
5. Re: Diff response times at different times

peterj Oct 5, 2009 12:38 PM (in response to njrfrens)

The time data is all on the X axis. Either you have not set up a second scale for the Y axis for that data (don't ask me how to do this, I'm not a Excel expert), or you have not provided a high enough factor (I cover this in my description of building the chart, using a factor of 10000 or 100000, depending on the pause-time ranges). [I really need to come up with new scripts that plot the pause time ranges for minor and major collections... Yet another task to do in copious spare time.] Also, look at the pause times for the full collections and give me an idea of what their ranges are.

Yes, CMS is the Concurrent Mark-Sweep collector.

Regarding the 500 users in the SLA - is that logged in users, or simultaneous active requests? I ask because in general you can figure about 1 active request for every 10 logged-in users.

Is there any think time in your load testing script?

I still think you need to go with the higher memory settings.
Actions
6. Re: Diff response times at different times

njrfrens Oct 5, 2009 1:11 PM (in response to njrfrens)

Please pardon me for my poor excel skills.
The csv files for the tests are here
1. Without setting young size
http://new.flyupload.com/files/view/rwy3laQamtpju9Ok1ZH9
2. With Young size setting
http://new.flyupload.com/files/view/JP8mCuSxAOH8VTIGWJEj

What is meant by think time?
My Performance test run is done around for 8 min
Where in first 2 min, no. of users slowly ramp up and then steady state is maintained for 5 min and then the no. of users will be ramped down in 1 min

Though my RAM is 4 GB, unfortunately, If I am going beyond 1124 MB of heap, I'm getting some error saying that cannot allocate that much heap and the server is not getting started.

I'm running I am running JBoss 4.2.3 on Intel Xeon CPU, E5405 @2.00 G.Hz,3.99 GB RAM. It has 4 processors
I'm running it on 32-bit Windows Server 2003 OS
Actions
7. Re: Diff response times at different times

peterj Oct 5, 2009 3:00 PM (in response to njrfrens)

No need to apologize about your Excel, I am not that good at it either.

The full GC pause time is only around 1 second, so GC is not the cause of the response time discrepancies you are seeing. In total you are spending about 53 seconds in GC (72 when not setting young gen size), which is a little high (1/8th of your run time)., but not too bad.

I think the max heap size is due to the permgen size setting. Between the heap and the permgen you can allocate around 1700MB. You could do a run with -XX:+PrintHeapAtGC to see what your perm gen requirements are and set it accordingly.

Other than that, you might want to look at your CPU usage. Is the kernel time high? If so, then that could indicate contention issues between your threads.

"Think time" is the delay that you place into your load test script between requests. Some load tests simulate real environments where the users have to "think" between the time they are shown a page and when they submit the request. For example, after entering a certain page back the script might wait 20 seconds before sending the next request on the assumption that it takes the average user 20 seconds to fill in the page before making the request. This the question about your SLA is very apropos - if it is 500 logged in users then added the think time would place less of a burden on the system and lower the response time. Usually you will want to under guess the think time. In other words, when I mentioned 20 second think time in the earlier example, that was probably because most users take 40 seconds or longer. Of course, the other way to do that is assume that only, say, 1/4 or 1/5 of the users will have simultaneous requests. Then you can try a 100 or 150 user run to simulate that (though I would run 200 users just to be safe.)

Have you tried 100, 200, ... 400 users? Or going 50, 100, .. 450, 500? If so, have you plotted the response times for each such run? If there is a dramatic drop between two runs that could pinpoint your saturation level. In which case you might want to reduce the number of HTTP threads to match that. For example, if with 300 users, 90% of your responses are within 15 seconds, then you could set 300 threads with a 200 request wait queue. Then the overall response time should be around 30 seconds. The idea here is to not overload the system (more threads is not always better, and adding more thread to an overloaded system is a well-known performance anti-pattern). I found out the hard way when first doing performance testing many years ago that doing the full run with the max number of users as the first run was the wrong way to do this - you need to start small and steadily increase the workload and note at what time the response times start to change drastically - that is your saturation point. And you have to find out hwy your are saturated at that point, fix that issue, see the response times go back in line, and then continue.
Actions
8. Re: Diff response times at different times

njrfrens Oct 6, 2009 3:16 AM (in response to njrfrens)

Thanks Peter For your excellent support.

During the load test run, I saw that the CPU utilization is around 60-70%

I'm setting the PermGen Size to 512m with the params -XX:PermSize=64m -XX:MaxPermSize=512m
If I donot set this, I'm quickly getting Out of Memory Permgen Space Error.

If I can't increase the heapsize beyond 1700MB(including permgen),
Does that mean 2GB/4GB/8GB RAM will not make much difference?

If garbage collector is not the culprit for the difference in response times at different intervals, I wonder what else could be the reason...?

Will clustering of JBoss help me in meeting the SLA?
If yes, can I try the software clustering(with out adding any other hardware), or I have to go with Hardware Clustering only?
Actions
9. Re: Diff response times at different times

njrfrens Oct 6, 2009 9:58 AM (in response to njrfrens)

There was a typo in my earlier post regarding CPU Utilization.

It was around 60-90% during peak load
Actions
10. Re: Diff response times at different times

peterj Oct 6, 2009 11:31 AM (in response to njrfrens)

If I can't increase the heapsize beyond 1700MB(including permgen), Does that mean 2GB/4GB/8GB RAM will not make much difference?

That is correct. As long as you are using a 32-bit JVM you are limited to a 32-bit (4GB) address space for each process. Windows uses 2GB of that space for its data (file handles, thread handles, GUI handles, etc) and leaves the other 2GB for the process' use. Then take away the memory required for the code (exe and dlls), the C/C++ data structures used by the JVM, and the thread stack memory, and you are left with around 1700MB, give or take a few 100 MB depending on the exact version of Windows. You would have to install a 64-bit OS and run with a 64-bit JVM to get a larger address space.

If garbage collector is not the culprit for the difference in response times at different intervals, I wonder what else could be the reason...?

Without profiling your code, it is hard to tell. You need to find out where you code is spending its time.

Also, seriously consider my suggestion of running with different numbers of users - with 500 threads all fighting to get time on 4 cores you could just be overloading the system. Reducing the number of threads could very well improve your response times. even if you have requests queued up waiting for their turn.

Will clustering of JBoss help me in meeting the SLA?

Clustering using a software load balancer (Apache mod_jk) will probably work for you. 500 requests per minute (based on SLA of 1 min max response time for 500 users) is very low, usually you need to go to a hardware load balancer if you are handling hundreds (or thousands) of requests per second.
Actions

Go to original post