We've been playing around with and measuring performance on our applications the last week or so. As standard practice, we optimize and obfuscate our own Java application code to squeeze as much as we can out of hardware. The last week we've been looking at the effect of JBoss on performance. It turns out that JBoss/Jetty is already pretty sleek.
Our test setup is comprised of currently two configurations.
1) JBoss/Jetty with our application framework and demo (accesses 4 tables to compile a topic content page with no graphics) on Linux RedHat 7.2 PII 500 MHz with 192Mb, accessing a Postgresql database on a Linux RedHat 7.2 PII 333MHz with 96MB across a 10 Mbps LAN being accessed by a wireless client with a simulated load of 90 users, each requesting the page every 20 seconds - IBM Java 1.4.0
2) As above except client, JBoss/Jetty and database on a Win2k PIII 1.2 GHz with 384 Mb where Postgresql is running within Cygwin - Sun JDK 1.4.1_02
The load is not overly excessive and we get 160 ms and 40 ms average responses respectively after we tweaked JBoss. The former has limiting effects from the DB and the network as the average barely moved. We managed to scratch between 3 and 5 ms off the latter's response time.
We tweaked JBoss 3.2.0 by building from the sources using javac.optimize on and javac.debug off. We added the optimized channel listener from the separate Jetty 4.2.9 sources. We used Retroguard to code optimize (ant, jasper, pg73jdbc3, xalan, xercesImpl, log4j and log4j-boot). You lose debugging information on class crashes by doing these tweaks.
The interesting thing is although we didn't gain much performance from the tweaks (even on the fast machine), we shaved between 20 to 30 Mb of memory consumption from the Linux run-time footprint. We'll put up the Retroguard optimization scripts on our website by Monday if anyone is interested.
Anyhow, hope this gives people an idea of the magnitude of the responsiveness you can get from JBoss.
So these were the performance tests you spoke about y'day :-).
How can I set up the same thing but for Tomcat?
I noticed that when I simulate 600 clients with a simple java program opening 600 threads that perform http gets and then I try to access the same page from a browser, the page does not load until all 600 threads exit. Why is this? I find it very strange.
This is a simple page loads a dropdown list of 2 recs from the database.
There are no bottlenecks from the database, I checked.
As always yr support is appreciated :-)
I attached my code fyi
I also attached a log of the connections made during the simulation. I'm not very happy with performance I'm getting. It takes anything from 3-5 secs to load one page with 600 clients.
You can get all JBoss code easily built from source except for the Tomcat bundle. I haven't tried yet and it will probably be the weekend before I have a chance to look at this area. What you can do is build the optimized JBoss and drop your jbossweb-tomcat.sar into the finished build (and delete the jbossweb-jetty.sar) or have a try at building the Tomcat bundle yourself (you need the Tomcat source as well and to follow the instructions in the catalina subdirectory of the source distribution).
To build the JBoss bundle with optimizations, unpack the source. Go to the build directory in the source distribution. Edit local.properties. Change this section as shown:
### Javac/Jikes compiler configuration ###
Don't use the Jikes compiler as it doesn't allow for optimizations. Besides, Jikes for Win2k breaks someway down the process for JBoss.
Run the build.sh or build.bat according to your environment. You will need ant to build the system. The results will appear in the output subdirectory of build. If you compare specific JBoss JARS with those in the normal distribution, you will find they are smaller for your own build.
JBoss does not touch included JARs such as Jasper, log4j, Axis and so on. This is why we experimented with post-optimizations for these classes.
As for your question on Tomcat, I don't believe it provides non-blocking IO. This means that:
1 user == 1 HTTP 1.1 connection == 1 thread
Until the persistent connection drops for a user, the thread is not able to service any other requests from other users. Even the JBoss/Jetty bundle does not provide non-blocking IO as JBoss is meant to support JDK 1.3, which does not provide a non-blocking IO implementation so HTTP listeners cannot offer this functionality. With JDK 1.4.x Jetty can provide non-blocking IO support for its listener (through SocketChannelListener) - more than 1 user per connection thread. You need to get and compile the original Jetty distribution to get org.mortbay.jetty.jar with IbmJsseListener and SocketChannelListener.
If enough people ask, we can also provide this optimized jar when we get around to putting up the other optimization scripts and instructions.
As for your testing, I would definitely get acquainted with Apache JMeter. Apart from allowing remote starts of the test harness (so you can have the same test plan run from several different computers at once), you can also tune the request distribution pattern and gather statistics on responses. Very useful for measuring and viewing the response patterns of your application as it ramps up and when it reaches steady state. It gives you a very good idea about how your application will react to environmental stresses.
Finally completed testing performance effects of the JVM and JBoss bytecode optimizations. We had to dial back the request rate so as not to overrun the systems we were using.
Surprisingly we found that the JVM and the bytecode optimizations had a significant impact on performance. Running on a Linux system, we were able to reduce the response time of an application by up to 36 percent, reduce the virtual memory by up to 35 percent and reduce the in-memory portion by up to 15 percent using bytecode optimizations and the IBM SDK 1.4.0. If you want to read more and also find out how we went about optimizing the bytecode, you can get the document and the script files here at http://www.amitysolutions.com.au/pages/downloads.html#optimizations.
Now, we're not claiming that you will achieve this increase for all applications, but it does provide some feel for the magnitude of the changes that might be achieved.
We did some additional testing by tweaking web applications to use local interfaces to EJBs. In our case, we found that they had little impact over the optimizations that JBoss performs internally. We saw perhaps a 1 ms decrease in the response time average, using the IBM SDK 1.4.0 with an optimized JBoss 3.2.0.
In our case, we built the servlets so that they obtained references to EJBs only at initialization. So that nullified the PortableRemoteObject operation disadvantage. In a production system, you don't expect the references to change so it is a reasonable performance tweak.
Therefore, we were only measuring the speed improvement of passing by reference, with small amounts of data transferred between EJBs and servlets on a per transaction basis, the gain is small if not negligible from our results. If you were consistently transferring large blocks of data such as BLOBs for pictures, you might see more substantial performance gains. Images appeared to display faster although we would need to measure this to obtain a proper understanding of the gain.
From a memory perspective, we saw a 4Mb reduction in the virtual size (VSZ) and a 1.5Mb reduction in the resident in memory footprint (RSS) using the IBM SDK 1.4.0 with an optimized JBoss 3.2.0.
Hope that is of use.
>> We used Retroguard to code optimize
Isn't Retroguard a bytecode obfuscator? Does it have some kind of optimization (apparently unadvertised) built in?
It not only obfuscates but also optimizes the namespace - look at the output logs to get an idea of the processing. Most other obfuscators also do optimization - check Jarg and Jopt but they also create some issues.
Obfuscators sometimes strip out bytecode that is actually useful. We found with Jarg that you got really small bytecode footprints and obfuscated code but sometimes public static variables would get munged and an else if statement with empty braces would get completely removed from the "if then else if then else if etc" chain - which lead to results not intended.
Retroguard doesn't produce code that is as skinny as a full optimizer and obfuscator, but does strip out the line and debug information as part of it's delivery.
The jury's still out on the whole actual code optimization but Sun and IBM push the idea that the JIT or HotSpot service are able to optimize that anyway. And we take this path because it creates less headaches than trying to work out why a class stopped working after optimization. Especially if we didn't create the class in the first place.
But I should clarify that the use of Retroguard is about trying to minimise the memory footprint in the safest way possible when you don't have the original code to compile with the optimize flag.
Yeah. My bad - didn't mean bytecode optimization as such. Maybe package optimization in terms of removing additional information from the file and compacting it as much as possible.