4 Replies Latest reply on Jul 9, 2003 8:08 AM by Jon Barnett

    Optimizations and in-VM calls

    Jon Barnett Master

      We're currently doing some investigation work on JBoss to get a handle on performance limits.

      In some spare time, just wrote a simple test rig to check the performance of calls using the optimized and Trove modified JBoss versus standard JBoss.

      The internal VM calls were invoked by 3 parallel threads that had a stateless session bean look itself up and make a single method call (that just returned an incremented counter) - so minimal business logic in the EJB. One thread collected information on JNDI lookups, one performed a call through the local interface (the JNDI lookup is excluded from the measurement, so only create, and call is performed) and the other the same call except using the remote interface (but in VM). Each performed a serial set of 10,000 operations and measured the elapsed time. 10 loops of each were run and the average taken. We set it up this way so that JBoss performed under some sort of load that required multiple operations in parallel.

      For the remote calls tests, we had 2 threads from a remote client, 1 performing remote JNDI lookups and the other performing a remote call on the same bean but not at the same time as the in-VM tests. Again 10,000 serial operations and 10 loops. Reported values are the average per operation time.

      The results on Win2K, PIII 1.2 GHz with Sun JDK 1.4.1_01

      Optimized JBoss 3.2.0 with some Trove and collection modifications in the server branch

      IN VM ---------------------------------------------- OUT OF VM
      Lookup ------- Loc call ------- Rem call ------- Lookup ------- Rem Call
      26 usec ***** 88 usec ****** 95 usec ******* 4.57 msec ** 8.14 msec

      Reported memory use by Windows Task Manager: 52,712K

      Standard JBoss

      IN VM ---------------------------------------------- OUT OF VM
      Lookup ------- Loc call ------- Rem call ------- Lookup ------- Rem Call
      38 usec ***** 107 usec ***** 111 usec ****** 4.60 msec ** 8.39 msec

      Reported memory use by Windows Task Manager: 55,960K

      *******************

      The results on Linux RedHat 7.2, PII 500 MHz with IBM SDK 1.4.0

      Optimized JBoss 3.2.0 with some Trove and collection modifications in the server branch

      IN VM ---------------------------------------------- OUT OF VM
      Lookup ------- Loc call ------- Rem call ------- Lookup ------- Rem Call
      85 usec ***** 308 usec ***** 370 usec ***** 21.36 msec ** 23.94 msec

      Reported memory use: VSZ = 177864K, RSS = 128288K

      Standard JBoss

      IN VM ---------------------------------------------- OUT OF VM
      Lookup ------- Loc call ------- Rem call ------- Lookup ------- Rem Call
      129 usec **** 334 usec ***** 387 usec **** 21.50 msec ** 24.29 msec

      Reported memory use: VSZ = 198816K, RSS = 145768K

      If nothing else, this should clearly show why embedded Tomcat/Jetty is better than an externalised Tomcat/Jetty configuration.

        • 1. Re: Optimizations and in-VM calls
          Jon Barnett Master

          We've now completed the first release of documentation on performance code enhancements to JBoss 3.2.0. This is not a step-by-step coding example but an indication of areas where simple changes can reap a fair performance increase.

          Using the IBM SDK 1.4.0 performance VM on Linux, bytecode optimization, some Trove collection classes and the substitution of certain Collection implementations over others, we are able to achieve a 30 percent reduction in response time over a standard JBoss distribution running under the same SDK. Compared with a standard JBoss distribution running under the Sun JDK, this is a reduction of 44 percent.

          The test results documentation can be found at http://www.amitysolutions.com.au/pages/downloads.html#code. The PDF is 820K.

          • 2. Re: Optimizations and in-VM calls
            Jon Barnett Master

            Some unofficial information :-
            For ECperf 1.1 on our meagre test configuration, the standard JBoss 3.2.0 distribution fails at txRate=1 for a Sun JDK 1.4.1_01 - failures with with manufacturing and orders response times and failure to meet targets by substantial margins, up to 30 percent below. With the changes, we were able to get to txRate=2 without failure. At txRate=3, response times passed but the targets failed by up to 5 percent of the required values. So the performance improvements appear to deliver substantial benefits.

            We measured responses against our own test application on the Sun JDK on Windows and determined a decrease in the reponse time of 20 percent on measurements taken in a previous study.

            • 3. Re: Optimizations and in-VM calls
              jbossnz Newbie

              What hardware/ RAM were you using, and what was the database for the ecperf testing ?

              Seems like some good gains with those optimisations. Would be interesting to see the performance gains for various -Xms -Xmx options.

              • 4. Re: Optimizations and in-VM calls
                Jon Barnett Master

                Sorry for the late reply. Spent time today looking at the blockage. Experimentally, it does not seem DBMS performance bound as I switched to a faster machine for Postgresql. The JBoss installation is running on a Win2K. It is possible that there is a block now on the web server but I'd need to probe into that further.

                The Win2K system was a PIII 1.2GHz with 392 Mb and the original Linux Postgresql system was running on a PII 333MHz with 96 Mb - I gave the JVM up to 200Mb but there was little thrashing, no out of memory problems so the only place I can see that there may be bottleneck issues are somewhere in the web server response. I'll have to devise a test for it I suppose as we have done little in that area.

                Currently the tweaks put in place today have eliminated all problems with manufacturing, and delivery rates are fine but there are failures in the order mix.

                Ideally, I'd like to get the test on a fast Linux machine running the IBM SDK as I'm sure we can get that to peak. But in a way, working with slower technology gives you incentive to chase down any spare cycles you can get.

                One thing we did find is that you can't blindly implement the Trove classes as performance between putting and getting can vary and performance charcateristics change between JVMs. For example, the THashMap is slightly slower than the native HashMap hasNext/next/get but is twice as fast at the put for the IBM SDK. So you need to work out whether you win - if you do a lot of iterate and get and only initialise once, you will lose out implementing it.

                I'll add the results of the two change sets implemented today within the week.