Version 40

    Performance we should expect from PojoCache

    Ben Wang, 05-2006




    PojoCache (formerly called TreeCacheAop) is a key component in JBoss Cache. It is an in-memory, replicated, and persistence cache system that operates directly on POJOs (Plain Old Java Objects) in a distributed environment. That is, it is object-oriented such that it can preserve object relationship during replication (or persistence). In addition, it performs fine-grained replication transparently, meaning once a POJO is attached to the cache system, any further POJO field update will trigger a corresponding replication automatically.


    PojoCache also supports Java annotation as well. In the upcoming 1.4 release, for example, there are two additional field-level annotations: @Transient and @Serializable to provide options to skip field replication, or to treat a sub-object as Serializable (but still maintain the external object relationship).


    Users that are interested to know more details should refer the JBoss Cache online documentation. There are examples that you can run in the JBoss Cache release distribution as well. To see an interesting example of the usage, you can also refer to this OnJava article.


    In this article, we will compare the performance of PojoCache fine-grained replication against TreeCache (a default plain cache component of JBoss Cache). As mentioned, with automatic field-level replication, not only will PojoCache ease up the burden of and simplify development (no more additional cache.put() at the end of modifications), it can also potentially increase throughput as well, depending on the POJO size. Our objective here then is to give the user a clear picture of the PojoCache performance characteristics (by comparing to the TreeCache).


    Performance tester

    We have created a performance tester to benchmark JBoss Cache. For those interested, the script can be checked out from JBossCache cvs under tests/scripts/ Basically, a user can configure the number of nodes in the cluster, clients on each node, and payload size (through a list size, explained below). In addition, there is a switch to use either TreeCache (default to JBoss Cache) or PojoCache. If it is PojoCache, then there is another option to specify the frequency of whole POJO updates (e.g., in PojoCache parlance, how often is a new POJO will be attached to the cache).


    As for the load pattern, basically, the tester resides in the same VM as the cache instance, and each client will update a distinct fully qualified name (Fqn) repeatedly non-stop and such that there won't be any write contention between the clients. This is similar to the pattern of http session replication using sticky sessions. To minimize the impact of CPU sharing by the loader client, we have pre-constructed the POJO before the run.


    Test load pattern

    The test POJOs are listed in the Appendix section. It consists of a Student class (inherited from Person class) with an Address class and a list of Courses. To vary the the request message size, we have chosen to parameterize the list size of courses.


    For TreeCache test, we would simply perform the following code snippet for each client loop:

    cache.put(fqn, key, pojo);

    where key is a thread id String, and pojo is a pre-constructed Student instance.


    For PojoCache tests, depending on the update frequency, we do one of the following within each client iteration:

    1. pojocache.putObject(fqn, pojo);
    2. pojo.getCourses().get(0).setInstructor("Ben Wang");


    The first one maps a new POJO each time to the cache system. As a result, it will be more expensive. Note that we are emphasizing a new pojo for every subsequent putObject because if it is still the same POJO, PojoCache will simply recongize that and return the instance right away. While this would be fast operation, it is not our test ojbective here.


    The second one, as mentioned, will simply trigger a field replication autmatically. For example, when going through the underlying replication, the field replication is doing an equivalent of:

    pojocache.put(fqn, key, "Ben Wang");

    of which should be fairly efficient (about 125 bytes over the wire)!


    Replication message size

    To give an idea of the actual message size used, here is a table of list size that we employed compared to the actual message size during replication for TreeCache test.



    List size

    Replicated message size (bytes)







    Table 1. List size vs. replicated message size



    As we can see, the size ranges from 1K bytes to 16K bytes. In addition, PojoCache when doing a whole object update would have twice as much of the size because of extra metadata and overhead involved. We have plan to reduce this size further in the future relase.


    Test Environment

    We ran the tests using a 4-node cluster. The machines are connected with a Gigabit switch. Here are more detailed info.




    Intel dual 3.0Ghz CPU with 4G RAM





    Table 2. Testing environment




    Typically, we expect the POJO would have a long lifetime residing in the POJO cache system, since it is expected that the longer the POJO lifetime, the better the overall throughput will be (becuase of more field updates). To illustrate the impact of the POJO lifetime on the overall performance, we have chosen to vary the POJO update frequency here.


    In addition, we have studied the effects of replication message size against the cache performance. We have chosen to vary the Course list size in the Student object. We have run cases with list size of: 10, 100, and 200, respectively (see Table 1 for the actual replicated size again).


    We have compared the overall throughput and cpu utilization for 4 different cases:

    • TreeCache. This is the plain cache with put of the POJO every time.

    • PojoCache 100-0. PojoCache with a different POJO attachment every time. That is, there is no fine-grained replication. This is 100% POJO update.

    • PojoCache 10-90. PojoCache with 1 POJO atachment and 9 field updates. This is 10% POJO update and 90% field replication.

    • PojoCache 5-95. PojoCache with 1 POJO attachement and 19 field updates. This is 5% POJO update and 95% field replication.


    Figure 1 shows the overall throughput for the 4 different cases. From the figure, we can see that TreeCache is about 4 times faster than PojoCache 100-0. While we expect every PojoCache POJO update will be slower (becuase it needs to actively map the fields into the cache system), further optimization is possible in the future release.


    With PojoCache 10-90 case, however, the overall throughput is about 2 times faster than the TreeCache ones. That is, when the POJO update frequency is 10%, at list size of 100, PojoCache is about as twice as fast as the TreeCache one. When the POJO update frequency is only 5% (case PojoCache 5-95), it becomes about 3 times faster (fastest one is 3.5 times for list size of 200) than the TreeCache ones. Obviously, the longer the POJO lifetime, the better performance for PojoCache as discussed.


    In addition, we are seeing that the bigger the list size, the bigger the advantage that PojoCache has. This is expected since TreeCache would have to serialize a bigger message payload over the wire.



    Note that some people have been mislead by some of the above statistics.  All of the tests are performing 100% writes.  The 10-90 and 5-95 you see pertain to the ratio of attaching pojos to modifying pojo fields, NOT a ratio of writes to reads.




    Figure 1. Overall througput


    Figure 2 shows the corresponding cpu utilization for the 4 different cases. As we can see, the CPU utilization are somewhat similar for all test cases except PojoCache 100-0 where every POJO update probably demands more CPU power.



    Figure 2. CPU utilization



    Finally, the above tests all have been run using ASYNCHRONOUS replication mode. To validate the results with SYNCHRONOUS replication, we have also shown the resulting throughput using the list size of 100 in Table 3.




    Asynchronous throughput (req/sec)

    Synchronous throughput (req/sec)




    PojoCache 100-0



    PojoCache 10-90



    PojoCache 5-95



    Table 3. Throughput comparison for different cache modes with list size of 100.



    As we can see, except the overall throughput for synchronous replication is slower than the asynchronous one, the trend is about the same. E.g., PojoCache 5-95 is about 3 times as fast as the TreeCache one.



    In this article, we have compared the performance of PojoCache against the plain TreeCache (both components available in JBoss Cache). It is expected that without fine-grained field-level replication, PojoCache is slower than TreeCache. However, with field replication, PojoCache can be 3 times faster than the TreeCache counterpart with cases shown here. Of course, besides the performance factor, PojoCache also comes with automic POJO field replication and capability of handling POJO object relationships (during replication).


    For additional feedback, please go to this topic under JBoss Cache Forum.




    Test POJO


     * Person class with PojoCache declaration.
    // Note that for PojoCache it is actually not necessary to implement Serialziable. It is
    // needed for TreeCache
    public class Person  implements 
       protected String name;
       protected Address address;


     * Student class. No need to declare annotation since it inherits Student.
    public class Student extends Person
       protected String school;
       // We will vary this list size to run different replication message sizes.
       protected List courses = new ArrayList();


     * Address class with PojoCache declaration.
    public class Address implements {
       protected String city;
       protected int zip;
       protected String street;


     * Course class with PojoCache declaration.
    public class Course  implements {
       protected String title;
       protected String instructor;
       protected String room;


    Note that the POJOs are declared with JDK5.0 annotations. Before we run the PojoCache tests, we have to run the aopc ant target to prepare the POJOs once. Then during runtime, there is no special class loader needed.




    Referenced by: