I was writing this really long entry on the messed up things people do in their J2EE applications, but I lost my own attention due to the length. I figured I would break it up. This series will focus on how to make your application run poorly on JBoss and then give the inverse as well. Most of it will be just generic computer science type stuff, but it will all be with a JBoss-twist. My interest is performance and scalability not things like J2EE compliance. I will try and note where compliance might be an issue, but it is not possible to produce a completely scalable drag-and-drop application. This is going to focus on the idea that you are developing for a cluster of 1 to many JBoss instances. It hardly matters what you do for extremely small applications with limited requirements -- or does it?

 

Rule #1 - Use the HTTP Session as a Cache for database data (remember this is to SCREW UP your application)

 

By using the HTTP Session as a cache you can maximize the serialization your application does thereby limiting your scale and performance. Yes it works great on one system but when you have a cluster, the data must be serialized across all of the nodes every time you call "setAttribute()". Furthermore, the more nodes you have in the cluster, the more time they spend processing state replication messages (passed over multicast via JGroups).

 

As an added bonus, it is much easier to blow the JVM's heap size (OutOfMemoryException). It is likely that you do not know how big that session is getting, nor the objects created as overhead to replicate it across the cluster. Nor do you know in advance how many users will login at peak. Thus this is a recipie for "fall off the side of a cliff" degredation even if you do not blow the heap.

 

Inverse - Use second-level cache in your persistence mechanism and keep the Session as small as possible

 

Whether you are using EJB CMP, Hibernate or JBoss Cache directly -- you have access to a second-level cache in JBoss. Generally, if you could cache the data in the session, you can cache the data in second-level cache. Moreover, the data does not have to be replicated.

 

Both JBoss CMP and Hibernate offer "cache invalidation" and LRU algorithms to size this data appropriately. Meaning, you can use one persistence configuration to write the data which will send an invalidation message to the nodes whenever the data changes and invalidate it from cache. The data is not replicated to each node. In the event of a failure, you would just have a cache miss. This offers probably cleaner data than you could achieve in the HTTP Session.

 

Moreover, Hibernate offers optimistic locking strategies and dirty checking. It can merely check the data's staleness if it was used in a write rather than use cache invalidation (which could still be a little overwhelming).

 

JBoss CMP offers "Commit Option D" allowing the data to be flushed at regular intervals. This may be good enough in many cases and may have less overhead than cache invalidation for some datasets.

 

Rolling your own - Writing a persistence mechanism is easy, doing it right...

 

O-R mapping and caching are nothing new and are not hard to write. Writing them correctly so that they can be highly concurrent and writing advanced features such as cache invalidation is somewhat harder. Its not code you want to own or maintain and your boss is not likely to want to pay you to write and support it. JBoss can help you with solid and well-used, tested implementations. Its likely that the #1 most used appserver, JBoss, will have a better general use implementation then you could write as part of your development cycle while you write your business application. Moreover, if it does not completely suit your needs, you can use it as a baseline and have your changes committed to the appserver codebase. This can substantially lower your cost of ownership of your overall codebase. If caching is not part of your core business, then why focus on it as part of your day job. Let us help!

 

Next up: How not to do persistence