JBoss Mail Server M4-pre1 released

Posted by acoliver Jan 31, 2006

You can now download the JBoss Mail Server M4-pre1 release for JDK 1.5 here.

This release is our first pre release of the next milestone. It has undergone dramatic internal improvement from the M3 release. Therefore early adopters should treat this "pre" release with more care than previous "pre" releases. This release includes (among other things) these major features:

Virtual Hosts
Aliases
SMTP Relaying (via mail routes)
APOP
TLS for POP
numerous additions to the installer

The installation process now includes some post installation tasks that the administrator must complete. Moreover, some screens for enabling complex features may not be obvious. Therefore, we recommend that even users with prior JBMS experience view the installation walkthrough. While much of the documentation for this release is already updated, it will not be finished until the actual 1.0M4 milestone release.

For users who chose to use Microsoft Outlook, we now have instructions for setting up Outlook for use with JBoss Mail Server available on the wiki (common settings). For users using Mozilla's Thunderbird open source product, we also have updated instructions for setting up Thunderbird for use with JBoss Mail Server. While these are the most common mail clients we are willing to publish instructions for integrating other commonly available mail clients with JBMS, please offer suggestions in the JBoss Mail Server development forum.

Although the winter season saw a pause in development, adoption and interest has been steady in JBMS and we have now begun to attract contributors from competitive and complementary efforts. This is primarily due to the open nature of our community with its "do-ocratic" viewpoint where some projects have become increasingly stagnant due to their own social aspects. It can also be traced to the fact that going from small to large can be achieved faster than going from large to much larger. Still we would like to see more potential contributors live up to their potential.

As previously emphasized, user installation experience is a continued focus for the project. We think that we are already the easiest to install mail server on the UNIX platforms and positive feedback at JUGs have indicated that we are also doing well on the Windows platform. We would like to hear about your installation experience positive or negative in the JBoss Mail Server development forum as well as any suggestions you may have to improve it. A challenge in maintaining our progress in this area is offering a good set of features without overburdening the average administrator with settings that he/she will never need. Moreover, it is obvious that some tasks should be in the domain of post-installation administration. To this effect, I do think that M4-pre1 is a minor ease-of-installation regression over M3 in that there are more screens in the installer and creating mailboxes presently involves confronting the much-hated JBAS jmx console (you could also use the web console but then you loose the friendly CTRL-F key for all intents and purposes). Rest assured this issue is foremost on my mind. We are looking heavily into the area of administration experience. Although I anticipate that we are close to a "production" release (if not already there), I will be reluctant to stamp a release with the 1.0-final blessing until we have both webmail and an administration tool.

I hope to apply further performance optimizations over the next week. While most of our load testing was done before a crucial mailbox re-factoring, we expect some performance improvement. Still, we are aware of a number of places where there are opportunities for improvement (particularly in the new mailbox layer). However, my wife said "Ooooh...the mail runs much faster now." which probably means we did okay here :-).

For the M5 release (which will have a much shorter cycle) we will primarily focus on IMAP, but will of course be looking to improve the administrative/installation experience and hopefully see a sophisticated webmail begin to emerge. Please report problems and suggestions to the JBoss Mail Server forum.

Hibernate/EJB3 Tuning

Posted by acoliver Jan 23, 2006

Having done a fair amount of Hibernate training and consulting, I thought I might post this mini-guide on some common tuning things. These are the top 8 things I tend to find when tuning Hibernate/EJB3 apps. I am going to talk about this predominantly in Hibernate terms but most of it applies to EJB3 persistence (some of the detached object/etc stuff is a bit Hibernate specific) as well. All of it obviously applies to JBoss's EJB3 persistence implementation (which is Hibernate) either by default or through the use of "extensions" (is it an extension if it predates the spec?).

appropriateness of Hibernate for the task
fetch strategies
set logic vs iterative logic
named vs adhoc queries
cache abuse
version columns/locking strategy
bah...DB2 and friends
flushing your performance

appropriateness of Hibernate for the task

It may surprise you to read that sometimes the first thing out of my mouth to customers is "I really am not sure you should be using Hibernate for this". Too often I see code that could take seconds as a few "update/where" and "insert/from" type statements written as a set of iterative loops. Now this bad code was traditionally written as PL/SQL with cursors but is now written as Java with Hibernate. The problem is that if the PL/SQL code was really slow, the Java code is ridiculously slow (do to interprocess/network communication). Generally the "reason" is that they "want to reuse our business logic for batch processing". The problem is that improper cost/benefit analysis is done. If you "reuse your business logic" and it takes 10 hours for your batch process to run when it could have taken 10 minutes...was it worth it? This comes up frequently in computing. It is "performance vs maintainability" and traditionally the answer is "maintainability and buy a faster processor. The problem is that this is all about "how we do IO" and "how many processors are used to do the task". When we talk about millions or billions of records being processed the decision matrix tends to shift in favor of hard to read Oracle-specific SQL statements with bizarre "parrallel(x)" syntax running efficiently in the database. Using cursors and iteratively processing these in loops performs poorly and bound to a single processor. Performance is even worse when doing remote communication by using a Java resultset for sequential access. Hibernate is really appropriate for transaction processing where we have short sweet reads, writes, and updates happening over many threads. The scale is in concurrency and pause time rather than in records processed.

fetch strategies

Many users define their fetch strategies inside of their meta-data. In Hibernate, you can put it in your HQL query instead which allows you to tune on a use-case basis. In H3 when you run an HQL query the mapping file's fetch strategy is ignored for each association and they are loaded lazily by default. Speaking from a performance perspective only, your DAO classes should reflect "use cases" such as "findCustomersAndOrdersByXXX()". This is a great advantage over definition defined fetch strategies as I may sometimes only want customers and other times I may want customers and orders. It depends on whether I am about to show the customer his address or about to show a sales person his detailed order history. You may wish to separate data access tiers, but should not do so in a way that is naive.

from Customer c left outer join fetch c.orders

set logic vs iterative logic

This item overlaps the stuff mentioned above in the appropriateness of Hibernate issue. Unless I have already loaded a set of data into my session cache, iterating through a set of data and updating a property is often inadvisable. Meaning I should not load a customer, loop through his orders, have a condition and call setStatus(STATUS.FOO). I should write this using the "executeUpdate()" EJB3/H3 method. This lets me write a SQL-style "update" statement (as well as delete, etc).

   update customer c set c.status=:status where c.status=:prestatus and c.territory=:territory

named vs adhoc queries

Hopefully most of you use Hibernate with an appserver. If you do not, I know where you can get a great one for free! Your appserver should supply a connection pool. Your connection pool should supply a statement cache. When you use named queries, you optimize the use of the statement cache. Meaning Hibernate will pre-parse the HQL, create the statement and have it ready to go for named queries. For an "ad hoc" query this happens every time on the fly. For EJB3 this means using the appropriate annotations for named queries. This can make a difference in performance, but the main advantage is that any HQL errors are discovered at startup rather than on the fly.

cache abuse

Cache is King, but it can also be king of bad performance. I generally recommend turning cache on at the end of your project as part of your performance testing effort. However, most developers fundamentally misunderstand how Hibernate uses cache. Cache is not, by default, on at all. Turning second level cache on, does not effect HQL/QbE/QbC/SQL queries. Caching with a low cache hit ratio will result in decreased performance due to de-optimization of the JVM garbage collector. Caching forces memory fragmentation. Caching expedites major and full garbage collections. Second level cache, primarily affects Session.get/find/load operations as well as association traversal (but less so). Query cache affects queries, but has to be turned on for specific queries. It is keyed by the query and its parameters. This is good for things like list boxes. The only automatic caching Hibernate does by default is Session caching. Meaning when you access entities through the session they are kept referenced in the session. If you access the same entity twice through Session.get/find/load then it is retrieved from the session cache.

version columns/locking strategy

Most developers don't understand how most transactional databases work (see DB2 section for a more in-depth discussion). When you have a transaction in the data you update is a "copy" of the data. It is only saved over the original when you commit. However if two concurrent transactions update the same data, the default behavior is "second one wins". Much of the time, we want "first one wins, second one is informed". Hibernate can help. One way it can do this is to keep a copy of every field read and execute a statement like this:

update thing set description='something in the db' where id=1 and date='10/10/01' and description='this here thing'

Meaning all of the original values are in the where clause. In order to do this, the session must remain open which precludes the use of detached objects (for all intents and purposes). However, Hibernate also supports the use of version or timestamp columns. In order to use them, you must have a version or timestamp column in your table. You should have one anyhow, because even if Hibernate isn't your exclusive access method, you can do this without Hibernate:

update thing set description='something in the db', set version=2 where id=1 and version=1

Even if the above DML SQL was executed twice, the second query would return "0 rows updated" which Hibernate uses (via JDBC) to generate an exception to the second updating thread. This is an efficient way to implement optimistic locking and it of course works with detached objects. Obviously you can adapt legacy code to do this as well. While some data do require a LockMode.UPGRADE or "select for update", much data do not. The great thing is that Hibernate manages the version matching/logic for you, you just catch the exception and decide what to do about the conflict.

DB2 and friends

With IBM's DB2(/Informix) you get to do fun stuff like monitor lock contention. The big issue with DB2 is that it doesn't support something called Multi-Version Concurrency Control (MVCC). What is it? Well this article gives a high level explanation, but basically you work on a copy of the data in your transaction rather than on the real record; when the transaction commits it is written back. DB2 supports row or page locks basically. Which means in the normal/default/usually-optimal "read committed" isolation level DB2 will get a read lock on rows read in a transaction when Hibernate issues select statements. This could result in very slow performance if you have lookup tables that are not/cannot be cached for instance. Your options though suck. You can degrade the isolation level to "read uncommitted" (you do this in the datasource definition in JBoss). Since Hibernate uses a "write behind" strategy, you can used this as a "poor man's" MVCC. Meaning so long as you do not actually flush to the database, you're going to achieve most of what MVCC tries to achieve. However, you have to be aware of the flush mode behavior of Hibernate. There is a lot more to all this and if you have to use DB2 (you'd have these problems with or without Hibernate) and you don't know all of this already then you probably need help if you're going to achieve a high level of concurrency. With all of that being said, while DB2 costs a lot less than Oracle, I can't say enough about how much more optimized for transactional computing Oracle is. However, Oracle is not your only option, this is not really a cola war. You actually can choose from several databases with MVCC or forms of MVCC including: Microsoft SQL Server, MySQL and PostgreSQL. DB2 and Informix are basically left out in the cold here with regards to "popular" databases without MVCC. In fact glaring are the benchmarks published by MySQL. I'd put to you that MVCC support is probably more important than a lot of other concerns in transactional computing (as opposed to batch processing and OLTAP type stuff). If licensing was the reason DB2 was chosen over Oracle, you may want to consider one of the other database choices. Aside from Oracle, I have had the most experience with MySQL, but they have great service and support and with what you save on licensing you can probably buy some fatter hardware and possibly exceed the performance of the others for far less money.

flushing your performance

Finally, a lot of developers call session.flush excessively. Note that the default flush mode (auto) in Hibernate flushes data whenever a query is executed or whenever a transaction is committed. This makes calling flush normally unnecessary. It also means that if you do query, update, query, update, query, commit that flushes will happen at least 3 times. If we do query, query, query, update, update, commit and change the flush mode to COMMIT (instead of auto) then we could achieve the same thing with only 1 flush. Flushing is pretty expensive (it is when all the DML is sent to the DB). Economizing this would be good from a performance perspective. However, if you have a lot of junior developers on your team (or careless people) then it is pretty hard to work in a FlushMode.COMMIT world. Meaning if I update thing.name='foo' and then execute "from thing where name='foo'" I will not get back the thing I just updated as a result. You have to manage your own transactional space with regards to your own updates in a FlushMode.COMMIT world. However, it is good from a performance perspective (meaning I wouldn't recommend it much of the time). In looking into this you may also want to look into the jdbc batch_size setting.

While in most apps your performance tuning should start with your persistence layer, it should not stop there. Moreover, your settings in the persistence layer touch your appserver configuration (statement cache size), JVM tuning (garbage collection/heap size) and more. These are just a few of the biggies that I run across in my travels that you should consider, there is obviously more. We touch on all of this in the Hibernate 3 advanced training (except for the DB2 stuff) which JBoss and our partners offer.

Filter Blog

By date:

JBossDeveloper

Andrew Oliver's Blog

JBoss Mail Server M4-pre1 released

Hibernate/EJB3 Tuning

Actions

Filter Blog