Managing multiple databases transparently

Version 2

Created by anthonyhib on Aug 11, 2009 9:23 AM. Last modified by sebersole on Mar 19, 2010 1:55 PM.

The ideas here meet the problems of a single application that reads and persists entities on multiple databases, even not of the same type, e.g. one is an Oracle instance, the other a MySQL one. In our case, we have a web application that uses the standard thread-local solution for handling (Hibernate) sessions and transactions, so that pages and web components on the same request don’t have to deal explicitly with transaction and session. In particular, we use a servlet filter to open and close the session on the requests, which is likely the most used solution.

What happens if you want to access on the same page objects from different databases, and hence from different session factories? We started working on this, and here we present our solution.

We took a look at Shards, but it meets problems of scalability/geographic distribution. In Shards, for the same entity, distinct instances can go on different tables, using sharding strategies. But it is a different problem from what we outlined above.

A first approach is to manage the "other, second" database by hand on you actions and pages, by recovering the second session factory, and manually handling its session and transaction locally. But if you have web components that are persistence aware, and you have made them rely on the default factory, they won’t work for entities on the second factory.

We built a way to freely mix usages of entities and components from the two (on n) factories on the same request, simply by guessing the factory handler by the persistent entity class.

Notice that this solution cannot work if the same entity is mapped on both factories; in that case, Shards may be more useful. It also does not work if one entity refers to an entity on a different factory; in order to "connect" the entities, you have to recur to primitive types. This approach can be useful if for example you want to copy some data from entities on a database to another, as it often happens in integration with legacy applications.

The "guessing" approach works in this way: every time you do a search, find by primary key, store, delete etc. query, you need to determine the session to which the entity instance belongs. It will be nice if this is not done by hand, but by your framework; this is exactly what we achieved, by inserting a "guessing" method in all query ones.

Technically, map your factory configuration to a factory instance. This done, you need to recover the session and its factory, given a persistent entity class name. Now, in order to save resources, you can open sessions only if entities belonging to that factory are used, in relational terms, you open (eventually pooled) connections only if you need to read from that database. We did it this way: we hold a thread local object, which holds a set of open sessions belonging to different factories. If a query is guessed as concerning an entity which has no corresponding session on the current thread, a new session is opened by the factory which manages such entity.

One of the factories should be marked as default; this is the factory that will be used when you want to open sessions not lazily, not starting from an entity, criteria or hql query.

One of the first bugs of our implementation is that we hadn’t realized that thread pooling on the container has as consequence that you can find on a supposedly "new" thread, the set of open sessions is already loaded with "dead" sessions; so be careful in the lazy implementation.

If you have updated information, please feed this page (not the remarks below) or send us (mailto:ppolsinelli (at) open-lab (dot) com) infos to put here.

References

Some sample code:

http://jblooming.blogspot.com/2008/08/managing-multiple-databases.html

Hibernate Shards:

http://www.hibernate.org/hibernate/subprojects/shards

JBossDeveloper

Managing multiple databases transparently

Comments