A few weeks ago we finally closed the acquisition of MetaMatrix. MetaMatrix is a pioneer in federated data services and metadata management. I am not sure you all understood what MetaMatrix is about when we initially released the formal press announcement back in April. At least I didn't ;) Consequently, now that we have closed this acquisition, I thought I would take time to describe in more technical term what MetaMatrix is all about.

 

For the busy developers, here is the “(Eiffel Tower) Elevator Pitch”®:

“MetaMatrix provides a way to aggregate disparate (and possibly heterogeneous) data sources (databases, mainframes, XML documents, etc.) and make them look like a single unified virtual database that you can access through standard interfaces like ODBC or JDBC or even as an XML document (XQuery interface) and accessible through Web Services. You can obviously perform a multitude of transformations on these various back-end database schema and even perform joins between multiple heterogeneous sources. MetaMatrix works both with Read and Write/Update/Delete operations.”

 

Now, for the less busy ones, let's go through a typical MetaMatrix use case. Let's say your company uses to store customer information in an Oracle DB. Over time, around 10 applications (Java, C, etc.) have been deployed and directly leverage this data source. At some point, your company merges with a competitor. This competitor stores its customer information in a DB2 DB and has a similar number of applications directly leveraging it.

 

When working out the IT integration plan, the CIO makes the following decisions:

  • Over the next 24 months, existing applications will be migrated to a new unique schema that will contemplate the specificities of both legacy DBs; this means that on an average, about one application will be migrated every month.
  • A set of new applications will be developed ASAP so that the company can start rolling out new services to the combined customers of the merged entities. Given that each system has its own concept of essential business metadata like "customer" and "account", this will have to somehow be unified.

 

The typical problems faced with that kind of realistic) scenario are the following:

  1. You cannot migrate all existing applications in one shot, which means various applications will be using the old and the new schema at the same time;
  2. You don't want to develop brand new applications on top of what's now considered a pair of legacy schema: you want to use the new schema for all new applications, otherwise you are just making your migration issue worse;
  3. You might not be able to replicate data contained in the “old DBs” into a new fresh DB: data inconsistencies or synchronization delays are not compatible with most applications, which means you must keep a single repository of the data.

 

See where I am going? One easy way to solve that solution is to use MetaMatrix. This is the typical steps you would follow:

  • Your architects would define a clean and new DB schema (probably based on the concepts of the two legacy ones);
  • Using the MetaMatrix Eclipse-based tools, architects will graphically implement the new schema i) by capturing the 2 legacy schemas in a graphical representation and then ii) by defining any required transformation to perform the mapping. Architects are also able to set a load of settings such as security scheme, caching, etc
  • Once defined, the new schema and transformation logic is stored in MetaMatrix's meta-data repository;
  • The team developing the new applications (based on JBoss AS and Hibernate, obviously) use the MetaMatrix JDBC driver and the newly defined schema. At runtime, MetaMatrix loads the referenced schema from the repository (it can load and run several schema at the same time obviously) and acts as a SQL database.
  • In parallel, the team dedicated to the migration of existing applications will focus on one application at a time and upgrade each to the new data model according to their migration schedule: each application can be migrated to the new schema independently of the others.

 

Once all applications have been migrated to the new schema (if such a thing is possible in the first place), the company can either decide to keep running things this way or migrate the database itself to the new schema, possibly getting rid of the MetaMatrix mediating layer. However, keeping this mediator can have several advantages, for example:

  • During the 24 months of the migration program, maybe other mergers take place or the new schema itself has to be modified/improved, hence leading to several versions of the unified schema running in parallel;
  • Even if the physical database itself is being migrated to the new schema, it might still make sense to keep MetaMatrix in the middle with a null “identity” mapping. That way you can make schema changes without affecting your applications–. At worse you will need modeling changes (all the benefits of model-driven architecture).
  • Perhaps add the following....Even after the DB is migrated to the new schema, it is likely that requests from other departments will come up, asking to combine both departments' databases for reporting purposes. Rather than providing another copy or an extract of the data, MetaMatrix can provide a view that combines both departments's data for a variety of reporting purposes.

 

What has been described above is a typical scenario where several relational databases are seen as a single one through a relational interface. However, MetaMatrix also provides the following features:

  • On the back-end: ability to aggregate relational and non-relational data sources. Typical examples include mainframes API (through adapters), XML documents and even Excel documents.
  • On the front-end: ability to represent the aggregated information not only as a relational source but also as an XML source and perform queries through the XQuery interface or as a Web Service, for example.

 

Hence MetaMatrix is not just a one-to-many relational aggregation layer, but really a many-to-many aggregation layer, relational or not.

 

As you can guess, MetaMatrix products will be open sourced at JBoss.org. We've already opened up a forum there, so you can start discussing new approaches for using MetaMatrix, including its roadmap and schedule for open sourcing.

 

Onward,

 

 

 

Sacha