general persistence ideas
porcherg Aug 21, 2007 9:31 AMbasically there are two main objectives for persistence:
- instance recovery after a crash
- search for data into already executed instances.
For us, persistence is used as the following:
1- persist all needed data
2- if there is a crash, we use the db to recover
3- when the instance ends, we delete the instance data from the runtime db. If we want to keep information, we can either flush it to disk or put it in a data warehouse
As a first increment, we can create a first unit test where we only persist basic classes (Process, Node) then we can extend that to all classes used for process definition (Transition, NodeBehaviour...). After this, the test can be extended to all runtime classes (instance classes such as ExecutionImpl and Environment)
To do this, we can use a simple db such as Hypersonic and we can use Toplink or openJPA to make a difference with hibernate (maybe we can make it easy to switch between the different implementations). If it is really easy to change, we can even test with the three implementations.
For the moment, our main comments are:
- choose the JPA inheritance strategy
- decide which fields are transient
- decide if we use both annotations and xml (xml overrides annotations) or if we use only xml
For the JPA inheritance strategy: there are 3 different strategies.
OpenJPA: http://openjpa.apache.org/docs/latest/manual/manual.html#jpa_overview_mapping_inher
TopLink: http://www.oracle.com/technology/products/ias/toplink/jpa/resources/toplink-jpa-annotations.html#Inheritance
Hibernate: http://www.hibernate.org/hib_docs/annotations/reference/en/html_single/#d0e808
Let's take the following example:
B extends A C extends A D extends B
strategy 1 - Joined strategy: we have 4 tables,
TABLE_A contains A related fields
TABLE_B contains B related fields that are not in A
TABLE_C contains C related fields that are not in A
TABLE_D contains D related fields that are not in A and not in B
With this strategy, an object of class D will be persisted in 3 tables.
Advantages:
- easy to understand (close to java model)
- if we add a new class E extending D, we only have a new table for E and the other tables are not changed
Drawbacks:
- updates, search and removes are slower
Strategy 2: Single table: we have only one table TABLE_A
TABLE_A contains A,B,C and D related fields
Advantages:
- if there are only small differences between A, B, C, and D, all is centralized in one table. Queries are quicker.
Drawbacks:
- if there are many different fields, the table will have many columns that will not be used.
- if we add a new class E with new fields, new columns must be created and all existing entries must be updated.
- it's difficult to understand
Strategy 3: One table per concrete class: we have 4 tables (if A, B, C, D are concrete)
TABLE_A contains all A related fields
TABLE_B contains all B related fields (including inherited fields from A)
TABLE_C contains all C related fields (including inherited fields from A)
TABLE_D contains all D related fields (including inherited fields from A and B)
Advantages:
- easy to understand
- if we add a new class E extending D, we only have a new table for E and the other tables are not changed
Drawback:
- many duplicated columns
- optional in the spec (but seems to be implemented in the 3 implementations)
- to select all A objects, 4 queries are performed (one on A, B C and D) and then the union of the results is returned.
Support for the combination of inheritance strategies within a single entity inheritance hierarchy is not required by JPA speci?cation.
openJPA supports mixing inheritance strategies. Hibernate does not. We don't know if Toplink does.
The best approach is to choose the best strategy for each inheritance tree.
For transient fields:
in Process, some fields are annotated transient (initial, nodes...)
Question: Does this mean that we want to persist them as properties in order to use getter and setter methods to store and load them ?
In general, which kind of field will be marked as transient ?
Annotations vs XML
basically we see 2 approaches:
- a default configuration with annotations. JPA allows to override them by using an xml definition.
- no annotations and everything defined in xml.
The main advantage of the annotation solution is that the xml file is easier to understand (because it has fewer elements). But the drawback is that some settings are hidden in the code.
The main advantage for the xml solution is that all the configuration is in the same file.
For the moment we decided to use the native JPA interface where we can and to create a pluggable interface for the 20% of features we need that are not covered by JPA.
If we decide to create a complete pluggable persistence service (no dependency on JPA nor Hibernate), we can imagine that someone comes with a new persistence service that will not be compatible with the annotations.
Thoughts ?
Charles and Guillaume