7 Replies Latest reply on Aug 24, 2007 10:11 AM by tom.baeyens

general persistence ideas

porcherg Aug 21, 2007 9:31 AM

basically there are two main objectives for persistence:
- instance recovery after a crash
- search for data into already executed instances.

For us, persistence is used as the following:
1- persist all needed data
2- if there is a crash, we use the db to recover
3- when the instance ends, we delete the instance data from the runtime db. If we want to keep information, we can either flush it to disk or put it in a data warehouse

As a first increment, we can create a first unit test where we only persist basic classes (Process, Node) then we can extend that to all classes used for process definition (Transition, NodeBehaviour...). After this, the test can be extended to all runtime classes (instance classes such as ExecutionImpl and Environment)

To do this, we can use a simple db such as Hypersonic and we can use Toplink or openJPA to make a difference with hibernate (maybe we can make it easy to switch between the different implementations). If it is really easy to change, we can even test with the three implementations.

For the moment, our main comments are:
- choose the JPA inheritance strategy
- decide which fields are transient
- decide if we use both annotations and xml (xml overrides annotations) or if we use only xml

For the JPA inheritance strategy: there are 3 different strategies.
OpenJPA: http://openjpa.apache.org/docs/latest/manual/manual.html#jpa_overview_mapping_inher
TopLink: http://www.oracle.com/technology/products/ias/toplink/jpa/resources/toplink-jpa-annotations.html#Inheritance
Hibernate: http://www.hibernate.org/hib_docs/annotations/reference/en/html_single/#d0e808
Let's take the following example:

B extends A
C extends A
D extends B

strategy 1 - Joined strategy: we have 4 tables,
TABLE_A contains A related fields
TABLE_B contains B related fields that are not in A
TABLE_C contains C related fields that are not in A
TABLE_D contains D related fields that are not in A and not in B
With this strategy, an object of class D will be persisted in 3 tables.
Advantages:
- easy to understand (close to java model)
- if we add a new class E extending D, we only have a new table for E and the other tables are not changed
Drawbacks:
- updates, search and removes are slower

Strategy 2: Single table: we have only one table TABLE_A
TABLE_A contains A,B,C and D related fields
Advantages:
- if there are only small differences between A, B, C, and D, all is centralized in one table. Queries are quicker.
Drawbacks:
- if there are many different fields, the table will have many columns that will not be used.
- if we add a new class E with new fields, new columns must be created and all existing entries must be updated.
- it's difficult to understand

Strategy 3: One table per concrete class: we have 4 tables (if A, B, C, D are concrete)
TABLE_A contains all A related fields
TABLE_B contains all B related fields (including inherited fields from A)
TABLE_C contains all C related fields (including inherited fields from A)
TABLE_D contains all D related fields (including inherited fields from A and B)
Advantages:
- easy to understand
- if we add a new class E extending D, we only have a new table for E and the other tables are not changed
Drawback:
- many duplicated columns
- optional in the spec (but seems to be implemented in the 3 implementations)
- to select all A objects, 4 queries are performed (one on A, B C and D) and then the union of the results is returned.

Support for the combination of inheritance strategies within a single entity inheritance hierarchy is not required by JPA speci?cation.
openJPA supports mixing inheritance strategies. Hibernate does not. We don't know if Toplink does.

The best approach is to choose the best strategy for each inheritance tree.

For transient fields:
in Process, some fields are annotated transient (initial, nodes...)
Question: Does this mean that we want to persist them as properties in order to use getter and setter methods to store and load them ?
In general, which kind of field will be marked as transient ?

Annotations vs XML

basically we see 2 approaches:
- a default configuration with annotations. JPA allows to override them by using an xml definition.
- no annotations and everything defined in xml.

The main advantage of the annotation solution is that the xml file is easier to understand (because it has fewer elements). But the drawback is that some settings are hidden in the code.
The main advantage for the xml solution is that all the configuration is in the same file.

For the moment we decided to use the native JPA interface where we can and to create a pluggable interface for the 20% of features we need that are not covered by JPA.
If we decide to create a complete pluggable persistence service (no dependency on JPA nor Hibernate), we can imagine that someone comes with a new persistence service that will not be compatible with the annotations.

Thoughts ?
Charles and Guillaume

1. Re: general persistence ideas

tom.baeyens Aug 22, 2007 3:13 AM (in response to porcherg)

as a first increment, the class Process has to be made persistent. only Process, no related objects. One table: PVM_PROCESS. a test case, with store, load, update and delete.

once that is done, then we need to cover caching strategy.

once that is done we need to define the cascading over the process definition model. so that saving process object cascades to the whole process definition.

then we can start mapping and testing persistence of other related process definition classes.

Also, we should use field access. I believe that is the default.

The inheritence mapping strategy needs to be defined for each inheritence individually.

I think it's better to use XML and no annotations as that creates 2 more library dependencies.
Actions
2. Re: general persistence ideas

tom.baeyens Aug 22, 2007 3:16 AM (in response to porcherg)

"porcherg" wrote:
For transient fields :

some fields in class process were marked transient in an attempt to get the first process persisted with jpa.

by default, if a field is transient (not serialized) then jpa doesn't store this field either. this was just temporary and is not a good strategy for us to use. as we want our processes to be serializable as well as persistable in db.
Actions
3. Re: general persistence ideas

porcherg Aug 22, 2007 7:45 AM (in response to porcherg)

"tom.baeyens@jboss.com" wrote:
once that is done, then we need to cover caching strategy.

What do you mean by caching strategy? We know this is available if we use hibernate but there is no standard way to do this in different JPA implementations.

"tom.baeyens@jboss.com" wrote:
The inheritence mapping strategy needs to be defined for each inheritence individually.

Do you want to mix inheritance strategy in the same inheritance tree ? If yes, this is not supported by all JPA implementations (not required by the spec). Hibernate does not support this feature.

Charles and Guillaume
Actions
4. Re: general persistence ideas

tom.baeyens Aug 22, 2007 8:05 AM (in response to porcherg)

"porcherg" wrote:
"tom.baeyens@jboss.com" wrote:
once that is done, then we need to cover caching strategy.

What do you mean by caching strategy? We know this is available if we use hibernate but there is no standard way to do this in different JPA implementations.

It is pretty crucial for our performance. We interpret the process model during runtime. An alternative approach is to generate code for a process during deployment. If we can't cache the process definition in memory, the difference between our approach and the code generation becomes too much.

We must make sure that it works out of the box with every JPA implementation that we support. Otherwise the PVM gets the blame and will be called not performant.

"porcherg" wrote:
"tom.baeyens@jboss.com" wrote:
The inheritence mapping strategy needs to be defined for each inheritence individually.

Do you want to mix inheritance strategy in the same inheritance tree ? If yes, this is not supported by all JPA implementations (not required by the spec). Hibernate does not support this feature.

the point i was trying to make was that we can't decide on an inheritence mapping strategy in general. We have to look at each inherintence tree separate and decide on it.

Apart from that, Hibernate *does* support mixed inheritence mapping.
Actions
5. Re: general persistence ideas

csouillard Aug 22, 2007 8:17 AM (in response to porcherg)
Tom,

Hibernate supports mixed inheritence strategies if it is used standalone. Yesterday we had a look with Guillaume about Hibernate through JPA and we have seen that it was not supporting mixed inheritence strategy... (in the same inheritence tree).
You can read that here :
http://www.hibernate.org/hib_docs/annotations/reference/en/html_single/#d0e788
The chosen strategy is declared at the class level of the top level entity in the hierarchy using the @Inheritance annotation.

Have you got a new doc saying that Hibernate through JPA supports mixed inheritence in the same inheritence tree ?
Charles
Actions
6. Re: general persistence ideas

tom.baeyens Aug 22, 2007 8:20 AM (in response to porcherg)

that is a problem

i'll have to look at the consequences and if/how we can resolve
Actions
7. Re: general persistence ideas

tom.baeyens Aug 24, 2007 10:11 AM (in response to porcherg)

blog to watch: http://weblogs.java.net/blog/jdeanquin/archive/2007/08/diference_betwe.html
Actions

Go to original post