14 Replies Latest reply on Sep 25, 2008 5:41 PM by adamw

    Why numerical revisions, and not the date ranges?

      What was the reasoning behind using numerical revision numbers in Envers? Wouldn't it be more useful to store date stamps instead (the way it's done in Oracle Workspace Manager, for example)? After all, we design our apps mainly for business users, who hardly ever use source control systems (doh!), and have no understanding of (and, quite frankly, don't give a flying cr@p about) the "revision #" concept. To them, it would be more natural to query the database state "as it was on date X" rather than "as it was at revision Y". I currently have an app that use the OWM-style versioning, and was initially excited to see the Envers doing similar things, but the numerical revisions make it nearly useless for our purposes...

        • 1. Re: Why numerical revisions, and not the date ranges?
          adamw

          Hello,

          the reasoning is that two transactions can be commited at the same time, having the same timestamps. Then, two changes to an entity could get "merged" into a single one, producing false results when asking for the number of changes, or making it impossible to determine which user made the change (as there could potentially 2 different users making the changes).

          However, it's still very easy to query the database "as it was on date X" etc.

          First of all, each revision entity (the meta-data that is bound with each reivision), contains a timestamp. Secondly, you can easily convert revisions to timestamps and vice versa using the VersionsReader interface:
          http://www.jboss.org/files/envers/api-beta/index.html

          There, you have getRevisionDate, getRevisionNumberForDate and findRevision which let you do the conversions.

          --
          Adam

          • 2. Re: Why numerical revisions, and not the date ranges?
            adamw
            • 3. Re: Why numerical revisions, and not the date ranges?

              Adam,

              Thanks for the prompt response, it gets clearer now. Basically, the use case for versioning in our app is to be able to view the whole versioned schema as of certain date (audit requirements). I understand that the revisions are global to the schema, i.e., there's no separate revision numbering per table, correct? Is it still possible to view the data state at a date that falls between revisions? Do I have to apply the date conditions every time I request something from the versioned schema? It would be a nice feature to set the "view as of date" property globally per VersionedReader, so that, once set, it would always retrieve a proper historical view. Also, it would be nice to have something similar to the Hibernate Search's FullTextEntityManager - a versioning-aware EntityManager, to be able to use standard JPA instead of Hibernate Session.

              • 4. Re: Why numerical revisions, and not the date ranges?
                adamw

                Hello,

                the revisions are global, yes (that way versioning relations works).

                The getRevisionNumberForDate will return a "good" revision number for any date. More specifically, if you request the revision number for date X, you will a revision number that is as high as possible, and that was commited on or before X. So, that revision number represents the data as it was on date X.

                You don't use Hibernate's Session in Envers - instead, having an EntityManager or a Session, you can obtain a VersionsReader. And then you can obtain the revision number once, and pass it to the methods of the VersionsReader.

                I agree that having a "VersionsEntityManager" would be nice, but it has different methods than the VersionsReader. And implementing it is far from straightforward.

                The biggest difference are the queries. In JPA, you only have HQL queries, so queries which you write in String form. In Envers, you only have Hibernate-Criteria-like queries (so queries constructed programmatically), and moreover, there are two types of queries: for entities at a given revision (this interests you more, I think), and for revisions of entities.

                Moreover, the HQL is richer than what is currently supported by Envers Queries. In the future, that can change, of course :)

                --
                Adam

                • 5. Re: Why numerical revisions, and not the date ranges?

                  Well, that's exactly what I seem to be missing here. The DML operations (insert, update, delete) are already transparent in Envers (that is, they are performed against the original entities) - but the queries are not. One has to construct them in a special way to query versioned data. The transparent queries is something that OWM offers - as it's an Oracle-only product, it does it by creating views that point to the versioned tables, and using INSTEAD OF triggers to intercept DML thrown at these views - and it's something that would be great to have in Envers, too, although I do realize it's not trivial to implement. Maybe it's possible to manipulate the Hibernate's internal query form (all JPA queries are compiled into it anyways) to add the versioning conditions? Not sure whether Hibernate exposes this, though.

                  • 6. Re: Why numerical revisions, and not the date ranges?

                    Now that I'm thinking of it more... why can't we use Hibernate filters for this? If every versioned table has a "revision" column, then simply enabling it for all versioned entities (might even be done automatically at startup, if versioning can piggyback on Hibernate configuration, the way Annotations do it) gives us a complete versioned view on all data...

                    • 7. Re: Why numerical revisions, and not the date ranges?

                      Ok, to sum it up, here's how I'd approach transparent querying in Envers - let me know what you think

                      // as done in Annotations, piggyback on existing plain EMF
                      // as part of startup, assign filters to all versioned entities
                      VersionedEntityManagerFactory vemf = ...
                      // similar to Hibernate Search
                      // VEM extends the EM interface with versioning-specific operations
                      VersionedEntityManager vem = vemf.createEntityManager();
                      // optionally, set the revision to query for
                      // by default, uses the latest (live) revision
                      // the revision setting stays for as long as the VEM is reused - e.g. in Seam's long conversations
                      vem.setTargetRevision(25);
                      // transparently use regular HQL queries
                      Query q = vem.createQuery("...");
                      // VEM enables filters for all versioned entities transparently in this call
                      q.getResultList();

                      • 8. Re: Why numerical revisions, and not the date ranges?
                        adamw

                        Hello,

                        the problem that I have with filters is that I have to do more then just adding a constraint on the revision field (it's possible that it could be done simpler - I just didn't yet discover how ;) ). It's all because of relations. If I have a many-to-one relation, in the hibernate mapping of the versions table, I can't map this as a relation, but I've got to map the related entity's id. That is because I can't reference a specific revision of the related entity - the revisions at which those two entities change can be completely different.

                        Having transparent queries is one of the "ultimate" goals, and it would be great to have them, I agree :). However now, when the allowed constructs in the queries are restriced, I think that it's better to enable people only to construct valid queries, instead of throwing a run-time exception in case an unsupported construct is used. I don't remember it very well now, but I think it's possible to plug-in to the parsed HQL tree and manipulate it.

                        About the syntax of the VersionedEntityManager - setting "targetRevision" there is one option, the other is doing it when creating the query. But that's details :)

                        Anyway, if you'd be willing to contribute in this area, you're of course very welcome :)

                        Adam

                        • 9. Re: Why numerical revisions, and not the date ranges?

                           

                          "adamw" wrote:
                          the problem that I have with filters is that I have to do more then just adding a constraint on the revision field (it's possible that it could be done simpler - I just didn't yet discover how ;) ). It's all because of relations. If I have a many-to-one relation, in the hibernate mapping of the versions table, I can't map this as a relation, but I've got to map the related entity's id. That is because I can't reference a specific revision of the related entity - the revisions at which those two entities change can be completely different.


                          Ah, I see now. So it turns out that if I have a one-to-many between tables A and B, and a gets a new row, this row will be marked as a new revision #, yet it will reference an older revision of B... I see.

                          Well, that's exactly why I wondered why did you go with the revision #s instead of dates - it's very hard (if not impossible) to reconstruct a valid state of relationships using this approach, without having to determine the revision # for each participating table separately. With date ranges, this task is almost trivial - and retrieving live data is even simpler, as you simply query for end range to be NULL. It gets harder to pull a graph of changes for a particulat transaction this way, but then again, nothing prevents you from putting both the revision # and dates on the record, now does it?

                          Regarding contribution - I'd love to, but I'm wondering how much flexibility can I expect in terms of the overall architecture and approach - because I feel that (no offense) the current approach might pose big problems in accomplishing what I'm thinking of.

                          BTW, do you have access to an Oracle DB?

                          • 10. Re: Why numerical revisions, and not the date ranges?
                            adamw

                            Hello,

                            I don't quite understand what you wrote here :). So, you have two entities in a many-to-one relationship: A and B. A has a collection of B's and B has a reference to A. Now, you add a new B, and set it a reference to A. This generates a new revision - lets say number X (on both sides, as the revision content changes for A; this only happens on collection elements adding/removal, not update). Now, if you get the the B or A entity at revision X-1, you'll see the old relationship. If you get it at revision X, you'll see the new one.

                            It will work the same if there's an unidirectional relation between A and B.

                            Revisions have the benefit I described in previous posts above dates, and, dates and revisions are in fact equvialent (you can convert between the two using the VersionsReader).

                            I agree that the schema can be improved - and I plan to propose it for Envers2, in a very similar fashion that you describe here; however this would involve data migration so we'll need some discussion beforehand. I'll write about it tomorrow or next week.

                            No, I don't have access to an Oracle DB :)

                            Adam

                            • 11. Re: Why numerical revisions, and not the date ranges?

                              I think I got my relationship backwards (one B to many A) that's why the confusion. Let's stick with yours for the purposes of discussion, then (one A to many B).

                              So, the way I understand it, when there are insertions or deletions in the child collection, both parent and child sides get a new revision #s, but for updates to a child entity, only child gets a new revision, correct? And this way, the revision #s on parent and child can get out of sync, so that an entity B with revision X would relate to an entity A with revision X-1 (or any Y<X, for that matter), correct? Yes, this makes it problematic to map these relationships, you're right.

                              This is the part where dates are not equivalent to revisions, IMHO, as date filter (in the form of ":date BETWEEN effective_from and effective_to" for past revisions, and "effective_to IS NULL" for the live one) would work correctly even when child element is updated yet parent one is not.

                              • 12. Re: Why numerical revisions, and not the date ranges?
                                adamw

                                Hello,

                                no :) An entity at revision X always relates only to entities at revision X :). The revisions generated on child collection insertions/deletions don't really have any significance, and you can turn them on or off in the configuration. They are only there so that when you ask "when this entity was modified", you'll also get the revisions in which the collection was expanded or shrunk.

                                When a B entity is persisted, lets say at revision X, only the id of the related entity A is persisted. It's not a relation in the hibernate meaning or the database meaning (no foreign key constraints). When you read it, at revision X, and you request the related A entity, Envers queries the database for A at revision X.

                                Maybe we are talking about different situations:
                                let's stay with the A and B example. B has a reference to A. Now, you change the referenced a (instance of class A) to a'. Then you ask the EntityManager for the B instance and call getA(). Would you expect to get a or a'?

                                Adam

                                • 13. Re: Why numerical revisions, and not the date ranges?

                                  If this B instance would have been saved after changing a to a', and if I'm asking for a 'live' instance, then I would expect to get a' back - and I believe this is how Envers does it, right? What I'm wondering is how it's done in the DB, what gets inserted/updated. The way I understand it, when B gets to point to a different instance of A, in DB this means that the A_ID gets changed. With versioning, I assume this means that, instead of updating A_ID, the versioned table gets a new row, with the same B_ID, but the new A_ID, and new revision number X, and the old row (with the old A_ID and, say, old revision number Y<X) remains intact. What I don't understand is whether this operation also inserts a new row into the A table - with the same ID and with the revision number X?

                                  • 14. Re: Why numerical revisions, and not the date ranges?
                                    adamw

                                    Hello,

                                    yes, Envers would return a' back. Though I think that you understood my desription wrong (because it was wrong) :) What I mean is that you update a to a' - you don't change anything in the B instance.

                                    Here's a short description on what is inserted to the versions tables:
                                    http://www.jboss.org/envers/tables.html

                                    As I wrote, whether a new row is inserted into the versions tables when a reference/collection changes is configurable, and has only impact on the results of checking when an entity was changed. It does not have an impact on the content of relationships.

                                    Your description is correct. Old versions always remain intact, and the current version is also always intact in the versions tables. The latest data is exactly the same, as if versioning was switched off.

                                    Adam