Recording commit events into a a distributed binary log for consumption by an eventually consistent external client
sannegrinovero Apr 10, 2015 1:07 PMHello all,
I could use some design advice related to feeding a binary log of changes generated from an Hibernate/JPA application, typically deployed on WildFly (i.e. using Narayana).
We register a javax.transaction.Synchronization as soon as any write is being detected via Hibernate event listeners, and collect changes in the Synchronization instance. We then want to record some work to be done eventually in a binary log, by listening to successful after-completion events.
The problem is this binary log needs to be reliable, distributed, and respect ordering of these logged events. I haven't decided which log implementation I could use for that - currently evaluating Apache Kafka, or a RAFT implementation such as JGroups-Raft - but my concern is with the definition of "ordering".
What we need is that the consumer which will eventually process this log, should be able to apply operations which relate to the same "database entity" (same record) in the same order as transactions have been applied on the database; for example I should in theory be able to replicate the database state. It's currently not my intention to actually provide a database replication feature, but that's possibly an interesting by-product as it would pave the road for a JavaEE friendly alternative for facilities such as XStreams and GoldenGate (both Oracle proprietary).
If I were to use a replicated state machine, these would accept the log entries in the order in which they are enqueued - which doesn't necessarily match the order of what the transaction did (am I right?), especially not as I'm triggering the state transitions as a post-commit event so that opens up for some race conditions between the locks being released on the database and the transitions to be propagated across multiple nodes in a cluster.
By reading the documentation of Apache Kafka, this solution seems more promising than a replicated state machine in terms of performance, but it seems to only guarantee ordering from each event producer independently, so I'm afraid that's not good enough in case of clustered applications deployed on multiple AS instances. And I'd have the same ordering problem of the race conditions described in the previous paragraph, so I need additional hints about this "order" concept.
Assuming one could patch/configure the jgroups-raft implementation or the Kafka implementation to use a different definition of "order" - or worst case implement my own binary log - would Narayana be able to help with such a problem?
For example I think I remember in certain configurations the Transaction ID can be used to help identify some form of "global order" even across multiple nodes in the cluster - however it would not be correct to base ordering on an identifier which gets generated at the beginning of a transaction, so I'm wondering if there is a similar concept which could be used to track ordering of commit operations. I'm aware I could probably get quite far by using timestamps, but I'd prefer a more reliable solution.
Or should I rather look into Hibernate ORM's Flush and Lock events? Are relational databases going to generate some meaningful metadata? (I'm willing to patch Hibernate ORM to get these, if they exist)
Another thing I'm wondering, is if the Transaction log from Narayana could be used (abused) to store such a binary application log. It would need to be able to store additional payload and ultimately make it possible to have a remote consumer read the entries generated by all nodes of the application in global order.
Thanks in advance for any pointer or suggestions!
Sanne