-
1. Re: Cassandra for Nodes and Binary Storage / Cassandra Subsystem
hchiorean Apr 25, 2016 3:21 AM (in response to frank.mehlhose)Before going into the details of a potential ModeShape - Cassandra integration, a Cassandra WF subsystem would have to be provided (or contributed to) by the Widlfly team (the Infinispan and JGroups subsystems that you refer to are provided by the Wildfly people as they require integration with internal server components). If such a subsystem were available in the server, we could look into using it.
ModeShape-Cassandra integration
Binary store
ModeShape has had a Cassandra implementation for a binary store for a long time, but it was more of an experiment and until ModeShape 5 you could not even configure and use it. We've added this ability in ModeShape 5 so that if users want to, they can test it. However, it may turn out that in reality this will or will not work (because of transactions, see below)
Persistent store and clustering
After ModeShape 3 and 4 and the Infinispan experience we realized that the only way in which ModeShape can be clustered and actually work (in the sense of providing data consistency) is only in a very conservative way, where
a) exclusive global cluster locking is used to prevent concurrent modifications of the same nodes
b) there is only one (conceptual) place where data is stored (i.e. a shared store by all the cluster nodes).
Any other model wouldn't work because in certain cases it would go back to eventual consistency, which does not work with JCR. The bottom line is that you cannot have eventual consistency with JCR and still be full JCR-compliant.
Equally important, neither Infinispan (at least until 8.x) nor ModeShape (not even in 5.0) support partitions which are a given in any cluster. So the HA acronym goes out the window.
The new persistent stores in ModeShape have really one outstanding requirement: they have to be transactional (in the ACID sense) and also have to behave in linearizable fashion (at least). Although I have not worked with Cassandra personally, after reading Jepsen: Cassandra and the Cassandra documentation (https://opencredo.com/new-features-in-cassandra-2-0-lightweight-transactions-on-insert/) it seems that lightweight (row-level) transactions are not enough. ModeShape requires multi-row ACID transactional support. This support has to be available in the Java driver as well. Until 2.1 it seems that even lightweight transactions did not work at all with the Java driver, which makes me question the capabilities of the Java driver altogether.
The fixes post 2.1 seem to have fixed this problem, but it is still unclear to me if the Java driver (or for that matter Cassandra itself) supports multi-row transactions. Also, if I understand correctly (see previous link) there is no write-ordering in Cassandra which IMO could cause data corruption in certain cases since the order in which you perform JCR operations *is critical*.
To conclude, based on my current understanding, I don't think Cassandra would be a good fit for storing data. I'm more than happy to hear the feedback of people who worked with Cassandra in production, in case I missed anything.