1 Reply Latest reply on Apr 25, 2016 3:21 AM by hchiorean

    Cassandra for Nodes and Binary Storage / Cassandra Subsystem

    frank.mehlhose

      Former Modeshape Wildfly Subsystem versions utilized Infinispan to allow clustering of both the content repository and the binary store via multiple nodes.

       

       

      With Modeshape 5 HA Clustering will require:

      1. A shared transactional database for storing the Content Repository.

      2. A shared binary store.

       

       

      The documentation says, Cassandra is a valid binary store:

      https://docs.jboss.org/author/display/MODE50/Built-in+binary+stores

       

       

      Cassandra is also a high available transcational database.

      Does it make sense to use a Cassandra Cluster for Modeshape Content Repository Persistence and also for Binary Storage?

       

       

      Former Modeshape versions utilized the Infinispan Subsystem that shipped with Wildfly.

      That was relatively easy to configure even in a domain cluster.

       

       

      It become harder to configure when you included infinispan within the Modeshape Wildfly Subsystem and needed to supply additional files to configure the Infinispan Cache and JGroups.

       

       

      Would a Wildfy Cassandra Subsystem make sense together with the Modeshape Wildfly Subsystem?

      From what I see, there are several attempts to implement such a subsystem:

      https://developer.jboss.org/thread/248277?start=0&tstart=0

      https://developer.jboss.org/wiki/ApacheCassandraAndWildFlyPlayingTogether

      https://github.com/hawkular/wildfly-cassandra

      https://github.com/wildfly-extras/wildfly-cassandra

      https://github.com/heiko-braun/wildfly-cassandra

       

       

      A Cassandra Subsystem would allow to Setup Modeshape in a Wildfly Domain completely via JBoss CLI, without additional configuration files.

      Cassandra might be a good replacement for what you tried to achieve with Infinispan in the first place.

       

       

      Does someone have experience using a HA Clustered Cassandra installation for Modeshape for the JCR and the Binary Store?

       

       

      with kind regards

      Frank

        • 1. Re: Cassandra for Nodes and Binary Storage / Cassandra Subsystem
          hchiorean

          Before going into the details of a potential ModeShape - Cassandra integration, a Cassandra WF subsystem would have to be provided (or contributed to) by the Widlfly team (the Infinispan and JGroups subsystems that you refer to are provided by the Wildfly people as they require integration with internal server components). If such a subsystem were available in the server, we could look into using it.

           

          ModeShape-Cassandra integration

           

          Binary store


          ModeShape has had a Cassandra implementation for a binary store for a long time, but it was more of an experiment and until ModeShape 5 you could not even configure and use it. We've added this ability in ModeShape 5 so that if users want to, they can test it. However, it may turn out that in reality this will or will not work (because of transactions, see below)

           

          Persistent store and clustering

           

          After ModeShape 3 and 4 and the Infinispan experience we realized that the only way in which ModeShape can be clustered and actually work (in the sense of providing data consistency) is only in a very conservative way, where

          a) exclusive global cluster locking is used to prevent concurrent modifications of the same nodes

          b) there is only one (conceptual) place where data is stored (i.e. a shared store by all the cluster nodes).

          Any other model wouldn't work because in certain cases it would go back to eventual consistency, which does not work with JCR. The bottom line is that you cannot have eventual consistency with JCR and still be full JCR-compliant.

           

          Equally important, neither Infinispan (at least until 8.x) nor ModeShape (not even in 5.0) support partitions which are a given in any cluster. So the HA acronym goes out the window.

           

          The new persistent stores in ModeShape have really one outstanding requirement: they have to be transactional (in the ACID sense) and also have to behave in linearizable fashion (at least). Although I have not worked with Cassandra personally, after reading Jepsen: Cassandra and the Cassandra documentation (https://opencredo.com/new-features-in-cassandra-2-0-lightweight-transactions-on-insert/) it seems that lightweight (row-level) transactions are not enough. ModeShape requires multi-row ACID transactional support. This support has to be available in the Java driver as well. Until 2.1 it seems that even lightweight transactions did not work at all with the Java driver, which makes me question the capabilities of the Java driver altogether.

           

          The fixes post 2.1 seem to have fixed this problem, but it is still unclear to me if the Java driver (or for that matter Cassandra itself) supports multi-row transactions. Also, if I understand correctly (see previous link) there is no write-ordering in Cassandra which IMO could cause data corruption in certain cases since the order in which you perform JCR operations *is critical*.

           

          To conclude, based on my current understanding, I don't think Cassandra would be a good fit for storing data. I'm more than happy to hear the feedback of people who worked with Cassandra in production, in case I missed anything.