4 Replies Latest reply on Feb 11, 2016 1:37 AM by Horia Chiorean

    ModeShape 5 and beyond

    Randall Hauch Master

      ModeShape has come a long way in the past 7+ years, and it's amazing to think of the capabilities and features the whole community has put into the software -- thank you once again! More recently, we've made a lot of great progress by steadily fixing issues, adding a few new features, and improving performance. As a result of all that effort, ModeShape 4.4 is more stable and functional than ever before. ModeShape 4.5 is almost ready and will introduce a Lucene index provider and the ability to incrementally update indexes and keep them synchronized in a cluster.


      Although ModeShape is stable and well behaved in normal operations, we want to focus our future roadmap on making it far easier to manage and recover safely from unexpected problems. Our current architecture limits what we can do, so the 5.x effort will be addressing these through fairly significant changes that we feel are necessary to take ModeShape to the next level.

       

      Background

       

      The JCR API provides a very useful model of working with content. Applications can use JCR sessions to read and navigate content, and additions and changes to multiple nodes can be accumulated in the session and then saved all at once. JCR requires and ModeShape provides strong consistency by ensuring that changes made by the sessions are atomic and that changes appear to be made in a linearized manner. This significantly simplifies application logic, but linearizability is expensive in a distributed system.

       

      ModeShape has relied upon Infinispan to provide a scalable and strongly consistent distributed system foundation. And with Infinispan, ModeShape 4 can be carefully clustered to handle large numbers of nodes. But unfortunately this capability comes with additional costs and reduced performance due to the extra coordination required by a distributed system providing consistent and transactional behaviors. Any highly distributed but strongly consistent system will incur these costs.

       

      Infinispan can work astoundingly well as a very large cache. But because of its architecture, relying upon Infinispan as a strongly consistent primary data store leads to a number of issues. Infinispan’s underlying cache stores do not participate in transactions (see MODE-2420 and this thread), and while it is technically possible to fully recover from a crash by replaying the transaction log, doing this is not easy or quick. Yet easy and quick is exactly what you need when you're trying to bring a system back up after a crash.

       

      It’s also still too difficult and complicated to properly configure Infinispan (see MODE-2427 and this thread). For a long time evicting data from memory could cause consistency problems, although this has been recently fixed in Infinispan 7.2. Still, if the network were to partition (and this does happen), Infinispan is still susceptible to split brain, meaning it's possible for different partitions to independently change data and making it likely the data becomes inconsistent and difficult to fully recover. And while we'd like to be able to cluster Infinispan with multiple cache stores for performance reasons, we still cannot do so because a failure on one of the processes is fairly likely to trigger a "partial rollback" that makes the clustered data inconsistent.

       

      To be fair, ModeShape uses Infinispan as primary data storage, whereas the Infinispan community has in the past few years moved focused on being an effective distributed cache for data that is stored elsewhere. The latter use cases are far more tolerant of network partitions and consistency problems, because you always have the option of purging some or all of your cache and rebuilding it from the external persistent store. ModeShape doesn’t have that option when it uses Infinispan as the primary data store.

       

      The bottom line is that we've not lived up to our goals for scaling out with large clusters. In many respects, those scalability and distributed goals were simply not compatible with the reality of strong consistency and stateful sessions required by JCR.

       

      Now, I'm certainly not saying that JCR is useless -- far from it! JCR and ModeShape are still very useful and have a great list of features: a hierarchical data structure with flexible schema constraints, strong and weak references, sessions that allow easily navigating and editing portions of large amounts of data with minimal impact on memory, strong consistency and participation in user transactions, binary storage with optional text extraction, powerful query and search, events and observation, an event journal, access control, versioning, same-name siblings, orderable children, locking, and workspace management. ModeShape adds non-standard but very useful features: Wildfly integration, an enhanced query language, federation, content-based automatic sequencing, manual sequencing, large unordered collections, extensible authentication and authorization, REST API, repository explorer web application, and more.

       

      ModeShape 5

       

      As we begin focusing on ModeShape 5, our goals are to continue to support all of these JCR and ModeShape features while making ModeShape more durable, resilient, reliable, and tolerant of network problems and other failures. There is no silver bullet, so we can only accomplish this by changing our direction and focusing on what JCR and ModeShape can do well. It also means we have to acknowledge and move away from what ModeShape does not do well.

       

      To achieve these goals, we believe ModeShape needs to change in several important ways. First, ModeShape needs to focus on scaling up rather than out. A single larger process will be far faster than a distributed cluster, simply because all coordination can be done within the single process and does not have to involve network communication. A single process is also more easily able to guarantee consistency even when failures happen. After that is solid, we would then work toward supporting small clusters for high availability. For example, we might find that a simple cluster topology with a single primary and warm secondary provides the high-availability qualities needed by most applications and sessions.

       

      The second important change is to move to persistent storage using a durable, resilient, consistent, and transactional database that is easy to manage and operate, that doesn't lose data, and is easier to recover from network problems and crashes. It’s likely that different use cases will have different needs, so we’re looking at several possible technologies. For example, an embedded repository might prefer a local embeddable database like RocksDB, which recently added support for transactions. Users of larger repositories might prefer a separate relational database; after all, many ModeShape 3 and 4 users already use a relational database via the Infinispan JDBC cache store, and many operations teams are already comfortable with managing, scaling, backing up, and clustering relational databases. PostgreSQL is an extremely solid open source relational database with some incredible JSON support (including queries), so we’re even considering how we can automatically take advantage of those new features when using a PostgreSQL database. Other technologies like CockroachDB are exciting, but too early for us to consider at this time.

       

      (Other database systems are interesting but don’t satisfy our requirements and are therefore off the table. MongoDB does not support multi-document transactions, and may lose data or may cause dirty and stale reads. Cassandra uses tunable eventually consistency and does not support fully ACID transactions. ElasticSearch is a popular clustered search platform, but it doesn't support transactions and can lose data. Aerospike is a fast distributed key-value store that partitions and replicates schemaless data with support for large volumes of data. They claim ACID updates, although it's been shown to lose data during network partitions.)

       

      Finally, a third change would be to always persist the events in a durable log or within the underlying transactional data store, so that ModeShape can guarantee at-least-once processing of all events even following a crash in one or more ModeShape processes. We'd have to evaluate whether it is more desirable and performant to store the events within the database during the same transaction (which might lengthen each transaction), or to use a separate write-ahead log mechanism.

       

      With such a large change in storage, migration of existing 4.x installations will be a priority. We will reuse the existing backup and restore capability already in 4.x, making it easy for users to migrate their data to the new system. Configurations will of course be very different, but we’re hoping the 5.x configurations will be dramatically simpler. We will also continue our Wildfly subsystem, and will look at ways of making deployment with Wildfly even easier. Otherwise, we expect that all ModeShape functionality and APIs remain unchanged.

       

      Next steps

       

      As we mentioned above, we’ll release 4.5 very soon, will continue to fix issues on the 4.x branch, and plan to cut a 4.6 release early next year. However, after we release 4.5 we do plan to focus nearly all of our effort on the 5.x changes, and will provide early-access releases (alphas?, betas, candidate releases) as we make progress. As always, we welcome and encourage anyone that wants to contribute!