4 Replies Latest reply on Feb 16, 2012 8:37 AM by tirthalp

    Questions on Modeshape evaluation against Jackrabbit’s known problems

    tirthalp

      Hi,

      Recently I am working on product already running in live for more than 5 years; having DMS module using JSR-170 and JSR-283 features. For this Apache Jackrabbit 2.2.7 is bundled within the application’s EAR package and deployed on JBoss 4.0.1; so the individual repository instance is started and stopped with the containing application, which means that the application is not only connecting to the repository but is also in charge of starting and stopping the repository). The product’s DMS module has performance penalty due to the way Jackrabbit is integrated/deployed and not possible to solve in my case. In short Jackrabbit is easy to use/configure in clustering, but became bottleneck to meet current growing business needs of the product.

      Overall Modeshape looks very promising JCR implementation. I am evaluating it for two scenarios (1) consider it to support JCR features while launching product for any new customer (2) to replace Jackrabbit for the existing customers (so need to think mainly on existing content migration strategy from Jackrabbit to Modeshape). Answers to below specific questions would greatly help me for further decision making process. 

      (1) I understand if we want to use all JCR (1.0 and 2.0) features, then Modeshape must be running on same JVM instance on which application is deployed; because the WebDAV/REST option offers only limited JCR features. For me it is not possible to upgrade JBoss 4.0.1. So please let me know, in case of any limitation by deploying Modeshape on JBoss 4.0.1 by following mentioned steps http://docs.jboss.org/modeshape/latest/manuals/reference/html/configuration.html#deloying_modeshape_to_jbossas.

      (2) Just want to know is it possible to run Modeshape in standalone mode on JVM (without deploying on JBoss/Tomcat), even if I don’t require?

       

      (3) I just need to ensure scalability of product; not considering high availability. So please confirm, enabling Modeshape clustering (http://docs.jboss.org/modeshape/latest/manuals/reference/html/configuration.html#clustering_configuration) doesn’t demand for Jboss clustering (http://docs.jboss.org/jbossas/jboss4guide/r4/html/cluster.chapt.html). For example, if there are 3 separate Jboss instances running. On each Jboss instance the product and Modeshape are deployed. Here we don’t want clustering of Jboss; however we’ll require to enable Modeshape clustering for ensuring content synchronization accessed among products on all 3 instances.

      (4) In case of Jackrabbit, if too many child nodes then performance goes down (http://wiki.apache.org/jackrabbit/Performance). So for giving consistent performance, does Modeshape has any inherent limitation like this or related to creating content repository model structure?

      (5) Jackrabbit says - if you need to write to the same node concurrently, then you need to use multiple sessions and use JCR locking to ensure there is no conflict (http://wiki.apache.org/jackrabbit/QuestionsAndAnswers#Concurrency). This is only true, if only single instance of Jackrabbit is running. But if we consider Jackrabbit in clustering; then good for horizontally scaling reads only (practically zero overhead on read access) and not so good for heavy concurrent writes because exclusive lock over the whole cluster (writes synchronized over the entire cluster). I need to support both heavy concurrent reads and writes in clustering, however Jackrabbit makes write operations in serialized manner (for example if 50 MB pdf file write operation is in progress in Jackrabbit, then it will keep all other operations in wait during that period of time. I confirmed this behavior by taking Java thread dumps). Where can I get detail on how Modeshape reacts to heavy concurrent reads and writes when Modeshape clustering is enabled?

       

      (6) Any option is available for repository content migrating from Jackrabbit to Modeshape, if we consider to switch from Jackrabbit to Modeshape for live systems? Or any suggestion how expensive can it be to develop migration process/script?

       

      Thank you.

      Tirthal Patel.

        • 1. Re: Questions on Modeshape evaluation against Jackrabbit’s known problems
          rhauch

          My response applies to ModeShape 2.7, except where noted:

          (1) I understand if we want to use all JCR (1.0 and 2.0) features, then Modeshape must be running on same JVM instance on which application is deployed; because the WebDAV/REST option offers only limited JCR features. For me it is not possible to upgrade JBoss 4.0.1. So please let me know, in case of any limitation by deploying Modeshape on JBoss 4.0.1 by following mentioned steps http://docs.jboss.org/modeshape/latest/manuals/reference/html/configuration.html#deloying_modeshape_to_jbossas.

          There are two ways to deploy ModeShape to JBoss AS. The first is to install it as a service within the JBoss AS 5.x or 6.x application server - in this case, it's a resource that your application(s) can simply lookup and use. The second is to install it in the same way as on JBoss AS 4.x, Tomcat, Glassfish, etc - in this case, it's packaged with, started, managed, and owned by your application. The latter is likely what you'll need to do if you want to stay on JBoss AS 4.x.

          (2) Just want to know is it possible to run Modeshape in standalone mode on JVM (without deploying on JBoss/Tomcat), even if I don’t require?

          Yes, you can easily embed ModeShape into any standalone J2SE application. You can even cluster multiple J2SE applications together.

           

          (3) I just need to ensure scalability of product; not considering high availability. So please confirm, enabling Modeshape clustering (http://docs.jboss.org/modeshape/latest/manuals/reference/html/configuration.html#clustering_configuration) doesn’t demand for Jboss clustering (http://docs.jboss.org/jbossas/jboss4guide/r4/html/cluster.chapt.html). For example, if there are 3 separate Jboss instances running. On each Jboss instance the product and Modeshape are deployed. Here we don’t want clustering of Jboss; however we’ll require to enable Modeshape clustering for ensuring content synchronization accessed among products on all 3 instances.

           

          ModeShape and JBoss AS both use JGroups for their clustering technology. However, this does not mean that ModeShape needs JBoss AS for clustering, since ModeShape can have it's own JGroups configuration. And if you embed ModeShape within your web application (deployed on JBoss AS 4), ModeShape will definitely need its own JGroups configuration that is completely different than what's used (if anything) in JBoss AS4.

          (4) In case of Jackrabbit, if too many child nodes then performance goes down (http://wiki.apache.org/jackrabbit/Performance). So for giving consistent performance, does Modeshape has any inherent limitation like this or related to creating content repository model structure?

           

          ModeShape does exhibit some performance degradation with large numbers of child nodes, but there are lots of factors here that come into play, making it difficult to predict how or when you might see these effects. I strongly recommend you trying it out to see if you see or run into it. Be sure to look at How To Select The Right Connectors and How To Tune ModeShape for Better Performance.

          (5) Jackrabbit says - if you need to write to the same node concurrently, then you need to use multiple sessions and use JCR locking to ensure there is no conflict (http://wiki.apache.org/jackrabbit/QuestionsAndAnswers#Concurrency). This is only true, if only single instance of Jackrabbit is running. But if we consider Jackrabbit in clustering; then good for horizontally scaling reads only (practically zero overhead on read access) and not so good for heavy concurrent writes because exclusive lock over the whole cluster (writes synchronized over the entire cluster). I need to support both heavy concurrent reads and writes in clustering, however Jackrabbit makes write operations in serialized manner (for example if 50 MB pdf file write operation is in progress in Jackrabbit, then it will keep all other operations in wait during that period of time. I confirmed this behavior by taking Java thread dumps). Where can I get detail on how Modeshape reacts to heavy concurrent reads and writes when Modeshape clustering is enabled?

          ModeShape has a number of backend connectors used to interact with the persistence mechanism, and it's at this low-level where all write locking is done. However, each connector is different and will thus exhibit different levels of blocking. For example, the transient in-memory connector does have a global write lock, but it's not practical for most application. Most of the other connectors, however, do not use a global write lock. The JPA connector relies upon the regular DBMS functionality for locking, so this will depend on your choice of DBMS and how you've configured it. (Remember some databases and/or configurations do row-level locking, while others do global locking.)

           

          You may want to use JCR locks to ensure your application will exhibit the kinds of concurrency the application needs, but ModeShape does not require this for a clustered application with concurrent writes.

           

          Our goal when building ModeShape was to provide a fast, clusterable JCR implementation that is efficient for concurrent reads and writes, and that goal has strongly informed all of our decisions.

           

          (6) Any option is available for repository content migrating from Jackrabbit to Modeshape, if we consider to switch from Jackrabbit to Modeshape for live systems? Or any suggestion how expensive can it be to develop migration process/script?

           

          Unfortunately, we don't yet have a migration tool that helps migrate content from Jackrabbit to ModeShape. System-view JCR import should work, as long as the "/jcr:system" content is not included.

           

          Now, you may have read that we're in the process of working on ModeShape 3.0, which will be a pretty substantial architectural change for us. (See Next Generation ModeShape for a starting point.) Our goal for ModeShape 3.0 is to be significantly faster, far more scalable, and highly clusterable. So depending upon your timeframe, stay tune to give ModeShape 3 a try.

          1 of 1 people found this helpful
          • 2. Re: Questions on Modeshape evaluation against Jackrabbit’s known problems
            tirthalp

            Thanks Randall for your valuable reply. I have next questions in continuance of above answers...

             

            (1) I understood - Modeshape installation *as a service* is not possible on JBoss AS 4.x; so it must be packaged within J2SE/JavaEE application to manage start/stop. Can you please help me to refer steps/documentation for such setup, if available any?

             

            (2) Sorry my question was different. I was asking to run Modeshape on standalone JVM & J2SE/J2EE application running on different JVM can access it using WebDAV/REST (i.e. http://jackrabbit.apache.org/deployment-models.html - Model 3 diagram). Is such deployment model possible in case of Modeshape?

             

            (3) No further questions on clustering setup.

             

            (4) Thanks to provide links. I'll certainly consider Modeshape performance tuning during practical try.

             

            Performance degradation with large numbers of child nodes - I thought it is the Jackrabbit specific implementation issue, because I didn't find it as part of JCR JSR specifications.

             

            I read about Flat hierarchies in Modeshape 3.0 - Other minor features: "Flat hierarchies - JCR repositories are often designed to be well-formed hierarchies that are not too flat, and use same-name siblings with caution. These constraints are often due to implementation details and are not an inherent limitation of the JCR API. As proof, ModeShape now has excellent support for efficiently handling hundreds of thousands of children under a single node, even when many of those nodes share similar names.".

             

            Does it mean - Modeshape 3.0 has plan to support consistent performance against many child nodes under a single node?

             

            (5) I am very happy about reading "ModeShape was to provide a fast, clusterable JCR implementation that is efficient for concurrent reads and writes" and this is encouraging point for me to evaluate different use cases practically with Modeshape.

             

            (6) At some time, I'll try Jackrabbit to Modeshape content migration in plain vanila mode & share results in future.

             

            Hm... I am already in the process of understanding it at depth and may consider ModeShape 3 to support JCR features in future projects. 

             

            Thank you again.

             

            • 3. Re: Questions on Modeshape evaluation against Jackrabbit’s known problems
              rhauch

              (1) I understood - Modeshape installation *as a service* is not possible on JBoss AS 4.x; so it must be packaged within J2SE/JavaEE application to manage start/stop. Can you please help me to refer steps/documentation for such setup, if available any?

               

              See this page in our documentation.

              (2) Sorry my question was different. I was asking to run Modeshape on standalone JVM & J2SE/J2EE application running on different JVM can access it using WebDAV/REST (i.e. http://jackrabbit.apache.org/deployment-models.html - Model 3 diagram). Is such deployment model possible in case of Modeshape?

              The WebDAV/REST components require a servlets container, so technically a J2SE application wouldn't work. But if you're standalone application is a valid J2EE environemtn, then maybe it'd work.

               

               

              Performance degradation with large numbers of child nodes - I thought it is the Jackrabbit specific implementation issue, because I didn't find it as part of JCR JSR specifications.

               

              I read about Flat hierarchies in Modeshape 3.0 - Other minor features: "Flat hierarchies - JCR repositories are often designed to be well-formed hierarchies that are not too flat, and use same-name siblings with caution. These constraints are often due to implementation details and are not an inherent limitation of the JCR API. As proof, ModeShape now has excellent support for efficiently handling hundreds of thousands of children under a single node, even when many of those nodes share similar names.".

               

              Does it mean - Modeshape 3.0 has plan to support consistent performance against many child nodes under a single node?

               

              The fact that JCR repositories are naturally hierarchical, that nodes have references to their children, and that child nodes need to be accessed randomly all make it difficult for any implementation to scale to large numbers of child nodes. ModeShape 2.x does I think does as well as Jackrabbit, but some developers have high expectations for what "large numbers of child nodes" means. With ModeShape 3, we think we've found a way that will scale to much larger numbers (more than 100Ks of children) with minimal (perhaps unnoticeable) performance degredation, and, with some internal improvements later on, eventually many more than that.

               

              But having said that, a good JCR design will try to use the hieararchal nature to an advantage. Even file systems can't have millions of files in a single directory.

               

              Best regards.

              • 4. Re: Questions on Modeshape evaluation against Jackrabbit’s known problems
                tirthalp

                Thanks much Randall to clarify all doubts.