1 2 Previous Next 15 Replies Latest reply on Feb 8, 2013 9:17 AM by rhauch

    Can I use ModeShape for transactional large data?

    ozhou

      Hi Author,

       

      I'm currently evaluating ModeShape as data store to replace relational database since my data model is inherently hierachical and deep inheritance tree.

      There are basic two kinds of data I need to save.

      Considering I have a product catalog to descride all the product specification for vehicle insurance.

      I think ModeShape is good to handle such "master data" or "reference data".

       

      And then when the product is sold, insurance policy instance is generated according to the product specification, and then I need to save the policy instance to somewhere.

       

      The product catalog is only for product managers and has low concurrency and low load and small data.

      But policy can be generated very quickly and potentially very large data.

       

      My question is that does it make sense to use ModeShape to save policy data as well? Since it also has a herachical structure, I prefer using ModeShape as well.

      How it compared to relational database in terms of transcation, performance, scalability?

       

      Best wishes,

      Oliver

        • 1. Re: Can I use ModeShape for transactional large data?
          rhauch

          Oliver,

           

          What you describe does sound like a perfect fit for ModeShape, and your concepts sound like a great starting point.

          My question is that does it make sense to use ModeShape to save policy data as well? Since it also has a herachical structure, I prefer using ModeShape as well.

          Sure, it does make sense to use ModeShape to store the policy data as well, especially if it is hierarchical.

           

          One suggestion I would make is to be sure that you don't create flat hierarchies: don't create a parent node with milions of children (i.e., a flat hierarchy). Instead, if you have that many children, think of ways of splitting the children up into groups or "folders". For example, your policies will likely have an identifier, so you could create a multi-layer hierarchy just from the identifier by creating the first layer for the first few characters of the identifer, the second layer from the second few characters of the identifer, and continuing this as needed to get your distribution.

           

          For example, let's imagine a simple policy identifier pattern "[a-z]{4}\-[0-9]{3}", with "axrd-835" as an example identifier. We want all of our policies to be under the "/policy" node, but putting several million children under "/policies" (e.g., "/policies/axrd-835") would not perform well. But we can use the first two characters of the policy identifier to create a "folder" under which we'll place only the policies with identifiers beginning with that pair of characters. And we could do the same for second two characters. We'd our repository might look something like this:

           

            /policy/aa/aa/aaaa-001

            /policy/aa/aa/aaaa-002

            ...

            /policy/aa/ab/...

            /policy/aa/ac/...

            /policy/aa/ad/...

            ...

            /policy/ax/rd/axrd-835

             ...

             /policy/zz/zz/zzzz-999

           

          The benefit is that the first level of intermediate (and artificial) "folders" could contain up to 676 children (26x26), whereas the second-level folder would contain at most 999 children. Also, by using the complete identifier as the name of the policy nodes, we're still able to search and navigate by whole identifier and never need to "reconstitute" the identifier from a path.

           

          Note, however, that you should create these intermediate folders only when needed. In other words, don't pre-create the "/policy/zz/zz" node until you actually have a policy with an identifier that begins with "zzzz".

           

          We use this approach internally, except that we use UUIDs or SHA1s (which happen to have the added property that they are very well distributed).

           

          How it compared to relational database in terms of transcation, performance, scalability?

          Functionally, ModeShape supports transactions, so writing your application will be very similar. ModeShape also scales and performs well, but honestly you need to try ModeShape for yourself. I'd suggest building a very, versy simple proof of concept to see how it performs and to determine how your application will access the data (since this should inform the particulars of your repository structure, such as node types, hierarchical structure, queries, etc.).

           

          But because ModeShape is designed specifically to be hierarchical, it will likely do this better than a relational database. It is very difficult to design a relational database that allows very efficient and deep navigation, and any such database design will likely be quite convoluted.

          • 2. Re: Can I use ModeShape for transactional large data?
            ozhou

            Hi Randall,

             

            Thanks for detailed explanation. I have set up the enviroment yesterday and can run an standalone "Hello World" ModeShape application.

            I have a set of classes (about 20) annotated with JPA annotation because I used to use Hibernate to store data.

            Based on my understanding, ModeShape is just a hierarchical database, it doesn't contain a "ORM" out of the box.

             

            Image the following case:

            An insurance product is saved into ModeShape. A product can have multiple levels of child nodes.

            Next time, this product is loaded from ModeShape and convert to corresponding Java Object Graph.

            And I make changes to two sub object in the graph, and then I want to save the product again. In Hibernate, it supports Cascaded Saving/Updating and track changes in EntityManager, it finally only send SQL related to changed entity to the database.

            Back to the ModeShape, how can I do the similar efficiently? If I just simply recursively save all nodes again, since most of nodes are unchanged, will ModeShape save them again and cause a lot of unneccessary SQL (Supposed I configured Infinispan with JDBC cache store)?

            • 3. Re: Can I use ModeShape for transactional large data?
              rhauch

              I have a set of classes (about 20) annotated with JPA annotation because I used to use Hibernate to store data.

              Based on my understanding, ModeShape is just a hierarchical database, it doesn't contain a "ORM" out of the box.

              Correct, ModeShape does not have an ORM. Putting an ORM on top of ModeShape would eliminate all of but the "very schema-fixed" way of using ModeShape because you're forcing your structure to match your POJO structure (which will not change). So IMO, an ORM would encourage the less-desirable behavior.

               

               

              Image the following case:

              An insurance product is saved into ModeShape. A product can have multiple levels of child nodes.

              Next time, this product is loaded from ModeShape and convert to corresponding Java Object Graph.

              And I make changes to two sub object in the graph, and then I want to save the product again. In Hibernate, it supports Cascaded Saving/Updating and track changes in EntityManager, it finally only send SQL related to changed entity to the database.

              Back to the ModeShape, how can I do the similar efficiently? If I just simply recursively save all nodes again, since most of nodes are unchanged, will ModeShape save them again and cause a lot of unneccessary SQL (Supposed I configured Infinispan with JDBC cache store)?

               

              First of all, when you use a Session, you're able to access all of the content in the workspace. If you change nothing, there is no transient state (changes) in the Session, so calling "Session.save()" does nothing. If you change 2 properties on one node, then when you call "save()" only those changes will be persisted. IOW, ModeShape only persists the changes that you make.

               

              In effect, we *always* do something like Cascading Saving/Updating.

               

              Now, it is true that you can call "save()" on a particular node, but a) that method in the API is deprecated, b) it's not efficient because we have to figure out all the changes outside of the node that also must be saved, and c) it may end up saving more than you think. IOW, you should never really use "Node.save()" anymore, and always use "Session.save()".

              1 of 1 people found this helpful
              • 4. Re: Can I use ModeShape for transactional large data?
                ozhou

                Hi Randall,

                 

                Thanks for your information. Since there are a lot of services built around the domain model and I want the persistence logic totally separated, that's why I need a "ORM" to make the dirty work for me, but still if needed, I can access the JCR Node directly.

                I found JackRabbit-OCM 2.0 component is what I am looking for. And it can work with any JCR 2.0 implementation including ModeShape. I plan to annotate my all domain model classes and save them into ModeShape.

                Then I will test the performance against my original persistence layer (JPA/Hibernate). Hopefully ModeShape can give me an exciting result because the data model is very hierarchical. I strongly feel saving them using JPA/Hiberante is not the best option.

                • 5. Re: Can I use ModeShape for transactional large data?
                  rhauch

                  Oliver, please let us know how the OCM works with ModeShape, as well as your comparison with your own JPA/Hibernate layer. You're going to use ModeShape 3, right? Please be aware that the choice of cache store (e.g., the JDBC cache stores, BerkleyDB, file system, cassandra, etc.) will have a pretty big impact on results. If possible, try out several just for comparison's sake.

                   

                  If you have any questions, please ask!

                   

                  Best regards

                  • 6. Re: Can I use ModeShape for transactional large data?
                    ozhou

                    Hi Randall,

                     

                    OCM is in general working well with ModeShape after some small modifications. I just want to ask whether there is a standalone modeshape repository browser or eclipse plugin to allow view the content of repository.

                    Command Line is okay, GUI is better.

                    • 7. Re: Can I use ModeShape for transactional large data?
                      rhauch

                      Unfortunately, no we don't have one. If you're deploying to a web server with our REST JAR, then you can use the REST service to navigate. It's not pretty, but it does work.

                      • 8. Re: Can I use ModeShape for transactional large data?
                        ozhou

                        Okay, I get it. It should be enough for me.

                        Sorry for another question, the original appliation is written by using Spring/JPA/Hibernate. How does ModeShape work with Spring transcation manager? Is there any example for this?

                        • 9. Re: Can I use ModeShape for transactional large data?
                          rhauch

                          I know of people using the Spring Transaction Manager and Atomikos, and others that use the JBoss Transaction Manager (which we use in our tests). We describe how to configure a transaction manager here.

                          1 of 1 people found this helpful
                          • 10. Re: Can I use ModeShape for transactional large data?
                            ozhou

                            Okay, finally I found I need to write a special TransactionManagerLookup implementation to fetch TransactionManager from Spring JTA Transacation Manager (Configured with Atomikos)

                            Next week, I will start to compare the performance, but so far everything works well, especially Versioning support is excellent, I can implement a complicated version management feature for our products, before I found it's very hard to implement via JPA/Hibernate

                            • 11. Re: Can I use ModeShape for transactional large data?
                              rhauch

                              Next week, I will start to compare the performance, but so far everything works well, especially Versioning support is excellent, I can implement a complicated version management feature for our products, before I found it's very hard to implement via JPA/Hibernate

                               

                              Great news, Oliver!

                              • 12. Re: Can I use ModeShape for transactional large data?
                                ozhou

                                Hi Randall,

                                 

                                I almost finish the transition and wrote a very simple GUI for repository navigation and query.

                                The biggest question I found is around the performance.

                                 

                                I have wrote a few JUNIT test cases in a test class, in the "setup" method, I create 2 insurance product and insert into ModeShape, in the "tearDown", I remove these two products.

                                Each test case uses a new jcr session. The number doesn't include session creation time (I know the first session needs a lot time to create).

                                I'm using JDBC cache store with H2 in memory database

                                Here's the output, how can I explain these numbers, you can see, I don't know why the first time saving cost more than 5 seconds, however, the second time saving only cost 1.5 second.

                                 

                                2013-02-01 12:13:04,079  INFO [main] (ProductSpecificationPersistenceTest.java:84) - ACORD Use Case

                                2013-02-01 12:13:09,969  INFO [main] (ProductSpecificationPersistenceTest.java:114) - Saving Gold Personal Auto took: 5.871 s

                                2013-02-01 12:13:09,970  INFO [main] (ProductSpecificationPersistenceTest.java:93) - Ebao Use Case

                                2013-02-01 12:13:12,323  INFO [main] (ProductSpecificationPersistenceTest.java:114) - Saving Personal Car took: 2.352 s

                                 

                                2013-02-01 12:13:12,686  INFO [main] (ProductSpecificationPersistenceTest.java:84) - ACORD Use Case

                                2013-02-01 12:13:14,095  INFO [main] (ProductSpecificationPersistenceTest.java:114) - Saving Gold Personal Auto took: 1.409 s

                                2013-02-01 12:13:14,096  INFO [main] (ProductSpecificationPersistenceTest.java:93) - Ebao Use Case

                                2013-02-01 12:13:15,794  INFO [main] (ProductSpecificationPersistenceTest.java:114) - Saving Personal Car took: 1.697 s

                                 

                                2013-02-01 12:13:19,376  INFO [main] (ProductSpecificationPersistenceTest.java:84) - ACORD Use Case

                                2013-02-01 12:13:19,908  INFO [main] (ProductSpecificationPersistenceTest.java:114) - Saving Gold Personal Auto took: 531.0 ms

                                2013-02-01 12:13:19,909  INFO [main] (ProductSpecificationPersistenceTest.java:93) - Ebao Use Case

                                2013-02-01 12:13:20,690  INFO [main] (ProductSpecificationPersistenceTest.java:114) - Saving Personal Car took: 780.6 ms

                                • 13. Re: Can I use ModeShape for transactional large data?
                                  ozhou

                                  Hi Randall,

                                   

                                  After running the test against visualvm, I found the last result is not correct because of some problems in my own application.

                                  The following is the updated result.

                                  You can see the first time saving time is reduced to 2.554s from 5.871s, now it looks normal for me.

                                   

                                  2013-02-08 10:32:07,909  INFO [main] (ProductSpecificationPersistenceTest.java:84) - ACORD Use Case

                                  2013-02-08 10:32:10,477  INFO [main] (ProductSpecificationPersistenceTest.java:114) - Saving Gold Personal Auto took: 2.554 s

                                  2013-02-08 10:32:10,477  INFO [main] (ProductSpecificationPersistenceTest.java:93) - Ebao Use Case

                                  2013-02-08 10:32:12,290  INFO [main] (ProductSpecificationPersistenceTest.java:114) - Saving Personal Car took: 1.812 s

                                   

                                  2013-02-08 10:32:12,614  INFO [main] (ProductSpecificationPersistenceTest.java:84) - ACORD Use Case

                                  2013-02-08 10:32:13,847  INFO [main] (ProductSpecificationPersistenceTest.java:114) - Saving Gold Personal Auto took: 1.232 s

                                  2013-02-08 10:32:13,847  INFO [main] (ProductSpecificationPersistenceTest.java:93) - Ebao Use Case

                                  2013-02-08 10:32:15,149  INFO [main] (ProductSpecificationPersistenceTest.java:114) - Saving Personal Car took: 1.301 s

                                   

                                  2013-02-08 10:32:18,385  INFO [main] (ProductSpecificationPersistenceTest.java:84) - ACORD Use Case

                                  2013-02-08 10:32:19,167  INFO [main] (ProductSpecificationPersistenceTest.java:114) - Saving Gold Personal Auto took: 780.9 ms

                                  2013-02-08 10:32:19,168  INFO [main] (ProductSpecificationPersistenceTest.java:93) - Ebao Use Case

                                  2013-02-08 10:32:20,178  INFO [main] (ProductSpecificationPersistenceTest.java:114) - Saving Personal Car took: 1.009 s

                                  • 14. Re: Can I use ModeShape for transactional large data?
                                    ozhou

                                    In terms of cache store, I tested BDB/JDBC with H2 in-memory/JDBC with MYSQL

                                    BDB is fastest, even faster than H2 in-memory which suprised me a lot.

                                    MYSQL is the slowest, about 30% slower than BDB

                                    1 2 Previous Next