1 Reply Latest reply on Nov 18, 2014 2:03 PM by Randall Hauch

    Best practices for distributing large sets of nodes to avoid performance issues

    Richard Lucas Apprentice

      We currently have a node hierarchy that has a top level node which then has multiple child node layers under it each layer averaging somewhere around 5 nodes. This maps cleanly into the JCR node structure.


      The problem we are running into is that although the child node layers are distributed in a hierarchy that maps to are business domain model and should provide good performance over time, the same does not apply to the parent nodes. The number of top level nodes are expected to grow over time into the 100,000's if not 1,000,000's and will share the same parent.


      We are already seeing issues with this when load testing as we use an aggressive locking strategy and there is high contention for locks on the parent node when creating top level nodes, I also expect to see read performance degradation as the number of top-level nodes grow.


      Given this we are looking for ways to distribute the top level nodes so they do not all share the same parent. The problem is our business domain model does not provide a way to split the top level nodes. Given this we are looking at providing some arbitrary distribution of top level nodes across a set of intermediate parents that only exist as an internal mechanism and would be transparent to the business domain model.


      What I'm wondering is if there are any best practices in JCR for distributing a flat structure of nodes across a set of intermediate child nodes or if I am looking at the problem incorrectly and we should be considering other options?