3 Replies Latest reply on Jan 26, 2016 2:38 AM by hchiorean

    Looking for suggestions on how to optimize certain queries (Modeshape 4)

    ma6rl

      Query 1

       

      I have 11,000 nodes with a mixin type of app:asset that sit under a parent node /assets, and approximately 330,000 nodes total  in the repository. The asset nodes are descendent nodes, not direct children as we used the UUID approach described in this article Improving performance with large numbers of child nodes | ModeShape to distribute the nodes (over time we expect the number of nodes to be in the millions).

       

      I have a need to be able to return a paginated result set of asset nodes ordered by created date descending. I currently use the following query to return the first 10 assets:

       

      select node.* from [app:asset] AS node WHERE ISDESCENDANTNODE("/assets") ORDER BY node.[jcr:created] LIMIT 10 OFFSET 0;
      

       

      In order to optimize the query I have created an index (using Lucene)


      <index name="asset_types" provider-name="lucene" synchronous="true" node-type="app:asset" columns="jcr:mixinTypes(NAME)"/>
      

       

      I have a cache configured that can hold the entire repository in-memory so no access the the caches persistent store should be required (using JDBC).

       

      The query on average takes about 4 seconds to return which is considerably longer than I would like to bring back a result set of 10 nodes. The query is using the asset_types index (I have logged the query plan to confirm this). Does anyone have any suggestions on how I could improve the performance of this query or where and why it is taking 4+ seconds to execute?

       

      Query 2

       

      Under /assets/ab/cd/asset1 I have a 2 child nodes with a mixin type app:variant out of a total of ~30 nodes. If I run the following query:

       

      select node.* from [app:varitan] AS node WHERE ISDESCENDANTNODE("/assets/ab/cd/asset1") ORDER BY node.[jcr:created] LIMIT 10 OFFSET 0;
      

       

      It takes 5+ seconds to return the 2 nodes of app:variant. In this scenario I have not indexed the node type 'app:variant' as doing so results in all variant nodes across all assets being returned and filtered based on the ISDESCENDANTNODE constraint which normally timeout. Again does anyone have any suggestions on how I could improve the performance of this query or where and why it is taking 5+ seconds to execute?

       

      In both cases I used the following configuration when specifying my index provider

       

      <index-providers>
          <index-provider name="lucene" classname="lucene" path="modeshape/indexes/" relative-to="jboss.server.data.dir" module="org.modeshape.index-provider.lucene"/>
      </index-providers>
      
        • 1. Re: Looking for suggestions on how to optimize certain queries (Modeshape 4)
          hchiorean

          The existing index providers can't handle join conditions so the ISDESCENDANTNODE constraint (or any other JOIN constraint) will be processed by the default, in-memory engine.

           

          For the 1st query, did you look at the query plan ( Query and search - ModeShape 4 - Project Documentation Editor) to make sure the index is actually used ? If it's not used for some reason, you can try changing the index kind to "nodetype" (as opposed to the default which is "value"). If the index is used, the only other optimization I can think of  is not use JOINs

           

          For the 2nd query, since you don't have any indexes defined, the entire query will be processed in memory against all the existing nodes from the repository. Adding a "nodetype" index on the the [app:variant] type should speed up that query assuming the index is used correctly to filter the set of nodes (again, you should check the query plan)

          • 2. Re: Looking for suggestions on how to optimize certain queries (Modeshape 4)
            ma6rl

            I'm still experimenting with ways to optimize but am seeing some strange behavior with one of my queries:

             

            This query has two index options in the query plan as expected (as I index the node type app:asset)

            SELECT node.* FROM [app:asset] AS node WHERE ISDESCENDANTNODE('/organizations/default/assets') ORDER by node.[jcr:created] DESC LIMIT 10 OFFSET 0
            

             

            Index [node] <INDEX_SPECIFICATION=asset_mixin_types, provider=lucene, cost~=100, cardinality~=9251, selectivity~=1.0, constraints=[node.[jcr:mixinTypes] = 'omakase:asset'], INDEX_USED=true>

            Index [node] <INDEX_SPECIFICATION=DescendantsByPath, provider=<implicit>, cost~=1000, cardinality~=10000, selectivity~=?, constraints=[ISDESCENDANTNODE(node,'/organizations/default/assets')], path=/organizations/default/assets>


            This query has one index option in the query plan as expected (I have no indexes that should be used)

            SELECT node.* FROM [app:variant] AS node WHERE ISDESCENDANTNODE('/organizations/default/assets/aa/de/74/aade7428_3310_4a46_b35a_53bc265fc3d4') ORDER by node.[jcr:created] DESC LIMIT 10 OFFSET 0
            


            Index [node] <INDEX_SPECIFICATION=DescendantsByPath, provider=<implicit>, cost~=1000, cardinality~=10000, selectivity~=?, constraints=[ISDESCENDANTNODE(node,'/organizations/default/assets/aa/de/74/aade7428_3310_4a46_b35a_53bc265fc3d4')], path=/organizations/default/assets/aa/de/74/aade7428_3310_4a46_b35a_53bc265fc3d4, INDEX_USED=true>


            This query on the other hand does not have a DescendantsByPath index in the plan and I'm not sure why, instead it is using an inefficient index I have on the type

            SELECT node.* FROM [app:variantFile] AS node WHERE ISDESCENDANTNODE('/organizations/default/assets/aa/de/74/aade7428_3310_4a46_b35a_53bc265fc3d4/fb12ad9c_eec6_4717_b6dc_91d0a4fc635b') AND (node.[app:type] != 'CHILD') ORDER by node.[app:fileCreated] DESC LIMIT 10 OFFSET 0
            

             

            Index [node] <INDEX_SPECIFICATION=variant_file_types, provider=lucene, cost~=100, cardinality~=92566, selectivity~=1.0, constraints=[node.[omakase:type] <> 'CHILD'], INDEX_USED=true>

             

            Any idea why their is no DescendantsByPath in the query plan for the above query?

            • 3. Re: Looking for suggestions on how to optimize certain queries (Modeshape 4)
              hchiorean

              Index [node] <INDEX_SPECIFICATION=variant_file_types, provider=lucene, cost~=100, cardinality~=92566, selectivity~=1.0, constraints=[node.[omakase:type] <> 'CHILD'], INDEX_USED=true>

               

              Any idea why their is no DescendantsByPath in the query plan for the above query?

              not off the top of my head, no. I would've expected that the index is at least considered, like in the first query. If you can isolate this in a test case, feel free to log a JIRA an I'll investigate this some more.

               

              As far as the first two queries go, the plan looks fine (and in the case of the first query also the index usage).