8 Replies Latest reply on Apr 3, 2017 8:18 AM by amischler

    Query optimization performance

    amischler

      Hi,

       

      I'm investigating on a performance issue for a given query. I'm using Modeshape 4.6.2.Final.

       

      This is a simple query :

       

      SELECT [jcr:path] FROM [dsdk:smarttext] WHERE smartTextKey = 'infiltrea.analyse.measureMethod.blowerdoorPosition'

       

      And I have defined the following indexes that can be used for this query :

       

      "indexProviders": {

        "local": {

         "classname": "org.modeshape.jcr.index.local.LocalIndexProvider",
         "directory": "${application.workspace}/indexes"
        }

      },
      "indexes": {

        "smartTextKeys": {

         "kind": "value",
         "provider": "local",
         "synchronous": "true",
         "nodeType": "dsdk:smarttext",
         "columns": "smartTextKey(STRING)",
         "mode:workspaces": "default"
        },

        "nodeTypes": {

         "kind": "nodeType",
         "provider": "local",
         "synchronous": "true",
         "nodeType": "nt:base",
         "columns": "jcr:primaryType(STRING)",
         "mode:workspaces": "default"
        },

      [...]

      }

       

      As you can see from the attached query plan log, both index are being considered as expected. However, the index "nodeTypes" get used instead of the "smartTextKeys" index which I would expect to be more specific to my query. Any idea why ?

       

      Moreover, the whole query execution takes up to 4 second ; from the logs I can see that most of the time is spent in the optimization process of the query itself :

       

      Stopped executing query 115: 4,011 sec (plan=482,152 usec, optim=4,01 sec, resultform=1,161 ms)

       

      I there anything that I could do to improve or limit the optimization time of the query ?

       

      Thanks.

      --

      Antoine

        • 1. Re: Query optimization performance
          hchiorean

          The reason why one index is preferred over another is the "algorithm" used by ModeShape: if 2 indexes have the same cost and cardinality (i.e. the estimate of the number of items in that index) the decision is made based on the name of the index (i.e. lexicographically) because the selection has to be stable. You can see the logic here:  modeshape/IndexPlan.java at master · ModeShape/modeshape · GitHub From the log both your indexes have the same cost and cardinality, so "nodeTypes" comes before "smartTextKeys"

           

          Regarding performance: from looking at the log, 90% of the time is wasted between

          [TRACE]    2017-03-23 11:25:38,151 o.m.j.i.l.LocalIndexProvider {Thread-22} - Index 'nodeTypes' in 'local' provider applies to query in workspace 'default' with constraint: [dsdk:smarttext].[jcr:primaryType] = 'dsdk:smarttext'

          [TRACE]    2017-03-23 11:25:42,131 o.m.j.q.o.RuleBasedOptimizer {Thread-22} - Plan after running query optimizer rule org.modeshape.jcr.query.optimize.AddIndexes@241ae050:

           

          Given the configuration snippet you added there's no good reason for that, but without a test case I can look at locally there's not much else I can say.

          If you can provide a test case which exhibits this issue, I can look at it locally but using the latest community version 5.x

          • 2. Re: Query optimization performance
            amischler

            Ok, thanks for your answer.

             

            I have investigated further on the time wasted and it turns out that most of the time is lost in the LocalIndexProvider while estimating the size of the index :

             

            stacktrace-query.tiff

             

            So the performance issue is not at Modeshape level but rather at MapDB level. I tried to fine tune MapDB configuration as discussed in this thread Re: Modeshape / Infinispan performance [how performant should it be] (ModeShape 4.0)  . I finally managed to reduce the query time to around 500 ms by tuning the various MapDB parameters but this requires to set some options that I don't want to use in production, making the indexes less robust to crashes. So I will try to do additional tests with the Lucene index provider.

            • 3. Re: Query optimization performance
              hchiorean

              Thanks for the update. It seems highly strange to me as well that a keySet() operation would take that long for MapDB.

               

              ModeShape uses MapDB 1.0.9 (afaik the latest available MapDB version from the 1.x series). MapDB 2 has been a failure and the current (radically changed, basically a rewrite) version is 3.x. So if the problem is indeed with MapDB, it's unlikely that we can do anything about it.

              • 4. Re: Query optimization performance
                amischler

                I read from MapDB documentation that there is counter option : "Another parameter is the size counter. By default HTreeMap does not keep track of its size and map.size() performs a linear scan to count all entries" (see HTreeMap · MapDB )

                 

                I made a quick test with a custom LocalIndexProvider in which I force DB to create HashSet with the counterEnable() option. After setting this option the performance of the query is greatly improved :

                 

                 

                [TRACE]2017-03-31 15:49:26,813 o.m.jcr.query {JTP Slot 1} - The execution function for 104: (filtered width=1 (filter [dsdk:smarttext].smartTextKey = 'infiltrea.analyse.additionalinformation') (filtered width=1 (filter [dsdk:smarttext].[jcr:primaryType] = 'dsdk:smarttext') (from-index nodeTypes with [[dsdk:smarttext].[jcr:primaryType] = 'dsdk:smarttext'])))
                [TRACE]2017-03-31 15:49:26,819 o.m.jcr.query {JTP Slot 1} - Stopped executing query 104: 98,584 ms (plan=283,108 usec, optim=71,535 ms, resultform=26,765 ms)

                 

                Is there a reason that this counter option is not enabled by default by Modeshape ?

                • 5. Re: Query optimization performance
                  hchiorean
                  I made a quick test with a custom LocalIndexProvider in which I force DB to create HashSet with the counterEnable() option. After setting this option the performance of the query is greatly improved :

                  I'm not sure I understand that what change you made exactly,  since LocalIndexProvider only initializes the MapDB DB, but not any collections.

                   

                  HTreeMap is only used in one place and that has counter enabled - https://github.com/ModeShape/modeshape/blob/master/modeshape-jcr/src/main/java/org/modeshape/jcr/query/BufferManager.java#L834

                  The LocalMapIndex class (which serves as the base class for the local indexes) also has counter enabled for its collection modeshape/LocalMapIndex.java at master · ModeShape/modeshape · GitHub

                   

                  Feel free to open a PR with the suggested enhancement though so I can understand exactly what you mean by the previous statement.

                  • 6. Re: Query optimization performance
                    amischler

                    In my case, the performance bottleneck is at LocalEnumeratedIndex.estimateTotalCount() which performs a call on to HTreeMap.size(). For this HTreeMap the counter does not seem to be enabled. This HTreeMap is created at LocalEnumerateIndex:105 through a call to db.getHashSet() which creates an HTreeMap without enabling the counter.

                     

                    Here is the change I made to make sure the counter is enabled on all HTreeMap created using db.getHashSet() : Query optimization performance - Quick & dirty fix · dooApp/modeshape@a669eea · GitHub

                     

                    I don't submit it as a pull request, it was only a quick & dirty fix to check whether it fixes the performance issue. I guess, that this should be handled in a cleaner way in LocalEnumeratedIndex.createOrGetKeySet()

                    • 7. Re: Query optimization performance
                      hchiorean

                      ok; now I understand your use case. The limitation comes from the way MapDB creates HashSets internally and it's not something explicit in the current ModeShape code, which simply uses the MapDB API.

                       

                      So the solution to this would be some sort of enhancement to create the HashSet there in another way, via a local HTreeMap for example. If you want this tracked for the next 5.5 release, please log a JIRA. Thanks.

                      • 8. Re: Query optimization performance
                        amischler