4 Replies Latest reply on Nov 26, 2014 5:57 AM by bes82

    Detail questions regarding indexes

    bes82

       

      If I understand the doc - Query and search - ModeShape 4 - Project Documentation Editor - correctly, it is no problem to define asynchronous and synchronous index at the same time in one indexmanager.

       

      Is the following assumption correct:

       

      Calling session.save() blocks at least as long as all affected synchronous indexes have been updated?

       

       

      Then I have another question about how to define PATH indexes, consider the following query:

       

      SELECT child.x

      FROM [nt:child] AS child

      JOIN [nt:parent] as parent ON ISDESCENDANTNODE(child,parent)

      JOIN [nt:subchild] AS subchild ON ISCHILDNODE(subchild,child)

      WHERE child.y='12345'

      AND parent.z= '67890'

       

      Indexes are defined on child.y and parent.z and they are used. What is not used is a PATH index for the subchild query that is also defined on all three nodetypes

       

      The stripped down plan looks like this, and I don't understand why the access queries are ordered this way:

       

      Project [child]

        Join [subchild,child]

          Join [parent,child]

            Access [parent]

              IndexUsed

            Access [child]

              IndexUsed

          Access [subchild]

            NoIndexUsed

       

      What I would have expected is:

       

      Access to child using child.y index, access to parent using either parent.z or PATH index, access to subchild using PATH index.

       

      What I guess is happening is, that as the subchild access is not a dependent access, no index is used, but why is it not a dependent query?

        • 1. Re: Detail questions regarding indexes
          rhauch

          If I understand the doc - Query and search - ModeShape 4 - Project Documentation Editor - correctly, it is no problem to define asynchronous and synchronous index at the same time in one indexmanager.

           

          Yes.

          Calling session.save() blocks at least as long as all affected synchronous indexes have been updated?

          Yes.

          Then I have another question about how to define PATH indexes, consider the following query:

           

          SELECT child.x

          FROM [nt:child] AS child

          JOIN [nt:parent] as parent ON ISDESCENDANTNODE(child,parent)

          JOIN [nt:subchild] AS subchild ON ISCHILDNODE(subchild,child)

          WHERE child.y='12345'

          AND parent.z= '67890'

           

          Indexes are defined on child.y and parent.z and they are used. What is not used is a PATH index for the subchild query that is also defined on all three nodetypes

           

          The stripped down plan looks like this, and I don't understand why the access queries are ordered this way:

           

          Project [child]

            Join [subchild,child]

              Join [parent,child]

                Access [parent]

                  IndexUsed

                Access [child]

                  IndexUsed

              Access [subchild]

                NoIndexUsed

           

          The 'parent' nodes are found via the index on 'z', and the 'child' nodes are found via the index on 'y', and the two are joined to find tuples for all the correct combinations of 'child' and 'parent' (per the ISDESCENDANTNODE criteria); any 'child' node that is not a descendant of a 'parent' will be discarded, and any 'parent' node that has no descendant in 'child' will also be discarded. Then, the final join finds all 'subchild' nodes for each of the remaining 'child' nodes.

           

          The only kind of true dependent query that ModeShape supports is in correlated subqueries.

           

          BTW, in your example query the 'subchild' nodes serve no purpose; I presume they do in the real query.

          • 2. Re: Detail questions regarding indexes
            bes82

            The 'subchild' serves one purpose: there has to be such a child, otherwise I don't want to include the child in the resultset.


            So the problem is now, that there is no criteria for subchild so the access query collects every node in the repository.


            I thought that a path index is used (which exists) to narrow down the number of nodes that have to be fetched. Without this my query is extremely unperformant.


            But how do I change that? There is no constraint on subchildI could ask for. I simply like to get all child nodes that have a child. I could add an index on primaryType for the subchilds, but the type of subchilds in my real query is rather nt:unstructured, so that wouldn't help either.

            • 3. Re: Detail questions regarding indexes
              rhauch

              I'm surprised the implicit child node index is not used.

              • 4. Re: Detail questions regarding indexes
                bes82

                Should I report this as a bug?

                 

                Slightly modified Query that works on every modeshape repository and doesn't use indexes at all with 4.0:

                 

                select sys.* from [mode:system] as sys

                join [nt:nodeType] as ntx on ISDESCENDANTNODE(ntx,sys)

                join [nt:propertyDefinition] as pd on ISCHILDNODE(pd,ntx)