9 Replies Latest reply on Jul 30, 2010 2:12 AM by rhauch

    Finding children with given path in JCR_SQL2

    simon.g

      Hi all

       

      In a repository structure like this (depicted as XML):

      <a>
          <b name="b1">
              <c>
                  <d name="d1"/>
                  <d name="d2"/>
              </c>
          </b>
          <b name="b2">
              <c>
                  <d name="d3"/>
              </c>
          </b>
      </a>

       

      I am trying to express a condition with will return all "d" nodes. Expressed in standard XPath, such a path would be /a/b/c/d returning all d's.

       

      1) Trying to stick to standard JCR SQL2 conditions, I gave the following expression a try:
           SELECT * FROM [nt:base] WHERE (ischildnode('/a/b/c'))

      ... but this path seems to be interpreted as /a/b[1]/c. I acutally could not find any section in the spec which described this, but assume it is correct since jackrabbit behaves the same.

       

      2) So I gave the followin modeshape-specific operand a try:

           SELECT * FROM [nt:base] WHERE (PATH() LIKE '/a/b%/c%/d%')

      which only returned the node at path

           /a/b[1]/c/d[1]

       

      3) Based on intuition I executed the follwoing query:

           SELECT * FROM [nt:base] WHERE (PATH() LIKE '/a/b[*]/c[*]/d[*]%')

      which finally returned the result I was looking for:

           /a/b/c/d

           /a/b/c/d[2]

           /a/b[2]/c/d

       

      Regarding this now my questions:

      • I didn't find * as placeholder for all indices mentioned in the spec. Is this correct?
      • Why does the path /a/b%/c%/d% from 2) not equal all paths found in 3)?
      • Is it correct for the path in 3) to require a "%" at the end?
      • And last but not least: How could this be expressed in a JCR query without the Path operand?

       

      Thanks a lot for any feedback. Still new the JCR SQL and trying to get the hang of it.

      Simon

        • 1. Re: Finding children with given path in JCR_SQL2
          rhauch

          Simon Gunzenreiner wrote:

          ...

          I am trying to express a condition with will return all "d" nodes. Expressed in standard XPath, such a path would be /a/b/c/d returning all d's.

           

          When dealing strictly with path constraints, XPath is substantially more concise and easier. I'm not sure why it was deprecated in the JCR 2.0 specification, but it is available as a supported query language in ModeShape.

          1) Trying to stick to standard JCR SQL2 conditions, I gave the following expression a try:

               SELECT * FROM [nt:base] WHERE (ischildnode('/a/b/c'))

          ... but this path seems to be interpreted as /a/b[1]/c. I acutally could not find any section in the spec which described this, but assume it is correct since jackrabbit behaves the same.

           

          Section 6.7.21 describes the use of the ISCHILDNODE criteria, and 6.7.22 describes the use of the ISDESCENDANTNODE criteria, and both are written in terms of the "Path" rule, which defined in section 6.7.23 as a valid JCR path (not a matching expression for multiple paths).

          2) So I gave the followin modeshape-specific operand a try:

               SELECT * FROM [nt:base] WHERE (PATH() LIKE '/a/b%/c%/d%')

          which only returned the node at path

               /a/b[1]/c/d[1]

          Using a LIKE constraint on the path is the correct way to express this constraint (though there are other ways of specifying the full criteria; see below). Unfortunately, the specification does not really go into the semantics of how the wildcard characters are to be handled inside a path-like expression. Should the paths be required to have the SNS-index qualifiers?

           

          ModeShape decided to approach this in a way that allowed queries to have the greatest amount of control, but this may result in some initial confusion.  (Plus, documenting all the capabilities of a query language is actually fairly difficult. The grammar is hard enough to document, let alone all of the semantics.)

           

          ModeShape attempts to parse the LIKE expression for a path in a similar manner to a regular path expression. Thus, for each segment in the path, a lack of a SNS qualifier (e.g., "[3]") is always interpreted as "[1]". Thus, your constraint would be interpreted as "/a[1]/b%[1]/c%[1]/d%[1]". Any wildcards appearing in a path segment between the '[' and ']' characters would then be treated as range constraints on the SNS indexes. Thus "/a[_]" would match any single-digit SNS index (e.g., 1-9), whereas "/a[%]" would match all SNS indexes.

           

          So in your case you want to match the paths regardless of SNS index, so you have to use wildcards in the SNS index portion of all path segments: "/a[%]/b[%]/c[%]/d[%]" would match all nodes named 'd' below nodes named 'c' that are below nodes named 'b' below any top-level nodes named 'a'. You could even specify wildcards in the names: "/a%[%]/b%[%]/c%[%]/d%[%]" would match all nodes with a name starting with 'd' below nodes with a name starting with 'c' that are below nodes with names starting with 'b' below any top-level nodes with names starting with 'a'.

           

          3) Based on intuition I executed the follwoing query:

               SELECT * FROM [nt:base] WHERE (PATH() LIKE '/a/b[*]/c[*]/d[*]%')

          which finally returned the result I was looking for:

               /a/b/c/d

               /a/b/c/d[2]

               /a/b[2]/c/d

           

          This was a good guess, but it actually is an accident of the implementation: under the covers ModeShape is actually using regular expressions as part of the query execution, and ModeShape is passing the regular expression multi-character wildcard symbol '*' down to the underlying processor. So it happens to work, but "/a/b[%]/c[%]/d[%]" is the JCR-SQL2 compliant form.

           

          Also, '%' at the end of the path makes the LIKE expression interpreted as the prefix of any path. Thus, "/a/b[%]/c[%]/d[%]" will match a node at "/a[1]/b[2]/c[3]/d[2]" but will not match a node at "/a[1]/b[2]/c[3]/d[2]/e[1]". On the other hand, "/a/b[%]/c[%]/d[%]/%" will match both of these nodes (plus all descendants of any of the 'd' nodes.

          Regarding this now my questions:

          • I didn't find * as placeholder for all indices mentioned in the spec. Is this correct?
          • Why does the path /a/b%/c%/d% from 2) not equal all paths found in 3)?
          • Is it correct for the path in 3) to require a "%" at the end?

          I think I answered these questions above.

          • And last but not least: How could this be expressed in a JCR query without the Path operand?

           

          JCR-SQL2 does not really offer an approach to handle queries with non-exact (e.g., pattern-based) path constraints other than "... PATH() LIKE ..." approach. However, it is possible to use ISDESCENDANTNODE criteria to find descendant nodes that are lower than an exact path. For example, this query will find all nodes below "/a":

           

               SELECT * FROM [nt:base] WHERE ISDESCENDANTNODE('/a')

           

          You can narrow the set of returned nodes further with additional constraints, such as those on node types and mixin types (using joins), or names, depths, etc., using other constraints.

           

          I hope this helps!

           

          Best regards,

           

          Randall

          • 2. Re: Finding children with given path in JCR_SQL2
            simon.g

            Hi Randall

             

            Thanks a lot for the detailed answer, it clarified a lot. Regarding the usage of XPath, I first tired that but then found some behaviour that I did not understand. For example, the following XPath query results in only one node (in the same a b c d structure as above):

             

            query = /jcr:root/a/b/c/d

                 /a/b/c/d

             

            In XPath, one would expect all d nodes to be returned, like when the following query is used:

             

            query = /jcr:root//d

                 a/b/c/d
                 /a/b/c/d[2]
                 /a/b[2]/c/d

             

            What I also did not understand was the once I used the element() test like this:

             

            query = /jcr:root/a/b/c/element(d, *)

                 /a/b/c/d

                 /a/b/c/d[2]

             

            only the child nodes of b[1], but not of b[2] were returned.

             

            Best regards

            Simon

            • 3. Re: Finding children with given path in JCR_SQL2
              rhauch

              Simon,

               

              I think you are right that an XPath query like "/jcr:root/a/b/c/d" should find all nodes (with any SNS index), and should equate to a JCR-SQL2 query like this:

               

              {code:sql}SELECT * FROM [nt:base] WHERE PATH() LIKE '/a[%]/b[%]/c[%]/d[%]'{code}

               

              Instead, ModeShape is incorrectly generating this JCR-SQL2 query:

               

              {code:sql}SELECT * FROM [nt:base] WHERE PATH() LIKE '/a/b/c/d'{code}

               

              Would you care to log an issue? It shouldn't be too hard to fix.

              • 4. Re: Finding children with given path in JCR_SQL2
                rhauch

                Actually, I'm having second thoughts about what ModeShape should do when it is presented with a LIKE constraint on a path, where that LIKE expression does not (always) include same-name-sibling indexes.

                 

                So, given this LIKE constraint:

                 

                {code:sql} ... PATH() LIKE '/a[1]/b[2]/c[3]/d[4]' ...{code}

                 

                there really is no ambiguity in what this means, since every segment in the path contains an explicit SNS index. Even this LIKE constraint has a fairly obvious behavior:

                 

                {code:sql} ... PATH() LIKE '/a[1]/b[2]/%/c[3]/d[4]' ...{code}

                 

                In this case, we don't care what nodes are actually below '/a[1]/b[2]' and above 'c[3]/d[4]', just that there is at least one node (since there is a path delimiter before and after the '%').

                 

                However, what should the following LIKE constraint imply?

                 

                {code:sql} ... PATH() LIKE '/a/b/c/d' ...{code}

                 

                I think there are two very sensible possibilities for handling this kind of query, where some path segments lack an explicit SNS index:

                1. the lack of a SNS index implies all SNS indexes, or
                2. the lack of a SNS index implies an SNS index of '1'

                 

                Either option allows a user to explicitly specify the SNS and get their desired behavior. So we just need to decide what is the best behavior when the SNS indexes are not specified.

                 

                Thoughts?

                • 5. Re: Finding children with given path in JCR_SQL2
                  bcarothers

                  I think that the behavior for this case might be easier to understand if the JCR--SQL2 behavior mirrored the XPath behavior an equivalent path constraint.

                   

                  In other words, I think we should select option 1.

                  • 6. Re: Finding children with given path in JCR_SQL2
                    rhauch

                    That's a great point, and it would also make the XPath behavior match (since we are translating XPath to JCR-SQL2). I suspect Simon would prefer that as well, since that's the gist of the original question.

                     

                    I'll log a defect to this effect.

                    • 7. Re: Finding children with given path in JCR_SQL2
                      simon.g

                      Hi Brian, Randall

                       

                      I can only confirm your points regarding the interpretation of Path LIKE expressions in SCR-SQL2. Once this "defect" is addressed, would this also mean that an XPath query like "/a/b/c/d" would result in all d nodes under any /a/b/c path?

                       

                      Simon

                      • 8. Re: Finding children with given path in JCR_SQL2
                        rhauch

                        I've logged MODE-834 and have been trying a fix, but I soon realized that "jcr:path" constraints in JCR-SQL are required to interpret the lack of a SNS index as equivalent to a SNS of 1, and there is even one TCK test that verifies this behavior. Thus, I think JCR-SQL2 Path(...) constrains need to behave the same way.

                         

                        In other words, our handing of SNS indexes in JCR-SQL and JCR-SQL2 was correct, but our handling of them in XPath was incorrect.

                         

                        So, once MODE-834 is fixed, an XPath query like "/a/b/c/d" would indeed result in all d nodes under any /a/b/c path. In fact, Section 6.6.3.4 of the JCR 1.0 specification gives an example of how the following XPath query

                         

                          {noformat}/jcr:root/some/nodes// element(*, my:type){noformat}

                         

                        would translate into JCR-SQL as:

                         

                          {code:sql}SELECT * FROM my:type WHERE jcr:path LIKE '/some[%]/nodes[%]/%'{code}

                         

                        So, XPath defaults to matching any SNS index, whereas JCR-SQL (and I surmise JCR-SQL2) default to matching only SNS indexes equal to 1. Thus, to get the XPath-like behavior using JCR-SQL or JCR-SQL, you must insert "[%]" in each path segment that does not already have a SNS index.

                         

                        (In fact, this is what our XPath-to-JCRSQL2 translator will be doing after MODE-834 is fixed.)

                        • 9. Re: Finding children with given path in JCR_SQL2
                          rhauch

                          I just committed the fix for MODE-834 into SVN, and will mark the issue as resolved. If you get a chance to try the latest code in trunk, let us know if it now works as you expect.

                           

                          I hope this helped!