6 Replies Latest reply on Dec 14, 2010 12:29 PM by edouardh

    Descendant node query and shareable nodes

    edouardh

      Hello,

      I'm facing a problem when trying to query shared nodes using the ISDESCENDANTNODE operator. Let's consider this tree :

       

      /A/B/X
      


      X being a shareable node. Considering this other tree :

      /E/F


      If I clone the X node under F (resulting in /E/F/X) and run a query such as :

      select * from [type of X] as x where isdescendantnode(x, [/A])


      Then I get X. But if I run this other query :

      select * from [type of X] as x where isdescendantnode(x, [/E])


      Then I don't get any result. It looks like Jackrabbit considers that X  is only a descendant of A, not of E. Jackrabbit has the same  behaviour. So, is it the expected behaviour that a shared node is considered a descendant only of its first parent, or is that a bug ?

      Thanks,

      Edouard

        • 1. Re: Descendant node query and shareable nodes
          rhauch

          The JCR 2.0 specification states in Section 14.16:

           

          If a query matches two or more nodes in a shared set, whether all of these nodes or just one is returned in the query result is an implementation issue.

           

          This variability is allowed since different implementations might have different “natural” behaviors, and it would be expensive for an implementation to compute the answer that is “unnatural” for that implementation.

           

          If a query matches a descendant node of a shared set, it appears in query results only once.

           

          So, if I'm interpreting this correctly, an implementation can either return multiple shared nodes satisfying the query in the result set, or only one. ModeShape currently only returns the shareable node as if it exists in one location. It sounds like Jackrabbit does the same thing. IOW, the behavior is allowed by the specification.

           

          I would understand why someone would want the other behavior, however. But IMO, the **best** behavior would be to make a shared subgraph appear in the query results either:

           

          1. in only one place, or
          2. in every place that subgraph was shared.

           

          But the last sentence in the quote above appears to not allow #2 above. It'd be really tough to implement, but it's unfortunate they don't allow this.

          • 2. Re: Descendant node query and shareable nodes
            edouardh

            Thanks for your answer.

             

            Well, I think this part of the spec doesn't apply to this actual case. The spec permits that, when you have /A/B/X1 and /E/F/X2, X1 and X2 being in the same shared set, a query on / for its descendants of type X would return either X1 or X2 but not both (no duplicates). But in my own case, the query is not on / but on either /A or /E : the query would not return two or more nodes anyway, but only one. But actually, it doesn't return any for /E.

             

            Sadly, the spec is not clear on this point. X1 is a child of B, X2 is a child of F, X1 and X2 are "the same", but both X1 and X2 return the same parent node (B if X1 is the firstly created node). I agree, it might be touth to implement !

            • 3. Re: Descendant node query and shareable nodes
              rhauch

              I'd have to agree with you that the specification does not explicitly address your case, where your query is matching and returning "/A" and "/E" based upon constrains applied to the descendant nodes of "/A" and "/E". Since node "/A/B/X" was shared to "/E/F/X", the issue at hand is how query (descendant) constraints apply to nodes in a shared set and their descendants.

               

              However, I would argue that the specification does indirectly apply, for two reasons.

               

              First, the specification doesn't make a distinction between the kinds of queries or constraints that should apply to the nodes in the shared set and their descendants. Therefore, it should be possible to have a single, consistent model or notion of how the query engine works for all cases, including shared nodes and their descendants.

               

              Second, the only single, consistent model that works for all queries and all nodes is one in which the query engine can only see the shared node (or one of the other nodes in the shared set) and its descendants, and cannot see any of the other nodes in the shared set or their descendants. This conceptual model would mean that the criteria can only be applied to the nodes visible to the query engine, and thus either the nodes at/under "/A/B/X" or "/E/F/X" (but not both). Therefore, your query should return either either "/A" or "/E" but not both.

               

              The only alternative conceptual model that would be consistent in all cases would be for the query engine to see ALL nodes in the shared set AND their descendants. After all, this is entirely consistent with the behavior of the API. But as I was alluding to earlier, this alternative model is not allowed by the specification, because (per the last paragraph of Section 14.16) the descendants of the nodes in a shared set are only allowed to 'appear' in one place in a query result.

               

              What's strange is that I can't really come up with a single model of a query engine that can consistently handle the case where an implementation might return/consider all of the nodes in a shared set, yet return/consider only the descendants of ONE of the nodes in the shared set. IMO, this is complete inconsistent with the way shared nodes are treated in the API and how queries are defined in the specification.

              • 4. Re: Descendant node query and shareable nodes
                edouardh

                Randall Hauch wrote:

                 

                I'd have to agree with you that the specification does not explicitly address your case, where your query is matching and returning "/A" and "/E" based upon constrains applied to the descendant nodes of "/A" and "/E". Since node "/A/B/X" was shared to "/E/F/X", the issue at hand is how query (descendant) constraints apply to nodes in a shared set and their descendants.

                Maybe my point was not clear about this query : I do not mean to match both /A and /E at the same time. Instead, I want to match all the descendant nodes of one certain node, that are of a certain type, that type being shareable. This would be : "return all the nodes of type X that are descendants of /A" or "return all the nodes of type X that are descendants of /E", but not : "return all the nodes of type X that are descendants of /A or that are descendants of /E". This makes all the difference : in the last case, a shared node between /A and /E could match twice and be returned twice or not, depending on the interpretation of the JCR. But in my case, the shared node doesn't match twice as there is only one path that leads to it in the context of the query. If a shared node should not match twice in a resultset, it should also match at least once. But, as far as I could see, it only matches when I query the descendants of /A, but not the descendants of /E.

                 

                May I suggest you take a look at the test case I submitted with issue MODE-1051 ? Maybe it will make more sense

                • 5. Re: Descendant node query and shareable nodes
                  rhauch

                  Edouard Hue wrote:

                   

                  But, as far as I could see, it only matches when I query the descendants of /A, but not the descendants of /E.

                   

                  You are right, this also is not specifically addressed in the spec, since it leaves a lot open to the particular implementation. The current behavior of ModeShape is that ONLY the original shareable node and its descendants are visible to queries (which appears to be per the spec by my reading). In your case, because '/E/F/X' was shared from '/A/B/X', only '/A/B/X' (and its descendants) are visible to queries, while '/E/F/X' nor any of its descendants are not.

                   

                  I just added a question about your test case on MODE-1051.

                  • 6. Re: Descendant node query and shareable nodes
                    edouardh

                    This whole point is about how to interpret this sentence :

                    If a query matches two or more nodes in a shared set, whether all of  these nodes or just one is returned in the query result is an  implementation issue.

                    As said in the comments of MODE-1051, ModeShape's query engine ignores clone nodes and only considers the originals. For a query such as "select * from [t:x] as x where isdescendantnode(x, '/B')", this is equivalent to :

                    • remove all clone nodes, keep originals
                    • keep all nodes of type t:x
                    • keep descendants of /B

                    Sadly, I'm not totally convinced that this behaviour is conform to the spec, because clone nodes are removed before they are matched with the where clause. /B/X is a descendant of /B and has the t:x type, it matches the query. IMO, the spec says that when there are several clones in a result set, all but one could be removed. But these clone nodes can be removed only from those nodes that match the query. As I understand it, to be conform, the behaviour should be :

                    • keep all nodes of type t:x
                    • keep descendants of /B
                    • remove all remaining clone nodes, keep originals

                    I seeked the spec for more details about shared nodes in queries, especially in section 3.9, but couldn't find anything relevant...