3 Replies Latest reply on Jan 8, 2013 9:18 AM by rhauch

    nt:unstructured nodes and queries

    bwallis42

      I'm writing some code to query some data in a repository. The main part of the repository has a well defined structure with my own defined node types and with this the queries are reasonably easy to write, for example:

       

      {code}

      SELECT s.[inf:name] FROM [inf:patient] AS p JOIN [inf:section] AS s ON ISCHILDNODE(s, p)

          WHERE p.[inf:masterNs] = $ns AND p.[inf:masterId] = $urno

      {code}

       

      Now I can create the data in the repository without the node types and use nt:unstructured nodes throughout. This has been done in a recent addition to the repository as the eventual structure was not clear when I begun. But it has occurred to me that queries against trees of nt:unstructured nodes are somewhat less clear to write as you cannot use the node type in the expression.

       

      For example, the above query expects a structure of inf:patient node types containing inf:section node types and that the nodes contain the attributes as shown. How would I write this query if I were to use nt:unstructured nodes instead? Do I need the following?

       

      {code}

      SELECT s.[inf:name] FROM [nt:unstructured] AS p JOIN [nt:unstructured] AS s ON ISCHILDNODE(s, p)

          WHERE p.[inf:masterNs] IS NOT NULL AND p.[inf:masterNs] = $ns AND p.[inf:masterId] IS NOT NULL AND p.[inf:masterId] = $urno

      {code}

       

      Will this query work? Not all nt:unstructured nodes that are contained in other nt:unstructured nodes will have the attributes as mentioned. Do I need to add the PropertyExistance expressions to the query? What about the columns (properties) returned. I only wanted the inf:name property.

       

      And finally, will the query across the data using my node types be more efficient than the one with the nt:unstructured node types?

       

      thanks.

        • 1. Re: nt:unstructured nodes and queries
          rhauch

          Brian Wallis wrote:

           

          I'm writing some code to query some data in a repository. The main part of the repository has a well defined structure with my own defined node types and with this the queries are reasonably easy to write, for example:

           

          
          SELECT s.[inf:name] FROM [inf:patient] AS p JOIN [inf:section] AS s ON ISCHILDNODE(s, p) 
              WHERE p.[inf:masterNs] = $ns AND p.[inf:masterId] = $urno

           

          This original query will work exactly the same whether the "inf:pateint" and "inf:section" node types are the primary type or a mixin type on your nodes. The ModeShape tables include the following:

           

          • nt:base (jcr:primaryType, jcr:mixinTypes, jcr:path, jcr:score, jcr:name, mode:localName, mode:depth)
          • nt:unstructured (jcr:primaryType, jcr:mixinTypes, jcr:path, jcr:score, jcr:name, mode:localName, mode:depth)
          • inf:patient (inf:name, inf:masterNs, ..., jcr:path, jcr:score, jcr:name, mode:localName, mode:depth)
          • inf:section (..., jcr:path, jcr:score, jcr:name, mode:localName, mode:depth)

           

          So a node that has a primary type of "nt:unstructured" an a mixin of "inf:patient" will appear in the first three of these tables. Your original query uses only the columns from the "inf:patient" and "inf:section" tables, and thus that query will continue to work.

           

           

           

          
          SELECT s.[inf:name] FROM [nt:unstructured] AS p JOIN [nt:unstructured] AS s ON ISCHILDNODE(s, p) 
              WHERE p.[inf:masterNs] IS NOT NULL AND p.[inf:masterNs] = $ns AND p.[inf:masterId] IS NOT NULL AND p.[inf:masterId] = $urno
          

           

           

          Will this query work? Not all nt:unstructured nodes that are contained in other nt:unstructured nodes will have the attributes as mentioned. Do I need to add the PropertyExistance expressions to the query? What about the columns (properties) returned. I only wanted the inf:name property.

           

           

          This query will also work (see below), but is a lot more complicated than your first query. Thus, there's no real reason do use this more complicated query.

           

           

          And finally, will the query across the data using my node types be more efficient than the one with the nt:unstructured node types?

          In fact, your second query will be more expensive because it has a JOIN. But more generally, it is no more expensive to execute a query that uses custom node types than it would be to run a similar query against the built-in node types.

           

           

           

          What tables exist?

           

          Per the JCR specification, every node type corresponds to a "table" that you can query, and conceptually any given node will appear as a row in every "table" for which "node.isNodeType(table)" is true. Also, tThe JCR specification requires that each "table" have columns for each of the node type's explicit or inherited non-residual, single-valued property definitions. Therefore, the following are several of the tables that are expected to exist in any JCR repository given "inf:patient" and "inf:section" node types (the required columns are in parentheses):

           

          • nt:base (jcr:primaryType)
          • nt:unstructured (jcr:primaryType)
          • inf:patient (inf:name, inf:masterNs, ...)
          • inf:section (...)

           

          Note that "nt:unstructured" has "nt:base" as a supertype and thus has a column defined for the "jcr:primaryType" property definition. Also, the "jcr:mixinTypes" is a multi-valued property definition, and thus the specification does not require that this column exist. But "inf:patient" does not extend "nt:base" (if it is a mixin), and therefore will not have the "jcr:primaryType" column.

           

          However, ModeShape actually does support a couple of things beyond what the specification requires:

           

          1. tables DO include a column for single-valued AND multi-valued property definitions
          2. tables DO include several pseudo-columns for the path to the node, the score, the node' name, the node's local name, and the depth of the node
          3. tables DO include columns for residual properties

           

          Thus, in ModeShape the tables would look like this:

           

          • nt:base (jcr:primaryType, jcr:mixinTypes, jcr:path, jcr:score, jcr:name, mode:localName, mode:depth)
          • nt:unstructured (jcr:primaryType, jcr:mixinTypes, jcr:path, jcr:score, jcr:name, mode:localName, mode:depth)
          • inf:patient (inf:name, inf:masterNs, ..., jcr:path, jcr:score, jcr:name, mode:localName, mode:depth)
          • inf:section(..., jcr:path, jcr:score, jcr:name, mode:localName, mode:depth)

           

          The inclusion of these extra column in ModeShape makes the queries more useful. You can include any of these columns any place in a query that a column can be used, including in SELECT and WHERE expressions. The only tricky part is that the residual properties: they can be used explicitly in queries, but they will not be included in the results of a "SELECT *" query. (That's because the residual properties appear on each row but are not universal in all rows of the table.)

          1 of 1 people found this helpful
          • 2. Re: nt:unstructured nodes and queries
            bwallis42

            Thanks for the explanation of how things are decomposed into the tables and the comments on the queries in general.

            Randall Hauch wrote:

             

            Brian Wallis wrote:

             

            I'm writing some code to query some data in a repository. The main part of the repository has a well defined structure with my own defined node types and with this the queries are reasonably easy to write, for example:

             

            
            SELECT s.[inf:name] FROM [inf:patient] AS p JOIN [inf:section] AS s ON ISCHILDNODE(s, p) 
                WHERE p.[inf:masterNs] = $ns AND p.[inf:masterId] = $urno

             

            I have realised that I can better express this as

             

            {code}

            SELECT s.[inf:name] FROM [inf:section] AS s where ISCHILDNODE(s,"/ns.urno")

            {code}

             

            since the values of inf:masterNs and inf:masterId in my model are used to create the node name via concatentation with a '.' separator.

             

            I would hope that this query would be more efficient as it doesn't have the JOIN. Is this the case?

             

            thanks.

            • 3. Re: nt:unstructured nodes and queries
              rhauch

              Yes, it will be more efficient.

               

              If you're curious about how a particular query will be executed, you can always retrieve the query plan by casting to the ModeShape implementation class:

               

                 QueryResult result = ...

                  String plan = ((org.modeshape.jcr.query.JcrQueryResult)result).getPlan();

               

               

              (We really should make that part of our public API. I'll log an enhancement.)