9 Replies Latest reply on Feb 5, 2015 3:55 PM by brmeyer

    Complex queries on parent/child relationships without correlated subqueries

    brmeyer

      (Stems from [MODE-2422] Support JCR-SQL2 correlated subqueries - JBoss Issue Tracker and [ARTIF-202] The not() function doesn't quite work with relationships - JBoss Issue Tracker)

       

      [sramp:baseArtifactType] > mix:created, mix:lastModified, mix:referenceable abstract mixin
      - sramp:uuid (string) mandatory
      - sramp:artifactType (string)
      ...
      + * (sramp:relationship)
      
      [sramp:relationship]
      - sramp:relationshipType (string)
      ...
      
      

       

      SELECT artifact1.* FROM [sramp:baseArtifactType] AS artifact1
      WHERE artifact1.[sramp:artifactType] = 'Part' AND 'element'
      NOT IN (
        SELECT relationship2.[sramp:relationshipType]
        FROM [sramp:relationship] AS relationship2
        WHERE ISCHILDNODE(relationship2,artifact1)
      )
      
      

       

      In this example, an "artifact" node has 0..n children representing various forms of "relationships" with other artifacts.  What we're trying to support is something like:  "Find me all 'Part' nodes that do not include a relationship of type 'element' ('element' is simply a literal).

       

      In order to return artifacts that either have no relationship children or artifacts that define relationships other than 'element', using 'in' with a subquery seemed like the best choice.  However, I didn't realize that ModeShape didn't support correlations in subqueries (ie, can't use the 'artifact1' selector).

       

      Any ideas on workarounds?  Any queries that would return 'artifact' if 0 relationship children exist, or none of the children include the given relationshipType?

       

      Thanks!

        • 1. Re: Complex queries on parent/child relationships without correlated subqueries
          rhauch

          How about this?

           

          SELECT DISTINCT artifact.* FROM [sramp:baseArtifactType] AS artifact 
          JOIN [sramp:relationship] AS relation ON ISCHILDNODE(relation,artifact)
          WHERE artifact.[sramp:artifactType] = 'Part' AND relation.[sramp:relationshipType] != 'element'
          

           

          I'm not sure if the DISTINCT is actually required.

          • 2. Re: Complex queries on parent/child relationships without correlated subqueries
            brmeyer

            Randall, wouldn't that still return an artifact that has an 'element' relationship, as long as it has a relationship of some other type?  That was the problem we were originally facing.  Filtering out an artifact that has an 'element' relationship, period (regardless of its other relationships), didn't seem doable simply through a join and condition.

            • 3. Re: Re: Complex queries on parent/child relationships without correlated subqueries
              rhauch

              Yeah, you're right.

               

              Does your app need to search for artifacts without a specific kind of relationship? For example, if you know that `element` is the only kind of relationship that you want to filter out, then a very simple approach is to put a boolean property on "sramp:artifact" that tracks whether or not it has an `entity` relationship, and the query becomes trivial -- and fast! This will even work and will be very efficient for a handful of these predefined relationship types, especially if you need to mix and match (e.g., without 'entity' or 'foo' but with 'bar'). It requires only a little bit of code to flip this boolean property when relationships of the specific type are added/removed.

               

              SELECT artifact.* FROM [sramp:baseArtifactType] AS artifact
              WHERE artifact.[sramp:artifactType] = 'Part' AND
              artifact.[sramp:hasElementRelationship] = 'false'
              

               

              However, this approach does becomes much harder if you need a dynamically changing kind of relationship to filter out. To handle this more generally, you could create a subquery that finds all artifacts that *have* one or more relationships of type `entity`, and then use that to find all artifacts that are not in that set defined by a subquery with the first query:

               

              SELECT artifact.* FROM [sramp:baseArtifactType] AS artifact
              WHERE artifact.[sramp:artifactType] = 'Part'
              AND artifact.[mode:id] NOT IN (
                SELECT DISTINCT artifactWithEntity.[mode:id] FROM [sramp:baseArtifactType] AS artifactWithEntity
                JOIN [sramp:relationship] AS relation ON ISCHILDNODE(relation,artifactWithEntity)
                WHERE relation.[sramp:relationshipType] = 'element'
                AND artifactWithEntity.[sramp:artifactType] = 'Part'
              )
              
              
              

               

              Note that any criteria on artifacts (e.g., artifactType = 'Part') should be used in both the upper query and the subquery: this will help keep the number of nodes relatively small in each case. And, in this case, you probably want to have indexes on both "[sramp:baseArtifactType].[sramp:artifactType]" and "[sramp:relationship].[sramp:relationshipType]".

               

              You could even use a bind variable rather than the `element` literal value. And of course you'll want to test how this subquery performs and scales.

               

              WDYT?

              1 of 1 people found this helpful
              • 4. Re: Re: Re: Complex queries on parent/child relationships without correlated subqueries
                rhauch

                Another idea is to break the relationships into groups of children. So under an artifact you'd have a child named `elements` and under that all of the relationship nodes with a type of `element`. Then finding all artifacts that have no 'element' relationship becomes something like:

                 

                SELECT artifact.* FROM [sramp:baseArtifactType] AS artifact
                OUTER JOIN [sramp:relationshipContainer] AS relationshipContainer ON ISCHILDNODE(relationshipContainer,artifact)  
                WHERE artifact.[sramp:artifactType] = 'Part' AND
                AND NAME(relationshipContainer) != 'entity'
                CHILDCOUNT(relationshipContainer) = 0
                

                 

                The query needs an OUTER JOIN so that it also returns all artifacts that have no child nodes (which means it doesn't have an 'entity' relationship.

                 

                Of course, this may not really be very useful in your case, but I thought I'd throw it out.

                1 of 1 people found this helpful
                • 5. Re: Re: Re: Complex queries on parent/child relationships without correlated subqueries
                  brmeyer

                  Does your app need to search for artifacts without a specific kind of relationship? For example, if you know that `element` is the only kind of relationship that you want to filter out, then a very simple approach is to put a boolean property on "sramp:artifact" that tracks whether or not it has an `entity` relationship, and the query becomes trivial -- and fast! This will even work and will be very efficient for a handful of these predefined relationship types

                  Definitely a fair point, but won't be feasible in Artificer's specific context.  In addition to several dozen built-in types of relationships, we also allow ad hoc relationships to be created.  So, all of the types are not known ahead of time.

                  To handle this more generally, you could create a subquery that finds all artifacts that *have* one or more relationships of type `entity`, and then use that to find all artifacts that are not in that set defined by a subquery with the first query

                  Great idea -- will give that a shot.  But, as you point out, the only concern is performance...

                  • 6. Re: Re: Re: Complex queries on parent/child relationships without correlated subqueries
                    brmeyer

                    Another idea is to break the relationships into groups of children. So under an artifact you'd have a child named `elements` and under that all of the relationship nodes with a type of `element`.

                    Good grief, that's exactly how I have it organized.  That's perfect.  Thanks much!

                    • 7. Re: Re: Re: Complex queries on parent/child relationships without correlated subqueries
                      brmeyer

                      Randall, actually, now that I look at it, not quite following your last suggested query.  Is a piece of it missing?

                      • 8. Re: Re: Re: Re: Complex queries on parent/child relationships without correlated subqueries
                        rhauch

                        Well, maybe my description wasn't clear. My last suggestion was to introduce another node level between the artifact nodes and the relationship nodes:

                         

                        - 'A1' jcr:primaryType=sramp:someArtifactType
                          - 'element' jcr:primaryType=sramp:relationshipContainer
                            - 'rel1' jcr:primaryType=sramp:relationship sramp:relationshipType='element'
                            - 'rel2' jcr:primaryType=sramp:relationship sramp:relationshipType='element'
                          - 'foo' jcr:primaryType=sramp:relationshipContainer
                            - 'rel3' jcr:primaryType=sramp:relationship sramp:relationshipType='foo'
                            - 'rel4' jcr:primaryType=sramp:relationship sramp:relationshipType='foo'
                        

                         

                        Again, I'm not sure if this really works for everything else, but these relationship containers basically organize the relationships with a single type, so relationship nodes with a different type are represented under a different relationship container. If an artifact has no relationships of a particular type, then either (a) there is no relationship container for that relationship type, or (b) there is a container for that relationship type but it is empty and has no children.

                         

                        This allows you to query to find all artifacts that have no relationships of a particular type by using criteria on the # of child nodes (via CHILDCOUNT) of a specific relationship type node (level 2), or by the absence of the relationship type node (via the OUTER part). Essentially, this allows you to join just the top 2 levels.

                         

                        SELECT artifact.* FROM [sramp:baseArtifactType] AS artifact 
                        OUTER JOIN [sramp:relationshipContainer] AS relationshipContainer ON ISCHILDNODE(relationshipContainer,artifact)   
                        WHERE artifact.[sramp:artifactType] = 'Part' AND 
                        AND NAME(relationshipContainer) != 'entity' 
                        CHILDCOUNT(relationshipContainer) = 0 
                        

                         

                        Does that at least help explain the concept?

                         

                        BTW, JCR-SQL2 makes it very difficult to use criteria that asserts there are no children with some particular attributes. It's the absence-of that isn't covered in the query object model.

                        • 9. Re: Re: Re: Re: Re: Complex queries on parent/child relationships without correlated subqueries
                          brmeyer

                          Randall, thanks again.  You're right -- I was misreading what you were saying.  Artificer's structure is really similar:

                           

                          - 'A1' jcr:primaryType=sramp:someArtifactType
                            - 'sramp-relationships:element' jcr:primaryType=sramp:relationship sramp:relationshipType='element'
                              - ...jcr:primaryType=sramp:target...
                              - ...jcr:primaryType=sramp:target...
                            - 'sramp-relationships:fooRel' jcr:primaryType=sramp:relationship sramp:relationshipType='fooRel'
                              - ...jcr:primaryType=sramp:target...
                              - ...jcr:primaryType=sramp:target...
                          

                           

                          Each primary node can have 0..n relationship children, but only 0..1 for any given relationshipType.  So, your "relaionshipContainer" is already there, in a sense.

                           

                          Again, I'm not sure if this really works for everything else, but these relationship containers basically organize the relationships with a single type, so relationship nodes with a different type are represented under a different relationship container. If an artifact has no relationships of a particular type, then either (a) there is no relationship container for that relationship type, or (b) there is a container for that relationship type but it is empty and has no children.

                           

                          This allows you to query to find all artifacts that have no relationships of a particular type by using criteria on the # of child nodes (via CHILDCOUNT) of a specific relationship type node (level 2), or by the absence of the relationship type node (via the OUTER part). Essentially, this allows you to join just the top 2 levels.

                          Definitely understand what you're describing.  However, doesn't your example query suffer from the original problem?  Since our relationships can be ad-hoc, there's no way to define all the "relationshipContainers" ahead of time, so that container might not exist at all.  'CHILDCOUNT(relationshipContainer) = 0' would never occur in this context.  So, with the OUTER JOIN and NAME() use, you'd still be returning artifacts that have some relationship container, even if 'element' was present.


                          BTW, JCR-SQL2 makes it very difficult to use criteria that asserts there are no children with some particular attributes. It's the absence-of that isn't covered in the query object model.

                          Right, that's exactly what we're hitting

                           

                          Thanks for your patience and descriptions.