5 Replies Latest reply on Jun 16, 2014 1:12 PM by shawkins

    Optimizations for hierarchical joins

    markaddleman

      Our UI has a notion of a "focal object" - It's the set of information that the user is primarily interested in.  The focal object is represented to Teiid as a table or an inline view.  The UI shows the first n rows of the focal object and rows of related objects.  In a lot of cases, the related objects are sparsely populated and the desire is to show default values for the missing related data.  I think of this as a hierarchical join between the focal object and everything else.

       

      I can think of a few ways to express the hierarchical idea in Teiid SQL but the most straightforward is to outer join the focal object with the related using the optional hint.  Something like this:

      SELECT * FROM focal_object LEFT OUTER JOIN /*+ optional */ related_object ON focal_object.a=related_object.a LIMIT 10

       

      This produces the desired execution plan by pushing the LIMIT clause to focal_object source and then joining that result to related_object.  This technique doesn't work, however when an ORDER BY is introduced:

      SELECT * FROM focal_object LEFT OUTER JOIN /*+ optional */ related_object ON focal_object.a=related_object.a ORDER BY focal_object.c LIMIT 10

       

      In this case, Teiid pushes an ORDER BY a to both focal_object and related_object (presumably to execute an efficient join) and then performs the order by focal_object.c itself and finally executing the limit clause.  Wouldn't it be more efficient to push down order by focal_object.c limit 10 to focal_object source and then join those results to related_object?  The results may not be right but "right" takes on a looser definition with the optional hint.  I have tried making focal_object a dependent join but that doesn't affect the push down in the desired way, either.

       

      Attached is my test case.

        • 1. Re: Optimizations for hierarchical joins
          shawkins

          >  Wouldn't it be more efficient to push down order by focal_object.c limit 10 to focal_object source and then join those results to related_object? The results may not be right but "right" takes on a looser definition with the optional hint.

           

          The logic currently says only perform the optional join removal if no columns from the optional are used.  I don't think either of the queries that you show above will meet that condition given that you are using select *.  If you just select from focal_object, then the optional hint will kick in. 

          • 2. Re: Optimizations for hierarchical joins
            markaddleman

            > The logic currently says only perform the optional join removal if no columns from the optional are used.  I don't think either of the queries that you show above will meet that condition given that you are using select *.  If you just select from focal_object, then the optional hint will kick in.


            Makes sense.  So, it seems the optional hint really isn't the way to achieve a hierarchical join.  Instead, I think we can approach this problem using an inline view or a CTE.  Given how our query generation system is built, it is a lot easier to implement using CTEs but I've noticed that the execution plans for CTEs aren't as optimized as inline views.  I know there's a jira to improve this but I can't find it quickly.  Specifically, I've noticed that WITH q AS (SELECT * FROM T) SELECT q.a FROM q will fetch all columns from T.  Perhaps this belies a misunderstanding of CTE but eventually, do you expected Teiid to process CTEs with the same efficiency as inline views?



            • 3. Re: Optimizations for hierarchical joins
              shawkins

              [TEIID-2481] common table (with) push down improvements - JBoss Issue Tracker allows more optimization with CTEs in 8.8.  The initial commit allows for CTEs in views/inline views to be merged into their parent plans, but it does not yet go to the extent of performing projection minimization nor some of the other optimizations mentioned in the issue.  Yes we do expect the CTE logic to eventually be fully optimized, it's just that their planning path doesn't follow the same logical structure as the rest of relational planning so it's a little more difficult to add in optimizations.

              • 4. Re: Optimizations for hierarchical joins
                markaddleman

                > Yes we do expect the CTE logic to eventually be fully optimized, it's just that their planning path doesn't follow the same logical structure as the rest of relational planning so it's a little more difficult to add in optimizations.

                 

                Thanks, that helps.  Now for the $64,000 question:  Do you think you'll get most of the optimizations onto the CTEs in the 8.9 timeframe?  I'm trying to judge whether it's worth us investing in our query generator to improve its inline view capability or whether we should just wait for CTEs to be more fully optimized.

                • 5. Re: Optimizations for hierarchical joins
                  shawkins

                  > Do you think you'll get most of the optimizations onto the CTEs in the 8.9 timeframe?

                   

                  Hopefully, but if you could tack on specific needs to the issue, then we can spawn off subtasks and have a better picture of progress.