8 Replies Latest reply on Mar 21, 2013 10:17 PM by markaddleman

    Joining with view versus joining with procedure

    nsabina

      Exactly the same query is executed on two systems using the same version of Teiid and pointing to the same backend. The only difference is that one system defines "chorus_time.timetable" as a view and the other one as a procedure.

       

      SELECT ........

      FROM (SELECT CHORUS_B.*

                    FROM (SELECT * FROM chorus_time.timetable WHERE period = 30000) AS chorus_refresh_expression,

                              (SELECT * FROM SECURITY_USERUPDATE.BASEUSER) AS CHORUS_B)

                              AS CHORUS_B

                 LEFT OUTER JOIN

                (SELECT CHORUS_J0.*

                   FROM (SELECT * FROM chorus_time.timetable WHERE period = 30000) AS chorus_refresh_expression,

                              (SELECT * FROM SECURITY.SYSINFOVIEW) AS CHORUS_J0)

                             AS CHORUS_J0

                ON CHORUS_J0.sysid = CHORUS_B.sysid

      WHERE (CHORUS_B.USERID = '....') AND (CHORUS_B.SYSID = '......') LIMIT 1

       

      Query is successfully executed on the system where timetable defind as procedure and executes endlessly on the second one. It seems to be related to LEFT OUTER JOIN between cross product joins

       

      All differences in execution plan seem to be related to how Teiid is processing join with timetable.

      Is it expected behavior ?

       

      Here is a fragment of one of the differences in the plan:

       

      ============= procedure (query returns results) ================

      CANONICAL PLAN:

      .................

                    Join(groups=[chorus_refresh_expression, CHORUS_B], props={JOIN_TYPE=CROSS JOIN, JOIN_STRATEGY=NESTED_LOOP, JOIN_CRITERIA=[]})

                      Source(groups=[chorus_refresh_expression], props={NESTED_COMMAND=SELECT * FROM chorus_time.timetable WHERE period = 30000, SYMBOL_MAP={chorus_refresh_expression.ts_start=chorus_time.timetable.ts_start, chorus_refresh_expression.ts_stop=chorus_time.timetable.ts_stop, chorus_refresh_expression.EXPECTED_PERIOD=chorus_time.timetable.EXPECTED_PERIOD, chorus_refresh_expression.ACTUAL_PERIOD=chorus_time.timetable.ACTUAL_PERIOD, chorus_refresh_expression.EXECUTION_COUNT=chorus_time.timetable.EXECUTION_COUNT, chorus_refresh_expression.period=chorus_time.timetable.period}})

                        Project(groups=[chorus_time.timetable], props={PROJECT_COLS=[chorus_time.timetable.ts_start, chorus_time.timetable.ts_stop, chorus_time.timetable.EXPECTED_PERIOD, chorus_time.timetable.ACTUAL_PERIOD, chorus_time.timetable.EXECUTION_COUNT, chorus_time.timetable.period]})

                          Select(groups=[chorus_time.timetable], props={SELECT_CRITERIA=period = 30000})

                            Source(groups=[chorus_time.timetable], props={NESTED_COMMAND=SELECT X.*, chorus_time.timetable.period AS period FROM (EXEC chorus_time.timetable(chorus_time.timetable.period)) AS X, SYMBOL_MAP={chorus_time.timetable.ts_start=X.ts_start, chorus_time.timetable.ts_stop=X.ts_stop, chorus_time.timetable.EXPECTED_PERIOD=X.EXPECTED_PERIOD, chorus_time.timetable.ACTUAL_PERIOD=X.ACTUAL_PERIOD, chorus_time.timetable.EXECUTION_COUNT=X.EXECUTION_COUNT, chorus_time.timetable.period=chorus_time.timetable.period}})

                              Project(groups=[X], props={PROJECT_COLS=[X.ts_start, X.ts_stop, X.EXPECTED_PERIOD, X.ACTUAL_PERIOD, X.EXECUTION_COUNT, chorus_time.timetable.period AS period]})

                                Source(groups=[X], props={NESTED_COMMAND=EXEC chorus_time.timetable(chorus_time.timetable.period), SYMBOL_MAP={X.ts_start=chorus_time.timetable.ts_start, X.ts_stop=chorus_time.timetable.ts_stop, X.EXPECTED_PERIOD=chorus_time.timetable.EXPECTED_PERIOD, X.ACTUAL_PERIOD=chorus_time.timetable.ACTUAL_PERIOD, X.EXECUTION_COUNT=chorus_time.timetable.EXECUTION_COUNT}})

                                  Project(groups=[], props={PROJECT_COLS=[chorus_time.timetable.ts_start, chorus_time.timetable.ts_stop, chorus_time.timetable.EXPECTED_PERIOD, chorus_time.timetable.ACTUAL_PERIOD, chorus_time.timetable.EXECUTION_COUNT]})

                                    Source(groups=[chorus_time.timetable], props={VIRTUAL_COMMAND=EXEC chorus_time.timetable(chorus_time.timetable.period)})

       

      ===============view ( query hangs) ========

      CANONICAL PLAN:

      ....................

                    Join(groups=[chorus_refresh_expression, CHORUS_B], props={JOIN_TYPE=CROSS JOIN, JOIN_STRATEGY=NESTED_LOOP, JOIN_CRITERIA=[]})

                      Source(groups=[chorus_refresh_expression], props={NESTED_COMMAND=SELECT * FROM chorus_time.timetable WHERE period = 30000, SYMBOL_MAP={chorus_refresh_expression.TS_START=chorus_time.timetable.TS_START, chorus_refresh_expression.TS_STOP=chorus_time.timetable.TS_STOP, chorus_refresh_expression.PERIOD=chorus_time.timetable.PERIOD, chorus_refresh_expression.ACTUAL_PERIOD=chorus_time.timetable.ACTUAL_PERIOD, chorus_refresh_expression.EXECUTION_COUNT=chorus_time.timetable.EXECUTION_COUNT}})

                        Project(groups=[chorus_time.timetable], props={PROJECT_COLS=[chorus_time.timetable.TS_START, chorus_time.timetable.TS_STOP, chorus_time.timetable.PERIOD, chorus_time.timetable.ACTUAL_PERIOD, chorus_time.timetable.EXECUTION_COUNT]})

                          Select(groups=[chorus_time.timetable], props={SELECT_CRITERIA=period = 30000})

                            Source(groups=[chorus_time.timetable])

        • 1. Re: Joining with view versus joining with procedure
          shawkins

          > All differences in execution plan seem to be related to how Teiid is processing join with timetable.

           

          The procedure invocation is nested under a generated inline view needed by the procedural relational syntax.  The snippets only shows the initial structural differences in the initial plan, which is entirely expected.  The final plan / processing plan would show any difference in the actual processing plans.

           

          > Is it expected behavior ?

           

          The processing plans many times will be functionally equivalent in a scenario like this, but procedures and tables can plan differently when used in joins mainly due to the lack of cardinality estimates with a procedure invocation.  Although it's not being used here, the join structure may also be somewhat forced with the use of a lateral join or implied access pattern to pass parameter values.

           

          So as to the question of why the query appears to be hanging, that is not clear from the post.  You'd could first check that it's not simply an issue related to continuous/asynch execution and not throw DataNotAvailable exceptions.

           

          Steve

          • 2. Re: Joining with view versus joining with procedure
            nsabina

            I took close look at reusable timer executions and it looks like there is no new instance of TimerExecution created for the second component of outer join. This causes the execution state being passed as WAIT and the result of left outer join being always suspended.

             

            Update 3/13. I am attaching the test case which shows that same instance of reusable execution is used for both selects from timetable

             

            ========== FIRST===================

            Line 13: 12 Mar 2013 14:08:38,937 PDT DEBUG [org.teiid.CONNECTOR] (Worker1_QueryProcessorQueue77) FcjDtlJm8tnC.0.5.315 Create State

            Line 19: 12 Mar 2013 14:08:38,943 PDT DEBUG [org.teiid.CONNECTOR] (Worker1_QueryProcessorQueue77) FcjDtlJm8tnC.0.5.315 Processing NEW request: SELECT chorus_time.timetable.TS_START FROM chorus_time.timetable WHERE chorus_time.timetable.PERIOD = 30000 LIMIT 1

            Line 29: 12 Mar 2013 14:08:38,959 PDT DEBUG [org.teiid.CONNECTOR] (Worker1_QueryProcessorQueue77) FcjDtlJm8tnC.0.5.315 Obtained execution

            Line 33: 12 Mar 2013 14:08:38,965 PDT DEBUG [org.teiid.CONNECTOR] (Worker1_QueryProcessorQueue77) FcjDtlJm8tnC.0.5.315 Executed command

            Line 35: 12 Mar 2013 14:08:38,968 PDT DEBUG [org.teiid.CONNECTOR] (Worker1_QueryProcessorQueue77) FcjDtlJm8tnC.0.5.315 Processing MORE request

            Line 36: 12 Mar 2013 14:08:38,970 PDT DEBUG [org.teiid.CONNECTOR] (Worker1_QueryProcessorQueue77) FcjDtlJm8tnC.0.5.315 Getting results from connector

            Line 37: 12 Mar 2013 14:08:38,971 PDT DEBUG [org.teiid.CONNECTOR] (Worker1_QueryProcessorQueue77) FcjDtlJm8tnC.0.5.315 Obtained last batch, total row count: 1

            Line 38: 12 Mar 2013 14:08:38,972 PDT DEBUG [org.teiid.CONNECTOR] (Worker1_QueryProcessorQueue77) FcjDtlJm8tnC.0.5.315 Remove State

            Line 39: 12 Mar 2013 14:08:38,973 PDT DEBUG [org.teiid.CONNECTOR] (Worker1_QueryProcessorQueue77) FcjDtlJm8tnC.0.5.315 Processing Close : SELECT chorus_time.timetable.TS_START FROM chorus_time.timetable WHERE chorus_time.timetable.PERIOD = 30000 LIMIT 1

            Line 41: 12 Mar 2013 14:08:38,977 PDT DEBUG [org.teiid.CONNECTOR] (Worker1_QueryProcessorQueue77) FcjDtlJm8tnC.0.5.315 Closed execution

             

            =========SECOND=================

            Line 15: 12 Mar 2013 14:08:38,939 PDT DEBUG [org.teiid.CONNECTOR] (Worker1_QueryProcessorQueue77) FcjDtlJm8tnC.0.8.317 Create State

            Line 56: 12 Mar 2013 14:08:39,089 PDT DEBUG [org.teiid.CONNECTOR] (Worker1_QueryProcessorQueue80) FcjDtlJm8tnC.0.8.317 Processing NEW request: SELECT chorus_time.timetable.TS_START FROM chorus_time.timetable WHERE chorus_time.timetable.PERIOD = 30000

            Line 58: 12 Mar 2013 14:08:39,093 PDT DEBUG [org.teiid.CONNECTOR] (Worker1_QueryProcessorQueue80) FcjDtlJm8tnC.0.8.317 Obtained execution

            Line 62: 12 Mar 2013 14:08:39,104 PDT DEBUG [org.teiid.BUFFER_MGR] (Worker1_QueryProcessorQueue80) FcjDtlJm8tnC.0.8.317 Blocking on DataNotAvailableException FcjDtlJm8tnC.0.8.317

            Line 74: 12 Mar 2013 14:08:39,421 PDT DEBUG [org.teiid.BUFFER_MGR] (Worker1_QueryProcessorQueue82) FcjDtlJm8tnC.0.8.317 Blocking on DataNotAvailableException FcjDtlJm8tnC.0.8.317

            =================================.

            • 3. Re: Joining with view versus joining with procedure
              shawkins

              Sabina,

               

              Assuming https://issues.jboss.org/browse/TEIID-2391 has been applied, that would be expected behavior as the ReusableExecution is available for reuse by the second execution.  Does your execution state need updated when reset is called?

               

              Steve

              • 4. Re: Joining with view versus joining with procedure
                nsabina

                Hi Steve,

                 

                Yes this patch is applied and working as expected. The issue here is that we need new instance of TimerExecution for each side of outer join ( in black bold). Each of them should execute once and then switch to WAIT status until refersh period is over. Right now we have one instance only for both side. For left side of the join it returns the timestamp and then switches to WAIT, so other side of the join is not merged/processed.

                 

                SELECT ........

                FROM (SELECT CHORUS_B.*

                              FROM (SELECT * FROM chorus_time.timetable WHERE period = 30000) AS chorus_refresh_expression,

                                        (SELECT * FROM SECURITY_USERUPDATE.BASEUSER) AS CHORUS_B)

                                        AS CHORUS_B

                           LEFT OUTER JOIN

                          (SELECT CHORUS_J0.*

                             FROM (SELECT * FROM chorus_time.timetable WHERE period = 30000) AS chorus_refresh_expression,

                                        (SELECT * FROM SECURITY.SYSINFOVIEW) AS CHORUS_J0)

                                       AS CHORUS_J0

                          ON CHORUS_J0.sysid = CHORUS_B.sysid

                WHERE (CHORUS_B.USERID = '....') AND (CHORUS_B.SYSID = '......') LIMIT 1

                 

                In provided test case you can see that other 2 subselects (in italic) each get it's own instance of H2Execution, so result is returned but there is nothing to merge with on the other side

                 

                Thanks,

                Sabina

                • 5. Re: Joining with view versus joining with procedure
                  shawkins

                  > In provided test case you can see that other 2 subselects (in italic) each get it's own instance of H2Execution, so result is returned but there is nothing to merge with on the other side

                   

                  With TEIID-2391 reusable executions are not tied to particular processing nodes, but rather pooled for reuse.  There is no guarentee that a new execution will be used rather reusable executions are reused based upon when they complete their processing relative to when other processing starts.  Do you want different behavior or are you seeing something you think is an issue?

                   

                  > Each of them should execute once and then switch to WAIT status until refersh period is over.

                   

                  That is not something that Teiid is aware of.  This really becomes a question of what you are trying to do so that we can map that to the current capabilities of Teiid or see if an enhancement is needed.

                  1 of 1 people found this helpful
                  • 6. Re: Joining with view versus joining with procedure
                    nsabina

                    Steve, you are correct, I think I see the dilemma. How do you distinguish if it is invoked for the same query submission or not?

                     

                    I am not sure why TEIID-2391 called the need to untie reusable execution from node ID but apparently it is causing a problem.

                    Ideally I believe Teiid should be able to recognize when to create new instance of reusable execution and when to pull existing for reuse. I do not see how this can be controlled before query is submitted to Teiid.

                     

                    If same reusable execution is called more than one time while single full query is processed (within same execution plan ?) , new instance should be created. If this same query submitted second time, the same reusable exectuions (matching either processing node) should be pulled again.

                    We continue looking for solution, currently changing query rewrite logic to join with single reusable execution. It is however rather limiting and we truely hope this can be solved.

                    If you have ideas how this can be achieved please share.

                     

                    PS. I tested the query with 8.3.0 and even I see slightly different behavior it is still using same instance of reusable execution for both sides of the join.

                    • 7. Re: Joining with view versus joining with procedure
                      shawkins

                      > I am not sure why TEIID-2391 called the need to untie reusable execution from node ID but apparently it is causing a problem.

                       

                      A node id is not sufficiently unique.  Subplans (via procedure calls or implicitly invoked materialization plans) can have the same node ids as the parent.  Also for dependent joins the same node can have multiple queries executed in parallel against it.  So while the node id was initially convient, it is not by itself currently capable of this tracking.

                       

                      > If this same query submitted second time, the same reusable exectuions (matching either processing node) should be pulled again.

                       

                      That was not an express requirement as I understood it, just that the execution could be reused for the lifetime of the user query.  Tracking by command is possible, but also has issues with dependent joins and with subplans that issue dynamic sql such that it may still be needed to view them as a pool of executions to limit the number created.

                       

                      One possiblity would be to prevent queries that use reusable executions from executing subplans or parallel executions to enforce a simple execution model and allow for a 1-1 tie between node and reusable execution.

                       

                      Maybe it would help if I understood better what your tie is between a timetable query and a base table.  Do you need a different refresh time for each base table (as you say is a single time not sufficient?).  What type of translator(s) is backing the base tables?

                       

                      > PS. I tested the query with 8.3.0 and even I see slightly different behavior it is still using same instance of reusable execution for both sides of the join.

                       

                      All that matters is the relative time between one execution finishs and the other starts.  There are changes in join processing to perform prefetching that may cause the execution timing to be different between those versions.

                      1 of 1 people found this helpful
                      • 8. Re: Joining with view versus joining with procedure
                        markaddleman

                        Thanks, Steve.  I have just realized the bug is in our translator code owing to my misunderstanding of the translator mechanics.  In particular:  instances of reusable executions can be reused across a query plan even within the same iteration of a continuous query.  The implication of this is reusable executions should not carry state related to the result set that the execution is producing (here, I include any state related to throwing or not DataNotAvailableException).  The proper place to store all this state is in a connection object.

                         

                        We've found reusable executions and continuous queries extremely powerful with a few gotchas.  I want to find the time to write a proper blog post about using them.