13 Replies Latest reply on Feb 13, 2013 2:47 PM by nsabina

    Stored Procedures and the Teiid Planner

    markaddleman

      (moved from https://community.jboss.org/thread/221057?tstart=0 for better googling)

       

      Queries of the form:

      SELECT blah FROM t1, (CALL sp(10)) AS sp

      UNION

      SELECT blah FROM t2, (CALL sp(10)) AS sp

       

      It appears that the planner will invoke the stored procedure twice, one for each step of the union. 

       

      I believe that in queries of the form

      SELECT blah FROM t1, (SELECT * FROM v) AS v

      UNION

      SELECT blah FROM t2, (SELECT * FROM v) AS v

      the planner will execute the inline view only once.

       

      I have found that many custom executions could be exposed either as stored procedure or as views/tables.  I have found that implementing stored procedures is easier than implementing views/table so I'd like a way to hint the planner that a stored procedure is deterministic much in the same way that I with a UDF.  I see that we can setFunction(true) on the Procedure metadata object but the value doesn't appear to be read anywhere that I can find.

        • 1. Re: Stored Procedures and the Teiid Planner
          shawkins

          > It appears that the planner will invoke the stored procedure twice, one for each step of the union.

           

          You'll want to check the plan.  If you see the access node marked as shared (see https://issues.jboss.org/browse/TEIID-1598), then the actual procedure will only be invoked once and the results shared with each join.

           

          > the planner will execute the inline view only once.

           

          Generally that should not be the case.  We do not have a specific optimization that will convert a union of joins to a union with a single join.  Here again you would want to check to see if the view results in access nodes that are marked as shared.  Since there are potentially more optimizations that we can do in this case, it may not be as easy for the optimizer to see that the source queries are sharable - this is the thrust of https://issues.jboss.org/browse/TEIID-2139  For this specific usage it may be more straight-forward to optimize the scenario as a union/join problem.

           

          > so I'd like a way to hint the planner that a stored procedure is deterministic much in the same way that I with a UDF. 

           

          The shared command logic considers all procedure invocations with the same parameters as determinstic.  And in general we tend not to consider stored procedure invocation as non-determinstic in select query, however I don't think that is explicitly stated anywhere.

           

          > I see that we can setFunction(true) on the Procedure metadata object but the value doesn't appear to be read anywhere that I can find.

           

          The function property will actually cause teiid to treat the metadata as a scalar function.  This exists since the Designer initially split the relational and function metamodels which made it difficult to define functions in Designer in the same model as the other relational constructs.

           

          Steve

          • 2. Re: Stored Procedures and the Teiid Planner
            markaddleman

            I was writing my post from memory and it's possible that I misremembered the plan.  I'll check later today with our latest plans.

            • 3. Re: Stored Procedures and the Teiid Planner
              rokhmanov

              This is a test query which we are running to reproduce the problem:

               

                SELECT '1' FROM

              ( SELECT '1'  FROM performance_netmaster.RMTADDR ) AS RMTADDR ,

              ( EXEC chorus_time.timetable(1000) ) AS chorus_refresh_expression

                UNION ALL

                SELECT '1' FROM

              ( SELECT '1' FROM performance_nm.RTNWRTT ) AS RTNWRTT ,

              ( EXEC chorus_time.timetable(1000) ) AS chorus_refresh_expression

               


              The performance_netmaster.RMTADDR is a table backed by translator, the performance_nm.RTNWRTT is a dynamic view. Note that RTNWRT dynamic view is built by DDL select from RMTADDR, so both this elements are somewhat related - not sure if this is important. The execution plan for this query is attached.

              • 4. Re: Stored Procedures and the Teiid Planner
                shawkins

                As this thread spawned from a prior one I want to make sure we're talking about the same thing - what problem does this plan exhibit exactly?

                 

                Steve

                • 5. Re: Stored Procedures and the Teiid Planner
                  markaddleman

                  Sorry, seems we got our threads crossed.  This is referencing the ConnectorWorkItem issue discussed in https://community.jboss.org/thread/221057?tstart=0

                   

                  We continue to try for a test case but the current problem seems to be entertwined with a custom translator that connects to one of our backend systems so any case we provide would not be standalone.  To help us along, do you think the problem is influenced by the supports* of the underlying translator?  I suspect so but want to confirm.

                  • 6. Re: Stored Procedures and the Teiid Planner
                    rokhmanov

                    Steven, I might be wrong but I believe it is applicable for both threads. I am not sure why Mark split them, here is what I see: the problematic query gets stuck and does not return result back. By stuck I mean - I see translator repeat attempts in logs, like in continuous execution. But it is a regular non-continuous call. In debug mode I see the ConnectorWorkItem gets incorrect execution, like it was explained by Mark in the beginning of the thread. The actual query is much bigger and contans more unions, but can be simplified to the test query I provided in the last post. When I run this test query I see exactly the same behavior with passed wrong Execution.

                     

                    My test query is also the same type as the sample query in the beginning of this new thread, we have Stored Procedures with supports Reusable in both subqueries (named "timetable"). The plan also shows the same planner behavior you discussed with Mark above - I see the same element "performance_netmaster.RMTADDR.CURR1MINOFFSET" appear twice in both Projects of UnionAllNode (see AccessNode(3) and AccessNode(8) of OPTIMIZATION COMPLETE section).

                     

                    My post is just an attempt to help you and Mark with a simpler plan and simpler query. I am having a hard time to make a unit test using embedded teiid - I cannot find a proper VDB configuration to reproduce it with dummy translators and views, and I cannot make (yet) our performance_nm translator to work in Embedded mode.

                    • 7. Re: Stored Procedures and the Teiid Planner
                      shawkins

                      So the issue here https://issues.jboss.org/browse/TEIID-2391 was due to the tuple source sharing logic not providing the correct nodeid, which of course is required by the reusable logic.

                       

                      To continue the deterministic/non-deterministic discussion from above, if you ever do need non-deterministic stored procedure handling specific to tuple source sharing, one way to achive this is by returning a CacheDirective with a scope of none for the relevant procedure executions.  However there will still be other instances that we treat procedure executions as deterministic, such as for automatically determining the scope for results caching, or in correlated subquery execution.  So if you find a general need for non-deterministic procedure handling that will need to be another issue.

                       

                      Steve

                      • 8. Re: Stored Procedures and the Teiid Planner
                        markaddleman

                        So the issue here https://issues.jboss.org/browse/TEIID-2391 was due to the tuple source sharing logic not providing the correct nodeid, which of course is required by the reusable logic.

                        Thanks a bunch for the fix.  The beer counter keeps growing.

                         

                        To continue the deterministic/non-deterministic discussion from above, if you ever do need non-deterministic stored procedure handling specific to tuple source sharing, one way to achive this is by returning a CacheDirective with a scope of none for the relevant procedure executions.  However there will still be other instances that we treat procedure executions as deterministic, such as for automatically determining the scope for results caching, or in correlated subquery execution.  So if you find a general need for non-deterministic procedure handling that will need to be another issue.

                        I like the default of treating views and stored procedures as deterministic as the semantics play closer to read consistency.  If we ever do need non-deterministic procedures, we'll file an issue.

                        • 9. Re: Stored Procedures and the Teiid Planner
                          shawkins

                          https://issues.jboss.org/browse/TEIID-2322 was resolved in such a way as to render https://issues.jboss.org/browse/TEIID-2391 moot.  This addresses several potential problems with using the node id as the identifier for the ReuseableExecution, but it makes it more likely that the command will change when reset is called.

                          • 10. Re: Stored Procedures and the Teiid Planner
                            markaddleman

                            Thanks for the heads up.  We are using reusable executions in  two of our translators but only one of them has sufficient supports* where the command might actually change.  We have an "h2 continuous translator" The basic strategy is to put triggers on all of our H2 tables.  These triggers publish change events when an insert/update/delete is performed.  When called in continuous mode, our H2 continuous translator listens for changes on the relevant tables.  After returning results from the first execution, if a change occurs, it calls dataAvailable() and restarts the execution from the execute() method.

                             

                            We have implemented this H2 continuous translator by subclassing both H2ExecutionFactory and JDBCQueryExecution.  When a reset occurs, we set JDBCQueryExecution's context, connection and command under synchronized block to ensure a cache flush.  We also ensure that JDBCQueryExecution is called in a synchronized block (I've been meaning to post a question about the relationship of translators to the Java memory model but I haven't gotten around to it).  We don't currently ensure that next() is called in a synchronized block. 

                             

                            Do you see a problem with this approach?

                            • 11. Re: Stored Procedures and the Teiid Planner
                              shawkins

                              > Do you see a problem with this approach?

                               

                              Since you are updating the command with the reset, that covers the initial concern.  As for synchronization that will just depend on what state you are changing that next has access to.

                              • 12. Re: Stored Procedures and the Teiid Planner
                                markaddleman

                                Good point.  I see that columnDataTypes and results are computed in execute() so we'll synchronize on next() as well.

                                • 13. Re: Stored Procedures and the Teiid Planner
                                  nsabina

                                  Fix for https://issues.jboss.org/browse/TEIID-2322 really made a difference. As Mark says "beer counter keeps growing", this time at least by 10 I believe