7 Replies Latest reply on Oct 22, 2013 12:04 PM by shawkins

    Reusable executions in continuous queries

    markaddleman

      I have an execution factory that manages two stored procedures that get called in an anonymous procedure block continuously.  Something like:

      BEGIN

         CALL t.sp1();

         CALL t.sp2();

      END

       

      sp1 is backed by a reusable execution which completes is lifecycle close().  When the engine executes sp2, it does not call the execution factory createProcedureExecution(), instead it picks the same execution instance as was used for sp1.  I'm not entirely sure of the intention behind reusable execution but this doesn't look like correct behavior to me.  I can develop a test case if that would help.

       

      This is against the 8.6-Alpha2-SNAPSHOT. 

        • 1. Re: Reusable executions in continuous queries
          shawkins

          > I'm not entirely sure of the intention behind reusable execution but this doesn't look like correct behavior to me.

           

          Since the initial correction to the implementation, reusable executions are essentially a pooling mechanism - they are not tied to a particular node in a plan (given that plans change, multiple queries, and even concurrent queries can be issued from a single node).

          • 2. Re: Reusable executions in continuous queries
            markaddleman

            Then I wonder about their usefulness.  Doesn't the current behavior mean that, effectively, all reusable executions from the same factory must have the same processing logic?  Separately, as a pooling mechanism, what is really being conserved?  Memory?  The CPU cycles for reallocating execution on the heap?

            • 3. Re: Reusable executions in continuous queries
              shawkins

              > Doesn't the current behavior mean that, effectively, all reusable executions from the same factory must have the same processing logic?  Separately, as a pooling mechanism, what is really being conserved?  Memory?  The CPU cycles for reallocating execution on the heap?

               

              As I understood the rationale it is for situations where there is more local state (or perhaps no backing resource-adapter defined) such that the executions themselves are effectively acting as the connections.

               

              > Then I wonder about their usefulness.

               

              There are certainly other ways to envision this, such as reuse based upon a particular (probably parameterized) command, or to attempt a 1-1 mapping with the plan - but the latter has issues conceptually with data driven executions (dependent joins), dynamic sql, etc. and would take some amount of rearchitecting to handle as the plan/node layer is effectively disconnected from the source layer.

              • 4. Re: Reusable executions in continuous queries
                markaddleman

                > As I understood the rationale it is for situations where there is more local state (or perhaps no backing resource-adapter defined) such that the executions themselves are effectively acting as the connections.

                 

                I'm going to have to digest this a bit more.

                 

                > There are certainly other ways to envision this, such as reuse based upon a particular (probably parameterized) command

                Fair enough, I hadn't considered this possibility.

                 

                In an idea that's an offshoot of the cache invalidation and re-execution coordination that we had discussed a few weeks ago, I've been playing around with a notion of a task management execution factory.  The idea is that tasks, written in Teiid procedure language language using anonymous blocks, would be submitted to the system using a stored procedure and the system would execute the task continuously.  I can see using this mainly for data pump kinds of activities such as destructively reading data off a queue and inserting/merging it into a table.

                 

                In my experiment, I have some code that looks like:

                try (c = getConnection; s = c.prepareStatement("call submit_task(?)"))

                {

                   s.setString(1, "begin merge into blah select from blahblah; select rowcount; end");

                   s.execute();

                   waitForever();

                }

                 

                The submit_task execution looks like:

                execute() {

                   s = commandContext.getConnection();

                   s.submitExecute(taskProcedure, callback, continuously);

                   globalRunningTasksCollection.add(s);

                }

                 

                The first problem I ran into is the connection received from commandContext.getConnection() already has a transaction associated with it and, thus, can't submitExecute().  I got around that problem by adding s.execute("set autoCommitTxn off")  - I suspect this truly isn't allowed since I think it just does an end-run around the verbotten setAutoCommit() call. 

                 

                The next problem is that it appears that every time the submitted task returns a result set, I get an exception TEIID30495 indicating that the receiver has been closed.  I think this is because the scope of the internal connection that launched the task has been closed even though the statement hasn't been explicitly closed and neither has the client connection (ie, "c" in the first pseudo code).  Is this right?

                 

                I'm probably abusing the notion of the CommandContext.getConnection() idea with this use case.  My motivation is that, ultimately, I'd like to expose this stored procedure as a REST service for clients to "program" our server for data pump activities and, ideally, I want to re-use Teiid's REST functionality.  Thoughts?

                • 5. Re: Reusable executions in continuous queries
                  shawkins

                  > I got around that problem by adding s.execute("set autoCommitTxn off")  - I suspect this truly isn't allowed since I think it just does an end-run around the verbotten setAutoCommit() call.

                   

                  Turning off the transaction auto detection is fine when we are being overly cautious about starting a transaction.  I'm not seeing though where we should be detecting unless multi-source, the obscure update count extension metadata, etc. is being used.  Just executing a single procedure would typically not require a transaction.

                   

                  If this is simply issuing back into the engine, what does the intermediate translator step help you do?  Why not just issue the sql blocks directly?  Reading ahead it looks like you want continuous and you want a REST hook.  Perhaps it would be better to just take over responsibility for the notion of continuous execution in a procedure that looks something like:

                   

                  create virtual procedure submit_task(string sql) as

                  begin

                     while (not done)

                         execute immediate sql;

                  end

                   

                  where you could control what it means to be done, when to reexecute, etc.  The only down side is currently there would be replanning of the dynamic sql, but I'm looking at what can be done with that now to better reuse dynamic plans. 

                   

                  > I think this is because the scope of the internal connection that launched the task has been closed even though the statement hasn't been explicitly closed and neither has the client connection (ie, "c" in the first pseudo code).  Is this right?

                   

                  Closing the internal connection should be a no-op.  Can you post an example of what you are trying to do.  There may be some issue with trying a continuous query against an internal connection that I didn't anticipate.

                   

                  > ideally, I want to re-use Teiid's REST functionality.  Thoughts?

                   

                  A simple rest or OData hook to a procedure should work perfectly well - provided that the procedure returns in a reasonable time.  But that may not be what you are looking for as it seems like you want to drop off a task that keeps running.  Since the REST logic will close the Statement that wouldn't be viable.  MMX did expose a custom web api for long running queries that expected the client to use a polling mechanism to get results rather than a simple request/response (OData does support pagination, but that doesn't seem applicable here).

                   

                  Steve

                  • 6. Re: Re: Reusable executions in continuous queries
                    markaddleman

                    > Turning off the transaction auto detection is fine when we are being overly cautious about starting a transaction.  I'm not seeing though where we should be detecting unless multi-source, the obscure update count extension metadata, etc. is being used.  Just executing a single procedure would typically not require a transaction.

                     

                    Attached is an example that fails with autoCommitTxn as default but works with autoCommitTxn set to off.

                     

                    > Closing the internal connection should be a no-op.  Can you post an example of what you are trying to do.

                     

                    The same example demonstrates the problem but now I think I understand it a little better.  The statement calling submit_task stored proc gets closed causing the submitted task's callback to receive an exception indicating that the results receiver has gone away.  The exception appears to cancel the request.  The engine canceling a task because the receiver has disappeared makes perfect sense when the task is a query but when the task is a side-affecting procedure, the logic doesn't apply.

                     

                    > If this is simply issuing back into the engine, what does the intermediate translator step help you do?  Why not just issue the sql blocks directly?

                     

                    I hadn't considered using execute immediate to achieve the same ends.  The only problem is one of my own making:  The execution restart framework that I'm putting together relies on CommandContext.getReuseCount() to know when it's the first or some subsequent execution of the same request.  I could figure out some other sentinel. 

                     

                    > The only down side is currently there would be replanning of the dynamic sql, but I'm looking at what can be done with that now to better reuse dynamic plans.

                     

                    Although re-using the plan would be nice, all this is speculative at this point so taking performance issues into account is premature. 

                     

                    > A simple rest or OData hook to a procedure should work perfectly well - provided that the procedure returns in a reasonable time.  But that may not be what you are looking for as it seems like you want to drop off a task that keeps running.  Since the REST logic will close the Statement that wouldn't be viable. 

                     

                    I suppose I could write my down data source that produced statements whose close() methods did nothing.

                     

                    > MMX did expose a custom web api for long running queries that expected the client to use a polling mechanism to get results rather than a simple request/response

                    MMX?  I'm not familiar with that.

                    • 7. Re: Re: Reusable executions in continuous queries
                      shawkins

                      > MMX?  I'm not familiar with that.

                       

                      Sorry, that's the abbreviation for MetaMatrix, the old product.