1 2 Previous Next 26 Replies Latest reply on Nov 30, 2012 1:01 PM by shawkins

Lifecycle in ReusableExecutions

markaddleman Nov 1, 2012 1:25 PM

We're still struggling a bit with continuous executions. I believe that this is the lifecycle of a ReusableExecution under continuous execution:

Constructor,goto 2
Execute, goto 3
Next
1. If next returns non-null, goto 3
2. If next returns null, goto 4
3. If next throws DataNotAvailable.NO_POLLING, goto 7
4. If next throws DataNotAvailable(timeout / date), goto 8
Close, goto 5
Reset, goto 2
When ExecutionContext.dataAvailable() called, goto 5
When timeout / date occurs, goto 5
Dispose (only reached when client cancels the statement)

Is this correct?

It appears that a batch of results is delivered to the client

step 3.1 if the number of rows is large enough
step 3.2

Should batches delivered to the client immediately on 3.3 and 3.4?

1. Re: Lifecycle in ReusableExecutions

rareddy Nov 2, 2012 9:15 AM (in response to markaddleman)
I think it is more like

ReusableExecution Constructor,goto 2
Execute, goto 3
Next
If next returns non-null, goto 3
If next returns null, goto 4
If next throws DataNotAvailable.NO_POLLING, goto 6
If next throws DataNotAvailable(timeout / date), goto 7
Close, goto 5
Reset, goto 2
When ExecutionContext.dataAvailable() called, goto 3
When timeout / date occurs, goto 3
Dispose (only reached when client close or cancel called on the statement)

Should batches delivered to the client immediately on 3.3 and 3.4?

Ans: In 3.3, 3.4 the translator is indicating that it has no results to end to engine, so engine will not send result in this case. When all the results (up to every close method) from all the translators participating in the transaction gathered then the client results are sent. If certain scenarios if results are large enough then results can be sent once the batch size is fulfilled.

Ramesh..
Actions
2. Re: Lifecycle in ReusableExecutions

shawkins Nov 2, 2012 1:43 PM (in response to rareddy)

Ramesh, yes - reset is not called until after close as it indicates another execution will begin.

For 3.3 and 3.4, the current logic in the access node does not generally consider a blocking event a reason to release rows that have already been collected. If the hasPendingRows check is elevated out of its if block, then you would see results immediately in this case as well.

That would be a valid behavior change especially in this case. In general with poorly configured source batch sizes it would lead to additional overhead from small batches, but of course would reduce latency for plans that basic pass batches through.

Steve
Actions
3. Re: Lifecycle in ReusableExecutions

markaddleman Nov 11, 2012 12:28 PM (in response to rareddy)
I recently discovered that DataNotAvailable can be thrown from the execute() method. I hope this is supported behavior otherwise we have very complex state management behavior in next(). If so, I believe the lifecycle is:

ReusableExecution Constructor,goto 2
Execute, goto 3
If execute throws DataNotAvailable.NO_POLLING goto 8
If execute throws DataNotAvailable.(timeout/date), goto 9
else goto 3
Next
If next returns non-null, goto 3
If next returns null, goto 4
If next throws DataNotAvailable.NO_POLLING, goto 6
If next throws DataNotAvailable(timeout / date), goto 7
Close, goto 5
Reset, goto 2
When ExecutionContext.dataAvailable() called, goto 3
When timeout / date occurs, goto 3
When ExecutionContext.dataAvailable() called, goto 2
When timeout / date occurs, goto 2
Dispose (only reached when client close or cancel called on the statement)

If this is correct, I'll produce a nice state diagram and place it in the wiki
Actions
4. Re: Lifecycle in ReusableExecutions

shawkins Nov 12, 2012 10:57 AM (in response to markaddleman)

Yes at some point I removed the limiation of throwing DataNotAvailable from the execute method. However it generally makes more sense to just return and then throw the DataNotAvailable exception from next. An update to the wiki/docs would definitely be wellcomed.

Steve
Actions
5. Re: Lifecycle in ReusableExecutions

markaddleman Nov 12, 2012 11:17 AM (in response to shawkins)

I can see how throwing DataNotAvailable from next() makes sense for data sources that are slow. In our cases, however, our data sources provide complete result sets very quickly and always return null from next. We throw DNA from execute to indicate that the data sources have updated result sets available. Throwing DNA from execute simplifies the execution's state management tremendously. I'd like to expand the use of DNA: I'd like to see it thrown from close() as well. Restarting an execution suspended using DNA from either execute or close, the engine should start from the execute() method.
Actions
6. Re: Lifecycle in ReusableExecutions

rareddy Nov 12, 2012 12:13 PM (in response to markaddleman)

If you view execution phase and data retrieval as two separate phases the above does not make sense, imo. Especially when you do not know what the execution style the source that translator is supporting in the execute method.
Actions
7. Re: Lifecycle in ReusableExecutions

markaddleman Nov 12, 2012 12:26 PM (in response to rareddy)

Our client has no knowledge of an underlying data source's particular execution/retrieval phases. The client always make queries in continuous modes. Our translators are deeply aware of the execution style of the source. I don't see any way around that.

Are we skirting around a missing execution state and/or state transition? Should the API need to distinguish between pausing the retrieval phase versus indicating a new execution should run because the data source has new data?
Actions
8. Re: Lifecycle in ReusableExecutions

rareddy Nov 12, 2012 2:40 PM (in response to markaddleman)

That distinction already exists in the Teiid execution's state machine, the API calls to the "execute" vs "next" methods reflect that.

The DNF exception is solely to indicate to the engine that the data is currently not available, and Teiid provides couple different options when it is appropriate for the engine to call back for results. One of those options is dataAvailable(), indicating to the engine that data retrieval can resume. I do not see any correlation between the availability of the data event from translator and the start of a new execution cycle by the engine.
Actions
9. Re: Lifecycle in ReusableExecutions

markaddleman Nov 12, 2012 8:14 PM (in response to rareddy)

I do not see any correlation between the availability of the data event from translator and the start of a new execution cycle by the engine.
Then I'm missing the point of the continuous query. The whole point, in my mind, is to enable the client to be notified to new result sets as conditions in the data source warrant. The client's continuous query is, in effect, saying that it is interested in the following fields from particular data sources filtered in a particular way. The translator and data source promise to keep the client up to date, albeit with no promise that the result sets won't be redundant. The whole point of the translator's data available event is to make the process of informing the client efficient.

Am I missing something?
Actions
10. Re: Lifecycle in ReusableExecutions

shawkins Nov 12, 2012 9:53 PM (in response to markaddleman)

Just to make sure we're on the same page, I'm going back to the original post:

> I can see how throwing DataNotAvailable from next() makes sense for data sources that are slow. In our cases, however, our data sources provide complete result sets very quickly and always return null from next.

By always return null, you mean after returning the appropriate results via next correct? Yes, the original feature was for long running, typically asynch, executions that should not tie up a Teiid thread. In the case of asynch, the execute method typically returned quickly, while data would arrive whenever triggering the DNA from next().

> We throw DNA from execute to indicate that the data sources have updated result sets available. Throwing DNA from execute simplifies the execution's state management tremendously.

Are you catching/rethrowing that DNA in a delegating layer to make a distinction? It doesn't really make a difference to the engine if the DNA is thrown from the execute call or next. We just reattempt the respective call when appropriate.

> I'd like to expand the use of DNA: I'd like to see it thrown from close() as well. Restarting an execution suspended using DNA from either execute or close, the engine should start from the execute() method.

It would seem odd to go back to execute from close via a DNA. A case could be made for re-attempting the close, but our general policy is for best effort close - so currently the engine typically does not really care if a close fails and will just be logged. Also we should have only have reached close from either a query close/cancel (in which case we don't want to restart) or by returning null from next. It seems like rather than returning null from next the translator should have thrown a DNA to indicate more results are expected/possible. Can you expand on this more if I'm not following you?

Steve
Actions
11. Re: Lifecycle in ReusableExecutions

markaddleman Nov 13, 2012 12:44 PM (in response to shawkins)
By always return null, you mean after returning the appropriate results via next correct?
Yes. We return an entire result set and indicate the end with null.

Are you catching/rethrowing that DNA in a delegating layer to make a distinction? It doesn't really make a difference to the engine if the DNA is thrown from the execute call or next. We just reattempt the respective call when appropriate.
In fact we are but, as you say, it shouldn't make any difference to the engine.

It would seem odd to go back to execute from close via a DNA. A case could be made for re-attempting the close, but our general policy is for best effort close - so currently the engine typically does not really care if a close fails and will just be logged. Also we should have only have reached close from either a query close/cancel (in which case we don't want to restart) or by returning null from next. It seems like rather than returning null from next the translator should have thrown a DNA to indicate more results are expected/possible. Can you expand on this more if I'm not following you?

After reflecting on this, I believe the engine's current behavior is to restart the execution at the same method which threw the DNA. So, if execute() throws, the engine will call execute() again after the appropriate timeout or dataAvailable(). Same goes for next(). Is that right?

If so, this explains the communication gap between us. My comment about throwing DNA from close() was about simplifying code rather than following the state model. To explain in a bit more detail:
In all of our cases, execute() is cheap
The first call to execute() never throws DNA and execution procedes through the normal lifecyle of nexts until null, then close then reset. The second call to execute() almost always throws DNA.
At some time later and asynchronously, the translator calls dataAvailable() and the engine calls execute() which must prep the result set. We go through normal lifecycle of nexts until null, then close, then reset. The following call to execute() throws DNA.

Because execute() does double duty under this model, each execution must keep a state flag that indicates whether the execute() should throw DNA or prep for the result set. Before understanding the engine's restart behavior, it seemed to me that allowing close() to throw DNA which would result in restarting the execution at execute() would simplify code in the executions (no need to keep state for the double behavior of execute()).

Assuming I have the engine's restart behavior correct (restarting from the method that threw DNA), I am not suggesting changing the engine's restart behavior just to make the execution state management easier.
Actions
12. Re: Lifecycle in ReusableExecutions

rareddy Nov 13, 2012 1:16 PM (in response to markaddleman)

For me it seems simple to not to allow the DNF from the "execute" call, to distinguish from restart. This goes with Steve's earlier comment about returning from execute call and using DNF only in the "next" method.
Actions
13. Re: Lifecycle in ReusableExecutions

shawkins Nov 13, 2012 1:25 PM (in response to markaddleman)

> So, if execute() throws, the engine will call execute() again after the appropriate timeout or dataAvailable(). Same goes for next(). Is that right?

Yes, it is effectively a retry.

> Because execute() does double duty under this model, ..

Ideally it seems like reset() should handle that transition as it is called immediately before the next execution. However there may be a soft spot in the code, since if you throw a DNA from execute we'll still call reset on the next execution attempt.

Steve
Actions
14. Re: Lifecycle in ReusableExecutions

markaddleman Nov 14, 2012 8:13 AM (in response to shawkins)

Ideally it seems like reset() should handle that transition as it is called immediately before the next execution. However there may be a soft spot in the code, since if you throw a DNA from execute we'll still call reset on the next execution attempt.
Are you suggesting that reset() can throw DNA and the engine will restart at execute()? This seems like an ideal situation for at least a couple of reasons.
Actions

1 2 Previous Next

Go to original post