Vetting an approach: Detecting changes in continuous queries
markaddleman Aug 17, 2012 12:00 PMWe have a UI requirement to auto-refresh data on a web page. The user will be able to freeze the data on-demand and then manipulate the data set in some way (eg, scroll through it or sort it). At a high-level, my plan is to execute the user query as a continuous execution to fill a temp table while the user query would be rewritten to source from the temp table and also be executed as a continuous execution.
Keeping the temp table up-to-date seems straightforward:
- Execute the continuous query
- On the first result set, examine the ResultSetMetadata and construct the temp table
- On each result set, delete the data from the temp table and fill with new data
With additional complexity, step 3 can probably be optimized (somewhat related note, Teiid support for something like http://www.h2database.com/html/grammar.html?highlight=merge&search=merge#merge would be nice)
Rewritting the UI query to execute against the temp table is not quite as straightforward. Ideally, I'd like to rewrite the query to only return results when the result set changes. I'm envisioning a scalar and aggregate functions called change_in and hash_of, respectively. Both of these pushdown-required functions would run in a reusable execution. change_in takes a single parameter and returns a boolean if the parameter value is the different from the previous execution (though, as described below, the value is only ever true). hash_of takes a variable list of parameters and returns a hash of the result set.
Thus, the user queries against the materialized view would be rewritten as:
Original query:
SELECT a,b,c FROM t
Rewritten query:
SELECT a,b,c FROM temp WHERE EXISTS (SELECT change_in(hash_of(a,b,c)) change FROM temp JOIN (CALL sync_point()) A)
The implementation specific entity that falls out is the sync_point stored procedure. It would work in concert with the code keeping the temp table up-to-date: When the temp table is updated, some cross-thread communication mechanism would alert sync_point() to call ExecutionContext.dataAvailable(). Then, the hash_of function would hash the entire result set (to keep memory footprint low) while change_in would compare the hash to the previous value. If change_in determines that there is a change, it calls ExecutionContext.dataAvailable() and returns a single-row result set, otherwise it throws DataNotAvailable to block the rest of the execution. The next time that the temp table is updated, of course, the sync_point would call dataAvailable(), its execution would run, starting the whole cycle over again.
One of the goals from this approach is to avoid generating an empty result set when there is no change in the hash. That would introduce incidental complexity in the UI.
Thoughts?