5 Replies Latest reply on Aug 28, 2012 11:30 AM by markaddleman

    Vetting an approach:  Detecting changes in continuous queries

    markaddleman

      We have a UI requirement to auto-refresh data on a web page.  The user will be able to freeze the data on-demand and then manipulate the data set in some way (eg, scroll through it or sort it).  At a high-level, my plan is to execute the user query as a continuous execution to fill a temp table while the user query would be rewritten to source from the temp table and also be executed as a continuous execution. 

       

      Keeping the temp table up-to-date seems straightforward:

      1. Execute the continuous query
      2. On the first result set, examine the ResultSetMetadata and construct the temp table
      3. On each result set, delete the data from the temp table and fill with new data

      With additional complexity, step 3 can probably be optimized (somewhat related note, Teiid support for something like http://www.h2database.com/html/grammar.html?highlight=merge&search=merge#merge would be nice)

       

      Rewritting the UI query to execute against the temp table is not quite as straightforward.  Ideally, I'd like to rewrite the query to only return results when the result set changes.  I'm envisioning a scalar and aggregate functions called change_in and hash_of, respectively.  Both of these pushdown-required functions would run in a reusable execution. change_in takes a single parameter and returns a boolean if the parameter value is the different from the previous execution (though, as described below, the value is only ever true).  hash_of takes a variable list of parameters and returns a hash of the result set.

       

      Thus, the user queries against the materialized view would be rewritten as:

      Original query: 

      SELECT a,b,c FROM t 

      Rewritten query: 

      SELECT a,b,c FROM temp WHERE EXISTS (SELECT change_in(hash_of(a,b,c)) change FROM temp JOIN (CALL sync_point()) A)

       

      The implementation specific entity that falls out is the sync_point stored procedure.  It would work in concert with the code keeping the temp table up-to-date:  When the temp table is updated, some cross-thread communication mechanism would alert sync_point() to call ExecutionContext.dataAvailable().  Then, the hash_of function would hash the entire result set (to keep memory footprint low) while change_in would compare the hash to the previous value.  If change_in determines that there is a change, it calls ExecutionContext.dataAvailable() and returns a single-row result set, otherwise it throws DataNotAvailable to block the rest of the execution.  The next time that the temp table is updated, of course, the sync_point would call dataAvailable(), its execution would run, starting the whole cycle over again.

       

      One of the goals from this approach is to avoid generating an empty result set when there is no change in the hash.  That would introduce incidental complexity in the UI.

       

      Thoughts?

        • 1. Re: Vetting an approach:  Detecting changes in continuous queries
          shawkins

          2) That's not quite in the current API, there isn't a specific marker for when a query iteration has finished from a client perspective.  Are you embedding a value in the rows to differentiate the iterations or are you looking for additional API support?  It's easy to detect inserts/updates without knowing what iteration you're on, but deletes need something more explicit.

           

          3) Yes merge has been on my radar, but I don't think there's even a jira on it yet.

           

          > Rewritting the UI query to execute against the temp table is not quite as straightforward.  Ideally, I'd like to rewrite the query to only return results when the result set changes.

           

          Seems like what you want is to produce a continuous result of changes that would at minimum be the combination of a type column (insert/update/delete) and the primary key of the affected row (there would probably be a time component in there also).  You would then process the changes against the UI state.

           

          > SELECT a,b,c FROM temp WHERE EXISTS (SELECT change_in(hash_of(a,b,c)) change FROM temp JOIN (CALL sync_point()) A)

           

          This implies that any change triggers a full refresh correct?  Will you need something more incremental? It also seems that for this to be continuous, it's also banking on knowning the difference between each iteration.

          • 2. Re: Vetting an approach:  Detecting changes in continuous queries
            markaddleman

            I guess I didn't make this clear:  I'm looking to vet out an approach with existing Teiid (8.1) functionality.  My post isn't really about asking for new capabilities but I wouldn't be surprised if new capabilities fall out of this discussion.

             

            > 2) That's not quite in the current API, there isn't a specific marker for when a query iteration has finished from a client perspective.  Are you embedding a value in the rows to differentiate the iterations or are you looking for additional API support?  It's easy to detect inserts/updates without knowing what iteration you're on, but deletes need something more explicit.

             

            There are no special sentinels indicating row.  I was thinking of keeping local state in my callback:  Just a boolean indicating if the first row has been processed or not.  As for API support, I think it would be useful to extend the exising callback in this way:

            public void onBeforeFirstRow(ResultSet) OR beforeFirstRow(ResultSetMetadata)

              (the second method signature makes more conceptual sense, I think)

            public void onAfterLastRow(Statement);

             

            Of course, a proper super class makes the interface less cumbersome for clients. 

             

            > Yes merge has been on my radar, but I don't think there's even a jira on it yet.

            There is now TEIID-2152

             

            > It's easy to detect inserts/updates without knowing what iteration you're on, but deletes need something more explicit.

            Agreed.  I'm thinking there that after performing the merge operation, construct a second temp table of all of the primary keys of the new result set and issuing the command DELETE FROM temp_table WHERE pkey NOT IN (SELECT pkey FROM second_temp_table)

             

            > This implies that any change triggers a full refresh correct?  Will you need something more incremental?

            A full refresh, yes.  I am trying to keep things as simple as possible, so I assumed that sending a full result set to the UI and let it to a merge operation into its JTable (or equivalent) is the most straightforward.  Without risking the UI and server getting out of sync with one-another, I don't see how to make an incremental approach work.  In practice, it doesn't matter much,  anyway, because we're talking about a few dozen rows and a few dozen columns.  There just isn't that much data.

             

            > It also seems that for this to be continuous, it's also banking on knowning the difference between each iteration.

             

            That's an excellent point.  I had forgotten that the current callback interface doesn't distinguish between result sets.  So, it does look like I'd like StatementCallback to be enhanced to support onAfterLastRow, at least.  Though, can I use/abuse the interface by calling ResultSet.isLast() ?

             

            In the longer term, do you see exposing continuous query capabilities in Teiid's procedure language?  Somewhat relatedly, do you see Teiid providing lock / notification support (maybe through expanded notion of triggers)?  With these two pieces, it seems like I'd be able to implement ~80% of the logic to keep the temp table up to date as a straight-forward stored procedure rather than in Java.  With a lock/notification support, we might be able to write 50%-80% of the UI notification code as a stored procedure as well.

             

            To provide some motivation behind my curiosity:  We are pushing more and more of our application logic into Teiid.  One result of that is we tend to write distinct functions / procedures as custom translator that perform a single piece of logic and then use Teiid & SQL as a functional composition engine.  If you are familiar with Lisp at all, Teiid is becoming something like a REPL for us in addition to being a data integration engine.  I am very interested in treating Teiid more and more as the logic engine of our app.  This desire motivates some of my more ethereal feature requests such as more dynamic metadata, continuous queries, polymorphic foreign keys and, (although I haven't formed the actual request yet) recursive queries.

            • 3. Re: Vetting an approach:  Detecting changes in continuous queries
              markaddleman

              With a night to sleep on our requirements:  It seems we're developing a pattern around using a table to buffer between two sets of continuous queries.  The particiants of the pattern are data supplier continuous queries whose results are merged/inserted into  a buffer table and a set of client continuous queries that source from the buffer. 

               

              The things that are variable across are use case are:  the particular supplier queries, the scope of the buffer table, rules around maintaining the data in the buffer and the particular rules that trigger the client queries to re-execute.

               

              In some of our use cases, the buffer table is scoped to a session, so a temp table makes sense.  In other cases, the buffer table is scoped to the life of the app instance so a global temp table would make sense.  I can foresee a need to extend the scope of the buffer table beyond the life of any one app instance.  Here, I think Teiid gives us all the functionality we need, though global temp tables would be a convenience (I'm pretty sure there's a Jira for this already)

               

              In all of our use cases, there is no reason to re-execute any client query unless the data in the buffer table has changed (ie, a data supplier query has produced a non-empty result set) but more complex conditions may need to be satisfied before re-executing the client query.  For example, in the UI auto-refresh example, we'd want to drive client queries only after both the merge and delete operations complete.  For this, two possible solutions come to mind:  implicitly call ExecutionContext.dataAvailable() when the transaction is completed or explicity call dataAvailble() when all conditions are met.  I favor the later since it keeps transaction issues separate.  Since we use temp tables to achieve session scoped buffers, I think a  proc named DataAvailable that takes a single parameter, table name, would be a convenient way to re-execute client queries.  The behavior would be, under a continuous query, the first execution would return data and then block further executions until the app called the DataAvaiable stored proc.  Perhaps there needs to be an explicit declaration that the temp table should behave this way versus 'traditional' non-blocking behavior.

               

              I think this stored proc idea could be extended to all data sources that don't natively support continuous queries.  There are plenty of details to work out, are there conflicts between translators supporting continuous queries and this stored proc?  What is the default behavior for continuous queries against non-aware translators?  I suspect that this behavior could be supplied via delegate translator but I don't think DTs work against temp tables.

               

              For now, I'd be happy with a DataAvailable stored proc that only supported temp tables.  What do you think of this approach?

               

              The bulk of our use cases involving this supplier-buffer-client pattern could be implemented in SQL with just a bit of Java to initiate the supplier queries.  I'm hopeful that most buffer maintenence logic can be implemented in the procedure language.

              • 4. Re: Vetting an approach:  Detecting changes in continuous queries
                shawkins

                > though global temp tables would be a convenience (I'm pretty sure there's a Jira for this already)

                 

                Yes there is a JIRA https://issues.jboss.org/browse/TEIID-2067, but it hasn't been targeted to a release yet.

                 

                > The behavior would be, under a continuous query, the first execution would return data and then block further executions until the app called the DataAvaiable stored proc.

                 

                This doesn't quite fit well with the continuous logic as that's implemented at the translator level and temp table access has no such hook.  A couple of thoughts.  Temporary access could also be directed at a source, which has been requested as a feature, would give a translator hook.  Alternatively you could decouple the mechisms.  If you have a translator with a dataAvailable, isDataAvailable procedures, you could issue "call isDataAvailable('condition name')" continuously.  Then when this procedure returns in the onRow of you statement callback you'd select the content from your session table or whatever is protected by that condition.  From your app you would then either issue "call dataAvailable('condition name') or that logic could be somewhere in the custom translator.

                 

                > For now, I'd be happy with a DataAvailable stored proc that only supported temp tables.  What do you think of this approach?

                 

                If you're going beyond just simple notification procedures and looking to have something built-in in Teiid there are a couple of thoughs here too.  More traditionally it looks like you want default trigger actions that on insert/update/delete will interpret any change as a reason to call dataAvailable or a similar notification.  For temp tables we don't have a built-in mechanism for creating the instead of triggers, but we do have some logic that is close to this.  The EventDistributor is already our internal distributor of data change events at the translator level.  This could also be extended to global  (when added) or session scoped temp tables.  We'd also have to add a listener mechanism there as well.

                • 5. Re: Vetting an approach:  Detecting changes in continuous queries
                  markaddleman

                  > A couple of thoughts.  Temporary access could also be directed at a source, which has been requested as a feature, would give a translator hook.

                   

                  That would be very helpful for us.  I'm curious:  I would imagine that this feature is tantamount to providing dynamic metadata (at least, as applied to tables and column).  That is, the client is creating tables (albeit, temporary) that are first-class citizens with every other Teiid object.  I'm sure I'm oversimplifying.

                   

                  > > For now, I'd be happy with a DataAvailable stored proc that only supported temp tables.  What do you think of this approach?

                   

                  If we can redirect temp tables to a source+translator, then I don't think we need any specific Teiid support for our use case (I'm continuing to think about what a proper set of base operations should be).  I imagine more interesting uses for a translator-backed temp table.  For example, I can see creating a temp table backed by a Java priority queue.  I see the CREATE TEMP TABLE syntax being a limitation:  I'd like some way to specify the comparator for a priority queue.  Perhaps the CREATE syntax could be extended to support an arbitrary set of key-value pairs as an OPTION?