6 Replies Latest reply on Aug 30, 2013 12:29 PM by markaddleman

    TEIID-2584 - A use case to consider

    markaddleman

      Hi guys -

       

      I came across TEIID-2584 and thought I'd offer up a use case we're considering implementing.  From what I can tell, it's related to some of the new functionality as part of the ticket.

       

      We have several tables from different sources that are expensive to query.  Fortunately, the nature of the data is append-only.  Our basic plan is to query the underlying table once and materialize it as a materialized view (either internal or external) and, through some means, receive rows to append to the materialized view (probably HornetQ a la Continuous Support for INSERT INTO / MERGE INTO ?).  Practically, this means we'll be physically storing the incrementally delivered rows in an H2 table by creating an INSERT trigger on the materialized view.

       

      Since this is basically data synchronization between the materialized view and the underlying data source, I'm concerned about out-of-sync conditions.  Detecting out-of-sync is necessarily data source dependent but the recovery is always starting from scratch:  a refresh against the materialized view.  I haven't experimented with this yet, but I'm thinking/hoping that if I create an appropriate delete trigger on the materialized view, a refresh will cause the H2 table of incrementally delivered rows to be deleted as well.  The goal is to atomically refresh the materialized view as well as deleting the H2 table. 

       

      I've got a few questions/assumptions:

      • Does this seems like a workable approach?
      • Does any of TEIID-2584 affect internal materialized views?  If so, I see the ALLOW_MATVIEW_PARTIAL_LOADS* properties but not sure I totally understand them.  If partial loads are allowed, could internal materialized views support a direct INSERT?  This would eliminate the need for an H2 table
      • For external materialized views, I see we can specify a truncate procedure.  Using external views, I don't think there's any reason to have a separate table for the incrementally delivered rows and I don't think there would any atomic refresh problems.
      • Because refreshing the materialized view occurs asynchronously to delivering the incremental rows, I'm pretty sure there's going to need to be some form of coordination between the queue reading stored procedure and the MATVIEW_LOAD_SCRIPT.  We could write our own locking service.  Do you know if JBoss have some sort of locking service (maybe that's part of JTA)?  Maybe we could abuse Infinispan?
        • 1. Re: TEIID-2584 - A use case to consider
          shawkins

          > Does this seems like a workable approach?

           

          I think so.  To just rephrase - you'll install for each row triggers on insert/update/delete (sounds like you are only interested in delete) on the materialized view to make the appropriate source changes and the corresponding changes into the materialized table.  But there are other paths for modifications leading to the out of sync condition.  A full refresh would certainly put you back in sync or if you had a quick way for determining the validity of a row, you could do a partial refresh to just delete the out of date rows.

           

          > If so, I see the ALLOW_MATVIEW_PARTIAL_LOADS* properties but not sure I totally understand them.  If partial loads are allowed, could internal materialized views support a direct INSERT?  This would eliminate the need for an H2 table

           

          I think we're refining the initial commit to remove the partial loads concept.  For convenience we may add a range based refresh procedure, but since the initial logic relies on the materialization table being exposed as proper metadata direct updates could be made as well.

           

          > For external materialized views, I see we can specify a truncate procedure.  Using external views, I don't think there's any reason to have a separate table for the incrementally delivered rows and I don't think there would any atomic refresh problems.

           

          truncate is for the default full refresh strategy which would typically rely on a staging table that is swapped for the real table via a rename.  If you are not performing a full refresh, then truncation wouldn't be needed.

          • 2. Re: TEIID-2584 - A use case to consider
            markaddleman

            Thanks Steve.  I'm very interested in a range based refresh.  If I am thinking of it correctly, it would allow the materialized table to be populated over time from queries rather than a full universe load at the first query.

             

            I can see designing some sort of pipeline translator architecture that chains the Infinispan translator together with an underlying translator to provide a rather sophisticated lazy-load caching solution. 

             

            To answer my own question question regarding locking:  It seems Infinispan provides explicit locks and they cross-process.  To fit my use-case, we'll probably write some stored procedures to expose that locking functionality into the Teiid procedure language.

            • 3. Re: TEIID-2584 - A use case to consider
              shawkins

              > it would allow the materialized table to be populated over time from queries rather than a full universe load at the first query.

               

              Yes and allow for more than just a single row refresh.

               

              > To fit my use-case, we'll probably write some stored procedures to expose that locking functionality into the Teiid procedure language.

               

              Table/row locking is something that we've avoided thus far, but may eventually need to exposed - both for source tables and even our internal tables if we start to allow global tables to have consistent updates.  Let us know if you come up with anything that looks generally useful to add as locking operations.

              • 4. Re: TEIID-2584 - A use case to consider
                markaddleman

                > it would allow the materialized table to be populated over time from queries rather than a full universe load at the first query.

                Yes and allow for more than just a single row refresh.

                This sounds great.  Do you think range based refresh will be in the next release?  It doesn't matter a whole lot to us but I'm eager to play with it to understand how it competes/fits with result set caching.

                 

                As an aside, the product we're building on top of Teiid is picking up steam internally.  I expect we'll be integrating many more and more diverse data sources into Teiid in the future.  We have a few complex integrations under our belt now and I'm starting to document Teiid integration patterns in the style of the GoF book.  I expect that most of the material I'm gathering is going to be pretty generic so (lawyers willing), I hope to open source those patterns so the entire Teiid community can participate.  I hope that part of that documentation will include guidance around when to choose between similar but different functionalities like materialized views versus result set caching.

                 

                Let us know if you come up with anything that looks generally useful to add as locking operations.

                Will do

                • 5. Re: TEIID-2584 - A use case to consider
                  rareddy

                  Mark,

                   

                  I missed this from my previous reading of the post, but when you talked about updating the H2 table from events from HornetQ, how are you converting the various update events, typically from different updates from different tables that define the materialized view into a row that can be part of materialized view table?

                   

                  Because while it may be straight forward for a given *single view definition* along with understanding its source tables, I could not think of way to do generically that can apply to *any* materialized views. As insert of one row in underlying tables may result on 0..n rows or deletion of rows in the matview table. 

                   

                  Thanks

                   

                  Ramesh..

                  • 6. Re: TEIID-2584 - A use case to consider
                    markaddleman

                    Hi Ramesh -

                     

                    Thanks for thinking about this.  I haven't given a lot of thought to a general solution.  Right now, we're faced with a few specific examples of a performance problem so we can tightly couple HornetQ message schemas to H2 tables to specific views.  Further, we're only dealing with inserts which helps a lot.

                     

                    For a more general solution, I'm hoping you and Steven come up with something clever using range based refresh   In seriousness, is there a jira issue or any other write up for range refreshes?