8 Replies Latest reply on Jul 9, 2012 7:27 PM by markaddleman

    Ideas for new Cache scopes

    markaddleman

      We have a number of views that are joins betwen a particular table,C, and a series of other tables, T1, T2, T3.  Something like:

      • V1=CxT1
      • V2=CxT2
      • ...

       

      A query that includes V1 and V2 can cause C to be read multiple times and in many instances, not only is a consistent snapshot of C a good performance improvement, it is actually the better approach functionally.

       

      What if there were two new cache scopes:  Request and Transaction.  I could define a view, VC, as

      /* cache(request / transaction) */ SELECT * FROM C

      Then, redefine my views to go against VC instead of C. 

       

      I think the desired behavior is obvious:  Materialize the view the first time it is referenced in the query and use the cached results for subsequent needs. 

       

      I suspect that there is enough information in Teiid query plan to build appropriate indexes on the materialized result set.

       

      Obviously, there are workarounds for this in application code but all of the ones I can think of either involve moving the view logic out of Teiid and into the application (highly undesirable) or defining new views for various combination of V1, V2, etc (constrains the types of queries that our application performs).

        • 1. Re: Ideas for new Cache scopes
          shawkins

          I see what you are getting at here is the desire for something to be transparent to your user queries.  Ignoring that for a second, the built-in mechanisms for the request and transaction scopes are common table expressions and temporary tables respectively.

           

          A common table expression gives you more flexibility than a view hint.  Instead you write your query as "WITH (query) AS name SELECT ...".  You can then reference the given alias as if it were a table in the query. 

           

          Temporary tables created under a transaction will have their lifecycle tied to the transaction, but of course require an explicit reference to the temporary table.

           

          The next line of thought is about whether, in the case of a single request, whether query reduction can be used and/or if we should look at automatically pulling out common sub trees in the plan.  There is already an outstanding JIRA on removing redundant self joins and on translator scoped caching that are related.  Do you know if the accesses to C are effectively redundant?

           

          In this case does it make more sense to always cache the full view at the predefined scope or would a hint on the user query directing it to use effectively a common table make more sense?

           

          A  transaction scope is feasable, but is there a reason why traditional materialization is not used?

           

          Steve

          • 2. Re: Ideas for new Cache scopes
            markaddleman

            I see what you are getting at here is the desire for something to be transparent to your user queries.

            I confess that my line of thought started with, how can I make my app's problem a Teiid problem?  

             

            A common table expression gives you more flexibility than a view hint.  Instead you write your query as "WITH (query) AS name SELECT ...".  You can then reference the given alias as if it were a table in the query.

             

            I had considered using CTEs but the problem with that is, in order to be effective, I end up duplicating some view logic between the definitions of V1,V2,etc and the query portion of the table expression.  That's not a big deal except that, in our application, our set of V's are fairly dynamic and defined in SQL.  Further, the query that combines V's is highly dynamic.  These two things conspire to make finding the common table expression very difficult. 

             

            This brings to mind another approach:  I bet this information is, somehow, available through the query planner.  What about some ability to request a plan for a query without executing it?  The app would inspect the plan to find common table expressions and then resubmit the query for execution.

            The next line of thought is about whether, in the case of a single request, whether query reduction can be used and/or if we should look at automatically pulling out common sub trees in the plan.  There is already an outstanding JIRA on removing redundant self joins and on translator scoped caching that are related.  Do you know if the accesses to C are effectively redundant?

            In our case, yes, access to C are effectively redundant.  The problem is the desirability of read consistency is highly data source / application dependent.  I would think you would want to declare it on the view or query level. 

             

            In this case does it make more sense to always cache the full view at the predefined scope or would a hint on the user query directing it to use effectively a common table make more sense?

            I think a query hint to pull effectively use a common table expression is the right approach (and it absolves me of the responsibility of parsing Teiid query plans as I imagined above).  I don't like the idea of always caching the full view.  In our case, C can be quite expensive to query if there are no qualifications on the query.

             

            A  transaction scope is feasable, but is there a reason why traditional materialization is not used?

            I threw in the idea of a transaction scope without thinking about it very much.  I agree, materialization is a work-around but, again, due to the highly dynamic nature of our queries, the application would have to take on the work of determining the common expressions.  I should also add that, in our app, I see transaction scope as having little value.

            • 3. Re: Ideas for new Cache scopes
              markaddleman

              Is there an issue already open for the query planner to factor out common table expressions?  If not, I'll add it.

              • 4. Re: Ideas for new Cache scopes
                shawkins

                Mark,

                 

                I don't believe there is, go ahead and add one.  That should be a fairly easy one to address.  More complicated scenarios where you want filtered view results can be considered later.

                 

                Steve

                • 5. Re: Ideas for new Cache scopes
                  markaddleman

                  Done. See TEIID-2077.

                   

                  I'm not sure what you mean by "More complicated scenarios where you want filtered view results..."

                  • 6. Re: Ideas for new Cache scopes
                    shawkins

                    I've updated that issue with relevant thoughts.  We'll hopefully get something for 8.1.

                    • 7. Re: Ideas for new Cache scopes
                      shawkins

                      Simple sharing of source commands with-in a query has now been implemented, which may be of some benefit in this case.  Can you describe your query structure more?  Does your user query for example look something like V1 union all V2 union all V3 ... where (some filter on C columns projected from each branch)?

                      • 8. Re: Ideas for new Cache scopes
                        markaddleman

                        Our queries take the following forms:

                        SELECT a,b,c FROM (

                           SELECT a,b,c FROM t1

                           UNION ALL

                           SELECT a,b,c FROM t2

                        ) t

                        GROUP BY a, b, c

                         

                        SELECT a,b,c FROM (

                           SELECT a,b,c FROM v1 WHERE a='A'

                           UNION ALL

                           SELECT a,b,c FROM t2 WHERE a='B'

                        ) t

                        GROUP BY a, b, c

                        v1: SELECT a,b,c FROM t1

                        v2: SELECT a,b,c FROM t2

                         

                        SELECT a,b FROM (SELECT a,b FROM t) t

                         

                        WITH v AS (select * FROM t)

                        SELECT * FROM v

                         

                         

                        Not sure if that's getting you what you need (and sorry for the delay).  If more specifics are required, I can get them easily.