1 2 3 4 Previous Next 47 Replies Latest reply on Mar 10, 2015 2:43 PM by stavroskalamat

    SQL script doesn't get pushed down to Aster Teradata data source

    stavroskalamat

      ENV'NT

      Aster Teradata cluster (ATC)  <-->  JBoss DV with Dev Studio & Teiid (JDV)   <--> dbVisualizer (dbVis)

      All of the above, anmely ATC, JDV and dbVis reside on different servers in different IP segments separated by Firewalls w/ appropriate ACL rules to allow traffic via port 31000 (dbVis -> JDV) and port 2406 (JDV -> ATC)

       

      I'm not the DBA on ATC but I can ask the DBA Qs if I have to.

      I have full ctrl of JDV and dbVis

       

      SETUP

      Aster Teradata is one of our data sources JDV connects to

      A Teiid JDBC connection to ATC, along with a VBLayer and VDB have all been configured from Dev studio and work perfectly fine

      Tested the JDV connection to Aster from dbVisualizer (dbVis) and can run simple queries (Select * from tbl LIMIT 10,  select count(*) from tbl, etc)

       

      PERFORMANCE TEST

      When the following script runs from dbVis through a straight connection to Aster Teradata it takes a couple of mins

      When the following script runs from dbVis via the JDV connection it takes about an hr or more.

       

      DW is Aster's db schema

      4 tbls are called in this script. They have the following sizes:

       

      tbl1: 2.3 Billion rcds

      tbl2: 4.2 Billion rcds

      tbl3: 15 Million rcds

      tbl4: 73,000 rcds

       

      select  f.fld1_dim_key, f.fld2_dim_key, f.fld3_dim_key,

      f.fld8_FACT_KEY, f.fld7, f.fld9_dim_key,

      f.fld10_dim_key, f.fld12_dim_key, f.fld11,

      f.natural_key, a.fld13_DIM_KEY, a.fld14

      from

      DW.tbl1_fact f

      left outer join

      DW.tbl2_fact a

      on  f.natural_key = a.fld8_fact_nk and f.fld2_dim_key = a.fld2_dim_key

      inner join  DW.tbl3_dim b

      on  f.fld15_dim_key = b.fld16_dim_key

      inner join  DW.tbl4_dim d

      on  f.fld17_DIM_KEY = d.dt_dim_key

      where

      coalesce(a.fld4_key,'20130601') between '20130601' and '20131130'

      and f.fld4_key between '20130601' and '20131130'

      and f.fld5_DIM_KEY = 35768

      and Cy_NMBR = 2013 and CM_NMBR = 8

      and f.fld1_dim_key = 678

      and b.fld6 = 'John Smith'

      ORDER BY f.fld7, f.natural_key ;

       

       

      TESTING conducted already to perf tune the above script:

      1. I was under the impression that such a script will get pushed down to ATC by default.

       

      2. replaced the left outer join with an inner join (although the logic of the script changes) to chk any change in exec time. Result: run time dropped from  an hr down to less than 100secs. Does that mean the translator has issues with the left outer join

      OR

      as Chapter 9. Translators  says 

      The Salesforce Connector executes SQL commands by “pushing down” the command to Salesforce whenever possible, based on the supported capabilities. Teiid will automatically provide additional database functionality when the Salesforce Connector does not explicitly provide support for a given SQL construct

      where Salesforce is ATC in this case.

       

      3. Rewrote the script using a right outer join but no luck. It still takes quite a bit of time to complete.

       

      4. Per Redhat's engineer advice (worked with him in the past) I imported a physical model to a new data source and exposed it via a fresh standalone VDB and then run the script from dbVis against that VDB. I still got a long execution response, no change.

       

      QUESTION:

      I'm not an expert in JDV in any stretch of the imagination but I can follow advice and, most importantly, learn from it. How can I make this particular script run faster and what general development guidelines (any docs, URLs you can point me to? ) should I follow when I write scripts like that ?  By the way, how does SQL push down work on Teiid ? I didn't import the indexes with the views (ATC's DBA exposed only views from his env'nt and that's understandable) I brought in from ATC when I created the data source in Dev Studio, is that a problem ?

       

      This issue has been occupying my nights for quite a while so ANY advice is greatly appreciated.

      Anything I'm forgetting to report please ask and I'll answer ASAP.

       

      Thanks

      Stavros K.

        • 1. Re: SQL script doesn't get pushed down to Aster Teradata data source
          rareddy

          Stavros,

           

          The pushdown behavior in Teiid is controlled by what we call "capabilities". These are defined individually for each translator through their implementation. If you look at TeradataExecutionFactory class you will see many "supportsXXXX" methods, those define if the individual translator was capable of certain filtering or join etc. Based on those Teiid query engine will rewrite the query that can be handled by source and rest of the implementation will be provided by the Teiid.

           

          During planning certain decisions are influenced by costing information you can provide. Look at the "cardinality" property on each source table and provide the table sizes on them. Indexing metadata information on source model is also important. Most importantly Query Plan call tell you what has been executed at the source Vs what is being handled by the Teiid Engine. Take look at this for getting a query plan for your source Query Plans - Teiid 8.9 (draft) - Project Documentation Editor

          Importing the metadata using the Dev Studio is perfectly fine to use. Most probably I suspect some operation is not being pushed as you expect (may be due to a bug, rewrite or non available capability etc). So, I suggest you try the above then post back with query plan as attachment, then we can provide specific guidance.

           

          Thanks


          Ramesh..

          1 of 1 people found this helpful
          • 2. Re: SQL script doesn't get pushed down to Aster Teradata data source
            virtualdatabase

            I'm wondering something... the query mentioned is a single-source query (ALL IN TERADATA) so the entire query SHOULD be pushed down, right?

            IF not... what's the best, fastest way to identify that the query is NOT being pushed down and WHY would that be (other than obvious functions or something of the like that is not supported by the source)

             

            thanks..

            • 3. Re: SQL script doesn't get pushed down to Aster Teradata data source
              rareddy

              Typically DEBUG PLAN will have some comments in them to show why that is the case, like Functions or Joins etc.

               

              SET SHOWPLAN DEBUG

               

              You will see log on the console pertaining to your query, however often this is long and and confusing to understand. I do wish there is graphical way to represent this, but no such thing exists. Designer does show PLAN (not DEBUG PLAN) in graphical way, I think it is kind of primitive in its use.

               

              Ramesh..

              • 4. Re: SQL script doesn't get pushed down to Aster Teradata data source
                stavroskalamat

                Hello Ramesh.

                 

                To make sure I'm doing the right thing.

                In order to produce the Qury Plan I need to execute the code from Query Plans - Teiid 8.9 (draft) - Project Documentation Editor wrapped in JAVA on eclipse (for example) by calling the Teiid JDBC API using a connection class to connect to DV, correct ? OR I'm totally off ?

                 

                Thanks.

                Stavros K.

                • 5. Re: SQL script doesn't get pushed down to Aster Teradata data source
                  rareddy

                  You can use Java program if you want to see this, you can even use Teiid Designer data tools (right click -> Execute query plan ..)

                  • 6. Re: Re: SQL script doesn't get pushed down to Aster Teradata data source
                    stavroskalamat

                    Hello Ramesh,

                     

                    I'm attaching the exec plan for the script and some more material that may assist more prudent advice such as  a short avi of the plan tree. I know if I place the mouse pointer on top of the Exec plan visualization yellow items I'll be able to get more info about them but my understanding is that they  get covered in the avi recorded plan tree.

                     

                    Please let me know if what I have attached is enough OR more detail is necessary for a good advice.

                    As always many thanks for your help.

                    • 7. Re: Re: SQL script doesn't get pushed down to Aster Teradata data source
                      rareddy

                      I can not play AVI. Please provide a text format of the plan

                      • 8. Re: Re: Re: SQL script doesn't get pushed down to Aster Teradata data source
                        stavroskalamat

                        Hello Ramesh.

                         

                        A txt file of the exec plan is attached. Please let me know if any other info is necessary to facilitate a good advice.

                         

                        Thanks.

                        Stavros K.

                        • 9. Re: Re: Re: SQL script doesn't get pushed down to Aster Teradata data source
                          shawkins

                          This is the final plan form, which is helpful.  But the debug log is more detailed and should have statements as to why push down decisions are made.  In this case we see a join between HSDW_Prod.v_encntr_fact and HSDW_Prod.v_diag_fact that is not pushed down.  I think the issue is that it's a left outer join and the translator is marked as not supporting outer joins.  This may be a mistake as prior to 7.7 the translator was marked as only not supporting full outer joins.  I don't even see in teradata's documentation where that restriction came from, so it may not be necessary for newer versions.

                          1 of 1 people found this helpful
                          • 10. Re: Re: Re: SQL script doesn't get pushed down to Aster Teradata data source
                            stavroskalamat

                            Hello Steven,

                             

                            QUESTION

                            For our education can you please tell us where in the txtplan do you see the join between HSDW_Prod.v_encntr_fact and HSDW_Prod.v_diag_fact not getting pushed down ?

                             

                            Also, I had been  compiling a msg for the last hr to share the following:

                             

                            1) we replaced the left outer join with an inner join, although the logic of the script changes, to test perf. and the script runs in 1min and 14 secs when it executes directly on JDV. Execution time out of dbVis with a direct conn. to Aster (or from Teradata Studio) is 2mins and 7secs. We thought it should not be much slower (probably something else was going on Aster's side when we ran against it)    so I attempted the same runs 3 more times. 

                             

                            Here are the results out of 3 runs

                            dbVis -> JDV -> Aster     1min 15secs       1min 18secs         1min 13secs

                            dbVis -> Aster                2mins 9secs        2mins  12secs     2mins 8secs

                             

                            That's a good sign but the only explanation is that JDV throws some extra optimization we can't see   before it hits Aster as opposed to running the script directly against Aster.

                             

                            2) we also saw a small improvement when we replaced the "between" with a left and right  inequalities ("x between a and b" replaced by "x>= a and x <= b").

                             

                            3) I forgot to  mention in my original post that the tbl cardinality change Ramesh had recommended was on target as well.  We had applied it before we started this post and it dropped the exec. time orders of magnitude down from hrs to a few mins (less than 7mins in this case with the left outer join in it). As mentioned above the run time decreased even more down to less than 2mins when we changed the script logic by replacing the left outer join with an inner join.

                             

                            Question:

                            Does the cardinality have to be exactly equal to the number of rcds in a tbl ? For example, some Teradata tbls have a length of 2.5billion records however the acceptable highest value of cardinality in Dev Studio goes up to values equal to nine 9s (999,999,999). Will that be a problem or a ballpark would suffice ?

                             

                            Thanks

                            Stavros K.

                            • 11. Re: Re: Re: Re: SQL script doesn't get pushed down to Aster Teradata data source
                              stavroskalamat

                              Hello,

                               

                              Along with the 2 Qs from my previous reply I have another new development I'd like to report for advice.

                               

                              I'm executing a series of 8 scripts and every  script after the first is using a temp tbl one of the previous scripts has produced.


                              The top of the following 2 scripts is joining  v_diag_fact  (about 2+ billion records) with a temp tbl (#PatntLst_Encntrs_tmp) a few thousands of rcds long and it executes in about 30-40 mins. The bottom one is joining v_proc_fact  (about 2+ billion records) with the same temp tbl (#PatntLst_Encntrs_tmp) but it has been executing for hrs w/o an end, although the time period in its WHERE condition is much shorter and only 1 rcd is requested back.

                              Al tbl cardinalities are as close as possible to the real rcd length of the tbls, except the temp ones of course.

                               

                              All of the above runs in a dbVisualizer session where the temp tbls persist  (I qry the temp tbls for limit 1 in between scripts to ensure data persists) so when I go to JBoss Dev. Studio to get an exec. plan a long error is shown (see jpg attached).

                               

                              That, in my mind at least, means JBoss isn't aware of the #PatntLst_Encntrs_tmp. So no exec plan is available.

                               

                              QUESTIONs

                              How come and the dbVis session I'm running the scripts (that is connected to JBoss EAP/DVirt via Teiid JDBC drivers) knows about the temp tbls but JBoss Dev Studio doesn't?

                              Isn't it safe to assume that the temp tbls created in a dbVis session are cached on JBoss EAP ?

                              How do I troubleshoot perf. to ensure the qry gets pushed down to Aster in this case ?

                               

                              SELECT      enc.bnfcry_deers_id

                                          ,enc.encntr_m2_rcrd_id

                                          ,EncDiag.encntr_diag_seq_nmbr

                                          ,DimDiag.icd_diag_cd

                                          ,DimDiag.icd_diag_cd_w_dod_extndr

                                          ,DimDiag.icd_diag_cd_shrt_dscrptn

                                          ,DimDiag.icd_diag_cd_lng_dscrptn

                                          ,EncDiag.quick_cnt

                              FROM  HSDW_P.v_diag_fact EncDiag

                                          INNER JOIN #PatntLst_Encntrs_tmp enc

                                                ON EncDiag.encntr_fact_key = enc.encntr_fact_key

                                          INNER JOIN HSDW_P.v_icd_diag_cd_dim DimDiag

                                                ON EncDiag.icd_diag_cd_dim_key = DimDiag.icd_diag_cd_dim_key

                              WHERE EncDiag.prttn_key between '20101001' AND '20110930'

                              GO

                               

                               

                              SELECT      enc.bnfcry_deers_id

                                          ,enc.encntr_m2_rcrd_id

                                          ,EncProc.proc_ln_item_nmbr

                                          ,DimCPT.cpt_cd

                                          ,DimCPT.cpt_cd_lbl

                                          ,DimCPT.cpt_cd_shrt_dscrptn

                                          ,DimCPT.cpt_cd_lng_dscrptn

                                          ,DimCPT.cpt_cd_modfr

                                          ,DimCPT.cpt_cd_modfr_dscrptn

                                          ,EncProc.proc_rvu

                                          ,EncProc.prvdr_rvu

                                          ,EncProc.proc_qty

                                          ,EncProc.quick_cnt

                              FROM  HSDW_P.v_proc_fact EncProc

                                          INNER JOIN #PatntLst_Encntrs_tmp enc

                                                ON EncProc.encntr_fact_key = enc.encntr_fact_key

                                          INNER JOIN HSDW_P.v_cpt_cd_dim DimCPT

                                                ON EncProc.cpt_cd_dim_key = DimCPT.cpt_cd_dim_key

                              WHERE EncProc.prttn_key between '20101001' AND  '2011101'

                              LIMIT 1

                              GO

                              • 12. Re: Re: Re: Re: SQL script doesn't get pushed down to Aster Teradata data source
                                rareddy

                                1) In the query plan, when you see the JOIN node, underneath it you can follow two other child nodes, they represent two sides of the join. In typical Teiid query scenario, these two child nodes represent two different source queries. However in your situation they are from the same source, so, the join could not could be pushed to the source. If you see capabilities of the teradata connector in code you will see that "outer joins" was for some reason not supported. As per the speed of the query, I believe Teiid is doing effective job in terms of maintaining/re-using the existing connection (no-connection overhead each time) and also effective buffering of the results, thus you are seeing the improvement in time.

                                 

                                2) Between optimization is automatically done by Teiid re-writer, I believe. That will be not be case when you execute directly from DbVis to Aster.

                                 

                                3) When cardinality info available then Teiid planner can produce optimized plans for execution, like dependent queries and tries to avoid table scans etc. As per cardinatilty information having them close to real numbers is recommended, but if you can not enter it, close enough is good in my opinion. Also read http://teiid.blogspot.com/2014/10/teiid-platform-sizing-guidelines-and.html on some current limitations of Teiid platform.

                                 

                                I am not sure I understand your Temp table question, but I want you know they are scoped to a session, if your DBVis and DEV Studio are two different sessions, one does not have scope for other.

                                 

                                Ramesh..

                                • 13. Re: Re: Re: Re: SQL script doesn't get pushed down to Aster Teradata data source
                                  shawkins

                                  3) Also later versions of Designer/Teiid will accept values greater than the max integer, but yes setting to a high number should initially suffice.

                                   

                                  > How come and the dbVis session I'm running the scripts (that is connected to JBoss EAP/DVirt via Teiid JDBC drivers) knows about the temp tbls but JBoss Dev Studio doesn't?

                                   

                                  You have created a session scoped temporary table.  I'm not sure if Designer maintains a session for querying.

                                   

                                  > Isn't it safe to assume that the temp tbls created in a dbVis session are cached on JBoss EAP ?

                                   

                                  A session scoped temp table will exists on the server it was created on for the life of the session.

                                   

                                  > How do I troubleshoot perf. to ensure the qry gets pushed down to Aster in this case ?

                                   

                                  You can always turn up the logging level on the server to see all relevant information.  The command log can be a fairly concise source of information as well - it can show the plan and all source queries executed.  See Logging - Teiid 8.8 - Project Documentation Editor

                                   

                                  More than likely you need a dependent join using the values from the temp table.  It may be good to confirm why it's not being created and you can also just add a hint to query, such as "INNER JOIN /*+MAKEIND*/ #PatntLst_Encntrs_tmp enc"

                                  • 14. Re: Re: Re: Re: SQL script doesn't get pushed down to Aster Teradata data source
                                    shawkins

                                    [TEIID-3196] Remove Teradata join pushdown restriction - JBoss Issue Tracker was logged to correct the Teradata join restriction.

                                    1 2 3 4 Previous Next