9 Replies Latest reply on Jun 20, 2016 2:16 AM by bpiepers

    Chain /*+MAKEDEP*/ hints or controlling what gets pushed when

    bpiepers

      I'm currently trying out the /*+MAKEDEP*/ query hint as described here: teiid: Tech Tip: Teiid SQL Language MAKEDEP Hint Explained and Federated Optimizations - Teiid 9.0 (draft) - Project Documentation Editor. I see some behavior of JDV that I would like to influence for better performance. Basically what I'm trying to do, but don't know if it is possible, is to "chain" /*+MAKEDEP*/ hints. Consider the following example:

       

      In my view model I have the following:

       

      SELECT (fields)

      FROM

      nz.USER_MACHINE AS S INNER JOIN nz.MACHINE AS M ON M.MACHINE_ID = S.MACHINE_ID INNER JOIN /*+MAKEDEP*/ ora.MACHINE AS PM ON M.field = PM.field INNER JOIN /*+ MAKEDEP */ ora.LARGE_FACT_TABLE AS L ON L.MACHINE_ID = PM.MACHINE_ID INNER JOIN /*+ MAKEDEP*/ ora.MASTER_DATA_TABLE AS I ON L.MASTER_DATA_TABLE_ID=I.ID INNER JOIN nz.MASTER_TABLE AS NM ON NM.field = I.field

       

      Obviously above query is fictitious but a representation of a real situation I have at a customer.

       

      This query gets its data from two separate sources: a Netezza and an Oracle database. Tables that come from Netezza are indicated with the "nz" pre-fix, tables coming from Oracle are prepended with "ora". So in the above example I'm trying to have the first machine query to deliver a subset of MACHINE records in Netezza that must then be joined with the MACHINE table in the Oracle database. This join happens on a different field, assume that this is a unique identifier for this table (M.field = PM.field).

       

      Then, this subset of oracle machine records must be joined with the large fact table from Oracle. Finally I want to additionally join the subset of that large fact table with a master data table in Oracle (ora.MASTER_DATA_TABLE) and I want to join another master table from Netezza with that. Again based on a field in both master data tables from both datasources that correspond (NM.field = I.field).

       

      The behavior I observe is as follows:

       

      JDV first tries to query all the tables on the Netezza end without any criteria. As these tables do not match but no constraints are given this results in an enormous resultset (more than 9 million records). This is obviously very slow. When that's finished it queries the oracle tables in one query, using the results it gets from the master tables as "IN" statements. The resulting set is correct but because the initial query is very slow, this leads to a very long waiting time. What I would like to prevent JDV from doing is to fetch ALL the records from the Netezza end in one query. Rather, I would like to split the MACHINE queries from the MASTER_DATA_TABLE queries since they are unrelated.

       

      Is this possible to do? I played around with more /*+MAKEDEP*/ instructions and tried to set the last part of the join as /*+ MAKEIND*/ hoping it would separate that query from the rest. This did not lead to the behavior I would like to trigger.

       

      Teiid version 8.7.5, JDV version 6.2 with the latest patch.

        • 1. Re: Chain /*+MAKEDEP*/ hints or controlling what gets pushed when
          shawkins

          > JDV first tries to query all the tables on the Netezza end without any criteria.

           

          You may be seeing some version dependent behavior there.  There should be cross join avoidance, unless there is costing metadata that would lead us to believe that the results will be small.   Do you have at least table cardinalities set on the netezza tables?  And have you tried this scenario in a later version?

           

          > Is this possible to do? I played around with more /*+MAKEDEP*/ instructions and tried to set the last part of the join as /*+ MAKEIND*/ hoping it would separate that query from the rest. This did not lead to the behavior I would like to trigger.

           

          Yes it should be possible.  Ideally we should plan appropriately based upon the cardinality and join predicates alone. From there you can use the preserve hint to force the join structure to stay as written and of course the makedep/makeind hints to force the join filtering.

           

          Would it be possible to provide a representative query plan?

          • 2. Re: Chain /*+MAKEDEP*/ hints or controlling what gets pushed when
            bpiepers

            > Do you have at least table cardinalities set on the netezza tables?

             

            No I haven't yet. This is because a lot of tables are imported from views hence they do not contain any cardinality on the database. If I add it to the model, where would be the best place to add it? In the view model or further down in the source model?

             

            > Would it be possible to provide a representative query plan?

             

            I would love to do that but I may risk violation of company security rules if I post that here on this forum. Unfortunately I don't have a representative test environment with exactly these database brands and versions available.

            • 3. Re: Chain /*+MAKEDEP*/ hints or controlling what gets pushed when
              shawkins

              > In the view model or further down in the source model?

               

              Always on the source tables.

               

              > I would love to do that but I may risk violation of company security rules if I post that here on this forum. Unfortunately I don't have a representative test environment with exactly these database brands and versions available.

               

              If you have a relationship with Red Hat support, that would be another avenue.

              • 4. Re: Chain /*+MAKEDEP*/ hints or controlling what gets pushed when
                bpiepers

                Thanks Steven, I'll try that and may contact support in case I have further queries.

                • 5. Re: Chain /*+MAKEDEP*/ hints or controlling what gets pushed when
                  bpiepers

                  Also for users that may face the same issues: adding cardinality dramatically increases performance! This is very hopeful.

                   

                  Note that in my case Teiid was unable to infer cardinality automatically. So even when invoking the "Update Source Data Statistics" function this did not have any effect on the cardinality for the tables. They were all set to -1. In my case I had to manually add them and I did this based on the counts on the production environments of the databases. I noticed differences in the query planning which were more logical and closer to what I expected.

                   

                  Thanks again!

                  • 6. Re: Chain /*+MAKEDEP*/ hints or controlling what gets pushed when
                    shawkins

                    > So even when invoking the "Update Source Data Statistics" function this did not have any effect on the cardinality for the tables

                     

                    This was through Designer?  It probably should default to just issuing select count(*) - but even that can be problematic for some sources.  An issue may be needed here.

                     

                    When the import is performed through Teiid and import approximate indexes is turned on, we generally are able to determine the cardinality.  If that doesn't seem to be working for Netezza, then an issue is needed there too.

                    • 7. Re: Chain /*+MAKEDEP*/ hints or controlling what gets pushed when
                      bpiepers

                      > This was through Designer?

                       

                      Yes, this was through Designer. I am in contact with the support team on this topic. I think that in most cases you wouldn't want Teiid to automatically calculate the cardinality because count queries can be very costly on some platforms and in most cases you are working on a DEV/TST environment that only contains a small set of data (which makes the source data statistics not very representative). What I also noticed is that the cardinality field can "only" contain 9 characters while quite a few tables in our model contain literally billions of records.In these cases I just fill in 999999999 as the cardinality. Is that the right approach?

                       

                      I stumbled upon more unexpected behavior even with the cardinality set so I have now raised the question with support if it would help if I add foreign key constraints to the source model myself. One of the source models doesn't contain foreign key constraints because it is based on a schema that only contains views on the database. But I wonder if the query engine would make better decisions if I add relations between tables explicitly. Would that make a difference? What I often see now is that when I join tables from different backend sources that JDV just does a cartesian join between tables that are not really related but are in the same source model. Or at least it seems to group table joins coming from the same database but do not really have a relation. I don't understand why it does that. 

                       

                      I am asking this prior to trying it out because we have a relatively large model and the Teiid Designer unfortunately is still very unstable. So if I do not have to make changes I rather don't...

                      • 8. Re: Chain /*+MAKEDEP*/ hints or controlling what gets pushed when
                        shawkins

                        > But I wonder if the query engine would make better decisions if I add relations between tables explicitly. Would that make a difference?

                         

                        Yes it can.  There are several decisions that knowing the relationship benefits.

                         

                        > What I often see now is that when I join tables from different backend sources that JDV just does a cartesian join between tables that are not really related but are in the same source model. Or at least it seems to group table joins coming from the same database but do not really have a relation. I don't understand why it does that.

                         

                        That could be a version dependent bug - typically cross join avoidance is observed, unless there is costing information to determine that the cross join is small.  Since you are on a supported version, please open a ticket.  In particular [TEIID-3237] Prevent processing cross joins with unknown cardinalities - JBoss Issue Tracker could apply.  Otherwise you could recheck the behavior on a later version to confirm that it has been fixed.

                        • 9. Re: Chain /*+MAKEDEP*/ hints or controlling what gets pushed when
                          bpiepers

                          Thanks Steven. I am in contact with support regarding this issue as well but always find additional input from this forum very valuable as well. In addition other people may benefit from the information as well.