Want to start a discussion on getting a better understanding of analyzing a query plan. With the goal of answering these 3 questions (but not limited to):
- Better understanding of some of the node rules
- What to look for in a query plan that would indicate the query will perform at less than optimal performance
- What are the recommended action(s) to improve query performance
Probably, many of the recommendations will be to update your model or add a hint to your query. But I would like to tie the identified problem in the query plan to the recommended resolution. An example of this is the table cardinality. Not having this value set in the source model can definitely impact how the query plan is constructed. But when would a query plan show the symtom? Which node rules utilize this value?
This post may get lengthy, so I'm going to start out with one rule at a time, and try to connect all the dots. Also, I'm running some of my own tests to show what I'm seeing in the query plan in an effort to help draw a correlation between the node rules and the query plan.
NOTE: this is using the 7.7 doc's.
As details are determined, they will be noted in this section.
RuleChooseJoinStrategy - this rules determines the base join strategy and join type
A Join Strategy is, what kind of algorithm Teiid using internally to sorting through the results on each side to satisfy the JoinType.
Join Strategy: Merge, Nested Loop, Nested Table, Enhanced Sort
Default strategy: Nested Loop
Merge: Merge Join Strategy supports generalized Full, Left Outer, and Inner Joins (containing non-equi join criteria) as long as there is at least one equi-join criteria
Nested Loop: Currently implmented as a degenerate case of merge join, and only for use with Full, Left, Inner and Cross joins.
Nested Table: A variation of the NEST_LOOP join that handles nested tables.
Enhanded Sort: An extension to the MERGE join strategy, to check for conditions necessary to NOT fully sort one of the sides. Will be used for inner joins and only if both sorts are not required. Degrades to a normal merge join if the tuples are balanced.
NESTED LOOP is expensive
Join Type: Inner Join, Right Outer Join, Left Outer Join, Full Outer Join, Cross Join, Union Join, Semi Join, Anti Semi Join
CROSS JOINS are usually expensive, as they produce Cartesian product of two resultsets.
- It talks about enabling costing? how is that done
- reference is made to "column ndv and column nnv values", what are ndv / nnv?
- indicates that if "key metadata information" is there, what key metadata is it specifically looking for?
4. When a JOIN is planned, you will see a JoinNode in the query plan. For each JoinNode, at the end for that planned node, it indicates the strategy, type and criteria. Example:
+ Child 0:
+ Join Strategy:NESTED LOOP JOIN
+ Join Type:CROSS JOIN
+ Join Criteria:product.symbol = stock.symbol
What are the possible values for Strategy and Type (I didn't see a link or reference in the doc)? And which one's indicate a possible performance problem?
What are possible resolutions?
Is there value (or null) for which the Join Criteria will contain such that it indicates a problem may exist?