-
1. Re: Expensive plan
shawkins May 28, 2015 1:59 PM (in response to markaddleman)Since there is no cardinality or other stats on metric_name, it is deferring that join to later stages as the heuristic selectivity of the agent_name filter isn't enough to assume it should be performed first. Can you add a cardinality and possible a distinct value count for agent_name?
-
2. Re: Expensive plan
markaddleman May 28, 2015 3:19 PM (in response to shawkins)I think I have cardinality and distinct value count for agent_name. Prior to running the query in question, I issue the following stored procs:
Cost update query: CALL SYSADMIN.setColumnStats('cassandra_metrics.metric_name', 'metric_id', (SELECT COUNT(DISTINCT metric_id) FROM cassandra_metrics.metric_name), 0, (SELECT MAX(metric_id) FROM cassandra_metrics.metric_name), (SELECT MIN(metric_id) FROM cassandra_metrics.metric_name))
Cost update query: CALL SYSADMIN.setColumnStats('cassandra_metrics.metric_name', 'agent_host', (SELECT COUNT(DISTINCT agent_host) FROM cassandra_metrics.metric_name), 0, (SELECT MAX(agent_host) FROM cassandra_metrics.metric_name), (SELECT MIN(agent_host) FROM cassandra_metrics.metric_name))
Cost update query: CALL SYSADMIN.setColumnStats('cassandra_metrics.metric_name', 'agent_name', (SELECT COUNT(DISTINCT agent_name) FROM cassandra_metrics.metric_name), 0, (SELECT MAX(agent_name) FROM cassandra_metrics.metric_name), (SELECT MIN(agent_name) FROM cassandra_metrics.metric_name))
Cost update query: CALL SYSADMIN.setColumnStats('cassandra_metrics.metric_name', 'agent_process', (SELECT COUNT(DISTINCT agent_process) FROM cassandra_metrics.metric_name), 0, (SELECT MAX(agent_process) FROM cassandra_metrics.metric_name), (SELECT MIN(agent_process) FROM cassandra_metrics.metric_name))
Cost update query: CALL SYSADMIN.setColumnStats('cassandra_metrics.metric_name', 'attribute_type', (SELECT COUNT(DISTINCT attribute_type) FROM cassandra_metrics.metric_name), 0, (SELECT MAX(attribute_type) FROM cassandra_metrics.metric_name), (SELECT MIN(attribute_type) FROM cassandra_metrics.metric_name))
Cost update query: CALL SYSADMIN.setColumnStats('cassandra_metrics.metric_name', 'metric_attribute', (SELECT COUNT(DISTINCT metric_attribute) FROM cassandra_metrics.metric_name), 0, (SELECT MAX(metric_attribute) FROM cassandra_metrics.metric_name), (SELECT MIN(metric_attribute) FROM cassandra_metrics.metric_name))
Cost update query: CALL SYSADMIN.setColumnStats('cassandra_metrics.metric_name', 'metric_path', (SELECT COUNT(DISTINCT metric_path) FROM cassandra_metrics.metric_name), 0, (SELECT MAX(metric_path) FROM cassandra_metrics.metric_name), (SELECT MIN(metric_path) FROM cassandra_metrics.metric_name))
Cost update query: CALL SYSADMIN.setColumnStats('cassandra_metrics.metric_name', 'source_name', (SELECT COUNT(DISTINCT source_name) FROM cassandra_metrics.metric_name), 0, (SELECT MAX(source_name) FROM cassandra_metrics.metric_name), (SELECT MIN(source_name) FROM cassandra_metrics.metric_name))
That ought to fully update the costing info, yes?
Attached is the most recent plan
-
expensive_plan_2.txt.zip 4.8 KB
-
-
3. Re: Expensive plan
shawkins May 29, 2015 10:37 AM (in response to markaddleman)To double check, are you expecting a top level dependent join? If not then the node ordering for the enhanced sort merge doesn't matter. Both sides will be accessed in parallel until the processor can determine if one side is sufficiently small to build an index off of to perform the join.