3 Replies Latest reply on May 29, 2015 10:37 AM by shawkins

    Expensive plan

    markaddleman

      Attached is an execution plan that I believe is too expensive and could be much cheaper.  The query is

      select a.agent_name, count(*)

      from cassandra_metrics.metric_name as a, facts.numerical_metric_data_live as numerical_metric_data_live

      where numerical_metric_data_live.metric_id = a.metric_id

      and a.agent_name = 'TIXChange-Web'

      group by a.agent_name

       

      numerical_metric_data_live is a view involving a dependent join but I don't believe that ought to change the efficient solution.  The planner is choosing to first join the complete cassandra_metrics.metric_name table to facts.numerical_metric_data_live and then join that to the cassandra_metrics.metric_name qualified by agent_name='TIXChange-Web'.  Shouldn't it first determine the result set of the qualified metric_name table and then join to numerical_metric_data_live?

        • 1. Re: Expensive plan
          shawkins

          Since there is no cardinality or other stats on metric_name, it is deferring that join to later stages as the heuristic selectivity of the agent_name filter isn't enough to assume it should be performed first.  Can you add a cardinality and possible a distinct value count for agent_name?

          • 2. Re: Expensive plan
            markaddleman

            I think I have cardinality and distinct value count for agent_name.  Prior to running the query in question, I issue the following stored procs:

            Cost update query: CALL SYSADMIN.setColumnStats('cassandra_metrics.metric_name', 'metric_id', (SELECT COUNT(DISTINCT metric_id) FROM cassandra_metrics.metric_name), 0, (SELECT MAX(metric_id) FROM cassandra_metrics.metric_name), (SELECT MIN(metric_id) FROM cassandra_metrics.metric_name))

            Cost update query: CALL SYSADMIN.setColumnStats('cassandra_metrics.metric_name', 'agent_host', (SELECT COUNT(DISTINCT agent_host) FROM cassandra_metrics.metric_name), 0, (SELECT MAX(agent_host) FROM cassandra_metrics.metric_name), (SELECT MIN(agent_host) FROM cassandra_metrics.metric_name))

            Cost update query: CALL SYSADMIN.setColumnStats('cassandra_metrics.metric_name', 'agent_name', (SELECT COUNT(DISTINCT agent_name) FROM cassandra_metrics.metric_name), 0, (SELECT MAX(agent_name) FROM cassandra_metrics.metric_name), (SELECT MIN(agent_name) FROM cassandra_metrics.metric_name))

            Cost update query: CALL SYSADMIN.setColumnStats('cassandra_metrics.metric_name', 'agent_process', (SELECT COUNT(DISTINCT agent_process) FROM cassandra_metrics.metric_name), 0, (SELECT MAX(agent_process) FROM cassandra_metrics.metric_name), (SELECT MIN(agent_process) FROM cassandra_metrics.metric_name))

            Cost update query: CALL SYSADMIN.setColumnStats('cassandra_metrics.metric_name', 'attribute_type', (SELECT COUNT(DISTINCT attribute_type) FROM cassandra_metrics.metric_name), 0, (SELECT MAX(attribute_type) FROM cassandra_metrics.metric_name), (SELECT MIN(attribute_type) FROM cassandra_metrics.metric_name))

            Cost update query: CALL SYSADMIN.setColumnStats('cassandra_metrics.metric_name', 'metric_attribute', (SELECT COUNT(DISTINCT metric_attribute) FROM cassandra_metrics.metric_name), 0, (SELECT MAX(metric_attribute) FROM cassandra_metrics.metric_name), (SELECT MIN(metric_attribute) FROM cassandra_metrics.metric_name))

            Cost update query: CALL SYSADMIN.setColumnStats('cassandra_metrics.metric_name', 'metric_path', (SELECT COUNT(DISTINCT metric_path) FROM cassandra_metrics.metric_name), 0, (SELECT MAX(metric_path) FROM cassandra_metrics.metric_name), (SELECT MIN(metric_path) FROM cassandra_metrics.metric_name))

            Cost update query: CALL SYSADMIN.setColumnStats('cassandra_metrics.metric_name', 'source_name', (SELECT COUNT(DISTINCT source_name) FROM cassandra_metrics.metric_name), 0, (SELECT MAX(source_name) FROM cassandra_metrics.metric_name), (SELECT MIN(source_name) FROM cassandra_metrics.metric_name))

             

            That ought to fully update the costing info, yes?

             

            Attached is the most recent plan

            • 3. Re: Expensive plan
              shawkins

              To double check, are you expecting a top level dependent join?  If not then the node ordering for the enhanced sort merge doesn't matter. Both sides will be accessed in parallel until the processor can determine if one side is sufficiently small to build an index off of to perform the join.