2 Replies Latest reply on Jun 14, 2015 3:42 PM by markaddleman

    Feature request - Option to push down distinct values in a dependent join

    markaddleman

      My source has an API that accepts two arguments, each of which is a set of values.  Internally, it effectively performs the cartesian product of the arguments and returns the results.  I'm currently modeling this API as a table with dependent join support (although I'd rather model it as a stored procedure, see Feature request - depjoin-like functionality for stored procedures).  Because getDependentValues() returns the cartesian product of values, the translator must back-out the unique values on each dimension.  I'd like an option to retrieve only the unique dependent values.

        • 1. Re: Feature request - Option to push down distinct values in a dependent join
          shawkins

          Can you give a full example of this?

           

          At worst getDependentValues should give you 1 value for each row coming from the other side of the join.  So unless the join is producing a cross product, you would not generally expect that.  But yes, the values do not go through a distinct filter like they are when the engine handles the dependent join.

          • 2. Re: Feature request - Option to push down distinct values in a dependent join
            markaddleman

            > So unless the join is producing a cross product

             

            You hit it *almost* on the head.  The issue I was having had to do with particular plan that was produced.  To recap, I have an API that produces time series data.  For discussion purposes, there are three parameters to the API: a, b and a time range.  To use the API in a relational model, I have a view that produces a crossproduct b.  The API itself is modeled as a table with columns for a, b, timestamp and the data values.  The API table has an access pattern [a,b,timestamp].  Clients are expected to query a view defined as

            create view client_api as select * from a-x-b inner join /*+ makedep */ api on a-x-b.a=api.a and a-x-b.b=api.b

             

            Further, clients will generally provide qualifers on a and b which limit the results quite a bit.  As it turns out, I was relying on the planner to fully resolve a-x-b + client supplied qualifiers before pushing down to the API.  Originally, the data store for parameters a and b was Cassandra and because it doesn't support complex queries, the Teiid plan retrieves data from Cassandra, produces a-x-b, pushes that to the API and then filtering the results by applying the qualifiers.  Since the API is comparatively slow, I was getting horrible query response times.  I have since moved the source for the parameters from Cassandra to H2.  Because H2 supports much more sophisticated queries, the plan Teiid produces fully results a-x-b + qualifiers first and then resolves against the API yielding much a much more efficient plan.