2 Replies Latest reply on Jun 14, 2015 3:42 PM by markaddleman

Feature request - Option to push down distinct values in a dependent join

markaddleman May 28, 2015 10:17 AM

My source has an API that accepts two arguments, each of which is a set of values. Internally, it effectively performs the cartesian product of the arguments and returns the results. I'm currently modeling this API as a table with dependent join support (although I'd rather model it as a stored procedure, see Feature request - depjoin-like functionality for stored procedures). Because getDependentValues() returns the cartesian product of values, the translator must back-out the unique values on each dimension. I'd like an option to retrieve only the unique dependent values.

1. Re: Feature request - Option to push down distinct values in a dependent join

shawkins May 28, 2015 8:17 PM (in response to markaddleman)

Can you give a full example of this?

At worst getDependentValues should give you 1 value for each row coming from the other side of the join. So unless the join is producing a cross product, you would not generally expect that. But yes, the values do not go through a distinct filter like they are when the engine handles the dependent join.
Actions
2. Re: Feature request - Option to push down distinct values in a dependent join

markaddleman Jun 14, 2015 3:42 PM (in response to shawkins)

> So unless the join is producing a cross product

You hit it *almost* on the head. The issue I was having had to do with particular plan that was produced. To recap, I have an API that produces time series data. For discussion purposes, there are three parameters to the API: a, b and a time range. To use the API in a relational model, I have a view that produces a crossproduct b. The API itself is modeled as a table with columns for a, b, timestamp and the data values. The API table has an access pattern [a,b,timestamp]. Clients are expected to query a view defined as
create view client_api as select * from a-x-b inner join /*+ makedep */ api on a-x-b.a=api.a and a-x-b.b=api.b

Further, clients will generally provide qualifers on a and b which limit the results quite a bit. As it turns out, I was relying on the planner to fully resolve a-x-b + client supplied qualifiers before pushing down to the API. Originally, the data store for parameters a and b was Cassandra and because it doesn't support complex queries, the Teiid plan retrieves data from Cassandra, produces a-x-b, pushes that to the API and then filtering the results by applying the qualifiers. Since the API is comparatively slow, I was getting horrible query response times. I have since moved the source for the parameters from Cassandra to H2. Because H2 supports much more sophisticated queries, the plan Teiid produces fully results a-x-b + qualifiers first and then resolves against the API yielding much a much more efficient plan.
Actions

Go to original post