I'm throwing this out there for discussion:
We have programmatic data sources whose schema cannot be known at VDB deployment time. To get around this problem, we define that they return an array and we use ARRAYTABLE in the query to specify the schema. We end up with queries like this:
SELECT a FROM (SELECT tuple FROM t) t, ARRAYTABLE(tuple.t COLUMNS a varchar) schemafied_t WHERE a > 0 ORDER BY a
Functionally, this works but isn't always performant. What do you think of some mechanism for pushing down the appropriate query to the execution?
Teiid might push down the query including the ARRAYTABLE and, thus, make the translator responsible for understanding a pretty complex query AST. Alternatively, Teiid might pushdown the following query:
SELECT a FROM schemafied_t WHERE a > 0' ORDER BY a
I think that "schemafied_t" should be represented as a new AST object (SchemaProjection?) that contains
- the AST for SELECT tuple FROM t
- all of the schema information from the ARRAYTABLE function
- perhaps the alias "schemafied_t"
In more complex cases, the user query might be federated:
SELECT schemafied_t.a,s.b FROM (SELECT tuple FROM t) t, ARRAYTABLE(tuple.t COLUMNS a varchar, b varchar) schemafied_t JOIN s ON schemafied_t.a=s.a WHERE a>0 ORDER BY b
Here, the pushdown would be
SELECT schemafied_t.a FROM schemafied_t WHERE a>0
I could very well be wrong about this, but I think ARRAYTABLE always references exactly one data source so the query can always be rewritten as if the data source had exposed that schema in getMetadata().
I think that any pushdown facility that's applicable to ARRAYTABLE is equally applicable to OBJECTTABLE. In fact, I think the use cases for OBJECTTABLE are more interesting since it allows for optimizing queries around very dynamic data structures. It isn't our use case, but I can see this kind of facility used to efficiently bind Teiid to document oriented data sources like MongoDB or completly dynamic schema sources like Datomic.