4 Replies Latest reply on Dec 27, 2012 12:21 PM by markaddleman

    Pushdown support for ARRAYTABLE and OBJECTABLE ?

    markaddleman

      I'm throwing this out there for discussion:

       

      We have programmatic data sources whose schema cannot be known at VDB deployment time.  To get around this problem, we define that they return an array and we use ARRAYTABLE in the query to specify the schema.  We end up with queries like this:

       

      SELECT a FROM (SELECT tuple FROM t) t, ARRAYTABLE(tuple.t COLUMNS a varchar) schemafied_t WHERE a > 0 ORDER BY a

       

      Functionally, this works but isn't always performant.  What do you think of some mechanism for pushing down the appropriate query to the execution?

       

      Teiid might push down the query including the ARRAYTABLE and, thus, make the translator responsible for understanding a pretty complex query AST.  Alternatively, Teiid might pushdown the following query:

      SELECT a FROM schemafied_t WHERE a > 0' ORDER BY a

      I think that "schemafied_t" should be represented as a new AST object (SchemaProjection?) that contains

      1. the AST for SELECT tuple FROM t
      2. all of the schema information from the ARRAYTABLE function
      3. perhaps the alias "schemafied_t"

       

      In more complex cases, the user query might be federated:

       

      SELECT schemafied_t.a,s.b FROM (SELECT tuple FROM t) t, ARRAYTABLE(tuple.t COLUMNS a varchar, b varchar)  schemafied_t JOIN s ON schemafied_t.a=s.a WHERE a>0 ORDER BY b

       

      Here, the pushdown would be

      SELECT schemafied_t.a FROM schemafied_t WHERE a>0

       

      I could very well be wrong about this, but I think ARRAYTABLE always references exactly one data source so the query can always be rewritten as if the data source had exposed that schema in getMetadata().

       

      I think that any pushdown facility that's applicable to ARRAYTABLE is equally applicable to OBJECTTABLE.  In fact, I think the use cases for OBJECTTABLE are more interesting since it allows for optimizing queries around very dynamic data structures.  It isn't our use case, but I can see this kind of facility used to efficiently bind Teiid to document oriented data sources like MongoDB or completly dynamic schema sources like Datomic.

        • 1. Re: Pushdown support for ARRAYTABLE and OBJECTABLE ?
          shawkins

          > Functionally, this works but isn't always performant.  What do you think of some mechanism for pushing down the appropriate query to the execution?

           

          So the issue with performance is likely around that you are pulling back all tuples.  There are a couple of possibilities.  The simpliest in the array case is to add support on your translator for the array_get function:

           

          SELECT a FROM (SELECT tuple FROM t WHERE array_get(tuple, 1) > 0) t, ARRAYTABLE(tuple.t COLUMNS a varchar) schemafied_t ORDER BY a


          ARRAYTABLE is really nothing more than a short cut for a bunch of array_get's in any case.  The engine rather than attempting to push down arraytable, could just look for array_get/cast support and do a rewrite that removes the nested table function altogether:

           

          SELECT cast(array_get(tuple, 1) as varchar) as a FROM t WHERE cast(array_get(tuple, 1) as varchar) > '0' ORDER BY a


          This is shown even a little more simplified since the t view would be flattened as well.  The downsides to this approach are possibly more overhead to the evaluation of the array_get/cast operations and needing cast support on your source.  Also this doesn't really convey the notion that there is a schema to the array.

           

          > Teiid might push down the query including the ARRAYTABLE and, thus, make the translator responsible for understanding a pretty complex query AST.

           

          At some point we should push it down, however that hasn't been a priority since it's not a SQL standard construct.  In theory it's only of use for Teiid sources.

           

          > Alternatively, Teiid might pushdown the following query:SELECT a FROM schemafied_t WHERE a > 0' ORDER BY a


          I see this in two ways. 

          1 it looks like you are trying to introduce a higher level handling of array sources from a translator perspective.  There are some similar concerns in the object translator.  There you have tables that represent objects and the logic for handling requests against those tables essentially calls into the same logic as OBJECTTABLE.  In effect the object translator is exposing specialized OBJECTTABLE views that allow for searchability on the object's primary key/indexed fields.  Your additional constraint of not knowing the schema however means that you cannot generate the approprate metadata, so that doesn't quite work.

          2. it seems like you are wanting to add views (or possibly tables backed by some notion of an array translator) after deploy time to encasulate the schemafied entities.  Unfortunately we don't allow that yet, but is something that should be on the radar shortly.  The ability to incrementally add metadata isn't that far off after all of various metadata related changes.  VDB reuse gets you part of the way to this solution, but has limitations in terms of sessioning, etc. when metadata changes are made.

           

          > I could very well be wrong about this, but I think ARRAYTABLE always references exactly one data source so the query can always be rewritten as if the data source had exposed that schema in getMetadata().

           

          ARRAYTABLE always references a scalrar value, but there isn't always a path from that scalar to a single source - for example returning an array from array_agg or a udf that references multiple sources and passing that value to ARRAYTABLE.

           

          > I think that any pushdown facility that's applicable to ARRAYTABLE is equally applicable to OBJECTTABLE.  In fact, I think the use cases for OBJECTTABLE are more interesting since it allows for optimizing queries around very dynamic data structures

           

          Yes, similar to ARRAYTABLE at some point we should push it down, but it is a Teiid specific construct.

          • 2. Re: Pushdown support for ARRAYTABLE and OBJECTABLE ?
            markaddleman
            So the issue with performance is likely around that you are pulling back all tuples.

             

            Yes.  It was so obvious in my mind, I don't know why I didn't make that more clear in my post

            ARRAYTABLE is really nothing more than a short cut for a bunch of array_get's in any case.  The engine rather than attempting to push down arraytable, could just look for array_get/cast support and do a rewrite that removes the nested table function altogether:

             

            SELECT cast(array_get(tuple, 1) as varchar) as a FROM t WHERE cast(array_get(tuple, 1) as varchar) > '0' ORDER BY a

            I had read that ARRAYTABLE was nothing more than array_get but it didn't sink in.  Are you saying that array_table is pushdown-able in 8.1 or 8.2?  If so, we can easily generate queries of this form though, as you say, the fact that there is a schema gets obscured.  If not, I'll add a jira.

             

            It strikes me that OBJECTTABLE is really nothing more than Iterable<Map<String, Object>>.  Does it make sense for Teiid to provide a pushdown-able map_get function?

             

            2. it seems like you are wanting to add views (or possibly tables backed by some notion of an array translator) after deploy time to encasulate the schemafied entities.  Unfortunately we don't allow that yet, but is something that should be on the radar shortly.  The ability to incrementally add metadata isn't that far off after all of various metadata related changes.  VDB reuse gets you part of the way to this solution, but has limitations in terms of sessioning, etc. when metadata changes are made.

            Yes and no.  Certainly, I'm looking for more dynamism.  VDB reuse gets us partly there and we'd love the ability to add views after deploy time.  We are also discovering use cases for transient tables:  We'd like the abilty for our users to dynamically register new tables and drop them, perhaps within a few minutes.  This operation would be fairly common so redeploying the VDB is too weighty.  Ideally, these transient tables could be backed by any regular translator (we expect these operations to be performed mostly against regular databases as users explore data).  If, instead, they were backed by some array translator, I don't think it would be a big inconvenience.  In fact, I suspect that we could write a delegating translator to bridge the two.

            • 3. Re: Pushdown support for ARRAYTABLE and OBJECTABLE ?
              shawkins

              > Are you saying that array_table is pushdown-able in 8.1 or 8.2?  If so, we can easily generate queries of this form though, as you say, the fact that there is a schema gets obscured.  If not, I'll add a jira.

               

              Go ahead and add a jira.

               

              > It strikes me that OBJECTTABLE is really nothing more than Iterable<Map<String, Object>>.  Does it make sense for Teiid to provide a pushdown-able map_get function?

               

              OBJECTTABLE doesn't inherently use a map model.  It is really up to the script engine as to how the object is accessed.  There could be an analogous object_get (or OBJECTQUERY) operation.  However OBJECTTABLE is more like XMLTABLE, than ARRAYTABLE.  If the context item is iterable, then multiple rows will be produced, whereas ARRAYTABLE will currently only project a single row. 

               

              You are right that there is room to add handling that has built-in support for maps of attributes much like the built-in array handling.  Teiid script will likely be enhanced at some point to handle maps.

               

              Steve

              • 4. Re: Pushdown support for ARRAYTABLE and OBJECTABLE ?
                markaddleman

                > Are you saying that array_table is pushdown-able in 8.1 or 8.2?  If so, we can easily generate queries of this form though, as you say, the fact that there is a schema gets obscured.  If not, I'll add a jira.

                 

                Go ahead and add a jira.

                Done:  https://issues.jboss.org/browse/TEIID-2336

                 

                Thanks