I should clarify: All of this is in a single VDB
Great to hear that Teiid is holding so many models. Internally we test with handful of large models but I do not believe we have ever tested with large number of small models. Concern is VDB metadata is memory bound, so it may eventually hit the boundaries depending upon your VM size. In muti-source (single model but multiple sources underneath) we tested earlier up to 800 sources.
The internal materialization is not of a concern, as it is backed by buffer manager which uses disk in cases of over runs.
We would be very interested in your finding. Please keep us posted. Better write a blog -:)
We're running a 64 bit JVM so memory isn't a huge concern at the moment but I would love to see some rough numbers so we can approximate the same of the VDB metadata.
Related: Any chance the VDB metadata could be stored in an internal, embedded database (possibly the same engine that suppots materialized views?) to reduce memory footprint?
Related to the related: What engine does Teiid use for IMT? Is it a custom one? Out of curiousity, I wonder if there would be any advantage to switching to something like H2...
> Any chance the VDB metadata could be stored in an internal, embedded database (possibly the same engine that suppots materialized views?) to reduce memory footprint?
It could, but that has not been seen as necessary yet. Having the memory model simplifies canonicalization and the implementation of system tables. The object model is currently deeply connected - it's possible to get to the schema from a column. Without lot's of additional changes, any usage of metadata from a schema would necessitate that the full schema is in memory. With that limitation in mind, the MetadataStore could have additional smarts to do schema paging and would help reduce the memory footprint - as long as no preparedplan or anything else had a lingering schema reference. With some design changes it would also be possible to reduce the footprint based upon what consumes the most (or variable amount) of memory from each record - the sql/description/extension properties.
> What engine does Teiid use for IMT? Is it a custom one? Out of curiousity, I wonder if there would be any advantage to switching to something like H2...
It is a custom one. That decision is partly historical, but has other factors such as integrated use of the buffer management logic, shared replication logic, no restrictions on production use, etc.
That said there are benefits to allowing the target to be configurable. There would be a simple alternative to mat view replication by using a single instance, and for temptables in general we would gain transactional support, additional indexing functionality (such as function based or hash indexes), etc.
Hi Steve -
Something more to think about regarding the target of materialization: I'd like to target different datastores for materialization / caching depending on the underlying data and the query. To take an idea I've been playing with: We have lots of heterogeneous data sources (ranging from flat files to relational databases to programs). They all expose some kind of graph data that I'd like to traverse with varying degrees of connections between data in different data sources. Ultimately, I want to apply various traversals of the graph. Some properties of the solution I'd like are: build up a cache in a graph database (Neo4J or something like that) and build the cache lazily since it is very unlikely I'll need the entire graph.
I'd like to use Teiid as a way of normalizing data access between these data sources but somehow lazily materialize portions of the data into the graph database. Given Teiid's current architecture, I think the solution is to use a two-stage pipeline model architecture where the first model's translator handles the caching and, on cache fault, redirects the query back into the underlying model and caches the results. We use this 'pipeline translator' pattern in a few places within our application and it works well with one drawback: The metadata exposed by the first-stage model is the union of other models' metadata and is sometimes difficult/inconvenient to compute at Teiid startup.
I don't have a specified feature request but I'd like to throw this problem out there as either another example of where more a dynamic metadata model would be helpful or as a more advanced materialization use-case.