6 Replies Latest reply on Sep 11, 2011 2:40 PM by markaddleman

Practical limits?

markaddleman Aug 19, 2011 12:29 PM

We're populating Teiid with several dozen models (potentially reaching a few hundred models) and several thousand tables (along with a smattering of stored procedures). Additionally, we are starting to make use of internal materialized views. Our largest IMT could end up having a few million rows. So far, I've been pleasantly surprised that Teiid has handled this without problems. I do have a low-level, persistent concern that we're going to hit some limit at some point.

What are the practical limit or other scaling issues that come into play?

1. Re: Practical limits?

markaddleman Aug 19, 2011 12:31 PM (in response to markaddleman)

I should clarify: All of this is in a single VDB
Actions
2. Re: Practical limits?

rareddy Aug 20, 2011 10:11 PM (in response to markaddleman)

Mark,

Great to hear that Teiid is holding so many models. Internally we test with handful of large models but I do not believe we have ever tested with large number of small models. Concern is VDB metadata is memory bound, so it may eventually hit the boundaries depending upon your VM size. In muti-source (single model but multiple sources underneath) we tested earlier up to 800 sources.

The internal materialization is not of a concern, as it is backed by buffer manager which uses disk in cases of over runs.

We would be very interested in your finding. Please keep us posted. Better write a blog -:)

Thank you.

Ramesh..
Actions
3. Re: Practical limits?

markaddleman Aug 24, 2011 12:05 PM (in response to rareddy)

We're running a 64 bit JVM so memory isn't a huge concern at the moment but I would love to see some rough numbers so we can approximate the same of the VDB metadata.

Related: Any chance the VDB metadata could be stored in an internal, embedded database (possibly the same engine that suppots materialized views?) to reduce memory footprint?
Actions
4. Re: Practical limits?

markaddleman Aug 24, 2011 12:06 PM (in response to markaddleman)

Related to the related: What engine does Teiid use for IMT? Is it a custom one? Out of curiousity, I wonder if there would be any advantage to switching to something like H2...
Actions
5. Re: Practical limits?

shawkins Aug 24, 2011 1:07 PM (in response to markaddleman)

Mark,

> Any chance the VDB metadata could be stored in an internal, embedded database (possibly the same engine that suppots materialized views?) to reduce memory footprint?

It could, but that has not been seen as necessary yet. Having the memory model simplifies canonicalization and the implementation of system tables. The object model is currently deeply connected - it's possible to get to the schema from a column. Without lot's of additional changes, any usage of metadata from a schema would necessitate that the full schema is in memory. With that limitation in mind, the MetadataStore could have additional smarts to do schema paging and would help reduce the memory footprint - as long as no preparedplan or anything else had a lingering schema reference. With some design changes it would also be possible to reduce the footprint based upon what consumes the most (or variable amount) of memory from each record - the sql/description/extension properties.

> What engine does Teiid use for IMT? Is it a custom one? Out of curiousity, I wonder if there would be any advantage to switching to something like H2...

It is a custom one. That decision is partly historical, but has other factors such as integrated use of the buffer management logic, shared replication logic, no restrictions on production use, etc.

That said there are benefits to allowing the target to be configurable. There would be a simple alternative to mat view replication by using a single instance, and for temptables in general we would gain transactional support, additional indexing functionality (such as function based or hash indexes), etc.

Steve
Actions
6. Re: Practical limits?

markaddleman Sep 11, 2011 2:40 PM (in response to shawkins)

Hi Steve -

Something more to think about regarding the target of materialization: I'd like to target different datastores for materialization / caching depending on the underlying data and the query. To take an idea I've been playing with: We have lots of heterogeneous data sources (ranging from flat files to relational databases to programs). They all expose some kind of graph data that I'd like to traverse with varying degrees of connections between data in different data sources. Ultimately, I want to apply various traversals of the graph. Some properties of the solution I'd like are: build up a cache in a graph database (Neo4J or something like that) and build the cache lazily since it is very unlikely I'll need the entire graph.

I'd like to use Teiid as a way of normalizing data access between these data sources but somehow lazily materialize portions of the data into the graph database. Given Teiid's current architecture, I think the solution is to use a two-stage pipeline model architecture where the first model's translator handles the caching and, on cache fault, redirects the query back into the underlying model and caches the results. We use this 'pipeline translator' pattern in a few places within our application and it works well with one drawback: The metadata exposed by the first-stage model is the union of other models' metadata and is sometimes difficult/inconvenient to compute at Teiid startup.

I don't have a specified feature request but I'd like to throw this problem out there as either another example of where more a dynamic metadata model would be helpful or as a more advanced materialization use-case.
Actions

Go to original post