> I have also noticed that the 1st query for an entity takes a large amount of time (say 18 secs) while subsequent execution of the same query takes 3-4 secs with no result-set caching.
Given no result set caching you wouldn't expect such a variation in timing. Are you using prepared statements? Have you examined the logs or tried first issuing a the query with no exec on to determine if the time is being spent in planning? Otherwise there's not much from the Teiid side that would account for the variation - there could be an issues with how long it takes to get connections from your pool etc. I think you need to understand exactly how the time is being spent before looking for optimizations, but I'll take a stab at answering your questions just in case.
> So the question is what is exactly held in the resultset cache? Is it mapping from the exact query to the location of source tuples that hold data to the select query?
The result set cache is used for user query result caching and for source query caching (using the CacheDirective). It is keyed by the query and several other factors, such as the vdb, user, and session depending upon the scope.
> If the select resulted in more than 1 batch (and hence multiple source tuples) how are these multiple source tuples matched in the resultset cache?
The storage unit of the result set cache is a batch, but they are tracked as a unit. No single batch is purged from all layers of cache until the entire result is purged.
> Based on the documentation it looks like the resultset cache is already turned on but query results are not cached by default; is this correct?
Yes, since we are not aware of all data changes (as the sources can be accessed outside of Teiid) we cannot assume results should always be cached.
> So the way to turn on caching for resultset would be to add a hint in the query like "/*+ cache
/Select * from A" or have the statement run the "set resultSetCacheMode true" query first.
Yes.
> If each subsequent query requires the same hint, would running the resultSetCacheMode be a better option for resultset caching?
Not necessarily. Setting the resultSetCacheMode executionProperty does not give you any of the options that are available with the cache hint.
> What is the preparedplan-cache used for and what is held at each node of the prepared plan cache? Is the sql statement the key and parsed query plan the value in the datagrid? This question will help us determine how to tune plan cache.
A prepared plan is the just as you would expect. It the processing plan generated by the optimizer for some given sql. The key is primarily the sql, but also must be tracked by vdb and possibly the session. At this time prepared plans are not distributed/replicated across a cluster as the plan is not serializable. I'm not sure how much tuning of the plan cache you'll need to do. It's more of a question of when should you be using PreparedStatements which are primarily responsible for creating a prepared plan entry - although if you are using CallableStatements or in general calling stored procedures then we are likely creating prepared plans under the covers.
> How is the scope of the cache determined by Teiid to be session/user/vdb? It can be over-ridden in the query hint but I don't really want to use query hint because it is not standard JDBC and doesn't lend itself to ORM solutions like JPA.
The scope is determined by the sql. By the functions called, session tables referenced etc. It needs to be overriden though because we currently do not know if for example you're using a source that will return different results based upon the user. The hint is just fine from a JDBC perspective (since it's just a sql comment), but yes it may be difficult to express correctly through JPA.
> What are the implications of moving Teiid memory off the heap? How is the reference to previous source tuples swapped out in case the off heap memory becomes full?
You have to first understand the memory layers in Teiid. There is the heap layer - this is the estimated heap memory held by the buffermanager (in dual queues) and potentially in use by plans. We'll proactively and based upon memory pressure copy/move values to a fixed memory buffer. At this point the batches have a know (and generally smaller) size. This fixed memory buffer layer can be either on or off heap. Generally off-heap will be better performance when you have volumes of data buffered through Teiid - as you will avoid GC sweeps through a large amount of memory and on some platforms can allocate more memory than you generally can to your java process as heap memory. Entries that are evicted from the fixed memory buffer spill to disk.
> Based on the architecture writeup; Teiid uses cursors for all data access. How do the cursors interact in caching?
I'm not sure what you are asking. The general principle is that we want to avoid buffering data whenever possible. Thus as much as data can be cursored/streamed in batches we'll do it. When you cache you generally end up with a copy of the entire results and then cursor on whichever batch is being accessed.
> We use CursoredStream to hold a reference to large data sets and read data in batches from the CursoredStream. Will Teiid cache the results from the CursoredStream opened on the Teiid JDBC connection?
I'm not familiar with CursoredStream. How specifically does it interact with the client?
> How is the invalidation of resultset cache handled when suppose that the user add/update/delete a row from the entity and this affects the resultset?
Changes are tracked to a table not row level, but of course by default we'll only know about changes issued through Teiid. From there the cache is configured with a max staleness setting to say how long you want entries to stay in the cache even after they are stale. There is an api to make Teiid aware of source changes outside of Teiid.
> If we need to reduce the amount of time taken for query planning and make the first instance of the same query take the same time; would firing the same query with "set noexec on" be a valid option? Are there any alternatives to improving the query plan as the noexec assumes that I know the queries that I need to prepare plan for beforehand.
You would use the noexec (or a detailed log) to confirm the time spent in planning, but ideally you would not use that as a mechanism to warm the prepared plan cache. If you can provide more details on your scenario, then we could recommend next steps.
> I am assuming that support for Infinispan in server mode would be defined by the Infinispan subsystem. Based on how the cache and buffer manager are used we will think about whether to use Infinispan in embedded mode or client-server mode.
It's more of a question of what benefit do you expect from the replication of results. Infinispan in the context of this discussion is mainly being used to replicate the keys for vdb/user result set caching. The result set cache data is replicated on demand directly through jgroups.
> Thanks in advance in reading through this long list of questions
No problem. They're good questions. Hopefully we'll get your issue addressed without too much hassle.