Three levels of "caching" discussed:
1) First is passive cache (passive from user standpoint) similar to current results set cache. Perhaps result object cache in future. As queries execute and flow through the system, similar queries hit cache and are fulfilled from cache. Ideally this cache would be distributed among nodes in a clustered environment but perhaps not initially (Note: I added this after our discussion) Least amount of user-involvement, least flexibility from user-standpoint, most automated. Only control is config params like TTL and cache size that user sets on instance/system basis.
2) Next level of caching is "pinned in memory" or "checkbox-enabled" cache/materialization backed by buffer manager (with buffer manager possibly backed by something like JBCache or Infinspan) or potentally backed by an in-memory embedded database. Here the user selects virtual tables and decides to materialize/cache them through a simple UI gesture. They would just be materialized to memory. User has limited control of matl process. Not suitable for complex matl schemes. How is refresh managed, implemented? Not for large data sets where lack of indexing will kill performance. More for data sets with intermittently available data sources or data sets that require complex calculations or aggregations. This is a desirable, new level of caching/materialization not currently offered in the product.
3) Third level is revamping of matl views as we know them. Moving away from process where user checks box in Designer and matl view source model and ddl scripts are magically generated under the covers, and when vdb is deployed scripts are spit out. Withing designer, make changes for user gesture to identify tables/views for "full" materialization. Gesture causes generation of matl view source model and matl view model on top, where matl view model matches virtual model from which it was generated. Underlying matl source model may ultimately differ but that would be user customization. Automated process would generate view/source models that match and have direct transformations by default. Open issue: How to decide when to hit matl view or not and how to present that to the user. Benefits to having single view for client apps (e.g. clients coded to always hit the foo table), client code should not have to embed decision of whether to hit matl view or source data.
Chaning client code too disruptive. Should be able to control shunt in data services layer. Options discussed include
- specialized language construct to allow joining/unioning between matl and source data, with query directive to override the default
- specialized materialization transformations that control this through explicit logic
- others? Matl view maintenance options discussed include:
- external scripts to pull data out of source and write to matl source
- virtual procedure logic
When materialized source model is created, need account for status data. Status table in materialized schema and source model suggested. What should it contain? Num rows materialized? Last updated? what else? This should be available to materialization management processes so users can check this information when performing more complex scenarios and as a basis for monitoring/auditing.
This is most complex/advanced form of materialization proving greatest flexibility to user but requiring most user involvement.