Version 4

    The following is an initial step at providing guidelines for designing a data access layer using Teiid.    This approach is a hybrid (i.e., data access layer and data abstraction layer) and is adapted based on Teiid's data driven approach to data federation.    A goal is to help ensure that your data access layer meets the requirements of your application, performs efficiently and securely, and is easy to maintain and extend as business requirements change.    And I hope, that if this is useful, it can continually be updated to add more detail or clarification that would be beneficial to desigining a data access layer using Teiid. 

     

    Design Approach

     

    1. Create an overall design for your data access layer:
      1. Indentify the data source requirements
        1. Enumerate the data sources
        2. What method will each be accessed
        3. Are there special security requirements for obtaining a connection (i.e., does the user credentials need to be passed down or is it a trusted subsystem)
        4. Is materialization needed? (i.e., so that transaction system isn't accessed directly or specific reporting needs required)
        5. Is this for read-only or will there be updates
        6. Are blob's going to be read or written?
      2. Determine how the data will be exposed
        1. Which Teiid interfaces to be used (i.e.., JDBC, ODBC, Web Services, etc.)
        2. What 3rd party tools will be used (i.e,, Business Objects, BIRT, etc.)
      3. Determine security requirements
        1. What level of data access security to be used (i.e., use roles, row or column level security)
      4. Consider performance and scalability objectives
        1. Can materialization be used to consolidate different datasources for improved access
        2. Can caching be used for static data for improved performance
      5. Define the data abstraction interface(s)

                     This step is a combination of working from the bottom-up and the top-down, and ends with mapping the logical model to your business entities.

        1. Define the business entities (i.e, customer, order, P&L Report, Accounts Payable Report, etc.)  that are exposed by the datasources
          1. build physical model via designer import option
          2. build a virtual model, creating one view for each physical (1-to-1 mapping) table
          3. build the business entity views (i.e., customer, order, etc.) by creating transformations that combine the views created in step #2
        2. Define the data abstraction layer that the user/developer will be used based on the business entities
          1. Based on requirements of the external systems, define the logical views (i.e, interfaces) to be exposed for use
        3. Map your logical views in the transformations to the business entities.
        4. When you build the vdb, change the visibility of the source models to false.   Don't give direct access to the source models for a couple of reasons:
          1. Using the abstraction layered approach will enable you to make changes to your source structure without impacting the virtual layers.  This includes the ability to swap datasources with possible minimal impact.
          2. Ensures all access goes thru the same business rules in the virtual layers.
    1. Other Considerations
      1. Do you plan to use a load-balancer to distribute load across the servers (see below Load Balancing / HA Options)
    2. Tuning
      1. see the Caching section below for tuning options.
      2. See the Admin-Guide, section Performance Tuning / Memory Management regarding configuring buffer manager.
      3. The Admin-Guide Performance Tuning section also talks about memory allocation of on-heap vs off-heap buffer space in large RAM scenarios.
      4. How to influence the optimizer
        1. The Reference Guide has high-level descriptions of many of the optimizations.  You can also reference https://community.jboss.org/wiki/DesignerMetadataUsageRequirementsInTeiidRuntime tha touches nearly all the metadata properties.
      5. Considerations:
          • If processing large volumes (100+ mb's) of unique data sets, consider the following options:
            • If more memory is available for allocation, then it should be increased
            • If not enough memory, then consider disabling the value cache by setting org.teiid.useValueCache to false.   This can be done by editing <jboss-install>/bin/run.conf and adding the property in the format of -Dproperty=value
      6. Buffer Management
        1. With 7.7, especially the buffer manager can typically be left at the default settings.  Only if there is a specific need, such as the usage of direct memory or an extreme amount of intermediate storage, would values need to be adjusted
      7. Data Source(s)
        1. The datasources accessed by a VDB may require the “max-pool-size” property to be adjusted. The setting needs to be adjusted across all the datasources that are considered important to meeting the desired level of maximum concurrent client queries.

     

    Load Balancing / HA Options

    Basic load balancing capabilities are built into the Teiid JDBC driver, and ensure that after a connection is established, clients will balance query requests between the available servers indicated on the URL. An option for tcp load balancing is to use HAProxy.   A article regarding how it can be used can be found here: http://tenfourty.com/2011/04/09/how-to-load-balance-tcp-connections-with-haproxy/. Benefits to using  HAProxy or something similar are:

    • System can be scaled up without having to change the URL
    • Servers can be hidden to direct user access

    Other third-party load balancing solutions can provide other capabilities, such as:

    • Load balancing for web service request to Teiid
    • Network redirect to backup site in a disaster recovery scenario

     

    Caching

     

    # i.  Reasons to use a caching

     

    • Improve query performance by caching expensive queries or static data
    • When multiple data sources are involved, this could ease network load
    • Provide data replication so that consolidated information can be provided on a timely bases. 
    • Provide the ability to work disconnected.  Data can be replicated so that the primary system/data is off loaded to local processing.

     

    # ii.  Caching options

     

    1. Internal
    • ResultSet Cache
      • When its refreshed:  the Teiid instance is notified via the EventDistributor bean (there is no system procedure hook at this time for that call) that will appropriately flush the resultset and other cache entries that were built off of the specified tables (it currently does not go down to the row/column granularity)
    • Code Tables
    • Materialized View (i.e., temp tables)
    1. External
    • Materialized View

     

    # iii.  Performance Considerations

     

    1. When considering materialization, consider selecting a view that has a join.   This will eliminate the engine from doing the join and improve performance
    2. For internal materialization, be sure to define primary index on materialized table
    3. Internal cache size limitation is 2,147,483,647 rows (2^31-1)

     

    # iv.  Use Cases

     

    1. Static data provided to the application for populating pick lists.

    Use an internal materialized table and populate it from a data source. This would enable queries to be performant when returning static data to the application.

    1. Improve performance of complex queries by materializing the results. 

    When a complex query, which could be reading multiple data sources, is running real slow, it could be an option to materialize the results. Making the same query from the client, appear much  This optionUse an internal materialized table and populate it from a data source. This would enable queries to be performant when returning static data to the application.

    1. Provide the ability to stage business data on a timely manner, such as catalogs, advertisements, discounts, etc., is made available to the applications in a recurring timely fashion (i.e., weekly).

    Using external materialization, business data can be read from its various sources and consolidated for  use.    The replications can be scheduled such that the data can become available when needed.  In this case, the catalogs and advertisements will be made available once the replication is done.

    1. Using temp tables

     

    Creating a snapshot of a subset of data is a good use.