3 Replies Latest reply on May 24, 2017 8:37 AM by shawkins

    Is it possible to run query in distributed way using teiid cluster?

    kulbhushanc

      I have setup teiid cluster, host1 and host2 are two hosts in cluster

       

      If I want to run sql query on teiid in distributed way i.e. let's say I am firing sql query like

      "Select * from a_table"

      So here is it possible that some of the data for the query comes from host1 and some data from host2?

       

      I am using teiid 9.1.3 and jboss 10.0.0

       

      Thanks,

      Kulbhushan Chaskar.

        • 1. Re: Is it possible to run query in distributed way using teiid cluster?
          shawkins

          Teiid doesn't do automatic partitioning of a top level user query.  A basic assumption is that your workload will have sufficient query volume to effectively distribute load across the cluster.  Do you have a low query volume / large data volume processing scenario?

          • 2. Re: Is it possible to run query in distributed way using teiid cluster?
            kulbhushanc

            What I understood..

            If a table containing large data i.e. it's having millions of data then if I fire select * on that table then work load will be distributed in cluster, if the table containing few records then it will be executed by only one node of the cluster. am I right?

            >Do you have a low query volume / large data volume processing scenario?

            I do have both scenarios.

            • 3. Re: Is it possible to run query in distributed way using teiid cluster?
              shawkins

              > am I right?

               

              No, the load of a single user query is not automatically distributed.  Depending upon the backend, there may be little value in splitting the work of just select * against a single table.  You are probably more interested in splitting the work when the Teiid layer is performing a significant amount of processing, such as multiple federated joins.  Unfortunately there is nothing built-in for that scenario. 

               

              Getting an effective distribution of work is largely based on how easily partitioned the work is - a strategy based upon that is to front your VDB with another VDB that uses partitioned union views where each branch points to a different data source that is configured to round robin connections back to the original VDB.  This introduces an additional vdb/hop in processing, but the load will be more distributed - although without precise understanding of work levels on the node, resilience, etc. that you may want.