3 Replies Latest reply on May 24, 2017 8:37 AM by shawkins

Is it possible to run query in distributed way using teiid cluster?

kulbhushanc May 23, 2017 8:34 AM

I have setup teiid cluster, host1 and host2 are two hosts in cluster

If I want to run sql query on teiid in distributed way i.e. let's say I am firing sql query like

"Select * from a_table"

So here is it possible that some of the data for the query comes from host1 and some data from host2?

I am using teiid 9.1.3 and jboss 10.0.0

Thanks,

Kulbhushan Chaskar.

1. Re: Is it possible to run query in distributed way using teiid cluster?

shawkins May 23, 2017 8:53 AM (in response to kulbhushanc)

Teiid doesn't do automatic partitioning of a top level user query. A basic assumption is that your workload will have sufficient query volume to effectively distribute load across the cluster. Do you have a low query volume / large data volume processing scenario?
Actions
2. Re: Is it possible to run query in distributed way using teiid cluster?

kulbhushanc May 24, 2017 4:48 AM (in response to shawkins)

What I understood..
If a table containing large data i.e. it's having millions of data then if I fire select * on that table then work load will be distributed in cluster, if the table containing few records then it will be executed by only one node of the cluster. am I right?
>Do you have a low query volume / large data volume processing scenario?
I do have both scenarios.
Actions
3. Re: Is it possible to run query in distributed way using teiid cluster?

shawkins May 24, 2017 8:37 AM (in response to kulbhushanc)

> am I right?

No, the load of a single user query is not automatically distributed. Depending upon the backend, there may be little value in splitting the work of just select * against a single table. You are probably more interested in splitting the work when the Teiid layer is performing a significant amount of processing, such as multiple federated joins. Unfortunately there is nothing built-in for that scenario.

Getting an effective distribution of work is largely based on how easily partitioned the work is - a strategy based upon that is to front your VDB with another VDB that uses partitioned union views where each branch points to a different data source that is configured to round robin connections back to the original VDB. This introduces an additional vdb/hop in processing, but the load will be more distributed - although without precise understanding of work levels on the node, resilience, etc. that you may want.
Actions

Go to original post