> First off all I'd like to mention that I like Jboss Teiid a lot!
> To be more precise, I'm collecting data from several databases (Oracle, Sybase, MySQL) and I insert them in a table in a MySQL server. Inserting 500.000 rows took several hours. Currently I'm using teiid 8.5.
What is the nature of the insert? An insert against a view? Directly against a source table? How is the client involved? What we need to confirm is if you are in a scenario where the insert should be batched and if not if there is an easy change to make on either your or our end to allow it to be batched.
Thank you for the quick reply.
I'm currently using Teiid to collect data from several data sources to combine them in a single result set. To create the report there is currently an issue ([TEIID-2776] ArrayIndexOutOfBoundsException - JBoss Issue Tracker) in Teiid that I'd like to tackle by using some MySQL specific features.
Therefore I collect the data from these sources and insert them in the MySQL table. I can not put the real query's here because they involve medical and personal data. So I'll describe the use case here.
First I've created a local temporary table that contains some admission ID's from some patients. The ones where I'd like to collect data from several sources for.
Next a query selects the admission numbers and then joins them with some Sybase tables. The result is inserted in the MySQL table. I've only created a virtual database containing references to several databases. I didn't create views. So in pseudo code you'll get something like this:
insert into schema_someRealMySQLDatabase
select some columns
from #tmp_table_containing_admission_ids ids
join some_sybase_table st on st.admissionid = ids.admissionid
left join some_other_sybase_table sot on sot.variableid = st.variableid
where some filters
I'm not sure what you mean by a batched insert. Could you point me to some documentation?
With kind regards
IMO, you can really make use of the Views in Teiid. View is similar to the table, but you can define a transformation SQL that pulls data from all the other sources and make makes result set out of it. For example you can create view called "Patient" with transformation like
select col1, col2, col3 from some_sybase_table st left join some_other_sybase_table sot on sot.variableid = st.variableid where some filters
now instead of storing the above view again in the view, from your client, you can issue query like
select col1, col2, col3 from #tmp_table_containing_admission_ids ids join "patient" p on ids.admissionid = p.col1;
here you kind of removed the need to duplicate the data that is from the original data stores by using real time integration, thus removing the data staleness factor or burden of keeping them in-sync among multiple stores. However, if you still want to store you can still do that. Here is example of that
create table in MySQL called "schema_someRealMySQLDatabase", then your client can issue
insert into schema_someRealMySQLDatabaseselect col1, col2, col3 from some_sybase_table st left join some_other_sybase_table sot on sot.variableid = st.variableid where some filters
The main idea with data virtualization is leaving the data where it is originated from, but provide a integration and logical view creation over them, so that the consuming client as been abstracted from sources.
As per the batch insert, the above should have enabled by default if translator and source driver supports it I thought, as you are using insert with query command. There may be a configuration setting I am forgetting, I look around if I find something I will let you know.
> I'm not sure what you mean by a batched insert. Could you point me to some documentation?
With JDBC you can prepare an insert and add value sets to it and execute the insert as a batch. Batching is something that would typically happen automatically if possible when an insert is managed by Teiid. If for example from the client side you were issuing single inserts through Teiid, then there is little that we could do to improve performance.
> insert into schema_someRealMySQLDatabase select ...
This should be processed as a batched insert under the covers. That is we'll build up results from the query expression, then insert them in batches into the source. Do you have a command log showing how many inserts are processed? There is a property on the translator, maxInsertBatchSize, that defaults to 2048 rows that controls how large of a batch should be processed at a time.