Wishlist item: support automated IN statement expansion with limit clauses
gadeyne.bram Oct 30, 2014 7:03 AMHi,
Just a thought that might be a future feature request.
In my configuration I use tables that contain millions of rows. These are mostly medical data. I create a lot of scripts that select eg 3000 patients. Then I need to collect some values of those patients in a different database.
In the past I've already discussed some difficulties I had with IN statements. Now I notice that these IN statements are also generated in the case when you join tables together. Teiid handles this perfectly unless a maximum number of rows is reached.
eg.
create local temporary table tmp_patients(
patientid integer NOT NULL
)
-- add 3000 patients to this table in some way.
Normally I would write something like this:
select x.patientid, x.value
from tableX x
join /*+ MAKEDEP*/ tmp_patients p on p.patientid = x.patientid.
Since p contains 3000 lines and x contains millions of lines Teiid tries to join these 2 tables in memory.
If tmp_patients would only contain eg. 400 rows then Teiid would generate something like this:
select x.patientid, x.value
from tableX x
where x.patientid in (?,?,?,?,?,?,?,?,....)
The question marks are then replaced by the 400 patient ids.
Therefore I use the following technique:
select x.patientid, x.value
from tableX x
where x.patientid in (
select patientid
from tmp_patients p
order by p.patientid
LIMIT 0,500
)
UNION ALL
select x.patientid, x.value
from tableX x
where x.patientid in (
select patientid
from tmp_patients p
order by p.patientid
LIMIT 500,500
)
UNION ALL
select x.patientid, x.value
from tableX x
where x.patientid in (
select patientid
from tmp_patients p
order by p.patientid
LIMIT 1000,500
)
UNION ALL
select x.patientid, x.value
from tableX x
where x.patientid in (
select patientid
from tmp_patients p
order by p.patientid
LIMIT 1500,500
)
UNION ALL
select x.patientid, x.value
from tableX x
where x.patientid in (
select patientid
from tmp_patients p
order by p.patientid
LIMIT 2000,500
)
UNION ALL
select x.patientid, x.value
from tableX x
where x.patientid in (
select patientid
from tmp_patients p
order by p.patientid
LIMIT 2500,500
)
As you can see the only thing that changes is the first number denoted in the LIMIT statement of the in statement.
It would be great if some feature would exist that automates this. But this is probably not as easy as it sounds.
Any thoughts?
With kind regards
Bram