1 Reply Latest reply on Oct 30, 2014 7:31 AM by shawkins

    Wishlist item: support automated IN statement expansion with limit clauses

    gadeyne.bram

      Hi,

       

      Just a thought that might be a future feature request.

       

      In my configuration I use tables that contain millions of rows. These are mostly medical data. I create a lot of scripts that select eg 3000 patients. Then I need to collect some values of those patients in a different database.

       

      In the past I've already discussed some difficulties I had with IN statements. Now I notice that these IN statements are also generated in the case when you join tables together. Teiid handles this perfectly unless a maximum number of rows is reached.

       

      eg.

       

      create local temporary table tmp_patients(

           patientid integer NOT NULL

      )

      -- add 3000 patients to this table in some way.

       

      Normally I would write something like this:

       

      select x.patientid, x.value

      from tableX x

      join /*+ MAKEDEP*/ tmp_patients p on p.patientid = x.patientid.

       

      Since p contains 3000 lines and x contains millions of lines Teiid tries to join these 2 tables in memory.

       

      If tmp_patients would only contain eg. 400 rows then Teiid would generate something like this:

       

      select x.patientid, x.value

      from tableX x

      where x.patientid in (?,?,?,?,?,?,?,?,....)

       

      The question marks are then replaced by the 400 patient ids.

       

      Therefore I use the following technique:

       

      select x.patientid, x.value

      from tableX x

      where x.patientid in (

           select patientid

           from tmp_patients p

           order by p.patientid

           LIMIT 0,500

      )

      UNION ALL

      select x.patientid, x.value

      from tableX x

      where x.patientid in (

           select patientid

           from tmp_patients p

           order by p.patientid

           LIMIT 500,500

      )

      UNION ALL

      select x.patientid, x.value

      from tableX x

      where x.patientid in (

           select patientid

           from tmp_patients p

           order by p.patientid

           LIMIT 1000,500

      )

      UNION ALL

      select x.patientid, x.value

      from tableX x

      where x.patientid in (

           select patientid

           from tmp_patients p

           order by p.patientid

           LIMIT 1500,500

      )

      UNION ALL

      select x.patientid, x.value

      from tableX x

      where x.patientid in (

           select patientid

           from tmp_patients p

           order by p.patientid

           LIMIT 2000,500

      )

      UNION ALL

      select x.patientid, x.value

      from tableX x

      where x.patientid in (

           select patientid

           from tmp_patients p

           order by p.patientid

           LIMIT 2500,500

      )

       

       

      As you can see the only thing that changes is the first number denoted in the LIMIT statement of the in statement.

       

      It would be great if some feature would exist that automates this. But this is probably not as easy as it sounds.

       

      Any thoughts?

       

      With kind regards

      Bram

        • 1. Re: Wishlist item: support automated IN statement expansion with limit clauses
          shawkins

          > Since p contains 3000 lines and x contains millions of lines Teiid tries to join these 2 tables in memory.

           

          You loose me a little here.  With these row counts and especially with the makedep hint, then we should plan a dependent join that will divide up the independent side based upon the source setting of the dependent side (max dependent in predicates, max in criteria size, etc.)

           

          > Any thoughts?

           

          I'd want to see the debug plan for a scenario that you think should be working, but is not.