2 Replies Latest reply on Jul 13, 2017 10:32 AM by Don Krapohl

    Buffer Service and jdbc driver fetch size settings

    Don Krapohl Newbie

      My question is specifically for Cloudera Impala but would perhaps be relevant to any other jdbc driver.  From Cloudera's jdbc driver documentation there is a fetch size setting with this detail:

      ---------------------------------------------------------

      RowsFetchedPerBlock

      Default ValueData TypeRequired
      10000IntegerNo


      Description
      The maximum number of rows that a query returns at a time.
      Any positive 32-bit integer is a valid value, but testing has shown that performance gains are
      marginal beyond the default value of 10000 rows.

      ---------------------------------------------------------

       

      I see the processor batch size property in the buffer service.  We use 1024 on a jvm with 12gb allocated to Teiid.  My question is if such a big disparity between the jdbc fetch size ability and the Teiid setting means there is greater opportunity to increase processor batch size for Impala.  With 12gb allocated and 25 active plans should Teiid be able to do 4096 more quickly for example?

       

        • 1. Re: Buffer Service and jdbc driver fetch size settings
          Steven Hawkins Master

          > I see the processor batch size property in the buffer service.  We use 1024 on a jvm with 12gb allocated to Teiid.  My question is if such a big disparity between the jdbc fetch size ability and the Teiid setting means there is greater opportunity to increase processor batch size for Impala.  With 12gb allocated and 25 active plans should Teiid be able to do 4096 more quickly for example?

           

          The Teiid processor batch size assumes a nominal target of approximately 2kb of heap per row.  The working batch size can be larger or smaller depending on the data width of the rows.  At the JDBC source level we do set fetchSize property to 2x the working batch size.   I'm not sure about the relationship between the Statement.fetchSize and this Impala property. 

           

          The decision about how large to make the processor batch size would be influenced by a couple factors - the larger it is the more latency you can introduce, and number of pages/batches you expect to hold in memory.  At 4096 rows each batch/page could be taking approximate 8mb of heap which is a large size if you have a lot of temporary tables, internal materializations, etc. with indexes or otherwise set to prefer memory.  But if your workload is mostly streaming and you have expensive network hops, then yes a larger batch size would be fine.