5 Replies Latest reply on Sep 8, 2016 10:06 AM by ehudreches

    Slow down when fetching large result sets

    ehudreches

      This issue was observed using an embedded Teiid server in version 8.13.2. It was triggered by bad sqoop performance, but can be reproduced directly with JDBC.

       

      An HSQL table is defined in the VDB. The table’s schema contains about 40 pretty small string, decimal and int columns. The table contains 2,000,000 rows.

       

      A “SELECT * from <table>” SQL query is issued. The client simply iterate through the result set, just printing out some of the columns.

       

      Initially, the rows are fetched very quickly. But increasingly, there are pauses between batches of rows. By the time we reach 2,000,000 rows we observe 10 second pauses.

       

      The pauses correspond to the size of EmbeddedConfiguration.setProcessorBatchSize().When it is set to 256, the pauses happen every several hundred rows.

      When it is set to 16384 it happens between several thousands of rows and as a result the overall performance is much better.

       

      But obviously large values for process-batch-size will not solve the problem when hundreds of millions of rows are fetched.

       

      No such slow-down is observed when fetching the data directly from HSQL.

       

      When Teiid’s engine debugging is turned on, the slow-down can be observed from the logs messages that prints out TupleBatch.toString messages. These messages are initially printed out every several milli seconds. But as more rows are fetched there are several seconds difference between the messages.

        • 1. Re: Slow down when fetching large result sets
          shawkins

          It would take profiling or gc information to determine exactly what is happening.  A possible issue would be if the result set is being buffered in any way - are you using result set caching, a scrollable resultset, etc.?  Are you using a socket connection?

           

          Locally when I mock up this scenario against H2, I don't see anything like the behavior described regardless of the number of runs.

           

          > But obviously large values for process-batch-size will not solve the problem when hundreds of millions of rows are fetched.

           

          It is definitely fine to increase the processor batch size when you are in a low query volume scenario, but yes, it shouldn't be increased to unreasonable levels.

          • 2. Re: Slow down when fetching large result sets
            ehudreches

            Thanks for the quick response.

             

            • We don’t ask explicitly for result set caching or scrolling.  In the log I can see:  “Command has no cache hint and result set cache mode is not on”.

            Any other type of result set buffering/caching?

             

            • Socket connection: The client calls from JDBC. We use a socket with wire protocol WireProtocol.teii. Is that what you are asking?

             

            If it’s not too difficult, could you tell me how to issue the query directly to Teiid, on the server side, bypassing JDBC and the wire altogether?

             

            If you have any other ideas for experiments or what Teiid code to look at, please let me know.

             

            Thanks.

            • 3. Re: Slow down when fetching large result sets
              shawkins

              > We don’t ask explicitly for result set caching or scrolling.  In the log I can see:  “Command has no cache hint and result set cache mode is not on”.

               

              Then it will default to forward only and you wouldn't expect the result set to be buffered/cached.

               

              > Socket connection: The client calls from JDBC. We use a socket with wire protocol WireProtocol.teii. Is that what you are asking?

               

              Yes.  However locally I don't see any issue with a local or socket connection.

               

              > If it’s not too difficult, could you tell me how to issue the query directly to Teiid, on the server side, bypassing JDBC and the wire altogether?

               

              You can't bypass JDBC, but you can consume directly with a local connection from EmbeddedServer.getDriver() which bypasses the socket layer.

               

              I'll try specifically using HSQL with 8.13 rather than just testing on master with H2 to see if that makes a difference.  If there is more you can provide about what you are doing, such as a stripped down reproducer I could easily provide more input.

              • 4. Re: Slow down when fetching large result sets
                shawkins

                I am not able to reproduce this.  Using 8.13, HSQL, and a socket connection I can retrieve 2000000 rows of 40 small strings in under 10 seconds.  We'll need more details of your scenario to understand what you are seeing.

                • 5. Re: Slow down when fetching large result sets
                  ehudreches

                  Ok, thank you for testing it. There must be something else going on in our environment. I'll post if I find anything related to Teiid.