5 Replies Latest reply on Mar 12, 2015 11:53 AM by shawkins

    Delimited Flat Files | Performance observations

    shiveeta.mattoo

      Hi,

       

      We observe a performance degradation for File Sources while accessing the File data using Teiid VDB vs a normal File read in our application.

      The performance degradation is nearly ~3 times. However in case of Fixed Width Files, instead, performance improvement is observed.

       

      For the delimited file source, I profiled the application and the major bottleneck reported is at TextTableNode.process method, specifically - TexTableNode.parseDelimitedLine.

      Attached is a screenshot reflecting this, generated using YourKit profiler.

       

      TextTableNodeProfilingResult.png

       

      Any thoughts on optimization on this?

        • 1. Re: Delimited Flat Files | Performance observations
          shawkins

          > We observe a performance degradation for File Sources while accessing the File data using Teiid VDB vs a normal File read in our application.

           

          A performance degradation relative to what?

           

          > For the delimited file source, I profiled the application and the major bottleneck reported is at TextTableNode.process method, specifically - TexTableNode.parseDelimitedLine.

           

          It should be expected that delimited is more expensive to process than fixed as full scans of the line/characters are required.  Using the stringbuilder does add overhead, but simplifies handling of escapes.  You can submit a patch/log an issue if you want to improve what's there.

          • 2. Re: Delimited Flat Files | Performance observations
            shiveeta.mattoo

            Thanks Steven,

            The performance degradation was in comparison to direct File read from our application which did not include the virtualization layer.

            On further testing, the performance degradation was observed for Fixed width files as well.

             

            I am working on submitting a patch for improving the performance improvement for Flat File read.

            • 3. Re: Delimited Flat Files | Performance observations
              shiveeta.mattoo

              I made following changes locally to readLine and parseDelimitedLine to fix the performance bottlenecks reported by the profiler.

              - Instead of BufferedReader, did manual buffering of the lines as suggested here -  http://www.kegel.com/java/wp-javaio.html

              - Enhanced parseDelimited line to reduce the number of String objects that were being created.

               

              Based on prelim readings, these changes gave a performance improvement of nearly 50sec for a 446MB file (5 million records).

              However the results are not yet satisfactory with respect to readings of file read without a virtualization layer, where in the performance is almost 3 times slower compared to a normal read. For a 48M record file, (~ 3 GB), the time taken is almost 1.5 hours.

               

              Fresh profiling results report bottlenecks at BufferedManagerImpl and BufferedFrontedFileStoreCache.get.

              Although this is common framework code for any file source, any pointers, if anything special done for file source, which might cause these to be reported as Hot spots.

               

              Any pointers would be helpful. Thank you.

              • 4. Re: Delimited Flat Files | Performance observations
                shawkins

                > Based on prelim readings, these changes gave a performance improvement of nearly 50sec for a 446MB file (5 million records).


                That's encouraging.

                 

                > Fresh profiling results report bottlenecks at BufferedManagerImpl and BufferedFrontedFileStoreCache.get.

                 

                That would have more to do with processing above the texttablenode, such that the results are being buffered for further processing rather than just streamed.  What does your user query look like?

                • 5. Re: Delimited Flat Files | Performance observations
                  shawkins

                  Is there any code that you would want to share for this?