11 Replies Latest reply on Apr 8, 2014 5:16 PM by Anil Allewar

    Eagar loading behavior of BLOBs in Teiid 8.4

    Anil Allewar Newbie

      I am trying to compare Teiid 8.4 performance against native JDBC for Oracle. For most part we found that Teiid performance is comparable with SQL access through JDBC driver when we have tables that don't have BLOB data. We found that with tables having BLOB data, as the number of records increases Teiid performance drops to about 5x-6x times that of direct JDBC access. Please see numbers in the table below that have the Blob column value as "TRUE"

       

      ORACLE-JDBC
      TEIID-JDBC
      Transactions:No of RecordsBlobIteration (ms)Iteration (ms)
      Table Size

      Average -

      1st Iteration

      Average -

      Remaining 4 Iteration

      Average -

      1st Iteration

      Average -

      Remaining 4 Iterations

      Table Name(from old run)
      *******424,351,139TRUE142710014
      *******131,302FALSE13739315
      *******720,85094,687,298FALSE4,3483,5114,5223,880
      *******21,8043,220,420TRUE5512131,5191,620
      *******00TRUE13616014
      *******8,6522,013,954FALSE633146539205
      *******2270,566TRUE14429210
      *******32,5310TRUE9926235,0495,098
      *******00FALSE13526716
      *******00FALSE13616212
      *******69389,770FALSE2284118358
      *******00TRUE13716015
      *******174FALSE1341927
      *******1,103,918119,609,996FALSE6,1205,1576,4425,793
      *******185,454541,726,992TRUE3,7613,30325,13925,518
      *******1,003,798142,704,688FALSE5,3794,4245,7194,939
      *******505,007230,497,476FALSE6,8765,9667,5387,068
      *******505,005405,861,672FALSE16,15015,09019,39919,080
      *******00FALSE13327012
      *******859,457136,837,530FALSE5,3554,5895,6304,979

       

      For the BLOB columns we are just retrieving the object using getObject() and then typecast based on whether the object is an instance of Blob; but not really reading the contents using either getBytes() or getInputStream() on the Blob object. When I debugged through the Teiid client code, I found that the object received in the org.teiid.netty.handler.codec.serialization.CompactObjectInputStream class contains a javax.sql.rowset.serial.SerialBlob object which might mean that the Blob was serialized on the server itself. I also put the read in a transaction and noticed same behavior.

       

      I am confused as to how Teiid handles LOBs; the documentation states that "Source LOB values (LOBs from physical sources) are typically accessed by reference, rather than having the value copied to a temporary location." but that doesn't seem to happening when I look at the behavior. Debugging through the code indicates that the Teiid JDBC client is actually reading the data in the stream and creating a new "javax.sql.rowset.serial.SerialBlob.SerialBlob" object.

       

      BlobType.readReference(ObjectInput) line: 145
      BlobType(Streamable).readExternal(ObjectInput) line: 125
      BatchSerializer$BlobColumnSerializer1.readObject(ObjectInput) line: 333

      Can you please explain what should be the best practice to consume Blobs from source? Also would appreciate any pointers to making read of Blob data performant in Teiid.

      Thanks,
      Anil
        • 1. Re: Eagar loading behavior of BLOBs in Teiid 8.4
          Ramesh Reddy Master

          Anil,

           

          Yes, Teiid does make a copy in its engine. Source DB is remote machine, and so is client machine. In order to serve contents after the source connection is closed/released, Teiid needs to make a copy. When it makes copy, note that it is not sending all the contents to client in one scoop, it does in a memory efficient manner in chunks.

           

          If client and server were in same VM and using the Local Connection, Teiid will pass the source driver's reference to Blob all the way to client avoiding the unnecessary copy.

           

          Ramesh..

          1 of 1 people found this helpful
          • 2. Re: Eagar loading behavior of BLOBs in Teiid 8.4
            Steven Hawkins Master

            There are multiple factors as to whether we copy lobs or attempt to stream them.  It starts with the translator.  If the copyLobs option is enabled, then all lob values will be copied to a value local to the Teiid server.  If copyLobs is not enabled, and inline lobs is enabled (the default) then when the server is serializing a lob value (for consumption by the client or as part of buffering) if the size of the lob is less than the max lob size (8kb), then we'll create a local copy rather than a streaming reference.

             

            So what are the sizes of your lobs and have you changed any of these settings?

            • 3. Re: Eagar loading behavior of BLOBs in Teiid 8.4
              Anil Allewar Newbie

              Steven,

               

              We use the default settings; no customization so I am assuming that inline lobs is true an the max lob size is 8KB. We also compared the table size and the number of rows and found that the average row size is less that 8KB which definitely means that the size of the lobs doesn't exceed 8KB.

               

              I am curious to know what happens if the lob size if greater than 8KB; how does Teiid provide this data to a remote client?

               

              Anil

              • 4. Re: Eagar loading behavior of BLOBs in Teiid 8.4
                Steven Hawkins Master

                > I am curious to know what happens if the lob size if greater than 8KB; how does Teiid provide this data to a remote client?

                 

                All that is initially sent is a reference value.  Then when the client requests data from the lob it is chunked to the client in 100kb batches.

                 

                Are you actually reading all of the lob stream values in your test, if not then the lob inlining may make the default Teiid approach seem slower.

                • 5. Re: Eagar loading behavior of BLOBs in Teiid 8.4
                  Anil Allewar Newbie

                  Yes, we are actually reading all BLOBs in the test now so that we make an apple-to-apple comparison between Teiid and Oracle JDBC access.

                   

                  Earlier we were just getting a reference to the Blob; but in case of Teiid because of inline LOBs we were getting all the data copied anyways.

                   

                  W.r.t the reference being passed, does Teiid hold onto the source connection till the resultset is consumed or would it pull a new connection from the pool for each BLOB data being requested?

                   

                  I am trying to validate another design decision we took wherein we have streaming factory references passed to the client in case of large blobs/virtual columns etc. Once the client actually wants to read contents of the BLOB/virtual column

                  1. we pass the source connection to the callback streaming factory
                  2. create a query based on the context that was stored in the initial resultset creation
                  3. Execute the query against the source connection
                  4. Return a stream with reference to the BLOB/virtual column content

                  How would this approach be different from the Teiid one where you pass a reference to the BLOB via the DPQ? Please note that we don't hold any source connection reference in our design but would execute the query against source for each BLOB column being requested.

                   

                  Anil

                  • 6. Re: Eagar loading behavior of BLOBs in Teiid 8.4
                    Steven Hawkins Master

                    > W.r.t the reference being passed, does Teiid hold onto the source connection till the resultset is consumed or would it pull a new connection from the pool for each BLOB data being requested?

                     

                    There are two built-in behaviors.  Some sources are marked as supporting lob reads after the result set is closed (which may not always be possible), in which case the result set / connection is closed as normal at the end of execution.  Otherwise we do hold the connection until the user request / result set is closed.

                     

                    > How would this approach be different from the Teiid one where you pass a reference to the BLOB via the DPQ? Please note that we don't hold any source connection reference in our design but would execute the query against source for each BLOB column being requested.

                     

                    Yes, that is different.  For lobs that are locatable by primary key or other means, that is a viable approach.

                    • 7. Re: Eagar loading behavior of BLOBs in Teiid 8.4
                      Ramesh Reddy Master

                      Anil,

                      Yes, we are actually reading all BLOBs in the test now so that we make an apple-to-apple comparison between Teiid and Oracle JDBC access

                      How does numbers differ now?

                      W.r.t the reference being passed, does Teiid hold onto the source connection till the resultset is consumed or would it pull a new connection from the pool for each BLOB data being requested?

                      it keeps the connection/resultset open until the data is read or copylobs job is finished.

                      How would this approach be different from the Teiid one where you pass a reference to the BLOB via the DPQ? Please note that we don't hold any source connection reference in our design but would execute the query against source for each BLOB column being requested.

                      In Teiid, when its says sending reference value, it not sending the source connection reference, buffer manager will send a its own reference-id with which client can request for additional blob data. Where as Teiid engine keeps track of source resultset/connection to reference-id information. I does not ever re-queries the source for the blob data again rather keeps the current reference to resultset. 

                       

                      I know you said you do not hold source connection but your (1) contradicts that right? What you want to do may be valid but will only work in the embedded situation. But in embedded the handling is by true reference, no data ever streamed. If it is remote case, then I guess you wanting do is inform the client that there is blob information there, if they want to read blob, issue a separate call for it, rather than using in-line stream or any other option. Otherwise I understood your question wrong!

                       

                      Ramesh..

                      • 8. Re: Re: Eagar loading behavior of BLOBs in Teiid 8.4
                        Anil Allewar Newbie

                        Ramesh,

                         

                        Interestingly Teiid seems to be faster for non-blob tables when we read the columns using getObject().toString() and read the BLOB bytes. I am trying to reason why that would be the case; the getObject() seems to be lot slower on Oracle side than it was on the previous run whereas it is about 1/3 slower for Teiid than Oracle. Please see the code snippet below.

                         

                        My assumption was that Teiid should be marginally slower than direct JDBC access given same fetch size of 512 on the driver. This is what we found on the previous run when we had line 9-24 commented. Any pointers would be appreciated!!

                         

                        No of RecordsOracle-JDBC
                        Rem Avg - 4 Iterations
                        Teiid-JDBC
                        Rem Avg - 4 Iterations
                        0220
                        0117
                        0211
                        118
                        13434
                        6936582
                        8,652583554
                        505,00561,33454,536
                        505,00723,53621,969
                        720,85013,12312,001
                        859,45717,51814,869
                        1,003,79815,43114,071
                        1,103,91818,80715,444

                         

                         

                        while (rs.next()) {
                          ++nRows;
                          for (int columnIndex = 1 ; columnIndex <= nCols ; ++columnIndex)
                          {
                               startTimeNsGetObject = System.nanoTime();
                               Object o = rs.getObject(columnIndex);
                               elapsedTimeNsGetObject += (System.nanoTime() - startTimeNsGetObject);
                              
                               if (o != null)
                               {
                                    if (o instanceof Blob)
                                    {
                                         hasBlob = true;
                                         blob = (Blob)o;
                                         blobLength = blob.length();
                                         totalBytes += blobLength;
                                         // The first byte is at position 1
                                         blob.getBytes(1, (int) blobLength);
                                    }
                                    else
                                    {
                                         totalBytes += ((o.toString().length()) * 2);
                                    }
                               }
                          }
                          }
                        

                         

                        Anil

                        • 9. Re: Re: Eagar loading behavior of BLOBs in Teiid 8.4
                          Ramesh Reddy Master

                          Anil,

                           

                          In Teiid JDBC driver there is no penalty to calling getObject() vs getInt() kind of calls. Underneath all of them call getObject.  When Teiid calls Oracle driver's getXXX call, it uses the specific call like getInt(), getString(0 etc as Teiid is aware of the data types it is expecting. I suspect Oracle driver is doing something more work in the case of getObject call that case. May be their data is packed in different format, and running a "toString" needs to convert to the native type then to string, where as in Teiid the data is already in native type. To test this theory, instead of doing getObject(), go specific type calls on resultset on both Teiid and Oracle.

                           

                          Ramesh..

                          • 10. Re: Re: Eagar loading behavior of BLOBs in Teiid 8.4
                            Anil Allewar Newbie

                            Ramesh,

                             

                            Please note line 5-8 and you would find that we are just timing the getObject() call that seems to be a lot slower. The toString() call is timed using a different variable altogether.

                             

                            As the next step, I am anyways going to remove the read part and only have the code do getObject() rather than doing toString() or reading bytes.

                             

                            Anil

                            • 11. Re: Re: Eagar loading behavior of BLOBs in Teiid 8.4
                              Anil Allewar Newbie

                              Ramesh,

                               

                              With reference to the design, when we encounter a Blob we send a BLOB reference to the client that has an InputStreamFactory associated with it . The InputStreamFactory has the JNDI name of the source connection pool associated with this resultset. Now when the client calls getBytes()/getInputStream() for the BLOB, the call gets passed to the getInputStream() method of the InputStreamFactory implementation. This will lookup a connection based on the JNDI name from the JBoss JNDI context and pass it to the callback class. The callback already has the primary key and other information to issue a query against the source connection and stream the BLOB back; so essentially for each BLOB data call we issue a separate call.

                               

                              I hope that answers your question!! Thanks for the detailed explanation on the BLOB handling.

                               

                              Anil