11 Replies Latest reply on Apr 8, 2014 5:16 PM by anilallewar

Eagar loading behavior of BLOBs in Teiid 8.4

anilallewar Apr 3, 2014 4:32 PM

I am trying to compare Teiid 8.4 performance against native JDBC for Oracle. For most part we found that Teiid performance is comparable with SQL access through JDBC driver when we have tables that don't have BLOB data. We found that with tables having BLOB data, as the number of records increases Teiid performance drops to about 5x-6x times that of direct JDBC access. Please see numbers in the table below that have the Blob column value as "TRUE"

ORACLE-JDBC

TEIID-JDBC

Transactions:

No of Records

Blob

Iteration (ms)

Table Size

Average -

1st Iteration

Average -

Remaining 4 Iteration

Average -

1st Iteration

Average -

Remaining 4 Iterations

Table Name

(from old run)

*******

4,351,139

TRUE

142

100

*******

1,302

FALSE

137

*******

720,850

94,687,298

FALSE

4,348

3,511

4,522

3,880

*******

21,804

3,220,420

TRUE

551

213

1,519

1,620

*******

TRUE

136

*******

8,652

2,013,954

FALSE

633

146

539

205

*******

270,566

TRUE

144

*******

32,531

TRUE

992

623

5,049

5,098

*******

FALSE

135

*******

FALSE

136

*******

693

89,770

FALSE

228

183

*******

TRUE

137

*******

FALSE

134

*******

1,103,918

119,609,996

FALSE

6,120

5,157

6,442

5,793

*******

185,454

541,726,992

TRUE

3,761

3,303

25,139

25,518

*******

1,003,798

142,704,688

FALSE

5,379

4,424

5,719

4,939

*******

505,007

230,497,476

FALSE

6,876

5,966

7,538

7,068

*******

505,005

405,861,672

FALSE

16,150

15,090

19,399

19,080

*******

FALSE

133

*******

859,457

136,837,530

FALSE

5,355

4,589

5,630

4,979

For the BLOB columns we are just retrieving the object using getObject() and then typecast based on whether the object is an instance of Blob; but not really reading the contents using either getBytes() or getInputStream() on the Blob object. When I debugged through the Teiid client code, I found that the object received in the org.teiid.netty.handler.codec.serialization.CompactObjectInputStream class contains a javax.sql.rowset.serial.SerialBlob object which might mean that the Blob was serialized on the server itself. I also put the read in a transaction and noticed same behavior.

I am confused as to how Teiid handles LOBs; the documentation states that "Source LOB values (LOBs from physical sources) are typically accessed by reference, rather than having the value copied to a temporary location." but that doesn't seem to happening when I look at the behavior. Debugging through the code indicates that the Teiid JDBC client is actually reading the data in the stream and creating a new "javax.sql.rowset.serial.SerialBlob.SerialBlob" object.

BlobType.readReference(ObjectInput) line: 145
BlobType(Streamable).readExternal(ObjectInput) line: 125
BatchSerializer$BlobColumnSerializer1.readObject(ObjectInput) line: 333

Can you please explain what should be the best practice to consume Blobs from source? Also would appreciate any pointers to making read of Blob data performant in Teiid.

Thanks,

Anil

1. Re: Eagar loading behavior of BLOBs in Teiid 8.4

rareddy Apr 3, 2014 7:42 PM (in response to anilallewar)

Anil,

Yes, Teiid does make a copy in its engine. Source DB is remote machine, and so is client machine. In order to serve contents after the source connection is closed/released, Teiid needs to make a copy. When it makes copy, note that it is not sending all the contents to client in one scoop, it does in a memory efficient manner in chunks.

If client and server were in same VM and using the Local Connection, Teiid will pass the source driver's reference to Blob all the way to client avoiding the unnecessary copy.

Ramesh..
1 of 1 people found this helpful
Actions
2. Re: Eagar loading behavior of BLOBs in Teiid 8.4

shawkins Apr 3, 2014 8:44 PM (in response to rareddy)

There are multiple factors as to whether we copy lobs or attempt to stream them. It starts with the translator. If the copyLobs option is enabled, then all lob values will be copied to a value local to the Teiid server. If copyLobs is not enabled, and inline lobs is enabled (the default) then when the server is serializing a lob value (for consumption by the client or as part of buffering) if the size of the lob is less than the max lob size (8kb), then we'll create a local copy rather than a streaming reference.

So what are the sizes of your lobs and have you changed any of these settings?
Actions
3. Re: Eagar loading behavior of BLOBs in Teiid 8.4

anilallewar Apr 4, 2014 6:09 PM (in response to shawkins)

Steven,

We use the default settings; no customization so I am assuming that inline lobs is true an the max lob size is 8KB. We also compared the table size and the number of rows and found that the average row size is less that 8KB which definitely means that the size of the lobs doesn't exceed 8KB.

I am curious to know what happens if the lob size if greater than 8KB; how does Teiid provide this data to a remote client?

Anil
Actions
4. Re: Eagar loading behavior of BLOBs in Teiid 8.4

shawkins Apr 4, 2014 6:42 PM (in response to anilallewar)

> I am curious to know what happens if the lob size if greater than 8KB; how does Teiid provide this data to a remote client?

All that is initially sent is a reference value. Then when the client requests data from the lob it is chunked to the client in 100kb batches.

Are you actually reading all of the lob stream values in your test, if not then the lob inlining may make the default Teiid approach seem slower.
Actions
5. Re: Eagar loading behavior of BLOBs in Teiid 8.4

anilallewar Apr 7, 2014 4:23 PM (in response to shawkins)
Yes, we are actually reading all BLOBs in the test now so that we make an apple-to-apple comparison between Teiid and Oracle JDBC access.

Earlier we were just getting a reference to the Blob; but in case of Teiid because of inline LOBs we were getting all the data copied anyways.

W.r.t the reference being passed, does Teiid hold onto the source connection till the resultset is consumed or would it pull a new connection from the pool for each BLOB data being requested?

I am trying to validate another design decision we took wherein we have streaming factory references passed to the client in case of large blobs/virtual columns etc. Once the client actually wants to read contents of the BLOB/virtual column
we pass the source connection to the callback streaming factory
create a query based on the context that was stored in the initial resultset creation
Execute the query against the source connection
Return a stream with reference to the BLOB/virtual column content
How would this approach be different from the Teiid one where you pass a reference to the BLOB via the DPQ? Please note that we don't hold any source connection reference in our design but would execute the query against source for each BLOB column being requested.

Anil
Actions
6. Re: Eagar loading behavior of BLOBs in Teiid 8.4

shawkins Apr 7, 2014 6:12 PM (in response to anilallewar)

> W.r.t the reference being passed, does Teiid hold onto the source connection till the resultset is consumed or would it pull a new connection from the pool for each BLOB data being requested?

There are two built-in behaviors. Some sources are marked as supporting lob reads after the result set is closed (which may not always be possible), in which case the result set / connection is closed as normal at the end of execution. Otherwise we do hold the connection until the user request / result set is closed.

> How would this approach be different from the Teiid one where you pass a reference to the BLOB via the DPQ? Please note that we don't hold any source connection reference in our design but would execute the query against source for each BLOB column being requested.

Yes, that is different. For lobs that are locatable by primary key or other means, that is a viable approach.
Actions
7. Re: Eagar loading behavior of BLOBs in Teiid 8.4

rareddy Apr 7, 2014 6:47 PM (in response to anilallewar)

Anil,

Yes, we are actually reading all BLOBs in the test now so that we make an apple-to-apple comparison between Teiid and Oracle JDBC access

How does numbers differ now?

W.r.t the reference being passed, does Teiid hold onto the source connection till the resultset is consumed or would it pull a new connection from the pool for each BLOB data being requested?

it keeps the connection/resultset open until the data is read or copylobs job is finished.

How would this approach be different from the Teiid one where you pass a reference to the BLOB via the DPQ? Please note that we don't hold any source connection reference in our design but would execute the query against source for each BLOB column being requested.

In Teiid, when its says sending reference value, it not sending the source connection reference, buffer manager will send a its own reference-id with which client can request for additional blob data. Where as Teiid engine keeps track of source resultset/connection to reference-id information. I does not ever re-queries the source for the blob data again rather keeps the current reference to resultset.

I know you said you do not hold source connection but your (1) contradicts that right? What you want to do may be valid but will only work in the embedded situation. But in embedded the handling is by true reference, no data ever streamed. If it is remote case, then I guess you wanting do is inform the client that there is blob information there, if they want to read blob, issue a separate call for it, rather than using in-line stream or any other option. Otherwise I understood your question wrong!

Ramesh..
Actions
8. Re: Re: Eagar loading behavior of BLOBs in Teiid 8.4

anilallewar Apr 8, 2014 4:33 PM (in response to rareddy)
Ramesh,

Interestingly Teiid seems to be faster for non-blob tables when we read the columns using getObject().toString() and read the BLOB bytes. I am trying to reason why that would be the case; the getObject() seems to be lot slower on Oracle side than it was on the previous run whereas it is about 1/3 slower for Teiid than Oracle. Please see the code snippet below.

My assumption was that Teiid should be marginally slower than direct JDBC access given same fetch size of 512 on the driver. This is what we found on the previous run when we had line 9-24 commented. Any pointers would be appreciated!!

No of Records Oracle-JDBC
Rem Avg - 4 Iterations Teiid-JDBC
Rem Avg - 4 Iterations
0 2 20
0 1 17
0 2 11
1 1 8
13 4 34
693 65 82
8,652 583 554
505,005 61,334 54,536
505,007 23,536 21,969
720,850 13,123 12,001
859,457 17,518 14,869
1,003,798 15,431 14,071
1,103,918 18,807 15,444

while (rs.next()) { ++nRows; for (int columnIndex = 1 ; columnIndex <= nCols ; ++columnIndex) { startTimeNsGetObject = System.nanoTime(); Object o = rs.getObject(columnIndex); elapsedTimeNsGetObject += (System.nanoTime() - startTimeNsGetObject); if (o != null) { if (o instanceof Blob) { hasBlob = true; blob = (Blob)o; blobLength = blob.length(); totalBytes += blobLength; // The first byte is at position 1 blob.getBytes(1, (int) blobLength); } else { totalBytes += ((o.toString().length()) * 2); } } } }

Anil
Actions
9. Re: Re: Eagar loading behavior of BLOBs in Teiid 8.4

rareddy Apr 8, 2014 4:52 PM (in response to anilallewar)

Anil,

In Teiid JDBC driver there is no penalty to calling getObject() vs getInt() kind of calls. Underneath all of them call getObject. When Teiid calls Oracle driver's getXXX call, it uses the specific call like getInt(), getString(0 etc as Teiid is aware of the data types it is expecting. I suspect Oracle driver is doing something more work in the case of getObject call that case. May be their data is packed in different format, and running a "toString" needs to convert to the native type then to string, where as in Teiid the data is already in native type. To test this theory, instead of doing getObject(), go specific type calls on resultset on both Teiid and Oracle.

Ramesh..
Actions
10. Re: Re: Eagar loading behavior of BLOBs in Teiid 8.4

anilallewar Apr 8, 2014 5:00 PM (in response to rareddy)

Ramesh,

Please note line 5-8 and you would find that we are just timing the getObject() call that seems to be a lot slower. The toString() call is timed using a different variable altogether.

As the next step, I am anyways going to remove the read part and only have the code do getObject() rather than doing toString() or reading bytes.

Anil
Actions
11. Re: Re: Eagar loading behavior of BLOBs in Teiid 8.4

anilallewar Apr 8, 2014 5:16 PM (in response to anilallewar)

Ramesh,

With reference to the design, when we encounter a Blob we send a BLOB reference to the client that has an InputStreamFactory associated with it . The InputStreamFactory has the JNDI name of the source connection pool associated with this resultset. Now when the client calls getBytes()/getInputStream() for the BLOB, the call gets passed to the getInputStream() method of the InputStreamFactory implementation. This will lookup a connection based on the JNDI name from the JBoss JNDI context and pass it to the callback class. The callback already has the primary key and other information to issue a query against the source connection and stream the BLOB back; so essentially for each BLOB data call we issue a separate call.

I hope that answers your question!! Thanks for the detailed explanation on the BLOB handling.

Anil
Actions

Go to original post

0	2	20
0	1	17
0	2	11
1	1	8
13	4	34
693	65	82
8,652	583	554
505,005	61,334	54,536
505,007	23,536	21,969
720,850	13,123	12,001
859,457	17,518	14,869
1,003,798	15,431	14,071
1,103,918	18,807	15,444