3 Replies Latest reply on May 7, 2019 8:57 AM by rareddy

Teiid Blob Reference Stream (InputStreamFactory) VS Simple Java Stream

fzkhan May 7, 2019 7:41 AM

Hi Teiid Community,

In our translator we have implemented Blob streaming as mentioned in this Article using the InputStreamFactory:

"With reference to the design, when we encounter a Blob we send a BLOB reference to the client that has an InputStreamFactory associated with it . The InputStreamFactory has the JNDI name of the source connection pool associated with this resultset. Now when the client calls getBytes()/getInputStream() for the BLOB, the call gets passed to the getInputStream() method of the InputStreamFactory implementation. This will lookup a connection based on the JNDI name from the JBoss JNDI context and pass it to the callback class. The callback already has the primary key and other information to issue a query against the source connection and stream the BLOB back; so essentially for each BLOB data call we issue a separate call."

We have used Java's piped I/O (PipedInputStream/OutputStream) which is based on the producer-consumer pattern, where the producer produces data and the consumer consumes it

CopyLobs is set to false, The data is placed on local machine

This is one design, for the same data we have created a sample java program with similar piped i-o design excluding the teiid InputStreamFactory blob reference implementation

We have noticed following stats (excluding connection time)

Total Time to stream data in Simple Java Program : 11 seconds (average 3 iterations)

Total Time to stream data through Teiid Program : 20 seconds (average 3 iterations)

We have tweaked some configurations like "lob-chunk-size-in-kb" to see if we can have some performance benefit

Trying to figure out this performance overhead while streaming the similar data through teiid, are we using these things in a right way or we need to add some configurations?

Thanks

1. Re: Teiid Blob Reference Stream (InputStreamFactory) VS Simple Java Stream

rareddy May 7, 2019 8:20 AM (in response to fzkhan)

Through Teiid there will be an extra layer of marshaling, depending upon size of the blob this can be a factor. But it is hard to compare that simply, one would have peel each layer make side by side comparisons. There can be others like were connections hot or not, how is your java program reading blob data.

Ramesh..
Actions
2. Re: Teiid Blob Reference Stream (InputStreamFactory) VS Simple Java Stream

fzkhan May 7, 2019 8:33 AM (in response to rareddy)

Ramesh
Java is reading the data from the file, writes into a PipedOutputStream, which is than consumed in another thread
Can you please explain what kind of marshaling, is it a way teiid packages a blob data to stream?

Fahad
Actions
3. Re: Teiid Blob Reference Stream (InputStreamFactory) VS Simple Java Stream

rareddy May 7, 2019 8:57 AM (in response to fzkhan)

That is incorrect comparison with your Java program then IMO. In Java program, you are working everything under single VM, with Teiid you have at least 2 VMs, one Teiid and client. Depending upon how Blob is read that is another process it is being read from. When you mentioned you are using the JNDI name, I assumed you are reading from an external database, which I still think is the case for Teiid, not so for the Java program. From your above comment, I read as in your Java program there is no JNDI lookup etc, just reading from the file and using the piped in/out and makes the result in a separate thread.

java : read from file/write to pipe/read

java external client -> Teiid --> look up JNDI/connection/read (unmarshall) data/write to inputstream -> marshell result to external client --> data read in external client

also note database vendors might send data in some specific format or condensed etc that may be reversed in jdbc driver, to compare you need to write a program that you can read directly from the source using same fetching techniques you used in the Inputstreamfactory and carefully account for connection times etc, and then read that data from a second program not in from separate thread.

Otherwise, you can also use Teiid Embedded wherein single vm you can have your client reading and Teiid reading in same vm to bring them little closer in comparison.
Actions

Go to original post