7 Replies Latest reply on Nov 17, 2016 4:27 PM by jrod2016

    Is Teiid using XML Streaming for this View definition

    jrod2016

      My view definition is as follows:

       

      CREATE VIEW 
      TestADFView (Product_name VARCHAR(102), Date_ TIMESTAMP, Unit_price DOUBLE) 
      AS SELECT A.Product_name, A.Date_, A.Unit_price 
      FROM (EXEC AdfcoreSource.executeProfile(52)) AS f, 
      XMLTABLE('/*[local-name()=''dataset'']/*[local-name()=''data'']/*[local-name()=''row'']' 
      PASSING XMLPARSE(DOCUMENT f.result WELLFORMED) 
      COLUMNS 
      Product_name VARCHAR(102) PATH '*[local-name()=''value''][1]/text()', 
      Date_ TIMESTAMP PATH '*[local-name()=''value''][2]/text()', 
      Unit_price DOUBLE PATH '*[local-name()=''value''][3]/text()') 
      AS A
      

       

      Is Teiid using streaming for processing or loading the entire document into memory? I'm getting the following OOM exception:

       

      java.lang.OutOfMemoryError: Java heap space
        at net.sf.saxon.tree.tiny.TinyTree.ensureNodeCapacity(TinyTree.java:258) ~[saxon9ee.jar:na]
        at net.sf.saxon.tree.tiny.TinyTree.addNode(TinyTree.java:377) ~[saxon9ee.jar:na]
        at net.sf.saxon.tree.tiny.TinyBuilder.startElement(TinyBuilder.java:245) ~[saxon9ee.jar:na]
        at net.sf.saxon.event.NamespaceReducer.startElement(NamespaceReducer.java:73) ~[saxon9ee.jar:na]
        at net.sf.saxon.event.ProxyReceiver.startElement(ProxyReceiver.java:129) ~[saxon9ee.jar:na]
        at org.teiid.query.xquery.saxon.PathMapFilter.startElement(PathMapFilter.java:126) ~[teiid-engine-9.0.1.jar:9.0.1]
        at net.sf.saxon.event.ReceivingContentHandler.startElement(ReceivingContentHandler.java:292) ~[saxon9ee.jar:na]
        at org.apache.xerces.parsers.AbstractSAXParser.startElement(Unknown Source) ~[xercesImpl-2.10.0.jar:na]
        at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source) ~[xercesImpl-2.10.0.jar:na]
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source) ~[xercesImpl-2.10.0.jar:na]
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) ~[xercesImpl-2.10.0.jar:na]
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) ~[xercesImpl-2.10.0.jar:na]
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) ~[xercesImpl-2.10.0.jar:na]
        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) ~[xercesImpl-2.10.0.jar:na]
        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) ~[xercesImpl-2.10.0.jar:na]
        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) ~[xercesImpl-2.10.0.jar:na]
        at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:428) ~[saxon9ee.jar:na]
        at net.sf.saxon.event.Sender.send(Sender.java:170) ~[saxon9ee.jar:na]
        at net.sf.saxon.Configuration.buildDocument(Configuration.java:3361) ~[saxon9ee.jar:na]
        at net.sf.saxon.Configuration.buildDocument(Configuration.java:3303) ~[saxon9ee.jar:na]
        at org.teiid.query.xquery.saxon.XQueryEvaluator.evaluateXQuery(XQueryEvaluator.java:159) ~[teiid-engine-9.0.1.jar:9.0.1]
        at org.teiid.query.processor.relational.XMLTableNode.evaluate(XMLTableNode.java:295) ~[teiid-engine-9.0.1.jar:9.0.1]
        at org.teiid.query.processor.relational.XMLTableNode.nextBatchDirect(XMLTableNode.java:193) ~[teiid-engine-9.0.1.jar:9.0.1]
        at org.teiid.query.processor.relational.RelationalNode.nextBatch(RelationalNode.java:282) ~[teiid-engine-9.0.1.jar:9.0.1]
        at org.teiid.query.processor.relational.LimitNode.nextBatchDirect(LimitNode.java:102) ~[teiid-engine-9.0.1.jar:9.0.1]
        at org.teiid.query.processor.relational.RelationalNode.nextBatch(RelationalNode.java:282) ~[teiid-engine-9.0.1.jar:9.0.1]
        at org.teiid.query.processor.BatchIterator.finalRow(BatchIterator.java:69) ~[teiid-engine-9.0.1.jar:9.0.1]
        at org.teiid.common.buffer.AbstractTupleSource.getCurrentTuple(AbstractTupleSource.java:70) ~[teiid-engine-9.0.1.jar:9.0.1]
        at org.teiid.query.processor.BatchIterator.getCurrentTuple(BatchIterator.java:84) ~[teiid-engine-9.0.1.jar:9.0.1]
        at org.teiid.common.buffer.AbstractTupleSource.hasNext(AbstractTupleSource.java:92) ~[teiid-engine-9.0.1.jar:9.0.1]
        at org.teiid.query.processor.relational.NestedTableJoinStrategy.process(NestedTableJoinStrategy.java:119) ~[teiid-engine-9.0.1.jar:9.0.1]
        at org.teiid.query.processor.relational.JoinNode.nextBatchDirect(JoinNode.java:227) ~[teiid-engine-9.0.1.jar:9.0.1]
      

       

      The document being processed is 244 MB and my heap size is set to 1024M. This works with a heap size of 4096 MB. The Teiid result set is being consumed by a JDBC client.

        • 1. Re: Is Teiid using XML Streaming for this View definition
          shawkins

          No, it is not using streaming.  The logic for streaming is pretty restrictive in what it looks for in the context expression.  In this case you need to use /*:dataset/ rather than the local-name() test.  The decision about streaming is shown in the plan debug log, but I think it needs added to the query plan as well.  You can also submit a JIRA to have the streaming logic look for the local-name node test form.

          • 2. Re: Is Teiid using XML Streaming for this View definition
            jrod2016

            Thanks for your response Steven. This definitely did the trick. Previously I was unable to process a 250 MB file wit 1GB of heap space. With this tweak, I'm able to process a 550 MB XML file with just 512 MB heap space.

             

            I also wanted to know more about the logs you mentioned which indicate streaming. Where would I find this, or how do I activate that logging?

             

            There is a possibility that we may look at CSV instead of XML just to limit the verbosity of the data coming out of our source. Will CSV also be processed in a streaming fashion like XML?

            • 3. Re: Is Teiid using XML Streaming for this View definition
              shawkins

              > Where would I find this, or how do I activate that logging?

               

              You can obtain the query plan debug log using "SET SHOWPLAN DEBUG" Query Plans · Teiid Documentation 

               

              Also at a trace level you'll see everything in the server log, but could be more verbose than you want.  I'll capture an issue about showing whether streaming is being used in just the query plan as well.

               

              > There is a possibility that we may look at CSV instead of XML just to limit the verbosity of the data coming out of our source. Will CSV also be processed in a streaming fashion like XML?

               

              Yes, CSV is processed in a streaming manner.

              • 4. Re: Is Teiid using XML Streaming for this View definition
                jrod2016

                Do you have a code example of streaming with TEXTTABLE?

                 

                I get the following exception in org.teiid.query.processor.QueryProcessor for TEXTTABLE.

                TEIID30179 Text parse error: Could not read data in Unknown

                 

                This is my override for ProcedureExecution:

                 @Override
                   public List<?> getOutputParameterValues() throws TranslatorException {
                      final IADFProcedureExecutionCallback procedureExecutionCallback = executionFactory.getProcedureExecutionCallback();
                      final ArtifactOutputFormatEnum outputFormat = procedureExecutionCallback.getOutputFormat(profileId);
                      switch (outputFormat) {
                         case XML:
                            return Collections.singletonList(new BlobType(new BlobImpl() {
                               @Override
                               public InputStream getBinaryStream() throws SQLException {
                                  return returnValue;
                               }
                            }));
                         case CSV:
                            return Collections.singletonList(new ClobType(new ClobImpl(new InputStreamFactory() {
                               @Override
                               public InputStream getInputStream() throws IOException {
                                  return returnValue;
                               }
                            }, -1)));
                         default:
                            throw new ApiException(String.format("Unmapped output format [%s]", outputFormat.toString()),
                                  INVALID_DATASOURCE_DEFINITION);
                      }
                   }
                
                  
                
                

                 

                 

                This is my View definition:

                CREATE VIEW LargeCsv 
                (Product_name VARCHAR(102), Date_ TIMESTAMP, Unit_price DOUBLE) 
                AS SELECT A.Product_name, A.Date_, A.Unit_price 
                FROM (EXEC AdfcoreSource.executeProfileAsCSV(356)) AS f, 
                TEXTTABLE(f.result COLUMNS Product_name VARCHAR(102), Date_ TIMESTAMP, Unit_price DOUBLE) AS A
                
                • 5. Re: Is Teiid using XML Streaming for this View definition
                  shawkins

                  > Do you have a code example of streaming with TEXTTABLE?

                   

                  TEXTTABLE is not like XMLTABLE.  TEXTTABLE always processes in a streaming manner as it can just parse line by line.

                   

                  The exception is simply due to an IOException while processing the stream.  The full stack trace will show the underlying exception.

                  • 6. Re: Is Teiid using XML Streaming for this View definition
                    shawkins

                    https://issues.jboss.org/browse/TEIID-4586 updates the query plan to add a Streaming flag to the XMLTABLE node.

                    • 7. Re: Is Teiid using XML Streaming for this View definition
                      jrod2016

                      FYI - I've got TEXTTABLE working. The error was caused due to the encoding on the stream.