Text to Table with Teiid

Version 4

    If you are a programmer, chances are that you have written code at least once to read data from a text file to do something with data. Today, I am going to show you how easy it is convert the text contents into a relational Table data using Teiid.  The benefit is, now you can access your text data using JDBC API in your Java applications. Since you already using Teiid, you can extend this use case to combine data from multiple text files, or from a relational database using SQL and write back to text file, all with out writing single line of Java code. Sounds interesting? Then read on!

     

    Teiid supported integrating data from text files may be from very first alpha we released as it is one of the of the cardinal use-cases in any data integration project. What I am excited to share now is how simple it became with Teiid 7.0 with the introduction of "TEXTTABLE" function. Lets jump into an example with out wasting any more time.

     

    This is a hands on example, so If you do not have the Teiid Server installed, please install Teiid 7.x Server along with JBoss AS 5.1.0. See Teiid documents if you need installation instructions. In Teiid you define Virtual Database (VDB) for doing the data integration. Consult Teiid documents to learn more if you are not familiar VDB concepts. So, lets start with defining a simple Dynamic VDB. For continuity sake I will use the same sample example from Teiid examples.

     

    portfolio-vdb.xml

     

    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <vdb name="portfolio" version="1">
        <model name="dummy">
            <source name="dummy" translator-name="loopback"/>
        </model>
    </vdb>
    

     

    Here I created a VDB with dummy schema (model) inside it. We will develop this as the example progresses.

     

    1. Deploy this VDB into Teiid. (copy the file into <jboss-as>/server/default/deploy directory)
    2. Start the JBoss AS server. (execute <jboss-as>/bin/run.sh script)
    3. Now use a JDBC SQL Client tool like SquirreL for quick testing and connect the VDB that has been deployed in previous step.

     

    Now that all the setup is done, we will execute following SQL queries for converting Text to Tabular data.

     

    Example 1: Comma separated data

     

    If you have same CSV data like this

     

    RHT, 45.23

    GE, 16.45

    DELL, 10.75

     

    You can form a SQL Query like this. This example inlines all the above data as string in single line with embedded "\n" new line symbols

     

    SELECT * FROM TEXTTABLE ('RHT, 45.23
      GE, 16.45
      DELL, 10.75' 
    COLUMNS symbol string, price double) stock
    

     

    In the above SQL query, we defined a "stock" table that has three columns (symbol, price) and result would be three rows of data like

     

     

    symbolprice
    RHT45.23
    GE16.45
    DELL10.75

     

    now apply some filtering

     

    SELECT symbol FROM TEXTTABLE ('RHT, 45.23
      GE, 16.45
      DELL, 10.75' 
    COLUMNS symbol string, price double) stock WHERE price > 15.00

     

    symbol
    RHT
    GE

     

    By selecting only "symbol", and providing criteria this example showed how to only select subset of your data and at the same time showed to how to apply filters on your data.

     


    Example 2: Fixed Width Text

     

    If you have fixed text file like this where symbol has 4, price has 5 characters respectivel. If they are not exact length, add blank spaces or leading zeros if numeric filed)

     

    RHT 45.23

    GE  16.45

    DELL10.75

     

    SELECT * FROM TEXTTABLE (
    'RHT 45.23
    GE  16.45
    DELL10.75' 
    COLUMNS symbol string WIDTH 4, price double WIDTH 5) stock

     

    The result will be same as the before

     

    symbolprice
    RHT45.23
    GE16.45
    DELL10..75

     

     

    Example 3: Data from a file.

     

    So far the above examples showed you how parse the data from text string, but most often this kind of data resides in a text file. This example will show you how to fetch data from text file and use TEXTTABLE function to convert data to relational form.

     

    For example purpose, assume that the contents of text file are in the csv format as before

     

    marketdata.txt

     

    symbol, price
    RHT, 45.23
    GE, 16.45
    DELL, 10.75
    

     

    Before we can access the above file though Teiid, we need re-configure the VDB and define a "source" (aka physical) model, that knows how to access the file from the disk. Now below is modified VDB file.

     

    portfolio-vdb.xml

     

    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <vdb name="portfolio" version="1">
        <model name="marketdata">
            <source name="text-connector" translator-name="file" 
                  connection-jndi-name="java:marketdata-file"/>
        </model>
    </vdb>
    




     

    OK, here few things needs explaining. First, I removed the "dummy" model from before, replaced with "marketdata". The "source " element defined what kind of source this model if encapsulating.

    • "name" attribute defines the name of source given by the user. It is used in logging etc.
    • "translator-name" defines the type of source. For this example it is "file". (this can be any database like "oracle", "db2". Check here for complete list.
    • "connection-jndi-name" defines the "jndi" name of the data source in the JBoss AS. For accessing a file, this may seem excessive, however for consistent connection creation and access Teiid used JCA data sources.

     

    Now creating a JCA data source for file access posses a interesting problem. No worries, Teiid provides the resource adapters for the file, database, ldap, web service, sales force etc. In JBoss AS, to create a data source you need to create a "xxx-ds.xml" file. Lets create one for the file access for our example.

     

    marketdata-file-ds.xml

     

    <?xml version="1.0" encoding="UTF-8"?>
    <connection-factories>
       <no-tx-connection-factory>
          <jndi-name>marketdata-file</jndi-name>
          <rar-name>teiid-connector-file.rar</rar-name>
          <connection-definition>javax.resource.cci.ConnectionFactory</connection-definition>                   
          <config-property name="ParentDirectory" 
                   type="java.lang.String">{my-directory}</config-property>            
          <max-pool-size>20</max-pool-size>
        </no-tx-connection-factory>
    </connection-factories>
    

     

    in the above XML, replace {my-directory} with directory location of where you saved "marketdata.txt" file. Then deploy the VDB and -ds.xml file in JBoss AS. (copy them both to the deploy directory). Now re-connect to the VDB using SquirreL SQL client again and now you can issue below commands.

     

    select stock.* from (call MarketData.getTextFiles('*.txt')) f, 
    TEXTTABLE(f.file COLUMNS symbol string, price double HEADER) stock
    

     

    and you will see our familer results

     

    symbolprice
    RHT45.23
    GE16.45
    DELL10.75

     

    Lets take second and understand what is going on here. This above command takes advantage nested table reference, where table results from one FROM clause Table can be fed into next FROM clause Table in the query. So, in the above query

     

    (call MarketData.getTextFiles('*.txt')) f
    

     

    executes a stored procedure called "getTextFiles" exposed by the file "source" model created above, and its execution results are exposed as a Table aliases to 'f'. The "getTextFiles" procedure exposed by the "file" has the following signature.

     

    ResultSet getTextFile (string pathAndPattern)
    
    ResultSet (has following columns)
         - file as clob
         - filePath as string
    

     

     

    Now to the next segment of the SQL query

     

    TEXTTABLE(f.file COLUMNS symbol string, price double HEADER) stock


    This looks similar to SQL from example 1, with couple changes. Here instead of passing the text contents directly, we passed 'f.file' which is a clob that contains the contents of the marketdata.txt file. Since, TEXTTABLE understands to stream data from clob this works exactly same as before. Next difference "HEADER" tag, what this is saying is that in marketdata.txt file, on its first row this file contains column names such "symbol, price.." and TEXTTABLE will match the columns to that defined in the query. There are lot more options you can define like your own delimiter character or Escape characters etc. So check the documentation for details.

     

    Example 4: Creating a View from text data

     

    As of Teiid 7.6 Designer, it has added a new feature to read a csv file and it contents, and help you create a view tables using UI wizards. See this article http://community.jboss.org/docs/DOC-17403

     

    The real value of Teiid is in its ability to define views over the data from different sources where the end user/developer does not really know where the data is gathered from, all they know is how it is exposed to them. Teiid 7.0 does not support views with the Dynamic VDB, this is on Teiid's roadamp for 8.0 version.

     

    Currently to define a view layer, you need to use Teiid Desinger, a eclipse based environment that aids to develop a schema model for use by the Teiid runtime. Teiid Desinger can produces a VDB artifact (this is not a XML file) that can contain views. The below are steps you need to take to create VDB, that has the view model. for your convenience I have attached the eclipse project for this example.

     

    • Download and install Teiid Designer.
    • Start the Designer.
    • Create "Model Project" name it as "portfolio"
    • Create "Metadata Model" with class "Relational" and type "source" and name it as "MarketData". (this is going to file source)
    • Create a "procedure" called "getTextFiles", and add one IN parameter "pathAndPattern:string". Also add a procedure result and add two columns to the result. "file:clob" and "filePath:string"
    • Save.
    • Create a another "Metadata Model" with class "Relational" and type "view" and name it as "MarketDataView".
    • Create a base table in the model, and name it "stock"
    • Add two columns to the "stock" table
      • symbol:string
      • price:double
    • Now double click on the "stock" table and in the "transformation editor", copy or type in the following code.

     

    SELECT 
            symbol AS symbol, price AS price
    FROM 
            (EXEC MarketData.getTextFiles('*.txt')) AS f, 
    TEXTTABLE(file COLUMNS symbol string, price double HEADER) AS s

     

    • Save
    • Now create a "virtual database" named "PortfolioView", and add both the "MarketData" and "MarketDataView" models into it.
    • On VDB editor, make sure the "translator" name for the "MarketData" is set to "file" and JNDI name is set to "marketdata-file". This is similar to "source" element we configured in the Dynamic VDB case.
    • Make sure on the "MarketDataView" the visible box is checked.
    • Save, and deploy the VDB into the Teiid runtime. Since we previously deployed the "marketdata-file-ds.xml" file, we already defined the data source for the file access.

     

    Now using SquirreL tool or eclipse DTP connect to the new VDB and issue a command like

     

    SELECT * FROM stock
    

     

    The results would be same as before.

     

    symbolprice
    RHT45.23
    GE16.45
    DELL10.75

     

    Using the similar technique you can get data from different sources and integrate the sources to expose only the the data you need through views. This effectively abstracts the data retrieval and only exposes the schema that you would like to expose to the user.