Integrate Amazon SimpleDB with Teiid

Version 4

    Amazon SimpleDB is a hosted data store offered by Amazon.  Basic use is free, and Amazon offers many commercial access levels.


    In this article, I will walk through an example showing to movement of data from a CSV file into Amazon SimpleDB using Teiid.  Then I'll demonstrate issuing queries through Teiid to Amazon SimpleDB.  Once the results are available in Teiid, you can further integrate/enhance this data with other data sources.  To keep this example simple I choose importing the data from CSV file.  Note that CSV is just an example, you could move data from any RDBMS, webservice, XML, or custom source.  The prerequisites for this example are to have some familiarity with Teiid and to use Teiid 8.7 (or later).


    For more on Teiid's Amazon SimpleDB Translator see Amazon SimpleDB Translator.


    Step 1: Get Sample Data


    For the purpose of this example, I am using sample data from http://data.gov about baby names collected from social security numbers over the last century. The data is arranged by state. You can download data from http://catalog.data.gov/dataset/baby-names-from-social-security-card-applications-data-by-state-and-district-of-columbiaThis ZIP file contains may different files, one file for each state with names of the babies.  A copy of a state file is attached (VA.txt). Note that the below file does not have column headers as shown below.    To help ease the import of the file in Teiid, you can manually add a header as shown below.

     

    STATE, MALE or FEMALE, YEAR, NAME, # of babies
    KS,F,1910,Mary,251
    KS,F,1910,Helen,172
    KS,F,1910,Dorothy,144
    ...
    
    
    
    
    
    

     

    Unzip the "namesbystate.zip" file into any directory. For the purposes of this article I will assume this is unzipped into "/babynames" directory on your local machine.

     

    Step 2: Create a Account on Amazon for SimpleDB

     


    Step 3: Building a VDB

    Option 1: Using Teiid Designer (Skip to Option 2 for Dynamic VDB)

    • Install Teiid 8.7 (or Later)
    • Install Teiid Designer 8.5 or  later. 8.3.2, and 8.4 version does not work correctly.
    • Start Teiid Designer, switch to "Teiid Designer" perspective.
    • Start JBoss EAP in Teiid profile.
    • Configure the Teiid Server, make sure you have made connection to Teiid Server.
    • Create a new "Teiid Model Project" and name it "BabyNames".
    • We need to create two separate source models, one for the CSV files, and another for the "simpledb".


    Build Flat File (CSV) Based Model


    • Choose "File -> import", then choose "Teiid Designer/File Source Flat >> Source and View Model". Follow directions in the article Connect to a Flat File using Teiid to create model based on files. Once you are done, you should see two models with a view table like below.


    designer1.png

    • The transformation of the "babies" view table should look like


    SELECT  ROW_NUMBER() OVER (ORDER BY A.state) AS id, A.name, A.state, A.gender, A.birthyear, A.occurrences
      FROM  (EXEC file_source.getTextFiles('VA.TXT')) AS f,
      TEXTTABLE(f.file COLUMNS state string, gender char, birthyear integer, name string, occurrences integer) AS A
    
    
    
    
    


    • You can preview the data on right clicking on "babies" table, "Modeling -> Preview Data". Make sure this is successful.


    Note: The "id" field and PK reference was added after the wizard completed, you would need to manually tweak the transformation in the view table. In the above ROW_NUMBER is used to generate unique number to be used as PK.


    Build SimpleDB Based Model


    • Using the scratch pad defined here http://docs.aws.amazon.com/AmazonSimpleDB/latest/GettingStartedGuide/CreatingADomain.html, create a domain called "babies", and then create attributes named "id", "name", "state",  "gender", "birthyear", "occurences".
    • Switch back to Designer
    • It is time to import the metadata from the SimpleDB. Right click on the project, choose "import", and under "Teiid Designer", choose "Teiid Connection >> Source Model" type.  See note about Designer version to use above or use option 2 - dynamic vdb
    • Click "new.." to create a new data source, and choose "simpledb" type, and provide the necessary properties


    simpledb2.png

    • Click OK
    • Select the "translator" as "simpledb" and finish the wizard.

     

    simpledb1.png

     

    NOTE: You may have to create PK manually and assign to the ID column, also the string lengths might not be exactly like above.

     

    Now build a VDB with all the models defined so far, and create data source as shown before and deploy the VDB. Make sure the JNDI names for the sources and translators were assigned correctly.  Save, right click on the VDB and choose "modeling->deploy". Make sure the VDB is deployed in "active" state and start testing the SQL interface. On to Step 3.

    Option 2: Building with Dynamic VDB (Skip this if you did Option 1)

     

    • Edit "standalone-teiid.xml", under "resource-adapter" section add the XML fragments for creation of data sources. Note to adjust ParentDirectory correctly to match where you unzipped the babynames zip file. And SimpleDB Access Keys  to match your installation.


    Data Source for Flat File

     

    <resource-adapter id="file-ds">
        <module slot="main" id="org.jboss.teiid.resource-adapter.file" />
        <transaction-support>NoTransaction</transaction-support>
        <connection-definitions>
            <connection-definition class-name="org.teiid.resource.adapter.file.FileManagedConnectionFactory" jndi-name="java:/file-ds" enabled="true" pool-name="file-ds">
                <config-property name="ParentDirectory">/babynames</config-property>
            </connection-definition>
        </connection-definitions>
    </resource-adapter>
    
    
    
    
    

     

    Data Source for SimpleDB

            <resource-adapter id="simpledb">
                <module slot="main" id="org.jboss.teiid.resource-adapter.simpledb"/>
                <transaction-support>NoTransaction</transaction-support>
                <connection-definitions>
                    <connection-definition class-name="org.teiid.resource.adapter.simpledb.SimpleDBManagedConnectionFactory"
                            jndi-name="java:/simpledbDS"
                            enabled="true"
                            use-java-context="true"
                            pool-name="teiid-simpledb-ds">
                          <!-- simpledb access key id -->
                          <config-property name="AccessKey">xxx</config-property>
                          <!-- simpledb secret access key -->
                          <config-property name="SecretAccessKey">xxx</config-property>                      
                    </connection-definition>
                </connection-definitions>
            </resource-adapter>
    
    
    
    
    

     

    • Start JBoss EAP + Teiid Server using command
    ./standalone.sh -c standalone-teiid.xml
    
    
    
    
    



    • Create the following dynamic VDB, name the file as "simpledb-vdb.xml". Note that in this VDB we are not defining the table definition from Teiid.  If you have not done previous step, you can manually define the DDL for "simpledb" model, and above domain will be created automatically. See the commented section, un-comment it. The idea here is Teiid can either read existing domains as tables, or you can create domains as you see it from Teiid perspective.


    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <vdb name="babynames" version="1">
        <model name="simpledb">
            <source name="simpledb" translator-name="simpledb" connection-jndi-name="java:/simpledbDS"/>
    <!--
             <metadata type = "DDL"><![CDATA[
                CREATE FOREIGN TABLE babies (
                 id string NOT NULL OPTIONS(NAMEINSOURCE 'itemName()'),
                 name string,
                 state string,
                 gender string,
                    birthyear string,
                    occurences string,
                 CONSTRAINT PK0 PRIMARY KEY(id)
                ) OPTIONS (UPDATABLE TRUE);
             ]]>
             </metadata>
    -->
        </model>
        <model name="file_source">
            <source name="file" translator-name="file" connection-jndi-name="java:/file-ds" />
        </model>
        <model name="file" visible="true" type="VIRTUAL">
            <metadata type="DDL"><![CDATA[      
                CREATE VIEW babies (id integer PRIMARY KEY,
                                    name varchar(25),        
                                    state varchar(25),
                                    gender char(1),
                                    birthyear integer,
                                    occurences integer
                ) AS SELECT ROW_NUMBER() OVER (ORDER BY A.state) as id , A.name, A.state, A.gender, A.birthyear,  A.occurences FROM
                (EXEC file_source.getTextFiles('VA.TXT')) AS f,          
                TEXTTABLE(f.file COLUMNS state string, gender char, birthyear integer, name string, occurences integer) AS A; 
                ]]>
            </metadata>
        </model>
    </vdb>
    
    
    
    
    


    • Using either admin-console or CLI, deploy the above VDB. See directions at Deploying VDBs


    Step 4: Testing and Querying

    • Make sure you have active VDB by checking the Console or Log files or by using web-console http://localhost:9990/console/App.html#vdb-runtime
    • Now using a tools like SquirreL or Eclipse Data Tools, you can make JDBC connection to "babynames" VDB using JDBC URL"jdbc:teiid:babynames.1@mm://localhost:31000;" and issue SQL queries like below.

     

    Check the records in the File Source

     

    select * from file.babies

     


    Check records in SimpleDB Source (Should be none right now)

    select * from simpledb.babies
    
    
    
    
    

     

    Dump records from File Source into Accumulo

    insert into simpledb.babies (id, name, state, gender, birthyear, occurences)
      select f.id, f.name, f.state, f.gender, f.birthyear, f.occurences from file.babies as f
    
    
    
    
    


    you should see

    (131638 rows affected)
    Elapsed Time:  0 hr, 1 min, 31 sec, 424 ms.
    
    
    
    
    

     

    Now you can issue commands like

    select * from simpledb.babies
    select * from simpledb.babies where name = 'Mary'
    select * from simpledb.babies where birthyear = '1910' or birthyear '1920'
    
    
    
    
    


    NOTE: SimpleDB only supports lexicographical compares, so all the compares are done on string based data. Zero pad the numerical data appropriately when working with numbers.

     

    If you have other models like RDBMS, Salesforce, Web Service etc you can join the table between those sources and SimpleDB. You can also issue SQL queries like INESRT/UPDATE/DELETE on SimpleDB source to manage the records in the store. For full documentation on Amazon SimpleDB in Teiid read this documentation Amazon SimpleDB Translator

     

    Hopefully this article gave how to integrate Amazon's SimpleDB  with Teiid. Let us know if you have any comments or suggestions in Teiid forums.

     

    Ramesh..