In this article, I will walk through an example, where I will show to move some data from CSV file into Apache Solr using Teiid. Then issue queries to Apache Solr, to get the contents of search using Teiid. Once the search results are available in Teiid, you can further integrate/enhance this data with other data sources. The prerequisites for this example are to have some familiarity with Teiid and Apache Solr. For the purposes of this article I am using Teiid 8.7.Alpha1 (or later) and Apache Solr 4.6.0
For the purpose of this example, I am using sample data from http://data.gov about baby names collected from social security numbers. The data is arranged by state. You can download data from http://catalog.data.gov/dataset/baby-names-from-social-security-card-applications-data-by-state-and-district-of- A copy of the file is attached. This ZIP file contains may different files, one file for each state with names of the babies with following format.
STATE, MALE or FEMALE, YEAR, NAME, # of babies KS,F,1910,Mary,251 KS,F,1910,Helen,172 KS,F,1910,Dorothy,144 ...
Unzip the "namesbystate.zip" file into any directory. For the purposes of this article I will assume this is unzipped into "/babynames" directory on your local machine.
The goal is load this data into Solr using Teiid, and issue queries against it using SQL. Note that, I am showing full example, where data is gathered in source system (CSV files) and loading of that data into Solr. Typically you may already have data in a RDBMS, Web Service etc. that needs to be loaded in Solr for searching. In that case you can skip the Step 2
Step 1: Install and Configure Apache Solr
- Download Apache Solr, and install it. It is assumed that you installed in "/solr" directory
- Copy "/solr/example/solr/collection1" directory, save in some temp directory
- Now rename "/solr/example/solr/collection1" to "/solr/example/solr/babynames"
- Delete the contents of "/solr/example/solr/babynames/data" directory if any
- Delete "/solr/example/solr/babynames/core.properties" file
- Edit "/solr/example/solr/babynames/conf/schema.xml" and delete all elements under "<fields>" and "<copyFields>"
- Now the below contents under "<fields>" contents.
<field indexed="true" name="_version_" stored="true" type="long"/> <field indexed="true" name="_root_" stored="false" type="string"/> <field indexed="true" name="id" stored="true" type="string" required="true" multiValued="false" /> <field indexed="true" name="state" stored="true" type="string" omitNorms="true"/> <field indexed="true" name="name" required="true" stored="true" type="string"/> <field indexed="true" name="gender" stored="true" type="string" omitNorms="true"/> <field indexed="true" name="birthyear" stored="true" type="int"/> <field indexed="true" name="totalcount" stored="true" type="int"/> <field indexed="true" name="text" stored="false" type="text_general" multiValued="true"/>
- Save and close above "schema.xml" file. These fields represent the schema for the data that is present in baby names CSV document from above. The schemal.xml file defined the internal document structure of Solr.
- Now start the Apache Solr by executing following command
cd "/solr/example" java -jar start.jar
- Once the Solr is started, using a web browser go to address "http://localhost:8983/solr"
- On Left hand side navigation go to "Core Admin", if you see "unload" button click it
- Now click "Add Core", and "babies" as core name and "babynames" as directory name as shown below.
- Now the Solr set up is complete. Make sure there were no errors while doing this.
- The above tasks, configured Solr with a "core", that can store/search documents.
Step 2: Building a VDB
Option 1: Using Teiid Designer (Skip to Option 2 for Dynamic VDB)
- Install Teiid 8.7.Alpha1 (or Later)
- Install Teiid Designer 8.3 or later
- Start Teiid Designer, switch to "Teiid Designer" perspective.
- Configure the Teiid Server, make sure you have made connection to Teiid Server.
- Create a new "Teiid Model Project" and name it "BabyNames".
- We need to create two separate source models, one for the CSV files, and another for the "Solr".
- A pre-built sample project attached "designer_project.tar"
Build Flat File based Model
- Choose "File -> import", then choose "Teiid Designer/File Source Flat >> Source and View Model". Follow directions in the article Connect to a Flat File using Teiid to create model based on files. Once you are done a view table like below.
- You can preview the data on right clicking on "babies" table, "Modeling -> Preview Data". Make sure this is successful.
Build Solr based Model
- Choose "File -> import", then choose "Teiid Designer/Teiid Connection >> Source Model".
- Next ..
- On the "Data the Source to use for Import" dialog box, click "New..", on the Create Data Source dialog should look like
- OK, then "Next >"
- In this dialog box make sure your "translator" is set to "solr", and click "Next >"
- The dialog will show DDL for the model, with table "babies", click up to "Finish". Currently I see (TEIIDDES-2017)
CREATE FOREIGN TABLE babies ( id string OPTIONS (SEARCHABLE 'Searchable'), birthyear integer OPTIONS (SEARCHABLE 'Searchable'), name string OPTIONS (SEARCHABLE 'Searchable'), state string OPTIONS (SEARCHABLE 'Searchable'), gender string OPTIONS (SEARCHABLE 'Searchable'), totalcount integer OPTIONS (SEARCHABLE 'Searchable') ) OPTIONS (UPDATABLE TRUE);
- When done you will see below table. Do a manual fix for [TEIIDDES-2018] by editing the data type property.
- Now build a VDB called "babynames" with all the models in this VDB. Deploy to server. You can right click deploy. Make sure you created data sources correctly.
Option 2: Building with Dynamic VDB (Skip this if you did Option 1)
- Edit "standalone-teiid.xml", under "resource-adapter" section add the XML for creation of data sources.
Data Source for Flat File
<resource-adapter id="file-ds"> <module slot="main" id="org.jboss.teiid.resource-adapter.file"/> <transaction-support>NoTransaction</transaction-support> <connection-definitions> <connection-definition class-name="org.teiid.resource.adapter.file.FileManagedConnectionFactory" jndi-name="java:/file-ds" enabled="true" pool-name="file-ds"> <config-property name="ParentDirectory"> /babynames </config-property> </connection-definition> </connection-definitions> </resource-adapter>
Data Source for Solr
<resource-adapter id="solr-ds"> <module slot="main" id="org.jboss.teiid.resource-adapter.solr"/> <transaction-support>XATransaction</transaction-support> <connection-definitions> <connection-definition class-name="org.teiid.resource.adapter.solr.SolrManagedConnectionFactory" jndi-name="java:/solr-ds" enabled="true" pool-name="solr-ds"> <config-property name="CoreName"> babies </config-property> <config-property name="url"> http://localhost:8983/solr/babies </config-property> </connection-definition> </connection-definitions> </resource-adapter>
- Start JBoss EAP + Teiid Server using command
./standalone.sh -c standalone-teiid.xml
- Create the following dynamic VDB, name the file as "babynames-vdb.xml"
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <vdb name="babynames" version="1"> <model name="solr"> <source name="solr" translator-name="solr" connection-jndi-name="java:/solr-ds"/> </model> <model name="file_source"> <source name="file" translator-name="file" connection-jndi-name="java:/file-ds"/> </model> <model name = "file" visible = "true" type = "VIRTUAL" > <metadata type = "DDL"><![CDATA[ CREATE VIEW babies ( name varchar, state varchar, gender varchar, birthyear integer, totalcount integer ) AS SELECT A.name, A.state, A.gender, A.birthyear, A.totalcount FROM (EXEC file_source.getTextFiles('VA.TXT')) AS f, TEXTTABLE(f.file COLUMNS state string, gender string, birthyear integer, name string, totalcount integer) AS A; ]]> </metadata> </model> </vdb>
- Using either admin-console or CLI, deploy the above VDB. See directions at Deploying VDBs
Step 3: Testing
- Make sure you have active VDB by checking the Console or Log files or by using web-console http://localhost:9990/console/App.html#vdb-runtime
- Now using a tools like SquirreL or Eclipse Data Tools, you can issue queries like below.
Check the records in the File Source
select * from file.babies
Check records in Solr Source (Should be none right now)
select * from solr.babies
Dump records from File Source into Solr
insert into solr.babies (id, name, state, gender, birthyear, totalcount) select f.name, f.name, f.state, f.gender, f.birthyear, f.totalcount from file.babies as f
you should see
(131638 rows affected) Elapsed Time: 0 hr, 1 min, 31 sec, 424 ms.
Now you can issue commands like
select * from solr.babies select * from solr.babies where name = 'Mary' select * from solr.babies where name like '*ary' select * from solr.babies where birthyear > 1910 and birthyear < 1920
If you have other models like RDBMS, Salesforce, Web Service etc you can join the table between those sources and Solr. You can also issue SQL queries like INESRT/UPDATE/DELETE on Solr source to manage the documents in the store. For full documentation on Solr in Teiid read this documentation. Apache SOLR Translator
Hopefully this article gave how to integrate Solr data with Teiid. Let us know if you have any comments or suggestions in Teiid forums.
Ramesh..
Comments