9 Replies Latest reply on Aug 19, 2014 10:23 AM by jason.marley

CouchDB translator guidance

jason.marley Aug 11, 2014 10:29 AM

Hi All,

I've recently started creating a translator for CouchDB and I am kind of in a quandary. For interacting with CouchDB basic HTTP rest API works well, GET, PUT, DELETE, etc, which is fine especially when pulling all fields/data. However, anything beyond SELECT *, CouchDB doesn't offer much out of the box and requires custom views to be programmed in either JavaScript or CoffeeScript and stored in CouchDB. Thus the quandary, because anytime a new database (presumably a related set of document/field 's, but not guaranteed ) is added then a new view needs to be created to display a subset of that information should it be desired.

Where it stands now, I am not sure it makes sense to create separate translator, rather if we assume that if CouchDB users have set up there databases with said custom views, then Teiid can then communicate directly with CouchDB via REST?

Thoughts?

Jason

1. Re: CouchDB translator guidance

shawkins Aug 13, 2014 4:42 PM (in response to jason.marley)

It definitely sounds like an initial approach around REST should be documented as it will allow integration even with older versions.

It seems like you are saying that projection/selection could require the translator to write a script to be executed against CouchDB - or can that not be done on the fly?
Actions
2. Re: CouchDB translator guidance

tom9729 Aug 14, 2014 5:17 PM (in response to jason.marley)

Unfortunately CouchDB doesn't really offer a good way to do general purpose queries. You can write map/reduce functions to create what are essentially materialized views (the results are precomputed and stored on disk) which are indexed and accessed by whatever you return as the primary key. There is no support for secondary indexes which means you are basically going to have to iterate through all of the documents stored in the database. CouchDB supports adhoc views but these are slow because CouchDB is just going to iterate through every single document in the database.

One option is to index documents with Lucene, and then query the index. I developed a CouchDB translator using this strategy and was able to get pretty good results. I can't share the code but would be happy to offer guidance. See https://github.com/rnewson/couchdb-lucene for the Lucene integration. Another option might be to use SOLR, since I think Teiid already has a translator for that.
Actions
3. Re: CouchDB translator guidance

jason.marley Aug 14, 2014 5:35 PM (in response to tom9729)

thanks!

So the Teiid would be more or less sit on top of lucene to issue query then the response would come from CouchDB?
Actions
4. Re: CouchDB translator guidance

tom9729 Aug 14, 2014 7:02 PM (in response to jason.marley)

The C-L plugin that I linked to above runs as a handler in CouchDB. It listens to the changes feed for new documents and runs them through an index function from the design doc. It has a servlet that Teiid can issue queries against. Optionally, the full document of each result can be embedded in the response (which I assume the servlet is just pulling out of CouchDB).

My translator converted SQL commands to Lucene queries (http://lucene.apache.org/java/3_6_2/queryparsersyntax.html) and supported AND/OR, ranges, equals (including LIKE), and negation. C-L can be configured to serve stale data when the index is being regenerated (for example, when it detects that the design doc changed). One of the issues I ran into was that Lucene is generally used for full text search, so I had to disable some of the processing Lucene would do when indexing data.
Actions
5. Re: CouchDB translator guidance

jason.marley Aug 17, 2014 5:55 AM (in response to jason.marley)

That is a cool approach. A client I was on in the past took did a similar configuration with Solr in lieu of lucene I wasn't familiar with couchDB at the time didn't really understand why they had layered approach, but makes sense now.

In your experience would it be uncommon to just have couchDB by itself?
Actions
6. Re: CouchDB translator guidance

shawkins Aug 18, 2014 2:13 PM (in response to jason.marley)

Just wanted to add a pointer over to the existing JIRA: [TEIID-2820] Support Couchbase as a resource - JBoss Issue Tracker
Actions
7. Re: CouchDB translator guidance

rareddy Aug 18, 2014 2:51 PM (in response to jason.marley)

I guess to expand further on Jason question, this translator is Lucene translator or CouchDB specific translator then? What we have to investigate is is this C-L handler exposes the Lucene query syntax vs combnation of Lucene and CouchDB query syntax. If it is former then Teiid should develop a Lucene translator, then may be another translator directly talks to CouchDB. That way we can access data, however the user has configured their CouchDB environment.

Jason, does it provide access to single row of document though any primary index? If yes, then this is not much different from Infinispan (JDV) before the query support. Even there with query support I believe they use Lucene underneath, however they exposed their own JPA style querying on top.
Actions
8. Re: Re: CouchDB translator guidance

tom9729 Aug 18, 2014 8:29 PM (in response to shawkins)
My understanding is that Couchbase has diverged from CouchDB. Basically Couchbase is using CouchDB as the storage engine, and replaces the REST API with a binary memcache-based one [1][2]. It looks like they're working on a query language for Couchbase, which might make an easy target for writing a translator [3].

Edit: Appending to this post because the forum overlords are rate-limiting me to hourly posts again...

The translator I developed is specific to the CouchDB-Lucene plugin linked above. My understanding is that Lucene is a Java library, so it wouldn't make sense to write a Lucene-specific translator because the interface is going to be different for each product. I went the C-L route because I wanted to keep the data in-place but be able to query it quickly.

CouchDB REST API:
Access one or more JSON documents by ID (see bulk APIs for multi).
Iterate through all documents. Skip/limit works but it is recommended to use PK ranges (startkey/limit or startkey/endkey) for performance.
Filter documents by primary key.
Order documents by primary key.
Create (materialized) view to change the primary key and body of documents stored in a database, and then do the above actions on the result. Views are implemented with map/reduce functions written in JavaScript and stored in the design document. Views can take hours to run for large databases, and require a lot of disk space if you're not careful.

I think it would be straightforward to develop a translator to talk to CouchDB REST API and provide key/value access to the JSON documents stored there, but queries are going to be slow without support for secondary indices. A SQL view could be created on top of this using the usual JSON->XML->XPATH strategy. Also it might be useful to provide blob access to attachments somehow.

On a related note, an ElasticSearch translator would be pretty cool because they appear to have support for spatial queries (e.g., polygon, box, distance), and ES can be setup to index just about anything (including CouchDB).

Edit: Updated typos and elaborated a bit.
1 of 1 people found this helpful
Actions
9. Re: CouchDB translator guidance

jason.marley Aug 19, 2014 10:23 AM (in response to tom9729)

Ramesh:
yes, you can get get a list of documents in a row.

All:
Sounds like a translator for CouchDB directly does not make sense; too many unknowns (additional indexing w/ solr, lucene, etc.), slow, assumptions on user to build map/reduce scripts, would service a small direct subset of users.

As Tom has pointed out CouchBase is a good alternative, because it incorporates CouchDB with additional indexing/query speed.

I have heard several consultants with the need for ElasticSearch internally for projects so this might be another viable option (well at least in the pecking order of things)?
Actions

Go to original post