It definitely sounds like an initial approach around REST should be documented as it will allow integration even with older versions.
It seems like you are saying that projection/selection could require the translator to write a script to be executed against CouchDB - or can that not be done on the fly?
Unfortunately CouchDB doesn't really offer a good way to do general purpose queries. You can write map/reduce functions to create what are essentially materialized views (the results are precomputed and stored on disk) which are indexed and accessed by whatever you return as the primary key. There is no support for secondary indexes which means you are basically going to have to iterate through all of the documents stored in the database. CouchDB supports adhoc views but these are slow because CouchDB is just going to iterate through every single document in the database.
One option is to index documents with Lucene, and then query the index. I developed a CouchDB translator using this strategy and was able to get pretty good results. I can't share the code but would be happy to offer guidance. See https://github.com/rnewson/couchdb-lucene for the Lucene integration. Another option might be to use SOLR, since I think Teiid already has a translator for that.
So the Teiid would be more or less sit on top of lucene to issue query then the response would come from CouchDB?
The C-L plugin that I linked to above runs as a handler in CouchDB. It listens to the changes feed for new documents and runs them through an index function from the design doc. It has a servlet that Teiid can issue queries against. Optionally, the full document of each result can be embedded in the response (which I assume the servlet is just pulling out of CouchDB).
My translator converted SQL commands to Lucene queries (http://lucene.apache.org/java/3_6_2/queryparsersyntax.html) and supported AND/OR, ranges, equals (including LIKE), and negation. C-L can be configured to serve stale data when the index is being regenerated (for example, when it detects that the design doc changed). One of the issues I ran into was that Lucene is generally used for full text search, so I had to disable some of the processing Lucene would do when indexing data.
That is a cool approach. A client I was on in the past took did a similar configuration with Solr in lieu of lucene I wasn't familiar with couchDB at the time didn't really understand why they had layered approach, but makes sense now.
In your experience would it be uncommon to just have couchDB by itself?
I guess to expand further on Jason question, this translator is Lucene translator or CouchDB specific translator then? What we have to investigate is is this C-L handler exposes the Lucene query syntax vs combnation of Lucene and CouchDB query syntax. If it is former then Teiid should develop a Lucene translator, then may be another translator directly talks to CouchDB. That way we can access data, however the user has configured their CouchDB environment.
Jason, does it provide access to single row of document though any primary index? If yes, then this is not much different from Infinispan (JDV) before the query support. Even there with query support I believe they use Lucene underneath, however they exposed their own JPA style querying on top.
1 of 1 people found this helpful
My understanding is that Couchbase has diverged from CouchDB. Basically Couchbase is using CouchDB as the storage engine, and replaces the REST API with a binary memcache-based one . It looks like they're working on a query language for Couchbase, which might make an easy target for writing a translator .
Edit: Appending to this post because the forum overlords are rate-limiting me to hourly posts again...
The translator I developed is specific to the CouchDB-Lucene plugin linked above. My understanding is that Lucene is a Java library, so it wouldn't make sense to write a Lucene-specific translator because the interface is going to be different for each product. I went the C-L route because I wanted to keep the data in-place but be able to query it quickly.
CouchDB REST API:
- Access one or more JSON documents by ID (see bulk APIs for multi).
- Iterate through all documents. Skip/limit works but it is recommended to use PK ranges (startkey/limit or startkey/endkey) for performance.
- Filter documents by primary key.
- Order documents by primary key.
I think it would be straightforward to develop a translator to talk to CouchDB REST API and provide key/value access to the JSON documents stored there, but queries are going to be slow without support for secondary indices. A SQL view could be created on top of this using the usual JSON->XML->XPATH strategy. Also it might be useful to provide blob access to attachments somehow.
On a related note, an ElasticSearch translator would be pretty cool because they appear to have support for spatial queries (e.g., polygon, box, distance), and ES can be setup to index just about anything (including CouchDB).
Edit: Updated typos and elaborated a bit.
yes, you can get get a list of documents in a row.
Sounds like a translator for CouchDB directly does not make sense; too many unknowns (additional indexing w/ solr, lucene, etc.), slow, assumptions on user to build map/reduce scripts, would service a small direct subset of users.
As Tom has pointed out CouchBase is a good alternative, because it incorporates CouchDB with additional indexing/query speed.
I have heard several consultants with the need for ElasticSearch internally for projects so this might be another viable option (well at least in the pecking order of things)?