This document discusses what querying in Infinispan should look like.
Requirements
Descending order of importance
- Queryability over REST and Hot Rod
- Common API for remote and library-mode querying
- Handle re-indexing in the event of index corruption or loss
- Support changes in schema or indexes as an application grows
- Management of indexes and schemas
- Querying without indexes
Phase 1 (target for Infinispan 6.0)
Initial work to bring library mode querying to a mature level, and to lay the groundwork for a compatible remote querying implementation.
Streams 1 and 2 can happen in parallel.
- Define a Query API that will be common for library mode + remote (ISPN-3169) (STREAM 1)
- DSL to describe Filters. See Coherence API. These filters can be directly passed into Lucene.
- LocalQuery sub-interface to support Lucene query objects for library mode only. Filters also supported on LocalQuery on top of Lucene queries.
- Decide on a serialization format, necessary for server-side indexing DECISION: USING PROTOCOL BUFFERS.
- Requirements/points to consider:
- Platform independence. At least support for Java, C++, .NET. Optional support for Python and Ruby
- Supported platforms (same as the C++ client at least): Windows, 64-bit, Visual Studio 2010* RHEL 6, 64-bit, RHEL 5, 64-bit
- Partial de-serialization
- Schema (type) versioning to allow for upgrade to application code
- Tooling to validate schema versions
- Performance
- Compatible licensing
- Potential libraries to consider:
- Apache Avro
- Thrift
- Google ProtoBufs
- JBoss Marshalling (with some enhancements?)
- Requirements/points to consider:
- Metadata store (ISPN-3170) (STREAM 2)
- Globally scoped cache store for internal component use. Accessed via GlobalComponentRegistry
- Manual configuration and setup via XML
- <global><metadataCache name="XYZ" enabled="true" /></global>
- Where name is a defined named cache. Recommended that this is (replicated && persisted) || (local && with a shared cache store).
- Define and manage indexes and schemas via JMX (ISPN-3172) (STREAM 2)
- Ability to upload/attach a .proto file
- Ability to edit indexing metadata, which maps to Hibernate Search programmatic metadata API
- Store schema and metadata info in Metadata Store
- Define a new query operation over hotrod (ISPN-3173) (STREAM 1)
- String-based query language for communication between client and server (ISPN-3174) (STREAM 1)
- internal only, not exposed over client API. For this phase at least.
- Use https://github.com/hibernate/hibernate-jpql-parser for parsing into Lucene query
- Upgrade the java hotrod client to support remote querying (ISPN-3175) (STREAM 2)
- based on #1, #5 and #6)
Phase 2 (target for Infinispan 6.1)
More sophisticated remote querying
- Formalise String query language, make it public API
- Expose via Client API
- Expose Filters + query string over REST
- Syntactic sugar for autoconf of metadata store
- Binary query representation for Hot Rod (only if we feel the String representation performance is slow)
- CLI hooks for the above, to execute queries.
- Dynamic "installation" of object schema (.proto files) over Hot Rod and REST
Future
Dynamic querying, index-less querying, etc.
- Named queries (equivalent to a prepared statement)
- Dynamically switch between an index-based query versus translate query into a Map/Reduce task if indexes don't exist for a given type
- Allow execution of such queries via the Query API (local and remote)
- May require appropriate Hot Rod verbs
- Allow execution of such queries via the Query API (local and remote)
- Consider Dremel (Impala?) as an alternate query mechanism
- Distributed indexes
Comments