Version 14

    This document discusses what querying in Infinispan should look like.

     

    Requirements

    Descending order of importance

    • Queryability over REST and Hot Rod
    • Common API for remote and library-mode querying
    • Handle re-indexing in the event of index corruption or loss
    • Support changes in schema or indexes as an application grows
    • Management of indexes and schemas
    • Querying without indexes

    Phase 1 (target for Infinispan 6.0)

    Initial work to bring library mode querying to a mature level, and to lay the groundwork for a compatible remote querying implementation.

     

    Streams 1 and 2 can happen in parallel.

     

    1. Define a Query API that will be common for library mode + remote (ISPN-3169) (STREAM 1)
      • DSL to describe Filters.  See Coherence API.  These filters can be directly passed into Lucene.
      • LocalQuery sub-interface to support Lucene query objects for library mode only.  Filters also supported on LocalQuery on top of Lucene queries.
    2. Decide on a serialization format, necessary for server-side indexing   DECISION: USING PROTOCOL BUFFERS.
      • Requirements/points to consider:
        • Platform independence.  At least support for Java, C++, .NET.  Optional support for Python and Ruby
        • Supported platforms (same as the C++ client at least): Windows, 64-bit, Visual Studio 2010* RHEL 6, 64-bit, RHEL 5, 64-bit
        • Partial de-serialization
        • Schema (type) versioning to allow for upgrade to application code
        • Tooling to validate schema versions
        • Performance
        • Compatible licensing
      • Potential libraries to consider:
        • Apache Avro
        • Thrift
        • Google ProtoBufs
        • JBoss Marshalling (with some enhancements?)
    3. Metadata store   (ISPN-3170) (STREAM 2)
      • Globally scoped cache store for internal component use.  Accessed via GlobalComponentRegistry
      • Manual configuration and setup via XML
        • <global><metadataCache name="XYZ" enabled="true" /></global>
        • Where name is a defined named cache.  Recommended that this is (replicated && persisted) || (local && with a shared cache store).
    4. Define and manage indexes and schemas via JMX  (ISPN-3172) (STREAM 2)
      • Ability to upload/attach a .proto file
      • Ability to edit indexing metadata, which maps to Hibernate Search programmatic metadata API
      • Store schema and metadata info in Metadata Store
    5. Define a new query operation over hotrod (ISPN-3173) (STREAM 1)
    6. String-based query language for communication between client and server (ISPN-3174(STREAM 1)
    7. Upgrade the java hotrod client to support remote querying (ISPN-3175) (STREAM 2)
      1. based on #1, #5 and #6)

     

    Phase 2 (target for Infinispan 6.1)

    More sophisticated remote querying

     

    • Formalise String query language, make it public API
      • Expose via Client API
    • Expose Filters + query string over REST
    • Syntactic sugar for autoconf of metadata store
    • Binary query representation for Hot Rod (only if we feel the String representation performance is slow)
    • CLI hooks for the above, to execute queries.
    • Dynamic "installation" of object schema (.proto files) over Hot Rod and REST

     

    Future

    Dynamic querying, index-less querying, etc.

     

    • Named queries (equivalent to a prepared statement)
    • Dynamically switch between an index-based query versus translate query into a Map/Reduce task if indexes don't exist for a given type
      • Allow execution of such queries via the Query API (local and remote)
        • May require appropriate Hot Rod verbs
    • Consider Dremel (Impala?) as an alternate query mechanism
    • Distributed indexes