1 Reply Latest reply on Jun 11, 2012 10:06 AM by rhauch

    ModeShape 3 indexes

    jonathandfields

      Will ModeShape 3 be using Infinispan for Lucene index storage? This has been mentioned in the past but I was not sure if that was on the roadmap for the initial release, long term, or at all....

       

      I don't know much about this configuration myself (I have always just used on-disk), but I'm assuming it would perform better, plus it would consolidate storage into a single location instead of two.

       

      From an operational standpoint, it would make backup and recovery easier since, for example, that could all be handled using Berkeley DB.

       

      That makes it easier to sell "new" technologies like this  to IT organizations that are used to the backup/recovery features of relational databases.

       

      Thanks,

      Jon

        • 1. Re: ModeShape 3 indexes
          rhauch

          Hi, Jonathan.

           

          UPDATE: I just was looking into this, and Alpha4 is not properly setting the location of the Infinispan configuration file to use for the 3 caches. I've logged this as MODE-1510, and already have a fix and will include it in today's Alpah5 release. As a result, the Infinispan configuration in the "hibernate-search-infinispan.jar" will be used by default; this should work fine in most cases but does not store the index in a persistent cache store.

           

          Yes, it's already possible to configure ModeShape 3.0.0.Alpha4 (or later) to store the Lucene indexes in Infinispan, although honestly I don't know of anyone who's tried it yet with ModeShape. The Infinispan Lucene Directory component does all the heavy lifting, and people are certainly using this. ModeShape doesn't directly use this component, however; instead, it configures Hibernate Search to use it.

           

          NOTE: You must include the Infinispan Lucene Directory JARs in the classpath, or include them in your dependencies if using Maven. If you're using ModeShape+AS7, you'll need to add a Infinispan modules in AS7 and add to the "org.modeshape" module a dependency to this new module. (The module will be added to the ModeShape kit by Beta1; see MODE-1509.)

           

          Note that the Infinispan Lucene Directory implementation uses 3 separate caches for the index storage (see also Sanne Grinovero's recent presentation), and they should always be different caches than the one ModeShape uses for content storage. If you have enough memory all of the caches should be replicated (so that each process has the entire copy of the indexes); otherwise, the lock and metadata caches are small and should always be replicated, while the data cache is where all the raw index data is stored and can be distributed if the indexes are large enough. (See also the Hibernate Search documentation for some background.)

           

          Here's a sample embedded configuration that shows the basic usage:

           

          {

              "name" : "sample",

              "storage" : {

                  "cacheConfiguration" : "pathToInfinispanConfig",

                  "cacheName" : "content-storage",

                  "binaryStorage" : { ... }

              }

              "query" : {

                  "indexStorage" : {

                      "type" : "infinispan",

                      "cacheConfiguration" : "pathToInfinispanConfig",

                      "lockCacheName" : "sample-index-locks",

                      "dataCacheName" : "sample-index-data",

                      "metadataCacheName" : "sample-index-metadata",

                      "chunkSizeInBytes" : 1024

                  }

              }

          }

           

          The "type" field is required, but all the others fields under "indexStorage" have defaults (see our configuration schema for details, defaults, and descriptions of each field).

           

          With 3.0.0.Alpha4, however, I would recommend setting all of the fields in "indexStorage", including the cache configuration (you can point it to the same Infinispan configuration file that you're using for the content storage, or point to a different Infinispan configuration file). I'm not confident that the same Infinispan configuration would be used by default.

           

          If you're using ModeShape + AS7, then the configuration is similar but follows the XSD for the ModeShape subsystem in AS7. See this link for an example configuration file.

           

           

          I don't know much about this configuration myself (I have always just used on-disk), but I'm assuming it would perform better, plus it would consolidate storage into a single location instead of two.

           

          From an operational standpoint, it would make backup and recovery easier since, for example, that could all be handled using Berkeley DB.

           

          That makes it easier to sell "new" technologies like this  to IT organizations that are used to the backup/recovery features of relational databases.

           

          It should indeed perform quite a bit better, since accessing in-memory data from machines within the same fast network (very likely in most installations) can often be much faster than even reading the same data from local disk.

           

          I also agree that operationally everything becomes much cleaner if ModeShape can store all content in the same system, so using Infinispan for content and index storage gets us most of the way there. (At the moment, binary storage only works with the file system option, though we plan to offer a cache-based storage option there, too; see MODE-1443.)