In short, there are several options for how Lucene indexes can be stored.
Using Lucene within a cluster has some challenges (e.g., only one writer can update the indexes at a time). So rather than directly use Lucene like we did in 2.x, in 3.x we're using Hibernate Search as a framework for updating and managing the Lucene indexes. (Important: we're only using the bottom half of Hibernate Search, which they call the "engine" and which does not depend on Hibernate ORM or JPA.) The Hibernate Search engine is essentially a clustering utility layer for Lucene, and it gives us the flexibility to configured different ways that a cluster can update the indexes in very efficient ways.
Each ModeShape 3.0 JCR repository instance will use a Hibernate Search engine to update the indexes, even when running in a cluster. So there will be two primary decisions for how to manage the indexes:
- Where should the indexes be stored?
- If clustering, can each process in the cluster update the indexes directly?
The options for storing the Lucene indexes are:
- Filesystem - the indexes are stored on the file system
- Infinispan - the indexes are stored (and optionally distributed) in dedicated caches within the Infinispan grid (see here for more info)
- RAM - the indexes are stored in-memory (obviously usefullness is limited)
- Custom Lucene directory implementations
Non-clustered repositories are pretty easy: there's only one set of indexes, so the repository can directly update them. This may even work with some clustered situations: if storing in Infinispan, or if storing on the filesystem and each process in the cluster has efficient access to the file system. But in larger/more complicated cluster topologies, it may be more efficient (or even desirable) to have only one of the processes write to the indexes, and to have all other processes forward (through JMS or JGroups; see here for more details) their writes to the one master process. And when storing the indexes on the file system, each cluster process will likely want it's own copy of the indexes for reading, so Hibernate Search engine provides a variation of the filesystem storage option where there's a single master set of indexes stored on a filesystem and the other processes have read-only copies (updated various ways).
In summary, there's a lot of flexibility here, but hopefully we can make it very easy to set up ModeShape for most non-clustered and clustered situations while still allowing those fewer cases that need it the ability to access and control that flexibility. We think that storing the indexes in Infinispan will be the easiest and best performing option for most clusters (and maybe even for non-clustered repositories, too).
All of this can be configured right now (even with Alpha2), but we don't yet have any documentation describing how to do it. If you're interested in trying this, let us know and we can help you with the configuration.
Thanks for that.
We will be prototyping our planned clustered setup over the next few weeks I hope so when we get to that I'll ask about the configuration.