We need a federating connector that has a much more sophisticated mapping capability than the current one has.
I have been making some brief notes about what we might need (included below) and am interested in any suggestions. You will notice from the notes below that a prime requirement is the ability to easily specify where a document is stored and the ability to relocate it to another location. As mentioned in earlier posts, my application needs to store large numbers of documents with storage requirements in excess of 20TB. In a production environment you need control over where the documents are physically stored and you need to be able to change your mind later and move them around (management tools will be required in the long run to view and manage this).
I have already written one readonly connector (prototype) for ModeShape over the last few weeks that connects a legacy store that uses a proprietary database and filesystem layout. It was a bit of a steep learning curve but I got there in the end. I would be interested in any comments on the best approach to an implementation of a new federating connector based on the ideas below. Start with the existing one or just start from scratch?
My timeframe is longish so I'd be aiming to do this in version 3.0.
The idea is to introduce a layer, like the current federating connector, that delegates the actual storage to underlying connectors of all types (filesystems, DB, other JCR implementations, NoSQL connectors, etc.) but has a very flexible mapping scheme so that a unified tree above can be arbitrarily split across multiple storage locations.
- map any subtree within the document hierarchy to a particular connector based on the metadata at the root node of the subtree.
- ability to do this mapping at any level, such that a subtree is on one connector but child subtrees of that subtree are on other connectors and so on.
- able to move any subtree to a different connector - at runtime and transparently to the application above.
- mapping algorithms should be plugable to allow the introduction of new mapping schemes (not necessarily at runtime though)
- a map algorithm selects a node to be mapped and supplies a destination.
- all child nodes of a mapped node are assumed to map to the same destination unless selected by a different algorithm
- this connector could use an additional attribute on each node that determines the mapping.
- The value if not specified is inherited from the parent
- an operation that changes the value of this attribute results in a move of the node.
- the move is transactional if the source and destination connectors are transactional
- this might make bulk moves expensive
- If a mapping attribute is used, you could introduce an extra layer above this that maps based on the document metadata by adding the extra attribute. This could be the pluggable mapping algorithm.
Changed the title of the post to better reflect the discussion