4 Replies Latest reply on Jul 5, 2011 2:00 PM by joedel

Is dual persistance storage possible in Modeshape?

joedel Jun 29, 2011 11:47 AM

I'd like to have the metadata (stored as properties) of my content in a database and the actual content (files) in a file system, similar to what Jackrabbit and other CRs do.

I know that Modeshape allows you to have the persistance storage of your content repository in a database, filesystem, etc., but I'm not sure if you can have both in the same content repository. Is this possible?

Thanks.

1. Re: Is dual persistance storage possible in Modeshape?

rhauch Jun 30, 2011 2:25 PM (in response to joedel)

There are a couple of ways to use/configure ModeShape do what you want. The best approach will depend upon your needs (which I can't infer from your question).

ModeShape use its connectors to access and store all of the content in a repository. The JPA connector stores all the content in a RDBMS, and uses JPA/Hibernate to abstract the connector's entity classes from the actual database. The JPA connector normally stores nodes together with their properties, but all "large" property values are stored in a central area of the database (keyed by SHA1, in much the same way that Jackrabbit does). The reason the connector doesn't store the file content in a separate area on the file system is because storing everything in the database simplifies backup, transactions, isolation, locking, performance, etc.

If you're just storing files and directories with some additional metadata (e.g., "extra" properties), then maybe the ModeShape file system connector would better suit your needs. Basically, it allows ModeShape to accesses/stores 'nt:file' and 'nt:folder' nodes as files and folders on the file system. The properties that aren't normally included in "nt:file" and "nt:folder" nodes (see "Custom Properties on nt:file and nt:folder nodes") are stored in files hidden by the connector and JCR, and are actually accessible and readable by other applications if need be. So you definitely can store extra metadata.

The 'master' branch of ModeShape also has a new disk-based connector that (like the JPA connector) accesses/stores all content, but (like the file system connector) persists it all on disk. Unlike the file system connector, the nodes and property values are stored in binary files and are not accessible by other applications. But like the JPA connector, "large" property values are keyed by SHA1 and stored in a central area on disk. In other words, the binary content of 'nt:file' nodes' are stored in their own files on the file system.

However, if you a mixture of all of these features, then there's yet another option: the federation connector. It does require using different areas of the repository for different content (e.g., files in one area, metadata in another), but if you're just storing files that's probably not a big deal anyway. Basically, a JCR repository is configured to use a single federation connector, but the federation connector is configured to use 2 or more other connectors and to specify how the content in those other sources is projected into the single logical content.

I hope this helps. If not, just follow up with additional questions and we can hopefully help point you in the right direction for your needs.
Actions
2. Re: Is dual persistance storage possible in Modeshape?

joedel Jul 1, 2011 5:41 PM (in response to rhauch)

Yes, this answer my question. Basically, what I wanted was one JCR connector that would allow me to store the properties of a node in a database and the associated files in a filesystem.
From your answer I see that a connector is meant to handle only one kind of system, so is not possible to combine both features using one only one connector.

There is some controversy around what solution is better to manage files, a DB or a FS, and I've seen some white papers and reports defending that, specially for large files, the file system is superior to a database. That's why I like the idea of having the best of both worlds in the CR: the node properties as a set of key/value pairs in a DB, and the files associated to a node in the FS.

Having the flexibility of the connector system capability of federating 2 or more connectors into a higher level connector might somehow mimic what I'm trying to get.

Thanks
Actions
3. Re: Is dual persistance storage possible in Modeshape?

rhauch Jul 2, 2011 1:56 PM (in response to joedel)

I would definitely recommend taking a look at our new Disk connector, which I described above. It stores files (including large files) natively on the file system, whereas your metadata (e.g., other nodes and properties) would be stored on the file system but in a serialized format. Performance is actually very good, especially when using the connector's cache.

Or, take a look at the Infinispan connector, which uses an Infinispan data grid for storage. Plus, there's always the option of writing your own connector or specializing one of ours.

There is some controversy around what solution is better to manage files, a DB or a FS, and I've seen some white papers and reports defending that, specially for large files, the file system is superior to a database.

I agree that for large files, a FS is certainly better storage. That's what file systems do every well, plus the files can be read by other processes and applications.
That's why I like the idea of having the best of both worlds in the CR: the node properties as a set of key/value pairs in a DB, and the files associated to a node in the FS.
I'm curious why you think the best world for node properties (other than the files' content) is to be stored in a relational database. Neither Jackrabbit nor ModeShape stores properties in a traditional relational approach - both are really using the RDBMS for transactional storage of blobs (serialized chunks of multiple properties). Sure, transactional semantics are great, but if you're combining that DB with other non-transactional systems, you don't get true ACID semantics.
Actions
4. Re: Is dual persistance storage possible in Modeshape?

joedel Jul 5, 2011 2:00 PM (in response to rhauch)

I don't know how Jackrabbit or ModeShape use the database, but I would think that storing data in the form of key/value pairs is a good fit for a DB, and provides better performance for searching and querying. That doesn't imply that there can be implementations using the filesystem (like the serialized format in your Disk connector) that can be as good as the DB approach.

I'll take a look at the Disk connector.
Actions

Go to original post