First of all, welcome. Any questions are fine.
I have a Java EE project with many entities which are managed and persisted using EJB. All the data is stored to a MySQL database. Now a "library" should be added that enables users to browse and manage all the entities by them self. Thus, the entities will be in a hierarchy, meta data will be added, searching and versioning will be useful features -- this sounds like a case where to use JCR to me (from what I've read about it until now).
You are absolutely correct -- this does indeed sound like a really good use case for using the JCR API and using ModeShape's connectors to access that existing database.
So, my question is: Is it possible to use JCR/ModeShape upon an existing database/together with EJB? As far as I found out until now the repository builds it's own data structure and cannot use a given database? And the persisting must be done either by EJB or JCR?
Yes, it is possible to use JCR/ModeShape upon an existing database (even one that uses EJB). Our connector framework means that ModeShape engine doesn't really know or care how the information is stored or structured, and that the connectors are completely responsible for projecting the existing content into the desired structure. ModeShape will then take care of the entire JCR implementation.
We do have an existing JPA connector does create its own data structure, and therefore can't be used to access a database with an existing schemata. It is intended to be used by ModeShape to persist in a DBMS the content that is not stored elsewhere.
But you can write a connector that talks to your database (via JPA or EJB or even JDBC; your choice) and project the persisted entities into a node structure of your choosing. Unfortunately, we have no such examples or documentation. However, I'd be happy to help guide you through the process.
The first step is to figure out what kind of node structure you'd like to project, and it sounds like you might have done this at least a bit. If your node structure will use custom node types, you'll need to create those node type definitions (probably in a CND file, though you could register them programmatically).
The second step is to create a connector that talks to your database. If you're going to use EJB, then your connector will be pretty much all new. We do provide some base classes in the org.modeshape.graph.connector.base package that makes writing a connector much easier, but again we don't really have a good example for you to follow.
If your connector would use JPA (and Hibernate), you could actually create a new connector that reuses some of our existing JPA connector (basically, reusing all the parts about defining the connection & Hibernate configuration). You'd create a new subclass of JpaSource, and override the method that creates a new instance of a custom Model subclass. One of the most important things that this Model subclass will do is return a new RepositoryConnection given the JpaSource instance. (You can get a configured EntityManager instance via the getEntityManagers() on JpaSource.) Your RepositoryConnection implementation can be from scratch, but I'd suggest use some of the base classes in the org.modeshape.graph.connector.base package. If your database has the notion of UUIDs (probably not), you could use the Map-oriented base classes; if your database has the notion of path, you can use the path-oriented base classes; otherwise, you can always subclass the abstract classes and create your own custom Transaction implementation (or subclass BaseTransaction).
I hope this makes sense. I wish I could say "Just follow this example." But in lieu of that, I'd be happy to walk you through this process and help as much as I can. For example, if you're willing to describe your schemata and hierarchical structure, I can help sketch out the classes and node types. I can even do more if you're willing to share the code. (I'd love to have example code for exactly this use case!)
Thank you very much for this comprehensive reply!
Well, it sounds like a lot of work to get this running but good to know that it is possible. I'm not sure if it's worth the effort in comparison to an implementation that just relies on EJB and the existing infrastructure. I'm also concerned about the performance if the repository cannot use it's own data structure, what do you think? (Performance is an important aspect in this project since there is really a lot of data).
We'll have a project meeting tomorrow where we'll discuss whether to use JCR or not. After that I can provide you with more information.
Again a big thank you for your reply and the offered help!
Well, it sounds like a lot of work to get this running but good to know that it is possible. I'm not sure if it's worth the effort in comparison to an implementation that just relies on EJB and the existing infrastructure.
I wish that weren't the case, but I'd probably agree. Working with EJBs is fairly straightforward. It'd be different, I think, if we already had a connector for EJBs.
I'm also concerned about the performance if the repository cannot use it's own data structure, what do you think? (Performance is an important aspect in this project since there is really a lot of data).
First of all, ModeShape's JCR implementation is capable of using the data structured provided by any properly-implemented connector. For example, the JDBC metadata connector projects the metadata from a JDBC database into a JCR graph structure, and this works perfectly well.
Second, performance can be really good, but it does depend quite a bit on the structure that the connector projects. ModeShape (like other JCR implementations) doesn't perform well with shallow and flat hierarchies (where nodes contain many, many children). This is because the JCR API was designed to expose the node structure as being hierarchical. In fact, the developers of most JCR implementations recommend designing the node structured more like you would a file system (which can't have millions of files under a single folder), and to leverage the data's natural hierarchy - dates can be broken down by year, month, and day; addresses by geographic region, province, city, etc; products by category, manufacturer, style, etc. Even when data has no natural hierarchical breakdown (e.g., identifiers), often a more artificial structure can be used (e.g., UUIDs can be broken down into various multi-character segments).