Multiple JPA source instances in separate processes pointing to the same DB should not cause a problem, certainly in terms of keeping the persisted data consistent. The database transactions will ensure the proper isolation, plus the connector adheres to JCR 'invalid state' semantics. For example, right now when a session is saved or refreshed, all content is flushed, ensuring that the transient state is created from the persisted data. The only problem at the moment is related to clustering: the processes will not receive notifications of changes made in the other process, and this could corrupt the search indexes (requiring more frequently re-indexing).
Using multiple JPA source instances within the same process pointing to the same DB should not be necessary and is discouraged. Instead, it's possible for multiple repositories to simply reference/use the same source, as long as each of those repositories has node types for all of the content it stores. And because the multiple JCR repositories are within the same process, any changed made by sessions for one of these repositories will be received by the other repositories.
Ok, I think the latter part explains what I'm trying to do. Didn't think multiple repos could point to the same source. Does it make sense then to even use different repos? Will the repo keys be stored w/ the content or something?
There is certainly something to be said for the ease and conceptual simplicity of having each repository source exposed through a single repository. Even if you are federating, it may make sense to only expose each source through a single repository. (And yes, this may be true even if you have multiple processes for failover/load/tolerance/etc.)
However, there certainly are use cases where you might have a particular source (let's say a database or data-grid) that you want to be able to access from multiple repositories. For example, there may be multiple applications and you may want each application to have its own repository, yet there is some content that is needed by multiple applications. However, most of the scenarios of this sort that I can envision would likely benefit from each repository federating multiple sources. After all, if two applications need to access exactly the same set of content, why not just have them use the same repository?
Will the repo keys be stored w/ the content or something?
I'm not sure what you mean. Can you elaborate?
What I'm trying to figure out is if having multiple repos pointing to different sources, but those sources point to the same database schema is not a good idea, but multiple repositories pointing to the same source works, how would i differentiate content belonging to repository1 vs. repository2 ?
What I'm trying to figure out is if having multiple repos pointing to different sources, but those sources point to the same database schema is not a good idea,
This is notionally identical to having a repository/source pair in multiple processes that are not clustered. Yes, they should work well with the database (relying upon the database for transaction isolation), but one repository will not receive events of changes made via the other repository and thus the search indexes will become inconsistent (pending a re-index).
Plus, you're treating them as if they were logically different repositories (with distinct names and source names), but they actually share the same content. Isn't it easier to just have one repository/source pair?
but multiple repositories pointing to the same source works, how would i differentiate content belonging to repository1 vs. repository2 ?
If you have multiple repositories pointing to the same source (and by definition the repositories are in the same engine and same process), then each repository will receive events for changes made via another repository, and this will keep the search indexes in sync. The behavior of this configuration is analogous to two repository instances in a cluster sharing the same (underlying) source, where the cluster ensures that each repository instance sees all changes made by the other instances. It's just that if the repository instances are in the same engine, there's no need for cross-engine or cross-process event channels. Once we plug in JGroups, the repository instances can be separated into different engines and/or processes.
I think the latter approach is all around much better than the first approach. But is there a reason why you don't just use a single repository/source? Perhaps there is something that distinguishes each repository, or perhaps you're using federation and each repository has a different combination of sources? Can you elaborate on why you're using multiple repositories on top of a single source?
Ideally, I wanted to treat it like a partition. It's kind of related to the search.
So in a given example, I have the "staged" content and the "live" content. The staged content contains everything - content that's been added, in progress, approved, not approved, no longer active, etc. The "live" content is what we want to search. At any given time, the live content would be analogous to a realization of the staged data - only the latest approved version of any content exists here, as long as it hasn't expired. It's much more streamlined - only contains nodes/child nodes that are ready to be seen by the world.
To be honest, thinking about it, I think it makes sense to keep the "live" data in an in memory or infinispan repository. In this case, I probably don't need to worry about using multiple JPA sources.