2 Replies Latest reply on Sep 5, 2013 2:38 AM by m.jawwad

    Query regarding Multiple Repositories

    m.jawwad

      Hi,

       

           I am working with ModeShape 3.4 latest release on testing purposes. I have some queries regarding modeshapes handling of repositories.


      Scenario:

       

          Our aim is to create and manage multiple repositories and different locations (and by multiple I mean no specific number in practical environment) with a single web interface. And also there will be bulk uploading across these repos (maybe at the same time) with separate sessions for each repo. but all these repositories will be maintained under one web application.  I plan to use modeshape engine for handling multiple repos.

       

           So multiple repositories, multiple sessions, multiple (bulk) file uploading and all within our main web application.

       

      Questions:

           

      So my questions in the light of the above scenario is that


      1. Is this a right and feasible approach?"

      2. How does the ModeShape engine handles multiple repositories in the memory?

      3. When modeshape engine deploys/loads a repository, how does it move any of its part into the main memory ?

       



      Regards

        • 1. Re: Query regarding Multiple Repositories
          rhauch

          1. Is this a right and feasible approach?

          ModeShape was designed to support running multiple independent repositories within the same process, where each repository has its own configuration that is independent of the other repositories. Using the ModeShapeEngine, you can indeed dynamically deploy and undeploy these individual repositories as needed, and even update the configuration of a repository while it is running.

           

          It certainly is feasible for a single web application to access multiple repositories. In fact, this is how our RESTful service, WebDAV service, and CMIS service all work.

           

          So theoretically, this certainly sounds like ModeShape can do what you are looking for. However, there are some things to be aware of:

          • Managing repositories is not a lightweight activity. Each repository is backed by an Infinispan cache, which also has to be configured and managed. ModeShape accesses the Infinispan Cache instance via an abstraction called the Environment. By default, each RepositoryConfiguration will use a LocalEnvironment instance, though you can easily make the RepositoryConfiguration use a different instance. (The LocalEnvironment has methods for registering Infinispan CacheManagers, which you can programmatically configure to contain the expected Cache instances. If needed, you could extend LocalEnvironment or implement your own Environment for total control.)
          • Every repository has some overhead, and this (and your JVM environment) will dictate how many repositories you can practically manage in an engine. You don't given any rough numbers for the numbers of repositories you'll have deployed at any one time, so it's hard to say if it's feasible to handle the numbers you're talking about.
          • Each repository contains multiple workspaces, and there is some relationship between these workspaces (e.g., see JCR's concept of corresponding nodes). These workspaces all share the same namespaces, node types, version history, binary storage, node storage, etc. Be sure you understand what a workspace is before you conclude that you need even more separation than what workspaces give you.
          • Each workspace within a repository uses a separate in-memory Infinispan cache of (internal) node representations. Unlike the primary cache, this is indeed a limited cache of most recently used node representations. By default, ModeShape will instantiate a cache for each workspace, configuring them to keep a maximum number of node representations in-memory and to keep them in-memory for a maximum period of time (see here for background). If you don't like the defaults (and you probably won't if you're going to run quite a few repositories), you should configure and manage these Infinispan cache instances, too. (See this area of the RepositoryConfiguration's JSON schema.)
          • If you are installing ModeShape into EAP, then there will already be a managed ModeShapeEngine that you can dynamically configure using the CLI or other EAP administrative tools. The Infinispan caches and cache managers are also managed in a similar way.

           

          2. How does the ModeShape engine handles multiple repositories in the memory?

           

          As I mention above, each repository is completely independent, and each will use a separate Infinispan cache that actually persists the information (assuming it is configured with a cache store, though that is not necessary for purely in-memory operations). There is some memory-overhead, though most of this can be configured (e.g., the workspace caches) or is a function of how the repository is used. For example, a session that creates (or imports) hundreds of thousands of nodes will have more memory requirements than sessions that just make smaller changes. On the other hand, many sessions that are actively reading the same set of nodes will share the cached node representations, and thus each session will consume relatively small amounts of memory.

           

          How much memory overhead per repository is too difficult to quantify, considering the sheer number of configuration variations that ModeShape supports. My recommendation is to build a very quick and dirty proof of concept that does some of the things you want to do, and see whether ModeShape satisfies your requirements.

           

          3. When modeshape engine deploys/loads a repository, how does it move any of its part into the main memory ?

           

          I tried to address this above. The biggest factor here are:

          1. the repository's workspace caches, which are essentially an in-memory cache of most-recently-used nodes;
          2. how much transient (unsaved) state each session has, since that is kept in memory;
          3. the length of your sessions - its better to create a session, use it for a short period of time (e.g., a web request), and then close it;
          4. whether your application holds only Node and Property objects returned by a Session, preventing them from being garbage collected (the JCR specification requires that each Node and Property know about its Session, and that the same Node/Property object is returned for repeated calls to the Session).
          • 2. Re: Query regarding Multiple Repositories
            m.jawwad

            Thanks a lot Randall. Thats very helpful.

             

            Regards,