3 Replies Latest reply on Oct 23, 2014 6:02 AM by hchiorean

    web app with multiple repo

    agonist

      Hi, I started using modeshape for a web project and I have some questions.

      We will probably have a huge amout of data per user that why we think to create one repositorie for each. (we plan to have 1000 users in the next two years) .each regular user will have the possibility to have between 1 and 20Go  of data.

      So 1000 * 20Go of potential data for just one repo, I guess it's to much...

      So when a user make a request the web app will use the correct repo depending of the user.

       

      What do you think ? having multiple repo is a good usage of modeshape ? Or there is a better solution ?

       

      Thanks

        • 1. Re: web app with multiple repo
          hchiorean

          It depends on what you mean by "huge amount of data": is it a matter of nodes & properties or binary content ? ModeShape will store the 2 differently: nodes/properties (the default JCR idioms) are stored in the main repository cache (via Infinispan) while all binary information (for example the content of files) is stored in a separate binary store for performance reasons: https://docs.jboss.org/author/display/MODE40/Binary+values

           

          Separate repositories "is a" high degree of separation & configuration overhead and is very much dependent on your use case. If you're planing on storing 20TB of data as nodes & properties that's probably a good idea, but I'm highly skeptical that you can anticipate in bytes the amount of nodes & properties that your application will use.

          If on the other hand the 20TB is binary content (i.e. binary JCR properties), as mentioned above the question comes down to the binary storage for which IMO you don't need separate repositories. But you do need to clarify the following:

          • if you were not using Modeshape, what would you choose to store the 20TB bytes in ?  Once you know that, it's very likely you can configure a binary store in Modeshape to use it (be it FS, Database, Infinispan, Mongo, Cassandra etc)
          • ModeShape has a "composite binary store" implementation which basically is an aggregate of different binary stores mapped under a key. So you have the option of using different binary stores to store different data. See modeshape/composite-binary-storage.json at master · ModeShape/modeshape · GitHub for an example.
          • 2. Re: web app with multiple repo
            agonist

            Actually I think in long term it will be more something like 1-2TB of data as nodes ans 18TB as binaries files. So 1-2TB is not a big deal as nodes ?

             

            Anyway thank you for your answer

            • 3. Re: web app with multiple repo
              hchiorean

              1-2TB worth of node data is significant, but I'm very curios how you came to this assessment size-wise ? Normally one would be able to estimate the number of nodes, not their size (the data is stored as BSON documents in Infinispan and there's no easy way to be able to estimate the size).

              You can use multiple repositories, but it's really hard to tell how practical this will be in the long run. When dealing with a large amount of nodes, it's also very important how you structure your nodes within a repository. See also http://modeshape.wordpress.com/2014/08/14/improving-performance-with-large-numbers-of-child-nodes/

              I would recommend doing a POC with 1 repository and looking at the amount of data that you can manipulate in that, based on your use-case.