3 Replies Latest reply on Aug 10, 2015 4:27 AM by hchiorean

    What is exact ModeShape advantages over writing own mongodb storage?

    jbossastana

      Cross post from http://stackoverflow.com/questions/31847986/what-is-exact-modeshape-advantages-over-writing-own-mongodb-storage

      I'm going to develop file storage system. Mainly I will store text documents. I read many questions and answers and got some information about file management systems on top of which I may develop my own.

      1. Alfresco uses filesystem and DB reference to FS, Apache Jackrabbit uses fs||db and Modeshape uses fs||db||nosql db(cassandra,mongo)
      2. Blobs are slower than FS especially in dealing with large files (>1MB) but blobs more reliable and provide backup,migration, consistency support out of box. As I don't want to store many large files the performance difference between fs and blob became blurred.
      3. I decided to store blob not in relational DB but in mongo db, because
        • mongodb has GridFS under the hood, which provides chunked processing of binary data, replication between servers out of box;
        • mongodb good for storing key/value which is docid/blob in my case;
        • AFAIK, facebook uses mongo db for storing images and media (but they merge many files to one blob)
      4. Many CMS systems like Magnolia, Hippo CMS and LogocalDOC based on Jacrabbit which may only provide FS||DB and don't relevant for me as I want mongodb. Alfresco is too cumbersome for my small requirements ans also doesn't support nosql DB and I decided to choose ModeShape.

      Question: What is exact profit of using Modeshape instead of simply creating own small web app and directly write to mongodb and gain benefits of GridFS?

      The only answer from myself is that Modeshape also comes with bundled Lucene engine for indexed search. I'm not sure about versions of documents - does it specially written in Modeshape or I can simply rely on mongodb to deal with this task? Does modeshape provide additional mechanisms to provide integrity of data and reliable storage or it simply relies on underlying database? Does modeshape use GridGS when connect to mongoDB store or I should turn on this option?

      I also would like to use file storage system as REST service under JBOSS Keycloak and not sure is it possible to put Modeshape under Keycloak. So, my question is should I develop own app and thus gain flexible develop, integrate it with mongoDB, put it under Keycloak and other custom wishes or I should use Modeshape and gain some advantages? What is that advantages? Will it really decrease code amount from my side?

        • 1. Re: What is exact ModeShape advantages over writing own mongodb storage?
          hchiorean

          Question: What is exact profit of using Modeshape instead of simply creating own small web app and directly write to mongodb and gain benefits of GridFS?

          I can't say what the advantage in your particular use case is, but I can try to explain some of ModeShape's concepts which might help you decide.

          First and foremost, ModeShape is not a database but rather a JCR implementation with extra features. As such, there is no comparison outright between MongoDB (or any other DB) and ModeShape. In your particular case, you can choose either one or the other, but you can't mix both of them. When you're interacting with ModeShape you're only doing so via the JCR API and optionally the ModeShape API which has some extensions. When you're interacting directly with a DB, you have much more flexibility in your design/code.

           

          One thing that's also important to understand is that when you're using ModeShape, data is essentially stored in 2 different places: JCR data (like nodes and properties) are stored via Infinispan in whatever cachestore Infinispan is configured to use. Binary data (i.e. in your case the byte[] content of files) is stored elsewhere, in something we call "binary stores". This is normally a separate (configurable) storage medium like the FS, a RDBMS, MongoDB, Cassandra or another ISPN cache. This design is explained in more detail here: Binary values - ModeShape 4 - Project Documentation Editor

          Does modeshape provide additional mechanisms to provide integrity of data and reliable storage or it simply relies on underlying database?

          When using ModeShape, you're  using nodes, properties and more importantly JCR sessions. The latter represent your unit of work in JCR and ModeShape supports both externally managed or internal transactions for each session. In other words, the data integrity guaranty is that of ACID transactions.

          I'm not sure about versions of documents - does it specially written in Modeshape or I can simply rely on mongodb to deal with this task?

          If you were to use ModeShape, you could use the JCR Versioning API (JSR 283 Chapter15) and essentially version your [nt:file] and [nt:folder] nodes.

          Does modeshape use GridGS when connect to mongoDB store or I should turn on this option?

          If you've configured ModeShape to use a MongoDB backed binary store (see above), data is chunked internally by ModeShape and stored in multiple BasicDBObject instances. So ModeShape's  Mongo binary store does not use GridFS directly.

          The only answer from myself is that Modeshape also comes with bundled Lucene engine for indexed search.

          Lucene isn't supported yet in ModeShape 4 (it will be in the future) but you still have full search capabilities (Query and search - ModeShape 4 - Project Documentation Editor) and optional custom index definitions which are stored in MapDB atm. So a nice feature is that you would already have search capabilities built-in.

          I also would like to use file storage system as REST service under JBOSS Keycloak and not sure is it possible to put Modeshape under Keycloak.

          You would still have to create the REST endpoint and its corresponding logic and as such, you'd have to handle the Keycloak interaction yourself. ModeShape is essentially a library (like Hibernate for example) while Keycloak handles SSO & web-specific auth protocols.

          So, my question is should I develop own app and thus gain flexible develop, integrate it with mongoDB, put it under Keycloak and other custom wishes or I should use Modeshape and gain some advantages? What is that advantages? Will it really decrease code amount from my side?

          If you were to use ModeShape, you'd get features like transaction support,  searching or versioning at the cost of flexibility and some complexity. Depending on your use case/requirements it's something that may or may not be worth it. The easiest way probably to get started on making a decision is to try doing a simple POC using both approaches and compare the outcome.

          • 2. Re: What is exact ModeShape advantages over writing own mongodb storage?
            jbossastana

            Thank you for expanded response!

             

            I think I will use ModeShape but I'm not sure I have to develop my own REST endpoint   - it seems I may deploy modeshape-rest.war and probably change AuthetificationProvider (or omit authorization in first test effort).

             

            If you've configured ModeShape to use a MongoDB backed binary store (see above), data is chunked internally by ModeShape and stored in multiple BasicDBObject instances.

             

            Does it mean I still may extract binary data as stream (by parts)? Does ModeShape give binary data by chunks when request it by REST call? What about uploading large files by REST? Is ModeShape able to merge chunked incoming binary stream automatically or I should merge parts on backend and after that give ready data to ModeShape?

            • 3. Re: What is exact ModeShape advantages over writing own mongodb storage?
              hchiorean

              ModeShape's REST service only exposes a limited set of "generic" operations and is nowhere close to the full JCR API spectrum. Instead of using the default service, you should really focus on building your own, context-dependent, REST endpoint and interact behind the scenes with the JCR/ModeShape API.

               

              Does it mean I still may extract binary data as stream (by parts)? Does ModeShape give binary data by chunks when request it by REST call? What about uploading large files by REST? Is ModeShape able to merge chunked incoming binary stream automatically or I should merge parts on backend and after that give ready data to ModeShape?

              As mentioned above, you shouldn't really be using ModeShape's REST endpoint out-of-the-box. In terms of the current implementation, the current REST service writes the stream of the binary data back on the HTTP response, using the mime-type and optional content-disposition headers. It doesn't do any chunking. When uploading binary data, it simply relies on RestEASY (the underlying API with which the endpoint is built) and reads the incoming stream from the request body. It also supports multipart/form-data so you could upload your binary data in chunks, that way.

              You can see the current implementation of the REST endpoint here: modeshape/ModeShapeRestService.java at master · ModeShape/modeshape · GitHub