7 Replies Latest reply on Jan 31, 2013 1:28 PM by rhauch

    ModeShape suitability for medium sized binary files

    gmlopezdev

      Hi,

      I'm looking for a content store solution which should be able to provide support to store small to medium sized binary files (images/pdf/audio/video which will probably not more than 100-200 Megs and an average of 5-10 Megs). ModeShape looks great in terms of standards, scalability/availability and so forth, however I'm concerned about how it will behave with medium to large sized binary files. Although I foresee that I will be handling small sized files on average situations, I may have exceptions. Looking forward, it is also possible that this content store start being used for more purposes than the original ones. Another option I was considering is Mongo/gridfs which is actually able and built to handle large binary files among other interesting features. ModeShape has its own interesting features too.

       

      I've read [this thread|https://community.jboss.org/thread/176690] which is not completely related to my question but Randall's post about binary content, provided some information however I'd like to more thorough information about it.

       

      Any input about my file size concerns and suitability regarding ModeShape?

       

      Thanks for your help!

        • 1. Re: ModeShape suitability for medium sized binary files
          rhauch

          ModeShape can indeed handle very large files, though how fast depends on several differen things. If you haven't already, look at our documentation that describes how ModeShape handes Binary values. There are severall options to choose from, based upon the topology you're looking for.

           

          A non-clustered server could use the FileSystemBinaryStore, which really just forwards all calls directly to the underlying java.io.File objects managed in the binary store. All access is via buffered file input/output streams. Thus, this option will be fast and should handle as large of files as the OS can handle directly.

           

          Clustered topologies need a shared binary store, and we also have several options here as mentioned in the documentation. Most of them can handle very large files, but the need to have a shared, distributed store will increase overhead. Be sure to test performance on your own hardware.

           

          (Sorry for the many updates. For some reason, I had a lot of trouble entering this post.)

          1 of 1 people found this helpful
          • 2. Re: ModeShape suitability for medium sized binary files
            gmlopezdev

            Hi Randall, thank you very much for your reply! It is actually helpful as the link you have provided. As per the documentation, it appears that a Mongo data store could also be defined/configured to be used in particular as binary store.

             

            If you do not mind. Could you please point me to some code samples for uploading/retrieving files from ModeShape?

             

            Thanks again!

            • 3. Re: ModeShape suitability for medium sized binary files
              rhauch

              The key is that the content of a file would be stored as Binary value in a property on some node:

               

              // Create a buffered input stream for the file's contents ...

              InputStream stream = new BufferedInputStream(new FileInputStream(file));

               

              // Create a node where we'll store the content ...

              Node node = parentNode.addNode("myfile","nt:unstructured");

               

              // Upload the file to that node ...

              Binary binary = session.getValueFactory().createBinary(stream);

              node.setProperty("content", binary);


              // Save the session ...

              session.save();

               

              The important two lines that deal with setting a Binary value are just after the "Upload he file to that node" comment.

               

              Getting the file's content back out is pretty easy:

               

              // Get an input stream to the binary value ...

              Binary content = node.getProperty("content").getBinary();

              long size = content.getSize();

              InputStream stream = content.getStream();

               

              Obviously these examples do not show storing any other information about the file (e.g., no other metadata), and the resulting node is somewhat arbitrary in structure. You would likely add properties you find interesting, and design the node to suit your needs.

               

              Now, the JCR specification actually pre-defines some node types that are expressly intended to be used in nodes that represent files and folders. Here's an example of code (note that the Binary part is largely the same):

               

               

              Calendar lastModified = Calendar.getInstance();

              lastModified.setTimeInMillis(file.lastModified());

               

              // Create a buffered input stream for the file's contents ...

              InputStream stream = new BufferedInputStream(new FileInputStream(file));

               

              // Create an 'nt:file' node at the supplied path ...

              Node fileNode = folder.addNode(file.getName(),"nt:file");

               

              // Upload the file to that node ...

              Node contentNode = fileNode.addNode("jcr:content", "nt:resource");

              Binary binary = session.getValueFactory().createBinary(stream);

              contentNode.setProperty("jcr:data", binary);

              contentNode.setProperty("jcr:lastModified",lastModified);

               

              // Save the session (and auto-created the properties) ...

              session.save();

               

              This code creates two nodes (one for the file thing, the other for the content), and it sets additional properties, including several (e.g., "jcr:mimeType", "jcr:created" and "jcr:createdBy") that are all set automatically upon save. Again, this is just one way to store files, but you're absolutely free to use whatever node structure you want. For example, some applications might want to store information about a patient, including scanned documents. I would imagine that documents would appear *under* the patient node type, and could be "nt:file" nodes or other custom node types (that may or may not subtype "nt:file").

               

              I suggest these links to learn more about "nt:file" and "nt:folder":

               

              • 4. Re: ModeShape suitability for medium sized binary files
                gmlopezdev

                Thanks again Randall!

                 

                Let me ask you one more question. Reading the documentation available, it is stated here that repositories are not good for storing large files although it is also stated that can be stored outside of the repository without any other comment about it. I believe you when you say that it is suitable for that purpose but the documentation is a bit contradictory. Could you please clarify for me?

                • 5. Re: ModeShape suitability for medium sized binary files
                  rhauch

                  That documentation says (emphasis mine):

                   

                  JCR repositories are good at storing files, but binary values are accessed (via Java streams) and are thus less useful for storing very large files (e.g., GB in size)

                   

                  This means that you can store files of any size, although very large files (I'd guess starting around 1GB) start to become more time-consuming to access simply because this requires the files to be processed with Java streams. Now, ModeShape might still be fine for very large files/content that are normally streamed to the client (e.g,. videos). But if delivery of these large files must be as fast as possible, then perhaps storing them on the file system may reduce the overhead of accessing these files.

                   

                  Hope this helps. I'll update the documentation to reflect this subtle distinction.

                  1 of 1 people found this helpful
                  • 6. Re: ModeShape suitability for medium sized binary files
                    gmlopezdev

                    Very helpful again! Just to avoid missunderstandings, what do you actually mean by "storing them on the filesystem"? Do you mean not using a content repository at all or using the FileSystemBinaryStore?

                    • 7. Re: ModeShape suitability for medium sized binary files
                      rhauch

                      I only suggest storing the files outside the repository only if your application has some more efficient way to read/write the files (e.g., copying the files using system functions). On the other hand, if you're just going to use Java file IO (or NIO) to read and write those files, then putting them inside the repository is perfectly fine, since ModeShape also uses Java file IO and will be just as efficient as your application.