7 Replies Latest reply on Feb 13, 2013 11:27 AM by yanosh

    Best Architecture to store file on file system

    yanosh

      Hi,

       

      i'm really new to modeshape. I have to implement a modeshape based stand alone library to store some file on my file system with other metadata in xml format. The Files i need to store are produced with another stand alone application made by a collegue. He will have to import in his classpath my classes and use them to reach this aim.

       

      Because i come from the jpa world the only architecture i can image is to write some DataAccessObject class with an interface to store the files, while the Model is represented by the beans in the project of my collegue (implemtenting my Storable interface). Is this method of think correct also with modeshape?

       

      I also read in another discussion that to store content on my file system i need the jpa connector with HQSQLDB. can i have any suggestion, maybe some simple example, on how implement this technology?

       

      In the attachment the code of my generic DataAccessObject. In my mind, every DataAccess should extends this, specifying the node (whose path is indicated in absPath) it is responsable to manage (like il would be a table of a database).

       

      Thanks in advance and regards!

        • 1. Re: Best Architecture to store file on file system
          rhauch

          ModeShape is a hierarchical database that supports a flexible schema. While it is possible to use a highly-rigid schema, most users tend to want to have less-rigid schemas or even more of a schemaless structure. This is really where ModeShape shines. Add in the ability to cluster and the ease of storing files of any size makes it a perfect system to store metadata and files (though this is just one frequently-occurring use case).

           

          I think the challenge of using DataAccessObjects and POJO entities (which are obviously critically important in JPA) is that you'll tend to constrain yourself to the most limiting way of using ModeShape. The ModeShape API is built around nodes with properties, and while you can map the nodes/properties to entity-like classes, any changes to your data will require changing your classes. For some scenarios, this is perfectly acceptable. But for many other scenarios, the data structure evolves more frequently than you want to change code.

           

          Perhaps some other people on the forum can share how they code to ModeShape and JCR API.

           

          Now, please be aware that ModeShape 2.x used connectors for all storage, and it offered a JPA connector that stored content in a relational database. The community has largely moved onto ModeShape 3, which has a completely different architecture and is all around a hue improvement over 2.x. A ModeShape 3 repository stores its content in an Infinispan cache, which can be distributed/replicated and/or persisted to a variety of stores, including the file system, relational database, Cassandra, cloud, etc. See our Getting Started guide for an overview. ModeShape 3 does have connectors, but they're used only for federation.

           

          So, I'd strongly recommend looking at ModeShape 3. And if you can, spend a few days to write some prototypes that store some representative data in a repository using the JCR API. Don't write warpper classes, and don't use POJOs. This will help you become familiar with the API, how it's used, and how powerful it can be. Don't worry about node types to start out; just focus on the data. Consider several ways of structuring your content, and look at how the data might be navigated by your application. Upload some files. Add some metadata to various nodes. Learn how to evole the data structure by adding more properties to existing nodes. To get started, you could even work with an in-memory repository so you don't have to worry about configuration too early. Again, focus on the data.

           

          After that, you might look into how you might start using node types to define the structure of your data (really, to identify the patterns your data will follow), and the difference between a primary node type and a mixin node type.

           

          Then start thinking about how/where you want to store your data. Will you embed ModeShape into an application? Will it be web-based or JavaSE? Will it be clustered? Answers to these questions can help guide what the configuration file might look like. Of course, this can seem more complicated than it is, so don't hesitate to ask more questions.

          1 of 1 people found this helpful
          • 2. Re: Best Architecture to store file on file system
            yanosh

            Hi Randall, first of all thank you for your answer.

             

            Actually i'm already on modeshape 3.x. Indeed i have configured an infinispan FileCacheStore configurated (you have my config in the attachement) with a file called initial.xml that indicate starting nodes i need. i already have a structure of my future file system and i have already written some class to start the modeShape engine (and it works, i have printed some node name in my console, but only textual information.. i need also to store files pdf). So sorry for my bad explanation (my english is also not so good ).

             

            so the question is: what now? i can connect to a repository with an initial structure but my boss want me to write some api to give to my collegue, and let him to persist this pojos that contains some pdf to send to our customers, and also to be shown to the administrators when we will set up a web application that will read this data. If i shouldn't think in term of wrapper classes and pojo how can i map the beans of my collegue to a Node? and still, how can i pass to the repository complex data different from a string like a byte array? in this terms effectively some code of other people of the forum would be really appreciate.

             

            Another thing is: according to OOP, how can i hide the logic of the JCR to this my collegue in manner to avoid him to study the JCR for persist an object?

             

            I don't know, maybe i'm too focused on my baggage of the last 7 year of work with the relational database so sorry for the low level questions

            • 3. Re: Best Architecture to store file on file system
              rhauch

              Actually i'm already on modeshape 3.x. Indeed i have configured an infinispan FileCacheStore configurated (you have my config in the attachement) with a file called initial.xml that indicate starting nodes i need. i already have a structure of my future file system and i have already written some class to start the modeShape engine (and it works, i have printed some node name in my console, but only textual information.. i need also to store files pdf).

              Excellent!

               

               

              So sorry for my bad explanation (my english is also not so good ).

               

              Don't worry about it. Perhaps I wasn't reading closely enough.

              so the question is: what now? i can connect to a repository with an initial structure but my boss want me to write some api to give to my collegue, and let him to persist this pojos that contains some pdf to send to our customers, and also to be shown to the administrators when will set up a web application that will read this data. If i shouldn't think in term of wrapper classes and pojo how can i map the beans of my collegue to a Node? and still, how can i pass to the repository complex data different from a string like a byte array? in this terms effectively some code of other people of the forum would be really appreciate.

               

              Another thing is: according to OOP, how can i hide the logic of the JCR to this my collegue in manner to avoid him to study the JCR for persist an object?

              Well, if you are not allowed to expose the JCR API (which is a standard Java API, BTW), then there are two options:

               

              1. Use a conventional API with classes (including POJOs) containing the information you're passing back and forth
              2. Use a more dynamic (and optionally less-strongly) typed API with classes that use/implement map-like structures for some/all of the information you're passing back and forth. The map could be used to store the information normally stored in fields in POJOs.

               

              Both could use a DAO-like wrapper for the "service". Can you provide a concrete example of what one POJO (or several) might be like in the conventional (JPA-like) case?

               

              The one thing that you should try to mirror, however, is using streams for the files. For example, when the application uses your API, it should provide an InputStream to the file content. Your API implementation could then obtain a Session, create a Binary value from that InputStream, create a node (or a small structure of nodes, depending upon your use case), store the Binary value in one of the properties on the node, set the other properties of the node(s), and then save the Session.

               

              What you do not want to do is hold the file content in memory (in, say, a byte[]).

               

              I don't know, maybe i'm too focused on my baggage of the last 7 year of work with the relational database so sorry for the low level questions

              That certainly is a challenge.

               

              The JCR API is not really intended to be hidden like the JDBC API is. It was designed to be used directly by applications that are storing data, and doing so exposes all of the power and flexibility of ModeShape (or any other JCR implementation). I think most applications are written directly to the JCR API. But of those that aren't, some use a DAO-like pattern with read/write methods (and POJOs) while others use more of a light-weight facade pattern where the business object is basically a wrapper around the Node structure. The latter doesn't create POJOs that are separate representations; it is directly used by the application to access and update the underlying content. (This might be easier to explain with some examples from your domain.)

              • 4. Re: Best Architecture to store file on file system
                yanosh

                Good morning Randal,

                 

                sorry if i disappeared but i went home from work . i spoke with my collegue and i discovered he hasn't a pojo at all . he only have a java.io.File for save a zip with all pdf files, and a list of java.io.File that represent a series of xml with metadata that i need. so the problem is simpler . at this point: can you kindly suggest me a way to save this series of files in a nt:folder with a certain path, or in 2 distinct folders?

                 

                In the meanwile i will try to google the solution, considering i am about 6 ours ahead of you there

                 

                regards!!

                • 5. Re: Best Architecture to store file on file system
                  yanosh

                  ok, found... i'll try this solution

                   

                  thanks thousand!

                   

                  https://docs.jboss.org/author/display/MODE/Storing+files+and+folders

                  • 6. Re: Best Architecture to store file on file system
                    rhauch

                    Take a look at our JcrTools utility class (you can find the code for it here), especially the "uploadFile" methods. It basically is the logic for saving files in the repository. You can use directly, or you can create similar code.

                     

                    However, the "nt:file" and "nt:folder" nodes don't allow extra properties. But you can do that with mixins; see the "Adding other properties" on that page.

                    • 7. Re: Best Architecture to store file on file system
                      yanosh

                      Thank you, Bye!!