1 2 3 Previous Next 31 Replies Latest reply on Jun 11, 2013 11:06 AM by rhauch Go to original post
      • 15. Re: Next generation federation requirements
        sverker

        Hi,

        I'm commenting now on the whole thread but very interesting reading. The main reason why I'm interested in modeshape is as it is stated on the front page:

         

        {quote}A ModeShape repository isn't yet another silo of isolated information, but rather it's a JCR view of the information you already have in your environment:

        files systems, databases, other repositories, services, applications, etc.{quote}

         

        When I first started to look at modeshape 3 alpha release I couldn't connect the dots as it seemed to me to be exactly that, a silo with it's own data storage via Infinispan which is not for integration of exsisting data structures. This thread makes it more clear with the distinction between "internal storage" and "external storage".

         

        My current use case is similar to what Jonathan describes above, and I've worked through basically the same alternatives. I have a repository of video files which we will play out over a CDN cloud service. Currently we are using AmazonS3 and Cloudfront but I don't want to be tied to tightly to that which was why I started to develop on modeshape in the first place. I created a AmazonS3 connector which provides the abstraction between the system and the storage which makes it possible to replace S3/CloudFront with other cloud service or with a local file storage without having to re-develop the system.

         

        However, my S3 connector (which was based on the FileSystem connector) is not efficient enough. It makes far too many requests to S3 even for simple operations so there are huge room for improvements there.

         

        With the 3.x architecture it would be an external storage connector and as I understand from this discussion it would be possible to use the internal storage as a high performance cache which would be the best setup.

         

        How far has you got on the external storage connector interface. I assume there is nothing of it yet in the alpha?

        • 16. Re: Next generation federation requirements
          hchiorean

          Hi,

           

          There isn't any external storage connector support yet, in the Alpha1 release. We're hoping to have something out in the next release.

          • 17. Re: Next generation federation requirements
            sverker

            Hi

            any estimate yet when you'll start working on the design of the connector/federation layer?

            /Sverker

            • 18. Re: Next generation federation requirements
              rhauch

              We're trying to get ModeShape 3.0.0.Final out the door, and realistically that should happen within the next 3-4 weeks. It's taken a while, but we've made excellent progress so far, and ModeShape 3 looks to be quite stable.

               

              Stay tuned!

              • 19. Re: Next generation federation requirements
                rhauch

                Federation has been added to 3.1.0.Final. Please take a look and see how it compares with your requirements.

                • 20. Re: Next generation federation requirements
                  jay709

                  I am also interested in the cloud connector, but unfortunately it is not provided in 3.1.

                   

                  Can anyone please give me any guide to develop my own cloud connector?

                   

                  I have the enviroment to connect to AWS, but not know what steps and framework to develpment a modeshape connector.

                  • 21. Re: Next generation federation requirements
                    rhauch

                    ModeShape can store data in the cloud via Infinispan and its JClouds cache store., which we also describe in our documentation. You are right that the JClouds cache store (aka cache loader) is not included by default, but you can easily include it if using Maven by adding this dependency:

                     

                          <dependency>

                             <groupId>org.infinispan</groupId>

                             <artifactId>infinispan-cachestore-cloud</artifactId>

                             <version>USE THE CORRECT INFINISPAN VERSION HERE</version>

                          </dependency>

                     

                    or (if you're not using Maven) by looking in the Infinispan downloads.

                    • 22. Re: Next generation federation requirements
                      jay709

                      Hi Randall,

                       

                      Thank you for your information about infinispan.

                       

                      But for a very large file, ie.,10GB, and it is stored in AWS Glacier, is it suitable using the infinispan?

                       

                      Jay

                      • 23. Re: Next generation federation requirements
                        rhauch

                        But for a very large file, ie.,10GB, and it is stored in AWS Glacier, is it suitable using the infinispan?

                         

                        I personally have no experience with AWS Glacier - perhaps someone else on the list might have more experience and could respond.

                         

                        However, while ModeShape will always store nodes and properties in Infinispan, it can optionally stores BINARY values in a separate location. See our documentation for more details. So while you might store your repository content (e.g., nodes and properties) in AWS via Infinispan and JClouds, you could choose to store the binary values somewhere else.  ModeShape has the concept of a BinaryStore, and we have several built-in implementations:

                         

                        • FileSystem
                        • Infinispan
                        • MongoDB
                        • Relational database (via JDBC)

                         

                        If one of these didn't meet your needs, you could create a new implementation that stored the binary values where and how you wanted. (Note that some of the built-in implementations store the whole file as-is, while others chunk it into pieces. Any custom implementation could do either.)

                         

                        What do you think?

                        • 24. Re: Next generation federation requirements
                          jay709

                          Thanks again,Randall.

                           

                          For our project, we want to save many images with metadata (from small size to extrem large size) into modeshape repository. We welcome the infinisapn cache feature for our small pictures. For those large files, they are used less often. So we put them seperately into the 'cold' storage, AWS Glacier, as the long term, lower cost storage. Once the clients need those file, they could be ready to download several hours later.

                           

                          Now, I was just wondering, as the infinispan is a in-memory cache, is it suitable to store/access the 10 GB large files through infinispan?

                          If not, is it feasible to implement the valure.binary.GlacierBinaryStory to use Glacier as the external storage? And/or any other classes, like sequencier/connector need?

                           

                          Jay

                          Randall Hauch wrote:

                           

                          But for a very large file, ie.,10GB, and it is stored in AWS Glacier, is it suitable using the infinispan?

                           

                          I personally have no experience with AWS Glacier - perhaps someone else on the list might have more experience and could respond.

                           

                          However, while ModeShape will always store nodes and properties in Infinispan, it can optionally stores BINARY values in a separate location. See our documentation for more details. So while you might store your repository content (e.g., nodes and properties) in AWS via Infinispan and JClouds, you could choose to store the binary values somewhere else.  ModeShape has the concept of a BinaryStore, and we have several built-in implementations:

                           

                          • FileSystem
                          • Infinispan
                          • MongoDB
                          • Relational database (via JDBC)

                           

                          If one of these didn't meet your needs, you could create a new implementation that stored the binary values where and how you wanted. (Note that some of the built-in implementations store the whole file as-is, while others chunk it into pieces. Any custom implementation could do either.)

                           

                          What do you think?

                          • 25. Re: Next generation federation requirements
                            rhauch

                            Jay C wrote:

                             

                            Thanks again,Randall.

                             

                            For our project, we want to save many images with metadata (from small size to extrem large size) into modeshape repository. We welcome the infinisapn cache feature for our small pictures. For those large files, they are used less often. So we put them seperately into the 'cold' storage, AWS Glacier, as the long term, lower cost storage. Once the clients need those file, they could be ready to download several hours later.

                             

                            Now, I was just wondering, as the infinispan is a in-memory cache, is it suitable to store/access the 10 GB large files through infinispan?

                            If not, is it feasible to implement the valure.binary.GlacierBinaryStory to use Glacier as the external storage?

                            Infinispan is not actually involved in reading/writing "large" binary values, where "large" is a size of your choosing (any binary values smaller are actually store with the content; larger binary values are stored in the binary store). The whole reason we did this was so that you could easily store 10GB files (or larger) without filling your heap. The exception to Infinispan not being involved in binary values is, of course, when you choose to actually store the "large" binary values inside Infinispan via our InfinispanBinaryStore. But then the binary values are chunked into pieces, and each piece is stored separately in Infinispan. (I'm still not convinced that it's a good idea to use the InfinispanBinaryStore for very large binary values.)

                             

                            So you could absolutely implement a BinaryStore that used AWS Glacier. In fact, from what little I just read about it, it looks like you might even be able to customize or extend the FileSystemBinaryStore -- or at the very least base a custom implementation upon FileSystemBinaryStore or even the MongodbBinaryStore. With AWS, you might have to figure out what to return if somebody asks for a particular binary value and it cannot be read immediately by the store (because it has to be "unarchived"). And if you don't care about extracting text from your large binary values, you can simply short-circuit those methods.

                             

                            An AWS Glacier binary store implementation would be a very welcome addition to ModeShape, if you're interested in contributing it back to us.

                             

                            We're working on a ChainedBinaryStore that has the ability to work with a series of other (real) BinaryStores in a designated order. You might even be able to use that in some way. See MODE-1908 for more details. (Work is mostly done except for configuring these chains within the EAP kit.) We're hoping that this work will make it into the 3.3 release that we'll cut at the end of this week.

                             

                             

                            And/or any other classes, like sequencier/connector need?

                             

                            Sequencers, connectors, and binary stores are completely unrelated to each other. Binary stores can use the text extractor framework (a binary store is asked to get the extract text from large binary values, so that an implementation can optionally store the extracted text), but otherwise text extractors are also independent of the rest.

                             

                            Hope this helps!

                            • 26. Re: Next generation federation requirements
                              jay709

                              Hi Randall,

                               

                              Thank you very much for the detailed explaination and the good suggestion for the starter of GlacierBinaryStore,Randall.

                               

                              Randall Hauch wrote:


                              Infinispan is not actually involved in reading/writing "large" binary values, where "large" is a size of your choosing (any binary values smaller are actually store with the content; larger binary values are stored in the binary store). The whole reason we did this was so that you could easily store 10GB files (or larger) without filling your heap. The exception to Infinispan not being involved in binary values is, of course, when you choose to actually store the "large" binary values inside Infinispan via our InfinispanBinaryStore. But then the binary values are chunked into pieces, and each piece is stored separately in Infinispan. (I'm still not convinced that it's a good idea to use the InfinispanBinaryStore for very large binary values.)

                               

                              So, for those data in binaryStore except InfinispanBinaryStore, it would bypass the cache provided by infinispan?

                               

                              And as in turn,Glacier wrapper should be implemented in binaryStore level, instead of inside the JClouds AWS wrapper?

                               

                              And please excuse me for further bothering:

                              Could you please drop some pieces of code/configurationAnnotation to show how the ModeShape  to use infinispan or not for special modules?

                               

                              Thanks once again,

                               

                              Jay

                              • 27. Re: Next generation federation requirements
                                rhauch

                                So, for those data in binaryStore except InfinispanBinaryStore, it would bypass the cache provided by infinispan?

                                Yes, that is correct.

                                 

                                 

                                And as in turn,Glacier wrapper should be implemented in binaryStore level, instead of inside the JClouds AWS wrapper?

                                If you create a new BinaryStore implementation that used AWS Glacier (using whatever APIs or libraries you wanted to use), then it would indeed be completely independent of the Infinispan JClouds cache store.

                                 

                                I cannot really say if a GlacierBinaryStore makes sense in an of itself, simply because I'm not familiar with the semantics or behavior of AWS Glacier. For example, if you uploaded a binary value into a repository that was using a GlacierBinaryStore, then can that binary value be accessed from Glacier at any time, or is there some period of time after the intial request for the data before Glacier makes it available?

                                 

                                If the latter is true, then maybe a GlacierBinaryStore might wrap another BinaryStore that maintains the actively-used binary values, but any binary value that hasn't been used in a while would get moved over to AWS Glacier. Dunno if that makes sense, either.

                                 

                                Finally, I just merged the ChangedBinaryStore into our 'master' branch, and this can be used to wrap a number of other binary stores and that allows clients to use "rules" to say in which store a binary value should be persisted. This might be really powerful with a simple GlacierBinaryStore that merely stores whatever is asked of it inside AWS Glacier, allowing the clients to say where the binary stores are physically stored. Note that when a binary value is read, the ChangedBinaryStore consults all of the wrapped stores until it finds one that has the binary value.

                                 

                                 

                                And please excuse me for further bothering:

                                Could you please drop some pieces of code/configurationAnnotation to show how the ModeShape  to use infinispan or not for special modules?

                                I'm not sure I understand what you're asking for. Can you try again?

                                • 28. Re: Next generation federation requirements
                                  jay709

                                  Hi Randall,

                                   

                                  Thank you very much again.

                                   

                                   

                                   

                                  And please excuse me for further bothering:

                                  Could you please drop some pieces of code/configurationAnnotation to show how the ModeShape  to use infinispan or not for special modules?

                                  I'm not sure I understand what you're asking for. Can you try again?

                                  Currently I am curious how ModeShape utilizes infinispan in code view. How the normal string/file Values go to infinispan while binaryStore Values not?

                                   

                                  Thanks,

                                   

                                  Jay

                                  • 29. Re: Next generation federation requirements
                                    rhauch

                                    Currently I am curious how ModeShape utilizes infinispan in code view. How the normal string/file Values go to infinispan while binaryStore Values not?

                                    Clients always use the JCR API to work with binary values. See the documentation for examples.

                                     

                                    What I'm going to explain is really never seen by client applications, and describes (at a very high level) how ModeShape handles the different kinds of property values (including BINARY values).

                                     

                                    First of all, clients use the Session's ValueFactory to create new STRING values, DATE values, REFERENCE, etc., and BINARY values. But you can only create a BINARY value from an InputStream, so under the covers we do things differently. (Actually, we have factories for each of the property types, and those factories know how to create/convert supplied values into the desired type. One such factory is a BinaryFactory that directly uses the repository's BinaryStore. All of this is internal code hidden to clients.) The end result is that each BINARY value created (and used) is stored inside the BinaryStore and identified by the SHA-1 hash of the content. And ModeShape never holds the whole BinaryValue in memory; it's always stream into and from the BinaryStore.

                                     

                                    Then, when you set a property to contain a BINARY value, ModeShape actually stores an internal reference to that BINARY value that simply contains the SHA-1 hash. It's this tiny internal reference that actually gets stored in Infinispan in the same document with the rest of that node's information. Note that creat a BINARY value (and therefore store it in the BinaryStore) but do NOT actually use it in a property, then eventually that unused BINARY value will get garbage collected.

                                     

                                    STRING values are handled slightly differently than BINARY values. When the node is persisted to Infinispan, the whole node is represented as a JSON/BSON document. When we are processing the node's content, any string that is larger than the "large string value" length set in the configuration is automatically converted into a BINARY value and stored inside the BinaryStore. Clients never see this happen, and can continue to access the property values as strings. Actually, you can convert any of the property types into Java strings, including BINARY values.

                                     

                                    I hope this helps!