11 Replies Latest reply on Oct 10, 2012 10:51 AM by K. Bachl

    IRC meeting to discuss issues to be fixed before 3.0.0.Final

    Randall Hauch Master

      We plan to cut ModeShape 3.0.0.CR1 today or tomorrow. Please join me to go over any outstanding issues that we've targeted to fix before 3.0.0.Final.

       

      When: Tuesday, Oct 9 at 13:00 UTC (see other time zones)

      Where: IRC chat room #modeshape on irc.freenode.net

      What: Review the open issues

       

      Be ready to quickly summarize any issues you're working on and provide an estimated completion date. If you cannot attend the meeting, please add comments to all your issues before the meeting.

       

      The outcome of this meeting will determine when/if 3.0.0.CR1 is adequate, and if not when we're ready to cut the 3.0.0.Final release.

       

      UPDATED: correct the day of the week

        • 4. Re: IRC meeting to discuss issues to be fixed before 3.0.0.Final
          K. Bachl Novice

          Short question: why is ModeShape 3 CR1 still on Infinispan 5.1.2.FINAL and not on the current 5.1.6.FINAL? the fixes between 5.1.2 and 5.1.6 are quite noteworthy IMHO

          • 5. Re: IRC meeting to discuss issues to be fixed before 3.0.0.Final
            Horia Chiorean Master

            Because AS 7.1.1.Final is the latest available final version of AS7 which we support and that uses ISPN 5.1.2. However, we also have a build profile which runs against ISPN 5.1.5.FINAL which is used by the AS7.2.Alpha kit.

            • 6. Re: IRC meeting to discuss issues to be fixed before 3.0.0.Final
              K. Bachl Novice

              wouldn't it be better to use the latest minor version of infnispan in the modeshape pom's and instead just overide the necessary transitions in the jboss-kit's poms?

               

              Oh and I tested the current CR1 against our project and all worked very fine. Performance seemed to got a bit slower from beta 3 or so, but that is only hard to notice (~3% - could also be measuring); Especially the infinispan stores are now also working well on filesystem and dont grow bigger than needed. One thing I'm not sure is the binaryStorage that is still used inside the sotarge section.

              I mean does this really make sence if all data is held by infinispan to seperate big files outside of it?

              • 7. Re: IRC meeting to discuss issues to be fixed before 3.0.0.Final
                Horia Chiorean Master

                wouldn't it be better to use the latest minor version of infnispan in the modeshape pom's and instead just overide the necessary transitions in the jboss-kit's poms?

                that's definitively an option and we might do it, but probably after the 3.0.Final release, which should be coming very soon.

                I mean does this really make sence if all data is held by infinispan to seperate big files outside of it?

                the binary storage is a bit different from the rest of the JCR data: the BSON format that we're using to store the nodes isn't suitable for plain binary content, plus for large binary values it could impact the overall performance & usability of a running repo, if we were to store everything in the same ISPN cache(s).

                 

                Having the binary storage as a separate "subsystem", offers us more flexibility in terms of:

                • different implementations - we currently have File System, Infinispan (a separate ISPN configuration), a relational DB and MongoDB
                • separate lifecycle management - we use "reference counting" for binary values and whenever a value isn't used, we remove it from the storage
                • handling of mime-types and text extraction - which is something specific to binary content

                 

                So from an overall design perspective, separating (or extending) the storing of JCR nodes from the binary data of properties, seems like a sound solution.

                • 8. Re: IRC meeting to discuss issues to be fixed before 3.0.0.Final
                  Randall Hauch Master

                  Horia was absolutely correct - it's all about flexibility and choice. Some folks use ModeShape to store very (very!) large files, and it simply doesn't make sense to force them to store all content in the same system, especially when doing so would have a pretty severe negative impact on performance.

                   

                  Note that even with this design, you can choose to store binary content in the same place as your regular node and properties content. Horia mentions that you can store your binary content in Infinispan, and you might notice that doing so requires two caches: one for the binary (chunked) data and one for the metadata. We separated these because their access patterns are very different: the data is only read when Binary.getStream() is called for a given binary value, but the metadata is significantly smaller in size yet accessed far more frequently (when creating and accessing Binary values, and when housecleaning/garbage collecting). Because they're so different, we recommend storing them in different caches.

                   

                  For example, in small clustered systems, both caches can be replicated. In large clustered systems that are not using a backing store, the data cache should be setup as distributed (where the cache maintains a handful of copies distributed across the entire cluster) while the metadata cache should be replicated (where every process in the cluster has a copy of the metadata).

                   

                  Note that you can configure all these caches within a single Infinispan configuration file (we do recommend this). In fact, we're talking about making a small change to the Infinispan binary store so that one could configure the binary data and metadata caches and the main content cache to all use the same Infinispan cache. But this would only be good to do in small repositories that generally have smallish and few binary values.

                   

                  Best regards

                  • 9. Re: IRC meeting to discuss issues to be fixed before 3.0.0.Final
                    K. Bachl Novice

                    @Horia and @Randall: Thank you very much for this clear explanation!

                     

                    Now, what "size" would you suggest to seperate the large files from the small ones? I know that this might differ, but there should be some roundabout number, shouldn't it? In ModeShape 2.x I was forced to put it together int to meta area as DiskConnector in conjunction with programmatically cloned workspaces tend to loose large files where still in need in another area - so I dont really have any clue of a reasonable size yet.

                     

                    I'm asking this as I intend to switch to modeshape 3 quite soon as the infinispan with cache loader approach reduces our Disk IOs compared to our current 2.8.x - DiskConnector solution (that was necessary as infinispan once didnt like programaticaly created workspaces );

                     

                    Best

                    • 10. Re: IRC meeting to discuss issues to be fixed before 3.0.0.Final
                      Randall Hauch Master

                      Now, what "size" would you suggest to seperate the large files from the small ones? I know that this might differ, but there should be some roundabout number, shouldn't it?


                      It completely depends on your situation.

                       

                      Do you want to store your binary values in a separate area? Do you want to use a different storage technique? If so, then a smaller (or even '0') minimum large value size would be better, so that more binary values are persisted in the separate store. Will you be using the same binary value in more than node? If so, the binary store is able to store each BINARY value only once, no matter how many times it's used. Are you accessing the nodes with BINARY property values much more frequently than you're accessing the BINARY values themselves? If so, then separating the node and the BINARY values will lead to better performance.

                       

                      Storing the binary values in the content (e.g., a large minimum size for the binary values) means that the binary value is stored with it's node right along side the STRING, DATE, LONG and other property values. That means that whenever that node has to be materialized into memory, the binary values come along for the ride, and you're paying the performance. However, this usually is acceptable for smaller BINARY values that are not reused in multiple places.

                       

                      The default is 4 kilobytes, and this is probably acceptable for many situations. That's small enough that it's okay to store, materialize, and persist the value with the node where it's used, and small enough that a few duplicate BINARY values won't break the bank. With binary values larger than this, the overhead becomes greater, so it makes more sense to store them in a separate location and stream them (via buffers) only when needed while storing only one copy of each.

                       

                      Hope this helps.

                      • 11. Re: IRC meeting to discuss issues to be fixed before 3.0.0.Final
                        K. Bachl Novice

                        Thank you very much for this answer - this really should help me