6 Replies Latest reply on Sep 8, 2015 3:31 AM by Sébastien Berthezene

    Binary store replication when ModeShape is used in replication mode

    Sébastien Berthezene Newbie

      Hi,

       

      Does it exist any internal mechanism or binary store implementation to automatically replicate also binary content when ModeShape is used in replication mode. In our case the network performance will be a problem and they do no really want to have a shared binary store that can potentially be far away from some clusters.

       

      Possibilities :

       

      1.     use Infinispan also for binary data cache but I will have a lot of files, so memory occupancy could be huge. Also, I do not know if Infinispan will work correctly with big binary (file) and what is the data limit length for one item.
      2.     use JDBC/NoSQL database for binary store and rely on internal cluster management of database. For example, i could use MongoDB to store binaries and let MongoDB manage its cluster configuration. Each ModeShape cluster will have its own MongoDB cluster for binary
      3.     external solution : keep using local file store but use an external solution to synchronize file store between servers. In this case i will need to consolidate the application code because there can be cases when i will try to access binary content not already synchronized

       

      What do you think ? Any other possibility ? Is there any developer who faced the same problem and how did you solve it ?

        • 1. Re: Binary store replication when ModeShape is used in replication mode
          Horia Chiorean Master

          I think that all the listed options are valid (and I can't think of a new one), but if network performance is a problem then I don't think you should be using Infinispan for replication, because that is *highly* network intensive.

          So to me it seems like the best option would be #2 - use a database (even a RDBMS) - and handle replication between DB instances using whatever technology the DB provider offers.

           

          In general however, having an active-active replicated binary store seems highly unusual, especially since binary stores will normally involve storing large amounts of data. So in order to keep the stores synchronized, you will most likely have to "copy across" large chunks of data. If network performance is a problem (like you mention) then I'm not sure this will ever perform well, regardless of adopted solution.

          • 2. Re: Binary store replication when ModeShape is used in replication mode
            Sébastien Berthezene Newbie

            In general however, having an active-active replicated binary store seems highly unusual, especially since binary stores will normally involve storing large amounts of data. So in order to keep the stores synchronized, you will most likely have to "copy across" large chunks of data. If network performance is a problem (like you mention) then I'm not sure this will ever perform well, regardless of adopted solution

            It is a good point. In my case, binaries values will be file contents and the application is mostly READ intensive rather than WRITE. Binary data network traffic for write is supposed to be in background, not from user activity. The problem with the central shared binary store will be the delay when user will try to access to a file content caused by network performances between the cluster and the shared binary store. It will be also a problem with fulltext search because same store is used. In a clustered binary store scenario, you will surely also have a network traffic but only between clusters and only once during synchronization.

             

            Anyway, even with clustered DB binary store with clusters i could have a problem depending on the way the DB manage clusters and clusters awareness of missing data. For example, with MongoDB, it is asynchronous then i will need to consolidate to manage cases when binary data is not yet synchronized. It is basically the reason why Infinispan could be interesting because, from my understanding, when you use distributed mode each Infinispan cluster is aware of missing data available on other cluster and synchronize it when you try to access it. It means that first time i will try to access to the file there will be a delay but not the second one. I did not try yet, perhaps it is not working in this way, i need to verify. Perhaps i will also have a look at Cassandra that seems to be very good on paper. 

             

            Thanks for your help, any feedback of such implementation is welcome.

            • 3. Re: Binary store replication when ModeShape is used in replication mode
              Horia Chiorean Master

              One thing you have to be aware regarding Infinispan in general is this issue: [MODE-2420] Modeshape can potentially lose data because Infinispan Cache Stores do not participate in transactions - JBo… (see the discussion and linked ISPN issues).

              In short, if you're planning on clustering ISPN, the only way you get strong consistency (which is mandatory from ModeShape's/JCR's perspective) is if you're using a shared cache store to save all your data. If your using multiple cache stores, there's always the chance that an unexpected failure will leave your persistent data in an inconsistent state.

              • 4. Re: Binary store replication when ModeShape is used in replication mode
                Sébastien Berthezene Newbie

                I see... Thanks for warning. Shared cache store also means that if this this store fails or is corrupted for any reason, the whole application is down. I am not very comfortable with such architecture, it is a real problem. If i use shared cache, what do you suggest for shared cache store type ?

                • 5. Re: Binary store replication when ModeShape is used in replication mode
                  Horia Chiorean Master

                  You don't have to use a "shared cache". ISPN has always 2 parts: the in memory cache(s) and the persistent storage (i.e. the cache-store). Caches are always local to each node they run in and they can be configured in either replicated or distributed mode.

                  The cache-stores on the other hand can be multiple (e.g. each cluster node has its own cache store) or shared by all the caches (i.e. the shared cache store).

                   

                  If your system is not write-intensive, then having multiple cache-stores might work, although it increases the chance that if one write to a cache-store fails, data in other cache-stores becomes inconsistent. A shared cache store on the other hands significantly reduces this risk and you can, for example, use a DB as a shared cache store which you replicate offline for backup reasons (an active-passive model).

                   

                  I strongly suggest you also read: Consistency guarantees in Infinispan · infinispan/infinispan Wiki · GitHub