I think that all the listed options are valid (and I can't think of a new one), but if network performance is a problem then I don't think you should be using Infinispan for replication, because that is *highly* network intensive.
So to me it seems like the best option would be #2 - use a database (even a RDBMS) - and handle replication between DB instances using whatever technology the DB provider offers.
In general however, having an active-active replicated binary store seems highly unusual, especially since binary stores will normally involve storing large amounts of data. So in order to keep the stores synchronized, you will most likely have to "copy across" large chunks of data. If network performance is a problem (like you mention) then I'm not sure this will ever perform well, regardless of adopted solution.
In general however, having an active-active replicated binary store seems highly unusual, especially since binary stores will normally involve storing large amounts of data. So in order to keep the stores synchronized, you will most likely have to "copy across" large chunks of data. If network performance is a problem (like you mention) then I'm not sure this will ever perform well, regardless of adopted solution
It is a good point. In my case, binaries values will be file contents and the application is mostly READ intensive rather than WRITE. Binary data network traffic for write is supposed to be in background, not from user activity. The problem with the central shared binary store will be the delay when user will try to access to a file content caused by network performances between the cluster and the shared binary store. It will be also a problem with fulltext search because same store is used. In a clustered binary store scenario, you will surely also have a network traffic but only between clusters and only once during synchronization.
Anyway, even with clustered DB binary store with clusters i could have a problem depending on the way the DB manage clusters and clusters awareness of missing data. For example, with MongoDB, it is asynchronous then i will need to consolidate to manage cases when binary data is not yet synchronized. It is basically the reason why Infinispan could be interesting because, from my understanding, when you use distributed mode each Infinispan cluster is aware of missing data available on other cluster and synchronize it when you try to access it. It means that first time i will try to access to the file there will be a delay but not the second one. I did not try yet, perhaps it is not working in this way, i need to verify. Perhaps i will also have a look at Cassandra that seems to be very good on paper.
Thanks for your help, any feedback of such implementation is welcome.
One thing you have to be aware regarding Infinispan in general is this issue: [MODE-2420] Modeshape can potentially lose data because Infinispan Cache Stores do not participate in transactions - JBo… (see the discussion and linked ISPN issues).
In short, if you're planning on clustering ISPN, the only way you get strong consistency (which is mandatory from ModeShape's/JCR's perspective) is if you're using a shared cache store to save all your data. If your using multiple cache stores, there's always the chance that an unexpected failure will leave your persistent data in an inconsistent state.
I see... Thanks for warning. Shared cache store also means that if this this store fails or is corrupted for any reason, the whole application is down. I am not very comfortable with such architecture, it is a real problem. If i use shared cache, what do you suggest for shared cache store type ?
You don't have to use a "shared cache". ISPN has always 2 parts: the in memory cache(s) and the persistent storage (i.e. the cache-store). Caches are always local to each node they run in and they can be configured in either replicated or distributed mode.
The cache-stores on the other hand can be multiple (e.g. each cluster node has its own cache store) or shared by all the caches (i.e. the shared cache store).
If your system is not write-intensive, then having multiple cache-stores might work, although it increases the chance that if one write to a cache-store fails, data in other cache-stores becomes inconsistent. A shared cache store on the other hands significantly reduces this risk and you can, for example, use a DB as a shared cache store which you replicate offline for backup reasons (an active-passive model).
I strongly suggest you also read: Consistency guarantees in Infinispan · infinispan/infinispan Wiki · GitHub
Thanks for your great advices.