3 Replies Latest reply on Feb 26, 2013 10:33 AM by dan.berindei

    Cluster merge strategy for 5.2

    mattw

      Hi,

       

      We have currently implemented a largest cluster wins merge strategy on Infinispan 5.2 so when multiple clusters remerge, the largest dataset is kept. This has the potential for data loss from the smaller clusters. This is no longer satisfactory for our scenario.

       

      ISPN-263 (https://issues.jboss.org/browse/ISPN-263) may fix this by allowing us to provide a custom merge policy. This is slated for version 6.0 and that is too far away for us.

       

      We would like to implement a last write wins policy (We can add timestamps/version numbers to our objects to provide this) so there is no data loss (or at least minimal) on the merge. Is there somewhere in the state transfer where we can override the existing code to cheek for key overwrite and only take our latest copy? We don't have time to implement a full ISPN-263 solution, which would be more than we currently require, so consider this more of a quick fix while we wait for ISPN-263 in version 6.0.

       

      Any help or pointers to code location is gratefully appreciated.

       

      Many thanks,

      Matt

        • 1. Re: Cluster merge strategy for 5.2
          dan.berindei

          Hi Matt!

           

          You might be able to achieve what you want, but you have to start by modifying ClusterTopologyManagerImpl.updateCacheStatusAfterMerge.

           

          What we do now is we get the owners of each segment in every partition, and then we install a CH in which the owners of each segment is the union of the segment in each partition. The effect is that every node thinks it has all the data for the segments that it owned before the merge, and there is basically no state transfer.

           

          What you want is to start a different rebalance in each partition: if node A owned segment 0 in partition 0, and node B owned segment 0 in partition 1, you want A to request segment 0 from B, but you also want B to request segment 0 from A. You can do that, but you'll have to send a different REBALANCE_START command to each partition, instead of sending the same command to the entire cluster.

           

          In theory, you should be then able to enable versioning with your own custom VersionGenerator that uses timestamps, and rely on our versioning support to preserve the newest value. But we haven't actually tested this yet, so I'm not sure it will work without some changes to our versioning support.

           

          If your cache is transactional, you could actually read the local value and check the timestamps in StateConsumerImpl.doApplyState, inside the transaction. It would be a bit of a hack, and it's probably never going to be supported as such in Infinispan. But in your situation it might actually be the safer choice.

           

          Note that we don't create tombstones yet for removed keys, so if there's a conflict between a put and a remove, the put will win every time (even with versioning enabled). It might be ok in your scenario, but you should be aware of it.

          • 2. Re: Cluster merge strategy for 5.2
            mattw

            Hi Dan,

             

            Thanks for getting back to me.

             

            When you say "...there is basically no state transfer", could that be why I'm not seeing any state transfer under 5.2 for a Merge? (See https://community.jboss.org/thread/220771?tstart=0) I don't see DATA_REHASHED events either but you imply that should take place.

             

            From your description, I want the new coordinator to send the rebalance to the other partition and  the old coordinator to send to the partition with the new coordinator? Does that make sense? I can get the details of the two partitions from the Merge/ViewChanged events and I know if I'm the new coordinator but how can I get the old coordinator to send the rebalance the other way?

             

            I've had a look through the ClusterTopologyManagerImpl and the rebalance code but I still can't see where we can access actual cache objects to do the conflict resolution. I expect I'm missing something obvious there but if you could give me another pointer that would be great...

             

            You're right about the deletions, we are more worried about losing data than having deleted data reappear so we can live with that.

             

            Thanks again,

            Matt

            • 3. Re: Cluster merge strategy for 5.2
              dan.berindei

              Sorry for the long delay, Matthew.

               


              When you say "...there is basically no state transfer", could that be why I'm not seeing any state transfer under 5.2 for a Merge? (See https://community.jboss.org/thread/220771?tstart=0) I don't see DATA_REHASHED events either but you imply that should take place.

               

              Yes, that's the reason why you're not seeing the DATA_REHASHED events.

               

              From your description, I want the new coordinator to send the rebalance to the other partition and  the old coordinator to send to the partition with the new coordinator? Does that make sense? I can get the details of the two partitions from the Merge/ViewChanged events and I know if I'm the new coordinator but how can I get the old coordinator to send the rebalance the other way?

               

               

              You always want the latest coordinator to handle topology installation, because otherwise you'd have to deal with the possibility of the old coordinator dying and pick a new "old coordinator". So my idea was for the new coordinator has to start 2 rebalance operations in parallel (although you'd still need lots of changes for that as well, and you'd still need a way to recover the status of both rebalance operations if the new coordinator died).

               

              I've had a look through the ClusterTopologyManagerImpl and the rebalance code but I still can't see where we can access actual cache objects to do the conflict resolution. I expect I'm missing something obvious there but if you could give me another pointer that would be great...

               

              The actual conflict resolution you'd have to modify StateConsumerImpl as well, as that's the component that handles the data received via state transfer (see the doApplyState method).