4 Replies Latest reply on Feb 26, 2013 11:38 AM by Dan Berindei

    State Transfer after cluster merge in Infinispan 5.2 (CR3)

    Matthew Wellard Newbie



      When running under Infinispan 5.1.5 we saw state transfer happening on cluster merge events. This potentially swapped data if the same key existed in both clusters and had been updated.


      When running under 5.2 (CR3) I can see the merge event and see the cluster reforming but I'm not seeing any state transfer at all. Data that was changed is only seen by members of the cluster where it was changed. If new clients are started, they correctly join the cluster and get a copy of the data (I can see DataRehashedEvents).


      I can see a lot has changed for State Transfer in 5.2. Is there any changes I need in the config to get this to work? Or should it at least behave like 5.1.5 out of the box?


      Many thanks,


        • 1. Re: State Transfer after cluster merge in Infinispan 5.2 (CR3)
          Erik Salter Newbie

          Hi Matt,


          Without knowing too much about your configuration, this appears to be working as designed.  The big difference (and this is a MASSIVE improvement) is in non-blocking state transfer.  The state migration is separate from the view change event, whereas in 5.1.x, it was "stop the world."  Due to the new segment design, this has the (very welcomed) effect of data stickiness -- data is far more likely to stay with the node it was written to than with the virtual node implementation.  (YMMV with different CH Factory implementations)


          So if you're seeing DataRehashed events (two -- one before and one after), the data is accessible from every node in the cluster, and the ownership policy is correct (data replicas == numOwners), you're golden.


          See HERE for the NBST design.  If any of the above aren't true, post back here and I'll see about helping you debug.





          • 2. Re: State Transfer after cluster merge in Infinispan 5.2 (CR3)
            Matthew Wellard Newbie

            Hi Erik,


            Thanks for the response. I am seeing some of that so I'll go through my simple scenario (Which I've now also run with 5.2.Final) and hopefully we can figure out what is going wrong.


            Node A has a simple replicated string based cache with key "a" - value "a".


            Node B starts and joins the cluster and we get the following events

              CACHE_ENTRY_CREATED (pre=true, key=a)

                 CACHE_ENTRY_MODIFIED (pre=true)

                 CACHE_ENTRY_MODIFIED (pre=false value=a)

              CACHE_ENTRY_CREATED (pre=false)

              TOPOLOGY_CHANGED (pre=true)

                  DATA_REHASHED (pre=false)

              TOPOLOGY_CHANGED (pre=false)

            Node A has the following events:


              TOPOLOGY_CHANGED (pre=true)

                  DATA_REHASHED (pre=false)

              TOPOLOGY_CHANGED (pre=false)

              TOPOLOGY_CHANGED (pre=true)

                  DATA_REHASHED (pre=false)

              TOPOLOGY_CHANGED (pre=false)


            Node B now also has key="a" - value="a"


            The network is now disconnected between the two...

            Both nodes get:


              TOPOLOGY_CHANGED (pre=true)

              TOPOLOGY_CHANGED (pre=false)

            Both have access to the data no problem.


            On Node A:

              Update key "a" - value "b"

              Add key "b" - value "b"

            On Node B:

               Update key "a" - value "c"

               Add key "c" - value "c"

            Both updated and can see their own data.


            Everything so far is as expected.


            Reconnect the network. Following events seen on both nodes:


              TOPOLOGY_CHANGED (pre=true)

            TOPOLOGY_CHANGED (pre=false)


            There are no DATA_REHASHED events on either node and each node can only see the data it had.

            Attempt to get the data from Node A that's only on Node B, "null" is returned.


            I appreciate that changes to the same key are not handled in a controlled way but I'm not seeing transfer of the new data between the two nodes either.


            Adding another node to the cluster I see DATA_REHASHED events and the new node gets a copy from one of the nodes only so our cache content is still inconsistent across the cluster.


            I've tried playing with the "stateTranster" properties in the infinispan config xml but no change.


            Any help or suggestions gratefully received.







            • 3. Re: State Transfer after cluster merge in Infinispan 5.2 (CR3)
              Mircea Markus Master

              I'd expect a DATA_REHASH to be triggered on a view merge, Dan/Adrian can you please comment on this?

              • 4. Re: State Transfer after cluster merge in Infinispan 5.2 (CR3)
                Dan Berindei Expert

                Mircea Markus wrote:


                I'd expect a DATA_REHASH to be triggered on a view merge, Dan/Adrian can you please comment on this?


                No, we don't actually rehash any data on a merge: we assume that every member was up-to-date before the merge and that it doesn't need to receive anything from the other partition. Each segment ends up with num_partitions * num_owners owners, and we just send a CH_UPDATE command to trim the number of owners per segment to num_owners.


                This doesn't really work, except maybe if the cache is using a shared cache store (but then in that scenario maybe invalidating the entire cache on a merge would be even better).


                One alternative would be to consider the largest partition the "authoritative version", and make the nodes from all the other partitions request data from the largest partition's owners. This would be relatively simple to implement (just send a REBALANCE_START command with currentCH == largest_partition_currentCH and pendingCH = balanced_CH_for_the_merged_cluster), but it would still lose data.


                Another option would be to make each partition exchange data with the other partition, as discussed in https://community.jboss.org/thread/220703. This would be a lot more tricky to implement, because we'd have to send a different REBALANCE_START command to each partition (and if the merge coordinator died, we'd have to be able to recover the status of each partition without any information from JGroups). Supporting merges with 3 partitions would be even more complicated.