9 Replies Latest reply on Dec 22, 2006 8:16 AM by belaban

    Buddy state transfer

    vblagojevic

      Currently, in HEAD branch, it seems that each buddy collects its own state and then sends it across the wire to his buddies in RPC call.

      Should we change this approach so that upon receipt of buddy reconfiguration message each buddy retrieves state from each of his buddies using regular state transfer rather than using RPC?

      The main reason why current approach is a problem I would suspect is that state transfer is not done with in-place state transfer semantics, i.e FLUSH and byte|streaming transfer defined in configuration file.

      What do you think?

        • 1. Re: Buddy state transfer
          brian.stansberry

          In general, I think shifting to a pull approach using the same process as the other state transfer types is the right thing to do. One set of code to maintain, etc, etc. The following bits look like I'm arguing in the opposite direction, but takes that as "food for thought", i.e. inputs into the discussion.

          There are a couple problems with using a regular pull-type state transfer:

          1) Deadlock problems, which is what led last spring to the push approach. Problem is new cache C comes online, decides A and B are its buddies, so sends A and B an RPC telling them. A and B get the RPC and request state. Problem is they are using the JG up_thread to request state, so it deadlocks.

          That's probably solvable, but at the cost of making the buddy group formation protocol more complex.

          2) A and B now have to independently request state rather than C doing a single push. Don't know if there are any issues with the coordination involved there. Now that we are using anycast, the network utilization is likely the same either way.

          One thing to think about is whether the JBC-315 issue applies here. Or can it be made to go away w/o relying on FLUSH? Here the data in question is *owned* by the node that sends it. Anything that changes that state is originating on that server -- either via a locally originating put/remove call, or via a response to a gravitation call that the node sent out.

          • 2. Re: Buddy state transfer
            manik

            I too agree that, from a manageabillity + intuitiveness standpoint, we should use the same process, but here's more food for thought (hope you're hungry!) :-)

            "bstansberry@jboss.com" wrote:


            2) A and B now have to independently request state rather than C doing a single push. Don't know if there are any issues with the coordination involved there. Now that we are using anycast, the network utilization is likely the same either way.



            Even with anycast, it is not the same. Currently, anycast forces a unicast to the members individually. IIRC, if the transport is UDP and multicast is enabled, this is achieved by multicasting and only named recipients accepting the comms (Bela or Vlad, perhaps you could confirm this?).

            If this is true, then the pull approach is more expensive on UDP/multicast, even with anycast.

            Another cost with the pull approach is processing time building the state transfer payload. And also the concurrency cost on the data owner, locking the tree to generate this payload. This happens once with push, n times with pull.

            Both of these gets more expensive (O(n)) based on number of buddies though, so with just 1 buddy per node (the default) this should not be any more expensive.

            • 3. Re: Buddy state transfer
              manik

              Perhaps the correct approach would be to implement the state transfer subsystem to deal with both push and pull models? Sure, this is a more complex and longer term approach, but I dare say more correct, and offering more flexibility?

              • 4. Re: Buddy state transfer
                brian.stansberry

                When you talk about dealing with both, do you mean in JGroups or in JBC?

                The good news is the actual state transfer code in JBC already deals with both models quite well. Except there is no way to do streaming state transfer via push. But the FLUSH stuff Vladimir is doing; I've no idea if that could be adapted. But maybe it doesn't need to be, if JBCACHE-315 isn't an issue there.

                • 5. Re: Buddy state transfer
                  belaban

                   

                  "manik.surtani@jboss.com" wrote:

                  IIRC, if the transport is UDP and multicast is enabled, this is achieved by multicasting and only named recipients accepting the comms (Bela or Vlad, perhaps you could confirm this?).


                  No, we always use unicasts. It is up to the caller of RpocDispatcher.callRemoteMethods() to decide whether a multicast or multiple unicasts are used.
                  With TCP as transport, of course *always* use anycasting. With UDP (IP multicasting), it depends on the subset to which you want to send a message, e.g.
                  1-3 out of 10: anycast
                  4-10 out of 10: multicast




                  • 6. Re: Buddy state transfer
                    belaban

                    Another issue I see with the state transfer approach is that we need to run the flush protocol *once* for each state transfer. Unless we can have the coordinator start and stop the flush protocol, this might be quite expensive.

                    Regarding deadlocks: we could wait until JGroups 2.5 is in place; with the threadless stack and parallel and out-of-band processing, deadlocks should not occur anymore.

                    • 7. Re: Buddy state transfer
                      manik

                       

                      When you talk about dealing with both, do you mean in JGroups or in JBC?


                      In JBC, using whatever support we can get from JG.

                      • 8. Re: Buddy state transfer
                        manik

                         

                        "bela@jboss.com" wrote:
                        ... With UDP (IP multicasting), it depends on the subset to which you want to send a message, e.g.
                        1-3 out of 10: anycast
                        4-10 out of 10: multicast


                        Is "3" a hard coded limit? Or is this something that is configurable?

                        • 9. Re: Buddy state transfer
                          belaban

                          No, this is just an example: for UDP, if you call RpcDispatcher.callRemoteMethods() with use_anycast=false, I'll use regular multicast. If it is true, I'll use single unicasts *no matter how many members there are in the target list* !

                          So the decision whether to use anycast or not would be made by JBossCache, possibly based on my previous example. Maybe even configurable ? Although we have already too many attributes... :-)