7 Replies Latest reply on Mar 12, 2008 1:02 PM by brian.stansberry

    Eager state push on failure with Buddy Replication

    manik

      At the moment, when using buddy replication and a data owner fails, the backup resides on a buddy instance and is only gravitated into the primary tree of an instance when someone makes a request for that data.

      It has been designed this way to minimise network traffic and load during the crash of a server, which could cause a network storm if there is a lot of state that would need to be reorganised across a cluster.

      Despite that, people have asked for an option to allow an eager push of backup state to a new data owner - or even the assigned buddy taking on the data as though it were the owner - and creating appropriate backups.

      Here are some thoughts on how this can be implemented:

      * When an instance fails, this option forces the buddy to take ownership of the failed node’s state.
      * Should wait a defined amount of time first, to allow for gravitation calls to organically move data.
      * Don’t need to block data gravitation calls when taking ownership since DG will look in both primary and backup trees
      * Could be in chunks to prevent a network storm (since the new node taking ownership will be backing stuff up as well). Would need some additional "hints" as to which subtrees should be considered "related" data though.
      * When an instance failure is detected, buddies should rename the region such to prevent the original instance re-appearing and overwriting backed up state. E.g., rename /_B_B_/CacheA/ to /_B_B_/CacheA:dead_n/ where n is a counter since A could die and rejoin several times before the state is re-owned.

      Thoughts and comments? How important/useful do you think this is in the first place?

        • 1. Re: Eager state push on failure with Buddy Replication
          brian.stansberry

          The presence of a region can act as a "hint" about related data. Meaning a region root node represents the lowest level of unrelated data. For example, we have a region

          /JSESSION/localhost/webapp1

          In buddy backup we have:

          /_BUDDY_BACKUP/CacheA:dead_0//JSESSION/localhost/webapp1/session1/attrA
          /_BUDDY_BACKUP/CacheA:dead_0//JSESSION/localhost/webapp1/session1/attrB
          /_BUDDY_BACKUP/CacheA:dead_0//JSESSION/localhost/webapp1/session2/attrA
          /_BUDDY_BACKUP/CacheA:dead_0//JSESSION/localhost/webapp1/session2/attrB
          /_BUDDY_BACKUP/CacheA:dead_0//JSESSION/localhost/webapp1/session3/attrA
          /_BUDDY_BACKUP/CacheA:dead_0//JSESSION/localhost/webapp1/session3/attrB

          The background thread recognizes the existence of the
          /JSESSION/localhost/webapp1 region and therefore starts iterating over the children of
          /_BUDDY_BACKUP/CacheA:dead_0//JSESSION/localhost/webapp1/ migrating one child at a time.

          I recognize this example is very much tailored to my particular use case, but actually in every JBC app I've written a region has that kind of meaning.

          A "structural" node marker can be used instead of a region, and more cleanly indicates the meaning, since the region concept is so overloaded.


          Re: usefulness, I think it's pretty necessary. With buddy groups / data partitions by default having 2 members, one member leaving means only 1 copy of data. Admins have to be very careful 1) to know what node has that backup data and 2) not to remove that node fom service until they are sure that data isn't needed any longer -- which typically means waiting a 1/2 hour or more. That means a simple rolling upgrade of a 4 node cluster takes over 2 hours, which is probably longer than a lot of service windows. Larger cluster takes longer.

          • 2. Re: Eager state push on failure with Buddy Replication
            manik

            Are regions replicated on the buddy backup as well? The region marker, I mean? Just wondering if the concept in itself still holds true.

            • 3. Re: Eager state push on failure with Buddy Replication
              brian.stansberry

              I don't understand what you mean by "region marker". Are you referring to

              A "structural" node marker can be used instead of a region, and more cleanly indicates the meaning, since the region concept is so overloaded.
              ?

              If yes, AIUI the "structural" node marker concept doesn't really exist yet. There's the "resident" flag which IIRC is not replicated.

              • 4. Re: Eager state push on failure with Buddy Replication
                manik

                Well, either case really. In the case of the "resident" flag, this is not replicated.

                Even if we use a traditional region (created using the RegionManager) I don't believe this is recreated on the buddy backup instance, since Regions are explicitly created on each instance. I could be wrong as I haven't checked the code yet, but I'm pretty sure regions aren't reflected in a buddy backup subtree.

                • 5. Re: Eager state push on failure with Buddy Replication
                  brian.stansberry

                  OK, so you're talking about a case where Node B is Node A's buddy, but whatever application created region /JSESSION/localhost/webapp1 on A hasn't deployed on B.

                  Yeah, that's a problem.

                  If the "resident" flag were replicated that would be a better solution. That would be a good thing anyway, although it adds cost to replication/invalidation messages.

                  • 6. Re: Eager state push on failure with Buddy Replication
                    manik

                    the problem with the resident flag is that it could be used for anything - arbitrarily marking a node such that it doesn't get evicted, etc.

                    • 7. Re: Eager state push on failure with Buddy Replication
                      brian.stansberry

                      Yes, "structural node" != "resident". I mean "structural node flag" and should be disciplined and use the exact terminology. :-) You're right, substituting one concept for the other isn't correct.