1 Reply Latest reply on Aug 31, 2010 2:34 PM by timfox

    Backup FSM

    jmesnil

      To get my mind clear on HA failover, I drew the FSM for a backup either with no-shared store or with shared-store.

       

      With shared store configuration, the FSM is:

       

       

                    start                    locked backup                locked live           stop
      [stopped] ----------->[waiting backup]--------------->[main backup]-------------> [live] ------> [stopped]
                                    \                                 \                                ^  ^
                                     \                                 \           stop               /  /
                                      \            stop                 -----------------------------   /
                                       ----------------------------------------------------------------
      
      
      

       

      When a backup is started, it first try to lock the backup.lock file.

      The first one to do this is the main backup. All subsequent backups will wait to lock this backup.lock file

      The main backup will go further in its initialization and wait to lock the live.lock file.

      When it locks this file, it becomes the live server until it is stopped.

       

      With shared store, the FSM is a bit more complex:

       

       

                                          sync w/ live
                                         --------------
                    start               /              V
      [stopped] ----------->[queuing up]           [syncing with live]
                              ^        ^               /  /     
                               \        \  queue up   /  /
                                \        ------------   /
                                 \                     /        
                          another \                   / synced      
                            backup \                 / with live
                            becomes \               /         
                                live \             /     
                                      \           V    becomes live          stop
                                     [backup active] ---------------> [live] ---> [stopped]
                                             \                                      ^
                                              \              stop                  /
                                               -----------------------------------   
      

       

      When a backup is started, it will synchronize with the live server by queuing up resources (w/o lock on the live server) and a sync phase.

      Once the backup is fully synced, it becomes "active" (not the right word...)

      If the live server stops or crashes and another backup becomes live, it will sync again with this new live

      If the backup is the one which becomes live, it will remain live until it is stoppped

        • 1. Re: Backup FSM
          timfox

          Shared store looks fine.

           

          Don't worry about the replicated store yet, it's not part of the current task you're doing, initially we're just getting shared store working.