Backup FSM
jmesnil Aug 31, 2010 10:44 AMTo get my mind clear on HA failover, I drew the FSM for a backup either with no-shared store or with shared-store.
With shared store configuration, the FSM is:
start locked backup locked live stop [stopped] ----------->[waiting backup]--------------->[main backup]-------------> [live] ------> [stopped] \ \ ^ ^ \ \ stop / / \ stop ----------------------------- / ----------------------------------------------------------------
When a backup is started, it first try to lock the backup.lock file.
The first one to do this is the main backup. All subsequent backups will wait to lock this backup.lock file
The main backup will go further in its initialization and wait to lock the live.lock file.
When it locks this file, it becomes the live server until it is stopped.
With shared store, the FSM is a bit more complex:
sync w/ live -------------- start / V [stopped] ----------->[queuing up] [syncing with live] ^ ^ / / \ \ queue up / / \ ------------ / \ / another \ / synced backup \ / with live becomes \ / live \ / \ V becomes live stop [backup active] ---------------> [live] ---> [stopped] \ ^ \ stop / -----------------------------------
When a backup is started, it will synchronize with the live server by queuing up resources (w/o lock on the live server) and a sync phase.
Once the backup is fully synced, it becomes "active" (not the right word...)
If the live server stops or crashes and another backup becomes live, it will sync again with this new live
If the backup is the one which becomes live, it will remain live until it is stoppped