Shared store and file locks
jmesnil Sep 2, 2010 9:33 AMI am testing shared store with real file locks
First case:
- node0 server is started
=> create live.lock
=> lock it
=> node 0 is live
- node1 server is started
=> wait to lock live.lock
=> node1 is backup waiting to failover
Now what happens when node0 is stopped?
currently, it will delete the file and unlock it.
since node0 has no longer a lock on it, node1 will in turn lock it and becomes "live".
But the file is no longer there!
When the node0 is restarted (still with a live server configuration), it will not check that if the file exists
but always recreate it and lock it
=> node0 is "live" too at the same time than node1!
To fix this, I'll change the algorithm:
Shared Live Activation
* it will wait in SharedStoreLiveActivation.run() if there is *already* a live.lock file and wait until it no longer exists
=> this way, a live node will not start if a backup has failed over and become live
* in SharedStoreLiveActivation.close(), we still delete the file and unlock
Shared Backup Activation
* in SharedStoreBackupAction.run(), we wait to lock the live.lock file. When the method returns, we check if the file still
exists. If it's not the case, we lock again to recreate the file (that's ok, we already hold the lock)
=> this way, a backup node which has failed over will have the lock and the file will still exists
* in SharedStoreBackupAction.close(), if we unlock from live.lock (i.e. the node became live), we also make sure to delete the file.
=> whether we stop a live node or a backup node which has failed over, there must be no live.lock file once the server is stopped