2PC replication and recovery
timfox May 30, 2006 6:45 AMHi Chaps-
I was interesting in how you've approached the problem of 2PC replication and recovery in JBoss Cache, since I am currently approaching an analagous problem and looking at ways to solve it.
We have a master node, and X replica nodes which we can failover over on to, each node can have it's own, non-shared persistent store (if it's shared the problem becomes a lot easier but we are considering the non shared case).
Something happens (e.g. we add a reliable jms message) to the master node and we want to replicate that state change to each of the replica nodes in a reliable way, so we can guarantee they all get it or none of them get it.
We can't tolerate a situation where one of the nodes gets the update but another crashes before it records it's update resulting in different nodes have different states whent they are brought back up.
Each nodes persistent state is stored in it's persistent store.
It seems some kind of 2PC can be used to solve this problem (I think you have done something similar?).
So first we multicast a "prepare" to all the nodes, then if all nodes respond "ok" we multicast a "commit". otherwise we multicast a "rollback". This is all pretty standard 2PC stuff.
There are a couple of edge conditions though:
1) We multicast the prepare, and at least one of the participants says "no" (it's crashed say), so we multicast the "rollback" but the crashed node doesn't get it since it's crashed. So when it is brought up (maybe days later) it has some state hanging in a prepared state which needs to be rolledback. How does it know to rollback the update?
2) We multicast the prepare, and all partipants say "yes". The commit is then multicasted, some of them manage to commit, but a node crashes before it's commit is processed. When the node comes back it, it needs to process the commit, but how does it know to commit or rollback the update.
3) Other related edge cases, e.g. all nodes crash before commit is processed.
So in other words, to handle all these edge cases properly, the co-ordinator needs to have a recovery protocol and a persistent transaction log.
I.e. basically it needs to have much of the functionality of a recovering transaction manager as JBoss Transactions.
I'm wondering does the JBoss Cache co-ordinator handle these edge cases, or perhaps you have found some other way around this?
In JBoss Messaging we need to be able to provide a very strong reliability guarantee, so we need to cope with the recovery cases.