Overview of HA / Failover design approach in JBoss Messaging 1.2
We need to maintain a cluster wide consistent mapping of node -> List corresponding to unacked messages in each of the sessions on the connection that failed. Based on this the server recreates the delivery list in the server consumer delates.
If the failed node is subsequently resurrected, then it is not such a simple matter to just move the connections back to the original node since there may be unacknowledged messages in live sessions. If we move the connections then we any non persistent messages might get redelivered.
Therefore we can only safely move back connections if there are no unacked messages in any sessions.
This is probably part of a bigger question of how we redistribute connections over many nodes when we suddenly add a lot of nodes to the cluster.
For the first pass we should probably not bother since this is tricky. However if we want to be able to automatically spread load smoothly and get benefits when adding new nodes to cluster with already created connections we should consider this.
We should also consider being able to bring down a node smoothly from the management console without losing sessions - i.e. move them transparently to another node. Again this is not a high priority but something to think about.