That is something being improved on the next release.
For now, you need to manually copy the data after failover, like redirecting the traffic on that node to another node on the cluster, and then restarting the whole pair.
I know this is not the ideal for GA, but this is something being worked ATM.
I re-read the doc section but I am still confused as to how exactly we should be failing-back clients. There's not much talk about it in the doc I don't see.
Should we shutdown the clients and copy the data from the backup node to the primary node then start the clients backup expecting them to connect to the primary?
If these nodes are participating in a cluster and we cleanly shutdown the backup after a primary has catastrophically failed, will the clients cleanly try to re-connect to another node in the cluster? If this were possible, I think this would solve most problems because it will allow us time to move the data from backup to primary without the client having downtime and still remain ACID.
I don't think you are suggesting that we move the data file while clients are still connected to the backup because I don't see how you would ever get an accurate snap-shot of state because objects would still be shifting around.
Any help would be appreciated,
I think this is the statement I don't fully understand:
"redirecting the traffic on that node to another node on the cluster"
How do you administratively direct the clients to another node on the cluster while the applications are still running? I think that is all I am really missing.
There currently is no way to re-instate a live node with a new backup node while it is running. So, if a node fails over to its backup, that backup node becomes live and has to live for some time with no backup, until you can take it down and bring it back up with a backup. (Actually that is no different from ActiveMQ)
Before GA we're going to be making some significant changes in the area of replication and failover, and I hope to address this.
I agree that we really need a seamless process for adding a backup to a live node so we can continue with zero downtime after failure.
Thanks for the information. Currently this is not a show stopper for us I just wanted to fully understand my options. Since there are no requirements to use this right now I think we will just come back to it after GA.