1 Reply Latest reply on Oct 28, 2009 7:00 AM by timfox

HA documentation

jmesnil Oct 28, 2009 6:28 AM

I'm rewriting the HA chapter of the user manual to take into account the changes made to failover & replication.

The oultine looks like:

* Live-Backup Pair
 * Configuring Live-Backup Pair
* HA Modes
 * Data Replication
 * Configuration
 * Synchronize Live-Backup Pairs
 * Shared Store
 * Configuration
 * Synchronize Live-Backup Pairs
* Queue Activation Timeout
* Client Failover
 * Configuring Clients For Automatic Failover
 * Transacted Session & Failover
* Application-Level Client Failover

I need a few clarifications to be sure the new doc is correct for the final implementation.

* Shared store configuration

To configure shared store, both servers must set <shared-store>true</shared-store> and point to the same data location (on a shared file system). Is there anything else to configure?

* Sync live-backup with share store mode

No sync is required as they use the same store. However, after failover, at first opportunity, backup server must be taken down, and live server then backup servers must be restarted.
It could be possible to restart the previous live server as backup but this would include configuration modification to both servers (flag previous live as backup, make sure the current live as a backup-connector, etc.). What should we suggest? Stopping backup + Restarting live then backup is simpler

* Data replication configuration

As it is the default configuration, there is nothing to do

* Sync live-backup with data replication

The procedure is the same than 2.0.Beta5, right? After failover, stop the backup, copy the backup journal to the live, restart live and backup servers

* Split-brain

As failover is activated by client connecting to the backup server, we need to make clear under which conditions split-brain occurs and what are the consequences (esp. when sharing the store).

Do you see other things that I need to add to the documentation?

1. Re: HA documentation

timfox Oct 28, 2009 7:00 AM (in response to jmesnil)
"jmesnil" wrote:
I'm rewriting the HA chapter of the user manual to take into account the changes made to failover & replication.

The oultine looks like:

* Live-Backup Pair * Configuring Live-Backup Pair * HA Modes * Data Replication * Configuration * Synchronize Live-Backup Pairs * Shared Store * Configuration * Synchronize Live-Backup Pairs * Queue Activation Timeout

Queue activation timeout doesn't exist any more

* Client Failover * Configuring Clients For Automatic Failover * Transacted Session & Failover * Application-Level Client Failover

Basically there are 3 modes now:

1) 100% transparent re-attach - this only works when reconnecting to the same node (since we don't replicate any more). For this, the channel needs to maintain the confirmation buffer. (This will be clearer in my next commit)

2) Automatic failover - where the sessions/consumers are recreated but session state does not survive - this can be transacted or non transacted. For non transacted, exceptions are thrown on commit. Reconnect attempts must be -1 or >1 for this to occur

3) "Application level failover". This is when the user manually recreates the connections/sessions via an ExceptionListener. Normally reconnect attempts would be zero for this.

Also, in the current code, the ExceptionListener *always* gets called irrespective of whether the code automatically reconnected.

I need a few clarifications to be sure the new doc is correct for the final implementation.

* Shared store configuration

To configure shared store, both servers must set <shared-store>true</shared-store> and point to the same data location (on a shared file system). Is there anything else to configure?

* Sync live-backup with share store mode

No sync is required as they use the same store.

However, after failover, at first opportunity, backup server must be taken down, and live server then backup servers must be restarted.
It could be possible to restart the previous live server as backup but this would include configuration modification to both servers (flag previous live as backup, make sure the current live as a backup-connector, etc.). What should we suggest? Stopping backup + Restarting live then backup is simpler

For shared store, backup-connector is not used - that's just used for replicating the store.

* Data replication configuration

As it is the default configuration, there is nothing to do

* Sync live-backup with data replication

The procedure is the same than 2.0.Beta5, right? After failover, stop the backup, copy the backup journal to the live, restart live and backup servers

No, clebert is working on this now, it will be automatic.

* Split-brain

As failover is activated by client connecting to the backup server, we need to make clear under which conditions split-brain occurs and what are the consequences (esp. when sharing the store).

Do you see other things that I need to add to the documentation?

Split brain is currently a TODO
Actions