1 Reply Latest reply on Oct 28, 2009 7:00 AM by timfox

    HA documentation

    jmesnil

      I'm rewriting the HA chapter of the user manual to take into account the changes made to failover & replication.

      The oultine looks like:

      * Live-Backup Pair
       * Configuring Live-Backup Pair
      * HA Modes
       * Data Replication
       * Configuration
       * Synchronize Live-Backup Pairs
       * Shared Store
       * Configuration
       * Synchronize Live-Backup Pairs
      * Queue Activation Timeout
      * Client Failover
       * Configuring Clients For Automatic Failover
       * Transacted Session & Failover
      * Application-Level Client Failover
      


      I need a few clarifications to be sure the new doc is correct for the final implementation.

      * Shared store configuration

      To configure shared store, both servers must set <shared-store>true</shared-store> and point to the same data location (on a shared file system). Is there anything else to configure?

      * Sync live-backup with share store mode

      No sync is required as they use the same store. However, after failover, at first opportunity, backup server must be taken down, and live server then backup servers must be restarted.
      It could be possible to restart the previous live server as backup but this would include configuration modification to both servers (flag previous live as backup, make sure the current live as a backup-connector, etc.). What should we suggest? Stopping backup + Restarting live then backup is simpler

      * Data replication configuration

      As it is the default configuration, there is nothing to do

      * Sync live-backup with data replication

      The procedure is the same than 2.0.Beta5, right? After failover, stop the backup, copy the backup journal to the live, restart live and backup servers

      * Split-brain

      As failover is activated by client connecting to the backup server, we need to make clear under which conditions split-brain occurs and what are the consequences (esp. when sharing the store).

      Do you see other things that I need to add to the documentation?

        • 1. Re: HA documentation
          timfox

           

          "jmesnil" wrote:
          I'm rewriting the HA chapter of the user manual to take into account the changes made to failover & replication.

          The oultine looks like:

          * Live-Backup Pair
           * Configuring Live-Backup Pair
          * HA Modes
           * Data Replication
           * Configuration
           * Synchronize Live-Backup Pairs
           * Shared Store
           * Configuration
           * Synchronize Live-Backup Pairs
          * Queue Activation Timeout
          



          Queue activation timeout doesn't exist any more


          * Client Failover
           * Configuring Clients For Automatic Failover
           * Transacted Session & Failover
          * Application-Level Client Failover
          



          Basically there are 3 modes now:

          1) 100% transparent re-attach - this only works when reconnecting to the same node (since we don't replicate any more). For this, the channel needs to maintain the confirmation buffer. (This will be clearer in my next commit)

          2) Automatic failover - where the sessions/consumers are recreated but session state does not survive - this can be transacted or non transacted. For non transacted, exceptions are thrown on commit. Reconnect attempts must be -1 or >1 for this to occur

          3) "Application level failover". This is when the user manually recreates the connections/sessions via an ExceptionListener. Normally reconnect attempts would be zero for this.

          Also, in the current code, the ExceptionListener *always* gets called irrespective of whether the code automatically reconnected.


          I need a few clarifications to be sure the new doc is correct for the final implementation.

          * Shared store configuration

          To configure shared store, both servers must set <shared-store>true</shared-store> and point to the same data location (on a shared file system). Is there anything else to configure?

          * Sync live-backup with share store mode

          No sync is required as they use the same store.

          However, after failover, at first opportunity, backup server must be taken down, and live server then backup servers must be restarted.
          It could be possible to restart the previous live server as backup but this would include configuration modification to both servers (flag previous live as backup, make sure the current live as a backup-connector, etc.). What should we suggest? Stopping backup + Restarting live then backup is simpler


          For shared store, backup-connector is not used - that's just used for replicating the store.


          * Data replication configuration

          As it is the default configuration, there is nothing to do

          * Sync live-backup with data replication

          The procedure is the same than 2.0.Beta5, right? After failover, stop the backup, copy the backup journal to the live, restart live and backup servers


          No, clebert is working on this now, it will be automatic.



          * Split-brain

          As failover is activated by client connecting to the backup server, we need to make clear under which conditions split-brain occurs and what are the consequences (esp. when sharing the store).

          Do you see other things that I need to add to the documentation?


          Split brain is currently a TODO