11 Replies Latest reply on Oct 29, 2009 10:03 AM by clebert.suconic

    Starting a backup for a live node?

    clebert.suconic

      jmesnil, jbossfox: my question is.. what do we want to do in terms on backup / live replication.

      Do we want to be able to add a backup to any live node? or should it always be pre-configured?

      The way I thought:

      - You have nodes A and B. B is the backup for A.
      - A fails, B is online.
      - B will be looking up for A to come back.
      - As soon as A is back, B will transfer/copy the data to A,
      - now A is the replica

      Now.. if we want to be able to add a backup node to *any* live node.. we would need to add operations to add the configuration.


        • 1. Re: Starting a backup for a live node?
          clebert.suconic

          I'm fine either way. I just want to know how we want to accomplish this. Pre-configured.. or being able add a backup to any node.

          • 2. Re: Starting a backup for a live node?
            timfox

            How it can be pre-configured?

            If you have nodes A and B, where B is the backup of A.

            Node A knows who it's backup is, by nature of the backup-connector-ref element.

            The backup *does not know* who it's live node is.

            If node A now dies, node B will become live.

            Note that node B *does not have any backup connector-ref*

            So I don't see how it can pre-configured like you suggest.

            • 3. Re: Starting a backup for a live node?
              clebert.suconic

              I can think of two ways to solve this:

              I - Pre-configured.

              1.1 The server would only connect to a pre-configured node.

              # On server A.
              The backup-ref is associated to ServerB. ServerB is the backup

              # On server B.
              The backup-ref is associated to ServerA. That config won't take place until B is activated.

              1.2 The server would keep tryint to connect to its backup. As soon as the backup is connected a copy is done and that node becomes operational.


              II - Management Operation

              2.1 An user administrator could connect through management and call a method such as configureBackup(String connector).

              2.2 At the point of the method call, the live node will send all the data to the backup and that node will become the backup.


              if we decide by 2) How can we send the backup change to the clients?

              • 4. Re: Starting a backup for a live node?
                timfox

                When the live node dies, it might have blown up, so you cannot assume it is available for the new live node to use as it's backup.

                • 5. Re: Starting a backup for a live node?
                  clebert.suconic

                   

                  it might have blown up


                  I thought about that. But I thought the user would be able to deploy a similar node.

                  Ok.. We will need a management operation then. That means 2.


                  I'm assuming it is already possible to update the ConnectionFactory at the client side. Do you have any pointers on how this is currently done?

                  • 6. Re: Starting a backup for a live node?
                    timfox

                     

                    "clebert.suconic@jboss.com" wrote:
                    it might have blown up


                    I thought about that. But I thought the user would be able to deploy a similar node.

                    Ok.. We will need a management operation then. That means 2.


                    I'm assuming it is already possible to update the ConnectionFactory at the client side. Do you have any pointers on how this is currently done?


                    See the section in the usermanual on discovery

                    • 7. Re: Starting a backup for a live node?
                      clebert.suconic

                      I don't see the backupConnectoFactory being updated anywhere.

                      It seems that there is some work to be on at the failover also to have this working properly.

                      • 8. Re: Starting a backup for a live node?
                        clebert.suconic

                        by backupConnectoFactory read FailoverManager::backupConnectorFactory., backupTransports... etc.

                        It seems the backup has to be initialized at the startup only. Right now the update of the CF are only updating the Load Balancing. (Maybe I'm missing something here?)

                        • 9. Re: Starting a backup for a live node?
                          clebert.suconic

                          BTW: It should be possible to add a backup on a live node with shared storage also. So, this feature won't be exclusive to replication.

                          • 10. Re: Starting a backup for a live node?
                            jmesnil

                            can we step back and agree on what's the end game here?

                            I need to document the task relates to HA and failover and regardless of the implementation, I am not even sure about what we want to achieve here.
                            afaiui, we need to provide the ability to add on the fly a backup to a running live server.
                            This is the fundamental step to go from a working HA env to failover env to another working HA env.

                            1. user starts with live server A and backup server B.
                            2. live server A crashes
                            => clients fail over to server B and activate it
                            3. there is no longer any HA as server B is the only server up and running

                            What are the operations hornetq/admins must do to go back to a HA environment?

                            * this depends on the type of HA (replicating store vs shared store)
                            * this should take into account the clients (the clients must be aware of the new HA)
                            - this will depend on the way clients are informed of servers (discovery groups vs static connectors)

                            at step 3. the admin must:
                            * start a new backup server C (it might not be a good idea to restart server A, see below)
                            - copy server B config + flag the server C as backup
                            - if the server B is using a shared store -> server C must also use it
                            - start server C
                            * update configuration on running server B:
                            - if server B is using a shared store, nothing to do
                            - if server B has a replicated store:
                            - add a connector to server C (management operation)
                            - sync server B data to C (depending on the HA mode) (triggered by a management operation)
                            * inform the clients of the new HA env (B is live, C is backup)
                            - modify B's hornetq-jms.xsd for JMS with JNDI resources
                            - broadcast new config

                            am I missing steps?
                            When all these operations are performed, HornetQ will again offer a HA env.

                            A few things to mention:

                            * difference in server B live config vs configuration file:
                            - in configuration file, server B is flagged as backup and has no connector to server C
                            - running server B is a live server and has a connector to server C
                            => restarting server B w/o changing its config file will break the HA env
                            * do not use server A as server's B backup
                            - there could be failed over clients which are configured to connect to server A<->live and server B<->backup
                            if these clients are restarted in the new HA env (A<->backup, B<->live), they'll connect to the server A first,
                            activate it and trigger a split brain




                            • 11. Re: Starting a backup for a live node?
                              clebert.suconic

                               

                              at step 3. the admin must:
                              * start a new backup server C (it might not be a good idea to restart server A, see below)
                              - copy server B config + flag the server C as backup
                              - if the server B is using a shared store -> server C must also use it
                              - start server C
                              * update configuration on running server B:
                              - if server B is using a shared store, nothing to do
                              - if server B has a replicated store:
                              - add a connector to server C (management operation)
                              - sync server B data to C (depending on the HA mode) (triggered by a management operation)
                              * inform the clients of the new HA env (B is live, C is backup)
                              - modify B's hornetq-jms.xsd for JMS with JNDI resources
                              - broadcast new config



                              What you describe here is exactly what we are doing. I'm not restarting A any more.

                              My problem is being at :

                              - add a connector to server C (management operation).

                              The clients connected to B will need to be informed about a new backup server (B->C). I believe that would involve changes to the current Failover code.