4 Replies Latest reply on Oct 11, 2012 11:27 AM by borges

    Replication in hornetq 2.3.0 beta

    qtm

      Hi,

       

      I'm testing  replication and I have 2 servers on my localhost for this test. If I stop the live server, the back-up become lives and replication takes place. Everything as expected so far. But, when the live server restarts, the back-up (now live), after transfering the journal to the live server, kills itself. It goes down without an error with the usual shutdown message:

      INFO  [org.hornetq.core.server] HQ111004: HornetQ Server version 2.3.0.BETA1 (HornetQ sting, 122) [5d35fd3f-12a1-11e2-b231-edb2c3349cd7] stopped.

      If the live goes down again, there is no back-up available. I've attached my config files. Is there something wrong?

       

      Also, after each transfer, the old journal is still kept in a different folder. Over time, this could cause some size problems on the disk.

       

       

      Thanks

        • 1. Re: Replication in hornetq 2.3.0 beta
          qtm

          Hi,

          Is this a known issue or is just my config messed up?

          • 2. Re: Replication in hornetq 2.3.0 beta
            gaohoward

            Sounds to me like a problem. If you can reproduce it do you mind creating a Jira ?

            • 3. Re: Replication in hornetq 2.3.0 beta
              clebert.suconic

              Also, after each transfer, the old journal is still kept in a different folder. Over time, this could cause some size problems on the disk.

               

              That is by design. It requires an admin to delete the old journal in order to avoid losing data.

              Maybe we could/should add an Info asking to verify if these files can be removed.

              • 4. Re: Replication in hornetq 2.3.0 beta
                borges

                qtm wrote:

                 

                Hi,

                Is this a known issue or is just my config messed up?

                This is a known issue.

                 

                We haven't gotten around to add support to `revert the old-backup-now-live server to a backup after the live is manually restarted`. Ideally the two servers should switch roles withOUT having to sync any more data. The locking is a little tricky, so it hasn't been implemented.

                 

                Some comments:

                 

                • when you restart the original-live, and it kicks the original-backup from its place (we call that fail-back), for this to work you need to have allowed fail-back at the backup server. So while it might be surprising, the backup configuration is allowing it through

                            <allow-failback>true</allow-failback>

                • The replication code is new, we can't allow it to go on deleting all the local journals you have when you start a backup. I understand that you may start accumulating data that has been moved out of the way, we do log a warning, so that users are (somehow) notified of it. I really do not want to add a configuration option to delete these blindly. Users would copy configuration files around, without checking all the values, then start the wrong server and that server would delete precious data. I am much more afraid of this than of 'moved out files' accumulating and using disk space. I know this is not ideal, and I am open to suggestions about how to deal elegantly with this.