11 Replies Latest reply on Apr 27, 2010 2:50 AM by BJ Chippindale

    Sync  Live-Backup pairs inside JBoss

    BJ Chippindale Master

      If I have a live-backup pair set up inside  JBoss AS  (Can I even do this?  Clustering I am sure of... but this?  One of my co-workers wants me to, and I haven't tried it yet) , how would I go about forcing sync before starting the JBoss back up?  

       

      I am assuming that the JBoss AS is taken down completely on both nodes.

       

      (The standalone hornetq looks SO much easier...)

       

      Thanks

      BJ

        • 1. Re: Sync  Live-Backup pairs inside JBoss
          Clebert Suconic Master

          HornetQ inside the application server is pretty much the same as standalone. We are just sharing the VM with the application server, while on the standalone we have hornetQ alone.

           

          You should probably just sync the data directory under /default/data/journal and bindings between live and backup.

          1 of 1 people found this helpful
          • 2. Re: Sync  Live-Backup pairs inside JBoss
            BJ Chippindale Master

            OK... tar the ../data/journal from the live to the backup.   That's clear.  

             

            Could you expand on "sync the bindings?"  

             

            The connectors would all be the same except for the assymetry that one is the "backup"  which is what I thought of first when you said bindings...  the JMS queues and topics ?    That must be it, I only know about the location of the jms definitions in conf...  I haven't looked at that data directory yet.

             

            Thanks

            BJ

            • 3. Re: Sync  Live-Backup pairs inside JBoss
              Clebert Suconic Master
              under data you will see the bindings directory, which is the journal for the bindings definition. That's all that I meant.. the data directories.
              • 5. Re: Sync  Live-Backup pairs inside JBoss
                BJ Chippindale Master

                Well, everything was OK until we decided to use this config in anger.

                 

                We have a couple of "issues" that complicate the arrangement, however, it isn't doing what I'd expect.

                 

                We are using the sequence of copying ( using scp -r   ) from the live to the backup.  We get

                hornetq-data-904.hq

                hornetq-data-906.hq

                hornetq-data-903.hq

                hornetq-data-902.hq

                hornetq-data-901.hq

                hornetq-data-905.hq

                hornetq-data-900.hq

                hornetq-bindings-1.bindings

                hornetq-bindings-2.bindings

                 

                Then starting the backup and within about 10 seconds to 30 seconds  the live server.

                 

                This worked reliably for a week.  Then we went live and within a day.....

                 

                We get the message on the live server that the backup could not start, as the data differs and neither actually start correctly (I think because of the complication, which is that there are applications that require durable connections to topics (which they can't share between the two nodes) as the architecture does not yet include a bridge to a queue to ensure the correct distribution.  

                 

                So instead they normally start the two up and then shut down applications on the backup server.

                 

                Which begs the following question.

                 

                If we had a failover event (possible in some part of the tests we have been doing ) where would we find out about it.    I am not seeing any indication of which is live and which is backup in the logs,     How do I know?

                 

                 

                ???

                 

                respectfully

                BJ

                • 6. Re: Sync  Live-Backup pairs inside JBoss
                  Tim Fox Master

                  This is probably happening because a connection has failed from your live server to your backup server. Once that has happened the live server data is "invalid" and cannot be restarted.

                   

                  Alternatively this can happen by a misconfigured client connecting directly to the backup server which would trigger the backup server to activate.

                   

                  Have a look for "Activating backup server" in the logs of the backup server.

                  1 of 1 people found this helpful
                  • 7. Re: Sync  Live-Backup pairs inside JBoss
                    BJ Chippindale Master

                    So the first would be flagged by:

                     

                    WARN [org.hornetq.core.replication.impl.ReplicationManagerImpl]  Connection to the backup node failed, removing replication now

                     

                     

                    ?

                     

                    This being the message I see in the log of the live server.

                     

                    However, it would not fail immediately, but only when a restart is attempted.    

                     

                    Which leaves the question of how to recover. 

                     

                    Sync the backup server to the live and restart with backup first and thus available did not do the job.  

                     

                    Sync the live to the backup seems counterintuitive.  The backup going offline would mean that it is the one not keeping up.  ?

                     

                    I had to set the live node to "normal" mode leaving live-backup off for the moment, in order to start that node.

                     

                    I believe I will be able to restart the live-backup configuration now, but what is the "right" way to do this? 

                     

                    Thanks

                     

                    BJ

                    • 8. Re: Sync  Live-Backup pairs inside JBoss
                      Tim Fox Master

                      BJ Chippindale wrote:

                       

                      So the first would be flagged by:

                       

                      WARN [org.hornetq.core.replication.impl.ReplicationManagerImpl]  Connection to the backup node failed, removing replication now

                       

                       


                      This warning means the connection from the live to the backup node, that is used to replicate the data, failed. Probably due to a temporary network failure.

                       

                      If that connectionn is down, then clearly no more data can be replicated to the backup, so they'll get out of sync. When you bring down the system and restart it, this will be detected and you'll get the warnings that the servers are not synced.

                       

                      In this situation you need to wipe the backup and copy across the data from the live before restarting them both.

                      • 9. Re: Sync  Live-Backup pairs inside JBoss
                        BJ Chippindale Master

                        So  the scp -r  from the live to the backup leaves something in the backup ?  

                         

                        That makes sense.    I will modify my sync script.  Wipe data and bindings first.  Then scp.

                         

                        Thanks

                        BJ

                         

                         

                        BTW:  I realized that I left out a characteristic when  I wrote up that problem with the server not seeing the correct ip in the multihomed setup.  Realized it a few days ago when looking at your response.   The server is inside JBoss-4.2.3.GA.   Feeling fairly dumb about that.

                         

                        respectfully

                        BJ

                        • 10. Re: Sync  Live-Backup pairs inside JBoss
                          BJ Chippindale Master

                          That worked as expected.

                           

                          Which leaves just one more dumb question.

                           

                          To recover after the switch from live to backup (which is not what I just did or the problem I just had):

                           

                          I will be looking at a backup server that has taken on the full load....

                          ... and a live system that has just had (for example) a power event that caused it to disappear for some minutes but has come up invalid.

                           

                          I have to :

                           

                          • stop the live server, 
                          • stop the backup server, 
                          • wipe the hornetq data from the live server, 
                          • copy the hornetq data from the backup server,
                          • restart the backup
                          • within a few seconds restart the live. 

                           

                          Is this correct?

                           

                          respectfully

                          BJ