1 2 Previous Next 15 Replies Latest reply on Nov 1, 2012 6:31 AM by ataylor

    Backup server can not be started if master server is offline

    dengyong

      HornetQ: 2.3 BETA

      Topology: Two HornetQ has been configured to form HA live backup group. The HA mode is date replication

       

      I am using UDP broadcast.

       

      Description:

          1. HornetQ master is stopped

          2. start HornetQ backup server, it will with following exception

          3. Start HornetQ master. After the master is fully started, the backup won't recover

       

          Why backup server can not start if master server is offline? If backup is activated after master died, before

          master replicate the state back, the backup dies. In this situation, it will cause problem if backup server can

          not start if master if offline.

       

          Configuration is attached.

       

       

       

      19:06:50,176 ERROR [org.hornetq.core.server] HQ114002: Failure in initialisation: HornetQException[e

      rrorType=CONNECTION_TIMEDOUT message=HQ119031: Timed out waiting to receive initial broadcast from c

      luster]

              at org.hornetq.core.client.impl.ServerLocatorImpl.createSessionFactory(ServerLocatorImpl.jav

      a:760) [hornetq-core-client.jar:]

              at org.hornetq.core.client.impl.ServerLocatorImpl.connect(ServerLocatorImpl.java:594) [horne

      tq-core-client.jar:]

              at org.hornetq.core.client.impl.ServerLocatorImpl.connect(ServerLocatorImpl.java:578) [horne

      tq-core-client.jar:]

              at org.hornetq.core.server.LiveNodeLocator.connectToCluster(LiveNodeLocator.java:82) [hornet

      q-core.jar:]

              at org.hornetq.core.server.impl.HornetQServerImpl$SharedNothingBackupActivation.run(HornetQS

      erverImpl.java:2222) [hornetq-core.jar:]

              at java.lang.Thread.run(Thread.java:662) [rt.jar:1.6.0_26]

       

      HornetQException[errorType=CONNECTION_TIMEDOUT message=HQ119031: Timed out waiting to receive initia

      l broadcast from cluster]

              at org.hornetq.core.client.impl.ServerLocatorImpl.createSessionFactory(ServerLocatorImpl.jav

      a:760)

              at org.hornetq.core.client.impl.ServerLocatorImpl.connect(ServerLocatorImpl.java:594)

              at org.hornetq.core.client.impl.ServerLocatorImpl.connect(ServerLocatorImpl.java:578)

              at org.hornetq.core.server.LiveNodeLocator.connectToCluster(LiveNodeLocator.java:82)

              at org.hornetq.core.server.impl.HornetQServerImpl$SharedNothingBackupActivation.run(HornetQS

      erverImpl.java:2222)

              at java.lang.Thread.run(Thread.java:662)

        • 1. Re: Backup server can not be started if master server is offline
          clebert.suconic

          A backup will connect to the live server to request data, so a backup needs to talk to the live server.

          • 2. Re: Backup server can not be started if master server is offline
            dengyong

            It is ok if master online is a perquisite for backup start. but currently, backup server will fail to initialize and can never start even after master starts. This behavior is not preferred. Backup needs a retry mechanism. It need to retry with a interval until master is online. Do you agree?

            • 3. Re: Backup server can not be started if master server is offline
              dengyong

              A question on bakcup start perquisite (master must be online): what if the backup server is the last live server in the group (that means it has the latest the JMS state)? If it can not start because it is configured as backup server, won't it have problems?

              • 4. Re: Backup server can not be started if master server is offline
                ataylor

                It is ok if master online is a perquisite for backup start. but currently, backup server will fail to initialize and can never start even after master starts. This behavior is not preferred. Backup needs a retry mechanism. It need to retry with a interval until master is online. Do you agree?

                That is currently what happens, once the live comes up the backup should start replicating. if it doesnt could you provide a test

                A question on bakcup start perquisite (master must be online): what if the backup server is the last live server in the group (that means it has the latest the JMS state)? If it can not start because it is configured as backup server, won't it have problems?

                a back server cant just make an arbitrary decision to start, this could cause split brain issues, it needs to connect to the cluster before deciding what should happen. you should configure your cluster so what you suggested cant happen

                • 5. Re: Backup server can not be started if master server is offline
                  dengyong

                  Andy Taylor wrote:

                  A question on bakcup start perquisite (master must be online): what if the backup server is the last live server in the group (that means it has the latest the JMS state)? If it can not start because it is configured as backup server, won't it have problems?

                  a back server cant just make an arbitrary decision to start, this could cause split brain issues, it needs to connect to the cluster before deciding what should happen. you should configure your cluster so what you suggested cant happen

                   

                  If backup server start has perquisite (it can not be first live member || master must be online), it will have trouble to support below production real case:

                  1. We configure master A + backup B to form HA live group
                  2. Master A and backup B are both started
                  3. Master A host machine has hardware problem and need take sometime to repair
                  4. If backup have the perquisite, it means backup B can never brought down to maintain before master A is repaired

                  How we support such real production case?

                   

                  I know currently we have reson for such perquisite to avoid brain split issue. This is a implementation level thing. We can figure out idea to improve it, right?

                  • 6. Re: Backup server can not be started if master server is offline
                    dengyong

                    Andy Taylor wrote:

                     

                    It is ok if master online is a perquisite for backup start. but currently, backup server will fail to initialize and can never start even after master starts. This behavior is not preferred. Backup needs a retry mechanism. It need to retry with a interval until master is online. Do you agree?

                    That is currently what happens, once the live comes up the backup should start replicating. if it doesnt could you provide a test

                     

                    I am using HornetQ 2.3 BETA. The behavior is not like you described. When backup server is starting, if master is offline, the backup will fail to start and can never recover after master come online again.

                    I have provided detail steps in my orignal post. Let me know if you need more.

                    • 7. Re: Backup server can not be started if master server is offline
                      ataylor

                      like i say, the backup cant just arbitrarily start as a live, it needs to connect to the cluster to avoid split brain. In your case you know th wlive is down so you can just configure the backup to be live, i.e. <backup>false</backup>

                      • 8. Re: Backup server can not be started if master server is offline
                        dengyong

                        Andy Taylor wrote:

                         

                        like i say, the backup cant just arbitrarily start as a live, it needs to connect to the cluster to avoid split brain. In your case you know th wlive is down so you can just configure the backup to be live, i.e. <backup>false</backup>

                         

                        Ok, I think this is acceptable if you have no good idea to improve the behavior.

                         

                         

                        Andy Taylor wrote:

                         

                        It is ok if master online is a perquisite for backup start. but currently, backup server will fail to initialize and can never start even after master starts. This behavior is not preferred. Backup needs a retry mechanism. It need to retry with a interval until master is online. Do you agree?

                        That is currently what happens, once the live comes up the backup should start replicating. if it doesnt could you provide a test

                         

                        I am using HornetQ 2.3 BETA. The behavior is not like you described. When backup server is starting, if master is offline, the backup will fail to start and can never recover after master come online again.

                        I have provided detail steps in my orignal post. Let me know if you need more.

                         

                        For above issue, do you plan to fix that? Need to raise JIRA bug?

                        • 9. Re: Backup server can not be started if master server is offline
                          dengyong

                          HornetQ bug HORNETQ-1076 is filed.

                          • 10. Re: Backup server can not be started if master server is offline
                            ataylor

                            as i have explained, this is not a bug, it is expected behaviour, i will close the jira

                            • 11. Re: Backup server can not be started if master server is offline
                              ataylor

                              actuall, ignoe last comment, it is an issue. i mis understood your original post

                              • 12. Re: Backup server can not be started if master server is offline
                                ataylor

                                ive renamed the jira to be clearer

                                • 13. Re: Backup server can not be started if master server is offline
                                  ataylor

                                  by the way Yong, thanks for testing out our Beta, if you want to try any fixes you can build from master by building a distro via mvn -Prelease package

                                  • 14. Re: Backup server can not be started if master server is offline
                                    dengyong

                                    Sure. I will keep eye on it. When 2.3 final will be released? Do we have a defined date?

                                    1 2 Previous Next