6 Replies Latest reply on Jun 14, 2010 7:55 AM by mzeijen

    What is the behavior of the core bridge on different kinds of network problems or server outages?

    mzeijen

      I would like to know the behavior of the HornetQ core bridge (Version 2.1.0.Final) by different kinds of network connection problems. The situation

      is that we have two sets of HornetQ servers. Both sets have a production and backup server. The servers are separated by a WAN. HornetQ server set 1 has a core bridge to Hornetq server set 2. The core bridge of set 1 has the production and backup server configured of set 2. The "reconnect-attempts" setting is set to -1.

       

      Here are the different kinds of network connection problems:

       

      1. The WAN connection is lost and both the production and backup server of set 2 are unreachable for a couple of minutes.
        • Does the core bridge see the difference between a network problem and a crashed HornetQ server?
        • Does the core bridge try to reconnect to the production server or does it also try to connect to the backup server?
        • When the WAN connection is recovered will it then always reconnect to the production or can it also happened that it tries to reconnect to the backup server because it doesn't know the difference between a network problem and a crashed HornetQ?
      2. Only the connection to the production HornetQ server of set 2 is lost but the backup server is still reachable.
        • Does the core bridge failover to the backup server?
      3. Both live an backup server of set 2 go down (for instance due to Power loss).
        • Will the bridge reconnect to the production server as soon as the production server is back online?

       

      I know the HornetQ team is working on an improved HA solution for HornetQ. These points won't be an issue anymore as soon as there is release of a version with these improvements. But until that time I need to know how the bridge will act on these situations to prevent a split-brain problem.

       

      Thanks for your time.

        • 1. Re: What is the behavior of the core bridge on different kinds of network problems or server outages?
          timfox

          Maurice Zeijen wrote:

           

          I would like to know the behavior of the HornetQ core bridge (Version 2.1.0.Final) by different kinds of network connection problems. The situation

          is that we have two sets of HornetQ servers. Both sets have a production and backup server. The servers are separated by a WAN. HornetQ server set 1 has a core bridge to Hornetq server set 2. The core bridge of set 1 has the production and backup server configured of set 2. The "reconnect-attempts" setting is set to -1.

           

          Here are the different kinds of network connection problems:

           

          1. The WAN connection is lost and both the production and backup server of set 2 are unreachable for a couple of minutes.
            • Does the core bridge see the difference between a network problem and a crashed HornetQ server?

          It's not possible to distinguish the two cases. How would you do this?

           

          Maurice Zeijen wrote:

            • Does the core bridge try to reconnect to the production server or does it also try to connect to the backup server?

          If you've configured the client with knowledge of the backup, it will attempt to fail over to the backup, otherwise it will try to reconnect with the live

           

          Maurice Zeijen wrote:

            • When the WAN connection is recovered will it then always reconnect to the production or can it also happened that it tries to reconnect to the backup server because it doesn't know the difference between a network problem and a crashed HornetQ?

          See above

           

          Maurice Zeijen wrote:

           

          1. Only the connection to the production HornetQ server of set 2 is lost but the backup server is still reachable.
            • Does the core bridge failover to the backup server?

          Yes, if you've configured a backup, otherwise no.

           

          Maurice Zeijen wrote:

           

          1. Both live an backup server of set 2 go down (for instance due to Power loss).
            • Will the bridge reconnect to the production server as soon as the production server is back online?

           

           

          If you've configured a backup it will attempt to connect to the backup, otherwise it will attempt to reconnect to live

          • 2. Re: What is the behavior of the core bridge on different kinds of network problems or server outages?
            mzeijen

            Thanks for the information.

             

            I also thought that it is not possible to distiungish between network problems and a crashed HornetQ. But as I am not a network specialist I just wanted to be sure.

             

            Until a new version of HornetQ with improved HA is available I will use the JMS bridge instead of the core bridge. I can configure the JMS bridge in a way that it will be HA but it will never activate the backup server. For my usecase the JMS bridge suffices.

            • 3. Re: What is the behavior of the core bridge on different kinds of network problems or server outages?
              timfox

              Maurice Zeijen wrote:


               

              Until a new version of HornetQ with improved HA is available I will use the JMS bridge instead of the core bridge. I can configure the JMS bridge in a way that it will be HA but it will never activate the backup server. For my usecase the JMS bridge suffices.

              You can do the same with the core bridge. Just don't tell it about the backup server.

              • 4. Re: What is the behavior of the core bridge on different kinds of network problems or server outages?
                mzeijen

                Tim Fox wrote:

                 

                You can do the same with the core bridge. Just don't tell it about the backup server.

                But then it will never connect to the backup server when the production server is down. I want it to connect to the backup server to keep the Messaging service going else I wouldn't need the the whole HA setup would be pretty useless (except for the clients that post messages to the server).

                • 5. Re: What is the behavior of the core bridge on different kinds of network problems or server outages?
                  timfox

                  I don't understand your point here.

                   

                  The JMS bridge doesn't know about backup servers so this will not help you either.

                  • 6. Re: What is the behavior of the core bridge on different kinds of network problems or server outages?
                    mzeijen

                    Tim Fox wrote:

                     

                    I don't understand your point  here.

                     

                    The  JMS bridge doesn't know about backup servers so this will not help you  either.

                    I have a normal JMS Bridge setup but I did the following to make sure that it can connect to the backup server but without it ever making the backup server activate because of connection problem:

                     

                    In the JNDI properties of the JMS bridge I use a java.naming.provider.url containing both the JNDI addresses of the production and backup server (separated with a comma). As soon as the bridge loses the connection to the production server it will try to reconnect via JNDI to the live and backup server. If the network is down then it will reconnect to the live server as soon as the network is up. If the live server is really down and the backup is not then one of the local clients should have caused a failover and the backup server should be active. The bridge then connects to the backup server.

                     

                    To make sure that local clients can cause failovers, we need an extra connection factory configuration for the bridge (named by me "JmsBridgeConnectionFactory". In it's configuration it doesn't have the corresponding 'backup' server declared that the connection factory for the local clients does.

                     

                    This setup is a bit more complicated then the core bridge and it doesn't have the advantages as the core bridge but it does offer me the behaviour that I need.

                     

                    Thanks for you feedback.