7 Replies Latest reply on Feb 21, 2019 6:30 PM by dshifrin

    Hornetq 2.4.7 keep alive between live and backup servers

    jcrowley42

      We are running hornetq 2.4.7 on Wildfly 9. Our configuration is a live server with 3 backup servers using replication for the messages. We are using jgroups for our discovery and broadcast groups. We have experienced the split brain scenario which has led the operations team to shutdown the backup servers. I just want to understand how the backup server identifies that it has lost connectivity to the live server. Does it use a ping? Does it use jgroups?

       

      "the backup will become active when it loses connection to its live server. This can be problematic because this can also happen because of a temporary network problem. In order to address this issue, the backup will try to determine whether it still can connect to the other servers in the cluster. If it can connect to more than half the servers, it will become active, if more than half the servers also disappeared with the live, the backup will wait and try reconnecting with the live. This avoids a split brain situation."

       

      When the backup tries to connect to the rest of the servers is it all the servers that make up the cluster - regardless of whether they are running hornetq?

      Are there configurations for how many times, or duration of time that the backup server will try to communicate with the live server?

       

      Thanks

        • 1. Re: Hornetq 2.4.7 keep alive between live and backup servers
          jbertram

          Discovery (whether it uses JGroups, standard UDP multicast, or standard static TCP) is only used to discover other cluster members. Once discovery is done the nodes form TCP connections between them. In your case, the backup and the live servers have a TCP connection between them. When that connection dies/fails then the backup will activate as it will assume the live has died. Usually you'll want to have 3 live/backup pairs so that you have a quorum to mitigate split brain since the danger of having split brain is much greater with just a single live/backup pair.

           

          A HornetQ backup can only connect with a HornetQ live server so if HornetQ isn't running it can't connect to it. I can't recall off the top of my head if there are ways to configure how the backup will try to connect to the live. You'll need to consult the documentation and/or the source code to determine that.

           

          In general, I recommend you move to Apache ActiveMQ Artemis as the HornetQ code-base was donated to the Apache ActiveMQ community several years ago now and hasn't been maintained since.

          • 2. Re: Hornetq 2.4.7 keep alive between live and backup servers
            jcrowley42

            Thanks Jason.

             

            I’m not sure I understand how having additional live/backup pairs helps to reduce the split brain.

            If  backup1 can’t communicate to the live1, the backup1 will take over managing the queue if it can communicate with more than 50% of the servers in the cluster. Is the way the quorum works is backup1 will try to communicate with the rest of the servers (live 2, backup2, live3, backup3) in the cluster, or will live2 and live3 servers try to communicate with live1 server?

             

            Is just having the additional live/backup servers defined in the cluster enough i.e. I don’t need to have those servers processing messages unless I specifically have a consumer or producer use them?

            • 3. Re: Hornetq 2.4.7 keep alive between live and backup servers
              jbertram
              Thanks Jason.

              I assume you're replying to me, although my name isn't Jason.

               

               

              If  backup1 can’t communicate to the live1, the backup1 will take over managing the queue if it can communicate with more than 50% of the servers in the cluster.

              That's sort of correct.

               

               

              Is the way the quorum works is backup1 will try to communicate with the rest of the servers (live 2, backup2, live3, backup3) in the cluster, or will live2 and live3 servers try to communicate with live1 server?

              When the replication connection fails then the backup will initiate a quorum vote and ask every other live node in the cluster to indicate whether or not it can "see" the live server for which replication failed. If the other live nodes reply in the affirmative then the backup will conclude that only the replication connection failed (i.e. rather than the whole broker) and it will not activate.

               

              Is just having the additional live/backup servers defined in the cluster enough i.e. I don’t need to have those servers processing messages unless I specifically have a consumer or producer use them?

              I believe so. It's hard for me to remember the particulars for HornetQ at this point for the automatic load-balancing.

               

              For what it's worth you can avoid all this by just using shared storage.

              • 4. Re: Hornetq 2.4.7 keep alive between live and backup servers
                jcrowley42

                Thanks Justin. I apologize for the wrong name.

                • 5. Re: Hornetq 2.4.7 keep alive between live and backup servers
                  dshifrin

                  Hi Justin,

                   

                  I am having similar issues and was wondering if using shared storage will fix this issue if I only have a single pair of live/backup servers?  Or does shared storage only work if I have multiple pairs of live/backups like you all are discussing above?  Thanks in advance

                  • 6. Re: Hornetq 2.4.7 keep alive between live and backup servers
                    jbertram

                    Shared store fundamentally mitigates against split brain since there is only one copy of the data between the pair (unlike with replication where there are multiple copies of the data).

                    • 7. Re: Hornetq 2.4.7 keep alive between live and backup servers
                      dshifrin

                      Great. Thanks a lot Justin for the quick reply!