4 Replies Latest reply on Apr 20, 2016 6:09 AM by mnovak

    Co-located Failover/Failback Issue Wildfly 8.2.1

    amsinha

      Setup:

      Node1

      Node2

      Primary HornetQ Server on Node1

      Primary HornetQ Server on Node2

      Backup HornetQ Server for Node2

      Backup HornetQ Server for Node1

       

      Submit a job to our application and log on the console shows job being processed on both nodes (messages are being processed by both nodes).

       

      Steps:

      1. On node1, 'CTRL + C' in the windows that started 'standalone.bat'

      2. Backup on the alternate node2 seems to activate just fine. All records in the submitted job gets processed.

      3. Startup the node1 again by running 'standalone.bat'

      4. Submit the same bulk Job, and now it only gets processed on 'node1' (split brain type scenario - no there is no network issue).


      Same nodes, same network, deployed the same application using EAP 6.4.0.

      Same steps executed and all works well. After bringing back node1 and submitting a bulk job, it gets processed by both nodes.


      <allow-failback> and <failover-on-shutdown> are set to true on all (four) HornetQ servers (Wildfly as well as EAP configuration tests).


      This is easily reproducible. Bug?

        • 1. Re: Co-located Failover/Failback Issue Wildfly 8.2.1
          jbertram

          Are you setting <check-for-live-server> as discussed in the documentation?

           

          Also, if you really think there is a bug I recommend you reproduce on the latest version (which in this case would be Wildfly 10.0.0.Final).

          • 2. Re: Co-located Failover/Failback Issue Wildfly 8.2.1
            amsinha

            Yes, <check-for-live-server> is set to be true on both nodes for each primary hornetq server (True for both wildfly 8.2.1 as well as EAP 6.4.0 environments)

             

            is there anything else that can be done to either confirm or rule or if this is a bug or not?

             

            And if this is to be a bug, Is trying with 10.0.0 the only path forward or would a bug patch to 8.2.1 be a possibility?

             

            Thanks!

            • 3. Re: Co-located Failover/Failback Issue Wildfly 8.2.1
              jbertram

              is there anything else that can be done to either confirm or rule or if this is a bug or not?

              I personally do not have many resources to spend on an issue in a legacy release that may be fixed already upstream which is why I encourage you to reproduce the problem on a current release.  If there actually is an issue that hasn't been fixed already then I'd have some motivation to investigate more and fix it.  That said, my guess is that this is some kind of configuration error or incorrect expectation since EAP releases are run through a series of colocated, replicated tests before making it out the door.  Colocated, replicated use-cases involving fail-over are relatively complex and a 4 point explanation of what's happening isn't much to go on.  If you had a test-case that reproduced the behavior that you're seeing that I could run easily that would help your cause.

               

              And if this is to be a bug, Is trying with 10.0.0 the only path forward or would a bug patch to 8.2.1 be a possibility?

              HornetQ is no longer under active development so there wouldn't be any kind of patch release for it.  Also, I don't think Wildfly does patch releases either.  Community projects are more like the bleeding edge of development - you really need to stay on the current release to have all the latest fixes.  If you want long-term stability then I recommend not just using EAP but getting support subscription from Red Hat so that you can actually get patches for bugs.  Of course, if something is fixed upstream that you want in a previous release you can always back-port it and recompile it yourself.  This is open-source after all.

              • 4. Re: Co-located Failover/Failback Issue Wildfly 8.2.1
                mnovak

                Could you share your configuration with us?

                 

                Thanks,

                Mirek