6 Replies Latest reply on Jan 23, 2018 3:24 AM by Miroslav Novak

    Co-located replication failover configuration in standalone-ha.xml EAP  7

    Kavintha Maduranga Newbie

      Is that possible to configure two messaging server nodes which that a single node have live,backup pair configured in standalone-ha.xml and demonstrate fail-over and fail back scenarios. If yes please post the xml files.

        • 1. Re: Co-located replication failover configuration in standalone-ha.xml EAP  7
          Kavintha Maduranga Newbie

          Could manage two instances as

          group 1 ( co-locate default server in node 1 with backup server in node2)

          group 2 ( co-locate default server in node 2 with backup server in node1)


          And activemq subsystem in standalone-ha.xml is as follows,





          <subsystem xmlns="urn:jboss:domain:messaging-activemq:1.0">

                      <server name="default">

                          <cluster password="12345"/>

                          <replication-master check-for-live-server="true" cluster-name="my-cluster" group-name="group1"/>

                          <security-setting name="#">

                              <role name="admin" send="true" consume="true" create-non-durable-queue="true" delete-non-durable-queue="true"/>


                          <address-setting name="#" dead-letter-address="jms.queue.DLQ" expiry-address="jms.queue.ExpiryQueue" max-size-bytes="10485760" page-size-bytes="2097152" message-counter-history-day-limit="10" redistribution-delay="1000" max-delivery-attempts="-1"/>

                          <http-connector name="http-connector" socket-binding="http" endpoint="http-acceptor"/>

                          <http-connector name="http-connector-throughput" socket-binding="http" endpoint="http-acceptor-throughput">

                              <param name="batch-delay" value="50"/>


                          <remote-connector name="netty" socket-binding="messaging"> 

                              <param name="use-nio" value="true"/> 

                              <param name="use-nio-global-worker-pool" value="true"/> 


                          <in-vm-connector name="in-vm" server-id="0"/>

                          <http-acceptor name="http-acceptor" http-listener="default"/>

                          <http-acceptor name="http-acceptor-throughput" http-listener="default">

                              <param name="batch-delay" value="50"/>

                              <param name="direct-deliver" value="false"/>


                          <remote-acceptor name="netty" socket-binding="messaging"> 

                              <param name="use-nio" value="true"/> 


                          <in-vm-acceptor name="in-vm" server-id="0"/>

                          <broadcast-group name="bg-group1" connectors="netty" jgroups-channel="activemq-cluster"/>

                          <discovery-group name="dg-group1" jgroups-channel="activemq-cluster"/>

                          <cluster-connection name="my-cluster" address="jms" connector-name="netty" discovery-group="dg-group1"/>

                          <jms-queue name="ExpiryQueue" entries="java:/jms/queue/ExpiryQueue"/>

                          <jms-queue name="DLQ" entries="java:/jms/queue/DLQ"/>

                          <jms-queue name="ConnectPublish_error" entries="java:jboss/exported/jms/queue/ConnectPublish_error"/>

                          <connection-factory name="InVmConnectionFactory" connectors="in-vm" entries="java:/ConnectionFactory"/>

                          <connection-factory name="RemoteConnectionFactory" ha="true" block-on-acknowledge="true" reconnect-attempts="-1" connectors="netty" entries="java:jboss/exported/jms/RemoteConnectionFactory"/>

                          <pooled-connection-factory name="activemq-ra" transaction="xa" connectors="in-vm" entries="java:/JmsXA java:jboss/DefaultJMSConnectionFactory"/>


                      <server name="backup"> 

                          <security enabled="false"/> 

                          <cluster password="12345"/> 

                          <replication-slave cluster-name="my-cluster" group-name="group2" allow-failback="true" restart-backup="true"/>

                          <bindings-directory path="activemq/bindings-B"/>

                          <journal-directory path="activemq/journal-B"/>

                          <large-messages-directory path="activemq/largemessages-B"/>

                          <paging-directory path="activemq/paging-B"/>

                          <security-setting name="#"> 

                              <role name="guest" manage="true" delete-non-durable-queue="true" create-non-durable-queue="true" delete-durable-queue="true" create-durable-queue="true" consume="true" send="true"/> 


                          <address-setting name="#" redistribution-delay="0" page-size-bytes="524288" max-size-bytes="1048576" max-delivery-attempts="200"/> 

                          <remote-connector name="netty-backup" socket-binding="messaging-backup"/> 

                          <in-vm-connector name="in-vm" server-id="0"/> 

                          <remote-acceptor name="netty-backup" socket-binding="messaging-backup"/> 

                          <broadcast-group name="bg-group-backup" connectors="netty-backup" broadcast-period="1000" jgroups-channel="activemq-cluster"/> 

                          <discovery-group name="dg-group-backup" refresh-timeout="1000" jgroups-channel="activemq-cluster"/> 

                          <cluster-connection name="my-cluster" retry-interval="1000" connector-name="netty-backup" address="jms" discovery-group="dg-group-backup"/> 







          But Still fail back scenario does not occur from node 2 to node 1 when node 1 is up running back.

          • 2. Re: Co-located replication failover configuration in standalone-ha.xml EAP  7
            Miroslav Novak Master



            I've attached standalone-full-ha.xml configs for EAP7.1/WF11 for both of the servers. I did review of yours but could not find any issues in messaging subsystem. There might be issue with JGroups config but it's hard say. If you attach the whole config maybe I'll be able to find the issue.




            1 of 1 people found this helpful
            • 3. Re: Co-located replication failover configuration in standalone-ha.xml EAP  7
              Kavintha Maduranga Newbie

              Hi Miroslav,


              I've used a separate remote-acceptor and remote-connector with separate socket-bindings. Please find the attached xml files.

              1 of 1 people found this helpful
              • 4. Re: Co-located replication failover configuration in standalone-ha.xml EAP  7
                Miroslav Novak Master

                I tried your config and it worked...I could do failover and failback. I believe that I might see possible problem in your case.


                I believe that problem is that max-saved-replicated-journal-size is not set in <replication-slave ... /> in configuration of backup servers. By default it's set to 2 which means after 2 failover-failbacks when live server is starting again, backup does not restart itself and does sync with live server again. At this moment when live is stopped/killed then backup does not start. It's not exactly what you're describing as issue but I believe that you're hitting this issue.


                Could you try to delete journal directories and set <replication-slave max-saved-replicated-journal-size="-1" ... /> , please?


                WF10 has number of issues when configuring replicated journal in colocated topology and I recommend to shared store instead if possible. Problem here is that everytime backup syncs with live server, it moves its journal aside in journal directory and create new one (with up-to-date copy of live's journal). Then futher during failback when live is starting then live is also moving aside its journal and create new one (with up-to-date copy of backup's journal). So those additional copies of journal directories are summing up until out of disc space happens. It can be limited just on backup servers by setting max-saved-replicated-journal-size but unfortunately not on live servers. This annoying thing was fixed in WF11. WF11 also contains lots of fixes for other issues with replicated journal. I strongly recommend to move to this version.


                Let me know if setting max-saved-replicated-journal-size helped, please? There might other issues :-)




                1 of 1 people found this helpful
                • 5. Re: Co-located replication failover configuration in standalone-ha.xml EAP  7
                  Kavintha Maduranga Newbie

                  Hi Miroslav,


                  Thanks for the help and i could manage a failover-failback using max-saved-replicated-journal-size="-1" as well. And as you said backup restart issue was there. Anyway with previous configurations I could create a simple two node cluster with graceful client switching mechanism that balances the load when a server fails. So both nodes act as identical servers and once a failover occurs it get synced and the second node process the new messages + uncommited _from_node1.  But if you want the process get back to node 1, you have to manually break the client connection with node 2. And this behavior is pretty ok for now. Again thanks for the reply and i'll get back if anything goes wrong.

                  • 6. Re: Co-located replication failover configuration in standalone-ha.xml EAP  7
                    Miroslav Novak Master

                    Happy to help, writing standalone JMS client which is able to handle failover without loosing/duplicating message is quite hard. I would recommend to have another WF server with client application deployed where consuming or sending messages is part of XA transaction (which is managed by transaction manager). Here the complexity is left on server.