9 Replies Latest reply on Oct 25, 2017 9:10 AM by Wayne Wang

    Cluster of wildfly instances in domain mode and configuration of message high availablity

    Wayne Wang Apprentice

      Hi,

       

      I worked on cluster of standalone wildfly instances with message configuration for high availability using shared-store approach.

       

      Now I am trying to working on cluster of wildfly instances running in domain mode, and configuring for message high availability. The shared-store approach seems to have problem since the domain.xml is a shared configuration.

       

      Does it mean I will configure domain.xml differently in master vs slave?

       

      Thanks,

       

      Wayne

        • 1. Re: Cluster of wildfly instances in domain mode and configuration of message high availablity
          Miroslav Novak Master

          You can use system properties to set shared store location for each server in server-group.

          • 2. Re: Cluster of wildfly instances in domain mode and configuration of message high availablity
            Wayne Wang Apprentice

            Hi Miroslav,

             

            I am working on configuring the cluster of instances running in domain mode.

             

            I had a basic walk-through to set up instances in domain mode with a very simple web example (no messaging HA). I can confirm that everything looks fine. This is basically a walk-though from jboss website (Clustering and Domain Setup Walkthrough - WildFly 10 - Project Documentation Editor )

             

            Then I started to configure (manually for now) the the domain.xml (domain controller and host controller). I put what has been a good configuration for cluster of instances running in standalone mode into domain mode (domain.xml). The only problem now is that the domain controller started up with no problem, but the host controller did not start up completely. Basically console message such as following line is missing (in domain controller, but not in host controller):

             

            [Server:server-one] 12:51:15,459 INFO  [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly Full 10.1.0.Final (WildFly Core 2.2.0.Final) started in 7904ms - Started 444 of 719 services (510 services are lazy, passive or on-demand)

             

            In host controller, I found the following:

             

            #####################################################################################################################################

            [Host Controller] 13:05:05,617 INFO  [org.jboss.as.host.controller] (Controller Boot Thread) WFLYHC0023: Starting server server-one-slave

            13:05:05,667 INFO  [org.jboss.as.process.Server:server-one-slave.status] (ProcessController-threads - 3) WFLYPC0018: Starting process 'Server:server-one-slave'

            [Server:server-one-slave] 13:05:06,013 INFO  [org.jboss.modules] (main) JBoss Modules version 1.5.2.Final

            [Server:server-one-slave] 13:05:06,159 INFO  [org.jboss.msc] (main) JBoss MSC version 1.2.6.Final

            [Server:server-one-slave] 13:05:06,221 INFO  [org.jboss.as] (MSC service thread 1-2) WFLYSRV0049: WildFly Full 10.1.0.Final (WildFly Core 2.2.0.Final) starting

            .......

            .......

             

            [Server:server-one-slave] 13:05:09,719 INFO  [org.apache.activemq.artemis.core.server] (ServerService Thread Pool -- 68) AMQ221034: Waiting indefinitely to obtain live lock

            [Server:server-one-slave] 13:05:11,667 INFO  [org.apache.activemq.artemis.core.server] (Thread-1 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$2@63ef6d3-1756878130)) AMQ221031: backup announced

             

            ######################################################################################################################################

             

            After about 5 minutes, then the following error messages:

             

            ########################################################################################################################################

             

            [Server:server-one-slave] 13:10:08,797 ERROR [org.jboss.as.controller.management-operation] (Controller Boot Thread) WFLYCTL0348: Timeout after [300] seconds waiting for service container stability. Operation will roll back. Step that first updated            the service container was 'add' at address '[("interface" => "unsecure")]'

            [Server:server-one-slave] 13:10:08,810 INFO  [org.wildfly.extension.undertow] (MSC service thread 1-2) WFLYUT0008: Undertow HTTPS listener https suspending

            [Server:server-one-slave] 13:10:08,812 INFO  [org.wildfly.extension.undertow] (MSC service thread 1-2) WFLYUT0007: Undertow HTTPS listener https stopped, was bound to 10.35.1.209:8443

            [Server:server-one-slave] 13:10:08,826 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-2) ISPN000080: Disconnecting JGroups channel custom

            [Server:server-one-slave] 13:10:08,827 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-2) ISPN000082: Stopping the RpcDispatcher for channel custom

            [Server:server-one-slave] 13:10:08,828 INFO  [org.apache.activemq.artemis.core.server] (AMQ119000: Activation for server ActiveMQServerImpl::serverUUID=37e8f297-b8c9-11e7-a4b1-a161b4512d80) AMQ221033: ** got backup lock

            [Server:server-one-slave] 13:10:08,830 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-1) ISPN000080: Disconnecting JGroups channel hibernate

            [Server:server-one-slave] 13:10:08,830 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-1) ISPN000082: Stopping the RpcDispatcher for channel hibernate

            [Server:server-one-slave] 13:10:08,833 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-1) ISPN000080: Disconnecting JGroups channel server

            [Server:server-one-slave] 13:10:08,833 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-1) ISPN000082: Stopping the RpcDispatcher for channel server

            [Server:server-one-slave] 13:10:08,835 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-4) ISPN000080: Disconnecting JGroups channel ejb

            [Server:server-one-slave] 13:10:08,835 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-4) ISPN000082: Stopping the RpcDispatcher for channel ejb

            [Server:server-one-slave] 13:10:08,850 INFO  [org.apache.activemq.artemis.core.server] (ServerService Thread Pool -- 78) AMQ221002: Apache ActiveMQ Artemis Message Broker version 1.1.0.wildfly-017 [37e8f297-b8c9-11e7-a4b1-a161b4512d80] stopped

            [Server:server-one-slave] 13:10:08,853 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-3) ISPN000080: Disconnecting JGroups channel web

            [Server:server-one-slave] 13:10:08,854 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-3) ISPN000082: Stopping the RpcDispatcher for channel web

             

            It looks like the host controller is missing some configuration in order to start up properly.

             

            What do you think could be missing or wrong?

             

            Thanks,

             

            Wayne

            • 3. Re: Cluster of wildfly instances in domain mode and configuration of message high availablity
              Wayne Wang Apprentice

              Hi Miroslav,

               

              I would like to mention that I also tried to start the server in management console, and it did not work either.

               

              Wayne

              • 4. Re: Cluster of wildfly instances in domain mode and configuration of message high availablity
                Wayne Wang Apprentice

                Hi Miroslav,

                 

                I just found out that the host controller server will be started up if I remove the files in the shared drives (shared-store approach). However, this is not a solution

                 

                The following is from the console after I removed all the files and folders.

                 

                #########################################################################################

                [Server:server-one-slave] 14:36:57,496 INFO  [org.apache.activemq.artemis.core.server] (Thread-1 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$2@56b903b6-525344043)) AMQ221031: backup announced

                [Server:server-one-slave] 14:37:45,533 ERROR [org.apache.activemq.artemis.core.server] (ServerService Thread Pool -- 59) AMQ224000: Failure in initialisation: java.io.IOException: Cannot open file:No such file or directory

                [Server:server-one-slave]       at org.apache.activemq.artemis.jlibaio.LibaioContext.open(Native Method)

                [Server:server-one-slave]       at org.apache.activemq.artemis.jlibaio.LibaioContext.openControlFile(LibaioContext.java:308)

                [Server:server-one-slave]       at org.apache.activemq.artemis.core.server.impl.AIOFileLockNodeManager.tryLock(AIOFileLockNodeManager.java:56)

                [Server:server-one-slave]       at org.apache.activemq.artemis.core.server.impl.AIOFileLockNodeManager.lock(AIOFileLockNodeManager.java:76)

                [Server:server-one-slave]       at org.apache.activemq.artemis.core.server.impl.FileLockNodeManager.startLiveNode(FileLockNodeManager.java:167)

                [Server:server-one-slave]       at org.apache.activemq.artemis.core.server.impl.SharedStoreLiveActivation.run(SharedStoreLiveActivation.java:63)

                [Server:server-one-slave]       at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.start(ActiveMQServerImpl.java:396)

                [Server:server-one-slave]       at org.apache.activemq.artemis.jms.server.impl.JMSServerManagerImpl.start(JMSServerManagerImpl.java:381)

                [Server:server-one-slave]       at org.wildfly.extension.messaging.activemq.jms.JMSService.doStart(JMSService.java:199)

                [Server:server-one-slave]       at org.wildfly.extension.messaging.activemq.jms.JMSService.access$000(JMSService.java:63)

                [Server:server-one-slave]       at org.wildfly.extension.messaging.activemq.jms.JMSService$1.run(JMSService.java:97)

                [Server:server-one-slave]       at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

                [Server:server-one-slave]       at java.util.concurrent.FutureTask.run(FutureTask.java:266)

                [Server:server-one-slave]       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

                [Server:server-one-slave]       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

                [Server:server-one-slave]       at java.lang.Thread.run(Thread.java:745)

                [Server:server-one-slave]       at org.jboss.threads.JBossThread.run(JBossThread.java:320)

                [Server:server-one-slave]

                [Server:server-one-slave] 14:37:45,534 INFO  [org.apache.activemq.artemis.core.server] (ServerService Thread Pool -- 59) AMQ221001: Apache ActiveMQ Artemis Message Broker version 1.1.0.wildfly-017 [nodeID=49d31e5e-b8ea-11e7-a4ee-85ed33f5e6dc]

                 

                [Server:server-one-slave] 14:37:45,660 INFO  [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly Full 10.1.0.Final  (WildFly Core 2.2.0.Final) started in 52405ms - Started 426 of 714 services (510 services are lazy, passive or on-demand)

                 

                [Server:server-one-slave] 14:37:45,762 ERROR [org.apache.activemq.artemis.core.server] (AMQ119000: Activation for server ActiveMQServerImpl::serverUUID=49deb71f-b8ea-11e7-a4ee-85ed33f5e6dc) AMQ224000: Failure in initialisation: java.io.IOException:   Cannot open file:No such file or directory

                [Server:server-one-slave]       at org.apache.activemq.artemis.jlibaio.LibaioContext.open(Native Method)

                [Server:server-one-slave]       at org.apache.activemq.artemis.jlibaio.LibaioContext.openControlFile(LibaioContext.java:308)

                [Server:server-one-slave]       at org.apache.activemq.artemis.core.server.impl.AIOFileLockNodeManager.tryLock(AIOFileLockNodeManager.java:56)

                [Server:server-one-slave]       at org.apache.activemq.artemis.core.server.impl.AIOFileLockNodeManager.lock(AIOFileLockNodeManager.java:76)

                [Server:server-one-slave]       at org.apache.activemq.artemis.core.server.impl.FileLockNodeManager.startBackup(FileLockNodeManager.java:153)

                [Server:server-one-slave]       at org.apache.activemq.artemis.core.server.impl.SharedStoreBackupActivation.run(SharedStoreBackupActivation.java:61)

                [Server:server-one-slave]       at java.lang.Thread.run(Thread.java:745)

                [Server:server-one-slave]

                • 5. Re: Cluster of wildfly instances in domain mode and configuration of message high availablity
                  Wayne Wang Apprentice

                  Hi Miroslav,

                   

                  This seemed to be related to the fact that the domain controller create server.lock file in the journal directory. The host controller server will be started up if I remove the lock file.

                   

                  Do you have any comments?

                   

                  Thanks,

                   

                  Wayne

                  • 6. Re: Cluster of wildfly instances in domain mode and configuration of message high availablity
                    Wayne Wang Apprentice

                    Hi Miroslav,

                     

                    It looks like the host controller started with the same configuration as the domain since the message showed the same active / back up message server.

                    I will try system property to re-configure the two servers

                     

                    Regards,

                     

                    Wayne

                    • 7. Re: Cluster of wildfly instances in domain mode and configuration of message high availablity
                      Wayne Wang Apprentice

                      Hi Miroslav,

                       

                      It worked now.

                       

                      I passed the following command line option for domain controller:

                      -DmessageDir=A -DmessageBackDir=B

                       

                      I passed the following command line option for a host controller

                      -DmessageDir=B -DmessageBackDir=A

                       

                      When I created two distinct configuration files, only the version of the domain controller actually got executed.

                       

                      Thanks,

                       

                      Wayne

                      • 8. Re: Cluster of wildfly instances in domain mode and configuration of message high availablity
                        Miroslav Novak Master

                        Hi Wayne, I've never tried to configure this in domain and did not have much time to take a look. So issue what you're seeing is that configuration of domain controller is used on all brokers? Could you share your configs?

                         

                        Thanks,

                        Mirek

                        • 9. Re: Cluster of wildfly instances in domain mode and configuration of message high availablity
                          Wayne Wang Apprentice

                          Hi Mirek,

                           

                          The issue is that once I have a hard-coded configuration in domain controller (domain.xml), whatever configuration (domain.xml) I hard coded in host controller will not be read as is. The host controller domain.xml may have values different from that of the domain.xml in domain controller, but the value is not read. Instead, the value in domain controller domain.xml will be read.

                           

                          However, if I pass a system property with different value in command line to domain controller and host controller with domain.xml using variable such as ${messageDir} or ${messageBackDir}, then domain controller and host controller will have different values in run-time configuration, and this is what shared-store approach requires. Note, this problem does not happen with cluster of wildfly instances running in standalone mode.

                           

                          You can see that the configuration is the same with variables. If I hard code it as A or B in the configuration files, it will not work

                           

                          The following is snapshot of message configuration for domain controller

                           

                                          <server name="backup">

                                              <cluster password="password"/>

                                              <shared-store-slave failover-on-server-shutdown="true"/>

                                              <bindings-directory path="/nfsshare/bindings-${messageBackDir}"/>

                                              <journal-directory path="/nfsshare/journal-${messageBackDir}"/>

                                              <large-messages-directory path="/nfsshare/largemessages-${messageBackDir}"/>

                                              <paging-directory path="/nfsshare/paging-${messageBackDir}"/>

                                              <address-setting name="#" dead-letter-address="jms.queue.DLQ" expiry-address="jms.queue.ExpiryQueue" max-size-bytes="10485760" page-size-bytes="2097152" message-counter-history-day-limit="10" redistribution-del

                          ay="1000"/>

                                              <remote-connector name="netty-backup" socket-binding="messaging-backup"/>

                                              <remote-acceptor name="netty-backup" socket-binding="messaging-backup"/>

                                              <broadcast-group name="bg-group1" jgroups-channel="activemq-cluster" connectors="netty-backup"/>

                                              <discovery-group name="dg-group1" jgroups-channel="activemq-cluster"/>

                                              <cluster-connection name="my-cluster" address="jms" connector-name="netty-backup" discovery-group="dg-group1"/>

                                          </server>

                           

                           

                          The following is snapshot of message configuration for host controller

                           

                                          <server name="backup">

                                              <cluster password="password"/>

                                              <shared-store-slave failover-on-server-shutdown="true"/>

                                              <bindings-directory path="/nfsshare/bindings-${messageBackDir}"/>

                                              <journal-directory path="/nfsshare/journal-${messageBackDir}"/>

                                              <large-messages-directory path="/nfsshare/largemessages-${messageBackDir}"/>

                                              <paging-directory path="/nfsshare/paging-${messageBackDir}"/>

                                              <address-setting name="#" dead-letter-address="jms.queue.DLQ" expiry-address="jms.queue.ExpiryQueue" max-size-bytes="10485760" page-size-bytes="2097152" message-counter-history-day-limit="10" redistribution-d

                          elay="1000"/>

                                              <remote-connector name="netty-backup" socket-binding="messaging-backup"/>

                                              <remote-acceptor name="netty-backup" socket-binding="messaging-backup"/>

                                              <broadcast-group name="bg-group1" jgroups-channel="activemq-cluster" connectors="netty-backup"/>

                                              <discovery-group name="dg-group1" jgroups-channel="activemq-cluster"/>

                                              <cluster-connection name="my-cluster" address="jms" connector-name="netty" discovery-group="dg-group1"/>

                                          </server>

                           

                          Regards,

                           

                          Wayne