0 Replies Latest reply on Dec 5, 2016 12:30 PM by java.food

    What is the correct way to configure domain controller failover in WF10?

    java.food

      We plan to run our application in domain mode.  I know that there is no automatic domain controller failover functionality at the moment.  However, the WF documentation on manual domain controller failover is unclear.  Here is our setup:

       

      Domain Controllers:

      1. dc1 -- this is the active controller.  There are no servers defined in its host.xml file.

      2. dc2 -- this is the backup controller.  As with dc1, there are no servers defined in its host.xml file.  The domain-controller section in the host.xml points to dc1.  This host is normally started with the --backup option.

       

      App servers:

      app1, app2 and app3.  These hosts each have 2 servers defined in the host.xml file.  In the domain-controller section, they are configured to talked to dc1 first and then failover to dc2.

       

      Load Balancer:

      lb -- running Apache with modcluster.

       

      When under normal operation, everything works fine.  However, we encounter problems when trying to test this domain controller failure scenario:

      Step 1: Kill dc1, all the app servers fail over to dc2 correctly.

      Step 2: Start dc1 again

      Step 3: Kill dc2.  We expect the app servers to fail over back to dc1.  However, the app servers don't go back to dc1 and we see this error repeating in dc1's log:

       

      [Host Controller] 12:06:23,080 INFO  [org.jboss.as.protocol] (management I/O-1) WFLYPRT0057:  cancelled task by interrupting thread Thread[Host Controller Service Threads - 37,5,Host Controller Service Threads]

      [Host Controller] 12:06:24,406 INFO  [org.jboss.as.protocol] (management I/O-1) WFLYPRT0057:  cancelled task by interrupting thread Thread[Host Controller Service Threads - 37,5,Host Controller Service Threads]

       

      And in the app servers, it logs an exception:

      [Host Controller] 12:17:57,495 ERROR [org.jboss.as.host.controller] (Host Controller Service Threads - 169) WFLYHC0143: Failed to apply domain-wide configuration from master host controller. Operation outcome: failed. Failure description "WFLYCTL0158: Operation handler failed: java.lang.IllegalArgumentException: WFLYCTL0156: step is null"

      [Host Controller] 12:17:57,497 INFO  [org.jboss.as.protocol] (Host Controller Service Threads - 20) WFLYPRT0057:  cancelled task by interrupting thread Thread[Host Controller Service Threads - 169,5,Host Controller Service Threads]

      [Host Controller] 12:17:57,502 WARN  [org.jboss.as.host.controller] (Host Controller Service Threads - 20) WFLYHC0146: Could not discover master using discovery option StaticDiscovery{protocol=remote,host=dc1,port=9999}. Error was: 1-$-

      [Host Controller] 12:18:12,507 INFO  [org.jboss.as.host.controller] (Host Controller Service Threads - 20) WFLYHC0150: Trying to reconnect to master host controller.

      [Host Controller] 12:18:12,593 ERROR [org.jboss.as.controller.management-operation] (Host Controller Service Threads - 171) WFLYCTL0013: Operation ("apply-remote-domain-model") failed - address: ([]): java.lang.IllegalArgumentException: WFLYCTL0156: step is null

      [Host Controller]       at org.jboss.as.controller.AbstractOperationContext.addStep(AbstractOperationContext.java:290)

      [Host Controller]       at org.jboss.as.controller.AbstractOperationContext.addStep(AbstractOperationContext.java:270)

      [Host Controller]       at org.jboss.as.controller.AbstractOperationContext.addStep(AbstractOperationContext.java:246)

      [Host Controller]       at org.jboss.as.domain.controller.operations.SyncServerStateOperationHandler$1.execute(SyncServerStateOperationHandler.java:112)

      [Host Controller]       at org.jboss.as.controller.AbstractOperationContext.executeStep(AbstractOperationContext.java:890)

      [Host Controller]       at org.jboss.as.controller.AbstractOperationContext.processStages(AbstractOperationContext.java:659)

      [Host Controller]       at org.jboss.as.controller.AbstractOperationContext.executeOperation(AbstractOperationContext.java:370)

      [Host Controller]       at org.jboss.as.controller.OperationContextImpl.executeOperation(OperationContextImpl.java:1329)

      [Host Controller]       at org.jboss.as.controller.ModelControllerImpl.internalExecute(ModelControllerImpl.java:400)

      [Host Controller]       at org.jboss.as.controller.AbstractControllerService.internalExecute(AbstractControllerService.java:409)

      [Host Controller]       at org.jboss.as.host.controller.DomainModelControllerService.access$1000(DomainModelControllerService.java:179)

      [Host Controller]       at org.jboss.as.host.controller.DomainModelControllerService$InternalExecutor.execute(DomainModelControllerService.java:1255)

      [Host Controller]       at org.jboss.as.host.controller.RemoteDomainConnectionService.applyRemoteDomainModel(RemoteDomainConnectionService.java:575)

      [Host Controller]       at org.jboss.as.host.controller.RemoteDomainConnectionService.access$1100(RemoteDomainConnectionService.java:131)

      [Host Controller]       at org.jboss.as.host.controller.RemoteDomainConnectionService$2.applyDomainModel(RemoteDomainConnectionService.java:518)

      [Host Controller]       at org.jboss.as.host.controller.RemoteDomainConnection.applyDomainModel(RemoteDomainConnection.java:311)

      [Host Controller]       at org.jboss.as.host.controller.RemoteDomainConnection$RegisterSubsystemsRequest$1.execute(RemoteDomainConnection.java:454)

      [Host Controller]       at org.jboss.as.protocol.mgmt.AbstractMessageHandler$ManagementRequestContextImpl$1.doExecute(AbstractMessageHandler.java:363)

      [Host Controller]       at org.jboss.as.protocol.mgmt.AbstractMessageHandler$AsyncTaskRunner.run(AbstractMessageHandler.java:472)

      [Host Controller]       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

      [Host Controller]       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

      [Host Controller]       at java.lang.Thread.run(Thread.java:745)

      [Host Controller]       at org.jboss.threads.JBossThread.run(JBossThread.java:320)

       

      My questions are:

      1. What is the correct way to configure the domain controller failover?

      2. What is the usage of the --cached-dc flag?  It seems like we don't need to use this flag at all at Step 2 of our testing.