2 Replies Latest reply on Mar 28, 2018 1:19 PM by Oscar Islas

    Failover problem in wildfly 12 in domain mode

    Oscar Islas Newbie

      Hello,

       

      I was following this tutorial to create a cluster, however the configuration is not complete or just isn't working for me.

      The configuration in the tutorial isn't working for me, so I've been fixing every new issue that appears, but the one that I can't seem to fix is the one of failover.

       

      When I try to shutdown the first server so that the second one picks up the job, it doesn't work, and the page just says "Site can't be reached".

      Also the log doesn't tell that an error has ocurred but just a warning, and still doesn't work.

       

      2018-03-27 12:24:28,183 INFO  [org.infinispan.CLUSTER] (remote-thread--p5-t1) ISPN000310: Starting cluster-wide rebalance for cache default-server, topology CacheTopology{id=2, rebalanceId=2, currentCH=ReplicatedConsistentHash{ns = 256, owners = (1)[master: 256]}, pendingCH=ReplicatedConsistentHash{ns = 256, owners = (2)[master: 126, slave: 130]}, unionCH=null, phase=READ_OLD_WRITE_ALL, actualMembers=[master, slave], persistentUUIDs=[501fd582-6eeb-4f6a-bbf3-60505aed7e37, 7897b9ee-2fe6-44dd-9939-8cf5e4d73489]}
      2018-03-27 12:24:28,183 INFO  [org.infinispan.CLUSTER] (remote-thread--p6-t2) ISPN000310: Starting cluster-wide rebalance for cache client-mappings, topology CacheTopology{id=2, rebalanceId=2, currentCH=ReplicatedConsistentHash{ns = 256, owners = (1)[master: 256]}, pendingCH=ReplicatedConsistentHash{ns = 256, owners = (2)[master: 126, slave: 130]}, unionCH=null, phase=READ_OLD_WRITE_ALL, actualMembers=[master, slave], persistentUUIDs=[ebfb150a-2304-4e69-ae9d-1c04b4b336ae, afd55427-e889-46bf-836d-99b448fdde3e]}
      2018-03-27 12:24:28,201 INFO  [org.infinispan.CLUSTER] (remote-thread--p5-t1) [Context=default-server][Scope=master]ISPN100002: Started rebalance with topology id 2
      2018-03-27 12:24:28,201 INFO  [org.infinispan.CLUSTER] (remote-thread--p6-t2) [Context=client-mappings][Scope=master]ISPN100002: Started rebalance with topology id 2
      2018-03-27 12:24:28,232 INFO  [org.infinispan.CLUSTER] (transport-thread--p15-t11) [Context=default-server][Scope=master]ISPN100003: Node master finished rebalance phase with topology id 2
      2018-03-27 12:24:28,281 INFO  [org.infinispan.CLUSTER] (transport-thread--p14-t8) [Context=client-mappings][Scope=master]ISPN100003: Node master finished rebalance phase with topology id 2
      2018-03-27 12:24:28,296 INFO  [org.infinispan.CLUSTER] (remote-thread--p5-t1) ISPN000310: Starting cluster-wide rebalance for cache cluster-demo-master.war, topology CacheTopology{id=2, rebalanceId=2, currentCH=DefaultConsistentHash{ns=256, owners = (1)[master: 256+0]}, pendingCH=DefaultConsistentHash{ns=256, owners = (2)[master: 126+130, slave: 130+126]}, unionCH=null, phase=READ_OLD_WRITE_ALL, actualMembers=[master, slave], persistentUUIDs=[501fd582-6eeb-4f6a-bbf3-60505aed7e37, 7897b9ee-2fe6-44dd-9939-8cf5e4d73489]}
      2018-03-27 12:24:28,297 INFO  [org.infinispan.CLUSTER] (remote-thread--p5-t1) [Context=cluster-demo-master.war][Scope=master]ISPN100002: Started rebalance with topology id 2
      2018-03-27 12:24:28,383 INFO  [org.infinispan.CLUSTER] (transport-thread--p15-t14) [Context=cluster-demo-master.war][Scope=master]ISPN100003: Node master finished rebalance phase with topology id 2
      2018-03-27 12:24:31,622 INFO  [org.infinispan.CLUSTER] (remote-thread--p5-t1) [Context=cluster-demo-master.war][Scope=slave]ISPN100003: Node slave finished rebalance phase with topology id 2
      2018-03-27 12:24:31,622 INFO  [org.infinispan.CLUSTER] (remote-thread--p5-t1) ISPN000336: Finished cluster-wide rebalance for cache cluster-demo-master.war, topology id = 2
      2018-03-27 12:24:31,639 INFO  [org.infinispan.CLUSTER] (transport-thread--p15-t19) [Context=cluster-demo-master.war][Scope=master]ISPN100003: Node master finished rebalance phase with topology id 3
      2018-03-27 12:24:31,748 INFO  [org.infinispan.CLUSTER] (remote-thread--p5-t1) [Context=cluster-demo-master.war][Scope=slave]ISPN100003: Node slave finished rebalance phase with topology id 3
      2018-03-27 12:24:31,752 INFO  [org.infinispan.CLUSTER] (transport-thread--p15-t22) [Context=cluster-demo-master.war][Scope=master]ISPN100003: Node master finished rebalance phase with topology id 4
      2018-03-27 12:24:31,755 INFO  [org.infinispan.CLUSTER] (remote-thread--p5-t1) [Context=cluster-demo-master.war][Scope=slave]ISPN100003: Node slave finished rebalance phase with topology id 4
      2018-03-27 12:24:33,353 INFO  [org.infinispan.CLUSTER] (remote-thread--p5-t1) [Context=default-server][Scope=slave]ISPN100003: Node slave finished rebalance phase with topology id 2
      2018-03-27 12:24:33,354 INFO  [org.infinispan.CLUSTER] (remote-thread--p6-t2) [Context=client-mappings][Scope=slave]ISPN100003: Node slave finished rebalance phase with topology id 2
      2018-03-27 12:24:33,354 INFO  [org.infinispan.CLUSTER] (remote-thread--p5-t1) ISPN000336: Finished cluster-wide rebalance for cache default-server, topology id = 2
      2018-03-27 12:24:33,355 INFO  [org.infinispan.CLUSTER] (remote-thread--p6-t2) ISPN000336: Finished cluster-wide rebalance for cache client-mappings, topology id = 2
      2018-03-27 12:24:33,358 INFO  [org.infinispan.CLUSTER] (transport-thread--p14-t12) [Context=client-mappings][Scope=master]ISPN100003: Node master finished rebalance phase with topology id 3
      2018-03-27 12:24:33,359 INFO  [org.infinispan.CLUSTER] (remote-thread--p5-t1) [Context=default-server][Scope=slave]ISPN100003: Node slave finished rebalance phase with topology id 3
      2018-03-27 12:24:33,365 INFO  [org.infinispan.CLUSTER] (transport-thread--p15-t3) [Context=default-server][Scope=master]ISPN100003: Node master finished rebalance phase with topology id 3
      2018-03-27 12:24:33,368 INFO  [org.infinispan.CLUSTER] (remote-thread--p6-t2) [Context=client-mappings][Scope=slave]ISPN100003: Node slave finished rebalance phase with topology id 3
      2018-03-27 12:24:33,371 INFO  [org.infinispan.CLUSTER] (remote-thread--p5-t1) [Context=default-server][Scope=slave]ISPN100003: Node slave finished rebalance phase with topology id 4
      2018-03-27 12:24:33,369 INFO  [org.infinispan.CLUSTER] (transport-thread--p15-t3) [Context=default-server][Scope=master]ISPN100003: Node master finished rebalance phase with topology id 4
      2018-03-27 12:24:33,378 INFO  [org.infinispan.CLUSTER] (transport-thread--p14-t15) [Context=client-mappings][Scope=master]ISPN100003: Node master finished rebalance phase with topology id 4
      2018-03-27 12:24:33,384 INFO  [org.infinispan.CLUSTER] (remote-thread--p6-t2) [Context=client-mappings][Scope=slave]ISPN100003: Node slave finished rebalance phase with topology id 4
      2018-03-27 12:24:36,648 INFO  [org.apache.activemq.artemis.core.server] (Thread-1 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$3@1444d2c)) AMQ221027: Bridge ClusterConnectionBridge@17bb730 [name=sf.my-cluster.faac71f2-312b-11e8-bac3-b86920524153, queue=QueueImpl[name=sf.my-cluster.faac71f2-312b-11e8-bac3-b86920524153, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=f7d3fd88-312b-11e8-b695-b86920524153]]@8a2675 targetConnector=ServerLocatorImpl (identity=(Cluster-connection-bridge::ClusterConnectionBridge@17bb730 [name=sf.my-cluster.faac71f2-312b-11e8-bac3-b86920524153, queue=QueueImpl[name=sf.my-cluster.faac71f2-312b-11e8-bac3-b86920524153, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=f7d3fd88-312b-11e8-b695-b86920524153]]@8a2675 targetConnector=ServerLocatorImpl [initialConnectors=[TransportConfiguration(name=http-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?httpUpgradeEndpoint=http-acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=8380&host=192-168-80-190], discoveryGroupConfiguration=null]]::ClusterConnectionImpl@31549909[nodeUUID=f7d3fd88-312b-11e8-b695-b86920524153, connector=TransportConfiguration(name=http-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?httpUpgradeEndpoint=http-acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=8330&host=192-168-80-187, address=jms, server=ActiveMQServerImpl::serverUUID=f7d3fd88-312b-11e8-b695-b86920524153])) [initialConnectors=[TransportConfiguration(name=http-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?httpUpgradeEndpoint=http-acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=8380&host=192-168-80-190], discoveryGroupConfiguration=null]] is connected
      2018-03-27 12:27:06,491 INFO  [stdout] (default task-1) Putting date now
      
      
      2018-03-27 12:29:08,802 WARN  [org.apache.activemq.artemis.core.server] (Thread-2 (ActiveMQ-client-global-threads)) AMQ222095: Connection failed with failedOver=false
      2018-03-27 12:29:13,741 INFO  [org.infinispan.CLUSTER] (VERIFY_SUSPECT.TimerThread-19,ejb,master) ISPN000094: Received new cluster view for channel ejb: [master|2] (1) [master]
      2018-03-27 12:29:13,748 INFO  [org.infinispan.CLUSTER] (VERIFY_SUSPECT.TimerThread-19,ejb,master) ISPN100001: Node slave left the cluster
      2018-03-27 12:29:13,749 INFO  [org.infinispan.CLUSTER] (VERIFY_SUSPECT.TimerThread-19,ejb,master) ISPN000094: Received new cluster view for channel ejb: [master|2] (1) [master]
      2018-03-27 12:29:13,749 INFO  [org.infinispan.CLUSTER] (VERIFY_SUSPECT.TimerThread-19,ejb,master) ISPN100001: Node slave left the cluster
      2018-03-27 12:29:13,750 INFO  [org.infinispan.CLUSTER] (VERIFY_SUSPECT.TimerThread-19,ejb,master) ISPN000094: Received new cluster view for channel ejb: [master|2] (1) [master]
      2018-03-27 12:29:13,750 INFO  [org.infinispan.CLUSTER] (VERIFY_SUSPECT.TimerThread-19,ejb,master) ISPN100001: Node slave left the cluster
      2018-03-27 12:29:13,750 INFO  [org.infinispan.CLUSTER] (VERIFY_SUSPECT.TimerThread-19,ejb,master) ISPN000094: Received new cluster view for channel ejb: [master|2] (1) [master]
      2018-03-27 12:29:13,752 INFO  [org.infinispan.CLUSTER] (VERIFY_SUSPECT.TimerThread-19,ejb,master) ISPN100001: Node slave left the cluster
      2018-03-27 12:29:13,774 WARN  [org.infinispan.CLUSTER] (transport-thread--p14-t19) [Context=client-mappings]ISPN000314: Lost at least half of the stable members, possible split brain causing data inconsistency. Current members are [master], lost members are [slave], stable members are [master, slave]
      2018-03-27 12:29:13,775 WARN  [org.infinispan.CLUSTER] (transport-thread--p15-t12) [Context=default-server]ISPN000314: Lost at least half of the stable members, possible split brain causing data inconsistency. Current members are [master], lost members are [slave], stable members are [master, slave]
      2018-03-27 12:29:13,780 WARN  [org.infinispan.CLUSTER] (transport-thread--p15-t12) [Context=cluster-demo-master.war]ISPN000314: Lost at least half of the stable members, possible split brain causing data inconsistency. Current members are [master], lost members are [slave], stable members are [master, slave]

       

      I'm working on a single machine in Windows 7, I have a master and slave nodes and when I try to test I kill the process with "taskkill /PID", I have also tried to shutdown the server gracefully and pretty much every way but it doesn't work.

      I'm just using a very simple application to print and save a Date on one page and then display that Date on another page, and yes my web.xml has the <distributable/> tag.

       

      I'm going to attach the logs of both servers as well as my configuration for the domain.xml which is where i think the problem is.

       

      Any help is greatly appreciated, thank you.

        • 1. Re: Failover problem in wildfly 12 in domain mode
          Radoslav Husar Master

          How are you testing?

           

          It seems like "Site can't be reached" is a message from the Google Chrome browser.

           

          This site can’t be reached

          localhost refused to connect.

          ERR_CONNECTION_REFUSED

           

          You are trying to access the server that you shutdown. You either need to change the port to the second node (as to simulate what a load balancer would do on failover) or setup/access the load balancer address.

           

          Also, the document is out of date and needs an update.

          • 2. Re: Failover problem in wildfly 12 in domain mode
            Oscar Islas Newbie

            Hi Radoslav,

             

            I'm using two static ip's, 192.168.80.187:8330 (master) and 192.168.80.190:8380 (slave), the process I follow to test is this one:

             

            1. Start both servers, corroborate that the cluster was formed without errors. All is good here.

             

            2. Access access master in the browser and print and put a Date. All good here.

             

            3. Shutdown master by killing its process.

             

            4. Try to access master, slave, master:8380 or slave:8330 and no combination works, the only possitive result I get is from slave, which gets me a null date because it acts, for some reason, as a separate instance and thinks that a date was never saved.

             

            Also I don't understand what you mean by "the document is out of date and needs an update"

             

            Thanks for the help.