1 Reply Latest reply on Apr 6, 2016 7:40 AM by jprio

    Wildfly 10.0.0.Final slow shutdown

    jprio

      Hello,

      Since our migration from JB7 to WF10, the nodes of our production cluster are taking a very long time to stop. Here is our configuration :

      • WF10.0.0.Final, java build 1.8.0_72-b15, mod_cluster 1.2.6.Final
      • Domain mode, full-ha profile
      • <distributable/> app with ~700 sessions (we’re using Wicket as our web fmk)
      • 4 nodes in the cluster
      • Default configuration of :
        • Infinispan for session replication (except for the locking isolation : READ_COMMITED)
        • Mod cluster

       

      The nodes are stopping very quickly in QA (2/3s).

       

      In production, our application takes always approx. 242s to stop and the server takes always approx. 320s to stop.

       

      What we saw in the logs :

       

      • Stop server, session draining :

       

      [root@s-pho-tmp ~]# grep 'All pending' /usr/local/apps/wildfly-10.0.0.Final/domain/servers/s-pho-tmp/log/server.log

      2016-03-23 08:18:53,822 INFO [org.jboss.as.server] (ServerService Thread Pool -- 80) WFLYSRV0211: Suspending server with 0ms timeout.

      2016-03-23 08:21:02,151 INFO [org.jboss.modcluster] (ServerService Thread Pool -- 80) MODCLUSTER000021: All pending requests drained from default-host:/phoenix-tmp in 80.1 seconds

      2016-03-23 08:21:02,162 INFO [org.jboss.as.server] (Thread-2) WFLYSRV0220: Server shutdown has been requested.

      2016-03-23 08:23:42,276 INFO [org.jboss.modcluster] (ServerService Thread Pool -- 86) MODCLUSTER000021: All pending requests drained from default-host:/phoenix-tmp in 80.1 seconds

      2016-03-23 08:25:03,915 INFO [org.jboss.as.server.deployment] (MSC service thread 1-8) WFLYSRV0028: Stopped deployment aecetia-web-15.1.5-tmp.war (runtime-name: aecetia-web-15.1.5-tmp.war) in 241744ms

      2016-03-23 08:25:33,216 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-1) ISPN000080: Disconnecting JGroups channel web

      2016-03-23 08:25:33,216 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-1) ISPN000082: Stopping the RpcDispatcher for channel web

      2016-03-23 08:26:22,432 INFO [org.wildfly.extension.undertow] (MSC service thread 1-7) WFLYUT0008: Undertow HTTP listener default suspending

      2016-03-23 08:26:22,434 INFO [org.jboss.modcluster] (ServerService Thread Pool -- 97) MODCLUSTER000002: Initiating mod_cluster shutdown

      2016-03-23 08:26:22,436 INFO [org.wildfly.extension.undertow] (MSC service thread 1-7) WFLYUT0007: Undertow HTTP listener default stopped, was bound to 192.168.123.196:8080

      2016-03-23 08:26:22,438 INFO [org.wildfly.extension.undertow] (MSC service thread 1-7) WFLYUT0004: Undertow 1.3.15.Final stopping

      2016-03-23 08:26:22,472 INFO [org.jboss.as] (MSC service thread 1-5) WFLYSRV0050: WildFly Full 10.0.0.Final (WildFly Core 2.0.10.Final) stopped in 320294ms

       

      • Webapp stop duration :

       

      [root@s-pho-tmp ~]# grep 'Stopped deployment aecetia-web-15.1.5-tmp.war' /usr/local/apps/wildfly-10.0.0.Final/domain/servers/s-pho-tmp/log/server.log

      2016-03-23 08:25:03,915 INFO  [org.jboss.as.server.deployment] (MSC service thread 1-8) WFLYSRV0028: Stopped deployment aecetia-web-15.1.5-tmp.war (runtime-name: aecetia-web-15.1.5-tmp.war) in 241744ms

      2016-03-23 08:41:14,881 INFO [org.jboss.as.server.deployment] (MSC service thread 1-4) WFLYSRV0028: Stopped deployment aecetia-web-15.1.5-tmp.war (runtime-name: aecetia-web-15.1.5-tmp.war) in 241812ms

      2016-03-23 09:52:21,457 INFO [org.jboss.as.server.deployment] (MSC service thread 1-2) WFLYSRV0028: Stopped deployment aecetia-web-15.1.5-tmp.war (runtime-name: aecetia-web-15.1.5-tmp.war) in 242510ms

      2016-03-23 10:31:00,402 INFO [org.jboss.as.server.deployment] (MSC service thread 1-2) WFLYSRV0028: Stopped deployment aecetia-web-15.1.5-tmp.war (runtime-name: aecetia-web-15.1.5-tmp.war) in 242063ms

       

      • Server stop duration :

       

      [root@s-pho-tmp ~]# grep 'WildFly Full 10.0.0.Final (WildFly Core 2.0.10.Final) stopped' /usr/local/apps/wildfly-10.0.0.Final/domain/servers/s-pho-tmp/log/server.log

      2016-03-24 08:34:56,689 INFO [org.jboss.as] (MSC service thread 1-3) WFLYSRV0050: WildFly Full 10.0.0.Final (WildFly Core 2.0.10.Final) stopped in 320227ms

      2016-03-24 09:36:04,231 INFO [org.jboss.as] (MSC service thread 1-3) WFLYSRV0050: WildFly Full 10.0.0.Final (WildFly Core 2.0.10.Final) stopped in 320227ms

      2016-03-24 09:50:27,389 INFO [org.jboss.as] (MSC service thread 1-2) WFLYSRV0050: WildFly Full 10.0.0.Final (WildFly Core 2.0.10.Final) stopped in 320226ms

      2016-03-24 10:27:36,769 INFO [org.jboss.as] (MSC service thread 1-3) WFLYSRV0050: WildFly Full 10.0.0.Final (WildFly Core 2.0.10.Final) stopped in 320224ms

      2016-03-24 10:51:56,552 INFO [org.jboss.as] (MSC service thread 1-5) WFLYSRV0050: WildFly Full 10.0.0.Final (WildFly Core 2.0.10.Final) stopped in 320317ms

      2016-03-24 11:13:24,815 INFO  [org.jboss.as] (MSC service thread 1-3) WFLYSRV0050: WildFly Full 10.0.0.Final (WildFly Core 2.0.10.Final) stopped in 320255ms

      2016-03-24 11:27:07,772 INFO [org.jboss.as] (MSC service thread 1-7) WFLYSRV0050: WildFly Full 10.0.0.Final (WildFly Core 2.0.10.Final) stopped in 320274ms

       

      • ISPN timeout

       

      2016-03-23 15:49:19,426 ERROR [org.infinispan.topology.ClusterTopologyManagerImpl] (transport-thread--p16-t6) ISPN000196: Failed to recover cluster state after the current node became the coordinator: java.util.concurrent.ExecutionException: org.infinispan.util.concurrent.TimeoutException: Timed out waiting for valid responses!

       

      What we tried :

      • Modify the transport timeout (->40000) for ISPN => no effect
      • Modify the state-transfert time out (-> 120000) for ISPN => no effect
      • Modifify the drainingSessionStrategy (-> NEVER) for the mod cluster config => no effect

       

       

      Any help appreciated. The impact for us : a long time to restart a node and deliver a new version of the application.

       

       

      Regards,

      JP

        • 1. Re: Wildfly 10.0.0.Final slow shutdown
          jprio

          The problem was on our side (network configuration). During the startup, we saw this log :

           

          2016-04-06 10:29:46,340 ERROR [org.jboss.modcluster] (UndertowEventHandlerAdapter - 1) MODCLUSTER000043: Failed to send INFO command to phoenix-vip-02/192.168.122.20:6666: connect timed out


          One of our web proxies was not reachable because of a fw rule. Once the rule modified, everything was ok (fast startup and shutdown).

           

          JP

          1 of 1 people found this helpful