Wildfly 10.0.0.Final slow shutdown
jprio Mar 24, 2016 7:00 AMHello,
Since our migration from JB7 to WF10, the nodes of our production cluster are taking a very long time to stop. Here is our configuration :
- WF10.0.0.Final, java build 1.8.0_72-b15, mod_cluster 1.2.6.Final
- Domain mode, full-ha profile
- <distributable/> app with ~700 sessions (we’re using Wicket as our web fmk)
- 4 nodes in the cluster
- Default configuration of :
- Infinispan for session replication (except for the locking isolation : READ_COMMITED)
- Mod cluster
The nodes are stopping very quickly in QA (2/3s).
In production, our application takes always approx. 242s to stop and the server takes always approx. 320s to stop.
What we saw in the logs :
- Stop server, session draining :
[root@s-pho-tmp ~]# grep 'All pending' /usr/local/apps/wildfly-10.0.0.Final/domain/servers/s-pho-tmp/log/server.log
2016-03-23 08:18:53,822 INFO [org.jboss.as.server] (ServerService Thread Pool -- 80) WFLYSRV0211: Suspending server with 0ms timeout.
2016-03-23 08:21:02,151 INFO [org.jboss.modcluster] (ServerService Thread Pool -- 80) MODCLUSTER000021: All pending requests drained from default-host:/phoenix-tmp in 80.1 seconds
2016-03-23 08:21:02,162 INFO [org.jboss.as.server] (Thread-2) WFLYSRV0220: Server shutdown has been requested.
…
2016-03-23 08:23:42,276 INFO [org.jboss.modcluster] (ServerService Thread Pool -- 86) MODCLUSTER000021: All pending requests drained from default-host:/phoenix-tmp in 80.1 seconds
…
2016-03-23 08:25:03,915 INFO [org.jboss.as.server.deployment] (MSC service thread 1-8) WFLYSRV0028: Stopped deployment aecetia-web-15.1.5-tmp.war (runtime-name: aecetia-web-15.1.5-tmp.war) in 241744ms
…
2016-03-23 08:25:33,216 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-1) ISPN000080: Disconnecting JGroups channel web
2016-03-23 08:25:33,216 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-1) ISPN000082: Stopping the RpcDispatcher for channel web
2016-03-23 08:26:22,432 INFO [org.wildfly.extension.undertow] (MSC service thread 1-7) WFLYUT0008: Undertow HTTP listener default suspending
2016-03-23 08:26:22,434 INFO [org.jboss.modcluster] (ServerService Thread Pool -- 97) MODCLUSTER000002: Initiating mod_cluster shutdown
2016-03-23 08:26:22,436 INFO [org.wildfly.extension.undertow] (MSC service thread 1-7) WFLYUT0007: Undertow HTTP listener default stopped, was bound to 192.168.123.196:8080
2016-03-23 08:26:22,438 INFO [org.wildfly.extension.undertow] (MSC service thread 1-7) WFLYUT0004: Undertow 1.3.15.Final stopping
2016-03-23 08:26:22,472 INFO [org.jboss.as] (MSC service thread 1-5) WFLYSRV0050: WildFly Full 10.0.0.Final (WildFly Core 2.0.10.Final) stopped in 320294ms
- Webapp stop duration :
[root@s-pho-tmp ~]# grep 'Stopped deployment aecetia-web-15.1.5-tmp.war' /usr/local/apps/wildfly-10.0.0.Final/domain/servers/s-pho-tmp/log/server.log
2016-03-23 08:25:03,915 INFO [org.jboss.as.server.deployment] (MSC service thread 1-8) WFLYSRV0028: Stopped deployment aecetia-web-15.1.5-tmp.war (runtime-name: aecetia-web-15.1.5-tmp.war) in 241744ms
2016-03-23 08:41:14,881 INFO [org.jboss.as.server.deployment] (MSC service thread 1-4) WFLYSRV0028: Stopped deployment aecetia-web-15.1.5-tmp.war (runtime-name: aecetia-web-15.1.5-tmp.war) in 241812ms
2016-03-23 09:52:21,457 INFO [org.jboss.as.server.deployment] (MSC service thread 1-2) WFLYSRV0028: Stopped deployment aecetia-web-15.1.5-tmp.war (runtime-name: aecetia-web-15.1.5-tmp.war) in 242510ms
2016-03-23 10:31:00,402 INFO [org.jboss.as.server.deployment] (MSC service thread 1-2) WFLYSRV0028: Stopped deployment aecetia-web-15.1.5-tmp.war (runtime-name: aecetia-web-15.1.5-tmp.war) in 242063ms
- Server stop duration :
[root@s-pho-tmp ~]# grep 'WildFly Full 10.0.0.Final (WildFly Core 2.0.10.Final) stopped' /usr/local/apps/wildfly-10.0.0.Final/domain/servers/s-pho-tmp/log/server.log
2016-03-24 08:34:56,689 INFO [org.jboss.as] (MSC service thread 1-3) WFLYSRV0050: WildFly Full 10.0.0.Final (WildFly Core 2.0.10.Final) stopped in 320227ms
2016-03-24 09:36:04,231 INFO [org.jboss.as] (MSC service thread 1-3) WFLYSRV0050: WildFly Full 10.0.0.Final (WildFly Core 2.0.10.Final) stopped in 320227ms
2016-03-24 09:50:27,389 INFO [org.jboss.as] (MSC service thread 1-2) WFLYSRV0050: WildFly Full 10.0.0.Final (WildFly Core 2.0.10.Final) stopped in 320226ms
2016-03-24 10:27:36,769 INFO [org.jboss.as] (MSC service thread 1-3) WFLYSRV0050: WildFly Full 10.0.0.Final (WildFly Core 2.0.10.Final) stopped in 320224ms
2016-03-24 10:51:56,552 INFO [org.jboss.as] (MSC service thread 1-5) WFLYSRV0050: WildFly Full 10.0.0.Final (WildFly Core 2.0.10.Final) stopped in 320317ms
2016-03-24 11:13:24,815 INFO [org.jboss.as] (MSC service thread 1-3) WFLYSRV0050: WildFly Full 10.0.0.Final (WildFly Core 2.0.10.Final) stopped in 320255ms
2016-03-24 11:27:07,772 INFO [org.jboss.as] (MSC service thread 1-7) WFLYSRV0050: WildFly Full 10.0.0.Final (WildFly Core 2.0.10.Final) stopped in 320274ms
- ISPN timeout
2016-03-23 15:49:19,426 ERROR [org.infinispan.topology.ClusterTopologyManagerImpl] (transport-thread--p16-t6) ISPN000196: Failed to recover cluster state after the current node became the coordinator: java.util.concurrent.ExecutionException: org.infinispan.util.concurrent.TimeoutException: Timed out waiting for valid responses!
What we tried :
- Modify the transport timeout (->40000) for ISPN => no effect
- Modify the state-transfert time out (-> 120000) for ISPN => no effect
- Modifify the drainingSessionStrategy (-> NEVER) for the mod cluster config => no effect
Any help appreciated. The impact for us : a long time to restart a node and deliver a new version of the application.
Regards,
JP