Issue analysis for the request to invoke the WildFly graceful shutdown logic when the server process is terminated via an operating system signal and not by a management operation.
WildFly 10 introduced the "graceful shutdown" feature whereby as part of shutdown the server would move into a suspended mode whereby new requests would be rejected but in process requests would be given a configurable amount of time to complete before the server would shut down. See Admin Guide - Latest WildFly Documentation - Project Documentation Editor for details on this feature. However, currently graceful shutdown only occurs if the server is stopped via a management operation. If shutdown occurs via an OS signal (e.g. from a kill -15 or a Ctrl-C) then the server shutdown is not graceful. This makes managing shutdown of WildFly servers more difficult for tooling, e.g. Kubernetes, that needs to use general purpose facilities like OS signals to manage a wide variety of types of processes. So the idea here is to enable graceful shutdown support when the server process is terminated via a signal.
EAP ISSUE: https://issues.jboss.org/browse/EAP7-732
DEV CONTACTS: Brian Stansberry
QE CONTACTS: Jan Stourac
AFFECTED PROJECTS OR COMPONENTS: WildFly Core kernel, OpenShift
OTHER INTERESTED PARTIES:
- Perform graceful shutdown in a manner consistent with what occurs when the 'shutdown' management operation is performed if the following are true:
- The JVM has received an OS signal that causes shutdown hooks to run.
- The state of the server is RUNNING; i.e. the server has completed boot and has not begun any sort of shutdown or reload.
- Management operations like 'shutdown' allow configuration via an operation parameter of the maximum time to wait for in-process requests to complete before proceeding with shutdown. This configuration option of course isn't available in the OS signal case, so instead this value can be configured via a new org.wildfly.sigterm.suspend.timeout system property.
- The value of the property is an integer that represents the number of seconds to wait for requests to complete after the server suspend-state enters state SUSPENDING before proceeding with shutdown.
- A value of less than 0 means "wait indefinitely".
- The default value is 0.
- The value of the system property will be read at the time of handling a signal, so whatever value is in effect in the JVM at that time will be applied.
- If the system property is set as part of server boot (e.g. via -D to java, via inclusion in a properties file passed via -p, via configuration in the xml config file processed at boot, or, for a domain server, via inclusion in the config provided by the Host Controller at boot), then the value of the property will take immediate effect.
- If the value of the property is set on a server following boot via update of a system-property resource in the management model (e.g. using CLI or HAL console), the value of the property will also take immediate effect.
- Disable this new behavior if the org.wildfly.sigterm.suspend.timeout system property is not set. This isn't really a nice-to-have; it's more a "do this if necessary". The reason to do this would be to be conservative about the contexts in which this behavior is enabled, e.g. to give it more bake in situations where it is wanted before making it the behavior always. The ideal behavior is to always do the graceful shutdown, but with a 0 ms timeout if the property is not configured. This is how shutdown is done when a management op is used. If we did initially require setting the system property, we could always remove that requirement in a later release without doing something incompatible.
- Have a proper attribute in the management model (e.g. in subsystem=core-management) instead of or in addition to using a system property for configuration. This is a nice-to-have as my expectation is the target audience for this is oriented toward generic management mechanisms rather than having to ensure an attribute in the user-controlled WildFly xml config is set. Requiring configuration of this value via a management attribute and disallowing config via system property, which is a standard requirement for most WildFly configuration settings, would actually be an anti-requirement in this case. (Note: this nice-to-have is now not expected to be implemented in the version of the feature described in this document.)
- Support this behavior for a domain mode server. It's my expectation that we will support this, as the code involved is the same for a standalone or domain mode server. But I'm describing it as a nice-to-have just in case some unforeseen issue arises. The key use case here is in the cloud and in cloud scenarios our emphasis is on standalone servers.
- Determine what sort of signal (e.g. SIGTERM vs SIGINT) is causing the JVM to exit and vary the behavior based on this information.
- Do anything different from current behavior if the server is not in RUNNING state when the signal is handled or if the server kernel has indicated the process needs to termninate due to a failure.
- Any special graceful shutdown behavior when shutdown is triggered via this mechanism that is different from what occurs when shutdown is triggered by a management operation.
- Perform graceful shutdown in response to a signal if the VM is configured not to run shutdown hooks (e.g. with the -Xrs command line argument to java).
- Perform graceful shutdown in response to signals that do not result in shutdown hooks being run (e.g. SIGKILL).
- Implement some variant of this feature for a domain mode Process Controller or Host Controller process.