Wildfly suspend and resume (aka graceful shutdown)

Version 2

    Introduction

     

    The purpose of the suspend and resume feature is to allow a server to be taken out of service in a graceful manner, allowing all current requests to finish as normal. One of the main use cases for this is graceful shutdown, although it is possible to suspend a server without shutting is down (e.g. to perform some admin operations).

     

    Graceful shutdown only applies to runtime behaviour, the management interfaces are not affected.

     

    Implementation

     

    Graceful shutdown is co-ordinated at a server wide level, mostly focused on the entry points at which a request enters the server. Entry points track the number of active requests, and if the server is suspending they reject new requests and allow existing ones to shut down.

     

    org.jboss.as.server.suspend.SuspendController

     

    SuspendController coordinates the suspend. There are two phases, the pre-suspend phase, and the suspend phase. During the pre-suspend phase the server operates as normal, however subsystems are expected to notify external parties that the server is about to go away. For example mod_cluster will notify the load balancer, clustering will notify the cluster that this node is going away, JMS will stop delivery etc.

     

    When all susbsystems have reported that the pre-suspend phase is done then the server goes into a suspending state. In this state all endpoints reject new requests, and notify the suspend controller when all current active requests are complete. Once all active requests have completed (plus any other resources shutting down that the subsystem needs to wait on) then the server is considered suspended. If this is a graceful shutdown request rather than a straight suspend then the server will now shut down.

     

    Subsystems register themselves with the SuspendController by registering instances of org.jboss.as.server.suspend.ServerActivity. This interface allows them to directly listen for suspend events, and notify the controller when they consider them complete. Many subsystems will not need to use these constructs directly however, if a subsystem only cares about tracking active requests they should use the RequestController instead.

     

    org.wildfly.extension.requestcontroller.RequestController

     

    This service is responsible for tracking active requests in a server, and notifying the SuspendController when they are complete. Subsystems first get a ControlPoint instance from the controller that corresponds to the deployment and entry point (in future we may support suspending an individual interface or deployment). This should then be used as follows:


    RunResult result = entryPoint.beginRequest();
    if(result != RunResult.RUN) {
        //do reject request
        return;
    }
    try {
        //handle request
    } finally {
        entryPoint.requestComplete();
    }
    
    

     

    Reasons for this global entry point based design

    Graceful shutdown has to take place at a global level, as it is impossible to predict which resources an active request may try and use. For example if you attempt to gracefully shut down a datasource by tracking the connections that are allocated this may allow some requests to complete successfully, however other requests that are running in the container that attempt to use the datasource will fail.

     

    While graceful shutdown is in progress the container must operate normally from the point of view of any already running request, the only point at which requests can be rejected is at the entry point.

     

    Email Discussions:

    http://wildfly-development.1055759.n5.nabble.com/Design-Proposal-Server-suspend-resume-AKA-Graceful-Shutdown-tc5714243.h…

    http://wildfly-development.1055759.n5.nabble.com/Graceful-shutdown-tc5714814.html