13 Replies Latest reply on Sep 24, 2010 11:14 AM by Emanuel Muckenhuber

    Overall control of graceful shutdown

    Brian Stansberry Master

      Wanted to document some thoughts (mostly David's) on overall handling of graceful shutdown in AS 7.

       

      The first core requirement is users need to be able to ask the server to shut down gracefully. Shutting down gracefully means allowing ongoing work, including long-running tasks, to complete before shutting down or, if possible, shifting the long running work to another server and cleanly telling the client of the change. The second core requirement is allowing the user to provide an overall timeout for how long to allow ongoing work to complete before accepting possible disruption to that work and pushing forward with the shutdown.

       

      The individual services within the AS are the ones who understand what the ongoing tasks are, how to monitor them for completeness, how to push them to other servers and how to terminate them "gracelessly" if needed.

       

      The overall AS knows how long it's been since graceful shutdown was requested and thus whether a timeout has been exceeded.

       

      The gist of our thoughts on this is the normal behaviour of any AS subsystem should be to shut down gracefully. That is, the stop() method of the services that comprise the subsystem should understand what the ongoing tasks are, monitor them for completeness, try if possible to push them to other servers, and only shut down when ongoing work is complete. The stop() method should allow this to take as long as necessary.

       

      However, the services should also provide a hook to allow the overall AS to signal that the shutdown timeout has passed and that shutdown needs to now proceed without regard for disruption to ongoing work. That signal would come on a separate thread from the one that invoked the service's stop() method. When that signal is received the thread executing stop() should be informed of that (e.g. via setting flags that stop() periodically checks, interrupting the thread running stop(), etc) and then the stop() execution should proceed down a code path that promptly stops the service.

       

      The hook provided by the AS would consist of a service that any service that supports graceful shutdown could have injected:

       

      public interface GracefulShutdownService() {
         void registerGracefulShutdownTerminator(GracefulShutdownTerminator terminator);
      }
      

       

      The service that supports graceful shutdown would register a callback with GracefulShutdownService that the AS would use to signal that the "graceful" aspect of shutdown no longer applies:

       

      public interface GracefulShutdownTerminator() {
          void terminateGracefulShutdown();
      }
      
      

      I think it might be better though to just simplify this, skip GracefulShutdownTerminator and have services register a Runnable with GracefulShutdownService.

       

      public interface GracefulShutdownService() {
         void registerGracefulShutdownTerminator(Runnable terminator);
      }
      

       

       

      All this is just about the overall control of the shutdown. The complexity is in the subsystem's graceful shutdown handling. That's where tricky stuff needs to happen, e.g.

       

      1) Identifying existing long running work that extends beyond a single request

      2) Preventing external clients creating new long running work that extends beyond a single request

      3) Allowing internal clients to continue creating new long running tasks (i.e. trust the internal client to properly do 2) and assume any new work it submits is needed to allow some overall existing work to gracefully complete)

      4) Moving long running tasks to other servers or letting them complete

      5) Leaving external connectors open until 4) is done but stopping them as soon as possible once externally generated long running work is handled

      6) Final shutdown

        • 1. Re: Overall control of graceful shutdown
          Brian Stansberry Master

          Something to think about is how all this can apply to a "graceful undeploy" notion. A graceful undeploy is just a more limited scope for the above. So, possibly a service that is part of a deployment could have a "GracefulUndeployService" injected, with which it could register a callback. The trick would be having that GracefulUndeployService have the correct scope; i.e. it's not an AS-wide singleton like GracefulShutdownService.

          • 2. Re: Overall control of graceful shutdown
            Jason Greene Master

            Brian Stansberry wrote:

             

            Something to think about is how all this can apply to a "graceful undeploy" notion. A graceful undeploy is just a more limited scope for the above. So, possibly a service that is part of a deployment could have a "GracefulUndeployService" injected, with which it could register a callback. The trick would be having that GracefulUndeployService have the correct scope; i.e. it's not an AS-wide singleton like GracefulShutdownService.

            Is this really needed? Couldn't the services that make up a deployment could just use the same GracefulShutdownService? E.g you could have a DeploymentSessionManager which owns all sessions associated with a deployment, or has the ability to clean them up.

            • 3. Re: Overall control of graceful shutdown
              Brian Stansberry Master

              It's not needed by the services that make up the deployment. It's needed so whatever is controlling the overall undeployment knows which terminators to invoke. There needs to be some sort of mapping of terminators to deployments.

              • 4. Re: Overall control of graceful shutdown
                Emanuel Muckenhuber Master

                Not sure if i understand you correctly... Are you saying that the graceful shutdown would part of a normal service lifecycle. With service i mean org.jboss.msc.Service, where the lifecycle is also managed by MSC? Where e.g. stop(StopContext) would register the GracefulShutdownTerminator used by AS to signal the service promptly shutdown?

                If that's the case i'm not sure if that will work, since usually deployments depend on the connector - which means that when stopping the server, deployments would get undeployed before the connector.stop() is invoked. Where actually the connector should be started after deployment is completed and stopped (in a graceful manner or not) before undeployment begins.

                • 5. Re: Overall control of graceful shutdown
                  Brian Stansberry Master

                  The GracefulShutdownTerminator would be registered in start(StartContext). If the shutdown isn't meant to be graceful, the MSC would trigger an overall stop (probably by stopping a root service all the others depend on) and then would immediately invoke all the GracefulShutdownTerminators. So waiting until stop() to register the terminator would not work.

                   

                  I smell a possible race there though even with registering in start().

                   

                  Your question about the deployments/connector gets into the real heart of the issue, how subsystems and services that are part of a deployment can be composed such that the dependencies work out. I think it's worthwhile to think through how to do that, since whether or not the GracefulShutdownTerminator idea makes sense, using the MSC dependency mechanism is very likely going to be needed to get graceful shutdown to work.

                   

                  For example, as *part* of a war deployment there could be a service that monitors the existence of active sessions during shutdown. It depends on some other service in the deployment (e.g. the session manager) and on the connector. The connector will not stop until that service stops. That service registers the GracefulShutdownTerminator.

                  • 6. Re: Overall control of graceful shutdown
                    Emanuel Muckenhuber Master

                    Brian Stansberry wrote:

                     

                    Your question about the deployments/connector gets into the real heart of the issue, how subsystems and services that are part of a deployment can be composed such that the dependencies work out. I think it's worthwhile to think through how to do that, since whether or not the GracefulShutdownTerminator idea makes sense, using the MSC dependency mechanism is very likely going to be needed to get graceful shutdown to work.

                     

                    For example, as *part* of a war deployment there could be a service that monitors the existence of active sessions during shutdown. It depends on some other service in the deployment (e.g. the session manager) and on the connector. The connector will not stop until that service stops. That service registers the GracefulShutdownTerminator.

                    Hmm, yeah maybe having a service as part of the deployment with some "artificial" dependencies preventing the connector to stop could do the trick. However this basically means that the connector cannot be stopped until the deployment is undeployed. Which might not even matter in case you use mod_cluster (at least for webapps).

                    • 7. Re: Overall control of graceful shutdown
                      Brian Stansberry Master

                      The dependencies aren't really artificial. The service that depends on the connector has a real job to do -- figuring out when long running work the needs an open connector is completed. It will of course delegate to other services for a lot of things, but I bet it will end up having some significant logic.

                       

                      If mod_cluster is integrated then part of what it does could be to interact with mod_cluster to trigger sending a global DISABLE_APP to httpd.

                      • 8. Re: Overall control of graceful shutdown
                        Emanuel Muckenhuber Master

                        Brian Stansberry wrote:

                         

                        The dependencies aren't really artificial. The service that depends on the connector has a real job to do -- figuring out when long running work the needs an open connector is completed. It will of course delegate to other services for a lot of things, but I bet it will end up having some significant logic.

                        "artificial dependencies" in a sense that e.g. in jboss.web nothing really has a dependency on a connector. Where this graceful shutdown handler would need a dependency on all configured connectors preventing them to shutdown. Which basically means that connectors have to be started before the deployment is deployed and stopped after undeployment. This seems to conflict a bit with JBAS-8423 - where we maybe can just separate between starting connectors late in the process and use the dependencies to control the graceful shutdown.

                        • 9. Re: Overall control of graceful shutdown
                          Emanuel Muckenhuber Master

                          I guess the startup could be different from case to case, but we could provide a way to register a runnable we run after deployment completed. More precisely a service using the asynchronous() feature of msc and registering a runnable which does something like:

                           

                           

                              public void run() {
                                  try {
                                      connector.start();
                                      startContext.complete();
                                  } catch (Throwable t) {
                                      startContext.failed(new StartException(t));
                                  }
                              }
                              public void run() {
                                  try {
                                      connector.start();
                                      startContext.complete();
                                  } catch (Exception e) {
                                      startContext.failed(new StartException(e));
                                  }
                              }
                          
                          • 10. Re: Overall control of graceful shutdown
                            Emanuel Muckenhuber Master

                            After posting the last comment i realized that it does not make sense with dependencies as well. Confused myself.

                             

                            Maybe the best option would be to only have those graceful shutdown hooks and dependencies as part of deployments, separate of other services like connectors. This would require that we undeploy all deployments before shutting down the server and other services. This seems to be better, since it would also support a graceful undeployment of a single deployment.

                            • 11. Re: Overall control of graceful shutdown
                              Brian Stansberry Master

                              A problem we've had in the past with not stopping endpoints early as  part of graceful shutdown is the invocation has returned from the  deployment code, so the deployment and its CL are unloaded. But the  invocation is still alive on it's return path, it hasn't passed yet  through the response marshalling logic in the endpoint. As a result,  when it hits the response marshalling logic, calls are made to the  unloaded CL, and they fail.

                               

                              This is more of an issue with EJBs. Web requests happily don't do any marshalling in the connectors.

                              • 13. Re: Overall control of graceful shutdown
                                Emanuel Muckenhuber Master

                                I see. Hmm, doesn't this problem exist in general when you have multiple EJB deployments using the same endpoint (not sure how this works exactly) - but in the end it might happen that one deployment can be shut down gracefully faster than another, where the endpoint can't be closed?