8 Replies Latest reply on Sep 7, 2010 5:48 PM by brian.stansberry

Asynchronous results from executing a deployment plan

brian.stansberry Sep 7, 2010 10:15 AM

One of the main things I've been struggling with when implementing deployments is how to provide information to the caller on the results of executing a deployment plan. In the deployment API discussed at https://community.jboss.org/thread/155937?tstart=0 the StandaloneDeploymentPlan.execute(DeploymentPlan) method returns a DeploymentPlanResult object. That's straightforward enough; what's interesting is making getting the details of those results asynchronous.

There are two facets of the asynchronous problem:

1) As discussed on https://community.jboss.org/thread/154922?tstart=0 there's a desire to immediately return control to the user and let them come back later to check for results. I'm looking at doing that by encapsulating the result details in a Future and executing the plan on another thread. Simple enough.

2) Even if we made the caller always block waiting for the results, it's still complex, because the deployment process itself is asynchronous. More specifically, installing the services generated from the deployment is multi-threaded -- MSC breaks down all service start/stop work into tasks that are executed by threads from an Executor. So, the thread that's executing the deployment plan and trying to assemble the results can't just invoked BatchBuilder.install() and assume everything is done when that call return. After install() returns that thread needs to find a way to detect what all the services associated with the deployment are and monitor their status as other threads actually register and start them.

The 2) issue is the focus of the rest of this post.

The approach I'm looking at using is making use of the o.j.as.deployment.DeploymentService class to facilitate this. Currently that class is being used as a sort of empty placeholder on which all other services associated with a deployment depend. Telling the MSC to stop/remove a DeploymentService instance is thus a simple way to trigger removal of all the associated services. This is how undeploy and rollback of a failed deployment are working.

What I'm looking at doing is giving DeploymentService a richer set of behaviors. Basically giving it the ability to track what the services are that were associated with a deployment and an API that lets callers find out about those services. Users interested in finding out details of the results of executing a deployment plan could indirectly call into that API.

The DeploymentService learns about the services associated with a deployment by getting callbacks from a ServiceListener that is registered with the sub-batch that's actually doing the deployment:

public void activate(final ServiceActivatorContext context) {
        .......
        final BatchBuilder batchBuilder = context.getBatchBuilder();
        // Create deployment service
        final ServiceName deploymentServiceName = DeploymentService.SERVICE_NAME.append(deploymentName);
        DeploymentService deploymentService = new DeploymentService();
        batchBuilder.addService(deploymentServiceName, deploymentService);

        // Create a sub-batch for this deployment
        final BatchBuilder deploymentSubBatch = batchBuilder.subBatchBuilder();

        // Setup a batch level dependency on deployment service
        deploymentSubBatch.addDependency(deploymentServiceName);

        // Let deploymentService listen to services in the subbatch
        deploymentSubBatch.addListener(deploymentService.getDependentStartupListener());
            
        // Add a deployment failure listener to the batch
        deploymentSubBatch.addListener(new DeploymentFailureListener(deploymentServiceName));

        ..... go on and create deployment unit and pass it to deployer chain

Important (i.e. new) bit is in bold.

The DeploymentService (and the listener class used above) look like this:

public class DeploymentService implements Service<DeploymentService> {
    public static final ServiceName SERVICE_NAME = ServiceName.JBOSS.append("deployment");
    private static Logger logger = Logger.getLogger("org.jboss.as.deployment");
    
    private final Map<ServiceName, ServiceController<?>> dependents = new HashMap<ServiceName, ServiceController<?>>();
    /** Dependent services that have not yet reached a terminal state in their initial startup (UP, FAILED, DOWN, REMOVED) */
    private final Set<ServiceName> incompleteDependents = new HashSet<ServiceName>();
    private final Lock lock = new ReentrantLock();
    private final Condition startupCondition = lock.newCondition();
    private final Condition stoppedCondition = lock.newCondition();
    /** Whether start() has been invoked since initialization or the last stop() call */
    private boolean started = false;
    /** Whether stop() has been invoked since initialization or the last start() call */
    private boolean stopped = false;
    
    /**
     * Start the deployment.  This will re-mount the deployment root if service is restarted.
     *
     * @param context The start context
     * @throws StartException if any problems occur
     */
    public void start(StartContext context) throws StartException {
        lock.lock();
        try {
            started = true;
            stopped = false;
            startupCondition.notifyAll();
        }
        finally {
            lock.unlock();
        }
    }

    /**
     * Stop the deployment.  This will close the virtual file mount.
     * 
     * @param context The stop context
     */
    public void stop(StopContext context) {
        lock.lock();
        try {
            stopped = true;
            started = false;
            stoppedCondition.notifyAll();
        }
        finally {
            lock.unlock();
        }
    }

    /** {@inheritDoc} **/
    public DeploymentService getValue() throws IllegalStateException {
        return this;
    }
    
    /**
     * Blocks until all services associated with this deployment have
     * completed startup (not necessarily successfully).
     * 
     * @throws InterruptedException
     */
    public void awaitDependentStartup() throws InterruptedException {
        lock.lock();
        try {        
            while (!stopped && (!started || incompleteDependents.size() > 0)) {
                if (onlyNeverMode()) {
                    break;
                }
                startupCondition.await();
            }
        }
        finally {
            lock.unlock();
        }
    }
    
    /**
     * Blocks until all services associated with this deployment have
     * completed startup (not necessarily successfully) or the specified
     * timeout occurs.
     * 
     * @throws InterruptedException
     */
    public void awaitDependentStartup(long timeout, TimeUnit timeUnit) throws InterruptedException {
        lock.lock();
        try {        
            while (!stopped && (!started || incompleteDependents.size() > 0)) {
                if (onlyNeverMode()) {
                    break;
                }
                startupCondition.await(timeout, timeUnit);
            }
        }
        finally {
            lock.unlock();
        }        
    }
    
    /**
     * Blocks until this service is stopped.
     * 
     * @throws InterruptedException
     */
    public void awaitStop() throws InterruptedException {
        lock.lock();
        try {        
            while (!stopped) {
                stoppedCondition.await();
            }
        }
        finally {
            lock.unlock();
        }
    }
    
    /**
     * Blocks until this service is stopped or the specified
     * timeout occurs.
     * 
     * @throws InterruptedException
     */
    public void awaitStop(long timeout, TimeUnit timeUnit) throws InterruptedException {
        lock.lock();
        try {        
            while (!stopped) {
                stoppedCondition.await(timeout, timeUnit);
            }
        }
        finally {
            lock.unlock();
        }        
    }

    /**
     * Gets any exceptions that occurred during start of the services that
     * are associated with this deployment.
     * 
     * @return the exceptions keyed by the name of the service. Will not be <code>null</code>
     */
    public Map<ServiceName, StartException> getDependentStartupExceptions() {
        lock.lock();
        try {
            Map<ServiceName, StartException> result = new HashMap<ServiceName, StartException>();
            for (Map.Entry<ServiceName, ServiceController<?>> entry : dependents.entrySet()) {
                StartException se = entry.getValue().getStartException();
                if (se != null)
                    result.put(entry.getKey(), se);
            }
            return result;
        }
        finally {
            lock.unlock();
        }
    }
    
    /**
     * Gets the {@link ServiceController.State state} of the services that
     * are associated with this deployment.
     * 
     * @return the services and their current state. Will not be <code>null</code>
     */
    public Map<ServiceName, ServiceController.State> getDependentStates() {
        lock.lock();
        try {
            Map<ServiceName, ServiceController.State> result = new HashMap<ServiceName, ServiceController.State>(dependents.size());
            for (Map.Entry<ServiceName, ServiceController<?>> entry : dependents.entrySet()) {
                result.put(entry.getKey(), entry.getValue().getState());
            }
            return result;
        }
        finally {
            lock.unlock();
        }
    }
    
    /**
     * Gets a {@link ServiceListener} that can track startup events for 
     * services associated with the deployment this service represents. This 
     * listener should
     * be associated with a {@link BatchBuilder#subBatchBuilder() sub-batch}
     * of this services batch that encapsulates the creation of services that
     * are associated with the deployment.
     * 
     * @return the service listener
     */
    public ServiceListener<Object> getDependentStartupListener() {
        return new DependentServiceListener();
    }

    
    /** Checks whether all incomplete dependents are Mode.NEVER. Must be called with the lock held */
    private boolean onlyNeverMode() {
        int ever = incompleteDependents.size();
        for (ServiceName name : incompleteDependents) {
            ServiceController<?> controller = dependents.get(name);
            if (controller == null || controller.getMode() == Mode.NEVER)
                ever--;
        }
        return ever == 0;
    }
    
    private class DependentServiceListener extends AbstractServiceListener<Object> {

        /** 
         * This will be called for all dependent services before the 
         * BatchBuilder.install() call returns. So at that point we know what
         * the dependent services are; other threads will invoke the other
         * callbacks are services are started.
         */
        @Override
        public void listenerAdded(ServiceController<? extends Object> controller) {
            lock.lock();
            try {            
                dependents.put(controller.getName(), controller);
                incompleteDependents.add(controller.getName());
            }
            finally {
                lock.unlock();
            }            
        }

        @Override
        public void serviceFailed(ServiceController<? extends Object> controller, StartException reason) {
            lock.lock();
            try {
                incompleteDependents.remove(controller.getName());
                startupCondition.notifyAll();
            }
            finally {
                lock.unlock();
            }
        }

        @Override
        public void serviceRemoved(ServiceController<? extends Object> controller) {
            lock.lock();
            try  {
                incompleteDependents.remove(controller.getName());
                startupCondition.notifyAll();
            }
            finally {
                lock.unlock();
            }
        }

        @Override
        public void serviceStopped(ServiceController<? extends Object> controller) {
            lock.lock();
            try  {
                incompleteDependents.remove(controller.getName());
                startupCondition.notifyAll();
            }
            finally {
                lock.unlock();
            }
        }
        
    }
    
}

Besides the listener, the other interesting bit in the above are the awaitXXX methods. Those are what allow a caller that actually wants to find out what happened with a deployment to block until the asynchronous MSC tasks complete.

The awaitStop() methods are straightforward enough.

The awaitDependentStartup() implementation is more subtle. It depends on the fact that the listener's listenerAdded() method should be invoked passing in any services associated with the deployment before the BatchBuilder.install() method returns. My understanding of how MSC works tells me this is the case -- all listeners associated with a batch are passed to ServiceBuilderImpl and the listenerAdded method is invoked as part of ServiceBuilderImpl.doCreate(). This is all done as part of executing BatchBuilder.install(). This seems like a logical and necessary part of the BatchBuilder.install() contract; it would be good if it were documented as such.

There is a subtle race here though. In BatchBuilder.install() the DeploymentService itself could have all dependencies satisfied and tasks executed by another thread to start it before the thread executing install() processes the dependent services and calls listenerAdded(). If a caller invoked awaitDependentStartup() during this window, it would return even though the dependent services are not yet started. I'm dealing with that by having the thread that executes BatchBuilder.install() not expose DeploymentService.awaitDependentStartup() to any calling threads until the install() method returns.

1. Re: Asynchronous results from executing a deployment plan

dmlloyd Sep 7, 2010 11:01 AM (in response to brian.stansberry)

I don't like the await() methods. Talking from MSC experience, using a caller-blocking methodology doesn't mix well with the callback-driven methodology used by MSC. In particular, if you use an await()-like method from within an MSC task (listener, service start/stop, etc) which depends on a change in another service (and believe me, people will do this), you may be introducing a deadlock since that service's completion may ultimately depend upon you.

That said - looks like you have no provision for supporting on-demand services. This is actually a tricky problem because a service's mode can change irrespective of the current controller state (i.e. the fact that a listener is running doesn't "preserve" the current controller mode). Thus a service's mode can change on you, so you can't just say "if this service is automatic/immediate, add it to the set of incomplete dependents" because changing the mode to on-demand can "complete" it. Also, an AUTOMATIC service which doesn't start because of an ON_DEMAND dependency could also be considered "complete" for the purposes of deployment. However not all AUTOMATIC services can be considered complete.

So the question is, what does it mean for a deployment to be "done"? Saying that all services in the deployment have fully started won't work due to on-demand and other services which are not expected to start immediately, and possibly changing modes etc. So you really need to track a specific subset of services which represent the meat of the deployment, which can vary based on deployment type and probably other factors as well. For some deployments, it may not even be possible to wait for any services to start. This means that the listener would be applied to a select set of services only.

As for the race condition, that's not too hard to solve since you're using locks & conditions: just add a flag "started" which you set to "true" (with a signalAll() [btw, you should use signal*, not notify* with Condition]) which is evaluated as part of the condition for readiness. The flag would be set once the batch is installed (then you'd know all the listeners were added).
Actions
2. Re: Asynchronous results from executing a deployment plan

jason.greene Sep 7, 2010 11:11 AM (in response to dmlloyd)

Well, it looks as though Brian is trying to use this to tell the user a deploy() was successful. IMO that doesn't necessary mean that the deployment was successful and all of its runtime services have been started.
Actions
3. Re: Asynchronous results from executing a deployment plan

dmlloyd Sep 7, 2010 12:06 PM (in response to jason.greene)

If that were the case, then one merely has to confirm that the batch install succeeded. I'd be all for that, but I have a feeling (from prior discussions) that Brian really wants to ensure that every (relevant) service has started.
Actions
4. Re: Asynchronous results from executing a deployment plan

brian.stansberry Sep 7, 2010 12:14 PM (in response to jason.greene)

We have flexibility in what we tell the user. We just need to come up with something that's meaningful and useful. See the Deployment API thread for some of the things people are looking for. Some of it is pretty rich and detailed. So part of what I'm trying to work toward here is how to lay the foundation for supporting that.

For detailed, intelligent reporting on what's happened with a deployment, for sure some sort of processing based on the type of the deployment is needed. Whether we want to do such detailed, intelligent reporint needs discussion. Part of what I was getting at here though is the DeploymentService seems like a natural place to encapsulate information about a deployment, whatever it may be.

But, ignoring the really rich, detailed reporting cases, what's a reasonable minimum to tell a user?

Simplest is just to report whether the deployment unit passed through the deployment chain without exception and BatchBuilder.install() returned without exception. That's easy enough, since it's all done by one thread. Not sure how useful that is though, since nothing at all is actually started by that thread.

David's concept of tracking the "meat" of the deployment is better, but it's not clear how to implement that. Does that become a deployer responsibility? Deciding the service X is "meat" and somehow registering it with whatever is tracking the overall deployment?

Re: signalAll: oops! thanks for that! The IDE isn't always your friend.
Actions
5. Re: Asynchronous results from executing a deployment plan

brian.stansberry Sep 7, 2010 12:39 PM (in response to brian.stansberry)

Another quick though on this: we can eliminate any requirement that the results of the deployment plan are stable over time. We can simply say that we won't report on a particular deployment until BatchBuilder.install() has returned. Thereafter, the information can change.
Actions
6. Re: Asynchronous results from executing a deployment plan

brian.stansberry Sep 7, 2010 12:44 PM (in response to brian.stansberry)

Semi-OT, this discussion makes me see a flaw in my impl of the DeploymentPlan notion. One thing a deployment plan supports is the notion of grouping a set of actions (e.g. separate deploy/undeploy operations) together and rolling them all back in case of failure of any. Again, simple enough to do if a problem occurs on the thread the calls BatchBuilder.install(). But more complex if the failure is on another thread that's installing a service.
Actions
7. Re: Asynchronous results from executing a deployment plan

jason.greene Sep 7, 2010 4:39 PM (in response to brian.stansberry)

Hmm what if the deployment API returned a list of services, then using some other API you could query the status of the services.
Actions
8. Re: Asynchronous results from executing a deployment plan

brian.stansberry Sep 7, 2010 5:48 PM (in response to jason.greene)

Yeah, that works too; eliminates the unclean feeling of having a "DeploymentPlanResult" object whose results are unstable over time.

For now I'll generate the DeploymentPlanResult based off the status when BatchBuilder.install() returns. If later we figure out a mechanism to identify and track the status of services that represent the "meat" of a particular deployment type, then we can defer generating the DeploymentPlanResult until those services are "complete". That change wouldn't affect API.
Actions

Go to original post