Graceful Shutdown| JBoss.org Content Archive (Read Only)

1. Re: Graceful Shutdown

brian.stansberry Nov 23, 2009 5:06 PM (in response to alrubinger)

Preserving for posterity a long discussion on this today on #jboss-dev on freenode:

<bstansberry> got your note on graceful shutdown
<bstansberry> i'm thinking a bit about what to work on to actually get something useful in M2
<bstansberry> that's my priority -- get something actually useful, not just progress in the right direction
<ALR> bstansberry: OK, let me look a bit at my notes from before to remember the issues involved.
<bstansberry> k
<ALR> bstansberry: So the requirements we discussed was that:
<ALR> 1) Tied into the bootstrap
<ALR> 2) 2-phase prepare/commit-like cycle.
<ALR> Where in the first phase you continue to process requests but block incoming.
<ALR> And then when done w/ processing signal that you're ready for the 2nd phase
<bstansberry> yep
<ALR> Also the mechanism imp has to be decoupled well so we don't leak anything.
<bstansberry> yes
<ALR> But I hadn't yet flushed out how the impl of the shutdown registry works. I think it can be concurrent (ie. Send requests to all subsystems to shutdown, then get Future back and when all .isCompleted we can move to phase 2
<bstansberry> what i'm thinking about now is who needs to be involved in that vs clustered deployments to actually get things done
<bstansberry> for sure it needs to be concurrent
<ALR> bstansberry: I can assume everything to be part of the lifecycle is an MC bean?
* balunasj has quit ()
<ALR> bstansberry: Then they can autowire/inject a "ShutdownRegistry" type and use it. Then provide an event listener mechanism which each subsystem will implement
<ALR> So on install, everyone gets a ShutdownRegistry if they want, and its up to the component to handle the events.
<bstansberry> being an MC bean is a fair requirement. might take some doing for all the services, but what you describe is what i was thinking as well
<bstansberry> is AS trunk actually using Bootstrap now?
<ALR> bstansberry: Not a recent enough version, no, I need to merge everything in.
<bstansberry> is that part of your M2 plan?
<ALR> bstansberry: Which, judging from Branch_5_x integration time, is a bit short of a day. (unless the IDE is playing nicer w/ AS modules and I can see compilation errors more readily)
<ALR> bstansberry: It is now. Might as well start knocking off my action items.
<bstansberry> ok
<ALR> bstansberry: But unlikely I'll have it for you this week. You're chomping at the bit or just want to be sure it's in M2?
<bstansberry> neither
<bstansberry> i'm evaluating working on graceful vs working on clustered deployments
<bstansberry> figuring out what needs to happen to have something useful for either, who needs to be involved
<ALR> bstansberry: If you work on clustered deployments for now that'll give me some time to first clean up with EmbeddedAS release/6.0.0.M1, merge stuff to trunk, then incorporate a new API in bootstrap.
<bstansberry> and will then pick one or the other to ensure something useful gets in M2
<bstansberry> yeah. so for GS, the "who" is 1) you for boostrap stuff 2) me for web containers 3) me or paul or someone for EJB3 containers 4) transactions ??? 5) JBM/HornetQ ???
<bstansberry> JBM/HornetQ is IMO optional for "useful". actually just 1) and 2) is enough for "useful"
<bstansberry> so now i'll think a bit about clustered deployments; same kind of analysis
<ALR> Sounds about right, although I think for EJB3 there's more than just clustered...we need to also halt incoming regular requests.
* maeste has quit (Remote closed the connection)
<bstansberry> yes, all of 1-5 isn't just clustered
* mazz (n=mazz@redhat/jboss/mazz) has left #jboss-dev
<bstansberry> my earlier intent was to work on domain model stuff, but that's deferred, so i really want to get something actually done :-)
<ALR> bstansberry: I also have a vested interest in domain model. :)
<ALR> For now I'll draw up some notes on a design Wiki and we'll be better able to pick up after Thanksgiving.,
<bstansberry> ALR: sure, we all do. but we gotta keep doing stuff that actually gets released
<bstansberry> RERO
<bstansberry> sounds good
* maeste (n=maeste4@194.185.94.10) has joined #jboss-dev
<dmlloyd> don't forget that domain model is likely to be an AS7 thing
<ALR> Yup.
<bstansberry> yes, hence my shift to graceful shutdown or clustered deployments
<ALR> Though I was hoping for some decoupled thing too. Imagine I want to use the same interface for starting any arbitrary service, but also configure it.
<ALR> server.getConfiguration().addService(GrizzlyAdaptor.class);
<ALR> server.getConfiguration().as(GrizzlyAdaptor.class).setBindPort(8081).setBindHost("localhost");
<ALR> server.start() < Grizzly comes up.
<dmlloyd> which still doesn't let you configure more than one GrizzlyAdaptor :)
<ALR> Excuuuuuuuse my psudocode
<ALR> Though IMO more than one Grizzly gets dangerous; they're difficult to outrun.
<dmlloyd> play dead
<ALR> We already support that.
<bstansberry> OT: why is Grizzly getting into the AS?
<ALR> bstansberry: It's not necessarily into AS. I've been playing with the idea of just making some runtime into which you can install/start any service with some adaptor, and Grizzly is an easy impl target.
<ALR> Because they were built for embeddability and have a unified config API, etc
<ALR> May be a poor example. Pretend I said "JBossWeb". :)
<bstansberry> ALR: LOL. ok; i'm just channeling remy a bit here
<ALR> bstansberry: IMO JBossWeb should be as easy to setup.
<ALR> Else we provide something that is. Or have any number of options for a servlet container.
<ALR> And let the user choose.
<ALR> Again, not necessarily for AS.
<jpederse> ALR: well, I would go with Grizzly or Jetty as the PoC ;)
<ALR> pgier: Ping.
* maeste has quit (Remote closed the connection)
<pgier> ALR: hi
<ALR> pgier: Sorry, so how do I set the profile in a local build to build the dist?
<pgier> you mean the zip?
<ALR> (in tags/6.0.0.M1? )
<ALR> Yeah.
<pgier> -Pdist-zip
<ALR> pgier: ./build.sh -Pdist-zip ?
<ALR> Or the maven build from trunk works there now?
<pgier> ah, right, I forgot that it's using ant
<pgier> you have to do build.sh first
<pgier> and then mvn -Pdist-zip package
<ALR> pgier: Ah OK. Thanks.
* rploski (n=rploski@redhat/jboss/rploski) has joined #jboss-dev
* ChanServ gives voice to rploski
<pgier> we could probably add a target to the ant build to include the zip
<ALR> Looking forward to trunk and one build to rule them all.
<ALR> pgier: It's tagged already; just so long as it gets released into the M2 repo eventually we'll be all good.
<ALR> My EmbeddedAS testing for the past few weeks has been stale
<pgier> ok, I'll check with Rajesh to make sure he uploads that
<ALR> After renaming artifacts, I've been using the same ol' snapshots of the old locations
* gerdogdu (n=gurkaner@85.104.134.232) has joined #jboss-dev
* aslak (n=aslak@212-71-93-70.dsl.no.powertech.net) has joined #jboss-dev
<ALR> pgier: Sorry.
<pgier> ALR: ?
<ALR> pgier: "mvn clean install -Pdist-zip" doesn't give me an org/jboss/jbossas/jboss-as-distribution in my local.
<pgier> I think the problem is a hard coded path in the assembly descriptor
<pgier> when the version changed, then it broke the assembly
<pgier> take a look at src/assembly/jboss-dist.xml
<ALR> pgier: Are we considering the tag immutable?
<ALR> Ooh yeah.
<pgier> probably up to Nihility whether it should be fixed
<ALR> IMHO it should be this assembly which is also the AS distribution which goes to SF.net; so we don't have different things there and in M2 repo.
* gerdogdu (n=gurkaner@85.104.134.232) has left #jboss-dev
<pgier> ALR: yeah, I think that's a good idea
<Nihility> i just patched the version
<Nihility> do an update
<ALR> pgier: :) Which opens a bad door. It'd have to be that artifact which also gets tested in QE.
<ALR> Nihility: Thanks.
* cbrock (n=cbrock@redhat/jboss/cbrock) has joined #jboss-dev
* ChanServ gives voice to cbrock
<ALR> Or we one-off this M1, and then in trunk it's all a Maven build anyway, and this problem goes away.
* asoldano has quit ("I'm leaving")
<ALR> But today I'll test EmbeddedAS with it, which should give some decent coverage of major moving parts.
<ALR> bstansberry: If you feel voyeuristic: https://jira.jboss.org/jira/browse/JBBOOT-116
<bstansberry> ALR: thanks
<ALR> I suppose a design Wiki is in order ,but I think the description in the JIRA should suffice for now.
* rploski has quit ()
<Jaikiran> Nihility: btw, would there be a merge of the tagged M1 with trunk or is it upto individuals to port their fixes from M1 to trunk?
<Jaikiran> fixes within the AS code base
<ALR> Jaikiran: A mass merge wouldn't work.
<dmlloyd> yeah, it's too divergent
<Nihility> it has to be individual updates
<ALR> 1) Too much has changed 2) The source locations have moved
<ALR> src/main/java in turnk
<dmlloyd> I did a diff of just component matrix and it made me sad :)
<ALR> *trunk
<Jaikiran> hmm, yeah
<Jaikiran> i'll scan through some of the jiras i fixed for M1 and see if i have some pending ports
<ALR> I have enough ports to be a harbor.
<Jaikiran> :)
<jpederse> ALR: regarding JBBOOT-116, there have to be a layer in MC also, as services should be stopped in the reverse order as they started
<jpederse> ALR, bstansberry: so maybe talk with Ales about that
<ALR> jpederse: They already are...(well the bootstraps are shut down in reverse order anyway)
<ALR> jpederse: But bootstrap sits above MC. So they really shouldn't depend upon this at all.
<jpederse> ALR: I'm more thinking in the lines of parallel deployments once MC trunk hits AS trunk
<jpederse> ALR: but yeah, ideally you shouldn't worry
<ALR> jpederse: Yeah, this is an opt-in mechanism for subsystems.
<dmlloyd> if the thing which accepts requests depends on the thing that processes requests, then the request acceptor should naturally stop before the request executor
<jpederse> ALR: yup, the real issue is really proper dependency definitions
<ALR> Hmm, I wonder if I should use jboss-threads as the concurrent broadcast phase 1 executor. :) Naaaah. :D
<bstansberry> jpederse, ALR: still, it's a good point. for a full undeploy, the MC handles the dependencies, but for the "phase 1" part there needs to be similar behavior
<jpederse> ALR: no dependencies, thank you
<ALR> jpederse: No API deps.
<ALR> bstansberry: Then I'm not understanding what we'd ask of MC to do here.
<bstansberry> ALR: well just imagine a weld-type app with an SFSB container fronted by a web container
<ALR> k.
<bstansberry> you don't want the SFSB container to start rejecting new sessions before
<bstansberry> hmm, never mind :)
<jpederse> ALR: you are f.ex. asking JCA to shutdown before the SLSB container
<bstansberry> no, don't never mind ;-)
<ALR> Not a bad point.
<bstansberry> yeah, exactly
<ALR> Right, they need to stop accepting new session in order.
<ALR> *sessions
<jpederse> ALR: but the SLSB already have the request
<jpederse> ALR: but the request hasn't reached JCA yet
<ALR> bstansberry: So for stuff with explicit deps, those can't be concurrent
<jpederse> ALR: kaboom
<ALR> Web > EJB3 > JCA, in that order.
<bstansberry> if they are smart, the SFSB container knows it's only fronted by the web container, so it doesn't register at all
<ALR> Hmm, I don't want to work on this anymore. :)
<jpederse> ALR: so in a sense it is up to the kernel to notify in the correct order
<bstansberry> ALR: they can be concurrent, but they need to be mediated by the MC dependency mechanism. analogous to parallel deployment
<ALR> bstansberry: The thing is, for anything that opts in needs to know its dependencies
<jpederse> bstansberry: kernel level service IMHO
<ALR> bstansberry, jpederse: Right, so now this becomes a feature of MC. Not an add-on.
<jpederse> bstansberry: its the kernel single entity that knows the dependency chain
<bstansberry> jpederse: yep. having the SFSB container try to understand that is a hack
<jpederse> ALR, bstansberry: and then we have all the OSGi stuff to worry about too
<dmlloyd> OSGi is just another name for "don't worry about deps, that's someone else's problem"
<jpederse> ALR, bstansberry: so def. something for the kernel ;)
<jpederse> ALR, bstansberry: well, at least some of it
<jpederse> dmlloyd: until you start deploying multiple containers
<ALR> Ah, I remember discussing this w/ Carlo
<ALR> We'd also looked at a ThreadLoad mechanism.
<jpederse> dmlloyd: f.ex. two different JCA containers - 1.5 and 1.6
<ALR> Where you must know the entry/exit point of each request.
<ALR> So in other words, EJB3 can gracefully shut down.
<ALR> And block all incoming requests.
<ALR> UNLESS there's a ThreadLocal saying "hey I came in from JbossWeb, serve me".
<jpederse> yuck
<ALR> Ugly with its advantages.
<jpederse> ThreadLocal is not a good contract for inter container communication
<jpederse> ALR: well, the whole thing needs a PoC, so it 'could' be a first implementation - just to expose all the problems
<bstansberry> jpederse: yes. we need to think of ways to add useful functionality
<ALR> Personally I dislike things have fly under the API like sysprops and ThreadLocal. But here we can say: "Whomever sets it is responsible for unsetting it".
<ALR> Else what's the difference between this 2-phase graceful shutdown and a regular MC lifecycle phase?
<ALR> stop, destroy
<jpederse> ALR: yeah, but f.ex. all work done in JCA is done in each own thread - and they can be long running processes - so there have to be a well-defined contract
<ALR> Or adding a new lifecycle phase? (pre-stop) to halt processing of new requests
<ALR> Actually that seems more likely. New lifecycle phase.
<ALR> Then it's all built into MC from the get-go.
<jpederse> ALR: yeah, that could be a possible solution
<ALR> Looks easiest too.
<jpederse> ALR: but you still have the problem with my use-case
* kconner (n=kevin@redhat/jboss/kconner) has joined #jboss-dev
* ChanServ gives voice to kconner
<bstansberry> jpederse: what problem is that again?
<ALR> jpederse: I think when getting the lifecycle callback to @PreStop your subsystem would have to determine if it wanted to halt/interrupt any long-running Tasks.
<jpederse> bstansberry: incoming request to SLSB container that hasn't reached f.ex. JCA yet
<jpederse> bstansberry: and JCA is notified of the shutdown before the requests hits
<jpederse> bstansberry: so it'll block since there are currently no active work
<ALR> bstansberry: That's handled.
<ALR> SLSB depends on JCA. So JCA won't get the @PreStop event until EJB3 @PreStop is done.
<ALR> I mean jpederse^
<jpederse> ALR: ok, that would solve it
<jpederse> ALR: now you will have to determine to which beans to send the notification to first...
<ALR> Which again means that all subsystems must have explicitly set their deps correctly. :)
<bstansberry> ALR: are there a lot of unexpressed dependencies, e.g. within an EJB3 app
<ALR> I'm SURE there are.
<bstansberry> e.g. SFSB calls into SLSB
<ALR> bstansberry: That should tie into EJB3 Containers becoming first-class MC beans.
* bstansberry starts thinking in terms of a web-tier only initial version
<jpederse> ALR, bstansberry: I think that the first use-case to solve is to look at the entire AS as one component
<jpederse> ALR, bstansberry: then solve the problem with requests coming from the "outside" -- web, ...
<ALR> Why draw any distinction?
<jpederse> bstansberry: brain clustering again :)
<dmlloyd> the problem with making random things be MC beans is that the MC lifecycle states don't make sense
<ALR> A deployment may create any number of components.
<ALR> The components should have well-defined dependencies anyway, otherwise they boot by luck alone.
<dmlloyd> there should only be two states: "up" and "not up". If you depend on e.g. classloading "phase", then that should be a separate target for that "phase"
<ALR> So when we bring them down, do so in reverse order. The only thing we're adding now is a new phase to keep servicing but stop listening on new requests.
<dmlloyd> but that's just a pet peeve of mine I guess
* whitingjr (n=whitingj@gondolin.ncl.ac.uk) has left #jboss-dev
<jpederse> dmlloyd: I agree, the rest are internal kernel states only
<ALR> "Phase" is a concern of the environment, not an individual bean.
<dmlloyd> how much time do they spend struggling over the fact that "states" need to be extensible, or that the default ones do not suffice
<ALR> Beans opt-in to take action on phase events triggered by the environment though.
<jpederse> ALR: yeah, so start simple with the web container as a PoC
<jpederse> ALR: or JNDI
<ALR> Everything depends on JNDI.
<ALR> :)
<jpederse> ALR: yeah, I mean not in-vm reuqests ;)
* cbrock_ (n=cbrock__@redhat/jboss/cbrock) has joined #jboss-dev
* ChanServ gives voice to cbrock_
<jpederse> ALR: I think it would be a lot simpler to alter the naming server to allow graceful shutdown
<jpederse> bstansberry: another idea ^
<bstansberry> jpederse: JNDI would help, but i don't think it would be reliable enough though
<ALR> jpederse: And why wouldn't that just be another addition of @PreStop lifecycle event handler in the naming server?
<bstansberry> who knows when the client is going to invoke on whatever they looked up?
<jpederse> ALR: I said, first use-case
<ALR> jpederse: MC dep mechansim would fire stuff off in order ,JNDI would come near last as everything depends upon it...
<ALR> jpederse: Right, I guess I'm still working the angle where we decompose into parts:
<ALR> 1) A mechanism (probably new lifecycle phase)
<ALR> 2) Every subsystem takes advantage of it
<ALR> 3) Analyze the MC dep graphs to ensure that each component is declaring its deps fully
<jpederse> ALR: correct, but what happens if a container returns false for the gracefulShutdown() method -- e.g. I can't stop now, ask later -- or do you want it blocking in the entire chain ?
<ALR> jpederse: There's no return value on lifecycle callbacks; it's all blocking.
<jpederse> ALR: ok, in that case - throws an exception
* ggear (n=ggear@host-83-146-13-110.dslgb.com) has joined #jboss-dev
<jpederse> ALR: but I'm getting too far into the use-cases currently....
<ALR> jpederse: Same as if any lifecycle method throws an exception. MC is handling it.
<ALR> jpederse: Probably pages of stack.
<jpederse> ALR: yeah, but the feature is called "Graceful shutdown"...
<bstansberry> jpederse: a container doesn't get to veto a shutdown. whoever requests it provides a max wait time and if that is exceeded, the shutdown occurs
<jpederse> ALR: not shutdown-kaboom
<jpederse> bstansberry: that would be a way
<jpederse> bstansberry: but its blocking all the way down the chain
<jpederse> bstansberry: f.ex. stopping long running JCA threads can take some time
<jpederse> bstansberry: just something to think about...
<bstansberry> jpederse: sessions as well. could easily by 30 minutes
<jpederse> bstansberry: yup
<bstansberry> jpederse: but the max timeout would need to be an overall time, i.e. not applied independently for each item in the chain
<ALR> jpederse: So now we've got a discussion about the shutdown impl vs. clients.
<ALR> Lifecycle states are an open-ended contract, anyone can abuse it.
<ALR> Same thing with startup.
<ALR> I can make AS start in 30 minutes by installing a bean w/ Thread.sleep in @Start
<bstansberry> yeah, but a 30 minute session timeout is the default for most webapps
<dmlloyd> I could make the container die by using System.exit()
<jpederse> the KaboomMCService :)
<ALR> So far as the impl mechanism as a lifecycle state, I don't see the problem. We notify the subsystem, and now its their responsibility to cleanly shut down as quickly as possible.
<dmlloyd> let's assume that start() will generally execute as quickly as possible (non-blocking) and stop() will block until everything is down.
<ALR> bstansberry: We have to consider sessions, or just requests?
<dmlloyd> nothing else really makes sense
<ALR> I think graceful shutdown just means that requests in process come out OK.
<ALR> We don't guarantee that new requests for a current session are gonna be processed...do we?
<bstansberry> graceful shutdown == i don't lose my session.
<jpederse> ALR: well, the request have to be migrated to another server in the cluster
<bstansberry> the whole point is to not lose sessions
<ALR> bstansberry: On reboot your session can stillbe there.
<ALR>
<ALR> bstansberry: Or maybe that's a config option for JBossWeb. "Wait on all sessions". During @PreStop it can handle how it likes.
<jpederse> bstansberry: yeah, I guess we also have the case where there is only one machine - good point
<bstansberry> ALR: yes. a clustered web app can be a lot quicker since it can understand that it has replicated the session
<bstansberry> ALR: problem with counting on session persistence is it can take a long time to restart the server
<ALR> bstansberry, jpederse: I'll start a Thread on AS Development forum
<bstansberry> cool. we haven't even gotten into the transaction manager issues :)
<jpederse> haha
<ALR> All ears.
<bstansberry> client has no open SFSB sessions, no ongoing SLSB requests.
<bstansberry> so, EJB can stop, right?
<bstansberry> wrong
<bstansberry> client has an ongoing transaction
<bstansberry> and the TM is later in the chain vs the EJB, so counting on @PreStop in reverse order doesn't handle it either
<dmlloyd> well the EJB shutdown process just takes open transactions into account, that's all
<bstansberry> dmlloyd: yeah, i guess the container will have to track all open transactions
<bstansberry> dmlloyd: whether or not it actually has to do anything when they commit/rollback
<ALR> bstansberry, dmlloyd, jpederse: http://www.jboss.org/index.html?module=bb&op=viewtopic&p=4267115
<ALR> bstansberry: In that case an open Tx is the same as a session
<dmlloyd> I still don't see any case where a separate, graceful stage is needed
<ALR> bstansberry: So we have to ask the TxManager for all Txs for that component...I'm not familiar enough with those APIs to know how that works in practice though.
<dmlloyd> just gonna make things more complex...
<ALR> dmlloyd: What would you do? Put it all in @Stop?
<dmlloyd> <dmlloyd> if the thing which accepts requests depends on the thing that processes requests, then the request acceptor should naturally stop before the request executor
<ALR> Right.
<dmlloyd> then stop just has to wait until everything's done, no matter how long it takes
<ALR> This puts the request acceptor stuff stopping in @PreStop, Executors stop in @Stop
<dmlloyd> ungraceful shutdown can be implemented via interrupt. If stop is interrupted with the "kill" flag set to true, burn down the house and get out of there as quick as possible.
<ALR> But it's not like things are now separated out enough that "acceptors" are different components.
<ALR> For example the entry point to EJB3 stuff can be remoting.
<ALR> And we don't have one remoting acceptor component per Container/Deployment
<dmlloyd> that seems like an easier solution than adding another phase.
<dmlloyd> if you have one MC component per container, why not two?
<jpederse> reminds me of http://www.youtube.com/watch?v=NXbhfwHorlI
<dmlloyd> "acceptor" and "doer"
<ALR> Because of unified entry points.
<ALR> dmlloyd: Each webapp gets its own frontend acceptor?
<ALR> dmlloyd: Or we break up JBossWeb to have a separate acceptor?
<dmlloyd> nah, each container gets its own acceptor
<dmlloyd> jbossweb (for example) talks to that, not the actual container
<ALR> EJB3 has container per bean.
<ALR> But one remoting connector
<dmlloyd> then when the EJB is undeployed, the acceptor is stopped

2. Re: Graceful Shutdown

emuckenhuber Nov 24, 2009 7:30 AM (in response to alrubinger)

Hmm graceful shutdown should not be part of the MC bean lifecycle, as it would then always shutdown in a graceful manner. This should be an optional way to shutdown AS - triggered with a different signal or management action.
Additionally with the graceful shutdown there will most likely be a timeout which then is going just stop AS. This does not seem to fit very well with MC lifecycle actions, since we would basically need to interrupt a action during a state transition (pre_stop -> stop).

I see the problem with the order of calling the graceful shutdown and agree that we should try to leverage existing dependencies. Thinking about that i'm not really sure if using bean dependencies would make sense though.
Using MC bean dependencies would mean that the jboss.web somehow needs to have a dependency on EJB3, which does not really make sense - since the bean itself does not need this dependency.
Maybe one thing which might be worth looking at is if we can use Profile dependencies for that. Since we are going to have something like optional dependencies as well, to better control the boot sequence. At least at the point where a profile would describe something like a service/container - this set of dependencies could make more sense.

3. Re: Graceful Shutdown

emuckenhuber Nov 24, 2009 8:11 AM (in response to alrubinger)

Actually this also brings up the question if we want to have something like a "acceptor" phase in our bootstrap process, which could be part of the Profile definition. This could be interesting in general otherwise we might start the connector before deploying user applications. This does not make it easier for the ProfileService part, but it might help to handle this part better as well.

4. Re: Graceful Shutdown

alrubinger Nov 24, 2009 8:33 AM (in response to alrubinger)

"emuckenhuber" wrote:
Actually this also brings up the question if we want to have something like a "acceptor" phase in our bootstrap process

To highlight DML's suggestion in the long chat, one idea on the table is to pull out "acceptors" and "processors", where acceptors depend on processors. Then there's no additional lifecycle phase, just a typical dependency.

S,
ALR

5. Re: Graceful Shutdown

brian.stansberry Nov 24, 2009 10:42 AM (in response to alrubinger)

"emuckenhuber" wrote:
Hmm graceful shutdown should not be part of the MC bean lifecycle, as it would then always shutdown in a graceful manner. This should be an optional way to shutdown AS - triggered with a different signal or management action.
Additionally with the graceful shutdown there will most likely be a timeout which then is going just stop AS. This does not seem to fit very well with MC lifecycle actions, since we would basically need to interrupt a action during a state transition (pre_stop -> stop).

With the acceptor concept we should be able to deal with these issues even though we're essentially using the MC lifecycle.

The acceptors can all be registered with a central management bean that can set a property as to how long they should wait to return from stop(). -1, don't do anything, just return, 0 wait as long as it takes, > 0, wait that long. The default is -1 or something configurable at the server level. The management console sets to something else if a graceful shutdown is invoked.

The acceptors should have a thread pool injected so they use a pool thread to actually perform the graceful shutdown work (with a Future). So the thread calling in from the MC won't have to be interrupted in some arbitrary code; it's just blocking in a future.

It's the responsibility of the acceptor impls (or more likely some element of the container they're associated with) to ensure that if a graceful shutdown is requested and is still in progress they can still cleanly perform their normal stop() processing. I don't think this should be hard. As I see it, the graceful shutdown task would be to:

1) If possible, signal load balancing mechanism that this node shouldn't receive new sessions, or new requests if requests are the relevant unit of work.

.. some details skipped...

2) Wait for existing work to complete.

3) Close a gate such that requests for new work raise an exception, generate an HTTP 503 etc.

It shouldn't be hard to ensure that a normal stop() can proceed even if the above isn't complete.

6. Re: Graceful Shutdown

emuckenhuber Nov 24, 2009 11:10 AM (in response to alrubinger)

Ok, thanks for the explanation.

I was actually more referring to a problem i see with ProfileService deploying profiles. When having more and smaller profiles it's likely that services are started before user applications are deployed. The same would be when stopping AS (graceful or not), that deployments are undeployed before the services.
So we might need to resurrect the DeploymentPhase we had before and artificially create a deployers and deploy phase or similar - but this might be more a topic for a different thread.

7. Re: Graceful Shutdown

brian.stansberry Nov 24, 2009 11:25 AM (in response to alrubinger)

Could the deployment of the acceptor be part of the deployment of the service? It's just a separate MC bean that depends on the core service.

That eliminates the issue of deployment phases.

I want to restate my understanding of David's idea in case I have it wrong. :-) The key concept is to separate internal requests from external requests, with the acceptor acting as a gate on the external requests. So graceful shutdown ends external requests, while still leaving the possibility of internal requests, e.g. web tier calling into an EJB. For the internal requests, the normal dependency mechanism (web app depends on ejb) ensures the EJB doesn't undeploy before the webapp is undeployed.

8. Re: Graceful Shutdown

alrubinger Nov 24, 2009 12:22 PM (in response to alrubinger)

"ALRubinger" wrote:
The key concept is to separate internal requests from external requests, with the acceptor acting as a gate on the external requests. So graceful shutdown ends external requests, while still leaving the possibility of internal requests, e.g. web tier calling into an EJB. For the internal requests, the normal dependency mechanism (web app depends on ejb) ensures the EJB doesn't undeploy before the webapp is undeployed.

To copy myself a bit from a #jboss-dev IRC talk:

What's an invocation which was triggered by an EJB Timer; internal or external? While I like the notion of separating out acceptors/processors, I think the context of the request/session in progress is not always so clear.

Also we've talked about a separation between services and deployments.; but services are deployments themselves. Deployments may depend both upon each other and upon services. Services may depend upon each other. In this light I think the standard MC dependency mechanism will suffice.

To me the tricky part is extracting out all the endpoints (acceptors) and ensuring all moving parts are explicitly wired together.

S,
ALR

9. Re: Graceful Shutdown

jason.greene Nov 24, 2009 2:38 PM (in response to alrubinger)

"ALRubinger" wrote:

What's an invocation which was triggered by an EJB Timer; internal or external?

Internal.

Also we've talked about a separation between services and deployments.; but services are deployments themselves. Deployments may depend both upon each other and upon services. Services may depend upon each other. In this light I think the standard MC dependency mechanism will suffice.

MC dependencies don't make sense in this case. MC deps are internal service implementation details, and don't necessarily reflect the current runtime behavior. Graceful shutdown is all about the enclosing request (or transaction).

To me the tricky part is extracting out all the endpoints (acceptors) and ensuring all moving parts are explicitly wired together.

We can start small. First implement this kind of thing for the web layer, that will handle 90% of what people are after (Brian's idea).

10. Re: Graceful Shutdown

brian.stansberry Nov 24, 2009 2:40 PM (in response to alrubinger)

Perhaps better than the concept of internal/external is whether the AS can be expected to work out the dependencies between components and establish proper MC dependencies. If it can, then the normal MC "shut down dependent items first" behavior should handle it. The acceptor comes into play to cut off further requests from clients the MC can't know about.

So, can the MC be expected to know about the relationship between the EJB Timer and the EJB(s) it invokes on?

11. Re: Graceful Shutdown

alrubinger Nov 24, 2009 2:44 PM (in response to alrubinger)

"jason.greene@jboss.com" wrote:
MC dependencies don't make sense in this case. MC deps are internal service implementation details, and don't necessarily reflect the current runtime behavior. Graceful shutdown is all about the enclosing request (or transaction).

If the MC deps don't represent the current runtime, it's wired up incorrectly and works by luck. To get graceful shutdown means to shut the runtime down in order such that no request/session gets orphaned in the process.

S,
ALR

12. Re: Graceful Shutdown

brian.stansberry Nov 24, 2009 2:50 PM (in response to alrubinger)

A semi-nasty wrinkle that occurred to me as I wrote the above:

For a container involving sessions or other long running units of work, we want to allow new requests for that session until it has expired/been removed/been replicated. This is the work of the acceptor, which needs to interact with the container to track the status of sessions.

But actually it's only interested in sessions associated with its endpoint. The fact that calls coming in via some other endpoint are keeping sessions alive shouldn't prevent the acceptor completing its stop method.

Perhaps this isn't a huge issue; e.g. if a SFSB takes requests via a Remoting connector and also via the web tier, there are 2 acceptors involved -- the SFSB's and the web app's. If the shutdown is multithreaded, stop() can proceed on both in parallel. So all sources of new sessions will be cut off and eventually and the SFSB+Remoting acceptor can complete stop().

This requires multithreaded shutdown though.

13. Re: Graceful Shutdown

jason.greene Nov 24, 2009 6:29 PM (in response to alrubinger)

"ALRubinger" wrote:
"jason.greene@jboss.com" wrote:
MC dependencies don't make sense in this case. MC deps are internal service implementation details, and don't necessarily reflect the current runtime behavior. Graceful shutdown is all about the enclosing request (or transaction).

If the MC deps don't represent the current runtime, it's wired up incorrectly and works by luck. To get graceful shutdown means to shut the runtime down in order such that no request/session gets orphaned in the process.

Let me give an example. Someone defines a service A, which supports remote invocation. There is no dependency between service A and the EJB container (Service B), because it doesn't make sense (they have nothing to do with each other). Then instance SFSB Foo of Service B *dynamically* decides to call service A during an invocation. Then during "graceful" shutdown, service A is stopped first.

The point is that runtime callflow is orthogonal to service dependencies.

14. Re: Graceful Shutdown

alrubinger Nov 24, 2009 6:39 PM (in response to alrubinger)

"jason.greene@jboss.com" wrote:
Let me give an example. Someone defines a service A, which supports remote invocation. There is no dependency between service A and the EJB container (Service B), because it doesn't make sense (they have nothing to do with each other). Then instance SFSB Foo of Service B *dynamically* decides to call service A during an invocation. Then during "graceful" shutdown, service A is stopped first.

Haha, that's exactly my point. There's a missing dependency between B->A. B requires A to function correctly, regardless of whether its a dynamic thing or something we support w/ injection. So if they're deployed in the same container/cluster, I'd expect there to be an explicit dep declared.

For what use case is this not possible?

S,
ALR