where does the 500 errors are coming from? AS7 or httpd.
what you describe is weird mod_cluster should failover the request from one node to the other withouth problems.
The 500 error comes from jboss, with the standard jboss error page formating. Yes, I would have thought it would catch that failover too.
Is it becuase the redeploy issues are DISABLE_APP not a STOP_APP, or atleast not quickly enough? Maybe it should just call STOP_APP?
My temporary work-around I did by changing my script to call DISABLE_APP on the web server instances, then wait for 5 minutes (my cilents are typlically small sessions [mobile]), then do a STOP_APP, followed by the redeploy. That way forcing any left over sessions to failover. What is interesting is when I then redeploy the archive the status goes back to DISABLE_APP, which technically would allow some outlier sessions to be routed through.
The redeploy should do:
wait for existing session to move to another node
between the DISABLE_APP and STOP_APP only request with sessionid will be routed to the node.
I am sure of the other problem:
"when I then redeploy the archive the status goes back to DISABLE_APP"
do you mean that if you do a undeploy myapp.war then a deploy myapp.war the status of the context is DISABLED for a while?
Well to force the session movement I issue via a curl script the DISABLE_APP and STOP_APP commands to the apache server farm. Essentially, mimicing the undeploy mod_cluster commands that are be issued, but I wait for 5 minutes inbetween, then I do the actual redploy of the war. When the app is redeploying (it's a ROOT.war) the status goes back to DISABLED, i'm thinking the redploy doesn't check if it's in a STOP state already, then when it's redployed it back to ENABLED as expected.
In the case where I didn't have the script do the DISABLE, wait then STOP the 500 errors I belive where while the container was in the undeploy process. I would also sometimes get cdi errors when a page was being processed, as if the undeploy was occuring while a page request was happening.
Any update? BTW what are the CDI errors?