Error 404 after sucessfull cluster switch
erhard Dec 29, 2011 3:38 PMI have the setup from http://community.jboss.org/thread/176649 (that patch included) with two JBoss 4 as backend. the following happens:
When I shutdown the JBoss with the active session, the second takes over sucessfully. But then sometimes I get a 404 for a single hit. Parts from the logfile (with debug):
{code}
[Thu Dec 29 19:16:07 2011] [debug] mod_proxy_ajp.c(644): proxy: AJP: serving URL ajp://192.168.40.129:8109/demo/
...
[Thu Dec 29 19:16:07 2011] [debug] ajp_header.c(516): ajp_unmarshal_response: status = 200
... Shutdown of first node
[Thu Dec 29 19:16:07 2011] [debug] mod_manager.c(2296): manager_handler DISABLE-APP (/) processing: "JVMRoute=cluster1&Alias=localhost&Context=%2Fdemo"
[Thu Dec 29 19:16:07 2011] [debug] mod_manager.c(2339): manager_handler DISABLE-APP OK
[Thu Dec 29 19:16:07 2011] [debug] mod_manager.c(1653): manager_trans STOP-APP (/)
[Thu Dec 29 19:16:07 2011] [debug] mod_manager.c(2296): manager_handler STOP-APP (/) processing: "JVMRoute=cluster1&Alias=localhost&Context=%2Fdemo"
[Thu Dec 29 19:16:07 2011] [debug] mod_manager.c(2339): manager_handler STOP-APP OK
[Thu Dec 29 19:16:07 2011] [debug] mod_manager.c(1653): manager_trans REMOVE-APP (/)
[Thu Dec 29 19:16:07 2011] [debug] mod_manager.c(2296): manager_handler REMOVE-APP (/) processing: "JVMRoute=cluster1&Alias=localhost&Context=%2Fdemo"
[Thu Dec 29 19:16:07 2011] [debug] mod_manager.c(2339): manager_handler REMOVE-APP OK
... Successfully removed route
[Thu Dec 29 19:16:08 2011] [debug] mod_proxy_ajp.c(644): proxy: AJP: serving URL ajp://192.168.123.231:8209/demo/
...
[Thu Dec 29 19:16:08 2011] [debug] ajp_header.c(516): ajp_unmarshal_response: status = 200
... But then it uses ajp://192.168.40.129:8109/demo/ again and fails with 404
[Thu Dec 29 19:16:09 2011] [debug] proxy_util.c(1937): proxy: BALANCER: retrying the worker for (192.168.40.129)
[Thu Dec 29 19:16:09 2011] [debug] proxy_util.c(1943): proxy: BALANCER: worker for (192.168.40.129) has been marked for retry
[Thu Dec 29 19:16:09 2011] [debug] mod_proxy.c(993): Running scheme balancer handler (attempt 0)
[Thu Dec 29 19:16:09 2011] [debug] mod_proxy_http.c(1930): proxy: HTTP: declining URL ajp://192.168.40.129:8109/demo/
[Thu Dec 29 19:16:09 2011] [debug] mod_proxy_scgi.c(508): [client 127.0.0.1] proxy: SCGI: declining URL ajp://192.168.40.129:8109/demo/
[Thu Dec 29 19:16:09 2011] [debug] mod_proxy_ajp.c(644): proxy: AJP: serving URL ajp://192.168.40.129:8109/demo/
[Thu Dec 29 19:16:09 2011] [debug] proxy_util.c(1999): proxy: AJP: has acquired connection for (192.168.40.129)
[Thu Dec 29 19:16:09 2011] [debug] proxy_util.c(2055): proxy: connecting ajp://192.168.40.129:8109/demo/ to 192.168.40.129:8109
[Thu Dec 29 19:16:09 2011] [debug] proxy_util.c(2153): proxy: connected /demo/ to 192.168.40.129:8109
[Thu Dec 29 19:16:09 2011] [debug] ajp_utils.c(31): Into ajp_handle_cping_cpong
[Thu Dec 29 19:16:09 2011] [debug] ajp_utils.c(102): ajp_handle_cping_cpong: Done
[Thu Dec 29 19:16:09 2011] [debug] ajp_header.c(224): Into ajp_marshal_into_msgb
[Thu Dec 29 19:16:09 2011] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[0] [Accept-Encoding] = [identity]
[Thu Dec 29 19:16:09 2011] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[1] [Host] = [devjava]
[Thu Dec 29 19:16:09 2011] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[2] [Cookie] = [JSESSIONID=D6AcPj+99EObFqH+F9wjew**.cluster1]
[Thu Dec 29 19:16:09 2011] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[3] [Connection] = [close]
[Thu Dec 29 19:16:09 2011] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[4] [User-agent] = [Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)]
[Thu Dec 29 19:16:09 2011] [debug] ajp_header.c(450): ajp_marshal_into_msgb: Done
[Thu Dec 29 19:16:09 2011] [debug] mod_proxy_ajp.c(265): proxy: APR_BUCKET_IS_EOS
[Thu Dec 29 19:16:09 2011] [debug] mod_proxy_ajp.c(270): proxy: data to read (max 8186 at 4)
[Thu Dec 29 19:16:09 2011] [debug] mod_proxy_ajp.c(285): proxy: got 0 bytes of data
[Thu Dec 29 19:16:09 2011] [debug] ajp_header.c(687): ajp_read_header: ajp_ilink_received 04
[Thu Dec 29 19:16:09 2011] [debug] ajp_header.c(697): ajp_parse_type: got 04
[Thu Dec 29 19:16:09 2011] [debug] ajp_header.c(516): ajp_unmarshal_response: status = 404
{code}
The reason seems to be in mod_proxy_cluster.c, find_route_worker():
{code}
/*
* If the worker is in error state run
* retry on that worker. It will be marked as
* operational if the retry timeout is elapsed.
* The worker might still be unusable, but we try
* anyway.
*/
ap_proxy_retry_worker("BALANCER", worker, r->server);
if (PROXY_WORKER_IS_USABLE(worker)) {
return worker;
}
{code}
Here the "worker for (192.168.40.129) has been marked for retry" (see log) although the node is going down and we get a 404. Why do we try anyway? Can't we rely on mod_manager to determine the usable workers?
I removed this code (the first change is from http://community.jboss.org/thread/176649):
{code}
===================================================================
--- mod_proxy_cluster.c (revision 663)
+++ mod_proxy_cluster.c (working copy)
@@ -1858,9 +1858,7 @@
#endif
if (node_storage->find_node(&ou, route) == APR_SUCCESS) {
if (!strcmp(balancer, ou->mess.balancer)) {
- if (ou->mess.Domain[0] != '\0') {
- *domain = ou->mess.Domain;
- }
+ *domain = ou->mess.Domain;
return APR_SUCCESS;
}
}
@@ -2208,14 +2206,6 @@
else
return NULL; /* application has been removed from the node */
} else {
- /*
- * If the worker is in error state run
- * retry on that worker. It will be marked as
- * operational if the retry timeout is elapsed.
- * The worker might still be unusable, but we try
- * anyway.
- */
- ap_proxy_retry_worker("BALANCER", worker, r->server);
if (PROXY_WORKER_IS_USABLE(worker)) {
return worker;
} else {
@@ -2232,16 +2222,6 @@
rworker = find_route_worker(r, balancer, worker->s->redirect,
vhost_table, context_table);
/* Check if the redirect worker is usable */
- if (rworker && !PROXY_WORKER_IS_USABLE(rworker)) {
- /*
- * If the worker is in error state run
- * retry on that worker. It will be marked as
- * operational if the retry timeout is elapsed.
- * The worker might still be unusable, but we try
- * anyway.
- */
- ap_proxy_retry_worker("BALANCER", rworker, r->server);
- }
if (rworker && PROXY_WORKER_IS_USABLE(rworker))
return rworker;
}
{code}
That seems to help for this problem in my setup.
Erhard