0 Replies Latest reply on Dec 29, 2011 3:38 PM by erhard

Error 404 after sucessfull cluster switch

erhard Dec 29, 2011 3:38 PM

I have the setup from http://community.jboss.org/thread/176649 (that patch included) with two JBoss 4 as backend. the following happens:

When I shutdown the JBoss with the active session, the second takes over sucessfully. But then sometimes I get a 404 for a single hit. Parts from the logfile (with debug):

{code}
[Thu Dec 29 19:16:07 2011] [debug] mod_proxy_ajp.c(644): proxy: AJP: serving URL ajp://192.168.40.129:8109/demo/
...
[Thu Dec 29 19:16:07 2011] [debug] ajp_header.c(516): ajp_unmarshal_response: status = 200
... Shutdown of first node
[Thu Dec 29 19:16:07 2011] [debug] mod_manager.c(2296): manager_handler DISABLE-APP (/) processing: "JVMRoute=cluster1&Alias=localhost&Context=%2Fdemo"
[Thu Dec 29 19:16:07 2011] [debug] mod_manager.c(2339): manager_handler DISABLE-APP  OK
[Thu Dec 29 19:16:07 2011] [debug] mod_manager.c(1653): manager_trans STOP-APP (/)
[Thu Dec 29 19:16:07 2011] [debug] mod_manager.c(2296): manager_handler STOP-APP (/) processing: "JVMRoute=cluster1&Alias=localhost&Context=%2Fdemo"
[Thu Dec 29 19:16:07 2011] [debug] mod_manager.c(2339): manager_handler STOP-APP  OK
[Thu Dec 29 19:16:07 2011] [debug] mod_manager.c(1653): manager_trans REMOVE-APP (/)
[Thu Dec 29 19:16:07 2011] [debug] mod_manager.c(2296): manager_handler REMOVE-APP (/) processing: "JVMRoute=cluster1&Alias=localhost&Context=%2Fdemo"
[Thu Dec 29 19:16:07 2011] [debug] mod_manager.c(2339): manager_handler REMOVE-APP  OK
... Successfully removed route
[Thu Dec 29 19:16:08 2011] [debug] mod_proxy_ajp.c(644): proxy: AJP: serving URL ajp://192.168.123.231:8209/demo/
...
[Thu Dec 29 19:16:08 2011] [debug] ajp_header.c(516): ajp_unmarshal_response: status = 200
 
... But then it uses ajp://192.168.40.129:8109/demo/ again and fails with 404
 
[Thu Dec 29 19:16:09 2011] [debug] proxy_util.c(1937): proxy: BALANCER: retrying the worker for (192.168.40.129)
[Thu Dec 29 19:16:09 2011] [debug] proxy_util.c(1943): proxy: BALANCER: worker for (192.168.40.129) has been marked for retry
[Thu Dec 29 19:16:09 2011] [debug] mod_proxy.c(993): Running scheme balancer handler (attempt 0)
[Thu Dec 29 19:16:09 2011] [debug] mod_proxy_http.c(1930): proxy: HTTP: declining URL ajp://192.168.40.129:8109/demo/
[Thu Dec 29 19:16:09 2011] [debug] mod_proxy_scgi.c(508): [client 127.0.0.1] proxy: SCGI: declining URL ajp://192.168.40.129:8109/demo/
[Thu Dec 29 19:16:09 2011] [debug] mod_proxy_ajp.c(644): proxy: AJP: serving URL ajp://192.168.40.129:8109/demo/
[Thu Dec 29 19:16:09 2011] [debug] proxy_util.c(1999): proxy: AJP: has acquired connection for (192.168.40.129)
[Thu Dec 29 19:16:09 2011] [debug] proxy_util.c(2055): proxy: connecting ajp://192.168.40.129:8109/demo/ to 192.168.40.129:8109
[Thu Dec 29 19:16:09 2011] [debug] proxy_util.c(2153): proxy: connected /demo/ to 192.168.40.129:8109
[Thu Dec 29 19:16:09 2011] [debug] ajp_utils.c(31): Into ajp_handle_cping_cpong
[Thu Dec 29 19:16:09 2011] [debug] ajp_utils.c(102): ajp_handle_cping_cpong: Done
[Thu Dec 29 19:16:09 2011] [debug] ajp_header.c(224): Into ajp_marshal_into_msgb
[Thu Dec 29 19:16:09 2011] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[0] [Accept-Encoding] = [identity]
[Thu Dec 29 19:16:09 2011] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[1] [Host] = [devjava]
[Thu Dec 29 19:16:09 2011] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[2] [Cookie] = [JSESSIONID=D6AcPj+99EObFqH+F9wjew**.cluster1]
[Thu Dec 29 19:16:09 2011] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[3] [Connection] = [close]
[Thu Dec 29 19:16:09 2011] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[4] [User-agent] = [Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)]
[Thu Dec 29 19:16:09 2011] [debug] ajp_header.c(450): ajp_marshal_into_msgb: Done
[Thu Dec 29 19:16:09 2011] [debug] mod_proxy_ajp.c(265): proxy: APR_BUCKET_IS_EOS
[Thu Dec 29 19:16:09 2011] [debug] mod_proxy_ajp.c(270): proxy: data to read (max 8186 at 4)
[Thu Dec 29 19:16:09 2011] [debug] mod_proxy_ajp.c(285): proxy: got 0 bytes of data
[Thu Dec 29 19:16:09 2011] [debug] ajp_header.c(687): ajp_read_header: ajp_ilink_received 04
[Thu Dec 29 19:16:09 2011] [debug] ajp_header.c(697): ajp_parse_type: got 04
[Thu Dec 29 19:16:09 2011] [debug] ajp_header.c(516): ajp_unmarshal_response: status = 404
{code}

The reason seems to be in mod_proxy_cluster.c, find_route_worker():

{code}
                    /*
                     * If the worker is in error state run
                     * retry on that worker. It will be marked as
                     * operational if the retry timeout is elapsed.
                     * The worker might still be unusable, but we try
                     * anyway.
                     */
                    ap_proxy_retry_worker("BALANCER", worker, r->server);
                    if (PROXY_WORKER_IS_USABLE(worker)) {
                            return worker;
                    } 
{code}

Here the "worker for (192.168.40.129) has been marked for retry" (see log) although the node is going down and we get a 404. Why do we try anyway? Can't we rely on mod_manager to determine the usable workers?

I removed this code (the first change is from http://community.jboss.org/thread/176649):

{code}
===================================================================
--- mod_proxy_cluster.c (revision 663)
+++ mod_proxy_cluster.c (working copy)
@@ -1858,9 +1858,7 @@
 #endif
     if (node_storage->find_node(&ou, route) == APR_SUCCESS) {
         if (!strcmp(balancer, ou->mess.balancer)) {
-            if (ou->mess.Domain[0] != '\0') {
-                *domain = ou->mess.Domain;
-            }
+            *domain = ou->mess.Domain;
             return APR_SUCCESS;
         }
     }
@@ -2208,14 +2206,6 @@
                     else
                        return NULL; /* application has been removed from the node */
                 } else {
-                    /*
-                     * If the worker is in error state run
-                     * retry on that worker. It will be marked as
-                     * operational if the retry timeout is elapsed.
-                     * The worker might still be unusable, but we try
-                     * anyway.
-                     */
-                    ap_proxy_retry_worker("BALANCER", worker, r->server);
                     if (PROXY_WORKER_IS_USABLE(worker)) {
                             return worker;
                     } else {
@@ -2232,16 +2222,6 @@
                             rworker = find_route_worker(r, balancer, worker->s->redirect,
                                    vhost_table, context_table);
                             /* Check if the redirect worker is usable */
-                            if (rworker && !PROXY_WORKER_IS_USABLE(rworker)) {
-                                /*
-                                 * If the worker is in error state run
-                                 * retry on that worker. It will be marked as
-                                 * operational if the retry timeout is elapsed.
-                                 * The worker might still be unusable, but we try
-                                 * anyway.
-                                 */
-                                ap_proxy_retry_worker("BALANCER", rworker, r->server);
-                            }
                             if (rworker && PROXY_WORKER_IS_USABLE(rworker))
                                 return rworker;
                         }
{code}

That seems to help for this problem in my setup.

Erhard