0 Replies Latest reply on Dec 29, 2011 3:38 PM by erhard

    Error 404 after sucessfull cluster switch

    erhard

      I have the setup from http://community.jboss.org/thread/176649 (that patch included) with two JBoss 4 as backend. the following happens:

      When I shutdown the JBoss with the active session, the second takes over sucessfully. But then sometimes I get a 404 for a single hit. Parts from the logfile (with debug):

       

       

      {code}

      [Thu Dec 29 19:16:07 2011] [debug] mod_proxy_ajp.c(644): proxy: AJP: serving URL ajp://192.168.40.129:8109/demo/

      ...

      [Thu Dec 29 19:16:07 2011] [debug] ajp_header.c(516): ajp_unmarshal_response: status = 200

      ... Shutdown of first node

      [Thu Dec 29 19:16:07 2011] [debug] mod_manager.c(2296): manager_handler DISABLE-APP (/) processing: "JVMRoute=cluster1&Alias=localhost&Context=%2Fdemo"

      [Thu Dec 29 19:16:07 2011] [debug] mod_manager.c(2339): manager_handler DISABLE-APP  OK

      [Thu Dec 29 19:16:07 2011] [debug] mod_manager.c(1653): manager_trans STOP-APP (/)

      [Thu Dec 29 19:16:07 2011] [debug] mod_manager.c(2296): manager_handler STOP-APP (/) processing: "JVMRoute=cluster1&Alias=localhost&Context=%2Fdemo"

      [Thu Dec 29 19:16:07 2011] [debug] mod_manager.c(2339): manager_handler STOP-APP  OK

      [Thu Dec 29 19:16:07 2011] [debug] mod_manager.c(1653): manager_trans REMOVE-APP (/)

      [Thu Dec 29 19:16:07 2011] [debug] mod_manager.c(2296): manager_handler REMOVE-APP (/) processing: "JVMRoute=cluster1&Alias=localhost&Context=%2Fdemo"

      [Thu Dec 29 19:16:07 2011] [debug] mod_manager.c(2339): manager_handler REMOVE-APP  OK

      ... Successfully removed route

      [Thu Dec 29 19:16:08 2011] [debug] mod_proxy_ajp.c(644): proxy: AJP: serving URL ajp://192.168.123.231:8209/demo/

      ...

      [Thu Dec 29 19:16:08 2011] [debug] ajp_header.c(516): ajp_unmarshal_response: status = 200

       

      ... But then it uses ajp://192.168.40.129:8109/demo/ again and fails with 404

       

      [Thu Dec 29 19:16:09 2011] [debug] proxy_util.c(1937): proxy: BALANCER: retrying the worker for (192.168.40.129)

      [Thu Dec 29 19:16:09 2011] [debug] proxy_util.c(1943): proxy: BALANCER: worker for (192.168.40.129) has been marked for retry

      [Thu Dec 29 19:16:09 2011] [debug] mod_proxy.c(993): Running scheme balancer handler (attempt 0)

      [Thu Dec 29 19:16:09 2011] [debug] mod_proxy_http.c(1930): proxy: HTTP: declining URL ajp://192.168.40.129:8109/demo/

      [Thu Dec 29 19:16:09 2011] [debug] mod_proxy_scgi.c(508): [client 127.0.0.1] proxy: SCGI: declining URL ajp://192.168.40.129:8109/demo/

      [Thu Dec 29 19:16:09 2011] [debug] mod_proxy_ajp.c(644): proxy: AJP: serving URL ajp://192.168.40.129:8109/demo/

      [Thu Dec 29 19:16:09 2011] [debug] proxy_util.c(1999): proxy: AJP: has acquired connection for (192.168.40.129)

      [Thu Dec 29 19:16:09 2011] [debug] proxy_util.c(2055): proxy: connecting ajp://192.168.40.129:8109/demo/ to 192.168.40.129:8109

      [Thu Dec 29 19:16:09 2011] [debug] proxy_util.c(2153): proxy: connected /demo/ to 192.168.40.129:8109

      [Thu Dec 29 19:16:09 2011] [debug] ajp_utils.c(31): Into ajp_handle_cping_cpong

      [Thu Dec 29 19:16:09 2011] [debug] ajp_utils.c(102): ajp_handle_cping_cpong: Done

      [Thu Dec 29 19:16:09 2011] [debug] ajp_header.c(224): Into ajp_marshal_into_msgb

      [Thu Dec 29 19:16:09 2011] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[0] [Accept-Encoding] = [identity]

      [Thu Dec 29 19:16:09 2011] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[1] [Host] = [devjava]

      [Thu Dec 29 19:16:09 2011] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[2] [Cookie] = [JSESSIONID=D6AcPj+99EObFqH+F9wjew**.cluster1]

      [Thu Dec 29 19:16:09 2011] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[3] [Connection] = [close]

      [Thu Dec 29 19:16:09 2011] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[4] [User-agent] = [Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)]

      [Thu Dec 29 19:16:09 2011] [debug] ajp_header.c(450): ajp_marshal_into_msgb: Done

      [Thu Dec 29 19:16:09 2011] [debug] mod_proxy_ajp.c(265): proxy: APR_BUCKET_IS_EOS

      [Thu Dec 29 19:16:09 2011] [debug] mod_proxy_ajp.c(270): proxy: data to read (max 8186 at 4)

      [Thu Dec 29 19:16:09 2011] [debug] mod_proxy_ajp.c(285): proxy: got 0 bytes of data

      [Thu Dec 29 19:16:09 2011] [debug] ajp_header.c(687): ajp_read_header: ajp_ilink_received 04

      [Thu Dec 29 19:16:09 2011] [debug] ajp_header.c(697): ajp_parse_type: got 04

      [Thu Dec 29 19:16:09 2011] [debug] ajp_header.c(516): ajp_unmarshal_response: status = 404

      {code}

       

      The reason seems to be in mod_proxy_cluster.c,  find_route_worker():

       

      {code}

                          /*

                           * If the worker is in error state run

                           * retry on that worker. It will be marked as

                           * operational if the retry timeout is elapsed.

                           * The worker might still be unusable, but we try

                           * anyway.

                           */

                          ap_proxy_retry_worker("BALANCER", worker, r->server);

                          if (PROXY_WORKER_IS_USABLE(worker)) {

                                  return worker;

                          }

      {code}

      Here the "worker for (192.168.40.129) has been marked for retry" (see log) although the node is going down and we get a 404. Why do we try anyway? Can't we rely on mod_manager to determine the usable workers?

      I removed this code (the first change is from http://community.jboss.org/thread/176649):

       

      {code}

      ===================================================================

      --- mod_proxy_cluster.c (revision 663)

      +++ mod_proxy_cluster.c (working copy)

      @@ -1858,9 +1858,7 @@

      #endif

           if (node_storage->find_node(&ou, route) == APR_SUCCESS) {

               if (!strcmp(balancer, ou->mess.balancer)) {

      -            if (ou->mess.Domain[0] != '\0') {

      -                *domain = ou->mess.Domain;

      -            }

      +            *domain = ou->mess.Domain;

                   return APR_SUCCESS;

               }

           }

      @@ -2208,14 +2206,6 @@

                           else

                              return NULL; /* application has been removed from the node */

                       } else {

      -                    /*

      -                     * If the worker is in error state run

      -                     * retry on that worker. It will be marked as

      -                     * operational if the retry timeout is elapsed.

      -                     * The worker might still be unusable, but we try

      -                     * anyway.

      -                     */

      -                    ap_proxy_retry_worker("BALANCER", worker, r->server);

                           if (PROXY_WORKER_IS_USABLE(worker)) {

                                   return worker;

                           } else {

      @@ -2232,16 +2222,6 @@

                                   rworker = find_route_worker(r, balancer, worker->s->redirect,

                                          vhost_table, context_table);

                                   /* Check if the redirect worker is usable */

      -                            if (rworker && !PROXY_WORKER_IS_USABLE(rworker)) {

      -                                /*

      -                                 * If the worker is in error state run

      -                                 * retry on that worker. It will be marked as

      -                                 * operational if the retry timeout is elapsed.

      -                                 * The worker might still be unusable, but we try

      -                                 * anyway.

      -                                 */

      -                                ap_proxy_retry_worker("BALANCER", rworker, r->server);

      -                            }

                                   if (rworker && PROXY_WORKER_IS_USABLE(rworker))

                                       return rworker;

                               }

      {code}

       

      That seems to help for this problem in my setup.

       

      Erhard