4 Replies Latest reply on May 16, 2013 8:16 AM by rhusar

    503 with a single node

    nichele

      Hi all,

      i'm doing some stress tests and i have a strange behaviour.

       

      The main components of my system are:

      tomcat 7 + java 1.7 + mod_cluster 1.2.0 + httpd 2.2.24 

       

      At the moment i have both tomcat and httpd running on a single node. On a different host i'm running my stress client.

       

      My client simulates "just" 15 concurrent users.

      The issue i'm having is that after a wihle (usually some minutes) some requests fail with a 503. I say "some requests" since i'm talking about 5/8 failures every 500k requests. Strange thing is that the system keeps working a part that failing requests.

       

      In the httpd log i see:

       

      [Wed Apr 10 07:20:28 2013] [debug] ajp_header.c(698): ajp_read_header: ajp_ilink_received 04

      [Wed Apr 10 07:20:28 2013] [debug] ajp_header.c(708): ajp_parse_type: got 04

      [Wed Apr 10 07:20:28 2013] [debug] ajp_header.c(527): ajp_unmarshal_response: status = 200

      [Wed Apr 10 07:20:28 2013] [debug] ajp_header.c(548): ajp_unmarshal_response: Number of headers is = 5

      [Wed Apr 10 07:20:28 2013] [debug] ajp_header.c(610): ajp_unmarshal_response: Header[0] [X-aaaa] = [aaaaaa]

      [Wed Apr 10 07:20:28 2013] [debug] ajp_header.c(610): ajp_unmarshal_response: Header[1] [Set-Cookie] = []

      [Wed Apr 10 07:20:28 2013] [debug] ajp_header.c(610): ajp_unmarshal_response: Header[2] [Accept-Encoding] = [gzip,deflate]

      [Wed Apr 10 07:20:28 2013] [debug] ajp_header.c(610): ajp_unmarshal_response: Header[3] [Content-Type] = [application/vnd.syncml+xml]

      [Wed Apr 10 07:20:28 2013] [debug] ajp_header.c(620): ajp_unmarshal_response: ap_set_content_type done

      [Wed Apr 10 07:20:28 2013] [debug] ajp_header.c(610): ajp_unmarshal_response: Header[4] [Content-Length] = [628]

      [Wed Apr 10 07:20:28 2013] [debug] ajp_header.c(698): ajp_read_header: ajp_ilink_received 03

      [Wed Apr 10 07:20:28 2013] [debug] ajp_header.c(708): ajp_parse_type: got 03

      [Wed Apr 10 07:20:28 2013] [debug] ajp_header.c(698): ajp_read_header: ajp_ilink_received 03

      [Wed Apr 10 07:20:28 2013] [debug] ajp_header.c(708): ajp_parse_type: got 03

      [Wed Apr 10 07:20:28 2013] [debug] ajp_header.c(698): ajp_read_header: ajp_ilink_received 05

      [Wed Apr 10 07:20:28 2013] [debug] ajp_header.c(708): ajp_parse_type: got 05

      [Wed Apr 10 07:20:28 2013] [debug] mod_proxy_ajp.c(625): proxy: got response from 10.32.20.35:8009 (10.32.20.35)

      [Wed Apr 10 07:20:28 2013] [debug] proxy_util.c(2031): proxy: AJP: has released connection for (10.32.20.35)

      [Wed Apr 10 07:20:28 2013] [debug] mod_proxy_cluster.c(1543): proxy: byrequests balancer FAILED

      [Wed Apr 10 07:20:28 2013] [debug] mod_proxy_cluster.c(1543): proxy: byrequests balancer FAILED

      [Wed Apr 10 07:20:28 2013] [error] proxy: CLUSTER: (balancer://mycluster). All workers are in error state

      [Wed Apr 10 07:20:28 2013] [debug] mod_proxy_cluster.c(1543): proxy: byrequests balancer FAILED

      [Wed Apr 10 07:20:28 2013] [error] proxy: CLUSTER: (balancer://mycluster). All workers are in error state

      [Wed Apr 10 07:20:28 2013] [debug] ajp_header.c(698): ajp_read_header: ajp_ilink_received 04

      [Wed Apr 10 07:20:28 2013] [debug] ajp_header.c(708): ajp_parse_type: got 04

      [Wed Apr 10 07:20:28 2013] [debug] ajp_header.c(527): ajp_unmarshal_response: status = 200

      [Wed Apr 10 07:20:28 2013] [debug] ajp_header.c(548): ajp_unmarshal_response: Number of headers is = 5

      ...

      ....

       

      In my tomcat server.xml i have:

       

        <Listener className="org.jboss.modcluster.container.catalina.standalone.ModClusterListener"

                  advertise="false"

                  proxyList="perf-ds-01:6666"

                  maxAttempts="3"

                  nodeTimeout="600"

                  workerTimeout="60"

                  ping="60"

                  stickySession="true"

                  stickySessionRemove="false"

                  stickySessionForce="false"

                  loadMetricClass="org.jboss.modcluster.load.metric.impl.AverageSystemLoadMetric"

                  loadMetricCapacity="5"

        />

       

       

      Do you have any idea about the reason ?

       

      thanks a lot in advance

      ste

        • 1. Re: 503 with a single node
          nichele

          Hi,

          good news..

          I think there are two bugs here:

          1. (minor) when the load of a node is 0, mod_cluster-manager shows the latest value (if i well remember this is already known or am'i wrong ?)

          2. (major) when the load of a node is 0 some requests (at least in an single node environmnet) can fail

           

          Workaround for me: increasing loadMetricCapacity in order to avoid Load=0.

           

          Let me know if this is reasonable and if i need to file bugs for both things.

           

          cheers,

          ste

          • 2. Re: 503 with a single node
            mbabacek

            Hi Stefano, having one worker node is...let's say: "an edge case" :-) I wonder why you are experiencing Load 0? Essentially, Load 0 means an invalid state, the node appears to the balancer as being desperately overloaded and in any sane setup, it would send other requests to a different node.

             

            Conclusion: If you have 1 worker node being loaded so much as to have Load=0, it is expected to get an error, in my opinion.

             

            BTW: I hope it is clear that Load=0 actually means "OMG full load" and Load=100 means "No load at all"...

            • 3. Re: 503 with a single node
              nichele

              Hi Michal,

              i tend to agree with you, but the strange thing is that just some requests fail. Instead, following your idea, all requests should fail until the load returns != 0.

               

              About Load=0, yes i know that means full load :-)

               

              thx

              ste

              • 4. Re: 503 with a single node
                rhusar

                Michal Babacek wrote:

                 

                BTW: I hope it is clear that Load=0 actually means "OMG full load" and Load=100 means "No load at all"...

                Watch out! This is not correct.

                 

                Load of 0 means ERROR state and the node will be put to error state. Exactly what you are seeing:

                [Wed Apr 10 07:20:28 2013] [error] proxy: CLUSTER: (balancer://mycluster). All workers are in error state

                If your system has load of over 4. and you have 4 threads, the load will be 1 - meaning full load - but the serer will be in OK state and requests will be still sent to it if its the only in this group.

                 

                E.g.:

                 

                Balancer: mycluster,LBGroup: ,Flushpackets: Off,Flushwait: 10000,Ping: 10000000,Smax: 26,Ttl: 60000000,Status: OK,Elected: 0,Read: 0,Transferred: 0,Connected: 0,Load: 1

                 

                If you are getting a metric to return 0 as actual representation of a load, it might be an old bug. If you keep seeing it with later versions please report an issue.