4 Replies Latest reply on Apr 2, 2010 8:47 PM by Neeraj Tickoo

    What can cause - All workers are in error state

    Adam Karl Novice

      [Wed Feb 24 06:54:47 2010] [error] proxy: CLUSTER: (balancer://mycluster). All workers are in error state

       

      Currently running mod_cluster 1.0GA but this is more of a general question.  I see the error message above from time to time.  Specifically it happened last night for me for ~15 seconds for all incoming connections.  My question is what are the known reasons for this to happen?

       

      More detail, I have 2 app servers connecting via the HA AJP method.  At that same time I see STATUS messages succeeding from the master app server node so connectivity should not have been an issue.  I also don't see any red flags as far as the app servers are concerned.  There are no errors in their logs at that time and their CPUs were not loaded (using average load as my balancing metric).

        • 1. Re: What can cause - All workers are in error state
          Jean-Frederic Clere Master
          Any other error messages before those "All workers are in error state" messages?
          • 2. Re: What can cause - All workers are in error state
            Adam Karl Novice
            Nothing else in the error log.  There were 9 instances of the message above which correspond to 9 http calls happening at that time which all ended up failing with error code 503.  They were the only errors in the log.  In the access log I saw STATUS messages succeeding during this time period.
            • 3. Re: What can cause - All workers are in error state
              Matthias Hueller Novice

              I'm having the same issue with the JBoss CirrAS Images:


              [Wed Mar 24 15:36:23 2010] [error] (70007)The timeout specified has expired: ajp_ilink_receive() can't receive header
              [Wed Mar 24 15:36:23 2010] [error] ajp_handle_cping_cpong: ajp_ilink_receive failed
              [Wed Mar 24 15:36:23 2010] [error] (120006)APR does not understand this error code: proxy: AJP: cping/cpong failed to 10.192.178.166:8009 (10.192.178.166)
              [Wed Mar 24 15:36:23 2010] [error] (70007)The timeout specified has expired: ajp_ilink_receive() can't receive header
              [Wed Mar 24 15:36:23 2010] [error] ajp_handle_cping_cpong: ajp_ilink_receive failed
              [Wed Mar 24 15:36:23 2010] [error] (120006)APR does not understand this error code: proxy: AJP: cping/cpong failed to 10.192.178.166:8009 (10.192.178.166)
              [Wed Mar 24 15:36:23 2010] [error] (70007)The timeout specified has expired: ajp_ilink_receive() can't receive header
              [Wed Mar 24 15:36:23 2010] [error] ajp_handle_cping_cpong: ajp_ilink_receive failed
              [Wed Mar 24 15:36:23 2010] [error] (120006)APR does not understand this error code: proxy: AJP: cping/cpong failed to 10.215.18.195:8009 (10.215.18.195)
              [Wed Mar 24 15:36:23 2010] [error] proxy: CLUSTER: (balancer://mycluster). All workers are in error state
              [Wed Mar 24 15:36:24 2010] [error] proxy: CLUSTER: (balancer://mycluster). All workers are in error state
              [Wed Mar 24 15:36:24 2010] [error] proxy: CLUSTER: (balancer://mycluster). All workers are in error state
              [Wed Mar 24 15:36:24 2010] [error] proxy: CLUSTER: (balancer://mycluster). All workers are in error state
              [Wed Mar 24 15:36:24 2010] [error] proxy: CLUSTER: (balancer://mycluster). All workers are in error state

               

              Any ideas?

              • 4. Re: What can cause - All workers are in error state
                Neeraj Tickoo Newbie

                Hi

                 

                Did it cause your application to fail? I also have the same problem when the httpd starts but after like 20-30 seconds(approx), I dont get this error.  Also it never caused the crash of my application. So i just ignore this error.

                 

                This is what I know about the above issue (courtesy Paul Ferraro ):

                 

                "This is probably due to the short period of time (i.e. 17 seconds) between when the CONFIG command is sent to the proxy (after the WebServer is started), and when the AJP connector (over which ajp_cping_cpong operates) is itself started.  In the AS, the connectors are the very last things to start - this is evident in the AS log.  So, for the time being - ignore these messages.  A workaround for this would be to defer all mod_cluster startup until the connectors are started."

                 

                May be the same reason is behind your error as well.

                 

                Hope this helps.

                 

                Neeraj