4 Replies Latest reply on Sep 8, 2014 9:38 AM by tomcsanyid

    mod_cluster gives error 503 sometimes

    tomcsanyid

      Hi Everyone,

       

      So here is my situation: I have a system with Apache 2.4, Ubuntu 14.04 and self-compiled github version of mod_cluster. I set this up about 3 weeks ago. We have two JBoss nodes which have different contexts running on them: node1 runs /context1 and node2 runs /context2 and /context2admin

      The problem we are facing is that sometimes we get 503 errors and 404 errors when browsing the sites. Sometimes the whole site just doesn't load, sometimes it's only the CSS that is missing so to summarize it: some requests simply fail. We use PreserveSlots On and for some time I thought the issue is related to that, because deleting all the cached node information from /etc/apache2/logs and restarting apache fixed the issue for some days, but later it returned again.

      I have been running Apache in debug mode, to get some more information and now I think the problem is related to having two nodes with different contexts mounted on them - it seems like that mod_cluster is trying to route requests to the wrong node (at least that's how I interpret line 1 for example).

       

      [Sun Aug 10 12:07:59.913096 2014] [:debug] [pid 17186:tid 140677308421888] mod_proxy_cluster.c(2283): proxy: byrequests balancer FAILED
      [Sun Aug 10 12:07:59.913102 2014] [:error] [pid 17186:tid 140677308421888] proxy: CLUSTER: (balancer://mycluster). All workers are in error state
      [Sun Aug 10 12:07:59.913391 2014] [proxy_ajp:debug] [pid 17187:tid 140677316814592] mod_proxy_ajp.c(625): [client 192.168.0.129:40827] AH00892: got response from 10.60.2.80:8009 (10.60.2.80), referer: http://10.60.100.40/planet/public/lo
      gin.xhtml
      [Sun Aug 10 12:07:59.913404 2014] [proxy:debug] [pid 17187:tid 140677316814592] proxy_util.c(2035): AH00943: AJP: has released connection for (10.60.2.80)
      [Sun Aug 10 12:07:59.914294 2014] [:debug] [pid 17186:tid 140677400741632] mod_proxy_cluster.c(2777): cluster: balancer://mycluster Found value 3-92sjh4DDfyxfcrHDCmU1J6.node01:context1-prod-01 for stickysession JSESSIONID|jsessionid
      [Sun Aug 10 12:07:59.914316 2014] [authz_core:debug] [pid 17186:tid 140677400741632] mod_authz_core.c(802): [client 192.168.0.129:40985] AH01626: authorization result of Require all granted: granted, referer: http://10.60.100.40/context1/public/login.xhtml
      [Sun Aug 10 12:07:59.914323 2014] [authz_core:debug] [pid 17186:tid 140677400741632] mod_authz_core.c(802): [client 192.168.0.129:40985] AH01626: authorization result of <RequireAny>: granted, referer: http://10.60.100.40/planet/public/login.xhtml
      [Sun Aug 10 12:07:59.914350 2014] [:debug] [pid 17186:tid 140677400741632] mod_proxy_cluster.c(2283): proxy: byrequests balancer FAILED
      [Sun Aug 10 12:07:59.914356 2014] [:error] [pid 17186:tid 140677400741632] proxy: CLUSTER: (balancer://mycluster). All workers are in error state
      [Sun Aug 10 12:07:59.920666 2014] [:debug] [pid 17186:tid 140677383956224] mod_proxy_cluster.c(2777): cluster: balancer://mycluster Found value 3-92sjh4DDfyxfcrHDCmU1J6.node01:context1-prod-01 for stickysession JSESSIONID|jsessionid
      [Sun Aug 10 12:07:59.920693 2014] [authz_core:debug] [pid 17186:tid 140677383956224] mod_authz_core.c(802): [client 192.168.0.129:40986] AH01626: authorization result of Require all granted: granted, referer: http://10.60.100.40/context1/public/login.xhtml
      [Sun Aug 10 12:07:59.920699 2014] [authz_core:debug] [pid 17186:tid 140677383956224] mod_authz_core.c(802): [client 192.168.0.129:40986] AH01626: authorization result of <RequireAny>: granted, referer: http://10.60.100.40/context1/public/login.xhtml
      [Sun Aug 10 12:07:59.920729 2014] [:debug] [pid 17186:tid 140677383956224] mod_proxy_cluster.c(2283): proxy: byrequests balancer FAILED
      [Sun Aug 10 12:07:59.920735 2014] [:error] [pid 17186:tid 140677383956224] proxy: CLUSTER: (balancer://mycluster). All workers are in error state
      [Sun Aug 10 12:08:00.614594 2014] [:debug] [pid 17187:tid 140677232887552] mod_proxy_cluster.c(314): Created: reusing worker for ajp://10.60.2.80:8009
      [Sun Aug 10 12:08:00.631046 2014] [:debug] [pid 17186:tid 140677333600000] mod_proxy_cluster.c(314): Created: reusing worker for ajp://10.60.2.90:8009
      [Sun Aug 10 12:08:10.617085 2014] [:debug] [pid 17187:tid 140677325207296] mod_proxy_cluster.c(314): Created: reusing worker for ajp://10.60.2.80:8009
      [Sun Aug 10 12:08:10.633239 2014] [:debug] [pid 17186:tid 140677367170816] mod_proxy_cluster.c(314): Created: reusing worker for ajp://10.60.2.90:8009
      [Sun Aug 10 12:08:20.619930 2014] [:debug] [pid 17187:tid 140677400741632] mod_proxy_cluster.c(314): Created: reusing worker for ajp://10.60.2.80:8009
      [Sun Aug 10 12:08:20.635693 2014] [:debug] [pid 17186:tid 140677274851072] mod_proxy_cluster.c(314): Created: reusing worker for ajp://10.60.2.90:8009
      [Sun Aug 10 12:08:30.622450 2014] [:debug] [pid 17187:tid 140677224494848] mod_proxy_cluster.c(314): Created: reusing worker for ajp://10.60.2.80:8009
      [Sun Aug 10 12:08:30.639547 2014] [:debug] [pid 17186:tid 140677350385408] mod_proxy_cluster.c(314): Created: reusing worker for ajp://10.60.2.90:8009
      [Sun Aug 10 12:08:40.625373 2014] [:debug] [pid 17187:tid 140677300029184] mod_proxy_cluster.c(314): Created: reusing worker for ajp://10.60.2.80:8009
      [Sun Aug 10 12:08:40.642499 2014] [:debug] [pid 17186:tid 140677266458368] mod_proxy_cluster.c(314): Created: reusing worker for ajp://10.60.2.90:8009
      [Sun Aug 10 12:08:50.629604 2014] [:debug] [pid 17187:tid 140677358778112] mod_proxy_cluster.c(314): Created: reusing worker for ajp://10.60.2.80:8009
      [Sun Aug 10 12:08:50.644822 2014] [:debug] [pid 17186:tid 140677291636480] mod_proxy_cluster.c(314): Created: reusing worker for ajp://10.60.2.90:8009
      
      

       

      We are not using any ProxyPass directives just mod_cluster's default automatic mounting of contexts.

      It is a really bad kind of error, since it can't be easily reproduced, it just turns up after some time - hours, maybe even days or weeks. There is nothing wrong with the JBoss nodes as far as I see, since the same system with simple ProxyPass directives (without mod_cluster loadbalancing) works without any problems.

       

      I was thinking of creating 2 multicast groups and then use Apache virtual hosts to separate the nodes with different contexts on them, but as far as I see mod_cluster can't use two different multicast groups, so the only option left is to run 2 Apache instances but that makes it impossible to reach all 3 contexts using the same IP address .

       

      Thank you!
      Domonkos