1 2 Previous Next 16 Replies Latest reply on Feb 20, 2012 10:37 AM by erhard

    Error 503 for several seconds until session failover

    erhard

      Hi,

       

      I tried a simple demo applicaton and democlient (see attachments) that opens a session and increases a session-variable for each hit. The democlient hits the server once every second. I have an apace httpd with mod_cluster and two JBoss server. When I shut down the active JBoss, I get Error 503 for about 10 seconds, then the other server gets the requests an continues the session:

       

      $ ./democlient.py http://devjava/demo7/

      0 unknown

      1 ef730190

      2 ef730190

      3 ef730190

      4 ef730190

      ...

      28 ef730190

      29 ef730190

      30 ef730190 <--- Now I shut down the active server

      Failed to open "http://devjava/demo7/". Error code - 503.

      Broken between 30 and 0 for 0 seconds.

      Failed to open "http://devjava/demo7/". Error code - 503.

      Broken between 0 and 0 for 1 seconds.

      Failed to open "http://devjava/demo7/". Error code - 503.

      Broken between 0 and 0 for 2 seconds.

      Failed to open "http://devjava/demo7/". Error code - 503.

      Broken between 0 and 0 for 3 seconds.

      Failed to open "http://devjava/demo7/". Error code - 503.

      Broken between 0 and 0 for 4 seconds.

      Failed to open "http://devjava/demo7/". Error code - 503.

      Broken between 0 and 0 for 5 seconds.

      Failed to open "http://devjava/demo7/". Error code - 503.

      Broken between 0 and 0 for 6 seconds.

      Failed to open "http://devjava/demo7/". Error code - 503.

      Broken between 0 and 0 for 7 seconds.

      Failed to open "http://devjava/demo7/". Error code - 503.

      Broken between 0 and 0 for 8 seconds.

      Failed to open "http://devjava/demo7/". Error code - 503.

      Broken between 0 and 0 for 9 seconds.

      Broken between 0 and 31 for 10 seconds. <-- After about 10 seconds the second server continues with the session

      32 dc1ca871

      33 dc1ca871

      34 dc1ca871

      35 dc1ca871

      ...

       

      Especially when I shut down JBoss gracefully I would have expected that a failover occurs without errors.

      Do I expect too much?

      Is it a configuration error or a bug?

      Anybody got this running without errors with mod_cluster?

       

      I tried this with JBoss 4.2.3, the latest JBoss 7 Snapshot (with AJP), mod_cluster 1.0.0, 1.1.3 and the latest 1.1.4-SNAPSHOT, with the same results. The Apache configuration is the one from the mod_cluster download.

      The problem might be related to

      http://community.jboss.org/message/625242

      http://community.jboss.org/message/643133

       

      Greetings

      Erhard

        • 1. Re: Error 503 for several seconds until session failover
          erhard

          It seems that the problem is in mod_proxy_cluster.c:

           

          {code}

                  if (domain == NULL) {

                      /*

                       * We have a route provided that doesn't match the

                       * balancer name. See if the provider route is the

                       * member of the same balancer in which case return 503

                       */

                      ap_log_error(APLOG_MARK, APLOG_ERR, 0, r->server,

                                   "proxy: CLUSTER: (%s). All workers are in error state for route (%s)",

                                   (*balancer)->name, route);

              ...

          {code}

           

          I don't use domain-mode and domain is only set when ou->mess.Domain[0] != '\0'. Clustering should be independent from domain-mode. The following helps with this problem:

           

          {code}

          Index: mod_proxy_cluster.c

          ===================================================================

          --- mod_proxy_cluster.c          (revision 663)

          +++ mod_proxy_cluster.c          (working copy)

          @@ -1858,9 +1858,7 @@

          #endif

               if (node_storage->find_node(&ou, route) == APR_SUCCESS) {

                   if (!strcmp(balancer, ou->mess.balancer)) {

          -            if (ou->mess.Domain[0] != '\0') {

          -                *domain = ou->mess.Domain;

          -            }

          +            *domain = ou->mess.Domain;

                       return APR_SUCCESS;

                   }

               }

          {code}

           

          Greetings

          Erhard

          • 2. Re: Error 503 for several seconds until session failover
            jfclere

            Hm it seems you are using stickySessionForce = true, aren't you?

            • 3. Re: Error 503 for several seconds until session failover
              erhard

              Yes, I use the defaults.

              ssl                                                                                  advertise=true

              advertise-socket=modcluster                                                          auto-enable-contexts=true

              balancer=mycluster                                                                   excluded-contexts=ROOT,admin-console,invoker,jbossws,jmx-console,juddi,web-console

              flush-packets=false                                                                  flush-wait=-1

              max-attemps=1                                                                        node-timeout=-1

              ping=10                                                                              proxy-list=/

              socket-timeout=20                                                                    sticky-session=1

              sticky-session-force=true                                                            sticky-session-remove=false

              stop-context-timeout=10                                                              ttl=60

              worker-timeout=-1

              • 4. Re: Error 503 for several seconds until session failover
                jfclere

                try with stickySessionForce = false

                • 5. Re: Error 503 for several seconds until session failover
                  erhard

                  No noticable difference. In the logfile:

                  [Mon Jan 02 11:38:09 2012] [error] proxy: CLUSTER: (balancer://mycluster). All workers are in error state for route (2b727b8c-faf0-37d7-9cab-e2af94cd7bea)

                   

                   

                  ls subsystem=modcluster/mod-cluster-config=configuration   

                  ssl                                                                                  advertise=true

                  advertise-socket=modcluster                                                          auto-enable-contexts=true

                  balancer=mycluster                                                                   excluded-contexts=ROOT,admin-console,invoker,jbossws,jmx-console,juddi,web-console

                  flush-packets=false                                                                  flush-wait=-1

                  max-attemps=1                                                                        node-timeout=-1

                  ping=10                                                                              proxy-list=/

                  socket-timeout=20                                                                    sticky-session=1

                  sticky-session-force=false                                                           sticky-session-remove=false

                  stop-context-timeout=10                                                              ttl=60

                  worker-timeout=-1

                  • 6. Re: Error 503 for several seconds until session failover
                    jfclere

                    Please try with the original mod_cluster code and HAVE_CLUSTER_EX_DEBUG 1 (mod_proxy_cluster/mod_proxy_cluster.c), I am not able to reproduce the problem.

                    • 7. Re: Error 503 for several seconds until session failover
                      erhard

                      Attached the error log with HAVE_CLUSTER_EX_DEBUG 1. The requests with the democlient look like this:

                       

                      ./democlient.py http://devjava/demo/

                      0 unknown

                      Switch to node cluster1

                      1 cluster1

                      2 cluster1

                      3 cluster1

                      4 cluster1

                      5 cluster1

                      6 cluster1

                      7 cluster1

                      8 cluster1

                      9 cluster1

                      10 cluster1

                      11 cluster1

                      12 cluster1

                      Failed to open "http://devjava/demo/". Error code - 503.

                      Broken between 12 and 0 for 0 seconds.

                      Failed to open "http://devjava/demo/". Error code - 404.

                      Broken between 0 and 0 for 1 seconds.

                      Failed to open "http://devjava/demo/". Error code - 503.

                      Broken between 0 and 0 for 2 seconds.

                      Failed to open "http://devjava/demo/". Error code - 503.

                      Broken between 0 and 0 for 3 seconds.

                      Failed to open "http://devjava/demo/". Error code - 503.

                      Broken between 0 and 0 for 4 seconds.

                      Failed to open "http://devjava/demo/". Error code - 503.

                      Broken between 0 and 0 for 5 seconds.

                      Failed to open "http://devjava/demo/". Error code - 503.

                      Broken between 0 and 0 for 6 seconds.

                      Failed to open "http://devjava/demo/". Error code - 503.

                      Broken between 0 and 0 for 7 seconds.

                      Failed to open "http://devjava/demo/". Error code - 503.

                      Broken between 0 and 0 for 8 seconds.

                      Failed to open "http://devjava/demo/". Error code - 503.

                      Broken between 0 and 0 for 9 seconds.

                      Failed to open "http://devjava/demo/". Error code - 503.

                      Broken between 0 and 0 for 10 seconds.

                      Broken between 0 and 13 for 11 seconds.

                      Switch to node cluster2

                      14 cluster2

                      15 cluster2

                      16 cluster2

                      17 cluster2

                       

                       

                      These tests are done with JBoss4 because something strange happend. When I tried with JBoss 7, I suddenly couldn't reproduce the problem anymore. After some testing with JBoss 4 and JBoss 7, it looks like a clean restart of Apache and JBoss4 application leads to the error, a clean restart of Apache and JBoss7 is ok, but stopping all JBoss 4 instances and starting JBoss 7 instances without restart of Apache leads to the error. (It seems that starting the JBoss 4 after stopping the JBoss 7 without Apache restart doesn't lead to the error, but its too late right now to confirm this for sure.) In other words:

                      Stop Apache and all JBoss instances

                      start Apache

                      start JBoss4-1

                      start JBoss4-2

                      start democlient

                      stop JBoss4-1 -> Error 503

                      stop democlient

                      stop JBoss4-2

                      start JBoss7-1

                      start JBoss7-2

                      start democlient

                      stop JBoss7-1 -> Error 503

                      stop democlient

                      stop JBoss7-2

                      restart Apache

                      start JBoss7-1

                      start JBoss7-2

                      start democlient

                      stop JBoss7-1 -> No Error!

                       

                      I colleage of mine reproduced the error today solely with JBoss 7, I investigate the details tomorrow. The error also occured consistently with jboss-as-7.1.0.CR1-SNAPSHOT, in the meantime I upgraded to  jboss-as-7.1.0.Final-SNAPSHOT (because of another bug). Maybe this fixed the problem with JBoss 7 (ou->mess.Domain[0] != '\0' ???) I also installed the new mod_cluster.jar in JBoss4, but it didn't help either.

                       

                      If JBoss 4 is not supposed to work with mod_cluster 1.1.4, it's not too much of a problem, since I have a workaround with my patch, otherwise I would be happy to help with more information if necessary.

                       

                      Erhard

                       

                      Nachricht geändert durch Erhard Siegl

                      • 8. Re: Error 503 for several seconds until session failover
                        erhard

                        Apearently yesterday it was too late to think straight. The reason that it suddenly worked was the restart of Apache after setting sticky-session-force=false and in JBoss 4 I still had sticky-session-force=true. So it seems that after changing sticky-session-force one has to stopp all servers and restart Apache.

                        Since sticky-session-force=true is the default, what are the plans? Change the defaults, make it work or change the documentation?

                        Do you still need a logfile with HAVE_CLUSTER_EX_DEBUG 1?

                         

                        Erhard

                        • 9. Re: Error 503 for several seconds until session failover
                          jfclere

                          The documentation says that sticky-session-force=true is the default so it is correct, if you think the default should be false open a JIRA.

                          I don't need logfile with HAVE_CLUSTER_EX_DEBUG 1 if it works.

                          Anyway the 404 you have is weird it may need some investigation. Did it occur with AS7?

                          • 10. Re: Error 503 for several seconds until session failover
                            erhard

                            The 404 occured with JBoss 4. (At the first glipse it looks like http://community.jboss.org/message/643850 but you said this came from my patch.)

                             

                            I think the defaults should work and I think getting a 503 is not ok. It took me a couple of days and your (much apreciated) help to get a simple demo running (still have to fix it for JBoss 4). I think mod_cluster and AS 7 are great, but I ran into about 5 problems (I still have open issues) since I started to play around with it. I want my customers to use mod_cluster, but it in order to recommend it, it has to work out of the box. Thats why I try to help with these issues.

                            I think it should be fixed that there is a 503 with sticky-session-force=true. It seems to be the same problem as in https://issues.jboss.org/browse/MODCLUSTER-257, which is still unresolved.

                             

                            I hope this doesn't sount like a rant, its not meant to be negative.

                             

                            Erhard

                            • 11. Re: Error 503 for several seconds until session failover
                              jfclere

                              According to the latest trace you provide the 404 is trigger by a bug in the remove / update logic. I will create a JIRA for it.

                               

                              MODCLUSTER-257 is probably several small bugs and misconfiguration (read bad defaults too).

                               

                              503 with sticky-session-force=true won't be fixed for the moment (most of them are excepted).

                              • 12. Re: Error 503 for several seconds until session failover
                                erhard

                                I got AS4 and AS7 both working with stickySessionForce = false. Thank you. I didn't understand stickySessionForce properly, to make this default is questionable.

                                 

                                The issue that I had to restart Apache in order to acivate the property: Is it a bug or intended behaviour?

                                • 13. Re: Error 503 for several seconds until session failover
                                  jfclere

                                  The need to restart Apache httpd is a bug: need a JIRA.

                                  • 14. Re: Error 503 for several seconds until session failover
                                    rhusar

                                    BTW here is a link to the required apache restart issue https://issues.jboss.org/browse/MODCLUSTER-273

                                    1 2 Previous Next