13 Replies Latest reply on Oct 22, 2014 6:51 AM by mbabacek

    Server load threshold

    leaqui

      Hi, I want to know if it is possible (in a sticky session scenario) to automatically send new requests to other servers when server load exceeds some configurable threshold in one server.

      Thanks in advance

       

      Leandro

        • 1. Re: Server load threshold
          mbabacek

           

          Dear Leandro,

          no, there is no directly configurable attribute that would allow that on the balancer side, but you can tweak capacity attribute of your load metrics on the AS/WildFly side, within mod_cluster subsystem configuration. If the use case is that one/some of the servers are weaker than others, this tweaking might help.

           

          Anyway, with sticky sessions, load balancer will keep sending requests to that one server until it becomes unavailable. Note that I mean requests within the same session, not new sessions requests...

           

          If you feel like there is a need for an additional feature, please, state the expected behaviour clearly on the Jira, filling a feature request.

           

          HTH

           

          K.

          • 2. Re: Server load threshold
            leaqui

            Thanks Michal, I think I can reach this behavior by implementing some metric that returns 0 or -1 when the load exceeds some threshold, what do you think?

            Thanks again

             

            Leandro

            • 3. Re: Server load threshold
              mbabacek

              Well, yes, doing it on the application server side would be definitely more in "mod_cluster way", i.e. not introducing static settings on the balancer side.

              Take a look at this example of a custom load metric. It takes the load number from a file, so it's handy for testing.

              1 of 1 people found this helpful
              • 4. Re: Server load threshold
                godiedelrio

                Hi Michal, I'm also interested in the "application server side" taking into account a metric threshold so that the balancer side stops sending requests to the node in which the threshold was reached. Subsequent requests from the sessions on that node and requests from new sessions should then be redirected to other nodes in which the threshold isn't reached yet.

                Should the ModClusterService send a STATUS message to the balancer side with value 0 or -1 to reflect this situation? LoadBalanceFactorProvider should have the responsability, in collaboration with LoadMetric, to calculate if any metric exceeded the threshold.

                By the way, digging into the code I couldn't find the place in the application server side where a STATUS message is created with value 0 or -2. Are these values used at the present time?

                Thanks in advance

                • 5. Re: Server load threshold
                  godiedelrio

                  Reading some related issues in mod_cluster jira, I've just realized that STATUS message with load 0 are generated by using a SimpleLoadBalanceFactorProvider with loadBalanceFactor 0. As I understand it, this value is used to indicate a node is stand-by wich makes that node elegible when other node fails. Therefore, that value can't be used to indicate a node has exceeded some metric threshold.

                  This leads us to resort to use -1 to indicate a node has reached a threshold in some metric. In order to do so, DynamicLoadBalanceFactorProvider could return -1 in this circumstance.

                  In a sticky-session scenario, taking into account a threshold prevents subsequent requests from these sessions from overloading the node, maintaining quality of service.

                  Thanks again.

                  • 6. Re: Server load threshold
                    mbabacek

                    Dear Diego, I'm sorry, I'm not following you. Is this a question?

                    Yes, load 0, since MODCLUSTER-235, indicates a stand-by node, whereas -1 indicates the worker is in an error state.

                    Feel free to set -1 with your custom load metric and/or play with capacity and history values of the current metrics.

                     

                    About the balancing logic in general

                    It is noteworthy that one has to send a substantial amount of requests to see the balancing behaviour, i.e. if you send only three or five requests, it might occur to you that the balancer is targeting overloaded nodes.

                    You might want to take a look at this article: FAQ · modcluster/mod_cluster Wiki · GitHub

                     

                    Cheers

                    • 7. Re: Server load threshold
                      godiedelrio

                      Hi Michal, what i am trying to say is that sending a STATUS message with -1 requires the DynamicLoadBalanceFactorProvider to be modified or a new implementation of LoadBalanceFactorProvider. It doesn't depend only on the value returned by the LoadMetric. Currently, DynamicLoadBalanceFactorProvider normalizes the load factor, whatever is its value, to a number between 1 and 100. In fact, if the load metric value were -1, DynamicLoadBalanceFactorProvider would normalize it to a value of 100.

                      Hope I've made my self clear.

                      • 8. Re: Re: Server load threshold
                        mbabacek

                        Hmm, I see, you don't like:

                        // apply ceiling & floor and invert to express as "load factor"
                        // result should be a value between 1-100
                        return 100 - Math.max(0, Math.min(load, 99));
                        
                        

                        right?

                         

                        Well, there are two options:

                         

                        1) Set your returning load and capacity so as your metric does something like this:

                         

                        measured something - returning 80

                        measured something - returning 60

                        measured something - returning 50

                        measured something - internal threshold crossed - returning 1

                        returning 1

                        ...

                         

                        This way, having low history, you will easily make the balancer to avoid this worker (see that FAQ article on GitHub I linked to previously). On the other hand, you are right, the requests within active sessions will keep coming until sessions become inactive.

                         

                        2) Open a MODCLUSTER JIRA feature request

                        This might be something along the lines that you want to have the power to programatically disable (switch to error state) the worker node from within your own custom load metric, thus forcing failover.

                         

                        By the way, you might not be aware of it, but it's possible to trigger failover from your web applications by returning a special predefined HTTP codes, e.g. you might define HTTP 203 as the code on which the balancer does failover to another box: See https://issues.jboss.org/browse/MODCLUSTER-390

                         

                        Cheers

                        K.

                        • 9. Re: Server load threshold
                          godiedelrio

                          Yes, i've been playing around with option 1 to conclude that this can't be solved solely in a LoadMetric, so perhaps we should fill a feature request in jira.

                          Meanwhile I'm going to look around what you said about the application triggering a failover through a predefined HTTP code.

                           

                          Many thanks.

                          • 10. Re: Server load threshold
                            rhusar

                            Very roughly scanning through this thread, it seems the correct solution would be writing a custom treshold load metric (that can delegate to others or whatever you want to do) and indeed returning load of 0 once the theshold is reached. This way no new sessions will be sent to the node, but the remaining ones will still stay sticky to that node.

                            • 11. Re: Re: Server load threshold
                              mbabacek

                              IMHO nope, see my previous comment, line 4.  It's not possible with the current code.

                               

                              Besides, Diego doesn't want even the current sessions to carry on, he wants to force a failover at a certain threshold. I understand it as some kind of an emergency overload precaution, when it's actually better for the client to be routed to a new worker, because digging out the session data is still faster than continuing with the former overloaded worker.

                              Anyway, whatever you set as Load in your custom load metric, it won't be lower than 1, hence my recommendations 1) and 2), previous comment.

                               

                              K.

                              • 12. Re: Re: Server load threshold
                                rhusar

                                Ok, I guess it used to work, the commit that breaks my recommendation is:

                                 

                                MODCLUSTER-279 mod_cluster returns 503s after STATUS · 1fbff78 · modcluster/mod_cluster · GitHub