no, there is no directly configurable attribute that would allow that on the balancer side, but you can tweak capacity attribute of your load metrics on the AS/WildFly side, within mod_cluster subsystem configuration. If the use case is that one/some of the servers are weaker than others, this tweaking might help.
Anyway, with sticky sessions, load balancer will keep sending requests to that one server until it becomes unavailable. Note that I mean requests within the same session, not new sessions requests...
If you feel like there is a need for an additional feature, please, state the expected behaviour clearly on the Jira, filling a feature request.
Thanks Michal, I think I can reach this behavior by implementing some metric that returns 0 or -1 when the load exceeds some threshold, what do you think?
1 of 1 people found this helpful
Well, yes, doing it on the application server side would be definitely more in "mod_cluster way", i.e. not introducing static settings on the balancer side.
Take a look at this example of a custom load metric. It takes the load number from a file, so it's handy for testing.
Hi Michal, I'm also interested in the "application server side" taking into account a metric threshold so that the balancer side stops sending requests to the node in which the threshold was reached. Subsequent requests from the sessions on that node and requests from new sessions should then be redirected to other nodes in which the threshold isn't reached yet.
Should the ModClusterService send a STATUS message to the balancer side with value 0 or -1 to reflect this situation? LoadBalanceFactorProvider should have the responsability, in collaboration with LoadMetric, to calculate if any metric exceeded the threshold.
By the way, digging into the code I couldn't find the place in the application server side where a STATUS message is created with value 0 or -2. Are these values used at the present time?
Thanks in advance
Reading some related issues in mod_cluster jira, I've just realized that STATUS message with load 0 are generated by using a SimpleLoadBalanceFactorProvider with loadBalanceFactor 0. As I understand it, this value is used to indicate a node is stand-by wich makes that node elegible when other node fails. Therefore, that value can't be used to indicate a node has exceeded some metric threshold.
This leads us to resort to use -1 to indicate a node has reached a threshold in some metric. In order to do so, DynamicLoadBalanceFactorProvider could return -1 in this circumstance.
In a sticky-session scenario, taking into account a threshold prevents subsequent requests from these sessions from overloading the node, maintaining quality of service.
Dear Diego, I'm sorry, I'm not following you. Is this a question?
Yes, load 0, since MODCLUSTER-235, indicates a stand-by node, whereas -1 indicates the worker is in an error state.
Feel free to set -1 with your custom load metric and/or play with capacity and history values of the current metrics.
About the balancing logic in general
It is noteworthy that one has to send a substantial amount of requests to see the balancing behaviour, i.e. if you send only three or five requests, it might occur to you that the balancer is targeting overloaded nodes.
You might want to take a look at this article: FAQ · modcluster/mod_cluster Wiki · GitHub
Hi Michal, what i am trying to say is that sending a STATUS message with -1 requires the DynamicLoadBalanceFactorProvider to be modified or a new implementation of LoadBalanceFactorProvider. It doesn't depend only on the value returned by the LoadMetric. Currently, DynamicLoadBalanceFactorProvider normalizes the load factor, whatever is its value, to a number between 1 and 100. In fact, if the load metric value were -1, DynamicLoadBalanceFactorProvider would normalize it to a value of 100.
Hope I've made my self clear.
Hmm, I see, you don't like:
// apply ceiling & floor and invert to express as "load factor" // result should be a value between 1-100 return 100 - Math.max(0, Math.min(load, 99));
Well, there are two options:
1) Set your returning load and capacity so as your metric does something like this:
measured something - returning 80
measured something - returning 60
measured something - returning 50
measured something - internal threshold crossed - returning 1
This way, having low history, you will easily make the balancer to avoid this worker (see that FAQ article on GitHub I linked to previously). On the other hand, you are right, the requests within active sessions will keep coming until sessions become inactive.
2) Open a MODCLUSTER JIRA feature request
This might be something along the lines that you want to have the power to programatically disable (switch to error state) the worker node from within your own custom load metric, thus forcing failover.
By the way, you might not be aware of it, but it's possible to trigger failover from your web applications by returning a special predefined HTTP codes, e.g. you might define HTTP 203 as the code on which the balancer does failover to another box: See https://issues.jboss.org/browse/MODCLUSTER-390
Yes, i've been playing around with option 1 to conclude that this can't be solved solely in a LoadMetric, so perhaps we should fill a feature request in jira.
Meanwhile I'm going to look around what you said about the application triggering a failover through a predefined HTTP code.
Very roughly scanning through this thread, it seems the correct solution would be writing a custom treshold load metric (that can delegate to others or whatever you want to do) and indeed returning load of 0 once the theshold is reached. This way no new sessions will be sent to the node, but the remaining ones will still stay sticky to that node.
IMHO nope, see my previous comment, line 4. It's not possible with the current code.
Besides, Diego doesn't want even the current sessions to carry on, he wants to force a failover at a certain threshold. I understand it as some kind of an emergency overload precaution, when it's actually better for the client to be routed to a new worker, because digging out the session data is still faster than continuing with the former overloaded worker.
Anyway, whatever you set as Load in your custom load metric, it won't be lower than 1, hence my recommendations 1) and 2), previous comment.