I read through the load balancing design doc, and I have a couple of observations.
First, I think you need to consider having the load manager component to at least have a master/slave concept (and maybe many slaves). What happens to the HASingleton if that node crashes? There wouldn't be anything communicating load information back to the httpd.
Second, I think you need to consider the common topology of when httpd is sitting in a DMZ, and communication is across the inner firewall. I know at my last employer, our security policy would not allow http/s to be over a non-standard port outbound (which the AS side would be going outbound to mod_cluster), and for the inbound side from httpd through the inner firewall to the AS we wouldn't allow http/s, as this would only be allowed to be open on the outer firewall into the httpd servers. Only application specific protocols like AJP were allowed through the inner firewall. We also preferred to have those protocols be encrypted with TLS, so for our configuration we would use stunnel, but that is a band-aid. I think we should consider a TLS option on the AJP, or whatever protocol we use for mod_cluster to talk to the AS side.
Re: HASingleton crash
If there's any nodes in the cluster, one of them will become master and will restart the HASingleton. Obviously, there would be a gap during which information would not be sent back to httpd, but it's something that we can live with. While httpd does not receive load information, it can carry on applying existing rules.
This is much easier that trying to maintain load information, that changes every minute, replicated accross the cluster.
That's what I was looking for, at a minimum, but its not clear in the documents diagram that that would be done. In an alternative approach to HASingleton, you don't necessarily have to replicate load information across the cluster. You could use buddy replication, but as long as there is a master/slave relationship on the HASingleton, then I agree, this is all that is needed. It just wasn't clear in the document.
I agree. I'll work with Brian to include a section of failure/failover scenario in the wiki.
Thanks for the input, Andy. The discussion of how the server-side works on the wiki doc is still very thin. Also, we also had a conf call right before I went on holiday; I need to update the docs for that.
How the HASingleton should work is an open issue. If the load balance calculation is stateless, it's a trivial HASingleton problem; if the current master fails/shuts down, the new one takes over and starts sending data to the mod_cluster instances. But, in reality the load balance calculation is unlikely to be stateless; e.g. most will probably use some sort of time-decay function. So, we need something like a master/slave, where the slaves maintain the necessary state to take over the load balance calculation if they are elected master.
A few approaches to this come to mind:
1) Nodes multicast their node data to the cluster. So, any node that is interested in maintaining state has it. Don't much like that, as it's pretty chatty in a large cluster. If the underlying JGroups channel isn't UDP-based, then its a lot of traffic.
2) Nodes are aware who the master and slaves are, and send multiple unicasts. Again chatty.
3) Master knows who the slaves are and sends a copy of aggregated, pre-digested state to them every time it recalculates. This seems the most logical.
In any case, a good load balancing algorithm should have smoothing functions built in to ensure the results do not change too radically if some state gets dropped..
The current HASingleton infrastructure should support these options pretty well; any HASingletonSupport subclass is a regular service that adds a couple extra lifecyle stages to the standard 4:
They also get callbacks for cluster topology changes. So, it's easy for the ModClusterService to run on each node in the "started" mode, monitoring state if they believe their position in the topology makes them a "slave". Then take over interacting with the mod_cluster instances when they "become master".
Re: communication across the firewall, the intent is to use http/https from the AS side to the httpd side for the load balancing information. What port and whether http or https is used depends on how the user configures httpd. Basically, mod_cluster functions as an httpd mount, similar to the jkstatus web app that comes with mod_jk. Here's the config you add to httpd.conf for that; I assume mod_cluster would use something similar (with a different "Allow from"):
<Location /status/> JkMount jkstatus Order deny,allow Deny from all Allow from 127.0.0.1 </Location>
For normal request traffic from mod_cluster to JBossWeb, communication will use AJP. The AS-side endpoint will be the standard AJP Connector; there is no intent to modify that at all for this. So, if Mladen et al decide to add TLS support to AJP, then it would be handled.