Cluster Member Starvation
groovesoftware Jul 17, 2002 11:28 AMI am experiencing a situation where some members of a cluster are not getting any requests. The following is my configuration.
OS: Solaris
Java version: 1.3
JBoss version: jboss_3.0.1RC1-tomcat_4.0.4
I have 2 partitions. One called ProdEJB has only SFSB and SLSB clustering turned on. The other called ProdJNDI has only JNDI clustering turned on. At present all servers live on the same physical computer. Each server participating in the ProdJNDI partition is running HA-JNDI on a different port.
5 or 6 servers participate in the ProdEJB partition. All of the 5 or 6 servers that participate in the ProdEJB partition also participate in the ProdJNDI partition. We refer to these servers as EJB servers.
There is one other server which runs servlets only and no EJBs that participates in the ProdJNDI partition. The servlets in this server use ejb-ref declarations to talk to EJBs in the ProdEJB partition. Each ejb-ref points to a JNDI name in HA-JNDI on this server. We refer to this server as the web server.
So you can see that for each EJB looked up by the web server the lookup will talk to HA-JNDI running in the web server. HA-JNDI will not find the EJB name in the global namespace or the local namespace so it will begin asking the other (EJB) servers if they have the name in their local namespace. The first EJB server it asks will have the name and will return a clustered home interface. The web server now has a reference to the clustered home of an EJB. Calls to the create methods in the home interface should be made round-robin on the 5 or 6 EJB servers. For SLSB calls to any remote method should be made round-robin on the 5 or 6 EJB servers.
One final note is that the web server holds on to the home reference in a Service Locator object rather than looking it up each time it is needed.
With this setup we are seeing the following behavior.
1. With 4 or less servers every server receives calls made to create methods in the home or business methods in the remote in a more or less fair manner. Over time all servers get roughly the same number of requests.
2. With 5 servers 1 of the servers gets starved. In other words it never gets any requests. In addition to this some other server gets about twice as many requests as the other servers. The remaining 3 servers get roughly the same number of requests.
3. With 6 servers 2 of the servers get starved and some other server gets about twice as many requests as the other servers.
The particular servers that get starved seems fairly random but somewhat related to the order in which I start the servers.
I realize that this is a somewhat more complicated clustering situation than most people are using, but it is reasonable, and suits our needs. It enables us to have 1 (and later more) servers handling servlets and talking to a cluster of EJB servers. This will perform well in our environment.
Has anyone else seen problems like this with the Round Robin algorithms? Any ideas?