I have three Intel boxes running RH9.0 and JBoss 3.2.1 set up in a clustered environment. In addition to the default partition, I have created an additional partition for these machines (TestPartition). In the deployed application, there are three SLSBs with clustering enabled for the TestPartition partition. They are also configured to use the round-robin load-balancing policy.
A fourth Intel box running RH9.0 runs a separate stand-alone Java process which makes use of this cluster and retrieves the EJB remote interfaces via HA-JNDI and HA-JNDI auto discovery (Hashtable passed in to the InitialContex object constructor contains: provider_url=null, url_pkg_prefixes=org.jboss.naming:org.jnp.interfaces, jnp.partitionName=TestPartition, jnp.discoveryGroup=230.0.0.5, jnp.discoveryPort=230.0.0.5:1102 discovery group and port match the values specified in the cluster deployment descriptor). This stand alone process invokes methods on the clustered beans on a regular interval which in turn query db tables.
I bring up the three JBoss instances and I can see in the log files all three of the machines successfully join the cluster. I start my stand-alone app and I can see that the work is being distributed amogst all three machines. If I kill one of the app server instances via a kill or kill -9, the remaining two app servers report the instance as a dead member and continue working. When I bring the instance back up, it rejoins the cluster and work is distributed to it.
The problem I encounter is when I pull the network cable out of one of the machines or shut down the network interface. When I do that, the other members of the cluster mark that machine as dead but any method invocations to it do not get failed over. The hang until the machine/instance comes back on-line.
Has anyone else encountered this problem or perhaps have any suggestions on how to fix or work around this problem?
Thanks!
-Tom