We are running two separate JBOSS clusters with 2 machines each and these two clusters are communicating with each other using Axis.
The two clusters are differentiated from each other by different multicast ports but their partition names have been kept same as "DefaultPartition".
The two clusters are in physically different networks separated by a firewall, thus they cannot communicate with each other except through a designated port.
On one cluster, we are using HA-JNDI to store and lookup some key-value data. During normal functioning, even during high load, these two clusters communicate properly. However, during high load, if the cluster machines which are not using HA-JNDI reboot, then the following happens:
1. During the time when rebooted cluster machines are coming up, communication from other cluster gets Socket exceptions( which is correct)
2. When the rebooted cluster machines have re-started and JBoss on them has also started, then the already alive cluster machines, get the expected response for their communication. (which is again correct)
3. After a successful response, the already alive cluster machines uses the HA-JNDI to lookup some previous data and bind some new data. At this point that transaction hangs and doesn't get any response from InitialContext. The hanging happens generally for 1 min
and in very few cases even as long as 30 minutes.
All symptoms point to some problem in the HA-JNDI or some communication between two clusters that is happening and causing the HA-JNDI to hang whenever one cluster machines reboot, but we are not sure of this.
The only solution to stop this hanging is to restart one of the cluster machines of the cluster that is using the HA-JNDI and which was not rebooted in the process.
I will appreciate any help in this regard.
thanks in advance,