This content has been marked as final. Show 6 replies
IMHO, I think this is normal and by design.
This is not normal behavior. If I cluster two Win2K machines that are on the same subnet, when the second node starts up, it immediately joins the partition once the JG channel is started by the ClusterPartition. All docs and posts that I have seen regarding clustering on JBoss state that the above behavior is the norm, but do not provide any information on detailed troubleshooting if this behavior does not occur.
Does anyone have any idea why this is happening? I am struggling to get our systems fully configured and tested for a pilot launch of our application!
The delayed join of the nodes is causing the HAPartition to not initialize the DistributedState with the current values from the partition (which have been set by the other node during startup/operation, before the second node was started).
I know this doesn't help, but for what it's worth, I'm seeing the same problem. Did you ever figure out a solution?
In your current config it takes some secs to detect a crashed
member. I 'd suggest you use timeout 10 OR 00 and max_tries=3. The caveat:
each member will send a heartbeat every 1.5 secs. Also, you increase the
probability of a 'false ' suspicion. If VERIFY_SUSPECT doesn 't catch
that, your suspected member will be shunned and then re-admitted later.
If you have a slow member, that 's just above the 1500 in response time,
it will constantly be shunned and re-joined.
Post the current configuration of your cluster-service.xml file... probably i'll be able to help you further.
I'm using the cluster-service.xml right out of the box.