-
1. Re: Proper cluster behavior on machine failure
sannegrinovero May 7, 2013 5:17 AM (in response to moia)Hi,
it's JGroups who has control of which nodes are in/out the group, these are called failure detection protocols. The ones I'm aware of however don't trigger on such exceptions as JGroups might not be aware of what's going on at the higher layer. You could extend one of JGroups's FD protocols, then catch such TimeoutExceptions and if they happen more than some threshold you could grab a reference to your custom FD protocol and force it to kick the bad bahaving node out of the group.
Totally unrelated: assuming your "hardware problem" might be causedby very high stress, it might be useful to try upgrading to Infinispan 5.2.6.Final as it performs a significantly lower amount of lock operations, so might be more "gentle" to your hardware and avoid the problem altogether.