I don't know if that helps but I read in the changes of jboss 3.2.6RC1 changes the following (from http://sourceforge.net/docman/display_doc.php?docid=23847&group_id=22866):
* A fix for the following clustering scenario has been added. There are a number of nodes in a partition. One node is selected as master, and runs the singleton service. The master replica is shunned. The other nodes remove its keys from the DRM. Another master replica is selected. It runs the singleton service. The shunned node returns. Two bad things happen:
1. It doesn't check if it should still run the singleton, and assumes it is still the master (not true, he is now the last node in the DRM, not the first). From now on, two nodes are running the singleton.
2. The other nodes don't update the shunned node's keys. From this point on, as far as the other nodes are concerned, that node can never be a master replica for that singleton service.
Could this be also the cause of the abnormal behavior you are facing?
BTW, I'm not a jboss developer neither an expert user. Just a dummy user, trying to help.
thanks for your answer. I experienced those two things and it is very good that they are fixed in 3.2.6RC1. Unfortunately, I think, that this doesn't resolve all problems. One main problem still remains: in case the master node loses network connection no new master node is ellected. I think that the main problem is that the HA Partition implementation doesn't react to a suspect node event so there is no event to trigger master node re-election when the exisiting master node loses connection. Can someone confirm that this is a bug? Could this be resolved in some way? In case this is a bug then are there any plans to fix it in the next releases?