2 Replies Latest reply on Aug 23, 2004 3:12 AM by canghel

    HA Singleton not working when the master node loses network

    canghel

      Hi JBoss guys,

      I am trying to use the HA Singleton mechanism in a JBoss cluster.

      To test the cluster behaviour I plug the network cable out from the master node. The cluster does not report that the node is down so the controller does not elect a new master node so I end up with a cluster with no master node.

      I noticed that when I plug the cable back in the cluster updates its topology. Why the cluster topology does not get updated when a node is down(due to network connection failure) but only when a node comes back or re-joins the cluster?

      I noticed also that if the master node is shut down normally the cluster topology gets updated, a new master node is elected and everything works fine (of course after those 60 seconds due to the hardcoded value in HaPartitionImpl)

      Is there any way to make the controller re-elect a new master node when the master node loses network connection?

      Is there any way to make the cluster react when a node loses network connection and to make the cluster update its topology?

      thanks in advance,
      Claudiu

        • 1. Re: HA Singleton not working when the master node loses netw
          pedrosalazar

          Claudiu,

          I don't know if that helps but I read in the changes of jboss 3.2.6RC1 changes the following (from http://sourceforge.net/docman/display_doc.php?docid=23847&group_id=22866):

          * A fix for the following clustering scenario has been added. There are a number of nodes in a partition. One node is selected as master, and runs the singleton service. The master replica is shunned. The other nodes remove its keys from the DRM. Another master replica is selected. It runs the singleton service. The shunned node returns. Two bad things happen:
          1. It doesn't check if it should still run the singleton, and assumes it is still the master (not true, he is now the last node in the DRM, not the first). From now on, two nodes are running the singleton.
          2. The other nodes don't update the shunned node's keys. From this point on, as far as the other nodes are concerned, that node can never be a master replica for that singleton service.


          Could this be also the cause of the abnormal behavior you are facing?

          BTW, I'm not a jboss developer neither an expert user. Just a dummy user, trying to help.

          regards,
          Pedro Salazar

          • 2. Re: HA Singleton not working when the master node loses netw
            canghel

            Hi Pedro,

            thanks for your answer. I experienced those two things and it is very good that they are fixed in 3.2.6RC1. Unfortunately, I think, that this doesn't resolve all problems. One main problem still remains: in case the master node loses network connection no new master node is ellected. I think that the main problem is that the HA Partition implementation doesn't react to a suspect node event so there is no event to trigger master node re-election when the exisiting master node loses connection. Can someone confirm that this is a bug? Could this be resolved in some way? In case this is a bug then are there any plans to fix it in the next releases?

            thanks,
            Claudiu