2 Replies Latest reply on Apr 27, 2006 6:15 PM by coachvargo

    Cluster problem: Farm works, but HASingleton service does no

    coachvargo Newbie

      I am having a problem with clustering on 2 servers running red hat enterprise. I have set up the clustering to use the tcp config. It works fine with 2 servers I have locally, but not on 2 servers at my ISP. The 2 servers get up and running and I can see in the logs that they recognize each other...sort of. Farming works fine, I can copy a XXXXX-ds.xml file to the farm directory and it is properly sent to the other server. Both list the cluster as having 2 members, but neither of them wants to run the hasingleton services I set up. When I just start the first server, the service correctly runs and MasterNode = true. When I run the second server, it joins the cluster and the hasinglton services I set up are then destroyed on the main node (no longer exist on the jmx console on either server) and then BOTH nodes show MasterNode = false in the hasingleton service. The .sar files exist on both servers in the deploy-hasingleton directory, so that isn't an issue here.

      Anyone have any ideas? Here is a log sample from the second node in the cluster.
      I can see my config is making it ok to the log:

      2005-12-06 11:51:53,486 DEBUG [org.jboss.ha.framework.server.ClusterPartition] Setting JGProps from xml to: TCP(bind_addr=;loopback=true;start_port=7800):TCPPING(down_thread=true;

      results of the tcp ping requests, which I think is a little strange since the ip address I have in my config for the other machine is being resolved to the network alias:

      2005-12-06 11:51:54,021 DEBUG [org.jgroups.protocols.TCPPING] [FIND_INITIAL_MBRS] sending PING request to st2clxll13:7800
      2005-12-06 11:51:54,022 DEBUG [org.jgroups.protocols.TCP] dest=st2clxll13:7800, hdrs:
      TCP: [TCP:group_addr=DefaultPartition]
      TCPPING: [PING: type=GET_MBRS_REQ, arg=null]
      2005-12-06 11:51:54,023 DEBUG [org.jgroups.protocols.TCPPING] [FIND_INITIAL_MBRS] sending PING request to st2clxll13:7801
      2005-12-06 11:51:54,024 DEBUG [org.jgroups.protocols.TCPPING] [FIND_INITIAL_MBRS] sending PING request to st2clxll13:7802
      2005-12-06 11:51:54,032 DEBUG [org.jgroups.protocols.TCP] opened connection to st2clxll13:7800
      2005-12-06 11:51:54,032 INFO [org.jgroups.blocks.ConnectionTable] connection was created to st2clxll13:7800
      2005-12-06 11:51:54,032 INFO [org.jgroups.blocks.ConnectionTable] created socket to st2clxll13:7800

      Here's the membership info from the log (it lists both members of the cluster as localhost, and also, notice how "I am" = null, where it should be the ip address of the host machine):

      2005-12-06 11:51:57,637 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] I am (null) received membershipChanged event:
      2005-12-06 11:51:57,638 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] Dead members: 0 ([])
      2005-12-06 11:51:57,638 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] New Members : 0 ([])
      2005-12-06 11:51:57,638 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] All Members : 2 ([,])