2 Replies Latest reply on Apr 27, 2006 6:15 PM by coachvargo

Cluster problem: Farm works, but HASingleton service does no

coachvargo Dec 6, 2005 1:50 PM

I am having a problem with clustering on 2 servers running red hat enterprise. I have set up the clustering to use the tcp config. It works fine with 2 servers I have locally, but not on 2 servers at my ISP. The 2 servers get up and running and I can see in the logs that they recognize each other...sort of. Farming works fine, I can copy a XXXXX-ds.xml file to the farm directory and it is properly sent to the other server. Both list the cluster as having 2 members, but neither of them wants to run the hasingleton services I set up. When I just start the first server, the service correctly runs and MasterNode = true. When I run the second server, it joins the cluster and the hasinglton services I set up are then destroyed on the main node (no longer exist on the jmx console on either server) and then BOTH nodes show MasterNode = false in the hasingleton service. The .sar files exist on both servers in the deploy-hasingleton directory, so that isn't an issue here.

Anyone have any ideas? Here is a log sample from the second node in the cluster.
I can see my config is making it ok to the log:

2005-12-06 11:51:53,486 DEBUG [org.jboss.ha.framework.server.ClusterPartition] Setting JGProps from xml to: TCP(bind_addr=172.25.5.30;loopback=true;start_port=7800):TCPPING(down_thread=true;
initial_hosts=172.25.5.30[7800],172.25.5.29[7800];num_initial_members=3;port_range=3;
timeout=3500;up_thread=true):MERGE2(max_interval=10000;min_interval=5000):
FD(down_thread=true;max_tries=5;shun=true;timeout=2500;up_thread=true):
VERIFY_SUSPECT(down_thread=false;timeout=1500;up_thread=false):
pbcast.NAKACK(down_thread=true;gc_lag=100;retransmit_timeout=3000;up_thread=true):
pbcast.STABLE(desired_avg_gossip=20000;down_thread=false;up_thread=false):
pbcast.GMS(down_thread=true;join_retry_timeout=2000;join_timeout=5000;
print_local_addr=true;shun=false;up_thread=true):
pbcast.STATE_TRANSFER(down_thread=true;up_thread=true)

results of the tcp ping requests, which I think is a little strange since the ip address I have in my config for the other machine is being resolved to the network alias:

2005-12-06 11:51:54,021 DEBUG [org.jgroups.protocols.TCPPING] [FIND_INITIAL_MBRS] sending PING request to st2clxll13:7800
2005-12-06 11:51:54,022 DEBUG [org.jgroups.protocols.TCP] dest=st2clxll13:7800, hdrs:
TCP: [TCP:group_addr=DefaultPartition]
TCPPING: [PING: type=GET_MBRS_REQ, arg=null]
2005-12-06 11:51:54,023 DEBUG [org.jgroups.protocols.TCPPING] [FIND_INITIAL_MBRS] sending PING request to st2clxll13:7801
2005-12-06 11:51:54,024 DEBUG [org.jgroups.protocols.TCPPING] [FIND_INITIAL_MBRS] sending PING request to st2clxll13:7802
2005-12-06 11:51:54,032 DEBUG [org.jgroups.protocols.TCP] opened connection to st2clxll13:7800
2005-12-06 11:51:54,032 INFO [org.jgroups.blocks.ConnectionTable] connection was created to st2clxll13:7800
2005-12-06 11:51:54,032 INFO [org.jgroups.blocks.ConnectionTable] created socket to st2clxll13:7800

Here's the membership info from the log (it lists both members of the cluster as localhost, and also, notice how "I am" = null, where it should be the ip address of the host machine):

2005-12-06 11:51:57,637 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] I am (null) received membershipChanged event:
2005-12-06 11:51:57,638 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] Dead members: 0 ([])
2005-12-06 11:51:57,638 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] New Members : 0 ([])
2005-12-06 11:51:57,638 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] All Members : 2 ([127.0.0.1:1099, 127.0.0.1:1099])

1. Re: Cluster problem: Farm works, but HASingleton service doe

anubisman Apr 27, 2006 8:07 AM (in response to coachvargo)

Hi,
I have problems with Farming.
It doesn't work with my configuration (on Windows XP).

Please, send your farming configuration, if it is possible.

Best regards,
Aram.
Actions
2. Re: Cluster problem: Farm works, but HASingleton service doe

coachvargo Apr 27, 2006 6:15 PM (in response to coachvargo)

I figured this out long ago, but I figure I'd leave what the solution was for me. There were some extraneous entries in the server /etc/hosts file put there by my isp for a few extra server alias. Once I removed those, the servers were able to find each other as expected.
Actions

Go to original post