TCP clustering problem

tyke16 Feb 26, 2008 3:14 PM

I am having difficulty getting a four node, TCP based JGroups (2.4.1)cluster operating properly.
The cluster will function as expected until the coordinator dies or is gracefully shut down. At that point the three remaining nodes do not 'elect' a new coordinator and are forever waiting for the old coordinator to come back online.
However, when I try re-introducing the previous coordinator into the cluster, it hangs trying to re-establish itself to the coordinator (itself) as defined by the other nodes.
I've tested this same scenario using UDP multicast communication and it works fine. However, TCP is the only option we have in our target production environment.

Any help would be great. Here is a snippet of my cluster configuration:
++++++++++++++
<TCP loopback="true"
start_port="6006"
bind_addr="10.10.21.73"/>
<TCPPING initial_hosts="vhcertrh01[6006],vhcertrh01[6106],vhcertrh02[6006],vhcertrh02[6106]"
port_range="10"
timeout="3000"
num_initial_members="2"/>
<pbcast.NAKACK gc_lag="50"
retransmit_timeout="600,1200,2400,4800"
max_xmit_size="8192"
up_thread="false"
down_thread="false"/>
<UNICAST timeout="600,1200,2400"
window_size="100"
min_threshold="10"
down_thread="false"/>
<pbcast.STABLE desired_avg_gossip="20000"
up_thread="false"
down_thread="false"/>
<FRAG frag_size="8192"
down_thread="false"
up_thread="false"/>
<pbcast.GMS join_timeout="5000"
join_retry_timeout="2000"
shun="true"
print_local_addr="true"/>
<pbcast.STATE_TRANSFER
up_thread="true"
down_thread="true"/>
<FD timeout="2500"
max_tries="3"
shun="true"/>
<FD_SOCK />
++++++++++++++++++++++++++
Thanks,
Tyke

1. Re: TCP clustering problem

belaban Feb 26, 2008 5:53 PM (in response to tyke16)

where did you get this comnfig from ? FD and FD_SOCK at the top of the stack ?

I suggest take one of the configs shipped with JGroups (e.g. tcp.xml) and use it with modifications...
Actions
2. Re: TCP clustering problem

tyke16 Feb 27, 2008 11:59 AM (in response to tyke16)

Thanks for the info. That configuration seems to be working.

Can you give me the reader's digest version of the significance of where the configuration options fall in the stack?

Thank you,
Tyke
Actions
3. Re: TCP clustering problem

belaban Feb 27, 2008 12:05 PM (in response to tyke16)

Read the documentation and wiki (latter has a good discussion of the protocols)
Actions

Go to original post