3 Replies Latest reply on Feb 27, 2008 12:05 PM by belaban

    TCP clustering problem

    tyke16

      I am having difficulty getting a four node, TCP based JGroups (2.4.1)cluster operating properly.
      The cluster will function as expected until the coordinator dies or is gracefully shut down. At that point the three remaining nodes do not 'elect' a new coordinator and are forever waiting for the old coordinator to come back online.
      However, when I try re-introducing the previous coordinator into the cluster, it hangs trying to re-establish itself to the coordinator (itself) as defined by the other nodes.
      I've tested this same scenario using UDP multicast communication and it works fine. However, TCP is the only option we have in our target production environment.

      Any help would be great. Here is a snippet of my cluster configuration:
      ++++++++++++++
      <TCP loopback="true"
      start_port="6006"
      bind_addr="10.10.21.73"/>
      <TCPPING initial_hosts="vhcertrh01[6006],vhcertrh01[6106],vhcertrh02[6006],vhcertrh02[6106]"
      port_range="10"
      timeout="3000"
      num_initial_members="2"/>
      <pbcast.NAKACK gc_lag="50"
      retransmit_timeout="600,1200,2400,4800"
      max_xmit_size="8192"
      up_thread="false"
      down_thread="false"/>
      <UNICAST timeout="600,1200,2400"
      window_size="100"
      min_threshold="10"
      down_thread="false"/>
      <pbcast.STABLE desired_avg_gossip="20000"
      up_thread="false"
      down_thread="false"/>
      <FRAG frag_size="8192"
      down_thread="false"
      up_thread="false"/>
      <pbcast.GMS join_timeout="5000"
      join_retry_timeout="2000"
      shun="true"
      print_local_addr="true"/>
      <pbcast.STATE_TRANSFER
      up_thread="true"
      down_thread="true"/>
      <FD timeout="2500"
      max_tries="3"
      shun="true"/>
      <FD_SOCK />
      ++++++++++++++++++++++++++
      Thanks,
      Tyke