1 Reply Latest reply on Feb 14, 2007 5:42 PM by belaban

    Problem of merging split cluster

    xyoungli

      I am seeing a cluster split in our production JBoss environment and wondering why the merging process is not working. The configuration is as follows:
      - JBoss: 4.0.3SP1
      - OS: Solaris
      - Cluster: there are 5 server instances on the same box. TCP is used as transport:

      <TCP bind_addr="localhost" start_port="${jboss.cluster.tcp.port:7800}" loopback="true"/>
      <TCPPING initial_hosts="localhost[${jboss.cluster.tcp.port:7800}]" port_range="${jboss.cluster.tcp.port.range:5}" timeout="3500"
      num_initial_members="${jboss.cluster.tcp.members:5}" up_thread="true" down_thread="true"/>
      <MERGE2 min_interval="5000" max_interval="10000"/>
      <FD shun="true" timeout="5000" max_tries="5" up_thread="false" down_thread="false" />
      <VERIFY_SUSPECT timeout="4000" down_thread="false" up_thread="false" />
      <pbcast.NAKACK down_thread="true" up_thread="true" gc_lag="100"
      retransmit_timeout="3000"/>
      <pbcast.STABLE desired_avg_gossip="20000" down_thread="false" up_thread="false" />
      <pbcast.GMS join_timeout="5000" join_retry_timeout="2000" shun="false"
      print_local_addr="true" down_thread="true" up_thread="true"/>
      <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>


      4 instances have formed one cluster and 1 instance formed another. The following are from JMX console:
      1) CurrentView java.util.Vector R [162.111.75.85:3599, 162.111.75.85:3799, 162.111.75.85:3499, 162.111.75.85:3899]

      2) CurrentView java.util.Vector R [162.111.75.85:3699]

      Why the merge process doesn't work?
      Is the TCP configuration wrong somewhere? The clustering has been working fine for a few weeks, and it split after restart today.