7 Replies Latest reply on May 18, 2009 9:36 AM by Bela Ban

    Facing Reincarnation Error?

    Larry Jiang Newbie

      I am using JBoss 4.0.3SP1 (shipped with JGroups 2.2.7) to combine a cluster with 3 nodes on Win 2003, basing on JDK1.4.2_17.

      I configured a TCP stack for underlying communication. Both in cluster-service.xml & tc5-cluster-service.xml. See following:

       <TCP bind_addr="10.200.**.1" start_port="7800" loopback="true"/>
       <TCPPING initial_hosts="10.200.**.1[7800],10.200.**.2[7800],10.200.**.3[7800]" port_range="3" timeout="10000"
       num_initial_members="3" up_thread="true" down_thread="true"/>
       <MERGE2 min_interval="5000" max_interval="10000"/>
       <FD shun="false" timeout="15000" max_tries="5" up_thread="true" down_thread="true" />
       <VERIFY_SUSPECT timeout="1500" down_thread="false" up_thread="false" />
       <pbcast.NAKACK down_thread="true" up_thread="true" gc_lag="100"
       <pbcast.STABLE desired_avg_gossip="20000" down_thread="false" up_thread="false" />
       <pbcast.GMS join_timeout="5000" join_retry_timeout="2000" shun="false"
       print_local_addr="true" down_thread="true" up_thread="true"/>
       <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>

      The problem is, if I turn off one server(say,node B) and restart it immediately, there are ERROR messages:
      on the B side:
      09:07:08|ERROR [ParticipantGmsImpl] handleJoinResponse() should not be invoked on an instance of org.jgroups.protocols.pbcast.participantGmsImpl

      and after a while also:
      ERROR [GMS] [B:7800(additional data:18bytes)] received view <= current view; discarding it (current vid:[A:7800(additional data:18bytes)|2],new vid:[A:7800(additional data:18bytes)|2])

      on the A side:
      09:07:03|ERROR[CoordGmsImp] memeber B:7800 already present; return existing view [A:7800,B:7800]
      09:07:07|ERROR[GMS][A:7800] received view<=current view;descarding it(current vid:[A:7800|1], new vid:[B:7800|1])

      After B started up, from the web console, I can see that B joined the cluster view. Is it a proper cluster? Could I just ignore these Error messages? It would be great if someone could explain that what do these error messages mean exactly. What happened behind?

      Is it typical reincarnation error?
      Is there some solution to solve this problem? it happens a lot and really annoys me.

      My solution(it works sometimes but looks stupid):
      After I turn off one node, I wait for about 4 minutes, then restart it. The Error messages don't show up anymore.

      Thanks for any idea.