3 Replies Latest reply on Apr 7, 2006 5:44 PM by Bela Ban

    JGroups UNICAST - previous_members not updated on node rejoi

    Fredrik Johansson Newbie

      Not sure whether to ask about more jgroups specific issues in this forum, but here we go anyway.

      I am running a cluster with 4 nodes (1,2,3,4). They all share a DistributedHashtable (jgroups specific, and yes - deprecated.).

      When the cluster is started everything is fine. I then kill node2, wait for the other nodes to detect a member loss (FD timeout). When they have detected a member loss, I restart node2 which rejoins the cluster.

      Now, after the restart when I add an entry to the DistributedHashtable on node2 it hangs on the .put(...) method.

      I have pinpointed the problem to being that node3 & node4 does not send a response to node2 for the _put RPC call. They discard the message since they still think that node2 has left the building:

      2006-04-06 17:16:52,178 UpHandler (COMPRESS) DEBUG org.jgroups.blocks.RequestCorrelator handleRequest:639 - sending rsp for 1144336568899 to
      2006-04-06 17:16:52,178 DownHandler (UNICAST) DEBUG org.jgroups.protocols.UNICAST down:242 - discarding message to as this member left the group, previous_members=[ ]

      In UNICAST row 242 we see that if the list previous_members contains node2, the message will be discarded (which is obviously what happens). However node2 should be removed from previous_members upon incoming data from node2. This is done in handleDataReceived in UNICAST. This never happens. It happens on other jchannels (repl.hashtables) but not for this particular one.

      This is for both 2.2.9final and I tried reverting to 2.2.8 and then it works. For 2.2.8 the put RPC is acknowledged and responded by all members in the view. We are however in need of the fine grained interface binding in 2.2.9.

      Does anyone have any input on why the handleDataReceived is not called for UNICAST and have any good solutions/workarounds?

      Fredrik Johansson