-
1. Re: Issues with shunned node rejoining a cluster
manik Oct 11, 2006 7:14 AM (in response to brian.stansberry)1) At the moment, you are correct, an exception is thrown. This should propagate up to the caller though.
2) Yes again; state is transferred when new buddies join (or shunned ones rejoin). No assumptions are made as to the fact that the buddy may already have some state.
3) We need to think about what the correct behaviour here should be. You are correct in that the current behaviour is incorrect.
4) Agreed.
The thing is,
1) How do we detect a shun? Do we have a JGroups notification for this?
2) How can we transparently deal with the merge? Simple case of wiping the joiner's in-memory cache? :( -
2. Re: Issues with shunned node rejoining a cluster
brian.stansberry Oct 11, 2006 9:48 AM (in response to brian.stansberry)"manik.surtani@jboss.com" wrote:
How do we detect a shun? Do we have a JGroups notification for this?
The org.jgroups.ChannelListener interface provides a callback when the channel is disconnected. That either happens with a shun or during a cache stop(), so the listener can know the difference by knowing if it's stopping.
Re: merging state, I'm not sure what the answer to this is. Probably the way to think it through is to identify in what cases the reconnected node holds state that's worth preserving. Then see if those cases justify the cost of merging vs. doing things like invalidating tx's and just wiping clean. Once we figure out how to do a merge for the non-shunning case we'll have a better idea of the cost of a merge.
I'll realize the above paragraph is a bunch of mush :( -
3. Re: Issues with shunned node rejoining a cluster
manik Oct 11, 2006 10:44 AM (in response to brian.stansberry)Ok, so that's fine for detecting such an event - but then again, doesn't the coord detect such an event as well and issue a new View? So we'd need to wire things so we don't reassign buddies twice - when we detect a shun/stop and again when we get a new View.
Re: merging, I think the wipe-clean approach is probably the only thing we can reliably do, as there is no way of knowing how to merge data without implementing a call-back for the calling application to handle. -
4. Re: Issues with shunned node rejoining a cluster
brian.stansberry Oct 11, 2006 11:00 AM (in response to brian.stansberry)I've found that you don't get a new view when you disconnect.
Re: agreed that wipe clean is pretty likely, but if we end up implementing merge policies that imply callbacks anyway, perhaps we can piggyback. -
5. Re: Issues with shunned node rejoining a cluster
manik Oct 11, 2006 12:28 PM (in response to brian.stansberry)That is really strange - a bug in JGroups? Shouldn't the coord broadcast a new view when someone disconnects, since the cluster membership changes?
-
6. Re: Issues with shunned node rejoining a cluster
brian.stansberry Oct 11, 2006 12:48 PM (in response to brian.stansberry)Bela can comment better as to whether not getting one last view before you leave is a flaw or intended behavior.
Looks like when a DISCONNECT comes down the channel, GMS sends a LEAVE_REQ to the coord, who, before sending out a new view, replies with a LEAVE_RSP. Upon receipt of the LEAVE_RSP, ParticipantGmsImpl sends a DISCONNECT_OK up to the app, and then passses the DISCONNECT down the channel. When the DISCONNECT gets to NAKACK, it puts it into a state where no further messages are passed up. Thus any subsequent view sent by the coord is dropped.
Any change to this would have to watch out for breaking this bit in JChannel.up():case Event.VIEW_CHANGE: my_view=(View)evt.getArg(); // crude solution to bug #775120: if we get our first view *before* the CONNECT_OK, // we simply set the state to connected if(connected == false) { connected=true; synchronized(connect_mutex) { // bug fix contributed by Chris Wampler (bug #943881) connect_ok_event_received=true; connect_mutex.notify(); } }
-
7. Re: Issues with shunned node rejoining a cluster
belaban Oct 11, 2006 3:03 PM (in response to brian.stansberry)That's intended behavior: when you're disconnected you're not member of the group anymore, so you won't receive any messages or view changes