-
1. Re: Node Starting / Rehash Inconsistency
manik Dec 8, 2010 4:58 AM (in response to shane_dev)ReplicationTask.call() contains the call for all 3 messages (in your case). All 3 are broadcast in parallel, but only the first one should return. (non-blocking, see the JGroups mode used in opts: GET_FIRST).
Do you have a stack trace, log or test case that proves otherwise?
-
2. Re: Node Starting / Rehash Inconsistency
shane_dev Dec 8, 2010 9:45 AM (in response to manik)This part was a little confusing to me. The notes seem to indicate otherwise.
// This is possibly a remote GET.
// TODO this is sub-optimal and sequential (for now), until JGroups provides notifying futures - JGRP-1030
I can see how futures are used in the next if/else block (GET_ALL), but in this one it appears that the response is evaluated and possibly returned before any subsequent iterations of the for loop (L301 - L305). Perhaps I am just not interpreting the code correctly.
Our logs seem to indicate that we are waiting for the first request.
NodeA 21:43:50,411 [InvocationContextInterceptor] Invoked with command PutKeyValueCommand{key=XYZ, value=XYZ...NodeB 21:43:50,430 [CommandAwareRpcDispatcher] Attempting to execute command: ClusteredGetCommand{key=XYZ...NodeB 21:43:50,430 [CommandAwareRpcDispatcher] Enough waiting; replayIgnored = false, sr STATE_PREEXISTEDNodeB 21:44:00,453 [InboundInvocationHandlerImpl] Cache named [ABC] exists but isn't in a state to handle invocations. Its state is INSTANTIATED.NodeA 21:44:00,477 [CommandAwareRpcDispatcher] responses: [sender=NodeC, retval=null, received=true, suspected=false]NodeA 21:44:00,477 [CallInterceptor] Executing command: PutKeyValueCommand{key=XYZ, value=XYZ...A 21:43:50,411 [InvocationContextInterceptor] Invoked with command PutKeyValueCommand{key=XYZ, value=XYZ...
B 21:43:50,430 [CommandAwareRpcDispatcher] Attempting to execute command: ClusteredGetCommand{key=XYZ...
B 21:43:50,430 [CommandAwareRpcDispatcher] Enough waiting; replayIgnored = false, sr STATE_PREEXISTED
B 21:44:00,453 [InboundInvocationHandlerImpl] Cache named [ABC] exists but isn't in a state to handle invocations. Its state is INSTANTIATED.
A 21:44:00,477 [CommandAwareRpcDispatcher] responses: [sender=C, retval=null, received=true, suspected=false]
A 21:44:00,477 [CallInterceptor] Executing command: PutKeyValueCommand{key=XYZ, value=XYZ...
It appears that A requests an entry that is on B and C. It first sends a request to B. B returns a request ignored response after the 10 second timeout. A then requests and receives a response from C.
-
3. Re: Node Starting / Rehash Inconsistency
mircea.markus Dec 8, 2010 2:18 PM (in response to shane_dev)1 of 1 people found this helpfulShane I think you are right reading the code. In the scenario you described in the first post, a get on A can indeed go to D first: this is just a matter of hazard as the sequence of addresses being called is "randomized" based on their hash code:
Set<Address> targets = new HashSet<Address>(dests); // should sufficiently randomize order
The good news is that the jgoups out of of the box support for that is here already: JGRP-1030. Manik, do we have JIRA for integrating this?
-
4. Re: Node Starting / Rehash Inconsistency
shane_dev Dec 8, 2010 3:55 PM (in response to mircea.markus)Cool. I imagine this would simply entail replacing sendMessage with castMessage (more or less) using the GET_FIRST mode?
Out of curiousity, if this is the case, couldn't this approach also be used with the GET_ALL block such that a single castMessage call is used instead of multiple sendMessageWithFuture calls? It seems that since the get() on each future blocks, the result (performance) is the same but there would be less code used.
-
5. Re: Node Starting / Rehash Inconsistency
shane_dev Dec 9, 2010 12:58 PM (in response to shane_dev)It just dawned on me that despite the fact that the mode is set to GET_FIRST, we still have to apply the filter and essentially ignore the mode. After looking at the JGroups manual some more I wonder if it would help to do a loop and call isDone() rather than get()?
-
6. Re: Node Starting / Rehash Inconsistency
manik Dec 17, 2010 10:37 AM (in response to mircea.markus)Yes we use this.
-
7. Re: Node Starting / Rehash Inconsistency
manik Dec 17, 2010 10:46 AM (in response to shane_dev)- regardless of the mode, if a filter is used, we set the mode to GET_FIRST. The filter then determines whether we have enough good results:
If you have access to the srcs, perhaps add more logging and try and see where it starts to misbehave?
-
8. Re: Node Starting / Rehash Inconsistency
shane_dev Dec 17, 2010 11:04 AM (in response to manik)What I see is that the filter is not added to the request options. A for loop is used to send a sync message to one target at a time, and for each response the filter is manually applied. The mode may be set to GET_FIRST but we are still sending separate sync messages to one target at a time. As a result, we block on the first message and it may timeout due to the reasons mentioned initially.
My understanding is that it could be changed to use a sync castMessage where all of the targets are passed in and the filter is added to the request options. This way we block only until the first appropriate message is returned.
Something along the lines of...
if (filter != null) {
opts.setRspFilter(filter);
opts.setAnycasting(false);
retval = castMessage(targets, constructMessage(buf, null), opts);
}
Does this make sense, or is there something about castMessage that makes it inappropriate here?
-
9. Re: Node Starting / Rehash Inconsistency
manik Jan 5, 2011 3:11 PM (in response to shane_dev)Are you referring to this TODO in the code?
Also do you have a test that reproduces this? Would help if I can trace it properly.
-
10. Re: Node Starting / Rehash Inconsistency
shane_dev Jan 6, 2011 8:52 AM (in response to manik)Indeed, that is what I was referring to. I don't have a unit test at the moment. It is a little bit random depending on the hash ring and the target set order. I am curious as to why futures would be used instead of a cast message call though?
-
11. Re: Node Starting / Rehash Inconsistency
manik Jan 6, 2011 11:00 AM (in response to shane_dev)The castMessage() method will broadcast to the entire cluster. This causes a lot of unnecessary noise on nodes that shouldn't care about this request. The correct solution is to have notifying futures in JGroups - I'll ping the JGroups team and see if we have this now.
-
12. Re: Node Starting / Rehash Inconsistency
manik Jan 6, 2011 11:05 AM (in response to manik)Ok, looks like JGroups has had this since 2.11. Which means we can now make use of this.
I've created a JIRA for this:
-
13. Re: Node Starting / Rehash Inconsistency
shane_dev Jan 6, 2011 11:07 AM (in response to manik)I believe that castMessage() will only broadcast to the entire cluster if the destinations are null. In our case we, our list contains the appropriate nodes.
-
14. Re: Node Starting / Rehash Inconsistency
shane_dev Jan 6, 2011 11:09 AM (in response to manik)Great!
Thanks,Shane