-
1. Re: Block at containsKey() until timeout at JGroups FlowControl
tfromm Mar 19, 2012 6:29 AM (in response to tfromm)If this often happens, FlowControl adjustment is nessesary. http://www.jgroups.org/papers/FlowControl.html
-
2. Re: Block at containsKey() until timeout at JGroups FlowControl
galder.zamarreno Mar 20, 2012 5:05 AM (in response to tfromm)Hmmm, does the node eventually start working again? If not, did you get any thread dumps from the receiver nodes to see if they have some kind of deadlock that could stop from sending credits back to senders?
If instead what you're getting is momentary blocks which eventually recover, then it might be a matter of tweaking FC settings trying the following:
1. Increase FC.max_credits - number of credit bytes, so must be below heap size.
2. Increase FC.min_threshold - percentage - this will help for slow receivers send more credits earlier and avoid senders blocking.
-
3. Re: Block at containsKey() until timeout at JGroups FlowControl
tfromm Mar 20, 2012 5:16 AM (in response to galder.zamarreno)That was the good thing: After timeout the node resumes to normal operations. I have not recognized any dataloss or something. :-)
I'll try the tweakings when this situation appears more frequent, otherwise I cannot determine if the configuration modifications have a positive effect.
-
4. Re: Block at containsKey() until timeout at JGroups FlowControl
galder.zamarreno Mar 20, 2012 6:18 AM (in response to tfromm)Ok. If it happens againt, make sure you get thread dumps from all nodes in the cluster cos that way we can see what's up with not only the senders but the receivers as well.
-
5. Re: Block at containsKey() until timeout at JGroups FlowControl
tfromm Apr 24, 2012 8:28 AM (in response to galder.zamarreno)Since 5.1.3 the issue appears more frequent.
I've attached thread dumps of all 3 nodes, the "castor" one is that which blocks.
Meanwhile I'll change the credit values...
-
thread-dump-pollux.txt.zip 16.8 KB
-
thread-dump-helena.txt.zip 16.4 KB
-
thread-dump-castor.txt.zip 18.6 KB
-
-
6. Re: Block at containsKey() until timeout at JGroups FlowControl
galder.zamarreno Apr 30, 2012 9:54 AM (in response to tfromm)That is very weird. Castor is waiting for responses but no trace in the other nodes of any processing. There's no FC wait here though, just waiting for a reply.
This smells like a UDP problem in your env since I had a similar issue on my Mac due to small UDP buffers. I'd suggest you try running the same test with TCP.