Problems with JGroups loopback in clustered environment
javagirlie Mar 2, 2015 8:30 AMHello modeshape team,
since upgrading to modeshape 4.2 we're getting the following error in our clustered environment:
[org.modeshape.jcr.bus.ClusteredChangeBus] - <Loopback changeset '0e6258e4-a736-4517-b5f3-fc9e77128a47' was never received back on 'ClusteringService[cluster_name='persistence', address=...]'. Make sure your JGroups configuration uses 'loopback=true' and if applicable 'loopback_separate_thread=true'>
Our JGroups config originally contained the entry loopback="true", loopback_separate_thread wasn't configured (it even cannot be configured with jgroups-3.4.3.Final and should not be used anyway as the JGroups manual says).
Taking a look into the code of ClusteredChangeBus we found the link to [MODE-2409], saying that loopback should NOT be disabled to ensure the correct order of the events.
But even with loopback enabled, we get the above error.
Looking at the JGroups code (org.jgroups.protocols.TP) we're a little confused by the comment of the loopback property.
/**
* If true, messages sent to self are treated specially: unicast messages are looped back immediately,
* multicast messages get a local copy first and - when the real copy arrives - it will be discarded. Useful for
* Window media (non)sense
*/
@Property(description="Messages to self are looped back immediately if true")
protected boolean loopback=true;
According to this, multicast messages are copied locally first, whereas the real message will be discarded as soon as it arrives.
As stated in ClusteredChangeBus.notify, modeshape is waiting until JGroups has dispatched it's own message back - but will it be waiting for the 'right' message as JGroups is going to dispatch a copy instead of the original?
We got rid of the above error message by setting loopback to false in our clustered environment but in fact, we're not really confident with this solution, as [MODE-2409] states that loopback should not be disabled.
Do you have any suggestions what else could be the reason for the error stated above and tips how we can get rid of it other than disabling 'loopback'? Or is disabling 'loopback' not as dangerous as it seems?
Thanks in advance, Susanne.