Clustering issue with JGroups on JBoss 3.2.5
rcostanzo Aug 26, 2004 5:09 PMI'm running a cluster on JBoss 3.2.5 and am having issues with my instances joining. It seems to only happen when starting up a 3rd or 4th instance to join the cluster. For example, say I have instance A and B running already, and go to start instance C:
1. serverA will acknowledge serverC as a member and update its cluster view properly
2. serverB will spit out the following warning:
2004-08-26 16:52:38,999 WARN [org.jgroups.protocols.pbcast.NAKACK] [serverC:32794 (additional data: 17 bytes)] discarded message from non-member serverC:32799 (additional data: 17 bytes)
3. serverC will hang at this point
Why does serverB think serverC is a non-member, when serverA is cool with serverC? And when serverB and serverA are in the same cluster...makes me thing that it's not a config issue.
Is there any way to hardcode who your members are to avoid this issue? I found that the issue doesn't happen when using TCP rather than multicast, but it is way too slow in comparison (just clicking around as a single user on my site I saw the page load times go up 3 seconds or so).
Any help/suggestions would be greatly appreciated. I've included the jgroups settings for one of my servers below:
<UDP bind_addr="XX.XX.XX.XXX" mcast_addr="228.1.2.1" mcast_port="45566"
ip_ttl="32" ip_mcast="true"
mcast_send_buf_size="800000" mcast_recv_buf_size="150000"
ucast_send_buf_size="800000" ucast_recv_buf_size="150000"
loopback="false" />
<PING timeout="2000" num_initial_members="4"
up_thread="true" down_thread="true" />
<MERGE2 min_interval="10000" max_interval="20000" />
<FD shun="true" up_thread="true" down_thread="true"
timeout="2500" max_tries="5" />
<VERIFY_SUSPECT timeout="3000" num_msgs="3"
up_thread="true" down_thread="true" />
<pbcast.NAKACK gc_lag="50" retransmit_timeout="300,600,1200,2400,4800"
max_xmit_size="8192"
up_thread="true" down_thread="true" />
<UNICAST timeout="300,600,1200,2400,4800" window_size="100" min_threshold="10"
down_thread="true" />
<pbcast.STABLE desired_avg_gossip="20000"
up_thread="true" down_thread="true" />
<FRAG frag_size="8192"
down_thread="true" up_thread="true" />
<pbcast.GMS join_timeout="5000" join_retry_timeout="2000"
shun="true" print_local_addr="true" />
<pbcast.STATE_TRANSFER up_thread="true" down_thread="true" />