1 Reply Latest reply on Jan 13, 2009 11:31 AM by lovelyliatroim

Shunning

lovelyliatroim Jan 7, 2009 12:26 PM

Hi Guys,

I have my cluster config like so

<attribute name="ClusterConfig">
 <config>
 <UDP mcast_addr="228.10.10.10"
 mcast_port="45588"
 singleton_name="transport_one"
 tos="8"
 ucast_recv_buf_size="20000000"
 ucast_send_buf_size="640000"
 mcast_recv_buf_size="25000000"
 mcast_send_buf_size="640000"
 loopback="false"
 discard_incompatible_packets="true"
 max_bundle_size="64000"
 max_bundle_timeout="30"
 use_incoming_packet_handler="true"
 ip_ttl="2"
 enable_bundling="false"
 enable_diagnostics="true"

 use_concurrent_stack="true"

 thread_naming_pattern="pl"

 thread_pool.enabled="true"
 thread_pool.min_threads="1"
 thread_pool.max_threads="25"
 thread_pool.keep_alive_time="30000"
 thread_pool.queue_enabled="true"
 thread_pool.queue_max_size="10"
 thread_pool.rejection_policy="Run"

 oob_thread_pool.enabled="true"
 oob_thread_pool.min_threads="1"
 oob_thread_pool.max_threads="4"
 oob_thread_pool.keep_alive_time="10000"
 oob_thread_pool.queue_enabled="true"
 oob_thread_pool.queue_max_size="10"
 oob_thread_pool.rejection_policy="Run"/>

 <PING timeout="2000" num_initial_members="3"/>
 <MERGE2 max_interval="30000" min_interval="10000"/>
 <FD_SOCK/>
 <FD timeout="10000" max_tries="5" shun="true"/>
 <VERIFY_SUSPECT timeout="1500"/>
 <pbcast.NAKACK
 use_mcast_xmit="false" gc_lag="0"
 retransmit_timeout="300,600,1200,2400,4800"
 discard_delivered_msgs="true"/>
 <UNICAST timeout="300,600,1200,2400,3600"/>
 <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
 max_bytes="400000"/>
 <pbcast.GMS print_local_addr="true" join_timeout="5000"
 shun="false"
 view_bundling="true" view_ack_collection_timeout="5000"/>
 <FRAG2 frag_size="60000"/>
 <pbcast.STREAMING_STATE_TRANSFER />
 <!-- <pbcast.STATE_TRANSFER/> -->
 <pbcast.FLUSH timeout="0"/>

 </config>
 </attribute>

Now what I am interested in is the these 2.

<FD timeout="10000" max_tries="5" shun="true"/>
........
 <pbcast.GMS print_local_addr="true" join_timeout="5000"
 shun="false"
 view_bundling="true" view_ack_collection_timeout="5000"/>

Now if a node goes becomes unresponsive for a certain time and leaves the group, would be be allowed back in if he recovers with the current config like so??

I have read here http://www.jgroups.org/javagroupsnew/docs/manual/html_single/index.html#d0e2873 I need to set both to false.

Would the above configuration stop a node rejoining the group if it left??

Thanks,
LL

1. Re: Shunning

lovelyliatroim Jan 13, 2009 11:31 AM (in response to lovelyliatroim)

Hi Guys,

Im having a problem with my node rejoining the group. I have 2 nodes on the same machine, one node runs into GC problems but after a few minutes recovers. When it recovers the other node seems to shun it. It wont let it back into the group.

Im seeing messages like so

16:59:41,523 [OOB-1,10.251.145.84:48336] WARN jgroups.protocols.pbcast.NAKACK - 10.251.145.84:48336] discarded message from non-member 10.251.145.84:48337, my view is [10.251.145.84:48336|26] [10.251.145.84:48336]

I have set the shun to be false both in "FD" protocol and the "pbcast.GMS", auto_reconnect has also been set to true but i still cant get a member to rejoin a group after it runs into GC problems and recovers.

Any ideas as to what i must do??

Thanks,
LL
Actions