1 Reply Latest reply on Jan 13, 2009 11:31 AM by lovelyliatroim

    Shunning

    lovelyliatroim

      Hi Guys,

      I have my cluster config like so


      <attribute name="ClusterConfig">
       <config>
       <UDP mcast_addr="228.10.10.10"
       mcast_port="45588"
       singleton_name="transport_one"
       tos="8"
       ucast_recv_buf_size="20000000"
       ucast_send_buf_size="640000"
       mcast_recv_buf_size="25000000"
       mcast_send_buf_size="640000"
       loopback="false"
       discard_incompatible_packets="true"
       max_bundle_size="64000"
       max_bundle_timeout="30"
       use_incoming_packet_handler="true"
       ip_ttl="2"
       enable_bundling="false"
       enable_diagnostics="true"
      
       use_concurrent_stack="true"
      
       thread_naming_pattern="pl"
      
       thread_pool.enabled="true"
       thread_pool.min_threads="1"
       thread_pool.max_threads="25"
       thread_pool.keep_alive_time="30000"
       thread_pool.queue_enabled="true"
       thread_pool.queue_max_size="10"
       thread_pool.rejection_policy="Run"
      
       oob_thread_pool.enabled="true"
       oob_thread_pool.min_threads="1"
       oob_thread_pool.max_threads="4"
       oob_thread_pool.keep_alive_time="10000"
       oob_thread_pool.queue_enabled="true"
       oob_thread_pool.queue_max_size="10"
       oob_thread_pool.rejection_policy="Run"/>
      
       <PING timeout="2000" num_initial_members="3"/>
       <MERGE2 max_interval="30000" min_interval="10000"/>
       <FD_SOCK/>
       <FD timeout="10000" max_tries="5" shun="true"/>
       <VERIFY_SUSPECT timeout="1500"/>
       <pbcast.NAKACK
       use_mcast_xmit="false" gc_lag="0"
       retransmit_timeout="300,600,1200,2400,4800"
       discard_delivered_msgs="true"/>
       <UNICAST timeout="300,600,1200,2400,3600"/>
       <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
       max_bytes="400000"/>
       <pbcast.GMS print_local_addr="true" join_timeout="5000"
       shun="false"
       view_bundling="true" view_ack_collection_timeout="5000"/>
       <FRAG2 frag_size="60000"/>
       <pbcast.STREAMING_STATE_TRANSFER />
       <!-- <pbcast.STATE_TRANSFER/> -->
       <pbcast.FLUSH timeout="0"/>
      
       </config>
       </attribute>


      Now what I am interested in is the these 2.




      <FD timeout="10000" max_tries="5" shun="true"/>
      ........
       <pbcast.GMS print_local_addr="true" join_timeout="5000"
       shun="false"
       view_bundling="true" view_ack_collection_timeout="5000"/>
      


      Now if a node goes becomes unresponsive for a certain time and leaves the group, would be be allowed back in if he recovers with the current config like so??

      I have read here http://www.jgroups.org/javagroupsnew/docs/manual/html_single/index.html#d0e2873 I need to set both to false.

      Would the above configuration stop a node rejoining the group if it left??

      Thanks,
      LL




        • 1. Re: Shunning
          lovelyliatroim

          Hi Guys,

          Im having a problem with my node rejoining the group. I have 2 nodes on the same machine, one node runs into GC problems but after a few minutes recovers. When it recovers the other node seems to shun it. It wont let it back into the group.

          Im seeing messages like so


          16:59:41,523 [OOB-1,10.251.145.84:48336] WARN jgroups.protocols.pbcast.NAKACK - 10.251.145.84:48336] discarded message from non-member 10.251.145.84:48337, my view is [10.251.145.84:48336|26] [10.251.145.84:48336]


          I have set the shun to be false both in "FD" protocol and the "pbcast.GMS", auto_reconnect has also been set to true but i still cant get a member to rejoin a group after it runs into GC problems and recovers.

          Any ideas as to what i must do??

          Thanks,
          LL