0 Replies Latest reply on Jul 10, 2007 6:02 PM by quinine

    FD, replication timeouts, and shunning

    quinine

      The short version of this question is: I have two nodes, A & B, with TreeCache hibernate 2nd-level cache replication. Node B will have several GCs during the day that make it unresponsive for minutes at a time. During these, Node A throws ReplicationExceptions. I want to set my jbosscache/jgroups configuration to enable node A to withstand these GC events on node B, either by temporarily shunning or by having appropriate timeouts, or some combination of both.

      Long Version:

      I have inherited a jboss cluster. I have been doing a great deal of reading, I have gone through the jgroups, jbosscache, & hibernate jbosscache wikis here on jboss.org.

      We have 2 nodes , A & B, in a cluster. Both are running:

      JBoss 4.0.5
      JBossCache 1.4.1.SP3
      JGroups 2.4.1-SP1.


      The jboss-config.xml for the hibernate treecache (identical on both):

      <?xml version="1.0" encoding="UTF-8"?>
      <server>
       <mbean code="org.jboss.cache.TreeCache"
       name="jboss.cache:service=HibernateTreeCache">
      
       <depends>jboss:service=Naming</depends>
       <depends>jboss:service=TransactionManager</depends>
      
       <attribute name="ClusterName">Hibernate-${jboss.partition.name:Cluster}</attribute>
      
       <attribute name="IsolationLevel">READ_COMMITTED</attribute>
      
       <attribute name="CacheMode">REPL_SYNC</attribute>
      
       <attribute name="UseRegionBasedMarshalling">false</attribute>
      
       <attribute name="InactiveOnStartup">false</attribute>
      
       <attribute name="TransactionManagerLookupClass">org.jboss.cache.BatchModeTransactionManagerLookup</attribute>
      
       <attribute name="ClusterConfig">
       <config>
       <TCP bind_addr="${partition.tcphost:HIB3-MISCONFIGURED}"
       start_port="${partition.tcpport.hib3:HIB3-MISCONFIGURED}" loopback="false"
       tcp_nodelay="false" up_thread="false" down_thread="false"/>
       <TCPPING initial_hosts="${partition.tcphosts.hib3:HIB3-MISCONFIGURED}"
       port_range="3" timeout="3500"
       num_initial_members="3" up_thread="false" down_thread="false"/>
       <MERGE2 min_interval="20000" max_interval="100000"
       down_thread="false" up_thread="false"/>
       <FD_SOCK down_thread="false" up_thread="false"/>
       <FD shun="true" down_thread="false" up_thread="false"
       timeout="20000" max_tries="5"/>
       <VERIFY_SUSPECT timeout="1500" down_thread="false" up_thread="false"/>
       <pbcast.NAKACK up_thread="false" down_thread="false" gc_lag="100"
       retransmit_timeout="60000"/>
       <pbcast.STABLE desired_avg_gossip="50000" up_thread="false" down_thread="false" />
       <pbcast.GMS join_timeout="5000" join_retry_timeout="2000" shun="true"
       print_local_addr="true" down_thread="false" up_thread="false"/>
       <pbcast.STATE_TRANSFER up_thread="false" down_thread="false"/>
       </config>
      
      
       </attribute>
      
       <attribute name="FetchInMemoryState">false</attribute>
       <attribute name="InitialStateRetrievalTimeout">20000</attribute>
      
       <attribute name="SyncReplTimeout">20000</attribute>
      
       <attribute name="LockAcquisitionTimeout">15000</attribute>
      
       <attribute name="BuddyReplicationConfig">
       <config>
       <buddyReplicationEnabled>false</buddyReplicationEnabled>
       <buddyLocatorClass>org.jboss.cache.buddyreplication.NextMemberBuddyLocator</buddyLocatorClass>
       <buddyLocatorProperties>
       numBuddies = 1
       ignoreColocatedBuddies = true
       </buddyLocatorProperties>
      
       <buddyPoolName>default</buddyPoolName>
       <buddyCommunicationTimeout>2000</buddyCommunicationTimeout>
      
       <autoDataGravitation>false</autoDataGravitation>
       <dataGravitationRemoveOnFind>true</dataGravitationRemoveOnFind>
       <dataGravitationSearchBackupTrees>true</dataGravitationSearchBackupTrees>
      
       </config>
       </attribute>
      
       </mbean>
      
      </server>
      


      Our cluster-service.xml:
      <?xml version="1.0" encoding="UTF-8"?>
      
      <server>
      
       <mbean code="org.jboss.ha.framework.server.ClusterPartition"
       name="jboss:service=${jboss.partition.name:DefaultPartition}">
      
       <attribute name="PartitionName">${jboss.partition.name:DefaultPartition}</attribute>
      
       <attribute name="NodeAddress">${jboss.bind.address}</attribute>
      
       <attribute name="DeadlockDetection">False</attribute>
      
       <attribute name="StateTransferTimeout">30000</attribute>
      
       <attribute name="PartitionConfig">
       <Config>
       <TCP bind_addr="${partition.tcphost:CLUSTERCONFIG-MISCONFIGURED}" start_port="${partition.tcpport.cluster:CLUSTERCONFIG-MISCONFIGURED}" loopback="false"
       recv_buf_size="2000000" send_buf_size="640000"
       tcp_nodelay="true" up_thread="true" down_thread="true"/>
       <TCPPING initial_hosts="${partition.tcphosts.cluster:CLUSTERCONFIG-MISCONFIGURED}"
       port_range="3" timeout="3500"
       num_initial_members="3" up_thread="true" down_thread="true"/>
       <MERGE2 min_interval="10000" max_interval="20000" />
       <FD_SOCK down_thread="true" up_thread="true"/>
       <FD shun="true" up_thread="true" down_thread="true"
       timeout="10000" max_tries="5"/>
       <VERIFY_SUSPECT timeout="3000" down_thread="true" up_thread="true" />
       <pbcast.NAKACK up_thread="true" down_thread="true" gc_lag="100"
       retransmit_timeout="300,600,1200,2400,4800"/>
       <pbcast.STABLE desired_avg_gossip="20000" max_bytes="400000"
       down_thread="true" up_thread="true" />
       <pbcast.GMS join_timeout="5000" join_retry_timeout="2000" shun="true"
       print_local_addr="true" up_thread="true" down_thread="true"/>
       <FC max_credits="2000000" down_thread="true" up_thread="true"
       min_threshold="0.10"/>
       <FRAG2 frag_size="60000" down_thread="true" up_thread="true"/>
       <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>
       </Config>
       </attribute>
       <depends>jboss:service=Naming</depends>
       </mbean>
      
      
       <mbean code="org.jboss.ha.hasessionstate.server.HASessionStateService"
       name="jboss:service=HASessionState">
       <depends>jboss:service=Naming</depends>
       <!-- We now inject the partition into the HAJNDI service instead
       of requiring that the partition name be passed -->
       <depends optional-attribute-name="ClusterPartition"
       proxy-type="attribute">jboss:service=${jboss.partition.name:DefaultPartition}</depends>
       <!-- JNDI name under which the service is bound -->
       <attribute name="JndiName">/HASessionState/Default</attribute>
       <!-- Max delay before cleaning unreclaimed state.
       Defaults to 30*60*1000 => 30 minutes -->
       <attribute name="BeanCleaningDelay">0</attribute>
       </mbean>
      
       <mbean code="org.jboss.ha.jndi.HANamingService"
       name="jboss:service=HAJNDI">
       <!-- We now inject the partition into the HAJNDI service instead
       of requiring that the partition name be passed -->
       <depends optional-attribute-name="ClusterPartition"
       proxy-type="attribute">jboss:service=${jboss.partition.name:DefaultPartition}</depends>
       <!-- Bind address of bootstrap and HA-JNDI RMI endpoints -->
       <attribute name="BindAddress">${jboss.bind.address}</attribute>
       <!-- Port on which the HA-JNDI stub is made available -->
       <attribute name="Port">1100</attribute>
       <!-- RmiPort to be used by the HA-JNDI service once bound. 0 => auto. -->
       <attribute name="RmiPort">1101</attribute>
       <!-- Accept backlog of the bootstrap socket -->
       <attribute name="Backlog">50</attribute>
       <!-- The thread pool service used to control the bootstrap and
       auto discovery lookups -->
       <depends optional-attribute-name="LookupPool"
       proxy-type="attribute">jboss.system:service=ThreadPool</depends>
      
       <!-- A flag to disable the auto discovery via multicast -->
       <attribute name="DiscoveryDisabled">false</attribute>
       <!-- Set the auto-discovery bootstrap multicast bind address. If not
       specified and a BindAddress is specified, the BindAddress will be used. -->
       <attribute name="AutoDiscoveryBindAddress">${jboss.bind.address}</attribute>
       <!-- Multicast Address and group port used for auto-discovery -->
       <attribute name="AutoDiscoveryAddress">${jboss.partition.udpGroup:230.0.0.4}</attribute>
       <attribute name="AutoDiscoveryGroup">1102</attribute>
       <!-- The TTL (time-to-live) for autodiscovery IP multicast packets -->
       <attribute name="AutoDiscoveryTTL">16</attribute>
       <!-- The load balancing policy for HA-JNDI -->
       <attribute name="LoadBalancePolicy">org.jboss.ha.framework.interfaces.RoundRobin</attribute>
      
       <!-- Client socket factory to be used for client-server
       RMI invocations during JNDI queries
       <attribute name="ClientSocketFactory">custom</attribute>
       -->
       <!-- Server socket factory to be used for client-server
       RMI invocations during JNDI queries
       <attribute name="ServerSocketFactory">custom</attribute>
       -->
       </mbean>
      
       <mbean code="org.jboss.invocation.jrmp.server.JRMPInvokerHA"
       name="jboss:service=invoker,type=jrmpha">
       <attribute name="ServerAddress">${jboss.bind.address}</attribute>
       <attribute name="RMIObjectPort">4447</attribute>
       <!--
       <attribute name="RMIClientSocketFactory">custom</attribute>
       <attribute name="RMIServerSocketFactory">custom</attribute>
       -->
       <depends>jboss:service=Naming</depends>
       </mbean>
      
       <!-- the JRMPInvokerHA creates a thread per request. This implementation uses a pool of threads -->
       <mbean code="org.jboss.invocation.pooled.server.PooledInvokerHA"
       name="jboss:service=invoker,type=pooledha">
       <attribute name="NumAcceptThreads">1</attribute>
       <attribute name="MaxPoolSize">300</attribute>
       <attribute name="ClientMaxPoolSize">300</attribute>
       <attribute name="SocketTimeout">60000</attribute>
       <attribute name="ServerBindAddress">${jboss.bind.address}</attribute>
       <attribute name="ServerBindPort">4446</attribute>
       <attribute name="ClientConnectAddress">${jboss.bind.address}</attribute>
       <attribute name="ClientConnectPort">0</attribute>
       <attribute name="EnableTcpNoDelay">false</attribute>
       <depends optional-attribute-name="TransactionManagerService">jboss:service=TransactionManager</depends>
       <depends>jboss:service=Naming</depends>
       </mbean>
      
       <mbean code="org.jboss.cache.invalidation.bridges.JGCacheInvalidationBridge"
       name="jboss.cache:service=InvalidationBridge,type=JavaGroups">
       <!-- We now inject the partition into the HAJNDI service instead
       of requiring that the partition name be passed -->
       <depends optional-attribute-name="ClusterPartition"
       proxy-type="attribute">jboss:service=${jboss.partition.name:DefaultPartition}</depends>
       <depends>jboss.cache:service=InvalidationManager</depends>
       <attribute name="InvalidationManager">jboss.cache:service=InvalidationManager</attribute>
       <attribute name="BridgeName">DefaultJGBridge</attribute>
       </mbean>
      
      </server>
      


      We occasionally get ReplicationExceptions on A, and we have been able to verify that these occur during long (up to 4-5 minutes) GCs on B, where the jvm becomes unresponsive.

      As I read the config snippets above, node A will not shun until it doesn't receive a heartbeat for at least (20x5 + 1.5) = 101.5 seconds, but the replication timeout is 20 seconds.

      So my questions are -

      1) If I change the config so A will shun B before SyncReplTimeout, will this prevent replication during the time where B is unresponsive (hence preventing the ReplicationExceptions)?
      2) I then am expecting B to be shunned somewhate regularly, but I always want B to be able to rejoin the cluster when it becomes responsive again. From what I'm reading, this means setting shun=false. Without shunning, how do I prevent replication to the unresponsive node B?
      3) Are there further caveats that I need to consider? Will I need to make similar timeout/config changes to the cluster-config.xml?

      Thank you very much for your time.