3 Replies Latest reply on May 16, 2007 6:38 PM by Bela Ban

    Increasing threads handling replicate() messages for a clust

    Himadri Saha Newbie

      Hi,
      Under load, the replicate() message on a cluster takes too long to appear. Following is the log snippet:
      Note: both the nodes are time synced.

      Node1:
      2007-05-16 23:37:59,583 DEBUG [app-nbdvjlljy7o3|10.253.205.12-1503236585|5fbb8cb142088f50] invoking method _put; id:3(null, /ipunity/mgcpstac
      k/sentCommandCache/48400013, item, com.ipunity.ri.jain.protocol.ip.mgcp.SelfRetransmitTask@7b24c2, true), members=[10.253.205.16:53498, 10.25
      3.205.15:50987], mode=REPL_SYNC, exclude_self=true, timeout=10000
      2007-05-16 23:37:59,583 DEBUG [app-nbdvjlljy7o3|10.253.205.12-1503236585|5fbb8cb142088f50] Broadcasting call _put; id:3(null, /ipunity/mgcpst
      ack/sentCommandCache/48400013, item, com.ipunity.ri.jain.protocol.ip.mgcp.SelfRetransmitTask@7b24c2, true) to recipient list null
      2007-05-16 23:37:59,583 DEBUG [app-nbdvjlljy7o3|10.253.205.12-1503236585|5fbb8cb142088f50] callRemoteMethods(): valid members are [10.253.205
      .15:50987] method: _replicate; id:13(_put; id:3(null, /ipunity/mgcpstack/sentCommandCache/48400013, item, com.ipunity.ri.jain.protocol.ip.mgc
      p.SelfRetransmitTask@7b24c2, true))
      2007-05-16 23:37:59,583 DEBUG [app-nbdvjlljy7o3|10.253.205.12-1503236585|5fbb8cb142088f50] Marshalling object _replicate; id:13(_put; id:3(nu
      ll, /ipunity/mgcpstack/sentCommandCache/48400013, item, com.ipunity.ri.jain.protocol.ip.mgcp.SelfRetransmitTask@7b24c2, true))
      2007-05-16 23:37:59,583 DEBUG [app-nbdvjlljy7o3|10.253.205.12-1503236585|5fbb8cb142088f50] Warning: using object serialization for class com.
      ipunity.ri.jain.protocol.ip.mgcp.SelfRetransmitTask
      
      Node2:
      2007-05-16 23:38:12,726 DEBUG [] 10.253.205.15:50987 received call _put; id:3(null, /ipunity/mgcpstack/sentCommandCache/48400013, item, com.i
      punity.ri.jain.protocol.ip.mgcp.SelfRetransmitTask@1d38cb3, true)
      2007-05-16 23:38:12,726 DEBUG [] (10.253.205.15:50987) call on method [_put; id:3(null, /ipunity/mgcpstack/sentCommandCache/48400013, item, c
      om.ipunity.ri.jain.protocol.ip.mgcp.SelfRetransmitTask@1d38cb3, true)]
      2007-05-16 23:38:12,726 DEBUG [] PessimisticLockInterceptor invoked for method _put; id:3(null, /ipunity/mgcpstack/sentCommandCache/48400013,
       item, com.ipunity.ri.jain.protocol.ip.mgcp.SelfRetransmitTask@1d38cb3, true)
      2007-05-16 23:38:12,726 DEBUG [] Attempting to lock node /ipunity/mgcpstack/sentCommandCache/48400013 for owner Thread[UpHandler (STATE_TRANS
      FER),5,Pooled Threads]
      


      Observe that the replicate() messages appeared only after 12 seconds. I have 10 seconds as replication timeout. Hence i get replication exceptions from the node2 in the cluster. I would not want to increase the replication timeout because of performance issues in my application.

      My guess is all the jgroups receive threads were busy handling other messages in the cluster. Is there a way to specify the thread pool size .. or to turn off thread-pooling so that the messages are handled as they appear?

      My treecache.xml is as follows:
      <?xml version="1.0" encoding="UTF-8"?>
      
      <!-- ===================================================================== -->
      <!-- -->
      <!-- Sample TreeCache Service Configuration -->
      <!-- -->
      <!-- ===================================================================== -->
      
      <server>
      
       <classpath codebase="./lib" archives="jboss-cache.jar, jgroups.jar"/>
      
      
       <!-- ==================================================================== -->
       <!-- Defines TreeCache configuration -->
       <!-- ==================================================================== -->
      
       <mbean code="org.jboss.cache.TreeCache"
       name="jboss.cache:service=TreeCache">
      
       <depends>jboss:service=Naming</depends>
       <depends>jboss:service=TransactionManager</depends>
      
       <!--
       Configure the TransactionManager
       -->
       <attribute name="TransactionManagerLookupClass">com.ipunity.common.cache.WeblogicTransactionManagerLookup</attribute>
      
       <!--
       Isolation level : SERIALIZABLE
       REPEATABLE_READ (default)
       READ_COMMITTED
       READ_UNCOMMITTED
       NONE
       -->
       <attribute name="IsolationLevel">READ_COMMITTED</attribute>
      
       <!--
       Valid modes are LOCAL, REPL_ASYNC and REPL_SYNC
       -->
       <attribute name="CacheMode">REPL_SYNC</attribute>
      
       <!--
       Just used for async repl: use a replication queue
       -->
       <attribute name="UseReplQueue">false</attribute>
      
       <!--
       Replication interval for replication queue (in ms)
       -->
       <attribute name="ReplQueueInterval">0</attribute>
      
       <!--
       Max number of elements which trigger replication
       -->
       <attribute name="ReplQueueMaxElements">0</attribute>
      
       <!-- Name of cluster. Needs to be the same for all clusters, in order
       to find each other
       -->
       <attribute name="ClusterName">IPUnity-Cluster-2</attribute>
      
       <!-- JGroups protocol stack properties. Can also be a URL,
       e.g. file:/home/bela/default.xml
       <attribute name="ClusterProperties"></attribute>
       -->
      
       <attribute name="ClusterConfig">
       <config>
       <!-- UDP: if you have a multihomed machine,
       set the bind_addr attribute to the appropriate NIC IP address, e.g bind_addr="192.168.0.2"
       -->
       <!-- UDP: On Windows machines, because of the media sense feature
       being broken with multicast (even after disabling media sense)
       set the loopback attribute to true -->
       <UDP mcast_addr="224.10.10.16" mcast_port="45568"
       ip_ttl="64" ip_mcast="true"
       mcast_send_buf_size="150000" mcast_recv_buf_size="80000"
       ucast_send_buf_size="150000" ucast_recv_buf_size="80000"
       loopback="false" bind_addr="10.253.205.16"/>
       <PING timeout="2000" num_initial_members="3"
       up_thread="false" down_thread="false"/>
       <MERGE2 min_interval="10000" max_interval="20000"/>
       <!-- <FD shun="true" up_thread="true" down_thread="true" />-->
       <FD_SOCK/>
       <VERIFY_SUSPECT timeout="1500"
       up_thread="false" down_thread="false"/>
       <pbcast.NAKACK gc_lag="50" retransmit_timeout="600,1200,2400,4800"
       max_xmit_size="8192" up_thread="false" down_thread="false"/>
       <UNICAST timeout="600,1200,2400" window_size="100" min_threshold="10"
       down_thread="false"/>
       <pbcast.STABLE desired_avg_gossip="20000"
       up_thread="false" down_thread="false"/>
       <FRAG frag_size="8192"
       down_thread="false" up_thread="false"/>
       <pbcast.GMS join_timeout="5000" join_retry_timeout="2000"
       shun="true" print_local_addr="true"/>
       <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>
       </config>
       </attribute>
      
      
       <!--
       Whether or not to fetch state on joining a cluster
       -->
       <attribute name="FetchStateOnStartup">true</attribute>
      
       <!--
       The max amount of time (in milliseconds) we wait until the
       initial state (ie. the contents of the cache) are retrieved from
       existing members in a clustered environment
       -->
       <attribute name="InitialStateRetrievalTimeout">5000</attribute>
      
       <!--
       Number of milliseconds to wait until all responses for a
       synchronous call have been received.
       -->
       <attribute name="SyncReplTimeout">10000</attribute>
      
       <!-- Max number of milliseconds to wait for a lock acquisition -->
       <attribute name="LockAcquisitionTimeout">15000</attribute>
      
       <!-- Name of the eviction policy class. Not supported now. -->
       <attribute name="EvictionPolicyClass"></attribute>
      
       <!--
       <attribute name="CacheLoaderClass">org.jboss.cache.loader.bdbje.BdbjeCacheLoader</attribute>
       <attribute name="CacheLoaderConfig">c:\tmp\bdbje</attribute>
       <attribute name="CacheLoaderShared">true</attribute>
       <attribute name="CacheLoaderPreload">/</attribute>
       -->
      
      <!--
       <attribute name="CacheLoaderClass">org.jboss.cache.loader.FileCacheLoader</attribute>
       <attribute name="CacheLoaderConfig">/tmp</attribute>
       <attribute name="CacheLoaderShared">true</attribute>
       <attribute name="CacheLoaderPreload">/</attribute>
      -->
      
      
       </mbean>
      
      
       <!-- Uncomment to get a graphical view of the TreeCache MBean above -->
       <!-- <mbean code="org.jboss.cache.TreeCacheView" name="jboss.cache:service=TreeCacheView">-->
       <!-- <depends>jboss.cache:service=TreeCache</depends>-->
       <!-- <attribute name="CacheService">jboss.cache:service=TreeCache</attribute>-->
       <!-- </mbean>-->
      
      
      </server>
      


      Version Details follow:
      JBC version - 1.4.1SP3
      Application server - Weblogic

      Any help would be appreciated.

      Regards,
      Himadri