4 Replies Latest reply on Jul 28, 2004 12:12 PM by belaban

    Cluster Problem: Is there cache object size limits ?

    hanson

      I am testing the cluster function of TreeCache.

      steps
      1)One cache is started up first, it will be put about 10,000 objects.
      2)The other cache is start up. it get data from the first cache


      the testing source is
      /////////////////////////////////////////////////////////////////////////////////
      import org.jboss.cache.*;
      import java.io.*;

      public class MyTreeCache {

      public static void main(String[] args) {

      try{

      TreeCache tree = new TreeCache();
      PropertyConfigurator config = new PropertyConfigurator(); config.configure(tree, "META-INF/replAsync-service.xml");
      tree.start(); // kick start tree cache

      //the two line will be commented in second cache
      //since the second cache just "read" data from the first
      long time = fillCache(tree,"a",100000);
      System.out.println("time = " +time);


      while(true)
      {
      Thread.sleep(5000);
      Node node = tree.get(new Fqn( new Object[] { "a" } ));
      System.out.println("data size " +node.getDataKeys().size() );
      }
      }
      catch(Exception e)
      {
      e.printStackTrace();
      }
      }

      private static long fillCache(TreeCache cache, String regionRoot, int count)
      throws Exception {
      long time = System.currentTimeMillis();
      for (int i = 0; i < count; i++) {

      String item = "item"+i;
      CacheMessage value = new CacheMessage(i);
      cache.put(new Fqn( new Object[] { regionRoot} ), item, value);
      }

      return System.currentTimeMillis() - time;
      }

      }

      class CacheMessage implements Serializable
      {
      public int index;
      public byte [] body;

      CacheMessage(int index)
      {
      body = new byte[100];
      this.index = index;
      }

      }
      ///////////////////////////////////////////////////////////////////////////////////


      when CacheMessage's member "body" szie is 100, the second cache will not able to get data from fisrt cache.

      This is the error message
      10:36:22,171 WARN [AckReceiverWindow] discarded msg with seqno=159 (next msg to
      receive is 164)
      10:36:22,171 WARN [AckReceiverWindow] discarded msg with seqno=145 .............................

      but if the size is 50, it's ok

      the cache config files replAsync-service.xml is copy from etc/META-INF directory. I adjust the log level to INFO.

      is there any size limits in jboss TreeCache?




      ////////////////////////////////////////////////////////////////////////////////////
      the tree config file

      <?xml version="1.0" encoding="UTF-8"?>

      <!-- ===================================================================== -->
      <!-- -->
      <!-- Sample TreeCache Service Configuration -->
      <!-- -->
      <!-- ===================================================================== -->






      <!-- ==================================================================== -->
      <!-- Defines TreeCache configuration -->
      <!-- ==================================================================== -->



      jboss:service=Naming
      jboss:service=TransactionManager

      <!--
      Configure the TransactionManager
      -->
      org.jboss.cache.DummyTransactionManagerLookup

      <!--
      Isolation level : SERIALIZABLE
      REPEATABLE_READ (default)
      READ_COMMITTED
      READ_UNCOMMITTED
      NONE
      -->
      REPEATABLE_READ

      <!--
      Valid modes are LOCAL, REPL_ASYNC and REPL_SYNC
      -->
      REPL_ASYNC

      <!--
      Just used for async repl: use a replication queue
      -->
      false

      <!--
      Replication interval for replication queue (in ms)
      -->
      0

      <!--
      Max number of elements which trigger replication
      -->
      0

      <!-- Name of cluster. Needs to be the same for all clusters, in order
      to find each other
      -->
      TreeCache-Cluster

      <!-- JGroups protocol stack properties. Can also be a URL,
      e.g. file:/home/bela/default.xml

      -->



      <!-- UDP: if you have a multihomed machine,
      set the bind_addr attribute to the appropriate NIC IP address -->
      <!-- UDP: On Windows machines, because of the media sense feature
      being broken with multicast (even after disabling media sense)
      set the loopback attribute to true -->
      <UDP mcast_addr="228.1.2.3" mcast_port="45566"
      ip_ttl="64" ip_mcast="true"
      mcast_send_buf_size="150000" mcast_recv_buf_size="80000"
      ucast_send_buf_size="150000" ucast_recv_buf_size="80000"
      loopback="false"/>
      <PING timeout="2000" num_initial_members="3"
      up_thread="false" down_thread="false"/>
      <MERGE2 min_interval="10000" max_interval="20000"/>
      <!-- <FD shun="true" up_thread="true" down_thread="true" />-->
      <FD_SOCK/>
      <VERIFY_SUSPECT timeout="1500"
      up_thread="false" down_thread="false"/>
      <pbcast.NAKACK gc_lag="50" retransmit_timeout="600,1200,2400,4800"
      max_xmit_size="8192" up_thread="false" down_thread="false"/>
      <UNICAST timeout="600,1200,2400" window_size="100" min_threshold="10"
      down_thread="false"/>
      <pbcast.STABLE desired_avg_gossip="20000"
      up_thread="false" down_thread="false"/>
      <FRAG frag_size="8192"
      down_thread="false" up_thread="false"/>
      <pbcast.GMS join_timeout="5000" join_retry_timeout="2000"
      shun="true" print_local_addr="true"/>
      <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>



      <!--
      Max number of entries in the cache. If this is exceeded, the
      eviction policy will kick some entries out in order to make
      more room
      -->
      2000000

      <!--
      Whether or not to fetch state on joining a cluster
      -->
      true

      <!--
      The max amount of time (in milliseconds) we wait until the
      initial state (ie. the contents of the cache) are retrieved from
      existing members in a clustered environment
      -->
      5000

      <!--
      Number of milliseconds to wait until all responses for a
      synchronous call have been received.
      -->
      10000

      <!-- Max number of milliseconds to wait for a lock acquisition -->
      15000

      <!-- Max number of milliseconds we hold a lock (not currently
      implemented) -->
      60000

      <!-- Name of the eviction policy class. Not supported now. -->




      <!-- Uncomment to get a graphical view of the TreeCache MBean above -->
      <!-- -->
      <!-- jboss.cache:service=TreeCache-->
      <!-- jboss.cache:service=TreeCache-->
      <!-- -->







        • 1. Re: Cluster Problem: Is there cache object size limits ?
          hanson


          The problem is solved. the default "InitialStateRetrievalTimeout" is 5 secs. I change it to 500 secs.it's ok
          but i got new problem: the time retrieving data between two cache is much longer than that inserting same data into local cache.
          inserting data into cache locally is about 8 secs. but retrieving data from another cache is
          1) when "body" size is 100, retrieval time is about 30 secs.
          2) when "body" size is 500 , retrieval time is about 120 secs.

          I distributed the two cache to different machine. The machine used was single-processor Intel Pentium 4 1.8GHz, 256M RAM.

          I noticed that when second cache startup, it's jvm memory usage is not increased unit retrieval's last serveral seconds.
          I don't now the what the clustering is doing before it . when the retrieval is over, the fisrt jvm use almost 600M memory, i don't
          know whether memory leaks.



          • 2. Re: Cluster Problem: Is there cache object size limits ?
            belaban

            Increase 5000 in your cache XML file.

            Bela

            • 3. Re: Cluster Problem: Is there cache object size limits ?
              hanson

              Bela , thanks for your reply.

              I just test replicating performance under solaris platform, meet some question.

              Steps:
              Start up first jboss cache , inserted 100,000 objects into it , the object size is about 500 bytes. The inserting time is about 10 secs.

              -------------------------------------------------------
              GMS: address is fire1:38833
              -------------------------------------------------------
              16:57:36,433 INFO [TreeCache] viewAccepted(): new members: [fire1:38833]
              16:57:36,447 INFO [TreeCache] state could not be retrieved (must be first member in group)
              16:57:36,447 INFO [TreeCache] setState(): new cache is null (maybe first member in cluster)
              time = 10319
              data size 100000


              then start second jboss cache to backup the contents of the first , the replicating time is about 20 secs.
              17:03:56,372 WARN [AckReceiverWindow] discarded msg with seqno=6232 (next msg to receive is 6558)
              17:03:56,373 WARN [AckReceiverWindow] discarded msg with seqno=6278 (next msg to receive is 6558)
              17:03:56,373 WARN [AckReceiverWindow] discarded msg with seqno=6360 (next msg to receive is 6558)
              17:03:56,374 WARN [AckReceiverWindow] discarded msg with seqno=6412 (next msg to receive is 6558)
              17:03:59,545 INFO [TreeCache] setState(): locking the old tree
              17:03:59,567 INFO [TreeCache] setState(): locking the old tree was successful
              17:03:59,568 INFO [TreeCache] setState(): forcing release of all locks in old tree
              17:03:59,568 INFO [TreeCache] state was retrieved successfully (in 21450 milliseconds


              the memory usage on first
              1597 hanson 26 29 10 570M 474M sleep 0:21 0.95% java

              the memory usage on backup
              18990 hanson 24 28 10 566M 277M sleep 0:10 0.03% java


              if insert 400,000 objects, inserting time is about 40 secs

              -------------------------------------------------------
              GMS: address is fire1:38856
              -------------------------------------------------------
              17:03:55,271 INFO [TreeCache] viewAccepted(): new members: [fire1:38856]
              17:03:55,291 INFO [TreeCache] setState(): new cache is null (maybe first member in cluster)
              17:03:55,292 INFO [TreeCache] state could not be retrieved (must be first member in group)
              time = 40046


              and replicating time is about 100 secs

              17:13:46,654 WARN [AckReceiverWindow] discarded msg with seqno=25746 (next msg to receive is 26175)
              17:13:46,655 WARN [AckReceiverWindow] discarded msg with seqno=25810 (next msg to receive is 26175)
              17:13:46,655 WARN [AckReceiverWindow] discarded msg with seqno=25852 (next msg to receive is 26175)
              17:13:46,655 WARN [AckReceiverWindow] discarded msg with seqno=26036 (next msg to receive is 26175)
              17:14:00,739 INFO [TreeCache] setState(): locking the old tree
              17:14:00,763 INFO [TreeCache] setState(): locking the old tree was successful
              17:14:00,764 INFO [TreeCache] setState(): forcing release of all locks in old tree
              17:14:00,764 INFO [TreeCache] state was retrieved successfully (in 98070 milliseconds
              data size 400000


              the memory usage on first
              23659 hanson 26 29 10 1881M 1632M sleep 1:37 0.12% java
              the memory usage on backup
              19012 hanson 25 28 10 613M 579M sleep 0:35 0.16% java

              My Question is

              [1] why the first jboss cache eat so much memory after replicating? before replicating, it only eat about 560M memory

              26793 hanson 24 29 10 566M 320M sleep 0:41 8.87% java

              [2] when insert 800,000 objects into first cache , then start backup cache. I found the repli cating failed in 500 sec .However, the first jboss cache eat almost all memoy (1800M)
              [3] Is there any way to improve the performance when replicating huge volume data (>500M)? How many meomory required?

              • 4. Re: Cluster Problem: Is there cache object size limits ?
                belaban

                2 issues:

                #1 When we do state transfer, we have to copy the state (actually worse: serialize it) into a byte[] buffer. Same happens on the receiver. This means that you will have a memory spike that is double the size of your state. If your state is 400M, then allocate at least 1GB of memory to your JVM

                #2 For large states I have an item on the todo list which is to provide a streaming state transfer API, where you transfer chunks (e.g. 10K in size) of state across the network. Your app (sender and receiver) therefore don't have to allocate double the memory of their state, but just an additional <chunk-size>, e.g. 10K.

                Bela