5 Replies Latest reply on Sep 14, 2004 2:10 AM by belaban

    Replicated cache performance

    tmark

      Hi all,

      I am building a partitioned, fault tolerant cache using TreeCache and jgroups. At this stage I'm conducting performance tests with sustained write volume in the range of 50 req/s. Using 4 cache instances distributed across two Sun v210s (with JDK 1.4.2 in server mode), I am seeing significant degradation of TreeCache responsiveness on put operations after 5-10 minutes. Eventually all requests result in exceptions of the following type:

      16:51:21,316 ERROR [TreeCacheInstance] [Thread-132] TreeCacheInstance.put : exception thrown : org.jboss.util.NestedRuntimeException: rsp=sender=ever-sun2:33393, retval=null, received=false, suspected=false; - nested throwable: (org.jboss.cache.lock.TimeoutException: rsp=sender=ever-sun2:33393, retval=null, received=false, suspected=false)

      My question is whether I am running into limits of the jgroups architecture (one event queue per channel) or if some aspect of my application or configuration is causing an unnecessary bottleneck. Does anyone have anecdotal evidence of stable, sustainable transaction rates higher than 12/s for a single cache?

      My cache jgroups configuration is as follows:



      <UDP mcast_addr="228.1.2.3" mcast_port="45565"
      ip_ttl="64" ip_mcast="true"
      mcast_send_buf_size="150000" mcast_recv_buf_size="80000"
      ucast_send_buf_size="150000" ucast_recv_buf_size="80000"
      loopback="false"/>
      <PING timeout="2000" num_initial_members="3"
      up_thread="false" down_thread="false"/>
      <MERGE2 min_interval="10000" max_interval="20000"/>
      <FD_SOCK/>
      <VERIFY_SUSPECT timeout="1500"
      up_thread="false" down_thread="false"/>
      <pbcast.NAKACK gc_lag="50" retransmit_timeout="600,1200,2400,4800"
      max_xmit_size="8192" up_thread="false" down_thread="false"/>
      <UNICAST timeout="600,1200,2400" window_size="100" min_threshold="10"
      down_thread="false"/>
      <pbcast.STABLE desired_avg_gossip="20000"
      up_thread="false" down_thread="false"/>
      <FRAG frag_size="8192"
      down_thread="false" up_thread="false"/>
      <pbcast.GMS join_timeout="5000" join_retry_timeout="2000"
      shun="true" print_local_addr="true"/>
      <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>



      Thanks,

      Thomas

        • 1. Re: Replicated cache performance
          belaban

          WHAT props do you use for the cache itself ? Asycn/sync repl ? Do you begin/commit a TX after each op, or after multiple ops ?

          Bela

          • 2. Re: Replicated cache performance
            tmark

            I am not explicitly demarcating transactions, and I am using synchronous replication (though to be fair I tried async, which was not noticeably faster). My full cache XML is as follows:







            jboss:service=Naming
            jboss:service=TransactionManager

            org.jboss.cache.DummyTransactionManagerLookup

            REPEATABLE_READ

            REPL_SYNC

            false
            0
            0

            TreeCache-Cluster1



            <UDP mcast_addr="228.1.2.3" mcast_port="45565"
            ip_ttl="64" ip_mcast="true"
            mcast_send_buf_size="150000" mcast_recv_buf_size="80000"
            ucast_send_buf_size="150000" ucast_recv_buf_size="80000"
            loopback="false"/>
            <PING timeout="2000" num_initial_members="3"
            up_thread="false" down_thread="false"/>
            <MERGE2 min_interval="10000" max_interval="20000"/>
            <FD_SOCK/>
            <VERIFY_SUSPECT timeout="1500"
            up_thread="false" down_thread="false"/>
            <pbcast.NAKACK gc_lag="50" retransmit_timeout="600,1200,2400,4800"
            max_xmit_size="8192" up_thread="false" down_thread="false"/>
            <UNICAST timeout="600,1200,2400" window_size="100" min_threshold="10"
            down_thread="false"/>
            <pbcast.STABLE desired_avg_gossip="20000"
            up_thread="false" down_thread="false"/>
            <FRAG frag_size="8192"
            down_thread="false" up_thread="false"/>
            <pbcast.GMS join_timeout="5000" join_retry_timeout="2000"
            shun="true" print_local_addr="true"/>
            <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>



            false
            5000

            10000

            15000

            org.jboss.cache.eviction.LRUPolicy


            30
            <!-- Cache wide default -->

            150000
            1800





            • 3. Re: Replicated cache performance
              belaban

              So you
              #1 do *not* use transactions
              #2 use repl sync

              #1 disables bundling of modifications, therefore you get more traffic across the wire (after *each* modification)

              #2 is *much* slower than async replication

              It would help if you have a small test case, zipped up and including the configuration that shows how to reproduce this.

              Bela

              • 4. Re: Replicated cache performance
                tmark

                Bela,

                Thanks for the insight. You are correct in that I am not currently using transactions, but if I were, I would simply have a begin and commit surrounding every put operation, so I'm not sure I would accrue any tangible benefits.

                I will see if I can mock a small test case that consistently reproduces the problem. One additional note is that the objects I'm placing in the cache are of moderate (3-4k) size.

                One final question - is ASYNC replication compatible with the default LRU eviction policy?

                Thanks,

                Thomas

                • 5. 6125
                  belaban

                  Last q: yes. Replication and eviction are orthogonal.