5 Replies Latest reply on Sep 14, 2004 2:10 AM by belaban

Replicated cache performance

tmark Sep 11, 2004 6:36 PM

Hi all,

I am building a partitioned, fault tolerant cache using TreeCache and jgroups. At this stage I'm conducting performance tests with sustained write volume in the range of 50 req/s. Using 4 cache instances distributed across two Sun v210s (with JDK 1.4.2 in server mode), I am seeing significant degradation of TreeCache responsiveness on put operations after 5-10 minutes. Eventually all requests result in exceptions of the following type:

16:51:21,316 ERROR [TreeCacheInstance] [Thread-132] TreeCacheInstance.put : exception thrown : org.jboss.util.NestedRuntimeException: rsp=sender=ever-sun2:33393, retval=null, received=false, suspected=false; - nested throwable: (org.jboss.cache.lock.TimeoutException: rsp=sender=ever-sun2:33393, retval=null, received=false, suspected=false)

My question is whether I am running into limits of the jgroups architecture (one event queue per channel) or if some aspect of my application or configuration is causing an unnecessary bottleneck. Does anyone have anecdotal evidence of stable, sustainable transaction rates higher than 12/s for a single cache?

My cache jgroups configuration is as follows:

<UDP mcast_addr="228.1.2.3" mcast_port="45565"
ip_ttl="64" ip_mcast="true"
mcast_send_buf_size="150000" mcast_recv_buf_size="80000"
ucast_send_buf_size="150000" ucast_recv_buf_size="80000"
loopback="false"/>
<PING timeout="2000" num_initial_members="3"
up_thread="false" down_thread="false"/>
<MERGE2 min_interval="10000" max_interval="20000"/>
<FD_SOCK/>
<VERIFY_SUSPECT timeout="1500"
up_thread="false" down_thread="false"/>
<pbcast.NAKACK gc_lag="50" retransmit_timeout="600,1200,2400,4800"
max_xmit_size="8192" up_thread="false" down_thread="false"/>
<UNICAST timeout="600,1200,2400" window_size="100" min_threshold="10"
down_thread="false"/>
<pbcast.STABLE desired_avg_gossip="20000"
up_thread="false" down_thread="false"/>
<FRAG frag_size="8192"
down_thread="false" up_thread="false"/>
<pbcast.GMS join_timeout="5000" join_retry_timeout="2000"
shun="true" print_local_addr="true"/>
<pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>

Thanks,

Thomas

1. Re: Replicated cache performance

belaban Sep 12, 2004 6:35 AM (in response to tmark)

WHAT props do you use for the cache itself ? Asycn/sync repl ? Do you begin/commit a TX after each op, or after multiple ops ?

Bela
Actions
2. Re: Replicated cache performance

tmark Sep 13, 2004 10:00 AM (in response to tmark)

I am not explicitly demarcating transactions, and I am using synchronous replication (though to be fair I tried async, which was not noticeably faster). My full cache XML is as follows:

jboss:service=Naming
jboss:service=TransactionManager

org.jboss.cache.DummyTransactionManagerLookup

REPEATABLE_READ

REPL_SYNC

false
0
0

TreeCache-Cluster1

<UDP mcast_addr="228.1.2.3" mcast_port="45565"
ip_ttl="64" ip_mcast="true"
mcast_send_buf_size="150000" mcast_recv_buf_size="80000"
ucast_send_buf_size="150000" ucast_recv_buf_size="80000"
loopback="false"/>
<PING timeout="2000" num_initial_members="3"
up_thread="false" down_thread="false"/>
<MERGE2 min_interval="10000" max_interval="20000"/>
<FD_SOCK/>
<VERIFY_SUSPECT timeout="1500"
up_thread="false" down_thread="false"/>
<pbcast.NAKACK gc_lag="50" retransmit_timeout="600,1200,2400,4800"
max_xmit_size="8192" up_thread="false" down_thread="false"/>
<UNICAST timeout="600,1200,2400" window_size="100" min_threshold="10"
down_thread="false"/>
<pbcast.STABLE desired_avg_gossip="20000"
up_thread="false" down_thread="false"/>
<FRAG frag_size="8192"
down_thread="false" up_thread="false"/>
<pbcast.GMS join_timeout="5000" join_retry_timeout="2000"
shun="true" print_local_addr="true"/>
<pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>

false
5000

10000

15000

org.jboss.cache.eviction.LRUPolicy

30


150000
1800
Actions
3. Re: Replicated cache performance

belaban Sep 13, 2004 10:42 AM (in response to tmark)

So you
#1 do *not* use transactions
#2 use repl sync

#1 disables bundling of modifications, therefore you get more traffic across the wire (after *each* modification)

#2 is *much* slower than async replication

It would help if you have a small test case, zipped up and including the configuration that shows how to reproduce this.

Bela
Actions
4. Re: Replicated cache performance

tmark Sep 13, 2004 10:56 PM (in response to tmark)

Bela,

Thanks for the insight. You are correct in that I am not currently using transactions, but if I were, I would simply have a begin and commit surrounding every put operation, so I'm not sure I would accrue any tangible benefits.

I will see if I can mock a small test case that consistently reproduces the problem. One additional note is that the objects I'm placing in the cache are of moderate (3-4k) size.

One final question - is ASYNC replication compatible with the default LRU eviction policy?

Thanks,

Thomas
Actions
5. 6125

belaban Sep 14, 2004 2:10 AM (in response to tmark)

Last q: yes. Replication and eviction are orthogonal.
Actions

Go to original post