Replicated cache performance
tmark Sep 11, 2004 6:36 PMHi all,
I am building a partitioned, fault tolerant cache using TreeCache and jgroups. At this stage I'm conducting performance tests with sustained write volume in the range of 50 req/s. Using 4 cache instances distributed across two Sun v210s (with JDK 1.4.2 in server mode), I am seeing significant degradation of TreeCache responsiveness on put operations after 5-10 minutes. Eventually all requests result in exceptions of the following type:
16:51:21,316 ERROR [TreeCacheInstance] [Thread-132] TreeCacheInstance.put : exception thrown : org.jboss.util.NestedRuntimeException: rsp=sender=ever-sun2:33393, retval=null, received=false, suspected=false; - nested throwable: (org.jboss.cache.lock.TimeoutException: rsp=sender=ever-sun2:33393, retval=null, received=false, suspected=false)
My question is whether I am running into limits of the jgroups architecture (one event queue per channel) or if some aspect of my application or configuration is causing an unnecessary bottleneck. Does anyone have anecdotal evidence of stable, sustainable transaction rates higher than 12/s for a single cache?
My cache jgroups configuration is as follows:
<UDP mcast_addr="228.1.2.3" mcast_port="45565"
ip_ttl="64" ip_mcast="true"
mcast_send_buf_size="150000" mcast_recv_buf_size="80000"
ucast_send_buf_size="150000" ucast_recv_buf_size="80000"
loopback="false"/>
<PING timeout="2000" num_initial_members="3"
up_thread="false" down_thread="false"/>
<MERGE2 min_interval="10000" max_interval="20000"/>
<FD_SOCK/>
<VERIFY_SUSPECT timeout="1500"
up_thread="false" down_thread="false"/>
<pbcast.NAKACK gc_lag="50" retransmit_timeout="600,1200,2400,4800"
max_xmit_size="8192" up_thread="false" down_thread="false"/>
<UNICAST timeout="600,1200,2400" window_size="100" min_threshold="10"
down_thread="false"/>
<pbcast.STABLE desired_avg_gossip="20000"
up_thread="false" down_thread="false"/>
<FRAG frag_size="8192"
down_thread="false" up_thread="false"/>
<pbcast.GMS join_timeout="5000" join_retry_timeout="2000"
shun="true" print_local_addr="true"/>
<pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>
Thanks,
Thomas