4 Replies Latest reply on Mar 1, 2007 2:28 PM by manik

Performance Tests and TimeoutException

ggrise Feb 13, 2007 11:23 AM

Hi,
We are doing some performance tests on the POJO cache to see if the cache will meet our requirements. We are planning going above 100 nodes but right now our test only include 8 nodes. The first test is simple, one node write (c.putObjet("/hashObj", hash) into the cache and all the others are reading/writing on an hashmap inside the pojo and we already experience locks problem.

Test specs:
========

Java 1.5
POJO pre-compilied with aopc
The cache version is 1.4.1.SP1 "Cayenne"

The exception occurring
========================

java.lang.RuntimeException: org.jboss.cache.lock.TimeoutException: failure acquiring lock: fqn=/hashObj/map, caller=GlobalTransaction:<10.105.204.141:4104>:1, lock=write
owner=GlobalTransaction:<10.105.204.141:4117>:14 (activeReaders=0, activeWriter=Thread[UpHandler (STATE_TRANSFER),5,JGroups threads], waitingReaders=0, waitingWriters=1, waitingUpgrader=0)
 at org.jboss.cache.aop.collection.CachedMapImpl.put(CachedMapImpl.java:8
0)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
 at java.lang.reflect.Method.invoke(Unknown Source)
 at org.jboss.cache.aop.collection.CollectionInterceptorUtil.invoke(CollectionInterceptorUtil.java:124)
 at org.jboss.cache.aop.collection.CachedMapInterceptor.invoke(CachedMapInterceptor.java:117)
 at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:98)
 at AOPClassProxy$0.put(AOPClassProxy$0.java)
 at com.cae.launchpad.dao.test.CommonObjectHash.putObject(CommonObjectHash.java:16)
 at com.cae.launchpad.dao.test.CollectionTest.run(CollectionTest.java:67)
 at com.cae.launchpad.dao.test.TestRunner.runTest(TestRunner.java:52)
 at com.cae.launchpad.dao.test.CacheTestJMX$1.run(CacheTestJMX.java:93)
Caused by: org.jboss.cache.lock.TimeoutException: failure acquiring lock: fqn=/hashObj/map, caller=GlobalTransaction:<10.105.204.141:4104>:1, lock=write
owner=GlobalTransaction:<10.105.204.141:4117>:14 (activeReaders=0, activeWriter=Thread[UpHandler (STATE_TRANSFER),5,JGroups threads], waitingReaders=0, waitingWriters=
1, waitingUpgrader=0)
 at org.jboss.cache.Node.acquire(Node.java:500)
 at org.jboss.cache.interceptors.PessimisticLockInterceptor.acquireNodeLock(PessimisticLockInterceptor.java:342)
 at org.jboss.cache.interceptors.PessimisticLockInterceptor.lock(PessimisticLockInterceptor.java:287)
 at org.jboss.cache.interceptors.PessimisticLockInterceptor.invoke(PessimisticLockInterceptor.java:172)
 at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68)
 at org.jboss.cache.interceptors.UnlockInterceptor.invoke(UnlockInterceptor.java:32)
 at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68)
 at org.jboss.cache.interceptors.ReplicationInterceptor.invoke(ReplicationInterceptor.java:39)
 at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68)
 at org.jboss.cache.interceptors.TxInterceptor.handleNonTxMethod(TxInterceptor.java:364)
 at org.jboss.cache.interceptors.TxInterceptor.invoke(TxInterceptor.java:160)
 at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68)
 at org.jboss.cache.interceptors.CacheMgmtInterceptor.invoke(CacheMgmtInterceptor.java:157)
 at org.jboss.cache.TreeCache.invokeMethod(TreeCache.java:5710)
 at org.jboss.cache.TreeCache.put(TreeCache.java:3782)
 at org.jboss.cache.aop.PojoCache.lockPojo(PojoCache.java:631)
 at org.jboss.cache.aop.PojoCache.putObject(PojoCache.java:461)
 at org.jboss.cache.aop.collection.CachedMapImpl.put(CachedMapImpl.java:78)
 ... 12 more
Caused by: org.jboss.cache.lock.TimeoutException: write lock for /hashObj/map could not be acquired after 10000 ms. Locks: Read lock owners: []
Write lock owner: GlobalTransaction:<10.105.204.141:4117>:14 (caller=GlobalTransaction:<10.105.204.141:4104>:1, lock info: write
owner=GlobalTransaction:<10.105.204.141:4117>:14 (activeReaders=0, activeWriter=Thread[UpHandler (STATE_TRANSFER),5,JGroups threads], waitingReaders=0, waitingWriters=1, waitingUpgrader=0))
 at org.jboss.cache.lock.IdentityLock.acquireWriteLock(IdentityLock.java:206)
 at org.jboss.cache.Node.acquireWriteLock(Node.java:529)
 at org.jboss.cache.Node.acquire(Node.java:476)
 ... 29 more

The test
======

// NOTE: One VM is writing and the others are reading

public class CollectionTest {

 public void run(PojoCache c) {



 try {
 Thread.sleep(2000);
 } catch (InterruptedException e) {
 e.printStackTrace();
 }


 if(System.getProperty("test.name") != null && System.getProperty("test.name").equals("runnerNode")) {

 CommonObjectHash hash = new CommonObjectHash();

 try {
 c.putObjet("/hashObj", hash);
 } catch (Exception e) {
 e.printStackTrace();
 }
 } else {
 try {
 Thread.sleep(500);
 } catch (InterruptedException e) {
 e.printStackTrace();
 }
 }

 String key = System.getProperty("test.name");


 CommonObjectHash obj = null;

 //wait until reference is available

 while(obj == null) {
 obj = (CommonObjectHash) c.get("/hashObj");
 try {
 Thread.sleep(500);
 } catch (InterruptedException e) {
 e.printStackTrace();
 }
 }


 for(int i = 0; i < 5;i++) {
 CommonObject objC = new CommonObject();
 byte[] b = new byte[1024];
 Arrays.fill(b, (byte)('A'+i));
 objC.setString(new String(b));
 obj.putObject(key+"-" + i, objC);
 }

 for(int i = 0; i < 5;i++) {

 CommonObject co = null;

 while(co == null) {

 co = obj.getObject("runnerNode-"+i);

 try {
 Thread.sleep(500);
 } catch (InterruptedException e) {
 e.printStackTrace();
 }
 }
 }
 for(int i = 0; i < 5;i++) {
 CommonObject co = obj.getObject(key+"-"+i);
 System.out.println("My objects: " + co);
 }
 try {
 Thread.sleep(15000);
 } catch (InterruptedException e) {
 e.printStackTrace();
 }
 }
}

The POJO:
=========

import java.util.*;

@org.jboss.cache.aop.annotation.PojoCacheable
public class CommonObjectHash {


 private HashMap map;

 public CommonObjectHash() {
 map = new HashMap();
 }

 public void putObject(String key, CommonObject obj) {
 map.put(key, obj);
 }

 public void removeObject(String key) {
 map.remove(key);

 }
 public CommonObject getObject(String key) {
 return (CommonObject)map.get(key);
 }
 public HashMap getHash() {
 return map;
 }
}

the cache setting:
=============

<?xml version="1.0" encoding="UTF-8"?>

<!-- ===================================================================== -->
<!-- -->
<!-- Sample TreeCache Service Configuration -->
<!-- -->
<!-- ===================================================================== -->

<server>

 <classpath codebase="./lib" archives="jboss-cache.jar, jgroups.jar"/>


 <!-- ==================================================================== -->
 <!-- Defines TreeCache configuration -->
 <!-- ==================================================================== -->

 <mbean code="org.jboss.cache.aop.PojoCache"
 name="jboss.cache:service=PojoCache">

 <depends>jboss:service=Naming</depends>
 <depends>jboss:service=TransactionManager</depends>

 <!--
 Configure the TransactionManager
 -->
 <attribute name="TransactionManagerLookupClass">org.jboss.cache.DummyTransactionManagerLookup</attribute>

 <!--
 Isolation level : SERIALIZABLE
 REPEATABLE_READ (default)
 READ_COMMITTED
 READ_UNCOMMITTED
 NONE
 -->
 <attribute name="IsolationLevel">REPEATABLE_READ</attribute>

 <!--
 Valid modes are LOCAL
 REPL_ASYNC
 REPL_SYNC
 INVALIDATION_ASYNC
 INVALIDATION_SYNC
 -->
 <attribute name="CacheMode">REPL_SYNC</attribute>

 <!--
 Just used for async repl: use a replication queue
 -->
 <attribute name="UseReplQueue">false</attribute>

 <!--
 Replication interval for replication queue (in ms)
 -->
 <attribute name="ReplQueueInterval">0</attribute>

 <!--
 Max number of elements which trigger replication
 -->
 <attribute name="ReplQueueMaxElements">0</attribute>

 <!-- Name of cluster. Needs to be the same for all clusters, in order
 to find each other
 -->
 <attribute name="ClusterName">TreeCache-Cluster</attribute>

 <!-- JGroups protocol stack properties. Can also be a URL,
 e.g. file:/home/bela/default.xml
 <attribute name="ClusterProperties"></attribute>
 -->

 <attribute name="ClusterConfig">
 <config>
 <!-- UDP: if you have a multihomed machine,
 set the bind_addr attribute to the appropriate NIC IP address, e.g bind_addr="192.168.0.2"
 -->
 <!-- UDP: On Windows machines, because of the media sense feature
 being broken with multicast (even after disabling media sense)
 set the loopback attribute to true -->
 <UDP mcast_addr="228.1.2.3" mcast_port="48866"
 ip_ttl="64" ip_mcast="true"
 mcast_send_buf_size="150000" mcast_recv_buf_size="80000"
 ucast_send_buf_size="150000" ucast_recv_buf_size="80000"
 loopback="false"/>
 <PING timeout="2000" num_initial_members="3"
 up_thread="false" down_thread="false"/>
 <MERGE2 min_interval="10000" max_interval="20000"/>
 <!-- <FD shun="true" up_thread="true" down_thread="true" />-->
 <FD_SOCK/>
 <VERIFY_SUSPECT timeout="1500"
 up_thread="false" down_thread="false"/>
 <pbcast.NAKACK gc_lag="50" retransmit_timeout="600,1200,2400,4800"
 max_xmit_size="8192" up_thread="false" down_thread="false"/>
 <UNICAST timeout="600,1200,2400" window_size="100" min_threshold="10"
 down_thread="false"/>
 <pbcast.STABLE desired_avg_gossip="20000"
 up_thread="false" down_thread="false"/>
 <FRAG frag_size="8192"
 down_thread="false" up_thread="false"/>
 <pbcast.GMS join_timeout="5000" join_retry_timeout="2000"
 shun="true" print_local_addr="true"/>
 <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>
 </config>
 </attribute>


 <!--
 Whether or not to fetch state on joining a cluster
 NOTE this used to be called FetchStateOnStartup and has been renamed to be more descriptive.
 -->
 <attribute name="FetchInMemoryState">true</attribute>

 <!--
 The max amount of time (in milliseconds) we wait until the
 initial state (ie. the contents of the cache) are retrieved from
 existing members in a clustered environment
 -->
 <attribute name="InitialStateRetrievalTimeout">15000</attribute>

 <!--
 Number of milliseconds to wait until all responses for a
 synchronous call have been received.
 -->
 <attribute name="SyncReplTimeout">15000</attribute>

 <!-- Max number of milliseconds to wait for a lock acquisition -->
 <attribute name="LockAcquisitionTimeout">10000</attribute>

 <!-- Name of the eviction policy class. -->
 <attribute name="EvictionPolicyClass"></attribute>

 <!--
 Indicate whether to use marshalling or not. Set this to true if you are running under a scoped
 class loader, e.g., inside an application server. Default is "false".
 -->
 <attribute name="UseMarshalling">false</attribute>
 </mbean>


 <!-- Uncomment to get a graphical view of the TreeCache MBean above
 <mbean code="org.jboss.cache.TreeCacheView" name="jboss.cache:service=TreeCacheView">
 <depends>jboss.cache:service=PojoCache</depends>
 <attribute name="CacheService">jboss.cache:service=PojoCache</attribute>
 </mbean> -->


</server>

We certainly badly configured the cache???

Someone can help??
Thanks for your support

Gabriel Grise

1. Re: Performance Tests and TimeoutException

ben.wang Feb 13, 2007 9:59 PM (in response to ggrise)

So among 8 node cluster, only one node is doing write with single-thread write while others are reading?
Actions
2. Re: Performance Tests and TimeoutException

ggrise Feb 15, 2007 11:53 AM (in response to ggrise)
more informations about my problem:

We still experience concurrent problem whether the cache is configured optimistic or pessimistic. Our test is simple:

1- A POJO containing a collection (HashMap) is written into the cache by JVM1 (line 5-6).
2- Then JVM1 to JVM8 are putting objects 1024 bytes into the collection (13-15) and reading the back the object from the collection (16-22).
1 JVM1-8: 2 PojoCache cache = new PojoCache(); //including start(); 3 sleep 2 ----------------------------------------------------------------------- 4 JVM1: | JVM2-8: 5 ObjectWithHashMap owhm = new ObjectWithHashMap(); | sleep 0.5 6 cache.putObject("/hashObj", owt); | ----------------------------------------------------------------------- 7 JVM1-8: | 8 ObjectWithHashMap owhm = cache.getObject("/hashObj"); | 9 while(owhm == null) { | 10 owhm = cache.getObject("/hashObj"); | 11 sleep 0.5 | 12 } | | 13 for(int i = 0; i < 5; i++) { | 14 owhm.put("JVM${JVM_NUMBER}"+i, new PojoOf1024bytes());| 15 } | 16 for(int i = 0; i < 5; i++) { | 17 PojoOf1024bytes pi = owhm.get("JVM1"+i); | 18 while(pi == null) { | 19 pi=owhm.get("JVM1"+i); | 21 sleep 0.5 22 } 23 }

With optimistic observation:
============================
some JVM are unable to take objects inside the collection (lines 16-22), and in other case, unable to take pojo containing the collection.

With pessimistic observation:
============================
We timeout exception or the same behavior as with optimistic

thanks

Gabriel.
Actions
3. Re: Performance Tests and TimeoutException

patlap Feb 26, 2007 6:45 AM (in response to ggrise)

Hi everyone,

JBosscache seems to be a great cache but our tests did not demonstrate the expectected results. We have only 8 nodes so far and we would like to go up to 100nodes. With 8 nodes, we already experienced lock problems in both modes (optimistic, pessimistic).

So few questions come to my mind:

1- How many nodes can we expect to go for best result in the cluster?
2- What is the maximun number of nodes that have been tried using the cache so far?

thanks for you support
Actions
4. Re: Performance Tests and TimeoutException

manik Mar 1, 2007 2:28 PM (in response to ggrise)

The problem with going up to more than say 5 or 6 nodes is that replication and 2 phase commits across a cluster become a big overhead and you will find that the app does not scale. I'm assuming you're using synchronous replication.

As a solution, to help things scale better, either use asunchronous replication or buddy replication.
Actions

Go to original post