4 Replies Latest reply on Mar 1, 2007 2:28 PM by Manik Surtani

    Performance Tests and TimeoutException

    Gabriel Grise Newbie

      Hi,
      We are doing some performance tests on the POJO cache to see if the cache will meet our requirements. We are planning going above 100 nodes but right now our test only include 8 nodes. The first test is simple, one node write (c.putObjet("/hashObj", hash) into the cache and all the others are reading/writing on an hashmap inside the pojo and we already experience locks problem.

      Test specs:
      ========

      Java 1.5
      POJO pre-compilied with aopc
      The cache version is 1.4.1.SP1 "Cayenne"

      The exception occurring
      ========================

      java.lang.RuntimeException: org.jboss.cache.lock.TimeoutException: failure acquiring lock: fqn=/hashObj/map, caller=GlobalTransaction:<10.105.204.141:4104>:1, lock=write
      owner=GlobalTransaction:<10.105.204.141:4117>:14 (activeReaders=0, activeWriter=Thread[UpHandler (STATE_TRANSFER),5,JGroups threads], waitingReaders=0, waitingWriters=1, waitingUpgrader=0)
       at org.jboss.cache.aop.collection.CachedMapImpl.put(CachedMapImpl.java:8
      0)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
       at java.lang.reflect.Method.invoke(Unknown Source)
       at org.jboss.cache.aop.collection.CollectionInterceptorUtil.invoke(CollectionInterceptorUtil.java:124)
       at org.jboss.cache.aop.collection.CachedMapInterceptor.invoke(CachedMapInterceptor.java:117)
       at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:98)
       at AOPClassProxy$0.put(AOPClassProxy$0.java)
       at com.cae.launchpad.dao.test.CommonObjectHash.putObject(CommonObjectHash.java:16)
       at com.cae.launchpad.dao.test.CollectionTest.run(CollectionTest.java:67)
       at com.cae.launchpad.dao.test.TestRunner.runTest(TestRunner.java:52)
       at com.cae.launchpad.dao.test.CacheTestJMX$1.run(CacheTestJMX.java:93)
      Caused by: org.jboss.cache.lock.TimeoutException: failure acquiring lock: fqn=/hashObj/map, caller=GlobalTransaction:<10.105.204.141:4104>:1, lock=write
      owner=GlobalTransaction:<10.105.204.141:4117>:14 (activeReaders=0, activeWriter=Thread[UpHandler (STATE_TRANSFER),5,JGroups threads], waitingReaders=0, waitingWriters=
      1, waitingUpgrader=0)
       at org.jboss.cache.Node.acquire(Node.java:500)
       at org.jboss.cache.interceptors.PessimisticLockInterceptor.acquireNodeLock(PessimisticLockInterceptor.java:342)
       at org.jboss.cache.interceptors.PessimisticLockInterceptor.lock(PessimisticLockInterceptor.java:287)
       at org.jboss.cache.interceptors.PessimisticLockInterceptor.invoke(PessimisticLockInterceptor.java:172)
       at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68)
       at org.jboss.cache.interceptors.UnlockInterceptor.invoke(UnlockInterceptor.java:32)
       at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68)
       at org.jboss.cache.interceptors.ReplicationInterceptor.invoke(ReplicationInterceptor.java:39)
       at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68)
       at org.jboss.cache.interceptors.TxInterceptor.handleNonTxMethod(TxInterceptor.java:364)
       at org.jboss.cache.interceptors.TxInterceptor.invoke(TxInterceptor.java:160)
       at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68)
       at org.jboss.cache.interceptors.CacheMgmtInterceptor.invoke(CacheMgmtInterceptor.java:157)
       at org.jboss.cache.TreeCache.invokeMethod(TreeCache.java:5710)
       at org.jboss.cache.TreeCache.put(TreeCache.java:3782)
       at org.jboss.cache.aop.PojoCache.lockPojo(PojoCache.java:631)
       at org.jboss.cache.aop.PojoCache.putObject(PojoCache.java:461)
       at org.jboss.cache.aop.collection.CachedMapImpl.put(CachedMapImpl.java:78)
       ... 12 more
      Caused by: org.jboss.cache.lock.TimeoutException: write lock for /hashObj/map could not be acquired after 10000 ms. Locks: Read lock owners: []
      Write lock owner: GlobalTransaction:<10.105.204.141:4117>:14 (caller=GlobalTransaction:<10.105.204.141:4104>:1, lock info: write
      owner=GlobalTransaction:<10.105.204.141:4117>:14 (activeReaders=0, activeWriter=Thread[UpHandler (STATE_TRANSFER),5,JGroups threads], waitingReaders=0, waitingWriters=1, waitingUpgrader=0))
       at org.jboss.cache.lock.IdentityLock.acquireWriteLock(IdentityLock.java:206)
       at org.jboss.cache.Node.acquireWriteLock(Node.java:529)
       at org.jboss.cache.Node.acquire(Node.java:476)
       ... 29 more
      





      The test
      ======

      // NOTE: One VM is writing and the others are reading
      public class CollectionTest {
      
       public void run(PojoCache c) {
      
      
      
       try {
       Thread.sleep(2000);
       } catch (InterruptedException e) {
       e.printStackTrace();
       }
      
      
       if(System.getProperty("test.name") != null && System.getProperty("test.name").equals("runnerNode")) {
      
       CommonObjectHash hash = new CommonObjectHash();
      
       try {
       c.putObjet("/hashObj", hash);
       } catch (Exception e) {
       e.printStackTrace();
       }
       } else {
       try {
       Thread.sleep(500);
       } catch (InterruptedException e) {
       e.printStackTrace();
       }
       }
      
       String key = System.getProperty("test.name");
      
      
       CommonObjectHash obj = null;
      
       //wait until reference is available
      
       while(obj == null) {
       obj = (CommonObjectHash) c.get("/hashObj");
       try {
       Thread.sleep(500);
       } catch (InterruptedException e) {
       e.printStackTrace();
       }
       }
      
      
       for(int i = 0; i < 5;i++) {
       CommonObject objC = new CommonObject();
       byte[] b = new byte[1024];
       Arrays.fill(b, (byte)('A'+i));
       objC.setString(new String(b));
       obj.putObject(key+"-" + i, objC);
       }
      
       for(int i = 0; i < 5;i++) {
      
       CommonObject co = null;
      
       while(co == null) {
      
       co = obj.getObject("runnerNode-"+i);
      
       try {
       Thread.sleep(500);
       } catch (InterruptedException e) {
       e.printStackTrace();
       }
       }
       }
       for(int i = 0; i < 5;i++) {
       CommonObject co = obj.getObject(key+"-"+i);
       System.out.println("My objects: " + co);
       }
       try {
       Thread.sleep(15000);
       } catch (InterruptedException e) {
       e.printStackTrace();
       }
       }
      }
      



      The POJO:
      =========
      import java.util.*;
      
      @org.jboss.cache.aop.annotation.PojoCacheable
      public class CommonObjectHash {
      
      
       private HashMap map;
      
       public CommonObjectHash() {
       map = new HashMap();
       }
      
       public void putObject(String key, CommonObject obj) {
       map.put(key, obj);
       }
      
       public void removeObject(String key) {
       map.remove(key);
      
       }
       public CommonObject getObject(String key) {
       return (CommonObject)map.get(key);
       }
       public HashMap getHash() {
       return map;
       }
      }
      




      the cache setting:
      =============
      <?xml version="1.0" encoding="UTF-8"?>
      
      <!-- ===================================================================== -->
      <!-- -->
      <!-- Sample TreeCache Service Configuration -->
      <!-- -->
      <!-- ===================================================================== -->
      
      <server>
      
       <classpath codebase="./lib" archives="jboss-cache.jar, jgroups.jar"/>
      
      
       <!-- ==================================================================== -->
       <!-- Defines TreeCache configuration -->
       <!-- ==================================================================== -->
      
       <mbean code="org.jboss.cache.aop.PojoCache"
       name="jboss.cache:service=PojoCache">
      
       <depends>jboss:service=Naming</depends>
       <depends>jboss:service=TransactionManager</depends>
      
       <!--
       Configure the TransactionManager
       -->
       <attribute name="TransactionManagerLookupClass">org.jboss.cache.DummyTransactionManagerLookup</attribute>
      
       <!--
       Isolation level : SERIALIZABLE
       REPEATABLE_READ (default)
       READ_COMMITTED
       READ_UNCOMMITTED
       NONE
       -->
       <attribute name="IsolationLevel">REPEATABLE_READ</attribute>
      
       <!--
       Valid modes are LOCAL
       REPL_ASYNC
       REPL_SYNC
       INVALIDATION_ASYNC
       INVALIDATION_SYNC
       -->
       <attribute name="CacheMode">REPL_SYNC</attribute>
      
       <!--
       Just used for async repl: use a replication queue
       -->
       <attribute name="UseReplQueue">false</attribute>
      
       <!--
       Replication interval for replication queue (in ms)
       -->
       <attribute name="ReplQueueInterval">0</attribute>
      
       <!--
       Max number of elements which trigger replication
       -->
       <attribute name="ReplQueueMaxElements">0</attribute>
      
       <!-- Name of cluster. Needs to be the same for all clusters, in order
       to find each other
       -->
       <attribute name="ClusterName">TreeCache-Cluster</attribute>
      
       <!-- JGroups protocol stack properties. Can also be a URL,
       e.g. file:/home/bela/default.xml
       <attribute name="ClusterProperties"></attribute>
       -->
      
       <attribute name="ClusterConfig">
       <config>
       <!-- UDP: if you have a multihomed machine,
       set the bind_addr attribute to the appropriate NIC IP address, e.g bind_addr="192.168.0.2"
       -->
       <!-- UDP: On Windows machines, because of the media sense feature
       being broken with multicast (even after disabling media sense)
       set the loopback attribute to true -->
       <UDP mcast_addr="228.1.2.3" mcast_port="48866"
       ip_ttl="64" ip_mcast="true"
       mcast_send_buf_size="150000" mcast_recv_buf_size="80000"
       ucast_send_buf_size="150000" ucast_recv_buf_size="80000"
       loopback="false"/>
       <PING timeout="2000" num_initial_members="3"
       up_thread="false" down_thread="false"/>
       <MERGE2 min_interval="10000" max_interval="20000"/>
       <!-- <FD shun="true" up_thread="true" down_thread="true" />-->
       <FD_SOCK/>
       <VERIFY_SUSPECT timeout="1500"
       up_thread="false" down_thread="false"/>
       <pbcast.NAKACK gc_lag="50" retransmit_timeout="600,1200,2400,4800"
       max_xmit_size="8192" up_thread="false" down_thread="false"/>
       <UNICAST timeout="600,1200,2400" window_size="100" min_threshold="10"
       down_thread="false"/>
       <pbcast.STABLE desired_avg_gossip="20000"
       up_thread="false" down_thread="false"/>
       <FRAG frag_size="8192"
       down_thread="false" up_thread="false"/>
       <pbcast.GMS join_timeout="5000" join_retry_timeout="2000"
       shun="true" print_local_addr="true"/>
       <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>
       </config>
       </attribute>
      
      
       <!--
       Whether or not to fetch state on joining a cluster
       NOTE this used to be called FetchStateOnStartup and has been renamed to be more descriptive.
       -->
       <attribute name="FetchInMemoryState">true</attribute>
      
       <!--
       The max amount of time (in milliseconds) we wait until the
       initial state (ie. the contents of the cache) are retrieved from
       existing members in a clustered environment
       -->
       <attribute name="InitialStateRetrievalTimeout">15000</attribute>
      
       <!--
       Number of milliseconds to wait until all responses for a
       synchronous call have been received.
       -->
       <attribute name="SyncReplTimeout">15000</attribute>
      
       <!-- Max number of milliseconds to wait for a lock acquisition -->
       <attribute name="LockAcquisitionTimeout">10000</attribute>
      
       <!-- Name of the eviction policy class. -->
       <attribute name="EvictionPolicyClass"></attribute>
      
       <!--
       Indicate whether to use marshalling or not. Set this to true if you are running under a scoped
       class loader, e.g., inside an application server. Default is "false".
       -->
       <attribute name="UseMarshalling">false</attribute>
       </mbean>
      
      
       <!-- Uncomment to get a graphical view of the TreeCache MBean above
       <mbean code="org.jboss.cache.TreeCacheView" name="jboss.cache:service=TreeCacheView">
       <depends>jboss.cache:service=PojoCache</depends>
       <attribute name="CacheService">jboss.cache:service=PojoCache</attribute>
       </mbean> -->
      
      
      </server>
      


      We certainly badly configured the cache???

      Someone can help??
      Thanks for your support

      Gabriel Grise

        • 1. Re: Performance Tests and TimeoutException
          Ben Wang Master

          So among 8 node cluster, only one node is doing write with single-thread write while others are reading?

          • 2. Re: Performance Tests and TimeoutException
            Gabriel Grise Newbie

            more informations about my problem:

            We still experience concurrent problem whether the cache is configured optimistic or pessimistic. Our test is simple:

            1- A POJO containing a collection (HashMap) is written into the cache by JVM1 (line 5-6).
            2- Then JVM1 to JVM8 are putting objects 1024 bytes into the collection (13-15) and reading the back the object from the collection (16-22).

            1 JVM1-8:
            2 PojoCache cache = new PojoCache(); //including start();
            3 sleep 2
            -----------------------------------------------------------------------
            4 JVM1: | JVM2-8:
            5 ObjectWithHashMap owhm = new ObjectWithHashMap(); | sleep 0.5
            6 cache.putObject("/hashObj", owt); |
            -----------------------------------------------------------------------
            7 JVM1-8: |
            8 ObjectWithHashMap owhm = cache.getObject("/hashObj"); |
            9 while(owhm == null) { |
            10 owhm = cache.getObject("/hashObj"); |
            11 sleep 0.5 |
            12 } |
             |
            13 for(int i = 0; i < 5; i++) { |
            14 owhm.put("JVM${JVM_NUMBER}"+i, new PojoOf1024bytes());|
            15 } |
            16 for(int i = 0; i < 5; i++) { |
            17 PojoOf1024bytes pi = owhm.get("JVM1"+i); |
            18 while(pi == null) { |
            19 pi=owhm.get("JVM1"+i); |
            21 sleep 0.5
            22 }
            23 }
            


            With optimistic observation:
            ============================
            some JVM are unable to take objects inside the collection (lines 16-22), and in other case, unable to take pojo containing the collection.

            With pessimistic observation:
            ============================
            We timeout exception or the same behavior as with optimistic

            thanks

            Gabriel.


            • 3. Re: Performance Tests and TimeoutException
              Pat Lap Newbie

              Hi everyone,

              JBosscache seems to be a great cache but our tests did not demonstrate the expectected results. We have only 8 nodes so far and we would like to go up to 100nodes. With 8 nodes, we already experienced lock problems in both modes (optimistic, pessimistic).


              So few questions come to my mind:

              1- How many nodes can we expect to go for best result in the cluster?
              2- What is the maximun number of nodes that have been tried using the cache so far?

              thanks for you support

              • 4. Re: Performance Tests and TimeoutException
                Manik Surtani Master

                The problem with going up to more than say 5 or 6 nodes is that replication and 2 phase commits across a cluster become a big overhead and you will find that the app does not scale. I'm assuming you're using synchronous replication.

                As a solution, to help things scale better, either use asunchronous replication or buddy replication.