1 Reply Latest reply on Aug 11, 2004 12:39 AM by ben.wang

    Transaction time-out causes a cache malfunction

    lbasili

      Hello,
      I am using TreeCache on JBoss-3.2.3 in a cluster partition of two nodes (their ip addresses are respectively 192.168.1.249, 192.168.1.250).
      The problem is that after 3-4 hours of heavy traffic, with a lot of reads and few updates on the cache, I encounter some time-outs and lock contentions,
      after which the cache stops responding.

      Here is the MBean configuration (the cluster config is cut-and-pasted from the cluster-service.xml)

      <mbean code="org.jboss.cache.TreeCache"
       name="jboss.cache:service=TreeCache">
      
       <depends>jboss:service=Naming</depends>
       <depends>jboss:service=TransactionManager</depends>
       <attribute name="TransactionManagerLookupClass">org.jboss.cache.JBossTransactionManagerLookup</attribute>
       <attribute name="IsolationLevel">REPEATABLE_READ</attribute>
       <attribute name="CacheMode">REPL_SYNC</attribute>
       <attribute name="UseReplQueue">false</attribute>
       <attribute name="ReplQueueInterval">0</attribute>
       <attribute name="ReplQueueMaxElements">0</attribute>
       <attribute name="ClusterName">TreeCache-Cluster</attribute>
       <attribute name="ClusterConfig">
       <Config>
       <UDP mcast_addr="228.1.2.3" mcast_port="45566"
       ip_ttl="64" ip_mcast="true"
       bind_addr="192.168.1.250"
       mcast_send_buf_size="150000" mcast_recv_buf_size="80000"
       ucast_send_buf_size="150000" ucast_recv_buf_size="80000"
       loopback="false" />
       <PING timeout="2000" num_initial_members="3"
       up_thread="true" down_thread="true" />
       <MERGE2 min_interval="5000" max_interval="10000" />
       <FD shun="true" up_thread="true" down_thread="true"
       timeout="2500" max_tries="5" />
       <VERIFY_SUSPECT timeout="3000" num_msgs="3"
       up_thread="true" down_thread="true" />
       <pbcast.NAKACK gc_lag="50" retransmit_timeout="300,600,1200,2400,4800"
       up_thread="true" down_thread="true" />
       <pbcast.STABLE desired_avg_gossip="20000"
       up_thread="true" down_thread="true" />
       <UNICAST timeout="5000" window_size="100" min_threshold="10"
       down_thread="true" />
       <FRAG frag_size="8192"
       down_thread="true" up_thread="true" />
       <pbcast.GMS join_timeout="5000" join_retry_timeout="2000"
       shun="true" print_local_addr="true" />
       <pbcast.STATE_TRANSFER up_thread="true" down_thread="true" />
       </Config>
       </attribute>
       <!--
       Max number of entries in the cache. If this is exceeded, the
       eviction policy will kick some entries out in order to make
       more room
       -->
       <attribute name="MaxCapacity">20000</attribute>
      
       <!--
       Whether or not to fetch state on joining a cluster
       -->
       <attribute name="FetchStateOnStartup">true</attribute>
      
       <!--
       The max amount of time (in milliseconds) we wait until the
       initial state (ie. the contents of the cache) are retrieved from
       existing members in a clustered environment
       -->
       <attribute name="InitialStateRetrievalTimeout">5000</attribute>
      
       <!--
       Number of milliseconds to wait until all responses for a
       synchronous call have been received.
       -->
       <attribute name="SyncReplTimeout">10000</attribute>
       <!-- Max number of milliseconds to wait for a lock acquisition -->
       <attribute name="LockAcquisitionTimeout">15000</attribute>
      
       <!-- Max number of milliseconds we hold a lock (not currently
       implemented) -->
       <attribute name="LockLeaseTimeout">60000</attribute>
      
      
       <!-- Name of the eviction policy class. -->
       <attribute name="EvictionPolicyClass">org.jboss.cache.eviction.LRUPolicy</attribute>
       <!-- Specific eviction policy configurations. This is LRU -->
       <attribute name="EvictionPolicyConfig">
       <config>
       <attribute name="wakeUpIntervalSeconds">5</attribute>
       <!-- Cache wide default -->
       <region name="/_default_">
       <attribute name="maxNodes">5000</attribute>
       <attribute name="timeToIdleSeconds">1000</attribute>
       </region>
       <region name="/org/jboss/data">
       <attribute name="maxNodes">5000</attribute>
       <attribute name="timeToIdleSeconds">1000</attribute>
       </region>
       <region name="/org/jboss/test/data">
       <attribute name="maxNodes">5</attribute>
       <attribute name="timeToIdleSeconds">4</attribute>
       </region>
       </config>
       </attribute>
       </mbean>
      

      After some hours during which the cache behaves correctly, I encounter the following message
      2004-08-05 20:56:29,927 WARN [org.jboss.tm.TransactionImpl] Lock contention, tx=Transacti
      onImpl:XidImpl [FormatId=257, GlobalId=ztc2.voip.stre.com//43797, BranchQual=]
      2004-08-05 20:56:29,938 ERROR [org.jboss.logging.Log4jService$ThrowableListenerLoggingAdap
      ter] Unhandled Throwable
      java.util.ConcurrentModificationException
       at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:448)
       at java.util.AbstractList$Itr.next(AbstractList.java:419)
       at org.jboss.tm.TransactionImpl.interruptThreads(TransactionImpl.java:823)
       at org.jboss.tm.TransactionImpl.timedOut(TransactionImpl.java:273)
       at org.jboss.util.timeout.TimeoutFactory$TimeoutWorker.run(TimeoutFactory.java:100
      )
      

      Then, I begin to see the following warning and a consequent exception on stderr (these messages are dumped a lot of times):

      2004-08-06 00:38:03,259 WARN [org.jboss.tm.TransactionImpl] Transaction TransactionImpl:X
      idImpl [FormatId=257, GlobalId=ztc2.voip.stre.com//43851, BranchQual=] timed out. status=STATUS_ACTIVE
      2004-08-06 00:38:03,259 ERROR [STDERR] java.lang.InterruptedException
      2004-08-06 00:38:03,261 ERROR [STDERR] at java.lang.Object.wait(Native Method)
      2004-08-06 00:38:03,261 ERROR [STDERR] at java.lang.Object.wait(Object.java:429)
      2004-08-06 00:38:03,261 ERROR [STDERR] at EDU.oswego.cs.dl.util.concurrent.BoundedBuffer.
      put(BoundedBuffer.java:109)
      2004-08-06 00:38:03,261 ERROR [STDERR] at org.jboss.cache.eviction.Region.putNodeEvent(Re
      gion.java:96)
      2004-08-06 00:38:03,261 ERROR [STDERR] at org.jboss.cache.eviction.Region.setVisitedNode(
      Region.java:90)
      2004-08-06 00:38:03,261 ERROR [STDERR] at org.jboss.cache.eviction.LRUPolicy.nodeVisited(
      LRUPolicy.java:95)
      2004-08-06 00:38:03,261 ERROR [STDERR] at org.jboss.cache.TreeCache.notifyNodeVisisted(Tr
      eeCache.java:3005)
      2004-08-06 00:38:03,261 ERROR [STDERR] at org.jboss.cache.TreeCache.get(TreeCache.java:11
      67)
      2004-08-06 00:38:03,261 ERROR [STDERR] at org.jboss.cache.TreeCache.get(TreeCache.java:11
      57)
      2004-08-06 00:38:03,261 ERROR [STDERR] at org.jboss.cache.TreeCache.get(TreeCache.java:11
      


      and the following errors

      2004-08-06 09:49:43,439 ERROR [org.jboss.cache.lock.IdentityLock] acquireReadLock(): attem
      pt failed. lock map ownership Read lock owners: [<192.168.1.250:32774>:14817]
      Write lock owner: <192.168.1.249:32779>:6198
      


      2004-08-06 09:50:41,069 WARN [org.jboss.ejb.plugins.AbstractInstanceCache] Unable to passivate due to ctx lock, id=172.30.1.101:1099:dxj8z7ya-5x3
      



      Then every time I try to read the cache, I obtain the exception:
      2004-08-06 09:54:29,626 ERROR [ztc.util.CacheHelper] Cache malfunction. Won't get key=1
      MBeanException: Exception in MBean operation 'get(java.lang.String,java.lang.Object)'
      Cause: org.jboss.cache.lock.LockingException: IdentityLock.attempt()
       at org.jboss.mx.capability.ReflectedMBeanDispatcher.invoke(ReflectedMBeanDispatche
      r.java:303)
       at org.jboss.mx.server.MBeanServerImpl.invoke(MBeanServerImpl.java:546)
       at ztc.util.CacheHelper.get(Unknown Source)
       at ztc.ldapdao.ra.ci.DAOFactoryImpl.getMetaData(Unknown Source)
       at ztc.dao.DAOContext.prepareName(Unknown Source)
       at ztc.dao.DAOContext.addChildNode(Unknown Source)
       at ztc.ldapdao.GenericDAO.getChildContext(Unknown Source)
       at ztc.ldapdao.GenericDAO.getChild(Unknown Source)
       at ztc.ejb.beanimpl.postlogger.PostLoggerBean.doPostLogin(Unknown Source)
       at sun.reflect.GeneratedMethodAccessor76.invoke(Unknown Source)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.ja
      va:25)
       at java.lang.reflect.Method.invoke(Method.java:324)
       at org.jboss.ejb.StatelessSessionContainer$ContainerInterceptor.invoke(StatelessSe
      ssionContainer.java:683)
       at org.jboss.resource.connectionmanager.CachedConnectionInterceptor.invoke(CachedC
      onnectionInterceptor.java:185)
       at org.jboss.ejb.plugins.StatelessSessionInstanceInterceptor.invoke(StatelessSessi
      onInstanceInterceptor.java:72)
       at org.jboss.ejb.plugins.AbstractTxInterceptor.invokeNext(AbstractTxInterceptor.ja
      va:84)
       at org.jboss.ejb.plugins.TxInterceptorCMT.runWithTransactions(TxInterceptorCMT.jav
      



      Can you help me to understand what is happening ?

      Thank you in advance,

      --
      L