Transaction time-out causes a cache malfunction
lbasili Aug 6, 2004 4:35 AMHello,
I am using TreeCache on JBoss-3.2.3 in a cluster partition of two nodes (their ip addresses are respectively 192.168.1.249, 192.168.1.250).
The problem is that after 3-4 hours of heavy traffic, with a lot of reads and few updates on the cache, I encounter some time-outs and lock contentions,
after which the cache stops responding.
Here is the MBean configuration (the cluster config is cut-and-pasted from the cluster-service.xml)
<mbean code="org.jboss.cache.TreeCache" name="jboss.cache:service=TreeCache"> <depends>jboss:service=Naming</depends> <depends>jboss:service=TransactionManager</depends> <attribute name="TransactionManagerLookupClass">org.jboss.cache.JBossTransactionManagerLookup</attribute> <attribute name="IsolationLevel">REPEATABLE_READ</attribute> <attribute name="CacheMode">REPL_SYNC</attribute> <attribute name="UseReplQueue">false</attribute> <attribute name="ReplQueueInterval">0</attribute> <attribute name="ReplQueueMaxElements">0</attribute> <attribute name="ClusterName">TreeCache-Cluster</attribute> <attribute name="ClusterConfig"> <Config> <UDP mcast_addr="228.1.2.3" mcast_port="45566" ip_ttl="64" ip_mcast="true" bind_addr="192.168.1.250" mcast_send_buf_size="150000" mcast_recv_buf_size="80000" ucast_send_buf_size="150000" ucast_recv_buf_size="80000" loopback="false" /> <PING timeout="2000" num_initial_members="3" up_thread="true" down_thread="true" /> <MERGE2 min_interval="5000" max_interval="10000" /> <FD shun="true" up_thread="true" down_thread="true" timeout="2500" max_tries="5" /> <VERIFY_SUSPECT timeout="3000" num_msgs="3" up_thread="true" down_thread="true" /> <pbcast.NAKACK gc_lag="50" retransmit_timeout="300,600,1200,2400,4800" up_thread="true" down_thread="true" /> <pbcast.STABLE desired_avg_gossip="20000" up_thread="true" down_thread="true" /> <UNICAST timeout="5000" window_size="100" min_threshold="10" down_thread="true" /> <FRAG frag_size="8192" down_thread="true" up_thread="true" /> <pbcast.GMS join_timeout="5000" join_retry_timeout="2000" shun="true" print_local_addr="true" /> <pbcast.STATE_TRANSFER up_thread="true" down_thread="true" /> </Config> </attribute> <!-- Max number of entries in the cache. If this is exceeded, the eviction policy will kick some entries out in order to make more room --> <attribute name="MaxCapacity">20000</attribute> <!-- Whether or not to fetch state on joining a cluster --> <attribute name="FetchStateOnStartup">true</attribute> <!-- The max amount of time (in milliseconds) we wait until the initial state (ie. the contents of the cache) are retrieved from existing members in a clustered environment --> <attribute name="InitialStateRetrievalTimeout">5000</attribute> <!-- Number of milliseconds to wait until all responses for a synchronous call have been received. --> <attribute name="SyncReplTimeout">10000</attribute> <!-- Max number of milliseconds to wait for a lock acquisition --> <attribute name="LockAcquisitionTimeout">15000</attribute> <!-- Max number of milliseconds we hold a lock (not currently implemented) --> <attribute name="LockLeaseTimeout">60000</attribute> <!-- Name of the eviction policy class. --> <attribute name="EvictionPolicyClass">org.jboss.cache.eviction.LRUPolicy</attribute> <!-- Specific eviction policy configurations. This is LRU --> <attribute name="EvictionPolicyConfig"> <config> <attribute name="wakeUpIntervalSeconds">5</attribute> <!-- Cache wide default --> <region name="/_default_"> <attribute name="maxNodes">5000</attribute> <attribute name="timeToIdleSeconds">1000</attribute> </region> <region name="/org/jboss/data"> <attribute name="maxNodes">5000</attribute> <attribute name="timeToIdleSeconds">1000</attribute> </region> <region name="/org/jboss/test/data"> <attribute name="maxNodes">5</attribute> <attribute name="timeToIdleSeconds">4</attribute> </region> </config> </attribute> </mbean>
After some hours during which the cache behaves correctly, I encounter the following message
2004-08-05 20:56:29,927 WARN [org.jboss.tm.TransactionImpl] Lock contention, tx=Transacti onImpl:XidImpl [FormatId=257, GlobalId=ztc2.voip.stre.com//43797, BranchQual=] 2004-08-05 20:56:29,938 ERROR [org.jboss.logging.Log4jService$ThrowableListenerLoggingAdap ter] Unhandled Throwable java.util.ConcurrentModificationException at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:448) at java.util.AbstractList$Itr.next(AbstractList.java:419) at org.jboss.tm.TransactionImpl.interruptThreads(TransactionImpl.java:823) at org.jboss.tm.TransactionImpl.timedOut(TransactionImpl.java:273) at org.jboss.util.timeout.TimeoutFactory$TimeoutWorker.run(TimeoutFactory.java:100 )
Then, I begin to see the following warning and a consequent exception on stderr (these messages are dumped a lot of times):
2004-08-06 00:38:03,259 WARN [org.jboss.tm.TransactionImpl] Transaction TransactionImpl:X idImpl [FormatId=257, GlobalId=ztc2.voip.stre.com//43851, BranchQual=] timed out. status=STATUS_ACTIVE 2004-08-06 00:38:03,259 ERROR [STDERR] java.lang.InterruptedException 2004-08-06 00:38:03,261 ERROR [STDERR] at java.lang.Object.wait(Native Method) 2004-08-06 00:38:03,261 ERROR [STDERR] at java.lang.Object.wait(Object.java:429) 2004-08-06 00:38:03,261 ERROR [STDERR] at EDU.oswego.cs.dl.util.concurrent.BoundedBuffer. put(BoundedBuffer.java:109) 2004-08-06 00:38:03,261 ERROR [STDERR] at org.jboss.cache.eviction.Region.putNodeEvent(Re gion.java:96) 2004-08-06 00:38:03,261 ERROR [STDERR] at org.jboss.cache.eviction.Region.setVisitedNode( Region.java:90) 2004-08-06 00:38:03,261 ERROR [STDERR] at org.jboss.cache.eviction.LRUPolicy.nodeVisited( LRUPolicy.java:95) 2004-08-06 00:38:03,261 ERROR [STDERR] at org.jboss.cache.TreeCache.notifyNodeVisisted(Tr eeCache.java:3005) 2004-08-06 00:38:03,261 ERROR [STDERR] at org.jboss.cache.TreeCache.get(TreeCache.java:11 67) 2004-08-06 00:38:03,261 ERROR [STDERR] at org.jboss.cache.TreeCache.get(TreeCache.java:11 57) 2004-08-06 00:38:03,261 ERROR [STDERR] at org.jboss.cache.TreeCache.get(TreeCache.java:11
and the following errors
2004-08-06 09:49:43,439 ERROR [org.jboss.cache.lock.IdentityLock] acquireReadLock(): attem pt failed. lock map ownership Read lock owners: [<192.168.1.250:32774>:14817] Write lock owner: <192.168.1.249:32779>:6198
2004-08-06 09:50:41,069 WARN [org.jboss.ejb.plugins.AbstractInstanceCache] Unable to passivate due to ctx lock, id=172.30.1.101:1099:dxj8z7ya-5x3
Then every time I try to read the cache, I obtain the exception:
2004-08-06 09:54:29,626 ERROR [ztc.util.CacheHelper] Cache malfunction. Won't get key=1 MBeanException: Exception in MBean operation 'get(java.lang.String,java.lang.Object)' Cause: org.jboss.cache.lock.LockingException: IdentityLock.attempt() at org.jboss.mx.capability.ReflectedMBeanDispatcher.invoke(ReflectedMBeanDispatche r.java:303) at org.jboss.mx.server.MBeanServerImpl.invoke(MBeanServerImpl.java:546) at ztc.util.CacheHelper.get(Unknown Source) at ztc.ldapdao.ra.ci.DAOFactoryImpl.getMetaData(Unknown Source) at ztc.dao.DAOContext.prepareName(Unknown Source) at ztc.dao.DAOContext.addChildNode(Unknown Source) at ztc.ldapdao.GenericDAO.getChildContext(Unknown Source) at ztc.ldapdao.GenericDAO.getChild(Unknown Source) at ztc.ejb.beanimpl.postlogger.PostLoggerBean.doPostLogin(Unknown Source) at sun.reflect.GeneratedMethodAccessor76.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.ja va:25) at java.lang.reflect.Method.invoke(Method.java:324) at org.jboss.ejb.StatelessSessionContainer$ContainerInterceptor.invoke(StatelessSe ssionContainer.java:683) at org.jboss.resource.connectionmanager.CachedConnectionInterceptor.invoke(CachedC onnectionInterceptor.java:185) at org.jboss.ejb.plugins.StatelessSessionInstanceInterceptor.invoke(StatelessSessi onInstanceInterceptor.java:72) at org.jboss.ejb.plugins.AbstractTxInterceptor.invokeNext(AbstractTxInterceptor.ja va:84) at org.jboss.ejb.plugins.TxInterceptorCMT.runWithTransactions(TxInterceptorCMT.jav
Can you help me to understand what is happening ?
Thank you in advance,
--
L