9 Replies Latest reply on Mar 28, 2006 5:48 PM by akardell

JBossCache 1.3 Beta 2

akardell Mar 21, 2006 11:39 AM

We have run a load test against our application swapping in the JBossCache 1.3 Beta 2 jars, using INVALIDATION_ASYNC as the CacheMode and READ_COMMITTED as the IsolationLevel. Approximately 5 minutes into the test, we start to receive many IdentityLock errors on various objects. Is there anything we can do to help troubleshoot this? Is this to be expected -- I thought the invalidation scheme avoided locking altogether, but I may not have an appropriate understanding of it.

All of the errors look similar to the following:

1052265 [PoolThread-47] ERROR org.jboss.cache.lock.IdentityLock - write lock for /com/abc/def/orm/Term/com.abc.def.orm.Term#6 could not be acquired after 0 ms. Locks: Read lock owners: {}
Write lock owner: PoolThread-63
(caller=PoolThread-47, lock info: write owner=PoolThread-63 (org.jboss.cache.lock.LockStrategyReadCommitted@b15853))

Eventually, we start to get read-lock timeouts also, like the following:

org.hibernate.cache.CacheException: org.jboss.cache.lock.TimeoutException: read lock for /org/hibernate/cache/UpdateTimestampsCache/[Accounts] could not be acquired by PoolThread-86 after 15000 ms. Locks: Read lock owners: {}
Write lock owner: GlobalTransaction:<10.40.58.13:2603>:1
, lock info: write owner=GlobalTransaction:<10.40.58.13:2603>:1 (org.jboss.cache.lock.LockStrategyReadCommitted@2d4ce3)

Do we need new Hibernate jars to make use of the latest JBossCache jars?

Any other thoughts / strategies to try? I'll help provide whatever information I can.

Thanks,

Aaron

1. Re: JBossCache 1.3 Beta 2

manik Mar 21, 2006 11:58 AM (in response to akardell)

Hi there - do you still see this when using JBossCache 1.2.4.SP2 with REPL_ASYNC?

And no, you don't need to upgrade your Hibernate jars as long as you're using Hibernate >= 3.0.2.
Actions
2. Re: JBossCache 1.3 Beta 2

akardell Mar 21, 2006 12:46 PM (in response to akardell)

Under a substantial amount of load, it was not uncommon to get TimeoutException's and IdentityLock's in 1.2.4.SP2 with REPL_ASYNC
Actions
3. Re: JBossCache 1.3 Beta 2

manik Mar 21, 2006 1:30 PM (in response to akardell)

Does this change if you have a really high timeout? Threads will block for longer (as expected), but I'd like to see if this affects anything - since the log message says timeout after o secs.
Actions

4. Re: JBossCache 1.3 Beta 2

akardell Mar 21, 2006 4:37 PM (in response to akardell)

Perhaps I'm missing a setting? None of my timeouts are set to 0, as seen below. I can re-run a test, but which timeouts should I increase?

Thanks!

Aaron

<?xml version="1.0" encoding="UTF-8" ?>
<server>

 <!-- ==================================================================== -->
 <!-- Defines TreeCache configuration -->
 <!-- ==================================================================== -->
 <mbean code="org.jboss.cache.TreeCache" name="jboss.cache:service=TreeCache">
 <depends>jboss:service=Naming</depends>
 <depends>jboss:service=TransactionManager</depends>


 <!-- Configure the TransactionManager -->
 <attribute name="TransactionManagerLookupClass">org.jboss.cache.DummyTransactionManagerLookup</attribute>

 <!--
 Node locking level : SERIALIZABLE
 REPEATABLE_READ (default)
 READ_COMMITTED
 READ_UNCOMMITTED
 NONE
 -->
 <attribute name="IsolationLevel">READ_COMMITTED</attribute>

 <!-- Valid modes are LOCAL
 REPL_ASYNC
 REPL_SYNC
 -->
 <attribute name="CacheMode">INVALIDATION_ASYNC</attribute>

 <!-- Name of cluster. Needs to be the same for all clusters, in order
 to find each other -->
 <attribute name="ClusterName">TreeCache-Cluster</attribute>

 <attribute name="ClusterConfig">
 <config>
 <!-- UDP: if you have a multihomed machine,
 set the bind_addr attribute to the appropriate NIC IP address
 -->
 <!-- UDP: On Windows machines, because of the media sense feature
 being broken with multicast (even after disabling media sense)
 set the loopback attribute to true
 -->
 <UDP mcast_addr="228.8.8.8" mcast_port="45567" ip_ttl="64" ip_mcast="true"
 mcast_send_buf_size="150000" mcast_recv_buf_size="80000" ucast_send_buf_size="150000"
 ucast_recv_buf_size="80000" loopback="true" bind_addr="0.0.0.0" />
 <PING timeout="2000" num_initial_members="3" up_thread="false" down_thread="false" />
 <MERGE2 min_interval="10000" max_interval="20000" />
 <FD shun="true" up_thread="true" down_thread="true" />
 <VERIFY_SUSPECT timeout="1500" up_thread="false" down_thread="false" />
 <pbcast.NAKACK gc_lag="50" max_xmit_size="8192" retransmit_timeout="600,1200,2400,4800" up_thread="false"
 down_thread="false" />
 <UNICAST timeout="600,1200,2400" window_size="100" min_threshold="10" down_thread="false" />
 <pbcast.STABLE desired_avg_gossip="20000" up_thread="false" down_thread="false" />
 <FRAG frag_size="8192" down_thread="false" up_thread="false" />
 <pbcast.GMS join_timeout="5000" join_retry_timeout="2000" shun="true" print_local_addr="true" />
 <pbcast.STATE_TRANSFER up_thread="false" down_thread="false" />
 </config>
 </attribute>

 <!-- The max amount of time (in milliseconds) we wait until the
 initial state (ie. the contents of the cache) are retrieved from
 existing members in a clustered environment
 -->
 <attribute name="InitialStateRetrievalTimeout">5000</attribute>

 <!-- Number of milliseconds to wait until all responses for a
 synchronous call have been received.
 -->
 <attribute name="SyncReplTimeout">10000</attribute>

 <!-- Max number of milliseconds to wait for a lock acquisition -->
 <attribute name="LockAcquisitionTimeout">15000</attribute>

 <!-- Name of the eviction policy class. -->
 <attribute name="EvictionPolicyClass">org.jboss.cache.eviction.LRUPolicy</attribute>

 <!-- Specific eviction policy configurations. This is LRU -->
 <attribute name="EvictionPolicyConfig">
 <config>
 <attribute name="wakeUpIntervalSeconds">5</attribute>
 <!-- Cache wide default -->
 <region name="/_default_">
 <attribute name="maxNodes">1000</attribute>
 <attribute name="timeToLiveSeconds">3600</attribute>
 </region>
 </config>
 </attribute>

 </mbean>
</server>

5. Re: JBossCache 1.3 Beta 2

manik Mar 21, 2006 6:53 PM (in response to akardell)

lock acquisition timeout
Actions
6. Re: JBossCache 1.3 Beta 2

akardell Mar 23, 2006 4:24 PM (in response to akardell)
I tried increasing the timeout from 15000 to 150000. Similar results.

However, I noticed that there's a new option with 1.3, in addition to the new INVALIDATION_ASYNC option...

<attribute name="NodeLockingScheme">OPTIMISTIC</attribute>

Setting this caused all of the lock exceptions to go away!

I'm now getting OutOfMemory errors, about 11 minutes into the test, but I need to confirm what the root cause on that is still. It may be unrelated to JBossCache -- I'm not sure yet.

Thanks for your help.
Actions
7. Re: JBossCache 1.3 Beta 2

manik Mar 24, 2006 6:14 AM (in response to akardell)

Optimistic locking will always bypass these locking issues - because the very concept of o/l is that node data is copied, rather than locked, for each transaction. The OOME errors are probably due to the extra memory requirements of o/l (additional memory space to copy node data, etc.)
Actions
8. Re: JBossCache 1.3 Beta 2

manik Mar 24, 2006 11:19 AM (in response to akardell)

Aaron, this problem seems to be something specific with when used with Hibernate. WHich version of Hibernate is this tested against?

Also, what do you do in your load test? Do you start transactions on the same objects to induce concurrency?

Cheers,
Manik
Actions
9. Re: JBossCache 1.3 Beta 2

akardell Mar 28, 2006 5:48 PM (in response to akardell)

Hi Manik,

We are using Hibernate 3.0.5. The load test isn't specific to JBossCache -- it is a full load test of our application; we aren't going out of our way to create transactions on the same object to induce concurrency, but it probably happens as a 'side effect' of our test.

As best as we can, we seem to have isolated JBossCache 1.3 as the source of the out of memory errors in some way shape or form -- if we swap in the 1.2 jars we don't get the out of memory errors. I am now trying to use a profiler to see if we can help identify the source of the memory leak. Our objects are small enough that I don't think copying nodes would be enough to cause out of memory errors unless those nodes are being retained indefinitely.

I'll help out however I can here -- hopefully our use of a profiler will highlight the problem area(s).

Thanks,

Aaron
Actions

Go to original post