-
1. Re: JBossCache 1.3 Beta 2
manik Mar 21, 2006 11:58 AM (in response to akardell)Hi there - do you still see this when using JBossCache 1.2.4.SP2 with REPL_ASYNC?
And no, you don't need to upgrade your Hibernate jars as long as you're using Hibernate >= 3.0.2. -
2. Re: JBossCache 1.3 Beta 2
akardell Mar 21, 2006 12:46 PM (in response to akardell)Under a substantial amount of load, it was not uncommon to get TimeoutException's and IdentityLock's in 1.2.4.SP2 with REPL_ASYNC
-
3. Re: JBossCache 1.3 Beta 2
manik Mar 21, 2006 1:30 PM (in response to akardell)Does this change if you have a really high timeout? Threads will block for longer (as expected), but I'd like to see if this affects anything - since the log message says timeout after o secs.
-
4. Re: JBossCache 1.3 Beta 2
akardell Mar 21, 2006 4:37 PM (in response to akardell)Perhaps I'm missing a setting? None of my timeouts are set to 0, as seen below. I can re-run a test, but which timeouts should I increase?
Thanks!
Aaron<?xml version="1.0" encoding="UTF-8" ?> <server> <!-- ==================================================================== --> <!-- Defines TreeCache configuration --> <!-- ==================================================================== --> <mbean code="org.jboss.cache.TreeCache" name="jboss.cache:service=TreeCache"> <depends>jboss:service=Naming</depends> <depends>jboss:service=TransactionManager</depends> <!-- Configure the TransactionManager --> <attribute name="TransactionManagerLookupClass">org.jboss.cache.DummyTransactionManagerLookup</attribute> <!-- Node locking level : SERIALIZABLE REPEATABLE_READ (default) READ_COMMITTED READ_UNCOMMITTED NONE --> <attribute name="IsolationLevel">READ_COMMITTED</attribute> <!-- Valid modes are LOCAL REPL_ASYNC REPL_SYNC --> <attribute name="CacheMode">INVALIDATION_ASYNC</attribute> <!-- Name of cluster. Needs to be the same for all clusters, in order to find each other --> <attribute name="ClusterName">TreeCache-Cluster</attribute> <attribute name="ClusterConfig"> <config> <!-- UDP: if you have a multihomed machine, set the bind_addr attribute to the appropriate NIC IP address --> <!-- UDP: On Windows machines, because of the media sense feature being broken with multicast (even after disabling media sense) set the loopback attribute to true --> <UDP mcast_addr="228.8.8.8" mcast_port="45567" ip_ttl="64" ip_mcast="true" mcast_send_buf_size="150000" mcast_recv_buf_size="80000" ucast_send_buf_size="150000" ucast_recv_buf_size="80000" loopback="true" bind_addr="0.0.0.0" /> <PING timeout="2000" num_initial_members="3" up_thread="false" down_thread="false" /> <MERGE2 min_interval="10000" max_interval="20000" /> <FD shun="true" up_thread="true" down_thread="true" /> <VERIFY_SUSPECT timeout="1500" up_thread="false" down_thread="false" /> <pbcast.NAKACK gc_lag="50" max_xmit_size="8192" retransmit_timeout="600,1200,2400,4800" up_thread="false" down_thread="false" /> <UNICAST timeout="600,1200,2400" window_size="100" min_threshold="10" down_thread="false" /> <pbcast.STABLE desired_avg_gossip="20000" up_thread="false" down_thread="false" /> <FRAG frag_size="8192" down_thread="false" up_thread="false" /> <pbcast.GMS join_timeout="5000" join_retry_timeout="2000" shun="true" print_local_addr="true" /> <pbcast.STATE_TRANSFER up_thread="false" down_thread="false" /> </config> </attribute> <!-- The max amount of time (in milliseconds) we wait until the initial state (ie. the contents of the cache) are retrieved from existing members in a clustered environment --> <attribute name="InitialStateRetrievalTimeout">5000</attribute> <!-- Number of milliseconds to wait until all responses for a synchronous call have been received. --> <attribute name="SyncReplTimeout">10000</attribute> <!-- Max number of milliseconds to wait for a lock acquisition --> <attribute name="LockAcquisitionTimeout">15000</attribute> <!-- Name of the eviction policy class. --> <attribute name="EvictionPolicyClass">org.jboss.cache.eviction.LRUPolicy</attribute> <!-- Specific eviction policy configurations. This is LRU --> <attribute name="EvictionPolicyConfig"> <config> <attribute name="wakeUpIntervalSeconds">5</attribute> <!-- Cache wide default --> <region name="/_default_"> <attribute name="maxNodes">1000</attribute> <attribute name="timeToLiveSeconds">3600</attribute> </region> </config> </attribute> </mbean> </server>
-
5. Re: JBossCache 1.3 Beta 2
manik Mar 21, 2006 6:53 PM (in response to akardell)lock acquisition timeout
-
6. Re: JBossCache 1.3 Beta 2
akardell Mar 23, 2006 4:24 PM (in response to akardell)I tried increasing the timeout from 15000 to 150000. Similar results.
However, I noticed that there's a new option with 1.3, in addition to the new INVALIDATION_ASYNC option...<attribute name="NodeLockingScheme">OPTIMISTIC</attribute>
Setting this caused all of the lock exceptions to go away!
I'm now getting OutOfMemory errors, about 11 minutes into the test, but I need to confirm what the root cause on that is still. It may be unrelated to JBossCache -- I'm not sure yet.
Thanks for your help. -
7. Re: JBossCache 1.3 Beta 2
manik Mar 24, 2006 6:14 AM (in response to akardell)Optimistic locking will always bypass these locking issues - because the very concept of o/l is that node data is copied, rather than locked, for each transaction. The OOME errors are probably due to the extra memory requirements of o/l (additional memory space to copy node data, etc.)
-
8. Re: JBossCache 1.3 Beta 2
manik Mar 24, 2006 11:19 AM (in response to akardell)Aaron, this problem seems to be something specific with when used with Hibernate. WHich version of Hibernate is this tested against?
Also, what do you do in your load test? Do you start transactions on the same objects to induce concurrency?
Cheers,
Manik -
9. Re: JBossCache 1.3 Beta 2
akardell Mar 28, 2006 5:48 PM (in response to akardell)Hi Manik,
We are using Hibernate 3.0.5. The load test isn't specific to JBossCache -- it is a full load test of our application; we aren't going out of our way to create transactions on the same object to induce concurrency, but it probably happens as a 'side effect' of our test.
As best as we can, we seem to have isolated JBossCache 1.3 as the source of the out of memory errors in some way shape or form -- if we swap in the 1.2 jars we don't get the out of memory errors. I am now trying to use a profiler to see if we can help identify the source of the memory leak. Our objects are small enough that I don't think copying nodes would be enough to cause out of memory errors unless those nodes are being retained indefinitely.
I'll help out however I can here -- hopefully our use of a profiler will highlight the problem area(s).
Thanks,
Aaron