3 Replies Latest reply on Jun 11, 2007 6:57 PM by nanreh

BdbjeCacheLoader exception with JTA distributed transactions

nanreh May 23, 2007 1:35 PM

I have a simple cache configuration that specifies a single BdbjeCacheLoader. I'm using JBoss Transactions and specifying transaction boundaries in my code using JTA. Here's one of the two transactional methods in my application:

 public byte[] loadData(byte[] key) throws ShardStoreException {

 Validate.notNull(key,"key is null");

 UserTransaction tx = null;
 try {

 String keyStr = HexUtil.toHexString(key);

 String fqn = new StringBuilder("/shard_").append(shardId).append("/").append(keyStr).toString();
 Fqn nodeFqn = Fqn.fromString(fqn);

 tx = com.arjuna.ats.jta.UserTransaction.userTransaction();
 logger.info("-------- About to begin TX");
 tx.begin();
 logger.info("-------- Transaction details: " + tx);
 logger.info("-------- Transaction started");

 Object value = cache.get(nodeFqn, KEY_PAYLOAD);

 logger.info("-------- About to commit tx.");
 tx.commit();
 logger.info("-------- TX committed.");

 if(null==value) {
 return null;
 }

 if( !(value instanceof byte[]) ) {
 throw new IllegalArgumentException("Invalid payload found in cache.");
 }
 return (byte[])value;

 } catch(Exception e) {
 try {
 logger.error("Error encountered. Rolling back transaction.",e);
 tx.rollback();
 } catch(Exception re) {
 logger.error("Error in rollback.",re);
 }
 throw new ShardStoreException(e);
 }
 }

Pretty basic usage of JBoss cache with lots of logger calls. With my current cache configuration I can call this method once and it appears to succeed normally. When I call it again, however, I get this exception:

jvm 1 | INFO 10:11.03 [AnonymousIoService-4] JBossCacheShardStore - -------- About to begin TX
jvm 1 | INFO 10:11.03 [AnonymousIoService-4] JBossCacheShardStore - -------- Transaction details: com.arjuna.ats.internal.jta.transaction.jts.UserTransactionImple@1b2d7df
jvm 1 | INFO 10:11.03 [AnonymousIoService-4] JBossCacheShardStore - -------- Transaction started
jvm 1 | [23-05-2007 10:11:03:651] jacorb.poa:INFO: oid:
jvm 1 | 05 15 0C 47 48 3F 29 14 36 40 ...GH?).6@
jvm 1 | object is activated
jvm 1 | ERROR 10:11.04 [AnonymousIoService-4] JBossCacheShardStore - Error encountered. Rolling back transaction.
jvm 1 | java.lang.RuntimeException: com.sleepycat.je.DeadlockException: (JE 3.2.23) Lock expired. Locker -1_AnonymousIoService-4_ThreadLocker: waited for lock on database=SHARD_CLUSTER node=28 type=READ grant=WAIT_NEW timeoutMillis=500 startTime=1179940263653 endTime=1179940264155
jvm 1 | Owners: [<LockInfo locker="6_RequestProcessor-5_Txn" type="WRITE"/>]
jvm 1 | Waiters: []
jvm 1 |
jvm 1 | at org.jboss.cache.CacheImpl.invokeMethod(CacheImpl.java:3764)
jvm 1 | at org.jboss.cache.CacheImpl.get(CacheImpl.java:1323)
jvm 1 | at org.jboss.cache.CacheImpl.get(CacheImpl.java:1298)
jvm 1 | at com.scea.scne.acs.shard.cache.JBossCacheShardStore.loadData(JBossCacheShardStore.java:110)
jvm 1 | at com.scea.scne.acs.shard.server.ShardFetchHandler.handleLoadRequest(ShardFetchHandler.java:72)
jvm 1 | at com.scea.scne.acs.shard.server.ShardFetchHandler.messageReceived(ShardFetchHandler.java:55)
jvm 1 | at org.apache.mina.common.support.AbstractIoFilterChain$TailFilter.messageReceived(AbstractIoFilterChain.java:703)
jvm 1 | at org.apache.mina.common.support.AbstractIoFilterChain.callNextMessageReceived(AbstractIoFilterChain.java:362)
jvm 1 | at org.apache.mina.common.support.AbstractIoFilterChain.access$1100(AbstractIoFilterChain.java:54)
jvm 1 | at org.apache.mina.common.support.AbstractIoFilterChain$EntryImpl$1.messageReceived(AbstractIoFilterChain.java:800)
jvm 1 | at org.apache.mina.filter.LoggingFilter.messageReceived(LoggingFilter.java:97)
jvm 1 | at org.apache.mina.common.support.AbstractIoFilterChain.callNextMessageReceived(AbstractIoFilterChain.java:362)
jvm 1 | at org.apache.mina.common.support.AbstractIoFilterChain.access$1100(AbstractIoFilterChain.java:54)
jvm 1 | at org.apache.mina.common.support.AbstractIoFilterChain$EntryImpl$1.messageReceived(AbstractIoFilterChain.java:800)
jvm 1 | at org.apache.mina.filter.codec.support.SimpleProtocolDecoderOutput.flush(SimpleProtocolDecoderOutput.java:62)
jvm 1 | at org.apache.mina.filter.codec.ProtocolCodecFilter.messageReceived(ProtocolCodecFilter.java:200)
jvm 1 | at org.apache.mina.common.support.AbstractIoFilterChain.callNextMessageReceived(AbstractIoFilterChain.java:362)
jvm 1 | at org.apache.mina.common.support.AbstractIoFilterChain.access$1100(AbstractIoFilterChain.java:54)
jvm 1 | at org.apache.mina.common.support.AbstractIoFilterChain$EntryImpl$1.messageReceived(AbstractIoFilterChain.java:800)
jvm 1 | at org.apache.mina.filter.executor.ExecutorFilter.processEvent(ExecutorFilter.java:247)
jvm 1 | at org.apache.mina.filter.executor.ExecutorFilter$ProcessEventsRunnable.run(ExecutorFilter.java:307)
jvm 1 | at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
jvm 1 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
jvm 1 | at java.lang.Thread.run(Thread.java:595)
jvm 1 | Caused by: com.sleepycat.je.DeadlockException: (JE 3.2.23) Lock expired. Locker -1_AnonymousIoService-4_ThreadLocker: waited for lock on database=SHARD_CLUSTER node=28 type=READ grant=WAIT_NEW timeoutMillis=500 startTime=1179940263653 endTime=1179940264155
jvm 1 | Owners: [<LockInfo locker="6_RequestProcessor-5_Txn" type="WRITE"/>]
jvm 1 | Waiters: []
jvm 1 |
jvm 1 | at com.sleepycat.je.txn.LockManager.lock(LockManager.java:266)
jvm 1 | at com.sleepycat.je.txn.BasicLocker.lockInternal(BasicLocker.java:90)
jvm 1 | at com.sleepycat.je.txn.Locker.lock(Locker.java:257)
jvm 1 | at com.sleepycat.je.dbi.CursorImpl.lockLNDeletedAllowed(CursorImpl.java:2349)
jvm 1 | at com.sleepycat.je.dbi.CursorImpl.lockLN(CursorImpl.java:2271)
jvm 1 | at com.sleepycat.je.dbi.CursorImpl.searchAndPosition(CursorImpl.java:1963)
jvm 1 | at com.sleepycat.je.Cursor.searchInternal(Cursor.java:1178)
jvm 1 | at com.sleepycat.je.Cursor.searchAllowPhantoms(Cursor.java:1148)
jvm 1 | at com.sleepycat.je.Cursor.search(Cursor.java:1014)
jvm 1 | at com.sleepycat.je.Database.get(Database.java:547)
jvm 1 | at org.jboss.cache.loader.bdbje.BdbjeCacheLoader.get(BdbjeCacheLoader.java:380)
jvm 1 | at org.jboss.cache.interceptors.CacheLoaderInterceptor.loadData(CacheLoaderInterceptor.java:515)
jvm 1 | at org.jboss.cache.interceptors.CacheLoaderInterceptor.loadNode(CacheLoaderInterceptor.java:395)
jvm 1 | at org.jboss.cache.interceptors.CacheLoaderInterceptor.loadIfNeeded(CacheLoaderInterceptor.java:181)
jvm 1 | at org.jboss.cache.interceptors.CacheLoaderInterceptor.invoke(CacheLoaderInterceptor.java:152)
jvm 1 | at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:75)
jvm 1 | at org.jboss.cache.interceptors.UnlockInterceptor.invoke(UnlockInterceptor.java:33)
jvm 1 | at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:75)
jvm 1 | at org.jboss.cache.interceptors.PessimisticLockInterceptor.invoke(PessimisticLockInterceptor.java:202)
jvm 1 | at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:75)
jvm 1 | at org.jboss.cache.interceptors.ReplicationInterceptor.invoke(ReplicationInterceptor.java:34)
jvm 1 | at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:75)
jvm 1 | at org.jboss.cache.interceptors.NotificationInterceptor.invoke(NotificationInterceptor.java:22)
jvm 1 | at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:75)
jvm 1 | at org.jboss.cache.interceptors.TxInterceptor.handleNonTxMethod(TxInterceptor.java:294)
jvm 1 | at org.jboss.cache.interceptors.TxInterceptor.invoke(TxInterceptor.java:128)
jvm 1 | at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:75)
jvm 1 | at org.jboss.cache.interceptors.CacheMgmtInterceptor.invoke(CacheMgmtInterceptor.java:126)
jvm 1 | at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:75)
jvm 1 | at org.jboss.cache.interceptors.InvocationContextInterceptor.invoke(InvocationContextInterceptor.java:66)
jvm 1 | at org.jboss.cache.CacheImpl.invokeMethod(CacheImpl.java:3752)
jvm 1 | ... 23 more

I'm not sure how to interpret this stack trace but it looks like the BerkleyDB is holding onto a lock from the first transaction and the second transaction fails to acquire it and fails.

When I tell the transaction manager to do local transactions the error goes away. I'm wondering if I need to send the Berkley DB a few properties to set up its transactional behavior but I thought I'd post in case anyone out there recognizes the issue.

Here's my cache configuration file:

<?xml version="1.0" encoding="UTF-8"?>

<server>
 <classpath codebase="./lib" archives="jboss-cache.jar, jgroups.jar"/>

 <mbean code="org.jboss.cache.CacheImpl" name="jboss.cache:service=Cache">
 <depends>jboss:service=Naming</depends>
 <depends>jboss:service=TransactionManager</depends>

 <!-- Use our custom TxManagerLookup class -->
 <attribute name="TransactionManagerLookupClass">org.defy.tx.CustomTxManagerLookup</attribute>

 <!--
 Node locking level : SERIALIZABLE
 REPEATABLE_READ (default)
 READ_COMMITTED
 READ_UNCOMMITTED
 NONE
 -->
 <attribute name="IsolationLevel">REPEATABLE_READ</attribute>

 <!-- Lock parent before doing node additions/removes -->
 <attribute name="LockParentForChildInsertRemove">true</attribute>

 <!--
 Valid modes are LOCAL
 REPL_ASYNC
 REPL_SYNC
 INVALIDATION_ASYNC
 INVALIDATION_SYNC
 -->

 <attribute name="CacheMode">REPL_SYNC</attribute>
 <!--<attribute name="CacheMode">LOCAL</attribute>-->

 <!-- Name of cluster. Needs to be the same for all JBoss Cache nodes in a
 cluster in order to find each other. -->
 <attribute name="ClusterName">SHARD_CLUSTER</attribute>

 <!--Uncomment next three statements to enable JGroups multiplexer.
 This configuration is dependent on the JGroups multiplexer being
 registered in an MBean server such as JBossAS. -->
 <!--
 <depends>jgroups.mux:name=Multiplexer</depends>
 <attribute name="MultiplexerService">jgroups.mux:name=Multiplexer</attribute>
 <attribute name="MultiplexerStack">fc-fast-minimalthreads</attribute>
 -->

 <!-- JGroups protocol stack properties.
 ClusterConfig isn't used if the multiplexer is enabled and successfully initialized.
 -->
 <attribute name="ClusterConfig">
 <config>
 <!-- UDP: if you have a multihomed machine,
 set the bind_addr attribute to the appropriate NIC IP address -->
 <!-- UDP: On Windows machines, because of the media sense feature
 being broken with multicast (even after disabling media sense)
 set the loopback attribute to true -->
 <UDP mcast_addr="228.1.2.3" mcast_port="48866"
 ip_ttl="64" ip_mcast="true"
 mcast_send_buf_size="150000" mcast_recv_buf_size="80000"
 ucast_send_buf_size="150000" ucast_recv_buf_size="80000"
 loopback="true"/>
 <PING timeout="2000" num_initial_members="3"
 up_thread="false" down_thread="false"/>
 <MERGE2 min_interval="10000" max_interval="20000"/>
 <FD shun="true" up_thread="true" down_thread="true"/>
 <VERIFY_SUSPECT timeout="1500"
 up_thread="false" down_thread="false"/>
 <pbcast.NAKACK gc_lag="50" retransmit_timeout="600,1200,2400,4800"
 max_xmit_size="8192" up_thread="false" down_thread="false"/>
 <UNICAST timeout="600,1200,2400" down_thread="false"/>
 <pbcast.STABLE desired_avg_gossip="20000"
 up_thread="false" down_thread="false"/>
 <FRAG frag_size="8192"
 down_thread="false" up_thread="false"/>
 <pbcast.GMS join_timeout="5000" join_retry_timeout="2000"
 shun="true" print_local_addr="true"/>
 <pbcast.STATE_TRANSFER up_thread="false" down_thread="false"/>
 </config>
 </attribute>


 <!--
 The max amount of time (in milliseconds) we wait until the
 initial state (ie. the contents of the cache) are retrieved from
 existing members in a clustered environment
 -->
 <attribute name="InitialStateRetrievalTimeout">20000</attribute>
 <attribute name="FetchInMemoryState">true</attribute>

 <!--
 Number of milliseconds to wait until all responses for a
 synchronous call have been received.
 -->
 <attribute name="SyncReplTimeout">20000</attribute>

 <!-- Max number of milliseconds to wait for a lock acquisition -->
 <attribute name="LockAcquisitionTimeout">15000</attribute>

 <attribute name="EvictionPolicyConfig">
 <config>
 <attribute name="wakeUpIntervalSeconds">5</attribute>
 <!-- This defaults to 200000 if not specified -->
 <attribute name="eventQueueSize">200000</attribute>
 <attribute name="policyClass">org.jboss.cache.eviction.FIFOPolicy</attribute>

 <!-- Cache wide default -->
 <region name="/_default_">
 <!-- We're allowing an unlimited number of nodes for now. -->
 <attribute name="maxNodes">0</attribute>
 </region>

 </config>
 </attribute>

 <attribute name="CacheLoaderConfiguration">
 <config>
 <!-- comma delimited FQNs to preload -->
 <passivation>false</passivation>
 <preload>/</preload>
 <shared>false</shared>

 <cacheloader>
 <class>org.jboss.cache.loader.bdbje.BdbjeCacheLoader</class>

 <properties>
 location=/tmp/filestore
 transactional=false
 </properties>
 <async>false</async>
 <fetchPersistentState>false</fetchPersistentState>
 <ignoreModifications>false</ignoreModifications>
 <purgeOnStartup>true</purgeOnStartup>
 </cacheloader>
 </config>
 </attribute>

 </mbean>
</server>

thanks in advance,
frank

1. Re: BdbjeCacheLoader exception with JTA distributed transact

nanreh May 23, 2007 1:47 PM (in response to nanreh)

One additional possible hint: when I shut the process down a shutdown hook runs to do an orderly shutdown of jacorb which is being used by JBoss Transactions for distributed transactions. BdbjeCacheLoader throws an exception when it's asked to shutdown apparently due to a lingering open transaction:

jvm 1 | com.sleepycat.je.DatabaseException: (JE 3.2.23) There is 1 existing transaction opened against the Environment.
jvm 1 | Aborting open transactions ...
jvm 1 |
jvm 1 | at com.sleepycat.je.Environment.close(Environment.java:331)
jvm 1 | at org.jboss.cache.loader.bdbje.BdbjeCacheLoader.stop(BdbjeCacheLoader.java:259)
jvm 1 | at org.jboss.cache.loader.CacheLoaderManager.stopCacheLoader(CacheLoaderManager.java:333)
jvm 1 | at org.jboss.cache.CacheImpl.stop(CacheImpl.java:870)
jvm 1 | at com.scea.scne.acs.shard.cache.JBossCacheShardStore$3.run(JBossCacheShardStore.java:72)

thanks...
f

2. Re: BdbjeCacheLoader exception with JTA distributed transact

manik Jun 8, 2007 6:48 AM (in response to nanreh)

I just wrote a unit test loosely based on your use case above and it works okay. Can you confirm that this is specific to the transaction manager you use? Could you try with org.jboss.cache.DummyTransactionManager and see how it behaves with that?

I'm NOT advocating the use of the DummyTransactionManager; just using it to try and pinpoint the fault. :-)
Actions
3. Re: BdbjeCacheLoader exception with JTA distributed transact

nanreh Jun 11, 2007 6:57 PM (in response to nanreh)

Thanks Manik for looking into this issue--I had another forum post which cleared it up for me:

http://www.jboss.com/index.html?module=bb&op=viewtopic&t=109512

I was running a distributed transaction manager with JBoss Cache and none of my transactions were being committed. I'm now running local transactions + a REPL_SYNC policy and letting JBoss do its internal prepare/commit across the cluster. For now I'm handling sync failures in my application as you mentioned.

thanks again,
Frank[/url]
Actions

Go to original post