5 Replies Latest reply on Aug 20, 2009 5:06 PM by rsmysore

    Possible deadlock with JBoss Cache? - Timeout exceptions whi

      Hi there,

      I am working on a standalone java application that uses Jboss cache and atomikos JTA transactions implementation. Here are the details on what is being used and some of the configuration we use.

      Java 1.5/1.6
      Jboss Cache 3.1.0 (transactional cache)
      Atmoikos 3.5.4 (standalone JTA)
      Eclipselink JPA 1.1.0 (All DB interaction)
      Spring 2.5 (dependency injection, transactional aspects)

      Jboss configuration (the locking scheme)

      isolationLevel="REPEATABLE_READ"
      lockParentForChildInsertRemove="true"
      lockAcquisitionTimeout="20000"
      nodeLockingScheme="mvcc"
      writeSkewCheck="false"
      concurrencyLevel="500"

      On server startup, using JPA a bunch of objects are fetched from DB and put into the jboss cache. Later on, we have a call that this server processes that does the following.

      Start transaction (not explicitly coded, but configured as transactional advice using spring. )
      Get objects from cache if needed and analyze
      for each object that needs updating
      start transaction (propagation REQUIRES_NEW, so failure of one object doesnt affect others)
      Get object from cache, apply changes
      update object in jboss cache <-- Problem occurs here
      update object in DB using JPA
      commit/rollback transaction


      Now for the most part this works fine. But recently we started seeing some exceptions when an existing object is being set back into the cache after modification. In this case the outer transaction hasnt even done a get on the cache for this object. The exact steps from above in this case looks something like this

      Start transaction
      for each object that needs updating (only one object needs to be updated)
      start transaction
      Get object from cache, apply changes
      update object in jboss cache <-- Problem occurs here, see exception below


      Some other test cases that run the same exact steps work fine. If i keep trying this again and again with the same object, i can reproduce the problem consistently. However, if I delete this object from database and cache. Re-import it and run the same exact scenario again, it works fine. The problem is gone. Makes me wonder if it has to do with the order or where it belongs in the cache. I have also made sure that the external transaction does not do any get or update of this object in cache. There is no other thread accessing this object in cache either, actually, I have made sure that there is no other activity on the server at all. I have also tried NESTED (though that is not what we want) for propagation (in spring transactional advice), and that doesnt seem to help either. I have changed isolation level for jboss cache to READ_COMMITTED and that doesnt help either.

      I have tried turning on debug logging and that doesnt seem to give any further info on the transaction holding the lock.

      Any ideas on what is going wrong here. Could it be a bug in jboss cache? Any help is appreciated.

      Thanks
      Prasad

      Here is the exception
      ERROR (8790): org.jboss.cache.lock.TimeoutException: Unable to acquire lock on Fqn [/Metadata/Form/Objects/14] after [40000] milliseconds for requestor [GlobalTransaction::418]! Lock held by [GlobalTransaction::17]
      at org.jboss.cache.mvcc.MVCCNodeHelper.acquireLock(MVCCNodeHelper.java:157)
      at org.jboss.cache.mvcc.MVCCNodeHelper.wrapNodeForWriting(MVCCNodeHelper.java:195)
      at org.jboss.cache.mvcc.MVCCNodeHelper.wrapNodeForWriting(MVCCNodeHelper.java:184)
      at org.jboss.cache.interceptors.MVCCLockingInterceptor.handlePutKeyValueCommand(MVCCLockingInterceptor.java:101)
      at org.jboss.cache.interceptors.base.PrePostProcessingCommandInterceptor.visitPutKeyValueCommand(PrePostProcessingCommandInterceptor.java:88)
      at org.jboss.cache.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:100)
      at org.jboss.cache.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
      at org.jboss.cache.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:131)
      at org.jboss.cache.commands.AbstractVisitor.visitPutKeyValueCommand(AbstractVisitor.java:65)
      at org.jboss.cache.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:100)
      at org.jboss.cache.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
      at org.jboss.cache.interceptors.TxInterceptor.attachGtxAndPassUpChain(TxInterceptor.java:301)
      at org.jboss.cache.interceptors.TxInterceptor.handleDefault(TxInterceptor.java:283)
      at org.jboss.cache.commands.AbstractVisitor.visitPutKeyValueCommand(AbstractVisitor.java:65)
      at org.jboss.cache.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:100)
      at org.jboss.cache.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
      at org.jboss.cache.interceptors.CacheMgmtInterceptor.visitPutKeyValueCommand(CacheMgmtInterceptor.java:119)
      at org.jboss.cache.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:100)
      at org.jboss.cache.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
      at org.jboss.cache.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:178)
      at org.jboss.cache.interceptors.InvocationContextInterceptor.visitPutKeyValueCommand(InvocationContextInterceptor.java:82)
      at org.jboss.cache.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:100)
      at org.jboss.cache.interceptors.InterceptorChain.invoke(InterceptorChain.java:287)
      at org.jboss.cache.invocation.CacheInvocationDelegate.put(CacheInvocationDelegate.java:555)
      at org.jboss.cache.invocation.NodeInvocationDelegate.put(NodeInvocationDelegate.java:373)


        • 1. Re: Possible deadlock with JBoss Cache? - Timeout exceptions

          Additional Note:

          Though we do configure for clustering, there is only one node in this case. See more configuration details below.


          Eviction configuration.

          <eviction wakeUpInterval="3000">
           <default algorithmClass="org.jboss.cache.eviction.LRUAlgorithm" eventQueueSize="200000">
           <property name="maxNodes" value="10000"/>
           </default>
           <region name="/Metadata/Form">
           <property name="maxNodes" value="10000"/>
           </region>
          
           <region name="/Workflow/Escalation">
           <property name="maxNodes" value="5000"/>
           </region>
          
           <region name="/Workflow/Filter">
           <property name="maxNodes" value="5000"/>
           </region>
          
           <region name="/Workflow/ActiveLink">
           <property name="maxNodes" value="5000"/>
           </region>
          
           <region name="/Metadata/Menu">
           <property name="maxNodes" value="5000"/>
           </region>
          
           <region name="/Metadata/Container">
           <property name="maxNodes" value="5000"/>
           </region>
          
           <region name="/AccessControl/User">
           <property name="maxNodes" value="10000"/>
           </region>
          
           <region name="/Metadata/Container/Application">
           <property name="maxNodes" value="0"/>
           </region>
          
           <region name="/Config">
           <property name="maxNodes" value="0"/>
           </region>
          
           </eviction>
          
          <clustering mode="invalidation" clusterName="JBossCache-Cluster">
          
           <!--
           Defines whether to retrieve state on startup
           -->
           <stateRetrieval timeout="20000" fetchInMemoryState="false"/>
          
           <!--
           Network calls are synchronous.
           -->
           <sync replTimeout="20000"/>
          
           <!--
           Configures the JGroups channel. Looks up a JGroups config file on the classpath or filesystem. udp.xml
           ships with jgroups.jar and will be picked up by the class loader.
           -->
           <jgroupsConfig configFile="jgroups-config.xml"/>
           </clustering>
          



          The atomikos transaction manager is injected using the DummyTransactionManager. See configuration below.

          <!--
           Used to register a transaction manager and participate in ongoing transactions. There is no
           JNDI present, since we are out of the container. Dummy lookup.
           The transaction manager is injected programatically.
           -->
           <transaction transactionManagerLookupClass="org.jboss.cache.transaction.DummyTransactionManagerLookup"
           syncRollbackPhase="true"
           syncCommitPhase="true"/>


          • 2. Re: Possible deadlock with JBoss Cache? - Timeout exceptions
            manik

            do you have any concurrent transactions accessing (writing to) that same fqn? If so, it could mean that you just need a greater lock acquisition timeout...

            • 3. Re: Possible deadlock with JBoss Cache? - Timeout exceptions

              I have already tried bumping up the timeouts to very high numbers and that doesnt help. There are no concurrent transactions that are accessing this same fqn. However, as described above, it is a nested (with a requires_new propagation) transaction.The external transaction is suspended when this happens in the internal transaction. This object/fqn is never accessed by the external transaction though. In fact the cache isnt even touched in the external transaction.

              There is never a problem when the same operation is done in a single transaction (non-nested)

              • 4. Re: Possible deadlock with JBoss Cache? - Timeout exceptions
                manik

                Try increasing your concurrencyLevel.

                • 5. Re: Possible deadlock with JBoss Cache? - Timeout exceptions

                  Changing concurrency level didnt seem to help. However, If I get rid of the external transaction, it seems to solve the issue, even though there was no cache interaction in the external transaction at all, and it happens with some objects and works fine in some cases. I think when nested transactions are in play, there could be a problem in associating the right transaction to the lock, thereby causing this issue (the defect could be in jboss cache or atomikos or spring). I will see if I can put together a simpler program that can reproduce this problem and post it here.