0 Replies Latest reply on Nov 22, 2013 3:57 AM by danielnuss

    EAP 6.1 Infinispan transaction problem: CacheException: Local transaction for transaction not found

    danielnuss

      Short version:

      I occasionally get the following exception:

       

      org.infinispan.CacheException: Local transaction for transaction (TransactionImple < ac, BasicAction: 0:ffff7f000001:1cf63113:528e2fe0:392 status: ActionStatus.COMMITTING >) not found
      
      
      
      
      
      
      

       

      What does that mean and what can i do to prevent this exception?

       

      Long version:

      I have the requirement to assign numbers to incoming jms messages in a clustered JBoss EAP 6.1 system. To do so, i use infinispan (the one embedded in JBoss) to implement a cluster-wide counter inside a MDB, and assign the resulting counter values to the incoming message, saved as key-value pairs in the cache. Now i'm testing a failover scenario with two cluster nodes on two physical separated machines, each backuped with a hornetq backup node (with replication, no shared journal). I'm sending 100 JMS messages to the cluster from outside JBoss with a simple Java client; within the MDB i increment and get the current counter value, log it, write the content of the message inside the cache (each time with the same key for this testing scenario, in real world scenario i would save each message with the counter value as key) and log old and new value of this cache key. So a log block looks like this (i send  100 messages several times to be able to reproduce the described issue, that's the reason the counter is already greater than 100):

       

      17:14:02,069 INFO  [de.patronas.jms.test.FailQueueMDB] (Thread-25 (HornetQ-client-global-threads-1700424464)) current counter value: 163
      17:14:02,070 INFO  [de.patronas.jms.test.FailQueueMDB] (Thread-25 (HornetQ-client-global-threads-1700424464)) received following message: message66
      17:14:02,070 INFO  [de.patronas.jms.test.FailQueueMDB] (Thread-25 (HornetQ-client-global-threads-1700424464)) old content in cache for TESTKEY: message64
      17:14:02,071 INFO  [de.patronas.jms.test.FailQueueMDB] (Thread-25 (HornetQ-client-global-threads-1700424464)) new content in cache for TESTKEY: message66
      
      
      
      
      
      
      

       

      While processing these 100 messages, i kill one of the nodes with kill -9. The backup node receives the JMS messages then, failover works, everything is fine - most of the times. But in some, rare cases, the same counter value is logged twice on the remaining node, each time with a different received message, but with the same old message in cache. This leads to the scenario, that the first of these two received messages was never saved in cache, so it would be lost in the real world scenario, and the counter ends with value 99 instead of 100. Each time when this happens, the exception above occurs, too, within the same thread the first message is received. So i have the impression as if there is a rollback on the infinispan transaction, but not on the hornetq transaction - because the JMS message is not received a second time, but the cache commit did not succeed obviously (because the counter and the value of the message currently saved in the cache are the same in two subsequent log blocks). The subsequent log blocks e.g. look like this:

       

      17:14:02,069 INFO  [de.patronas.jms.test.FailQueueMDB] (Thread-25 (HornetQ-client-global-threads-1700424464)) current counter value: 163
      17:14:02,070 INFO  [de.patronas.jms.test.FailQueueMDB] (Thread-25 (HornetQ-client-global-threads-1700424464)) received following message: message66
      17:14:02,070 INFO  [de.patronas.jms.test.FailQueueMDB] (Thread-25 (HornetQ-client-global-threads-1700424464)) old content in cache for TESTKEY: message64
      17:14:02,071 INFO  [de.patronas.jms.test.FailQueueMDB] (Thread-25 (HornetQ-client-global-threads-1700424464)) new content in cache for TESTKEY: message66
      ...
      17:14:14,545 INFO  [de.patronas.jms.test.FailQueueMDB] (Thread-26 (HornetQ-client-global-threads-1700424464)) current counter value: 163
      17:14:14,547 INFO  [de.patronas.jms.test.FailQueueMDB] (Thread-26 (HornetQ-client-global-threads-1700424464)) received following message: message68
      17:14:14,547 INFO  [de.patronas.jms.test.FailQueueMDB] (Thread-26 (HornetQ-client-global-threads-1700424464)) old content in cache for TESTKEY: message64
      17:14:14,548 INFO  [de.patronas.jms.test.FailQueueMDB] (Thread-26 (HornetQ-client-global-threads-1700424464)) new content in cache for TESTKEY: message68
      
      
      
      
      
      
      

      Can anyone explain me what exactly is happening here and what i can do to prevent this exception?

       

       

      Please find some relevant information regarding the scenario, like code snippets and configuration description underneath. I'll attach the full log of the node which is not killed (meaning this is the node which takes over messages from the killed node) and the configuration of one of the nodes (configuration of other nodes is the same, except backup-group and failover flag of hornetq configuration, which differs obviously).

      JBoss version: EAP 6.1

      Hornetq version: 2.3.1.Final-redhat-1

      Infinispan version: infinispan-core-5.2.6.Final-redhat-1

       

      Counter implementation:

        private long obtainCounter() {
          cache.getAdvancedCache().lock(CacheKeys.MESSAGE_COUNTER.name());
          final Object counter = cache.get(CacheKeys.MESSAGE_COUNTER.name());
          final long currentCounter = counter == null ? 1 : (Long) counter + 1;
          cache.put(CacheKeys.MESSAGE_COUNTER.name(), currentCounter);
          return currentCounter;
        }
      
      
      
      
      
      
      

       

      Logging block:

          final TextMessage textMessage = (TextMessage) inMessage;
      
          try {
            final long counter = obtainCounter();
            LOG.info("current counter value: {}", counter);
      
            final String content = textMessage.getText();
            LOG.info("received following message: {}", content);
            LOG.info("old content in cache for TESTKEY: {}", cache.get("TESTKEY"));
      
            cache.put("TESTKEY", content);
            LOG.info("new content in cache for TESTKEY: {}", cache.get("TESTKEY"));
      
      
      
      
      
      
      

       

       

      Cache configuration:

      <cache-container name="cacheTest" aliases="cachetest" default-cache="default_repl_cache" jndi-name="infinispan/cacheTest">
          <transport lock-timeout="60000"/>
          <replicated-cache name="default_repl_cache" mode="SYNC" start="EAGER" jndi-name="MyCache">
          <transaction mode="FULL_XA" locking="PESSIMISTIC"/>
          </replicated-cache>
      </cache-container>
      
      
      
      
      
      
      

       

      Full stacktrace of CacheException:

      17:14:03,654 WARN  [com.arjuna.ats.jta] (Thread-25 (HornetQ-client-global-threads-1700424464)) ARJUNA016036: commit on < formatId=131077, gtrid_length=29, bqual_length=36, tx_uid=0:ffff7f000001:1cf63113:528e2fe0:392, node_name=1, branch_uid=0:ffff7f000001:1cf63113:528e2fe0:396, subordinatenodename=null, eis_name=unknown eis name > (TransactionXaAdapter{localTransaction=LocalXaTransaction{xid=< 131077, 29, 36, 0000000000-1-112700128-10491982-11447-32003-11049, 0000000000-1-112700128-10491982-11447-32003-10600000000 >} LocalTransaction{remoteLockedNodes=null, isMarkedForRollback=false, lockedKeys=null, backupKeyLocks=null, topologyId=9, isFromStateTransfer=false} org.infinispan.transaction.xa.recovery.RecoveryAwareLocalTransaction@7e}) failed with exception $-: org.infinispan.CacheException: Local transaction for transaction (TransactionImple < ac, BasicAction: 0:ffff7f000001:1cf63113:528e2fe0:392 status: ActionStatus.COMMITTING >) not found
              at org.infinispan.transaction.xa.recovery.RecoveryAwareTransactionTable.failureCompletingTransaction(RecoveryAwareTransactionTable.java:149) [infinispan-core-5.2.6.Final-redhat-1.jar:5.2.6.Final-redhat-1]
              at org.infinispan.transaction.TransactionCoordinator.handleCommitFailure(TransactionCoordinator.java:222) [infinispan-core-5.2.6.Final-redhat-1.jar:5.2.6.Final-redhat-1]
              at org.infinispan.transaction.TransactionCoordinator.commit(TransactionCoordinator.java:177) [infinispan-core-5.2.6.Final-redhat-1.jar:5.2.6.Final-redhat-1]
              at org.infinispan.transaction.xa.TransactionXaAdapter.commit(TransactionXaAdapter.java:127) [infinispan-core-5.2.6.Final-redhat-1.jar:5.2.6.Final-redhat-1]
              at com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord.topLevelCommit(XAResourceRecord.java:451)
              at com.arjuna.ats.arjuna.coordinator.BasicAction.doCommit(BasicAction.java:2732)
              at com.arjuna.ats.arjuna.coordinator.BasicAction.doCommit(BasicAction.java:2648)
              at com.arjuna.ats.arjuna.coordinator.BasicAction.phase2Commit(BasicAction.java:1813)
              at com.arjuna.ats.arjuna.coordinator.BasicAction.End(BasicAction.java:1504)
              at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.end(TwoPhaseCoordinator.java:98)
              at com.arjuna.ats.arjuna.AtomicAction.commit(AtomicAction.java:162)
              at com.arjuna.ats.internal.jta.transaction.arjunacore.TransactionImple.commitAndDisassociate(TransactionImple.java:1165)
              at com.arjuna.ats.internal.jta.transaction.arjunacore.BaseTransaction.commit(BaseTransaction.java:126)
              at com.arjuna.ats.jbossatx.BaseTransactionManagerDelegate.commit(BaseTransactionManagerDelegate.java:75)
              at org.jboss.as.ejb3.inflow.MessageEndpointInvocationHandler.afterDelivery(MessageEndpointInvocationHandler.java:72) [jboss-as-ejb3-7.2.0.Final-redhat-8.jar:7.2.0.Final-redhat-8]
              at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) [:1.6.0_21]
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [rt.jar:1.6.0_21]
              at java.lang.reflect.Method.invoke(Method.java:597) [rt.jar:1.6.0_21]
              at org.jboss.as.ejb3.inflow.AbstractInvocationHandler.handle(AbstractInvocationHandler.java:60) [jboss-as-ejb3-7.2.0.Final-redhat-8.jar:7.2.0.Final-redhat-8]
              at org.jboss.as.ejb3.inflow.MessageEndpointInvocationHandler.doInvoke(MessageEndpointInvocationHandler.java:136) [jboss-as-ejb3-7.2.0.Final-redhat-8.jar:7.2.0.Final-redhat-8]
              at org.jboss.as.ejb3.inflow.AbstractInvocationHandler.invoke(AbstractInvocationHandler.java:73) [jboss-as-ejb3-7.2.0.Final-redhat-8.jar:7.2.0.Final-redhat-8]
              at $Proxy96.afterDelivery(Unknown Source)       at org.hornetq.ra.inflow.HornetQMessageHandler.onMessage(HornetQMessageHandler.java:322)
              at org.hornetq.core.client.impl.ClientConsumerImpl.callOnMessage(ClientConsumerImpl.java:1085) [hornetq-core-client-2.2.13.Final.jar:]
              at org.hornetq.core.client.impl.ClientConsumerImpl.access$400(ClientConsumerImpl.java:57) [hornetq-core-client-2.2.13.Final.jar:]
              at org.hornetq.core.client.impl.ClientConsumerImpl$Runner.run(ClientConsumerImpl.java:1220) [hornetq-core-client-2.2.13.Final.jar:]
              at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:106) [hornetq-core-client-2.2.13.Final.jar:]
              at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [rt.jar:1.6.0_21]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [rt.jar:1.6.0_21]
              at java.lang.Thread.run(Thread.java:619) [rt.jar:1.6.0_21]