EAP 6.1 Infinispan transaction problem: CacheException: Local transaction for transaction not found
danielnuss Nov 22, 2013 3:57 AMShort version:
I occasionally get the following exception:
org.infinispan.CacheException: Local transaction for transaction (TransactionImple < ac, BasicAction: 0:ffff7f000001:1cf63113:528e2fe0:392 status: ActionStatus.COMMITTING >) not found
What does that mean and what can i do to prevent this exception?
Long version:
I have the requirement to assign numbers to incoming jms messages in a clustered JBoss EAP 6.1 system. To do so, i use infinispan (the one embedded in JBoss) to implement a cluster-wide counter inside a MDB, and assign the resulting counter values to the incoming message, saved as key-value pairs in the cache. Now i'm testing a failover scenario with two cluster nodes on two physical separated machines, each backuped with a hornetq backup node (with replication, no shared journal). I'm sending 100 JMS messages to the cluster from outside JBoss with a simple Java client; within the MDB i increment and get the current counter value, log it, write the content of the message inside the cache (each time with the same key for this testing scenario, in real world scenario i would save each message with the counter value as key) and log old and new value of this cache key. So a log block looks like this (i send 100 messages several times to be able to reproduce the described issue, that's the reason the counter is already greater than 100):
17:14:02,069 INFO [de.patronas.jms.test.FailQueueMDB] (Thread-25 (HornetQ-client-global-threads-1700424464)) current counter value: 163 17:14:02,070 INFO [de.patronas.jms.test.FailQueueMDB] (Thread-25 (HornetQ-client-global-threads-1700424464)) received following message: message66 17:14:02,070 INFO [de.patronas.jms.test.FailQueueMDB] (Thread-25 (HornetQ-client-global-threads-1700424464)) old content in cache for TESTKEY: message64 17:14:02,071 INFO [de.patronas.jms.test.FailQueueMDB] (Thread-25 (HornetQ-client-global-threads-1700424464)) new content in cache for TESTKEY: message66
While processing these 100 messages, i kill one of the nodes with kill -9. The backup node receives the JMS messages then, failover works, everything is fine - most of the times. But in some, rare cases, the same counter value is logged twice on the remaining node, each time with a different received message, but with the same old message in cache. This leads to the scenario, that the first of these two received messages was never saved in cache, so it would be lost in the real world scenario, and the counter ends with value 99 instead of 100. Each time when this happens, the exception above occurs, too, within the same thread the first message is received. So i have the impression as if there is a rollback on the infinispan transaction, but not on the hornetq transaction - because the JMS message is not received a second time, but the cache commit did not succeed obviously (because the counter and the value of the message currently saved in the cache are the same in two subsequent log blocks). The subsequent log blocks e.g. look like this:
17:14:02,069 INFO [de.patronas.jms.test.FailQueueMDB] (Thread-25 (HornetQ-client-global-threads-1700424464)) current counter value: 163 17:14:02,070 INFO [de.patronas.jms.test.FailQueueMDB] (Thread-25 (HornetQ-client-global-threads-1700424464)) received following message: message66 17:14:02,070 INFO [de.patronas.jms.test.FailQueueMDB] (Thread-25 (HornetQ-client-global-threads-1700424464)) old content in cache for TESTKEY: message64 17:14:02,071 INFO [de.patronas.jms.test.FailQueueMDB] (Thread-25 (HornetQ-client-global-threads-1700424464)) new content in cache for TESTKEY: message66 ... 17:14:14,545 INFO [de.patronas.jms.test.FailQueueMDB] (Thread-26 (HornetQ-client-global-threads-1700424464)) current counter value: 163 17:14:14,547 INFO [de.patronas.jms.test.FailQueueMDB] (Thread-26 (HornetQ-client-global-threads-1700424464)) received following message: message68 17:14:14,547 INFO [de.patronas.jms.test.FailQueueMDB] (Thread-26 (HornetQ-client-global-threads-1700424464)) old content in cache for TESTKEY: message64 17:14:14,548 INFO [de.patronas.jms.test.FailQueueMDB] (Thread-26 (HornetQ-client-global-threads-1700424464)) new content in cache for TESTKEY: message68
Can anyone explain me what exactly is happening here and what i can do to prevent this exception?
Please find some relevant information regarding the scenario, like code snippets and configuration description underneath. I'll attach the full log of the node which is not killed (meaning this is the node which takes over messages from the killed node) and the configuration of one of the nodes (configuration of other nodes is the same, except backup-group and failover flag of hornetq configuration, which differs obviously).
JBoss version: EAP 6.1
Hornetq version: 2.3.1.Final-redhat-1
Infinispan version: infinispan-core-5.2.6.Final-redhat-1
Counter implementation:
private long obtainCounter() { cache.getAdvancedCache().lock(CacheKeys.MESSAGE_COUNTER.name()); final Object counter = cache.get(CacheKeys.MESSAGE_COUNTER.name()); final long currentCounter = counter == null ? 1 : (Long) counter + 1; cache.put(CacheKeys.MESSAGE_COUNTER.name(), currentCounter); return currentCounter; }
Logging block:
final TextMessage textMessage = (TextMessage) inMessage; try { final long counter = obtainCounter(); LOG.info("current counter value: {}", counter); final String content = textMessage.getText(); LOG.info("received following message: {}", content); LOG.info("old content in cache for TESTKEY: {}", cache.get("TESTKEY")); cache.put("TESTKEY", content); LOG.info("new content in cache for TESTKEY: {}", cache.get("TESTKEY"));
Cache configuration:
<cache-container name="cacheTest" aliases="cachetest" default-cache="default_repl_cache" jndi-name="infinispan/cacheTest"> <transport lock-timeout="60000"/> <replicated-cache name="default_repl_cache" mode="SYNC" start="EAGER" jndi-name="MyCache"> <transaction mode="FULL_XA" locking="PESSIMISTIC"/> </replicated-cache> </cache-container>
Full stacktrace of CacheException:
17:14:03,654 WARN [com.arjuna.ats.jta] (Thread-25 (HornetQ-client-global-threads-1700424464)) ARJUNA016036: commit on < formatId=131077, gtrid_length=29, bqual_length=36, tx_uid=0:ffff7f000001:1cf63113:528e2fe0:392, node_name=1, branch_uid=0:ffff7f000001:1cf63113:528e2fe0:396, subordinatenodename=null, eis_name=unknown eis name > (TransactionXaAdapter{localTransaction=LocalXaTransaction{xid=< 131077, 29, 36, 0000000000-1-112700128-10491982-11447-32003-11049, 0000000000-1-112700128-10491982-11447-32003-10600000000 >} LocalTransaction{remoteLockedNodes=null, isMarkedForRollback=false, lockedKeys=null, backupKeyLocks=null, topologyId=9, isFromStateTransfer=false} org.infinispan.transaction.xa.recovery.RecoveryAwareLocalTransaction@7e}) failed with exception $-: org.infinispan.CacheException: Local transaction for transaction (TransactionImple < ac, BasicAction: 0:ffff7f000001:1cf63113:528e2fe0:392 status: ActionStatus.COMMITTING >) not found at org.infinispan.transaction.xa.recovery.RecoveryAwareTransactionTable.failureCompletingTransaction(RecoveryAwareTransactionTable.java:149) [infinispan-core-5.2.6.Final-redhat-1.jar:5.2.6.Final-redhat-1] at org.infinispan.transaction.TransactionCoordinator.handleCommitFailure(TransactionCoordinator.java:222) [infinispan-core-5.2.6.Final-redhat-1.jar:5.2.6.Final-redhat-1] at org.infinispan.transaction.TransactionCoordinator.commit(TransactionCoordinator.java:177) [infinispan-core-5.2.6.Final-redhat-1.jar:5.2.6.Final-redhat-1] at org.infinispan.transaction.xa.TransactionXaAdapter.commit(TransactionXaAdapter.java:127) [infinispan-core-5.2.6.Final-redhat-1.jar:5.2.6.Final-redhat-1] at com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord.topLevelCommit(XAResourceRecord.java:451) at com.arjuna.ats.arjuna.coordinator.BasicAction.doCommit(BasicAction.java:2732) at com.arjuna.ats.arjuna.coordinator.BasicAction.doCommit(BasicAction.java:2648) at com.arjuna.ats.arjuna.coordinator.BasicAction.phase2Commit(BasicAction.java:1813) at com.arjuna.ats.arjuna.coordinator.BasicAction.End(BasicAction.java:1504) at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.end(TwoPhaseCoordinator.java:98) at com.arjuna.ats.arjuna.AtomicAction.commit(AtomicAction.java:162) at com.arjuna.ats.internal.jta.transaction.arjunacore.TransactionImple.commitAndDisassociate(TransactionImple.java:1165) at com.arjuna.ats.internal.jta.transaction.arjunacore.BaseTransaction.commit(BaseTransaction.java:126) at com.arjuna.ats.jbossatx.BaseTransactionManagerDelegate.commit(BaseTransactionManagerDelegate.java:75) at org.jboss.as.ejb3.inflow.MessageEndpointInvocationHandler.afterDelivery(MessageEndpointInvocationHandler.java:72) [jboss-as-ejb3-7.2.0.Final-redhat-8.jar:7.2.0.Final-redhat-8] at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) [:1.6.0_21] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [rt.jar:1.6.0_21] at java.lang.reflect.Method.invoke(Method.java:597) [rt.jar:1.6.0_21] at org.jboss.as.ejb3.inflow.AbstractInvocationHandler.handle(AbstractInvocationHandler.java:60) [jboss-as-ejb3-7.2.0.Final-redhat-8.jar:7.2.0.Final-redhat-8] at org.jboss.as.ejb3.inflow.MessageEndpointInvocationHandler.doInvoke(MessageEndpointInvocationHandler.java:136) [jboss-as-ejb3-7.2.0.Final-redhat-8.jar:7.2.0.Final-redhat-8] at org.jboss.as.ejb3.inflow.AbstractInvocationHandler.invoke(AbstractInvocationHandler.java:73) [jboss-as-ejb3-7.2.0.Final-redhat-8.jar:7.2.0.Final-redhat-8] at $Proxy96.afterDelivery(Unknown Source) at org.hornetq.ra.inflow.HornetQMessageHandler.onMessage(HornetQMessageHandler.java:322) at org.hornetq.core.client.impl.ClientConsumerImpl.callOnMessage(ClientConsumerImpl.java:1085) [hornetq-core-client-2.2.13.Final.jar:] at org.hornetq.core.client.impl.ClientConsumerImpl.access$400(ClientConsumerImpl.java:57) [hornetq-core-client-2.2.13.Final.jar:] at org.hornetq.core.client.impl.ClientConsumerImpl$Runner.run(ClientConsumerImpl.java:1220) [hornetq-core-client-2.2.13.Final.jar:] at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:106) [hornetq-core-client-2.2.13.Final.jar:] at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [rt.jar:1.6.0_21] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [rt.jar:1.6.0_21] at java.lang.Thread.run(Thread.java:619) [rt.jar:1.6.0_21]
-
server-CacheException.log.zip 79.3 KB
-
standalone-full-ha.xml 28.6 KB