TimeoutException while running in cluster
sirocchj Nov 4, 2009 12:12 PMHi all,
apologies if this was answered already but I have not yet found the solution.
I am running a JBoss cluster, 2 machines, Java 5 with: JBossCache 3.2.0.GA, JBoss 5.1.0.GA and JBossESB 4.6 and essentially performing the following:
try { Node<Object, Object> seriesNode = _addSeries(seriesUuid); _sanityCheck(info, seriesNode); _storeInCache(seriesNode, message, info); if (_isSequenceComplete(seriesNode)) result = _createAggregatedMessage(seriesNode); } catch (Exception e) { _logger.error("Exception intercepted while processing message: " + e.getLocalizedMessage()); e.printStackTrace(); }
relevant code is also the following:
private Node<Object, Object> _addSeries(final String seriesUuid) throws Exception { final Fqn<String> seriesFQN = Fqn.fromRelativeElements(FQN_BASE, seriesUuid); try { txManager.begin(); Node<Object, Object> seriesNode = rootNode.addChild(seriesFQN); txManager.commit(); _logger.info("Added series " + seriesUuid + " to cache!"); return seriesNode; } catch (Exception e) { _logger.warn("Could not add series " + seriesUuid + " to cache!"); throw e; } } private void _sanityCheck(MessageDetails info, Node<Object, Object> seriesNode) throws Exception { try { if (info.getMessageTerminationCondition() instanceof Boolean && Boolean.TRUE.equals(info.getMessageTerminationCondition())) { txManager.begin(); seriesNode.put("total", info.getMessageNumber()); txManager.commit(); _logger.info("Series total set (" + info.getMessageNumber() + ")"); } } catch (Exception e) { _logger.warn("Could not save termination condition in cache!"); throw e; } } private void _storeMessageInCache(final Node<Object, Object> seriesNode, final Message message, final MessageDetails info) throws Exception { try { txManager.begin(); Node<Object, Object> messageNode = seriesNode.addChild(Fqn.fromString(info.getMessageNumber().toString())); messageNode.put("payload", Util.serialize(message)); txManager.commit(); _logger.info("\n******************\nStored message number " + info.getMessageNumber() + ": " + info + "\n******************\n"); } catch (Exception e) { _logger.warn("Could not store message " + info.getMessageNumber()); throw e; } }
I'm omitting the rest as it is not relevant, as far as I can see.
Also, these are the properties I use:
<property name="transactionManagerLookupClass">org.jboss.cache.transaction.GenericTransactionManagerLookup</property> <property name="clusterName">${jboss.partition.name:DefaultPartition}-MyCache</property> <property name="multiplexerStack">${jboss.default.jgroups.stack:udp}</property> <property name="fetchInMemoryState">true</property> <property name="isolationLevel">READ_COMMITTED</property> <property name="writeSkewCheck">true</property> <property name="useLockStriping">true</property> <property name="lockParentForChildInsertRemove">false</property> <property name="cacheMode">REPL_SYNC</property> <property name="syncReplTimeout">35000</property> <property name="stateRetrievalTimeout">60000</property> <property name="lockAcquisitionTimeout">30000</property> <property name="useRegionBasedMarshalling">true</property> <property name="serializationExecutorPoolSize">0</property> <property name="listenerAsyncPoolSize">0</property> <property name="cacheLoaderConfig"> <bean class="org.jboss.cache.config.CacheLoaderConfig"> <property name="passivation">false</property> <property name="shared">true</property> <property name="preload">/</property> <property name="individualCacheLoaderConfigs"> <list> <bean class="org.jboss.cache.loader.JDBCCacheLoaderConfig"> <property name="properties"> cache.jdbc.datasource=java:/pojoCacheDS cache.jdbc.table.drop=false cache.jdbc.table.name=jbosscache cache.jdbc.table.create=true cache.jdbc.table.primarykey=jbosscache_pk cache.jdbc.fqn.column=fqn cache.jdbc.fqn.type=VARCHAR(255) cache.jdbc.node.column=node cache.jdbc.node.type=BLOB cache.jdbc.parent.column=parent </property> <property name="async">false</property> <property name="fetchPersistentState">true</property> <property name="purgeOnStartup">false</property> <property name="ignoreModifications">false</property> </bean> </list> </property> </bean> </property>
So, when this is run with 3 threads per machine, 2 machines in cluster, 2000 messages coming in asynchronously, at some stage, sometimes, I see in my stacktrace:
15:15:48,386 WARN [TxInterceptor] Caught exception, will now set transaction to roll back org.jboss.cache.lock.TimeoutException: Unable to acquire lock on Fqn [/aggregator/b41f62ab-4bbf-4314-9f79-341710c4957a/1996] after [30000] milliseconds for requestor [GlobalTransaction:<10.3.0.181:53730>:1329]! Lock held by [GlobalTransaction:<10.1.0.30:58999>:2852] at org.jboss.cache.mvcc.MVCCNodeHelper.acquireLock(MVCCNodeHelper.java:159) at org.jboss.cache.mvcc.MVCCNodeHelper.wrapNodeForWriting(MVCCNodeHelper.java:253) at org.jboss.cache.mvcc.MVCCNodeHelper.wrapNodeForWriting(MVCCNodeHelper.java:186) at org.jboss.cache.interceptors.CacheLoaderInterceptor.loadIfNeeded(CacheLoaderInterceptor.java:300) at org.jboss.cache.interceptors.CacheLoaderInterceptor.visitPutDataMapCommand(CacheLoaderInterceptor.java:124) at org.jboss.cache.commands.write.PutDataMapCommand.acceptVisitor(PutDataMapCommand.java:104) at org.jboss.cache.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116) at org.jboss.cache.interceptors.ReplicationInterceptor.handleCrudMethod(ReplicationInterceptor.java:150) at org.jboss.cache.interceptors.ReplicationInterceptor.visitPutDataMapCommand(ReplicationInterceptor.java:113) at org.jboss.cache.commands.write.PutDataMapCommand.acceptVisitor(PutDataMapCommand.java:104) at org.jboss.cache.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116) at org.jboss.cache.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:131) at org.jboss.cache.commands.AbstractVisitor.visitPutDataMapCommand(AbstractVisitor.java:60) at org.jboss.cache.commands.write.PutDataMapCommand.acceptVisitor(PutDataMapCommand.java:104) at org.jboss.cache.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116) at org.jboss.cache.interceptors.TxInterceptor.replayModifications(TxInterceptor.java:501) at org.jboss.cache.interceptors.TxInterceptor.handleRemotePrepare(TxInterceptor.java:388) at org.jboss.cache.interceptors.TxInterceptor.visitPrepareCommand(TxInterceptor.java:134) at org.jboss.cache.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:68) at org.jboss.cache.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116) at org.jboss.cache.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:131) at org.jboss.cache.commands.AbstractVisitor.visitPrepareCommand(AbstractVisitor.java:140) at org.jboss.cache.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:68) at org.jboss.cache.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116) at org.jboss.cache.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:178) at org.jboss.cache.interceptors.InvocationContextInterceptor.visitPrepareCommand(InvocationContextInterceptor.java:106) at org.jboss.cache.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:68) at org.jboss.cache.interceptors.InterceptorChain.invokeRemote(InterceptorChain.java:316) at org.jboss.cache.commands.remote.ReplicateCommand.processSingleCommand(ReplicateCommand.java:139) at org.jboss.cache.commands.remote.ReplicateCommand.perform(ReplicateCommand.java:115) at org.jboss.cache.marshall.CommandAwareRpcDispatcher.executeCommand(CommandAwareRpcDispatcher.java:319) at org.jboss.cache.marshall.InactiveRegionAwareRpcDispatcher.handle(InactiveRegionAwareRpcDispatcher.java:95) at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:637) at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:545) at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:368) at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:775) at org.jgroups.JChannel.up(JChannel.java:1339) at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:462) at org.jgroups.protocols.pbcast.FLUSH.up(FLUSH.java:501) at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.up(STREAMING_STATE_TRANSFER.java:310) at org.jgroups.protocols.FRAG2.up(FRAG2.java:188) at org.jgroups.protocols.FC.up(FC.java:473) at org.jgroups.protocols.pbcast.GMS.up(GMS.java:824) at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:233) at org.jgroups.protocols.UNICAST.up(UNICAST.java:299) at org.jgroups.protocols.pbcast.NAKACK.handleMessage(NAKACK.java:884) at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:716) at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:167) at org.jgroups.protocols.FD.up(FD.java:284) at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:309) at org.jgroups.protocols.MERGE2.up(MERGE2.java:144) at org.jgroups.protocols.Discovery.up(Discovery.java:264) at org.jgroups.protocols.PING.up(PING.java:273) at org.jgroups.protocols.TP.passMessageUp(TP.java:1285) at org.jgroups.protocols.TP.access$100(TP.java:49) at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1838) at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1812) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:651) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:676) at java.lang.Thread.run(Thread.java:613) 15:15:48,392 WARN [ClusteredTest] Could not store message 1996
10.3.0.181 definitively wants to perform a _storeMessageInCache for Fqn /aggregator/b41f62ab-4bbf-4314-9f79-341710c4957a/1996 while 10.1.0.30 knows nothing about that Fqn but owns some lock...
Btw, /aggregator/b41f62ab-4bbf-4314-9f79-341710c4957a/1996 IS in the DB...
I wonder if this has anything to do with https://jira.jboss.org/jira/browse/JBCACHE-1540 and, if so, if you would recommend the proposed solution (i.e.: has anyone verified the implications of that workaround?)
Kind Regards
\j