0 Replies Latest reply on Nov 4, 2009 12:12 PM by sirocchj

    TimeoutException while running in cluster

    sirocchj

      Hi all,

      apologies if this was answered already but I have not yet found the solution.

      I am running a JBoss cluster, 2 machines, Java 5 with: JBossCache 3.2.0.GA, JBoss 5.1.0.GA and JBossESB 4.6 and essentially performing the following:

      try {
       Node<Object, Object> seriesNode = _addSeries(seriesUuid);
       _sanityCheck(info, seriesNode);
       _storeInCache(seriesNode, message, info);
       if (_isSequenceComplete(seriesNode))
       result = _createAggregatedMessage(seriesNode);
      } catch (Exception e) {
       _logger.error("Exception intercepted while processing message: " + e.getLocalizedMessage());
       e.printStackTrace();
      }
      


      relevant code is also the following:

      private Node<Object, Object> _addSeries(final String seriesUuid) throws Exception {
       final Fqn<String> seriesFQN = Fqn.fromRelativeElements(FQN_BASE, seriesUuid);
       try {
       txManager.begin();
       Node<Object, Object> seriesNode = rootNode.addChild(seriesFQN);
       txManager.commit();
       _logger.info("Added series " + seriesUuid + " to cache!");
       return seriesNode;
       } catch (Exception e) {
       _logger.warn("Could not add series " + seriesUuid + " to cache!");
       throw e;
       }
      }
      
      private void _sanityCheck(MessageDetails info, Node<Object, Object> seriesNode) throws Exception {
       try {
       if (info.getMessageTerminationCondition() instanceof Boolean && Boolean.TRUE.equals(info.getMessageTerminationCondition())) {
       txManager.begin();
       seriesNode.put("total", info.getMessageNumber());
       txManager.commit();
       _logger.info("Series total set (" + info.getMessageNumber() + ")");
       }
       } catch (Exception e) {
       _logger.warn("Could not save termination condition in cache!");
       throw e;
       }
      }
      
      private void _storeMessageInCache(final Node<Object, Object> seriesNode, final Message message, final MessageDetails info) throws Exception {
       try {
       txManager.begin();
       Node<Object, Object> messageNode = seriesNode.addChild(Fqn.fromString(info.getMessageNumber().toString()));
       messageNode.put("payload", Util.serialize(message));
       txManager.commit();
       _logger.info("\n******************\nStored message number " + info.getMessageNumber() + ": " + info + "\n******************\n");
       } catch (Exception e) {
       _logger.warn("Could not store message " + info.getMessageNumber());
       throw e;
       }
      }
      


      I'm omitting the rest as it is not relevant, as far as I can see.
      Also, these are the properties I use:

      <property name="transactionManagerLookupClass">org.jboss.cache.transaction.GenericTransactionManagerLookup</property>
      <property name="clusterName">${jboss.partition.name:DefaultPartition}-MyCache</property>
      <property name="multiplexerStack">${jboss.default.jgroups.stack:udp}</property>
      <property name="fetchInMemoryState">true</property>
      <property name="isolationLevel">READ_COMMITTED</property>
      <property name="writeSkewCheck">true</property>
      <property name="useLockStriping">true</property>
      <property name="lockParentForChildInsertRemove">false</property>
      <property name="cacheMode">REPL_SYNC</property>
      <property name="syncReplTimeout">35000</property>
      <property name="stateRetrievalTimeout">60000</property>
      <property name="lockAcquisitionTimeout">30000</property>
      <property name="useRegionBasedMarshalling">true</property>
      <property name="serializationExecutorPoolSize">0</property>
      <property name="listenerAsyncPoolSize">0</property>
      <property name="cacheLoaderConfig">
       <bean class="org.jboss.cache.config.CacheLoaderConfig">
       <property name="passivation">false</property>
       <property name="shared">true</property>
       <property name="preload">/</property>
       <property name="individualCacheLoaderConfigs">
       <list>
       <bean class="org.jboss.cache.loader.JDBCCacheLoaderConfig">
       <property name="properties">
       cache.jdbc.datasource=java:/pojoCacheDS
       cache.jdbc.table.drop=false
       cache.jdbc.table.name=jbosscache
       cache.jdbc.table.create=true
       cache.jdbc.table.primarykey=jbosscache_pk
       cache.jdbc.fqn.column=fqn
       cache.jdbc.fqn.type=VARCHAR(255)
       cache.jdbc.node.column=node
       cache.jdbc.node.type=BLOB
       cache.jdbc.parent.column=parent
       </property>
       <property name="async">false</property>
       <property name="fetchPersistentState">true</property>
       <property name="purgeOnStartup">false</property>
       <property name="ignoreModifications">false</property>
       </bean>
       </list>
       </property>
       </bean>
      </property>
      


      So, when this is run with 3 threads per machine, 2 machines in cluster, 2000 messages coming in asynchronously, at some stage, sometimes, I see in my stacktrace:

      15:15:48,386 WARN [TxInterceptor] Caught exception, will now set transaction to roll back
      org.jboss.cache.lock.TimeoutException: Unable to acquire lock on Fqn [/aggregator/b41f62ab-4bbf-4314-9f79-341710c4957a/1996] after [30000] milliseconds for requestor [GlobalTransaction:<10.3.0.181:53730>:1329]! Lock held by [GlobalTransaction:<10.1.0.30:58999>:2852]
       at org.jboss.cache.mvcc.MVCCNodeHelper.acquireLock(MVCCNodeHelper.java:159)
       at org.jboss.cache.mvcc.MVCCNodeHelper.wrapNodeForWriting(MVCCNodeHelper.java:253)
       at org.jboss.cache.mvcc.MVCCNodeHelper.wrapNodeForWriting(MVCCNodeHelper.java:186)
       at org.jboss.cache.interceptors.CacheLoaderInterceptor.loadIfNeeded(CacheLoaderInterceptor.java:300)
       at org.jboss.cache.interceptors.CacheLoaderInterceptor.visitPutDataMapCommand(CacheLoaderInterceptor.java:124)
       at org.jboss.cache.commands.write.PutDataMapCommand.acceptVisitor(PutDataMapCommand.java:104)
       at org.jboss.cache.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
       at org.jboss.cache.interceptors.ReplicationInterceptor.handleCrudMethod(ReplicationInterceptor.java:150)
       at org.jboss.cache.interceptors.ReplicationInterceptor.visitPutDataMapCommand(ReplicationInterceptor.java:113)
       at org.jboss.cache.commands.write.PutDataMapCommand.acceptVisitor(PutDataMapCommand.java:104)
       at org.jboss.cache.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
       at org.jboss.cache.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:131)
       at org.jboss.cache.commands.AbstractVisitor.visitPutDataMapCommand(AbstractVisitor.java:60)
       at org.jboss.cache.commands.write.PutDataMapCommand.acceptVisitor(PutDataMapCommand.java:104)
       at org.jboss.cache.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
       at org.jboss.cache.interceptors.TxInterceptor.replayModifications(TxInterceptor.java:501)
       at org.jboss.cache.interceptors.TxInterceptor.handleRemotePrepare(TxInterceptor.java:388)
       at org.jboss.cache.interceptors.TxInterceptor.visitPrepareCommand(TxInterceptor.java:134)
       at org.jboss.cache.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:68)
       at org.jboss.cache.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
       at org.jboss.cache.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:131)
       at org.jboss.cache.commands.AbstractVisitor.visitPrepareCommand(AbstractVisitor.java:140)
       at org.jboss.cache.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:68)
       at org.jboss.cache.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
       at org.jboss.cache.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:178)
       at org.jboss.cache.interceptors.InvocationContextInterceptor.visitPrepareCommand(InvocationContextInterceptor.java:106)
       at org.jboss.cache.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:68)
       at org.jboss.cache.interceptors.InterceptorChain.invokeRemote(InterceptorChain.java:316)
       at org.jboss.cache.commands.remote.ReplicateCommand.processSingleCommand(ReplicateCommand.java:139)
       at org.jboss.cache.commands.remote.ReplicateCommand.perform(ReplicateCommand.java:115)
       at org.jboss.cache.marshall.CommandAwareRpcDispatcher.executeCommand(CommandAwareRpcDispatcher.java:319)
       at org.jboss.cache.marshall.InactiveRegionAwareRpcDispatcher.handle(InactiveRegionAwareRpcDispatcher.java:95)
       at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:637)
       at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:545)
       at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:368)
       at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:775)
       at org.jgroups.JChannel.up(JChannel.java:1339)
       at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:462)
       at org.jgroups.protocols.pbcast.FLUSH.up(FLUSH.java:501)
       at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.up(STREAMING_STATE_TRANSFER.java:310)
       at org.jgroups.protocols.FRAG2.up(FRAG2.java:188)
       at org.jgroups.protocols.FC.up(FC.java:473)
       at org.jgroups.protocols.pbcast.GMS.up(GMS.java:824)
       at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:233)
       at org.jgroups.protocols.UNICAST.up(UNICAST.java:299)
       at org.jgroups.protocols.pbcast.NAKACK.handleMessage(NAKACK.java:884)
       at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:716)
       at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:167)
       at org.jgroups.protocols.FD.up(FD.java:284)
       at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:309)
       at org.jgroups.protocols.MERGE2.up(MERGE2.java:144)
       at org.jgroups.protocols.Discovery.up(Discovery.java:264)
       at org.jgroups.protocols.PING.up(PING.java:273)
       at org.jgroups.protocols.TP.passMessageUp(TP.java:1285)
       at org.jgroups.protocols.TP.access$100(TP.java:49)
       at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1838)
       at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1812)
       at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:651)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:676)
       at java.lang.Thread.run(Thread.java:613)
      
      15:15:48,392 WARN [ClusteredTest] Could not store message 1996
      


      10.3.0.181 definitively wants to perform a _storeMessageInCache for Fqn /aggregator/b41f62ab-4bbf-4314-9f79-341710c4957a/1996 while 10.1.0.30 knows nothing about that Fqn but owns some lock...
      Btw, /aggregator/b41f62ab-4bbf-4314-9f79-341710c4957a/1996 IS in the DB...

      I wonder if this has anything to do with https://jira.jboss.org/jira/browse/JBCACHE-1540 and, if so, if you would recommend the proposed solution (i.e.: has anyone verified the implications of that workaround?)

      Kind Regards
      \j