5 Replies Latest reply on Aug 30, 2012 4:43 AM by Galder Zamarreño

    Timeout sending message when using DIST_SYNC mode

    Hans Peter Newbie

      Hi there,

      I'm currently working on a small Infinispan example using distribution clustering mode on two local nodes. The example code realizes some kind of AtomicLong (first get() and then replace() if not changed), where I'm trying to get 20000 unique numbers. When I set numOwners = 1, everything is working fine. But when I set it to numOwners=2, I immediately get a "org.jgroups.TimeoutException: timeout sending message" when the second node joins the cluster.

       

      My config.xml is:

       

      <infinispan
            xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
            xsi:schemaLocation="urn:infinispan:config:5.1 http://www.infinispan.org/schemas/infinispan-config-5.1.xsd"
            xmlns="urn:infinispan:config:5.1">
         
         <global>
            <transport clusterName="demoCluster1">
               <properties>
                  <property name="configurationFile" value="jgroups-tcp.xml" />
               </properties>
            </transport>
         </global>
           
         <default>
              <clustering mode="distribution">
                  <sync replTimeout="25000" />
                  <l1 enabled="false" lifespan="180000" onRehash="false" />
                  <hash numOwners="2" />
              </clustering>
      
              <locking writeSkewCheck="false" useLockStriping="false"
                  lockAcquisitionTimeout="60000" concurrencyLevel="1000" />
          </default>
      </infinispan>
      

       

      My example code:

       

       

      DefaultCacheManager cm;
      try {
              cm = new DefaultCacheManager("myconfig.xml");
                  
      } catch (IOException e1) {
          e1.printStackTrace();
          return;
      }
              
      Cache<String, Long> idCache = cm.getCache("currentID");
              
      idCache.put("currentID", 0L);
      System.out.println("Initialized counter");
              
      long current = 0L;
      boolean success = false;
      
      for (int i = 0; i < 20000; i++) {        while (true) {
                      current = idCache.get("currentID");
                      success = idCache.replace("currentID", current, current + 1);
                      
                      if (success) {
                          break;
                      }
              }
      }
      

       

      Any ideas?

        • 2. Re: Timeout sending message when using DIST_SYNC mode
          Hans Peter Newbie

          12/08/23 13:00:31 ERROR interceptors.InvocationContextInterceptor: ISPN000136: Execution error

          org.infinispan.CacheException: org.jgroups.TimeoutException: timeout sending message to JENA-40974

              at org.infinispan.util.Util.rewrapAsCacheException(Util.java:526)

              at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:172)

              at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:489)

              at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:161)

              at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:183)

              at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:240)

              at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:227)

              at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:222)

              at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:217)

              at org.infinispan.interceptors.DistributionInterceptor.handleWriteCommand(DistributionInterceptor.java:506)

              at org.infinispan.interceptors.DistributionInterceptor.visitReplaceCommand(DistributionInterceptor.java:304)

              at org.infinispan.commands.write.ReplaceCommand.acceptVisitor(ReplaceCommand.java:66)

              at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)

              at org.infinispan.interceptors.EntryWrappingInterceptor.invokeNextAndApplyChanges(EntryWrappingInterceptor.java:212)

              at org.infinispan.interceptors.EntryWrappingInterceptor.visitReplaceCommand(EntryWrappingInterceptor.java:165)

              at org.infinispan.commands.write.ReplaceCommand.acceptVisitor(ReplaceCommand.java:66)

              at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)

              at org.infinispan.interceptors.locking.NonTransactionalLockingInterceptor.visitReplaceCommand(NonTransactionalLockingInterceptor.java:111)

              at org.infinispan.commands.write.ReplaceCommand.acceptVisitor(ReplaceCommand.java:66)

              at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)

              at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:130)

              at org.infinispan.commands.AbstractVisitor.visitReplaceCommand(AbstractVisitor.java:72)

              at org.infinispan.commands.write.ReplaceCommand.acceptVisitor(ReplaceCommand.java:66)

              at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)

              at org.infinispan.interceptors.StateTransferLockInterceptor.handleWithRetries(StateTransferLockInterceptor.java:213)

              at org.infinispan.interceptors.StateTransferLockInterceptor.handleWriteCommand(StateTransferLockInterceptor.java:181)

              at org.infinispan.interceptors.StateTransferLockInterceptor.visitReplaceCommand(StateTransferLockInterceptor.java:162)

              at org.infinispan.commands.write.ReplaceCommand.acceptVisitor(ReplaceCommand.java:66)

              at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)

              at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:132)

              at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:91)

              at org.infinispan.commands.AbstractVisitor.visitReplaceCommand(AbstractVisitor.java:72)

              at org.infinispan.commands.write.ReplaceCommand.acceptVisitor(ReplaceCommand.java:66)

              at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:345)

              at org.infinispan.CacheImpl.executeCommandAndCommitIfNeeded(CacheImpl.java:1006)

              at org.infinispan.CacheImpl.replace(CacheImpl.java:754)

              at org.infinispan.CacheImpl.replace(CacheImpl.java:747)

              at org.infinispan.CacheSupport.replace(CacheSupport.java:148)

              at de.cinovo.deploy.globalid.InfinispanTest.main(InfinispanTest.java:69)

          Caused by: org.jgroups.TimeoutException: timeout sending message to JENA-40974

              at org.jgroups.blocks.MessageDispatcher.sendMessage(MessageDispatcher.java:384)

              at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.processSingleCall(CommandAwareRpcDispatcher.java:275)

              at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:165)

              ... 37 more

          Exception in thread "main" org.infinispan.CacheException: org.jgroups.TimeoutException: timeout sending message to JENA-40974

              at org.infinispan.util.Util.rewrapAsCacheException(Util.java:526)

              at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:172)

              at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:489)

              at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:161)

              at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:183)

              at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:240)

              at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:227)

              at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:222)

              at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:217)

              at org.infinispan.interceptors.DistributionInterceptor.handleWriteCommand(DistributionInterceptor.java:506)

              at org.infinispan.interceptors.DistributionInterceptor.visitReplaceCommand(DistributionInterceptor.java:304)

              at org.infinispan.commands.write.ReplaceCommand.acceptVisitor(ReplaceCommand.java:66)

              at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)

              at org.infinispan.interceptors.EntryWrappingInterceptor.invokeNextAndApplyChanges(EntryWrappingInterceptor.java:212)

              at org.infinispan.interceptors.EntryWrappingInterceptor.visitReplaceCommand(EntryWrappingInterceptor.java:165)

              at org.infinispan.commands.write.ReplaceCommand.acceptVisitor(ReplaceCommand.java:66)

              at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)

              at org.infinispan.interceptors.locking.NonTransactionalLockingInterceptor.visitReplaceCommand(NonTransactionalLockingInterceptor.java:111)

              at org.infinispan.commands.write.ReplaceCommand.acceptVisitor(ReplaceCommand.java:66)

              at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)

              at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:130)

              at org.infinispan.commands.AbstractVisitor.visitReplaceCommand(AbstractVisitor.java:72)

              at org.infinispan.commands.write.ReplaceCommand.acceptVisitor(ReplaceCommand.java:66)

              at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)

              at org.infinispan.interceptors.StateTransferLockInterceptor.handleWithRetries(StateTransferLockInterceptor.java:213)

              at org.infinispan.interceptors.StateTransferLockInterceptor.handleWriteCommand(StateTransferLockInterceptor.java:181)

              at org.infinispan.interceptors.StateTransferLockInterceptor.visitReplaceCommand(StateTransferLockInterceptor.java:162)

              at org.infinispan.commands.write.ReplaceCommand.acceptVisitor(ReplaceCommand.java:66)

              at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)

              at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:132)

              at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:91)

              at org.infinispan.commands.AbstractVisitor.visitReplaceCommand(AbstractVisitor.java:72)

              at org.infinispan.commands.write.ReplaceCommand.acceptVisitor(ReplaceCommand.java:66)

              at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:345)

              at org.infinispan.CacheImpl.executeCommandAndCommitIfNeeded(CacheImpl.java:1006)

              at org.infinispan.CacheImpl.replace(CacheImpl.java:754)

              at org.infinispan.CacheImpl.replace(CacheImpl.java:747)

              at org.infinispan.CacheSupport.replace(CacheSupport.java:148)

              at de.cinovo.deploy.globalid.InfinispanTest.main(InfinispanTest.java:69)

          Caused by: org.jgroups.TimeoutException: timeout sending message to JENA-40974

              at org.jgroups.blocks.MessageDispatcher.sendMessage(MessageDispatcher.java:384)

              at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.processSingleCall(CommandAwareRpcDispatcher.java:275)

              at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:165)

              ... 37 more

          • 3. Re: Timeout sending message when using DIST_SYNC mode
            Galder Zamarreño Master

            Weird. Do you see any other WARN or ERROR messages?

             

            Seems like the 2nd node is not replying, maybe cos it's running out memory, or is hanging... get some thread dumps of the 2nd node to see what is doing. And check the garbage collection logs on that node.

            • 4. Re: Timeout sending message when using DIST_SYNC mode
              Hans Peter Newbie

              No, there are no other Error/Warning messages.

               

              The communication between the 2 nodes must be working because they have the same view of the currentID cache. When I keep trying to get a number, it eventually works after a certain amount of timeouts (depending on chosen timeout values in config file). So it seems that if both nodes try to lock the key at the same time, they run into a deadlock that cannot be resolved. Another thing is that it works pretty well when I only try to get a few numbers per second.

               

              I think it's just a configuration thing, but I don't know what I'm doing wrong...

              • 5. Re: Timeout sending message when using DIST_SYNC mode
                Galder Zamarreño Master

                What Infinispan version are you using?


                Have you made any changes to the default jgroups-tcp.xml?

                 

                Looks to me there's a network, or communications issue here. Because the issue is with the JGroups transport layer, rather than Infinispan.

                 

                It might be best to try the test with two nodes in your own machine and see how it works.

                 

                You can also try with UDP transport and see if that works better.

                 

                And again, thread dumps...

                 

                Finally, run some JGroups performance tests like the ones suggested in http://www.jgroups.org/perf/perf2008/Report.html