5 Replies Latest reply on Aug 30, 2012 4:43 AM by galder.zamarreno

    Timeout sending message when using DIST_SYNC mode

    thofro

      Hi there,

      I'm currently working on a small Infinispan example using distribution clustering mode on two local nodes. The example code realizes some kind of AtomicLong (first get() and then replace() if not changed), where I'm trying to get 20000 unique numbers. When I set numOwners = 1, everything is working fine. But when I set it to numOwners=2, I immediately get a "org.jgroups.TimeoutException: timeout sending message" when the second node joins the cluster.

       

      My config.xml is:

       

      <infinispan
            xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
            xsi:schemaLocation="urn:infinispan:config:5.1 http://www.infinispan.org/schemas/infinispan-config-5.1.xsd"
            xmlns="urn:infinispan:config:5.1">
         
         <global>
            <transport clusterName="demoCluster1">
               <properties>
                  <property name="configurationFile" value="jgroups-tcp.xml" />
               </properties>
            </transport>
         </global>
           
         <default>
              <clustering mode="distribution">
                  <sync replTimeout="25000" />
                  <l1 enabled="false" lifespan="180000" onRehash="false" />
                  <hash numOwners="2" />
              </clustering>
      
              <locking writeSkewCheck="false" useLockStriping="false"
                  lockAcquisitionTimeout="60000" concurrencyLevel="1000" />
          </default>
      </infinispan>
      

       

      My example code:

       

       

      DefaultCacheManager cm;
      try {
              cm = new DefaultCacheManager("myconfig.xml");
                  
      } catch (IOException e1) {
          e1.printStackTrace();
          return;
      }
              
      Cache<String, Long> idCache = cm.getCache("currentID");
              
      idCache.put("currentID", 0L);
      System.out.println("Initialized counter");
              
      long current = 0L;
      boolean success = false;
      
      for (int i = 0; i < 20000; i++) {        while (true) {
                      current = idCache.get("currentID");
                      success = idCache.replace("currentID", current, current + 1);
                      
                      if (success) {
                          break;
                      }
              }
      }
      

       

      Any ideas?

        • 1. Re: Timeout sending message when using DIST_SYNC mode
          galder.zamarreno

          What's the full stacktrace?

          • 2. Re: Timeout sending message when using DIST_SYNC mode
            thofro

            12/08/23 13:00:31 ERROR interceptors.InvocationContextInterceptor: ISPN000136: Execution error

            org.infinispan.CacheException: org.jgroups.TimeoutException: timeout sending message to JENA-40974

                at org.infinispan.util.Util.rewrapAsCacheException(Util.java:526)

                at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:172)

                at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:489)

                at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:161)

                at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:183)

                at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:240)

                at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:227)

                at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:222)

                at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:217)

                at org.infinispan.interceptors.DistributionInterceptor.handleWriteCommand(DistributionInterceptor.java:506)

                at org.infinispan.interceptors.DistributionInterceptor.visitReplaceCommand(DistributionInterceptor.java:304)

                at org.infinispan.commands.write.ReplaceCommand.acceptVisitor(ReplaceCommand.java:66)

                at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)

                at org.infinispan.interceptors.EntryWrappingInterceptor.invokeNextAndApplyChanges(EntryWrappingInterceptor.java:212)

                at org.infinispan.interceptors.EntryWrappingInterceptor.visitReplaceCommand(EntryWrappingInterceptor.java:165)

                at org.infinispan.commands.write.ReplaceCommand.acceptVisitor(ReplaceCommand.java:66)

                at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)

                at org.infinispan.interceptors.locking.NonTransactionalLockingInterceptor.visitReplaceCommand(NonTransactionalLockingInterceptor.java:111)

                at org.infinispan.commands.write.ReplaceCommand.acceptVisitor(ReplaceCommand.java:66)

                at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)

                at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:130)

                at org.infinispan.commands.AbstractVisitor.visitReplaceCommand(AbstractVisitor.java:72)

                at org.infinispan.commands.write.ReplaceCommand.acceptVisitor(ReplaceCommand.java:66)

                at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)

                at org.infinispan.interceptors.StateTransferLockInterceptor.handleWithRetries(StateTransferLockInterceptor.java:213)

                at org.infinispan.interceptors.StateTransferLockInterceptor.handleWriteCommand(StateTransferLockInterceptor.java:181)

                at org.infinispan.interceptors.StateTransferLockInterceptor.visitReplaceCommand(StateTransferLockInterceptor.java:162)

                at org.infinispan.commands.write.ReplaceCommand.acceptVisitor(ReplaceCommand.java:66)

                at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)

                at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:132)

                at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:91)

                at org.infinispan.commands.AbstractVisitor.visitReplaceCommand(AbstractVisitor.java:72)

                at org.infinispan.commands.write.ReplaceCommand.acceptVisitor(ReplaceCommand.java:66)

                at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:345)

                at org.infinispan.CacheImpl.executeCommandAndCommitIfNeeded(CacheImpl.java:1006)

                at org.infinispan.CacheImpl.replace(CacheImpl.java:754)

                at org.infinispan.CacheImpl.replace(CacheImpl.java:747)

                at org.infinispan.CacheSupport.replace(CacheSupport.java:148)

                at de.cinovo.deploy.globalid.InfinispanTest.main(InfinispanTest.java:69)

            Caused by: org.jgroups.TimeoutException: timeout sending message to JENA-40974

                at org.jgroups.blocks.MessageDispatcher.sendMessage(MessageDispatcher.java:384)

                at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.processSingleCall(CommandAwareRpcDispatcher.java:275)

                at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:165)

                ... 37 more

            Exception in thread "main" org.infinispan.CacheException: org.jgroups.TimeoutException: timeout sending message to JENA-40974

                at org.infinispan.util.Util.rewrapAsCacheException(Util.java:526)

                at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:172)

                at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:489)

                at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:161)

                at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:183)

                at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:240)

                at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:227)

                at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:222)

                at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:217)

                at org.infinispan.interceptors.DistributionInterceptor.handleWriteCommand(DistributionInterceptor.java:506)

                at org.infinispan.interceptors.DistributionInterceptor.visitReplaceCommand(DistributionInterceptor.java:304)

                at org.infinispan.commands.write.ReplaceCommand.acceptVisitor(ReplaceCommand.java:66)

                at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)

                at org.infinispan.interceptors.EntryWrappingInterceptor.invokeNextAndApplyChanges(EntryWrappingInterceptor.java:212)

                at org.infinispan.interceptors.EntryWrappingInterceptor.visitReplaceCommand(EntryWrappingInterceptor.java:165)

                at org.infinispan.commands.write.ReplaceCommand.acceptVisitor(ReplaceCommand.java:66)

                at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)

                at org.infinispan.interceptors.locking.NonTransactionalLockingInterceptor.visitReplaceCommand(NonTransactionalLockingInterceptor.java:111)

                at org.infinispan.commands.write.ReplaceCommand.acceptVisitor(ReplaceCommand.java:66)

                at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)

                at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:130)

                at org.infinispan.commands.AbstractVisitor.visitReplaceCommand(AbstractVisitor.java:72)

                at org.infinispan.commands.write.ReplaceCommand.acceptVisitor(ReplaceCommand.java:66)

                at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)

                at org.infinispan.interceptors.StateTransferLockInterceptor.handleWithRetries(StateTransferLockInterceptor.java:213)

                at org.infinispan.interceptors.StateTransferLockInterceptor.handleWriteCommand(StateTransferLockInterceptor.java:181)

                at org.infinispan.interceptors.StateTransferLockInterceptor.visitReplaceCommand(StateTransferLockInterceptor.java:162)

                at org.infinispan.commands.write.ReplaceCommand.acceptVisitor(ReplaceCommand.java:66)

                at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)

                at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:132)

                at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:91)

                at org.infinispan.commands.AbstractVisitor.visitReplaceCommand(AbstractVisitor.java:72)

                at org.infinispan.commands.write.ReplaceCommand.acceptVisitor(ReplaceCommand.java:66)

                at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:345)

                at org.infinispan.CacheImpl.executeCommandAndCommitIfNeeded(CacheImpl.java:1006)

                at org.infinispan.CacheImpl.replace(CacheImpl.java:754)

                at org.infinispan.CacheImpl.replace(CacheImpl.java:747)

                at org.infinispan.CacheSupport.replace(CacheSupport.java:148)

                at de.cinovo.deploy.globalid.InfinispanTest.main(InfinispanTest.java:69)

            Caused by: org.jgroups.TimeoutException: timeout sending message to JENA-40974

                at org.jgroups.blocks.MessageDispatcher.sendMessage(MessageDispatcher.java:384)

                at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.processSingleCall(CommandAwareRpcDispatcher.java:275)

                at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:165)

                ... 37 more

            • 3. Re: Timeout sending message when using DIST_SYNC mode
              galder.zamarreno

              Weird. Do you see any other WARN or ERROR messages?

               

              Seems like the 2nd node is not replying, maybe cos it's running out memory, or is hanging... get some thread dumps of the 2nd node to see what is doing. And check the garbage collection logs on that node.

              • 4. Re: Timeout sending message when using DIST_SYNC mode
                thofro

                No, there are no other Error/Warning messages.

                 

                The communication between the 2 nodes must be working because they have the same view of the currentID cache. When I keep trying to get a number, it eventually works after a certain amount of timeouts (depending on chosen timeout values in config file). So it seems that if both nodes try to lock the key at the same time, they run into a deadlock that cannot be resolved. Another thing is that it works pretty well when I only try to get a few numbers per second.

                 

                I think it's just a configuration thing, but I don't know what I'm doing wrong...

                • 5. Re: Timeout sending message when using DIST_SYNC mode
                  galder.zamarreno

                  What Infinispan version are you using?


                  Have you made any changes to the default jgroups-tcp.xml?

                   

                  Looks to me there's a network, or communications issue here. Because the issue is with the JGroups transport layer, rather than Infinispan.

                   

                  It might be best to try the test with two nodes in your own machine and see how it works.

                   

                  You can also try with UDP transport and see if that works better.

                   

                  And again, thread dumps...

                   

                  Finally, run some JGroups performance tests like the ones suggested in http://www.jgroups.org/perf/perf2008/Report.html