2 Replies Latest reply on Apr 30, 2015 7:01 AM by Wolf-Dieter Fink

    Inconsistent cluster view

    sokrates_sd Newbie

      Setup

      • 2 node clustered Wildfly 8.0.2 with Infinispan 6.0.2.Final
      • Connected via JGroups TCPPING
      • Replicated synchronous cache

       

      On my development cluster I encountered the following

      erroneous behaviour. For better readability I put [NodeX]

      into the log lines below and pasted the actual stacktraces

      at the end of the text.

       

      At 10:57 Node2 logs that it received a new cluster view,

      while Node1 did not log anything.

       

      2015-04-28 10:57:03,494 [Node2][INFO ][Incoming-17,shared=tcp][org.infinispan.remoting.transport.jgroups.JGroupsTransport] ISPN000094: Received new cluster view: [va2/va-cluster|94] (1) [va2/va-cluster]

       

       

      From that time on, the JGroups layer had this views:

      node1: [va2/va-cluster|93] (2) [va2/va-cluster, va1/va-cluster]

      node2: [va2/va-cluster|94] (1) [va2/va-cluster]

       

       

      Over the whole day I could observe the following behaviour, e.g.

       

      2015-04-28 20:01:13,447 [Node1] putting object in cache

       

       

      Nothing happens for some time, thread seems blocked, then after 17 seconds Node1 logs

       

      2015-04-28 20:01:30,951 [Node1] [ERROR][default task-5][org.infinispan.interceptors.InvocationContextInterceptor] ISPN000136: Execution error: org.infinispan.util.concurrent.TimeoutException: Timed out waiting for topology 171

       

      And immediately this on node2

       

      2015-04-28 20:01:30,974 [Node2] [ERROR][remote-thread-72][org.infinispan.remoting.InboundInvocationHandlerImpl] ISPN000260: Exception executing command: org.infinispan.statetransfer.OutdatedTopologyException: Cache topology changed while the command was executing: expect 170, got 171

       

      Putting object in caches on node2 works fine, since obviously it does not try to replicte it to node1.

       

       

      Then I rebooted node2 with the following interesting log messages, note again a 12 second delay:

       

      2015-04-28 20:06:46,286 [Node2][INFO ][MSC service thread 1-7][org.infinispan.remoting.transport.jgroups.JGroupsTransport] ISPN000078: Starting JGroups Channel

      2015-04-28 20:06:58,812 [Node1][INFO ][Incoming-7,shared=tcp][org.infinispan.remoting.transport.jgroups.JGroupsTransport] ISPN000094: Received new cluster view: [va1/va-cluster|95] (1) [va1/va-cluster]

      2015-04-28 20:07:01,188 [Node2][INFO ][MSC service thread 1-7][org.infinispan.remoting.transport.jgroups.JGroupsTransport] ISPN000094: Received new cluster view: [va1/va-cluster|96] (2) [va1/va-cluster, va2/va-cluster]

       

      After that cache entries became replicated again in either direction.

       

       

      My questions now are:

       

      1. Why does this asymmetric view happen?

       

      2. Any hints on how to debug further? How to provoke this situation?

       

      3. What to do about it?

       

      Thanks for any thoughts on that,

       

      Here the detailed stack traces:

       

       

      2015-04-28 20:01:30,951 [Node1] [ERROR][default task-5][org.infinispan.interceptors.InvocationContextInterceptor] ISPN000136: Execution error: org.infinispan.util.concurrent.TimeoutException: Timed out waiting for topology 171

        at org.infinispan.statetransfer.StateTransferLockImpl.waitForTransactionData(StateTransferLockImpl.java:93)

        at org.infinispan.statetransfer.StateTransferInterceptor.handleNonTxWriteCommand(StateTransferInterceptor.java:234)

        at org.infinispan.statetransfer.StateTransferInterceptor.visitPutKeyValueCommand(StateTransferInterceptor.java:141)

        at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:70)

        at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)

        at org.infinispan.interceptors.CacheMgmtInterceptor.updateStoreStatistics(CacheMgmtInterceptor.java:148)

        at org.infinispan.interceptors.CacheMgmtInterceptor.visitPutKeyValueCommand(CacheMgmtInterceptor.java:134)

        at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:70)

        at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)

        at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:110)

        at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:73)

        at org.infinispan.commands.AbstractVisitor.visitPutKeyValueCommand(AbstractVisitor.java:32)

        at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:70)

        at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:333)

        at org.infinispan.CacheImpl.executeCommandAndCommitIfNeeded(CacheImpl.java:1306)

        at org.infinispan.CacheImpl.putInternal(CacheImpl.java:878)

        at org.infinispan.CacheImpl.put(CacheImpl.java:870)

        at org.infinispan.CacheImpl.put(CacheImpl.java:1351)

        at org.infinispan.CacheImpl.put(CacheImpl.java:215)

        at org.infinispan.AbstractDelegatingCache.put(AbstractDelegatingCache.java:276)

        [... more stuff in custom classes ...]

       

       

      2015-04-28 20:01:30,974 [Node2] [ERROR][remote-thread-72][org.infinispan.remoting.InboundInvocationHandlerImpl] ISPN000260: Exception executing command: org.infinispan.statetransfer.OutdatedTopologyException: Cache topology changed while the command was executing: expected 170, got 171

        at org.infinispan.interceptors.distribution.BaseDistributionInterceptor.handleNonTxWriteCommand(BaseDistributionInterceptor.java:184)

        at org.infinispan.interceptors.distribution.NonTxDistributionInterceptor.visitPutKeyValueCommand(NonTxDistributionInterceptor.java:72)

        at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:70)

        at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)

        at org.infinispan.interceptors.EntryWrappingInterceptor.invokeNextAndApplyChanges(EntryWrappingInterceptor.java:321)

        at org.infinispan.interceptors.EntryWrappingInterceptor.setSkipRemoteGetsAndInvokeNextForDataCommand(EntryWrappingInterceptor.java:402)

        at org.infinispan.interceptors.EntryWrappingInterceptor.visitPutKeyValueCommand(EntryWrappingInterceptor.java:164)

        at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:70)

        at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)

        at org.infinispan.interceptors.locking.AbstractLockingInterceptor.visitPutKeyValueCommand(AbstractLockingInterceptor.java:70)

        at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:70)

        at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)

        at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:112)

        at org.infinispan.commands.AbstractVisitor.visitPutKeyValueCommand(AbstractVisitor.java:32)

        at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:70)

        at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)

        at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:112)

        at org.infinispan.commands.AbstractVisitor.visitPutKeyValueCommand(AbstractVisitor.java:32)

        at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:70)

        at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)

        at org.infinispan.statetransfer.StateTransferInterceptor.handleNonTxWriteCommand(StateTransferInterceptor.java:213)

        at org.infinispan.statetransfer.StateTransferInterceptor.visitPutKeyValueCommand(StateTransferInterceptor.java:141)

        at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:70)

        at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)

        at org.infinispan.interceptors.CacheMgmtInterceptor.updateStoreStatistics(CacheMgmtInterceptor.java:148)

        at org.infinispan.interceptors.CacheMgmtInterceptor.visitPutKeyValueCommand(CacheMgmtInterceptor.java:134)

        at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:70)

        at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)

        at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:110)

        at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:73)

        at org.infinispan.commands.AbstractVisitor.visitPutKeyValueCommand(AbstractVisitor.java:32)

        at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:70)

        at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:333)

        at org.infinispan.commands.remote.BaseRpcInvokingCommand.processVisitableCommand(BaseRpcInvokingCommand.java:39)

        at org.infinispan.commands.remote.SingleRpcCommand.perform(SingleRpcCommand.java:48)

        at org.infinispan.remoting.InboundInvocationHandlerImpl.handleInternal(InboundInvocationHandlerImpl.java:95)

        at org.infinispan.remoting.InboundInvocationHandlerImpl.access$000(InboundInvocationHandlerImpl.java:50)

        at org.infinispan.remoting.InboundInvocationHandlerImpl$2.run(InboundInvocationHandlerImpl.java:172)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

        at java.lang.Thread.run(Thread.java:745)