2 Replies Latest reply on Apr 30, 2015 7:01 AM by wdfink

    Inconsistent cluster view

    sokrates_sd

      Setup

      • 2 node clustered Wildfly 8.0.2 with Infinispan 6.0.2.Final
      • Connected via JGroups TCPPING
      • Replicated synchronous cache

       

      On my development cluster I encountered the following

      erroneous behaviour. For better readability I put [NodeX]

      into the log lines below and pasted the actual stacktraces

      at the end of the text.

       

      At 10:57 Node2 logs that it received a new cluster view,

      while Node1 did not log anything.

       

      2015-04-28 10:57:03,494 [Node2][INFO ][Incoming-17,shared=tcp][org.infinispan.remoting.transport.jgroups.JGroupsTransport] ISPN000094: Received new cluster view: [va2/va-cluster|94] (1) [va2/va-cluster]

       

       

      From that time on, the JGroups layer had this views:

      node1: [va2/va-cluster|93] (2) [va2/va-cluster, va1/va-cluster]

      node2: [va2/va-cluster|94] (1) [va2/va-cluster]

       

       

      Over the whole day I could observe the following behaviour, e.g.

       

      2015-04-28 20:01:13,447 [Node1] putting object in cache

       

       

      Nothing happens for some time, thread seems blocked, then after 17 seconds Node1 logs

       

      2015-04-28 20:01:30,951 [Node1] [ERROR][default task-5][org.infinispan.interceptors.InvocationContextInterceptor] ISPN000136: Execution error: org.infinispan.util.concurrent.TimeoutException: Timed out waiting for topology 171

       

      And immediately this on node2

       

      2015-04-28 20:01:30,974 [Node2] [ERROR][remote-thread-72][org.infinispan.remoting.InboundInvocationHandlerImpl] ISPN000260: Exception executing command: org.infinispan.statetransfer.OutdatedTopologyException: Cache topology changed while the command was executing: expect 170, got 171

       

      Putting object in caches on node2 works fine, since obviously it does not try to replicte it to node1.

       

       

      Then I rebooted node2 with the following interesting log messages, note again a 12 second delay:

       

      2015-04-28 20:06:46,286 [Node2][INFO ][MSC service thread 1-7][org.infinispan.remoting.transport.jgroups.JGroupsTransport] ISPN000078: Starting JGroups Channel

      2015-04-28 20:06:58,812 [Node1][INFO ][Incoming-7,shared=tcp][org.infinispan.remoting.transport.jgroups.JGroupsTransport] ISPN000094: Received new cluster view: [va1/va-cluster|95] (1) [va1/va-cluster]

      2015-04-28 20:07:01,188 [Node2][INFO ][MSC service thread 1-7][org.infinispan.remoting.transport.jgroups.JGroupsTransport] ISPN000094: Received new cluster view: [va1/va-cluster|96] (2) [va1/va-cluster, va2/va-cluster]

       

      After that cache entries became replicated again in either direction.

       

       

      My questions now are:

       

      1. Why does this asymmetric view happen?

       

      2. Any hints on how to debug further? How to provoke this situation?

       

      3. What to do about it?

       

      Thanks for any thoughts on that,

       

      Here the detailed stack traces:

       

       

      2015-04-28 20:01:30,951 [Node1] [ERROR][default task-5][org.infinispan.interceptors.InvocationContextInterceptor] ISPN000136: Execution error: org.infinispan.util.concurrent.TimeoutException: Timed out waiting for topology 171

        at org.infinispan.statetransfer.StateTransferLockImpl.waitForTransactionData(StateTransferLockImpl.java:93)

        at org.infinispan.statetransfer.StateTransferInterceptor.handleNonTxWriteCommand(StateTransferInterceptor.java:234)

        at org.infinispan.statetransfer.StateTransferInterceptor.visitPutKeyValueCommand(StateTransferInterceptor.java:141)

        at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:70)

        at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)

        at org.infinispan.interceptors.CacheMgmtInterceptor.updateStoreStatistics(CacheMgmtInterceptor.java:148)

        at org.infinispan.interceptors.CacheMgmtInterceptor.visitPutKeyValueCommand(CacheMgmtInterceptor.java:134)

        at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:70)

        at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)

        at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:110)

        at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:73)

        at org.infinispan.commands.AbstractVisitor.visitPutKeyValueCommand(AbstractVisitor.java:32)

        at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:70)

        at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:333)

        at org.infinispan.CacheImpl.executeCommandAndCommitIfNeeded(CacheImpl.java:1306)

        at org.infinispan.CacheImpl.putInternal(CacheImpl.java:878)

        at org.infinispan.CacheImpl.put(CacheImpl.java:870)

        at org.infinispan.CacheImpl.put(CacheImpl.java:1351)

        at org.infinispan.CacheImpl.put(CacheImpl.java:215)

        at org.infinispan.AbstractDelegatingCache.put(AbstractDelegatingCache.java:276)

        [... more stuff in custom classes ...]

       

       

      2015-04-28 20:01:30,974 [Node2] [ERROR][remote-thread-72][org.infinispan.remoting.InboundInvocationHandlerImpl] ISPN000260: Exception executing command: org.infinispan.statetransfer.OutdatedTopologyException: Cache topology changed while the command was executing: expected 170, got 171

        at org.infinispan.interceptors.distribution.BaseDistributionInterceptor.handleNonTxWriteCommand(BaseDistributionInterceptor.java:184)

        at org.infinispan.interceptors.distribution.NonTxDistributionInterceptor.visitPutKeyValueCommand(NonTxDistributionInterceptor.java:72)

        at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:70)

        at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)

        at org.infinispan.interceptors.EntryWrappingInterceptor.invokeNextAndApplyChanges(EntryWrappingInterceptor.java:321)

        at org.infinispan.interceptors.EntryWrappingInterceptor.setSkipRemoteGetsAndInvokeNextForDataCommand(EntryWrappingInterceptor.java:402)

        at org.infinispan.interceptors.EntryWrappingInterceptor.visitPutKeyValueCommand(EntryWrappingInterceptor.java:164)

        at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:70)

        at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)

        at org.infinispan.interceptors.locking.AbstractLockingInterceptor.visitPutKeyValueCommand(AbstractLockingInterceptor.java:70)

        at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:70)

        at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)

        at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:112)

        at org.infinispan.commands.AbstractVisitor.visitPutKeyValueCommand(AbstractVisitor.java:32)

        at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:70)

        at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)

        at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:112)

        at org.infinispan.commands.AbstractVisitor.visitPutKeyValueCommand(AbstractVisitor.java:32)

        at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:70)

        at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)

        at org.infinispan.statetransfer.StateTransferInterceptor.handleNonTxWriteCommand(StateTransferInterceptor.java:213)

        at org.infinispan.statetransfer.StateTransferInterceptor.visitPutKeyValueCommand(StateTransferInterceptor.java:141)

        at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:70)

        at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)

        at org.infinispan.interceptors.CacheMgmtInterceptor.updateStoreStatistics(CacheMgmtInterceptor.java:148)

        at org.infinispan.interceptors.CacheMgmtInterceptor.visitPutKeyValueCommand(CacheMgmtInterceptor.java:134)

        at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:70)

        at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)

        at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:110)

        at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:73)

        at org.infinispan.commands.AbstractVisitor.visitPutKeyValueCommand(AbstractVisitor.java:32)

        at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:70)

        at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:333)

        at org.infinispan.commands.remote.BaseRpcInvokingCommand.processVisitableCommand(BaseRpcInvokingCommand.java:39)

        at org.infinispan.commands.remote.SingleRpcCommand.perform(SingleRpcCommand.java:48)

        at org.infinispan.remoting.InboundInvocationHandlerImpl.handleInternal(InboundInvocationHandlerImpl.java:95)

        at org.infinispan.remoting.InboundInvocationHandlerImpl.access$000(InboundInvocationHandlerImpl.java:50)

        at org.infinispan.remoting.InboundInvocationHandlerImpl$2.run(InboundInvocationHandlerImpl.java:172)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

        at java.lang.Thread.run(Thread.java:745)

        • 1. Re: Inconsistent cluster view
          rvansa

          That sounds like obviously buggy behaviour. If you can reproduce this, could you try to replace JGroups with latest version (3.6.3)? Also TRACE level on org.jgroups would give us more hints.

          • 2. Re: Inconsistent cluster view
            wdfink

            Do you use the infinispan cache inside WildFly or have an application using embedded mode?

             

            If you use the WildFly configuration it matter whether the WF-cluster is correct or not.

            If you use embedded mode you will have a WF cluster and a Infinispan-cluster which might be different.

             

            Attaching the configuration might be helpful.