Cluster leaver still owns key?
mdobrinin Feb 2, 2015 12:39 PMHi, I'm having an issue where it appears the owner of a key can leave and the key is not rehashed to the other nodes and parts of the system can still think that the leaver is still in the cluster in a replicated cache. This causes the node trying to perform an operation (remove) to ask the owner to do it for him, but of course that node already left! To give some more detail, the remove() operations happens in response to cluster topology change events, so it is clear that the other node already left, but some other parts of Infinispan must not be aware of it yet. The issue only occurs when going from 2 to 1 nodes, e.g., the pre-last node left the cluster. Here are some logs demonstrating the issue:
The leaver is removed from the cluster:
2015-01-30 12:07:09,566 TRACE [41 ] UNICAST2 [] CALVIN-8383: removing non members [CALVIN-44280] []
2015-01-30 12:07:09,566 TRACE [41 ] UNICAST2 [] CALVIN-8383 --> STABLE(CALVIN-44280: 243-243, conn_id=3) []
2015-01-30 12:07:09,566 TRACE [41 ] UDP [] CALVIN-8383: sending msg to CALVIN-44280, src=CALVIN-8383, headers are UNICAST2: STABLE, seqno=243, conn_id=3, UDP: [channel_name=MOCA_TEST_CLUSTER] []
2015-01-30 12:07:09,566 DEBUG [41 ] NAKACK [] CALVIN-8383: removed CALVIN-44280 from xmit_table (not member anymore) []
2015-01-30 12:07:09,566 TRACE [41 ] FD_SOCK [] adjusted suspected_mbrs: [] []
2015-01-30 12:07:09,566 DEBUG [41 ] FD_SOCK [] VIEW_CHANGE received: [CALVIN-8383] []
2015-01-30 12:07:09,570 TRACE [41 ] STABLE [] CALVIN-8383: reset digest to CALVIN-8383: [-1] []
2015-01-30 12:07:09,570 TRACE [41 ] UFC [] new membership: [CALVIN-8383] []
2015-01-30 12:07:09,570 TRACE [41 ] MFC [] new membership: [CALVIN-8383] []
2015-01-30 12:07:09,570 TRACE [41 ] FRAG2 [] CALVIN-8383: removed CALVIN-44280 from fragmentation table []
The remove() command is attempted:
2015-01-30 12:07:09,584 TRACE [170 ] InvocationContextInterceptor [] Invoked with command RemoveCommand{key=InfinispanNode [address=CALVIN-44280], value=null, flags=null, valueMatcher=MATCH_ALWAYS} and InvocationContext [org.infinispan.context.SingleKeyNonTxInvocationContext@15450db] []
2015-01-30 12:07:09,584 TRACE [170 ] StateTransferInterceptor [] handleNonTxWriteCommand for command RemoveCommand{key=InfinispanNode [address=CALVIN-44280], value=null, flags=null, valueMatcher=MATCH_ALWAYS} []
2015-01-30 12:07:09,584 TRACE [170 ] NonTransactionalLockingInterceptor [] Are (CALVIN-8383) we the lock owners for key 'InfinispanNode [address=CALVIN-44280]'? false []
2015-01-30 12:07:09,584 TRACE [170 ] EntryWrappingInterceptor [] Wrapping entry 'InfinispanNode [address=CALVIN-44280]'? false []
2015-01-30 12:07:09,584 TRACE [170 ] CallInterceptor [] Executing command: RemoveCommand{key=InfinispanNode [address=CALVIN-44280], value=null, flags=null, valueMatcher=MATCH_ALWAYS}. []
2015-01-30 12:07:09,584 TRACE [170 ] BaseDistributionInterceptor [] I'm not the primary owner, so sending the command to the primary owner(CALVIN-44280) in order to be forwarded []
2015-01-30 12:07:09,584 TRACE [170 ] RpcManagerImpl [] CALVIN-8383 invoking RemoveCommand{key=InfinispanNode [address=CALVIN-44280], value=null, flags=null, valueMatcher=MATCH_ALWAYS} to recipient list [CALVIN-44280] with options RpcOptions{timeout=20000, unit=MILLISECONDS, fifoOrder=true, totalOrder=false, responseFilter=null, responseMode=SYNCHRONOUS, skipReplicationQueue=false} []
2015-01-30 12:07:09,585 TRACE [177 ] JGroupsTransport [] dests=null, command=CacheTopologyControlCommand{cache=moca-current-roles, type=CH_UPDATE, sender=CALVIN-8383, joinInfo=null, topologyId=9, currentCH=ReplicatedConsistentHash{members=[CALVIN-8383], numSegments=60, primaryOwners=[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}, pendingCH=null, throwable=null, viewId=7}, mode=ASYNCHRONOUS_WITH_SYNC_MARSHALLING, timeout=20000 []
2015-01-30 12:07:09,585 TRACE [72 ] GlobalComponentRegistry [] Injecting dependencies for method [public void org.infinispan.topology.CacheTopologyControlCommand.init(org.infinispan.topology.LocalTopologyManager,org.infinispan.topology.ClusterTopologyManager,org.infinispan.topology.RebalancePolicy)] on an instance of [org.infinispan.topology.CacheTopologyControlCommand]. []
2015-01-30 12:07:09,585 TRACE [177 ] DefaultRebalancePolicy [] Cache moca-current-roles status changed: joiners=[], topology=CacheTopology{id=9, currentCH=ReplicatedConsistentHash{members=[CALVIN-8383], numSegments=60, primaryOwners=[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}, pendingCH=null} []
2015-01-30 12:07:09,585 TRACE [170 ] JGroupsTransport [] dests=[CALVIN-44280], command=SingleRpcCommand{cacheName='moca-node-urls', command=RemoveCommand{key=InfinispanNode [address=CALVIN-44280], value=null, flags=null, valueMatcher=MATCH_ALWAYS}}, mode=SYNCHRONOUS, timeout=20000 []
And eventually it returns as an error:
org.infinispan.remoting.transport.jgroups.SuspectException: One or more nodes have left the cluster while replicating command SingleRpcCommand{cacheName='moca-node-urls', command=RemoveCommand{key=InfinispanNode [address=CALVIN-44280], value=null, flags=null, valueMatcher=MATCH_ALWAYS}}
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:484) ~[infinispan-core-jda.jar:6.0.2-SNAPSHOT]
at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:281) [infinispan-core-jda.jar:6.0.2-SNAPSHOT]
at org.infinispan.interceptors.distribution.BaseDistributionInterceptor.handleNonTxWriteCommand(BaseDistributionInterceptor.java:233) [infinispan-core-jda.jar:6.0.2-SNAPSHOT]
at org.infinispan.interceptors.distribution.NonTxDistributionInterceptor.visitRemoveCommand(NonTxDistributionInterceptor.java:110) [infinispan-core-jda.jar:6.0.2-SNAPSHOT]