3 Replies Latest reply on May 30, 2016 9:33 AM by dan.berindei

    Replicated cache fails to replicate, likely during an ownership transfer (ISPN 7.2.3)

    phillavoie

      Hi,

       

      We are using ISPN 7.2.3 as an embedded cache.

      This particular cache is replicated. Writes are asynchronous and we don't await initial state transfer.

       

      We have a cluster with two nodes.

      The first one is put down using "kill -9".

      A new process is then launched, which joins the preexisting cluster that is now confined to the second node.

      The new first node is elx74379whc-35754.

      The 2nd node is elx74379whc-27681.


      First node

      On node startup, we put something in a replicated cache. We are being told, we are not the primary owner, so the put is sent remotely (you can see in the logs a get happening concurrently, I left it there in case it helps to debug):

       

      11:51:49,832 TRACE [org.infinispan.interceptors.InvocationContextInterceptor] (transport-thread--p8-t13) Invoked with command PutKeyValueCommand{key=MemberIdentity [internalId=elx74379whc-35754], value=[B0x343200687474703a..[45], flags=[IGNORE_RETURN_VALUES], putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true} and InvocationContext [org.infinispan.context.SingleKeyNonTxInvocationContext@258b52ae]

      11:51:49,832 TRACE [org.infinispan.commons.marshall.jboss.AbstractJBossMarshaller] (stateTransferExecutor-thread--p11-t7) Stop marshaller

      11:51:49,832 TRACE [org.infinispan.statetransfer.StateTransferInterceptor] (transport-thread--p8-t13) handleNonTxWriteCommand for command PutKeyValueCommand{key=MemberIdentity [internalId=elx74379whc-35754], value=[B0x343200687474703a..[45], flags=[IGNORE_RETURN_VALUES], putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true}, topology id -1

      11:51:49,832 TRACE [org.infinispan.interceptors.locking.NonTransactionalLockingInterceptor] (transport-thread--p8-t13) Are (elx74379whc-35754) we the lock owners for key 'MemberIdentity [internalId=elx74379whc-35754]'? false

      11:51:49,833 TRACE [org.infinispan.interceptors.EntryWrappingInterceptor] (transport-thread--p8-t13) Wrapping entry 'MemberIdentity [internalId=elx74379whc-35754]'? false

      11:51:49,833 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (stateTransferExecutor-thread--p11-t7) Response: null

      11:51:49,833 TRACE [org.infinispan.interceptors.CallInterceptor] (transport-thread--p8-t13) Executing command: PutKeyValueCommand{key=MemberIdentity [internalId=elx74379whc-35754], value=[B0x343200687474703a..[45], flags=[IGNORE_RETURN_VALUES], putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true}.

      11:51:49,833 TRACE [org.infinispan.executors.SemaphoreCompletionService] (stateTransferExecutor-thread--p11-t7) Task finished, tasks in queue 0, available permits 0

      11:51:49,833 TRACE [org.infinispan.executors.SemaphoreCompletionService] (stateTransferExecutor-thread--p11-t7) All queued tasks finished, available permits 1

      11:51:49,833 TRACE [org.infinispan.interceptors.distribution.BaseDistributionInterceptor] (transport-thread--p8-t13) Command topology id is 17, current topology id is 17, successful? true

      11:51:49,833 TRACE [org.infinispan.interceptors.distribution.BaseDistributionInterceptor] (transport-thread--p8-t13) I'm not the primary owner, so sending the command to the primary owner(elx74379whc-27681) in order to be forwarded

      11:51:49,833 TRACE [org.infinispan.remoting.rpc.RpcManagerImpl] (transport-thread--p8-t13) elx74379whc-35754 invoking PutKeyValueCommand{key=MemberIdentity [internalId=elx74379whc-35754], value=[B0x343200687474703a..[45], flags=[IGNORE_RETURN_VALUES], putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true} to recipient list [elx74379whc-27681] with options RpcOptions{timeout=15000, unit=MILLISECONDS, deliverOrder=PER_SENDER, responseFilter=null, responseMode=ASYNCHRONOUS_WITH_SYNC_MARSHALLING, skipReplicationQueue=false}

      11:51:49,833 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (transport-thread--p8-t13) dests=[elx74379whc-27681], command=SingleRpcCommand{cacheName='TrafficNodeStore', command=PutKeyValueCommand{key=MemberIdentity [internalId=elx74379whc-35754], value=[B0x343200687474703a..[45], flags=[IGNORE_RETURN_VALUES], putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true}}, mode=ASYNCHRONOUS_WITH_SYNC_MARSHALLING, timeout=15000

      11:51:49,833 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (transport-thread--p8-t13) Replication task sending SingleRpcCommand{cacheName='TrafficNodeStore', command=PutKeyValueCommand{key=MemberIdentity [internalId=elx74379whc-35754], value=[B0x343200687474703a..[45], flags=[IGNORE_RETURN_VALUES], putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true}} to single recipient elx74379whc-27681 with response mode GET_NONE

      11:51:49,833 TRACE [org.infinispan.statetransfer.StateConsumerImpl] (remote-thread--p9-t2) After applying the received state the data container of cache TrafficNodeStore has 3 keys

      11:51:49,834 TRACE [org.infinispan.statetransfer.StateConsumerImpl] (remote-thread--p9-t2) Segments not received yet for cache TrafficNodeStore: {}

      11:51:49,834 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (remote-thread--p9-t2) About to send back response null for command StateResponseCommand{cache=TrafficNodeStore, origin=elx74379whc-27681, topologyId=17}

      11:51:49,834 TRACE [org.infinispan.marshall.core.VersionAwareMarshaller] (remote-thread--p9-t2) Wrote version 510

      11:51:49,834 TRACE [org.infinispan.commons.marshall.jboss.AbstractJBossMarshaller] (remote-thread--p9-t2) Stop marshaller

      11:51:49,811 FINE  [com.ericsson.container.sip.distributor.clusters.sds.nodestore.NodeStore] (MSC service thread 1-6) getting from the store - id: MemberIdentity [internalId=elx74379whc-35754]

      11:51:49,833 TRACE [org.infinispan.commons.marshall.MarshallableTypeHints] (transport-thread--p8-t13) Cache a buffer size predictor for 'org.infinispan.commands.remote.SingleRpcCommand' assuming its serializability is unknown

      11:51:49,835 TRACE [org.infinispan.commons.marshall.AdaptiveBufferSizePredictor] (transport-thread--p8-t13) Next predicted buffer size for object type 'org.infinispan.commands.remote.SingleRpcCommand' will be 512

      11:51:49,835 TRACE [org.infinispan.marshall.core.VersionAwareMarshaller] (transport-thread--p8-t13) Wrote version 510

      11:51:49,836 TRACE [org.infinispan.commons.marshall.MarshallableTypeHints] (transport-thread--p8-t13) Cache a buffer size predictor for 'org.infinispan.commands.remote.SingleRpcCommand' assuming its serializability is unknown

      11:51:49,836 TRACE [org.infinispan.commons.marshall.AdaptiveBufferSizePredictor] (transport-thread--p8-t13) Next predicted buffer size for object type 'org.infinispan.commands.remote.SingleRpcCommand' will be 512

      11:51:49,836 TRACE [org.infinispan.marshall.core.VersionAwareMarshaller] (transport-thread--p8-t13) Wrote version 510

      11:51:49,836 TRACE [org.infinispan.interceptors.InvocationContextInterceptor] (MSC service thread 1-6) Invoked with command GetKeyValueCommand {key=MemberIdentity [internalId=elx74379whc-35754], flags=null} and InvocationContext [org.infinispan.context.SingleKeyNonTxInvocationContext@7232ba78]

      11:51:49,837 TRACE [org.infinispan.statetransfer.StateTransferInterceptor] (MSC service thread 1-6) handleTopologyAffectedCommand for command GetKeyValueCommand {key=MemberIdentity [internalId=elx74379whc-35754], flags=null}, origin null

      11:51:49,837 TRACE [org.infinispan.container.EntryFactoryImpl] (MSC service thread 1-6) Exists in context? null

      11:51:49,837 TRACE [org.infinispan.commons.marshall.jboss.AbstractJBossMarshaller] (transport-thread--p8-t13) Stop marshaller

      11:51:49,837 TRACE [org.infinispan.container.EntryFactoryImpl] (MSC service thread 1-6) Retrieved from container null (forceFetch=false, isLocal=true)

      11:51:49,838 TRACE [org.infinispan.commons.marshall.jboss.AbstractJBossMarshaller] (transport-thread--p8-t13) Stop marshaller

       

      Second node

      On the other end, we can see the RPC call being executed successfully:

       

      11:51:49,843 TRACE [org.infinispan.remoting.inboundhandler.GlobalInboundInvocationHandler] (Incoming-1,elx74379whc-27681) Attempting to execute CacheRpcCommand: SingleRpcCommand{cacheName='TrafficNodeStore', command=PutKeyValueCommand{key=MemberIdentity [internalId=elx74379whc-35754], value=[B0x343200687474703a..[45], flags=[IGNORE_RETURN_VALUES], putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true}} [sender=elx74379whc-35754]

      11:51:49,843 TRACE [org.infinispan.remoting.inboundhandler.NonTotalOrderPerCacheInboundInvocationHandler] (Incoming-1,elx74379whc-27681) Calling perform() on SingleRpcCommand{cacheName='TrafficNodeStore', command=PutKeyValueCommand{key=MemberIdentity [internalId=elx74379whc-35754], value=[B0x343200687474703a..[45], flags=[IGNORE_RETURN_VALUES], putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true}}

      11:51:49,844 TRACE [org.infinispan.commands.remote.BaseRpcInvokingCommand] (Incoming-1,elx74379whc-27681) Invoking command PutKeyValueCommand{key=MemberIdentity [internalId=elx74379whc-35754], value=[B0x343200687474703a..[45], flags=[IGNORE_RETURN_VALUES], putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true}, with originLocal flag set to false

      11:51:49,844 TRACE [org.infinispan.interceptors.InvocationContextInterceptor] (Incoming-1,elx74379whc-27681) Invoked with command PutKeyValueCommand{key=MemberIdentity [internalId=elx74379whc-35754], value=[B0x343200687474703a..[45], flags=[IGNORE_RETURN_VALUES], putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true} and InvocationContext [org.infinispan.context.impl.NonTxInvocationContext@62e9c22c]

      11:51:49,844 TRACE [org.infinispan.statetransfer.StateTransferInterceptor] (Incoming-1,elx74379whc-27681) handleNonTxWriteCommand for command PutKeyValueCommand{key=MemberIdentity [internalId=elx74379whc-35754], value=[B0x343200687474703a..[45], flags=[IGNORE_RETURN_VALUES], putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true}, topology id 17

      11:51:49,844 TRACE [org.infinispan.interceptors.locking.NonTransactionalLockingInterceptor] (Incoming-1,elx74379whc-27681) Are (elx74379whc-27681) we the lock owners for key 'MemberIdentity [internalId=elx74379whc-35754]'? false

      11:51:49,844 TRACE [org.infinispan.interceptors.EntryWrappingInterceptor] (Incoming-1,elx74379whc-27681) Wrapping entry 'MemberIdentity [internalId=elx74379whc-35754]'? true

      11:51:49,844 TRACE [org.infinispan.container.EntryFactoryImpl] (Incoming-1,elx74379whc-27681) Exists in context? null

      11:51:49,844 TRACE [org.infinispan.container.EntryFactoryImpl] (Incoming-1,elx74379whc-27681) Retrieved from container null (forceFetch=false, isLocal=true)

      11:51:49,844 TRACE [org.infinispan.container.EntryFactoryImpl] (Incoming-1,elx74379whc-27681) Creating new entry.

      11:51:49,844 TRACE [org.infinispan.container.EntryFactoryImpl] (Incoming-1,elx74379whc-27681) Wrap MemberIdentity [internalId=elx74379whc-35754] for put. Entry=ReadCommittedEntry(3db18ef){key=MemberIdentity [internalId=elx74379whc-35754], value=null, oldValue=null, isCreated=true, isChanged=false, isRemoved=false, isValid=true, skipRemoteGet=false, metadata=EmbeddedMetadata{version=null}}

      11:51:49,844 TRACE [org.infinispan.interceptors.CallInterceptor] (Incoming-1,elx74379whc-27681) Executing command: PutKeyValueCommand{key=MemberIdentity [internalId=elx74379whc-35754], value=[B0x343200687474703a..[45], flags=[IGNORE_RETURN_VALUES], putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true}.

      11:51:49,844 TRACE [org.infinispan.interceptors.distribution.BaseDistributionInterceptor] (Incoming-1,elx74379whc-27681) Command topology id is 17, current topology id is 18, successful? true

      11:51:49,844 TRACE [org.infinispan.interceptors.EntryWrappingInterceptor] (Incoming-1,elx74379whc-27681) About to commit entry ReadCommittedEntry(3db18ef){key=MemberIdentity [internalId=elx74379whc-35754], value=[B0x343200687474703a..[45], oldValue=null, isCreated=true, isChanged=true, isRemoved=false, isValid=true, skipRemoteGet=false, metadata=EmbeddedMetadata{version=null}}

      11:51:49,844 TRACE [org.infinispan.statetransfer.CommitManager] (Incoming-1,elx74379whc-27681) Trying to commit. Key=MemberIdentity [internalId=elx74379whc-35754]. Operation Flag=null, L1 invalidation=false

      11:51:49,844 TRACE [org.infinispan.statetransfer.CommitManager] (Incoming-1,elx74379whc-27681) Committing key=MemberIdentity [internalId=elx74379whc-35754]. It is a L1 invalidation or a normal put and no tracking is enabled!

      11:51:49,844 TRACE [org.infinispan.container.entries.ReadCommittedEntry] (Incoming-1,elx74379whc-27681) Updating entry (key=MemberIdentity [internalId=elx74379whc-35754] removed=false valid=true changed=true created=true loaded=false value=[B0x343200687474703a..[45] metadata=EmbeddedMetadata{version=null}, providedMetadata=EmbeddedMetadata{version=null})

      11:51:49,844 TRACE [org.infinispan.container.DefaultDataContainer] (Incoming-1,elx74379whc-27681) Creating new ICE for writing. Existing=null, metadata=EmbeddedMetadata{version=null}, new value=[B@5227a9e2

      11:51:49,844 TRACE [org.infinispan.container.DefaultDataContainer] (Incoming-1,elx74379whc-27681) Store ImmortalCacheEntry{key=MemberIdentity [internalId=elx74379whc-35754], value=[B0x343200687474703a..[45]} in container

      11:51:49,849 TRACE [org.infinispan.interceptors.EntryWrappingInterceptor] (Incoming-1,elx74379whc-27681) The return value is null


      First node

      It seems like the ownership has been transferred concurrently with the call, but the replication is lost because, later on, we try to access that very same key on the first node, and we get this:


      11:52:16,665 TRACE [org.infinispan.interceptors.InvocationContextInterceptor] (sip-message-processor-1) Invoked with command GetKeyValueCommand {key=MemberIdentity [internalId=elx74379whc-35754], flags=null} and InvocationContext [org.infinispan.context.SingleKeyNonTxInvocationContext@3c5d3500]

      11:52:16,665 TRACE [org.infinispan.statetransfer.StateTransferInterceptor] (sip-message-processor-1) handleTopologyAffectedCommand for command GetKeyValueCommand {key=MemberIdentity [internalId=elx74379whc-35754], flags=null}, origin null

      11:52:16,666 TRACE [org.infinispan.container.EntryFactoryImpl] (sip-message-processor-1) Exists in context? null

      11:52:16,666 TRACE [org.infinispan.container.EntryFactoryImpl] (sip-message-processor-1) Retrieved from container null (forceFetch=false, isLocal=true)

      11:52:16,666 TRACE [org.infinispan.container.EntryFactoryImpl] (sip-message-processor-1) Wrap MemberIdentity [internalId=elx74379whc-35754] for read. Entry=null

      11:52:16,666 TRACE [org.infinispan.interceptors.CallInterceptor] (sip-message-processor-1) Executing command: GetKeyValueCommand {key=MemberIdentity [internalId=elx74379whc-35754], flags=null}.

      11:52:16,666 TRACE [org.infinispan.commands.read.GetKeyValueCommand] (sip-message-processor-1) Entry not found

      11:52:16,666 TRACE [org.infinispan.interceptors.distribution.BaseDistributionInterceptor] (sip-message-processor-1) Not doing a remote get for key MemberIdentity [internalId=elx74379whc-35754] since entry is mapped to current node (elx74379whc-35754) or is in L1. Owners are [elx74379whc-35754, elx74379whc-27681]

       

      Any thoughts?

      Thanks,

      Phil