Hello,
Thank you all for your tipps, but I don´t think it is related to GC or memory. I make a fresh installation of IS 9.1.4 on the Redhat server and use the ./standalone.sh -c clustered.xml with no modifications. I configure my application and Connector to use the defaut repl cache to put my values. The Connector uses the HotRod protocol.
From the server.log I see different error messages, where I am not sure, where it was the root cause:
Sometimes, if we restart the 2nd node, I see on the first node
2018-04-09 14:48:20,108 ERROR [org.infinispan.CLUSTER] (transport-thread--p4-t17) ISPN000196: Failed to recover cluster state after the current node became the coordinator (or after merge): java.util.concurrent.ExecutionException: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 10 from wwhelapp0120
at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:82)
at org.infinispan.topology.ClusterTopologyManagerImpl.executeOnClusterSync(ClusterTopologyManagerImpl.java:620)
at org.infinispan.topology.ClusterTopologyManagerImpl.recoverClusterStatus(ClusterTopologyManagerImpl.java:484)
at org.infinispan.topology.ClusterTopologyManagerImpl.becomeCoordinator(ClusterTopologyManagerImpl.java:359)
at org.infinispan.topology.ClusterTopologyManagerImpl.handleClusterView(ClusterTopologyManagerImpl.java:338)
at org.infinispan.topology.ClusterTopologyManagerImpl.access$500(ClusterTopologyManagerImpl.java:83)
at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener.lambda$handleViewChange$0(ClusterTopologyManagerImpl.java:765)
at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:144)
at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:33)
at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:174)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 10 from wwhelapp0120
at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:163)
at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:86)
at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:21)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
... 3 more
2018-04-09 17:09:42,952 FATAL [org.infinispan.CLUSTER] (transport-thread--p4-t17) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [wwhelapp0119, wwhelapp0120].
2018-04-09 12:52:13,903 WARN [org.infinispan.server.hotrod.Decoder2x] (HotRod-ServerWorker-5-4) ISPN006011: Operation 'REMOVE' forced to return previous value should be used on transactional caches, otherwise data inconsistency issues could arise under failure situations
2018-04-09 12:52:28,913 ERROR [org.infinispan.interceptors.impl.InvocationContextInterceptor] (timeout-thread--p3-t1) ISPN000136: Error executing command RemoveCommand, writing keys [WrappedByteArray{bytes=[B0x033E104445534B54..[19], hashCode=-1420082805}]: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 18 from wwhelapp0120
at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:163)
at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:86)
at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:21)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2018-04-09 13:15:06,446 ERROR [org.infinispan.interceptors.impl.InvocationContextInterceptor] (jgroups-16,wwhelapp0119) ISPN000136: Error executing command RemoveCommand, writing keys [WrappedByteArray{bytes=[B0x033E104445534B54..[19], hashCode=-1420082805}]: org.infinispan.remoting.RemoteException: ISPN000217: Received exception from wwhelapp0120, see cause for remote stack trace
at org.infinispan.remoting.transport.ResponseCollectors.wrapRemoteException(ResponseCollectors.java:27)
at org.infinispan.remoting.transport.ValidSingleResponseCollector.withException(ValidSingleResponseCollector.java:41)
at org.infinispan.remoting.transport.ValidSingleResponseCollector.addResponse(ValidSingleResponseCollector.java:25)
at org.infinispan.remoting.transport.impl.SingleTargetRequest.receiveResponse(SingleTargetRequest.java:51)
at org.infinispan.remoting.transport.impl.SingleTargetRequest.onResponse(SingleTargetRequest.java:35)
at org.infinispan.remoting.transport.impl.RequestRepository.addResponse(RequestRepository.java:53)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processResponse(JGroupsTransport.java:1328)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processMessage(JGroupsTransport.java:1238)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.access$200(JGroupsTransport.java:121)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport$ChannelCallbacks.receive(JGroupsTransport.java:1366)
at org.jgroups.JChannel.up(JChannel.java:819)
at org.jgroups.fork.ForkProtocolStack.up(ForkProtocolStack.java:134)
at org.jgroups.stack.Protocol.up(Protocol.java:340)
at org.jgroups.protocols.FORK.up(FORK.java:134)
at org.jgroups.protocols.FRAG3.up(FRAG3.java:171)
at org.jgroups.protocols.FlowControl.up(FlowControl.java:343)
at org.jgroups.protocols.FlowControl.up(FlowControl.java:343)
at org.jgroups.protocols.pbcast.GMS.up(GMS.java:864)
at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:240)
at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1002)
at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:728)
at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:383)
at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:600)
at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:119)
at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:199)
at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:252)
at org.jgroups.protocols.MERGE3.up(MERGE3.java:276)
at org.jgroups.protocols.Discovery.up(Discovery.java:267)
at org.jgroups.protocols.TP.passMessageUp(TP.java:1229)
at org.jgroups.util.SubmitToThreadPool$SingleMessageHandler.run(SubmitToThreadPool.java:87)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000299: Unable to acquire lock after 10 seconds for key WrappedByteArray{bytes=[B0x033E104445534B54..[19], hashCode=-1420082805} and requestor CommandInvocation:wwhelapp0119:764. Lock is held by CommandInvocation:wwhelapp0119:763
at org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.lock(DefaultLockManager.java:253)
at org.infinispan.interceptors.locking.AbstractLockingInterceptor.lockAndRecord(AbstractLockingInterceptor.java:269)
at org.infinispan.interceptors.locking.AbstractLockingInterceptor.visitNonTxDataWriteCommand(AbstractLockingInterceptor.java:130)
at org.infinispan.interceptors.locking.NonTransactionalLockingInterceptor.visitDataWriteCommand(NonTransactionalLockingInterceptor.java:38)
at org.infinispan.interceptors.locking.AbstractLockingInterceptor.visitRemoveCommand(AbstractLockingInterceptor.java:105)
at org.infinispan.commands.write.RemoveCommand.acceptVisitor(RemoveCommand.java:63)
at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:58)
at org.infinispan.statetransfer.StateTransferInterceptor.handleNonTxWriteCommand(StateTransferInterceptor.java:306)
at org.infinispan.statetransfer.StateTransferInterceptor.handleWriteCommand(StateTransferInterceptor.java:252)
at org.infinispan.statetransfer.StateTransferInterceptor.visitRemoveCommand(StateTransferInterceptor.java:108)
at org.infinispan.commands.write.RemoveCommand.acceptVisitor(RemoveCommand.java:63)
at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:58)
at org.infinispan.interceptors.impl.CacheMgmtInterceptor.visitRemoveCommand(CacheMgmtInterceptor.java:214)
at org.infinispan.commands.write.RemoveCommand.acceptVisitor(RemoveCommand.java:63)
at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextAndExceptionally(BaseAsyncInterceptor.java:127)
at org.infinispan.interceptors.impl.InvocationContextInterceptor.visitCommand(InvocationContextInterceptor.java:96)
at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:60)
at org.infinispan.interceptors.DDAsyncInterceptor.handleDefault(DDAsyncInterceptor.java:54)
at org.infinispan.interceptors.DDAsyncInterceptor.visitRemoveCommand(DDAsyncInterceptor.java:65)
at org.infinispan.commands.write.RemoveCommand.acceptVisitor(RemoveCommand.java:63)
at org.infinispan.interceptors.DDAsyncInterceptor.visitCommand(DDAsyncInterceptor.java:50)
at org.infinispan.interceptors.impl.AsyncInterceptorChainImpl.invokeAsync(AsyncInterceptorChainImpl.java:234)
at org.infinispan.commands.remote.BaseRpcInvokingCommand.processVisitableCommandAsync(BaseRpcInvokingCommand.java:63)
at org.infinispan.commands.remote.SingleRpcCommand.invokeAsync(SingleRpcCommand.java:57)
at org.infinispan.remoting.inboundhandler.BasePerCacheInboundInvocationHandler.invokeCommand(BasePerCacheInboundInvocationHandler.java:102)
at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.invoke(BaseBlockingRunnable.java:99)
at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.runAsync(BaseBlockingRunnable.java:71)
at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.run(BaseBlockingRunnable.java:40)
... 3 more