Hi William
Thanks for help. Would be nice if you could help us to resolve where problem is.
Yes the problem may be as you have described. We start collection over cache and this hangs. Unpleasantly it not finishes. How can we detect such state to not got hang in that thread? Can we use some timeout for operation? Or is chance that Infinispan itself recovers from that state?
Regarding configuration - we used attached one. I think we do not use some special things. Can you briefly check if there is something suspicion?
How could we secure that we do not have topology update issues? To which configuration it could be connected?
I have additional information and also logs.
I've observed that this behavior can be connected to change of time? We have NTP configured on our nodes and when NTP triggers time synchronization such a things starts happening. I also created another post for that Change of date/time on nodes. How Infinispan/JGroups is affected? but there I have no answers yet.
Or is this change of time a dead end and it has no connection to it?
Anyway here are additional logs from Infinispan:
2014-03-24 17:05:21,207 WARN [remote-thread-0065f36a-2860-4107-810a-d087224c9105-p6-t25] (BasePerCacheInboundInvocationHandler.java:110) - ISPN000071: Caught exception when handling command StreamRequestCommand{cacheName='cache'}
org.infinispan.remoting.transport.jgroups.SuspectException: One or more nodes have left the cluster while replicating command StreamResponseCommand{cacheName='cache'}
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotelyAsync(JGroupsTransport.java:567) ~[infinispan-embedded-8.2.0.Final.jar:8.2.0.Final]
at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotelyAsync(RpcManagerImpl.java:162) ~[infinispan-embedded-8.2.0.Final.jar:8.2.0.Final]
at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:203) ~[infinispan-embedded-8.2.0.Final.jar:8.2.0.Final]
at org.infinispan.stream.impl.LocalStreamManagerImpl.streamOperationRehashAware(LocalStreamManagerImpl.java:271) ~[infinispan-embedded-8.2.0.Final.jar:8.2.0.Final]
at org.infinispan.stream.impl.StreamRequestCommand.perform(StreamRequestCommand.java:93) ~[infinispan-embedded-8.2.0.Final.jar:8.2.0.Final]
at org.infinispan.remoting.inboundhandler.BasePerCacheInboundInvocationHandler.invokePerform(BasePerCacheInboundInvocationHandler.java:92) [infinispan-embedded-8.2.0.Final.jar:8.2.0.Final]
at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.run(BaseBlockingRunnable.java:34) [infinispan-embedded-8.2.0.Final.jar:8.2.0.Final]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:1.8.0_71]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:1.8.0_71]
at java.lang.Thread.run(Unknown Source) [?:1.8.0_71]
2014-03-24 17:05:21,503 WARN [remote-thread-0065f36a-2860-4107-810a-d087224c9105-p6-t25] (BasePerCacheInboundInvocationHandler.java:110) - ISPN000071: Caught exception when handling command StreamRequestCommand{cacheName='cache'}
org.infinispan.remoting.transport.jgroups.SuspectException: One or more nodes have left the cluster while replicating command StreamResponseCommand{cacheName='cache'}
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotelyAsync(JGroupsTransport.java:567) ~[infinispan-embedded-8.2.0.Final.jar:8.2.0.Final]
at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotelyAsync(RpcManagerImpl.java:162) ~[infinispan-embedded-8.2.0.Final.jar:8.2.0.Final]
at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:203) ~[infinispan-embedded-8.2.0.Final.jar:8.2.0.Final]
at org.infinispan.stream.impl.LocalStreamManagerImpl.streamOperationRehashAware(LocalStreamManagerImpl.java:271) ~[infinispan-embedded-8.2.0.Final.jar:8.2.0.Final]
at org.infinispan.stream.impl.StreamRequestCommand.perform(StreamRequestCommand.java:93) ~[infinispan-embedded-8.2.0.Final.jar:8.2.0.Final]
at org.infinispan.remoting.inboundhandler.BasePerCacheInboundInvocationHandler.invokePerform(BasePerCacheInboundInvocationHandler.java:92) [infinispan-embedded-8.2.0.Final.jar:8.2.0.Final]
at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.run(BaseBlockingRunnable.java:34) [infinispan-embedded-8.2.0.Final.jar:8.2.0.Final]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:1.8.0_71]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:1.8.0_71]
at java.lang.Thread.run(Unknown Source) [?:1.8.0_71]
Thas starts rolling for a long time and seems not to stop 9over 30 minutes). In fact in that time actually should no node left cluster. Can here change of time play some role?
Another log:
2014-03-28 16:11:46,442 DEBUG [transport-thread-0066f36a-2860-4107-810a-d087224c9105-p8-t20] (LocalTopologyManagerImpl.java:324) - Updating local topology for cache cacheName: CacheTopology{id=6, rebalanceId=3, currentCH=DefaultConsistentHash{ns=256, owners = (4)[1005f36a-2860-4107-810a-d087224c9105-63959: 63+193, 0065f36a-2860-4107-810a-d087224c9105-59337: 61+195, 0066f36a-2860-4107-810a-d087224c9105-31598: 67+189, 0005f36a-2860-4107-810a-d087224c9105-63473: 65+191]}, pendingCH=null, unionCH=null, actualMembers=[1005f36a-2860-4107-810a-d087224c9105-63959, 0065f36a-2860-4107-810a-d087224c9105-59337, 0066f36a-2860-4107-810a-d087224c9105-31598, 0005f36a-2860-4107-810a-d087224c9105-63473]}
2014-03-28 16:11:46,476 DEBUG [transport-thread-0066f36a-2860-4107-810a-d087224c9105-p8-t20] (StateConsumerImpl.java:921) - Removing no longer owned entries for cache cacheName
2014-03-24 15:11:14,630 DEBUG [qtp28535554-180] (ServiceFinder.java:52) - No service impls found: FilterIndexingServiceProvider
2014-03-24 15:11:14,952 DEBUG [qtp28535554-180] (DelegatingBasicLogger.java:384) - Interceptor chain size: 11
2014-03-24 15:11:15,006 DEBUG [qtp28535554-180] (InterceptorChain.java:55) - Interceptor chain is:
>> org.infinispan.interceptors.distribution.DistributionBulkInterceptor
>> org.infinispan.interceptors.InvocationContextInterceptor
>> org.infinispan.interceptors.CacheMgmtInterceptor
>> org.infinispan.statetransfer.StateTransferInterceptor
>> org.infinispan.interceptors.locking.NonTransactionalLockingInterceptor
>> org.infinispan.interceptors.distribution.L1LastChanceInterceptor
>> org.infinispan.interceptors.EntryWrappingInterceptor
>> org.infinispan.interceptors.DeadlockDetectingInterceptor
>> org.infinispan.interceptors.distribution.L1NonTxInterceptor
>> org.infinispan.interceptors.distribution.NonTxDistributionInterceptor
>> org.infinispan.interceptors.CallInterceptor
2014-03-24 15:11:15,294 DEBUG [qtp28535554-180] (JmxUtil.java:75) - Object name org.infinispan:type=Cache,name="defaultCache(dist_async)",manager="caches",component=Cache already registered
2014-03-24 15:11:15,363 DEBUG [qtp28535554-180] (LocalTopologyManagerImpl.java:116) - Node 0066f36a-2860-4107-810a-d087224c9105-31598 joining cache defaultCache
2014-03-24 15:32:15,208 WARN [RMI TCP Connection(17)-192.168.0.42] (ClusterCacheStatsImpl.java:532) - Could not execute cluster wide cache stats operation
java.lang.NullPointerException
at org.infinispan.remoting.rpc.RpcManagerImpl.getMembers(RpcManagerImpl.java:422) ~[infinispan-embedded-8.2.0.Final.jar:8.2.0.Final]
at org.infinispan.distexec.DefaultExecutorService.getMembers(DefaultExecutorService.java:222) ~[infinispan-embedded-8.2.0.Final.jar:8.2.0.Final]
at org.infinispan.distexec.DefaultExecutorService.executionCandidates(DefaultExecutorService.java:229) ~[infinispan-embedded-8.2.0.Final.jar:8.2.0.Final]
at org.infinispan.distexec.DefaultExecutorService.submitEverywhere(DefaultExecutorService.java:469) ~[infinispan-embedded-8.2.0.Final.jar:8.2.0.Final]
at org.infinispan.distexec.DefaultExecutorService.submitEverywhere(DefaultExecutorService.java:462) ~[infinispan-embedded-8.2.0.Final.jar:8.2.0.Final]
at org.infinispan.stats.impl.ClusterCacheStatsImpl.fetchClusterWideStatsIfNeeded(ClusterCacheStatsImpl.java:529) ~[infinispan-embedded-8.2.0.Final.jar:8.2.0.Final]
at org.infinispan.stats.impl.ClusterCacheStatsImpl.getActivations(ClusterCacheStatsImpl.java:457) ~[infinispan-embedded-8.2.0.Final.jar:8.2.0.Final]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_71]
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) ~[?:1.8.0_71]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) ~[?:1.8.0_71]
at java.lang.reflect.Method.invoke(Unknown Source) ~[?:1.8.0_71]
at org.infinispan.jmx.ResourceDMBean$InvokableSetterBasedMBeanAttributeInfo.invoke(ResourceDMBean.java:394) ~[infinispan-embedded-8.2.0.Final.jar:8.2.0.Final]
at org.infinispan.jmx.ResourceDMBean.getNamedAttribute(ResourceDMBean.java:298) ~[infinispan-embedded-8.2.0.Final.jar:8.2.0.Final]
at org.infinispan.jmx.ResourceDMBean.getAttributes(ResourceDMBean.java:197) ~[infinispan-embedded-8.2.0.Final.jar:8.2.0.Final]
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttributes(Unknown Source) ~[?:1.8.0_71]
at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttributes(Unknown Source) ~[?:1.8.0_71]
at javax.management.remote.rmi.RMIConnectionImpl.doOperation(Unknown Source) ~[?:1.8.0_71]
at javax.management.remote.rmi.RMIConnectionImpl.access$300(Unknown Source) ~[?:1.8.0_71]
at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(Unknown Source) ~[?:1.8.0_71]
at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(Unknown Source) ~[?:1.8.0_71]
at javax.management.remote.rmi.RMIConnectionImpl.getAttributes(Unknown Source) ~[?:1.8.0_71]
at sun.reflect.GeneratedMethodAccessor467.invoke(Unknown Source) ~[?:?]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) ~[?:1.8.0_71]
at java.lang.reflect.Method.invoke(Unknown Source) ~[?:1.8.0_71]
at sun.rmi.server.UnicastServerRef.dispatch(Unknown Source) ~[?:1.8.0_71]
at sun.rmi.transport.Transport$1.run(Unknown Source) ~[?:1.8.0_71]
at sun.rmi.transport.Transport$1.run(Unknown Source) ~[?:1.8.0_71]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_71]
at sun.rmi.transport.Transport.serviceCall(Unknown Source) ~[?:1.8.0_71]
at sun.rmi.transport.tcp.TCPTransport.handleMessages(Unknown Source) ~[?:1.8.0_71]
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(Unknown Source) ~[?:1.8.0_71]
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$95(Unknown Source) ~[?:1.8.0_71]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_71]
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(Unknown Source) [?:1.8.0_71]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:1.8.0_71]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:1.8.0_71]
at java.lang.Thread.run(Unknown Source) [?:1.8.0_71]
2014-03-24 15:33:11,284 WARN [RMI TCP Connection(17)-192.168.0.42] (ClusterCacheStatsImpl.java:532) - Could not execute cluster wide cache stats operation
java.lang.NullPointerException
at org.infinispan.remoting.rpc.RpcManagerImpl.getMembers(RpcManagerImpl.java:422) ~[infinispan-embedded-8.2.0.Final.jar:8.2.0.Final]
at org.infinispan.distexec.DefaultExecutorService.getMembers(DefaultExecutorService.java:222) ~[infinispan-embedded-8.2.0.Final.jar:8.2.0.Final]
at org.infinispan.distexec.DefaultExecutorService.executionCandidates(DefaultExecutorService.java:229) ~[infinispan-embedded-8.2.0.Final.jar:8.2.0.Final]
at org.infinispan.distexec.DefaultExecutorService.submitEverywhere(DefaultExecutorService.java:469) ~[infinispan-embedded-8.2.0.Final.jar:8.2.0.Final]
at org.infinispan.distexec.DefaultExecutorService.submitEverywhere(DefaultExecutorService.java:462) ~[infinispan-embedded-8.2.0.Final.jar:8.2.0.Final]
at org.infinispan.stats.impl.ClusterCacheStatsImpl.fetchClusterWideStatsIfNeeded(ClusterCacheStatsImpl.java:529) ~[infinispan-embedded-8.2.0.Final.jar:8.2.0.Final]
at org.infinispan.stats.impl.ClusterCacheStatsImpl.getActivations(ClusterCacheStatsImpl.java:457) ~[infinispan-embedded-8.2.0.Final.jar:8.2.0.Final]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_71]
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) ~[?:1.8.0_71]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) ~[?:1.8.0_71]
at java.lang.reflect.Method.invoke(Unknown Source) ~[?:1.8.0_71]
at org.infinispan.jmx.ResourceDMBean$InvokableSetterBasedMBeanAttributeInfo.invoke(ResourceDMBean.java:394) ~[infinispan-embedded-8.2.0.Final.jar:8.2.0.Final]
at org.infinispan.jmx.ResourceDMBean.getNamedAttribute(ResourceDMBean.java:298) ~[infinispan-embedded-8.2.0.Final.jar:8.2.0.Final]
at org.infinispan.jmx.ResourceDMBean.getAttributes(ResourceDMBean.java:197) ~[infinispan-embedded-8.2.0.Final.jar:8.2.0.Final]
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttributes(Unknown Source) ~[?:1.8.0_71]
at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttributes(Unknown Source) ~[?:1.8.0_71]
at javax.management.remote.rmi.RMIConnectionImpl.doOperation(Unknown Source) ~[?:1.8.0_71]
at javax.management.remote.rmi.RMIConnectionImpl.access$300(Unknown Source) ~[?:1.8.0_71]
at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(Unknown Source) ~[?:1.8.0_71]
at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(Unknown Source) ~[?:1.8.0_71]
at javax.management.remote.rmi.RMIConnectionImpl.getAttributes(Unknown Source) ~[?:1.8.0_71]
at sun.reflect.GeneratedMethodAccessor467.invoke(Unknown Source) ~[?:?]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) ~[?:1.8.0_71]
at java.lang.reflect.Method.invoke(Unknown Source) ~[?:1.8.0_71]
at sun.rmi.server.UnicastServerRef.dispatch(Unknown Source) ~[?:1.8.0_71]
at sun.rmi.transport.Transport$1.run(Unknown Source) ~[?:1.8.0_71]
at sun.rmi.transport.Transport$1.run(Unknown Source) ~[?:1.8.0_71]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_71]
at sun.rmi.transport.Transport.serviceCall(Unknown Source) ~[?:1.8.0_71]
at sun.rmi.transport.tcp.TCPTransport.handleMessages(Unknown Source) ~[?:1.8.0_71]
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(Unknown Source) ~[?:1.8.0_71]
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$95(Unknown Source) ~[?:1.8.0_71]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_71]
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(Unknown Source) [?:1.8.0_71]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:1.8.0_71]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:1.8.0_71]
at java.lang.Thread.run(Unknown Source) [?:1.8.0_71]
2014-03-24 17:58:31,369 ERROR [Camel (camel-1) thread #2 - seda://provider] (InvocationContextInterceptor.java:136) - ISPN000136: Error executing command PutKeyValueCommand, writing keys [eafc6606-5d27-45d8-bca7-fb8c16e01188]
org.infinispan.util.concurrent.TimeoutException: Replication timeout for 0065f36a-2860-4107-810a-d087224c9105-59337
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.checkRsp(JGroupsTransport.java:801) ~[infinispan-embedded-8.2.0.Final.jar:8.2.0.Final]
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$83(JGroupsTransport.java:633) ~[infinispan-embedded-8.2.0.Final.jar:8.2.0.Final]
at java.util.concurrent.CompletableFuture.uniApply(Unknown Source) ~[?:1.8.0_71]
at java.util.concurrent.CompletableFuture$UniApply.tryFire(Unknown Source) ~[?:1.8.0_71]
at java.util.concurrent.CompletableFuture.postComplete(Unknown Source) ~[?:1.8.0_71]
at java.util.concurrent.CompletableFuture.complete(Unknown Source) ~[?:1.8.0_71]
at org.infinispan.remoting.transport.jgroups.SingleResponseFuture.call(SingleResponseFuture.java:46) ~[infinispan-embedded-8.2.0.Final.jar:8.2.0.Final]
at org.infinispan.remoting.transport.jgroups.SingleResponseFuture.call(SingleResponseFuture.java:17) ~[infinispan-embedded-8.2.0.Final.jar:8.2.0.Final]
at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:1.8.0_71]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown Source) ~[?:1.8.0_71]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) ~[?:1.8.0_71]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:1.8.0_71]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:1.8.0_71]
at java.lang.Thread.run(Unknown Source) [?:1.8.0_71]
As here can be seen the exception is thrown close where date was changed on node - from 28th to 24th. And keeps rolling from that time.
Do you have idea how we can avoid those situations?
Thanks
Kind regards
Tomas