-
1. Re: Troubleshooting JGroups/Infinispan timeouts?
rhusar Feb 10, 2016 12:06 PM (in response to thiago.presa)Can you paste some stack trace?
-
2. Re: Troubleshooting JGroups/Infinispan timeouts?
thiago.presa Feb 10, 2016 12:43 PM (in response to rhusar)I meant as a general question, but here's a particular case:
15:00:10,107 WARN [org.infinispan.statetransfer.StateConsumerImpl] (ServerService Thread Pool -- 68) ISPN000286: Issue when retrieving cluster listeners from <slave-node>:<app-name>: org.infinispan.util.concurrent.TimeoutException: Replication timeout for <slave-node>:<app-name>
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.checkRsp(JGroupsTransport.java:755)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$80(JGroupsTransport.java:589)
at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
at org.infinispan.remoting.transport.jgroups.SingleResponseFuture.call(SingleResponseFuture.java:46)
at org.infinispan.remoting.transport.jgroups.SingleResponseFuture.call(SingleResponseFuture.java:17)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
This is the only stack trace that seems related to the timeout issue. It comes before the AS timeout:
15:00:54,649 ERROR [org.jboss.as.controller.management-operation] (Controller Boot Thread) WFLYCTL0348: Timeout after [300] seconds waiting for service container stability. Operation will roll back. Step that first updated the service container was 'add' at address '[("interface" => "unsecure")]'
And after the timeout there are many stack traces, probably related with the server shutdown. For instance:
15:01:19,909 ERROR [org.jboss.as.controller.management-operation] (Controller Boot Thread) WFLYCTL0190: Step handler org.jboss.as.controller.AbstractAddStepHandler$1@7e599053 for operation {"operation" => "add","address" => [("socket-binding-group" => "ha-sockets"),("remote-destination-outbound-socket-binding" => "mc-prox1p")],"host" => "<ip-address>","port" => 6666,"source-interface" => undefined,"source-port" => undefined,"fixed-source-port" => undefined} at address [
("socket-binding-group" => "ha-sockets"),
("remote-destination-outbound-socket-binding" => "mc-prox1p")
] failed handling operation rollback -- java.util.concurrent.TimeoutException: java.util.concurrent.TimeoutException
at org.jboss.as.controller.OperationContextImpl.waitForRemovals(OperationContextImpl.java:506)
at org.jboss.as.controller.AbstractOperationContext$Step.handleResult(AbstractOperationContext.java:1369)
at org.jboss.as.controller.AbstractOperationContext$Step.finalizeInternal(AbstractOperationContext.java:1328)
at org.jboss.as.controller.AbstractOperationContext$Step.finalizeStep(AbstractOperationContext.java:1311)
at org.jboss.as.controller.AbstractOperationContext$Step.access$300(AbstractOperationContext.java:1185)
at org.jboss.as.controller.AbstractOperationContext.executeResultHandlerPhase(AbstractOperationContext.java:767)
at org.jboss.as.controller.AbstractOperationContext.processStages(AbstractOperationContext.java:644)
at org.jboss.as.controller.AbstractOperationContext.executeOperation(AbstractOperationContext.java:370)
at org.jboss.as.controller.OperationContextImpl.executeOperation(OperationContextImpl.java:1336)
at org.jboss.as.controller.ModelControllerImpl.boot(ModelControllerImpl.java:485)
at org.jboss.as.controller.AbstractControllerService.boot(AbstractControllerService.java:387)
at org.jboss.as.controller.AbstractControllerService.boot(AbstractControllerService.java:349)
at org.jboss.as.server.ServerService.boot(ServerService.java:392)
at org.jboss.as.server.ServerService.boot(ServerService.java:365)
at org.jboss.as.controller.AbstractControllerService$1.run(AbstractControllerService.java:299)
at java.lang.Thread.run(Thread.java:745)
It seems to me that if I upgrade to WF10 Final, I'll get the fix for this bug[1], which probably will provide me more info on why the timeout happened.
-
3. Re: Troubleshooting JGroups/Infinispan timeouts?
rhusar Feb 11, 2016 2:43 PM (in response to thiago.presa)There has been a significant amount of issues resolved and component upgrades with fixes in WF 10 and more in the upcoming version (see master and the PR queue on GitHub) so please test on the latest (at least released) version because its unmanageable to investigate bugs that were likely already resolved.
-
4. Re: Troubleshooting JGroups/Infinispan timeouts?
thiago.presa Feb 11, 2016 3:19 PM (in response to rhusar)Yes, I'm already working on upgrading our Wildfly clusters. Thanks!