Why does incoming message processing stop?
richmayfield Aug 21, 2015 3:31 PMInfinispan 7.0.3-Final
This one has me stumped. We have two nodes in our Infinispan cluster. All invalidating caches - we do not distribute the content. Occasionally one node stops processing incoming messages from the other. There are no errors logged or exceptions thrown.
The node that is no longer processing messages always ends up waiting in StateTransferLockImpl.waitForTransactionData().
"Incoming-2,lfvsfcp19626-23230" #289 prio=5 os_prio=0 tid=0x00002b3761e4c000 nid=0x8941 waiting on condition [0x00002b377c704000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000004d887bc00> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) at org.infinispan.statetransfer.StateTransferLockImpl.waitForTransactionData(StateTransferLockImpl.java:90) at org.infinispan.remoting.InboundInvocationHandlerImpl.handleWithWaitForBlocks(InboundInvocationHandlerImpl.java:206) at org.infinispan.remoting.InboundInvocationHandlerImpl.handle(InboundInvocationHandlerImpl.java:86) at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.executeCommandFromLocalCluster(CommandAwareRpcDispatcher.java:267) at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:211) at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:460) at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:377) at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:250) at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:677) at org.jgroups.JChannel.up(JChannel.java:755) at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1033) at org.jgroups.protocols.RSVP.up(RSVP.java:237) at org.jgroups.protocols.FRAG2.up(FRAG2.java:182) at org.jgroups.protocols.FlowControl.up(FlowControl.java:447) at org.jgroups.protocols.FlowControl.up(FlowControl.java:447) at org.jgroups.stack.Protocol.up(Protocol.java:420) at org.jgroups.stack.Protocol.up(Protocol.java:420) at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:294) at org.jgroups.protocols.UNICAST3.deliverBatch(UNICAST3.java:1087) at org.jgroups.protocols.UNICAST3.removeAndDeliver(UNICAST3.java:886) at org.jgroups.protocols.UNICAST3.handleBatchReceived(UNICAST3.java:867) at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:517) at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:710) at org.jgroups.stack.Protocol.up(Protocol.java:420) at org.jgroups.protocols.FD.up(FD.java:274) at org.jgroups.stack.Protocol.up(Protocol.java:420) at org.jgroups.protocols.MERGE2.up(MERGE2.java:252) at org.jgroups.stack.Protocol.up(Protocol.java:420) at org.jgroups.protocols.TP.passBatchUp(TP.java:1605) at org.jgroups.protocols.TP$BatchHandler.run(TP.java:1855) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
I have not configured these caches to be transactional. I cannot correlate these failures with any other activity. This is relatively infrequent - sometimes just once a day. Nonetheless, the only recovery appears to be restarting our application.
Is this a known issue? Any ideas?
Thanks so much