3 Replies Latest reply on Oct 24, 2017 4:58 PM by karlvr

CLI ls command is very slow to complete

karlvr Oct 23, 2017 8:18 PM

We've been having a lot of trouble with our Infinispan 9.1.1 Server cluster:

* High CPU usage.

* Timeouts on global lock acquisition

I'm wondering if it could be due to our monitoring which is connecting to each of our four Infinispan servers each minute and running this command:

/opt/infinispan/bin/ispn-cli.sh --connect "container clustered,ls"

Often times this command hangs for about 5 minutes. The stats command is also slow, but not as slow—more like 30s.

/opt/infinispan/bin/ispn-cli.sh --connect "container clustered,cache 'NAME',stats"

Are these issues expected? Or does this suggest something is wrong with my cluster? What is a useful approach to debugging these issues?

1. Re: CLI ls command is very slow to complete

sebastian.laskawiec Oct 24, 2017 2:42 AM (in response to karlvr)

Oftentimes long execution of CLI commands is caused by high load on the system. Could you please make sure this is not the case?

Probably one of the easiest debugging tricks is to use Flight Recorder and record a `jfr` file. You could attach it to this thread so that we could analyze it.
Actions
2. Re: CLI ls command is very slow to complete

karlvr Oct 24, 2017 4:49 PM (in response to sebastian.laskawiec)

Thanks Sebastian,

The load on the system does not appear to be high. The servers each have 12+ cores and load average is between 2 and 6. While the command is running the Infinispan process does use a lot of CPU (still not enough to push the load average up). I'm not sure if that is useful?

I haven't used Flight Recorder before, so I am getting into that, and I will come back with something.

Related to our general problems, maybe related to this, this is a type of exception we often see in our Tomcat log:

org.infinispan.client.hotrod.exceptions.HotRodClientException:Request for messageId=78202414 returned server error (status=0x86): org.infinispan.util.concurrent.TimeoutException: ISPN000299: Unable to acquire lock after 30 seconds for key WrappedByteArray{bytes=[B0x033E26706572736F..[41], hashCode=-2043581116} and requestor GlobalTx:app2:64740206. Lock is held by GlobalTx:app2:64740137
        at org.infinispan.client.hotrod.impl.protocol.Codec20.checkForErrorsInResponseStatus(Codec20.java:350)
        at org.infinispan.client.hotrod.impl.protocol.Codec20.readPartialHeader(Codec20.java:139)
        at org.infinispan.client.hotrod.impl.protocol.Codec20.readHeader(Codec20.java:125)
        at org.infinispan.client.hotrod.impl.operations.HotRodOperation.readHeaderAndValidate(HotRodOperation.java:56)
Actions
3. Re: CLI ls command is very slow to complete

karlvr Oct 24, 2017 4:58 PM (in response to karlvr)

I think perhaps the above stack is described by the logs below; it seems that app1 might be feeling temporarily left out of the cluster. I'm guessing it's GC pauses and I wonder if these slow CLI responses could also be due to GC pauses or lots of GC. Looking at the heaps stats I think this is correct. Tuning! And then I'll report back.

app2 log:

09:47:15,971 WARN [org.jboss.as.txn] (HotRod-ServerHandler-4-50) WFLYTX0027: The pre-jca synchronization SynchronizationAdapter{localTransaction=LocalTransaction{remoteLockedNodes=[app1, app2], isMarkedForRollback=false, lockedKeys=[], backupKeyLo
cks=[], topologyId=376, stateTransferFlag=null} org.infinispan.transaction.synchronization.SyncLocalTransaction@3dbdb30} org.infinispan.transaction.synchronization.SynchronizationAdapter@3dbdb4f associated with tx TransactionImple < ac, BasicAction
: 0:ffffc7c3c774:5ebbc1c9:59ed1ec5:7b7d94b status: ActionStatus.COMMITTED > failed during after completion: org.infinispan.commons.CacheException: Could not commit.
        at org.infinispan.transaction.impl.TransactionTable.afterCompletion(TransactionTable.java:853)
        at org.infinispan.transaction.synchronization.SynchronizationAdapter.afterCompletion(SynchronizationAdapter.java:33)
        at org.jboss.as.txn.service.internal.tsr.JCAOrderedLastSynchronizationList.afterCompletion(JCAOrderedLastSynchronizationList.java:147)
        at com.arjuna.ats.internal.jta.resources.arjunacore.SynchronizationImple.afterCompletion(SynchronizationImple.java:96)
...
Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 73164592 from app1
        at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:163)
        at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:86)
        at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:21)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        ... 1 more

And on app1:

09:35:34,399 WARN [org.infinispan.remoting.inboundhandler.NonTotalOrderTxPerCacheInboundInvocationHandler] (remote-thread--p2-t22) ISPN000071: Caught exception when handling command PrepareCommand {modifications=[PutKeyValueCommand{key=WrappedByteArray{bytes=[B0x033E37636F6E7472..[58], hashCode=814350529}, value=WrappedByteArray{bytes=[B0x030409000000176A..[203], hashCode=1502081560}, flags=[IGNORE_RETURN_VALUES], commandInvocationId=CommandInvocation:local:0, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{lifespan=86400000, maxIdle=-1, version=NumericVersion{version=24206856864276646}}, successful=true, topologyId=361}], onePhaseCommit=true, retried=false, gtx=GlobalTx:app3:557081434, cacheName='letterboxd.metadata', topologyId=361}: org.infinispan.commons.CacheException: ISPN000482: Cannot create remote transaction GlobalTx:app3:557081434, already completed
        at org.infinispan.transaction.impl.TransactionTable.lambda$getOrCreateRemoteTransaction$1(TransactionTable.java:387)
        at java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1853)
        at org.infinispan.transaction.impl.TransactionTable.getOrCreateRemoteTransaction(TransactionTable.java:380)
        at org.infinispan.transaction.impl.TransactionTable.getOrCreateRemoteTransaction(TransactionTable.java:360)
        at org.infinispan.commands.tx.PrepareCommand.createContext(PrepareCommand.java:120)
        at org.infinispan.commands.tx.PrepareCommand.invokeAsync(PrepareCommand.java:101)
        at org.infinispan.remoting.inboundhandler.BasePerCacheInboundInvocationHandler.invokeCommand(BasePerCacheInboundInvocationHandler.java:102)
        at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.invoke(BaseBlockingRunnable.java:99)
        at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.runAsync(BaseBlockingRunnable.java:71)
        at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.run(BaseBlockingRunnable.java:40)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

09:35:36,395 WARN [org.jgroups.protocols.pbcast.GMS] (jgroups-5799,app1) app1: failed to collect all ACKs (expected=3) for view [app1|87] after 2000ms, missing 3 ACKs from (3) app2, app3, app4
Actions

Go to original post