3 Replies Latest reply on Oct 24, 2017 4:58 PM by Karl von Randow

    CLI ls command is very slow to complete

    Karl von Randow Newbie

      We've been having a lot of trouble with our Infinispan 9.1.1 Server cluster:

      * High CPU usage.

      * Timeouts on global lock acquisition

       

      I'm wondering if it could be due to our monitoring which is connecting to each of our four Infinispan servers each minute and running this command:

       

      /opt/infinispan/bin/ispn-cli.sh --connect "container clustered,ls"

       

      Often times this command hangs for about 5 minutes. The stats command is also slow, but not as slow—more like 30s.

       

      /opt/infinispan/bin/ispn-cli.sh --connect "container clustered,cache 'NAME',stats"

       

      Are these issues expected? Or does this suggest something is wrong with my cluster? What is a useful approach to debugging these issues?

        • 1. Re: CLI ls command is very slow to complete
          Sebastian Łaskawiec Apprentice

          Oftentimes long execution of CLI commands is caused by high load on the system. Could you please make sure this is not the case?

           

          Probably one of the easiest debugging tricks is to use Flight Recorder and record a `jfr` file. You could attach it to this thread so that we could analyze it.

          • 2. Re: CLI ls command is very slow to complete
            Karl von Randow Newbie

            Thanks Sebastian,

             

            The load on the system does not appear to be high. The servers each have 12+ cores and load average is between 2 and 6. While the command is running the Infinispan process does use a lot of CPU (still not enough to push the load average up). I'm not sure if that is useful?

             

            I haven't used Flight Recorder before, so I am getting into that, and I will come back with something.

             

            Related to our general problems, maybe related to this, this is a type of exception we often see in our Tomcat log:

             

            org.infinispan.client.hotrod.exceptions.HotRodClientException:Request for messageId=78202414 returned server error (status=0x86): org.infinispan.util.concurrent.TimeoutException: ISPN000299: Unable to acquire lock after 30 seconds for key WrappedByteArray{bytes=[B0x033E26706572736F..[41], hashCode=-2043581116} and requestor GlobalTx:app2:64740206. Lock is held by GlobalTx:app2:64740137

                    at org.infinispan.client.hotrod.impl.protocol.Codec20.checkForErrorsInResponseStatus(Codec20.java:350)

                    at org.infinispan.client.hotrod.impl.protocol.Codec20.readPartialHeader(Codec20.java:139)

                    at org.infinispan.client.hotrod.impl.protocol.Codec20.readHeader(Codec20.java:125)

                    at org.infinispan.client.hotrod.impl.operations.HotRodOperation.readHeaderAndValidate(HotRodOperation.java:56)

            • 3. Re: CLI ls command is very slow to complete
              Karl von Randow Newbie

              I think perhaps the above stack is described by the logs below; it seems that app1 might be feeling temporarily left out of the cluster. I'm guessing it's GC pauses and I wonder if these slow CLI responses could also be due to GC pauses or lots of GC. Looking at the heaps stats I think this is correct. Tuning! And then I'll report back.

               

              app2 log:

               

              09:47:15,971 WARN  [org.jboss.as.txn] (HotRod-ServerHandler-4-50) WFLYTX0027: The pre-jca synchronization SynchronizationAdapter{localTransaction=LocalTransaction{remoteLockedNodes=[app1, app2], isMarkedForRollback=false, lockedKeys=[], backupKeyLo

              cks=[], topologyId=376, stateTransferFlag=null} org.infinispan.transaction.synchronization.SyncLocalTransaction@3dbdb30} org.infinispan.transaction.synchronization.SynchronizationAdapter@3dbdb4f associated with tx TransactionImple < ac, BasicAction

              : 0:ffffc7c3c774:5ebbc1c9:59ed1ec5:7b7d94b status: ActionStatus.COMMITTED > failed during after completion: org.infinispan.commons.CacheException: Could not commit.

                      at org.infinispan.transaction.impl.TransactionTable.afterCompletion(TransactionTable.java:853)

                      at org.infinispan.transaction.synchronization.SynchronizationAdapter.afterCompletion(SynchronizationAdapter.java:33)

                      at org.jboss.as.txn.service.internal.tsr.JCAOrderedLastSynchronizationList.afterCompletion(JCAOrderedLastSynchronizationList.java:147)

                      at com.arjuna.ats.internal.jta.resources.arjunacore.SynchronizationImple.afterCompletion(SynchronizationImple.java:96)

              ...

              Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 73164592 from app1

                      at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:163)

                      at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:86)

                      at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:21)

                      at java.util.concurrent.FutureTask.run(FutureTask.java:266)

                      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)

                      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)

                      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

                      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

                      ... 1 more

               

               

              And on app1:

               

              09:35:34,399 WARN  [org.infinispan.remoting.inboundhandler.NonTotalOrderTxPerCacheInboundInvocationHandler] (remote-thread--p2-t22) ISPN000071: Caught exception when handling command PrepareCommand {modifications=[PutKeyValueCommand{key=WrappedByteArray{bytes=[B0x033E37636F6E7472..[58], hashCode=814350529}, value=WrappedByteArray{bytes=[B0x030409000000176A..[203], hashCode=1502081560}, flags=[IGNORE_RETURN_VALUES], commandInvocationId=CommandInvocation:local:0, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{lifespan=86400000, maxIdle=-1, version=NumericVersion{version=24206856864276646}}, successful=true, topologyId=361}], onePhaseCommit=true, retried=false, gtx=GlobalTx:app3:557081434, cacheName='letterboxd.metadata', topologyId=361}: org.infinispan.commons.CacheException: ISPN000482: Cannot create remote transaction GlobalTx:app3:557081434, already completed

                      at org.infinispan.transaction.impl.TransactionTable.lambda$getOrCreateRemoteTransaction$1(TransactionTable.java:387)

                      at java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1853)

                      at org.infinispan.transaction.impl.TransactionTable.getOrCreateRemoteTransaction(TransactionTable.java:380)

                      at org.infinispan.transaction.impl.TransactionTable.getOrCreateRemoteTransaction(TransactionTable.java:360)

                      at org.infinispan.commands.tx.PrepareCommand.createContext(PrepareCommand.java:120)

                      at org.infinispan.commands.tx.PrepareCommand.invokeAsync(PrepareCommand.java:101)

                      at org.infinispan.remoting.inboundhandler.BasePerCacheInboundInvocationHandler.invokeCommand(BasePerCacheInboundInvocationHandler.java:102)

                      at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.invoke(BaseBlockingRunnable.java:99)

                      at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.runAsync(BaseBlockingRunnable.java:71)

                      at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.run(BaseBlockingRunnable.java:40)

                      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

                      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

                      at java.lang.Thread.run(Thread.java:748)

               

              09:35:36,395 WARN  [org.jgroups.protocols.pbcast.GMS] (jgroups-5799,app1) app1: failed to collect all ACKs (expected=3) for view [app1|87] after 2000ms, missing 3 ACKs from (3) app2, app3, app4