4 Replies Latest reply on Jun 15, 2017 5:15 PM by paulcb

    High CPU & GC In Infinispan Replicated Cache under low load

    paulcb

      Hi,

       

      My environment is as follows:

       

      OS: Linux Ubuntu Xenial on VMWare with 4 CPU's

      JVM: Java HotSpot(TM) 64-Bit Server VM (25.131-b11, mixed mode)

      Java: version 1.8.0_131, vendor Oracle Corporation

      Mem: 2GB Heap (min and max)

      GC: G1GC

      App Server Wildfly 10

      Infinispan 8.1 (being used by Restcomm JAIN SLEE)

      JVM Args:

      D[Standalone]

      -XX:MetaspaceSize=96M

      -XX:MaxMetaspaceSize=256m

      -Djava.net.preferIPv4Stack=true

      -Djboss.modules.system.pkgs=org.jboss.byteman

      -Djava.awt.headless=true

      -Xmx2000m

      -Xms2000m

      -XX:+UseG1GC

      -XX:MaxGCPauseMillis=100

      -Dorg.jboss.resolver.warning=true

      -Dsun.rmi.dgc.client.gcInterval=3600000

      -Dsun.rmi.dgc.server.gcInterval=3600000

      -Dorg.jboss.boot.log.file=/opt/JSLEE/wildfly-10.1.0.Final/standalone/log/server.log

      -Dlogging.configuration=file:/opt/JSLEE/wildfly-10.1.0.Final/standalone/configuration/logging.properties

       

      2 Nodes with Infinispan config as follows:

       

      <infinispan

          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

          xsi:schemaLocation="urn:infinispan:config:8.2 http://www.infinispan.org/schemas/infinispan-config-8.2.xsd"

          xmlns="urn:infinispan:config:8.2">

        <jgroups>

          <stack-file name="udp" path="default-configs/default-jgroups-udp.xml"/>

        </jgroups>

        <cache-container name="${jboss.partition.name:DefaultPartition}-restcomm-diameter" statistics="false" default-cache="asyncCache">

          <transport stack="udp" cluster="DIAMETER_CLUSTER" node-name="DIAMETER_CLUSTER-${smile.diameter.cluster.node.name:NODE}"/>

          <jmx duplicate-domains="true"/>

          <replicated-cache name="asyncCache" mode="ASYNC">

            <locking isolation="READ_COMMITTED" acquire-timeout="20000"/>

            <transaction mode="BATCH"/>

            <state-transfer enabled="false" await-initial-transfer="true"/>

          </replicated-cache>

        </cache-container>

      </infinispan>

       

      Restcomm SLEE seems to be using TreeCache for its get/put operations but that is not in my control as I am merely using the SLEE itself and not Infinispan directly.

       

      The one node is receiving traffic that updates entries in Infinispan cache that in turn are replicated to the second node. The traffic load is about 1TPS. A pcap between the nodes shows that infinispan is replicating traffic at about 1 UDP packet per second which is what I would expect.

       

      My issue:

       

      When the nodes have been running for a few hours then the CPU utilization starts to grow and then averages around 50% peaking at 100% on both nodes (yes even the node not actually processing any requests other than keeping the cache in sync). Our actual application level code is probably using about 1% CPU (i know this as this is what it uses on older nodes that run on JBoss Cache on AS7 while this new version is using infinispan). After much analysis in jvisualVM I can see that the CPU load is not really coming from bytecode running in the JVM but rather from the garbage collector. Running top -H (screen shot included as well as thread dump showing threads of the GD are Ids 0x83 to 0x87 which is 131 to 135 decimal as per top) shows that the threads doing the work are the G1GC threads. These often sit at around 10-20% CPU each.

       

      I investigated what could be causing so much pain for the garbage collector: heap analysis on the standby node using jvisualVM shows about 40MB per second of FastCopyHashMap$Entry[] are being created and destroyed. Screen shot attached at a point in time. Other than Infinispan objects, not much else really compares at all in terms of object creation/destruction. The rest of the Heap is fairly quiet and has plenty of space. This makes me believe that the root cause of our issue is a problem in Infinispan or its configuration. What would make a server that's getting one update a second of maybe a few KB end up thrashing the heap at 40MB/s with infinispan data???

       

      Any ideas on what to look at or fix would be much appreciated. I've pretty much exhaused my investigation.

       

      Thanks

      Paul

        • 1. Re: High CPU & GC In Infinispan Replicated Cache under low load
          sebastian.laskawiec

          I see you're using Oracle JVM. Would it be possible to record a JFR file from a test run?

          • 2. Re: High CPU & GC In Infinispan Replicated Cache under low load
            paulcb

            I've managed to work out that the thrashing is happening as per this stack trace:

             

            "Incoming-2865,JSLEE_CLUSTER-TZRE-OCS2-43868" #11872 prio=5 os_prio=0 tid=0x00007fcf000e4800 nid=0x7894 runnable [0x00007fce7e9fd000]

            java.lang.Thread.State: RUNNABLE

                    at org.infinispan.commons.util.AbstractMap.hash(AbstractMap.java:32)

                    at org.infinispan.commons.util.FastCopyHashMap.put(FastCopyHashMap.java:224)

                    at org.infinispan.atomic.impl.PutOperation.replay(PutOperation.java:46)

                    at org.infinispan.atomic.impl.AtomicHashMapDelta.merge(AtomicHashMapDelta.java:45)

                    at org.infinispan.commands.write.PutKeyValueCommand.performPut(PutKeyValueCommand.java:253)

                    at org.infinispan.commands.write.PutKeyValueCommand.perform(PutKeyValueCommand.java:109)

                    at org.infinispan.interceptors.CallInterceptor.handleDefault(CallInterceptor.java:125)

                    at org.infinispan.commands.AbstractVisitor.visitPutKeyValueCommand(AbstractVisitor.java:43)

                    at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:78)

                    at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:99)

                    at org.infinispan.interceptors.distribution.TxDistributionInterceptor.handleTxWriteCommand(TxDistributionInterceptor.java:284)

                    at org.infinispan.interceptors.distribution.TxDistributionInterceptor.visitPutKeyValueCommand(TxDistributionInterceptor.java:112)

                    at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:78)

                    at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:99)

                    at org.infinispan.interceptors.EntryWrappingInterceptor$EntryWrappingVisitor.visitPutKeyValueCommand(EntryWrappingInterceptor.java:619)

                    at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:78)

                    at org.infinispan.interceptors.EntryWrappingInterceptor.wrapEntriesForPrepare(EntryWrappingInterceptor.java:678)

                    at org.infinispan.interceptors.EntryWrappingInterceptor.visitPrepareCommand(EntryWrappingInterceptor.java:105)

                    at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:176)

                    at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:99)

                    at org.infinispan.interceptors.NotificationInterceptor.visitPrepareCommand(NotificationInterceptor.java:37)

                    at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:176)

                    at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:99)

                    at org.infinispan.interceptors.locking.AbstractTxLockingInterceptor.invokeNextAndCommitIf1Pc(AbstractTxLockingInterceptor.java:93)

                    at org.infinispan.interceptors.locking.OptimisticLockingInterceptor.visitPrepareCommand(OptimisticLockingInterceptor.java:83)

                    at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:176)

                    at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:99)

                    at org.infinispan.interceptors.TxInterceptor.invokeNextInterceptorAndVerifyTransaction(TxInterceptor.java:158)

                    at org.infinispan.interceptors.TxInterceptor.visitPrepareCommand(TxInterceptor.java:145)

                    at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:176)

                    at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:99)

                    at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:113)

                    at org.infinispan.commands.AbstractVisitor.visitPrepareCommand(AbstractVisitor.java:112)

                    at org.infinispan.statetransfer.TransactionSynchronizerInterceptor.visitPrepareCommand(TransactionSynchronizerInterceptor.java:39)

                    - locked <0x00000000fc97d320> (a org.infinispan.transaction.impl.RemoteTransaction)

                    at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:176)

                    at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:99)

                    at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:113)

                    at org.infinispan.commands.AbstractVisitor.visitPrepareCommand(AbstractVisitor.java:112)

                    at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:176)

                    at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:99)

                    at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:114)

                    at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:83)

                    at org.infinispan.commands.AbstractVisitor.visitPrepareCommand(AbstractVisitor.java:112)

                    at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:176)

                    at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:99)

                    at org.infinispan.interceptors.BatchingInterceptor.handleDefault(BatchingInterceptor.java:59)

                    at org.infinispan.commands.AbstractVisitor.visitPrepareCommand(AbstractVisitor.java:112)

                    at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:176)

                    at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:99)

                    at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:113)

                    at org.infinispan.commands.AbstractVisitor.visitPrepareCommand(AbstractVisitor.java:112)

                    at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:176)

                    at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:335)

                    at org.infinispan.commands.tx.PrepareCommand.perform(PrepareCommand.java:100)

                    at org.infinispan.remoting.inboundhandler.BasePerCacheInboundInvocationHandler.invokePerform(BasePerCacheInboundInvocationHandler.java:92)

                    at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.run(BaseBlockingRunnable.java:34)

                    at org.infinispan.remoting.inboundhandler.BasePerCacheInboundInvocationHandler.handleRunnable(BasePerCacheInboundInvocationHandler.java:132)

                    at org.infinispan.remoting.inboundhandler.NonTotalOrderTxPerCacheInboundInvocationHandler.handle(NonTotalOrderTxPerCacheInboundInvocationHandler.java:99)

                    at org.infinispan.remoting.inboundhandler.GlobalInboundInvocationHandler.handleCacheRpcCommand(GlobalInboundInvocationHandler.java:126)

                    at org.infinispan.remoting.inboundhandler.GlobalInboundInvocationHandler.handleFromCluster(GlobalInboundInvocationHandler.java:75)

                    at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.executeCommandFromLocalCluster(CommandAwareRpcDispatcher.java:205)

                    at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:175)

                    at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:455)

                    at org.jgroups.blocks.RequestCorrelator.dispatch(RequestCorrelator.java:406)

                    at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:357)

                    at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:245)

                    at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:664)

                    at org.jgroups.JChannel.up(JChannel.java:760)

                    at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1034)

                    at org.jgroups.protocols.FRAG2.up(FRAG2.java:182)

                    at org.jgroups.protocols.FlowControl.up(FlowControl.java:438)

                    at org.jgroups.protocols.FlowControl.up(FlowControl.java:438)

                    at org.jgroups.stack.Protocol.up(Protocol.java:426)

                    at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:294)

                    at org.jgroups.protocols.UNICAST3.deliverBatch(UNICAST3.java:1100)

                    at org.jgroups.protocols.UNICAST3.removeAndDeliver(UNICAST3.java:899)

                    at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:803)

                    at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:433)

                    at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:649)

                    at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:155)

                    at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:200)

                    at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:325)

                    at org.jgroups.protocols.MERGE3.up(MERGE3.java:292)

                    at org.jgroups.protocols.Discovery.up(Discovery.java:296)

                    at org.jgroups.protocols.TP.passMessageUp(TP.java:1657)

                    at org.jgroups.protocols.TP$SingleMessageHandler.run(TP.java:1872)

                    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

                    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

                    at java.lang.Thread.run(Thread.java:748)

             

             

            My hunch is that there are lots of entries being looped through in the changelog and hence line 60 is a large loop.

             

            @Override

            53    public DeltaAware More ...merge(DeltaAware d) {

            54       AtomicHashMap<Object, Object> other;

            55       if (d != null && (d instanceof AtomicHashMap))

            56          other = (AtomicHashMap<Object, Object>) d;

            57       else

            58          other = new AtomicHashMap();

            59       if (changeLog != null) {

            60          for (Operation<Object, Object> o : changeLog) o.replay(other.delegate);

            61       }     

            62       return other;

            63    }

             

            What would be the reason for this?

                                                                                                                                                                                        234,4-11       3%

            • 3. Re: High CPU & GC In Infinispan Replicated Cache under low load
              rvansa

              What you describe sounds really like an infinite loop caused by a race condition, or maybe by a dangling transaction (the updates to the atomic map that should be applied during commit time are stored in a list, but if the transaction is not committed the list grows beyond limits). The good news is that Infinispan 9.1 will contain new implementation of these AtomicMaps, so if this is a bug in Infinispan, it's likely to be fixed. The API (nor the Tree API) didn't change, so replacing the dependency could work. However if that's a bug in the Restcomm SLEE, there will still be a list that will aggregate the operations until you commit the transaction.

              JBossTM has (by default 60 s) time limit for the transaction, but I am not sure if Infinispan checks it at other times than when the transaction is being committed.

              • 4. Re: High CPU & GC In Infinispan Replicated Cache under low load
                paulcb

                Thanks Radim,

                 

                I think you are right that this could be related to uncommitted transactions. My question on that is what determines the transaction boundaries when transaction mode is BATCH? Surely Infinispan demarcates the transaction boundaries itself? What would be the best way to track down transactions that are not committed?