2 Replies Latest reply on Dec 6, 2014 9:34 PM by pferraro

    High CPU usage in cluster

    cmosher01

      We are running into a problem with Wildfly 8.1.0.Final, running a standalone cluster. After several minutes of high production-level load, we notice some nodes bogging down, pegging the CPU. Numerous thread dumps always show the following stack trace using all the CPU:

       

      "KeyAffinityService Thread Pool -- 1" prio=10 tid=0x00007f4ec413d000 nid=0x20fe runnable [0x00007f4e904c3000]

      java.lang.Thread.State: RUNNABLE

      at java.io.FileInputStream.readBytes(Native Method)

      at java.io.FileInputStream.read(FileInputStream.java:272)

      at sun.security.provider.NativePRNG$RandomIO.readFully(NativePRNG.java:202)

      at sun.security.provider.NativePRNG$RandomIO.ensureBufferValid(NativePRNG.java:264)

      at sun.security.provider.NativePRNG$RandomIO.implNextBytes(NativePRNG.java:278)

      - locked <0x000000059884afd0> (a java.lang.Object)

      at sun.security.provider.NativePRNG$RandomIO.access$200(NativePRNG.java:125)

      at sun.security.provider.NativePRNG.engineNextBytes(NativePRNG.java:114)

      at java.security.SecureRandom.nextBytes(SecureRandom.java:455)

      - locked <0x000000057b251e40> (a java.security.SecureRandom)

      at io.undertow.server.session.SecureRandomSessionIdGenerator.createSessionId(SecureRandomSessionIdGenerator.java:44)

      at org.wildfly.clustering.web.undertow.IdentifierFactoryAdapter.createIdentifier(IdentifierFactoryAdapter.java:42)

      at org.wildfly.clustering.web.undertow.IdentifierFactoryAdapter.createIdentifier(IdentifierFactoryAdapter.java:32)

      at org.wildfly.clustering.web.infinispan.AffinityIdentifierFactory.getKey(AffinityIdentifierFactory.java:55)

      at org.infinispan.affinity.KeyAffinityServiceImpl$KeyGeneratorWorker.generateKeys(KeyAffinityServiceImpl.java:247)

      at org.infinispan.affinity.KeyAffinityServiceImpl$KeyGeneratorWorker.run(KeyAffinityServiceImpl.java:220)

      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

      at java.lang.Thread.run(Thread.java:745)

      at org.jboss.threads.JBossThread.run(JBossThread.java:122)

       

      This is coupled with trying to read a massive amount of random numbers. We even introduced an external program on the host to generate random numbers for /dev/random, which helped the alleviate the problem, but only for a short time. After a half hour or so we saw the same behavior. This caused such a problem that we had to abandon using clustering in our production environment.

       

      Any idea what is happening here? Any workarounds? Why is it trying to generate so many keys? Why is it using so much CPU? Is there any way to reduce the number of keys needed? If not, then is there at least some way to configure it to use an internal random number generator instead of using /dev/random or /dev/urandom?

        • 1. Re: High CPU usage in cluster
          ctomc

          Looks like /dev/random is not giving out enough entropy and it is taking lots of time.

           

          can you try adding -Djava.security.egd=file:/dev/./urandom (and yes that "." in between is correct) system property to the server to make security provider use /dev/urandom

          • 2. Re: High CPU usage in cluster
            pferraro

            Session identifier generation in a cluster is more costly than the non-clustered case - since the probability that a given call to SecureRandom will generate a valid session identifier for the local node is approximately 1/N, where N is the cluster size.  Thus, a distributed web application will, on average, invoke SecureRandom N times more often than a non-distributable web application.  Normally, requests are protected from this this cost by leveraging a pool of pre-generated session identifiers that hash to the local node.