6 Replies Latest reply on Nov 20, 2017 4:34 AM by adinn

    Thread dump with many threads blocked on getInitiatedClasses

    tylerbenson

      I have a thread dump from a customer's Tomcat app with 150 threads blocked on calls to sun.instrument.InstrumentationImpl.getInitiatedClasses0(Native Method).

       

      The common calls for all of these are as follows:

       

      at sun.instrument.InstrumentationImpl.getInitiatedClasses0(Native Method)

      at sun.instrument.InstrumentationImpl.getInitiatedClasses(InstrumentationImpl.java:182)

      at org.jboss.byteman.agent.check.LoadCache.lookupClass(LoadCache.java:94)

      at org.jboss.byteman.agent.Transformer.getClassChecker(Transformer.java:908)

      at org.jboss.byteman.agent.Transformer.transform(Transformer.java:362) (some calls from line 324 instead...)

       

      Does anyone know what is going on here and what might be blocking the thread?

       

      The Byteman version being used here is: '4.0.0-BETA5'.

       

      Thanks!

       

      Message was edited by: Tyler Benson Removed confusing `dd.deps` prefix, an unintentional artifact from shadow jar.

        • 1. Re: Thread dump with many threads blocked on getInitiatedClasses
          adinn

          Hi Tyler,

           

          Thanks for reporting this problem.

           

          Tyler Benson wrote:

          . . .

          at sun.instrument.InstrumentationImpl.getInitiatedClasses0(Native Method)

          at sun.instrument.InstrumentationImpl.getInitiatedClasses(InstrumentationImpl.java:182)

          at org.jboss.byteman.agent.check.LoadCache.lookupClass(LoadCache.java:94)

          at org.jboss.byteman.agent.Transformer.getClassChecker(Transformer.java:908)

          at dd.deps.org.jboss.byteman.agent.Transformer.transform(Transformer.java:362) (some calls from line 324 instead...)

           

          Does anyone know what is going on here and what might be blocking the thread?

           

          I really need to have more details of what is deployed in the Tomcat container and underlying JVM to answer that question. In particular, which version of Byteman is this, which JVM is being used, what rules are being loaded into the runtime and when. I am particularly puzzled as to why the stack trace shows the top level call as being in package

           

          dd.deps.org.jboss.byteman.agent

           

          There is no such package in the standard Byteman releases. So, this does not look to me like it is a standard Byteman release. If so then I am not really going to be able to help without a fairly detailed account of what has been done to build this version.

           

          I can make a few surmises and suggestions based on the stack backtrace. First thing, let's establish a little context. I suggest that because I think your diagnosis is probably wrong -- the system may not be wedged but instead just running very, very slowly.

           

          1) Transformer.transform gets called when a class is first loaded. It allows Byteman to look at the bytecode and decide whether or not to inject a rule into it's bytecode.

           

          2) Transformer.transform may also get called if a rule is loaded into a running JVM which needs injecting into one or more already loaded classes.

           

          In the second case the call would happen from one thread only (the Byteman TransformListener thread). So, we need to assume we are in case 1.

           

          The two calls to Transformer.getClassChecker at the lines mentioned in the backtrace occur when Byteman is checking for rules that use overriding injection i.e. they specify

           

          CLASS ^Foobar

           

          or

           

          INTERFACE ^FOOBAR

           

          So, we know that there is at least one rule in the system that uses this syntax. One of the consequences of using overriding rules is that class loading can be slowed down. Here is why.

           

          When Byteman's Transformer is handed a class that is being loaded it has to decide if a rule applies to that class. So, let's compare the case where we don't use overriding with the one where we do. First, assume we have a rule for CLASS Foobar and we are checking a class with name org.my.X. Rules are stored in a hash table using the CLASS or INTERFACE name string as a key. If the class being loaded is org.my.Baz then we check whether we have a rule for it by looking in the hash table using "Baz" and "org.my.Baz" as the lookup key. So,  Byteman can very quickly decide there is no need to transform class org.my.X.

           

          Now assume we have a rule for CLASS ^Foobar and we are checking org.my.X. We can immediately identify that there are no rules mentioning CLASS X or CLASS org.my.X because the hash lookup fails. But Byteman knows that the rule system contains overriding rules. How do we decide if the rule for CLASS ^Foobar applies to org.my.X? The rule will only apply if X is a subclass of org.my.X. So, Byteman has to find out whether X is actually a subclass of some other class called Foobar.

           

          That sounds simple enough. Just call X.getSuperClass() repeatedly and do a hash lookup for the resulting class. Unfortunately, that doesn't work. Class X may not yet be fully resolved (it is in the middle of being loaded) so calling X.getSuperClass() may have side effects. In particular, assume class X has superclass Y and Y has not yet been loaded (yes, this can and does happen). If you call X.getSuperClass() it will force class Y to be loaded and resolved. The problem is that this is happening under a call out to an agent. The JVM will do the load for Y but won't make another agent call out (the agent transformation operation is deliberately designed not to be re-rentrant). So, if Y was actually Foobar then this would mean that the chance to inject into Foobar would get missed.

           

          So, instead Byteman has to resort to tricks to try to identify the superclass of X. It obtains the first super name for X from the initial byte[] for the class being transformed and then searches the loaded class base to find a class with the relevant name which belongs to the same classloader or to a parent loader (n.b. only Java agents and JDK runtime code can do this). If the class is found in the loaded classes list Byteman can continue to search the super hierarchy by using getSuperClass(). If not Byteman tries to obtain the class format  byte[] by asking the class loader to load the .class file for the class as a resource. This can be scanned to identify the next superclass and so on until a loaded class is found. (n.b. whenever a byte[] is found it is stored in a cache (using a weak reference) so that a second lookup will happen quicker).

           

          Normally, this trick works ok because it simply involves loading some bytes from disk and runs quite quickly with no side-edfects. However, if the app is using a classloader which does something unusual like load over a network then each attempt to load a super class may take a lot longer. With many loads happening at once this could run very slowly and cause a bottleneck. Also, if a normal class load and a resource file load both require some global classloader lock to be obtained then this might possibly cause a deadlock between a load in one thread which has not yet entered the transformer and another load in a different thread which has read its original class file, entered the transformer and is trying to execute a resource load for some superclass.

           

          I suspect one or other of these situations is happening. If so then Byteman is just not going to work.

           

          If you can let me know more details of the problematic deployment I will be very happy (and interested) to look further into this.

           

          regards,

           

          Andrew DInn

           

          • 2. Re: Thread dump with many threads blocked on getInitiatedClasses
            adinn

            Hi Tyler,

             

            Thanks for adding the extra details about the Byteman version and clarifying the situation regarding the dd.deps prefix.

             

            Is this deployment also running on JDK9 or JDK8? (or perhaps even 7?). Knowing this would make it easier to identify whether something is going wrong in the JVM underneath the call to getIntiatedClasses -- changes made in JDK9 to implement the module system could perhaps have altered the way that such lookups work and cause potential for a deadlock. I have not seen anything like that from any of the users who have been beta-testing the Byteman-4.0.0 release series but, of course, they may not have used Byteman in similar scenarios to the one you are reporting.

             

            If possible it would be useful to switch to the latest series 3.X release of Byteman (3.0.10) to see if this is to do with a change made as part of the update to 4.0.0. That would be appropriate in either these 2 cases:

             

            1. the deployment is not on JDK9
            2. the deployment is on JDK9 but the  Byteman rules only inject into classes of exported packages (that includes all classpath deployed classes)

             

            n.b. there is no significant difference in functionality between 3.0.10 and 4.0.0-BETA5 apart from the fact that 4.0.0 is aware of the module system and knows how to inject code into classes located in unexported packages. However,there are some significant differences in the implementation.

             

            Modulo what that tack might reveal, in order to diagnose further I'll probably need to know what rule are being loaded and into what code or, preferably, have a test case that reproduces the problem. I understand that you may not be able to disclose that sort of information but I'm very keen to follow up this problem so please let me know anything you can.

             

            regards,

             

             

            Andrew Dinn

            • 3. Re: Thread dump with many threads blocked on getInitiatedClasses
              tylerbenson

              Here is the version details for the app this happened on:

               

              java -version

              java version "1.8.0_121"

              Java(TM) SE Runtime Environment (build 1.8.0_121-b13)

              Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)

               

              I will try using the 3.0.10 version to see if that makes any difference.

               

              Thanks!

              • 4. Re: Thread dump with many threads blocked on getInitiatedClasses
                adinn

                Tyler Benson wrote:

                 

                I will try using the 3.0.10 version to see if that makes any difference.

                Ok, good. Let me know what happens.

                 

                regards,

                 

                Andrew Dinn

                • 5. Re: Thread dump with many threads blocked on getInitiatedClasses
                  tylerbenson

                  So I was able to reproduce the problem by creating a request that tries to load a lot of classes.  The problem was still present in 3.0.10.

                  Basically the problem ended up being the fact that I had several rules with super class matching (^).  By removing those rules, the performance returned to an acceptable level.

                   

                  Thanks for your help.  Let me know if you have further questions.

                  ~Tyler

                  • 6. Re: Thread dump with many threads blocked on getInitiatedClasses
                    adinn

                    Hi Tyler,

                     

                    Tyler Benson wrote:

                     

                    So I was able to reproduce the problem by creating a request that tries to load a lot of classes. The problem was still present in 3.0.10.

                    Basically the problem ended up being the fact that I had several rules with super class matching (^). By removing those rules, the performance returned to an acceptable level.

                     

                    Thanks very much for checking this and identifying/reporting that the problem relates to use of overriding injection. I don't think there is much I can do to improve performance when using this feature but I will take another look to see if there is anything that might help.

                     

                    regards,

                     

                     

                    Andrew Dinn