6 Replies Latest reply on Aug 25, 2009 8:56 AM by jaikiran

    VFSClassLoaderPolicy backed jboss-cl performance issue

    jaikiran

      While debugging through performance and boot time issues, we have been seeing that the BaseClassLoader.loadClass() which is backed by a VFSClassLoaderPolicy is consistently expensive. A call to BaseClassloader.loadClass() results in the following flow:

      1) BaseClassLoader.loadClass() leads to loadFromDomain
      2) The BaseClassLoaderDomain internally starts looking for loaders through findLoader()
      3) The findLoader() invokes findLoaderInExports() which tries to look for the correct loader in the "EXPORT"
      4) This ultimately leads to the VFSClassLoaderPolicy.getResource() which through it's findChild() API, starts
      traversing the jar(s).
      5) This traversing ultimately leads to AbstractVirtualFileHandler.structuredFindChild which consistently appears as an expensive operation given that it does:
      * ZipEntryHandler.isLeaf() - Haven't looked at why the isLeaf implementation contains the logic that it has right now.
      * PathTokenizer.getTokens() - Lot of string manipulations
      * ZipEntryHandler.createChildHandler() - Haven't looked at it.

      Here's the code from AbstractVirtualFileHandler.structuredFindChild http://anonsvn.jboss.org/repos/jbossas/projects/vfs/branches/Branch_2_1/src/main/java/org/jboss/virtual/plugins/context/AbstractVirtualFileHandler.java:

      /**
       * Structured implementation of get child
       *
       * @param path the path
       * @return the handler or <code>null</code> if it doesn't exist
       * @throws IOException for any error accessing the virtual file system
       * @throws IllegalArgumentException for a null name
       */
       public VirtualFileHandler structuredFindChild(String path) throws IOException
       {
       checkClosed();
      
       // Parse the path
       List<String> tokens = PathTokenizer.getTokens(path);
       if (tokens == null || tokens.size() == 0)
       return this;
      
       // Go through each context starting from ours
       // check the parents are not leaves.
       VirtualFileHandler current = this;
       for (int i = 0; i < tokens.size(); ++i)
       {
       if (current == null)
       return null;
      
       String token = tokens.get(i);
       if (PathTokenizer.isCurrentToken(token))
       continue;
      
       if (PathTokenizer.isReverseToken(token))
       {
       VirtualFileHandler parent = current.getParent();
       if (parent == null) // TODO - still IOE or null?
       throw new IOException("Using reverse path on top file handler: " + current + ", " + path);
       else
       current = parent;
      
       continue;
       }
      
       if (current.isLeaf())
       {
       return null;
       }
       else if (current instanceof StructuredVirtualFileHandler)
       {
       StructuredVirtualFileHandler structured = (StructuredVirtualFileHandler)current;
       current = structured.createChildHandler(token);
       }
       else
       {
       String remainingPath = PathTokenizer.getRemainingPath(tokens, i);
       return current.getChild(remainingPath);
       }
       }
      
       // The last one is the result
       return current;
       }
      


      I haven't yet browsed completely through any of this code. But the classloader performance is appearing as one of the expensive operations in every profiled snapshot that i have, not just at deployment time but at runtime too.

      Thoughts on how this can be improved?


      P.S: Although during boot time analysis of AS, i had mentioned that classloading backed by VFS was showing up as one of the expensive operation, at that time i had highlighted that it was affecting deployment time of the application. That issue is a separate one where the deployer used to traverse the jar(s) to find the package name capabilities of that jar.

        • 1. Re: VFSClassLoaderPolicy backed jboss-cl performance issue
          jaikiran

          I looked at this in a bit more detail and i think this can be optimized, although i don't have the exact changes that we need to do. Here's a more detailed version of the flow:

          1) The BaseClassLoader is invoked to load a class
          2) The BaseClassloader delegates this to the appropriate BaseClassLoaderDomain
          3) The domain has knowledge of the EXPORTs and also the package capabilities (i.e. which package is available in which jar) of those EXPORTs.
          4) The domain then selects a collection of such EXPORTs to identify the correct classloader out of those.
          5) In order to identify the correct loader, it asks each of those, through a getResource(String classResourceName) call whether it contains that resource. Actually, this is more of a isResourceAvailable type of boolean call where it doesn't care of the returned URL.
          6) If #5 indicates that the loader contains the necessary resource, then this loader is selected and scheduled for loading the class.

          The problem starts in #5 when the VFSClassLoaderPolicy is used. Specifically when the VFSClassLoaderPolicy.getResource(String classResourceName) is called, it starts looking in various "roots" to see if that resource is available within any of its known roots. While doing this, internally the each of the file handlers are invoked which ultimately lead to AbstractVirtualFileHandler.structureFindChild(String classResourceName) first splits the resource path name into multiple tokens (through PathTokenizer). This in itself is expensive because of the various String operations involved:

          PathTokenizer.getTokens():

          public static List<String> getTokens(String path)
           {
           if (path == null)
           throw new IllegalArgumentException("Null path");
          
           int start = -1, length = path.length(), state = STATE_INITIAL;
           char ch;
           List<String> list = new ArrayList<String>();
          
           for (int index = 0; index < length; index ++) {
           ch = path.charAt(index);
           switch (ch) {
           case '/': {
           switch (state) {
           case STATE_INITIAL: {
           // skip extra leading /
           continue;
           }
           case STATE_MAYBE_CURRENT_PATH: {
           // it's '.'
           list.add(CURRENT_PATH);
           state = STATE_INITIAL;
           continue;
           }
           case STATE_MAYBE_REVERSE_PATH: {
           // it's '..'
           list.add(REVERSE_PATH);
           state = STATE_INITIAL;
           continue;
           }
           case STATE_NORMAL: {
           // it's just a normal path segment
           list.add(path.substring(start, index));
           state = STATE_INITIAL;
           continue;
           }
           }
           continue;
           }
           case '.': {
           switch (state) {
           case STATE_INITIAL: {
           // . is the first char; might be a special token
           state = STATE_MAYBE_CURRENT_PATH;
           start = index;
           continue;
           }
           case STATE_MAYBE_CURRENT_PATH: {
           // the second . in a row...
           state = STATE_MAYBE_REVERSE_PATH;
           continue;
           }
           case STATE_MAYBE_REVERSE_PATH: {
           // the third . in a row, guess it's just a weird path name
           if (errorOnSuspiciousTokens) {
           throw new IllegalArgumentException("Illegal suspicious token in path: " + path);
           }
           state = STATE_NORMAL;
           continue;
           }
           }
           continue;
           }
           default: {
           switch (state) {
           case STATE_INITIAL: {
           state = STATE_NORMAL;
           start = index;
           continue;
           }
           case STATE_MAYBE_CURRENT_PATH:
           case STATE_MAYBE_REVERSE_PATH: {
           if (errorOnSuspiciousTokens) {
           throw new IllegalArgumentException("Illegal suspicious token in path: " + path);
           }
           state = STATE_NORMAL;
           }
           }
           }
           }
           }
           // handle the last token
           switch (state) {
           case STATE_INITIAL: {
           // trailing /
           break;
           }
           case STATE_MAYBE_CURRENT_PATH: {
           list.add(CURRENT_PATH);
           break;
           }
           case STATE_MAYBE_REVERSE_PATH: {
           list.add(REVERSE_PATH);
           break;
           }
           case STATE_NORMAL: {
           list.add(path.substring(start));
           break;
           }
           }
          
           return list;
           }
          


          For each of this token, the control is passed to the FileHandlers. One such handler is the ZipEntryHandler. And from what i see, the main reason why the loading of a class shows up as expensive is a result of our ZipEntryHandler. Here are some of the things it does almost for all operations. But in the current use case let's just consider the getChild(String path) method:

          i) ZipEntryHandler.getChild -> First gets the correct ZipEntryContext and then calls the getChild on the context. Any operations on the ZipEntryContext go through a isModified check which involves a call to System.currentTimeInMillis to get the current time and then comparing it with the earlier time. Do we need to do this modification check? This check happens almost on all operations on the context:

          VirtualFileHandler getChild(ZipEntryHandler parent, String name)
           {
           if (parent == null)
           throw new IllegalArgumentException("Null parent");
          
           checkIfModified();
          
           String pathName = parent.getLocalPathName();
           if("".equals(pathName))
           pathName = name;
           else
           pathName = pathName + "/" + name;
          
           EntryInfo ei = entries.get(pathName);
           if(ei != null)
           return ei.handler;
          
           return null;
           }
          
          


          private synchronized void checkIfModified()
           {
           // TODO: if zipSource represents a nested archive we should maybe delegate lastModified to its parent
           if (initStatus == InitializationStatus.NOT_INITIALIZED)
           {
           ensureEntries();
           }
           else if (initStatus == InitializationStatus.INITIALIZED && getZipSource().hasBeenModified())
           {
           EntryInfo rootInfo = entries.get("");
           entries.clear();
           entries.put("", rootInfo);
          
           if (getZipSource().exists())
           {
           try
           {
           initEntries();
           }
           catch(Exception ignored)
           {
           log.warn("IGNORING: Failed to reinitialize context: " + getRootURI(), ignored);
           }
           }
           }
           }
          



          ii) Within this flow, there's even an invocation to ZipEntryHandler.isLeaf(). Similar to the isModified call, this internally checks whether the zip context is closed. The isLeaf() implementation itself is a bit expensive since it's not a simple boolean check and instead does a .equals() on AbstractVirtualFileHandler:

          ZipEntryContext.isLeaf:
           boolean isLeaf(ZipEntryHandler handler)
           {
           if (handler == null)
           throw new IllegalArgumentException("Null handler");
          
           if (getRoot().equals(handler) == false)
           checkIfModified();
          
           EntryInfo ei = entries.get(handler.getLocalPathName());
           if (ei == null || ei.entry == null)
           return false;
          
           return ei.entry.isDirectory() == false;
           }
          


          AbstractVirtualFileHandler.equals()
          public boolean equals(Object obj)
           {
           if (this == obj)
           return true;
           if (obj == null || obj instanceof VirtualFileHandler == false)
           return false;
           VirtualFileHandler other = (VirtualFileHandler) obj;
           if (getVFSContext().equals(other.getVFSContext()) == false)
           return false;
           if (getPathName().equals(other.getPathName()) == false)
           return false;
           return true;
           }
          
          


          So overall, in order to find whether a zip (jar) file contains a given class resource (remember we already know it contains the package, we are just checking whether it contains the specific class) we do a lot of expensive operations. I haven't completely looked at the ZipEntryHandler/Context, but would it be possible to have some sort of a mapping which has the entries contained in that jar? I see that we are doing something similar in the initEntries() but not sure whether it can be used.

          So if we wanted to check if a given resource is available in the ZipEntryContext, maybe we could just do a entries.containsKey(resourcePath)? Would it be possible to simplify the zip entry handling?

          • 2. Re: VFSClassLoaderPolicy backed jboss-cl performance issue
            mstruk

             

            "jaikiran" wrote:
            While doing this, internally the each of the file handlers are invoked which ultimately lead to AbstractVirtualFileHandler.structureFindChild(String classResourceName) first splits the resource path name into multiple tokens (through PathTokenizer). This in itself is expensive because of the various String operations involved:


            Accross the whole VFS impl there is a lot of pathname parsing. One way to alleviate this would be to introduce a Path object that contains an already parsed path. This way all the methods taking a path in the form of a String could be rewritten to taking Path instead. Parsing would then happen only once for each path.

            "jaikiran" wrote:

            i) ZipEntryHandler.getChild -> First gets the correct ZipEntryContext and then calls the getChild on the context. Any operations on the ZipEntryContext go through a isModified check which involves a call to System.currentTimeInMillis to get the current time and then comparing it with the earlier time. Do we need to do this modification check?


            I doubt you can improve performance simply by skipping a few tens of thousands of System.currentTimeMillis().

            For example - the following code executes in 50-60ms on my laptop:

            import java.util.ArrayList;
            
            class Test {
            
             public static void main(String [] args) {
            
             long start = System.currentTimeMillis();
             ArrayList<Long> tstamps = new ArrayList<Long>(100000);
            
             for(int i=0; i<100000; i++) {
             long now = System.currentTimeMillis();
             tstamps.add(now);
             if (now - start > 1000)
             throw new RuntimeException("Your comp is slow, man!");
             }
            
             System.out.println("Time elapsed for " + tstamps.size() + " iterations: " + (tstamps.get(tstamps.size()-1) - tstamps.get(0)));
             }
            }
            


            isModified() check simply ensures that when you say getChild(name) you get the datastructure reflecting the actual state of the underlying file (someone may have dropped a modified ear in jboss deploy dir). The really expensive stuff only happens if the underlying file actually is changed. In this case reinit of children is triggered.

            Actually to make sure the datastructure reflects the current state of the file you would have to hit file's lastModified() method continuously which is pretty expensive. There's optimization in there that doesn't hit the file more often than once a second - see ZipWrapper . hasBeenModified(). In fact this should brobably be stated in the javadoc of this method which currently is not.

            This optimization is purposefully done as close to the file communication layer as possible to avoid introducing unexpected behaviour (read: vfs taking unusually long to pick up changes in the file).


            "jaikiran" wrote:


            ii) Within this flow, there's even an invocation to ZipEntryHandler.isLeaf(). Similar to the isModified call, this internally checks whether the zip context is closed. The isLeaf() implementation itself is a bit expensive since it's not a simple boolean check and instead does a .equals() on AbstractVirtualFileHandler:

            ZipEntryContext.isLeaf:
             boolean isLeaf(ZipEntryHandler handler)
             {
             if (handler == null)
             throw new IllegalArgumentException("Null handler");
            
             if (getRoot().equals(handler) == false)
             checkIfModified();
            
             EntryInfo ei = entries.get(handler.getLocalPathName());
             if (ei == null || ei.entry == null)
             return false;
            
             return ei.entry.isDirectory() == false;
             }
            



            You can't just do a boolean check here because ZipEntryContext represents a mounted filesystem. What you're saying here is: 'Hey, you zip filesystem, check this file for me - is it a file, or is it a dir'.

            And filesystem goes: "Let's see what he passed to me. Ok .. it's not null. Is it a zip file - the one that I as a filesystem am taking care of? It's not. In that case before doing anything else - to reflect the current state of the file on the disk - let's check if maybe the file was modified and let's reinit ourselves before continuing to reflect the latest.
            Now let's see if we have a file entry for the pathname of the passed handler. Ok, we have an entry - let's see is it a file?"

            It looks kind of convoluted, but it's because it tries to be lazy on one hand, deferring the really slow stuff until really necessary, and tries to be current on the other, detecting file changes promptly.


            I don't see how you could reimplement AbstractVirtualFileHandler.equals() to stay semantically correct and take zero cycles to compute :)

            "jaikiran" wrote:

            So overall, in order to find whether a zip (jar) file contains a given class resource (remember we already know it contains the package, we are just checking whether it contains the specific class) we do a lot of expensive operations. I haven't completely looked at the ZipEntryHandler/Context, but would it be possible to have some sort of a mapping which has the entries contained in that jar? I see that we are doing something similar in the initEntries() but not sure whether it can be used.

            So if we wanted to check if a given resource is available in the ZipEntryContext, maybe we could just do a entries.containsKey(resourcePath)? Would it be possible to simplify the zip entry handling?


            The code was originally written to support any usage scenario. That means that for some specific usage scenarios it could be optimized. If you know from the outside that the underlying file won't change, then all the code in vfszip that checks if the file was changed is redundant for your use-case. So you could for example introduce additional option UNCHANGABLE or something, and skip continous isModified checks when option is enabled.

            Cheers,

            - marko


            • 3. Re: VFSClassLoaderPolicy backed jboss-cl performance issue
              jaikiran

               

              "mstruk" wrote:


              I doubt you can improve performance simply by skipping a few tens of thousands of System.currentTimeMillis().



              Nah, that's not going to improve performance :) I posted that as a question on why we check for modifications on each operation (from a functionality point rather than a performance point).


              • 4. Re: VFSClassLoaderPolicy backed jboss-cl performance issue
                jaikiran

                A bit more on a higher level - Assuming we have a ZipEntryContext/Handler, does it know of the entries within that zip? I see a ZipEntryContext.entries Map in the code, but haven't yet figured out whether looking directly into this would indicate the presence/absence of a resource within the zip.

                So if a getChild(String path) is invoked, maybe looking into this "entries" would tell us whether it's present or not?

                • 5. Re: VFSClassLoaderPolicy backed jboss-cl performance issue
                  mstruk

                   

                  "jaikiran" wrote:
                  A bit more on a higher level - Assuming we have a ZipEntryContext/Handler, does it know of the entries within that zip? I see a ZipEntryContext.entries Map in the code, but haven't yet figured out whether looking directly into this would indicate the presence/absence of a resource within the zip.

                  So if a getChild(String path) is invoked, maybe looking into this "entries" would tell us whether it's present or not?


                  Yes, ZipEntryContext.entries contains relative path of the entry within a zip as the key and EntryInfo representing the entry as the value.
                  It reflects the state of a zip archive in a specific point in time.

                  Since entries is initialized in a lazy manner you need to make sure it was initialized first - before relying on direct entries.get(path).

                  ensureEntries() is the method that does that and it is called from checkIfModified().

                  But you can call ensureEntries() directly of course - without going through checkIfModified().

                  There is initStatus field you can use to check that entries have been initialized:

                  if (initStatus == InitializationStatus.INITIALIZED) {
                   EntryInfo entry = entries.get(path);
                  }
                  


                  But again, if you want to detect changes in the underlying zip file, you have to call checkIfModified which does the reinit of entries for you if it detects a change.

                  - marko

                  • 6. Re: VFSClassLoaderPolicy backed jboss-cl performance issue
                    jaikiran

                    Thanks Marko, for these details. I'll see if this can help out in some way to reduce the processing that we do. Probably the need to split the path on tokens might be eliminated if we get hold of the correct context and just call the entries.get(path) - ofcourse after ensuring the entries have been inited.