12 Replies Latest reply on Mar 26, 2008 5:23 AM by adrian.brock

    JarEntry as VFS root does not return empty vfspath

    mstruk

      Hi,

      I've been looking at http://jira.jboss.com/jira/browse/JBVFS-4?rc=1 to get it fixed and get acquainted with VFS in the process.

      I figure the current quick fix is called a hack because transient field vfsPath in AbstractVirtualFileHandler.java set through setPathName() only keeps it's value until it's serialized/deserialized. setPathName() should not exist at all, and AbstractVirtualFileHandler should contain enough information to properly lazy-init vfsPath.

      So the idea of a fix is (correct me if I'm wrong):

      a) get rid of setPathName()
      b) pass JARVFSContextUnitTestCase . testPathIsEmptryForJarEntryAsRoot
      c) don't break any tests


      First question, since VFS is such a central component, and I don't want to commit something that will break things for others - is the 'don't break any tests' a reasonable enough criteria for code 'commitability' or should I also at least make sure jboss5 default config runs with the new jboss-vfs.jar, or is there something else I should check as well?

      The second question, is there any VFS documentation? There's API class diagram on wiki, and there's some javadoc, but that's all I could find.

      vfsPath for example, what is its contract? Based on test cases, the contract seems to be:

      "vfsPath is a full path name relative to context root path"

      JarEntry as VFS root fails to stick to this contract, and that is the essence of this bug.


      For example, if full path name is: /dir1/jar2.jar!/org/jboss/someclass

      and context root path is /dir1/jar2.jar!/

      then vfsPath is: org/jboss/someclass

      And code for it is very simple:

      fullPathName.substring(contextPath.length());


      If naming scheme is consistently applied, then it's all very clean and simple.


      AbstractVirtualFileHandler currently has access to context root (context.getRootURI()), parent path name (parent.getPathName()), and name.

      From this information intuitively (and that's how the code is currently written) you get pathName by putting together parent's pathName and 'name'. The problem is that contract for 'name' apparently is not consistent with the naming scheme.

      In our case 'name' is not empty - it contains the name of the jar file.

      Question: is this expected and proper behaviour - that 'name' is not consistent with the naming scheme - that in this case it is not empty, but contains the name of the jar file?

      If this is proper behaviour, then maybe we need to provide extra information to AbstractVirtualFileHandler, because I don't know if we can properly generate pathName from existing information.

      If it's not proper behaviour, then we simply fix 'name' (all the code that passes bad name to file handler constructor) to conform to the contract and pathName will automatically get fixed.

      Feedback appreciated,

      Cheers,

      - marko


        • 1. Re: JarEntry as VFS root does not return empty vfspath
          starksm64

           

          "mstruk" wrote:

          So the idea of a fix is (correct me if I'm wrong):

          a) get rid of setPathName()
          b) pass JARVFSContextUnitTestCase . testPathIsEmptryForJarEntryAsRoot
          c) don't break any tests


          First question, since VFS is such a central component, and I don't want to commit something that will break things for others - is the 'don't break any tests' a reasonable enough criteria for code 'commitability' or should I also at least make sure jboss5 default config runs with the new jboss-vfs.jar, or is there something else I should check as well?

          The vfs tests and jboss5 should be sufficient. If they are not we need more tests.

          "mstruk" wrote:

          The second question, is there any VFS documentation? There's API class diagram on wiki, and there's some javadoc, but that's all I could find.

          javadocs and tests are all that exists currently.

          "mstruk" wrote:

          vfsPath for example, what is its contract? Based on test cases, the contract seems to be:

          "vfsPath is a full path name relative to context root path"

          Correct.

          "mstruk" wrote:

          JarEntry as VFS root fails to stick to this contract, and that is the essence of this bug.


          We have a number of issues with relying on the jdk jar classes, the JarEntry naming, jar: urls not supporting nested jars, JarFile not being serializable, JarFile requiring a java.io.File are the issue off of the top of my head.

          Not using the JarEntry name, removing AbstractVirtualFileHandler.setFileName, and fixing the existing usage of setFileName may work, but I suspect we need to redo how we handle jars to not rely on JarFile.


          • 2. Re: JarEntry as VFS root does not return empty vfspath
            alesj

             

            "scott.stark@jboss.org" wrote:
            but I suspect we need to redo how we handle jars to not rely on JarFile.

            Scott, what are some of the things you've already looked at, for this issue?
            If we just ironed these details from those projects...

            How does this relate with the urlconn-work branch?
            Or what's that status of that?

            • 3. Re: JarEntry as VFS root does not return empty vfspath
              mstruk

              I took the 'fix setting of a name' approach and I have a working fix for this issue. Still have to do some code cleanup before committing.

              There is just one inconsistency left.

              The unit test for this issue actually expects the wrong behavior (as far as I understand).

              This is the code:

               URL url = getResource("/vfs/context/jar/simple.jar");
               URL entry = new URL("jar:" + url.toString() + "!/child");
               JarContext context = new JarContext(entry);
               assertEquals("child", context.getRoot().getName()); // WRONG: we should expect ""
               assertEquals("", context.getRoot().getPathName());
              
               url = getResource("/vfs/test/outer.jar");
               entry = new URL("jar:" + url.toString() + "!/jar2.jar ");
               context = new JarContext(entry);
               assertEquals("jar2.jar", context.getRoot().getName()); // WRONG: we should expect ""
               assertEquals("", context.getRoot().getPathName());
              


              It's wrong because root node should always have "" name, and "" pathName.

              If it doesn't it's a violation of naming rules:

              fullUrl = contextUrl + pathName

              pathName = parent.pathName + "/" + name



              I can fix the unit test, and commit everything, unless someone finds something wrong with my logic :)




              • 4. Re: JarEntry as VFS root does not return empty vfspath
                starksm64

                Right, the test is wrong if JarContext(entry) is no longer simply returning the JarEntry name, and it should not be.

                • 5. Re: JarEntry as VFS root does not return empty vfspath

                   

                  "scott.stark@jboss.org" wrote:
                  Right, the test is wrong if JarContext(entry) is no longer simply returning the JarEntry name, and it should not be.


                  The VFS "name" should never be the empty string.

                  This is used in many places, e.g. VFSDeployment.getSimpleName()
                  and will break things if it is empty.

                  The idea of the "name" is say what artifact you are referring to without the path.

                  If you want to add a "contextName" which is just the relevant section used
                  to construct the pathName then that is fine, but don't change the "name".

                  Suppose you have

                  url://some/path/outer.jar/lib/inner.jar
                  and construct a vfs context from (context name obviously doesn't exist)

                  root = VFS.getRoot("url://some/path");

                  root.getName() : "path" (this should not be empty)
                  root.getPathName() : ""
                  root.getContextName() : "" (this would be empty because it is the root of the vfs)

                  outer = root.getChild("outer.jar");
                  outer.getName() : "outer.jar");
                  outer.getPathName() : "outer.jar"
                  outer.getContextName() : "outer.jar")

                  inner = outer.getChild("lib/inner.jar");
                  inner.getName() : "inner.jar"
                  inner.getPathName() : "outer.jar/lib/inner.jar"
                  inner.getContextName() : "inner.jar"

                  As you can see the ContextName is the same as the Name
                  except when it is the root of the VFS (i.e. no parent), so I'm not sure the ContextName
                  is really necessary. Or if it is, it could easily be written in the abstract

                  public String getContextName()
                  {
                  if (getParent() == null)
                  return "";
                  return getName();
                  }

                  • 6. Re: JarEntry as VFS root does not return empty vfspath
                    starksm64

                    Currently the name is "" for the root is the contract. Make the difference between the name and vfs path context name explicit to document the different, JBVFS-19

                     /**
                     * Get the simple VF name (X.java)
                     *
                     * @return the simple, non-empty file name of the VF
                     */
                     String getName();
                    
                     /**
                     * Get the simple VF context name (X.java). This is the name portion of the VF in the pathName.
                     *
                     * @return the path context name for the VFS relative path name. This will be empty for root VFs.
                     */
                     String getContextName();
                    
                     /**
                     * Get the VFS relative path name (org/jboss/X.java)
                     *
                     * @return the VFS relative path name. This will be empty for root VFs.
                     */
                     String getPathName();
                    



                    • 7. Re: JarEntry as VFS root does not return empty vfspath
                      mstruk

                      Ok, I'm implementing the fix to work with root-name-not-an-empty-string contract.

                      Root node test:

                      isContextRoot = parent == null
                      


                      does not work for situations where context root is not the jar itself but a path within it (jar://some/path.jar!/child).

                      In the absence of isRoot field on AbstractVirtualFileHandler one approach is along the line:

                      public boolean isContextRoot() {
                      
                       try
                       {
                       VirtualFileHandler ctxRoot = getVFSContext().getRoot();
                       return this == ctxRoot;
                       }
                       catch (IOException e)
                       {
                       throw new RuntimeException("Failed to get context root: ", e);
                       }
                      
                       }
                      


                      but it blows up on getVFSContext() if handler has been closed - it goes into infinite recursion (because I call isContextRoot() inside initPath() which is called from getPathName() which is called from toString() which is called from Exception generating constructor within getVFSContext()... a mess :)

                      I think it's overkill to blow on getVFSContext() which is just a field retrieval - no logic invoked. We should let the code retrieve the context and let the context blow up later if some operation is invoked on it.

                      Lazy init of contextPath is also a mine that makes things blow up when you poke around and play with code. Things would be much more predictable if we just set it through constructor and make it non transient.

                      To stick to the issue - I think it would be best to introduce non-transient field boolean isContextRoot to AbstractVirtualFileHandler and init it through constructor.

                      BTW: Is there a wiki page I can use for a little VFS guide I'm putting together? Where others in-the-know can point out inaccuracies I commit :)


                      - marko



                      • 8. Re: JarEntry as VFS root does not return empty vfspath

                         

                        "mstruk" wrote:

                        Lazy init of contextPath is also a mine that makes things blow up when you poke around and play with code. Things would be much more predictable if we just set it through constructor and make it non transient.


                        I don't remember the code being that difficult, but then I haven't worked on it for a while?
                        IIRC, the path originally was setup lazily, except for the now obsolete jar handler
                        that went through every entry up-front?

                        I don't understand why you can't fall back to getParent().getParent().etc.
                        at the worst?


                        To stick to the issue - I think it would be best to introduce non-transient field boolean isContextRoot to AbstractVirtualFileHandler and init it through constructor.


                        Sounds like a simple solution, but maybe not so good on memory usage?


                        BTW: Is there a wiki page I can use for a little VFS guide I'm putting together? Where others in-the-know can point out inaccuracies I commit :)


                        Besides this, I don't know? :-)
                        http://wiki.jboss.org/wiki/Wiki.jsp?page=JBoss5VirtualFileSystem
                        The original design(s) - it's been few a couple of iterations - were
                        mostly described in the forums. I don't know that anybody has collected the
                        implementation details into a more formal document?

                        • 9. Re: JarEntry as VFS root does not return empty vfspath
                          mstruk

                           


                          IIRC, the path originally was setup lazily, except for the now obsolete jar handler that went through every entry up-front?


                          I'd like to learn more about JarHandler being obsolete ...



                          I don't understand why you can't fall back to getParent().getParent().etc.
                          at the worst?



                          This is our context URI:

                          file:/vfs/test/outer.jar!/org/jboss/test/vfs/support

                          These are parent child relationships for handlers within JarContext:


                          null
                          JarHandler (represents: file:/vfs/test/outer.jar)
                          JarEntryHandler (represents org)
                          JarEntryHandler (represents jboss)
                          JarEntryHandler (represents test)
                          JarEntryHandler (represents vfs)
                          root = JarEntryHandler (represents support)
                          JarEntryHandler (represents CommonClass.class)
                          JarEntryHandler (represents jar1)
                          JarEntryHandler (represents jar2)


                          root handler has VF handler representing 'vfs' for its parent. getParent().getParent() returns VF handler representing 'test'.


                          The solution would be to set root's parent to null.



                          I don't remember the code being that difficult, but then I haven't worked on it for a while?


                          I've approached the problem with a simple guideline: make a fix with a minimum amount of code change.
                          Don't change method signatures, try not to introduce additional fields ...

                          AbstractVirtualFileHandler gets three pieces of information: VFSContext, parent, name. To properly compose pathName I have to start generating the path at root node. I need to know which node is root and I need to make this check during pathName generation.

                          The simplest way was - let's use naming convention. Root should always have '', and it makes no sense for any normal file or directory to have '' name so we can test for root like this: isRoot = "".equals(name).

                          But this fell through because root in this VFS implementation is not '' and shouldn't be forced to be ''.

                          Ok, next I tried using parent: isRoot = parent == null. But turns out it's not null.

                          There are ways to fix this - we just set it to null somewhere, but who knows what dragons this will wake up so I saved this approach for later maybe.

                          Let's use VFSContext then. Let's get root handler from it and see if it's equal to this.

                          This exposed infinite recursion mines. First VFSContext closed exception that calls toString() that calls getPathName(), then in equals() which again calls getPathName(), the only thing left was == comparison which is not reliable due to unreliable context root retrieval, and serialization/deserialization ...

                          So this approach fell through as well.




                          Sounds like a simple solution, but maybe not so good on memory usage?


                          Ok, I'm trying another thing first - 'set parent to null' approach, that won't require any additional fields. I see two ways of doing it. One is to propagate some extra info based on which each handler can determine - as it's being created - if it is a root handler or not. That approach is processor intensive and requires small changes in many places. Another approach is to introduce setParent(VirtualFileHandler) or setAsContextRoot() method and call it on only one handler as it's assigned to be context root. That approach requires one extra method and one extra call - very elegant. But ... makes handler creation a two-phase process for handlers that become root and makes these handlers mutable. Looking at AbstractVirtualFileHandler I find no reliance on immutability so I don't expect problems.


                          I'm testing a fix made using this approach, and so far everything is working nicely.


                          In the end I'll remove more code cleaning out the current quick fix than add new code for a proper solution :)


                          - marko


                          • 10. Re: JarEntry as VFS root does not return empty vfspath

                             

                            "mstruk" wrote:

                            IIRC, the path originally was setup lazily, except for the now obsolete jar handler that went through every entry up-front?


                            I'd like to learn more about JarHandler being obsolete ...


                            When I reworked the VFS there were two different implementation patterns.

                            One was for things have directory structures like the file system
                            (see AbstractVirtualHandler::structuredFindChild)
                            the other was things like jars where you list jar entries but they aren't structured
                            in directories
                            (see AbstractVirtualHandler::simpleFindChild)

                            The JarHandler later got changed by somebody else (Scott?) to simulate
                            a directory structure.

                            • 11. Re: JarEntry as VFS root does not return empty vfspath

                               

                              "mstruk" wrote:

                              This is our context URI:

                              file:/vfs/test/outer.jar!/org/jboss/test/vfs/support

                              These are parent child relationships for handlers within JarContext:

                              <pre>
                              null
                              JarHandler (represents: file:/vfs/test/outer.jar)
                              JarEntryHandler (represents org)
                              JarEntryHandler (represents jboss)
                              JarEntryHandler (represents test)
                              JarEntryHandler (represents vfs)
                              root = JarEntryHandler (represents support)
                              JarEntryHandler (represents CommonClass.class)
                              JarEntryHandler (represents jar1)
                              JarEntryHandler (represents jar2)
                              </pre>

                              root handler has VF handler representing 'vfs' for its parent. getParent().getParent() returns VF handler representing 'test'.


                              The solution would be to set root's parent to null.


                              Ok so the problem is that whatever is creating the JAR VFS
                              is letting you leak off the top when you create a VFS based on a jar entry
                              all the way back to the jar file.

                              In fact, the code says it is a hack in JarContext
                               protected VirtualFileHandler createVirtualFileHandler(VirtualFileHandler parent, URL url) throws IOException
                               {
                               if (url == null)
                               throw new IllegalArgumentException("Null url");
                              
                               String urlStr = url.toString();
                               String jarName = extractJarName(urlStr);
                               String entryPath = urlStr;
                               entryPath = entryPath(entryPath);
                               JarHandler jar = new JarHandler(this, parent, url, jarName);
                               if (entryPath == null)
                               return jar;
                              
                               // todo This is a hack until we can fix http://jira.jboss.com/jira/browse/JBMICROCONT-164
                               AbstractVirtualFileHandler result = (AbstractVirtualFileHandler)jar.getChild(entryPath);
                               result.setPathName("");
                               return result;
                               }
                              


                              The issue has been around for a year. ;-)

                              • 12. Re: JarEntry as VFS root does not return empty vfspath

                                 

                                "mstruk" wrote:

                                Ok, I'm trying another thing first - 'set parent to null' approach, that won't require any additional fields. I see two ways of doing it. One is to propagate some extra info based on which each handler can determine - as it's being created - if it is a root handler or not. That approach is processor intensive and requires small changes in many places. Another approach is to introduce setParent(VirtualFileHandler) or setAsContextRoot() method and call it on only one handler as it's assigned to be context root. That approach requires one extra method and one extra call - very elegant. But ... makes handler creation a two-phase process for handlers that become root and makes these handlers mutable. Looking at AbstractVirtualFileHandler I find no reliance on immutability so I don't expect problems.


                                I'm testing a fix made using this approach, and so far everything is working nicely.


                                In the end I'll remove more code cleaning out the current quick fix than add new code for a proper solution :)


                                - marko


                                Sounds overly complicated. The problem with complicated things
                                is that they generally need workarounds when they don't work as expected
                                leading to workarounds for workarounds (like the state you are in now :-)

                                Why can't a JarEntryHandler just be a root in its own right
                                with JarContext knowing how to properly construct a root handler for a jar entry?

                                Looks to me like "laziness" in that all the code to do the entry handling
                                requires a top level JarHandler where it does the initJarFile()
                                and once that is done, it just does findChild().
                                i.e. you don't really have a VFS Context for the entry, its for the jar.