2 Replies Latest reply on Nov 10, 2010 1:26 PM by jesper.pedersen

    New Analyzers architecture proposal

    michelemauro
      After Jesper's presentation of Tattletale at JUDCon Berlin 2010, and some talking with him, I started looking around the Tattletale code to think about how to support the addition of new Analyzers, both to cover already stated use-cases (to analyze WARs and EARs) and some of mine (to record class usage in Spring configuration files, web.xml files and jBPM Process Definitions). I'll try here to explain my idea on how to support a more extensible architecture of Analyzers to make easier to add feature to Tattletale.
      The main inspiration for my solution is the Visitor pattern: unifying how the analyzer walks the "stuff to be analyzed" and returns its results, we can make the implementation of the single analysis pluggable and extensible.
      There are three mayor concepts, which will probably become interfaces:
      - Element: an element is something to be analyzed. It is the input for the analyzer.
      - Result: a result is the result of the analysis. It can contain these values:
        * whether the analyzed element is an Archive or not
        * classes provided by the element
        * classes on which the element depends on
        * package on which the element depends on
        * platform classes on which the element depends on
        * blacklisted classes on which the element depends on
      - Analyzer: the analyzer can accept an element, and return a list of elements and results of the analysis. It can also reject the element, if it can't manage it. The rejected element remains not analyzed.
      The algorithm can be outlined with these steps:
      a stack of elements is initialized with the elements specified in the command line arguments;
      while (the stack is not empty) {
        find an analyzer that accepts the top of the stack;
        analyze the top of the stack;
        collect and aggregate all the results returned;
        push on the stack all the elements returned;
      }
      Some examples:
      An Element can be:
      - a String (a path coming from the command line)
      - a directory
      - a file (of any type)
      - a JarEntry
      A Result can be:
      - a not yet analyzed element, found inside the analyzed one
      - a named archive, with all its data (provided and used classes)
      - a list of used classes, with no specific archive name, from a configuration file for example
      - one or more provided classes and list of dependencies, but no archive name (or a default one), from a .class file
      Specific analyzers will be:
      - PathAnalyzer: used especially at the beginning, will accept a string an return as Elements the filesystem objects (be them directories or single files)
      - DirectoryAnalyzer: will accept a directory, and return an element for every file or directory contained for further analysis
      - ClassfileAnalyzer: will accept a .class file or JarEntry, and return a result detailing the class and inner classes provided, and the classes that they depend on
      - JarAnalyzer: will accept a .jar file (or entry in a war/ear), and return an archive obtained analyzing recursively the jar's content with the appropriate analyzers and summing up the results.
      - SpringXmlAnalyzer: will accept a .xml file or jar entry, and return it if it's not a spring configuration file, or return a result with all the classnames it can find inside it
      - WebXmlAnalyzer: will accept a file (or jar entry) named web.xml, and return all the usages declared inside (filters, servlets, etc.)
      - PersistenceXmlAnalyzer: will accept a file (or jar entry) named persistence.xml, and return all the usages declared inside (filters, servlets, etc.)
      - WarAnalyzer: will accept a .war file (or entry in a .ear) and return a set of archives for each jar it finds inside (analyzed with a JarAnalyzer), all the provides and usages from the .classes files in WEB-INF/classes (using ClassfileAnalyzer), plus the usages in the web.xml (WebAnalyzer) and other files it can find an analyzer for (persistence.xml, beans.xml, etc.)
      - EarAnalyzer: same thing as WarAnalyzer, only for EARs. Will recurse into internal WARs, and so on.
      New analyzers should be easy to add, via a configuration file or command line options. The whole analysis driver should be encapsulated in his own class so it can be reused in the more complex analyzers.
      The actual implementatio will find itself splitted between the directory, the jar and the class analyzer, and the main driver. The other analyzers can then be added. The main loop must be inverted and refactored to use a stack of elements to evaluate, and find the right analyzer for the topmost one, collect and aggregate the results and push on the stack any new element found. When the stack has be emptied, the aggregated result will be passed to the reporting engine.
      Contrary on the actual implementation, the Analyzer object will be called on instance methods: its use case is creation (with eventual dependencies and services), asking for acceptance of a element, analysis of the element. Instead on having the analyzer accumulating itself the discovered data inside some maps passed around, it will be the outmost cycle who will be responsible for merging the returned result in the growing database.
      I will think about it a little more, an discuss any feedbacks of course, before start coding. What do you think?

      Hi all!

       

      After Jesper's presentation of Tattletale at JUDCon Berlin 2010, and some talking with him, I started looking around the Tattletale code to think about how to support the addition of new Analyzers, both to cover already expressed use-cases (to analyze WARs and EARs) and some of mine (to record class usage in Spring configuration files, web.xml files and jBPM Process Definitions). I'll try here to explain my idea on how to support a more extensible architecture of Analyzers to make easier to add feature to Tattletale.

       

      The main inspiration for my solution is the Visitor pattern: unifying how the analyzer walks the "stuff to be analyzed" and returns its results, we can make the implementation of the single analysis pluggable and extensible.

       

      There are three mayor concepts, which will probably become interfaces:

       

      • Element: an element is something to be analyzed. It is the input for the analyzer.
      • Result: a result is the result of the analysis. It can contain these values:
        •   whether the analyzed element is an Archive or not
        •   classes provided by the element
        •   classes on which the element depends on
        •   package on which the element depends on
        •   platform classes on which the element depends on
        •   blacklisted classes on which the element depends on
      • Analyzer: the analyzer can accept an element, and return a list of elements and results of the analysis. It can also reject the element, if it can't manage it. The rejected element remains not analyzed.

       

      The algorithm can be outlined with these steps:

       

      a stack of elements is initialized with the elements specified in the command line arguments;

      while (the stack is not empty) {

        find an analyzer that accepts the top of the stack;

        analyze the top of the stack;

        collect and aggregate all the results returned;

        push on the stack all the elements returned;

      }

       

      Some examples:

       

      An Element can be:

      • a String (a path coming from the command line)
      • a directory
      • a file (of any type)
      • a JarEntry

       

      A Result can be:

      • a not yet analyzed element, found inside the analyzed one
      • a named archive, with all its data (provided and used classes)
      • a list of used classes, with no specific archive name, from a configuration file for example
      • one or more provided classes and list of dependencies, but no archive name (or a default one), from a .class file

       

      Specific analyzers will be:

      • PathAnalyzer: used especially at the beginning, will accept a string an return as Elements the filesystem objects (be them directories or single files)
      • DirectoryAnalyzer: will accept a directory, and return an element for every file or directory contained for further analysis
      • ClassfileAnalyzer: will accept a .class file or JarEntry, and return a result detailing the class and inner classes provided, and the classes that they depend on
      • JarAnalyzer: will accept a .jar file (or entry in a war/ear), and return an archive obtained analyzing recursively the jar's content with the appropriate analyzers and summing up the results.
      • SpringXmlAnalyzer: will accept a .xml file or jar entry, and return it if it's not a spring configuration file, or return a result with all the classnames it can find inside it
      • WebXmlAnalyzer: will accept a file (or jar entry) named web.xml, and return all the usages declared inside (filters, servlets, etc.)
      • PersistenceXmlAnalyzer: will accept a file (or jar entry) named persistence.xml, and return all the usages declared inside (filters, servlets, etc.)
      • WarAnalyzer: will accept a .war file (or entry in a .ear) and return a set of archives for each jar it finds inside (analyzed with a JarAnalyzer), all the provides and usages from the .classes files in WEB-INF/classes (using ClassfileAnalyzer), plus the usages in the web.xml (WebAnalyzer) and other files it can find an analyzer for (persistence.xml, beans.xml, etc.)
      • EarAnalyzer: same thing as WarAnalyzer, only for EARs. Will recurse into internal WARs, and so on.

       

      New analyzers should be easy to add, via a configuration file or command line options. The whole analysis driver should be encapsulated in his own class so it can be reused in the more complex analyzers.

       

      The current code will have to be splitted between the directory, the jar and the class analyzer, and the main driver. The other analyzers can then be added. The main loop must be inverted and refactored to use a stack of elements to evaluate, and find the right analyzer for the topmost one, collect and aggregate the results and push on the stack any new element found. When the stack has be emptied, the aggregated result will be passed to the reporting engine.

       

      Contrary on the current implementation, the Analyzer object will be called on instance methods: its use case is creation (with eventual dependencies and services), asking for acceptance of a element, analysis of the element. Instead on having the analyzer accumulating itself the discovered data inside some maps passed around, it will be the outmost cycle who will be responsible for merging the returned results in the growing database.

       

      I will think about it a little more, and discuss any feedbacks of course, before start coding a possible implementation.

      Thank you for your attention,

       

      What do you think?

       

      Michele Mauro

        • 1. Re: New Analyzers architecture proposal
          barnettc

          Hi Michelle

           

          If you should pursue this what would you think of producing  results in an XML format? I started looking at an issue (TTALE-118) to format reports in XML. After reviewing the code it seemed to me that the most direct approach was to produce XSL-FO XML in parallel with the current code producing the HTML reports. I did not see much value in producing an XML format for each report and then converting the XML to XSL. If your suggestion is implemented then a more comprehensive reporting system would make sense. XML results would also be a nice way of providing analysis results to other projects. If you are interested I would be glad to take a stab at producing an XML schema.

           

          Charlie Barnett

          • 2. Re: New Analyzers architecture proposal
            jesper.pedersen

            Sounds like a good structure.

             

            Today there is NestableArchive which can be used to f.ex. represent a .WAR file - however adding support for recording / parsing additional metadata (like web.xml) could benefit generated reports.

             

            Like Charlie said, he is looking at producing a XML schema that can be transformed into different output formats - which will be one of the last stages in the pipeline. So its best to coordinate through the forum.

             

            Thanks for looking into this.