3 Replies Latest reply on May 21, 2012 4:20 PM by Aslak Knutsen

    arquillian.org The Reference Dictionary

    Aslak Knutsen Master

      I've been playing with something I call the: The Reference Dictionary


      Think of it as a Term / Concept Dictionary based on JavaDoc data and not as a JavaDoc replacement. User facing API's in Arquillian are mostly Annotations, so the javadoc can describe the concept of what Annotation X is and does, and the incomming/outgoing references gives a hint to it's surrounding Concepts.


      The incomming/outgoing references are based on CamelCase word scan of the description, like a wiki, but only accept words that are other concepts in the dictionary. Each concept gets a Ranking based on how many others refer to them and how many times etc (the small numbers you can see on the different results/links).


      The search result is ordered based on these (same with the incomming/outgoing links) conditions:


      - API is more important then SPI

        - Concept Rank

         - Equally Rank are ordered Alphabetical


      The 'engine' currently scan both API and SPI(tho API is more useful since it actually has some JavaDoc)


      It works based on scanning the Latest.Tag of a artifact for *.java files, then parse the HEAD v. I do this to see what is in the latest release, but use HEAD to parse so we can get any textual fixes in as quick as possible and not having to wait for next release. We do risk that Concept A has slightly changed from Latest.Tag to HEAD, but for now until the JavaDoc for the concepts are up to date and as helpful as possible, I think it's more useful to be able to change the doc as we go.


      It scan all Tags to see where the first appearance of the given X.java was to determine the since attribute.


      It currently doesn't support members/methods parsing, but I do plan to extract more data about the individual APIs, especially the Annotation members as they add more description to the usage of the Concepts. Not 100% sure if we would want to extract SPI methods etc in the same way, we might just link out to the 'real' javadoc. ?


      The Reference Dictionary is exposed as a JSON Api, so anyone could fetch the data and play with it if they wanted to. e.g. included in Tools or Forge.


      Some of the other use cases of this on arquillian.org would be the Guides or Blogs, where we talk about "Deployment" without explaining in detail what that is, we can auto link to the Reference Dictionary for a in-depth description.


      Then we have the Release Notes, when releasing a artifact we can scan the Reference Dictionary for Concepts that are new in this release and add a link in the notes. (adding support for "Deprecated since" in the model we could also add notes about what has been removed). Automated API/SPI version diff.


      I've added a little "Help improve the docs!" link at the bottom that tries to do a little 'analysis' of the current data it uses to display the state of the doc. Don't use this list as a absolute of what need to be fixed, more as a possible improvments.

      A Concept will show up in that list if:

      - has a short description < 150 char

      - missing the word 'example' in the description

      - missing any incoming links

      - missing any outgoing links


      Currently all known Arquillian + ShrinkWrap + ShrinkWrap Resolver + ShrinkWrap Descriptors API/SPI artifacts are scanned.





      Duplicate Concept Names



      We have some apis that has the has the same wiki name, e.g. Event from core-api and graphene-selenium-api. Currently the backend will link to the first one found as it really has no clue which one you're actually trying to refere to. But we could add some opinionated attempts to resolve them, e.g.:

      - Assume that if one of the duplicate Concepts are from the same module as the linker, then prefer the local one.

      - Assume a API is refering to another API Concept?

      - Check if the linker is refering to the link via e.g. @LinkName and use the duplicate that is a Annotation



      Userfacing API vs Internal API



      As of now API vs SPI is just based on Artifact naming, but we probably need to distinguish between User Facing API and internal API/SPI. e.g. arquillian-core-api, while that is the API of core, a user will never see it, he sees arquillian-test-api for instance. So from a User's point of view that's all internal stuff.


      Also wondering if we should exclude internal APIs and SPIs from the default search results to avoid confusion. Rather have a check box or similar that say: Hey, I'm a Arquillian Developer, show me the good parts! We could then even reorder the results, preferring SPI over API?



      ShrinkWrap Descriptors not in source



      The generated Descriptors are not found in the git repo as they are generated during build. So e.g. WebAppDescriptor is not currently included. I'm thinking we change it a bit here and use the deployed sources jars as a source to parse. (Not quite sure what to do with since in this case, if we should download all versions to look for them)




      I've published it to a new staging site of arquillian.org, have a look: http://staging-arquilliandev.rhcloud.com/modules/reference/