2 Replies Latest reply on Sep 17, 2012 10:39 PM by Andrew Rubinger

    Immutable archives

    Dan Allen Master

      As concurrency and parallelism enter mainstream programming as a result of multiple cores being available on commodity hardware, immutability--one of the building blocks of functional programming--is more vital than ever. When uncoordinated multable operations are permitted, it can lead to all sorts of bugs and unintended consequences. In the Clojure world, these pitfalls are referred to as incidential complexity. Working with immutable values eliminates these sorts of problems and is thus a powerful simplifying force.

       

      As I study Clojure, I quickly recognize that a ShrinkWrap archive is a prime candidate to be an immutable data structure. The current archive implementation is a mutable map, which I think we need to reevaluate.

       

      Immutable data structures doesn't mean that change isn't possible, or that all values must be provided in the constructor (which is sort of what I thought until I studied a functional programming language like Clojure). It just means that when change occurs, it is only applied to the return value of the operation, which is a reference to a new data structure.

       

      Does that mean the archive has to be cloned on every operation? Certainly not. Cloning the object would satisify the contract, of course, but it would be a terribly inefficient and naive strategy. Since I was that naive, I wrote it off long ago as being a nice idea, but something that wouldn't work in practice. That's only because I was missing the other half of an equation.

       

      Where there is immutability, there is likely persistence (not the database type of persistence, but rather the data structure type). To achieve immutability without sacrificing performance and memory, you use persistent data structures that implement structural sharing. That is, they never perform deep copies to satisfy an operation. Instead, only the portions of the data structure affected by a change are swapped out, while references are retained to those parts that are uninvolved. On the surface it appears as though a clone has happened, but behind the scenes it looks like a series of patches applied to an original source.

       

      With a data structure that is immutable and persistent, change is easy, but not dangerous.

       

      Here's how this would affect ShrinkWrap. As of today, the archives are mutable. Thus, you would expect this behavior:

       

      JavaArchive jar = ShrinkWrap.create(JavaArchive, "archive.jar").addClass(org.example.ClassA.class);
      jar.addClass(org.example.ClassB.class);
      System.out.println(jar.toString(true));
      

       

      archive.jar:
      /org/example
      /org/example/ClassA.class
      /org/example/ClassB.class
      

       

      Here's how it would work if archives were immutable:

       

      JavaArchive jar = ShrinkWrap.create(JavaArchive, "archive.jar").addClass(org.example.ClassA.class);
      JavaArchive jar2 = jar.addClass(org.example.ClassB.class);
      System.out.println(jar.toString(true));
      

       

      archive.jar:
      /org/example
      /org/example/ClassA.class
      

       

      The archive referenced by jar remains unaffected by the second addClass() operation. We would only get the output from the first example if we printed the contents of jar2. Clearly, though, we need open up access to the archive name (which, mind you, wouldn't change the archive, but rather get applied to the new archive that is returned).

       

      I believe this change also makes the ShrinkWrap API easier to understand and it simplifies the task of reusing a base archive, which we see often in Arquillian test suites.

       

      One way to implement this behavior would be to use the pcollections library, which provide persistent versions of the interfaces in the Java collections API.

       

      If you don't understand the benefit of this proposal, I encourage you to read at least the first chapter of Clojure Programming (which is free). The ShrinkWrap model is well suited for the style of API that functional programming espouses.

        • 1. Re: Immutable archives
          Jason Porter Master

          Honestly, I'd prefer to have another method on Archive that would "finalize" a jar. Technically in your example you would have three jars as jar would simply be an empty archive, a jar2 would contain ClassA and jar3 would be jar2.addClass(org.example.ClassB.class) and contain both ClassB and ClassA.

           

          What your proposing would work fine I think. It would create a lot of intermediary archives which will be GC'd fairly quickly but I don't think that's really an issue.

          • 2. Re: Immutable archives
            Andrew Rubinger Master

            Ha, is this a request for immutable archives, or a lecture on the benefits of immutable types?

             

            Yes, immutability is the first line of defense when dealing with concurrency.  That's the end goal here anyway, if you're talking about taking advantage of N cores.  So access to shared mutable state has to be protected when passing these things around.

             

            But there's a reason we didn't bother to make ShrinkWrap Archives thread-safe from the get-go: for memory constraints, we DO NOT hold byte content when it's added.  Meaning that when you add URL, File, or InputStream content, the bytes aren't read on "add", but rather only when the archive is read out (be in in export or other direct "get" operations).  Otherwise we'd have to hold the entire contents in memory, and adds would register much more slowly.

             

            So even if we did lock down immutability in the container Map which holds the VFS, the content could always be swapped out under the hood anyway by manipulating a File after it's added, etc.

             

            Plus the first design goal of ShrinkWrap is to be a friendly, mutable type anyway.  So what's the concrete use case for needing an immutable structure to hold the VFS info, aside "good practice" and conjecture?

             

            S,

            ALR