Immutable archives
dan.j.allen Aug 10, 2012 1:41 AMAs concurrency and parallelism enter mainstream programming as a result of multiple cores being available on commodity hardware, immutability--one of the building blocks of functional programming--is more vital than ever. When uncoordinated multable operations are permitted, it can lead to all sorts of bugs and unintended consequences. In the Clojure world, these pitfalls are referred to as incidential complexity. Working with immutable values eliminates these sorts of problems and is thus a powerful simplifying force.
As I study Clojure, I quickly recognize that a ShrinkWrap archive is a prime candidate to be an immutable data structure. The current archive implementation is a mutable map, which I think we need to reevaluate.
Immutable data structures doesn't mean that change isn't possible, or that all values must be provided in the constructor (which is sort of what I thought until I studied a functional programming language like Clojure). It just means that when change occurs, it is only applied to the return value of the operation, which is a reference to a new data structure.
Does that mean the archive has to be cloned on every operation? Certainly not. Cloning the object would satisify the contract, of course, but it would be a terribly inefficient and naive strategy. Since I was that naive, I wrote it off long ago as being a nice idea, but something that wouldn't work in practice. That's only because I was missing the other half of an equation.
Where there is immutability, there is likely persistence (not the database type of persistence, but rather the data structure type). To achieve immutability without sacrificing performance and memory, you use persistent data structures that implement structural sharing. That is, they never perform deep copies to satisfy an operation. Instead, only the portions of the data structure affected by a change are swapped out, while references are retained to those parts that are uninvolved. On the surface it appears as though a clone has happened, but behind the scenes it looks like a series of patches applied to an original source.
With a data structure that is immutable and persistent, change is easy, but not dangerous.
Here's how this would affect ShrinkWrap. As of today, the archives are mutable. Thus, you would expect this behavior:
JavaArchive jar = ShrinkWrap.create(JavaArchive, "archive.jar").addClass(org.example.ClassA.class); jar.addClass(org.example.ClassB.class); System.out.println(jar.toString(true));
archive.jar: /org/example /org/example/ClassA.class /org/example/ClassB.class
Here's how it would work if archives were immutable:
JavaArchive jar = ShrinkWrap.create(JavaArchive, "archive.jar").addClass(org.example.ClassA.class); JavaArchive jar2 = jar.addClass(org.example.ClassB.class); System.out.println(jar.toString(true));
archive.jar: /org/example /org/example/ClassA.class
The archive referenced by jar remains unaffected by the second addClass() operation. We would only get the output from the first example if we printed the contents of jar2. Clearly, though, we need open up access to the archive name (which, mind you, wouldn't change the archive, but rather get applied to the new archive that is returned).
I believe this change also makes the ShrinkWrap API easier to understand and it simplifies the task of reusing a base archive, which we see often in Arquillian test suites.
One way to implement this behavior would be to use the pcollections library, which provide persistent versions of the interfaces in the Java collections API.
If you don't understand the benefit of this proposal, I encourage you to read at least the first chapter of Clojure Programming (which is free). The ShrinkWrap model is well suited for the style of API that functional programming espouses.