1 2 Previous Next 27 Replies Latest reply on Feb 22, 2010 5:27 PM by alrubinger

Proposal: tree structure for Archive

germanescobar Feb 6, 2010 3:57 PM

This is a proposal of a tree structure for the Archive. I've done some research and made basically a simplified version of the JSR 170/283 (Content Repository API for Java). So, this is what I've come up with:

We would need the following new API interfaces:

Item: is a Node or an Asset. You can check if its a Node using the Item.isNode().

Node: is a directory. I didn't use the directory term as an Archive is also a Node. You can check if the Node is an Archive using the Node.isArchive().

This change would make the structure much more consistent. It will be also easier to transverse the tree using recursion. For example (haven't tried it yet):

public void format(Node node, StringBuffer sb) {
     Collection<Item> items = node.getChildren();
     for (Item item : items) {
          sb.append(item.getName() + SEPARATOR);
          if (item.isNode() {
               format(Node.class.cast(item));
          }
     }
}

Addition and access to the items won't be as trivial as it is right now but it still nothing too complex. For example, to add an Item:

public void add(Item item, String path) {
     List<String> resolvedPath = ...; // tokenize
     
     // transverse the items until we find the deepest node of the path
     Node node = null;
     for (String path : resolvedPath) {
          Item i = obtainNode(path); // helper method that will create the node if nonexistent
          node = Node.class.cast(get(path));
     }
     
     node.add(item, path);     
}

ArchivePaths and ArchivePath would remain the same. We would need to do something about Archive.getContent() as it returns a Map<ArchivePath,Asset>.

1. Re: Proposal: tree structure for Archive

alrubinger Feb 6, 2010 5:38 PM (in response to germanescobar)

Nice docs.

I suppose my first question is "why do we need to change anything in the current user API"? From our discussions, I thought the main intent was to organize the internal structure of archives in a way that the content could be more easily traversed given some Node. So in essence this would primarily replace MemoryMapArchiveBase with a new implementation. And then we could hide things like "Node" and "Item" in the SPI component.

germanescobar wrote:
Item: is a Node or an Asset. You can check if its a Node using the Item.isNode().
Let's avoid this if we can? IMO when you need to check capabilities like this, or throw "UnsupportedOperationException", it's a sign to evaluate the design a bit. Same for Node.isArchive().

I might suggest instead:

Set<Node> Node.getChildren()

If this returns an empty Set, it's an empty directory. If it returns bunch of Nodes, those can either be have subdirectories or Assets.

In other words, why do we need both Item and Node? At first glance I think of a "Node" as a container which can have a Parent and Children, and Asset can be a type of Node. The "ArchivePath" of an Asset is then the names of all parents, tokenized by "/".

S,
ALR
Actions
2. Re: Proposal: tree structure for Archive

alrubinger Feb 6, 2010 5:31 PM (in response to alrubinger)

Correction: "Asset" cannot be a type of Node. It'd inherit stuff hidden to the user. The Node can "contain" an Asset, however.

S,
ALR
Actions
3. Re: Proposal: tree structure for Archive

lightguard Feb 6, 2010 10:50 PM (in response to alrubinger)

I agree with Andrew here. I know we're still in alpha, but I know there are a few blogs out there and other people that are using Shrinkwrap, we should probably try to avoid completely changing the user's API to something vastly different.
Actions
4. Re: Proposal: tree structure for Archive

alrubinger Feb 6, 2010 10:53 PM (in response to lightguard)

FYI my reasoning isn't that we shouldn't break the API at this point. We only get to be in alpha once per major revision, so any changes that need to be made I think we should flush out now. We have *no* backwards-compat requirements at the moment. Once we go beta, that luxury goes out the door.

I raised the point only because I haven't seen any user responses wishing for a change in API, leading me to think that we're doing something right and that the tree-structure backend can be kept to internals/SPI only.

S,
ALR
Actions
5. Re: Proposal: tree structure for Archive

germanescobar Feb 8, 2010 10:00 AM (in response to alrubinger)
I had a really interesting talk with Andrew through #jbosstesting on freenode about this subject. I want to summarize that conversation here so we can all discuss it (Andrew, please correct me if I'm wrong or missing something).

Item interface and isNode/isArchive methods

These are two different concerns that are related here:
The need of the Item interface.
The methods isNode/isArchive could be removed (I agree that this kind of checking is a bit of a hack).

Starting with the second point, we need to know, somehow, if we are dealing with a Node, an Archive or an Asset while transversing the tree. I'm assuming that knowing if a Node is an Archive is a good idea b/c you could manipulate archives inside other archives.

The idea that Andrew propossed (again, correct me if I'm wrong) is to leave only a tree of Node objects with the following methods (the Item interface would dissapear):

getNodes() - returns child nodes.
getAssets() - returns the assests that are inside the node.
getArchives() - returns the archives that are inside the node.

In this case, the Node interface would have a lot of more methods because there is not a consistent way of adding/removing items and you would have to call the add/removeNode, add/removeAsset and add/removeArchive independently).

Basically, the idea of the Item interface was to have a consistent, unified way of manipulating things inside an archive. However, it also adds information to the archive and asset that they don't need to know (like the name, path, etc.). So, I definitely like Andrew's propossal if we could somehow unify the Archive and Asset interfaces inside Node (so we don't have methods to manipulate them independently). Maybe Archive extends Asset? We wouldn't need the ArchiveAsset anymore.

Changing the current API

The other subject we discussed was if we need to change the current API (you can also see previous posts from Jason and Andrew). The main problem here lies in the Archive.getContent() method that returns a Map<ArchivePath,Asset>. I think we haven't seen any responses to change this, mainly because the most used use cases involve the creation of the archive, not its manipulation. However, we have already seen this need multiple times inside the group (I have in my head the full formatter Jason did and the JdkZipExporterDelegate). So, I definitely think users will start asking for this in the future. I also agree with Jason that we shouldn't be changing the API everytime we want. However, I think we can find a balance here.

Currently, a new Asset called DirectoryAsset was created to return empty directories which was a hack. The thing is that we need to be consistent here, meaning that we should return the full tree hierarchy or non (return all the directories, not only the empty ones). The idea that Andrew propossed, and that I support, is to return null as the value of the map entry if it is a directory. That way we can still always support that method. That doesn't mean that we can't add another method to return the direct childs of the Archive. For example, if Archive extends Node, it will inherit those methods to transverse the archive and anyone could use them.

Sorry for the lengthy comment. However, I think it's a really important subject to address that still needs some more discussion before making any decision.
Actions
6. Re: Proposal: tree structure for Archive

alrubinger Feb 8, 2010 1:37 PM (in response to germanescobar)
germanescobar wrote:
These are two different concerns that are related here:
The need of the Item interface.
The methods isNode/isArchive could be removed (I agree that this kind of checking is a bit of a hack).

Starting with the second point, we need to know, somehow, if we are dealing with a Node, an Archive or an Asset while transversing the tree. I'm assuming that knowing if a Node is an Archive is a good idea b/c you could manipulate archives inside other archives.

I think ideally, we should always be dealing with a Node. Node is then the container type for traversal, and it can optionally contain an Asset. If no Asset, the "Node" is just a directory.

This opens the problem again: ArchiveAsset. "Archive" is kind of a special type of Asset, and I think that's a standing mismatch. Archives can't support "openStream" unless we encode them with some format (like ZIP, or our own thing that wouldn't require compression).

Right now, we comply with the "openStream" contract by *always* sending over ZIP. Which means that if we ever support TAR.GZ, for example, roundtripping a nested archive structure will result in the nested archive getting written out as ZIP, not the TAR.GZ from which it was imported.

So I think what we've gotta do is:
Take the existing "ArchiveAsset" impl and make it "ZipArchiveAsset"
Make a new API Archive type: "ZipArchive". Which means that any export it does will be as "ZIP" format.
And also ZipArchiveImpl
Change "archive.add(ArchivePath,Archive) to instead accept "ZipArchive".
It means the user must be explicit about the encoding mechanism used to support nested archives. This is needed to preserve the proper format when roundtripping.
ZipImporter.import should return "ZipArchive" now

Now we'll always be able to represent archives as regular Assets with no special handling. ZipArchiveAsset knows the encoding type to use, and it opens the door for TarGzArchiveAsset / TarGzArchive in the future.

With all that in place I *think* we can then simply have Node.getContent(), which is always of type Asset. (No checking to see if it contains an Archive or an Asset). Or is there a case I haven't covered here?

germanescobar wrote:
The idea that Andrew propossed (again, correct me if I'm wrong) is to leave only a tree of Node objects with the following methods (the Item interface would dissapear):

getNodes() - returns child nodes.
getAssets() - returns the assests that are inside the node.
getArchives() - returns the archives that are inside the node.

If we take the above into account, I think we can have:

interface Node { Set<Node> getChildren(); // Subdirectories Set<Asset> getContents(); // Stuff in this directory only, or call it "getAssets();" }

germanescobar wrote:
For example, if Archive extends Node, it will inherit those methods to transverse the archive and anyone could use them.

"Archive", an API element, can't extend (or be a type of) "Node", an SPI element. For traversal inside the implementation, an SPI "ArchiveProvider" might be the hook in:

interface ArchiveProvider { Node getRoot(); }

germanescobar wrote:
Sorry for the lengthy comment. However, I think it's a really important subject to address that still needs some more discussion before making any decision.

Ah, don't apologize for being thorough!

Thoughts?

S,
ALR
Actions
7. Re: Proposal: tree structure for Archive

germanescobar Feb 9, 2010 8:24 AM (in response to alrubinger)
ALRubinger wrote:

I *think* we can then simply have Node.getContent(), which is always of type Asset. (No checking to see if it contains an Archive or an Asset). Or is there a case I haven't covered here?

Maybe the case in which you want to add content directly to an archive inside another archive? For example:

earArchive.add(asset, "test.jar/org/...");
Actions
8. Re: Proposal: tree structure for Archive

alrubinger Feb 9, 2010 9:14 AM (in response to germanescobar)
I think we can get away with either:

Requiring an import of that archive to another view first
Detecting the runtime type and doing the import for the user transparently

S,
ALR
Actions
9. Re: Proposal: tree structure for Archive

aslak Feb 9, 2010 9:43 AM (in response to germanescobar)
I just want to add a couple of my ideas to how it could work with out going to deep into the details of impl.

The simple case is Path and Asset. A Path is a chain of nodes that result in a Asset/Content.
On the other hand, a Archive is both Node and Content.

A Archives can be libraries or modules, added from File/URL/ClassLoaderResource or other ShrinkWrap Archives.
A Archive can be imported, exported, merged and contain nested Archives.
A Archive can be of any type; exploded, zip, tar etc and can contain nested Archives of any type exploded, zip, tar..
A nested Archive should be able to be exported in the same fashion it was imported.
It should be possible to reach Paths/Nodes inside a nested Archive.
Most Asset types we have now are handled in a lazy fashion, and Archives shouldn't be any different.

So with these requirements I'm thinking:

A added Archive should not be imported on add, but rather have a ref to its source. If the user attempts to Read the Archive, we should import its structure, but the original Archive is still unchanged so on a export we can export the original ref. If the user attempts to Write to the Archive, then we can mark it as 'dirty' and a complete export is needed.

If a Zip Archive contains a Tar, we can add a Asset to the Tar, and on export of the parent Archive the nested Archive is still exported as a Tar. Same goes for adding of a Exploded Archive with nested Zip Archives, adding a Asset to the nested Zip Archive and exporting the parent should result in a Exploded directory with Zipped Archives. and same for Exploded directories inside Zip Archives for instance..

Examples
{code}
// This will simply result in a stream copy from FileA to FileB, nothing else is needed
Archives.create("some.jar, Importer.class).import(FileA).as(Exporter.class).exportFile(FileB)

// This will export to FileB, but FileA streamed from File
Archives.create("some.jar, WebArchive.class)
   .addLibrary(FileA)
   .get(ArchivePaths.create("/FileA/test.txt"))
   .as(Exporter.class)
   .export(FileB)

// This will export to FileB and FileA is recreated on export due to the add
Archives.create("some.jar, WebArchive.class)
   .addLibrary(FileA)
   .add(ArchivePaths.create("/FileA/test.txt"))
   .as(Exporter.class)
   .export(FileB)
{code}

WDYT?
Actions
10. Re: Proposal: tree structure for Archive

aslak Feb 9, 2010 9:59 AM (in response to aslak)

oh, and I forgot..

You should be able to tell the Exporter that you want all nested Archives exported as the parent.
That means you can have Zips inside a Exploded Directory that gets exported as Exploded or Tars inside Zips that get exported as Zips etc..
Actions
11. Re: Proposal: tree structure for Archive

alrubinger Feb 9, 2010 10:42 AM (in response to aslak)

aslak wrote:

The simple case is Path and Asset. A Path is a chain of nodes that result in a Asset/Content.
On the other hand, a Archive is both Node and Content.
A Path is a address; a representation of a Node and hierarchy.

An archive *has* a Node (which is the root for all content). Also, a Node may contain an archive (in the case of nested archives). But in my proposals above, an Archive is not a Node, nor is a Node an Archive.

aslak wrote:
A added Archive should not be imported on add, but rather have a ref to its source. If the user attempts to Read the Archive, we should import its structure, but the original Archive is still unchanged so on a export we can export the original ref. If the user attempts to Write to the Archive, then we can mark it as 'dirty' and a complete export is needed.

Agree that an archive should not be imported on add. But I don't think we should do any automounting at all; if the user wants to mutate the archives contents, he/she should import and add the result.

As we've seen with the recent VFS work, autoimporting/mounting opens the door to much more complex, and sometimes unsolvable issues. Starting with marking as "dirty" etc.

S,
ALR
Actions
12. Re: Proposal: tree structure for Archive

aslak Feb 9, 2010 12:10 PM (in response to alrubinger)

Do you have any ref to the issues they are/were having? mailinglist/forum/jira....

.. curious ..
Actions
13. Re: Proposal: tree structure for Archive

germanescobar Feb 12, 2010 6:50 AM (in response to aslak)
This is what I've done so far:

2 new interfaces in the org.jboss.shrinkwrap.spi package:
Node - it has a getAsset() method that if returns null, we know it is a directory.
NodeArchive - with a single getRoot() method that returns a Node.
The implementations in the org.jboss.shrinkwrap.impl.base package:
NodeArchiveBase
NodeArchiveImpl - which implements NodeArchive.
NodeImpl - which implements Node

Things I would like to change soon:
I'm still using type checking to see if an Asset is an ArchiveAsset (and be able to work with nested archives).
The NodeArchiveBase.getContent() method, which returns a Map<ArchivePath,Asset>, is returning the assets and empty directories only (as a DirectoryAsset) to comply with the actual tests. We said that it should return all directories (empty or not) as null.

I still don't understand how clients are supposed to use the SPI NodeArchive interface as casting is not possible (the archive could have been assigned to an extension). For example, for a Formatter:

public String format(final Archive<?> archive) throws IllegalArgumentException { // how am I going to cast to NodeArchive? }

Because of this same question, I'm having trouble when working with nested archives.
Actions
14. Re: Proposal: tree structure for Archive

aslak Feb 12, 2010 7:32 AM (in response to germanescobar)

Formatter is API while Node is SPI, so strictly speaking it shouldn't know about it.

The Formatters can use Archive.getContent(), as they do today?
Actions

1 2 Previous Next

Go to original post