Design notes for supporting managed exploded deployments

Version 18

Created by brian.stansberry on Apr 27, 2016 11:30 AM. Last modified by ehugonnet on Sep 20, 2016 8:05 AM.

Issue analysis and design document for the feature request to support managed exploded deployments.

Overview

The intent of this feature is to have equivalent runtime semantics to what WildFly/EAP current provides for unmanaged exploded deployments, but instead for deployments whose content is stored in and managed by the kernel content repository. Instead of the user needing to manipulate the local filesystem to make changes to the deployment, as is required for an unmanaged deployment, the user would invoke operations against the WildFly/EAP management API.

Background

The WildFly/EAP management layer supports deployments presented to it by the user either in the form of a single zipped archive file, or in the form of a directory structure whose structure corresponds to an unzipped version of the archive. The latter is referred to as an "exploded deployment". Exploded deployments have some advantages, particularly the ability to update static content files (e.g. .html and .css files) and have that updated content reflected in the running application without needing to redeploy the application.

WildFly/EAP also supports both managed and unmanaged deployments. With a managed deployment the server takes the deployment content and copies it into an internal content repository and thereafter uses that copy of the content, not the original user-provided content. The server is thereafter responsible for the content it uses. With an unmanaged deployment the user provides the local filesystem path of deployment content, and the server directly uses that content. However the user is responsible for ensuring that content, e.g. for making sure that no changes are made to it that will negatively impact the functioning of the deployed application.

Managed deployments have a number of benefits over unmanaged:

They can be manipulated by remote management clients, not requiring access to the server filesystem.
In a managed domain, WildFly/EAP will take responsibility for replicating a copy of the deployment to all hosts/servers in the domain where it is needed. With an unmanaged deployment, it is the user's responsibility to have the deployment available on the local filesystem on all relevant hosts, at a consistent path.
The deployment content actually used is stored on the filesystem in the internal content repository, which should help shelter it from unintended changes.

WildFly/EAP currently supports both managed and unmanaged archive deployments, and unmanaged exploded deployments. This RFE is to add support for the managed exploded permutation.

Issue Metadata

EAP ISSUE: https://issues.jboss.org/browse/EAP7-204

Other at least somewhat related JIRAs are [WFCORE-379] Give option to make the content repository browsable - JBoss Issue Tracker and [WFCORE-380] Operations to read content from the content repository - JBoss Issue Tracker

DEV CONTACTS: Emmanuel Hugonnet (primary), Brian Stansberry (secondary), Jean-François Denise (CLI), Claudio Miranda (HAL)

QE CONTACTS: Michal Jurc

AFFECTED PROJECTS OR COMPONENTS: WildFly Core kernel, CLI ([WFCORE-1594] CLI to support attached file streams - JBoss Issue Tracker), HAL (https://issues.jboss.org/browse/EAP7-583)

OTHER INTERESTED PARTIES: JBDS

Requirements

Hard Requirements

These items must be satisfied in order to have a satisfactory feature.

Ability to have equivalent runtime semantics to what WildFly currently provides for unmanaged exploded deployments, but instead for deployments whose content is stored in and managed by the content repository.
The deployment resource's management API will include a read-only boolean managed attribute which indicates whether the deployment content is managed by the server. This is a convenience to the user since the behavior of some operations will depend on whether the deployment is managed.
The deployment resource's management API will include an explode operation, which will take an archive deployment and explode it.
- This operation will fail if the deployment is not managed.
- This operation will fail if the process type is ProcessType.SELF_CONTAINED (i.e. WildFly Swarm).
- This operation will fail if the content is already exploded, unless the path parameter discussed below is defined.
- This operation will fail if the deployment is currently installed in the runtime.
  - Supporting explode + redeploy in one op is theoretically possible, but the user can get the same semantic with a 3 op batch/composite (undeploy/explode/deploy) so I see little reason to invest resources and complicate the existing deployment handling by trying to account for this case.
- The exploding will be completely "non-recursive" by default. Non-recursive does not refer to not recursing into directory trees while exploding; rather it refers to not recursing into nested zip files inside the content and exploding those as well. Some other term the non-recursive is welcome.
- A successful invocation of the explode operation will make the archived version of the deployment immediately eligible for removal from the content repository, unless there is another management resource that refers to its hash. This isn't really a requirement of this feature; it's a general requirement of the content repo in that unreferenced content is eligible for removal, and that removal can be immediate in the case where the only reference is actively deleted.
The deployment resource's management API will include a add-content operation with a non-null target-path parameter and a content parameter that is similar to the content parameter in the add operation.
- This operation will fail if the deployment is not managed.
- This operation will fail if the content is not exploded
- This operation will fail if any content between the root of the deployment and the content referred to by the path parameter is an unexploded archive.
- Unless it fails in Stage.MODEL, this operation will succeed regardless of whether the deployment is currently deployed in the runtime, and the new content will be visible to runtime services the same as if a user copied a new file into an unmanaged exploded deployment.
- This operation will have an overwrite parameter (set to true per-default) that will make the operation fails if overwrite is set to false and the target path of the content already exists.
The deployment resource's management API will include a remove-content operation with a non-null path parameter.
- This operation will fail if the deployment is not managed.
- This operation will fail if the content is not exploded
- This operation will fail if any content between the root of the deployment and the content referred to by the path parameter is an unexploded archive.
- Unless it fails in Stage.MODEL, if the deployment is currently deployed in the runtime this operation will attempt to update to update the content visible to runtime services the same as if a user removed a file from an unmanaged exploded deployment. However there are some additional semantics:
  - If the operation is unable to remove the file, (e.g. there is a file locking problem) the operation must not block indefinitely. After a fairly short time it should fail, with a Stage.RUNTIME failure, with the effect of that failure on the overall operation execution consistent with that of any other Stage.RUNTIME failure (e.g. respect the rollback-on-runtime-failure header.)
The deployment resource's add operation will allow no actual content to be provided with the op. This allows initial creation of an exploded managed deployment, with subsequent operations to add content to follow.
- This scenario must be formally specified via a new field empty in the content parameter. It's a requirement that ambiguity between a user wanting a new empty exploded deployment and just forgetting to provide content not be allowed.
- The CLI's deploy command should support this properly.
- This kind of add operation will fail if the process type is ProcessType.SELF_CONTAINED (i.e. WildFly Swarm).
- It is a requirement that there be clean semantics if the user attempts to deploy an empty exploded deployment into the runtime. Either:
  - The deployment succeeds, but does nothing, similar to deploying an empty file or a text file that no DUP converts into runtime services.
  - Or, preferably, but nice to have, deployment fails with a clear, targeted failure message.
  - Failing uncleanly, e.g. with an NPE in some DeploymentUnitProcessor, is not acceptable.
The deployment resource's management API will include a read-content operation with a non-null path parameter. The return value will be the index of an associated stream from which the content at the given path can be read.
- This operation will fail if the deployment is not managed.
- This operation will fail if the content is not exploded
- This operation will fail if any content between the root of the deployment and the content referred to by the path parameter is an unexploded archive.
- This operation will fail if the content referred to by the path parameter is a directory.
- The return value will be the content that is present in the content repository, which will not necessarily be equivalent to the content installed in the runtime.
  - For example, if an update-content or remove-content operation previously failed in Stage.RUNTIME, but the rollback-on-runtime-header=false header was set, the content repository content will not reflect the current runtime content.
- The read-content operation must be handled as a read-only operation but it also must not be interfered with by any content repository activity (writing or deleting files). This means the existing mechanism of locking the operation out of the repository using the ModelController's exclusive write lock is insufficient; some additional safeguards within the repository itself are probably needed. (The exclusive MC lock is probably still fine for preventing conflicts between gc and content writes, though.)
It is a requirement that unreferenced content be removed from the content repository by the background garbage collection processing, with semantics consistent with other content garbage collection. It should take one gc pass to identify that a piece of content is unreferenced (no matter how deeply nested it was in an exploded deployment) and then a second pass to remove the item. (If the last sentence is inaccurate in some way regarding how gc works, please assume the intent is not to require a change in content repo gc.)
Timestamps. Operations which create new deployment content in the repo must support a mechanism for associating a timestamp with that item, which then becomes the timestamp that runtime services see for the file.
- For operations that take a content param, perhaps this can be done via a new timestamp field for that param.
- If no timestamp is provided, the timestamp is the execution time of the operation.
- The explode operation must preserve the per-file timestamp information contained in the zip being exploded (i.e. ZipEntry.getTime()) and use it for the exploded items.
- If a piece of content with a given hash is used in more than one deployment, the timestamp associated with that deployment must be that provided by the most recent operation that explicitly set a timestamp. Not the timestamp with the largest (i.e. most recent) value; the most recent operation.
  - However, only operations that explicitly set a timestamp are relevant. If an operation submits an undefined timestamp parameter or field, any existing timestamp should not be changed.
All functions must work properly in a managed domain.
- The explode, add-content and remove-content operations must propagate to all slave HCs, but can be ignored by slaves that are ignoring the target deployment.
- The explode, add-content and remove-content operations must propagate to all running server's whose server group has the deployment mapped.
The ability for the CLI to associate streams with the add-content and update-content operations is a nice-to-have. Doing so would require new CLI commands or changes to the existing deploy command.
Deployment overlays take precedence in the runtime over the exploded managed content the same as if the deployment was unmanaged or a managed archive. The add-content and update-content operations must not result in an overlay no longer taking precedence. The effect of remove-content when the content being removed has an overlay needs investigation. My inclination is to treat reacting to this as a nice-to-have as it's a corner case and there is a workaround where the user can first do remove-content and then remove the overlay.
HAL support. See Design Notes for Managed Exploded Deployment in HAL
Update the Controller-client module to provide support for these operations:
- being able to add an empty deployment,
- being able to add an exploded deployment,
- being able to explode a deployment,
- adding streams of data,
- removing contents

Nice-to-Have Requirements

These items are not required but would be nice to have if possible. Before the design phase completes it should be decided a) whether dev intends to meet the requirement, b) whether QE will test the requirement and c) what will happen if the requirement is met but not tested.

Support for some sort of recursion control param(s) to the explode operation that result in exploding the nested zips as well. This one is a nice-to-have that borderlines on a hard requirement, but it's something that could be deferred if not doable within a particular release's dev cycle.
The explode operation can take an optional path parameter, the value of which is a path within the deployment that should be exploded.
- The use case for this is scenarios like an exploded ear that contains an unexploded war, and then the user later wishes the war to be exploded.
- This is much closer to a hard requirement if the nice-to-have requirement for recursively exploding is not supported.
- This operation will fail if the content referred to by the path parameter is already exploded or is not a zip file.
- This operation will fail if any content between the root of the deployment and the content referred to by the path parameter is an unexploded archive.
  - So, /deployment=foo.ear:explode(path=/thewar.war/WEB-INF/lib/thejar.jar) would fail if thewar.war hadn't previously been exploded.
Instead of failing if the read-content operation is invoked for archived content or with a path parameter indicating a path inside archived child content, instead extract the content from the archive and return it.
Support for a browse-content operation that would return a tree of all the files / directories with the relative specified path parameter for a depth parameter and maybe an archive parameter that would only list expendable files. This could help completion for the CLI as well as cover related issues. The browse-content operation will return a list of a complex type that will provides some informations about the path. It will have:
- a relative-path attribute : the relative path to the content for the specified browsing start point.
- a file attribute of type boolean to indicate if it is a file or a folder
- a size attribute (we won't compute the size of folder at the moment)
(Very low priority) Instead of failing if the read-content operation is invoked with a path parameter that points to a directory, instead create a zip from the directory tree at that location and return it.

Non-Requirements

These are items that are explicitly not required.

Additional runtime features beyond those currently provided by unmanaged exploded deployments are not a requirement of this design. For example, dynamic class reloading is out of scope.
For any of add-content and remove-content , if the deployment is deployed into the runtime when the operation is executed it is not a requirement of this feature that the server detect or react to any negative effect on the application. Using these operations with a deployed deployment is done at the user's own risk. This is consistent with the treatment of unmanaged exploded deployments.
Exposing individual pieces of exploded content as a management resource is not a requirement. (A reason for doing so would be to expose additional metadata beyond the content bits available via the read-content operation, e.g. timestamps.)
It is not a requirement that execution of a management operation that results in some piece of content no longer being referenced from the configuration model (directly or indirectly) also results in synchronous removal of that content from the repository. Removal can be deferred until a garbage collection task occurs. Note however that synchronous removal is not an anti-requirement; i.e. if it can be properly implemented it is fine.
It is not a requirement that operations which simply upload content to the repo without themselves associating it with a deployment provide a way to store a timestamp, so long as by the time the content is associated with a deployment the user has a way to specify the timestamp.
It is not a requirement that identical content be shared between deployments within the content repository (i.e. in order to save disk space.) For example, if two ears both contain an indentical foo.jar in their lib directory, it is not a requirement to only keep one copy of foo.jar.

Design Details

Design information should be added to this section once the analysis phase is complete.

Installation of the exploded content into the runtime will consist of creating a standalone/tmp/managed-exploded/foo.war dir and then copying the content out of the repo into that dir. Deployment will then proceed as standalone/tmp/managed-exploded/foo.war was the location of an unmanaged exploded deployment. This is simple and provides the required semantics that managed and unmanaged exploded deployments behave consistently in the runtime.

The following are some initial design thoughts which are no longer completely relevant. They are kept for the record or in case some of the verbiage can be repurposed.

Terms

ContentRepoItem -- an item in the content repository. Can represent a file or directory. If a file, it can either be the entire content of a non-exploded deployment, or a leaf in an exploded deployment.

Details

I expect runtime execution of the add-content, remove-content and update-content ops will consist of manipulating the contents of the standalone/tmp/managed-exploded/foo.war dir.
All ContentRepoItems will be represented by a directory in the content repo, with the path the directory derived from a SHA-1 hash an InputStream provided by the ContentRepoItem. The path will consist of a directory whose name is the first two chars of the string representation of the hash, and then a subdirectory whose name is the remaining chars. (This is how things are already done; this point is really just restating the existing design and suggesting that it continue with the new types of content.)
The InputStream used for hashing depends on the type of the ContentRepoItem:
- If it is represents a non-directory file (non-exploded deployment or a leaf in an exploded deployment) the stream is a FileInputStream from the file.
- If it represents a non-empty directory, the stream is a SequencedInputStream, with the sequence consisting of another SequenceInputStream for each child, ordered by child name. That inner SequenceInputStream will consist of two streams, one ByteArrayInputStream containing the String name of the child, the other being whatever stream the ContentRepoItem for the child provides. The effect of this is the hash will reflect both the content of the children and the names in the parent under which that content is hashed. This ensures that two directories, each of which includes an identical file, but one has it in a child named "a" and the other in a child named "b", will hash differently.
- If it represents an empty directory, the stream will be a kind of "null input stream", i.e. one that always returns -1 from read().
Exploding a file requires hashing each ContentRepoItem, leaves and all directories. Updating an exploded deployment requires hashing the new content (if not a remove) and updating the hash of all parent directories. In both cases to make this efficient a large SequenceInputStream will be created that contains the entire tree of streams returned by the tree of ContentRepoItems, in a reproducible order. For each ContentRepoItem that needs a new hash value, the stream returned by the ContentRepoItem will be wrapped in a DigestInputStream and a ref to that DIS retained. The final resulting SequenceInputStream will be read fully, with the result that all child streams will be read once and only once, with the reads passing through the DigestInputStreams if present. Then the digest can be read from each DigestInputStream and the result used in creating and updating the persistent form of the ContentItemNodes.

Decomposition

This work doesn't seem highly decomposable into pieces that can worked in parallel, but the following are some conceptual pieces that can perhaps allow parallel work or more likely handing off of the work from one person to another.

Basic content repo changes to support exploding and updating of exploded content.
Management API changes, new handlers. Most likely this and the basic content repo changes would be done hand in hand as the handlers require the content repo and the management API is a simple way to experiment with the effect of the content repo changes.
Installation of the exploded content into the runtime and update thereof.
Content repo garbage collection.
CLI deploy command changes
Web console changes.

JBossDeveloper