Issue analysis for the request to allow registration of operations that do not affect configuration in the "profile" section of the domain-wide management resource tree.
Overview
A WildFly managed domain uses the "profile" section of the resource tree to centrally manage the subsystems that are used on domain servers that are configured to use that profile. This central management has focused on ensuring all servers using the profile are using the persistent configuration values stored in domain.xml, with the DC rolling out changes to the persistent configuration to all the servers. But, conceptually, the DC could also roll out requests that do not involve configuration, e.g. a request that all servers running the profile flush a connection pool. We have historically not allowed such usage as a matter of policy, even though the software could allow it. This task is about changing that policy.
The basic issue here is considering and addressing the implications of the policy change.
This issue is solely about whether this policy should be relaxed and any changes to the kernel that would be made in order to allow this policy to be relaxed. It is not about any particular set of operations a subsystem may wish to register against profile resources if the policy is relaxed. If the policy is relaxed, a subsystem taking advantage of that is implementing a feature and that feature should be independently analyzed and developed. It is not necessarily a good idea for a subsystem to expose a runtime operation on the profile tree resource. Doing so means the operation may be executed on many servers, quite likely concurrently. If that operation has independent effects on each server this is probably not an issue. But if each server is updating some shared external resource, then the subsystem author needs to consider whether that shared resource will properly handle multiple concurrent updates.
Issue Metadata
EAP ISSUE: https://issues.jboss.org/browse/EAP7-285
RELATED ISSUES:
[WFCORE-389] Alllow non persistent configuration(runtime) changes for server groups and domain using CLI - JBoss Issue T…
DEV CONTACTS: Brian Stansberry
QE CONTACTS:
AFFECTED PROJECTS OR COMPONENTS: WildFly Core kernel
OTHER INTERESTED PARTIES: Elytron, mod_cluster
Background
There are three basic things a WildFly management resource does:
- Maintains a chunk of persistent configuration, reading the values from disk on boot, storing them in memory, making them available to users, and persisting changes back to disk.
- Manages a set of runtime services, typically MSC services.
- In a managed domain, acts as an execution point for coordinated rollout of operations to all of the relevant processes in a domain. For domain-wide resources (i.e. those persisted to domain.xml), the Domain Controller ensures that the operation is rolled out to all Host Controllers and to all relevant servers. For some host-specific resources (i.e those persisted to a host.xml) the Host Controller ensures that the operation is rolled out to all servers managed by that HC. So, a user executing an operation against one of these resources can conveniently have the effect of that operation realized on all the relevant processes in the domain.
Not all operations against a resource need do all of these things. In particular, an operation may not affect persistent configuration, but instead only result in some change to the resource's runtime services. Such an operation is a runtime-only operation.
The /profile=* portion of the WildFly management resource tree is a domain-wide portion of the tree, managed by the Domain Controller to ensure that all operations executed against those resources are executed by all slave HCs and all servers that use the profile. For these resources, we do not allow #2 above; i.e. these resources are not allowed to manage any runtime services. There is no intention to change this. However, along with disallowing #2 above, we also disallowed #3 for runtime-only operations. However, just because an operation neither affects persistent configuration nor is allowed to affect runtime on the HC, that doesn't mean it can't benefit from coordinated rollout across the domain.
Why no Runtime Services on the Profile Resources?
There are a number of reasons we do not allow profile resources to manage runtime services. Before getting into those, it's important to remember that the profile=*/** resources only exist on Host Controller processes (including of course the master Domain Controller.) We're not talking about the /host=x/server=y/subsystem=z resources here.
So why can't these resources manage runtime services?
- Convenience. Extensions generally register the same set of resource definitions under the profile=* tree on the HC and on servers. This is quite convenient. The standard code provided by the kernel makes it quite easy for these extensions to avoid trying to execute runtime on the HC. Since in the vast bulk of use cases there is no benefit to a runtime service on the HC, it is important to keep this convenient.
- Likely source of bugs. Runtime service management that works well on a server will almost certainly fail on an HC, because on an HC the same subsystem resource can appear in multiple profiles. If the subsystem does not account for this, almost certainly there will be service name conflicts. So service handling code written for a server resource cannot be applied to the related profile resources.
- Unclear execution semantics. If we allowed runtime service management for profile resources, all hosts in the domain would need to manage those services. This would not be a "DC only" thing. If it were a DC-only thing, how are cases like DC failover to be handled?
- More difficult subsystem generation. We really need to make it easier to generate subsystems and eliminate much of the boilerplate currently involved. Adding the potential for runtime services on the profile resources to the mix makes this already difficult task even more difficult.
While addressing these issues is conceptually possible, doing so is outside the scope of this work. There is no intention to support runtime services behind profile resources as part of this work. To the contrary, the intent is to more strictly enforce prevention of such, in order to prevent subsystem authoring mistakes.
Management Operation Workflow
All WildFly management operations the following basic workflow:
- Any changes made by the operation to the target resource's in-memory management model are performed by the handler for the operation.
- The handler for the operation decides if the operation needs to make any changes to the local process' runtime services. If so, a step to do that is added.
- That step updates runtime services.
- Service container verification checks are performed by the kernel.
- If the operation affects multiple processes in a domain, domain rollout occurs
- DC sends the op to all slave HCs, which locally repeat steps 1-4.
- All HCs analyze which of their servers are affected by the op and send back to the HC information about what operations it needs to invoke on which servers to make the change
- DC formulates a rollout plan for the op, deciding in which order to invoke on which servers
- Each server receives the op and locally repeats steps 1-4
- If all is well, configuration changes are persisted on the DC.
- If the operation affects multiple processes in a domain, the DC instructs all slaves and affected servers to commit the change
There are a couple of aspects of this that are relevant to the question of runtime-only ops on "profile" resources:
- Operation handlers for a /profile=*/subsystem=* resource should never make runtime changes while executing on a DC or slave HC. That is, item 2. above should never add a step, item 3. doesn't happen and item 4. isn't necessary
- A "runtime-only" operation is one that does not involve changes to the resource's persistent configuration. That is, item 1. above is a no-op.
The upshot of all this is a "runtime-only" operation on a profile resource is a very simple thing. The handler doesn't do anything to update the model (same as how it functions on a server) and then ensures that it doesn't register a runtime step if executing on a DC or HC. The kernel then takes over from item 5. above and rolls it slave HCs, where the handler makes no model changes and doesn't add a runtime step. Finally when the op is rolled out to the servers, item 5.4, on the servers the handler decides to add the runtime step and item 3 occurs.
Requirements
Hard Requirements
- If an operation is registered with a ManagementResourceRegistration in the profile=* tree with an OperationDefinition that includes the OperationEntry.Flag.RUNTIME_ONLY flag, the DC must roll it out to the domain the same as it rolls out operations without that flag (with one exception noted below).
- This is already the case, and likely has been the case since AS 7.0. As mentioned in the Overview, this issue is primarily a policy change question, not a software change request.
- Basic verification and regression testing of this fact must be added to the WildFly Core domain testsuite.
- The one exception to the point above is that an operation targeting a domain-wide resource that has the OperationEntry.Flag.RUNTIME_ONLY flag should be rolled out to the domain even if it also contains the OperationEntry.Flag.READ_ONLY flag. Currently read-only ops against domain-wide resources are not rolled out, because only the model was being read, and the model is present locally. But for a domain-wide resource, there are no runtime services to read on the DC/HC, so RUNTIME_ONLY + READ_ONLY can only mean to roll the op out to the servers and gather the inputs. See [WFCORE-2858] Roll out READ_ONLY + RUNTIME_ONLY ops to the domain - JBoss Issue Tracker
- The kernel must make it easy for OperationStepHandler implementations to avoid trying to execute on a DC/HC, i.e. make it easy to implement item 2. from the Background section:
- The OperationContext already exposes an isDefaultRequiresRuntime() method which will provide the correct answer for most use cases, returning false if the process type is not a server or if the target address isn't in the /host=*/subsystem=* tree.
- The AbstractRuntimeOnlyHandler class, a convenient base class for OSHs that perform runtime-only ops, must be updated to perform such a check before adding the runtime-only step. The check should be overridable so classes that have other rules than OperationContext.isDefaultRequiresRuntime() (e.g. elytron) can apply their own rules. See [WFCORE-2850] AbstractRuntimeOnlyHandler should not add its step on a profile=* resource - JBoss Issue Tracker
- The OperationContext should reject invocations of its methods that indicate a desire to modify the runtime if the active step is in the /profile=* tree. See [WFCORE-2815] Reject service modification in the /profile=* resource tree - JBoss Issue Tracker
- The OperationContext should reject attempts to register Stage.RUNTIME steps if the step address is in the /profile=* tree. See [WFCORE-2849] Disallow addition of Stage.RUNTIME steps for /profile=*/subsystem=* resources - JBoss Issue Tracker
Nice-to-Have Requirements
- Allow subsystem authors to register runtime-only operations on the /profile=*/subsystem=x resources while disallowing their execution by end users on the related /host=*/server=*/subsystem=x resource. A management operation that writes persistent configuration cannot be directly executed by a user against address /host=*/server=*/subsystem=x. The operation must be targeted at the /profile=*/subsystem=x resource, and then the DC rolls that out to domain, thus ensuring all servers are consistent. But a runtime-only operation does not have such a restriction. Registering one on a /profile=*/subsystem=x resource does not mean that the operation can only be invoked that way; it is generally just a convenience to allow a change to multiple servers that could be performed individually on one or more servers and not on others. However, in theory at least, some subsystems may want to disallow direct invocation and only allow domain-wide invocation. Since the resolution of [WFCORE-13] End users can call non-published management API operations - JBoss Issue Tracker it should be possible to do this by having the subsystem author register the operation with two different OperationDefinitions, one for use on a DC/HC or standalone server and one for use on a domain server. The domain server one would use OperationEntry.EntryType.PRIVATE while the DC/HC/standalone one would use PUBLIC. This will make the operation accessible to end users only on the DC while still allowing the DC to roll out the operation to the servers.
- Updates to the management API version difference utility to identify and report changes in what miscellaneous operations are registered for a resource or in parameters to existing miscellaneous operations. This utility is used to do API version checks that we use to ensure that appropriate management operation transformers are in place for mixed domain scenarios. The utility is primarily focused on resource types and attributes, but this issue may result in more misc operations being added, so we want to be able to identify missing transformers for those.
- This item can come later as it is not expected that in WildFly 11 there will be any use of this feature by existing subsystems.
Non-Requirements
- Actual management of runtime services by the profile=* resources. To the contrary, some of the requirements above will help enforce the existing rule against this.
- A more convenient way to let subsystem authors to register runtime-only operations on the /profile=*/subsystem=x resources while disallowing their execution by end users on the related /host=*/server=*/subsystem=x resource, without having to use separate OperationDefinition instances for the two registrations.
- Special views in the HAL console for exposing such operations. The profile resources are primarily represented in the HAL configuration views. Even though runtime-only operations on a profile are not truly configuration, there is no requirement for HAL to expose them in some other way. This issue also doesn't create an overarching requirement for HAL to expose such operations in the configuration view. Any such requirement should be part of the requirement list for whatever subsystem feature is driving the addition of the operations.
Comments