Some notes on discussions we've had re: the various types of processes and/or inter-process coordination modules that will exist in a domain.
The following seem certain:
- A running Server
- The DomainController, which is a variant of a Server
- A process that can listen for commands from the DC and start/stop Servers
- A module that can synchronize the InstalledImage with the DC prior to launch of a Server
- A process that can parse the host.xml to extract any JVM configuration values specified for a Server and use those values to spawn the Server VM (#1 above)
The general consensus is to try to collapse 3, 4 and 5 into a single process, the ServerManager.
The ServerManager would use java.lang.ProcessBuilder to spawn Server processes.
There are also some other roles the ServerManager could play.
- Mediation between the DC and a Server in the handling of management operations. Two concepts are driving this notion:
- The SM is responsible for maintaining the InstalledImage. Many management operations involve updating the InstalledImage (e.g. new deployment content or a changed domain.xml). Having the SM mediate all management operations lets it handle this portion of the task.
- Reduction in the number of inter-process network connections. This assumes a ServerManager interacts with the various Servers it's managing via stdio. So, if you had N hosts in a domain, and M Server's per host, the DC would only need to maintain socket connections to N processes, rather than N * M. In a large domain this could be significant.
- (This is something that was briefly mentioned on a chat; not discussed much.) SM could expose command line API allowing updates to the local host.xml when the DC is down. This is consistent with its role as manager of the InstalledImage.
The risk of the above is the Server process depends on the SM; if the SM goes down the Server processes std input/output/err are not consumed and it will likely die. This makes it difficult (probably impossible) to upgrade/patch the SM without restarting all its associated Servers.
The SM is associated with the InstalledImage, and a patch/upgrade is a change to the InstalledImage, so to an extent having to restart the Servers following a patch to the SM isn't unreasonable. It does however make doing a rolling deployment of a patch more complex. An SM may be controlling Servers from multiple different ServerGroups/Clusters. so that interferes with a rolling upgrade approach of first restarting one ServerGroup, then another etc.
Another risk of the above is controlling the Server via stdio. We haven't done that much (maybe some have). Anything in the Server can write to stdout/stderr/.
An alternative is to have SM launch Server processes by executing run.sh/run.bat or something similar. The script launches the Server as a background process. The SM then communicates with the Server via a socket connection.