14 Replies Latest reply on Jun 28, 2010 11:26 AM by emuckenhuber

    Domain Topology

    brian.stansberry

      Some notes on discussions we've had re:  the various types of processes and/or inter-process coordination modules that will exist in a domain.

       

      The following seem certain:

       

      1. A running Server
      2. The DomainController, which is a variant of a Server
      3. A process that can listen for commands from the DC and start/stop Servers
      4. A module that can synchronize the InstalledImage with the DC prior to launch of a Server
      5. A process that can parse the host.xml to extract any JVM configuration values specified for a Server and use those values to spawn the Server VM (#1 above)

       

      The general consensus is to try to collapse 3, 4 and 5 into a single process, the ServerManager.

       

      The ServerManager would use java.lang.ProcessBuilder to spawn Server processes.

       

      There are also some other roles the ServerManager could play.

       

      1. Mediation between the DC and a Server in the handling of management operations. Two concepts are driving this notion:
        1. The SM is responsible for maintaining the InstalledImage. Many management operations involve updating the InstalledImage (e.g. new deployment content or a changed domain.xml). Having the SM mediate all management operations lets it handle this portion of the task.
        2. Reduction in the number of inter-process network connections. This assumes a ServerManager interacts with the various Servers it's managing via stdio. So, if you had N hosts in a domain, and M Server's per host, the DC would only need to maintain socket connections to N processes, rather than N * M. In a large domain this could be significant.
      2. (This is something that was briefly mentioned on a chat; not discussed much.) SM could expose command line API allowing updates to the local host.xml when the DC is down. This is consistent with its role as manager of the InstalledImage.

       

      The risk of the above is the Server process depends on the SM; if the SM goes down the Server processes std input/output/err are not consumed and it will likely die. This makes it difficult (probably impossible) to upgrade/patch the SM without restarting all its associated Servers.

       

      The SM is associated with the InstalledImage, and a patch/upgrade is a change to the InstalledImage, so to an extent having to restart the Servers following a patch to the SM isn't unreasonable. It does however make doing a rolling deployment of a patch more complex. An SM may be controlling Servers from multiple different ServerGroups/Clusters. so that interferes with a rolling upgrade approach of first restarting one ServerGroup, then another etc.

       

      Another risk of the above is controlling the Server via stdio. We haven't done that much (maybe some have). Anything in the Server can write to stdout/stderr/.

       

      An alternative is to have SM launch Server processes by executing run.sh/run.bat or something similar. The script launches the Server as a background process. The SM then communicates with the Server via a socket connection.

        • 1. Re: Domain Topology
          dmlloyd

          Brian Stansberry wrote:

           

          The risk of the above is the Server process depends on the SM; if the SM goes down the Server processes std input/output/err are not consumed and it will likely die. This makes it difficult (probably impossible) to upgrade/patch the SM without restarting all its associated Servers.

          It should be noted that this risk is mitigated (at least somewhat) by the possibility that the SM will have few external dependencies and thus should not need to be upgraded as often as the components consumed by the Servers and, of course, the deployments themselves.

           

          Brian Stansberry wrote:

           

          Another risk of the above is controlling the Server via stdio. We haven't done that much (maybe some have). Anything in the Server can write to stdout/stderr/.

           

          An alternative is to have SM launch Server processes by executing run.sh/run.bat or something similar. The script launches the Server as a background process. The SM then communicates with the Server via a socket connection.

          Using jboss-stdio we can prevent this problem by replacing System.out/System.err at boot (which we have to do anyway, in order to capture those streams for logging purposes).  In addition, any JVM output to stderr should be reflected in the error log of the SM.  The remainder of this risk can be mitigated by using a simple protocol over stdout which can recover to some extent from the interjection of random messages (though I don't believe I've ever run across a case where things come to stdout which do not come via System.out).

          • 2. Re: Domain Topology
            brian.stansberry

            David Lloyd wrote:

             

            Brian Stansberry wrote:

             

            The risk of the above is the Server process depends on the SM; if the SM goes down the Server processes std input/output/err are not consumed and it will likely die. This makes it difficult (probably impossible) to upgrade/patch the SM without restarting all its associated Servers.

            It should be noted that this risk is mitigated (at least somewhat) by the possibility that the SM will have few external dependencies and thus should not need to be upgraded as often as the components consumed by the Servers and, of course, the deployments themselves.

             

             

            We should shoot to make the SM as modular as possible such that as much as possible of its functionality can be upgraded without a restart.

            • 3. Re: Domain Topology
              emuckenhuber

              Having the SM synchronize the InstalledImage seems to go more into provisioning. Where it would be interesting to clearer define what the InstalledImage is. I guess it mostly refers to a versioned artifact/module repository, so maybe a InstalledImage then is also more a descriptor of versioned modules contained in an AS release - so something like a component-matrix.

              This also brings the requirement that updates/synchronization of these artifacts/modules don't affect other domains/server-groups running the same/previous InstalledImage. So what i mean you download the zip distribution run 2 different domains from the same InstalledImage and for whatever reason domain1 will be updated. This should not affect the other domain.

              • 4. Re: Domain Topology
                johnbailey

                One major benefit to having the installed image be a component-matrix like config, is upgrades and upgrade-rollback would be pretty easy.  You can always add new artifacts to the repo leaving the old ones in place.  Boot the server with a new installed image descriptor and possibly a tweaked domain.xml/host.xml (if schema changes or new configs are needed).    If something goes wrong you use the original domain.xml and installed image descriptor and you are set.  I think David mentioned something related to providing a cleanup utility that could be used down the road to cleanup artifacts no longer used in active installed images.   My guess is this will never be used by a customer because it will likely be a minor disk space savings and possibly conceived as unnecessary risk to a customer.  But having the cleanup would be a good feature for anyone worried about keeping things tidy, and would be a snap to implement, so there is no reason to not have it.

                • 5. Re: Domain Topology
                  emuckenhuber

                  Yeah, the thing i'm wondering is what to do with our other platforms extending or building on top of AS. So maybe the InstalledImage is really the initial zip distribution, containing one or more "repository roots" or update URLs - basically pointing to this descriptor containing the artifacts/modules for a given release.

                  • 6. Re: Domain Topology
                    dmlloyd

                    Emanuel Muckenhuber wrote:

                     

                    Yeah, the thing i'm wondering is what to do with our other platforms extending or building on top of AS. So maybe the InstalledImage is really the initial zip distribution, containing one or more "repository roots" or update URLs - basically pointing to this descriptor containing the artifacts/modules for a given release.

                     

                    This is more of a provisioning question though isn't it?  So far it doesn't look like any fancy provisioning architecture is part of our current requirements - though it would make sense to lay some sane groundwork for it.

                    • 7. Re: Domain Topology
                      emuckenhuber

                      David Lloyd wrote:


                      This is more of a provisioning question though isn't it?  So far it doesn't look like any fancy provisioning architecture is part of our current requirements - though it would make sense to lay some sane groundwork for it.

                      Yeah, more related to provisioning. IMHO it goes into that direction anyway as soon as talking about synchronizing the InstalledImage. Where user deployments are scoped to a domain, the bits in an InstalledImage are cross domain - meaning more or less that all domains running an EAP 6.0.1 have the same binaries. Having the InstalledImage as sort of component-matrix seems to move the complexity mostly to some build/release tools. Although this does not really handle things like one off patches.

                      For now i'm also more interested in some basic (sane) rules on which the SM knows when to synchronize parts of the InstalledImage.

                      • 8. Re: Domain Topology
                        dmlloyd

                        Emanuel Muckenhuber wrote:

                         

                        David Lloyd wrote:


                        This is more of a provisioning question though isn't it?  So far it doesn't look like any fancy provisioning architecture is part of our current requirements - though it would make sense to lay some sane groundwork for it.

                        Yeah, more related to provisioning. IMHO it goes into that direction anyway as soon as talking about synchronizing the InstalledImage. Where user deployments are scoped to a domain, the bits in an InstalledImage are cross domain - meaning more or less that all domains running an EAP 6.0.1 have the same binaries.

                        Well, maybe.  I don't see deployments as being part of the installed image personally.  Rather they're part of the domain model itself.  Forcing all systems in the domain to have a sync'd installed image is in violation of the requirement that the same version family can be intermixed within a domain, so long as all servers in each group have the same version.

                        • 9. Re: Domain Topology
                          emuckenhuber

                          David Lloyd wrote:

                           

                          Well, maybe.  I don't see deployments as being part of the installed image personally.  Rather they're part of the domain model itself.  Forcing all systems in the domain to have a sync'd installed image is in violation of the requirement that the same version family can be intermixed within a domain, so long as all servers in each group have the same version.

                          Obviously deployments are part of the domain, i thought that's what i said. So in what cases do we need to synchronize a InstalledImage then?

                          • 10. Re: Domain Topology
                            dmlloyd

                            Emanuel Muckenhuber wrote:

                             

                            David Lloyd wrote:

                             

                            Well, maybe.  I don't see deployments as being part of the installed image personally.  Rather they're part of the domain model itself.  Forcing all systems in the domain to have a sync'd installed image is in violation of the requirement that the same version family can be intermixed within a domain, so long as all servers in each group have the same version.

                            Obviously deployments are part of the domain, i thought that's what i said.

                            Oh, sorry, then I misunderstood you.

                             

                            Emanuel Muckenhuber wrote:

                             

                            So in what cases do we need to synchronize a InstalledImage then?

                             

                            Hmm, I have no idea!  Maybe when new hosts are installed?  Though I think the easiest way to do that is to just copy the image over to another box and start it up, but that's just me.

                            • 11. Re: Domain Topology
                              brian.stansberry

                              The basic initial idea behind "syncing the installed image" was 1) to pull down the current doman.xml (now domain model, transaction log) and 2) to get the current version of user deployments (and user libs -- think jdbc drivers in server/default/lib) down onto the local filesystem. Assume a host has been offline for a while and has missed some updates.

                               

                              Very true that the issue of upgrading the SM itself goes more into the area of provisioning; i.e. it's not something that the current set of requirements puts as a responsibility of the DC. That said, we should think about it when we design, i.e. even if it's not our responsibility, try to make life less painful for those who do have to handle it.

                               

                              Multiple domains from the same installed image? Interesting. If we're using a versioned-repository for all of our jars, I guess that's reasonable. OTOH I don't think it's a requirement, so if we find that supporting that is causing us headaches, drop it.  Again, even if we don't have to handle patching ourselves, let's not make life miserable for those who do.

                               

                              What then is the cardinality between DC, SM and InstalledImage? DC --> SM is 1..n. We'd been discussing in terms of SM --> II as 1..1. If II --> Domain is n..n, then either SM --> II is n..1 or DC --> SM is n..n. If patching an SM on one host means simultaneously taking down servers out of 9 server groups running over 3 domains, that's going to be a coordination nightmare for someone.

                              • 12. Re: Domain Topology
                                emuckenhuber

                                Hmm, ok then i maybe just misunderstood the term "InstalledImage" - for me the installedImage was the binaries we ship (e.g. the zip distribution), so mostly about the module repository. In case the InstalledImage is about domain deployments and libs, then yes the cardinality of SM --> II should be 1:1.

                                • 13. Re: Domain Topology
                                  brian.stansberry

                                  We could limit the term InstalledImage to purely what we ship. Or maybe think of a different term, since your interpretation is quite reasonable.

                                   

                                  Either way there is going to be an area on the host where we write stuff -- local copies of deployments and libs, plus the local representation of the domain model. The SM manages that, so the cardinality of SM --> writeable area is 1:1.

                                  • 14. Re: Domain Topology
                                    emuckenhuber

                                    Is actually patching part of the domain requirements?