10 Replies Latest reply on May 3, 2010 10:39 AM by dmlloyd

    Partition and Node identities

    dmlloyd

      In Remoting 3, we introduce the concept of an "Endpoint".  An Endpoint is an instance of a node in a connected graph of Remoting participants.  Each Endpoint has a name which identifies it to its peers; thus one would generally ensure that the name is unique within this graph.

       

      In an appserver setting, there would typically be a single Endpoint instance per AS instance.  Thus, it meshes strongly with the notion of having a single identity per AS instance.  And since it is common to have a single AS instance per host, it would appear to make sense to default the AS identity to be equivalent to the host's identity (the host name, to be more specific - not the DNS name of an interface, but the name of the host itself).

       

      Today, we configure clustered JBossAS instances with a "jboss.partition.name" which represents the identity of the cluster of which it is a part.  However as far as I've been able to figure out, we do not have a hard-and-fast concept of a node name.  I would like to propose a system property "jboss.node.name" which represents the identity of the node itself.  This property would have a sensible default but would also be modifiable just as partition name is today.

       

      In terms of implementation, my few minutes of research on the topic indicate that the best way to calculate a default node name at boot would be as follows.

      1. Define a system property, "jboss.host.qualified.name" or just "host.qualified.name", which, if unspecified, defaults to the value of:
        1. the HOSTNAME env var, or if that is not specified,
        2. the COMPUTERNAME env var, or if that is not specified,
        3. the value of InetAddress.getLocalHost().getHostName(), or if that does not turn up anything,
        4. a default value such as "unknown-host.unknown-domain"
      2. Define a system property, "jboss.host.name" or just "host.name", which, if unspecified, defaults to host portion of the above property
      3. Define a system property, "jboss.node.name" which, if unspecified, defaults to the value of the above property

       

      (The host name identification scheme described above has been reported to work on Windows, POSIX-like OSes, and Mac OS, barring the event of the environment being cleared for some reason, of course.)

       

      When the Remoting Endpoint is instantiated, it will then use the value of this property to name the endpoint, which also acts as the default authentication realm for incoming connections.

       

      The concept of a node name is also used in a few other places that I've found:

      • It appears to be used by mod_jk for identification purposes
      • Seems that it might be used by jgroups and/or jbcache and/or infinispan for something?

       

      Thoughts/feedback?

        • 1. Re: Partition and Node identities
          jaikiran

          david.lloyd@jboss.com wrote:

           


           

          In terms of implementation, my few minutes of research on the topic indicate that the best way to calculate a default node name at boot would be as follows.

          1. Define a system property, "jboss.host.qualified.name" or just "host.qualified.name", which, if unspecified, defaults to the value of:
            1. the HOSTNAME env var, or if that is not specified,
            2. the COMPUTERNAME env var, or if that is not specified,
            3. the value of InetAddress.getLocalHost().getHostName(), or if that does not turn up anything,
            4. a default value such as "unknown-host.unknown-domain"
          2. Define a system property, "jboss.host.name" or just "host.name", which, if unspecified, defaults to host portion of the above property
          3. Define a system property, "jboss.node.name" which, if unspecified, defaults to the value of the above property

           

          Given this above scheme, 2 independent (non-clustered) instances of JBossAS running on the same machine, might end up with the same node name unless the user explicitly specifies a unique jboss.node.name for each of those instances, isn't it? Maybe we should by default take into account the -b option to determine the default node name? That way, the user can just continue doing:

           

          run.sh -c default -b 127.0.0.1
          

           

          run.sh -c custom -b 127.0.0.2
          

           

          and the node names would then be 127.0.0.1 and 127.0.0.2 respectively.

          • 2. Re: Partition and Node identities
            dmlloyd
            In that particular case especially - if a remote node attempts to connect, and nodes were using the bind address for their identity, then (for example) 0.0.0.0 would end up trying to talk to 0.0.0.0.  I think that bind address can explicitly not be this identity for this reason.  Using a network interface is also undesirable for a similar reason.  I would think that the case of binding to localhost or 0.0.0.0, or having multiple network interfaces on a box, are both more common than running two instances on the same system.  That's why I suggested host name.
            • 3. Re: Partition and Node identities
              brian.stansberry

              I was out yesterday; sorry for slow reploy.

               

              100% agree we need a unique name for nodes, and the -b value isn't a good default, at least not if -b is 0.0.0.0.  I'll dig up a link to an earlier thread about this general topic that kind of died out. This touches on lots of areas, so I suspect this discussion will end up moving to the jboss-development list.

               

              I don't think it's the end of the world if people have to actually specify the name if they want to run two instances on the same machine both bound to 0.0.0.0. They'd have to set -Djboss.service.binding.set=xxxx and -Djboss.messaging.ServerPeerID=y on at least one node anyway, so it's not like their startup command was totally trivial and now they are forced to add complexity.

               

              I think when we move to a proper domain model we should require each server instance in the configuration to have a name. For that reason also, I don't think driving people in some cases to actually configure the name is the end of the world. Actually, that's the one thing that concerns me in your proposal for determining the name, which otherwise sounds fine: it introduces 3 system properties. If in a future domain configuration people are just required to do something like:

               

              <server name="AS1"...
              

               

              then for AS 6 we've probably introduced 2 configuration system properties that will disappear in AS 7?

              • 4. Re: Partition and Node identities
                dmlloyd

                bstansberry@jboss.com wrote:

                 

                I was out yesterday; sorry for slow reploy.

                 

                100% agree we need a unique name for nodes, and the -b value isn't a good default, at least not if -b is 0.0.0.0.  I'll dig up a link to an earlier thread about this general topic that kind of died out. This touches on lots of areas, so I suspect this discussion will end up moving to the jboss-development list.

                 

                I don't think it's the end of the world if people have to actually specify the name if they want to run two instances on the same machine both bound to 0.0.0.0. They'd have to set -Djboss.service.binding.set=xxxx and -Djboss.messaging.ServerPeerID=y on at least one node anyway, so it's not like their startup command was totally trivial and now they are forced to add complexity.


                Yeah, I think that no matter what, if someone is running two instances on the same machine, they'll have to distinguish the node name somehow.  The case I was describing to Jaikiran was the case where there are two machines with instances bound to 0.0.0.0 in the same cluster - using the name of the bound interface would not work in this (common) case.

                 

                bstansberry@jboss.com wrote:

                 

                I think when we move to a proper domain model we should require each server instance in the configuration to have a name. For that reason also, I don't think driving people in some cases to actually configure the name is the end of the world. Actually, that's the one thing that concerns me in your proposal for determining the name, which otherwise sounds fine: it introduces 3 system properties. If in a future domain configuration people are just required to do something like:

                 

                <server name=AS1...
                

                 

                then for AS 6 we've probably introduced 2 configuration system properties that will disappear in AS 7?

                In my research, I've seen enough people struggle to get ahold of the system's host name that it's my belief that that is something we should be providing anyway, if we can get it.  This way, all the various "tricks" for getting the host name that crop up over time can be centrally located.  Think of it as added value (if only a little tiny bit).  So I think keeping a pair of host name properties is something that we would have no reason to get rid of.

                 

                In the future domain configuration scenario, it might still be useful for any per-host configuration to have a default server name as well.  I really think that many administrators are just going to use their server host name for this anyway.  The only time you should be required to specify a server/node/host name is when you're not talking about your own.  Either way, again it might be a handy thing if users could always count on getting the node's name from a property.

                • 5. Re: Partition and Node identities
                  brian.stansberry

                  david.lloyd@jboss.com wrote:


                  Yeah, I think that no matter what, if someone is running two instances on the same machine, they'll have to distinguish the node name somehow.  The case I was describing to Jaikiran was the case where there are two machines with instances bound to 0.0.0.0 in the same cluster - using the name of the bound interface would not work in this (common) case.

                   

                  DOH! Yes, of course.

                   

                  Re: the extra properties; what you say makes sense: jboss.host.qualified.name and jboss.host.name are useful whether or not they end up as part of the server identity.

                   

                  Even if in AS 7 we force people to specify a name, jboss.node.name is still a useful way for users to access that value.

                   

                  A problem with using defaults is the node name leaks outside the data center, and if the default is used that means information about the server configuration is leaking. For example, the node name would become the logical value for the jvmRoute used by mod_jk and mod_cluster, and that ends up being appended to the session id for web sessions, which anyone can look at in a browser.

                   

                  So, defaults are convenient but leak information. Part of my thinking on requiring their configuration once the domain configuration is in place is the inconvenience is less -- you set up your configuration once and re-use it; you don't have to retype -Djboss.node.name=xxx every time.

                   

                  The previous discussion around this basic topic is at http://community.jboss.org/thread/91830?tstart=60

                   

                  Besides the use cases listed on that thread, JBossTS also needs a unique integer id for each server on the same host: http://community.jboss.org/thread/145892?start=0&tstart=0

                  • 6. Re: Partition and Node identities
                    dmlloyd

                    bstansberry@jboss.com wrote:

                    Re: the extra properties; what you say makes sense: jboss.host.qualified.name and jboss.host.name are useful whether or not they end up as part of the server identity.

                    OK, then pending a final review in jboss-dev I'll plan on setting this up.

                     

                    bstansberry@jboss.com wrote:


                    A problem with using defaults is the node name leaks outside the data center, and if the default is used that means information about the server configuration is leaking. For example, the node name would become the logical value for the jvmRoute used by mod_jk and mod_cluster, and that ends up being appended to the session id for web sessions, which anyone can look at in a browser.

                    This is a good point.  If we make jvmRoute configurable and default to node.name (yeah, that's a lot of defaulting) then at least it can be "hardened".  Alternately, using a short hex- or base64-encoded hash of the node name plus a configurable (cluster-wide?) salt might be workable too.  Or some combination thereof?  In the meantime we can "punt" the issue and just force manual configuration until/unless a satisfactory solution is found.

                     

                    So, defaults are convenient but leak information. Part of my thinking on requiring their configuration once the domain configuration is in place is the inconvenience is less -- you set up your configuration once and re-use it; you don't have to retype -Djboss.node.name=xxx every time.

                    That seems like a reasonable plan to me.  For 6, set up all the properties in bootstrap; for 7, generate them from the domain configuration.

                     

                    The previous discussion around this basic topic is at http://community.jboss.org/thread/91830?tstart=60

                     

                    Besides the use cases listed on that thread, JBossTS also needs a unique integer id for each server on the same host: http://community.jboss.org/thread/145892?start=0&tstart=0

                    I don't think we can solve the JBossTS case with a node name solution.  A hash wouldn't suffice, because administrators may want more direct control over the port and we probably can't avoid collisions in any case.  Apart from that, there's not a lot of options for getting from a node name to a port number, let alone an unbound one.

                     

                    JBM is a similar issue, though it's possible we could get away with a hash in this case.  Maybe JBM won't matter though.  Not yet sure if HornetQ uses a peer ID of some sort, but I'm inclined to take the same approach: if they use a name *and* the name isn't security sensitive, default to the node name, otherwise, solve later.

                     

                    Otherwise, everything I'm reading in these other threads indicates that I should be able to move forward with this idea, would you agree?

                     

                    P.S. I also found http://community.jboss.org/thread/90161?tstart=0

                    • 7. Re: Partition and Node identities
                      brian.stansberry

                      IMHO what you've laid out makes sense and is the way to move forward. I posted a link back to this thread on the other one, in case anyone following that forum wants to weigh in.

                       

                      Re: JBossTS, I really just linked that because you're a smart guy and I wanted you to be aware of the entire problem domain; i.e. we've got use cases that need ints and others that need strings.

                       

                      For the JBossTS case there's no need to come up with a port, just an int. The only reason that particular socket exists is to generate a unique-on-the-host int. They create the socket bound to 127.0.0.1 and then they can use the port number as a unique id because they know no other process could concurrently do the same thing. The whole thing is pluggable, and if another technique were available the whole socket would go away.

                       

                      But hashing the name would still leave the chance of collisions.

                       

                      Re: HornetQ, yeah, I hope the need for an int goes away, and my guess is it does. We can find out quickly enough; either when those guys get back from meetings or in a day or so when I get a minute to poke around in their AS integration branch to see.

                      • 8. Re: Partition and Node identities
                        dmlloyd
                        https://jira.jboss.org/jira/browse/JBAS-7779 and done for now.  If it's a problem we can revisit.
                        • 9. Re: Partition and Node identities
                          brian.stansberry

                          Snippet from a separate exchange, where Bela Ban is replying to me, that I want to merge into this thread:

                          >
                          >> Jason, as discussed in that thread I was thinking that for AS 7 in the
                          >> domain model we should make "name" be a required attribute of the
                          >> server element.
                          >>
                          >> [1] https://jira.jboss.org/jira/browse/JBAS-7779
                          >> [2] http://community.jboss.org/message/529257#529257
                          >
                          > If name is required, then that's fine, but that makes deploying of JBoss
                          > instances dynamically (e.g. in a cloud) difficult.
                          >

                           

                          Yes, that's the flaw in the idea; the thing that needs to be worked out.

                           

                          Servers will be able to join a domain dynamically, passing any required information to the DomainController as they register themselves. The name info would have to come from the command line. So this would mean forcing whatever tool is spinning up new instances on the cloud to generate and pass a synthetic name. TBH, having our own code generate a synthetic name (e.g. a UUID) in such a case seems reasonable.

                           

                          The tricky part is the domain.xml is meant to be a persistent store of configuration info for all nodes in a domain. So once you spin up a dynamic node like that, it has an entry in domain.xml. Over time your domain.xml will fill up with useless entries.

                           

                          Each named server would also get it's own writable work area on its local filesystem (which will let your https://jira.jboss.org/jira/browse/MODCLUSTER-147 approach work); over time a local filesystem could get littered with discarded write areas. This would probably be less of an issue with cloud-based deployments. It's more of a problem if people continually launch unnamed servers from the same filesystem image.

                           

                          These issues aren't unsolvable; just need some thought.

                          • 10. Re: Partition and Node identities
                            dmlloyd

                            Brian Stansberry wrote:

                             

                            Servers will be able to join a domain dynamically, passing any required information to the DomainController as they register themselves. The name info would have to come from the command line. So this would mean forcing whatever tool is spinning up new instances on the cloud to generate and pass a synthetic name. TBH, having our own code generate a synthetic name (e.g. a UUID) in such a case seems reasonable.

                            Even better would be to use a name already present and available to the instance.  If for example, the virtual host in question has a predictable host name, then our current solution would continue to work.