5 Replies Latest reply on Nov 20, 2014 5:18 AM by pilhuhn

    ID generation

    mazz

      I commented on this in Lukas' doc, but it was long and well, it is more for a discussion so here it is.

       

      I've been thinking about ID generation (agent IDs, resource IDs, anything that a "feed" is going to need to identify when it sends data for it (like metric data or configuration data).

       

      I think each resource managed by an agent can get assigned an ID by the agent (not the server) - that resource ID will be paired with the agent ID to make it unique server-side (i.e. a ID therefore is really a tuple: <agent-ID>:<resource ID>). I've been reading up on UUIDs - versions 3 and 5 of UUIDs have the concept of a namespace and name - this fits well with this concept. Notice that now it would be the agent's job to make sure it assigns unique IDs to its resources, not the server. This could be as simple as generating a name string unique among peers and pair it with the parent UUID as a namespace (which is analogous to resource keys in RHQ today - keys are names (idempotently discovered by plugins) that are unique among peer resources (i.e. all children under a single parent have unique keys), but not unique among other trees in the inventory). Since this is on the agent/feed, each parent resource's ID is a "sub-namespace" - each parent being a namespace for its children IDs but the parent itself being in the agent namespace This "agent-generated IDs" solves the need for the server to have idempotent ID generation for resources.

       

      How do we solve this for agent (or feed) IDs? The agent could suggest or request its own ID (kind of like agent name today - each agent today tells the server what its agent name should be) - it would be required that that agent ID never change. Since it is the agent who requested the ID, it can theoretically regenerate or rediscover it upon restart or even re-install (thus, it too would be idempotent). It would have to be understood that if the agent ID cannot be re-discovered by the agent, it will not be able to send up data to the server-side - effectively, the agent would have forgotten who it is since it can't remember its ID. We could provide some way for the agent to "recover" the ID through some server-side API, but that will not be 100% foolproof, there will be cases where the agent just flat out cannot get its ID back without manual admin intervention. We'd want to miniize that and yet still provide some mechanism to recover (even if its through manual admin intervention).

       

      What this means is we need an "ID generation" service, or "Agent registration service" (or call it "Feed registration", if we want to move away from the word "Agent") that will need to take an agent registration request with its suggested name and return a success/fail message. There would be no need for server-side calls to generate resource IDs since its just an algorithm the agent would use to generate IDs agent-side. We could provide a client .jar for Java agents to use so it could have a local API to call to generate IDs. But the point is there would be no need for the agent to ask the server for IDs - since the agent now would be responsible for ID generation. The agent could then immediately start sending data with its own home-cooked IDs. For example, once an agent has his own ID registered on the server, it can start pumping data to rhq-metrics with its own resource IDs it generated (along with the metrics for those resources obviously).

       

      I envision using UUIDs because they are easy to generate, obviously make it easy to come up with unique combinations, have a small fixed width (only 16 bytes of binary data, the string form is small and fixed width as well), and have this notion of namespace and name when you generate them, so it should be easy to build unique UUIDs based on the agent/parent/child tuple I discuss above. But this is an implementation detail - if it is easier to just use free-form strings, that's doable (though that has some drawbacks such as variable lengths).

        • 1. Re: ID generation
          jayshaughnessy

          This is in line with some of the comments I've added to Lukas's doc as well.  I do think a Feed [1] should be able to self-assign an ID if it thinks it can, like today's agent name, but it would have to guarantee uniqueness, which may be difficult.  The Feed assigning the Resource ID makes sense, we need to get away from having to sync on server-assigned [int] resource Id's, I think.  The "recursive" nature of resourceID, incorporating the parentID, echoes Lukas's and my own comments, so that is good.  Lukas's agent-type://agentName/resourceAddr format is like what you describe. Using UUIDs is possible and would shorten what could end up being long IDs, which could be cumbersome and take up bandwith as well.

           

          One thing about [generated] Feed IDs, is that they would very likely shy away from hostnames/IPs unless they were actually Feeds for machine level resources, In that way they become more portable and the parent-child hierarchy does not need to be rooted at a platform.  But, it may make sense for Feeds to tag their messages with host information and we may want to think about how to relate data to a host, because otherwise correlation of data with the machine may be difficult.  For example, How to explain a failure in an application to a disk space issue on its host, if we have no idea which host it was running on at the time.

           

          [1] The terms Feed and Agent are sort of interchangeable right now, for this response I'll use Feed, and consider today's RHQ Agent one type of Feed.

          • 2. Re: ID generation
            pilhuhn

            We need to make sure (and I think this is what you already envision with "Feed") that we also include the cases where an application is just sending data to the server directly without any other component of RHQ.next being involved.

             

            For the format, I think we need a URI with URI templates. The above agent-type://agentName/resourceAddr is a start here. I'll try to explain that a bit more:

             

            We have seen in the past a lot that customers are deploying (independent) instances of RHQ into test, integration and production environments. While test is often a lot different from the other two, integration and test are very often veeeery similar with the major difference of using different IP addresses for machines (of course there are/may be others). Here users want to not only set up their app in the integration env, but also the monitoring and then port that over to production.

            With that in mind, I propose PUID, partial unique ids, that are part of a URI like   schema://<env>/.../PUID where the PUID would be the same for a certain resource inside env=integration or env=production. This will allow to e.g. dump a list of alert definitions from integration, exchange the environment in the URI and load it into production.

             

            The schema may be used to identify the kind of object the URI identifies and could be a metric://<env>/.../PUID#metric_name. Similar for the resource:// or perhaps also alertDef:// (and more)

             

            Within the URI template, the agent name is probably part of the /.../ I penciled in above. Also here we need to make sure that the agents can be "easily" exchanged when moving the environment. The agent name may already be a relative name (relative to the environment).

             

            I fact with the multi-tenancy-proposal Thoughts on tenants for rhq.next the tenant id needs to be part of the URI as well, so the URI template could look like  <scheme>://<tenant>/<env>/<agent>/PUID

            • 3. Re: ID generation
              mazz

              Using URIs would involve variable length IDs - and they could be pretty long. So that could be an issue we have to worry about (e.g. make sure we design any storage schema to support randomly long URIs).

               

              Also, does this mean each feed has to know its tenant ID, agent ID and env ID? That is not something I would expect a "dumb agent" or "dumb feed" to be able to generate on its own. It can see it being able to generate its own agent ID, but the others I don't think so (how will it know its in a test environment and not production, for example?) It would have to "register" to get those IDs. Somehow, then, the server would need to know what tenant and env this new "feed" belongs to and send it back as part of the registration response.

              • 4. Re: ID generation
                pilhuhn

                 

                Using URIs would involve variable length IDs - and they could be pretty long. So that could be an issue we have to worry about (e.g. make sure we design any storage schema to support randomly long URIs).

                 

                It may be that we only need the (full) URI on central places (server, rhq-metrics, alerting), while the "agent" itself

                only needs to know the PUID

                 

                Also, does this mean each feed has to know its tenant ID, agent ID and env ID? That is not something I would expect

                 

                The feed needs to know some credentials, where the tenant ID would be a part of. I think the central instance

                would need to know the agent id etc. Or in "RHQ classic" speak, I guess the would know its ID and the plugin

                would provide the PUID; the agent would in this case also know its tenant. Environment (prod, int, test) would

                be supplied by the user when he takes the agent into inventory.

                 

                a "dumb agent" or "dumb feed" to be able to generate on its own. It can see it being able to generate its own agent ID, but the others I don't think so (how will it know its in a test environment and not production, for example?) It would

                 

                The dumb agent would need a "proxy" on the server side that supplies that data.

                Either as part of some initial registration. Or when the feed sends data to bring it into a form that is usable internally

                • 5. Re: ID generation
                  pilhuhn

                  Another thought that I forgot in my previous comments is that if we obtain ("link to") resources from another inventory service, we may probably just take the PUID from that service and have a URI like  kubernetes://k-host/PUID or manageIQ:/....  

                  If we want to address sub-items like a metric on kubernetes, the schema may become  metric+kubernetes://k-host/PUID#metric-name  or similar