OK, this can get complicated, so hopefully I'll explain it properly here.
First, when the agent is first installed, it knows nothing about the server cloud it should be talking to. Yes there is an "auto-discovery via multicast" feature inside the agent to help it find servers, but we cannot require the agent to have to run in a network that allows multicast traffic (many network admins disable multicast traffic). So, we need to come up with a way that will work anywhere in any network. (the agent's multicast auto-detection feature is only used to detect when a known Jopr server has gone up or down - it is not used to detect unknown Jopr servers). For the same reason, the server cannot rely on multicast traffic to detect new agents (because, again, many (most?) network admins disable multicast traffic and thus this won't work for them).
So, when you first start the agent it needs to know its "registration server". This is the first server the agent will ever talk to. So, if you run "rhq-agent.sh", it'll ask you the setup questions, one of which is the server's endpoint. THIS is the "registration" server the agent will talk to. The agent connects to the registration server, and registers on the Jopr server cloud.
At this point, the agent has joined the Jopr environment - it is registered and all Jopr servers in the cloud know about this agent. So, even though the agent talked to this one server, the agent isn't "owned" by that server - ALL servers now know about this agent and will be able to talk to that agent. What happens as part of the registration handshake is the "registration" server will send down to the agent a "failover list". This failover list contains a list of all the servers in the cloud (the public endpoint addresses and ports that all the servers are listening to). You can see this failover list by using the agent's "failover" prompt command (type "help failover" for information about this prompt command). This failover list is customized per agent (so the agents are spread out over all the servers in the cloud, spreading out the load). Once the agent gets this failover list, it will immediately switch over to its "primary" server (i.e. the server at the top of this failover list). This primary server can be different than the registration server you gave the agent at startup.
If the primary server goes down, or is switched to "maintenance mode", the agent will use this failover list to try to talk to another server in the cloud - moving its way down the list until it finds a server that is up and can process its request. (and yes, you can set up affinity for agents to prefer to talk to one server over another - affinity groups is discussed in the docs - it helps determine the order of the failover list).
Note that the IP addresses in your failover list are the "public endpoints" as configured for all your servers. If you start your agent and you get odd connection errors after registering, go to the Administration page in the UI and click the "Manage Servers" link and look at all the endpoints for all your servers - these are the public endpoints and MUST be routable by ALL agents. You can edit the endpoints from this UI screen if you need to change them.
(from user "jbosstcs"): "wow !!!... must say that was an awesome explanation... i have a doubt... how is that "cloud" that you were referring to configured ?... where can i see it ?"
oh.. sorry mazz.. forgot to move it to a new group... any ways now that you have dobe the needful can you please also answer my question about the "cloud"myou were referring to ?
The server cloud is really trivial to "configure" because there really is no "configuration". All you do is start a new server and *poof* you have a server cloud! :)
If you start more than one server (all using the same backend database), then they automatically become "part of the cloud". The servers have internal mechanisms that they perform to register themselves in the cloud. Of course, when you install your servers, you have to give them unique server names - this is a field you set in the installer GUI. The installer docs talk more about this so I refer you to that. (http://www.redhat.com/docs/en-US/JBoss_ON//html/Installation_Guide/Installation_Guide-Installation_Settings-Server_Name.html is one page that briefly talks about it)
To "see" your cloud (more specifically, to see what servers are in your cloud), simply log into the GUI and go to the Administration page - then click the "Manage Servers" link (its called "List Servers" in earlier versions - we changed the name recently in trunk). This takes you to a page that shows you the list of servers in your cloud. All of these servers are those that your agents can talk to (these are the servers that will be in your agents' failover lists).
There are some docs on this - here's some:
that link is really good... so , mazz , what i can do is maintain the jopr server on a single machine and bind all the agents on all the nodes i want to monitor to that one server... that way i can avoid unnecessary complexity... the only downside could be if that single jopr server of mine goes down or something like that... right ?
Right - you can designate one of your servers as the "registration" server and point all of your new agents to that server. This server is "special" - it will handle all new agent registrations. So, yes, make sure you keep it running - if it goes down, any new agents that are started will sit and wait for it to come back up (they'll sit and hang out - once they see the server come up, they will wake up and continue their registration).
Of course, once you've deployed all of your agents and you do not need to add new agents anymore, that server is nothing "special" anymore. :)
BTW: there is technically no configuration setting or anything that marks a server as a "registration" server (you won't see "registration server" anywhere in the UI) - its just the server that you pick for your new agents to be configured to talk to.