-
1. Re: Resource's availability report shows Intermittent availability
mazz Jun 2, 2011 11:08 AM (in response to rafaelcba)try to INCREASE the availability scan period to something like 10 minutes. What's probably happening is you have so many resources, they are all collectively taking too long to report back. You also might want to change the value of the server-side "Agent Quiet Time" system setting. If the agent doesn't report availability within a certain amount of time, the server will assume the agent is down and mark it as such. The default is 15 minutes IIRC, so maybe that isn't the issue (I find it hard to believe that it takes your agent longer than 15 minutes to collect all availabilities and report the changes up to the server), but you never know?? In older versions, that agent quiet time setting on the server was 5 minutes and on very large environments, that was in fact too low for some people. So, see what value you have in there (Administration>System Settings)
Anyway, those are two settings I would look at.
What version of JBoss AS instances do you have? Is it AS/EAP 4? or AS/EAP 5?
-
2. Re: Resource's availability report shows Intermittent availability
rafaelcba Jun 2, 2011 12:48 PM (in response to mazz)Hi John. Thanks by your reply.
I'm using JON 2.4.1, so the 'agent quiet time' is already set to 15 min. The AS version is 4.2.2.GA - there is a plan to upgrade to EAP soon.
I'll try to INCREASE the availability scan period to 10 minutes as you suggested.
Any news I report here.
Thanks.
-
3. Re: Resource's availability report shows Intermittent availability
rafaelcba Jun 3, 2011 10:15 AM (in response to mazz)Hi Mazz.
I've changed the the availability scan period to INCREASE it but no success yet. As you can see on screenshot the "false positive" still occurs on Availability Report. The DOWN events are from small times (2..7 minutes), but they seems strange when someone (admins) takes a look on it. is there some more Agent params that would be import to verify? Is there any blog post/wiki page discussing something about tuning Agent params when runing in boxes with a big amount of resources?
Thanks.
-
4. Re: Resource's availability report shows Intermittent availability
rafaelcba Jun 8, 2011 11:49 AM (in response to rafaelcba)I was thinking if maybe would be interesting have some kind of "Dampening" to be used by ResourceComponents' getAvailability() methods.This could help to avoid some false positives regarding availability of Resources. What do you think?
regards.
-
5. Re: Resource's availability report shows Intermittent availability
rafaelcba Jun 10, 2011 11:03 AM (in response to rafaelcba)Hello Guys!
Someone have already had this kind of issues regarding to fails on metrics collection and sending to the Server?
I observed this on rhq-agents used in Boxes with too many resources and small time intervals for metric collections.
I'd appreciate any sort of directions...
Thanks.
-
6. Re: Resource's availability report shows Intermittent availability
mazz Jun 10, 2011 2:26 PM (in response to rafaelcba)The only thing we could recommend is either increasing the metric collection intervals OR reducing the number of metrics that are actually collected (in other words, go through your resources and ONLY ENABLE those metrics that you really care about).
That is for metric collections.
As for availability collections, since availability is collected and reported differently than metrics, you can't just "disable" availability for a particular set of resources.
There is talk about refactoring/redesigning the way availability is collected and reported, but nothing to date has been done on this.
We actually haven't seen many people complain about this - usually, people aren't monitoring so many resources per agent that the availabillty collection slows down to the point where it causes problems. Tweeking the availability scan and quiet time settings are right now the only things to customize to try to work around problems.
-
7. Re: Resource's availability report shows Intermittent availability
rafaelcba Jun 14, 2011 7:17 PM (in response to mazz)The only thing we could recommend is either increasing the metric collection intervals OR reducing the number of metrics that are actually collected (in other words, go through your resources and ONLY ENABLE those metrics that you really care about).
Mazz, what about tuning the RHQ Agent to be more robust in these scenarios? I observed that when tuning some parameter on Agent like:
RHQ Agent > CONFIGURATION > Plugin Container and Client Sender
The agent can handle more metrics and seems to be more stable. But there isn't a DOC discussing about Tuning the agent parameters. I've found [1] but no info about agent perf tuning. Maybe with the approach of caching [2] this kind of issues could be addressed :-)
[1] http://www.rhq-project.org/display/RHQ/Performance+Engineering
[2] http://www.rhq-project.org/display/RHQ/Ideas+about+Caching
-
8. Re: Resource's availability report shows Intermittent availability
vladcrc Nov 23, 2012 11:39 AM (in response to rafaelcba)Hi,
We have a similar situation: an agent monitoring a machine with a JBoss 4.2; there are many resources in JBoss, but otherwise that JBoss is the only application running on that machine. The availability goes up then down and back again every minute, as the availability metric is set to 1 minute eventhough the JBoss is running fine. We use RHQ 4.4. We have an alert set on it and the admins go crazy . We increased the collecting time for availability metric but would be interesting to know why this happens.
Regards,
Vlad
-
9. Re: Resource's availability report shows Intermittent availability
jayshaughnessy Nov 24, 2012 5:10 PM (in response to vladcrc)The advice above is not relevant to your RHQ 4.4 installation as much has changed in the area of availability collection, agent quite time, etc (see release notes for more). As for your situation, that's odd, it seems like the server's availabilitgy check is just failing quite often. Perhaps timing out? Look in your agent log got errors and see if it sheds any light on what may be happening. You can change your avail collection interval to be higher but I'mnot sure that will realy help the situation.