I'm not sure why you need a recovery alert. It sounds like you just need a single Alert Definition. By default (with no special dampening) it will fire each time the metric is reported in the problem range. If the metric is reported with a normal value it will stop firing.
In general recovery alerts are used when you want an alert to fire once and not again until the situation is corrected.. In that case you set the standard alert to disable after firing, and you associate a recovery alert definition. The recovery alert just serves to re-enable the alert when the situation has been corrected.
I'm not sure about the REST interface, it may not yet support what you are looking for.
The reason for the recovery alert is to try and replicate an existing monitoring system RHQ is replacing where an alert event has a start and end alert. The existing system generates a 'cleared alert' to signal to the user a condition has ended or been cleared.
I was wanting to also add RHQ's ability to generate alerts every time the metric is reported to provide the end user with a history of the alert event so trends could be analyzed. Is type of combined functionality possible with RHQ?
OK, I think I get what you are saying. You are looking to record an "Alert Event" in some way. Something that somehow demarcates the beginnning and end of related alerts for a particular issue. Is that right?
So what you are looking for is sort of a way to fire a "Problem Start" alert, then 0 or more "Problem Not Solved" alerts, then a "Problem Solved" alert.
We don't really have that concept, recovery alerts don't do quite what you want. You need a little more manual contol over what is going on. The only thing I can think of that could help you out would be to incorporate Alert Notification scripts.
For example, and this is just off the top of my head. Say you want to do this for Resource R where metric M > value V. For R, create 3 alerts defs:
AD-1: PROBLEM FIXED! Condition: M <= V
This will be the recovery alert for AD-2
AD-2: PROBLEM START! Condition: M > V
Set AD-1 as recovery alert
AD-3: PROBLEM NOT SOLVED! Condition: M > V
Now, on AD-1 and AD-2 you also have Alert Notification scripts:
- For AD-1 it disables AD-3 (using the AlertDefinitionManagerRemote method to do so).
- For AD-2 it enables AD-3 (using the AlertDefinitionManagerRemote method to do so).
So then, if M exceeds V AD-2 will fire. It will then disable and wait for the recovery alert AD-1 to fire. This is the standard recovery alert feature. But also, AD-2 will enable AD-3 via the script. This will keep firing until the problem is fixed and AD-1 fires. It will execute the script to disable AD-3.
Or, at least I think it would behave that way.
As for grouping these alerts in a report. There is no way to link alerts together by default. You may be able to come up with some sort of script to figure it out, based on naming and such. See AlertManagerRemote.findAlertsByCriteria() for various query options.