8 Replies Latest reply on Nov 7, 2012 9:38 AM by jayshaughnessy

    Can alert conditions be created for metric averages?

    josepho

      Hi,

       

      I am trying to create alerts that monitor metric averages for x minutes/hours/days. I have tried using the dampening conditions to achieve a solution but none of the conditions can accurately account for averages due to the following reasons:

       

      1. The meric average can exceed the threshold without all the measuremnets meeting the threshhold in order to satisfy the 'Consecutive' condition, and I want the first occourace so the 'Last N Evaluations' condition is equivalent to 'Consecutive'.
      2. Both 'Time Period' and 'Consecutive' will not catch every instance because the agents do not have to send the metrics at every interval if the agent is under a high load so checking within a set number of measurement intervals within any set time frame can not be guaranteed to be accurate unless the alert definition is flexible enough watch the current average for the set time period.

       

      From what I can find it looks like the solution would likely be to add a condition to the alert system to monitor averages for a set time period, but it also appears that RHQ does not store the averages displayed in the GUI or REST so this may also require changing the schema to hold these values. Can someone give any insight on a simplier solution that alerts could be set to accuratly monitor an average for metrics over a custom time period?

       

      Thanks,

      Joseph

        • 1. Re: Can alert conditions be created for metric averages?
          mazz

          you can set alerts on metric baselines. not sure if it gets you exactly what you want, but its something to look into.

          • 2. Re: Can alert conditions be created for metric averages?
            josepho

            I forgot to mention that I have looked into using the baselines also, but it did not seem to be functioning as I had hoped either.

             

            I think to accurately trigger alerts on baselines when monitoring averages, 15 minutes in my case, the baselines would have to be calculated continuously every time metrics were received from an agent, every 30 sec for my use case. Also the baseline window would have to be able to be set for 15 minutes and it currently can only go as low as 1 day. This then raises a couple questions I still have about the baselines:

            1. Is the baseline average the average of the metrics in the time period set by the field 'Baseline Dataset' in the server administration?
            2. If my assumption in #1 is accurate is it possible to set the 'Baseline Dataset' to 15 minutes and the 'Baseline Calculation Frequency' to 30-60 seconds? If so would this cause significant performance issues for a server monitoring ~150 agents even in a clustered blade environment?

             

            Thanks,

            Joseph

            • 3. Re: Can alert conditions be created for metric averages?
              pilhuhn

              Actually there is a way in RHQ (> 4.3 ?) to achieve what you want, but with a little bit of external tooling (e.g. a shell script)

               

              Have a look at http://pilhuhn.blogspot.de/2012/01/pushing-metrics-baselines-via-rest.html

               

              What you would need to do is (in a loop)

               

              - via rest api obtain the metrics for the last 15 mins (via the raw-data endpoint)

              - calculate the baselines as you want them

              - write them back for the schedule of the metric

              - sleep some time

               

              And then jsut use the alerting where the data point is x% above/below the baseline.

               

              And if it works, write a blog post :-)

              • 4. Re: Can alert conditions be created for metric averages?
                josepho

                Setting alerts on baselines like that will be triggered if one datapoint goes over x% of the specified baseline value right?

                 

                For my situation I want to throw alerts when the average value exceeds a set threshold value. Using REST to update the baselines as described I think it would require that the alert condition was like the conditions for metrics so that: you could define a threshold value, select a comparator (<,>,=), then select which value from the baseline (min, max, or avg) to compare to. Then manipulating the baselines should work.

                • 5. Re: Can alert conditions be created for metric averages?
                  pilhuhn

                  Yes go to alert definitions and then add a condition on

                  metric baseline threshold

                   

                  In the popup you can then select the metric to compare (with its baseline),

                  the comparator, which can be <,=,>

                  the "exceeds baseline" factor in %

                  and the reference entriy of the baseline (avg, min, max)

                   

                  if you want to alert on "begin outside the band", you need to add two conditions, one comparing with > and the other with < and then use the "fire alert if ANY of the conditions matches" case

                   

                  What I was describing with the REST interface was only how to compute the baselines with an external job. The alerting itself will stay as is (i.e. with the internal computation mechanism).

                  • 6. Re: Can alert conditions be created for metric averages?
                    josepho

                    But an alert definition configured like that would be triggered if one datapoint met the condition correct?

                    Where as I am looking for a solution to alert against the baseline average when the baseline average (that I set through REST) exceeds a set threshold value.

                     

                    The reason I want to alert against an average of the datapoints over 15 minutes is to smooth out spikes in the metrics, and the current baseline implementation for alerts looks like it would still be triggered by a single high datapoint.

                    • 7. Re: Can alert conditions be created for metric averages?
                      genman

                      You can do something like, if a measurement exceeds N consecutive times in a row, then trigger an alarm. (I don't know what 'consecutive' means, though. Does it mean every time the measurement is scheduled to be taken or something else?)

                       

                      This is like checking an average.

                      • 8. Re: Can alert conditions be created for metric averages?
                        jayshaughnessy

                        Yes, I agree with Elias, it sounds like you could use dampening against the raw metrics being reported to make the alerting tolerant of spikes. The dampening feature is exactly for this purpose.