10 Replies Latest reply on May 15, 2014 1:39 PM by spolti

    RHQ JBoss and Tomcat plugins false alerts.

    spolti

      Hello Guys.

       

      Yesterday I updated my RHQ Server from version 4.6 to 4.10 and everything is working fine except the availability alerts for Jboss and Tomcat servers.

      The RHQ is generating many false alerts.
      In version 4.6 this did not occur.


      How can I make the RHQ really recognize that the server is stopped so that it is not sending false alerts?

      There is any Timeout that can be configurated?



      Any help is welcome.

      Thanks.

        • 1. Re: RHQ JBoss and Tomcat plugins false alerts.
          spolti

          Anyone?

          • 2. Re: RHQ JBoss and Tomcat plugins false alerts.
            genman

            What do you see in the logs for the agent?

             

            I've seen issues when the plugin is upgraded the monitoring stops working.

            • 3. Re: RHQ JBoss and Tomcat plugins false alerts.
              spolti

              Hi.

              Thanks for you reply.

               

              The only different message that i saw in the agent logs was:

               

              2014-05-12 11:53:06,397 WARN  [ConfigurationManager.threadpool-1] (rhq.core.pc.configuration.ConfigurationCheckExecutor)- Plugin Error: Invalid Core Address resource configuration returned by JBossAS7 plugin - Required property 'delete-durable-queue' has a null value in PropertyMap[id=0, name=role, map={name=PropertySimple[id=0, name=name, value=guest, override=null], send=PropertySimple[id=0, name=send, value=true, override=null], consume=PropertySimple[id=0, name=consume, value=true, override=null], create-durable-queue=PropertySimple[id=0, name=create-durable-queue, value=false, override=null], delete-durable-queue=PropertySimple[id=0, name=delete-durable-queue, value=null, override=null], manage=PropertySimple[id=0, name=manage, value=null, override=null]}].

               

              From the server side i didn't see strange logs.

              /trhead

              I already tried up the agent pool executor thread to 10 threads (<entry key="rhq.agent.plugins.operation-invoker.threadpool-size" value="5"/>) and change the connector transport url :

               

                         <entry key="rhq.communications.connector.transport-params" value="serverBindAddress=10.1.0.203&amp;serverBindPort=16163&amp;numAcceptThreads=3&amp;maxPoolSize=303&amp;clientMaxPoolSize=304&amp;socketTimeout=60000&amp;enableTcpNoDelay=true&amp;backlog=200" />
                        

               

               

              Nothing that i did works until now...

               

              Thanks.

              • 4. Re: RHQ JBoss and Tomcat plugins false alerts.
                jayshaughnessy

                Not related, but to let you know, RHQ 4.11 was just released.   Also probably not related but did you install 4.10 from scratch or upgrade?  

                 

                Can you be a little more specific about your alert definitions?   The conditions, whether or not you are using recovery alerting, dampening, notification types, etc.  There are no general known issues with this type of alerting so we'll likely need to dig a little deeper.

                • 5. Re: RHQ JBoss and Tomcat plugins false alerts.
                  spolti

                  Hi Jay, thanks for you reply.

                   

                  I upgrade from version 4.6, is not a fresh installation. And all agents was upgraded automatically when started again.

                   

                  I read about the rhq server migration  from 4.10 to 4.11 is not working yet. (i can be wrong).

                   

                  The alert definitions are just Availability[Goes Down] for both, AS7 Standalone and Tomcat Servers (version 6).

                  The Notifications configured are Direct E-mails and Mobicents (SMS).

                  And i don't have recovery and dampening configured yet.

                   

                  To configure the alerts I'm using from the alert definitions templates.

                   

                  And I've tried to upgrade the agents memory to min 128M and max 256M, no success too.. =//

                   

                  Thanks.

                  • 6. Re: RHQ JBoss and Tomcat plugins false alerts.
                    pathduck

                    Hello Filippe,

                    I wonder if this is something like what I am seeing in our monitoring of JBoss CE and Tomcat servers. There are a lot of short timespans where availbility goes Down, even if I know that the server was up. There are no indications of high load or network issues, and every time it's usually just 1-2 minutes of Down. See screenshot of how it looks:

                     

                    Clipboard01.png

                     

                    Is this the same you're seeing?

                     

                    I am guessing this has to do with some timeout/concurrency issues in the Agents and their avail. scanning and possibly it's better in later version - we're running 4.51 in production.

                     

                    This means if you want alerts, they have to be set to "down for 10 minutes" which is not very nice - and even then sometimes we get false alerts.

                    • 7. Re: RHQ JBoss and Tomcat plugins false alerts.
                      jayshaughnessy

                      The 4.11 upgrade has one issue which requires a small manual DB fix after the upgrade and before startup.  It should not prevent you from upgrading if you are interested in doing so. Although, there are no changes relevant to your issue here.

                       

                      You are using GOES DOWN AVAIL conditions.  It is good that you are not using dampening because dampening actually does not have any effect on availability conditions, because they trigger on discrete changes, not temporal or repetitive conditions (meaning availability changes are reported once).

                       

                      If you are getting alert triggers it indicates (as you would expect) that the availability was not DOWN and then changed to DOWN.  This can happen for various reasons, typically due to, as Stian said, load issues that somehow are timing out your avail checks.  I'd take a look at your agent machine and look atthe load.  And also the log file and see if you can learn why the avail checks are intermittently failing.

                       

                      In Version 4.10 we actually implemented https://bugzilla.redhat.com/show_bug.cgi?id=971556 which actually helps slow avail checking overall.  if you can't seem to get around the failures, you could change to AVAIL_DURATION conditions.  Something like GOES DOWN and STAYS DOWN 3 minutes, or whatever duration helps you avoid your short blips in avail.

                      1 of 1 people found this helpful
                      • 8. Re: RHQ JBoss and Tomcat plugins false alerts.
                        spolti

                        Hi Stian/Jay,

                         

                        Yes, this is exactly what's happening here.

                         

                        Sry about this questions, but, where can I change the AVAIL_DURATION conditions values?

                        It is in the database?

                         

                         

                        Thanks.

                        • 9. Re: RHQ JBoss and Tomcat plugins false alerts.
                          pathduck

                          Sry about this questions, but, where can I change the AVAIL_DURATION conditions values?

                          It is in the database?

                           

                          You just need to create alerts with the condition "Stays Down for 10 Minutes". Or five minutes.... For instance I have a situation "Jboss-Server-DOWN" with Availability Duration = Stays Down for 10 Minutes.

                           

                          This avoids most of the false alerts we had been getting because of the small glitches in avail. checking. Of course the result is also that it might take 10+ minutes for a server to be reported Down by RHQ, but for our situation this is OK. For others it might not be.

                           

                          However, like I said, we're using 4.5.1 in Production, and I think (like Jay mentions) that this has been quite a lot improved since this rather old (2+ years?) version. So if you still are getting minor drops in availability for no good reason, it might be something else.

                          1 of 1 people found this helpful
                          • 10. Re: RHQ JBoss and Tomcat plugins false alerts.
                            spolti

                            Maybe when the GC is triggered and make the JVM unresponsive about X seconds the RHQ server understand the resource as down , I'll configure concurrent GC and check if this problem is gone and I'll post here the results.

                             

                            Anyway, thank you guys for your attention.

                             

                            Regards.