7 Replies Latest reply on Apr 2, 2016 12:42 AM by karthikraj

    Delay in collecting metrics

    karthikraj

      Hello guys,

       

      I am always seeing the below messages in my JON agent.

       

      2016-03-31 10:11:16,156 DEBUG [MeasurementManager.collector-1] (rhq.core.pc.measurement.MeasurementManager)- Rescheduling [1] late collections: [ScheduledMeasurementInfo[res=103142, name=connectionAvailable, sched=512527]]

      2016-03-31 10:11:16,156 DEBUG [MeasurementManager.collector-1] (rhq.core.pc.measurement.MeasurementCollectorRunner)- Measurement collection is falling behind... Missed requested time by [30044ms]

      2016-03-31 10:11:16,156 DEBUG [MeasurementManager.collector-1] (rhq.core.pc.measurement.MeasurementManager)- Rescheduling [1] late collections: [ScheduledMeasurementInfo[res=106146, name=wait-time, sched=588438]]

      2016-03-31 10:11:16,156 DEBUG [MeasurementManager.collector-1] (rhq.core.pc.measurement.MeasurementCollectorRunner)- Measurement collection is falling behind... Missed requested time by [30043ms]

      2016-03-31 10:11:16,156 DEBUG [MeasurementManager.collector-1] (rhq.core.pc.measurement.MeasurementManager)- Rescheduling [1] late collections: [ScheduledMeasurementInfo[res=101100, name=connectionAvailable, sched=552083]]

      2016-03-31 10:11:16,156 DEBUG [MeasurementManager.collector-1] (rhq.core.pc.measurement.MeasurementCollectorRunner)- Measurement collection is falling behind... Missed requested time by [30035ms]

      2016-03-31 10:11:16,156 DEBUG [MeasurementManager.collector-1] (rhq.core.pc.measurement.MeasurementManager)- Rescheduling [1] late collections: [ScheduledMeasurementInfo[res=100959, name=connectionAvailable, sched=565576]]

      2016-03-31 10:11:16,156 DEBUG [MeasurementManager.collector-1] (rhq.core.pc.measurement.MeasurementCollectorRunner)- Measurement collection is falling behind... Missed requested time by [30004ms]

      2016-03-31 10:11:16,156 DEBUG [MeasurementManager.collector-1] (rhq.core.pc.measurement.MeasurementManager)- Rescheduling [1] late collections: [ScheduledMeasurementInfo[res=107684, name=collection-fetch-count, sched=634375]]

      2016-03-31 10:11:16,202 DEBUG [MeasurementManager.collector-1] (rhq.core.pc.measurement.MeasurementCollectorRunner)- Measurement collection is falling behind... Missed requested time by [30035ms]

      2016-03-31 10:11:16,202 DEBUG [MeasurementManager.collector-1] (rhq.core.pc.measurement.MeasurementCollectorRunner)- The most recent measurement requests prior to this detected delay:

        Thu Mar 31 10:11:16 IST 2016 ScheduledMeasurementInfo[res=107636, name=invocations, sched=633151]

      2016-03-31 10:11:16,202 DEBUG [MeasurementManager.collector-1] (rhq.core.pc.measurement.MeasurementManager)- Rescheduling [1] late collections: [ScheduledMeasurementInfo[res=101348, name=connectionAvailable, sched=557111]]

      2016-03-31 10:11:16,202 DEBUG [MeasurementManager.collector-1] (rhq.core.pc.measurement.MeasurementCollectorRunner)- Measurement collection is falling behind... Missed requested time by [30001ms]

      2016-03-31 10:11:16,202 DEBUG [MeasurementManager.collector-1] (rhq.core.pc.measurement.MeasurementManager)- Rescheduling [1] late collections: [ScheduledMeasurementInfo[res=107131, name=peak-concurrent-invocations, sched=620931]]

      2016-03-31 10:11:16,248 DEBUG [MeasurementManager.collector-1] (rhq.core.pc.measurement.MeasurementCollectorRunner)- Measurement collection is falling behind... Missed requested time by [30026ms]

      2016-03-31 10:11:16,248 DEBUG [MeasurementManager.collector-1] (rhq.core.pc.measurement.MeasurementCollectorRunner)- The most recent measurement requests prior to this detected delay:

        Thu Mar 31 10:11:16 IST 2016 ScheduledMeasurementInfo[res=109534, name=collection-recreated-count, sched=611831]

      2016-03-31 10:11:16,248 DEBUG [MeasurementManager.collector-1] (rhq.core.pc.measurement.MeasurementManager)- Rescheduling [1] late collections: [ScheduledMeasurementInfo[res=107281, name=second-level-cache-put-count, sched=624388]]

      2016-03-31 10:11:16,293 DEBUG [MeasurementManager.collector-1] (rhq.core.pc.measurement.MeasurementCollectorRunner)- Measurement collection is falling behind... Missed requested time by [30031ms]

      2016-03-31 10:11:16,293 DEBUG [MeasurementManager.collector-1] (rhq.core.pc.measurement.MeasurementCollectorRunner)- The most recent measurement requests prior to this detected delay:

        Thu Mar 31 10:11:16 IST 2016 ScheduledMeasurementInfo[res=107688, name=second-level-cache-miss-count, sched=634555]

      2016-03-31 10:11:16,293 DEBUG [MeasurementManager.collector-1] (rhq.core.pc.measurement.MeasurementManager)- Rescheduling [1] late collections: [ScheduledMeasurementInfo[res=102549, name=MaxWaitTime, sched=534066]]

      2016-03-31 10:11:16,293 DEBUG [MeasurementManager.collector-1] (rhq.core.pc.measurement.MeasurementCollectorRunner)- Measurement collection is falling behind... Missed requested time by [30010ms]

       


      However  getting values in the dashboard , I suspect they are not current values because of above message logs.(Measurement collection is falling behind... Missed requested time by [30001ms]) .
      Does the dashboard showing the present values or prior values.?
      Please help on this

        • 1. Re: Delay in collecting metrics
          pathduck

          The values shown in JON/RHQ metrics are collected values, i.e. they are "prior" values from the last time the metric was collected. So they are actually never 'current' values. Can't really think of any tools that show "current" values, possibly apart from Jconsole, they are all dependent on being collected from the JVM at intervals. JConsole/JVisualvm just does it more often.

           

          When this collection time is, depends on the collection interval/schedule set for the specific metric.

           

          If the measurement collection is falling behind, then it might be a good idea to check your JON Server/Platforms for excessive load. However, note that if you log DEBUG, you might see warnings that are not really things you need to worry about. Turn off DEBUG and concentrate on WARNings and ERRORs, and see if your metrics are getting collected.

           

          Hope that explains it,

          Stian

          • 2. Re: Delay in collecting metrics
            jayshaughnessy

            This is the second time you've reported something that indicates you're under too much load. (the first being the UI message about the server not responding fast enough).  Ensure your server and agent machines are not under too much load, your network connections are reliable, etc, and that your db is performing.  Your Agent could be trying to do too much, turn off metric collection that isn't necessary. Things like that.  How many app servers are you trying to monitor?

            • 3. Re: Delay in collecting metrics
              pathduck

              Developers always seem to think it's a Good Idea(tm) to log debug in dev, and then they "forget" to turn it off when deploying to production. Then they complain their application is running slow... sad but true...

              • 4. Re: Delay in collecting metrics
                karthikraj

                Hello Jay,

                 

                 

                Q.How many app servers are you trying to monitor?

                A.our environment has 28 groups(52 nodes)

                 

                 

                *Ensure your server and agent machines are not under too much load ---> I don't see much load and CPU utilization in my server and agents.
                How to watch my db performance ?? do u want me to have separate tool to monitor db performance?

                 

                *XMX values for both agent and servers are 2048M
                *Still I am using debug mode to analyze the logs (hope this will not have much impact for delay in measurement collection   "Measurement collection is falling behind... Missed requested time by [30026ms]" )

                * As of now The metrics I am monitoring is RAM usage , CPU utilization of individual nodes(52 nodes) , average load for all the nodes and max,used heap usages
                *Scheduled intervals  are
                      Load average  -- every 3 minutes
                     CPU user load --2 minutes
                      RAM usage -- 5 minutes

                      used heap - 20 mts
                     max heap usage - 40 mts

                 

                These are the metrics I have enabled with purpose and I am not sure what are the other default metrics agents is collecting.

                • 5. Re: Delay in collecting metrics
                  karthikraj

                  Adding to the things , I could see much cpu utilization in agent since the agent machine has 8 cores almost all of them are 60% busy (since all the nodes are running i could see more java processes.)

                  • 6. Re: Delay in collecting metrics
                    jayshaughnessy

                    Is that 1 agent per node or 1 agent monitoring everything?  Certainly your agent looks to be under strain.  How many resources have been imported for the one agent?  The metric collection seems fine/conservative for the CPUs and JVM so there must be something else, with metric collection outside of these resources or just a vast number of resources.

                    • 7. Re: Delay in collecting metrics
                      karthikraj

                      Hello Jay,

                       

                      One agent is monitoring everything and have imported a single resource (EAP)

                       

                       


                      I want to look the cassandra schemas /tables.Can you please suggest me tool