6 Replies Latest reply on Jan 4, 2013 9:15 AM by mazz

    Agent fails to send all custom metrics to the server

    dfradkov

      Hello,

       

      We have created a cutom plugin for additional moinitoring. However we ran into the issue where agent waits for 30 seconds before sending data over to the server however if there are issues with the monitored resources it takes a lot longer to get all the metrics.The funny part is that agent does finish collecting metrics AFTER report was sent to the server therefore a lot of them do not make it to the server. Is there anyways to set timeout say at 60 seconds?

       

      We run RHQ 4.4.0

       

      Here is the warning

       

      2012-12-18 13:38:53,970 WARN  [MeasurementManager.collector-1] (rhq.core.pc.measurement.MeasurementCollectorRunner)- Failure to collect measurement data for Resource[id=10561, uuid=f9ac79ca-9901-476b-a336-11797ed4aa13, type={CrHealthChecks}Script Server, key=/opt/rhq/custom-monitoring/scripts/crCustomMonitoring, name=crCustomMonitoring, parent=local-rhq] - cause: org.rhq.core.pc.inventory.TimeoutException:Call to [org.rhq.plugins.script.ScriptServerComponent.getValues()] with args [[org.rhq.core.domain.measurement.MeasurementReport@e5b4a2b, [ScheduledMeasurementInfo[res=10561, name={-config config_CRHealthCheck -met inspector -env prod}|, sched=18894], ScheduledMeasurementInfo[res=10561, name={-config config_CRHealthCheck -met wshealthcheck -env crstagingsdk1}|, sched=18871], ScheduledMeasurementInfo[res=10561, name={-config config_CRHealthCheck -met wshealthcheck -env crprodsdk1}|, sched=18881], ScheduledMeasurementInfo[res=10561, name={-config config_CRHealthCheck -met mllphealthcheck -env crstagingweb1}|, sched=18885], ScheduledMeasurementInfo[res=10561, name={-config config_CRHealthCheck -met wshealthcheck -env crprodsdk2}|, sched=18882], ScheduledMeasurementInfo[res=10561, name={-config config_CRHealthCheck -met ev -env staging}|, sched=18889], ScheduledMeasurementInfo[res=10561, name={-config config_CRHealthCheck -met ev -env prod}|, sched=18890], ScheduledMeasurementInfo[res=10561, name={-config config_CRHealthCheck -met webReports -env staging}|, sched=18891], ScheduledMeasurementInfo[res=10561, name={-config config_CRHealthCheck -met mllphealthcheck -env crprodweb1}|, sched=18887], ScheduledMeasurementInfo[res=10561, name={-config config_CRHealthCheck -met wshealthcheck -env crstagingweb2}|, sched=18874], ScheduledMeasurementInfo[res=10561, name={-config config_CRHealthCheck -met wshealthcheck -env crstagingsdk2}|, sched=18872], ScheduledMeasurementInfo[res=10561, name={-config config_CRHealthCheck -met wshealthcheck -env crprodweb2}|, sched=18884], ScheduledMeasurementInfo[res=10561, name={-config config_CRHealthCheck -met wshealthcheck -env crprodweb1}|, sched=18883], ScheduledMeasurementInfo[res=10561, name={-config config_CRHealthCheck -met mllphealthcheck -env crprodweb2}|, sched=18888], ScheduledMeasurementInfo[res=10561, name={-config config_CRHealthCheck -met wshealthcheck -env crstagingweb1}|, sched=18873], ScheduledMeasurementInfo[res=10561, name={-config config_CRHealthCheck -met mllphealthcheck -env crstagingweb2}|, sched=18886], ScheduledMeasurementInfo[res=10561, name={-config config_CRHealthCheck -met inspector -env staging}|, sched=18893], ScheduledMeasurementInfo[res=10561, name={-config config_CRHealthCheck -met webReports -env prod}|, sched=18892]]]] timed out after 30000 milliseconds - invocation thread will be interrupted.

       

      This error I beleive is caused by the fact that collecting thread has been interupted.

      2012-12-18 13:38:53,971 ERROR [ResourceContainer.invoker.daemon-1] (org.rhq.plugins.script.ScriptServerComponent)- Failed to obtain measurement [{-config config_CRHealthCheck -met wshealthcheck -env crprodsdk2}|]. Cause: java.lang.NumberFormatException: empty String

       

      Next line tells me that it took just over 34 seconds for metric collection to seize.

      2012-12-18 13:38:53,975 INFO  [MeasurementManager.sender-1] (rhq.core.pc.measurement.MeasurementSenderRunner)- Measurement collection for [688] metrics took 34005ms - sending report to Server...

       

      Thank you