1 Reply Latest reply on Jun 25, 2013 4:39 PM by mazz

    Agent memory monitoring

    genman

      I've had a few agents run out of memory, mostly due to too many resources, but sometimes due to memory leaks in bad plugins.

       

      I know the agent is supposed to restart when it is low on memory? How does the agent do this and why doesn't it always work? How can we check or alert for this?

       

      I also did some memory profiling, mostly to check on leaking. These are instance counts, not memory usage, but one thing that struck me is each resource, be it a MySQL table or something quite small, uses quite a lot of memory.

       

      One thing I noticed is each Resource.childResources object is a Set wrapping a ConcurrentHashMap, which is quite expensive in terms of memory use. This was probably the only thing that seemed a bit excessive. The JDom stuff is probably from parsing JBoss XML files, and I wonder if those references need to be kept or not.

       

      Instance Counts for All Classes (including platform)

      328758 instances of class java.lang.String

      294973 instances of class [C

      128861 instances of class java.util.HashMap$Entry

      96602 instances of class [Ljava.util.HashMap$Entry;

      92602 instances of class [Ljava.lang.Object;

      90346 instances of class java.util.ArrayList

      71976 instances of class java.util.concurrent.locks.ReentrantLock$NonfairSync

      71856 instances of class java.util.concurrent.ConcurrentHashMap$Segment

      71856 instances of class [Ljava.util.concurrent.ConcurrentHashMap$HashEntry;

      67739 instances of class java.util.HashMap

      64907 instances of class java.util.LinkedHashMap$Entry

      60993 instances of class org.rhq.core.domain.measurement.MeasurementScheduleRequest

      55696 instances of class java.util.HashMap$KeySet

      52850 instances of class java.util.HashSet

      41600 instances of class java.lang.Integer

      29953 instances of class org.rhq.core.pc.measurement.ScheduledMeasurementInfo

      28859 instances of class java.util.LinkedHashMap

      26987 instances of class java.lang.Long

      25175 instances of class java.util.TreeMap$Entry

      19993 instances of class [I

      18786 instances of class org.rhq.core.domain.configuration.PropertySimple

      17563 instances of class org.rhq.core.pc.plugin.CanonicalResourceKey$KeyTypePlugin

      17430 instances of class java.util.TreeMap

      17298 instances of class java.util.Hashtable$Entry

      13148 instances of class javax.management.modelmbean.DescriptorSupport$FieldName

      13052 instances of class org.rhq.core.pc.inventory.ResourceContainer$ResourceComponentInvocationHandler

      12755 instances of class java.util.LinkedHashSet

      12488 instances of class [B

      11886 instances of class java.util.LinkedList$Entry

      11845 instances of class java.util.LinkedList

      11821 instances of class org.mc4j.ems.store.CompleteValueHistory

      11671 instances of class org.rhq.core.domain.configuration.Configuration

      11664 instances of class org.mc4j.ems.impl.jmx.connection.bean.attribute.DAttribute

      7775 instances of class [S

      6956 instances of class java.util.HashMap$Values

      6927 instances of class java.util.HashMap$EntrySet

      6669 instances of class javax.xml.bind.JAXBElement

      6552 instances of class org.jdom.Text

      6378 instances of class org.rhq.core.domain.configuration.definition.PropertyDefinitionSimple

      6170 instances of class java.lang.Class

      6164 instances of class org.rhq.core.clientapi.descriptor.configuration.SimpleProperty

      6156 instances of class org.jdom.Attribute

      4933 instances of class [Ljava.util.Hashtable$Entry;

      4915 instances of class java.util.concurrent.ConcurrentHashMap$HashEntry

      4913 instances of class java.io.File

      4844 instances of class java.util.Hashtable

        • 1. Re: Agent memory monitoring
          mazz

          > I know the agent is supposed to restart when it is low on memory? How does the agent do this and why doesn't it always work? How can we check or alert for this?

           

          The agent "tries" to do this through the VM Health Check Thread

           

          But this isn't foolproof and if the VM is in just a really bad state, it may not even help.

           

          However, if you want to see what it does, see the source code for org.rhq.enterprise.agent.VMHealthCheckThread. It essentially looks at the memory pools every 5 seconds and if the memory levels hit a certain threshold, it attempts to perform GC and if memory is still critically low, the underlying plugin container and communications subsystems are shutdown and then restarted in the hopes of clearing up the situation (even if its just temporarily - which it probably is if you have an extremely large inventory or a leaking plugin).

           

          Note that this will not shutdown the entire JVM - nothing about this health check thread will restart a new JVM. This is all done internally within the same agent VM.