6 Replies Latest reply on Dec 7, 2014 4:53 AM by david.novak

    Garbage Collection pauses Ispn for 10s

    david.novak

      Hi,

       

      I am running a single-node Ispn 6.0.2 cache with 20M entries backed by a file loader with 10K entries in memory:

              <eviction strategy="LRU" maxEntries="10000"/>

              <persistence passivation="false">

                  <singleFile shared="false" preload="false" ignoreModifications="false" purgeOnStartup="false" fetchPersistentState="true"  location="ispn-cache-store">

                      <async enabled="false"/>

      My system is running requests approximately every 1-2 seconds and each request extracts about 3000 entries from Ispn.

      Approximately every 10-th request initiates a GC, which takes over 10s (and thus the request takes over 10s), which is of course not usable in production :-/

       

      Question: What can I do about it? :-)  Is there any recommendation to set the Java memory parameters (at the moment, I set only -Xmx5000m) or some other Oracle garbage collector?

       

      Thank you for any help

       

      David

       

      ispn-garbage-collection.png

        • 1. Re: Garbage Collection pauses Ispn for 10s
          rvansa

          We usually run Infinispan with 1 GB ParNew for new + CMS for old generation. However, your behaviour is quite strange. Btw., is that minor GC (the chart you've posted suggests that) or full GC (though, 10 seconds is a lot even for 5GB heap)?

           

          What is the size of those entries? Are they somehow linked to other objects, or are these just data holders?

          Do you use any listeners on the cache?

           

          I guess you have to find which objects were created and caused the GC, possibly by diffing heap dumps before and after requests.

          • 2. Re: Garbage Collection pauses Ispn for 10s
            david.novak

            Radim, thanks for your reply.

            Entry size is a bit over 16KB in main memory (4096-dim float array + metadata), on disk it is about 6KB (due to compression). I have no listeners, the objects are simple data holders.

            Each "request" reads 3000 objects, which means about 60MB of memory (+ some temporary data); the request also executes some operation on the read data (DistributedCallable) but it does not allocate anything special. After 10 requests, over 0.8GB memory is allocated and this causes the full GC which takes these 10s...

             

            Setting -XX:-UseConcMarkSweepGC -XX:NewSize=1g did not help much and the heap dumps did not show anything unexpected :-/  I don't seem to have any "memory leak"...

            • 3. Re: Garbage Collection pauses Ispn for 10s
              rvansa

              You're talking about Full GC, but as long as you don't keep any references to those ~0.8 GB data too long these should be held only in new generation. Collecting this new generation takes usually few milliseconds, therefore, your 10 seconds suggests that you have something set wrong.

               

              To check whether the problem is in Infinispan, try to mock the request with creating the same amount of dummy data instead.

              • 4. Re: Garbage Collection pauses Ispn for 10s
                david.novak

                Dear Radim and others,

                It seems I have figured out what is going on using the "Visual GC" tool (usable like a plugin to VisualVM) - its a great tool!

                ispn-garbage-collection-3.png

                As I said, I have all data on disk and only some number in memory: <eviction strategy="LRU" maxEntries="10000"/>

                All entries extracted from the cache stay in Eden (as they should) BUT when the GC (now I am using the G1) clears the Eden, those 10000 entries that are cached by Ispn in memory move to Survivor space :-)  In this way, the OldGen slowly grows and once in while the full GC has to take part (in case of G1, only the G1 Evacuation phase stops the process, so it decreased the pause time to 7-8s).

                 

                The full solution for me is to decrease the number of really cached entritries from tens of thousand down (to zero?) but is not a very nice solution :-)

                • 5. Re: Garbage Collection pauses Ispn for 10s
                  wdfink

                  The OldGen space should be huge enough to keep all entries in it + an extra space to work with. This should have no issues.

                   

                  Problematic might if you have a small Eden or survivor and the OldGen gets messed up with objects which are garbage in the near future. Then you need to consider a different eden setting to ensure that those objects are dead before moved to OldGen.

                  • 6. Re: Garbage Collection pauses Ispn for 10s
                    david.novak

                    I managed to find a suitable memory and GC settings for my scenario: use G1GC with high value of tenuring threshold. Specifically:

                    -Xmx5500m -XX:-UseConcMarkSweepGC -XX:NewSize=1000m -XX:SurvivorRatio=3 -XX:G1ReservePercent=5 -XX:MaxTenuringThreshold=25 -XX:InitialTenuringThreshold=10 -XX:+UseG1GC

                    Using this settings, the long "spot-the-world" collection in OldGen is not executed at all and everything is realized within eden.

                    Thank everybody.