We usually run Infinispan with 1 GB ParNew for new + CMS for old generation. However, your behaviour is quite strange. Btw., is that minor GC (the chart you've posted suggests that) or full GC (though, 10 seconds is a lot even for 5GB heap)?
What is the size of those entries? Are they somehow linked to other objects, or are these just data holders?
Do you use any listeners on the cache?
I guess you have to find which objects were created and caused the GC, possibly by diffing heap dumps before and after requests.
Radim, thanks for your reply.
Entry size is a bit over 16KB in main memory (4096-dim float array + metadata), on disk it is about 6KB (due to compression). I have no listeners, the objects are simple data holders.
Each "request" reads 3000 objects, which means about 60MB of memory (+ some temporary data); the request also executes some operation on the read data (DistributedCallable) but it does not allocate anything special. After 10 requests, over 0.8GB memory is allocated and this causes the full GC which takes these 10s...
Setting -XX:-UseConcMarkSweepGC -XX:NewSize=1g did not help much and the heap dumps did not show anything unexpected :-/ I don't seem to have any "memory leak"...
You're talking about Full GC, but as long as you don't keep any references to those ~0.8 GB data too long these should be held only in new generation. Collecting this new generation takes usually few milliseconds, therefore, your 10 seconds suggests that you have something set wrong.
To check whether the problem is in Infinispan, try to mock the request with creating the same amount of dummy data instead.
Dear Radim and others,
It seems I have figured out what is going on using the "Visual GC" tool (usable like a plugin to VisualVM) - its a great tool!
As I said, I have all data on disk and only some number in memory: <eviction strategy="LRU" maxEntries="10000"/>
All entries extracted from the cache stay in Eden (as they should) BUT when the GC (now I am using the G1) clears the Eden, those 10000 entries that are cached by Ispn in memory move to Survivor space :-) In this way, the OldGen slowly grows and once in while the full GC has to take part (in case of G1, only the G1 Evacuation phase stops the process, so it decreased the pause time to 7-8s).
The full solution for me is to decrease the number of really cached entritries from tens of thousand down (to zero?) but is not a very nice solution :-)
The OldGen space should be huge enough to keep all entries in it + an extra space to work with. This should have no issues.
Problematic might if you have a small Eden or survivor and the OldGen gets messed up with objects which are garbage in the near future. Then you need to consider a different eden setting to ensure that those objects are dead before moved to OldGen.
I managed to find a suitable memory and GC settings for my scenario: use G1GC with high value of tenuring threshold. Specifically:
-Xmx5500m -XX:-UseConcMarkSweepGC -XX:NewSize=1000m -XX:SurvivorRatio=3 -XX:G1ReservePercent=5 -XX:MaxTenuringThreshold=25 -XX:InitialTenuringThreshold=10 -XX:+UseG1GC
Using this settings, the long "spot-the-world" collection in OldGen is not executed at all and everything is realized within eden.