Did you analyze the garbage collection?
Most stops are related to a bad GC tuning
JVM information (CPU & memory usage from all heaps) would really help to find out what's going on. According to the startup script, you're running with the default GC, try to use CMS instead, even with such small heaps (1303 MB).
Also, the performance of HotRod request can differ with various key/value sizes, are these ~constant?
We updated the clustered.conf and adjusted the garbage collection. This seems to help.
# This file is optional; it may be removed if not needed.
# Specify the maximum file descriptor limit, use "max" or "maximum" to use
# the default, as queried by the system.
# Defaults to "maximum"
# Specify the profiler configuration file to load.
# Default is to not load profiler configuration file.
# Specify the location of the Java home directory. If set then $JAVA will
# be defined to $JAVA_HOME/bin/java, else $JAVA will be "java".
# Specify the exact Java VM executable to use.
if [ "x$JBOSS_MODULES_SYSTEM_PKGS" = "x" ]; then
# Uncomment the following line to prevent manipulation of JVM options
# by shell scripts.
# Specify options to pass to the Java VM.
if [ "x$JAVA_OPTS" = "x" ]; then
JAVA_OPTS="-Xms4G -Xmx8g -XX:MaxPermSize=256m -Djava.net.preferIPv4Stack=true \
-Dorg.jboss.resolver.warning=true -Dsun.rmi.dgc.client.gcInterval=3600000 \
-Dsun.rmi.dgc.server.gcInterval=3600000 -Djboss.modules.system.pkgs=$JBOSS_MODULES_SYSTEM_PKGS \
-Djava.awt.headless=true -Djgroups.bind_addr=infinispan-node -Djboss.bind.address=infinispan-node \
-Djgroups.udp.ip_ttl=1 -Djboss.bind.address.management=infinispan-node -Dsun.nio.ch.bugLevel=''"
echo "JAVA_OPTS already set in environment; overriding default settings with values: $JAVA_OPTS"
# Sample JPDA settings for remote socket debugging
# Sample JPDA settings for shared memory debugging
# Uncomment to not use JBoss Modules lockless mode
# Uncomment to gather JBoss Modules metrics
You should set the heap size -Xms8G -Xmx8g as this prevent from recalculating the amount of heap during runtime.
Also you might add -Xloggc:mylog.log -verbose:gc to analyse the behaviour, you'll find lot of explanations if you search for it.
Maybe if you use the latest Java6 or Java7 you can use the G1 GC which might have a better performance.
Yes we have been tuning the GC. But I now think that our pauses have to do more with the fact that we may be hitting the serer too hard with too many connections. We I did the thread dump the Hot Rod client is waiting for log stints to perform IO.
We have been playing around with the GC trying to get our max read time below 5ms and max write below 10ms without much luck. We do observe that the more client threads we have the more outliers we observe. Is there any reason that a dirty read would take longer than a write given that we are not doing locking? All our calls are synchronous.
Why do you expecte max latency below 10 ms when you're setting GC pause to 2000 ms? Obviously, max latency can be > 2000 ms if you try long enough (to let GC do its job, fulfilling the limit you're asking for).
Regrettably, with state-of-the art Java GCs this is not possible, Java is not hard real-time platform. Moreover if you use clustered mode, network latencies (or lost messages delivered later) can complicate it a lot more.
You can optimize only certain quantiles, but there will always be some outliers.