0 Replies Latest reply on May 11, 2016 5:16 PM by timothy.allen

Hawkular Cassandra Issues/Tuning

timothy.allen May 11, 2016 5:16 PM

I am experiencing stability issues running hawkular metrics on openshift. Every 10 minutes or so my 3 node cassandra cluster becomes unavavailable and I lose metrics. I guessing it's either garbage collection or compaction because it only happens after the cluster has been running for a while.

Some notes:

Each node has ample resources 64GB of RAM and 24 Cores

The nodes never use more than 10 gigs or get above 600 millicores

During the unavailable times the cassandra nodes report high numbers in the mutation pending column

I'm not really collecting that many metrics right now (maybe 20 containers). God help me when there hundreds of containers running!

Some adjustments and ideas:

I changed the jvm heap according to this document Tuning Java resources however I did not change the garbage collector type.

I noticed that hawkular is using the LCS compaction strategy. Wouldn't the DTCS strategy be more appropriate? Configuring compaction

Any help would be great! Let me know if you need any more info.