0 Replies Latest reply on May 11, 2016 5:16 PM by timothy.allen

    Hawkular Cassandra Issues/Tuning

    timothy.allen

      I am experiencing stability issues running hawkular metrics on openshift. Every 10 minutes or so my 3 node cassandra cluster becomes unavavailable and I lose metrics. I guessing it's either garbage collection or compaction because it only happens after the cluster has been running for a while.

       

      Some notes:

      Each node has ample resources 64GB of RAM and 24 Cores

      The nodes never use more than 10 gigs or get above 600 millicores

      During the unavailable times the cassandra nodes report high numbers in the mutation pending column

      I'm not really collecting that many metrics right now (maybe 20 containers). God help me when there hundreds of containers running!

       

      Some adjustments and ideas:

      I changed the jvm heap according to this document Tuning Java resources however I did not change the garbage collector type.

      I noticed that hawkular is using the LCS compaction strategy. Wouldn't the DTCS strategy be more appropriate? Configuring compaction

       

      Any help would be great! Let me know if you need any more info.