1 Reply Latest reply on Jun 17, 2013 3:48 PM by john.sanda

    Cassandra and RHQ installation questions

    genman

      I've been following the talk about Cassandra metrics storage in the upcoming RHQ, but I have some questions unanswered.

       

      1. Is migration to Cassandra metrics storage mandatory? If not, will the existing RDBS metrics storage work in the future?
      2. Can RHQ work (and be well supported) with a Cassandra instance installed using standard RPM installs? (e.g something like http://www.datastax.com/docs/0.8/install/install_package )
      3. Are there advantages (performance-wise) for co-locating Cassandra and RHQ? Likewise, would it make sense for every Cassandra node to run RHQ? (I don't think having more RHQ instances really will improve UI performance, but maybe it does?)
      4. Are there plans to migrate anything other than metrics to Cassandra? (Events, alerts, etc. come to mind.)
        • 1. Re: Cassandra and RHQ installation questions
          john.sanda

          These are excellent questions. I will do my best to answer each one. I also want to introduce a bit of terminology that will be helpful in these sorts of discussions. We are referring to the Cassandra instances that RHQ uses as RHQ storage nodes (or just storage nodes) in part to differentiate them from arbitrary Cassandra nodes.

           

          Is migration to Cassandra metrics storage mandatory? If not, will the existing RDBS metrics storage work in the future?

          Yes it is mandatory. The new metrics backend will be the only implementation used by RHQ.

           

           

          Can RHQ work (and be well supported) with a Cassandra instance installed using standard RPM installs? (e.g something like http://www.datastax.com/docs/0.8/install/install_package )

          RHQ will not support an arbitrary Cassandra instance. We neither expect nor want to burden users with having to manage another database in order to use RHQTo the greatest extent possible, it is intended to be an implementation detail. The storage node installed by RHQ is pre-configured and customized specifically for use with RHQ. The RHQ server will automate managing the storage node instances used for metrics storage; consequently, you will have to run an agent along side Cassandra so that RHQ can manage it.

           

           

          Are there advantages (performance-wise) for co-locating Cassandra and RHQ? Likewise, would it make sense for every Cassandra node to run RHQ? (I don't think having more RHQ instances really will improve UI performance, but maybe it does?

          In terms of performance the primary advantage would be avoiding network IO. A big motivation for co-locating them is to keep the added complexity of using and managing RHQ to a minimum. While the default will be to co-locate the server and storage node, you will have ability to install them on separate machines. When you are at a point where you need to scale out and deploy another RHQ storage node, installing another RHQ server really won't have any impact on that.

           

          Are there plans to migration anything other than metrics to Cassandra? (Events, alerts, etc.)

          In RHQ 4.8 we will only be storing numeric metrics in Cassandra. We are considering storing call time and trait metrics as well as events. Baselines are another possibility that have been discussed.

           

           

          - John