2 Replies Latest reply on Dec 15, 2014 10:23 AM by john.sanda

    Shared backend of rhq-inventory and rhq-metrics?

    lkrejci

      For RHQ inventory, we're thinking of using some sort of graph database for storing the resources and their relationships. One of the candidates is Titan graph db. Titan supports multiple backends one of which is Cassandra.

       

      Titan also can use Elastic Search for global, graph-wide indexes. These improve performance of queries spanning the whole graph which will likely be used in rhq-inventory as the a means for finding "roots" of our graph in different "views" of it ("select all platforms" for the the "classical RHQ" view, "select all applications" for app-centric view, etc).

       

      While I don't follow rhq-metrics discussion in detail, I've noticed mentions of Elastic Search as a possible choice for processing log events.

       

      IMHO being able to share the storage backend for 2 of our subsystems would benefit the user very much but of course much more investigation would be needed - for example the performance characteristics required from Cassandra might be totally different for rhq-metrics and rhq-inventory which could make the configuration of the shared Cassandra cluster more difficult - I just don't know.

       

      But I want to throw this idea out there so that we can take this into account when we make the decisions about rhq-inventory and log events storage.

        • 1. Re: Shared backend of rhq-inventory and rhq-metrics?
          pilhuhn

          Do we know how graph performance is with titan/C* vs. titan/ES?  Same for metric storage/retrieval.

           

          I agree that we should keep the number of backend datastores as small as possible, as otherwise the "management of the management" app will kill us.

          • 2. Re: Shared backend of rhq-inventory and rhq-metrics?
            john.sanda

            I am not too familiar with Titan, but I am not sure that the usage of Cassandra and Elastic Search is mutually exclusive. Titan can use Cassandra as a primary data store, and can use Elastic Search as an index. This leads me to believe that we could have Titan/C*, Titan/ES/some_other_storage, or Titan/C*/ES.

             

            With respect to sharing a Cassandra cluster, I think it is feasible and something to consider. I would expect the performance characteristics between rhq-metrics and rhq-inventory to be quite different as well. Compaction, compression, and row caching are configurable per table. Consistency is configurable per request. Data directories can be stored in different locations as well making it easier to utilize hardware better suited for different workloads. I think that the shared JVM heap is something to think about; however, most things have been moved off heap as you read about in http://www.datastax.com/dev/blog/off-heap-memtables-in-cassandra-2-1.