you are not exactly running into an open door - but at least it is not locked shut :-) In fact we did already have discussions of using such a nosql store for the events subsystem ( Events tab, following of log files) . But you are absolutely right, that the metrics do not require the full relational and transactional guarantees and the eventually consistent should be good enough.
Pushing then to Hadoop/ HBase would also have the advantage that more sophisticated computation like baselines (or future statistical analysis) could be run on the Hadoop nodes as map-reduce jobs.
Having said this - it is not entirely trivial to "just switch", but most part of metric storage and retrieval is hidden behind one session bean, which could be swapped out by a Hadoop version. The harder thing would be to re-write the baseline computation.
Would you be interested in doing some coding in that area?
Of course I'm interested but it's going to be a question of if I can prove a need for this, I can work on this, and I can share back with the community my changes. So there's a lot of "ifs."
I think a proof of concept would be nice where the Storage and Retreival of raw metrics (rhq_meas_data_num_r?? tables) would go to hadoop.
This would allow to get a feel for this (what is needed setup-wise, how could the performance look like).
If this looks good, it could serve as a starting point for more work. Also having a poc would allow to get feedback from other users.
How can I help you to get you going here?
Having learned a lot about Hadoop, Hive provides a SQL-like interface with the storage of Hadoop.
Steps off the top of my head:
* Define the Hive schema. Definition of partitions.
* Necessary query changes
* Necessary code changes...Not sure Hibernate would work OOB?
* Necessary logic changes...For example, it may not be necessary to compact old data. It may be necessary to cache certain things, since Hive is quite slow.
* Build integration
* Configuration and installation. Part of setup, you'd indicate a secondary data store.
* Define the Hive schema
* Testing, etc.
Do you have access to a Hadoop cluster at RedHat?