As you may be aware, the metrics storage backend re-implemented in RHQ 4.8. This could very well be a bug. I think the easiest way to determine what is happening is to look at the actual data points. Since RHQ does not provide APIs to directly access the data points (except for raw data), the easiest thing to do is directly query Cassandra. If you want to do that, here are the steps you need to perform:
- Log into RHQ
- Navigate to Administration --> Storage Nodes
- Select storage node row so that it is highlighted
- In the footer, click on the Operation button and then in the pop-up click on Enable Debug Mode
- Create a text file named raw_metrics.cql that contains the text, SELECT * FROM raw_metrics WHERE schedule_id = <schedule_id>;
- Create a text file named one_hour_metrics.cql that contains the text, SELECT * FROM one_hour_metrics WHERE schedule_id = <schedule_id>;
- In a terminal cd into <rhq-server-dir>rhq-storage/bin
- Execute ./cqlsh -u rhqadmin -p rhqadmin -k rhq -f raw_metrics.cql > raw_metrics.txt
- Execute ./cqlsh -u rhqadmin -p rhqadmin -k rhq -f one_hour_metrics.cql > one_hour_metrics.txt
The above obviously assumes you are querying for one hour data. If querying for older data, it is just a matter of changing the table name. If you have a lot of data, we can apply a date filter to limit the results. If you can share the data points, I would be more than happy to take a closer look and try to figure out what is happening. Lastly, what date ranges are you using in your queries?
I can't get that data that John asked for - I don't know what the schedule ID is for the metric that is bad.
I attached a debugger, found my schedule ID and ran the CQL. It turns out, I have data from days ago still. I had -Ddbsetup my RDBMS this mroning, but did not clean the storage node data. So my testing was bogus - I'm sure my data was not in sync between RDBMS and storage node. I closed the BZ as "not a bug".
So, make sure if you re-install RHQ and blow away your RDBMS data, you also blow away the storage node data as well.
I am unable to click the Operation button after selecting the storage node, and when I attempt to expand the node for more information, it throws an error. This is starting to sound like an issue with my storage installation, though data collection has seemed fairly functional thus far.
I'll plan on posting again after a fresh install. Thanks for all your help,
Alternatively, you can do this,
- Open <rhq-server-dir>/rhq-storage/conf/cassandra.yaml
- Find the start_rpc property and set it to true
- Restart the storage node
After that change, you can use cqlsh and run the queries I posted earlier.