I have been thinking some about what kinds of ingestion rates we want to or need to support, partly because this could affect schema design. By ingestion rate, I guess I really mean the collection rate for a single metric. Let me talk a little about RHQ to better frame the discussion. Unlike RHQ Metrics, RHQ has only one metrics collector, the RHQ agent (let's set aside the RHQ REST API for the moment). Due to limitations on the agent side, the maximum collection frequency for any single metric is 30 seconds. RHQ has a fixed retention period of seven days for raw data. This works out to a maximum of 20,160 live values per partition. RHQ partitions data by measurement schedule id. Similarly, RHQ Metrics is partitioning data by metric id thus far. If we intend to support much faster ingestion rates per metric, then we may want to consider partitioning data differently. Suppose we support an ingestion rate of 1 second (again, this is the rate at which we consume data points for a single metric), and assume for the moment the same retention period of 7 days. That works out to a maximum of 604,800 live cells/values. Personally, I do not see any reason why we would not support or allow sub-second or even sub-millisecond ingestion rates. In fact Cassandra offers the TimeUUID data type since Java does not support sub-millisecond timestamp resolution. And I definitely think we want to allow for retention periods longer longer than 7 days. The numbers get pretty big pretty fast. Storing all of those data points in a single partition would be detrimental to performance.
Depending on the ingestion rates and retention periods we support, we might want to partition data by metric id and by month, or by week, or even by day for example. Doing so will not have any negative impact on write performance, but it will certainly make queries more complex as multiple reads will have to be performed with results being merged client side.
It would be nice if there was a "one size fits all solution", but I am skeptical that we will find that to be the case. I think it is more likely that we will need to dynamically adjust our partitioning based on ingestion rates and retention periods.
Is there a maximum ingestion rate that we should impose? What should we support, handle, plan for, etc. out of the box?
I have similar questions regarding retention periods, but I will address those in a separate thread because there are some other details that I want to discuss that do not pertain to ingestion rates.