-
1. Re: Aggregation schema changes
john.sanda Apr 9, 2014 7:45 AM (in response to heiko.braun)Doing that single query would be equivalent to a distributed full table scan across all cluster nodes. For any appreciable amount of data, we would probably see a lot of timeouts. The coordinator node, the node that receives the client request, might time out waiting for other nodes to reply. And the client could time out waiting for the coordinator because it does not send the response until it has the full results. When lots of data has to be fetched, it can be a lot more efficient to execute multiple queries in parallel and start processing results as they arrive instead of waiting for the full result set.
The most efficient type of query is one executed against a single partition. The request can be serviced by a single replica and will requires the fewest number of disk seeks.
There is one other aspect to consider. Not all schedules might have data to process for the given time slice which could result in a lot of unnecessary reads.
-
2. Re: Aggregation schema changes
heiko.braun Apr 9, 2014 9:12 AM (in response to john.sanda)The most efficient type of query is one executed against a single partition.
I see what what you mean. That makes sense.
Just out of curiosity, what would it look like if we had time slices as row keys and schedule id's as columns keys? This would restrict all queries for a certain time slice to a single partition, right? I am currently just brainstorming and trying to understand the benefits and drawbacks of each approach. But since have a lot more experience with the C* data model layout, I am wondering what your thoughts are?
-
3. Re: Aggregation schema changes
john.sanda Apr 9, 2014 10:09 AM (in response to heiko.braun)Great question. Let's call the number of schedule ids N. If N is < 100,000 this might work out well. But as N increases in size, the query is going to get more resource intensive in terms of heap usage. This could also lead to hot spots, meaning writes for the time slice are not distributed across the cluster. To avoid these problems, we would probably want to partition the row. There are different ways of doing it. One is to append an integer such that keys would look like { 2014-04-09 03:00:00:0}, { 2014-04-09 03:00:00:1}, { 2014-04-09 03:00:00:2}, etc. On writes we could alternate the partition randomly or in a round robin fashion. Another approach might be to divide the time slice into smaller units; so instead of { 2014-04-09 03:00:00} as a key, we could divide into 4 where we would have { 2014-04-09 03:00:00}, { 2014-04-09 03:15:00}, etc. Of course you need to keep track of the number of partitions so that you know how many queries to execute. As far as how many partitions are needed, it depends on the number schedules, the resources allocated to each node (JVM heap, RAM on the box, etc.), and number of nodes.
Once nice thing is that it is easy to dynamically increase or decrease the number of partitions. If we have an efficient way to track the number of schedule ids, we could use that information to decide on whether or not more or less partitioning is needed. Whenever we query the time slice, we can make future adjustments based on the query results. One challenge I see that data for a schedule id could be spread across multiple partitions which means we would need to do in an in-memory sort. Without thinking about it some more, I am not sure how this could be done without loading all of the data for the time slice into memory.
With the changes described in Aggregation Schema Changes - RHQ - Project Documentation Editor storing all data for a time slice in a single partition but for a subset of schedules. All of the data for those schedules is co-located within that single partition.