-
1. Re: Why a continuous hash?
imbng Aug 11, 2009 7:08 PM (in response to imbng)1 of 1 people found this helpfulNot sure where my brain what when I was reading the docs but it plainly says you're using a consistent hash.
What may have confused me was the section on rehashing. What's the need for a rehash if you're using a consistent hash?
My guess is there is no was to define the number of shards/partitions. Clusters simply come and go so the total number is always changing and the cluster is what the keys hash to.
Other implementations I've played with all allow you to define the number of shards/partitions up front and then scatters those across all the running containers. The keys hash to shards/partitions.
Using the cluster as the unit of partition seems complex as you would need to rehash and move data around. You would also have to move the shards/partitions in the other case but rules can be written for what moves when and when how since the shard/partition is decoupled from the hashing.
I may have the terminology wrong here since I'm just starting to look at this since everyone seems to use different terms. -
2. Re: Why a continuous hash?
manik Aug 12, 2009 4:33 AM (in response to imbng)You're pretty much spot-on. The need for rehashing is due to nodes joining/leaving the cluster.
I have considered the shard/partition approach (or virtual nodes as I called them), but that would require some form of global metadata. -
3. Re: Why a continuous hash?
imbng Aug 12, 2009 11:54 AM (in response to imbng)Yes, there would be a need for global state but there is already some of that in the current implementation is there not?
You have to know how many clusters (to hash correctly) and where all the clusters are (to route requests).
Isn't that basically the same metadata you'd need if going with virtual nodes? -
4. Re: Why a continuous hash?
manik Aug 12, 2009 1:02 PM (in response to imbng)1 of 1 people found this helpfulno, for virtual nodes you'd need some added metadata including what each vnode hashes to in a given hash space, as well as which vnodes map to real nodes. The latter would be prone to change if there is a cluster reorganisation event (nodes joining or leaving) as vnodes could be assigned to different actual nodes.
-
5. Re: Why a continuous hash?
imbng Aug 12, 2009 1:27 PM (in response to imbng)True, you'd need the initial configuration setting that specifies how many total vnodes there would be and that would need to be global. From that configuration you can easily calculate where a key hashes using a modulo hash (key_hashcode % total_vnode_count).
As for the 2nd bit of state don't you already have that today or at least most of it? I've looked into JGroups which I believe you're using to some degree so that state/membership info is already tracked and is available. You may not have the mapping of vnodes to real nodes (depending on how they're registered) but the infrastructure and symatics are there to support it I'd guess.
Anyhow, interesting discussion and I'm glad to see this product in the portfolio.