7 Replies Latest reply on Dec 18, 2009 6:58 AM by manik

Implmentation of consistent hashing

patrick.huang May 19, 2009 4:38 AM

hi all,

I have noticed a class named DefaultConsistentHash, and I found code like this in method locate()

while (results.size() < numCopiesToFind) {
 // we mod the index the 2nd time to make sure the index starts again from the beginning when it reaches the end.
 // e.g., in a cluster of 10 with 3 copies of data, and a key that maps to node index 9, the next 2 backups should
 // be at indexes 0 and 1.

 int index = ((hash % clusterSize) + copyNumber) % clusterSize;
 Address candidate = addresses.get(index);
 results.add(candidate);
 copyNumber++;
 }

this is consistent hashing? why I don't think so, does this finished?

thanks
Patrick

1. Re: Implmentation of consistent hashing

manik May 19, 2009 6:22 AM (in response to patrick.huang)

No, it is not complete. DIST is still work-in-progress. Crucially, rehashing is still in development.

The algorithm you see does work, provided cluster size does not change. The way I hope to work this is to create a new DCH instance with a new cluster topology every time the "shape" of the cluster changes.
Actions
2. Re: Implmentation of consistent hashing

patrick.huang May 26, 2009 4:43 AM (in response to patrick.huang)

so How you guys plan to handle the situation like node fail?
redistributed data of all node?

why "create a new DCH instance with a new cluster topology every time the "shape" of the cluster changes"?

thanks
Patrick
Actions
3. Re: Implmentation of consistent hashing

manik May 26, 2009 5:51 AM (in response to patrick.huang)

So that in the window between data being redistributed, you can still track where the data "was" and check both locations.
Actions
4. Re: Implmentation of consistent hashing

patrick.huang Jun 4, 2009 2:27 AM (in response to patrick.huang)

so
when data is being redistributed,

Do we need hold the whole system?

and the system will be unresponsive at this time?

thanks
Patrick
Actions
5. Re: Implmentation of consistent hashing

manik Jun 4, 2009 3:34 AM (in response to patrick.huang)

The system should not be unresponsive - that is one of the goals of the design.
Actions
6. Re: Implmentation of consistent hashing

dtenny Dec 17, 2009 5:40 PM (in response to manik)

It's possible my question would be better served by a thread titled "partitioned cache rebalancing".

What is the present state of rebalancing partitioned cache data when cluster membership changes?

If I have 10 nodes and each is hosting 10% of the data, and I add 10 more, will each
node rebalance to host 5% of the data?

As was mentioned in the previous reply, I'd expect not to suffer responsiveness problems
while rebalancing occurs, it should occur in the background.

What's the reality w.r.t. the current implementation and the road map?

Thanks!
Actions
7. Re: Implmentation of consistent hashing

manik Dec 18, 2009 6:58 AM (in response to dtenny)

Our CH impl does not guarantee perfectly equal rebalancing, since it does not change the position of nodes on a hash wheel. Instead, as new nodes are added, they will be added to the hash wheel but this will not cause existing nodes to "shift". This means that each node will share a disproportionate amount of load - but this would be the case anyway since it depends on the hash function on your keys. And the larger the cluster gets the smaller this problem of disproportionate load becomes.

wrt. this happening in the background, that is what it is designed to do. To rebalance with minimal impact. The effect of fixed node positions means that only nodes up to "numOwners - 1" positions behind and ahead of the new joiner may be affected by any rebalancing, and even then only up to 25% of the keys on each of these nodes *may* have to move. This means that in a cluster of 20 nodes and a "numOwners" setting of 2, adding a new node will mean 18 nodes are completely unaffected by the rebalancing.

One drawback though is that the current impl uses RPC to push or pull state between the affected nodes. This has some impact since the necessary entries need to be loaded into memory and then pushed as a part of an RPC message. In future we plan to replace this with a streaming mechanism to remove the unnecessary serializing/deserializing, as well as to better control memory spikes during this process. See ISPN-284 for more details on this.

Cheers
Manik
Actions

Go to original post