Status -- Design Phase
Problem
Some users are reporting significant uneven distribution in Infinispan despite using good hashes. This has an impact on data distribution, which impacts on memory usage, and also on CPU usage (due to excessive gc on some nodes).
Challenges
- Rehashing/Rebalancing on join/leave
- Distribution/Replication is used for backup of data, so distributed copies must be placed on other physical nodes
- Communication to other virtual nodes on physical node doesn't need to visit the network layer at all
- Network storm when physical nodes join
Design
TODO
References
- ISPN-870
- Forum Thread
- Amazon Dynamo Paper (Section 4.2 for design, 6.2 for a general discussion on ensuring uniform distribution)
Notes
- Classes to look at: RehashTask, JoinTask, InvertedLeaveTask, DefaultCOnsistentHash, TopologyAwareConsistentHash, DistributionManagerImpl, DistributionInterceptor
- Will need benchmarking to establish recommended parameters
- Consider staggering start of virtual nodes to prevent network storm
- Aim to do all work simply in consistent hashing and rehashing code avoiding need for building this deep into the architecture
Comments