Distributed file storage
gciuloaica Apr 12, 2011 3:10 PMHi,
I'm looking for an advise to the following approach/problems:
Story
Distributed File Storage [DFS] is a simple distributed storage of unstructured data (files). It should expose a simple CRUD interface to be accessed from different platforms (browser, mobile apps...). The storage should be able to scale horizontally easily such it should be able to accommodate increasing need of storage.
The service should be able to respond to a client request under 30 ms.
It should be able to store billions of small files (under 50kb) and large files (Gb).
Medium load request that should be supported is 300 req/s.
System should be resilient to data centre failure - to be able to replicate data cross -data centre.
Approach
After some architecture research following non- functional requirements and approaches has been identified:
Scalability: Shared nothing architecture - this will allow to horizontally scale the system theoretically infinitely by adding new nodes. HDFS model is not ideal, because it use NameNodes to identify the location of the file in the cluster, that in case of a large system may become a bottleneck, and also because HDFS is optimized for large files.
High Availability: Async replication of data inside data centre and cross-data centre. To evaluate if 3 copies inside a data centre (in different racks) and 2 copies in other 2 different data centres will be a reasonable approach.
Routing: - client side routing - it will be hard to support dumb clients like browsers/ mobile clients
- server side routing - How to resolve the file location and file retrieving with a minimum of network calls? Ideal it should be able to retrieve a file content in one hope in more 50% of cases... to be able support the 30ms requirement for response, it should not touch disk at all...
Clustering: - few hundred nodes - there are plenty of systems available to handle properly medium size clusters.
- thousands of nodes - not aware of any available system...
In order to be able to build a shared nothing system we will have to store routing information in a distributed data grid. We'll not be able to store files in the grid. Most of the data grids are using consistent hashing to split the keys in all available nodes. All consistent hashing algorithms that I'm aware of, are assuming that value is a reasonable size blob, but in our case we can't assume this. How to build a routing algorithm that is able to not route write requests to full nodes?In order to support load requested, it should be able to read from all 3 copies from a datacentre.Re-balancing is not a solution to spread the data uniform in the system. This is because it may end-up moving gigabytes of data between notes.
Concepts, Terminology
- Replication area - a group of nodes that mirrors same data.
- Extended replication area - a replication area that has associated nodes from other data centres.
Practical approaches:
HTTP SERVER THAT OFFER CRUD OPERATIONS ON EACH NODE.
- Netty used to implement HTTP server, taking advantages of zero-copy byte buffer support.
- Routing information kept in Hazelcast
Impressed by the number of concurrent request that I was able to achieve on a single node, without using any special caching (just native files system caching). Able to get up to 12k concurrent connection on a regular desktop machine (Dell Dimension, quad code proc, 6 Gb Ram, running CentOS, updated to allow to have unlimited number of connections.). Hazelcast running in same jvm with Netty
Issues: - Hazelcast is not stable at all - serialization of routing data is not properly working. - I was not able to keep a cluster of 3 nodes up and running enough time to be able to run a performance test on the cluster.
HTTP SERVER THAT OFFER CRUD OPERATIONS ON EACH NODE.
- Netty used to implement HTTP server, taking advantages of zero-copy byte buffer support.
- Routing information kept in Voldemort
Issues: - Voldemort is more stable than Hazelcast but consume more resource and much complex to install/configure.
HTTP SERVER THAT OFFER CRUD OPERATIONS ON EACH NODE. - TBD
- Netty used to implement HTTP server, taking advantages of zero-copy byte buffer support.
- Routing information kept in Infinispan
So, will Infinispan be able to provide a reliable distributed storage for routing information?
Thanks,
Gabi