欢迎大家赞助一杯啤酒🍺 我们准备了下酒菜:Formal mathematics/Isabelle/ML, Formal verification/Coq/ACL2/Agda, C++/Lisp/Haskell
HDFS
Scale Requirements
- Number of nodes – 10 thousand.
- Total data size – 10 PB.
Assuming 10,000 nodes capable of storing 1TB each. This is an order of magnitude estimate. With 750GB disks becoming commodity we could reasonably expect to have to support 750GB*4/node = 3TB/node = 30PB total.
- Number of files – 100 million.
o If DFS data size is 1016 and the block size is 108 then under the assumption that each file has exactly one block we need to support 108 files.
o On our current installation there is 32TB of data, using 55,000 files and folders. Scaling 32TB to 10PB under the assumption the average file size remains the same gives us an estimate of 18,000,000 files.
- Number of concurrent clients – 100 thousand.
If on a 10,000 cluster each node has one task tracker running 4 tasks each according to current m/r defaults then we need to support 40,000 simultaneous clients.
- Acceptable level of data loss – 1 hour.
Any data created or updated in DFS 1 hour ago or before is guaranteed to be recoverable in case of system failures.
- Acceptable downtime level – 2 hours.
DFS failure requires manual system recovery. The system is guaranteed to be available again not later than 2 hours after the recovery start.

