HDFS

2007年9月13日 (四) 19:43的版本

Scale Requirements

Number of nodes – 10 thousand.
Total data size – 10 PB.

Assuming 10,000 nodes capable of storing 1TB each. This is an order of magnitude estimate. With 750GB disks becoming commodity we could reasonably expect to have to support 750GB*4/node = 3TB/node = 30PB total.

Number of files – 100 million.

o If DFS data size is 1016 and the block size is 108 then under the assumption that each file has exactly one block we need to support 108 files.

o On our current installation there is 32TB of data, using 55,000 files and folders. Scaling 32TB to 10PB under the assumption the average file size remains the same gives us an estimate of 18,000,000 files.

Number of concurrent clients – 100 thousand.

If on a 10,000 cluster each node has one task tracker running 4 tasks each according to current m/r defaults then we need to support 40,000 simultaneous clients.

Acceptable level of data loss – 1 hour.

Any data created or updated in DFS 1 hour ago or before is guaranteed to be recoverable in case of system failures.

Acceptable downtime level – 2 hours.

DFS failure requires manual system recovery. The system is guaranteed to be available again not later than 2 hours after the recovery start.

@@ 第7行： / 第7行： @@
 * Number of files – 100 million.
 o If DFS data size is 1016 and the block size is 108 then under the assumption that each file has exactly one block we need to support 108 files.
 o On our current installation there is 32TB of data, using 55,000 files and folders. Scaling 32TB to 10PB under the assumption the average file size remains the same gives us an estimate of 18,000,000 files.
 * Number of concurrent clients – 100 thousand.

HDFS

2007年9月13日 (四) 19:43的版本

Scale Requirements

个人工具

名字空间

变换

查看

操作

搜索

导航

工具箱