HDFS

来自开放百科 - 灰狐
(版本间的差异)
跳转到: 导航, 搜索
 
第7行: 第7行:
 
* Number of files – 100 million.
 
* Number of files – 100 million.
 
o If DFS data size is 1016 and the block size is 108 then under the assumption that each file has exactly one block we need to support 108 files.
 
o If DFS data size is 1016 and the block size is 108 then under the assumption that each file has exactly one block we need to support 108 files.
 +
 
o On our current installation there is 32TB of data, using 55,000 files and folders. Scaling 32TB to 10PB under the assumption the average file size remains the same gives us an estimate of 18,000,000 files.
 
o On our current installation there is 32TB of data, using 55,000 files and folders. Scaling 32TB to 10PB under the assumption the average file size remains the same gives us an estimate of 18,000,000 files.
 
* Number of concurrent clients – 100 thousand.
 
* Number of concurrent clients – 100 thousand.

2007年9月13日 (四) 19:43的版本


Scale Requirements

  • Number of nodes – 10 thousand.
  • Total data size – 10 PB.

Assuming 10,000 nodes capable of storing 1TB each. This is an order of magnitude estimate. With 750GB disks becoming commodity we could reasonably expect to have to support 750GB*4/node = 3TB/node = 30PB total.

  • Number of files – 100 million.

o If DFS data size is 1016 and the block size is 108 then under the assumption that each file has exactly one block we need to support 108 files.

o On our current installation there is 32TB of data, using 55,000 files and folders. Scaling 32TB to 10PB under the assumption the average file size remains the same gives us an estimate of 18,000,000 files.

  • Number of concurrent clients – 100 thousand.

If on a 10,000 cluster each node has one task tracker running 4 tasks each according to current m/r defaults then we need to support 40,000 simultaneous clients.

  • Acceptable level of data loss – 1 hour.

Any data created or updated in DFS 1 hour ago or before is guaranteed to be recoverable in case of system failures.

  • Acceptable downtime level – 2 hours.

DFS failure requires manual system recovery. The system is guaranteed to be available again not later than 2 hours after the recovery start.

分享您的观点
个人工具
名字空间

变换
操作
导航
工具箱