您的位置：首页 > 产品设计 > UI/UE

《Hadoop The Definitive Guide》ch06 How MapReduce Works

2012-07-07 16:01 513 查看

1. MapReduce的工作原理

1) 客户端提交MapReduce作业。

2) jobtracker 协调作业的运行。 jobtracker是一个Java应用程序，它的主类是JobTracker。

3) tasktracker 运行作业划分后的任务。tasktracker是一个Java应用程序，它的主类是TaskTracker。

4) 分布式文件系统（一般为HDFS），用来在其他实体间共享作业文件。

2. JobClient的submitJob()方法所实现的作业提交过程如下

a. Asks the jobtracker for a new job ID (by calling getNewJobId() on JobTracker) (step 2).

b. Checks the output specification of the job. For example, if the output directory has not been specified or it already exists, the job is not submitted and an error is

thrown to the MapReduce program.

c. Computes the input splits for the job. If the splits cannot be computed, because the input paths don’t exist, for example, then the job is not submitted and an error

is thrown to the MapReduce program.

d. Copies the resources needed to run the job, including the job JAR file, the configuration file, and the computed input splits, to the jobtracker’s filesystem in a

directory named after the job ID. The job JAR is copied with a high replication factor (controlled by the mapred.submit.replication property, which defaults to

10) so that there are lots of copies across the cluster for the tasktrackers to access when they run tasks for the job (step 3).

e. Tells the jobtracker that the job is ready for execution (by calling submitJob() on JobTracker) (step 4).

3. tasktracker中执行的流和管道及其子进程的关系

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航