《Hadoop The Definitive Guide》ch06 How MapReduce Works
2012-07-07 16:01
513 查看
1. MapReduce的工作原理
1) 客户端 提交MapReduce作业。
2) jobtracker 协调作业的运行。 jobtracker是一个Java应用程序,它的主类是JobTracker。
3) tasktracker 运行作业划分后的任务。tasktracker是一个Java应用程序,它的主类是TaskTracker。
4) 分布式文件系统(一般为HDFS),用来在其他实体间共享作业文件。
2. JobClient的submitJob()方法所实现的作业提交过程如下
a. Asks the jobtracker for a new job ID (by calling getNewJobId() on JobTracker) (step 2).
b. Checks the output specification of the job. For example, if the output directory has not been specified or it already exists, the job is not submitted and an error is
thrown to the MapReduce program.
c. Computes the input splits for the job. If the splits cannot be computed, because the input paths don’t exist, for example, then the job is not submitted and an error
is thrown to the MapReduce program.
d. Copies the resources needed to run the job, including the job JAR file, the configuration file, and the computed input splits, to the jobtracker’s filesystem in a
directory named after the job ID. The job JAR is copied with a high replication factor (controlled by the mapred.submit.replication property, which defaults to
10) so that there are lots of copies across the cluster for the tasktrackers to access when they run tasks for the job (step 3).
e. Tells the jobtracker that the job is ready for execution (by calling submitJob() on JobTracker) (step 4).
3. tasktracker中执行的流和管道及其子进程的关系
1) 客户端 提交MapReduce作业。
2) jobtracker 协调作业的运行。 jobtracker是一个Java应用程序,它的主类是JobTracker。
3) tasktracker 运行作业划分后的任务。tasktracker是一个Java应用程序,它的主类是TaskTracker。
4) 分布式文件系统(一般为HDFS),用来在其他实体间共享作业文件。
2. JobClient的submitJob()方法所实现的作业提交过程如下
a. Asks the jobtracker for a new job ID (by calling getNewJobId() on JobTracker) (step 2).
b. Checks the output specification of the job. For example, if the output directory has not been specified or it already exists, the job is not submitted and an error is
thrown to the MapReduce program.
c. Computes the input splits for the job. If the splits cannot be computed, because the input paths don’t exist, for example, then the job is not submitted and an error
is thrown to the MapReduce program.
d. Copies the resources needed to run the job, including the job JAR file, the configuration file, and the computed input splits, to the jobtracker’s filesystem in a
directory named after the job ID. The job JAR is copied with a high replication factor (controlled by the mapred.submit.replication property, which defaults to
10) so that there are lots of copies across the cluster for the tasktrackers to access when they run tasks for the job (step 3).
e. Tells the jobtracker that the job is ready for execution (by calling submitJob() on JobTracker) (step 4).
3. tasktracker中执行的流和管道及其子进程的关系
相关文章推荐
- 《Hadoop The Definitive Guide》ch02 MapReduce
- 《Hadoop The Definitive Guide》ch05 Developing a MapReduce Application
- RunningMapReduceExampleTFIDF - hadoop-clusternet - This document describes how to run the TF-IDF MapReduce example against ascii books. - This project is for those who wants to experiment hadoop as a skunkworks in a small cluster (1-10 nodes) - Google Pro
- 《Hadoop The Definitive Guide》ch07 MapReduce Types and Formats
- RunningMapReduceExampleTFIDF - hadoop-clusternet - This document describes how to run the TF-IDF MapReduce example against ascii books. - This project is for those who wants to experiment hadoop as a skunkworks in a small cluster (1-10 nodes) - Google Pro
- 《Hadoop: The Definitive Guide》读书笔记 -- Chapter 2 MapReduce
- Hadoop: the definitive guide 第三版 拾遗 第十二章 之HiveQL命令大全
- Hadoop: the definitive guide 第三版 拾遗 第十三章 之HBase起步
- [Hadoop]How MapReduce Works
- Hadoop: the definitive guide 第三版 拾遗 第四章 之MapFile
- Hadoop: The Definitive Guide (3rd Edition)
- Hadoop:The Definitive Guid 总结 Chapter 8 MapReduce的特性
- Hadoop经典书籍----- Hadoop: The Definitive Guide
- 《Hadoop The Definitive Guide》ch03 The Hadoop Distributed Filesystem
- 《Hadoop权威指南》(Hadoop:The Definitive Guide) 气象数据集下载脚本
- Hadoop: the definitive guide 第三版 拾遗 第四章 之CompressionCodec
- Hadoop: the definitive guide 第三版 拾遗 第十二章 之Hive架构
- 《Hadoop The Definitive Guide》ch04 Hadoop I/O
- 《Hadoop:The Definitive Guide 4th Edition》Chapter 17 Hive——B部分
- Hadoop:The Definitive Guid 总结 Chapter 7 MapReduce的类型与格式