您的位置:首页 > 运维架构 > Linux

centos下编译hadoop2.2.0src包

2013-11-20 23:07 316 查看

Hadoop 2.2.0编译,安装步骤(基于64位CentOS)(未完待续)

Hadoop 2.2.0编译,安装步骤(基于64位CentOS)

hight:
基于yarn计算框架和高可用性hdfs的第一个稳定版本。

注1:官网只提供32位release版本, 若机器为64位,需要手动编译。
注2:目前网上传的2.2版本的安装步骤几乎都有问题,没有一个版本是完全正确的。若不懂新框架内部机制,不要照抄网传的版本。
一、hadoop2.2编译(username:hadoop)
因为我们安装的CentOS是64bit的,而官方release的hadoop2.2.0版本没有对应的64bit安装包,故需要自行编译。
首先需要去oracle下载64位jdk:

$su root

$wget http://download.oracle.com/otn-pub/java/jdk/7u45-b18/jdk-7u45-linux-x64.tar.gz[/code] 
注: prompt(提示符)为%默认为当前用户, #则为root,注意以下各步骤中的prompt类型。
下面为hadoop编译步骤:中间部分的文本框里内容提要只是一些补充说明,不要执行框里的命令)
(1) BOTPROTO改为”dhcp”

#su root

#sed –i  s/static/dhcp/g /etc/sysconfig/network-scripts/ifcfg-eth0

#servicenetwork restart


(2) 下载hadoop2.2.0 源码

#su hadoop

$ cd ~

$ wget  http://apache.dataguru.cn/hadoop/common/stable/hadoop-2.2.0-src.tar.gz[/code] 
(3)安装maven

# suroot;  cd /opt

#wget http://apache.fayea.com/apache-mirror/maven/maven-3/3.1.1/binaries/apache-maven-3.1.1-bin.tar.gz 
# tarzxvf apache-maven-3.1.1-bin.tar.gz

# cdapache-maven-3.1.1


修改系统环境变量
有两种方式,修改/etc/profile或者在/etc/profile.d/下添加定制的shell文件,
鉴于profile文件的重要性,尽量不要在profile文件里添加内容,官方建议采用第二种,以保证profile文件的绝对安全。
下面采用第二种方式:
创建一个简单shell脚脚本并添加相关内容进去:

#cd/etc/profile.d/

# touch maven.sh


maven.sh里添加内容如下:

1 # catmaven.sh
2
3 #environmentvariable settings for mavn
4
5 exportMAVEN_HOME='/opt/apache-maven-3.1.1'
6
7 PATH=$MAVEN_HOME/bin:$PATH


接下来,

#Source/etc/profile

# mvn –version


显示版本信息“Apache Maven3.1.1”
(4)安装protobuf
注意:apache官方网站上的提示“NOTE: You will need protoc 2.5.0 installed.”

# suroot; cd /opt

# wget https://protobuf.googlecode.com/files/protobuf-2.5.0.tar.bz2 
# tarxvf protobuf-2.5.0.tar.bz2 (注意压缩文件后缀, maven安装包是—gzip文件,解压时需加–z)

# cd protobuf-2.5.0

# ./configure


Oh,shit, 提示报错”configure: error: C++ preprocessor "/lib/cpp" failssanity check”
安装gcc

#yuminstall gcc


(5)编译hadoop
首先从官网下载hadoop2.2.0source code:

#suhadoop; cd ~hadoop/

#wget http://apache.dataguru.cn/hadoop/common/stable/hadoop-2.2.0-src.tar.gz[/code] 
好了,痛苦的编译过程来了。
解压之:

% tarzxvf hadoop-2.2.0-src.tar.gz

% cd hadoop-2.2.0-src


注:蓝色部分是说明,不需要执行!

我们看看官网上是怎么写的:
You should be able to obtain the MapReduce tarball fromthe release. If not, you should be able to create a tarball from the source.
$ mvn clean install -DskipTests
$ cd hadoop-mapreduce-project
$ mvn clean install assembly:assembly-Pnative
NOTE: Youwill need protoc 2.5.0 installed.
To ignore the native builds in mapreduce you can omit the-Pnative argument for maven. The tarball should be available in target/directory.
看看,写得多么简要,好像一切都很美好的样子。
那好,现在按照官网要求来执行安装:
$mvn clean install –DskipTests
结果各种报错!而且下载极其缓慢~
Failed to execute goal org.apache.maven.plugins:maven-dependency-plugin:2.1:build-classpath(build-classpath)…
经过排查, 怀疑是GFW的原因,可能被墙了,修改maven的默认镜源为国内镜像,则正常。

步骤如下:

Step1.切root权限, 修改/opt/apache-maven-3.1.1/conf/settings.xml

1
2
3 %su root
4
5 #vim /opt/apache-maven-3.1.1/conf/settings.xml


(1).在<mirrors>…</mirrors>里添加国内源(注意,绿色部分是新加的,黑色部分是原本就有的):

1 <mirrors>
2
3 <mirror>
4
5 <id>nexus-osc</id>
6
7 <mirrorOf>*</mirrorOf>
8
9  <name>Nexusosc</name>
10
11 <url>http://maven.oschina.net/content/groups/public/</url>
12
13 </mirror>
14
15 </mirrors>


(2).在<profiles>标签中增加以下内容(<profiles>…</profiles>别动,只添加绿色的)

1 <profile>
2
3  <id>jdk-1.7</id>
4
5 <activation>
6
7    <jdk>1.7</jdk>
8
9 </activation>
10
11 <repositories>
12
13   <repository>
14
15     <id>nexus</id>
16
17     <name>local private nexus</name>
18
19     <url>http://maven.oschina.net/content/groups/public/</url>
20
21     <releases>
22
23       <enabled>true</enabled>
24
25     </releases>
26
27     <snapshots>
28
29       <enabled>false</enabled>
30
31     </snapshots>
32
33   </repository>
34
35  </repositories>
36
37 <pluginRepositories>
38
39   <pluginRepository>
40
41     <id>nexus</id>
42
43     <name>local private nexus</name>
44
45     <url>http://maven.oschina.net/content/groups/public/</url>
46
47     <releases>
48
49       <enabled>true</enabled>
50
51      </releases>
52
53     <snapshots>
54
55       <enabled>false</enabled>
56
57     </snapshots>
58
59   </pluginRepository>
60
61 </pluginRepositories>
62
63 </profile>


Step2

1 # su hadoop
2
3 # sudo cp/opt/apache-maven-3.1.1/conf/settings.xml ~/.m2/


(若提示该用户不在sudoers里,执行以下步骤:
$ su root;
在sudoers里第99行添加当前用户(下面行号不要加):
#cat /etc/sudoers
98 root ALL=(ALL) ALL
99 grid ALL=(ALL) ALL
)
现在执行:

1 $mvn clean install –DskipTests


漫长的等待后发现安装一切正常。
继续编译:

注:此紫色部分为说明,不要要照做!

执行官网的编译步骤:

$ cd hadoop-mapreduce-project
$ mvn clean install assembly:assembly –Pnative
编译了很久, 最后提示有ERROR,看到官网说:

To ignore the native builds in mapreduceyou can omit the -Pnative argument for maven. The tarball should be availablein target/ directory.
于是采用ignore native方式直接编译:

$mvn clean install assembly:assembly
结果发现又是各种错误。

Google后发现大家普遍采用的是以下编译方式(直接执行):
$ cd hadoop-2.2.0-src
$mvn package -Pdist,native -DskipTests -Dtar

又是漫长的等待……
最后安装结果,典型的错误由两个:

Failed to execute goalorg.apache.maven.plugins:maven-antrun-plugin:1.6:run (make) on projecthadoop-pipes: An Ant BuildException has occured: exec returned: 1 -> [Help1]…

[ERROR] Failed to execute goalorg.apache.maven.plugins:maven-antrun-plugin:1.6:run (make) on projecthadoop-common: An Ant BuildException has occured: Execute failed:java.io.IOException: Cannot run
program "cmake"
基于以上错误,所需的安装包有3个:

Cmake

ncurses-devel

openssl-devel

1 %su root
2
3 #yuminstall ncurses-devel
4
5 #yuminstall openssl-devel
6
7 #yuminstall cmake
8
9


以上安装完成后,切回hadoop用户:

1 #suhadoop;#cd ~/hadoop-2.2.0-src


编译

1 $mvn package-Pdist,native -DskipTests -Dtar


漫长的等待后,查看结果:

一切正常。至此,hadoop2.2.0编译完成。
验证:
下面验证编译结果是否符合预期, 注意我们当前是在目录~/hadoop-2.2.0-src下,

1
2
3 $cdhadoop-dist/
4
5 $ls
6
7 pom.xml  target


以上为maven编译的配置文件

1 %cd target
2
3 $ls -l
4
5 antrun
6
7 dist-tar-stitching.sh
8
9 hadoop-2.2.0.tar.gz
10
11 hadoop-dist-2.2.0-javadoc.jar
12
13 maven-archiver
14
15 dist-layout-stitching.sh
16
17 hadoop-2.2.0
18
19 hadoop-dist-2.2.0.jar
20
21 javadoc-bundle-options
22
23 test-dir


以上为maven编译后自动生成的目录文件,进入hadoop-2.2.0

1 $cd hadoop-2.2.0
2
3 $ls
4
5 bin  etc  include  lib libexec  sbin  share


这才是和官方release2.2.0版本(官方只有32bit版本)的相同的目录结构。
下面主要验证两项:
a.验证版本号

1 $bin/hadoopversion
2
3 Hadoop 2.2.0
4
5 Subversionhttps://svn.apache.org/repos/asf/hadoop/common -r 1529768
6
7 Compiledby hortonmu on 2013-10-07T06:28Z
8
9 Compiledwithprotoc 2.5.0
10
11 Fromsource with checksum 79e53ce7994d1628b240f09af91e1af4
12
13 Thiscommand was run using/home/grid/yarn/hadoop-2.2.0/share/hadoop/common/hadoop-common-2.2.0.jar


可以看到hadoop版本号,编译工具(protoc2.5.0版本号与官方要求一致)以及编译日期.
b.验证hadoop lib的位数

1 %file  lib//native/*
2
3 lib//native/libhadoop.a:        current ar archive
4
5 lib//native/libhadooppipes.a:   current ar archive
6
7 lib//native/libhadoop.so:       symbolic link to `libhadoop.so.1.0.0'
8
9 lib//native/libhadoop.so.1.0.0:ELF 64-bitLSB shared object, x86-64, version 1 (SYSV), dynamically linked, notstripped
10
11 lib//native/libhadooputils.a:   current ar archive
12
13 lib//native/libhdfs.a:          current ar archive
14
15 lib//native/libhdfs.so:         symbolic link to `libhdfs.so.0.0.0'
16
17 lib//native/libhdfs.so.0.0.0:   ELF 64-bit LSB shared object, x86-64,version 1 (SYSV), dynamically linked, not stripped


看到黑色的ELF-64bit LSB证明64bit hadoop2.2.0初步编译成功,查看我们之前的hadoop0.20.3版本,会发现lib//native/libhadoop.so.1.0.0是32bit,这是不正确的!。^_^
二、hadoop2.2配置
(1)home设置
为了和MRv1区别, 2.2版本的home目录直接命名为yarn:

1 #su hadoop
2
3 $cd ~
4
5 $mkdir –p yarn/yarn_data
6
7 $cp –a~hadoop/hadoop-2.2.0-src/hadoop-dist/target/hadoop-2.2.0  ~hadoop/yarn


(2)环境变量设置
.bashrc里添加新环境变量:

1 # javaenv
2
3 export JAVA_HOME="/usr/java/jdk1.7.0_45"
4
5 exportPATH="$JAVA_HOME/bin:$PATH"
6
7 # hadoopvariable settings
8
9 exportHADOOP_HOME="$HOME/yarn/hadoop-2.2.0"
10
11 export HADOOP_PREFIX="$HADOOP_HOME/"
12
13 exportYARN_HOME=$HADOOP_HOME
14
15 exportHADOOP_MAPRED_HOME="$HADOOP_HOME"
16
17 exportHADOOP_COMMON_HOME="$HADOOP_HOME"
18
19 exportHADOOP_HDFS_HOME="$HADOOP_HOME"
20
21 export HADOOP_CONF_DIR="$HADOOP_HOME/etc/hadoop/"
22
23 exportYARN_CONF_DIR=$HADOOP_CONF_DIR
24
25 exportPATH="$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH"


以上操作注意2点:
1. jdk一定要保证是64bit的
2.HADOOP_PREFIX极其重要,主要是为了兼容MRv1,优先级最高(比如寻找conf目录,即使我们配置了HADOOP_CONF_DIR,启动脚本依然会优先从$HADOOP_PREFIX/conf/里查找),
一定要保证此变量正确配置!
若想再2.2中使用MRv1框架,用此变量可以方便地指定MRv1的conf文件,否则,就直接注释掉,以防止产生不可预期的错误。
(3)改官方启动脚本的bug

说明:此版本虽然是release稳定版,但是依然有非常弱智的bug存在。

注意:紫色部分的内容了解即可,不需要执行!

正常情况下,启动守护进程$YARN_HOME/sbin/hadoop-daemons.sh中可以指定node.

我们看下该启动脚本的说明:

%$YARN_HOME/sbin/hadoop-daemons.sh

Usage:hadoop-daemons.sh [--config confdir] [--hosts hostlistfile] [start|stop]command args...

可以看到--hosts可以指定需要启动的存放节点名的文件名:

但这是无效的,此脚本调用的$YARN_HOME/libexec/hadoop-config.sh脚本有bug.

执行一下启动脚本:

%$YARN_HOME/sbin/hadoop-daemons.sh--hosts my_datanodes start datanode

at: /home/grid/yarn/hadoop-2.2.0/etc/hadoop//126571:No such file or directory

分析脚本,定位到嵌套脚本$YARN_HOME/libexec/hadoop-config.sh第96行:

96 exportHADOOP_SLAVES="${HADOOP_CONF_DIR}/%$1"

看到红色部分是不对的,修改为:

96 export HADOOP_SLAVES="${HADOOP_CONF_DIR}/$1"

现在执行

$hadoop-daemons.sh--hosts nodes start datanode

Slave:starting datanode,loggingto /home/grid/yarn/hadoop-2.2.0//logs/hadoop-grid-datanode-Slave.out

备注1:此版本11月初发布至今,网上教程不论中文还是英文,均未提及此错误。

备注2:$YARN_HOME/libexec/hadoop-config.sh中分析hostfile的逻辑非常脑残:

if [ "--hosts" = "$1" ]

then

shift

export HADOOP_SLAVES="${HADOOP_CONF_DIR}/%$1"

shift


因此,用户只能将自己的hostfile放在${HADOOP_CONF_DIR}/下面,放在其它地方是无效的。

备注3:按照$YARN_HOME/libexec/hadoop-config.sh脚本的逻辑,还有一种方式指定host

$hadoop-daemons.sh–hostnames Slave start datanode, 注意,因为bash脚本区分输入参数的分割符为\t或\s,所以限制了此种方式只能指定一个单独的节点

执行以下步骤:

%cd$YARN_HOME/libexec/

%vim hadoop-config.sh


修改第96行代码为:

export HADOOP_SLAVES="${HADOOP_CONF_DIR}/$1"


保存退出vim
(5)配置文件设置

etc/hadoop/core-site.xml

<configuration>

<property>

<name>fs.defaultFS</name>

<value>hdfs://node1:9000</value>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>${user.home}/yarn/yarn_data/tmp/hadoop-${user.name}</value>

</property>

</configuration>


备注1:注意fs.defaultFS为新的变量,代替旧的:fs.default.name
备注2:tmp文件夹放在我们刚才新建的$HOME/yarn/yarn_data/下面。
etc/hadoop/hdfs-site.xml

<property>
<name>dfs.replication</name>
<value>x</value>
</property>

</configuration>
备注1. 新:dfs.namenode.name.dir,旧:dfs.name.dir,新:dfs.datanode.name.dir,旧:dfs.data.dir
备注2.dfs.replication确定data block的副本数目,hadoop基于rackawareness(机架感知)默认复制3份分block,(同一个rack下两个,另一个rack下一份,按照最短距离确定具体所需block, 一般很少采用跨机架数据块,除非某个机架down了)
etc/hadoop/yarn-site.xml

<configuration>

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>

</property>

<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

</configuration>


etc/hadoop/mapred-site.xml

<configuration>

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

</configuration>


备注1:新的计算框架取消了实体上的jobtracker, 故不需要再指定mapreduce.jobtracker.addres,而是要指定一种框架,这里选择yarn,
备注2:hadoop2.2.还支持第三方的计算框架,但没怎么关注过。
(6)启动
确保在每主机的/etc/hosts里添加了所有node的域名解析表(i.e.node1 198.0.0.1);iptables已关闭;/etc/sysconfig/network-script/ifcfg-eth0里BOTPROTO=static;/etc/sysconfig/network文件里已设置了各台主机的hostname, 静态ip地址,且已经重启过每台机器;jdk和hadoop都为64bit;ssh免登陆已配置;完成以上几项后,就可以启动hadoop2.2.0了。
注意到从头到尾都没有提到Master, Slave,也没有提到namenode,datanode,是因为,新的计算框架,新的hdfs中不存在物理上的Master节点,所有的节点都是等价的。
下面以两个节点为例,为方便理解新框架,特做了以下规划:
node1 resourcemanager, nodemanager, proxyserver,historyserver, datanode, namenode,
node2 datanode, nodemanager

6.1 格式化:

%$YARN_HOME/bin/hdfsnamenode –format


(注意:hadoop 2.2.0的格式化步骤和旧版本不一样,旧的为 $YARN_HOME/bin/hadoop namenode –format
6.2 启动:
启动方式(1)手动开启
在node1节点主机上,分别启动resourcemanager,nodemanager, proxyserver, historyserver, datanode, namenode,
在node2节点主机上,分别启动datanode,nodemanager
备注:如果resourcemanager是独立的,则除了resourcemanager,其余每一个节点都需要一个nodemanager,我们可以在$YARN_HOME/etc/hadoop/下新建一个nodehosts, 在里面添加所有的除了resourcemanager外的所有node,因为此处我们配置的resourcemanager和namenode是同一台主机,所以此处也需要添加nodemanager

执行步骤如下:

%hostname

node1

%$YARN_HOME/sbin/hadoop-daemon.sh  --script hdfs start namenode

%$YARN_HOME/sbin/hadoop-daemon.sh --script hdfs start datanode

%$YARN_HOME/sbin/yarndaemon.shstart nodemanager

%$YARN_HOME/sbin/yarn-daemon.sh   start resourcemanager

%$YARN_HOME/sbin/yarn-daemon.shstart proxyserver

%$YARN_HOME/sbin/mr-jobhistory-daemon.sh   start historyserver

%$ssh node2

%$hostname

node2

%$YARN_HOME/sbin/yarndaemon.shstart nodemanager

%$YARN_HOME/sbin/hadoop-daemon.sh  --script hdfs start datanode

启动方式(2)自动开启

Step1.确认已登录node1

$hostname

node1


在$YARN_HOME/etc/hadoop/下新建namenodehosts,添加所有namenode节点

$cat $YARN_HOME/etc/hadoop/namenodehosts

node1


在$YARN_HOME/etc/hadoop/下新建datanodehosts,添加所有datanode节点

$cat$YARN_HOME/etc/hadoop/datanodehosts

node1

node2


在$YARN_HOME/etc/hadoop/下新建nodehosts,添加所有datano和namenode节点

$cat$YARN_HOME/etc/hadoop/datanodehosts

node1

node2


备注:以上的hostfile名字是随便起的,可以是任意的file1,file2,file3, 但是必须放在$YARN_HOME/etc/hadoop/下面!
step2.执行

%$YARN_HOME/sbin/hadoop-daemons.sh--hosts namenodehosts --script  hdfsstart  namenode

%$YARN_HOME/sbin/hadoop-daemons.sh--hosts datanodehosts --script  hdfsstart  datanode

%$YARN_HOME/sbin/yarn-daemons.sh--hostnames node1 start resourcemanager

%$YARN_HOME/sbin/yarn-daemons.sh--hosts nodehosts start nodemanager

%$YARN_HOME/sbin/yarn-daemons.sh--hostnames node1 start proxyserver

%$YARN_HOME/sbin/mr-jobhistory-daemon.sh   start  historyserver


Step3.查看启动情况
在node1上:

$jps

20698DataNode

21041JobHistoryServer

20888NodeManager

21429Jps

20606NameNode

20792ResourceManager


在node2上

$jps

8147DataNode

8355 Jps

8234NodeManager


Step4.查看各节点状态以及yarncluster运行状态
(1)查看各节点状态
FireFox进入: http://node1:50070(node1为namenode所在节点) 在主页面(第一张图)上点击Live Node,查看(第二张图)上各Live Node状态:

(2)查看resourcemanager上cluster运行状态
Firefox进入:http://node1:8088(node1为resourcemanager所在节点)

Step5. Cluster上MapReduce测试
现提供3个test cases
Test Case 1estimated_value_of_pi

%$YARN_HOME/sbin/yarnjar $YARN_HOME/share/hadoop//mapreduce/hadoop-mapreduce-examples-2.2.0.jar \

pi 101000000


Console输出摘录:

Number of Maps  =10

Samples per Map = 1000000

Wrote input for Map #0

Wrote input for Map #1

Wrote input for Map #2

Wrote input for Map #3

Wrote input for Map #4

Wrote input for Map #5

Wrote input for Map #6

Wrote input for Map #7

Wrote input for Map #8

Wrote input for Map #9

Starting Job

13/11/06 23:20:07 INFO Configuration.deprecation: mapred.map.tasksis deprecated. Instead, use mapreduce.job.maps

13/11/06 23:20:07 INFO Configuration.deprecation:mapred.output.key.class is deprecated. Instead, usemapreduce.job.output.key.class

13/11/06 23:20:07 INFO Configuration.deprecation:mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir

13/11/06 23:20:11 INFO mapreduce.JobSubmitter: Submittingtokens for job: job_1383806445149_0001

13/11/06 23:20:15 INFO impl.YarnClientImpl: Submittedapplication application_1383806445149_0001 to ResourceManager at /0.0.0.0:8032

13/11/06 23:20:16 INFO mapreduce.Job: The url to trackthe job: http://Node1:8088/proxy/application_1383806445149_0001/ 
13/11/06 23:20:16 INFO mapreduce.Job: Running job:job_1383806445149_0001

13/11/06 23:21:09 INFO mapreduce.Job: Jobjob_1383806445149_0001 running in uber mode : false

13/11/06 23:21:10 INFO mapreduce.Job:  map 0% reduce 0%

13/11/06 23:24:28 INFO mapreduce.Job:  map 20% reduce 0%

13/11/06 23:24:30 INFO mapreduce.Job:  map 30% reduce 0%

13/11/06 23:26:56 INFO mapreduce.Job:  map 57% reduce 0%

13/11/06 23:26:58 INFO mapreduce.Job:  map 60% reduce 0%

13/11/06 23:28:33 INFO mapreduce.Job:  map 70% reduce 20%

13/11/06 23:28:35 INFO mapreduce.Job:  map 80% reduce 20%

13/11/06 23:28:39 INFO mapreduce.Job:  map 80% reduce 27%

13/11/06 23:30:06 INFO mapreduce.Job:  map 90% reduce 27%

13/11/06 23:30:09 INFO mapreduce.Job:  map 100% reduce 27%

13/11/06 23:30:12 INFO mapreduce.Job:  map 100% reduce 33%

13/11/06 23:30:25 INFO mapreduce.Job:  map 100% reduce 100%

13/11/06 23:30:54 INFO mapreduce.Job: Jobjob_1383806445149_0001 completed successfully

13/11/06 23:31:10 INFO mapreduce.Job: Counters: 43

File SystemCounters

FILE:Number of bytes read=226

FILE:Number of bytes written=879166

FILE:Number of read operations=0

FILE:Number of large read operations=0

FILE:Number of write operations=0

HDFS:Number of bytes read=2590

HDFS:Number of bytes written=215

HDFS:Number of read operations=43

HDFS:Number of large read operations=0

HDFS:Number of write operations=3

JobCounters

Launchedmap tasks=10

Launchedreduce tasks=1

Data-localmap tasks=10

Totaltime spent by all maps in occupied slots (ms)=1349359

Totaltime spent by all reduces in occupied slots (ms)=190811

Map-ReduceFramework

Mapinput records=10

Mapoutput records=20

Mapoutput bytes=180

Mapoutput materialized bytes=280

Inputsplit bytes=1410

Combineinput records=0

Combineoutput records=0

Reduceinput groups=2

Reduceshuffle bytes=280

Reduceinput records=20

Reduceoutput records=0

SpilledRecords=40

ShuffledMaps =10

FailedShuffles=0

MergedMap outputs=10

GCtime elapsed (ms)=45355

CPUtime spent (ms)=29860

Physicalmemory (bytes) snapshot=1481818112

Virtualmemory (bytes) snapshot=9214468096

Totalcommitted heap usage (bytes)=1223008256

ShuffleErrors

BAD_ID=0

CONNECTION=0

IO_ERROR=0

WRONG_LENGTH=0

WRONG_MAP=0

WRONG_REDUCE=0

File InputFormat Counters

BytesRead=1180

File OutputFormat Counters

BytesWritten=97

13/11/06 23:31:15 INFO mapred.ClientServiceDelegate:Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirectingto job history server

Job Finished in 719.041 seconds

Estimated value of Pi is 3.14158440000000000000


说明:可以看到最后输出值为该job使用了10个maps, job id为job_1383806445149_000, 最后计算得Pi的值为13.14158440000000000000, job Id分配原则为job_年月日时分_job序列号,序列号从0开始,上限值为1000, task id分配原则为job_年月日时分_job序列号_task序列号_m, job_年月日时分_job序列号_task序列号_r,
m代表map taskslot , r代表reduce task slot, task 序列号从0开始,上限值为1000.

Test Case 2 random_writting

%$YARN_HOME/sbin/yarnjar $YARN_HOME/share/hadoop//mapreduce/hadoop-mapreduce-examples-2.2.0.jar \

randomwriter/user/grid/test/test_randomwriter/out


Console输出摘录

Running 10 maps.

Job started: Wed Nov 06 23:42:17 PST 2013

13/11/06 23:42:17 INFO client.RMProxy: Connecting toResourceManager at /0.0.0.0:8032

13/11/06 23:42:19 INFO mapreduce.JobSubmitter: number ofsplits:10

13/11/06 23:42:20 INFO mapreduce.JobSubmitter: Submittingtokens for job: job_1383806445149_0002

13/11/06 23:42:21 INFO impl.YarnClientImpl: Submittedapplication application_1383806445149_0002 to ResourceManager at /0.0.0.0:8032

13/11/06 23:42:21 INFO mapreduce.Job: The url to trackthe job: http://Master:8088/proxy/application_1383806445149_0002/ 
13/11/06 23:42:21 INFO mapreduce.Job: Running job:job_1383806445149_0002

13/11/06 23:42:40 INFO mapreduce.Job: Jobjob_1383806445149_0002 running in uber mode : false

13/11/06 23:42:40 INFO mapreduce.Job:  map 0% reduce 0%

13/11/06 23:55:02 INFO mapreduce.Job:  map 10% reduce 0%

13/11/06 23:55:14 INFO mapreduce.Job:  map 20% reduce 0%

13/11/06 23:55:42 INFO mapreduce.Job:  map 30% reduce 0%

13/11/07 00:06:55 INFO mapreduce.Job:  map 40% reduce 0%

13/11/07 00:07:10 INFO mapreduce.Job:  map 50% reduce 0%

13/11/07 00:07:36 INFO mapreduce.Job:  map 60% reduce 0%

13/11/07 00:13:47 INFO mapreduce.Job:  map 70% reduce 0%

13/11/07 00:13:54 INFO mapreduce.Job:  map 80% reduce 0%

13/11/07 00:13:58 INFO mapreduce.Job:  map 90% reduce 0%

13/11/07 00:16:29 INFO mapreduce.Job:  map 100% reduce 0%

13/11/07 00:16:37 INFO mapreduce.Job: Jobjob_1383806445149_0002 completed successfully

File OutputFormat Counters

BytesWritten=10772852496

Job ended: Thu Nov 07 00:16:40 PST 2013

The job took 2062 seconds.


说明:电脑存储空间足够的话,可以从hdfs里down下来看看。
现只能看一看输出文件存放的具体形式:

%$YARN_HOME/bin/hadoopfs -ls /user/grid/test/test_randomwriter/out/

Found 11items

-rw-r--r--   2 grid supergroup          0 2013-11-07 00:16/user/grid/test/test_randomwriter/out/_SUCCESS

-rw-r--r--   2 grid supergroup 1077278214 2013-11-0623:54 /user/grid/test/test_randomwriter/out/part-m-00000

-rw-r--r--   2 grid supergroup 1077282751 2013-11-0623:55 /user/grid/test/test_randomwriter/out/part-m-00001

-rw-r--r--   2 grid supergroup 1077280298 2013-11-0623:55 /user/grid/test/test_randomwriter/out/part-m-00002

-rw-r--r--   2 grid supergroup 1077303152 2013-11-0700:07 /user/grid/test/test_randomwriter/out/part-m-00003

-rw-r--r--   2 grid supergroup 1077284240 2013-11-0700:06 /user/grid/test/test_randomwriter/out/part-m-00004

-rw-r--r--   2 grid supergroup 1077286604 2013-11-0700:07 /user/grid/test/test_randomwriter/out/part-m-00005

-rw-r--r--   2 grid supergroup 1077284336 2013-11-0700:13 /user/grid/test/test_randomwriter/out/part-m-00006

-rw-r--r--   2 grid supergroup 1077284829 2013-11-0700:13 /user/grid/test/test_randomwriter/out/part-m-00007

-rw-r--r--   2 grid supergroup 1077289706 2013-11-0700:13 /user/grid/test/test_randomwriter/out/part-m-00008

-rw-r--r--   2 grid supergroup 1077278366 2013-11-0700:16 /user/grid/test/test_randomwriter/out/part-m-00009


Test Case3 word_count
(1)Locaol上创建文件:

%mkdirinput

%echo ‘hello,world’ >> input/file1.in

%echo ‘hello, ruby’ >> input/file2.in


(2)上传到hdfs上:

%$YARN_HOME/bin/hadoop fs -mkdir -p /user/grid/test/test_wordcount/

%$YARN_HOME/bin/hadoop fs –put input/user/grid/test/test_wordcount/in


(3)用yarn新计算框架运行mapreduce:

%$YARN_HOME/bin/yarn jar$YARN_HOME/share/hadoop//mapreduce/hadoop-mapreduce-examples-2.2.0.jarwordcount  /user/grid/test/test_wordcount/in/user/grid/test/test_wordcount/out


ConSole输出摘录

3/11/07 00:35:03 INFO client.RMProxy:Connecting to ResourceManager at /0.0.0.0:8032

13/11/07 00:35:05 INFO input.FileInputFormat:Total input paths to process : 2

13/11/07 00:35:05 INFO mapreduce.JobSubmitter:number of splits:2

13/11/07 00:35:06 INFO mapreduce.JobSubmitter:Submitting tokens for job: job_1383806445149_0003

13/11/07 00:35:08 INFO impl.YarnClientImpl:Submitted application application_1383806445149_0003 to ResourceManager at /0.0.0.0:8032

13/11/07 00:35:08 INFO mapreduce.Job: The urlto track the job: http://Master:8088/proxy/application_1383806445149_0003/ 
13/11/07 00:35:08 INFO mapreduce.Job: Runningjob: job_1383806445149_0003

13/11/07 00:35:25 INFO mapreduce.Job: Jobjob_1383806445149_0003 running in uber mode : false

13/11/07 00:35:25 INFO mapreduce.Job:  map 0% reduce 0%

13/11/07 00:37:50 INFO mapreduce.Job:  map 33% reduce 0%

13/11/07 00:37:54 INFO mapreduce.Job:  map 67% reduce 0%

13/11/07 00:37:55 INFO mapreduce.Job:  map 83% reduce 0%

13/11/07 00:37:58 INFO mapreduce.Job:  map 100% reduce 0%

13/11/07 00:38:51 INFO mapreduce.Job:  map 100% reduce 100%

13/11/07 00:38:54 INFO mapreduce.Job: Jobjob_1383806445149_0003 completed successfully

13/11/07 00:38:56 INFO mapreduce.Job:Counters: 43


说明:查看word count的计算结果:

%$YARN_HOME/bin/hadoop fs -cat /user/grid/test//test_wordcount/out/*

hadoop     1

hello  1

ruby   1


补充:因为新的YARN为了保持与MRv1框架的旧版本兼容性,很多老的API还是可以用,但是会有INFO。此处通过修改$YARN_HOME/etc/hadoop/log4j.properties可以turn offconfiguration deprecation warnings.
建议去掉第138行的注释(可选),确保错误级别为WARN(默认为INFO级别,详见第20行:hadoop.root.logger=INFO,console):
138 log4j.logger.org.apache.hadoop.conf.Configuration.deprecation=WARN
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: