您的位置：首页 > 运维架构 > Linux

Win7下面安装hadoop2.x插件及Win7/Linux运行MapReduce程序

2015-05-13 11:48 429 查看

http://www.it165.net/admin/html/201505/5427.html

一、win7下

（一）、安装环境及安装包

win7 32 bit

jdk7

eclipse-java-juno-SR2-win32.zip

hadoop-2.2.0.tar.gz

hadoop-eclipse-plugin-2.2.0.jar

hadoop-common-2.2.0-bin.rar

（二）、安装

默认已经安装好了jdk、eclipse以及配置好了hadoop伪分布模式

1、拷贝hadoop-eclipse-plugin-2.2.0.jar插件到Eclipse安装目录的子目录plugins下，重启Eclipse。

2、设置环境变量

3、配置eclipse中hadoop的安装目录

解压hadoop-2.2.0.tar.gz

4、解压hadoop-common-2.2.0-bin.rar

复制里面的文件到hadoop安装目录的bin文件夹下

（三）、在win7下，MapReuce On Yarn执行

新建一个工程

点击window–>show view–>Map/Reduce Locations

点击New Hadoop Location……

添加如下配置，点击完成。

自此，你就可以查看HDFS中的相关内容了。

编写mapreduce程序

在src目录下添加文件log4j.properties，内容如下：

view source print ?

1.

<code

class

" hljs avrasm"

>log4j.rootLogger=debug,appender1

2.

3.

log4j.appender.appender1=org.apache.log4j.ConsoleAppender

4.

5.

log4j.appender.appender1.layout=org.apache.log4j.TTCCLayout

6.

</code>

运行，结果如下：

二、在Linux下

（一）在Linux下，MapReuce On Yarn上

运行

view source print ?

01.

<code

class

" hljs vhdl"

>[root

@liguodong

Documents]# yarn jar  test.jar hdfs:

//liguodong:8020/hello
hdfs://liguodong:8020/output

02.

INFO client.RMProxy: Connecting to ResourceManager at /

0.0

0.0

03.

………………

04.

INFO mapreduce.JobSubmitter: Submitting tokens

for

job: job_1430648117067_0001

05.

INFO impl.YarnClientImpl: Submitted application application_1430648117067_0001 to ResourceManager at /

0.0

0.0

06.

INFO mapreduce.Job: The url to track the job: http:

//liguodong:8088/proxy/application_1430648117067_0001/

07.

INFO mapreduce.Job: Running job: job_1430648117067_0001

08.

INFO mapreduce.Job: Job job_1430648117067_0001 running in uber mode :

false

09.

INFO mapreduce.Job:  map

% reduce

10.

INFO mapreduce.Job:  map

% reduce

11.

INFO mapreduce.Job:  map

% reduce

12.

INFO mapreduce.Job: Job job_1430648117067_0001 completed successfully

13.

INFO mapreduce.Job: Counters:

14.

File System Counters

15.

FILE: Number of bytes read=

16.

FILE: Number of bytes written=

17.

FILE: Number of read operations=

18.

FILE: Number of large read operations=

19.

FILE: Number of write operations=

20.

HDFS: Number of bytes read=

21.

HDFS: Number of bytes written=

22.

HDFS: Number of read operations=

23.

HDFS: Number of large read operations=

24.

HDFS: Number of write operations=

25.

Job Counters

26.

Launched map tasks=

27.

Launched reduce tasks=

28.

Data-local map tasks=

29.

Total time spent by all maps in occupied slots (ms)=

30.

Total time spent by all reduces in occupied slots (ms)=

31.

Map-Reduce Framework

32.

Map input records=

33.

Map output records=

34.

Map output bytes=

35.

Map output materialized bytes=

36.

Input split bytes=

37.

Combine input records=

38.

Combine output records=

39.

Reduce input groups=

40.

Reduce shuffle bytes=

41.

Reduce input records=

42.

Reduce output records=

43.

Spilled Records=

44.

Shuffled Maps =

45.

Failed Shuffles=

46.

Merged Map outputs=

47.

GC time elapsed (ms)=

48.

CPU time spent (ms)=

49.

Physical memory (bytes) snapshot=

211070976

50.

Virtual memory (bytes) snapshot=

777789440

51.

Total committed heap usage (bytes)=

130879488

52.

Shuffle Errors

53.

BAD_ID=

54.

CONNECTION=

55.

IO_ERROR=

56.

WRONG_LENGTH=

57.

WRONG_MAP=

58.

WRONG_REDUCE=

59.

File Input Format Counters

60.

Bytes Read=

61.

File Output Format Counters

62.

Bytes Written=

63.

</code>

查看结果

view source print ?

01.

<code

class

" hljs applescript"

>[root

@liguodong

Documents]# hdfs dfs -ls  /

02.

Found

items

03.

-rw-r--r--

root supergroup

/hello

04.

drwxr-xr-x   - root supergroup

/output

05.

drwx------   - root supergroup

/tmp

06.

[root

@liguodong

Documents]# hdfs dfs -ls  /output

07.

Found

items

08.

-rw-r--r--

root supergroup

/output/_SUCCESS

09.

-rw-r--r--

root supergroup

/output/part-r-

10.

[root

@liguodong

Documents]# hdfs dfs -text  /output/pa*

11.

hadoop

12.

hello

13.

me

14.

you

15.

</code>

遇到的问题

1.

<code

class

" hljs coffeescript"

>File /output/………  could only be replicated to

nodes instead of minReplication (=

).

2.

There are

datanode(s) running and no node(s) are excluded in

this

operation.</code>

在网上找了很多方法是试了没有解决，然后自己根据这句话的中文意思是只有被复制到0个副本，而不是最少的一个副本。

我将最先dfs.replication.min设置为0，但是很遗憾，后面运行之后发现必须大于0，我又改为了1。

然后再dfs.datanode.data.dir多设置了几个路径，就当是在一个系统中多次备份吧，后面发现成功了。

设置如下，在hdfs-site.xml中添加如下配置。

1.

<code

class

" hljs avrasm"

>    <property>

2.

<name>dfs.datanode.data.dir</name>

3.

<value>     file:

//${hadoop.tmp.dir}/dfs/dn,file://${hadoop.tmp.dir}/dfs/dn1,file://${hadoop.tmp.dir}/dfs/dn2

4.

</value>

5.

</property>

6.

</code>

（二）在Linux下，MapReuce On Local上

在mapred-site.xml中，添加如下配置文件。

1.

<code

class

" hljs xml"

><configuration>

2.

<property>

3.

<name>mapreduce.framework.name</name>

4.

<value>local</value>

5.

</property>

6.

</configuration></code>

可以不用启动ResourceManager和NodeManager。

运行

1.

<code

class

" hljs ruby"

>[root

@liguodong

Documents]# hadoop jar  test.jar hdfs:

//liguodong:8020/hello
hdfs://liguodong:8020/output</code>

三、MapReduce运行模式有多种

mapred-site.xml中

1）本地运行模式(默认)

1.

<code

class

" hljs xml"

><configuration>

2.

<property>

3.

<name>mapreduce.framework.name</name>

4.

<value>local</value>

5.

</property>

6.

</configuration></code>

2）运行在YARN上

1.

<code

class

" hljs xml"

><configuration>

2.

<property>

3.

<name>mapreduce.framework.name</name>

4.

<value>yarn</value>

5.

</property>

6.

</configuration></code>

四、Uber Mode

Uber Mode是针对于在Hadoop2.x中，对于MapReuduce Job小作业来说的一种优化方式（重用JVM的方式）。

小作业指的是MapReduce Job 运行处理的数据量，当数据量（大小）小于 HDFS 存储数据时block的大小（128M）。

默认是没有启动的。

mapred-site.xml中

1.

<code

class

" hljs xml"

><name>mapreduce.job.ubertask.enable</name>

2.

<value>

true

</value></code>

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航