您的位置:首页 > 运维架构

hadoop入门学习--WordCount

2016-02-22 21:37 405 查看
学习资料:http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/

上一篇中已经成功在虚拟机的ubuntu环境上搭建了hadoop2.7.2的伪分布式集群,这一篇来运行一个map reduce job,计算指定文件中每个单词出现的次数:

1.下载文件到本地目录:

hduser@ubuntu:/home/miranda/usr/hadoop$ ls -l /home/miranda/Downloads/
total 3596
-rw-rw-r-- 1 miranda miranda 1428841 Feb 21 23:42 5000-8.txt
-rw-rw-r-- 1 miranda miranda 674570 Feb 21 23:41 pg20417.txt
-rw-rw-r-- 1 miranda miranda 1573151 Feb 21 23:45 pg4300.txt

2.重启hadoop集群

关闭:

hduser@ubuntu:/home/miranda/usr/hadoop$ bash /home/miranda/usr/hadoop/sbin/stop-all.sh
开启:

hduser@ubuntu:/home/miranda/usr/hadoop$ bash /home/miranda/usr/hadoop/sbin/start-all.sh

查看进程:

hduser@ubuntu:/home/miranda/usr/hadoop$ jps
10756 NodeManager
10460 SecondaryNameNode
10118 NameNode
10246 DataNode
5763 GetConf
10624 ResourceManager
11068 Jps

正常启动

3.将本地文件复制到hdfs中

创建hdfs路径目录:hduser@ubuntu:/home/miranda/usr/hadoop$ hadoop fs -mkdir /input
复制:hduser@ubuntu:/home/miranda/usr/hadoop$ bin/hadoop dfs -copyFromLocal /home/miranda/Downloads /input

查看hdfs上的文件:

hduser@ubuntu:/home/miranda/usr/hadoop$ bin/hadoop dfs -ls /input
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Found 1 items
drwxr-xr-x - hduser supergroup 0 2016-02-22 05:22 /input/Downloads
hduser@ubuntu:/home/miranda/usr/hadoop$ bin/hadoop dfs -ls /input/Downloads
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Found 3 items
-rw-r--r-- 1 hduser supergroup 1428841 2016-02-22 05:22 /input/Downloads/5000-8.txt
-rw-r--r-- 1 hduser supergroup 674570 2016-02-22 05:22 /input/Downloads/pg20417.txt
-rw-r--r-- 1 hduser supergroup 1573151 2016-02-22 05:22 /input/Downloads/pg4300.txt

4.运行map reduce job

创建输出目录:

hduser@ubuntu:/home/miranda/usr/hadoop$ hadoop fs -mkdir /testdata/output

jar可执行程序是自带的例子:

hduser@ubuntu:/home/miranda/usr/hadoop$ bin/hadoop jar /home/miranda/usr/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /input/Downloads /output

执行过程有点慢,,,,

输出成功

hduser@ubuntu:/home/miranda/usr/hadoop$ bin/hadoop dfs -ls /testdata/output
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Found 2 items
-rw-r--r-- 1 hduser supergroup 0 2016-02-22 05:29 /testdata/output/_SUCCESS
-rw-r--r-- 1 hduser supergroup 883509 2016-02-22 05:29 /testdata/output/part-r-00000

5.查看HDFS上的输出结果

hduser@ubuntu:/home/miranda/usr/hadoop$ bin/hadoop dfs -cat /testdata/output/part-r-00000
截取一部分输出:
zenith 5
zephyrs, 1
zerfielen. 1
zero 3
zero; 1
zest 1
zest. 2
zigzag 2
zigzagging 1
zigzags, 1

操作成功!
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: