您的位置:首页 > 运维架构

wordcount示例程序运行全过程(Hadoop-1.0.0)

2012-02-27 16:25 591 查看
在上一篇文章中,已经提到了配置完成了一个简单的分布式hadoop的hdfs文件系统,下面就需要去运行一下wordcount来了解该文件系统的运行机制。

cd到wordcount.java的目录,再此目录下建立一个新的目录:WordCount。然后执行:

]$javac -classpath ~/hadoop-1.0.0/lib/*.java:~/hadoop-1.0.0/hadoop-core-1.0.0.jar -d WordCount WordCount.java

编译完成后将在WordCount目录下生成三个文件,然后将其打包

]$jar -cvf ~/wordcount.jar -C WordCount/ .

这样,在你的主目录下就可以看到一个wordcount.jar。然后建立两个目录,一个input,一个output。但是由于wordcount会自己创建output目录,所以就不要再建output目录了。在input中写两个测试文件,然后使用命令拷贝进HDFS系统(假设你建立了两个文件:file01,file02):

]$./~/hadoop-1.0.0/bin/hadoop fs -mkdir
/user/fangpei.pt/input

]$./~/hadoop-1.0.0/bin/hadoop fs -copyFromLocal ~/file01 /user/fangpei.pt/input/
]$./~/hadoop-1.0.0/bin/hadoop
fs -copyFromLocal ~/file02 /user/fangpei.pt/input/

]$./~/hadoop-1.0.0/bin/hadoop
fs -mkdir /user/fangpei.pt/output

这样输入就准备好了。下面执行:
]$./../bin/hadoop jar ~/wordcount.jar org.apache.hadoop.examples.WordCount /user/fangpei.pt/wordcount/input /user/fangpei.pt/wordcount/output
执行结果如下:
12/02/27 14:42:30 INFO input.FileInputFormat: Total input paths to process : 2

12/02/27 14:42:30 INFO mapred.JobClient: Running job: job_201202131150_0002

12/02/27 14:42:31 INFO mapred.JobClient: map 0% reduce 0%

12/02/27 14:42:46 INFO mapred.JobClient: map 100% reduce 0%

12/02/27 14:42:59 INFO mapred.JobClient: map 100% reduce 100%

12/02/27 14:43:03 INFO mapred.JobClient: Job complete: job_201202131150_0002

12/02/27 14:43:03 INFO mapred.JobClient: Counters: 29

12/02/27 14:43:03 INFO mapred.JobClient: Job Counters

12/02/27 14:43:03 INFO mapred.JobClient: Launched reduce tasks=1

12/02/27 14:43:03 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=19155

12/02/27 14:43:03 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0

12/02/27 14:43:03 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0

12/02/27 14:43:03 INFO mapred.JobClient: Launched map tasks=2

12/02/27 14:43:03 INFO mapred.JobClient: Data-local map tasks=2

12/02/27 14:43:03 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=10685

12/02/27 14:43:03 INFO mapred.JobClient: File Output Format Counters

12/02/27 14:43:03 INFO mapred.JobClient: Bytes Written=41

12/02/27 14:43:03 INFO mapred.JobClient: FileSystemCounters

12/02/27 14:43:03 INFO mapred.JobClient: FILE_BYTES_READ=79

12/02/27 14:43:03 INFO mapred.JobClient: HDFS_BYTES_READ=320

12/02/27 14:43:03 INFO mapred.JobClient: FILE_BYTES_WRITTEN=65371

12/02/27 14:43:03 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=41

12/02/27 14:43:03 INFO mapred.JobClient: File Input Format Counters

12/02/27 14:43:03 INFO mapred.JobClient: Bytes Read=50

12/02/27 14:43:03 INFO mapred.JobClient: Map-Reduce Framework

12/02/27 14:43:03 INFO mapred.JobClient: Map output materialized bytes=85

12/02/27 14:43:03 INFO mapred.JobClient: Map input records=2

12/02/27 14:43:03 INFO mapred.JobClient: Reduce shuffle bytes=85

12/02/27 14:43:03 INFO mapred.JobClient: Spilled Records=12

12/02/27 14:43:03 INFO mapred.JobClient: Map output bytes=82

12/02/27 14:43:03 INFO mapred.JobClient: CPU time spent (ms)=2220

12/02/27 14:43:03 INFO mapred.JobClient: Total committed heap usage (bytes)=348389376

12/02/27 14:43:03 INFO mapred.JobClient: Combine input records=8

12/02/27 14:43:03 INFO mapred.JobClient: SPLIT_RAW_BYTES=270

12/02/27 14:43:03 INFO mapred.JobClient: Reduce input records=6

12/02/27 14:43:03 INFO mapred.JobClient: Reduce input groups=5

12/02/27 14:43:03 INFO mapred.JobClient: Combine output records=6

12/02/27 14:43:03 INFO mapred.JobClient: Physical memory (bytes) snapshot=383037440

12/02/27 14:43:03 INFO mapred.JobClient: Reduce output records=5

12/02/27 14:43:03 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1138081792

12/02/27 14:43:03 INFO mapred.JobClient: Map output records=8

查看下输出目录,发现生成了output目录。
如有问题,欢迎留言撒~~
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: