wordcount示例程序运行全过程(Hadoop-1.0.0)
2012-02-27 16:25
591 查看
在上一篇文章中,已经提到了配置完成了一个简单的分布式hadoop的hdfs文件系统,下面就需要去运行一下wordcount来了解该文件系统的运行机制。
cd到wordcount.java的目录,再此目录下建立一个新的目录:WordCount。然后执行:
]$javac -classpath ~/hadoop-1.0.0/lib/*.java:~/hadoop-1.0.0/hadoop-core-1.0.0.jar -d WordCount WordCount.java
编译完成后将在WordCount目录下生成三个文件,然后将其打包
]$jar -cvf ~/wordcount.jar -C WordCount/ .
这样,在你的主目录下就可以看到一个wordcount.jar。然后建立两个目录,一个input,一个output。但是由于wordcount会自己创建output目录,所以就不要再建output目录了。在input中写两个测试文件,然后使用命令拷贝进HDFS系统(假设你建立了两个文件:file01,file02):
]$./~/hadoop-1.0.0/bin/hadoop fs -mkdir
/user/fangpei.pt/input
]$./~/hadoop-1.0.0/bin/hadoop fs -copyFromLocal ~/file01 /user/fangpei.pt/input/
]$./~/hadoop-1.0.0/bin/hadoop
fs -copyFromLocal ~/file02 /user/fangpei.pt/input/
]$./~/hadoop-1.0.0/bin/hadoop
fs -mkdir /user/fangpei.pt/output
这样输入就准备好了。下面执行:
]$./../bin/hadoop jar ~/wordcount.jar org.apache.hadoop.examples.WordCount /user/fangpei.pt/wordcount/input /user/fangpei.pt/wordcount/output
执行结果如下:
12/02/27 14:42:30 INFO input.FileInputFormat: Total input paths to process : 2
12/02/27 14:42:30 INFO mapred.JobClient: Running job: job_201202131150_0002
12/02/27 14:42:31 INFO mapred.JobClient: map 0% reduce 0%
12/02/27 14:42:46 INFO mapred.JobClient: map 100% reduce 0%
12/02/27 14:42:59 INFO mapred.JobClient: map 100% reduce 100%
12/02/27 14:43:03 INFO mapred.JobClient: Job complete: job_201202131150_0002
12/02/27 14:43:03 INFO mapred.JobClient: Counters: 29
12/02/27 14:43:03 INFO mapred.JobClient: Job Counters
12/02/27 14:43:03 INFO mapred.JobClient: Launched reduce tasks=1
12/02/27 14:43:03 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=19155
12/02/27 14:43:03 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
12/02/27 14:43:03 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
12/02/27 14:43:03 INFO mapred.JobClient: Launched map tasks=2
12/02/27 14:43:03 INFO mapred.JobClient: Data-local map tasks=2
12/02/27 14:43:03 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=10685
12/02/27 14:43:03 INFO mapred.JobClient: File Output Format Counters
12/02/27 14:43:03 INFO mapred.JobClient: Bytes Written=41
12/02/27 14:43:03 INFO mapred.JobClient: FileSystemCounters
12/02/27 14:43:03 INFO mapred.JobClient: FILE_BYTES_READ=79
12/02/27 14:43:03 INFO mapred.JobClient: HDFS_BYTES_READ=320
12/02/27 14:43:03 INFO mapred.JobClient: FILE_BYTES_WRITTEN=65371
12/02/27 14:43:03 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=41
12/02/27 14:43:03 INFO mapred.JobClient: File Input Format Counters
12/02/27 14:43:03 INFO mapred.JobClient: Bytes Read=50
12/02/27 14:43:03 INFO mapred.JobClient: Map-Reduce Framework
12/02/27 14:43:03 INFO mapred.JobClient: Map output materialized bytes=85
12/02/27 14:43:03 INFO mapred.JobClient: Map input records=2
12/02/27 14:43:03 INFO mapred.JobClient: Reduce shuffle bytes=85
12/02/27 14:43:03 INFO mapred.JobClient: Spilled Records=12
12/02/27 14:43:03 INFO mapred.JobClient: Map output bytes=82
12/02/27 14:43:03 INFO mapred.JobClient: CPU time spent (ms)=2220
12/02/27 14:43:03 INFO mapred.JobClient: Total committed heap usage (bytes)=348389376
12/02/27 14:43:03 INFO mapred.JobClient: Combine input records=8
12/02/27 14:43:03 INFO mapred.JobClient: SPLIT_RAW_BYTES=270
12/02/27 14:43:03 INFO mapred.JobClient: Reduce input records=6
12/02/27 14:43:03 INFO mapred.JobClient: Reduce input groups=5
12/02/27 14:43:03 INFO mapred.JobClient: Combine output records=6
12/02/27 14:43:03 INFO mapred.JobClient: Physical memory (bytes) snapshot=383037440
12/02/27 14:43:03 INFO mapred.JobClient: Reduce output records=5
12/02/27 14:43:03 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1138081792
12/02/27 14:43:03 INFO mapred.JobClient: Map output records=8
查看下输出目录,发现生成了output目录。
如有问题,欢迎留言撒~~
cd到wordcount.java的目录,再此目录下建立一个新的目录:WordCount。然后执行:
]$javac -classpath ~/hadoop-1.0.0/lib/*.java:~/hadoop-1.0.0/hadoop-core-1.0.0.jar -d WordCount WordCount.java
编译完成后将在WordCount目录下生成三个文件,然后将其打包
]$jar -cvf ~/wordcount.jar -C WordCount/ .
这样,在你的主目录下就可以看到一个wordcount.jar。然后建立两个目录,一个input,一个output。但是由于wordcount会自己创建output目录,所以就不要再建output目录了。在input中写两个测试文件,然后使用命令拷贝进HDFS系统(假设你建立了两个文件:file01,file02):
]$./~/hadoop-1.0.0/bin/hadoop fs -mkdir
/user/fangpei.pt/input
]$./~/hadoop-1.0.0/bin/hadoop fs -copyFromLocal ~/file01 /user/fangpei.pt/input/
]$./~/hadoop-1.0.0/bin/hadoop
fs -copyFromLocal ~/file02 /user/fangpei.pt/input/
]$./~/hadoop-1.0.0/bin/hadoop
fs -mkdir /user/fangpei.pt/output
这样输入就准备好了。下面执行:
]$./../bin/hadoop jar ~/wordcount.jar org.apache.hadoop.examples.WordCount /user/fangpei.pt/wordcount/input /user/fangpei.pt/wordcount/output
执行结果如下:
12/02/27 14:42:30 INFO input.FileInputFormat: Total input paths to process : 2
12/02/27 14:42:30 INFO mapred.JobClient: Running job: job_201202131150_0002
12/02/27 14:42:31 INFO mapred.JobClient: map 0% reduce 0%
12/02/27 14:42:46 INFO mapred.JobClient: map 100% reduce 0%
12/02/27 14:42:59 INFO mapred.JobClient: map 100% reduce 100%
12/02/27 14:43:03 INFO mapred.JobClient: Job complete: job_201202131150_0002
12/02/27 14:43:03 INFO mapred.JobClient: Counters: 29
12/02/27 14:43:03 INFO mapred.JobClient: Job Counters
12/02/27 14:43:03 INFO mapred.JobClient: Launched reduce tasks=1
12/02/27 14:43:03 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=19155
12/02/27 14:43:03 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
12/02/27 14:43:03 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
12/02/27 14:43:03 INFO mapred.JobClient: Launched map tasks=2
12/02/27 14:43:03 INFO mapred.JobClient: Data-local map tasks=2
12/02/27 14:43:03 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=10685
12/02/27 14:43:03 INFO mapred.JobClient: File Output Format Counters
12/02/27 14:43:03 INFO mapred.JobClient: Bytes Written=41
12/02/27 14:43:03 INFO mapred.JobClient: FileSystemCounters
12/02/27 14:43:03 INFO mapred.JobClient: FILE_BYTES_READ=79
12/02/27 14:43:03 INFO mapred.JobClient: HDFS_BYTES_READ=320
12/02/27 14:43:03 INFO mapred.JobClient: FILE_BYTES_WRITTEN=65371
12/02/27 14:43:03 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=41
12/02/27 14:43:03 INFO mapred.JobClient: File Input Format Counters
12/02/27 14:43:03 INFO mapred.JobClient: Bytes Read=50
12/02/27 14:43:03 INFO mapred.JobClient: Map-Reduce Framework
12/02/27 14:43:03 INFO mapred.JobClient: Map output materialized bytes=85
12/02/27 14:43:03 INFO mapred.JobClient: Map input records=2
12/02/27 14:43:03 INFO mapred.JobClient: Reduce shuffle bytes=85
12/02/27 14:43:03 INFO mapred.JobClient: Spilled Records=12
12/02/27 14:43:03 INFO mapred.JobClient: Map output bytes=82
12/02/27 14:43:03 INFO mapred.JobClient: CPU time spent (ms)=2220
12/02/27 14:43:03 INFO mapred.JobClient: Total committed heap usage (bytes)=348389376
12/02/27 14:43:03 INFO mapred.JobClient: Combine input records=8
12/02/27 14:43:03 INFO mapred.JobClient: SPLIT_RAW_BYTES=270
12/02/27 14:43:03 INFO mapred.JobClient: Reduce input records=6
12/02/27 14:43:03 INFO mapred.JobClient: Reduce input groups=5
12/02/27 14:43:03 INFO mapred.JobClient: Combine output records=6
12/02/27 14:43:03 INFO mapred.JobClient: Physical memory (bytes) snapshot=383037440
12/02/27 14:43:03 INFO mapred.JobClient: Reduce output records=5
12/02/27 14:43:03 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1138081792
12/02/27 14:43:03 INFO mapred.JobClient: Map output records=8
查看下输出目录,发现生成了output目录。
如有问题,欢迎留言撒~~
相关文章推荐
- Hadoop示例程序WordCount编译运行
- 运行Hadoop的示例程序WordCount-Running Hadoop Example
- Hadoop MapReduce示例程序WordCount.java手动编译运行解析
- Hadoop示例程序WordCount运行及详解
- Hadoop MapReduce示例程序WordCount.java手动编译运行解析
- Hadoop示例程序WordCount运行及详解
- Hadoop示例程序WordCount运行及详解
- Hadoop的安装与配置及示例程序wordcount的运行
- Hadoop2.4.1中wordcount示例程序测试过程
- Hadoop系列--Hadoop自带程序wordcount运行示例
- Hadoop MapReduce基于新API的WordCount程序运行过程分析
- eclipse运行hadoop示例程序wordcount的一些问题
- 运行Hadoop示例程序WordCount
- Hadoop 下 WordCount 程序运行方法及过程分析
- hadoop示例程序wordcount的运行
- Hadoop安装配置、运行第一个WordCount示例程序
- Hadoop示例程序WordCount编译运行
- hadoop 集群运行WordCount示例程序
- 【学习笔记】用Hadoop在MapReduce中WordCount简单程序运行详细流程
- Hadoop实例WordCount程序一步一步运行