Hadoop中的Word Count例子到底是怎么回事?
2014-05-30 10:29
477 查看
1. 不管是国外还是国内,一提到Hadoop的入门程序,第一个一定会是WordCount,就是给你一堆文件,让你计算出文件中出现的单词的个数;
2. 他们一般给出的程序如下:
Map:
Reduce:
The MapReduce workflow
The MapReduce workflow
The MapReduce workflow
The word count flow:
refs:
http://blog.gopivotal.com/pivotal/products/hadoop-101-programming-mapreduce-with-native-libraries-hive-pig-and-cascading
http://kickstarthadoop.blogspot.de/2011/04/word-count-hadoop-map-reduce-example.html
https://developer.yahoo.com/hadoop/tutorial/
2. 他们一般给出的程序如下:
Map:
import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.io.Writable; import org.apache.hadoop.io.WritableComparable; import org.apache.hadoop.mapred.MapReduceBase; import org.apache.hadoop.mapred.Mapper; import org.apache.hadoop.mapred.OutputCollector; import org.apache.hadoop.mapred.Reporter; public class WordCountMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { private final IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(WritableComparable key, Writable value, OutputCollector output, Reporter reporter) throws IOException { String line = value.toString(); StringTokenizer itr = new StringTokenizer(line.toLowerCase()); while(itr.hasMoreTokens()) { word.set(itr.nextToken()); output.collect(word, one); } } }
Reduce:
import java.io.IOException; import java.util.Iterator; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.io.WritableComparable; import org.apache.hadoop.mapred.MapReduceBase; import org.apache.hadoop.mapred.OutputCollector; import org.apache.hadoop.mapred.Reducer; import org.apache.hadoop.mapred.Reporter; public class WordCountReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterator values, OutputCollector output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { IntWritable value = (IntWritable) values.next(); sum += value.get(); // process value } output.collect(key, new IntWritable(sum)); } }3. 但是没有解释为什么在Map中使用的是值为1的IntWritable,我google了一段时间,找到下面两个图,应该可以很好的解释了这个问题。
The MapReduce workflow
The MapReduce workflow
The MapReduce workflow
The word count flow:
refs:
http://blog.gopivotal.com/pivotal/products/hadoop-101-programming-mapreduce-with-native-libraries-hive-pig-and-cascading
http://kickstarthadoop.blogspot.de/2011/04/word-count-hadoop-map-reduce-example.html
https://developer.yahoo.com/hadoop/tutorial/
相关文章推荐
- 【转】分析Hadoop自带WordCount例子的执行过程(1)
- 【转】分析Hadoop自带WordCount例子的执行过程(3)
- hadoop自带例子wordcount的具体运行步骤
- hadoop第一个例子WordCount
- Hadoop 2.2.0新API的WordCount例子(运行通过)
- hadoop的第一个例子wordcount
- hadoop-0.20.1-examples.jar wordcount 例子运行出现的问题记录
- hadoop安装--18-- 验证 WordCount 例子
- hadoop MapReduce实例解析(wordcount例子)
- 分析Hadoop自带WordCount例子的执行过程(1)
- 在linux下eclipse中运行hadoop自带的WordCount例子出现的两个错误
- Hadoop测试例子wordcount
- 分析Hadoop自带WordCount例子的执行过程
- Hadoop - Map/Reduce 通过WordCount例子的变化来了解新版hadoop接口的变化
- 【转】分析Hadoop自带WordCount例子的执行过程(2)
- 云计算学习笔记006---运行hadoop的例子程序:统计字符--wordcount例子程序
- hadoop执行wordcount例子
- hadoop的WordCount例子
- hadoop第一个例子wordcount学习
- linux下在eclipse上运行hadoop自带例子wordcount