您的位置:首页 > 运维架构

Hadoop中的Word Count例子到底是怎么回事?

2014-05-30 10:29 477 查看
1. 不管是国外还是国内,一提到Hadoop的入门程序,第一个一定会是WordCount,就是给你一堆文件,让你计算出文件中出现的单词的个数;

2. 他们一般给出的程序如下:

Map:

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;

public class WordCountMapper extends MapReduceBase
implements Mapper<LongWritable, Text, Text, IntWritable> {

private final IntWritable one = new IntWritable(1);
private Text word = new Text();

public void map(WritableComparable key, Writable value,
OutputCollector output, Reporter reporter) throws IOException {

String line = value.toString();
StringTokenizer itr = new StringTokenizer(line.toLowerCase());
while(itr.hasMoreTokens()) {
word.set(itr.nextToken());
output.collect(word, one);
}
}
}


Reduce:

import java.io.IOException;
import java.util.Iterator;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;

public class WordCountReducer extends MapReduceBase
implements Reducer<Text, IntWritable, Text, IntWritable> {

public void reduce(Text key, Iterator values,
OutputCollector output, Reporter reporter) throws IOException {

int sum = 0;
while (values.hasNext()) {
IntWritable value = (IntWritable) values.next();
sum += value.get(); // process value
}

output.collect(key, new IntWritable(sum));
}
}
3. 但是没有解释为什么在Map中使用的是值为1的IntWritable,我google了一段时间,找到下面两个图,应该可以很好的解释了这个问题。

The MapReduce workflow

The MapReduce workflow
The MapReduce workflow



The word count flow:



refs:

http://blog.gopivotal.com/pivotal/products/hadoop-101-programming-mapreduce-with-native-libraries-hive-pig-and-cascading

http://kickstarthadoop.blogspot.de/2011/04/word-count-hadoop-map-reduce-example.html

https://developer.yahoo.com/hadoop/tutorial/
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: