您的位置：首页 > 其它

词频统计Map-Reduce过程

2016-04-28 10:17 561 查看

hdfs原始数据：

hello a

hello b

map阶段：

输入数据：key-value对，key为偏移量(一个字符一个偏移量，换行也算一个)

<0,"hello a">
<8,"hello b">

输出数据：context上下文，存储输出的数据（伪代码如下）

map(key,value,context) {
String line = value;    //hello a
String[] words = value.split("\t");
for(String word : words) {
//第一次hello
//第一次a
//第一次hello
//第一次b
context.write(word,1);//word为当前的词，1代表一次
}
}

map输出数据为：

<hello,1>
<a,1>
<hello,1>
<b,1>

reduce阶段(首先对输入数据进行分组排序)：

输入阶段：

<a,1>
<b,1>
<hello,{1,1}>

输出阶段：

reduce(key,value,context) {
int count = 0;
String word = key;
for(int i : value) {
count +=i;
}
context.write(key,count);
}

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航