您的位置：首页 > 运维架构

mapreduce计算平均值

2017-09-17 21:10 302 查看

当我们有每一位同学的每一科成绩时，我们计算他们的平均成绩，用传统的方法比较麻烦，如果我们用hadoop中MapReduce组件的话就比较简单了。

测试数据如下：

从上面的数据可以看到，计算每一位同学的平均成绩，在map阶段，我们可以用同学的姓名作为key，成绩作为value；在reduce阶段，key值相同的value值相加计算出总成绩，并且计算出科目的数量，然后用总成绩来除以科目数量就可以得出每一位同学的平均成绩了。

代码如下：

import java.io.IOException;
import java.util.Iterator;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class Socre {
public static class Map extends Mapper<Object,Text,Text,IntWritable>{
public void map(Object key,Text value,Context context){
//把读入的每一行转换为String
String line=value.toString();
//以“\n”切分数据
StringTokenizer tokenizer=new StringTokenizer(line,"\n");
while(tokenizer.hasMoreElements()){
//以空格切分数据
StringTokenizer tokenizerLine=new StringTokenizer(tokenizer.nextToken());
//获取姓名
String nameString=tokenizerLine.nextToken();
//获取成绩
String scoreString=tokenizerLine.nextToken();
int scoreInt=Integer.parseInt(scoreString);
try {
//key:姓名，value:成绩
context.write(new Text(nameString),new IntWritable(scoreInt));
} catch (IOException e) {
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
}

public static class Reduce extends Reducer<Text,IntWritable,Text,IntWritable>{
public void reduce(Text key,Iterable<IntWritable> values,Context context){
//计算给同学的总成绩
int sum=0;
//获取科目数目
int count=0;
Iterator<IntWritable> iterable=values.iterator();
while(iterable.hasNext()){
sum+=iterable.next().get();
count++;
}
int avg=(int)sum/count;
try {
context.write(key,new IntWritable(avg));
} catch (IOException e) {
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}

public static void main(String[] args) throws IllegalArgumentException, IOException, ClassNotFoundException, InterruptedException{
Configuration conf = new Configuration();
Job job = new Job(conf);
job.setJarByClass(Socre.class);
job.setJobName("avgscore1");
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path("/avgscoreinput"));
FileOutputFormat.setOutputPath(job, new Path("/avgscoreoutput"));
job.waitForCompletion(true);

}

}

运行结果：

程序还是有没有考虑到的情况，比如成绩时小数的话，应该用double或float来定义变量，但是该程序只是提供一个算法思想。

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： mapreduce hadoop 计算平均值数据

相关文章推荐

新的分享

章节导航