您的位置:首页 > 运维架构

Hadoop小练习——利用MapReduce求平均数

2015-07-05 20:37 651 查看
前面对MapRuduce理念作了学习,有一点领会,趁热打铁做一个小练习,巩固下理念知识才是真理,实践是检验真理的唯一标准。

这里做一个求分数平均数的MapReduce例子,这里引导一位前辈说的方法,我觉得非常道理。就是:

map阶段输入什么、map过程执行什么、map阶段输出什么、reduce阶段输入什么、执行什么、输出什么。能够将以上几个点弄清楚整明白,一个MapReduce程序就会跃然纸上。这里:
Map:指定格式的数据集(如"张三  60")——输入数据执行每条记录的分割操作以key-value写入上下文context中——执行功能
   得到指定键值对类型的输出(如"(new Text(张三),new IntWritable(60))")——输出结果
Reduce: map的输出——输入数据求出单个个体的总成绩后再除以该个体课程数目——执行功能得到指定键值对类型的输入——输出结果
鉴于上面的map和reduce过程,我们可以得到如下的代码:

package com.linxiaosheng.test;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.FloatWritable;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.util.GenericOptionsParser;

import com.linxiaosheng.test.Test1123.MapperClass;
import com.linxiaosheng.test.Test1123.ReducerClass;

public class ScoreAvgTest {
/**
*
* @author hadoop
* KEYIN:输入map的key值,为每行文本的开始位置子字节计算,(0,11...)
* VALUEIN:输入map的value,为每行文本值
* KEYOUT :输出的key值
* VALUEOUT:输出的value值
*/
public static class MapperClass extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable score = new IntWritable();
private Text name = new Text();
@Override
protected void map(Object key, Text value,Context context)
throws IOException, InterruptedException {
// TODO Auto-generated method stub
String lineText=value.toString();
System.out.println("Before Map:"+key+","+lineText);
StringTokenizer stringTokenizer=new StringTokenizer(lineText);
while(stringTokenizer.hasMoreTokens()){
name.set(stringTokenizer.nextToken());
score.set(Integer.parseInt(stringTokenizer.nextToken()));
System.out.println("Aefore Map:"+name+","+score);
try {
context.write(name, score);
} catch (IOException e) {
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
}
}

}

}

/**
*
* @author hadoop
*KEYIN:输入的名字
*VALUEIN:输入的分数
*KEYOUT:输出的名字
*VALUEOUT:统计输出的平均分
*/
public static class ReducerClass extends Reducer<Text, IntWritable, Text, IntWritable>{
public IntWritable result = new IntWritable();
protected void reduce(Text name, Iterable<IntWritable> scores,Context context)
throws IOException, InterruptedException {
// TODO Auto-generated method stub

StringBuffer sb=new StringBuffer();
int sum=0;
int avg=0;
int num=0;
for(IntWritable score:scores){
int s=score.get();
sum+=s;
num++;
sb.append(s+",");
}
avg=sum/num;
System.out.println("Bfter Reducer:"+name+","+sb.toString());
System.out.println("After Reducer:"+name+","+avg);
result.set(avg);
try {
context.write(name, result);
} catch (IOException e) {
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}

public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
/*if (otherArgs.length != 2) {
System.err.println("Usage: wordcount <in> <out>");
System.exit(2);
}*/
Job job = new Job(conf, "ScoreAvgTest");

job.setJarByClass(ScoreAvgTest.class);
job.setMapperClass(MapperClass.class);
job.setCombinerClass(ReducerClass.class);
job.setReducerClass(ReducerClass.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

org.apache.hadoop.mapreduce.lib.input.FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.addInputPath(job, new Path(otherArgs[1]));
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.setOutputPath(job, new Path(otherArgs[2]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
System.out.println("end");

}
}


数据集:这里的数据是码农我自己手工创建的,主要是想看看mapreduce的运行过程,所以就创建了两个文件,当然这里面的成绩也就没有什么是否符合正态分布的考虑了……
数据中设定有A-K共11个学生,共16门课程,具体数据如下:
score1.txt:

A   55

B   65

C   44

D   87

E   66

F   90

G   70

H   59

I   61

J   58

K   40

A   45

B   62

C   64

D   77

E   36

F   50

G   80

H   69

I   71

J   70

K   49

A   51

B   64

C   74

D   37

E   76

F   80

G   50

H   51

I   81

J   68

K   80

A   85

B   55

C   49

D   67

E   69

F   50

G   80

H   79

I   81

J   68

K   80

A   35

B   55

C   40

D   47

E   60

F   72

G   76

H   79

I   68

J   78

K   50

A   65

B   45

C   74

D   57

E   56

F   50

G   60

H   59

I   61

J   58

K   60

A   85

B   45

C   74

D   67

E   86

F   70

G   50

H   79

I   81

J   78

K   60

A   50

B   69

C   40

D   89

E   69

F   95

G   75

H   59

I   60

J   59

K   45


score2.txt:

A   55

B   65

C   44

D   87

E   66

F   90

G   70

H   59

I   61

J   58

K   40

A   45

B   62

C   64

D   77

E   36

F   50

G   80

H   69

I   71

J   70

K   49

A   51

B   64

C   74

D   37

E   76

F   80

G   50

H   51

I   81

J   68

K   80

A   85

B   55

C   49

D   67

E   69

F   50

G   80

H   79

I   81

J   68

K   80

A   35

B   55

C   40

D   47

E   60

F   72

G   76

H   79

I   68

J   78

K   50

A   65

B   45

C   74

D   57

E   56

F   50

G   60

H   59

I   61

J   58

K   60

A   85

B   45

C   74

D   67

E   86

F   70

G   50

H   79

I   81

J   78

K   60

A   50

B   69

C   40

D   89

E   69

F   95

G   75

H   59

I   60

J   59

K   45


首先先配置下运行参数arguments:



其执行过程中控制台打印的信息为:

15/07/05 20:35:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/07/05 20:35:33 WARN mapred.JobClient: No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
15/07/05 20:35:33 INFO input.FileInputFormat: Total input paths to process : 2
15/07/05 20:35:33 WARN snappy.LoadSnappy: Snappy native library not loaded
15/07/05 20:35:34 INFO mapred.JobClient: Running job: job_local577202179_0001
15/07/05 20:35:34 INFO mapred.LocalJobRunner: Waiting for map tasks
15/07/05 20:35:34 INFO mapred.LocalJobRunner: Starting task: attempt_local577202179_0001_m_000000_0
15/07/05 20:35:34 INFO util.ProcessTree: setsid exited with exit code 0
15/07/05 20:35:34 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1a6a5f9
15/07/05 20:35:34 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/user/hadoop/input/score1.txt:0+704
15/07/05 20:35:34 INFO mapred.MapTask: io.sort.mb = 100
15/07/05 20:35:34 INFO mapred.MapTask: data buffer = 79691776/99614720
15/07/05 20:35:34 INFO mapred.MapTask: record buffer = 262144/327680
15/07/05 20:35:35 INFO mapred.JobClient:  map 0% reduce 0%
Before Map:0,
Before Map:1,A   55
Aefore Map:A,55
Before Map:8,
Before Map:9,B   65
Aefore Map:B,65
Before Map:16,
Before Map:17,C   44
Aefore Map:C,44
Before Map:24,
Before Map:25,D   87
Aefore Map:D,87
Before Map:32,
Before Map:33,E   66
Aefore Map:E,66
Before Map:40,
Before Map:41,F   90
Aefore Map:F,90
Before Map:48,
Before Map:49,G   70
Aefore Map:G,70
Before Map:56,
Before Map:57,H   59
Aefore Map:H,59
Before Map:64,
Before Map:65,I   61
Aefore Map:I,61
Before Map:72,
Before Map:73,J   58
Aefore Map:J,58
Before Map:80,
Before Map:81,K   40
Aefore Map:K,40
Before Map:88,
Before Map:89,A   45
Aefore Map:A,45
Before Map:96,
Before Map:97,B   62
Aefore Map:B,62
Before Map:104,
Before Map:105,C   64
Aefore Map:C,64
Before Map:112,
Before Map:113,D   77
Aefore Map:D,77
Before Map:120,
Before Map:121,E   36
Aefore Map:E,36
Before Map:128,
Before Map:129,F   50
Aefore Map:F,50
Before Map:136,
Before Map:137,G   80
Aefore Map:G,80
Before Map:144,
Before Map:145,H   69
Aefore Map:H,69
Before Map:152,
Before Map:153,I   71
Aefore Map:I,71
Before Map:160,
Before Map:161,J   70
Aefore Map:J,70
Before Map:168,
Before Map:169,K   49
Aefore Map:K,49
Before Map:176,
Before Map:177,A   51
Aefore Map:A,51
Before Map:184,
Before Map:185,B   64
Aefore Map:B,64
Before Map:192,
Before Map:193,C   74
Aefore Map:C,74
Before Map:200,
Before Map:201,D   37
Aefore Map:D,37
Before Map:208,
Before Map:209,E   76
Aefore Map:E,76
Before Map:216,
Before Map:217,F   80
Aefore Map:F,80
Before Map:224,
Before Map:225,G   50
Aefore Map:G,50
Before Map:232,
Before Map:233,H   51
Aefore Map:H,51
Before Map:240,
Before Map:241,I   81
Aefore Map:I,81
Before Map:248,
Before Map:249,J   68
Aefore Map:J,68
Before Map:256,
Before Map:257,K   80
Aefore Map:K,80
Before Map:264,
Before Map:265,A   85
Aefore Map:A,85
Before Map:272,
Before Map:273,B   55
Aefore Map:B,55
Before Map:280,
Before Map:281,C   49
Aefore Map:C,49
Before Map:288,
Before Map:289,D   67
Aefore Map:D,67
Before Map:296,
Before Map:297,E   69
Aefore Map:E,69
Before Map:304,
Before Map:305,F   50
Aefore Map:F,50
Before Map:312,
Before Map:313,G   80
Aefore Map:G,80
Before Map:320,
Before Map:321,H   79
Aefore Map:H,79
Before Map:328,
Before Map:329,I   81
Aefore Map:I,81
Before Map:336,
Before Map:337,J   68
Aefore Map:J,68
Before Map:344,
Before Map:345,K   80
Aefore Map:K,80
Before Map:352,
Before Map:353,A   35
Aefore Map:A,35
Before Map:360,
Before Map:361,B   55
Aefore Map:B,55
Before Map:368,
Before Map:369,C   40
Aefore Map:C,40
Before Map:376,
Before Map:377,D   47
Aefore Map:D,47
Before Map:384,
Before Map:385,E   60
Aefore Map:E,60
Before Map:392,
Before Map:393,F   72
Aefore Map:F,72
Before Map:400,
Before Map:401,G   76
Aefore Map:G,76
Before Map:408,
Before Map:409,H   79
Aefore Map:H,79
Before Map:416,
Before Map:417,I   68
Aefore Map:I,68
Before Map:424,
Before Map:425,J   78
Aefore Map:J,78
Before Map:432,
Before Map:433,K   50
Aefore Map:K,50
Before Map:440,
Before Map:441,A   65
Aefore Map:A,65
Before Map:448,
Before Map:449,B   45
Aefore Map:B,45
Before Map:456,
Before Map:457,C   74
Aefore Map:C,74
Before Map:464,
Before Map:465,D   57
Aefore Map:D,57
Before Map:472,
Before Map:473,E   56
Aefore Map:E,56
Before Map:480,
Before Map:481,F   50
Aefore Map:F,50
Before Map:488,
Before Map:489,G   60
Aefore Map:G,60
Before Map:496,
Before Map:497,H   59
Aefore Map:H,59
Before Map:504,
Before Map:505,I   61
Aefore Map:I,61
Before Map:512,
Before Map:513,J   58
Aefore Map:J,58
Before Map:520,
Before Map:521,K   60
Aefore Map:K,60
Before Map:528,
Before Map:529,A   85
Aefore Map:A,85
Before Map:536,
Before Map:537,B   45
Aefore Map:B,45
Before Map:544,
Before Map:545,C   74
Aefore Map:C,74
Before Map:552,
Before Map:553,D   67
Aefore Map:D,67
Before Map:560,
Before Map:561,E   86
Aefore Map:E,86
Before Map:568,
Before Map:569,F   70
Aefore Map:F,70
Before Map:576,
Before Map:577,G   50
Aefore Map:G,50
Before Map:584,
Before Map:585,H   79
Aefore Map:H,79
Before Map:592,
Before Map:593,I   81
Aefore Map:I,81
Before Map:600,
Before Map:601,J   78
Aefore Map:J,78
Before Map:608,
Before Map:609,K   60
Aefore Map:K,60
Before Map:616,
Before Map:617,A   50
Aefore Map:A,50
Before Map:624,
Before Map:625,B   69
Aefore Map:B,69
Before Map:632,
Before Map:633,C   40
Aefore Map:C,40
Before Map:640,
Before Map:641,D   89
Aefore Map:D,89
Before Map:648,
Before Map:649,E   69
Aefore Map:E,69
Before Map:656,
Before Map:657,F   95
Aefore Map:F,95
Before Map:664,
Before Map:665,G   75
Aefore Map:G,75
Before Map:672,
Before Map:673,H   59
Aefore Map:H,59
Before Map:680,
Before Map:681,I   60
Aefore Map:I,60
Before Map:688,
Before Map:689,J   59
Aefore Map:J,59
Before Map:696,
Before Map:697,K   45
Aefore Map:K,45
15/07/05 20:35:39 INFO mapred.MapTask: Starting flush of map output
Bfter Reducer:A,55,45,51,85,35,65,85,50,
After Reducer:A,58
Bfter Reducer:B,45,64,65,45,55,69,62,55,
After Reducer:B,57
Bfter Reducer:C,64,49,44,74,74,40,40,74,
After Reducer:C,57
Bfter Reducer:D,67,67,77,37,87,57,89,47,
After Reducer:D,66
Bfter Reducer:E,36,66,76,86,69,69,60,56,
After Reducer:E,64
Bfter Reducer:F,90,95,70,50,80,50,50,72,
After Reducer:F,69
Bfter Reducer:G,60,76,50,50,80,70,75,80,
After Reducer:G,67
Bfter Reducer:H,59,69,51,79,59,79,59,79,
After Reducer:H,66
Bfter Reducer:I,60,61,81,81,61,71,68,81,
After Reducer:I,70
Bfter Reducer:J,58,59,78,68,78,68,70,58,
After Reducer:J,67
Bfter Reducer:K,40,50,49,60,60,45,80,80,
After Reducer:K,58
15/07/05 20:35:39 INFO mapred.MapTask: Finished spill 0
15/07/05 20:35:39 INFO mapred.Task: Task:attempt_local577202179_0001_m_000000_0 is done. And is in the process of commiting
15/07/05 20:35:39 INFO mapred.LocalJobRunner:
15/07/05 20:35:39 INFO mapred.Task: Task 'attempt_local577202179_0001_m_000000_0' done.
15/07/05 20:35:39 INFO mapred.LocalJobRunner: Finishing task: attempt_local577202179_0001_m_000000_0
15/07/05 20:35:39 INFO mapred.LocalJobRunner: Starting task: attempt_local577202179_0001_m_000001_0
15/07/05 20:35:39 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@10c2696
15/07/05 20:35:39 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/user/hadoop/input/score2.txt:0+704
15/07/05 20:35:39 INFO mapred.MapTask: io.sort.mb = 100
15/07/05 20:35:39 INFO mapred.JobClient:  map 50% reduce 0%
15/07/05 20:35:39 INFO mapred.MapTask: data buffer = 79691776/99614720
15/07/05 20:35:39 INFO mapred.MapTask: record buffer = 262144/327680
Before Map:0,
Before Map:1,A   65
Aefore Map:A,65
Before Map:8,
Before Map:9,B   75
Aefore Map:B,75
Before Map:16,
Before Map:17,C   64
Aefore Map:C,64
Before Map:24,
Before Map:25,D   67
Aefore Map:D,67
Before Map:32,
Before Map:33,E   86
Aefore Map:E,86
Before Map:40,
Before Map:41,F   70
Aefore Map:F,70
Before Map:48,
Before Map:49,G   90
Aefore Map:G,90
Before Map:56,
Before Map:57,H   79
Aefore Map:H,79
Before Map:64,
Before Map:65,I   81
Aefore Map:I,81
Before Map:72,
Before Map:73,J   78
Aefore Map:J,78
Before Map:80,
Before Map:81,K   60
Aefore Map:K,60
Before Map:88,
Before Map:89,A   65
Aefore Map:A,65
Before Map:96,
Before Map:97,B   82
Aefore Map:B,82
Before Map:104,
Before Map:105,C   84
Aefore Map:C,84
Before Map:112,
Before Map:113,D   97
Aefore Map:D,97
Before Map:120,
Before Map:121,E   66
Aefore Map:E,66
Before Map:128,
Before Map:129,F   70
Aefore Map:F,70
Before Map:136,
Before Map:137,G   80
Aefore Map:G,80
Before Map:144,
Before Map:145,H   89
Aefore Map:H,89
Before Map:152,
Before Map:153,I   91
Aefore Map:I,91
Before Map:160,
Before Map:161,J   90
Aefore Map:J,90
Before Map:168,
Before Map:169,K   69
Aefore Map:K,69
Before Map:176,
Before Map:177,A   71
Aefore Map:A,71
Before Map:184,
Before Map:185,B   84
Aefore Map:B,84
Before Map:192,
Before Map:193,C   94
Aefore Map:C,94
Before Map:200,
Before Map:201,D   67
Aefore Map:D,67
Before Map:208,
Before Map:209,E   96
Aefore Map:E,96
Before Map:216,
Before Map:217,F   80
Aefore Map:F,80
Before Map:224,
Before Map:225,G   70
Aefore Map:G,70
Before Map:232,
Before Map:233,H   71
Aefore Map:H,71
Before Map:240,
Before Map:241,I   81
Aefore Map:I,81
Before Map:248,
Before Map:249,J   98
Aefore Map:J,98
Before Map:256,
Before Map:257,K   80
Aefore Map:K,80
Before Map:264,
Before Map:265,A   85
Aefore Map:A,85
Before Map:272,
Before Map:273,B   75
Aefore Map:B,75
Before Map:280,
Before Map:281,C   69
Aefore Map:C,69
Before Map:288,
Before Map:289,D   87
Aefore Map:D,87
Before Map:296,
Before Map:297,E   89
Aefore Map:E,89
Before Map:304,
Before Map:305,F   80
Aefore Map:F,80
Before Map:312,
Before Map:313,G   70
Aefore Map:G,70
Before Map:320,
Before Map:321,H   99
Aefore Map:H,99
Before Map:328,
Before Map:329,I   81
Aefore Map:I,81
Before Map:336,
Before Map:337,J   88
Aefore Map:J,88
Before Map:344,
Before Map:345,K   60
Aefore Map:K,60
Before Map:352,
Before Map:353,A   65
Aefore Map:A,65
Before Map:360,
Before Map:361,B   75
Aefore Map:B,75
Before Map:368,
Before Map:369,C   60
Aefore Map:C,60
Before Map:376,
Before Map:377,D   67
Aefore Map:D,67
Before Map:384,
Before Map:385,E   80
Aefore Map:E,80
Before Map:392,
Before Map:393,F   92
Aefore Map:F,92
Before Map:400,
Before Map:401,G   76
Aefore Map:G,76
Before Map:408,
Before Map:409,H   79
Aefore Map:H,79
Before Map:416,
Before Map:417,I   68
Aefore Map:I,68
Before Map:424,
Before Map:425,J   78
Aefore Map:J,78
Before Map:432,
Before Map:433,K   70
Aefore Map:K,70
Before Map:440,
Before Map:441,A   85
Aefore Map:A,85
Before Map:448,
Before Map:449,B   85
Aefore Map:B,85
Before Map:456,
Before Map:457,C   74
Aefore Map:C,74
Before Map:464,
Before Map:465,D   87
Aefore Map:D,87
Before Map:472,
Before Map:473,E   76
Aefore Map:E,76
Before Map:480,
Before Map:481,F   60
Aefore Map:F,60
Before Map:488,
Before Map:489,G   60
Aefore Map:G,60
Before Map:496,
Before Map:497,H   79
Aefore Map:H,79
Before Map:504,
Before Map:505,I   81
Aefore Map:I,81
Before Map:512,
Before Map:513,J   78
Aefore Map:J,78
Before Map:520,
Before Map:521,K   80
Aefore Map:K,80
Before Map:528,
Before Map:529,A   85
Aefore Map:A,85
Before Map:536,
Before Map:537,B   65
Aefore Map:B,65
Before Map:544,
Before Map:545,C   74
Aefore Map:C,74
Before Map:552,
Before Map:553,D   67
Aefore Map:D,67
Before Map:560,
Before Map:561,E   86
Aefore Map:E,86
Before Map:568,
Before Map:569,F   70
Aefore Map:F,70
Before Map:576,
Before Map:577,G   70
Aefore Map:G,70
Before Map:584,
Before Map:585,H   79
Aefore Map:H,79
Before Map:592,
Before Map:593,I   81
Aefore Map:I,81
Before Map:600,
Before Map:601,J   78
Aefore Map:J,78
Before Map:608,
Before Map:609,K   60
Aefore Map:K,60
Before Map:616,
Before Map:617,A   70
Aefore Map:A,70
Before Map:624,
Before Map:625,B   69
Aefore Map:B,69
Before Map:632,
Before Map:633,C   60
Aefore Map:C,60
Before Map:640,
Before Map:641,D   89
Aefore Map:D,89
Before Map:648,
Before Map:649,E   69
Aefore Map:E,69
Before Map:656,
Before Map:657,F   95
Aefore Map:F,95
Before Map:664,
Before Map:665,G   75
Aefore Map:G,75
Before Map:672,
Before Map:673,H   59
Aefore Map:H,59
Before Map:680,
Before Map:681,I   60
Aefore Map:I,60
Before Map:688,
Before Map:689,J   79
Aefore Map:J,79
Before Map:696,
Before Map:697,K   65
Aefore Map:K,65
15/07/05 20:35:42 INFO mapred.MapTask: Starting flush of map output
Bfter Reducer:A,65,65,71,85,65,85,85,70,
After Reducer:A,73
Bfter Reducer:B,65,84,75,85,75,69,82,75,
After Reducer:B,76
Bfter Reducer:C,84,69,64,74,94,60,60,74,
After Reducer:C,72
Bfter Reducer:D,67,87,97,67,67,87,89,67,
After Reducer:D,78
Bfter Reducer:E,66,86,96,86,89,69,80,76,
After Reducer:E,81
Bfter Reducer:F,70,95,70,70,80,60,80,92,
After Reducer:F,77
Bfter Reducer:G,60,76,70,70,80,90,75,70,
After Reducer:G,73
Bfter Reducer:H,79,89,71,99,59,79,79,79,
After Reducer:H,79
Bfter Reducer:I,60,81,81,81,81,91,68,81,
After Reducer:I,78
Bfter Reducer:J,78,79,78,88,78,98,90,78,
After Reducer:J,83
Bfter Reducer:K,60,70,69,60,80,65,60,80,
After Reducer:K,68
15/07/05 20:35:42 INFO mapred.MapTask: Finished spill 0
15/07/05 20:35:42 INFO mapred.Task: Task:attempt_local577202179_0001_m_000001_0 is done. And is in the process of commiting
15/07/05 20:35:42 INFO mapred.LocalJobRunner:
15/07/05 20:35:42 INFO mapred.Task: Task 'attempt_local577202179_0001_m_000001_0' done.
15/07/05 20:35:42 INFO mapred.LocalJobRunner: Finishing task: attempt_local577202179_0001_m_000001_0
15/07/05 20:35:42 INFO mapred.LocalJobRunner: Map task executor complete.
15/07/05 20:35:42 INFO mapred.JobClient:  map 100% reduce 0%
15/07/05 20:35:43 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@8f544b
15/07/05 20:35:43 INFO mapred.LocalJobRunner:
15/07/05 20:35:43 INFO mapred.Merger: Merging 2 sorted segments
15/07/05 20:35:43 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 180 bytes
15/07/05 20:35:43 INFO mapred.LocalJobRunner:
Bfter Reducer:A,58,73,
After Reducer:A,65
Bfter Reducer:B,76,57,
After Reducer:B,66
Bfter Reducer:C,57,72,
After Reducer:C,64
Bfter Reducer:D,78,66,
After Reducer:D,72
Bfter Reducer:E,64,81,
After Reducer:E,72
Bfter Reducer:F,77,69,
After Reducer:F,73
Bfter Reducer:G,67,73,
After Reducer:G,70
Bfter Reducer:H,79,66,
After Reducer:H,72
Bfter Reducer:I,70,78,
After Reducer:I,74
Bfter Reducer:J,83,67,
After Reducer:J,75
Bfter Reducer:K,58,68,
After Reducer:K,63
15/07/05 20:35:44 INFO mapred.Task: Task:attempt_local577202179_0001_r_000000_0 is done. And is in the process of commiting
15/07/05 20:35:44 INFO mapred.LocalJobRunner:
15/07/05 20:35:44 INFO mapred.Task: Task attempt_local577202179_0001_r_000000_0 is allowed to commit now
15/07/05 20:35:44 INFO output.FileOutputCommitter: Saved output of task 'attempt_local577202179_0001_r_000000_0' to hdfs://localhost:9000/user/hadoop/output
15/07/05 20:35:44 INFO mapred.LocalJobRunner: reduce > reduce
15/07/05 20:35:44 INFO mapred.Task: Task 'attempt_local577202179_0001_r_000000_0' done.
15/07/05 20:35:44 INFO mapred.JobClient:  map 100% reduce 100%
15/07/05 20:35:44 INFO mapred.JobClient: Job complete: job_local577202179_0001
15/07/05 20:35:44 INFO mapred.JobClient: Counters: 22
15/07/05 20:35:44 INFO mapred.JobClient:   File Output Format Counters
15/07/05 20:35:44 INFO mapred.JobClient:     Bytes Written=55
15/07/05 20:35:44 INFO mapred.JobClient:   FileSystemCounters
15/07/05 20:35:44 INFO mapred.JobClient:     FILE_BYTES_READ=1703
15/07/05 20:35:44 INFO mapred.JobClient:     HDFS_BYTES_READ=3520
15/07/05 20:35:44 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=205902
15/07/05 20:35:44 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=55
15/07/05 20:35:44 INFO mapred.JobClient:   File Input Format Counters
15/07/05 20:35:44 INFO mapred.JobClient:     Bytes Read=1408
15/07/05 20:35:44 INFO mapred.JobClient:   Map-Reduce Framework
15/07/05 20:35:44 INFO mapred.JobClient:     Reduce input groups=11
15/07/05 20:35:44 INFO mapred.JobClient:     Map output materialized bytes=188
15/07/05 20:35:44 INFO mapred.JobClient:     Combine output records=22
15/07/05 20:35:44 INFO mapred.JobClient:     Map input records=352
15/07/05 20:35:44 INFO mapred.JobClient:     Reduce shuffle bytes=0
15/07/05 20:35:44 INFO mapred.JobClient:     Physical memory (bytes) snapshot=0
15/07/05 20:35:44 INFO mapred.JobClient:     Reduce output records=11
15/07/05 20:35:44 INFO mapred.JobClient:     Spilled Records=44
15/07/05 20:35:44 INFO mapred.JobClient:     Map output bytes=1056
15/07/05 20:35:44 INFO mapred.JobClient:     CPU time spent (ms)=0
15/07/05 20:35:44 INFO mapred.JobClient:     Total committed heap usage (bytes)=444071936
15/07/05 20:35:44 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=0
15/07/05 20:35:44 INFO mapred.JobClient:     Combine input records=176
15/07/05 20:35:44 INFO mapred.JobClient:     Map output records=176
15/07/05 20:35:44 INFO mapred.JobClient:     SPLIT_RAW_BYTES=230
15/07/05 20:35:44 INFO mapred.JobClient:     Reduce input records=22
查询输出结果:



这里补充两点我在程序过程遇到的问题:
(1).在Map函数前面用了StringTokenizer token = new StringTokenizer(textline),这里说说这个什么用,StringTokenizer(String
text)传入的是string值,之前在wordcount例子里面一直不明白,文本数据是怎么通过这个拆分成一个个单词,后来明白了,原来文本传入的默认格式是TextInputFormat即已经将文件中的文本按照行标示进行分割,即输入给map方法的已经是以一行为单位的记录,然后再以空格去拆分成一个个的单词,StringTokenizer还有这样一个构造方法StringTokenizer(String
str, String delim),delim表示分隔符,默认是“\t\n\r\f”。说回这里,那么我们只需要考虑每一行怎么拆分就可以了,这里人称和分数是隔着一个“\t”,也就是直接new
StringTokenizer(textline)即可,这里的“\t”不用明写,之前因为这样一直无法分割,可能不识别“\t”吧。
(2).从执行过程打印的信息,起初让我有些疑惑,因为从信息来看,似乎是:NameScore1.txt被分割并以每行记录进入map过程,当执行到该文件的最后一行记录时,从打印信息我们似乎看到的是紧接着就去执行reduce过程了,后面的NameScore2.txt也是如此,当两个文件都分别执行了map和reduce时似乎又执行了一次reduce操作。那么事实是不是如此,如果真是这样,这与先前所看到的理论中提到当map执行完后再执行reduce是否有冲突。

通过查看代码我们发现

  job.setMapperClass(MapperClass.class);
  job.setCombinerClass(ReducerClass.class);
  job.setReducerClass(ReducerClass.class);

  是的,没错,在这里我们发现了端倪,其真正执行过程是:先执行map,这就是过程信息中打印了每条成绩记录,后面执行的reduce其实是介于map和reduce之间的combiner操作,那么这个Combiner类又为何物,通过神奇的API我们可以发现Combine其实就是一次reduce过程,而且这里明确写了combiner和reduce过程都是使用ReducerClass类,从而就出现了这里为什么对于单个文件都会先执行map然后在reduce,再后来是两个map执行后,才执行的真正的reduce。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: