Hadoop小练习——利用MapReduce求平均数
2015-07-05 20:37
651 查看
前面对MapRuduce理念作了学习,有一点领会,趁热打铁做一个小练习,巩固下理念知识才是真理,实践是检验真理的唯一标准。
这里做一个求分数平均数的MapReduce例子,这里引导一位前辈说的方法,我觉得非常道理。就是:
map阶段输入什么、map过程执行什么、map阶段输出什么、reduce阶段输入什么、执行什么、输出什么。能够将以上几个点弄清楚整明白,一个MapReduce程序就会跃然纸上。这里:
Map:指定格式的数据集(如"张三 60")——输入数据执行每条记录的分割操作以key-value写入上下文context中——执行功能
得到指定键值对类型的输出(如"(new Text(张三),new IntWritable(60))")——输出结果
Reduce: map的输出——输入数据求出单个个体的总成绩后再除以该个体课程数目——执行功能得到指定键值对类型的输入——输出结果
鉴于上面的map和reduce过程,我们可以得到如下的代码:
数据集:这里的数据是码农我自己手工创建的,主要是想看看mapreduce的运行过程,所以就创建了两个文件,当然这里面的成绩也就没有什么是否符合正态分布的考虑了……
数据中设定有A-K共11个学生,共16门课程,具体数据如下:
score1.txt:
score2.txt:
首先先配置下运行参数arguments:
其执行过程中控制台打印的信息为:
这里补充两点我在程序过程遇到的问题:
(1).在Map函数前面用了StringTokenizer token = new StringTokenizer(textline),这里说说这个什么用,StringTokenizer(String
text)传入的是string值,之前在wordcount例子里面一直不明白,文本数据是怎么通过这个拆分成一个个单词,后来明白了,原来文本传入的默认格式是TextInputFormat即已经将文件中的文本按照行标示进行分割,即输入给map方法的已经是以一行为单位的记录,然后再以空格去拆分成一个个的单词,StringTokenizer还有这样一个构造方法StringTokenizer(String
str, String delim),delim表示分隔符,默认是“\t\n\r\f”。说回这里,那么我们只需要考虑每一行怎么拆分就可以了,这里人称和分数是隔着一个“\t”,也就是直接new
StringTokenizer(textline)即可,这里的“\t”不用明写,之前因为这样一直无法分割,可能不识别“\t”吧。
(2).从执行过程打印的信息,起初让我有些疑惑,因为从信息来看,似乎是:NameScore1.txt被分割并以每行记录进入map过程,当执行到该文件的最后一行记录时,从打印信息我们似乎看到的是紧接着就去执行reduce过程了,后面的NameScore2.txt也是如此,当两个文件都分别执行了map和reduce时似乎又执行了一次reduce操作。那么事实是不是如此,如果真是这样,这与先前所看到的理论中提到当map执行完后再执行reduce是否有冲突。
通过查看代码我们发现
job.setMapperClass(MapperClass.class);
job.setCombinerClass(ReducerClass.class);
job.setReducerClass(ReducerClass.class);
是的,没错,在这里我们发现了端倪,其真正执行过程是:先执行map,这就是过程信息中打印了每条成绩记录,后面执行的reduce其实是介于map和reduce之间的combiner操作,那么这个Combiner类又为何物,通过神奇的API我们可以发现Combine其实就是一次reduce过程,而且这里明确写了combiner和reduce过程都是使用ReducerClass类,从而就出现了这里为什么对于单个文件都会先执行map然后在reduce,再后来是两个map执行后,才执行的真正的reduce。
这里做一个求分数平均数的MapReduce例子,这里引导一位前辈说的方法,我觉得非常道理。就是:
map阶段输入什么、map过程执行什么、map阶段输出什么、reduce阶段输入什么、执行什么、输出什么。能够将以上几个点弄清楚整明白,一个MapReduce程序就会跃然纸上。这里:
Map:指定格式的数据集(如"张三 60")——输入数据执行每条记录的分割操作以key-value写入上下文context中——执行功能
得到指定键值对类型的输出(如"(new Text(张三),new IntWritable(60))")——输出结果
Reduce: map的输出——输入数据求出单个个体的总成绩后再除以该个体课程数目——执行功能得到指定键值对类型的输入——输出结果
鉴于上面的map和reduce过程,我们可以得到如下的代码:
package com.linxiaosheng.test; import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.FloatWritable; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.util.GenericOptionsParser; import com.linxiaosheng.test.Test1123.MapperClass; import com.linxiaosheng.test.Test1123.ReducerClass; public class ScoreAvgTest { /** * * @author hadoop * KEYIN:输入map的key值,为每行文本的开始位置子字节计算,(0,11...) * VALUEIN:输入map的value,为每行文本值 * KEYOUT :输出的key值 * VALUEOUT:输出的value值 */ public static class MapperClass extends Mapper<Object, Text, Text, IntWritable>{ private final static IntWritable score = new IntWritable(); private Text name = new Text(); @Override protected void map(Object key, Text value,Context context) throws IOException, InterruptedException { // TODO Auto-generated method stub String lineText=value.toString(); System.out.println("Before Map:"+key+","+lineText); StringTokenizer stringTokenizer=new StringTokenizer(lineText); while(stringTokenizer.hasMoreTokens()){ name.set(stringTokenizer.nextToken()); score.set(Integer.parseInt(stringTokenizer.nextToken())); System.out.println("Aefore Map:"+name+","+score); try { context.write(name, score); } catch (IOException e) { e.printStackTrace(); } catch (InterruptedException e) { e.printStackTrace(); } } } } /** * * @author hadoop *KEYIN:输入的名字 *VALUEIN:输入的分数 *KEYOUT:输出的名字 *VALUEOUT:统计输出的平均分 */ public static class ReducerClass extends Reducer<Text, IntWritable, Text, IntWritable>{ public IntWritable result = new IntWritable(); protected void reduce(Text name, Iterable<IntWritable> scores,Context context) throws IOException, InterruptedException { // TODO Auto-generated method stub StringBuffer sb=new StringBuffer(); int sum=0; int avg=0; int num=0; for(IntWritable score:scores){ int s=score.get(); sum+=s; num++; sb.append(s+","); } avg=sum/num; System.out.println("Bfter Reducer:"+name+","+sb.toString()); System.out.println("After Reducer:"+name+","+avg); result.set(avg); try { context.write(name, result); } catch (IOException e) { e.printStackTrace(); } catch (InterruptedException e) { e.printStackTrace(); } } } public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException { Configuration conf = new Configuration(); String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs(); /*if (otherArgs.length != 2) { System.err.println("Usage: wordcount <in> <out>"); System.exit(2); }*/ Job job = new Job(conf, "ScoreAvgTest"); job.setJarByClass(ScoreAvgTest.class); job.setMapperClass(MapperClass.class); job.setCombinerClass(ReducerClass.class); job.setReducerClass(ReducerClass.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); org.apache.hadoop.mapreduce.lib.input.FileInputFormat.addInputPath(job, new Path(otherArgs[0])); org.apache.hadoop.mapreduce.lib.input.FileInputFormat.addInputPath(job, new Path(otherArgs[1])); org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.setOutputPath(job, new Path(otherArgs[2])); System.exit(job.waitForCompletion(true) ? 0 : 1); System.out.println("end"); } }
数据集:这里的数据是码农我自己手工创建的,主要是想看看mapreduce的运行过程,所以就创建了两个文件,当然这里面的成绩也就没有什么是否符合正态分布的考虑了……
数据中设定有A-K共11个学生,共16门课程,具体数据如下:
score1.txt:
A 55 B 65 C 44 D 87 E 66 F 90 G 70 H 59 I 61 J 58 K 40 A 45 B 62 C 64 D 77 E 36 F 50 G 80 H 69 I 71 J 70 K 49 A 51 B 64 C 74 D 37 E 76 F 80 G 50 H 51 I 81 J 68 K 80 A 85 B 55 C 49 D 67 E 69 F 50 G 80 H 79 I 81 J 68 K 80 A 35 B 55 C 40 D 47 E 60 F 72 G 76 H 79 I 68 J 78 K 50 A 65 B 45 C 74 D 57 E 56 F 50 G 60 H 59 I 61 J 58 K 60 A 85 B 45 C 74 D 67 E 86 F 70 G 50 H 79 I 81 J 78 K 60 A 50 B 69 C 40 D 89 E 69 F 95 G 75 H 59 I 60 J 59 K 45
score2.txt:
A 55 B 65 C 44 D 87 E 66 F 90 G 70 H 59 I 61 J 58 K 40 A 45 B 62 C 64 D 77 E 36 F 50 G 80 H 69 I 71 J 70 K 49 A 51 B 64 C 74 D 37 E 76 F 80 G 50 H 51 I 81 J 68 K 80 A 85 B 55 C 49 D 67 E 69 F 50 G 80 H 79 I 81 J 68 K 80 A 35 B 55 C 40 D 47 E 60 F 72 G 76 H 79 I 68 J 78 K 50 A 65 B 45 C 74 D 57 E 56 F 50 G 60 H 59 I 61 J 58 K 60 A 85 B 45 C 74 D 67 E 86 F 70 G 50 H 79 I 81 J 78 K 60 A 50 B 69 C 40 D 89 E 69 F 95 G 75 H 59 I 60 J 59 K 45
首先先配置下运行参数arguments:
其执行过程中控制台打印的信息为:
15/07/05 20:35:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/07/05 20:35:33 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 15/07/05 20:35:33 INFO input.FileInputFormat: Total input paths to process : 2 15/07/05 20:35:33 WARN snappy.LoadSnappy: Snappy native library not loaded 15/07/05 20:35:34 INFO mapred.JobClient: Running job: job_local577202179_0001 15/07/05 20:35:34 INFO mapred.LocalJobRunner: Waiting for map tasks 15/07/05 20:35:34 INFO mapred.LocalJobRunner: Starting task: attempt_local577202179_0001_m_000000_0 15/07/05 20:35:34 INFO util.ProcessTree: setsid exited with exit code 0 15/07/05 20:35:34 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1a6a5f9 15/07/05 20:35:34 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/user/hadoop/input/score1.txt:0+704 15/07/05 20:35:34 INFO mapred.MapTask: io.sort.mb = 100 15/07/05 20:35:34 INFO mapred.MapTask: data buffer = 79691776/99614720 15/07/05 20:35:34 INFO mapred.MapTask: record buffer = 262144/327680 15/07/05 20:35:35 INFO mapred.JobClient: map 0% reduce 0% Before Map:0, Before Map:1,A 55 Aefore Map:A,55 Before Map:8, Before Map:9,B 65 Aefore Map:B,65 Before Map:16, Before Map:17,C 44 Aefore Map:C,44 Before Map:24, Before Map:25,D 87 Aefore Map:D,87 Before Map:32, Before Map:33,E 66 Aefore Map:E,66 Before Map:40, Before Map:41,F 90 Aefore Map:F,90 Before Map:48, Before Map:49,G 70 Aefore Map:G,70 Before Map:56, Before Map:57,H 59 Aefore Map:H,59 Before Map:64, Before Map:65,I 61 Aefore Map:I,61 Before Map:72, Before Map:73,J 58 Aefore Map:J,58 Before Map:80, Before Map:81,K 40 Aefore Map:K,40 Before Map:88, Before Map:89,A 45 Aefore Map:A,45 Before Map:96, Before Map:97,B 62 Aefore Map:B,62 Before Map:104, Before Map:105,C 64 Aefore Map:C,64 Before Map:112, Before Map:113,D 77 Aefore Map:D,77 Before Map:120, Before Map:121,E 36 Aefore Map:E,36 Before Map:128, Before Map:129,F 50 Aefore Map:F,50 Before Map:136, Before Map:137,G 80 Aefore Map:G,80 Before Map:144, Before Map:145,H 69 Aefore Map:H,69 Before Map:152, Before Map:153,I 71 Aefore Map:I,71 Before Map:160, Before Map:161,J 70 Aefore Map:J,70 Before Map:168, Before Map:169,K 49 Aefore Map:K,49 Before Map:176, Before Map:177,A 51 Aefore Map:A,51 Before Map:184, Before Map:185,B 64 Aefore Map:B,64 Before Map:192, Before Map:193,C 74 Aefore Map:C,74 Before Map:200, Before Map:201,D 37 Aefore Map:D,37 Before Map:208, Before Map:209,E 76 Aefore Map:E,76 Before Map:216, Before Map:217,F 80 Aefore Map:F,80 Before Map:224, Before Map:225,G 50 Aefore Map:G,50 Before Map:232, Before Map:233,H 51 Aefore Map:H,51 Before Map:240, Before Map:241,I 81 Aefore Map:I,81 Before Map:248, Before Map:249,J 68 Aefore Map:J,68 Before Map:256, Before Map:257,K 80 Aefore Map:K,80 Before Map:264, Before Map:265,A 85 Aefore Map:A,85 Before Map:272, Before Map:273,B 55 Aefore Map:B,55 Before Map:280, Before Map:281,C 49 Aefore Map:C,49 Before Map:288, Before Map:289,D 67 Aefore Map:D,67 Before Map:296, Before Map:297,E 69 Aefore Map:E,69 Before Map:304, Before Map:305,F 50 Aefore Map:F,50 Before Map:312, Before Map:313,G 80 Aefore Map:G,80 Before Map:320, Before Map:321,H 79 Aefore Map:H,79 Before Map:328, Before Map:329,I 81 Aefore Map:I,81 Before Map:336, Before Map:337,J 68 Aefore Map:J,68 Before Map:344, Before Map:345,K 80 Aefore Map:K,80 Before Map:352, Before Map:353,A 35 Aefore Map:A,35 Before Map:360, Before Map:361,B 55 Aefore Map:B,55 Before Map:368, Before Map:369,C 40 Aefore Map:C,40 Before Map:376, Before Map:377,D 47 Aefore Map:D,47 Before Map:384, Before Map:385,E 60 Aefore Map:E,60 Before Map:392, Before Map:393,F 72 Aefore Map:F,72 Before Map:400, Before Map:401,G 76 Aefore Map:G,76 Before Map:408, Before Map:409,H 79 Aefore Map:H,79 Before Map:416, Before Map:417,I 68 Aefore Map:I,68 Before Map:424, Before Map:425,J 78 Aefore Map:J,78 Before Map:432, Before Map:433,K 50 Aefore Map:K,50 Before Map:440, Before Map:441,A 65 Aefore Map:A,65 Before Map:448, Before Map:449,B 45 Aefore Map:B,45 Before Map:456, Before Map:457,C 74 Aefore Map:C,74 Before Map:464, Before Map:465,D 57 Aefore Map:D,57 Before Map:472, Before Map:473,E 56 Aefore Map:E,56 Before Map:480, Before Map:481,F 50 Aefore Map:F,50 Before Map:488, Before Map:489,G 60 Aefore Map:G,60 Before Map:496, Before Map:497,H 59 Aefore Map:H,59 Before Map:504, Before Map:505,I 61 Aefore Map:I,61 Before Map:512, Before Map:513,J 58 Aefore Map:J,58 Before Map:520, Before Map:521,K 60 Aefore Map:K,60 Before Map:528, Before Map:529,A 85 Aefore Map:A,85 Before Map:536, Before Map:537,B 45 Aefore Map:B,45 Before Map:544, Before Map:545,C 74 Aefore Map:C,74 Before Map:552, Before Map:553,D 67 Aefore Map:D,67 Before Map:560, Before Map:561,E 86 Aefore Map:E,86 Before Map:568, Before Map:569,F 70 Aefore Map:F,70 Before Map:576, Before Map:577,G 50 Aefore Map:G,50 Before Map:584, Before Map:585,H 79 Aefore Map:H,79 Before Map:592, Before Map:593,I 81 Aefore Map:I,81 Before Map:600, Before Map:601,J 78 Aefore Map:J,78 Before Map:608, Before Map:609,K 60 Aefore Map:K,60 Before Map:616, Before Map:617,A 50 Aefore Map:A,50 Before Map:624, Before Map:625,B 69 Aefore Map:B,69 Before Map:632, Before Map:633,C 40 Aefore Map:C,40 Before Map:640, Before Map:641,D 89 Aefore Map:D,89 Before Map:648, Before Map:649,E 69 Aefore Map:E,69 Before Map:656, Before Map:657,F 95 Aefore Map:F,95 Before Map:664, Before Map:665,G 75 Aefore Map:G,75 Before Map:672, Before Map:673,H 59 Aefore Map:H,59 Before Map:680, Before Map:681,I 60 Aefore Map:I,60 Before Map:688, Before Map:689,J 59 Aefore Map:J,59 Before Map:696, Before Map:697,K 45 Aefore Map:K,45 15/07/05 20:35:39 INFO mapred.MapTask: Starting flush of map output Bfter Reducer:A,55,45,51,85,35,65,85,50, After Reducer:A,58 Bfter Reducer:B,45,64,65,45,55,69,62,55, After Reducer:B,57 Bfter Reducer:C,64,49,44,74,74,40,40,74, After Reducer:C,57 Bfter Reducer:D,67,67,77,37,87,57,89,47, After Reducer:D,66 Bfter Reducer:E,36,66,76,86,69,69,60,56, After Reducer:E,64 Bfter Reducer:F,90,95,70,50,80,50,50,72, After Reducer:F,69 Bfter Reducer:G,60,76,50,50,80,70,75,80, After Reducer:G,67 Bfter Reducer:H,59,69,51,79,59,79,59,79, After Reducer:H,66 Bfter Reducer:I,60,61,81,81,61,71,68,81, After Reducer:I,70 Bfter Reducer:J,58,59,78,68,78,68,70,58, After Reducer:J,67 Bfter Reducer:K,40,50,49,60,60,45,80,80, After Reducer:K,58 15/07/05 20:35:39 INFO mapred.MapTask: Finished spill 0 15/07/05 20:35:39 INFO mapred.Task: Task:attempt_local577202179_0001_m_000000_0 is done. And is in the process of commiting 15/07/05 20:35:39 INFO mapred.LocalJobRunner: 15/07/05 20:35:39 INFO mapred.Task: Task 'attempt_local577202179_0001_m_000000_0' done. 15/07/05 20:35:39 INFO mapred.LocalJobRunner: Finishing task: attempt_local577202179_0001_m_000000_0 15/07/05 20:35:39 INFO mapred.LocalJobRunner: Starting task: attempt_local577202179_0001_m_000001_0 15/07/05 20:35:39 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@10c2696 15/07/05 20:35:39 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/user/hadoop/input/score2.txt:0+704 15/07/05 20:35:39 INFO mapred.MapTask: io.sort.mb = 100 15/07/05 20:35:39 INFO mapred.JobClient: map 50% reduce 0% 15/07/05 20:35:39 INFO mapred.MapTask: data buffer = 79691776/99614720 15/07/05 20:35:39 INFO mapred.MapTask: record buffer = 262144/327680 Before Map:0, Before Map:1,A 65 Aefore Map:A,65 Before Map:8, Before Map:9,B 75 Aefore Map:B,75 Before Map:16, Before Map:17,C 64 Aefore Map:C,64 Before Map:24, Before Map:25,D 67 Aefore Map:D,67 Before Map:32, Before Map:33,E 86 Aefore Map:E,86 Before Map:40, Before Map:41,F 70 Aefore Map:F,70 Before Map:48, Before Map:49,G 90 Aefore Map:G,90 Before Map:56, Before Map:57,H 79 Aefore Map:H,79 Before Map:64, Before Map:65,I 81 Aefore Map:I,81 Before Map:72, Before Map:73,J 78 Aefore Map:J,78 Before Map:80, Before Map:81,K 60 Aefore Map:K,60 Before Map:88, Before Map:89,A 65 Aefore Map:A,65 Before Map:96, Before Map:97,B 82 Aefore Map:B,82 Before Map:104, Before Map:105,C 84 Aefore Map:C,84 Before Map:112, Before Map:113,D 97 Aefore Map:D,97 Before Map:120, Before Map:121,E 66 Aefore Map:E,66 Before Map:128, Before Map:129,F 70 Aefore Map:F,70 Before Map:136, Before Map:137,G 80 Aefore Map:G,80 Before Map:144, Before Map:145,H 89 Aefore Map:H,89 Before Map:152, Before Map:153,I 91 Aefore Map:I,91 Before Map:160, Before Map:161,J 90 Aefore Map:J,90 Before Map:168, Before Map:169,K 69 Aefore Map:K,69 Before Map:176, Before Map:177,A 71 Aefore Map:A,71 Before Map:184, Before Map:185,B 84 Aefore Map:B,84 Before Map:192, Before Map:193,C 94 Aefore Map:C,94 Before Map:200, Before Map:201,D 67 Aefore Map:D,67 Before Map:208, Before Map:209,E 96 Aefore Map:E,96 Before Map:216, Before Map:217,F 80 Aefore Map:F,80 Before Map:224, Before Map:225,G 70 Aefore Map:G,70 Before Map:232, Before Map:233,H 71 Aefore Map:H,71 Before Map:240, Before Map:241,I 81 Aefore Map:I,81 Before Map:248, Before Map:249,J 98 Aefore Map:J,98 Before Map:256, Before Map:257,K 80 Aefore Map:K,80 Before Map:264, Before Map:265,A 85 Aefore Map:A,85 Before Map:272, Before Map:273,B 75 Aefore Map:B,75 Before Map:280, Before Map:281,C 69 Aefore Map:C,69 Before Map:288, Before Map:289,D 87 Aefore Map:D,87 Before Map:296, Before Map:297,E 89 Aefore Map:E,89 Before Map:304, Before Map:305,F 80 Aefore Map:F,80 Before Map:312, Before Map:313,G 70 Aefore Map:G,70 Before Map:320, Before Map:321,H 99 Aefore Map:H,99 Before Map:328, Before Map:329,I 81 Aefore Map:I,81 Before Map:336, Before Map:337,J 88 Aefore Map:J,88 Before Map:344, Before Map:345,K 60 Aefore Map:K,60 Before Map:352, Before Map:353,A 65 Aefore Map:A,65 Before Map:360, Before Map:361,B 75 Aefore Map:B,75 Before Map:368, Before Map:369,C 60 Aefore Map:C,60 Before Map:376, Before Map:377,D 67 Aefore Map:D,67 Before Map:384, Before Map:385,E 80 Aefore Map:E,80 Before Map:392, Before Map:393,F 92 Aefore Map:F,92 Before Map:400, Before Map:401,G 76 Aefore Map:G,76 Before Map:408, Before Map:409,H 79 Aefore Map:H,79 Before Map:416, Before Map:417,I 68 Aefore Map:I,68 Before Map:424, Before Map:425,J 78 Aefore Map:J,78 Before Map:432, Before Map:433,K 70 Aefore Map:K,70 Before Map:440, Before Map:441,A 85 Aefore Map:A,85 Before Map:448, Before Map:449,B 85 Aefore Map:B,85 Before Map:456, Before Map:457,C 74 Aefore Map:C,74 Before Map:464, Before Map:465,D 87 Aefore Map:D,87 Before Map:472, Before Map:473,E 76 Aefore Map:E,76 Before Map:480, Before Map:481,F 60 Aefore Map:F,60 Before Map:488, Before Map:489,G 60 Aefore Map:G,60 Before Map:496, Before Map:497,H 79 Aefore Map:H,79 Before Map:504, Before Map:505,I 81 Aefore Map:I,81 Before Map:512, Before Map:513,J 78 Aefore Map:J,78 Before Map:520, Before Map:521,K 80 Aefore Map:K,80 Before Map:528, Before Map:529,A 85 Aefore Map:A,85 Before Map:536, Before Map:537,B 65 Aefore Map:B,65 Before Map:544, Before Map:545,C 74 Aefore Map:C,74 Before Map:552, Before Map:553,D 67 Aefore Map:D,67 Before Map:560, Before Map:561,E 86 Aefore Map:E,86 Before Map:568, Before Map:569,F 70 Aefore Map:F,70 Before Map:576, Before Map:577,G 70 Aefore Map:G,70 Before Map:584, Before Map:585,H 79 Aefore Map:H,79 Before Map:592, Before Map:593,I 81 Aefore Map:I,81 Before Map:600, Before Map:601,J 78 Aefore Map:J,78 Before Map:608, Before Map:609,K 60 Aefore Map:K,60 Before Map:616, Before Map:617,A 70 Aefore Map:A,70 Before Map:624, Before Map:625,B 69 Aefore Map:B,69 Before Map:632, Before Map:633,C 60 Aefore Map:C,60 Before Map:640, Before Map:641,D 89 Aefore Map:D,89 Before Map:648, Before Map:649,E 69 Aefore Map:E,69 Before Map:656, Before Map:657,F 95 Aefore Map:F,95 Before Map:664, Before Map:665,G 75 Aefore Map:G,75 Before Map:672, Before Map:673,H 59 Aefore Map:H,59 Before Map:680, Before Map:681,I 60 Aefore Map:I,60 Before Map:688, Before Map:689,J 79 Aefore Map:J,79 Before Map:696, Before Map:697,K 65 Aefore Map:K,65 15/07/05 20:35:42 INFO mapred.MapTask: Starting flush of map output Bfter Reducer:A,65,65,71,85,65,85,85,70, After Reducer:A,73 Bfter Reducer:B,65,84,75,85,75,69,82,75, After Reducer:B,76 Bfter Reducer:C,84,69,64,74,94,60,60,74, After Reducer:C,72 Bfter Reducer:D,67,87,97,67,67,87,89,67, After Reducer:D,78 Bfter Reducer:E,66,86,96,86,89,69,80,76, After Reducer:E,81 Bfter Reducer:F,70,95,70,70,80,60,80,92, After Reducer:F,77 Bfter Reducer:G,60,76,70,70,80,90,75,70, After Reducer:G,73 Bfter Reducer:H,79,89,71,99,59,79,79,79, After Reducer:H,79 Bfter Reducer:I,60,81,81,81,81,91,68,81, After Reducer:I,78 Bfter Reducer:J,78,79,78,88,78,98,90,78, After Reducer:J,83 Bfter Reducer:K,60,70,69,60,80,65,60,80, After Reducer:K,68 15/07/05 20:35:42 INFO mapred.MapTask: Finished spill 0 15/07/05 20:35:42 INFO mapred.Task: Task:attempt_local577202179_0001_m_000001_0 is done. And is in the process of commiting 15/07/05 20:35:42 INFO mapred.LocalJobRunner: 15/07/05 20:35:42 INFO mapred.Task: Task 'attempt_local577202179_0001_m_000001_0' done. 15/07/05 20:35:42 INFO mapred.LocalJobRunner: Finishing task: attempt_local577202179_0001_m_000001_0 15/07/05 20:35:42 INFO mapred.LocalJobRunner: Map task executor complete. 15/07/05 20:35:42 INFO mapred.JobClient: map 100% reduce 0% 15/07/05 20:35:43 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@8f544b 15/07/05 20:35:43 INFO mapred.LocalJobRunner: 15/07/05 20:35:43 INFO mapred.Merger: Merging 2 sorted segments 15/07/05 20:35:43 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 180 bytes 15/07/05 20:35:43 INFO mapred.LocalJobRunner: Bfter Reducer:A,58,73, After Reducer:A,65 Bfter Reducer:B,76,57, After Reducer:B,66 Bfter Reducer:C,57,72, After Reducer:C,64 Bfter Reducer:D,78,66, After Reducer:D,72 Bfter Reducer:E,64,81, After Reducer:E,72 Bfter Reducer:F,77,69, After Reducer:F,73 Bfter Reducer:G,67,73, After Reducer:G,70 Bfter Reducer:H,79,66, After Reducer:H,72 Bfter Reducer:I,70,78, After Reducer:I,74 Bfter Reducer:J,83,67, After Reducer:J,75 Bfter Reducer:K,58,68, After Reducer:K,63 15/07/05 20:35:44 INFO mapred.Task: Task:attempt_local577202179_0001_r_000000_0 is done. And is in the process of commiting 15/07/05 20:35:44 INFO mapred.LocalJobRunner: 15/07/05 20:35:44 INFO mapred.Task: Task attempt_local577202179_0001_r_000000_0 is allowed to commit now 15/07/05 20:35:44 INFO output.FileOutputCommitter: Saved output of task 'attempt_local577202179_0001_r_000000_0' to hdfs://localhost:9000/user/hadoop/output 15/07/05 20:35:44 INFO mapred.LocalJobRunner: reduce > reduce 15/07/05 20:35:44 INFO mapred.Task: Task 'attempt_local577202179_0001_r_000000_0' done. 15/07/05 20:35:44 INFO mapred.JobClient: map 100% reduce 100% 15/07/05 20:35:44 INFO mapred.JobClient: Job complete: job_local577202179_0001 15/07/05 20:35:44 INFO mapred.JobClient: Counters: 22 15/07/05 20:35:44 INFO mapred.JobClient: File Output Format Counters 15/07/05 20:35:44 INFO mapred.JobClient: Bytes Written=55 15/07/05 20:35:44 INFO mapred.JobClient: FileSystemCounters 15/07/05 20:35:44 INFO mapred.JobClient: FILE_BYTES_READ=1703 15/07/05 20:35:44 INFO mapred.JobClient: HDFS_BYTES_READ=3520 15/07/05 20:35:44 INFO mapred.JobClient: FILE_BYTES_WRITTEN=205902 15/07/05 20:35:44 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=55 15/07/05 20:35:44 INFO mapred.JobClient: File Input Format Counters 15/07/05 20:35:44 INFO mapred.JobClient: Bytes Read=1408 15/07/05 20:35:44 INFO mapred.JobClient: Map-Reduce Framework 15/07/05 20:35:44 INFO mapred.JobClient: Reduce input groups=11 15/07/05 20:35:44 INFO mapred.JobClient: Map output materialized bytes=188 15/07/05 20:35:44 INFO mapred.JobClient: Combine output records=22 15/07/05 20:35:44 INFO mapred.JobClient: Map input records=352 15/07/05 20:35:44 INFO mapred.JobClient: Reduce shuffle bytes=0 15/07/05 20:35:44 INFO mapred.JobClient: Physical memory (bytes) snapshot=0 15/07/05 20:35:44 INFO mapred.JobClient: Reduce output records=11 15/07/05 20:35:44 INFO mapred.JobClient: Spilled Records=44 15/07/05 20:35:44 INFO mapred.JobClient: Map output bytes=1056 15/07/05 20:35:44 INFO mapred.JobClient: CPU time spent (ms)=0 15/07/05 20:35:44 INFO mapred.JobClient: Total committed heap usage (bytes)=444071936 15/07/05 20:35:44 INFO mapred.JobClient: Virtual memory (bytes) snapshot=0 15/07/05 20:35:44 INFO mapred.JobClient: Combine input records=176 15/07/05 20:35:44 INFO mapred.JobClient: Map output records=176 15/07/05 20:35:44 INFO mapred.JobClient: SPLIT_RAW_BYTES=230 15/07/05 20:35:44 INFO mapred.JobClient: Reduce input records=22查询输出结果:
这里补充两点我在程序过程遇到的问题:
(1).在Map函数前面用了StringTokenizer token = new StringTokenizer(textline),这里说说这个什么用,StringTokenizer(String
text)传入的是string值,之前在wordcount例子里面一直不明白,文本数据是怎么通过这个拆分成一个个单词,后来明白了,原来文本传入的默认格式是TextInputFormat即已经将文件中的文本按照行标示进行分割,即输入给map方法的已经是以一行为单位的记录,然后再以空格去拆分成一个个的单词,StringTokenizer还有这样一个构造方法StringTokenizer(String
str, String delim),delim表示分隔符,默认是“\t\n\r\f”。说回这里,那么我们只需要考虑每一行怎么拆分就可以了,这里人称和分数是隔着一个“\t”,也就是直接new
StringTokenizer(textline)即可,这里的“\t”不用明写,之前因为这样一直无法分割,可能不识别“\t”吧。
(2).从执行过程打印的信息,起初让我有些疑惑,因为从信息来看,似乎是:NameScore1.txt被分割并以每行记录进入map过程,当执行到该文件的最后一行记录时,从打印信息我们似乎看到的是紧接着就去执行reduce过程了,后面的NameScore2.txt也是如此,当两个文件都分别执行了map和reduce时似乎又执行了一次reduce操作。那么事实是不是如此,如果真是这样,这与先前所看到的理论中提到当map执行完后再执行reduce是否有冲突。
通过查看代码我们发现
job.setMapperClass(MapperClass.class);
job.setCombinerClass(ReducerClass.class);
job.setReducerClass(ReducerClass.class);
是的,没错,在这里我们发现了端倪,其真正执行过程是:先执行map,这就是过程信息中打印了每条成绩记录,后面执行的reduce其实是介于map和reduce之间的combiner操作,那么这个Combiner类又为何物,通过神奇的API我们可以发现Combine其实就是一次reduce过程,而且这里明确写了combiner和reduce过程都是使用ReducerClass类,从而就出现了这里为什么对于单个文件都会先执行map然后在reduce,再后来是两个map执行后,才执行的真正的reduce。
相关文章推荐
- 完全用Linux工作
- 【Linux编程】C/C++获取目录下文件或目录及linux中fork()函数详解(原创!!实例讲解)
- OpenGL 与 GLSL 版本
- 阿里云搭建apache+vsftpd
- linux ssh 问题记录
- xshell的上传和下载命令
- Android GPS架构分析(gps启动过程图)
- Android GPS架构分析-preview
- linux程序设计——用信号量进行同步(第十二章)
- linux安装jdk
- AOP
- 存储管理(两):openfiler它accounts
- 一致性哈希服务器的数据维护
- 我的 Fedora 20 + django + nginx + uwsgi 配置
- Android GPS架构分析(六)
- 网站分析数据收集方式详解Web日志JS标记和包嗅探器
- 用 Python 脚本实现对 Linux 服务器的监控
- Android GPS架构分析(五)
- Nginx 反向代理、负载均衡、页面缓存、URL重写及读写分离详解
- CentOS在安装配置 Ngnix_tomcat_PHP_Mysql