MapReduce程序初探 -------------- WordCount
2017-06-22 14:50
375 查看
程序代码
package test; import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class WordCount { public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{ private final static IntWritable one = new IntWritable(1); private Text word = new Text(); /* * LongWritable 为输入的key的类型 * Text 为输入value的类型 * Text-IntWritable 为输出key-value键值对的类型 */ public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); // 将TextInputFormat生成的键值对转换成字符串类型 while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable> { private IntWritable result = new IntWritable(); /* * Text-IntWritable 来自map的输入key-value键值对的类型 * Text-IntWritable 输出key-value 单词-词频键值对 */ public void reduce(Text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); // job的配置 Job job = Job.getInstance(conf, "word count"); // 初始化Job job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); // 设置输入路径 FileOutputFormat.setOutputPath(job, new Path(args[1])); // 设置输出路径 System.exit(job.waitForCompletion(true) ? 0 : 1); } }
创建目录
[root@Vm90 wxl]# ls output wordcount01 [root@Vm90 wxl]# mkdir wordcount01 [root@Vm90 wxl]# cd wordcount01 [root@Vm90 wxl]# mkdir src [root@Vm90 wxl]# mkdir classes [root@Vm90 wordcount01]# ls classes src [root@Vm90 wordcount01]# [root@Vm90 wordcount01]# cd src/ [root@Vm90 wordcount01]# vim WordCount.java #将上述代码粘贴到WordCount.java中 #然后执行编译 [root@Vm90 src]# cd .. [root@Vm90 wordcount01]# ls classes src #编译需要引用三个jar包 hadoop-common-2.6.0.jar hadoop-mapreduce-client-core-2.6.0.jar hadoop-test-1.2.1.jar #根据本身hadoop版本自行选取jar包,一下为本实验用到的jar包 hadoop-common-2.6.0-cdh5.5.0.jar hadoop-mapreduce-client-core-2.6.0-cdh5.5.0.jar hadoop-test-2.6.0-mr1-cdh5.5.0.jar [root@Vm90 wordcount01]# javac -Xlint:deprecation -classpath /opt/cloudera/parcels/CDH/jars/hadoop-common-2.6.0-cdh5.5.0.jar:/opt/cloudera/parcels/CDH/jars/hadoop-mapreduce-client-core-2.6.0-cdh5.5.0.jar:/opt/cloudera/parcels/CDH/jars/hadoop-test-2.6.0-mr1-cdh5.5.0.jar -d classes/ src/*.java -------输出为(不用在意警告): /opt/cloudera/parcels/CDH/jars/hadoop-common-2.6.0-cdh5.5.0.jar(org/apache/hadoop/fs/Path.class): 警告: 无法找到类型 'LimitedPrivate' 的注释方法 'value()': 找不到org.apache.hadoop.classification.InterfaceAudience的类文件 1 个警告 #打jar包 [root@Vm90 wordcount01]# jar -cvf wordcount.jar classes/* 已添加清单 正在添加: classes/test/(输入 = 0) (输出 = 0)(存储了 0%) 正在添加: classes/test/WordCount.class(输入 = 1516) (输出 = 815)(压缩了 46%) 正在添加: classes/test/WordCount$TokenizerMapper.class(输入 = 1746) (输出 = 758)(压缩了 56%) 正在添加: classes/test/WordCount$IntSumReducer.class(输入 = 1749) (输出 = 742)(压缩了 57%) [root@Vm90 wordcount01]# ls classes src wordcount.jar [root@Vm90 wordcount01]# #上次测试数据 [root@Vm90 input]# cat 2.txt hello hadoop bye hadoop good java great pytho [root@Vm90 wxl]# ls input output wordcount01 [root@Vm90 wxl]# hadoop fs -put input /hbase/
此时运行程序会报错
[root@Vm90 wxl]# hadoop jar wordcount01/wordcount.jar test.WordCount /hbase/input/ /hbase/output11 17/06/22 14:29:48 INFO client.RMProxy: Connecting to ResourceManager at Vm90/172.16.2.90:8032 17/06/22 14:29:49 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. 17/06/22 14:29:49 WARN mapreduce.JobResourceUploader: No job jar file set. User classes may not be found. See Job or Job#setJar(String). 17/06/22 14:29:49 INFO input.FileInputFormat: Total input paths to process : 1 17/06/22 14:29:49 INFO mapreduce.JobSubmitter: number of splits:1 17/06/22 14:29:50 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1497340925516_0008 17/06/22 14:29:50 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources. 17/06/22 14:29:50 INFO impl.YarnClientImpl: Submitted application application_1497340925516_0008 17/06/22 14:29:50 INFO mapreduce.Job: The url to track the job: http://Vm90:8088/proxy/application_1497340925516_0008/ 17/06/22 14:29:50 INFO mapreduce.Job: Running job: job_1497340925516_0008 17/06/22 14:29:58 INFO mapreduce.Job: Job job_1497340925516_0008 running in uber mode : false 17/06/22 14:29:58 INFO mapreduce.Job: map 0% reduce 0% 17/06/22 14:30:03 INFO mapreduce.Job: Task Id : attempt_1497340925516_0008_m_000000_0, Status : FAILED Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class test.WordCount$TokenizerMapper not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2199) at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:196) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:745) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.ClassNotFoundException: Class test.WordCount$TokenizerMapper not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2105) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2197) ... 8 more
原因是:
路径问题:报的是 test.WordCount$TokenizerMapper not found
而打的jar包时 classes/test/
重新打包:
[root@Vm90 wordcount01]# cd classes/ [root@Vm90 classes]# ls test [root@Vm90 classes]# jar -cvf ../wordcount.jar test 已添加清单 正在添加: test/(输入 = 0) (输出 = 0)(存储了 0%) 正在添加: test/WordCount.class(输入 = 1516) (输出 = 815)(压缩了 46%) 正在添加: test/WordCount$TokenizerMapper.class(输入 = 1746) (输出 = 758)(压缩了 56%) 正在添加: test/WordCount$IntSumReducer.class(输入 = 1749) (输出 = 742)(压缩了 57%) [root@Vm90 classes]# ls test [root@Vm90 classes]# cd .. [root@Vm90 wordcount01]# ls classes src wordcount.jar [root@Vm90 wordcount01]# #再次运行 [root@Vm90 wxl]# hadoop jar wordcount01/wordcount.jar test.WordCount /hbase/input/ /hbase/output12 17/06/22 14:38:15 INFO client.RMProxy: Connecting to ResourceManager at Vm90/172.16.2.90:8032 17/06/22 14:38:16 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. 17/06/22 14:38:16 INFO input.FileInputFormat: Total input paths to process : 1 17/06/22 14:38:16 INFO mapreduce.JobSubmitter: number of splits:1 17/06/22 14:38:16 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1497340925516_0009 17/06/22 14:38:17 INFO impl.YarnClientImpl: Submitted application application_1497340925516_0009 17/06/22 14:38:17 INFO mapreduce.Job: The url to track the job: http://Vm90:8088/proxy/application_1497340925516_0009/ 17/06/22 14:38:17 INFO mapreduce.Job: Running job: job_1497340925516_0009 17/06/22 14:38:24 INFO mapreduce.Job: Job job_1497340925516_0009 running in uber mode : false 17/06/22 14:38:24 INFO mapreduce.Job: map 0% reduce 0% 17/06/22 14:38:31 INFO mapreduce.Job: map 100% reduce 0% 17/06/22 14:38:38 INFO mapreduce.Job: map 100% reduce 50% 17/06/22 14:38:39 INFO mapreduce.Job: map 100% reduce 100% 17/06/22 14:38:40 INFO mapreduce.Job: Job job_1497340925516_0009 completed successfully 17/06/22 14:38:40 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=120 FILE: Number of bytes written=344705 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=146 HDFS: Number of bytes written=54 HDFS: Number of read operations=9 HDFS: Number of large read operations=0 HDFS: Number of write operations=4 Job Counters Launched map tasks=1 Launched reduce tasks=2 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=4784 Total time spent by all reduces in occupied slots (ms)=11119 Total time spent by all map tasks (ms)=4784 Total time spent by all reduce tasks (ms)=11119 Total vcore-seconds taken by all map tasks=4784 Total vcore-seconds taken by all reduce tasks=11119 Total megabyte-seconds taken by all map tasks=4898816 Total megabyte-seconds taken by all reduce tasks=11385856 Map-Reduce Framework Map input records=4 Map output records=8 Map output bytes=79 Map output materialized bytes=112 Input split bytes=99 Combine input records=8 Combine output records=7 Reduce input groups=7 Reduce shuffle bytes=112 Reduce input records=7 Reduce output records=7 Spilled Records=14 Shuffled Maps =2 Failed Shuffles=0 Merged Map outputs=2 GC time elapsed (ms)=134 CPU time spent (ms)=3220 Physical memory (bytes) snapshot=874729472 Virtual memory (bytes) snapshot=4710256640 Total committed heap usage (bytes)=860356608 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=47 File Output Format Counters Bytes Written=54 [root@Vm90 wxl]#
其他参考:
第一个MapReduce程序——WordCount
MapReduce程序-map和reduce的过程
相关文章推荐
- mapreduce的countWord程序
- MapReduce入门程序WordCount增强版
- (HADOOP入门)mapreduce入门程序wordcount旧版API
- 配置Hadoop2.x的HDFS、MapReduce来运行WordCount程序
- 用Python编写MapReduce的WordCount实例程序
- MapReduce(一):入门级程序wordcount及其分析
- 第一个MapReduce程序----wordcount(编写并运行)
- 我的第一个MapReduce程序(WordCount)
- Hadoop2.4.1 简单的wordCount的MapReduce程序
- WordCount,第一个MapReduce程序
- hadoop hdfs搭建 mapreduce环境搭建 wordcount程序简单注释
- wordCount程序中MapReduce工作过程分析
- MapReduce中wordCount程序工作过程分析
- hadoop学习笔记(三)mapreduce程序wordcount
- Hadoop MapReduce示例程序WordCount.java手动编译运行解析
- 第一个MapReduce程序——WordCount
- 第一个MapReduce程序——WordCount
- MapReduce基本原理与WordCount程序