hadoop学习过程-2013.08.22.2--hadoop1.2.1修改WordCount并编译
2013-08-28 22:09
423 查看
hadoop1.2.1修改WordCount并编译
hadoop-1.2.1\lib\commons-cli-1.2.jar
运行命令是:
hadoop jar
xxx/wordcount.jar org.apache.hadoop.examples.WordCount input output
(具体运行步骤参见文章 ”hadoop-1.2.1运行WordCount”)
下载本文
1. 代码:
package org.apache.hadoop.examples; import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.GenericOptionsParser; public class WordCount { public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context) throws IOException, InterruptedException { System.out.print("(" + key + ":" + value + ") ---map---> ["); StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); System.out.print("(" + word + ":" + one + "), "); context.write(word, one); } System.out.println("]"); } } public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { String vals="#"; int sum = 0; for (IntWritable val : values) { vals+=(val+","); sum += val.get(); } vals+="#"; System.out.print("(" + key+":" +vals+ ") ---reduce---> "); result.set(sum); System.out.println("(" + key + ":" + result+ ")"); context.write(key, result); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); String[] otherArgs = new GenericOptionsParser(conf, args) .getRemainingArgs(); if (otherArgs.length != 2) { System.err.println("Usage: wordcount <in> <out>"); System.exit(2); } Job job = new Job(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(otherArgs[0])); FileOutputFormat.setOutputPath(job, new Path(otherArgs[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } }
2. 依赖的jar
hadoop-1.2.1\hadoop-core-1.2.1.jarhadoop-1.2.1\lib\commons-cli-1.2.jar
3. 编译运行
在eclipse里新建java项目,加入1中的WordCount代码、2中的两个jar,将项目导出成wordcount.jar,放到hadoop集群环境中运行即可。运行命令是:
hadoop jar
xxx/wordcount.jar org.apache.hadoop.examples.WordCount input output
(具体运行步骤参见文章 ”hadoop-1.2.1运行WordCount”)
下载本文
相关文章推荐
- hadoop学习过程-2013.08.22.1--运行WordCount
- 查看Hadoop-1.2.1里面的例子jar并对WordCount进行修改
- hadoop学习(六)WordCount示例深度学习MapReduce过程(1)
- 大数据学习笔记——hadoop1.2.1 eclipse_plugin编译、安装及使用
- 【hadoop学习】在伪分布式hadoop上手把手实践word count程序【上】
- Hadoop学习笔记——简介及WordCount
- 第122讲:实战WordCount测试Hadoop集群环境学习笔记
- [Big Data]菜鸟的Hadoop (Before YARN) 学习笔记 (一) WordCount
- hadoop学习之WordCount.java代码解读
- 【hadoop学习】在伪分布式hadoop上实践word count程序——c/c++ pipes版本
- hadoop初识之十二:wordcount 处理过程和mapreduce的数据类型
- hadoop学习之WordCount程序升级版
- hadoop2.7.3 编译运行WordCount.java
- Hadoop1.2.1之WordCount常见问题
- hadoop学习笔记-3-运行wordcount示例
- 小白学习大数据之路——Hadoop3.0.0-alpha2 安装以及测试程序wordcount实践
- HADOOP的学习笔记 (第四期) eclipse 执行 wordcount
- 用命令行运行hadoop程序WordCount,编译hadoop程序报错
- hadoop2.7.3 编译运行WordCount.java
- Hadoop MapReduce示例程序WordCount.java手动编译运行解析