hadoop简单应用-统计文本文件单词个数
2011-11-04 18:07
483 查看
=============hadoop-0.12.2-core 版本===========================
MyMap.java
map方法把文本文件单词输出到中间过程output中,格式:<key,value>
handoop 1
Bye 1
handoop 1
World 1
MyReduce.java
reduce方法
遍历values 就可以得到同一个key的所有value
任务,主调方法
打开D:\files\wordCoutOut\part-00000文件如下结果:
Bye 3
Hadoop 4
Hello 3
World 2
===========hadoop-0.20.2-core版本========================
MyMap.java
MyReduce.java
JobTest.java
MyMap.java
map方法把文本文件单词输出到中间过程output中,格式:<key,value>
handoop 1
Bye 1
handoop 1
World 1
public class MyMap extends MapReduceBase implements Mapper { Text t = new Text(); private final static IntWritable one = new IntWritable(1); private Text word = new Text(); @Override public void map(WritableComparable key, Writable value, OutputCollector output, Reporter reporter) throws IOException { String line = value.toString(); StringTokenizer stz = new StringTokenizer(line); while(stz.hasMoreTokens()){ word.set(stz.nextToken()); output.collect(word, one); } } }
MyReduce.java
reduce方法
遍历values 就可以得到同一个key的所有value
public class MyReduce extends MapReduceBase implements Reducer { public void reduce(WritableComparable key, Iterator values, OutputCollector output,Reporter reporter) throws IOException { int sum = 0; while(values.hasNext()){ sum+=Integer.parseInt(values.next().toString()); } output.collect(key, new IntWritable(sum)); } }
任务,主调方法
public class JobTest{ public int run(String... args) throws IOException{ JobConf conf = new JobConf(new Configuration()); conf.setJobName("wordCount"); conf.setInputPath(new Path(args[0])); conf.setOutputPath(new Path(args[1])); conf.setMapperClass(MyMap.class); conf.setReducerClass(MyReduce.class); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); JobClient.runJob(conf); return 0; } public static void main(String[] args){ try { new JobTest().run("D:\\files\\wordCount.txt","D:\\files\\wordCoutOut"); } catch (IOException e) { e.printStackTrace(); } }
打开D:\files\wordCoutOut\part-00000文件如下结果:
Bye 3
Hadoop 4
Hello 3
World 2
===========hadoop-0.20.2-core版本========================
MyMap.java
public class MyMap extends Mapper<Object, Text, Text, IntWritable> { Text t = new Text(); private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context) throws IOException, InterruptedException { //output 和reporter 都集成到Context 中 StringTokenizer itr = new StringTokenizer(value.toString()); while(itr.hasMoreTokens()){ word.set(itr.nextToken()); context.write(word, one); } } }
MyReduce.java
public class MyReduce extends Reducer<Text,IntWritable,Text,IntWritable> { private IntWritable result = new IntWritable(); @Override protected void reduce(Text key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException { int sum = 0; for(IntWritable val:values){ sum+=val.get(); } result.set(sum); context.write(key, result); } }
JobTest.java
public class JobTest{ public int run(String... args) throws IOException, InterruptedException, ClassNotFoundException{ Job job = new Job(new Configuration(),"word count"); job.setJarByClass(JobTest.class); job.setMapperClass(MyMap.class); job.setCombinerClass(MyReduce.class); job.setReducerClass(MyReduce.class); job.setOutputKeyClass(Text.class);//设置reduce输出Key 类型 job.setOutputValueClass(IntWritable.class);//设置输出value 类型 FileInputFormat.addInputPath(job, new Path(args[0]));//设置输入路径 FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true)?0:1); return 0; } public static void main(String[] args) throws InterruptedException, ClassNotFoundException{ try { new JobTest().run("D:\\files\\wordCount.txt","D:\\files\\wordCoutOut"); } catch (IOException e) { e.printStackTrace(); } }
相关文章推荐
- Hadoop:使用原生python编写MapReduce来统计文本文件中所有单词出现的频率功能
- 统计文本文件中单词出现频率,自己编写的Java小程序
- Objective-C边学边记-2:统计文本文件中单词的长度
- 用hash表统计文本文件中每个单词出现的频率
- Spark在文本统计中的简单应用
- 统计文本文件中单词出现次数最多的单词
- c++简单读写文本,统计文件的行数,读取文件数据到数组
- Python 练习册 6-统计文本文件中的出现最多的单词
- 用hadoop统计文本中单词的个数
- Linux下统计文本文件中前n个出现频率最高的单词
- Hadoop 统计文件中某个单词出现的次数
- Hadoop 统计文件中单词出现的次数
- 统计文本文件中单词出现频率(用java集合框架编写)
- 在一个文本文件中的单词统计频率并打印前十个
- 统计文本文件中单词出现频率(用java集合框架编写)
- 应用散列表和外拉链表统计文本中单词个数
- hadoop运行简单例子--单词统计
- 一个简单的程序,统计文本文档中的单词和汉字数,逆序排列(出现频率高的排在最前面)。python实现。
- Hadoop的简单单词统计案例