初学Hadoop之WordCount词频统计
2017-09-01 16:15
288 查看
阅读目录
1、WordCount源码
2、编译源码
3、运行
4、查看结果
回到目录
将源码文件WordCount.java放到Hadoop2.6.0文件夹中。
![](https://oscdn.geek-share.com/Uploads/Images/Content/201603/69c5a8ac3fa60e0848d784a6dd461da6.gif)
![](https://oscdn.geek-share.com/Uploads/Images/Content/201603/69c5a8ac3fa60e0848d784a6dd461da6.gif)
回到目录
![](https://oscdn.geek-share.com/Uploads/Images/Content/202004/03/68cbd3620735c465fd30747166820583.png)
回到目录
新建input文件夹,用于存放需要统计的文本。
复制hadoop-2.6.0文件夹下的txt文件到input文件夹下。
![](https://oscdn.geek-share.com/Uploads/Images/Content/202004/03/926e05652f5a411f625bd878c49432f3.png)
运行命令。
![](https://oscdn.geek-share.com/Uploads/Images/Content/202004/03/5bc0dc5dbefb139ef8863e53f7a6d7d7.png)
![](https://oscdn.geek-share.com/Uploads/Images/Content/202004/03/7876dd93ea7390e614d29bbf0cc38697.png)
回到目录
![](https://oscdn.geek-share.com/Uploads/Images/Content/202004/03/fda8eb54ef3b99be6ee3aa348c830330.png)
至此,WordCount词频统计运行成功,Hadoop单机模式环境搭建成功。
作者:何海洋
出处:http://hehaiyang.cnblogs.com/
本博客内容主要以学习、研究和分享为主,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文链接,否则保留追究法律责任的权利。
1、WordCount源码
2、编译源码
3、运行
4、查看结果
回到目录
1、WordCount源码
将源码文件WordCount.java放到Hadoop2.6.0文件夹中。![](https://oscdn.geek-share.com/Uploads/Images/Content/201603/69c5a8ac3fa60e0848d784a6dd461da6.gif)
import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class WordCount { public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{ private final static IntWritable one = new IntWritable(1); private Text word = new Text() c000 ; public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } }
![](https://oscdn.geek-share.com/Uploads/Images/Content/201603/69c5a8ac3fa60e0848d784a6dd461da6.gif)
回到目录
2、编译源码
$ bin/hadoop com.sun.tools.javac.Main WordCount.java #将WordCount.java编译成三个.class文件 $ jar cf wc.jar WordCount*.class #将三个.class文件打包成jar文件
![](https://oscdn.geek-share.com/Uploads/Images/Content/202004/03/68cbd3620735c465fd30747166820583.png)
回到目录
3、运行
新建input文件夹,用于存放需要统计的文本。cd /opt/hadoop-2.6.0 mkdir input
复制hadoop-2.6.0文件夹下的txt文件到input文件夹下。
cp *.txt /opt/hadoop-2.6.0/input
![](https://oscdn.geek-share.com/Uploads/Images/Content/202004/03/926e05652f5a411f625bd878c49432f3.png)
运行命令。
bin/hadoop jar wc.jar WordCount /opt/hadoop-2.6.0/input /opt/hadoop-2.6.0/output #自动生成output文件夹,用于存放分词统计结果。
![](https://oscdn.geek-share.com/Uploads/Images/Content/202004/03/5bc0dc5dbefb139ef8863e53f7a6d7d7.png)
![](https://oscdn.geek-share.com/Uploads/Images/Content/202004/03/7876dd93ea7390e614d29bbf0cc38697.png)
回到目录
4、查看结果
bin/hdfs dfs -cat /opt/hadoop-2.6.0/output/part-r-00000
![](https://oscdn.geek-share.com/Uploads/Images/Content/202004/03/fda8eb54ef3b99be6ee3aa348c830330.png)
至此,WordCount词频统计运行成功,Hadoop单机模式环境搭建成功。
作者:何海洋
出处:http://hehaiyang.cnblogs.com/
本博客内容主要以学习、研究和分享为主,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文链接,否则保留追究法律责任的权利。
相关文章推荐
- Hadoop最基本的wordcount(统计词频)
- Hadoop最基本的wordcount(统计词频)
- Java实现词频统计(Wordcount)-Map或Hashtable的value排序
- HADOOP(1)__Mapreduce_WordCount统计单词数
- Hadoop编程入门,统计单词出现数目wordcount
- hadoop初学之WordCount程序一步一步运行
- 初学Hadoop之WordCount词频统计
- hadoop实例分析之WordCount单词统计分析
- Hadoop 第三课 wordcount 统计
- hadoop的统计单词程序WordCount
- 初学Hadoop之图解MapReduce与WordCount示例分析
- Hadoop实例WordCount程序修改--词频降序
- hadoop实例分析之WordCount单词统计分析
- cloudera CDH5.13.1 Hadoop2.6.0 测试运行wordcount大数据统计作业
- 【Big Data - Hadoop - MapReduce】初学Hadoop之图解MapReduce与WordCount示例分析
- 在Linux系统设置共享文件夹、Hadoop单机/伪分布部署,运行Hadoop Wordcount单词统计实例
- Hadoop Demo 1 ——WordCount 统计文章中单词的个数
- Hadoop入门实例——WordCount统计单词
- 初学Hadoop之图解MapReduce与WordCount示例分析