hadoop 实现简单的wordcount实例
2017-03-29 10:47
405 查看
前置条件:
在hadoop官网下载某个版本的zip文件,这里下载的版本是2.7.3,将其解压刀你的电脑的某个目录中,这里为:D:\dev\hadoop-2.7.3
下载地址:http://apache.fayea.com/hadoop/common/hadoop-2.7.3/
![](http://img.blog.csdn.net/20170329103834176?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvdTAxMjY3OTU4Mw==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/SouthEast)
src的是文件源码,有需要的可以下载下来研究~
配置环境变量:
HADOOP_HOME D:\dev\hadoop-2.7.3
1.使用idea新建一个maven项目
2.修改maven项目中pom文件,加入如下依赖
3.在java文件新建一个包 com.hadoop.wordcount 名字可以自定义
在包内新建一个类 wordCount
内容如下:
4.resources 文件中新建日志配置文件 log4j.properties
接下来就可以直接WordCount类中运行main函数了
首先配置运行前的参数
![](http://img.blog.csdn.net/20170329104344444?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvdTAxMjY3OTU4Mw==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/SouthEast)
接下来直接在WordCount类中右击鼠标,点击运行即可,可在控制台查看运行过程中输出的结果,以及你填写的文件输出路径的文件中结果
笔者的输出结果如下图所示:
![](http://img.blog.csdn.net/20170329104601851?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvdTAxMjY3OTU4Mw==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/SouthEast)
好了,一个简单的MapReduce下的单词统计实例就完成了~
在hadoop官网下载某个版本的zip文件,这里下载的版本是2.7.3,将其解压刀你的电脑的某个目录中,这里为:D:\dev\hadoop-2.7.3
下载地址:http://apache.fayea.com/hadoop/common/hadoop-2.7.3/
src的是文件源码,有需要的可以下载下来研究~
配置环境变量:
HADOOP_HOME D:\dev\hadoop-2.7.3
1.使用idea新建一个maven项目
2.修改maven项目中pom文件,加入如下依赖
<dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>3.8.1</version> <scope>test</scope> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>2.7.3</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-hdfs</artifactId> <version>2.7.3</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-mapreduce-client-core</artifactId> <version>2.7.3</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-mapreduce-client-jobclient</artifactId> <version>2.7.3</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-mapreduce-client-common</artifactId> <version>2.7.3</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>2.7.3</version> </dependency>
3.在java文件新建一个包 com.hadoop.wordcount 名字可以自定义
在包内新建一个类 wordCount
内容如下:
package com.hadoop.wordcount; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import java.io.IOException; import java.util.StringTokenizer; /** * WordCount * * @author: wychen * @time: 2017/3/20 20:25 */ public class WordCount { static class MyMapper extends Mapper<Object, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); @Override protected void map(Object key, Text value, Mapper<Object, Text, Text, IntWritable>.Context context) throws IOException, InterruptedException { //分割字符串 StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { //排除字母少于5个的 String tmp = itr.nextToken(); if (tmp.length() < 5) { continue; } word.set(tmp); context.write(word, one); } } } static class MyReduce extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable result = new IntWritable(); private Text keyEx = new Text(); @Override protected void reduce(Text key, Iterable<IntWritable> values, Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { //将map的结果方法,乘以2 sum += val.get() + 1; } result.set(sum); keyEx.set("输出:" + key.toString()); context.write(keyEx, result); } } public static void main(String[] args) throws Exception { //配置信息 Configuration conf = new Configuration(); //job名称 Job job = Job.getInstance(conf, "mywordcount"); job.setJarByClass(WordCount.class); job.setMapperClass(MyMapper.class); job.setReducerClass(MyReduce.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); //输入 输出path FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); //结束 System.exit(job.waitForCompletion(true) ? 0 : 1); } }
4.resources 文件中新建日志配置文件 log4j.properties
log4j.rootLogger=DEBUG, stdout log4j.appender.stdout=org.apache.log4j.ConsoleAppender log4j.appender.stdout.layout=org.apache.log4j.PatternLayout log4j.appender.stdout.layout.ConversionPattern=%c{1} - %m%n log4j.logger.java.sql.PreparedStatement=DEBUG
接下来就可以直接WordCount类中运行main函数了
首先配置运行前的参数
接下来直接在WordCount类中右击鼠标,点击运行即可,可在控制台查看运行过程中输出的结果,以及你填写的文件输出路径的文件中结果
笔者的输出结果如下图所示:
好了,一个简单的MapReduce下的单词统计实例就完成了~
相关文章推荐
- 用淘宝Fourinone实现Hadoop经典实例wordcount
- hadoop简单实例-WordCount
- 本地eclipse连接远程hadoop集群运行wordcount实例,实现远程调试
- python基于Hadoop Streaming实现简单的WordCount
- 用淘宝Fourinone实现Hadoop经典实例wordcount
- Hadoop简单案例WordCount运行详解(转)
- Hadoop示例程序WordCount详解及实例
- Hadoop之道--MapReduce之Hello World实例wordcount
- Hadoop提供了最简单的Map/Reduce编程实例WordCount,本文对该Demo的程序结构,以及Map/Reduce框架的注意事项,进行了分析。
- CentOS安装Hadoop并运行WordCount实例
- Hadoop WordCount改进实现正确识别单词以及词频降序排序
- Hadoop2.0 Mapreduce实例WordCount体验
- hadoop wordcount运行实例
- 对hadoop第一个小程序WordCount的简单解释.
- Hadoop2.2.0 实例测试 WordCount程序
- Hadoop在Linux下伪分布式的安装 wordcount实例的运行
- Word Count Example of Hadoop V1.0 – Mapper的实现
- 命令行运行hadoop实例wordcount程序
- hadoop伪分布式运行wordcount 实例Fedora14
- 【hadoop学习笔记】4.eclipse运行wordcount实例