IDEA 运行 Hadoop WordCount示例
2017-12-05 19:37
435 查看
1、在本地解压hadoop安装包,然后修改系统变量,增加HADOOP_HOME及HADOOP_USER_NAME,HADOOP_USER_NAME为实际集群运行用户
2、修改项目的Pom文件
3、将core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml、log4j.properties拷贝至resources目录
在mapred-site.xml设置
4、示例程序
Mapper
Reducer
Main
5、编辑运行选项,设置参数
6、运行
2、修改项目的Pom文件
<dependencies> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>2.9.0</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>2.9.0</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-hdfs</artifactId> <version>2.9.0</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-mapreduce-client-core</artifactId> <version>2.9.0</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-mapreduce-client-jobclient</artifactId> <version>2.9.0</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-mapreduce-client-common</artifactId> <version>2.9.0</version> </dependency> <dependency> <groupId>commons-cli</groupId> <artifactId>commons-cli</artifactId> <version>1.2</version> </dependency> </dependencies>
3、将core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml、log4j.properties拷贝至resources目录
在mapred-site.xml设置
<property> <name>mapreduce.app-submission.cross-platform</name> <value>true</value> </property> <property> <name>mapred.jar</name> <value>E:\Projects\hadoop\HadoopExercise\target\HadoopExercise-1.0-SNAPSHOT.jar</value> </property>
4、示例程序
Mapper
package org.zheng.demo; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; import java.io.IOException; import java.util.StringTokenizer; public class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); @Override public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } }
Reducer
package org.zheng.demo; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; import java.io.IOException; public class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable result = new IntWritable(); @Override public void reduce(Text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } }
Main
package org.zheng.demo; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class WordCount { public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); //设置RM 访问位置 Job job = Job.getInstance(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } }
5、编辑运行选项,设置参数
6、运行
相关文章推荐
- 「hadoop」win7 idea maven hadoop 运行WordCount示例
- Ubuntu12.04+hadoop-1.1.2运行wordcount示例
- Hadoop MapReduce示例程序WordCount.java手动编译运行解析
- hadoop 自带示例wordcount 详细运行步骤
- 运行Hadoop的示例程序WordCount-Running Hadoop Example
- Hadoop 之 配置与运行 wordcount示例
- hadoop学习(7)—— 使用yarn运行mapreduce一个简单的wordcount示例
- Hadoop示例程序WordCount运行及详解
- 【Spark亚太研究院系列丛书】Spark实战高手之路-第一章 构建Spark集群-配置Hadoop伪分布模式并运行Wordcount示例(1)
- hadoop示例程序wordcount的运行
- Hadoop MapReduce示例程序WordCount.java手动编译运行解析
- 【Spark亚太研究院系列丛书】Spark实战高手之路-第一章 构建Spark集群-配置Hadoop伪分布模式并运行Wordcount示例(1)
- Hadoop示例程序WordCount运行及详解
- Hadoop示例程序WordCount编译运行
- wordcount示例程序运行全过程(Hadoop-1.0.0)
- idea运行spark的wordcount与eclipse运行spark的wordcount示例,及本地运行sparkpi
- Hadoop_WordCount示例_运行详解
- Hadoop示例程序WordCount编译运行
- win7下idea远程连接hadoop,运行wordCount
- Hadoop的安装与配置及示例程序wordcount的运行