您的位置：首页 > 运维架构 > Linux

Linux下编写运行自己的WordCount程序

2015-04-29 09:47 531 查看

一、实践环境

Ubuntu14.04+JDK1.8.0_25 +Eclipse3.8+ Hadoop2.5.1

一共三台linux机器（virtualbox虚拟机，桥接网络配置静态ip），已经部署好的完全分布式环境。

二、编写WordCount程序

1.启动Eclipse，创建Java Project。

2.配置Java Project，这一步很重要，折腾了半天才配好。这一步需要加入外部的jar文件，Hadoop2.5.1的相关jar包在hadoop-2.5.1/share/hadoop目录下：

有关配置的conf方面在 hadoop-2.5.1/share/hadoop/common/hadoop-commom-2.5.1.jar

（org.apache.hadoop.conf.Configuration

org.apache.hadoop.fs.Path

org.apache.hadoop.io.IntWritable

org.apache.hadoop.io.Text

org.apache.hadoop.util.GenericOptionsParser

）

有关Mapreduce的部分是在 hadoop-2.5.1/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.5.1.jar 里面了

（org.apache.hadoop.mapreduce.Job

org.apache.hadoop.mapreduce.Mapper

org.apache.hadoop.mapreduce.Reducer

org.apache.hadoop.mapreduce.lib.input.FileInputFormat

org.apache.hadoop.mapreduce.lib.output.FiliOutputFormat

)

编写WordCount程序总共用到的有3个jar包，hadoop-commom-2.5.1.jar hadoop-mapreduce-client-core-2.5.1.jar commons-cli-1.2.jar

其他的jar包暂时没用到，也不清楚是干什么的，就没有导入。

3.编写程序

代码如下：

package com.zju.hadoop;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {

public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>
{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();

public void map(Object key, Text value, Context context)
throws IOException, InterruptedException{
StringTokenizer itr = new StringTokenizer(value.toString());
while(itr.hasMoreTokens()){
word.set(itr.nextToken());
context.write(word, one);
}
}
}

public static class IntSumReducer
extends Reducer<Text, IntWritable, Text, IntWritable>
{
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException{
int sum = 0;
for(IntWritable val: values){
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}

/**
* @param args
*/
public static void main(String[] args) throws Exception{
// TODO Auto-generated method stub
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if(otherArgs.length!=2){
System.err.println("Usage: wordcount <in> <out>");
System.exit(2);
}

Job job = new Job(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true)?0:1);

}

}

上述代码是符合新版API的写法。

4.编译源代码，并导出jar文件

编译MapReduce程序，待完成编译时，导出jar文件。在导出jar文件的时候，记得指定一个主类Main Class，即你要运行的类，在这里是WordCount.java

三、运行WordCount程序

1.切换到配置好ssh的用户下，我这里是用户jsj

2.之前运行的结果可能会影响这次的运行，为了确保成功，我重新格式化hdfs

在这之前，需要先把‘hadoop.tmp.dir’对应目录下的内容清空，master和slave都要清空。

还要把slave节点的‘dfs.datanode.data.dir’对应目录下的内容清空，如果不清空，会导致slave节点的datanode进程无法启动。

之后便可以在master上执行命令格式化namenode（在/home/jsj/hadoop-2.5.1目录下执行命令）：

bin/hdfs namenode -format

3.启动hadoop：

sbin/start-all.sh

用命令jps查看hadoop的运行状态，也可以打开浏览器http://localhost:50070查看hadoop的运行状态。

4.创建输入目录

bin/hadoop fs -mkdir /input

5.把测试文件上传到hdfs文件系统中

bin/hadoop fs -put /home/jsj/test/file* /input

检查文件是否上传成功：

bin/hadoop fs -ls /input

6.将WordCount.jar放到/home/jsj目录下，运行命令：

bin/hadoop jar /home/jsj/WordCount.jar /input /output

注意，这里不需要指定类名，否则会出现参数错误。Hadoop2系列和之前的不一样，不需要指定运行的主类。

7.查看运行结果

bin/hadoop fs -ls /output

bin/hadoop fs -cat /output/*

则可以看到WordCount统计出的单词频数。

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航