您的位置:首页 > 运维架构

「hadoop」win7 idea maven hadoop 运行WordCount示例

2017-09-21 18:26 1001 查看
运行一个简单的hadoop实例,已自测成功。

假设已安装如下环境:

1、win7跑三台ubuntu虚拟机,虚拟机已成功安装hadoop2.8.1环境;

2、win7安装idea工具 idea2017;

3、win7安装hadoop2.8.1环境,并已配置相关的环境变量;

4、拷贝windows用的已编译好的hadoop.dll和winutils.exe,务必注意一定要是2.8.1版本的, 参考 https://github.com/steveloughran/winutils
【步骤】

1、参考 http://blog.csdn.net/u011654631/article/details/70037219,该地址简称 参考页;

2、idea创建maven的java工程;

3、按参考页pom.xml中集成相应的hadoop jar包;(有hadoop-mapreduce-client-core,hadoop-hdfs,hadoop-mapreduce-client-jobclient(务必去掉provideed控制),hadoop-mapreduce-client-common,hadoop-common。

4、最后通过$hdfs dfs -cat /test/out/part-r-00000查看统计结果。

WordCount代码

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

import java.io.IOException;

public class WordCount extends Configured implements Tool {
public int run(String[] strings) throws Exception {
try {
System.setProperty("hadoop.home.dir", "C:\\LearnTool\\hadoop");
System.setProperty("HADOOP_USER_NAME", "chendajian");

Configuration conf = getConf();
conf.set("mapreduce.job.jar", "C:\\Workspace\\javaweb\\hadoop\\out\\artifacts\\hadoop_jar\\hadoop.jar");
//            conf.set("yarn.resourcemanager.hostname", "10.0.10.231");
conf.set("mapreduce.app-submission.cross-platform", "true");

Job job = Job.getInstance(conf);
job.setJarByClass(WordCount.class);

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);

job.setMapperClass(WcMapper.class);
job.setReducerClass(WcReducer.class);

job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);

// 清空out
FileSystem fs = FileSystem.get(conf);
String out = "hdfs://10.0.10.231:9000/test/out";
Path outPath = new Path(out);
if (fs.exists(outPath)) {
fs.delete(outPath, true);
}

FileInputFormat.setInputPaths(job, "hdfs://master:9000/test/testvim.txt");
FileOutputFormat.setOutputPath(job, new Path(out));

job.waitForCompletion(true);
} catch (Exception e) {
e.printStackTrace();
}
return 0;
}

public static class WcMapper extends Mapper<LongWritable, Text, Text, LongWritable> {
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String mVal = value.toString();
context.write(new Text(mVal), new LongWritable(1));
}
}

public static class WcReducer extends Reducer<Text, LongWritable, Text, LongWritable> {
@Override
protected void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException {
long sum = 0;
for (LongWritable lVal : values) {
sum += lVal.get();
}
context.write(key, new LongWritable(sum));
}
}

public static void main(String[] args) throws Exception {
ToolRunner.run(new WordCount(), args);
}
}


View Code

几点补充:

1、把core-site.xml,mapred-site.xml,yarn-site.xml等拷到工程的resources目录下;

2、如遇到 hdfs:master:9000 访问refused,用IP地址替换master试试;

3、input文件位于hdfs系统内,linux只能通过hdfs dfs方式访问;

4、2.8.1版本的hadoop.dll和winutils.exe需另行下载, 参考 https://github.com/steveloughran/winutils;
5、用户权限问题,win7增加环境变量 HADOOP_USER_NAME, 值为 hadoop的用户名;

6、增加日志打印配置文件log4j.xml,放到工程的resources目录下,xml内容参考 http://www.cnblogs.com/ftrako/p/7570094.html

7、pom.xml中的hadoop-mapreduce-client-jobclient依赖中去掉provide控制,会导致不会使用YARN模式,而使用local模式;
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: