您的位置：首页 > 编程语言 > Java开发

os上的hadoop执行环境及eclipse嵌入hadoop开发环境配置记录

2014-08-13 17:57 555 查看

下载hadoop包去官网即可

个人使用的是hadoop1.2.1稳定版本之前测试基本都是这个版本 2.x的虽新但担心各种兼容性问题而没有选择

官网下载：
http://hadoop.apache.org/#Download+Hadoop
下载包以后因为os底层为unix系统，只需在命令行解压后，到hadoop-env里配置好JAVA_HOME（os有自带的jdk1.6）在bin下就可以./hadoop 执行了

然后export 全路径执行的环境变量就可以了

如：
export HADOOP_INSTALL=/Users/user/Hadoop/hadoop-1.2.1/
export PATH=$PATH:$HADOOP_INSTALL/bin

这样你的机器就可以跑单机或者伪集群了

然后我们嵌入eclipse

下载地址：
http://wiki.apache.org/hadoop/EclipsePlugIn
直接选择里边的eclipse-plugin就可以用

嵌入eclipse的plugin 放到eclipse的plugin下重启动即可

启动以后创建mr项目时留意红字报错，意思是找不到hadoop的路径

Invalid Hadoop Runtime specified; please click 'Configure Hadoop install directory' or fill in library location input

field

解决方式为

eclipse window->preferences - > Map/Reduce 选择hadoop根目录填上即可

这时我们创建一个项目就可以试一试了
摘一段测试代码，可以验证我们环境是不是正常可以跑job了。

将示例代码直接复制进来，而后修改文件头部包名即可

如下：

package testMapReduce;

import java.io.File;
import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {
public static class dataMapper extends Mapper<Object,Text,Text,IntWritable>{
private final static IntWritable one=new IntWritable(1);
private Text word=new Text();

public void map(Object key,Text value,Context context)throws IOException,InterruptedException{
StringTokenizer itr=new StringTokenizer(value.toString());
while(itr.hasMoreTokens()){
word.set(itr.nextToken());
context.write(word,one);
}
}
}

public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable>{
private IntWritable result=new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,Context context)
throws IOException,InterruptedException{
int sum=0;
for(IntWritable val:values){
sum+=val.get();
}
result.set(sum);
context.write(key, result);
}
}

public static void delFile(File file){
if(file.exists()){
if(file.isFile()){
file.delete();
}else{
File files[]=file.listFiles();
for(int i=0;i<files.length;i++){
delFile(files[i]);
}
}
file.delete();
}
}

public static void main(String args[])throws Exception{

Configuration conf=new Configuration();
String[] otherArgs=new GenericOptionsParser(conf,args).getRemainingArgs();

File out=new File(otherArgs[1]);
/*System.out.println(out.exists());
if(out.exists()){
out.delete();
System.out.println(out.delete());
System.out.println(out.exists());
}*/
if(out.isDirectory()){
delFile(out);
}
if(otherArgs.length!=2){
System.err.println("Usage:wordcount <in> <out>");
System.exit(2);
}

Job job=new Job(conf,"wordcount");
job.setJarByClass(WordCount.class);
job.setMapperClass(dataMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true)?0:1);
}
}

优化过的splite代码片用于替换StringTokenizer（传说词类即将被淘汰，但是，效率来说StringTokenizer比splite高到姥姥家了。有兴趣戳http://my.oschina.net/kyo153/blog/41829）

public void map(Object key,Text value,Context context)throws IOException,InterruptedException{
//StringTokenizer itr=new StringTokenizer(value.toString());
//while(itr.hasMoreTokens()){
//word.set(itr.nextToken());
//context.write(word,one);

String[] sp=value.toString().split(" ");
for(int i=0;i<sp.length;i++){
word.set(sp[i]);
context.write(word,one);
}
}
}

WordCount如要运行，需要指定两个参数，即代码中65行和66行所需指定的路径。针对这种情况，我们即可以改动代码，直接在此处写好目标路径(同时还需要将53-57行之间的代码注释)而后即可直接运行调试；也可以

配置WordCount的调试运行环境，为其配置运行参数。这里我们选择后一种方式。

选择菜单：Run -> Run Configurations -> Java Application，点击窗口左上角处的图标：

新建一个配置，将弹出的窗口显示项切换到Arguments选项：

此处需要我们填写Program arguments，即指定程序运行所需参数，根据程序设定，此时需要指定两个参数，一个指定要处理的文件源路径，另一个是处理后文件的输出路径，中间以空格分隔。请根据实际情况指定参数，配置好后，即可点击Run运行。

1为数据输入路径 2为输出。在程序执行之前 2不能存在否则会报错

执行出现如下提示基本就成功了。

执行完成可以去文件里看一下结果有success

就是成功了
你可以去part里看看你的成果咯：）

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航