您的位置:首页 > 运维架构

Hadoop实战第一篇

2015-08-07 19:53 363 查看
前言:
  都说现在是草根为尊的时代,近年来hadoop及spark技术在国内越来越流行。而且渐渐现成为企业的新宠。在DT时代全面来临之前,能提早接触大数据的技术必然能先人一步。本文作为Hadoop系列的第一篇,将HDFS和MapRed两个技术核心用2个实例简单实现一些,希望能供hadoop入门的朋友些许参考。

--HDFS

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

public class  HDFStest {
final static String P_IN="hdfs://hadoop0:9000/data";
final static String P_F1="hdfs://hadoop0:9000/a.txt";

public static void main(String[] args) throws IOException {

FileSystem fileSystem = FileSystem.get(new Configuration());
System.out.println("make diretory:");
fileSystem.mkdirs(new Path(P_IN));
System.out.println("judgy if exist 'File':");
System.out.println(fileSystem.exists(new Path(P_F1)));

}

}


--MapReduce

实现文本单词出现次数的统计:


import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WC {

static String INPUT="hdfs://hadoop0:9000/hello";
static String OUTPUT="hdfs://hadoop0:9000/output";

public static void main(String[] args) throws Exception{

Job job = new Job(new Configuration(),WC.class.getSimpleName());
job.setMapperClass(MyMapper.class);
job.setReducerClass(MyReducer.class);
job.setJarByClass(WC.class);
//输出结果格式
job.setMapOutputKeyClass(Text.class);;
job.setMapOutputValueClass(LongWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
//路径设置
FileInputFormat.setInputPaths(job, INPUT);
FileOutputFormat.setOutputPath(job, new Path(OUTPUT));
//waitfor
job.waitForCompletion(true);

}

static class MyMapper extends Mapper<LongWritable, Text,Text,LongWritable >{

@Override
protected void map(LongWritable key, Text value,
Mapper<LongWritable, Text, Text, LongWritable>.Context context)
throws IOException, InterruptedException {

String[] words = value.toString().split(" ");
for(String word:words){
context.write(new Text(word), new LongWritable(1));
}
}
}
static class MyReducer extends Reducer<Text, LongWritable, Text, LongWritable>{

@Override
protected void reduce(Text arg0, Iterable<LongWritable> arg1,Context context)
throws IOException, InterruptedException {

Long sum=0L;
for(LongWritable c:arg1){
sum += c.get();
}
context.write(arg0,new LongWritable(sum));
}
}
}


以上代码相对简单,map读取到一行“Text”之后通过字符串切分函数split()得到各个单词,每个单词出现一次计数为1:



Reduce操作,实际就是一个集合元素累计的操作:

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: