您的位置：首页 > 其它

mapreduce分布式缓存

2013-01-07 16:05 127 查看

概述
作用
应用场景
示例

作用

将hdfs中的文件copy到本地map/reduce程序端，供map/reduce端代码使用

应用场景

大文件与小文件合并操作，如大文件10G，小文件10M，并且输入格式可以完全不一样

示例
主函数端代码

public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf);
job.getConfiguration().set("xyz", "fileHdfsLocation");
}

map或reduce类端

public static class LogMapper extends
Mapper<Object, LongWritable, xxx, xxx> {
private static HashSet<String> smallCollection = null;

protected void setup(Context context) throws IOException,
InterruptedException {
smallCollection = new HashSet<String>();
Path fileIn = new Path(context.getConfiguration().get("xyz"));
FileSystem hdfs = fileIn.getFileSystem(context.getConfiguration());
FSDataInputStream hdfsReader = hdfs.open(fileIn);
Text line = new Text();
LineReader lineReader = new LineReader(hdfsReader);
while (lineReader.readLine(line) > 0) {
//you can do something here
System.out.println(line.toString());
smallCollection.add(line.toString());
}
lineReader.close();
hdfsReader.close();
}
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
// use this Hashset
}
}

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航