您的位置：首页 > 运维架构

Hadoop学习笔记———《MultipleOutputs———将结果输出到指定的多个文件或文件夹》

2015-10-12 17:33 344 查看

在MapReduce中使用MultipleOutputs将结果输出到指定的多个文件或文件夹

使用步骤主要有三步：

1、在reduce或map类中创建MultipleOutputs对象，将结果输出；

class TestReducer extends Reducer<Text, Text, Text, Text>{

//将结果输出到多个文件或多个文件夹
private MultipleOutputs mos;

protected void setup(Context context) throws IOException,InterruptedException {
mos = new MultipleOutputs<>(context);  // 初始化mos
}

protected void cleanup(Context context) throws IOException,InterruptedException {
mos.close();  //关闭对象
}
}

2、在map或reduce方法中使用MultipleOutputs对象输出数据，代替context.write();

protected void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
.... // 计算key和value
//使用MultipleOutputs对象输出数据
if(key.toString().equals("file1")){
mos.write("file1", key, value);
}else if(key.toString().equals("file2")){
mos.write("file2", key, value);
}
}

3、在创建job时，定义附加的输出文件()，这里的文件名称与第二步设置的文件名相同;

要注意的是hadoop是不承认未经注册namedOutput的，必须先在主函数中注册，然后才能写入，否则运行时会报not defined错误；所以要在主函数中用MultipleOutputs.addNamedOutput将对应的namedOutput文件注册一下。

//定义附加的输出文件
MultipleOutputs.addNamedOutput(job,"file1",TextOutputFormat.class,Text.class,Text.class);
MultipleOutputs.addNamedOutput(job,"file2",TextOutputFormat.class,Text.class,Text.class);

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航