您的位置：首页 > 编程语言 > PHP开发

MR->OutputFormat->多文件名输出格式 MultipleOutputs

2015-01-15 10:44 661 查看

hadoop1.2.1中使用MultipleOutputs将结果输出到多个文件或文件夹

使用步骤主要有三步：

1、在reduce或map类中创建MultipleOutputs对象，将结果输出

Java代码

class reduceStatistics extends Reducer<Text, IntWritable, Text, IntWritable>{



    //将结果输出到多个文件或多个文件夹

    private MultipleOutputs<Text,IntWritable> mos;

    //创建对象

    protected void setup(Context context) throws IOException,InterruptedException {

        mos = new MultipleOutputs<Text, IntWritable>(context);

     }



        //关闭对象

    protected void cleanup(Context context) throws IOException,InterruptedException {

        mos.close();

    }

}

2、在map或reduce方法中使用MultipleOutputs对象输出数据，代替congtext.write()

Java代码

protected void reduce(Text key, Iterable<IntWritable> values, Context context)

            throws IOException, InterruptedException {

        IntWritable V = new IntWritable();

        int sum = 0;

        for(IntWritable value : values){

            sum = sum + value.get();

        }

        System.out.println("word:" + key.toString() + "     sum = " + sum);

        V.set(sum);



        //使用MultipleOutputs对象输出数据

        if(key.toString().equals("hello")){

            mos.write("filePreFix1", key, V);

        }else if(key.toString().equals("world")){

            mos.write("filePreFix2", key, V);

        }else if(key.toString().equals("hadoop")){

            //输出到hadoop/hadoopfile-r-00000文件

            mos.write("filePreFix3", key, V, "hadoop/");

        }



    }

3、在创建job时，定义附加的输出文件，这里的文件名称与第二步设置的文件名相同

Java代码

//定义附加的输出文件

            MultipleOutputs.addNamedOutput(job,"filePreFix1",TextOutputFormat.class,Text.class,IntWritable.class);

            MultipleOutputs.addNamedOutput(job,"filePreFix2",TextOutputFormat.class,Text.class,IntWritable.class);

            MultipleOutputs.addNamedOutput(job,"filePreFix3",TextOutputFormat.class,Text.class,IntWritable.class);

=====================================================================================================

4、易错指南

4.1   mos.write("filePreFix1", key, V);
中的filePreFix 必须是 [A-Za-z0-9] 这个源码有体现。

否则报异常：

java.lang.IllegalArgmentException: Named cannot be have a '' char

at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.checkNamedOutputName(MultipleOutouts.java:160)

at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.write(MultipleOutputs.java:186)



4.2 reduce中的   mos.write("filePreFix1", key, V);
必须和 main函数中的 MultipleOutputs.addNamedOutput(job,"filePreFix1",TextOutputFormat.class,Text.class,IntWritable.class);
一一对应。否则则报错

java.lang.IllegalArgmentException:
Named output '' not defined

at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.checkNamedOutputName(MultipleOutouts.java:193)

at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.write(MultipleOutputs.java:363)

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： mapreduce MultipleOutputs Named output not de Named cannot be have

相关文章推荐

新的分享

章节导航