您的位置:首页 > 其它

MapReduce UnitTest

2013-08-01 19:20 369 查看
通常情况下,我们需要用小数据集来单元测试我们写好的map函数和reduce函数。而一般我们可以使用Mockito框架来模拟OutputCollector对象(Hadoop版本号小于0.20.0)和Context对象(大于等于0.20.0)。

下面是一个简单的WordCount例子:(使用的是新API)

在开始之前,需要导入以下包:

1.Hadoop安装目录下和lib目录下的所有jar包。

2.JUnit4

3.Mockito

map函数:

Java代码

public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

private static final IntWritable one = new IntWritable(1);

private Text word = new Text();

@Override

protected void map(LongWritable key, Text value,Context context)

throws IOException, InterruptedException {

String line = value.toString(); // 该行的内容

String[] words = line.split(";"); // 解析该行的单词

for(String w : words) {

word.set(w);

context.write(word,one);

}

}

}

reduce函数:

Java代码

public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {

@Override

protected void reduce(Text key, Iterable<IntWritable> values,Context context)

throws IOException, InterruptedException {

int sum = 0;

Iterator<IntWritable> iterator = values.iterator(); // key相同的值集合

while(iterator.hasNext()) {

int one = iterator.next().get();

sum += one;

}

context.write(key, new IntWritable(sum));

}

}

测试代码类:

Java代码

public class WordCountMapperReducerTest {

@Test

public void processValidRecord() throws IOException, InterruptedException {

WordCountMapper mapper = new WordCountMapper();

Text value = new Text("hello");

org.apache.hadoop.mapreduce.Mapper.Context context = mock(Context.class);

mapper.map(null, value, context);

verify(context).write(new Text("hello"), new IntWritable(1));

}

@Test

public void processResult() throws IOException, InterruptedException {

WordCountReducer reducer = new WordCountReducer();

Text key = new Text("hello");

// {"hello",[1,1,2]}

Iterable<IntWritable> values = Arrays.asList(new IntWritable(1),new IntWritable(1),new IntWritable(2));

org.apache.hadoop.mapreduce.Reducer.Context context = mock(org.apache.hadoop.mapreduce.Reducer.Context.class);

reducer.reduce(key, values, context);

verify(context).write(key, new IntWritable(4)); // {"hello",4}

}

}

具体就是给map函数传入一行数据-"hello"

map函数对数据进行处理,输出{"hello",0}

reduce函数接受map函数的输出数据,对相同key的值求和,并输出。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: