MapReduce UnitTest
2013-08-01 19:20
369 查看
通常情况下,我们需要用小数据集来单元测试我们写好的map函数和reduce函数。而一般我们可以使用Mockito框架来模拟OutputCollector对象(Hadoop版本号小于0.20.0)和Context对象(大于等于0.20.0)。
下面是一个简单的WordCount例子:(使用的是新API)
在开始之前,需要导入以下包:
1.Hadoop安装目录下和lib目录下的所有jar包。
2.JUnit4
3.Mockito
map函数:
Java代码
public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private static final IntWritable one = new IntWritable(1);
private Text word = new Text();
@Override
protected void map(LongWritable key, Text value,Context context)
throws IOException, InterruptedException {
String line = value.toString(); // 该行的内容
String[] words = line.split(";"); // 解析该行的单词
for(String w : words) {
word.set(w);
context.write(word,one);
}
}
}
reduce函数:
Java代码
public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
@Override
protected void reduce(Text key, Iterable<IntWritable> values,Context context)
throws IOException, InterruptedException {
int sum = 0;
Iterator<IntWritable> iterator = values.iterator(); // key相同的值集合
while(iterator.hasNext()) {
int one = iterator.next().get();
sum += one;
}
context.write(key, new IntWritable(sum));
}
}
测试代码类:
Java代码
public class WordCountMapperReducerTest {
@Test
public void processValidRecord() throws IOException, InterruptedException {
WordCountMapper mapper = new WordCountMapper();
Text value = new Text("hello");
org.apache.hadoop.mapreduce.Mapper.Context context = mock(Context.class);
mapper.map(null, value, context);
verify(context).write(new Text("hello"), new IntWritable(1));
}
@Test
public void processResult() throws IOException, InterruptedException {
WordCountReducer reducer = new WordCountReducer();
Text key = new Text("hello");
// {"hello",[1,1,2]}
Iterable<IntWritable> values = Arrays.asList(new IntWritable(1),new IntWritable(1),new IntWritable(2));
org.apache.hadoop.mapreduce.Reducer.Context context = mock(org.apache.hadoop.mapreduce.Reducer.Context.class);
reducer.reduce(key, values, context);
verify(context).write(key, new IntWritable(4)); // {"hello",4}
}
}
具体就是给map函数传入一行数据-"hello"
map函数对数据进行处理,输出{"hello",0}
reduce函数接受map函数的输出数据,对相同key的值求和,并输出。
下面是一个简单的WordCount例子:(使用的是新API)
在开始之前,需要导入以下包:
1.Hadoop安装目录下和lib目录下的所有jar包。
2.JUnit4
3.Mockito
map函数:
Java代码
public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private static final IntWritable one = new IntWritable(1);
private Text word = new Text();
@Override
protected void map(LongWritable key, Text value,Context context)
throws IOException, InterruptedException {
String line = value.toString(); // 该行的内容
String[] words = line.split(";"); // 解析该行的单词
for(String w : words) {
word.set(w);
context.write(word,one);
}
}
}
reduce函数:
Java代码
public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
@Override
protected void reduce(Text key, Iterable<IntWritable> values,Context context)
throws IOException, InterruptedException {
int sum = 0;
Iterator<IntWritable> iterator = values.iterator(); // key相同的值集合
while(iterator.hasNext()) {
int one = iterator.next().get();
sum += one;
}
context.write(key, new IntWritable(sum));
}
}
测试代码类:
Java代码
public class WordCountMapperReducerTest {
@Test
public void processValidRecord() throws IOException, InterruptedException {
WordCountMapper mapper = new WordCountMapper();
Text value = new Text("hello");
org.apache.hadoop.mapreduce.Mapper.Context context = mock(Context.class);
mapper.map(null, value, context);
verify(context).write(new Text("hello"), new IntWritable(1));
}
@Test
public void processResult() throws IOException, InterruptedException {
WordCountReducer reducer = new WordCountReducer();
Text key = new Text("hello");
// {"hello",[1,1,2]}
Iterable<IntWritable> values = Arrays.asList(new IntWritable(1),new IntWritable(1),new IntWritable(2));
org.apache.hadoop.mapreduce.Reducer.Context context = mock(org.apache.hadoop.mapreduce.Reducer.Context.class);
reducer.reduce(key, values, context);
verify(context).write(key, new IntWritable(4)); // {"hello",4}
}
}
具体就是给map函数传入一行数据-"hello"
map函数对数据进行处理,输出{"hello",0}
reduce函数接受map函数的输出数据,对相同key的值求和,并输出。
相关文章推荐
- MapReduce UnitTest
- MapReduce Unit Test
- Android测试系列之Instrumented Unit Test-UiAutomator
- Unit Test Your .NET Data Access Layer
- unittest测试用例带有setUp、两个测试函数操作实例
- unittest之suite测试集(测试套件)
- 写一个程序,用于分析一个字符串中各个单词出现的频率,并将单词和它出现的频率输出显示。(单词之间用空格隔开,如“Hello World My First Unit Test”)
- 深入解读Python的unittest并拓展HTMLTestRunner
- python学习笔记(excel+unittest)
- 使用 Visual Studio 2005 Team System 进行单元测试并生成用于 Unit Test Framework 的源代码
- python unit test to assert a method calls sys.exit()
- Python单元测试框架unittest
- [zz]How to produce html unit test output in Python?
- c语言实现一个单元测试框架(Unit Test Framework)代码
- Constructing the SUT (System Under Test) – Eradicating Brittle Unit Tests
- seleniumIDE + unittest
- 应付Django 1.3.1 Unit test的一个Bug
- Python unittest excel数据驱动
- phpunit api PHPUnit_Extensions_OutputTestCase
- Python中unittest采用不同的参数组合产生独立的test case