您的位置:首页 > 运维架构

Hadoop中的一些自定义

2015-03-27 10:49 141 查看
自定义计数器
计数器用来监控,hadoop中job的运行进度和状态。
如源文件内容为:
a b
c d e f
g h i
现在需要找出字段数大于3和小于3的记录条数,可以使用计数器来实现,代码如下:

public void map(LongWritable key, Text value,
OutputCollector<Text, Text> output, Reporter reporter)
throws IOException {
String[] split = value.toString().split("\t");
if(split.length>3){
org.apache.hadoop.mapred.Counters.Counter counter = reporter.getCounter("MyCounter", "isLong");
counter.increment(1);
}else if(split.length<3){
org.apache.hadoop.mapred.Counters.Counter counter = reporter.getCounter("MyCounter","isShort");
counter.increment(1);
}
2. hadoop中的自定义数据类型
hadoop中默认的数据类型有:
BooleanWritable:标准布尔型数值

ByteWritable:单字节数值

DoubleWritable:双字节数值

FloatWritable:浮点数

IntWritable:整型数

LongWritable:长整型数

Text:使用UTF8格式存储的文本

NullWritable:当<key, value>中的key或value为空时使用

自定义数据类型的实现:
1.实现Writable接口,并重写内部write()和readFields()方法,从而完成序列化之后的网络传输和文件输入或输出。
2.如果该数据类型被作为mapreduce中的key,则该key需要为可比较的,需要实现WriableComparable接口,并重写内部write()和readFields()、compare()方法。

代码如下:

代码一:

public class Person implements Writable{
long id;
String name;
long age;
@Override
public void readFields(DataInput in) throws IOException {
this.id = in.readLong();
this.name = in.readUTF();
this.age = in.readLong();
}

@Override
public void write(DataOutput out) throws IOException {
out.writeLong(id);
out.writeUTF(name);
out.writeLong(age);
}

@Override
public String toString() {
return "id:"+id+" name:"+name+" age:"+age;
}
public long getId() {
return id;
}

public String getName() {
return name;
}

public long getAge() {
return age;
}

}
代码二:基于key的比较

package cn.com.bonc.hadoop;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

import org.apache.hadoop.io.WritableComparable;

public class PersonSortByAge implements WritableComparable<PersonSortByAge>{

long id;
String name;
long age;

@Override
public void readFields(DataInput in) throws IOException {
in.readLong();
in.readUTF();
in.readLong();
}

@Override
public void write(DataOutput out) throws IOException {
out.writeLong(id);
out.writeUTF(name);
out.writeLong(age);
}

@Override
public int compareTo(PersonSortByAge o) {

return (int) (this.id - o.id);
}

@Override
public String toString() {
return "id:"+id+" name:"+name+" age:"+age;
}
}
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: