您的位置:首页 > 编程语言

Hive的UDAF编程:计算几何平均值

2015-03-23 22:43 260 查看
(1)eclipse上创建Map/Reduce工程,命名为GeoMeanPro,在创建前,先把hive/lib目录下的jar包复制到hadoop/lib目录下面;

(2)在创建的工程上添加class,新建包com.hive.geomean.udaf,并在包下建立GeoMean.java;

(3)GeoMean.java代码为:

package com.hive.geomean.udaf;

import org.apache.hadoop.hive.ql.exec.UDAF;

import org.apache.hadoop.hive.ql.exec.UDAFEvaluator;

import org.apache.hadoop.io.IntWritable;

public class GeoMean extends UDAF {



public static class GeoMeanUDAFEval implements UDAFEvaluator {

public static class PartialResult {

double sum;

long count;

}

private PartialResult pResult;

@Override

public void init() {

pResult = null;

}

//参数的入口函数

public boolean iterate(IntWritable value) {

if (value == null) {

return true;

}

if (pResult == null) {

pResult = new PartialResult();

pResult.sum = 1;

pResult.count = 0;

}

pResult.sum *= value.get();

pResult.count++;

return true;

}

public PartialResult terminatePartial() {

return pResult;

}

public boolean merge(PartialResult other) {

if (other == null) {

return true;

}

if (pResult == null) {

pResult = new PartialResult();

pResult.sum = 1;

pResult.count = 0;

}

pResult.sum *= other.sum;

pResult.count +=other.count;

return true;

}

public Double terminate() {

if (pResult == null) {

return null;

}

return new Double (Math.pow(pResult.sum, 1.0/pResult.count));

}

}

}

(4)将工程export出jar包,并命名为geomean.jar,然后上传到/home/hadoop/class目录下:

(5)Hive的UDAF使用方法如下:

hive> add jar /home/hadoop/class/geomean.jar;

Added /home/hadoop/class/geomean.jar to class path

Added resource: /home/hadoop/class/geomean.jar

hive> create temporary function geomean as 'com.hive.geomean.udaf.GeoMean';

OK

Time taken: 0.038 seconds



hive> select * from grade;

OK

1 90

2 80

3 70

Time taken: 0.112 seconds



hive> select geomean (grade) from grade;

Total MapReduce jobs = 1

Launching Job 1 out of 1

Number of reduce tasks determined at compile time: 1

In order to change the average load for a reducer (in bytes):

set hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum number of reducers:

set hive.exec.reducers.max=<number>

In order to set a constant number of reducers:

set mapred.reduce.tasks=<number>

Starting Job = job_201503221120_0057, Tracking URL = http://Masterpc.hadoop:50030/jobdetails.jsp?jobid=job_201503221120_0057
Kill Command = /usr/hadoop/libexec/../bin/hadoop job -kill job_201503221120_0057

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1

2015-03-23 22:36:57,988 Stage-1 map = 0%, reduce = 0%

2015-03-23 22:37:04,042 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.39 sec

2015-03-23 22:37:05,063 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.39 sec

。。。。。。

2015-03-23 22:37:22,264 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.87 sec

MapReduce Total cumulative CPU time: 3 seconds 870 msec

Ended Job = job_201503221120_0057

MapReduce Jobs Launched:

Job 0: Map: 1 Reduce: 1 Cumulative CPU: 3.87 sec HDFS Read: 228 HDFS Write: 18 SUCCESS

Total MapReduce CPU Time Spent: 3 seconds 870 msec

OK

79.58114415792782

Time taken: 44.677 seconds

hive>
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: