Hive UDF编程
- 编写一个类 继承 org.apache.hadoop.hive.ql.exec.UDF
在该类中加入 evaluate 方法
"evaluate" should never be a void method. However it can return "null" if * needed.
public class UDFLastDay extends UDF{ private final SimpleDateFormat inputFormatter = new SimpleDateFormat("yyyy-MM-dd"); private final SimpleDateFormat outFormatter = new SimpleDateFormat("yyyy-MM-dd"); private final Calendar calendar = Calendar.getInstance(); Text result = new Text(); // 2015-03-01 ==> 2015-03-31 public Text evaluate(Text input) { if(null == input || StringUtils.isBlank(input.toString())) { return null; } try { calendar.setTime(inputFormatter.parse(input.toString())); int lastDate = calendar.getActualMaximum(Calendar.DATE); //获得到月份最大的天数 calendar.set(Calendar.DATE, lastDate); result.set(outFormatter.format(calendar.getTime())); return result; } catch (ParseException e) { e.printStackTrace(); return null; } } }
- 打包放到 linux 某个目录下 例如: /home/hadoop/software/lib/udf.jar
- 如何将UDF加入到hive中使用?
方式一:(当前session有效)
add jar /home/hadoop/software/lib/udf.jar ;
create temporary function getLastDay as 'com.cloudyhadoop.bigdata.udf.UDFLastDay';
show functions;
select empno, ename, hiredate, getLastDay(hiredate) last_day from emp;
方式二:(全局有效)
hive-site.xml中添加如下配置信息:
<property>
<name>hive.aux.jars.path</name>
<value>file:///home/hadoop/software/lib/udf.jar</value>
</property>
启动hive之后,就不需要再:add jar /home/hadoop/software/lib/udf.jar ;
create temporary function getLastDay as 'com.cloudyhadoop.bigdata.udf.UDFLastDay';
temporary: current session, 退出或者重启之后函数丢失
如何做到全局有效?
1、https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/DropFunction
CREATE FUNCTION [db_name.]function_name AS class_name
[USING JAR|FILE|ARCHIVE 'file_uri' [, JAR|FILE|ARCHIVE 'file_uri'] ];
2、修改源代码
https://github.com/cloudera/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
registerUDF("getLastDay", UDFLastDay.class, false);
重新编译、部署
- Hive 编程之DDL、DML、UDF、Select总结
- Hive 编程之DDL、DML、UDF、Select总结
- Hive 编程之DDL、DML、UDF、Select总结
- Hive 编程之DDL、DML、UDF、Select总结
- HIVE中的UDF编程
- Hive 编程之DDL、DML、UDF、Select总结
- Hive 编程之DDL、DML、UDF、Select总结
- Hive的UDF编程
- Hive 编程之DDL、DML、UDF、Select总结
- Hive自带Function使用及UDF编程
- Hive 编程之DDL、DML、UDF、Select总结
- hive之UDF编程
- Hive中自带Funcion以及UDF编程
- 2、Hive UDF编程实例
- Hive 编程之DDL、DML、UDF、Select总结
- Hive 编程之DDL、DML、UDF、Select总结
- hive入门UDF之星座计算(根据hive编程指南)
- Hive中UDF编程
- Hive 编程之DDL、DML、UDF、Select总结
- Hive UDF 编程