Parquet的timestamp类型转为long
2016-05-20 17:05
344 查看
由于sparksql使用parquet文件存储数据时,有个默认参数
使用的时候
spark.sql.parquet.int96AsTimestamp为true,即为保持精度,使用96位的int来存储Timestamp类型,这就为使用java代码解析Timestamp类型带来了麻烦,下面为转换工具类(代码为从github上找的,不是自己写的)。
package com.test.util; import java.util.concurrent.TimeUnit; import org.apache.parquet.io.api.Binary; import com.google.common.primitives.Ints; import com.google.common.primitives.Longs; public class ParquetTimestampUtils { /** * julian date的偏移量,2440588相当于1970/1/1 */ private static final int JULIAN_EPOCH_OFFSET_DAYS = 2440588; private static final long MILLIS_IN_DAY = TimeUnit.DAYS.toMillis(1); private static final long NANOS_PER_MILLISECOND = TimeUnit.MILLISECONDS.toNanos(1); private ParquetTimestampUtils() {} /** * Returns GMT timestamp from binary encoded parquet timestamp (12 bytes - julian date + time of day nanos). * * @param timestampBinary INT96 parquet timestamp * @return timestamp in millis, GMT timezone */ public static long getTimestampMillis(Binary timestampBinary) { if (timestampBinary.length() != 12) { return 0; // throw new PrestoException(HIVE_BAD_DATA, "Parquet timestamp must be 12 bytes, actual " + timestampBinary.length()); } byte[] bytes = timestampBinary.getBytes(); // little endian encoding - need to invert byte order long timeOfDayNanos = Longs.fromBytes(bytes[7], bytes[6], bytes[5], bytes[4], bytes[3], bytes[2], bytes[1], bytes[0]); int julianDay = Ints.fromBytes(bytes[11], bytes[10], bytes[9], bytes[8]); return julianDayToMillis(julianDay) + (timeOfDayNanos / NANOS_PER_MILLISECOND); } private static long julianDayToMillis(int julianDay) { return (julianDay - JULIAN_EPOCH_OFFSET_DAYS) * MILLIS_IN_DAY; } }
使用的时候
long date = ParquetTimestampUtils.getTimestampMillis(record.getInt96("CREATE_TIME", 0));
相关文章推荐
- ugui制作弧形血条
- iOS开发 UITableView 常用细节
- UIScrollView的基本设置
- Vue.js-----轻量高效的MVVM框架(五、计算属性)
- Angular2 and Electron - The definitive guide
- ios学习--UIStepper
- iOS UIScrollView的使用
- UE4 ‘Unrecognized type 'TMap'’问题
- iOS学习之—— UICollectionViewFlowLayout
- AndroidStudio导入项目一直卡在Building gradle project info最快速解决方案
- 一套完整的前台页面增删改查以及js(easyui)
- 01.手把手教你 .Net EasyUI DataGrid(创建数据表格)
- ElasticSearch : IN equivalent operator in ElasticSearch
- easyui 在日期不满足要求的情况下,让修改链接不可点,或者修改消失
- 键盘只能输入数字(或其它特定的规定字符)
- UITableView不掉用tableView: cellForRowAtIndexPath方法
- 高质量的UI组件下载地址暂存
- Android中用到UI控件
- Java字符串处理String、StringBuilder、StringBuffer类效率分析
- OncePerRequestFilter的作用