您的位置：首页 > 其它

Hive动态分区详解

2014-05-27 12:27 459 查看

设置如下参数开启动态分区：
hive.exec.dynamic.partition=true
默认值：false
描述：是否允许动态分区
hive.exec.dynamic.partition.mode=nonstrict
默认值：strict
描述：strict是避免全分区字段是动态的，必须有至少一个分区字段是指定有值的
设置如下参数配置动态分区的使用环境：
hive.exec.max.dynamic.partitions.pernode=100
默认值：100
描述：each mapper or reducer可以创建的最大动态分区数
hive.exec.max.dynamic.partitions=1000
默认值：1000
描述：一个DML操作可以创建的最大动态分区数
hive.exec.max.created.files=100000
默认值：100000
描述：一个DML操作可以创建的文件数
设置如下参数取消一些限制(HIVE 0.7后没有此限制)：
hive.merge.mapfiles=false
默认值：true
描述：是否合并Map的输出文件
hive.merge.mapredfiles=false
默认值：false
描述：是否合并Reduce的输出文件
案例

create table if not exists j_web
(
time_stamp string,
active_type string,
num string
)
row format delimited fields terminated by '|'
stored as textfile;

将以下数据加载到j_web表中：load data local inpath '/root/j_web' into table j_web
时间戳登录方式标示数
20140220_111|web|1
20140220_222|web|2
20140221_111|web|1
20140221_222|web|2
20140222_111|web|1
20140222_222|web|2

create table if not exists j_test
(
time_stamp string,
active_type string,
num string
)
partitioned by (log_day string)
row format delimited fields terminated by '|'
stored as textfile;
传统方式分区：
insert overwrite table j_test
partition(log_day=20140220)
select time_stamp,active_type,num
from j_web
where time_stamp like '20140220%';
insert overwrite table j_test
partition(log_day=20140221)
select time_stamp,active_type,num
from j_web
where time_stamp like '20140221%';
insert overwrite table j_test
partition(log_day=20140222)
select time_stamp,active_type,num
from j_web
where time_stamp like '20140222%';
动态分区：
insert overwrite table j_test
partition(log_day)
select t.*,substring(t.time_stamp,1,8) as log_day
from (
select
time_stamp,
active_type,
num
from j_web
) t;
insert overwrite table j_test
partition(log_day)
select time_stamp,
active_type,
num,
substring(t.time_stamp,1,8) as log_day
from j_web;

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航