您的位置：首页 > 其它

hive并行执行job

2012-12-28 17:21 423 查看

用过oracle rac的应该都知道parallel的用途。

并行执行的确可以大的加快任务的执行速率，但不会减少其占用的资源。

在hive中也有并行执行的选项。

set hive.exec.parallel=true; //打开任务并行执行

set hive.exec.parallel.thread.number=16; //同一个sql允许最大并行度，默认为8。

对于同一个SQL产生的JOB,如果不存在依赖的情况下，将会并行启动JOB，

比如：

Sql代码

from (

select phone,to_phone, substr(to_phone,-1) as key

from youni_contact4_lxw

where youni_id='1'

and length(to_phone) = 11

and  substr(to_phone,1,2) IN ('13','14','15','18')

group by phone,to_phone, substr(to_phone,-1)

) t

insert overwrite table youni_contact41_lxw partition(pt='0')

select phone,to_phone where key='0'

insert overwrite table youni_contact41_lxw partition(pt='1')

select phone,to_phone where key='1'

insert overwrite table youni_contact41_lxw partition(pt='2')

select phone,to_phone where key='2'

insert overwrite table youni_contact41_lxw partition(pt='3')

select phone,to_phone where key='3'

insert overwrite table youni_contact41_lxw partition(pt='4')

select phone,to_phone where key='4'

insert overwrite table youni_contact41_lxw partition(pt='5')

select phone,to_phone where key='5'

insert overwrite table youni_contact41_lxw partition(pt='6')

select phone,to_phone where key='6'

insert overwrite table youni_contact41_lxw partition(pt='7')

select phone,to_phone where key='7'

insert overwrite table youni_contact41_lxw partition(pt='8')

select phone,to_phone where key='8'

insert overwrite table youni_contact41_lxw partition(pt='9')

select phone,to_phone where key='9';

该SQL产生11个job，第一个job为生成临时表的job，后续job都依赖它，这时不会有并行启动，

第一个job完成后，后续的job都会并行启动。

运行时间比较：

不启用并行：35分钟

启用8个并行：10分钟

启用16个并行：6分钟

当然，得是在系统资源比较空闲的时候才有优势，否则，没资源，并行也起不来。
http://superlxw1234.iteye.com/blog/1703713

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航