您的位置:首页 > 其它

hive参数调节

2015-12-02 07:39 330 查看
一.优化切入后

session作用域

set mapred.job.priority

set mapred.job.priority=VERY_HIGH

整体map执行90%,才会启动reduced

MR中间压缩

set hive.exec.compress.intermediate=true;   hive开启压缩

set mapred.compress.map.output=true;     hadoop开启压缩

set mapred.compress.map.output.codec=LzoCodec;   压缩方式是Lzo

set mapred.compress.map.output.type=BLOCK;(Default)   压缩方式是块级压缩/还有行级压缩。生成一个块压缩一次

切分算法

goalSize=totalSize/ mapred.map.tasks

splitSize=max(minSplitSize,min(goalSize,dfs.block.size))

参数设置

set mapred.min.split.size=32000000;

set mapred.map.tasks=20;

文件块默认是128M,假设有4G的文件

文件切分算法

goalSize=4G/20=200M

splitSize=max(minSplitSize,min(goalSize,dfs.block.size))

            =max(32M,min(200M,128M))=128M

REDUCE数量

动态reduce数量

set hive.exec.reducers.bytes.per.reducer=200000000;       每个reducce执行是200M

set hive.exec.reduceers.max=1500;            最多有1500个reduce

静态reduce数

set mapred.reduce.tasks=900;    最好不要写死

输入小文件合并参数

一个文件只有几KB,,,有10个文件的话,就要有10job,,合并之后,一台资源就可以跑完,而且还省去了其他job的初始化时间

set mapred.max.split.size=256000000;    设置最大切分大小是256M

set mapred.min.split.size.per.node=100000000;    本节点最小的切分大小是100M

set mapred.min.split.size.per.rack=100000000;     本机架最小的切分大小是100M(网络成本,设置了100M之后,本节点80M的文件就不需要去跨节点去操作,节省网络成本)

set hive.input.format=CombineHiveInputFormat;(Default)   输入格式,合并hive的输入

输出小文件合并参数

set mapred.min.split.size=256000000;   文件块最小切分大小

set hive.merge.mapredfiles=true;   reduce输出是否合并

set hive.merge.smallfiles.avgsize=100000000;    reduce输出的大小除以reduce的个数,平均大小不超过100M的话,才会出发合并,按256M合并

set mapred.combine.input.format.local.only=false;  是否允许本地合并,不只是支持本地合并,要跨机器合并

并行hql,简单的hql没用的,,必须是有很多job的时候

set hive.exec.parallel=true;

set hive.exec.parallel.thread.number=8;(Default)   

关于数据倾斜的

set hive.map.aggr=true;(Default)   在map端开启聚合,在map端先聚合,减少网络传输,减少数据倾斜的发生

set hive.groupby.mapaggr.checkinterval=100000;   在在map端开启聚合的时候,出现10W条的时候,才总体聚合

set hive.groupby.skewindata=true;   机械的处理负载不均衡,。就是多一轮job(先进行小局部的聚合,再进行小局部的觉果聚合 )

rand()处理数据倾斜

给倾斜字段增加随机值,将其随机分配到不同节点

设置map与reduce并行数

set mapred.job.map.capacity=2000;   不设置的话,会抢资源

set mapred.job.reduce.capacity=2000;

MapJoin自动转换 

将小表放在左边,可以直接把他放到内存里面计算

set hive.auto.convert.join=true;  开启自动转换

set hive.mapjoin.smalltable.filesize   当小表小于25M的时候,自动转换

重复数据比较小的,也可以放在左边,对遍历更加有利

sort by

进入reduce之前,指定的字段进行排序  ,局部排序比海量排序要快很多

而order by 是对全局进行排序,只能在一个reduce中做

distribute by

和partition同样的效果

cluster by 

是sort by和distribute by的合体

join   连上的才会输出

left outer join   不管连没连上,左表都会输出
left semi join    a.id在B表中出现了,就把他拿出来,没有出现的就不要(效率更高,遍历到了,就不再继续)

1.No animal experiment,[ɪk'sperɪmənt]

When we were on the operating['ɒpəreɪtɪŋ]  table.Human is the object of the experiment

If we can choose,the little mouse, or your family.How would you chooseI think everyone will choose the former

2.Animals are more suitable for the human than the computer model

People have tried to build a
mouse brain model to make it run in 10 seconds.

And found that 10 times slower than the real mouse brain.[breɪn]

3.Human trials ['traɪəl] violate ['vaɪəlet] ethics  ['eθɪks] 

Only after the animal test, can we carry out the human test.

We don't want this to happen

4.The life cycle ['saɪkl]  of the experimental animals is often very short.

Most of the animals (such as mice, frogs, etc.) are short life cycle and the ability to reproduce [,riprə'dus] 

We will soon see the test results and Save more patients 
[riːprə'djuːs]

To know the impact of on our lives and on our children's lives

2003['θaʊznd], SARS Animal test

Doctors choose 18 [,e'tin]  healthy ['hɛlθi]  monkeys as an test model,Eventually[ɪ'ventʃʊəlɪ]
developed[dɪ'veləpt] a SARS vaccine [væk'sin].

The treatment ['tritmənt]  of diabetes [,daɪə'bitiz] 
, insulin ['ɪnsəlɪn], is the test of the dog.

Children's paralysis [pə'ræləsɪs] , is in the monkey who got this disease [dɪ'ziz]  do research

Kidney ['kɪdnɪ]  transplant  [træns'plænt] surgery ['sɝdʒəri] , the first object is a dog

And leukemia [lʊ'kimɪə]  , antibiotics [,æntɪbaɪ'ɑtɪks] 
, anesthetic [,ænɪs'θɛtɪk], and so on.

Without these tests, there is no birth of these drugs[drʌgz].

People treat [trit] animals as food,

We let the animals give us a guard [ɡɑrd]
.

We let the animals comfort ['kʌmfɚt] the patient ['peʃnt],

We have a dog looking for a bomb in a very bad environment
[ɪn'vaɪrənmənt].

And animal tests are just one of them.

are you  dare [deə]  to eat without the animal experiments.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: