您的位置:首页 > 数据库

DW 星形联接SQL调优一则

2015-04-20 17:28 120 查看
下面这个SQL,优化前执行时间8.5分钟,优化后执行时间0.8秒,性能提高638倍:

SELECT 

SUM(t623064.mtd_quantity) AS c1,

t622947.year_name AS c2

FROM dw.dim_caliber_d         t621885 /* AD Dim11 Caliber */,

dw.dim_manage_org_v      t621142 /* AD Dim03 Manager Org */,

dw.dim_account_v         t621070 /* AD Dim02 Account */,

dw.dim_period_v          t622947 /* AD Dim22 Year */,

dw.dm_fin_om_sales_pln_a t623064 /* AD Fct022 Fin Om Sales Year Pln  */

WHERE t621070.account_wid = t623064.account_wid

AND t621070.language_code = 'ZHS'

AND t621070.account_category = 'Q'

AND t621142.bd_short_name = '家用'

AND t621142.bg_code = 'M2'

AND t621142.language_code = 'ZHS'

AND t621142.manage_org_wid = t623064.manage_org_wid

AND t621885.caliber_wid = t623064.caliber_wid

AND t622947.language_code = 'ZHS'

AND t622947.period_type = 'Y'

AND t622947.period_wid = t623064.period_wid

AND (t621070.account_code IN ('FIN0201', 'FIN070104', 'FIN070106', 'MFG0101'))

AND (t621885.caliber_code IN ('0', '10', '40', '60'))

AND t621142.bd_code <> 'M100'

GROUP BY t622947.year_name;

这是一个比较典型的BI星形联接进行统计分析的SQL,dw.dm_fin_om_sales_pln_a表是fact表(300万条数据),其他几个以DIM开头的表(视图)是维度表(几百到几万条数据),执行计划如下:



其中,ID=26的索引DM_FIN_OM_SALES_PLN_A_N5是创建在MANAGE_ORG_WID字段上的单字段索引。
从这个执行计划看,应该不需要8分钟这么多的时间(估算时间是7秒)。但是通过sql monitor显示的实际数据来看,是优化器对各维度表使用谓词之后返回的结果估值偏小,导致最后的笛卡尔join返回了一个巨大的值986M:



从sqlhc收集的信息显示,上面用到的多个表的最后统计信息收集时间是在一两年前(而且经过count(*)确认,Num Rows列显示的记录数基本与实际记录数一致):



看到这里,很多人首先想到的是要对这些表重新收集统计信息,让优化器能够正确的评估行源(系统开启了自动收集直方图,各维度表相关字段都有直方图信息)。

但是,sqlhc同时又告诉我们,统计信息并没有问题(系统开启了自动统计信息收集,记录改变超过10%会自动重新收集) Stale Stats=NO:



针对这种典型的星形联接统计分析的BI SQL,我使用了下面两种方法进行优化:

1、SQL改写,将普通的join改为in的semi-join(前提是各维度表的XXX_id字段是唯一的,这样的改写结果才是等价的):

SELECT SUM(t623064.mtd_quantity) AS c1,

t622947.year_name AS c2

FROM dw.dim_period_v          t622947 /* AD Dim22 Year */,

dw.dm_fin_om_sales_pln_a t623064 /* AD Fct022 Fin Om Sales Year Pln  */

where t622947.period_wid = t623064.period_wid AND t622947.language_code = 'ZHS' AND t622947.period_type = 'Y' 

and t623064.account_wid in (select account_wid from dw.dim_account_v t621070 where  t621070.language_code = 'ZHS' AND t621070.account_category = 'Q' and t621070.account_code IN ('FIN0201', 'FIN070104', 'FIN070106', 'MFG0101') )

and t623064.manage_org_wid in (select manage_org_wid from dw.dim_manage_org_v t621142 where t621142.bd_short_name = '家用' AND t621142.bg_code = 'M2' AND t621142.language_code = 'ZHS' AND t621142.bd_code <> 'M100')

and t623064.caliber_wid in (select caliber_wid from dw.dim_caliber_d t621885 where t621885.caliber_code IN ('0', '10', '40', '60'))

group by t622947.year_name;

为什么dw.dim_period_v没有使用in,那是因为最后统计结果需要用到这个表除了ID之外的其他字段,不能用IN。

2、使用Hint:

SELECT /*+ STAR_TRANSFORMATION opt_param('star_transformation_enabled' 'true') */

SUM(t623064.mtd_quantity) AS c1,

t622947.year_name AS c2

FROM dw.dim_caliber_d         t621885 /* AD Dim11 Caliber */,

dw.dim_manage_org_v      t621142 /* AD Dim03 Manager Org */,

dw.dim_account_v         t621070 /* AD Dim02 Account */,

dw.dim_period_v          t622947 /* AD Dim22 Year */,

dw.dm_fin_om_sales_pln_a t623064 /* AD Fct022 Fin Om Sales Year Pln  */

WHERE t621070.account_wid = t623064.account_wid

AND t621070.language_code = 'ZHS'

AND t621070.account_category = 'Q'

AND t621142.bd_short_name = '家用'

AND t621142.bg_code = 'M2'

AND t621142.language_code = 'ZHS'

AND t621142.manage_org_wid = t623064.manage_org_wid

AND t621885.caliber_wid = t623064.caliber_wid

AND t622947.language_code = 'ZHS'

AND t622947.period_type = 'Y'

AND t622947.period_wid = t623064.period_wid

AND (t621070.account_code IN ('FIN0201', 'FIN070104', 'FIN070106', 'MFG0101'))

AND (t621885.caliber_code IN ('0', '10', '40', '60'))

AND t621142.bd_code <> 'M100'

GROUP BY t622947.year_name;

上面两个SQL的执行时间不到1秒(0.8s),执行计划都是一样的:



执行计划显示有Bitmap Index字样,实际上各表都没有创建bitmap index,这个是由这个隐含参数控制的:_b_tree_bitmap_plans,这个参数默认是True,很多情况下这个参数的问题比较多,很多系统都把这个参数改成了False,这星形联接的情况下,这个参数还是比较有用的。

针对这种星形联接的SQL,还有其他多种优化方法,比如创建事实表上的联合索引、创建位图联接索引等,前者组合较多,效果一般;后者改动比较复杂,性价比不高,这里不再赘述。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: