您的位置:首页 > 数据库

高效SQL语句必杀技(下篇)

2014-11-25 18:12 246 查看
二、合理使用索引以提高性能

索引依赖于表而存在,是真实表的一个缩影,类似于一本书的目录,通过目录以更快获得所需的结果。

Oracle使用了一个复杂的自平衡B树数据结构。即任意记录的DML操作将打破索引的平衡,而定期重构索引使得索引重新获得平衡。

通常,通过索引查找数据比全表扫描更高效。

任意的DQL或DML操作,SQL优化引擎优先使用索引来计算当前操作的成本以生成最佳的执行计划。

一旦使用索引超出参数optimizer_index_cost_adj设定的值才使用全表扫描。

同样对于多表连接使用索引也可以提高效率。同时索引也提供主键(primary key)的唯一性验证。

除了那些LONG或LONG RAW数据类型,你可以索引几乎所有的列.

通常,在大型表中使用索引特别有效.当然,你也会发现,在扫描小表时,使用索引同样能提高效率。

虽然使用索引能得到查询效率的提高,但是索引需要空间来存储,需要定期维护.尤其是在有大量DML操作的表上,任意的DML操作都将引起索

引的变更。这意味着每条记录的INSERT,DELETE,UPDATE将为此多付出4 , 5 次的磁盘I/O .

索引需要额外的存储空间和处理时间,那些不必要的索引反而会使查询反应时间变慢。

DML操作使索引上存在碎片而失去高度均衡,因此定期的重构索引是有必要的.

1)
避免基于索引列的计算

where
子句中的谓词上存在索引,而此时基于该列的计算将使得索引失效

-->低效:

SELECT employee_id,
first_name

FROM
employees

WHERE employee_id 10 >
150;
-->索引列上使用了计算,因此索引失效,走全表扫描方式

-->高效:

SELECT employee_id,
first_name

FROM
employees

WHERE employee_id >
160;
-->走索引范围扫描方式

例外情形

上述规则不适用于SQL中的MIN和MAX函数

hr@CNMMBO> SELECT MAX( employee_id )
max_id

2
FROM
employees

3 WHERE
employee_id

4
10 >
150;

1 row
selected.

Execution
Plan

----------------------------------------------------------

Plan hash value:
1481384439

---------------------------------------------------------------------------------------------

| Id |
Operation
|
Name
| Rows | Bytes | Cost (%CPU)|
Time
|

---------------------------------------------------------------------------------------------

| 0 | SELECT
STATEMENT
|
|
1
|
4
|
1 (0)| 00:00:01
|

| 1 | SORT
AGGREGATE
|
|
1
|
4
|
|
|

| 2
| FIRST
ROW
|
|
5 | 20
|
1 (0)| 00:00:01
|

|* 3
| INDEX FULL
SCAN (MIN/MAX)| EMP_EMP_ID_PK
|
5 | 20
|
1 (0)| 00:00:01
|

---------------------------------------------------------------------------------------------

2)
避免在索引列上使用NOT运算或不等于运算(<>,!=)

通常,我们要避免在索引列上使用NOT或<>,两者会产生和在索引列上使用函数相同的影响。

当 ORACLE遇到NOT或不等运算时,他就会停止使用索引转而执行全表扫
描。

-->低效:

SELECT
*

FROM
emp

WHERE NOT ( deptno = 20
); -->实际上NOT (
deptno = 20 )等同于deptno <> 20,即deptno
<>20同样会限制索引

-->高效:

SELECT
*

FROM
emp

WHERE deptno > 20 OR deptno
<
20;

-->尽管此方式可以实现上述结果,但依然走全表扫描,如果是单纯的 > 或
<
运算,则此时为索引范围扫描

需要注意的是,在某些时候,
ORACLE优化器会自动将NOT转化成相对应的关系操作符

其次如果是下列运算符进行NOT运算,依然有可能选择走索引, 仅仅除了NOT = 之外,因为 NOT = 等价于
<>

“NOT >” to
<=

“NOT >=” to
<

“NOT <” to
>=

“NOT <=” to
>

来看一个实际的例子

hr@CNMMBO> SELECT
*

2
FROM
employees

3 where not
employee_id<100;
-->索引列上使用了not,但是该查询返回了所有的记录,即107条,因此此时选择走全表扫描

107 rows
selected.

Execution
Plan

----------------------------------------------------------

Plan hash value:
1445457117

-------------------------------------------------------------------------------

| Id |
Operation
|
Name
| Rows | Bytes | Cost (%CPU)|
Time
|

-------------------------------------------------------------------------------

| 0 | SELECT
STATEMENT
|
| 107 | 7276
|
3 (0)| 00:00:01
|

|* 1 | TABLE ACCESS FULL|
EMPLOYEES | 107
| 7276
|
3 (0)| 00:00:01 |
-->执行计划中使用了走全表扫描方式

-------------------------------------------------------------------------------

Predicate Information (identified by operation
id):

---------------------------------------------------

1 -
filter("EMPLOYEE_ID">=100)
-->查看这里的谓词信息被自动转换为 >=
运算符

hr@CNMMBO> SELECT
*

2
FROM
employees

3 where not
employee_id<140;
-->此例与上面的语句相同,仅仅是查询范围不同返回67条记录,而此时选择了索引范围扫描

67 rows
selected.

Execution
Plan

----------------------------------------------------------

Plan hash value:
603312277

---------------------------------------------------------------------------------------------

| Id |
Operation
|
Name
| Rows | Bytes | Cost (%CPU)|
Time
|

---------------------------------------------------------------------------------------------

| 0 | SELECT
STATEMENT
|
| 68
| 4624
|
3 (0)| 00:00:01
|

| 1 | TABLE
ACCESS BY INDEX ROWID|
EMPLOYEES
| 68
| 4624
|
3 (0)| 00:00:01
|

|* 2 | INDEX
RANGE
SCAN
| EMP_EMP_ID_PK
| 68
|
|
1 (0)| 00:00:01 |
-->索引范围扫描方式

---------------------------------------------------------------------------------------------

Predicate Information (identified by operation
id):

---------------------------------------------------

2 -
access("EMPLOYEE_ID">=140)

3) 用UNION
替换OR(适用于索引列)

通常情况下,使用UNION
替换WHERE子句中的OR将会起到较好的效果.基于索引列使用OR使得优化器倾向于使用全表扫描,而不是扫描索引.

注意,以上规则仅适用于多个索引列有效。 如果有column没有被索引,
查询效率可能会因为你没有选择OR而降低。

-->低效:

SELECT deptno,
dname

FROM
dept

WHERE loc = 'DALLAS' OR deptno =
20;

-->高效:

SELECT deptno,
dname

FROM
dept

WHERE loc =
'DALLAS'

UNION

SELECT deptno,
dname

FROM
dept

WHERE deptno =
30

-->经测试,由于数据量较少,此时where子句中的谓词上都存在索引列时,两者性能相当.

-->假定where子句中存在两列

scott@CNMMBO> create table t6 as select
object_id,owner,object_name from dba_objects where owner='SYS' and
rownum<1001;

scott@CNMMBO> insert into t6 select
object_id,owner,object_name from dba_objects where owner='SCOTT'
and
rownum<6;

scott@CNMMBO> create index i_t6_object_id on
t6(object_id);

scott@CNMMBO> create index i_t6_owner on
t6(owner);

scott@CNMMBO> insert into t6 select
object_id,owner,object_name from dba_objects where owner='SYSTEM'
and
rownum<=300;

scott@CNMMBO>
commit;

scott@CNMMBO> exec
dbms_stats.gather_table_stats('SCOTT','T6',cascade=>true);

scott@CNMMBO> select owner,count(*) from t6 group by
owner;

OWNER
COUNT(*)

--------------------
----------

SCOTT
5

SYSTEM
300

SYS
1000

scott@CNMMBO> select * from t6 where owner='SCOTT'
and
rownum<2;

OBJECT_ID
OWNER
OBJECT_NAME

---------- --------------------
--------------------

69450
SCOTT
T_TEST

scott@CNMMBO> select * from t6 where object_id=69450
or owner='SYSTEM';

301 rows
selected.

Execution
Plan

----------------------------------------------------------

Plan hash value:
238853296

-----------------------------------------------------------------------------------------------

| Id |
Operation
|
Name
| Rows | Bytes | Cost (%CPU)|
Time
|

-----------------------------------------------------------------------------------------------

| 0 | SELECT
STATEMENT
|
| 300 | 7200
|
5 (0)| 00:00:01
|

| 1 |
CONCATENATION
|
|
|
|
|
|

| 2
| TABLE ACCESS BY INDEX ROWID|
T6
|
1 | 24
|
2 (0)| 00:00:01
|

|* 3
| INDEX RANGE
SCAN
| I_T6_OBJECT_ID
|
1
|
|
1 (0)| 00:00:01
|

|* 4 | TABLE
ACCESS BY INDEX ROWID|
T6
| 299 | 7176
|
3 (0)| 00:00:01
|

|* 5
| INDEX RANGE
SCAN
|
I_T6_OWNER
| 300
|
|
1 (0)| 00:00:01
|

-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation
id):

---------------------------------------------------

3 -
access("OBJECT_ID"=69450)

4 -
filter(LNNVL("OBJECT_ID"=69450))

5 -
access("OWNER"='SYSTEM')

Statistics

----------------------------------------------------------

0 recursive
calls

0 db block
gets

46 consistent
gets

0 physical
reads

0 redo
size

11383 bytes sent via SQL*Net to
client

712 bytes received via SQL*Net from
client

22 SQL*Net roundtrips to/from
client

0 sorts
(memory)

0 sorts
(disk)

301 rows
processed

scott@CNMMBO> select * from
t6

2 where
object_id=69450

3
union

4 select * from
t6

5 where
owner='SYSTEM';

301 rows
selected.

Execution
Plan

----------------------------------------------------------

Plan hash value:
370530636

------------------------------------------------------------------------------------------------

| Id |
Operation
|
Name
| Rows | Bytes | Cost (%CPU)|
Time
|

------------------------------------------------------------------------------------------------

| 0 | SELECT
STATEMENT
|
| 301 | 7224
|
7 (72)| 00:00:01
|

| 1 | SORT
UNIQUE
|
| 301 | 7224
|
7 (72)| 00:00:01
|

| 2
|
UNION-ALL
|
|
|
|
|
|

| 3
| TABLE
ACCESS BY INDEX ROWID|
T6
|
1 | 24
|
2 (0)| 00:00:01
|

|* 4
|
INDEX RANGE
SCAN
| I_T6_OBJECT_ID
|
1
|
|
1 (0)| 00:00:01
|

| 5
| TABLE
ACCESS BY INDEX ROWID|
T6
| 300 | 7200
|
3 (0)| 00:00:01
|

|* 6
|
INDEX RANGE
SCAN
|
I_T6_OWNER
| 300
|
|
1 (0)| 00:00:01
|

------------------------------------------------------------------------------------------------

Predicate Information (identified by operation
id):

---------------------------------------------------

4 -
access("OBJECT_ID"=69450)

6 -
access("OWNER"='SYSTEM')

Statistics

----------------------------------------------------------

1 recursive
calls

0 db block
gets

7 consistent
gets

0 physical
reads

0 redo
size

11383 bytes sent via SQL*Net to
client

712 bytes received via SQL*Net from client

22 SQL*Net roundtrips to/from
client

1 sorts
(memory)

0 sorts
(disk)

301 rows
processed

-->从上面的统计信息可知,consistent
gets由46下降为7,故当where子句中谓词上存在索引时,使用union替换or更高效

-->即使当列object_id与owner上不存在索引时,使用union仍然比or更高效(在Oracle
10g R2与Oracle 11g
R2测试)

4)
避免索引列上使用函数

-->下面是一个来自实际生产环境的例子

-->表acc_pos_int_tbl上business_date列存在索引,由于使用了SUBSTR函数,此时索引失效,使用全表扫描

SELECT
acc_num

,
curr_cd

, DECODE(
'20110728'

, ( SELECT TO_CHAR( LAST_DAY( TO_DATE( '20110728', 'YYYYMMDD' ) ),
'YYYYMMDD' ) FROM dual ),
0

,
adj_credit_int_lv1_amt

adj_credit_int_lv2_amt

-
adj_debit_int_lv1_amt

- adj_debit_int_lv2_amt
)

AS
interest

FROM
acc_pos_int_tbl

WHERE SUBSTR( business_date, 1, 6 ) = SUBSTR(
'20110728', 1, 6 ) AND business_date <=
'20110728';

-->改进的办法

SELECT
acc_num

,
curr_cd

, DECODE(
'20110728'

, ( SELECT TO_CHAR( LAST_DAY( TO_DATE( '20110728', 'YYYYMMDD' ) ),
'YYYYMMDD' ) FROM dual ), 0

,
adj_credit_int_lv1_amt

adj_credit_int_lv2_amt

-
adj_debit_int_lv1_amt

- adj_debit_int_lv2_amt
)

AS
interest

FROM
acc_pos_int_tbl

WHERE business_date >= TO_CHAR(
LAST_DAY( ADD_MONTHS( TO_DATE( '20110728', 'yyyymmdd' ), -1 )
)

1, 'yyyymmdd'
)

AND business_date <=
'20110728';

-->下面的例子虽然没有使用函数,但字符串连接同样导致索引失效

-->低效:

SELECT account_name,
amount

FROM
transaction

WHERE
account_name

|| account_type =
'AMEXA';

-->高效:

SELECT account_name,
amount

FROM
transaction

WHERE account_name = 'AMEX' AND account_type =
'A';

5) 比较不匹配的数据类型

-->低效:

SELECT
*

FROM
acc_pos_int_tbl

WHERE business_date =
20090201;

Execution
Plan

----------------------------------------------------------

Plan hash value:
2335235465

-------------------------------------------------------------------------------------

| Id |
Operation
|
Name
| Rows | Bytes | Cost (%CPU)|
Time
|

-------------------------------------------------------------------------------------

| 0 | SELECT
STATEMENT
|
| 37516 |
2857K| 106K
(1)| 00:21:17
|

|* 1 | TABLE ACCESS FULL|
ACC_POS_INT_TBL | 37516 |
2857K| 106K
(1)| 00:21:17
|

-------------------------------------------------------------------------------------

Predicate Information (identified by operation
id):

---------------------------------------------------

1 -
filter(TO_NUMBER("BUSINESS_DATE")=20090201)
-->这里可以看到产生了类型转换

-->高效:

SELECT
*

FROM
acc_pos_int_tbl

WHERE business_date = '20090201'

6) 索引列上使用 NULL 值

IS NULL和IS
NOT
NULL会限制索引的使用,因为数据中没有值等于NULL值,即便是NULL值也不等于NULL值.且NULL值不存储在于索引之中
因此应尽可能避免在索引类上使用NULL


SELECT
acc_num

,
pl_cd

,
order_qty

,
trade_date

FROM
trade_client_tbl

WHERE input_date IS NOT
NULL;

Execution
Plan

----------------------------------------------------------

Plan hash value:
901462645

--------------------------------------------------------------------------------------

| Id |
Operation
|
Name
| Rows | Bytes | Cost (%CPU)|
Time
|

--------------------------------------------------------------------------------------

| 0 | SELECT
STATEMENT
|
|
1 | 44
|
15 (0)| 00:00:01
|

|* 1 | TABLE ACCESS FULL|
TRADE_CLIENT_TBL
|
1 | 44
|
15 (0)| 00:00:01
|

--------------------------------------------------------------------------------------

alter table trade_client_tbl modify (input_date not
null);

不推荐使用的查询方式

SELECT * FROM table_name WHERE col IS NOT
NULL

SELECT * FROM table_name WHERE col IS
NULL

推荐使用的方式

SELECT * FROM table_name WHERE col >= 0 --尽可能的使用 =,
>=, <=, like 等运算符

-->Author: Robinson
Cheng

-->Blog: http://blog.csdn.net/robinson_0612
三、总结

1、尽可能最小化基表数据以及中间结果集(通过过滤条件避免后续产生不必要的计算与聚合)

2、为where子句中的谓词信息提供最佳的访问路径(rowid访问,索引访问)

3、使用合理的SQL写法来避免过多的Oracle内部开销以提高性能

4、合理的使用提示以提高表之间的连接来提高连接效率(如避免迪卡尔集,将不合理的嵌套连接改为hash连接等)

转自:http://blog.csdn.net/leshami/article/details/7406672
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: