您的位置:首页 > 其它

聚簇因子对于索引的影响

2016-08-22 16:22 375 查看

一、官方文档说明

1、基本概念

Index Clustering Factor

 

        Fora B-tree index, the index clustering factor measures the physical grouping ofrows in relation to an index value, such as last name. The index clusteringfactor helps the optimizer decide whether an index scan or full table scan ismore efficient
for certain queries). A low clustering factor indicates anefficient index scan.

 

        Aclustering factor that is close to the number of blocks in a table indicatesthat the rows are physically ordered in the table blocks by the index key. Ifthe database performs a full table scan, then the database tends to retrievethe rows as they
are stored on disk sorted by the index key. A clusteringfactor that is close to the number of rows indicates that the rows arescattered randomly across the database blocks in relation to the index key. Ifthe database performs a full table scan, then the database
would not retrieverows in any sorted order by this index key.

 

        Theclustering factor is a property of a specific index, not a table (see OracleDatabase Conceptsfor an overview). If multipleindexes exist on a table, then the clustering factor for one index might besmall while the factor for another index is large.
An attempt to reorganize thetable to improve the clustering factor for one index may degrade the clusteringfactor of the other index

 

 

2、聚簇因子影响的举例说明

Effect of Index Clustering Factor on Cost:Example

To illustrate how the index clusteringfactor can influence the cost of table access, consider the following scenario:

• A table contains 9 rows that are stored in 3 data blocks.

• The col1column currently stores the values A, B, and C.

• A nonunique index named col1_idxexists on col1for this table.

 

Example 10-4 Collocated Data

Assume that the rows are stored in the datablocks as follows:

Block 1   Block 2   Block 3

-------      -------        -------

A A A     B B B       C C C

In this example, the index clusteringfactor for col1_idxis low. The rows that have the same indexed column valuesfor col1are in the same data blocks in the table. Thus, the cost of using an indexrange scan to return all rows with value Ais low because only
one block in thetable must be read.

 

Example 10-5 Scattered Data

Assume that the same rows are scatteredacross the data blocks as follows:

Block 1 Block 2  Block 3

-------     -------     -------

A B C    A C B    B A C

In this example, the index clusteringfactor for col1_idxis higher. The database must read all three blocks in thetable to retrieve all rows with the value Ain col1.

 

索引聚簇因子越低,对于索引越好,因为低的聚簇因子有好的排序性,意味着可以在一个数据块中一次读入多个相同数据,而高的聚簇因子可能要在多个块中读入多个相同的数据

 

二、实验如下

注:本次演示是11gR2版本

1、聚簇因子高的影响

--创建测试表
create table factor_test1  as select * from dba_objects where 1=0;

begin
for i in 1..50 loop
insert /*+ append */ into factor_test1 select * from dba_objects orderby i;
commit;
end loop;
end;

--查看测试表的大小和块的多少
SQL> select segment_name,blocks,extents,bytes/1024/1024||'M' "size" from user_segments where segment_name='FACTOR_TEST1';

SEGMENT_NAME        BLOCKS    EXTENTS size
--------------- ---------- ---------------------------------------------------
FACTOR_TEST1         11264         82 88M

--创建测试表的索引
create index factor_test1_ind on factor_test1(object_id);

--查看聚簇因子,发现聚簇因子很高(CLUSTERING_FACTOR),和行数一致(NUM_ROWS),远大于上面查询出来的块数(BLOCKS)
select index_name,clustering_factor,num_rows from user_indexes where index_name='FACTOR_TEST1_IND';

INDEX_NAME                     CLUSTERING_FACTOR   NUM_ROWS
------------------------------ ---------------------------
FACTOR_TEST1_IND                          842700     842700

--收集表统计信息
exec dbms_stats.gather_table_stats(user,'factor_test1',cascade=>true);

--再次查看聚簇因子无变化
select index_name,clustering_factor,num_rows from user_indexes where index_name='FACTOR_TEST1_IND';

INDEX_NAME                     CLUSTERING_FACTOR   NUM_ROWS
----------------------------------------------- ----------
FACTOR_TEST1_IND                          842700     842700

--查看一下执行计划
set autotrace traceonly;

--查询id为666的执行计划如下,走的时索引扫描
select * from FACTOR_TEST1 where object_id=666;

50 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 643739684

------------------------------------------------------------------------------------------------
| Id | Operation                   |Name             | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0| SELECT STATEMENT            |                  |    50 | 4450 |    54 (0)| 00:00:01 |
|   1|  TABLE ACCESS BY INDEX ROWID|FACTOR_TEST1      |    50 | 4450 |    54 (0)| 00:00:01 |
|*  2|   INDEX RANGE SCAN          | FACTOR_TEST1_IND |    50 |      |     3 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified byoperation id):
---------------------------------------------------
2- access("OBJECT_ID"=666)

Statistics
----------------------------------------------------------
1  recursive calls
0  db block gets
57  consistent gets
0  physical reads
0  redo size
6487  bytes sent via SQL*Net toclient
556  bytes received via SQL*Netfrom client
5  SQL*Net roundtrips to/fromclient
0  sorts (memory)
0  sorts (disk)
50  rows processed

--清楚一下缓存
alter system flush buffer_cache;

--查询id在666和2666之间的执行计划,在有索引的情况下,发现走的是全表扫描
set autotrace traceonly;

select * from FACTOR_TEST1 where object_id>666 and object_id<2666;

99950 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3621494151

----------------------------------------------------------------------------------
| Id | Operation         | Name         | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
|   0| SELECT STATEMENT  |              | 72412 |  6293K| 3058   (1)| 00:00:37 |
|*  1|  TABLE ACCESS FULL| FACTOR_TEST1 |72412 |  6293K|  3058  (1)| 00:00:37 |
----------------------------------------------------------------------------------

Predicate Information (identified byoperation id):
---------------------------------------------------
1- filter("OBJECT_ID"<2666 AND "OBJECT_ID">666)

Statistics
----------------------------------------------------------
1  recursive calls
0  db block gets
17735  consistent gets
11110  physical reads
0  redo size
5251616  bytes sent via SQL*Net toclient
73816  bytes received via SQL*Netfrom client
6665  SQL*Net roundtrips to/fromclient
0  sorts (memory)
0  sorts (disk)
99950  rows processed


2、聚簇因子低的特性

 

--创建第二个测试表,以第一个表的object_id为正序排列插入,这样可以让object_id上的索引有顺序了,聚簇因子也就低了
create table FACTOR_TEST2 as select * from FACTOR_TEST1 order by object_id;

--查看测试表的大小和块的多少
SQL> select segment_name,blocks,extents,bytes/1024/1024||'M' "size" fromuser_segments where segment_name='FACTOR_TEST2';

SEGMENT_NAME        BLOCKS    EXTENTS size
--------------- ---------- ---------------------------------------------------
FACTOR_TEST1         11264         82 88M

--创建索引
create index factor_test2_ind on factor_test2(object_id);

--查看聚簇因子(CLUSTERING_FACTOR)发现很低,和上面的块(BLOCKS)多少差不多,远小于行数(NUM_ROWS)
select index_name,clustering_factor,num_rows from user_indexes where index_name='FACTOR_TEST2_IND';

INDEX_NAME                     CLUSTERING_FACTOR   NUM_ROWS
----------------------------------------------- ----------
FACTOR_TEST2_IND                           11084     842700

--同样的查询id在666和2666之间的执行计划,这次走的是索引扫描,物理读比之前的查询少了很多,cost成本也降低很多
set autotrace traceonly;

select * from FACTOR_TEST2 where object_id>666 and object_id<2666;

99950 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3993865951

------------------------------------------------------------------------------------------------
| Id | Operation                   |Name             | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0| SELECT STATEMENT            |                  | 53481 |    10M|  699 (1)| 00:00:09 |
|   1|  TABLE ACCESS BY INDEX ROWID|FACTOR_TEST2      | 53481 |    10M|  699 (1)| 00:00:09 |
|*  2|   INDEX RANGE SCAN          | FACTOR_TEST2_IND | 53481 |       |  102 (0)| 00:00:02 |
------------------------------------------------------------------------------------------------

Predicate Information (identified byoperation id):
---------------------------------------------------
2- access("OBJECT_ID">666 AND "OBJECT_ID"<2666)
Note
------
dynamic sampling used for this statement(level=2)

Statistics
----------------------------------------------------------
24  recursive calls
0  db block gets
14857  consistent gets
1790  physical reads
0  redo size
10481097  bytes sent via SQL*Netto client
73816  bytes received via SQL*Netfrom client
6665  SQL*Net roundtrips to/fromclient
0  sorts (memory)
0  sorts (disk)
99950  rows processed
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: