您的位置:首页 > 其它

有关位图连接索引(bitmap join index)的一些测试

2010-12-22 20:00 267 查看
数据库版本:10.2.0.4

create table zsj_obj as select owner,object_id from dba_objects where object_id is not null;

create table zsj_obj_detail pctfree 90
as select object_id,object_name,object_type,last_ddl_time+rownum last_ddl_time,status
from dba_objects where object_id is not null;

alter table zsj_obj_detail add constraint pk_zsj_obj_detail primary key(object_id);

create bitmap index ind_zsj_obj_detail
on zsj_obj_detail(o.owner)
from zsj_obj o,zsj_obj_detail d
where o.object_id=d.object_id;

指向zsj_obj_detail的数据行.where子句中必须连接另一个表zsj_obj的主键或者是唯一键.如果zsj_obj是复合主键的话,where子句中必须包含zsj_obj的所有主键列.

create bitmap index ind_zsj_obj_detail
on zsj_obj_detail(o.owner)
from zsj_obj o,zsj_obj_detail d
where o.object_id=d.object_id
ORA-25954: missing primary key or unique constraint on dimension
这里报错了,因为zsj_obj的object_id不是主键也不是唯一键,虽然说zsj_obj_detail的object_id是主键吧,但那不是我这里要求的.

alter table zsj_obj_detail drop constraint pk_zsj_obj_detail; 我把zsj_obj_detail上的主键给去掉了
alter table zsj_obj add constraint pk_zsj_obj primary key(object_id); 加上了zsj_obj的主键约束
SQL> desc zsj_obj;
Name Type Nullable Default Comments
--------- ------------ -------- ------- --------
OWNER VARCHAR2(30) Y
OBJECT_ID NUMBER
SQL> desc zsj_obj_detail;
Name Type Nullable Default Comments
------------- ------------- -------- ------- --------
OBJECT_ID NUMBER Y
OBJECT_NAME VARCHAR2(128) Y
OBJECT_TYPE VARCHAR2(19) Y
LAST_DDL_TIME DATE Y
STATUS VARCHAR2(7) Y

这样就可以创建上面的位图连接索引了:
create bitmap index ind_zsj_obj_detail
on zsj_obj_detail(o.owner)
from zsj_obj o,zsj_obj_detail d
where o.object_id=d.object_id;

Index created
索引创建成功了.

SQL> SELECT INDEX_TYPE,join_index,TABLE_NAME FROM USER_INDEXES WHERE INDEX_NAME=UPPER('ind_zsj_obj_detail');
INDEX_TYPE JOIN_INDEX TABLE_NAME
--------------------------- ---------- ------------------------------
BITMAP YES ZSJ_OBJ_DETAIL
这里显示是zsj_obj_detail表上的索引.

exec dbms_stats.gather_table_stats(user,'ZSJ_OBJ',cascade=>true,estimate_percent=>100,method_opt=>'for columnS OWNER SIZE 254,OBJECT_ID SIZE 1');
exec dbms_stats.gather_table_stats(user,'ZSJ_OBJ_DETAIL',cascade=>true,estimate_percent=>100,method_opt=>'for all columnS size 1');
分析表zsj_obj_detail连带它的索引也分析的时候,ind_zsj_obj_detail也被分析了,可以知道这个索引确实是zsj_obj_detail这个表的索引.

SQL> SELECT OWNER,COUNT(1) CNT FROM ZSJ_OBJ GROUP BY OWNER ORDER BY CNT DESC;
OWNER CNT
------------------------------ ----------
SYS 23328
......
TSMSYS 3

21 rows selected

SQL> select TABLE_NAME,COLUMN_NAME,NUM_BUCKETS,HISTOGRAM from User_Tab_Columns where table_name in('ZSJ_OBJ','ZSJ_OBJ_DETAIL');
TABLE_NAME COLUMN_NAME NUM_BUCKETS HISTOGRAM
------------------------------ ------------------------------ ----------- ---------------
ZSJ_OBJ OWNER 21 FREQUENCY
ZSJ_OBJ OBJECT_ID 1 NONE
ZSJ_OBJ_DETAIL OBJECT_ID 1 NONE
ZSJ_OBJ_DETAIL OBJECT_NAME 1 NONE
ZSJ_OBJ_DETAIL OBJECT_TYPE 1 NONE
ZSJ_OBJ_DETAIL LAST_DDL_TIME 1 NONE
ZSJ_OBJ_DETAIL STATUS 1 NONE

7 rows selected

select COUNT(1)
from zsj_obj a,zsj_obj_detail b
where a.object_id=b.object_id and a.owner='SYS';
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 5 | 2 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 5 | | |
| 2 | BITMAP CONVERSION COUNT | | 2509 | 12545 | 2 (0)| 00:00:01 |
|* 3 | BITMAP INDEX SINGLE VALUE| IND_ZSJ_OBJ_DETAIL | | | | |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("B"."SYS_NC00006$"='SYS')
Statistics
----------------------------------------------------------
0 db block gets
4 consistent gets
1 rows processed
只使用索引,不需要访问表数据.

select B.*
from zsj_obj a,zsj_obj_detail b
where a.object_id=b.object_id and a.owner='TSMSYS';
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 183 | 710 (1)| 00:00:09 |
|* 1 | HASH JOIN | | 3 | 183 | 710 (1)| 00:00:09 |
|* 2 | TABLE ACCESS FULL| ZSJ_OBJ | 3 | 30 | 23 (5)| 00:00:01 |
| 3 | TABLE ACCESS FULL| ZSJ_OBJ_DETAIL | 52691 | 2624K| 687 (1)| 00:00:09 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."OBJECT_ID"="B"."OBJECT_ID")
2 - filter("A"."OWNER"='TSMSYS')
Statistics
----------------------------------------------------------
0 db block gets
4000 consistent gets
3 rows processed
默认是不使用这个索引的,为什么呢?强制使用这个索引看看:
select /*+ INDEX(B ind_zsj_obj_detail)*/B.*
from zsj_obj a,zsj_obj_detail b
where a.object_id=b.object_id and a.owner='TSMSYS';
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2509 | 124K| 1524 (1)| 00:00:19 |
| 1 | TABLE ACCESS BY INDEX ROWID | ZSJ_OBJ_DETAIL | 2509 | 124K| 1524 (1)| 00:00:19 |
| 2 | BITMAP CONVERSION TO ROWIDS| | | | | |
|* 3 | BITMAP INDEX SINGLE VALUE | IND_ZSJ_OBJ_DETAIL | | | | |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("B"."SYS_NC00006$"='TSMSYS')
Statistics
----------------------------------------------------------
0 db block gets
5 consistent gets
3 rows processed
明显,这里的逻辑io要低得多(当然,cpu time时间之类的统计信息这里看不到),但为什么默认情况下,优化器没有使用这个执行计划呢?可以看到这里的cost是1524,比710要大,所以被优化器给抛弃了.但为什么这个代价这个高呢?和可能和这里估算出来的card=2509有关.但我在zsj_obj.owner上是收集了柱状图统计信息的呀,所以可以看到在默认的执行计划中估算最终的card=3是完全准确的呀,可这里为什么估算出了2509呢,这个数字是如何得来的呢?
SQL> select table_name,round(num_rows/21,2) from user_tables where table_name in('ZSJ_OBJ','ZSJ_OBJ_DETAIL');
TABLE_NAME ROUND(NUM_ROWS/21,2)
------------------------------ --------------------
ZSJ_OBJ_DETAIL 2509.1
ZSJ_OBJ 2509.05
这里正好是2509,也就是说又认为这21个值是平均分布的了.
可以想象,这里的位图连接索引也会像FBI一样,对应于zsj_obj_detail上的一个虚拟列.
SQL> select column_name,num_distinct,num_buckets,histogram,virtual_column,hidden_column from user_tab_cols where table_name='ZSJ_OBJ_DETAIL';
COLUMN_NAME NUM_DISTINCT NUM_BUCKETS HISTOGRAM VIRTUAL_COLUMN HIDDEN_COLUMN
------------------------------ ------------ ----------- --------------- -------------- -------------
OBJECT_ID 52691 1 NONE NO NO
OBJECT_NAME 32241 1 NONE NO NO
OBJECT_TYPE 41 1 NONE NO NO
LAST_DDL_TIME 52691 1 NONE NO NO
STATUS 2 1 NONE NO NO
SYS_NC00006$ NONE YES YES
这里的SYS_NC00006$正是上面的执行计划中出现的access filter: 3 - access("B"."SYS_NC00006$"='TSMSYS')
可以看到这个列上连基本的优化器统计信息都没有收集.我们收集一下它上的柱状图统计信息:
exec dbms_stats.gather_table_stats(user,'ZSJ_OBJ_DETAIL',estimate_percent=>100,method_opt=>'for columns SYS_NC00006$ size 254');
exec dbms_stats.gather_table_stats(user,'ZSJ_OBJ_DETAIL',estimate_percent=>100,method_opt=>'for columns "SYS_NC00006$" size 254');
exec dbms_stats.gather_table_stats(user,'ZSJ_OBJ_DETAIL',estimate_percent=>100,method_opt=>'for all hidden columns size 254');
exec dbms_stats.gather_table_stats(user,'ZSJ_OBJ_DETAIL',cascade=>true,estimate_percent=>100,method_opt=>'for all columns size 254');
可使用上面的任何一个选项,都不收集SYS_NC00006$这个虚拟列上的优化器统计信息,上面2个sql的执行计划,card也都没有任何的改变.
那就试试cardinality这个提示:
select /*+ cardinality(a 3)*/B.*
from zsj_obj a,zsj_obj_detail b
where a.object_id=b.object_id and a.owner='TSMSYS';
不管用.还是原来默认的执行计划.

select /*+ cardinality(b 3)*/B.*
from zsj_obj a,zsj_obj_detail b
where a.object_id=b.object_id and a.owner='TSMSYS';
也同样不是我要的结果.
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 183 | 688 (1)| 00:00:09 |
| 1 | NESTED LOOPS | | 3 | 183 | 688 (1)| 00:00:09 |
| 2 | TABLE ACCESS FULL | ZSJ_OBJ_DETAIL | 3 | 153 | 686 (1)| 00:00:09 |
|* 3 | TABLE ACCESS BY INDEX ROWID| ZSJ_OBJ | 1 | 10 | 1 (0)| 00:00:01 |
|* 4 | INDEX UNIQUE SCAN | PK_ZSJ_OBJ | 1 | | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("A"."OWNER"='TSMSYS')
4 - access("A"."OBJECT_ID"="B"."OBJECT_ID")
Statistics
----------------------------------------------------------
109267 consistent gets
3 rows processed

伪造一下数据,让优化器认为这个虚拟里上的不同值很多,并且它原来就没有收集柱状图统计信息,按说优化器会认为a.owner='TSMSYS'返回很少的数据行了呀.这里就是10(52691/5000)行多点儿而已.
exec dbms_stats.set_column_stats(user,'ZSJ_OBJ_DETAIL','SYS_NC00006$',distcnt => 5000,density => 1/5000,nullcnt => 0);
SQL> select column_name,num_distinct,num_buckets,histogram,virtual_column,hidden_column,c.last_analyzed from user_tab_cols c where table_name='ZSJ_OBJ_DETAIL';
COLUMN_NAME NUM_DISTINCT NUM_BUCKETS HISTOGRAM VIRTUAL_COLUMN HIDDEN_COLUMN LAST_ANALYZED
------------------------------ ------------ ----------- --------------- -------------- ------------- -------------
SYS_NC00006$ 5000 1 NONE YES YES 2010-12-21 20

但根本就不管用:
select B.*
from zsj_obj a,zsj_obj_detail b
where a.object_id=b.object_id and a.owner='TSMSYS';
还是使用原来的执行计划.

select /*+ INDEX(B ind_zsj_obj_detail)*/B.*
from zsj_obj a,zsj_obj_detail b
where a.object_id=b.object_id and a.owner='TSMSYS';
card,cost根本就没有改变.
这个先到这里吧,继续其它的实验

*************************
SQL> exec dbms_stats.delete_table_stats(user,'ZSJ_OBJ_DETAIL',cascade_columns => TRUE,cascade_indexes => TRUE);
SQL> exec dbms_stats.gather_table_stats(user,'ZSJ_OBJ_DETAIL',cascade=>true,estimate_percent=>100,method_opt=>'for all columnS size 1');
继续实验吧!
create bitmap index ind2_zsj_obj_detail
on zsj_obj_detail(o.owner,d.object_type)
from zsj_obj o,zsj_obj_detail d
where o.object_id=d.object_id;

Index created
新建一个索引.
select COUNT(1)
from zsj_obj a,zsj_obj_detail b
where a.object_id=b.object_id and a.owner='SYS' and b.object_type='INDEX';

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 14 | 1 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 14 | | |
|* 2 | TABLE ACCESS BY INDEX ROWID | ZSJ_OBJ_DETAIL | 235 | 3290 | 1 (0)| 00:00:01 |
| 3 | BITMAP CONVERSION TO ROWIDS| | | | | |
|* 4 | BITMAP INDEX SINGLE VALUE | IND2_ZSJ_OBJ_DETAIL | | | | |
-----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("B"."OBJECT_TYPE"='INDEX')
4 - access("B"."SYS_NC00007$"='SYS' AND "B"."SYS_NC00008$"='INDEX')
这个执行计划为什么还需要回访表呢?

create bitmap index ind3_zsj_obj_detail
on zsj_obj_detail(o.owner,d.last_ddl_time desc)
from zsj_obj o,zsj_obj_detail d
where o.object_id=d.object_id;

Index created
是可以创建成功的

select * from
(
select b.*
from zsj_obj a,zsj_obj_detail b
where a.object_id=b.object_id and a.owner='SYS' order by b.last_ddl_time desc
)
where rownum<=10;
默认,是不使用这个索引的

select * from
(
select /*+ index(b ind3_zsj_obj_detail)*/b.*
from zsj_obj a,zsj_obj_detail b
where a.object_id=b.object_id and a.owner='SYS' order by b.last_ddl_time desc
)
where rownum<=10;
强制使用这个索引,但还是有排序操作.
--------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 1040 | 12874 (1)| 00:02:35 |
|* 1 | COUNT STOPKEY | | | | | |
| 2 | VIEW | | 10328 | 1048K| 12874 (1)| 00:02:35 |
|* 3 | SORT ORDER BY STOPKEY | | 10328 | 615K| 12874 (1)| 00:02:35 |
|* 4 | HASH JOIN | | 10328 | 615K| 12872 (1)| 00:02:35 |
|* 5 | TABLE ACCESS FULL | ZSJ_OBJ | 23328 | 227K| 23 (5)| 00:00:01 |
| 6 | TABLE ACCESS BY INDEX ROWID | ZSJ_OBJ_DETAIL | 23328 | 1161K| 12849 (1)| 00:02:35 |
| 7 | BITMAP CONVERSION TO ROWIDS| | | | | |
|* 8 | BITMAP INDEX RANGE SCAN | IND3_ZSJ_OBJ_DETAIL | | | | |
--------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=10)
3 - filter(ROWNUM<=10)
4 - access("A"."OBJECT_ID"="B"."OBJECT_ID")
5 - filter("A"."OWNER"='SYS')
8 - access("B"."SYS_NC00009$"='SYS')
filter("B"."SYS_NC00009$"='SYS')
Statistics
----------------------------------------------------------
3364 consistent gets
10 rows processed

SQL> alter table zsj_obj_detail modify(last_ddl_time not null);
last_ddl_time列上加上非空的约束条件.

select * from
(
select /*+ index(b ind3_zsj_obj_detail)*/b.*
from zsj_obj a,zsj_obj_detail b
where a.object_id=b.object_id and a.owner='SYS' order by b.last_ddl_time desc
)
where rownum<=10;
仍然需要排序.而无法像普通索引那样避免排序操作.
当然,这里一系列的疑问,包括索引里的字段都包括了,为什么还需要回访表;这里为什么不能避免排序操作,应该来说都和位图连接索引的内部结构和实现相关,毕竟它不是常规的平衡树索引,无法实现常规的平衡树索引的功能.

我这里做这些实验,其实是因为我想通过位图连接索引实现一些sql性能上的优化(假如它具有平衡树索引的一些特性的话):
有一个产品表product,里面有一些通用的产品信息,包括创建时间,pageview等非空字段,经常需要某个小类或者是品牌的top10新品,top10热品信息,如果这个表有catalog_id,brand_id字段记录这个产品所属的小类,品牌的话,创建类似index(catalog_id,pageview desc)这样的索引,那么实现某个小类或者是品牌的top10新品,top10热品信息的检索完全可以使用这些索引来高效的完成,根本就不需要排序的(它的资源消耗几乎是确定的,只和索引的blevel有关,和这个小类,品牌下的产品数量多少没有关系).但因为一些历史的原因:原来一个产品是可以属于多个小类的,原来一个产品要在多个网站上展现,而这多个网站下相同的小类,品牌的id是不相同的,导致catalog_id,brand_id不能作为product的一个属性记录在product表中了,所以新加了一个表catalogrelateproduct记录这种对应关系(catalogid,productid),其中catalogid可以是小类id,也可以是品牌id.
SELECT *
FROM (SELECT P.*
FROM PRODUCT P, CATALOGRELATEPRODUCT CATAP
WHERE CATAP.CATALOGID = 122
AND P.ID = CATAP.PRODUCTID
AND P.PAGEVIEW IS NOT NULL
ORDER BY P.PAGEVIEW DESC)
WHERE ROWNUM <= 10;
对于这样的语句来说,性能上的损失是不可避免的了.其实我是想尝试一下,看看位图连接索引是否可以有所帮助,能够像普通的平衡树索引一样建立index(CATAP.CATALOGID,P.PAGEVIEW desc) 而快速的找到前10行数据.这样的话,现在的表结构不用修改,业务代码不用修改,sql也不需要修改.看来是不可能了.
因为后来确定一个产品只能属于一个小类了,并且现在只有一个站点在使用这批数据了,所以是可以使用product上的字段来记录它所属的小类,品牌了.是可以高效实现这样的功能,不需要排序的.但修改这批相关的sql而提高性能看来是不可避免的了.

当然了,即使确实能够如我所想的那样实现,在OLTP这样频繁的进行dml操作的数据库系统上,它的使用也是一个问题:
普通的位图索引,在修改一行数据的时候,锁定的并不只是修改的这行数据,而是相关的一组数据行,也就是说存在着锁扩容的问题,但至少在表上放置的还是TM3的锁.位图连接索引就不是这样了:

create index ind4_zsj_obj_detail on zsj_obj_detail(object_id);

523> insert into zsj_obj(Owner,OBJECT_ID) VALUES('ZHAOSJ',100000);
这需要先在zsj_obj上放置tm3的锁,然后在zsj_obj_detail上放置tm4的锁.
524> INSERT INTO ZSJ_OBJ_DETAIL(OBJECT_ID,OBJECT_NAME,OBJECT_TYPE,LAST_DDL_TIME,STATUS) VALUES(100002,'TEST1','TABLE',SYSDATE,'VALID');
这需要先在zsj_obj上先放置一个tm4的锁,然后在zsj_obj_detail上放置一个tm3的锁.
若在同一个事务中同时操作这两个表,最终会在zsj_obj,zsj_obj_detail上都持有tm5的锁.

都需要tm4,甚至tm5的表级锁了,显然这是OLTP系统所不能承受的.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: