Pipelined Table Function Statistics and Dynamic Sampling
2013-12-16 10:32
239 查看
This is an extraction of Adrian Billington's article:
http://www.oracle-developer.net/display.php?id=429
At some point, you might need to join a pipelined function to another rowsource (such as a table, a view, or the intermediate output of other joins within a SQL execution plan). Rowsource statistics (such as cardinality, data distribution, nulls,
etc) are critical to achieving efficient execution plans, but in the case of pipelined functions (or indeed any table function), the cost-based optimizer doesn't have much information to work with.
cardinality heuristics for pipelined table functions
Up to and including Oracle Database 11g Release 1, the CBO applies a heuristic cardinality to pipelined and table functions in SQL statements and this can sometimes lead to inefficient execution plans. The default cardinality appears to be dependent
on the value of the DB_BLOCK_SIZE initialization parameter, but on a database with a standard 8Kb block size Oracle uses a heuristic of 8,168 rows. I can demonstrate this quite easily with a pipelined function that pipes a subset of columns from the employees
table. Using Autotrace in SQL*Plus to generate an execution plan, I see the following.
/* Files on web: cbo_setup.sql and cbo_test.sql */
SQL< SELECT *
2 FROM TABLE(pipe_employees) e;
Execution Plan
----------------------------------------------------------
Plan hash value: 1802204150
--------------------------------------------------------------------
| Id | Operation | Name | Rows |
--------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8168 |
| 1 | COLLECTION ITERATOR PICKLER FETCH| PIPE_EMPLOYEES | |
--------------------------------------------------------------------
This pipelined function actually returns 50,000 rows, so if I join this pipelined function to the departments table, I run the risk of getting a suboptimal plan.
/* File on web: cbo_test.sql */
SQL< SELECT *
2 FROM departments d
3 , TABLE(pipe_employees) e
4 WHERE d.department_id = e.department_id;
Execution Plan
----------------------------------------------------------
Plan hash value: 4098497386
----------------------------------------------------------------------
| Id | Operation | Name | Rows |
----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8168 |
| 1 | MERGE JOIN | | 8168 |
| 2 | TABLE ACCESS BY INDEX ROWID | DEPARTMENTS | 27 |
| 3 | INDEX FULL SCAN | DEPT_ID_PK | 27 |
|* 4 | SORT JOIN | | 8168 |
| 5 | COLLECTION ITERATOR PICKLER FETCH| PIPE_EMPLOYEES | |
----------------------------------------------------------------------
As predicted, this appears to be a suboptimal plan; it is unlikely that a sort-merge join will be more efficient than a hash join in this scenario. So how do I influence the CBO? For this example I could use simple access hints such as LEADING
and USE_HASH to effectively override the CBO’s cost-based decision and secure a hash join between the table and pipelined function. However, for more complex SQL statements, it is quite difficult to provide all the hints necessary to “lock down” an execution
plan. It is often far better to provide the CBO with better statistics with which to make its decisions.
Optimizer dynamic sampling
This feature was enhanced in Oracle Database 11g (11.1.0.7) to include sampling for table and pipelined functions;
Dynamic sampling is an extremely useful feature that enables the optimizer to take a small statistics sample of one or more objects in a query during the parse phase. You might use dynamic sampling when you haven’t gathered statistics on all of
your tables in a query or when you are using transient objects such as global temporary tables. Starting with version 11.1.0.7, the Oracle database is able to use dynamic sampling for table or pipelined functions.
To see what difference this feature can make, I’ll repeat my previous query but include a DYNAMIC_SAMPLING hint for the pipe_employees function.
/* File on web: cbo_test.sql */
SQL< SELECT /*+ DYNAMIC_SAMPLING(5) */
2 *
3 FROM departments d
4 , TABLE(pipe_employees) e
5 WHERE d.department_id = e.department_id;
Execution Plan
----------------------------------------------------------
Plan hash value: 815920909
---------------------------------------------------------------------
| Id | Operation | Name | Rows |
---------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 50000 |
|* 1 | HASH JOIN | | 50000 |
| 2 | TABLE ACCESS FULL | DEPARTMENTS | 27 |
| 3 | COLLECTION ITERATOR PICKLER FETCH| PIPE_EMPLOYEES | |
---------------------------------------------------------------------
This time, the CBO has correctly computed the 50,000 rows that my function returns and has generated a more suitable plan. Note that I used the word “computed” and not “estimated” because in version 11.1.0.7 and later, the optimizer takes a 100%
sample of the table or pipelined function, regardless of the dynamic sampling level being used (this is also the case in Oracle Database 11g Release 2). I used level 5, but I could have used anything between level 2 and level 10 to get exactly the same result.
This means, of course, that dynamic sampling can be potentially costly or time-consuming if it is being used for queries involving high-volume or long-running pipelined functions.
http://www.oracle-developer.net/display.php?id=429
At some point, you might need to join a pipelined function to another rowsource (such as a table, a view, or the intermediate output of other joins within a SQL execution plan). Rowsource statistics (such as cardinality, data distribution, nulls,
etc) are critical to achieving efficient execution plans, but in the case of pipelined functions (or indeed any table function), the cost-based optimizer doesn't have much information to work with.
cardinality heuristics for pipelined table functions
Up to and including Oracle Database 11g Release 1, the CBO applies a heuristic cardinality to pipelined and table functions in SQL statements and this can sometimes lead to inefficient execution plans. The default cardinality appears to be dependent
on the value of the DB_BLOCK_SIZE initialization parameter, but on a database with a standard 8Kb block size Oracle uses a heuristic of 8,168 rows. I can demonstrate this quite easily with a pipelined function that pipes a subset of columns from the employees
table. Using Autotrace in SQL*Plus to generate an execution plan, I see the following.
/* Files on web: cbo_setup.sql and cbo_test.sql */
SQL< SELECT *
2 FROM TABLE(pipe_employees) e;
Execution Plan
----------------------------------------------------------
Plan hash value: 1802204150
--------------------------------------------------------------------
| Id | Operation | Name | Rows |
--------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8168 |
| 1 | COLLECTION ITERATOR PICKLER FETCH| PIPE_EMPLOYEES | |
--------------------------------------------------------------------
This pipelined function actually returns 50,000 rows, so if I join this pipelined function to the departments table, I run the risk of getting a suboptimal plan.
/* File on web: cbo_test.sql */
SQL< SELECT *
2 FROM departments d
3 , TABLE(pipe_employees) e
4 WHERE d.department_id = e.department_id;
Execution Plan
----------------------------------------------------------
Plan hash value: 4098497386
----------------------------------------------------------------------
| Id | Operation | Name | Rows |
----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8168 |
| 1 | MERGE JOIN | | 8168 |
| 2 | TABLE ACCESS BY INDEX ROWID | DEPARTMENTS | 27 |
| 3 | INDEX FULL SCAN | DEPT_ID_PK | 27 |
|* 4 | SORT JOIN | | 8168 |
| 5 | COLLECTION ITERATOR PICKLER FETCH| PIPE_EMPLOYEES | |
----------------------------------------------------------------------
As predicted, this appears to be a suboptimal plan; it is unlikely that a sort-merge join will be more efficient than a hash join in this scenario. So how do I influence the CBO? For this example I could use simple access hints such as LEADING
and USE_HASH to effectively override the CBO’s cost-based decision and secure a hash join between the table and pipelined function. However, for more complex SQL statements, it is quite difficult to provide all the hints necessary to “lock down” an execution
plan. It is often far better to provide the CBO with better statistics with which to make its decisions.
Optimizer dynamic sampling
This feature was enhanced in Oracle Database 11g (11.1.0.7) to include sampling for table and pipelined functions;
Dynamic sampling is an extremely useful feature that enables the optimizer to take a small statistics sample of one or more objects in a query during the parse phase. You might use dynamic sampling when you haven’t gathered statistics on all of
your tables in a query or when you are using transient objects such as global temporary tables. Starting with version 11.1.0.7, the Oracle database is able to use dynamic sampling for table or pipelined functions.
To see what difference this feature can make, I’ll repeat my previous query but include a DYNAMIC_SAMPLING hint for the pipe_employees function.
/* File on web: cbo_test.sql */
SQL< SELECT /*+ DYNAMIC_SAMPLING(5) */
2 *
3 FROM departments d
4 , TABLE(pipe_employees) e
5 WHERE d.department_id = e.department_id;
Execution Plan
----------------------------------------------------------
Plan hash value: 815920909
---------------------------------------------------------------------
| Id | Operation | Name | Rows |
---------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 50000 |
|* 1 | HASH JOIN | | 50000 |
| 2 | TABLE ACCESS FULL | DEPARTMENTS | 27 |
| 3 | COLLECTION ITERATOR PICKLER FETCH| PIPE_EMPLOYEES | |
---------------------------------------------------------------------
This time, the CBO has correctly computed the 50,000 rows that my function returns and has generated a more suitable plan. Note that I used the word “computed” and not “estimated” because in version 11.1.0.7 and later, the optimizer takes a 100%
sample of the table or pipelined function, regardless of the dynamic sampling level being used (this is also the case in Oracle Database 11g Release 2). I used level 5, but I could have used anything between level 2 and level 10 to get exactly the same result.
This means, of course, that dynamic sampling can be potentially costly or time-consuming if it is being used for queries involving high-volume or long-running pipelined functions.
相关文章推荐
- abap create dynamic structure and dynamic table
- Multi-table Insert Using Pipelined Function
- Table Function, pipelined, instead of inserts.
- 虚函数与动态绑定 / Virtual function and Dynamic Binding
- Fun with dynamicobject dynamic and the settings table
- PL/SQL 表函数, Cursor Variable, pipelined table function
- Oracle管道函数(Pipelined Table Function)介绍
- Oracle管道函数(Pipelined Table Function)介绍
- 重学 Statistics,Cha7 Sampling and Sampling Distribution
- Dynamic Table View Cell Height and Auto Layout
- 11g Multi-Column Correlation Stats and Dynamic Sampling
- 11g Multi-Column Correlation Stats and Dynamic Sampling
- Probability And Statistics In Python:Distributions And Sampling
- Dynamic Table View Cell Height and Auto Layout
- [PLSQL]Two small function utilities that could be used in dynamic SQL (sqlchar and correct_sql_name)
- oracle pipelined 自定义函数 function 返回table格式
- python closure and function decorators 1
- 管道函数(pipelined function)简单使用示例
- 56.View the Exhibit and examine the structure of the PROMOTIONS table.
- MySQL数据库出错:Table ... is marked as crashed and should be repaired