CBO

Richard Foote's picture

The CPU Costing Model: A Few Thoughts Part III (Bang Bang)

One of the advantages of system statistics and the CPU costing model is in how the CBO deals with multiblock reads and the impact of the db_file_multiblock_read_count parameter. When performing a full table (or fast full index) scan, the CBO needs to determine just how many multiblock read operations are likely to be performed so the associated operation can [...]

Richard Foote's picture

The CPU Costing Model – A Few Thoughts Part II

As previously discussed, the formula used by the CBO using the CPU costing model is basically:
 
(sum of all the single block I/Os x average wait time for a single block I/O +
 sum of all the multiblock I/Os x average wait time for a multiblock I/O +
 sum of all the required CPU cycles / CPU cycles per second)
/
average [...]

Richard Foote's picture

The CPU Costing Model: A Few Thoughts Part I

In the coming days, I’ll post a series of little entries highlighting a specific point in relation to the use of system statistics and the CPU cost model. In my previous post, we looked at how the cost of a FTS is calculated using the CPU costing model and how this generally results in an increase in the associated FTS [...]

randolf.geist's picture

Multi-column joins

Consider the following scenario with four tables. Two of them represent master data, the third one uses a concatenated primary key consisting of foreign keys to the first two, and the fourth one has a foreign key to the third one.

drop table t1 cascade constraints purge;

drop table t2 cascade constraints purge;

drop table t3 cascade constraints purge;

drop table t4 cascade constraints purge;

create table t1 (
t1_id integer not null constraint pk_t1 primary key,
filler1 varchar2(40),
filler2 varchar2(40)
);

create table t2 (
t2_id integer not null constraint pk_t2 primary key,
filler1 varchar2(40),
filler2 varchar2(40)
);

create table t3 (
t1_id integer not null,
t2_id integer not null,
filler1 varchar2(40),
filler2 varchar2(40),
constraint pk_t3 primary key (t1_id, t2_id) using index (
create index pk_t3 on t3 (t1_id, t2_id)
),
constraint fk_t3_1 foreign key (t1_id) references t1 (t1_id),
constraint fk_t3_2 foreign key (t2_id) references t2 (t2_id));

create table t4 (
t4_id integer not null constraint pk_t4 primary key,
t1_id integer,
t2_id integer,
filler1 varchar2(40),
filler2 varchar2(40),
constraint t4_fk_1 foreign key (t1_id, t2_id) references t3 (t1_id, t2_id)
);

Notice that the primary key of "t3" is using a non-unique index, which is supported and can be used e.g. for deferrable constraints or when loading data into tables that might be non-unique so that the constraint can be disabled without dropping the (unique) index. This allows to simply re-enable the constraint after cleaning up the non-unique rows instead of re-creating an unique index (and the risk of losing the index if anything goes wrong).

Now when using an uncorrelated data set for the concatenated keys, Oracle's default (join) selectivity formulas apply and the estimated cardinalities are correct. Table "t1" has 10,000 rows, "t2" 3 rows, table "t3" holds 30,000 rows combining "t1" and "t2" data. "t4" has 300,000 rows.

-- non-correlated column values
exec dbms_stats.set_table_stats(null, 't1', numrows=>10000, numblks=>100, avgrlen=>100)

exec dbms_stats.set_table_stats(null, 't2', numrows=>3, numblks=>1, avgrlen=>100)

exec dbms_stats.set_table_stats(null, 't3', numrows=>30000, numblks=>300, avgrlen=>100)

exec dbms_stats.set_table_stats(null, 't4', numrows=>300000, numblks=>3000, avgrlen=>100)

exec dbms_stats.set_column_stats(null, 't1', 't1_id', distcnt=>10000, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't2', 't2_id', distcnt=>3, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't3', 't1_id', distcnt=>10000, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't3', 't2_id', distcnt=>3, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't4', 't4_id', distcnt=>300000, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't4', 't1_id', distcnt=>10000, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't4', 't2_id', distcnt=>3, nullcnt=>0)

exec dbms_stats.set_index_stats(null, 'pk_t1', numdist=>10000, clstfct=>10000, indlevel=>2, numlblks=>10, numrows=>10000)

exec dbms_stats.set_index_stats(null, 'pk_t2', numdist=>3, clstfct=>3, indlevel=>1, numlblks=>1, numrows=>3)

exec dbms_stats.set_index_stats(null, 'pk_t3', numdist=>30000, clstfct=>30000, indlevel=>2, numlblks=>30, numrows=>30000)

exec dbms_stats.set_index_stats(null, 'pk_t4', numdist=>300000, clstfct=>300000, indlevel=>2, numlblks=>300, numrows=>30000)

Joining t4 to t3 results in a correct estimate of 300K rows:

select
count(*)
from
t4
, t3
where
t3.t1_id = t4.t1_id
and t3.t2_id = t4.t2_id;

----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 52 | | 1456 (3)| 00:00:18 |
| 1 | SORT AGGREGATE | | 1 | 52 | | | |
|* 2 | HASH JOIN | | 300K| 14M| 1120K| 1456 (3)| 00:00:18 |
| 3 | INDEX FAST FULL SCAN| PK_T3 | 30000 | 761K| | 11 (10)| 00:00:01 |
| 4 | TABLE ACCESS FULL | T4 | 300K| 7617K| | 833 (3)| 00:00:10 |
----------------------------------------------------------------------------------------

Things look however different if we have the awkward situation of correlated column values for the concatenated keys:

-- correlated column values
exec dbms_stats.set_table_stats(null, 't1', numrows=>10000, numblks=>100, avgrlen=>100)

exec dbms_stats.set_table_stats(null, 't2', numrows=>20000, numblks=>200, avgrlen=>100)

exec dbms_stats.set_table_stats(null, 't3', numrows=>30000, numblks=>300, avgrlen=>100)

exec dbms_stats.set_table_stats(null, 't4', numrows=>300000, numblks=>3000, avgrlen=>100)

exec dbms_stats.set_column_stats(null, 't1', 't1_id', distcnt=>10000, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't2', 't2_id', distcnt=>20000, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't3', 't1_id', distcnt=>10000, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't3', 't2_id', distcnt=>20000, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't4', 't4_id', distcnt=>300000, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't4', 't1_id', distcnt=>10000, nullcnt=>0)

exec dbms_stats.set_column_stats(null, 't4', 't2_id', distcnt=>20000, nullcnt=>0)

exec dbms_stats.set_index_stats(null, 'pk_t1', numdist=>10000, clstfct=>10000, indlevel=>2, numlblks=>10, numrows=>10000)

exec dbms_stats.set_index_stats(null, 'pk_t2', numdist=>20000, clstfct=>20000, indlevel=>2, numlblks=>20, numrows=>20000)

exec dbms_stats.set_index_stats(null, 'pk_t3', numdist=>30000, clstfct=>30000, indlevel=>2, numlblks=>30, numrows=>30000)

exec dbms_stats.set_index_stats(null, 'pk_t4', numdist=>300000, clstfct=>300000, indlevel=>2, numlblks=>300, numrows=>30000)

Here we simulate 20,000 distinct values in one column, 10,000 distinct values in the second one, but only 30,000 distinct values for the combination of both columns. In this case Oracle's default selectivity formula underestimates the cardinality since it is assuming uncorrelated values:

select /*+ opt_param('_optimizer_join_sel_sanity_check', 'false') */
count(*)
from
t4
, t3
where
t3.t1_id = t4.t1_id
and t3.t2_id = t4.t2_id;

----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 52 | | 1455 (3)| 00:00:18 |
| 1 | SORT AGGREGATE | | 1 | 52 | | | |
|* 2 | HASH JOIN | | 45 | 2340 | 1120K| 1455 (3)| 00:00:18 |
| 3 | INDEX FAST FULL SCAN| PK_T3 | 30000 | 761K| | 11 (10)| 00:00:01 |
| 4 | TABLE ACCESS FULL | T4 | 300K| 7617K| | 833 (3)| 00:00:10 |
----------------------------------------------------------------------------------------

Oracle simply multiplies the selectivity of the two columns and arrives at a join cardinality of 45 rows (1/20,000*1/10,000*300,000*30,000).

You'll notice that I had to use a undocumented optimizer parameter to arrive at that default selectivity. If you run an EXPLAIN PLAN for the same statement without the hint, you'll get the following estimate:

select
count(*)
from
t4
, t3
where
t3.t1_id = t4.t1_id
and t3.t2_id = t4.t2_id;

----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 52 | | 1455 (3)| 00:00:18 |
| 1 | SORT AGGREGATE | | 1 | 52 | | | |
|* 2 | HASH JOIN | | 30000 | 1523K| 1120K| 1455 (3)| 00:00:18 |
| 3 | INDEX FAST FULL SCAN| PK_T3 | 30000 | 761K| | 11 (10)| 00:00:01 |
| 4 | TABLE ACCESS FULL | T4 | 300K| 7617K| | 833 (3)| 00:00:10 |
----------------------------------------------------------------------------------------

It can be seen from a 10053 optimizer trace file that Oracle uses a "Multi-column cardinality sanity check" by default in cases where the calculated multi-column selectivity falls below a certain limit, obviously using the smaller selectivity available from the different 1/num_rows of the tables/row sources involved in the join, arriving at an estimate 30,000 rows in this particular case.

Changing the non-unique index used for the primary key on "t3" to a unique index will bring another sanity check into the picture: The "concatenated index" sanity check that uses the number of distinct values of an unique index that corresponds exactly to the join columns used.

create table t3 (
t1_id integer not null,
t2_id integer not null,
filler1 varchar2(40),
filler2 varchar2(40),
constraint pk_t3 primary key (t1_id, t2_id) using index (
create unique index pk_t3 on t3 (t1_id, t2_id)
),
constraint fk_t3_1 foreign key (t1_id) references t1 (t1_id),
constraint fk_t3_2 foreign key (t2_id) references t2 (t2_id));

With this unique index in place Oracle uses the number of distinct keys from this index to calculate the selectivity of the join and therefore arrives at the correct cardinality again:

select
count(*)
from
t4
, t3
where
t3.t1_id = t4.t1_id
and t3.t2_id = t4.t2_id;

----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 52 | | 1455 (3)| 00:00:18 |
| 1 | SORT AGGREGATE | | 1 | 52 | | | |
|* 2 | HASH JOIN | | 300K| 14M| 1120K| 1455 (3)| 00:00:18 |
| 3 | INDEX FAST FULL SCAN| PK_T3 | 30000 | 761K| | 11 (10)| 00:00:01 |
| 4 | TABLE ACCESS FULL | T4 | 300K| 7617K| | 833 (3)| 00:00:10 |
----------------------------------------------------------------------------------------

So this is another case where the uniqueness of an index makes a significant difference for optimizer calculations.

Note that from 11g on there more options to help the optimizer to come up with a better estimate even with the non-unique index on (t3.t1_id, t3.t2_id). Obviously 11g introduced extended statistics on column groups, so we can do the following:

variable ext_name varchar2(30)

exec :ext_name := dbms_stats.create_extended_stats(null, 't3', '(t1_id, t2_id)')

exec dbms_stats.set_column_stats(null, 't3', :ext_name, distcnt=>30000, nullcnt=>0)

This allows to derive the correct selectivity for these correlated column values using the extended statistics set.

Another option in 11g is adding an index on (t4.t1_id, t4.t2_id), like that:

create index ix_t4 on t4 (t1_id, t2_id);

exec dbms_stats.set_index_stats(null, 'ix_t4', numdist=>30000, clstfct=>30000, indlevel=>2, numlblks=>30, numrows=>300000)

Having now two non-unique indexes Oracle 11g comes up again with the correct join cardinality of 300K. Notice that this doesn't work in pre-11g. Pre-11g versions require the index on t3 to be unique to take advantage of the "concatenated index" sanity check.

Having demonstrated all these sanity checks available for multi-column joins (the general multi-column and the concatenated index sanity check), let's see what happens when joining three tables:

select
count(*)
from
t1
, t3
, t4
where
t4.t1_id = t1.t1_id
and t3.t1_id = t4.t1_id
and t3.t2_id = t4.t2_id;

-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 65 | | 1468 (4)| 00:00:18 |
| 1 | SORT AGGREGATE | | 1 | 65 | | | |
|* 2 | HASH JOIN | | 300K| 18M| | 1468 (4)| 00:00:18 |
| 3 | INDEX FAST FULL SCAN | PK_T1 | 10000 | 126K| | 4 (0)| 00:00:01 |
|* 4 | HASH JOIN | | 300K| 14M| 1120K| 1455 (3)| 00:00:18 |
| 5 | INDEX FAST FULL SCAN| PK_T3 | 30000 | 761K| | 11 (10)| 00:00:01 |
| 6 | TABLE ACCESS FULL | T4 | 300K| 7617K| | 833 (3)| 00:00:10 |
-----------------------------------------------------------------------------------------

All I've done is to add "t1", in this case joining to "t4" on "t1_id". Thanks to the concatenated index sanity check (or the extended column group statistics in 11g) the calculated join cardinality is still 300K.

Now what happens if one decides to join "t3" to "t1" on "t1_id" instead of "t4.t1_id"? From a logical point of view this should lead to exactly the same result, since we can deduce that if "t4.t1_id" = "t1.t1_id" and "t3.t1_id = t1.t1_id" then "t3.t1_id = t4.t1_id".

select
count(*)
from
t1
, t3
, t4
where
t4.t1_id = t1.t1_id
and t3.t1_id = t1.t1_id
and t3.t2_id = t4.t2_id;

-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 65 | | 1475 (4)| 00:00:18 |
| 1 | SORT AGGREGATE | | 1 | 65 | | | |
|* 2 | HASH JOIN | | 45 | 2925 | | 1475 (4)| 00:00:18 |
| 3 | INDEX FAST FULL SCAN | PK_T1 | 10000 | 126K| | 4 (0)| 00:00:01 |
|* 4 | HASH JOIN | | 450K| 22M| 1120K| 1459 (3)| 00:00:18 |
| 5 | INDEX FAST FULL SCAN| PK_T3 | 30000 | 761K| | 11 (10)| 00:00:01 |
| 6 | TABLE ACCESS FULL | T4 | 300K| 7617K| | 833 (3)| 00:00:10 |
-----------------------------------------------------------------------------------------

The result is astonishing. By making this simple change we have effectively disabled all available sanity checks and arrive at the result based on the the default, uncorrelated selectivity.

So whenever you perform multi-column joins and the column data is correlated, be very careful how you join the tables - it might make a significant difference to the calculations of the optimizer.

alberto.dellera's picture

CBO: NewDensity for Frequency Histograms,11g-10.2.0.4 (densities part IV)

As we have seen in the previous posts of this series, in 11g a new figure named "NewDensity" has been introduced as a replacement for the "density" column statistic for columns whose histogram has been collected; this change has been backported in 10.2.0.4 also.
In the previous post we discussed how NewDensity influences the CBO [...]

rshamsud's picture

Is plan_hash_value a final say?

I was reviewing a performance issue with a client recently. Problem is that increased global cache waits causing application slowdown affecting few critical business functions. Using one of my script gc_traffic.sql > and graphing the results with Excel spreadsheet, it is established that there is a marked increase in GC traffic today compared to week earlier. Similar jobs runs every day and so comparing two week days is sufficient to show the GC traffic increase. Graph is between total blocks and AWR snap time in 30 minutes interval. [Click the picture below to review the graph clearly.]

gc comparison

Identifying the object creating this increased GC traffic is essential to identify root cause. We were able to quickly determine that this increase in GC traffic was localized around few SQL statements using ADDM and AWR reports. We decided to focus on one SQL with an obvious increase in elapsed time compared to prior week. So, first question asked, is there a change in the plan? plan_hash_value was reviewed and quickly determined that there is no change in the plan_hash_value.

Little bit of history, there were few changes performed by the vendor over the weekend as a part of few bug fixes. Vendor’s argument was that since there is no change to the plan_hash_value, SQL access plan did not change and so, this can’t be due to vendor changes. My Client’s argument was that there were no changes to the environment and problem started after the application changes.

There are many different things that can go wrong without changes to the execution plan. We can ignore those conditions for now since (a) there has been no changes to the environment (b) no visible changes to the data (c) no error message and average CR recv time is consistent with prior weeks. Well, Let’s cut to the chase. It boiled down to a question “Can SQL plan change without a change in plan_hash_value?”. What do you think? Please answer in the poll below.

plan_hash_value and hash_value

Hash_value of a SQL statement is generated from the text of an SQL statement and plan_hash_value is generated from the execution plan of that SQL statement[ More accurately, from that child cursors' execution plan and exactly what is involved in generating plan_hash_value is not published]. It is a general belief that plan_hash_value will change even if there is a slightest change in the execution plan. But, that is not always the case!

Test case

We will use a small table to explore this issue.

prompt Test case #1: Initial test case with index on columns (n1, n2).
prompt ==========
create table t1 (n1 number, n2 number, v1 varchar2(100));
create index t1_n1 on t1(n1,n2);

explain plan for select * from t1 where n1=:b1 and n2=:b2;
select * from table (dbms_xplan.display);
prompt plan hash value in this case is 626762252

Plan hash value: 626762252
-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |    53 |  4134 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |    53 |  4134 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_N1 |     1 |       |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("N1"=TO_NUMBER(:B1) AND "N2"=TO_NUMBER(:B2))

So, we got the plan_hash_value as highlighted above. In the following test case we will recreate the index reordering the columns as (n2, n1).

prompt Test case #2: Index with re-ordered columns
prompt ==========
drop index t1_n1;
create index t1_n1 on t1(n2,n1);

Plan hash value: 626762252

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |    53 |  4134 |     9   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |    53 |  4134 |     9   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_N1 |  5298 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("N2"=TO_NUMBER(:B2) AND "N1"=TO_NUMBER(:B1))

Notice that in the test result above filter predicate changed reflecting index column reordering. But the plan_hash_value did not change. Point is that execution plan can change without a change in plan_hash_value (due to change in the underlying tables).

Let’s modify that index dropping a column from the index in the next test case.

prompt Test case #3: dropping a column from index t1_n1.

drop index t1_n1;
create index t1_n1 on t1(n1);

explain plan for select * from t1 where n1=:b1 and n2=:b2;

Plan hash value: 626762252

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |    53 |  4134 |    41   (3)| 00:00:01 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T1    |    53 |  4134 |    41   (3)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_N1 |  2119 |       |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N2"=TO_NUMBER(:B2))
   2 - access("N1"=TO_NUMBER(:B1))

We see that predicates changed but the plan_hash_value did not change. In the test case below, we will modify index to be a function based index and test this SQL statement. There are also few more self-evident test cases below.

prompt Test case #3: Index is a function based index. Still, no change in the plan_hash_value.

create index t1_n1 on t1(to_char(n1));
explain plan for select * from t1 where to_char(n1)=:b1 and n2=:b2;

Plan hash value: 626762252
-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |    53 |  4134 |   127   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T1    |    53 |  4134 |   127   (1)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_N1 |  2119 |       |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N2"=TO_NUMBER(:B2))
   2 - access(TO_CHAR("N1")=:B1)

prompt Test case #4: Different schema and same SQL. plan_hash_value did not change.

Plan hash value: 626762252
-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |     1 |    78 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T1    |     1 |    78 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_N1 |     3 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("N2"=TO_NUMBER(:B2))
   2 - access("N1"=TO_NUMBER(:B1))

plan_hash_value is also case sensitive for index names. In the test cases below, we will create case sensitive indices.

prompt Test case #5: Lower case index_name.. plan_hash_value changed.
create index "t1_n1" on t1(n1,n2);
explain plan for select * from t1 where n1=:b1 and n2=:b2;

Plan hash value: 2252949961
-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |    53 |  4134 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |    53 |  4134 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | t1_n1 |     1 |       |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

prompt Test case #6: Upper case index_name.. plan_hash_value did not change.
create index "T1_N1" on t1(n1,n2);
explain plan for select * from t1 where n1=:b1 and n2=:b2;
Plan hash value: 626762252

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |    53 |  4134 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |    53 |  4134 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_N1 |     1 |       |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

prompt Test case #7: Space in the index name changes plan_hash_value though.
create index "T1_N1 " on t1(n1,n2);
explain plan for select * from t1 where n1=:b1 and n2=:b2;
Plan hash value: 1377541522
--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |    53 |  4134 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1     |    53 |  4134 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_N1  |     1 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Summary

In summary, plan_hash_value is a very good indicator to see if the plan changed or not. Don’t get me wrong, I also use plan_hash_value, but in addition to comparing CPU time and elapsed time. Execution plan can change even when there is no change to the plan_hash_value. Salient points from these test cases are:

  1. Plan_hash_value is dependent upon partial execution plan not on complete execution plan.
  2. If a predicate is moved from filter_predicate to access_predicate or vice-versa, it doesn’t affect plan_hash_value.
  3. Changes in the parallelism of the queries does not affect plan_hash_value. For example, if a query used 4 parallel slaves today and 16 parallel slaves yesterday, that change is not visible through plan_hash_value. This behavior is expected as parallel_adaptive_multiuser can allocate slaves depending upon the state of that instance.
  4. Plan_hash_value is case-sensitive to index/table names. Also sensitive to white-space characters.
  5. Plan_hash_value is not sensitive to index types. For example, if the index type is a function based index as in our test case #4, as long as, index name did not change plan_hash_value will remain the same.
  6. Plan_hash_value is not sensitive to schema either. SQL statement accessing different schemas can have same plan_hash_value too.

Back to our problem. One of the index was recreated removing a column and caused optimizer to apply filter predicates at table level increasing number of accesses to table block tremendously, leading to more logical reads, more Global cache waits etc. This problem was amplified since this SQL was executed very frequently and concurrently from all RAC instances.

In my client’s defense, this application change was tested thoroughly. But, alas, test data chosen for this performance test was not probing this specific issue. This performance issue did not show up in development. Essentially, chosen data in the performance benchmark suite was not an excellent choice.

As they say in Gaelic language “Go raimh maith agat” to my client for allowing me to post this blog.
This can be read in traditional format in
plan_hash_value_and_gc
Update 1: Added the document and also corrected a typo.

rshamsud's picture

Exciting seminars in Dallas arena

If you live in Dallas area, there are two very important seminars coming up:

  1. Tanel Poder is presenting his Advanced Oracle troubleshooting seminar on September 9-11, 2009. .
  2. Jonathan Lewis presenting two day seminar cost based optimization and writing efficient SQL on Oct 28 and Oct 29 2009
  3. .

These two gentlemen needs no introduction. Tanel Poder is an expert in Oracle internals and Jonathan Lewis is a Guru in Cost based optimization and writing efficient SQL.

Tell them I sent ya :-)

Update: I just heard from Tanel and he is unable to make it to Dallas this September. He is rescheduling his seminar and I can’t wait for his seminar. I will update this blog with his rescheduled dates.

randolf.geist's picture

Optimizer cleverness

At present I'm quite busy and therefore don't have much time to spent on writing blog notes, but I couldn't resist to publish this small and simple test case.

Often you can read (mostly unqualified) rants in various places and forums about the Cost Based Optimizer how stupid, unpredictable etc. it seems to be.

So I think it's time to demonstrate how clever the optimizer sometimes can be.

Consider the following setup:

drop table t_opt_clever purge;

-- Use PCTFREE 99 so that only one row per (leaf) block
-- This can tell us how many "rows" had to be inspected
-- by checking the number of (leaf) blocks accessed
-- Unfortunately Oracle (usually) doesn't provide the information
-- how many rows have been accessed in the execution plan,
-- but only how many rows are returned by an operation
create table t_opt_clever (
id not null constraint pk_opt_clever primary key,
col1 not null,
col2 not null,
col3 not null,
col4 not null,
col5 not null,
filler
)
pctfree 99
pctused 1
as
select
level as id
, round(dbms_random.value(0, 200)) as col1
, round(dbms_random.value(0, 400)) as col2
, case
when level <= 666
then 'FIRST_BUCKET'
when level <= 833
then 'SECOND_BUCKET'
when level <= 1000
then 'THIRD_BUCKET'
end as col3
, round(dbms_random.value(0, 600)) as col4
, round(dbms_random.value(0, 800)) as col5
, rpad('x', 100, 'x') as filler
from
dual
connect by
level <= 1000;

create index idx_opt_clever1 on t_opt_clever (col5, col1, col4, col2) pctfree 99 compute statistics;

create index idx_opt_clever2 on t_opt_clever (col5, col1, col3, col4, col2) pctfree 99 compute statistics;

exec dbms_stats.gather_table_stats(null, 'T_OPT_CLEVER')

-- scale the table and index by factor 1000
exec dbms_stats.set_table_stats(null, 'T_OPT_CLEVER', numrows => 1000000, numblks => 30000)

exec dbms_stats.set_index_stats(null, 'PK_OPT_CLEVER', numrows=> 1000000, numlblks => 2000, numdist=>1000000, clstfct => 100000, indlevel => 3)

exec dbms_stats.set_index_stats(null, 'IDX_OPT_CLEVER1', numrows=> 1000000, numlblks => 14000, numdist=>1000000, clstfct => 1000000, indlevel => 3)

exec dbms_stats.set_index_stats(null, 'IDX_OPT_CLEVER2', numrows=> 1000000, numlblks => 16000, numdist=>1000000, clstfct => 1000000, indlevel => 3)

Basically this simulates a 1,000,000 rows table with two suboptimal indexes given the following Top 100 query:

-- Now which index can be efficiently used by the optimizer?
select
*
from (
select
*
from
t_opt_clever
where
col3 = 'FIRST_BUCKET'
order by
col3, col5, col1, col4, col2
)
where
rownum <= 100;

Now what do you think, can one of these indexes efficiently be used by the optimizer, and if yes, which one?

At first sight both indexes can't be used to satisfy the requested sort order to avoid a costly full scan of data and a corresponding SORT ORDER BY (STOPKEY) operation, and can't be used efficiently to filter the data because the filter predicate is not among the leading columns.

Let's check the result:

SQL> select * from table(dbms_xplan.display_cursor(null, null, '+COST ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID fz6vky8n5a3xq, child number 0
-------------------------------------
select * from ( select * from t_opt_clever where
col3 = 'FIRST_BUCKET' order by col3, col5, col1, col4, col2 ) where
rownum <= 100

Plan hash value: 4203008252

---------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | Reads |
---------------------------------------------------------------------------------------------------------------------------------
|* 1 | COUNT STOPKEY | | 1 | | | 100 |00:00:00.29 | 256 | 100 |
| 2 | VIEW | | 1 | 101 | 109 (0)| 100 |00:00:00.29 | 256 | 100 |
| 3 | TABLE ACCESS BY INDEX ROWID| T_OPT_CLEVER | 1 | 333K| 109 (0)| 100 |00:00:00.29 | 256 | 100 |
|* 4 | INDEX FULL SCAN | IDX_OPT_CLEVER2 | 1 | 101 | 8 (0)| 100 |00:00:00.01 | 156 | 0 |
---------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(ROWNUM<=100)
4 - access("COL3"='FIRST_BUCKET')
filter("COL3"='FIRST_BUCKET')

24 rows selected.

That is quite interesting, the index IDX_OPT_CLEVER2 is used and no SORT ORDER BY operation can be found in the execution plan, although the index doesn't match the requested sort order. And here comes the cleverness of the optimizer: It recognizes that due to the filter predicate on COL3 this index can actually be used to satisfy the sort order because it is not relevant for the resulting order since COL3 will always be the constant value of the filter predicate. And the same applies to IDX_OPT_CLEVER1, by the way.

But IDX_OPT_CLEVER2 is more efficient than using IDX_OPT_CLEVER1 because the filter predicate can be evaluated on the index data already eliminating some of the rows before visiting the table. Depending on the clustering factor this can make a significant difference to the cost of the operation, since random row accesses to table rows potentially require to access a different block per row.

This can be seen when forcing the usage of IDX_OPT_CLEVER1:

SQL> select * from table(dbms_xplan.display_cursor(null, null, '+COST ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 5tgmgfvyyx6z6, child number 0
-------------------------------------
select * from ( select /*+ index(t_opt_clever idx_opt_clever1) */ * from
t_opt_clever where col3 = 'FIRST_BUCKET' order by col3,
col5, col1, col4, col2 ) where rownum <= 100

Plan hash value: 678132971

---------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | Reads |
---------------------------------------------------------------------------------------------------------------------------------
|* 1 | COUNT STOPKEY | | 1 | | | 100 |00:00:00.20 | 310 | 54 |
| 2 | VIEW | | 1 | 101 | 312 (1)| 100 |00:00:00.20 | 310 | 54 |
|* 3 | TABLE ACCESS BY INDEX ROWID| T_OPT_CLEVER | 1 | 101 | 312 (1)| 100 |00:00:00.20 | 310 | 54 |
| 4 | INDEX FULL SCAN | IDX_OPT_CLEVER1 | 1 | 1000K| 8 (0)| 154 |00:00:00.01 | 156 | 0 |
---------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(ROWNUM<=100)
3 - filter("COL3"='FIRST_BUCKET')

23 rows selected.

Two things can be seen here:

1. The optimizer is again smart and is able to avoid the SORT ORDER BY operation, because the index IDX_OPT_CLEVER1 can also be used to return in the data in the requested order, again because COL3 is constant.

2. Using IDX_OPT_CLEVER1 is less efficient because more table rows have to be visited to apply the filter predicate.

The fact that the indexes can only be used efficiently under this special circumstance can be verified by changing the filter predicate so that COL3 can have more than a single value and therefore it's no longer possible to avoid an ORDER BY operation:

-- Change the filter predicate and force index
select
*
from (
select /*+ index(t_opt_clever idx_opt_clever2) */
*
from
t_opt_clever
where
col3 in ('FIRST_BUCKET', 'SECOND_BUCKET')
order by
col5, col1, col4, col2
)
where
rownum <= 100;
SQL> select * from table(dbms_xplan.display_cursor(null, null, '+COST ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID axr6u0yvdk50f, child number 0
-------------------------------------
select * from ( select /*+ index(t_opt_clever idx_opt_clever2) */ * from
t_opt_clever where col3 in ('FIRST_BUCKET', 'SECOND_BUCKET') order by col3, col5, col1,
col4, col2 ) where rownum <= 100

Plan hash value: 2229390605

----------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------------
|* 1 | COUNT STOPKEY | | 1 | | | 100 |00:00:00.02 | 1835 | | | |
| 2 | VIEW | | 1 | 666K| 703K (1)| 100 |00:00:00.02 | 1835 | | | |
|* 3 | SORT ORDER BY STOPKEY | | 1 | 666K| 703K (1)| 100 |00:00:00.02 | 1835 | 20480 | 20480 |18432 (0)|
| 4 | TABLE ACCESS BY INDEX ROWID| T_OPT_CLEVER | 1 | 666K| 683K (1)| 833 |00:00:00.01 | 1835 | | | |
|* 5 | INDEX FULL SCAN | IDX_OPT_CLEVER2 | 1 | 666K| 16100 (1)| 833 |00:00:00.01 | 1002 | | | |
----------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(ROWNUM<=100)
3 - filter(ROWNUM<=100)
5 - filter(("COL3"='FIRST_BUCKET' OR "COL3"='SECOND_BUCKET'))

25 rows selected.

Without the index hint the optimizer chooses a full table scan. Forcing e.g. the index IDX_OPT_CLEVER2 shows that indeed all rows had to be processed first and additionally a sort operation was necessary.

So it's interesting to note that the optimizer recognizes special cases where single value predicates allow an index usage that otherwise wouldn't be possible. This is a nice move, since it allows to perform above query in quite an efficient manner although the setup is suboptimal (e.g. a different index with COL3 as leading column or an appropriate IOT could be more suitable, depending on what else is done with the table). Under these (simulated) circumstances this optimization makes quite a difference compared to the otherwise only possible full table scan operation of a 30,000 blocks table.

By the way, above results could be reproduced on 10.2.0.4 and 11.1.0.7 Win32 using default system statistics and an 8KB LMT MSSM tablespace.

randolf.geist's picture

PLAN_HASH_VALUE - How equal (and stable?) are your execution plans - part 1

Oracle provides in recent releases the PLAN_HASH_VALUE information, which according to the documentation, is the following:

"Numerical representation of the SQL plan for the cursor. Comparing one PLAN_HASH_VALUE to another easily identifies whether or not two plans are the same (rather than comparing the two plans line by line)."

So according to the documentation the PLAN_HASH_VALUE can be used as a shortcut to quickly and easily determine if two execution plans are the "same".

I think that the statement "the two execution plans are the same" suggests that execution plans having the same PLAN_HASH_VALUE yield the same or at least similar performance at runtime. An interesting point that I would like to cover here in the following test cases.

This raises the interesting question, what exactly is the PLAN_HASH_VALUE based upon? Obviously it is a hash value calculated using the execution plan information as input. Apart from the fact that hash values are in theory always subject to potential hash collisions, which means that two different inputs can lead to the same hash value, the more interesting question is, which attributes of the execution plan are used as input?

Having this information at hand allows us to get a better understanding if two execution plans with the same PLAN_HASH_VALUE actually have to a yield similar execution profile (which Oracle doesn't say but I assume is a common assumption - or may be misconception?).

Let's start with a simple example to determine what makes the PLAN_HASH_VALUE different. All results shown below come from an 11.1.0.7 Win32 instance with a 8KB default block size, a MSSM 8KB LMT tablespace and default system statistics.

SQL>
SQL> drop table plan_hash_value_test1 purge;

Table dropped.

SQL>
SQL> drop table plan_hash_value_test2 purge;

Table dropped.

SQL>
SQL> drop table plan_hash_value_test3 purge;

Table dropped.

SQL>
SQL> drop user test_user_plan_hash_value cascade;

User dropped.

SQL>
SQL> create user test_user_plan_hash_value
2 identified by test_user_plan_hash_value
3 default tablespace test_8k
4 quota unlimited on test_8k;

User created.

SQL>
SQL> create table plan_hash_value_test1
2 as
3 select
4 id as id1
5 , mod(id, 2) as id2
6 , rpad('x', 20) as small_vc
7 from
8 (
9 select
10 level as id
11 from
12 dual
13 connect by
14 level <= 1000
15 );

Table created.

SQL>
SQL> exec dbms_stats.gather_table_stats(null, 'plan_hash_value_test1')

PL/SQL procedure successfully completed.

SQL>
SQL> create index plan_hash_value_test1_idx1 on plan_hash_value_test1 (id1, id2);

Index created.

SQL>
SQL> alter system flush shared_pool;

System altered.

SQL>
SQL> alter system flush shared_pool;

System altered.

SQL>
SQL> select /*+ find_me
2 index(a, plan_hash_value_test1_idx1) */
3 *
4 from
5 plan_hash_value_test1 a
6 where
7 id1 = 1
8 and rownum <= 1;

ID1 ID2 SMALL_VC
---------- ---------- --------------------
1 1 x

SQL>
SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
SQL_ID 3u4sfg1kzqtct, child number 0
-------------------------------------
select /*+ find_me index(a, plan_hash_value_test1_idx1) */
* from plan_hash_value_test1 a where id1 = 1
and rownum <= 1

Plan hash value: 2655295642

-----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 3 (100)| |
|* 1 | COUNT STOPKEY | | | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| PLAN_HASH_VALUE_TEST1 | 1 | 27 | 3 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | PLAN_HASH_VALUE_TEST1_IDX1 | 1 | | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(ROWNUM<=1)
3 - access("ID1"=1)

23 rows selected.

SQL>
SQL> select
2 object#, object_owner, object_name
3 from
4 v$sql_plan
5 where
6 operation = 'INDEX'
7 and sql_id = (select
8 sql_id
9 from
10 v$sql
11 where
12 sql_text like 'select /*+ find_me%'
13 );

OBJECT# OBJECT_OWNER OBJECT_NAME
---------- ------------------------------ ------------------------------
76267 CBO_TEST PLAN_HASH_VALUE_TEST1_IDX1

SQL>
SQL> pause

SQL>
SQL> alter system flush shared_pool;

System altered.

SQL>
SQL> alter system flush shared_pool;

System altered.

SQL>
SQL> drop index plan_hash_value_test1_idx1;

Index dropped.

SQL>
SQL> create index test_user_plan_hash_value.plan_hash_value_test1_idx1 on plan_hash_value_test1 (id2, id1);

Index created.

SQL>
SQL> exec dbms_stats.set_index_stats('test_user_plan_hash_value', 'plan_hash_value_test1_idx1', numdist=>0, numrows=>0)

PL/SQL procedure successfully completed.

SQL>
SQL> select /*+ find_me
2 index(a, plan_hash_value_test1_idx1) */
3 *
4 from
5 plan_hash_value_test1 a
6 where
7 id2 = 1
8 and rownum <= 2;

ID1 ID2 SMALL_VC
---------- ---------- --------------------
1 1 x
3 1 x

SQL>
SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
SQL_ID dsjuqc6fkfffj, child number 0
-------------------------------------
select /*+ find_me index(a, plan_hash_value_test1_idx1) */
* from plan_hash_value_test1 a where id2 = 1
and rownum <= 2

Plan hash value: 2655295642

-----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 3 (100)| |
|* 1 | COUNT STOPKEY | | | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| PLAN_HASH_VALUE_TEST1 | 2 | 54 | 3 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | PLAN_HASH_VALUE_TEST1_IDX1 | 1 | | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(ROWNUM<=2)
3 - access("ID2"=1)

23 rows selected.

SQL>
SQL> select
2 object#, object_owner, object_name
3 from
4 v$sql_plan
5 where
6 operation = 'INDEX'
7 and sql_id = (select
8 sql_id
9 from
10 v$sql
11 where
12 sql_text like 'select /*+ find_me%'
13 );

OBJECT# OBJECT_OWNER OBJECT_NAME
---------- ------------------------------ ------------------------------
76268 TEST_USER_PLAN_HASH_VALUE PLAN_HASH_VALUE_TEST1_IDX1

SQL>
SQL> pause

SQL>
SQL> drop index test_user_plan_hash_value.plan_hash_value_test1_idx1;

Index dropped.

SQL>
SQL> create index plan_hash_value_test1_idx1 on plan_hash_value_test1 (id2, id1);

Index created.

SQL>
SQL> select /*+ find_me
2 index(a, plan_hash_value_test1_idx1) */
3 *
4 from
5 plan_hash_value_test1 a
6 where
7 id2 = 1
8 and rownum <= 2;

ID1 ID2 SMALL_VC
---------- ---------- --------------------
1 1 x
3 1 x

SQL>
SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
SQL_ID dsjuqc6fkfffj, child number 0
-------------------------------------
select /*+ find_me index(a, plan_hash_value_test1_idx1) */
* from plan_hash_value_test1 a where id2 = 1
and rownum <= 2

Plan hash value: 2655295642

-----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 3 (100)| |
|* 1 | COUNT STOPKEY | | | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| PLAN_HASH_VALUE_TEST1 | 2 | 54 | 3 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | PLAN_HASH_VALUE_TEST1_IDX1 | 500 | | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(ROWNUM<=2)
3 - access("ID2"=1)

23 rows selected.

SQL>
SQL> select
2 object#, object_owner, object_name
3 from
4 v$sql_plan
5 where
6 operation = 'INDEX'
7 and sql_id = (select
8 sql_id
9 from
10 v$sql
11 where
12 sql_text like 'select /*+ find_me%'
13 );

OBJECT# OBJECT_OWNER OBJECT_NAME
---------- ------------------------------ ------------------------------
76269 CBO_TEST PLAN_HASH_VALUE_TEST1_IDX1

SQL>

Note the subtle differences: We have an index on a simple table, but these are actually three different objects, once owned by a different user and once with a different definition, but all three statements get the same PLAN_HASH_VALUE.

So this simple example already raises some of the most important points:

- The OBJECT_ID of an object obviously doesn't get used for the PLAN_HASH_VALUE calculation. Although present in the V$SQL_PLAN view, checking the PLAN_TABLE definition suggests why: It's not part of the PLAN_TABLE definition. This means that e.g. replacing an index with a different one but keeping the name will result in the same PLAN_HASH_VALUE if the operations and their order remain the same, like demonstrated (Although I have to admit that doing so might be called bad practice).

- The owner of an object doesn't get used either. Why this is so, remains unclear, since it seems to be part of all related object definitions (OBJECT_OWNER attribute in PLAN_TABLE/V$SQL_PLAN), but might be an very important point if you have multiple schemas with objects of the same name but representing different data or data volume. As you can see from the example the PLAN_HASH_VALUE is the same but the number of rows estimated is different due to the different underlying object (and its intentionally manipulated statistics).

- One of the most crucial aspects is also demonstrated: The FILTER_PREDICATES and ACCESS_PREDICATES information is NOT part of the PLAN_HASH_VALUE. This means that two executions plans can have the same PLAN_HASH_VALUE but behave significantly differently at actual execution time.

- What also is obvious from this example is that the ROWS and BYTES information is not used as part of the PLAN_HASH_VALUE. More on this later.

Let's move on to the next example:

SQL>
SQL> variable b2 number
SQL>
SQL> exec :b2 := 1;

PL/SQL procedure successfully completed.

SQL>
SQL> select /*+ find_me
2 index(a, plan_hash_value_test1_idx1) */
3 *
4 from
5 plan_hash_value_test1 a
6 where
7 id2 = :b2
8 and rownum <= 1;

ID1 ID2 SMALL_VC
---------- ---------- --------------------
1 1 x

SQL>
SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
SQL_ID 4y6b2pu1hzs3h, child number 0
-------------------------------------
select /*+ find_me index(a, plan_hash_value_test1_idx1) */
* from plan_hash_value_test1 a where id2 = :b2
and rownum <= 1

Plan hash value: 2655295642

-----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 3 (100)| |
|* 1 | COUNT STOPKEY | | | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| PLAN_HASH_VALUE_TEST1 | 1 | 27 | 3 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | PLAN_HASH_VALUE_TEST1_IDX1 | 500 | | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(ROWNUM<=1)
3 - access("ID2"=:B2)

23 rows selected.

SQL>
SQL> variable b2 varchar2(20)
SQL>
SQL> exec :b2 := '1';

PL/SQL procedure successfully completed.

SQL>
SQL> select /*+ find_me
2 index(a, plan_hash_value_test1_idx1) */
3 *
4 from
5 plan_hash_value_test1 a
6 where
7 id2 = :b2
8 and rownum <= 1;

ID1 ID2 SMALL_VC
---------- ---------- --------------------
1 1 x

SQL>
SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
SQL_ID 4y6b2pu1hzs3h, child number 1
-------------------------------------
select /*+ find_me index(a, plan_hash_value_test1_idx1) */
* from plan_hash_value_test1 a where id2 = :b2
and rownum <= 1

Plan hash value: 2655295642

-----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 3 (100)| |
|* 1 | COUNT STOPKEY | | | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| PLAN_HASH_VALUE_TEST1 | 1 | 27 | 3 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | PLAN_HASH_VALUE_TEST1_IDX1 | 500 | | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(ROWNUM<=1)
3 - access("ID2"=TO_NUMBER(:B2))

23 rows selected.

SQL>

Whereas the previous examples mainly used different SQLs (and therefore had different SQL_HASH_VALUEs or SQL_IDs) this example uses the same SQL and demonstrates the following:

- Bad application behaviour (in this case different types of bind variables) or other reasons can lead to unshared cursors, i.e. multiple child cursors for the same SQL (not actually the main topic here)

- But although the different cursors have different access predicates (in this case the implicit type conversion), as already shown, different access predicates don't lead to different PLAN_HASH_VALUEs

For this particular statement the difference in the predicates very likely doesn't represent a threat, but there are more subtle cases where these differences can lead to significant changes in behaviour. Possible reasons are different order of the predicate evaluation (e.g. when having system statistics enabled, which is the default from 10g on) which can make a significant difference in CPU usage or general resource consumption depending on the actual cost of the predicate evaluation (e.g. a costly PL/SQL function call), or even the evaluation of (filter) predicates at different steps of the same execution plan which could lead to significant differences in the number of rows generated by each operation step of the execution plan, and therefore make an execution plan much more inefficient due to the larger number of rows processed.

What about the actual estimates associated with the particular operations of an execution plan?

SQL>
SQL> explain plan for
2 select /*+ full(a) */
3 *
4 from
5 plan_hash_value_test1 a
6 where
7 id2 = 1;

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3758120161

-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 500 | 13500 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| PLAN_HASH_VALUE_TEST1 | 500 | 13500 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("ID2"=1)

13 rows selected.

SQL>
SQL> exec dbms_stats.set_table_stats(null, 'plan_hash_value_test1', numrows=>100000, numblks=>10000)

PL/SQL procedure successfully completed.

SQL>
SQL> explain plan for
2 select /*+ full(a) */
3 *
4 from
5 plan_hash_value_test1 a
6 where
7 id2 = 1;

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3758120161

-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 50000 | 1318K| 2733 (1)| 00:00:33 |
|* 1 | TABLE ACCESS FULL| PLAN_HASH_VALUE_TEST1 | 50000 | 1318K| 2733 (1)| 00:00:33 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("ID2"=1)

13 rows selected.

SQL>

It's obvious that none of the ROWS, BYTES, COST nor derived information like TIME in the later releases are used to calculate the PLAN_HASH_VALUE. So again this makes clear that the same PLAN_HASH_VALUE of two statements doesn't say anything about the similarity of the runtime performance.

Another example that demonstrates this point with a slightly more complex plan:

SQL>
SQL> exec dbms_stats.gather_table_stats(null, 'plan_hash_value_test1')

PL/SQL procedure successfully completed.

SQL>
SQL> create table plan_hash_value_test2
2 as
3 select
4 id as id1
5 , mod(id, 2) as id2
6 , rpad('x', 20) as small_vc
7 from
8 (
9 select
10 level as id
11 from
12 dual
13 connect by
14 level <= 1000
15 );

Table created.

SQL>
SQL> exec dbms_stats.gather_table_stats(null, 'plan_hash_value_test2')

PL/SQL procedure successfully completed.

SQL>
SQL> create index plan_hash_value_test2_idx1 on plan_hash_value_test2 (id2, id1);

Index created.

SQL>
SQL> explain plan for
2 select /*+
3 use_nl(a, b)
4 index(b)
5 leading(a)
6 */
7 *
8 from
9 plan_hash_value_test1 a
10 , plan_hash_value_test2 b
11 where
12 a.id1 = b.id1
13 and a.id2 = b.id2
14 and a.id1 between 1 and 10;

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 458854847

-----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9 | 486 | 23 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 9 | 486 | 23 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL | PLAN_HASH_VALUE_TEST1 | 10 | 270 | 3 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | PLAN_HASH_VALUE_TEST2_IDX1 | 1 | | 1 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| PLAN_HASH_VALUE_TEST2 | 1 | 27 | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - filter("A"."ID1"<=10 AND "A"."ID1">=1)
4 - access("A"."ID2"="B"."ID2" AND "A"."ID1"="B"."ID1")
filter("B"."ID1"<=10 AND "B"."ID1">=1)

19 rows selected.

SQL>
SQL> explain plan for
2 select /*+
3 use_nl(a, b)
4 index(b)
5 leading(a)
6 */
7 *
8 from
9 plan_hash_value_test1 a
10 , plan_hash_value_test2 b
11 where
12 a.id2 = b.id2
13 and a.id1 between 1 and 10;

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 458854847

-----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5005 | 263K| 74 (2)| 00:00:01 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 5005 | 263K| 74 (2)| 00:00:01 |
|* 3 | TABLE ACCESS FULL | PLAN_HASH_VALUE_TEST1 | 10 | 270 | 3 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | PLAN_HASH_VALUE_TEST2_IDX1 | 500 | | 2 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| PLAN_HASH_VALUE_TEST2 | 500 | 13500 | 7 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - filter("A"."ID1"<=10 AND "A"."ID1">=1)
4 - access("A"."ID2"="B"."ID2")

18 rows selected.

SQL>

Notice how one statement performs an effective join using appropriate join predicates whereas the other one generates duplicate records. Again, since the filter and access predicates are not evaluated, these two plans get the same PLAN_HASH_VALUE, although we can say that one of them is potentially suboptimal (and usually would result in a significantly different execution plan which in turn would have different PLAN_HASH_VALUEs but there are certainly more complex scenarios where the optimizer goes wrong for whatever reason resulting in such potentially inefficient execution plans).

What about extended execution plan information like partitioning and parallel execution?

Let's first address partitioning:

SQL> create table plan_hash_value_test3
2 (
3 invoice_no number,
4 sale_year integer not null,
5 sale_month integer not null,
6 sale_day integer not null
7 )
8 partition by range (invoice_no)
9 (
10 partition part_001 values less than (100),
11 partition part_002 values less than (400),
12 partition part_003 values less than (800),
13 partition part_004 values less than (maxvalue)
14 );

Table created.

SQL>
SQL> exec dbms_random.seed(0)

PL/SQL procedure successfully completed.

SQL>
SQL> insert into plan_hash_value_test3 (
2 invoice_no,
3 sale_year,
4 sale_month,
5 sale_day
6 )
7 select rownum,
8 2000 + round(dbms_random.value(0, 8)) as sale_year,
9 trunc(dbms_random.value(1, 13)) as sale_month,
10 trunc(dbms_random.value(1, 29)) as sale_day
11 from dual
12 connect by level <= 1000;

1000 rows created.

SQL>
SQL> commit;

Commit complete.

SQL>
SQL> exec dbms_stats.gather_table_stats(null, 'plan_hash_value_test3')

PL/SQL procedure successfully completed.

SQL>
SQL> exec dbms_stats.set_table_stats(null, 'plan_hash_value_test3', partname=>'part_002', numblks=>100000)

PL/SQL procedure successfully completed.

SQL>
SQL> select
2 partition_name
3 , blocks
4 from
5 user_tab_statistics
6 where
7 table_name = 'PLAN_HASH_VALUE_TEST3';

PARTITION_NAME BLOCKS
------------------------------ ----------
5
PART_001 1
PART_002 100000
PART_003 2
PART_004 1

SQL>
SQL> explain plan for
2 select
3 *
4 from
5 plan_hash_value_test3
6 where
7 invoice_no < 100;

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 4079248530

----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 99 | 1188 | 2 (0)| 00:00:01 | | |
| 1 | PARTITION RANGE SINGLE| | 99 | 1188 | 2 (0)| 00:00:01 | 1 | 1 |
| 2 | TABLE ACCESS FULL | PLAN_HASH_VALUE_TEST3 | 99 | 1188 | 2 (0)| 00:00:01 | 1 | 1 |
----------------------------------------------------------------------------------------------------------------

9 rows selected.

SQL>
SQL> explain plan for
2 select
3 *
4 from
5 plan_hash_value_test3
6 where
7 invoice_no >= 100 and invoice_no < 400;

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 4079248530

----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 300 | 3900 | 27256 (1)| 00:05:28 | | |
| 1 | PARTITION RANGE SINGLE| | 300 | 3900 | 27256 (1)| 00:05:28 | 2 | 2 |
| 2 | TABLE ACCESS FULL | PLAN_HASH_VALUE_TEST3 | 300 | 3900 | 27256 (1)| 00:05:28 | 2 | 2 |
----------------------------------------------------------------------------------------------------------------

9 rows selected.

SQL>
SQL> variable b1 number
SQL> variable b2 number
SQL>
SQL> exec :b1 := 0;

PL/SQL procedure successfully completed.

SQL> exec :b2 := 100;

PL/SQL procedure successfully completed.

SQL>
SQL> select
2 count(*)
3 from
4 plan_hash_value_test3
5 where
6 invoice_no >= :b1 and invoice_no < :b2;

COUNT(*)
----------
99

SQL>
SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
SQL_ID 23m1jk6dbsvbx, child number 0
-------------------------------------
select count(*) from plan_hash_value_test3 where
invoice_no >= :b1 and invoice_no < :b2

Plan hash value: 2447373661

--------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 2 (100)| | | |
| 1 | SORT AGGREGATE | | 1 | 3 | | | | |
|* 2 | FILTER | | | | | | | |
| 3 | PARTITION RANGE ITERATOR| | 99 | 297 | 2 (0)| 00:00:01 | KEY | KEY |
|* 4 | TABLE ACCESS FULL | PLAN_HASH_VALUE_TEST3 | 99 | 297 | 2 (0)| 00:00:01 | KEY | KEY |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter(:B1<:B2)
4 - filter(("INVOICE_NO">=:B1 AND "INVOICE_NO"<:B2))

23 rows selected.

SQL>
SQL> alter system flush shared_pool;

System altered.

SQL>
SQL> alter system flush shared_pool;

System altered.

SQL>
SQL> exec :b1 := 100;

PL/SQL procedure successfully completed.

SQL> exec :b2 := 400;

PL/SQL procedure successfully completed.

SQL>
SQL> select
2 count(*)
3 from
4 plan_hash_value_test3
5 where
6 invoice_no >= :b1 and invoice_no < :b2;

COUNT(*)
----------
300

SQL>
SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
SQL_ID 23m1jk6dbsvbx, child number 0
-------------------------------------
select count(*) from plan_hash_value_test3 where
invoice_no >= :b1 and invoice_no < :b2

Plan hash value: 2447373661

--------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 27256 (100)| | | |
| 1 | SORT AGGREGATE | | 1 | 4 | | | | |
|* 2 | FILTER | | | | | | | |
| 3 | PARTITION RANGE ITERATOR| | 300 | 1200 | 27256 (1)| 00:05:28 | KEY | KEY |
|* 4 | TABLE ACCESS FULL | PLAN_HASH_VALUE_TEST3 | 300 | 1200 | 27256 (1)| 00:05:28 | KEY | KEY |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter(:B1<:B2)
4 - filter(("INVOICE_NO">=:B1 AND "INVOICE_NO"<:B2))

23 rows selected.

SQL>

The PSTART and PSTOP information is not used either for the PLAN_HASH_VALUE, so different execution plans accessing different partitions of the same object might get the same PLAN_HASH_VALUE, but again the runtime performance might be dramatically different.

By the way above example demonstrates that the CBO peeks at the binds and uses then the partition level statistics of the corresponding partition defined by the bind values (in case the bind values prune to a single partition), although the execution plan might be executed with different bind values actually accessing different partitions at runtime.

What about parallel execution?

SQL>
SQL> explain plan for
2 select /*+
3 full(a)
4 full(b)
5 use_hash(a, b)
6 parallel(a)
7 parallel(b)
8 pq_distribute(b, none, broadcast)
9 */
10 *
11 from
12 plan_hash_value_test1 a
13 , plan_hash_value_test2 b
14 where
15 a.id1 = b.id1
16 and a.id2 = b.id2
17 and a.id1 between 1 and 10;

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1365899609

-------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
-------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9 | 486 | 5 (20)| 00:00:01 | | | |
| 1 | PX COORDINATOR | | | | | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10001 | 9 | 486 | 5 (20)| 00:00:01 | Q1,01 | P->S | QC (RAND) |
|* 3 | HASH JOIN | | 9 | 486 | 5 (20)| 00:00:01 | Q1,01 | PCWP | |
| 4 | PX BLOCK ITERATOR | | 10 | 270 | 2 (0)| 00:00:01 | Q1,01 | PCWC | |
|* 5 | TABLE ACCESS FULL | PLAN_HASH_VALUE_TEST1 | 10 | 270 | 2 (0)| 00:00:01 | Q1,01 | PCWP | |
| 6 | BUFFER SORT | | | | | | Q1,01 | PCWC | |
| 7 | PX RECEIVE | | 10 | 270 | 2 (0)| 00:00:01 | Q1,01 | PCWP | |
| 8 | PX SEND BROADCAST | :TQ10000 | 10 | 270 | 2 (0)| 00:00:01 | Q1,00 | P->P | BROADCAST |
| 9 | PX BLOCK ITERATOR | | 10 | 270 | 2 (0)| 00:00:01 | Q1,00 | PCWC | |
|* 10 | TABLE ACCESS FULL| PLAN_HASH_VALUE_TEST2 | 10 | 270 | 2 (0)| 00:00:01 | Q1,00 | PCWP | |
-------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("A"."ID1"="B"."ID1" AND "A"."ID2"="B"."ID2")
5 - filter("A"."ID1"<=10 AND "A"."ID1">=1)
10 - filter("B"."ID1"<=10 AND "B"."ID1">=1)

24 rows selected.

SQL>
SQL> explain plan for
2 select /*+
3 full(a)
4 full(b)
5 use_hash(a, b)
6 parallel(a)
7 parallel(b)
8 pq_distribute(b, broadcast, none)
9 */
10 *
11 from
12 plan_hash_value_test1 a
13 , plan_hash_value_test2 b
14 where
15 a.id1 = b.id1
16 and a.id2 = b.id2
17 and a.id1 between 1 and 10;

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3903716067

------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9 | 486 | 5 (20)| 00:00:01 | | | |
| 1 | PX COORDINATOR | | | | | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10001 | 9 | 486 | 5 (20)| 00:00:01 | Q1,01 | P->S | QC (RAND) |
|* 3 | HASH JOIN | | 9 | 486 | 5 (20)| 00:00:01 | Q1,01 | PCWP | |
| 4 | PX RECEIVE | | 10 | 270 | 2 (0)| 00:00:01 | Q1,01 | PCWP | |
| 5 | PX SEND BROADCAST | :TQ10000 | 10 | 270 | 2 (0)| 00:00:01 | Q1,00 | P->P | BROADCAST |
| 6 | PX BLOCK ITERATOR | | 10 | 270 | 2 (0)| 00:00:01 | Q1,00 | PCWC | |
|* 7 | TABLE ACCESS FULL| PLAN_HASH_VALUE_TEST1 | 10 | 270 | 2 (0)| 00:00:01 | Q1,00 | PCWP | |
| 8 | PX BLOCK ITERATOR | | 10 | 270 | 2 (0)| 00:00:01 | Q1,01 | PCWC | |
|* 9 | TABLE ACCESS FULL | PLAN_HASH_VALUE_TEST2 | 10 | 270 | 2 (0)| 00:00:01 | Q1,01 | PCWP | |
------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("A"."ID1"="B"."ID1" AND "A"."ID2"="B"."ID2")
7 - filter("A"."ID1"<=10 AND "A"."ID1">=1)
9 - filter("B"."ID1"<=10 AND "B"."ID1">=1)

23 rows selected.

SQL>
SQL> explain plan for
2 select /*+
3 full(a)
4 full(b)
5 use_hash(a, b)
6 parallel(a)
7 parallel(b)
8 pq_distribute(b, hash, hash)
9 */
10 *
11 from
12 plan_hash_value_test1 a
13 , plan_hash_value_test2 b
14 where
15 a.id1 = b.id1
16 and a.id2 = b.id2
17 and a.id1 between 1 and 10;

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 904614956

------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9 | 486 | 5 (20)| 00:00:01 | | | |
| 1 | PX COORDINATOR | | | | | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10002 | 9 | 486 | 5 (20)| 00:00:01 | Q1,02 | P->S | QC (RAND) |
|* 3 | HASH JOIN BUFFERED | | 9 | 486 | 5 (20)| 00:00:01 | Q1,02 | PCWP | |
| 4 | PX RECEIVE | | 10 | 270 | 2 (0)| 00:00:01 | Q1,02 | PCWP | |
| 5 | PX SEND HASH | :TQ10000 | 10 | 270 | 2 (0)| 00:00:01 | Q1,00 | P->P | HASH |
| 6 | PX BLOCK ITERATOR | | 10 | 270 | 2 (0)| 00:00:01 | Q1,00 | PCWC | |
|* 7 | TABLE ACCESS FULL| PLAN_HASH_VALUE_TEST1 | 10 | 270 | 2 (0)| 00:00:01 | Q1,00 | PCWP | |
| 8 | PX RECEIVE | | 10 | 270 | 2 (0)| 00:00:01 | Q1,02 | PCWP | |
| 9 | PX SEND HASH | :TQ10001 | 10 | 270 | 2 (0)| 00:00:01 | Q1,01 | P->P | HASH |
| 10 | PX BLOCK ITERATOR | | 10 | 270 | 2 (0)| 00:00:01 | Q1,01 | PCWC | |
|* 11 | TABLE ACCESS FULL| PLAN_HASH_VALUE_TEST2 | 10 | 270 | 2 (0)| 00:00:01 | Q1,01 | PCWP | |
------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("A"."ID1"="B"."ID1" AND "A"."ID2"="B"."ID2")
7 - filter("A"."ID1"<=10 AND "A"."ID1">=1)
11 - filter("B"."ID1"<=10 AND "B"."ID1">=1)

25 rows selected.

SQL>

The example uses different parallel distribution options for the same execution plan. In 10g and later this is reflected in different operations (like PX SEND BROADCAST) and this suggests that the PLAN_HASH_VALUEs are going to be different due to the different operations.

Running a similar test case on 9.2.0.8 shows it seems that actually some of the attributes related to parallel execution are also used to calculate the PLAN_HASH_VALUE:

SQL>
SQL> alter system flush shared_pool;

System altered.

SQL>
SQL> alter system flush shared_pool;

System altered.

SQL>
SQL> explain plan for
2 select /*+ find_me
3 full(a)
4 full(b)
5 use_hash(a, b)
6 parallel(a)
7 parallel(b)
8 pq_distribute(b, none, broadcast)
9 */
10 *
11 from
12 plan_hash_value_test1 a
13 , plan_hash_value_test2 b
14 where
15 a.id1 = b.id1
16 and a.id2 = b.id2
17 and a.id1 between 1 and 10;

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| TQ |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 270 | 6 (34)| | | |
|* 1 | HASH JOIN | | 5 | 270 | 6 (34)| 55,01 | P->S | QC (RAND) |
|* 2 | TABLE ACCESS FULL | PLAN_HASH_VALUE_TEST1 | 10 | 270 | 3 (34)| 55,01 | PCWP | |
|* 3 | TABLE ACCESS FULL | PLAN_HASH_VALUE_TEST2 | 10 | 270 | 3 (34)| 55,00 | P->P | BROADCAST |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("A"."ID1"="B"."ID1" AND "A"."ID2"="B"."ID2")
2 - filter("A"."ID1"<=10 AND "A"."ID1">=1)
3 - filter("B"."ID1"<=10 AND "B"."ID1">=1)

16 rows selected.

SQL>
SQL> rollback;

Rollback complete.

SQL>
SQL> select /*+ find_me
2 full(a)
3 full(b)
4 use_hash(a, b)
5 parallel(a)
6 parallel(b)
7 pq_distribute(b, none, broadcast)
8 */
9 *
10 from
11 plan_hash_value_test1 a
12 , plan_hash_value_test2 b
13 where
14 a.id1 = b.id1
15 and a.id2 = b.id2
16 and a.id1 between 1 and 10;

ID1 ID2 SMALL_VC ID1 ID2 SMALL_VC
---------- ---------- -------------------- ---------- ---------- --------------------
1 1 x 1 1 x
2 0 x 2 0 x
3 1 x 3 1 x
4 0 x 4 0 x
5 1 x 5 1 x
6 0 x 6 0 x
7 1 x 7 1 x
8 0 x 8 0 x
9 1 x 9 1 x
10 0 x 10 0 x

10 rows selected.

SQL>
SQL> select
2 plan_hash_value
3 , hash_value
4 from
5 v$sql
6 where
7 sql_text like 'select /*+ find_me%';

PLAN_HASH_VALUE HASH_VALUE
--------------- ----------
1709875781 508205717

SQL>
SQL> alter system flush shared_pool;

System altered.

SQL>
SQL> alter system flush shared_pool;

System altered.

SQL>
SQL> explain plan for
2 select /*+ find_me
3 full(a)
4 full(b)
5 use_hash(a, b)
6 parallel(a)
7 parallel(b)
8 pq_distribute(b, broadcast, none)
9 */
10 *
11 from
12 plan_hash_value_test1 a
13 , plan_hash_value_test2 b
14 where
15 a.id1 = b.id1
16 and a.id2 = b.id2
17 and a.id1 between 1 and 10;

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| TQ |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 270 | 6 (34)| | | |
|* 1 | HASH JOIN | | 5 | 270 | 6 (34)| 57,01 | P->S | QC (RAND) |
|* 2 | TABLE ACCESS FULL | PLAN_HASH_VALUE_TEST1 | 10 | 270 | 3 (34)| 57,00 | P->P | BROADCAST |
|* 3 | TABLE ACCESS FULL | PLAN_HASH_VALUE_TEST2 | 10 | 270 | 3 (34)| 57,01 | PCWP | |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("A"."ID1"="B"."ID1" AND "A"."ID2"="B"."ID2")
2 - filter("A"."ID1"<=10 AND "A"."ID1">=1)
3 - filter("B"."ID1"<=10 AND "B"."ID1">=1)

16 rows selected.

SQL>
SQL> rollback;

Rollback complete.

SQL>
SQL> select /*+ find_me
2 full(a)
3 full(b)
4 use_hash(a, b)
5 parallel(a)
6 parallel(b)
7 pq_distribute(b, broadcast, none)
8 */
9 *
10 from
11 plan_hash_value_test1 a
12 , plan_hash_value_test2 b
13 where
14 a.id1 = b.id1
15 and a.id2 = b.id2
16 and a.id1 between 1 and 10;

ID1 ID2 SMALL_VC ID1 ID2 SMALL_VC
---------- ---------- -------------------- ---------- ---------- --------------------
1 1 x 1 1 x
2 0 x 2 0 x
3 1 x 3 1 x
4 0 x 4 0 x
5 1 x 5 1 x
6 0 x 6 0 x
7 1 x 7 1 x
8 0 x 8 0 x
9 1 x 9 1 x
10 0 x 10 0 x

10 rows selected.

SQL>
SQL> select
2 plan_hash_value
3 , hash_value
4 from
5 v$sql
6 where
7 sql_text like 'select /*+ find_me%';

PLAN_HASH_VALUE HASH_VALUE
--------------- ----------
441284116 3983849749

SQL>
SQL> alter system flush shared_pool;

System altered.

SQL>
SQL> alter system flush shared_pool;

System altered.

SQL>
SQL> explain plan for
2 select /*+ find_me
3 full(a)
4 full(b)
5 use_hash(a, b)
6 parallel(a)
7 parallel(b)
8 pq_distribute(b, hash, hash)
9 */
10 *
11 from
12 plan_hash_value_test1 a
13 , plan_hash_value_test2 b
14 where
15 a.id1 = b.id1
16 and a.id2 = b.id2
17 and a.id1 between 1 and 10;

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| TQ |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 270 | 6 (34)| | | |
|* 1 | HASH JOIN | | 5 | 270 | 6 (34)| 59,02 | P->S | QC (RAND) |
|* 2 | TABLE ACCESS FULL | PLAN_HASH_VALUE_TEST1 | 10 | 270 | 3 (34)| 59,00 | P->P | HASH |
|* 3 | TABLE ACCESS FULL | PLAN_HASH_VALUE_TEST2 | 10 | 270 | 3 (34)| 59,01 | P->P | HASH |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("A"."ID1"="B"."ID1" AND "A"."ID2"="B"."ID2")
2 - filter("A"."ID1"<=10 AND "A"."ID1">=1)
3 - filter("B"."ID1"<=10 AND "B"."ID1">=1)

16 rows selected.

SQL>
SQL> rollback;

Rollback complete.

SQL>
SQL> select /*+ find_me
2 full(a)
3 full(b)
4 use_hash(a, b)
5 parallel(a)
6 parallel(b)
7 pq_distribute(b, hash, hash)
8 */
9 *
10 from
11 plan_hash_value_test1 a
12 , plan_hash_value_test2 b
13 where
14 a.id1 = b.id1
15 and a.id2 = b.id2
16 and a.id1 between 1 and 10;

ID1 ID2 SMALL_VC ID1 ID2 SMALL_VC
---------- ---------- -------------------- ---------- ---------- --------------------
1 1 x 1 1 x
3 1 x 3 1 x
5 1 x 5 1 x
8 0 x 8 0 x
10 0 x 10 0 x
2 0 x 2 0 x
4 0 x 4 0 x
6 0 x 6 0 x
7 1 x 7 1 x
9 1 x 9 1 x

10 rows selected.

SQL>
SQL> select
2 plan_hash_value
3 , hash_value
4 from
5 v$sql
6 where
7 sql_text like 'select /*+ find_me%';

PLAN_HASH_VALUE HASH_VALUE
--------------- ----------
353704519 1797852549

SQL>

Although the main operations and their order of execution stays the same, the calculated PLAN_HASH_VALUE is different in all three cases.

So in summary the following conclusions can be made:

- The same PLAN_HASH_VALUE is merely an indicator that the same operations on objects of the same name are performed in the same order.

- It tells nothing about the similarity of the expected runtime performance of the execution plan, due to various reasons as demonstrated. The most significant information that is not covered by the PLAN_HASH_VALUE are the filter and access predicates, but there are other attributes, too, that are not part of the hash value calculation.

- Still it's very likely that for the same SQL statement and the same unchanged underlying data the same PLAN_HASH_VALUE indicates similar expected runtime performance, except for some particular cases where the execution plan itself already contains "conditional" execution paths which might be taken by the runtime engine and therefore lead to significantly different runtimes. Some of these examples have already been covered by Jonathan Lewis: Conditional Plan and Hierarchical Queries in some versions of Oracle.

In the second part I'll demonstrate a possible approach how to calculate your own PLAN_HASH_VALUE that covers some of the information omitted if you suspect that you might have encountered a scenario where the same PLAN_HASH_VALUE suggests similar runtime performance but there are significant differences in the execution plans that you would like to have covered by the hash value calculation to detect these plan changes easily.

randolf.geist's picture

Dynamic sampling and partitioned tables

------------------------------------------------------------------------------

Update January 2010: A thread on OTN mentioned this blog post and another, actually contradicting blog post by Asif Momen.

So why are these two blog posts coming to different conclusions regarding Dynamic Sampling and partitions with missing statistics?

This is the good thing with documented test cases - I reproduced what Asif has done and found out that the significant difference between these two test cases is the existence of global level statistics.

In my test case below, I have explicitly gathered statistics only on partition level, and there are no statistics on global level/table (which can be seen from the output of the query against user_tab_statistics below).

Asif has actually gathered statistics on global/table level which can be seen from his blog post.

So the conclusion seems to be: If you prune to a single partition, but this partition has no statistics, then Dynamic sampling will be used if no global/table level statistics are available. If global/table level statistics are available, the optimizer won't perform dynamic sampling and revert to these global/table level statistics instead.

Oddly this obviously doesn't apply to the subpartition/partition level case: Repeating a similar setup with subpartitions having no statistics, but statistics on partition level are available, Dynamic Sampling still was used (tested on 11.1.0.7 Win32).

------------------------------------------------------------------------------

To prevent automated spam submissions leave this field empty.
Syndicate content