Execution plans

Jonathan Lewis's picture

Subquery Selectivity – 2

Here’s an item I thought I’d published a few years ago as a follow-up to an article on a 10g bug-fix for subquery selectivity. I was reminded of my oversight when a question came up on OTN that looked like an example of the bug introduced by the bug-fix – and I found that I couldn’t simply supply a link to explain the problem.

We start with some simple SQL to create a test data set:

create table t1
as
with generator as (
	select	--+ materialize
		rownum 	id
	from	all_objects
	where	rownum <= 3000
)
select
	mod(rownum,729)		id1,
	rownum			id2,
	mod(rownum,11)		n1,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 1000000
;

alter table t1 add constraint t1_pk primary key(id1, id2);

create table t2
as
with generator as (
	select	--+ materialize
		rownum 	id
	from	all_objects
	where	rownum <= 3000
)
select
	mod(rownum,737)		id1,
	trunc((rownum-1)/737)	id2,
	mod(rownum,11)		n1,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 1000000
;

alter table t2 add constraint t2_pk primary key(id1, id2);

-- gather statistics, compute, no histograms

Having created these data sets, we can run the following query against them. It’s a very simple query with aggregate subquery. I’ve included a no_unnest hint because many newer versions of Oracle would do a cost-based transformation on something this simple and unnest the subquery. I want to keep the code and numbers simple while demonstrating a principle. (In the earlier article I used a two-table join with two existence subqueries to make a point about how a change in arithmetic could cause a change in execution plan; in this article I’m just going to show you the change in arithmetic.)


select
	*
from
	t1
where
	1 = (
		select
			/*+ no_unnest */
			max(n1)
		from
			t2
		where
			t2.id1 = t1.id1
	)
;

Here are the execution plans from 9.2.0.8 and 11.1.0.6 respectively (the 10.2 plan matches the 11g plan). For consistency I am not using CPU costing (system statistics) in either case:


Execution Plan from 9.2.0.8
-----------------------------------------------------------------------------
| Id  | Operation                     |  Name       | Rows  | Bytes | Cost  |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |             | 10000 |  1201K|    13M|
|*  1 |  FILTER                       |             |       |       |       |
|   2 |   TABLE ACCESS FULL           | T1          | 10000 |  1201K|  2712 |
|   3 |   SORT AGGREGATE              |             |     1 |     7 |       |
|   4 |    TABLE ACCESS BY INDEX ROWID| T2          |  1357 |  9499 |  1363 |
|*  5 |     INDEX RANGE SCAN          | T2_PK       |  1357 |       |     6 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( (SELECT /*+ NO_UNNEST */ MAX("T2"."N1") FROM "T2" "T2" WHERE
              "T2"."ID1"=:B1)=1)
   5 - access("T2"."ID1"=:B1)

Execution Plan from 11.1.0.6 (matches plan from 10.2.0.3)
-----------------------------------------------------------------------
| Id  | Operation                     | Name  | Rows  | Bytes | Cost  |
-----------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |    14 |  1722 |   996K|
|*  1 |  FILTER                       |       |       |       |       |
|   2 |   TABLE ACCESS FULL           | T1    |  1000K|   117M|  2712 |
|   3 |   SORT AGGREGATE              |       |     1 |     7 |       |
|   4 |    TABLE ACCESS BY INDEX ROWID| T2    |  1357 |  9499 |  1363 |
|*  5 |     INDEX RANGE SCAN          | T2_PK |  1357 |       |     6 |
-----------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( (SELECT /*+ NO_UNNEST */ MAX("N1") FROM "T2" "T2" WHERE
              "T2"."ID1"=:B1)=1)
   5 - access("T2"."ID1"=:B1)

You will notice, of course, that the estimated cardinality of the result (line 0) is different, and the estimated cardinality of the scan on t1 (line 2) is different. There was an important code change introduced in 10g relating to the handling of subqueries that explains this effect.

In the previous article I pointed out that an existence subquery introduces a selectivity of 5% – and that the upgrade to 10g changed the point in the optimizer’s arithmetic where this 5% was applied (hence where the plan showed a change in cardinality). In the example above my code is of the form ” = (subquery) “, which means Oracle applies a “magic” 1% rather than 5%. (For a range-based predicate – i.e. <, > etc – it would have been the 5% that we got for existence.)

So in the 9i plan, line 2 shows a cardinality of 10,000 because that’s 1% of the original 1,000,000 rows but, as I pointed out in the earlier article, this means the optimizer will be working out the rest of the plan based on the estimated effect of a filter subquery that has not yet run. (Do read the previous article if you don’t follow that last comment.)

From 10g onwards (and in our 11.1 plan) the cardinality for line 2 is 1,000,000 because that’s how many rows will appear in the rowsource as the table is scanned. But then you have to ask why Oracle didn’t print a cardinality on line 1 (the FILTER line) because it’s at that line that we should see a cardinality of 10,000 as the filter selectivity of 1% is applied.

But there is another problem. The final cardinality of the 11.1 plan is NOT reported as 10,000; it’s reported as 14. As Oracle has fixed one defect in the optimizer it has introduced another (whether by accident, or by design, I don’t know). The estimate of 14 rows is effectively double-counting. The optimizer has applied the the 1% selectivity for the subquery, and then applied the selectivty of the predicate “t1.id1 = {unknown value}”. (If the predicate had been “> (subquery)” or some other range-based operator, or existence, then the arithmetic would have applied a 5% selectivity on top of that “t1.id1 = {unknown value}” selectivity.

In this case, of course, it probably doesn’t matter that the final cardinality is wrong – but in more complex cases this error may be introduced somewhere in the middle of the plan, making the optimizer do something very silly from that point onwards.

There is a surprising “workaround” to this issue – which may not be relevant in all cases and may not even work in some. If you force Oracle into running the subquery early (with the push_subq hint) the double-counting doesn’t occur:

select
	/*+ push_subq */	-- 9i hint
	*
from
	t1
where
	1 = (
		select
			/*+ no_unnest push_subq */	-- 10g+ hint (if you don't use query block naming)
			max(n1)
		from
			t2
		where
			t2.id1 = t1.id1
	)
;

-----------------------------------------------------------------------
| Id  | Operation                     | Name  | Rows  | Bytes | Cost  |
-----------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       | 10000 |  1201K|  4075 |
|*  1 |  TABLE ACCESS FULL            | T1    | 10000 |  1201K|  2712 |
|   2 |   SORT AGGREGATE              |       |     1 |     7 |       |
|   3 |    TABLE ACCESS BY INDEX ROWID| T2    |  1357 |  9499 |  1363 |
|*  4 |     INDEX RANGE SCAN          | T2_PK |  1357 |       |     6 |
-----------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( (SELECT /*+ PUSH_SUBQ NO_UNNEST */ MAX("N1") FROM "T2"
              "T2" WHERE "T2"."ID1"=:B1)=1)
   4 - access("T2"."ID1"=:B1)

It strikes me as a little bizarre that this hint has any effect at all in this case since there is no “earlier point” at which the subquery can run. But it has pushed the optimizer through a different code path, which has produced a plan with a different shape and avoided the double-counting.

Footnote: I’ve just included in the final query text a reminder that the push_subq hint changed in the upgrade from 9i to 10g. Originally it went at the top of the query and caused every subquery to be pushed; from 10g you have to be selective about the subqueries you want pushed. The hint at the top will be “ignored” and you apply the hint to each subquery that you want pushed by writing it in the subquery (as I have here) or by naming each subquery with the qb_name hint and then putting the push_subq(@{your subquery name}) hint(s) at the top of the query, naming the subqueries to be pushed.

Jonathan Lewis's picture

Fake Baselines

SQL Baslines in 11g are the new Stored Outlines – and one of the nicest features of SQL Baselines is that you are allowed to fake them; or rather, it’s legal to generate an execution plan for one query and transfer its execution plan to another query using the packaged procedure dbms_spm.load_plans_from_cursor_cache(). This posting is a demonstration of the technique.

We start with a sample data set and a query that is going to do “the wrong thing”. As usual I’ve got a locally managed tablespace with 8KB blocks and 1MB uniform extents, freelist management, and I’m running with CPU Costing disabled (and running 11.1.0.6 in this case):


set serveroutput off

create table t1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		rownum <= 10000
)
select
	rownum			id,
	mod(rownum,1000)	n1000,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 10000
;

create index t1_n1 on t1(n1000);

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt 	 => 'for all columns size 1'
	);
end;
/
select
	/*+ target_query */
	id
from
	t1
where
	n1000 = 500
;

The final query is the one that I want to fake a baseline for. With my current set up it does an index range scan to pick up 10 rows, but I’m going to make it do a tablescan instead. I’m going to need to pull the exact text of the query from memory in a moment, so I’ll find its sql_id and child_number by searching for the “pseudo-hint” that I’ve included in the text, and I’ll report the execution plan to show that I’ve picked up the right thing (I’m assuming that there’s only one piece of SQL that’s going to include the text “target_query”, of course):

column	sql_id			new_value	m_sql_id_1
column 	plan_hash_value		new_value	m_plan_hash_value_1
column	child_number		new_value	m_child_number_1

select
	sql_id, plan_hash_value, child_number
from
	v$sql
where
	sql_text like '%target_query%'
and	sql_text not like '%v$sql%'
and	rownum = 1
;

select * from table(dbms_xplan.display_cursor('&m_sql_id_1',&m_child_number_1));

SQL_ID  306m98cpu9yz7, child number 0
-------------------------------------
select  /*+ target_query */  id from  t1 where  n1000 = 500

Plan hash value: 1420382924

---------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost  |
---------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |       |       |    11 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |    10 |    80 |    11 |
|*  2 |   INDEX RANGE SCAN          | T1_N1 |    10 |       |     1 |
---------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N1000"=500)

The next step is to create a new query (and in a production system that might simply mean running a heavily hinted version of the target query) that uses the execution plan I want to see; and I’ll use the same search technique to find it and report its plan:

select
	/*+ full(t1) alternate_query */
	id
from
	t1
where
	n1000 = 500
;

column	sql_id			new_value	m_sql_id_2
column 	plan_hash_value		new_value	m_plan_hash_value_2
column	child_number		new_value	m_child_number_2

select
	sql_id, plan_hash_value, child_number
from
	v$sql
where
	sql_text like '%alternate_query%'
and	sql_text not like '%v$sql%'
and	rownum = 1
;

select * from table(dbms_xplan.display_cursor('&m_sql_id_2',&m_child_number_2));

SQL_ID  bvpb73bb6c6uy, child number 0
-------------------------------------
select  /*+ full(t1) alternate_query */  id from  t1 where  n1000 = 500

Plan hash value: 3617692013

----------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost  |
----------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |    28 |
|*  1 |  TABLE ACCESS FULL| T1   |    10 |    80 |    28 |
----------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1000"=500)

After been running and reporting these queries and their plans, I’ve captured the SQL_Id, child_number, and plan_hash_value for each query; and this is more than enough information to make it possible to create an SQL Baseline for one query using the execution plan for the other query.

declare
	m_clob	clob;
begin
	select
		sql_fulltext
	into
		m_clob
	from
		v$sql
	where
		sql_id = '&m_sql_id_1'
	and	child_number = &m_child_number_1
	;

	dbms_output.put_line(m_clob);

	dbms_output.put_line(
		dbms_spm.load_plans_from_cursor_cache(
			sql_id 			=> '&m_sql_id_2',
			plan_hash_value		=> &m_plan_hash_value_2,
			sql_text		=> m_clob,
			fixed			=> 'NO',
			enabled			=> 'YES'
		)
	);

end;
/

I used the SQL_ID and child_number from the first query to get the full SQL text of the query into an in-memory CLOB, and then use the SQL_id and plan_hash_value from the second query to associate the second plan with the first query – storing the result as a SQL Baseline that is enabled and ready for use.

You’ll have to take my word that I haven’t faked the following text, but this is what I get when I re-run the original query (flushing the shared pool first to make sure that I don’t accidentally end up picking up the original child cursor):

alter system flush shared_pool;

select
	/*+ target_query */
	id
from
	t1
where
	n1000 = 500
;

select * from table(dbms_xplan.display_cursor('&m_sql_id_1',null));

SQL_ID  306m98cpu9yz7, child number 1
-------------------------------------
select  /*+ target_query */  id from  t1 where  n1000 = 500

Plan hash value: 3617692013

----------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost  |
----------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |    28 |
|*  1 |  TABLE ACCESS FULL| T1   |    10 |    80 |    28 |
----------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1000"=500)

Note
-----
   - cpu costing is off (consider enabling it)
   - SQL plan baseline SYS_SQL_PLAN_3c0ea7f3dbd90e8e used for this statement

It’s so much easier than the fiddling one had to do for stored outlines which was quite easy in 8i, but got a little nasty in 9i because of the extra (undocumented) details that looked as if they might have been necessary when the third table appeared in the outln schema. However, in 10g, the dbms_outln package was enhanced to allow you to create outlines from the library cache – see this note from Randolf Geist for more details – but remember that stored outlines will be deprecated in Oracle 12.

Footnote: the dbms_spm package is created by script $ORACLE_HOME/rdbms/admin/dbmsspm.sql, and there is a comment near the top saying:


--           Grant the execute privilege on SPM package to public.           --
--           But ADMINISTER SQL MANAGEMENT OBJECT privilege is checked       --
--           before a user is allowed to execute.                            --

Jonathan Lewis's picture

Cost – again

Browsing through some postings on Tony Hasler’s blog a little while ago I found this response to a note he had posted on some anomalies (i.e. bugs) in the costing of the “(min/max)” index scans:

My current understanding is it is not valid to try to compare costs across different queries (even if you just alter it by adding a hint). In general a better plan will have a lower cost but you cannot rely on this metric. The metric is really for the CBO to choose between alternative plans for this specific query, not to compare plans generated for different queries.

Now I know that this is a statement that pretty much paraphrases something that Tom Kyte wrote on AskTom several years ago – but it’s wrong. As I’ve pointed out in the past, “Cost is Time”. The cost of a query represents the optimizer’s estimate of how long it will take that query to run – so it is perfectly valid to compare the cost of two queries to see which one the optimizer thinks will be faster but, thanks to limitations and defects in the optimizer it may not be entirely sensible to do so.

The point I want to address in this post though is the comment that “it’s valid to compare the cost of different plans, but not to compare the cost of two different queries”. Consider the following queries:

select
	t1.v1
from
	t1
where	t1.id in (
		select
			t2.id
		from	t2
		where	t2.n1 = 15
	)
;

select
	t1.v1
from
	t1
where	exists (
		select
			t2.id
		from	t2
		where	t2.n1 = 15
		and	t2.id = t1.id
	)
;

select
	t1.v1
from
	(
		select	distinct t2.id
		from	t2
		where	t2.n1 = 15
	) t2,
	t1
where
	t1.id = t2.id
;

select
	v1.v1
from	(
	select
		distinct
			t1.rowid,
			t2.id,
			t1.v1
	from
		t2, t1
	where
		t2.n1 = 15
	and	t1.id = t2.id
	) v1
;

Tables t1 and t2 have the following definitions, so all four queries are logically equivalent:

Name                    Null?    Type
----------------------- -------- ----------------
ID                      NOT NULL NUMBER
N1                      NOT NULL NUMBER
V1                               VARCHAR2(6)
PADDING                          VARCHAR2(100)

According to the claim, it is not valid to compare the costs that the optimizer gives you for these four different queries – but they are the same query. In principle the optimizer might transform the IN to an EXISTS, it might simply unnest, it might unnest and merge – so when the optimizer is “comparing different costs for the same query”, it will also be “comparing costs of different queries”.

Cost IS time – but only in theory. The “trick” to sorting out optimization problems lies in recognising where the optimizer model is not right for your data, or the optimizer arithmetic is too simplistic or has a bug.

Jonathan Lewis's picture

SQL Plan Baselines

Here’s another of my little catalogues of articles – this one on SQL Plan Baselines.

Be a little careful as you read through these notes – there are various changes in internal mechanisms, storage, etc. as you go through different versions of Oracle, so check which version the author is writing about.

Maria Colgan of Oracle Corp.

Christian Anognini:

Kerry Osborne

Tim Hall

Jason Arneil

Jonathan Lewis's picture

Existence

Here’s a posting on OTN that demonstrates a piece of SQL that uses inline scalar subqueries which are all “existence” tests to produce (presumably) a set of flags describing the state of a particular item of data.

I’ve linked to it because I contributed a comment about the implications of the cost figures that appeared in the execution plan for two of the “exists” subqueries. Essentially “existence” is optimized as a “first_rows(1)” operation – which results in two lines of the plan showing two different costs for table scans of the same table.

Update 30th Dec:
If you follow the OTN note you’ll see that the original poster was confused by my comments about the relative costs of the two tablescans, so I’ve whipped up a quick script to clarify the point. It uses my typical “reproducible” setup of 1MB uniform extents, 8KB block size, freelist management and disabling system statistics. Here’s the starting data set:

create table t1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		rownum <= 10000
)
select
	rownum			id,
	mod(rownum,1000)	n_1000,
	mod(rownum,5000)	n_5000,
	mod(rownum,10000)	n_10000,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 100000
;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		estimate_percent => 100,
		method_opt 	 => 'for all columns size 1'

	);
end;
/

Notice that any given value of n_1000 appears every 1,000 rows, any given value of n_5000 appears every 5,000 rows, and any given value for n_10000 appears every 10,000 rows. Think about the effect this has on Oracle’s prediction of how much work it has to do when asked to find a particular row under first_rows(1) optimisation (which is the optimisation strategy triggered by the “rownum = 1″ predicates below):


set autotrace traceonly explain

select count(*) from t1;
select id from t1 where	n_1000  =  500 and rownum = 1;
select id from t1 where	n_5000  = 2500 and rownum = 1;
select id from t1 where	n_10000 = 5000 and rownum = 1;

set autotrace off

The more rows you have to scan (on average) to find a given value, the more costly the tablescan becomes. (The initial select count(*) is there to demonstrate Oracle’s estimate of the cost of scanning the whole table).

---------------------------------------------------
| Id  | Operation          | Name | Rows  | Cost  |
---------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |   283 |
|   1 |  SORT AGGREGATE    |      |     1 |       |
|   2 |   TABLE ACCESS FULL| T1   |   100K|   283 |
---------------------------------------------------

-----------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost  |
-----------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     9 |     5 |
|*  1 |  COUNT STOPKEY     |      |       |       |       |
|*  2 |   TABLE ACCESS FULL| T1   |     2 |    18 |     5 |
-----------------------------------------------------------
   1 - filter(ROWNUM=1)
   2 - filter("N_1000"=500)

-----------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost  |
-----------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     9 |    16 |
|*  1 |  COUNT STOPKEY     |      |       |       |       |
|*  2 |   TABLE ACCESS FULL| T1   |     2 |    18 |    16 |
-----------------------------------------------------------
   1 - filter(ROWNUM=1)
   2 - filter("N_5000"=2500)

-----------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost  |
-----------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     9 |    30 |
|*  1 |  COUNT STOPKEY     |      |       |       |       |
|*  2 |   TABLE ACCESS FULL| T1   |     2 |    18 |    30 |
-----------------------------------------------------------
   1 - filter(ROWNUM=1)
   2 - filter("N_10000"=5000)

And now, check the costs of the tablescans in the existence subqueries for the following:


select
	(select 1 from dual where exists (select null from t1 where n_1000  =  500)) n_1000,
	(select 1 from dual where exists (select null from t1 where n_5000  = 2500)) n_5000,
	(select 1 from dual where exists (select null from t1 where n_10000 = 5000)) n_10000
from
	dual
;

-----------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost  |
-----------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |       |     2 |
|*  1 |  FILTER            |      |       |       |       |
|   2 |   FAST DUAL        |      |     1 |       |     2 |
|*  3 |   TABLE ACCESS FULL| T1   |     2 |     8 |     5 |
|*  4 |  FILTER            |      |       |       |       |
|   5 |   FAST DUAL        |      |     1 |       |     2 |
|*  6 |   TABLE ACCESS FULL| T1   |     2 |     8 |    16 |
|*  7 |  FILTER            |      |       |       |       |
|   8 |   FAST DUAL        |      |     1 |       |     2 |
|*  9 |   TABLE ACCESS FULL| T1   |     2 |     8 |    30 |
|  10 |  FAST DUAL         |      |     1 |       |     2 |
-----------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( EXISTS (SELECT /*+ */ 0 FROM "T1" "T1" WHERE "N_1000"=500))
   3 - filter("N_1000"=500)
   4 - filter( EXISTS (SELECT /*+ */ 0 FROM "T1" "T1" WHERE "N_5000"=2500))
   6 - filter("N_5000"=2500)
   7 - filter( EXISTS (SELECT /*+ */ 0 FROM "T1" "T1" WHERE "N_10000"=5000))
   9 - filter("N_10000"=5000)

Jonathan Lewis's picture

Index join – 4

In a recent note I wrote about index joins I made a passing comment about limitations in the optimizer’s available strategies that might make you choose to write your code to emulate an index join through explicit SQL references.

Here are two SQL similar SQL statements (with execution plans) that demonstrate the initial problem – the first is just a restatement of the basic example I supplied in the first article:

create table indjoin
as
select
	rownum	id,
	rownum	val1,
	rownum	val2,
	rpad('x',500) padding
from all_objects where rownum <= 3000
;

-- collect stats, compute, no histograms

create unique index ij_v1 on indjoin(id, val1);
create unique index ij_v2 on indjoin(id, val2);

select
	val1, val2
from
	indjoin		ij
where
	val1 between 100 and 200
and	val2 between 50 and 150
;

---------------------------------------------------------------------------
| Id  | Operation              | Name             | Rows  | Bytes | Cost  |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                  |     3 |    24 |    24 |
|*  1 |  VIEW                  | index$_join$_001 |     3 |    24 |    24 |
|*  2 |   HASH JOIN            |                  |       |       |       |
|*  3 |    INDEX FAST FULL SCAN| IJ_V1            |     3 |    24 |    11 |
|*  4 |    INDEX FAST FULL SCAN| IJ_V2            |     3 |    24 |    11 |
---------------------------------------------------------------------------

select
	val1, val2, rowid
from
	indjoin		ij
where
	val1 between 100 and 200
and	val2 between 50 and 150
;

-----------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost  |
-----------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |     3 |    60 |    17 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| INDJOIN |     3 |    60 |    17 |
|*  2 |   INDEX FULL SCAN           | IJ_V1   |   102 |       |     9 |
-----------------------------------------------------------------------

When we include the rowid in the query the optimizer stops using the index join – and it won’t even use the mechanism if we hint it. Apparently, for the purposes of analysing the query, Oracle doesn’t recognise the rowid as a column in the table and this automatically precludes the possibility of using the index join as the access method. So we have to use the manual rewrites I introduced in an earlier article.

You might wonder why this matters – but consider a case where a “perfect” index doesn’t exist for the following query:

select
	padding
from
	indjoin		ij
where
	val1 between 100 and 200
and	val2 between 50 and 150
;

The only access path available to the optimizer at this point is a fulll tablescan – but what if the two indexes are very small compared to the table; wouldn’t it be a good idea to use an index hash join between the two indexes to get a list of rowids and visit the table only for those rows. Unfortunately isn’t a path the optimizer can derive – so we might try something like:


select
	t.padding
from
	(
	select
		/*+
			index_join(ij ij_v1 ij_v2)
			no_merge
		*/
		rowid
	from
		indjoin		ij
	where
		val1 between 100 and 200
	and	val2 between 50 and 150
	)	v1,
	indjoin	t
where
	t.rowid = v1.rowid
;

But, as we’ve just seen, you can’t do an index join if you select the rowid, so this code won’t follow the strategy we want. (In fact, when I tried it, there was something distinctly bug-like about the plan – but I won’t go into that now). But we can do the following:


select
	t.padding
from
	(
	select
		rowid
	from
		indjoin		ij
	where
		val1 between 100 and 200
	)	v1,
	(
	select
		rowid
	from
		indjoin		ij
	where
		val2 between 50 and 150
	)	v2,
	indjoin	t
where
	v2.rowid = v1.rowid
and	t.rowid = v2.rowid
;

-----------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost  |
-----------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |     3 |  1632 |    10 |
|   1 |  NESTED LOOPS               |         |     3 |  1632 |    10 |
|*  2 |   HASH JOIN                 |         |     3 |    96 |     7 |
|*  3 |    INDEX FAST FULL SCAN     | IJ_V1   |   102 |  1632 |     3 |
|*  4 |    INDEX FAST FULL SCAN     | IJ_V2   |   102 |  1632 |     3 |
|   5 |   TABLE ACCESS BY USER ROWID| INDJOIN |     1 |   512 |     1 |
-----------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("V2".ROWID="V1".ROWID)
   3 - filter("VAL1">=100 AND "VAL1"<=200)
   4 - filter("VAL2">=50 AND "VAL2"<=150)

It’s amazing what you can make the optimizer do (even without hinting) if you think about the mechanics underneath the basic operations.

[Further reading on Index Joins]

Jonathan Lewis's picture

Join Surprise

Imagine I have a simple SQL statement with a “where clause” that looks like this:


	t2.id1(+) = t1.id1
and	t2.id2(+) = t1.id2

Would you expect it to run several times faster (25 minutes instead of a few hours) when the only change you made was to swap the order of the join predicates to:


	t2.id2(+) = t1.id2
and	t2.id1(+) = t1.id1

You may recall that a couple of years ago I wrote about some bugs in the optimizer, and pointed you to a blog article by Alberto Dell’Era that demonstrated an anomaly in cardinality calculations that made this type of thing possible. But here’s an example which has nothing to do with cardinality errors. We start with a suitable dataset – running on 11.1.0.6.


create table t1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		rownum <= 10000
)
select
	trunc(dbms_random.value(1,1000))	id1,
	trunc(dbms_random.value(1,1000))	id2,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',1000)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 10000
;

create table t2
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		rownum <= 7
)
select
	t1.id1,
	t1.id2,
	v1.id,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',70)		padding
from
	t1		t1,
	generator	v1
;

-- collect stats, compute, no histograms

This data set models a problem – stripped to the bare essentials – that I came across at a client site some time ago. We have a “parent/child” relationship between the tables (although I haven’t declared the referential integrity), with roughly seven child rows per parent. The parent rows are quite long, the child rows are quite short. Some parents may not have children (although in this data set they do).

We now run a “report” that generates data for a number-crunching tool that extracts all the data from the tables – using an outer join so that parent rows don’t get lost. For various reasons the tool wanted the data sorted in a certain order – so there’s also an order by clause in the query. I’m going to show you the original query – first unhinted, and then hinted to use a merge join:


select
	t1.padding,
	t2.padding
from
	t1, t2
where
	t2.id1(+) = t1.id1
and	t2.id2(+) = t1.id2
order by
	t1.id2,
	t1.id1
;

---------------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      | 10000 |    10M|       |  3720   (1)| 00:00:45 |
|   1 |  SORT ORDER BY         |      | 10000 |    10M|    22M|  3720   (1)| 00:00:45 |
|*  2 |   HASH JOIN RIGHT OUTER|      | 10000 |    10M|  6224K|  1436   (1)| 00:00:18 |
|   3 |    TABLE ACCESS FULL   | T2   | 70000 |  5400K|       |   260   (1)| 00:00:04 |
|   4 |    TABLE ACCESS FULL   | T1   | 10000 |  9853K|       |   390   (1)| 00:00:05 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T2"."ID1"(+)="T1"."ID1" AND "T2"."ID2"(+)="T1"."ID2")

select
	/*+ leading(t1 t2) use_merge(t2) */
	t1.padding,
	t2.padding
from
	t1, t2
where
	t2.id1(+) = t1.id1
and	t2.id2(+) = t1.id2
order by
	t1.id2,
	t1.id1
;

-------------------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      | 10000 |    10M|       |  6343   (1)| 00:01:17 |
|   1 |  SORT ORDER BY       |      | 10000 |    10M|    22M|  6343   (1)| 00:01:17 |
|   2 |   MERGE JOIN OUTER   |      | 10000 |    10M|       |  4059   (1)| 00:00:49 |
|   3 |    SORT JOIN         |      | 10000 |  9853K|    19M|  2509   (1)| 00:00:31 |
|   4 |     TABLE ACCESS FULL| T1   | 10000 |  9853K|       |   390   (1)| 00:00:05 |
|*  5 |    SORT JOIN         |      | 70000 |  5400K|    12M|  1549   (1)| 00:00:19 |
|   6 |     TABLE ACCESS FULL| T2   | 70000 |  5400K|       |   260   (1)| 00:00:04 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("T2"."ID1"(+)="T1"."ID1" AND "T2"."ID2"(+)="T1"."ID2")
       filter("T2"."ID2"(+)="T1"."ID2" AND "T2"."ID1"(+)="T1"."ID1")

But there’s something a little odd about how the optimizer has chosen to do the merge join. Although our join condition references the join columns in the order (id1, id2) our final sort order is on (id2, id1) – and the optimizer hasn’t taken advantage of the fact that it could do the “sort join” operations in the order (id2, id1) and avoid the final “sort order by” at line 1.

So let’s rewrite the query to make the order of the join predicates match the order of the order by clause, and see what happens to the plan:


select
	/*+ leading(t1 t2) use_merge(t2) */
	t1.padding,
	t2.padding
from
	t1, t2
where
	t2.id2(+) = t1.id2
and	t2.id1(+) = t1.id1
order by
	t1.id2,
	t1.id1
;

------------------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      | 10000 |    10M|       |  4059   (1)| 00:00:49 |
|   1 |  MERGE JOIN OUTER   |      | 10000 |    10M|       |  4059   (1)| 00:00:49 |
|   2 |   SORT JOIN         |      | 10000 |  9853K|    19M|  2509   (1)| 00:00:31 |
|   3 |    TABLE ACCESS FULL| T1   | 10000 |  9853K|       |   390   (1)| 00:00:05 |
|*  4 |   SORT JOIN         |      | 70000 |  5400K|    12M|  1549   (1)| 00:00:19 |
|   5 |    TABLE ACCESS FULL| T2   | 70000 |  5400K|       |   260   (1)| 00:00:04 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T2"."ID2"(+)="T1"."ID2" AND "T2"."ID1"(+)="T1"."ID1")
       filter("T2"."ID1"(+)="T1"."ID1" AND "T2"."ID2"(+)="T1"."ID2")

The plan no longer has the final “sort order by” operation – and the cost of the plan is much lower as a consequence.. You’ll also notice that the predicate sections (always check the predicate section) are a little different – the order of evaluation has been reversed.

In my test case the cost of the merge join still hasn’t fallen below the cost of the hash join – but in the case of the client changing the order of predicates – without adding any hints – made the cost of the merge join much cheaper than the cost of the hash join. Fortunately this was a case where the cost was a realistic indication of run time and avoiding a sort operation of some 35GB of join result was a very good move.

So watch out – with multi-column joins, the order of the join predicates can make a big difference to the way Oracle operates a merge join.

Jonathan Lewis's picture

Quiz Night

I have four simple (non-partitioned, non-clustered, not views, not object type – really I’m not trying to be cunning or devious here) heap tables, and write a query that joins them:

select
	/*+
		leading(t1 t2 t3 t4)
		use_hash(t2) use_hash(t3) use_hash(t4)
	*/
	count(t1.small_vc),
	count(t2.small_vc),
	count(t3.small_vc),
	count(t4.small_vc)
from
	t1,
	t2,
	t3,
	t4
where
	t2.id2 = t1.id1
and	t3.id3 = t2.id2
and	t4.id4 = t3.id3
;

Given that I’ve specified the join order and every join method, and given that there are no indexes on any of the tables, how many different execution plans are there for this query.

Update: When I say I’m not trying to be cunning or devious, it means I’m not trying to be cunning or devious – it doesn’t mean that I’m being really, really cunning and devious. Honestly, I’m just trying to make a simple but important point that isn’t terribly well known.

Update 2: And when I said simple tables I should also have said they weren’t declared parallel, weren’t going to be hinted parallel, weren’t external or remote/non-oracle, etc. etc. etc.

The Answer:

The answer to the question I was trying to ask is eight – as stated then demonstrated by Pavol Babel and explained very succinctly by Greg Rahn.

The answer to the question I actually asked is four as stated by Vyacheslav Rasskazov. In my attempts to describe the situation as clearly and simply as possible I forgot about a special case that I will comment on at the end of this note.

Going back to the “expected” answer. The key point is this:

    You have NOT defined a hash join completely until you have specified which rowsource should be used as the build table and which as the probe table – so every time you supplyy the use_hash() hint for a table, you should also supply the swap_join_inputs() hint or the no_swap_join_inputs() hint.

So my original query is suffering from incomplete hinting. There are three hash joins, so there should be three hints about swapping inputs or not. For example:

/*+
	leading(t1 t2 t3 t4)
	use_hash(t2) no_swap_join_inputs(t2)
	use_hash(t3) no_swap_join_inputs(t3)
	use_hash(t4) no_swap_join_inputs(t4)
*/

Since there are two possibilites for the swap/no_swap option, there are 2 x 2 x 2 = 8 possibilities in total for the execution plan – even though only one join order is examined. (If you check the 10053 trace file for this query you will find all the computation for these execution plans under one line that reads: Join order [1], there will not be a Join order[2])

Pavol did list all the different patterns of hints with their execution plans – but I’m going to do it again, stripped to the minimum output:

use_hash(t2) no_swap_join_inputs(t2)
use_hash(t3) no_swap_join_inputs(t3)
use_hash(t4) no_swap_join_inputs(t4)

--------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost  |
--------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |     1 |    24 |    11 |
|   1 |  SORT AGGREGATE       |      |     1 |    24 |       |
|*  2 |   HASH JOIN           |      |    70 |  1680 |    11 |
|*  3 |    HASH JOIN          |      |    70 |  1260 |     8 |
|*  4 |     HASH JOIN         |      |    70 |   840 |     5 |
|   5 |      TABLE ACCESS FULL| T1   |    70 |   420 |     2 |
|   6 |      TABLE ACCESS FULL| T2   |    70 |   420 |     2 |
|   7 |     TABLE ACCESS FULL | T3   |    70 |   420 |     2 |
|   8 |    TABLE ACCESS FULL  | T4   |    70 |   420 |     2 |
--------------------------------------------------------------

use_hash(t2)    swap_join_inputs(t2)
use_hash(t3) no_swap_join_inputs(t3)
use_hash(t4) no_swap_join_inputs(t4)

--------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost  |
--------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |     1 |    24 |    11 |
|   1 |  SORT AGGREGATE       |      |     1 |    24 |       |
|*  2 |   HASH JOIN           |      |    70 |  1680 |    11 |
|*  3 |    HASH JOIN          |      |    70 |  1260 |     8 |
|*  4 |     HASH JOIN         |      |    70 |   840 |     5 |
|   5 |      TABLE ACCESS FULL| T2   |    70 |   420 |     2 |
|   6 |      TABLE ACCESS FULL| T1   |    70 |   420 |     2 |
|   7 |     TABLE ACCESS FULL | T3   |    70 |   420 |     2 |
|   8 |    TABLE ACCESS FULL  | T4   |    70 |   420 |     2 |
--------------------------------------------------------------

use_hash(t2) no_swap_join_inputs(t2)
use_hash(t3)    swap_join_inputs(t3)
use_hash(t4) no_swap_join_inputs(t4)

--------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost  |
--------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |     1 |    24 |    11 |
|   1 |  SORT AGGREGATE       |      |     1 |    24 |       |
|*  2 |   HASH JOIN           |      |    70 |  1680 |    11 |
|*  3 |    HASH JOIN          |      |    70 |  1260 |     8 |
|   4 |     TABLE ACCESS FULL | T3   |    70 |   420 |     2 |
|*  5 |     HASH JOIN         |      |    70 |   840 |     5 |
|   6 |      TABLE ACCESS FULL| T1   |    70 |   420 |     2 |
|   7 |      TABLE ACCESS FULL| T2   |    70 |   420 |     2 |
|   8 |    TABLE ACCESS FULL  | T4   |    70 |   420 |     2 |
--------------------------------------------------------------

use_hash(t2)    swap_join_inputs(t2)
use_hash(t3)    swap_join_inputs(t3)
use_hash(t4) no_swap_join_inputs(t4)

--------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost  |
--------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |     1 |    24 |    11 |
|   1 |  SORT AGGREGATE       |      |     1 |    24 |       |
|*  2 |   HASH JOIN           |      |    70 |  1680 |    11 |
|*  3 |    HASH JOIN          |      |    70 |  1260 |     8 |
|   4 |     TABLE ACCESS FULL | T3   |    70 |   420 |     2 |
|*  5 |     HASH JOIN         |      |    70 |   840 |     5 |
|   6 |      TABLE ACCESS FULL| T2   |    70 |   420 |     2 |
|   7 |      TABLE ACCESS FULL| T1   |    70 |   420 |     2 |
|   8 |    TABLE ACCESS FULL  | T4   |    70 |   420 |     2 |
--------------------------------------------------------------

use_hash(t2) no_swap_join_inputs(t2)
use_hash(t3) no_swap_join_inputs(t3)
use_hash(t4)    swap_join_inputs(t4)

--------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost  |
--------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |     1 |    24 |    11 |
|   1 |  SORT AGGREGATE       |      |     1 |    24 |       |
|*  2 |   HASH JOIN           |      |    70 |  1680 |    11 |
|   3 |    TABLE ACCESS FULL  | T4   |    70 |   420 |     2 |
|*  4 |    HASH JOIN          |      |    70 |  1260 |     8 |
|*  5 |     HASH JOIN         |      |    70 |   840 |     5 |
|   6 |      TABLE ACCESS FULL| T1   |    70 |   420 |     2 |
|   7 |      TABLE ACCESS FULL| T2   |    70 |   420 |     2 |
|   8 |     TABLE ACCESS FULL | T3   |    70 |   420 |     2 |
--------------------------------------------------------------

use_hash(t2)    swap_join_inputs(t2)
use_hash(t3) no_swap_join_inputs(t3)
use_hash(t4)    swap_join_inputs(t4)

--------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost  |
--------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |     1 |    24 |    11 |
|   1 |  SORT AGGREGATE       |      |     1 |    24 |       |
|*  2 |   HASH JOIN           |      |    70 |  1680 |    11 |
|   3 |    TABLE ACCESS FULL  | T4   |    70 |   420 |     2 |
|*  4 |    HASH JOIN          |      |    70 |  1260 |     8 |
|*  5 |     HASH JOIN         |      |    70 |   840 |     5 |
|   6 |      TABLE ACCESS FULL| T2   |    70 |   420 |     2 |
|   7 |      TABLE ACCESS FULL| T1   |    70 |   420 |     2 |
|   8 |     TABLE ACCESS FULL | T3   |    70 |   420 |     2 |
--------------------------------------------------------------

use_hash(t2) no_swap_join_inputs(t2)
use_hash(t3)    swap_join_inputs(t3)
use_hash(t4)    swap_join_inputs(t4)

--------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost  |
--------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |     1 |    24 |    11 |
|   1 |  SORT AGGREGATE       |      |     1 |    24 |       |
|*  2 |   HASH JOIN           |      |    70 |  1680 |    11 |
|   3 |    TABLE ACCESS FULL  | T4   |    70 |   420 |     2 |
|*  4 |    HASH JOIN          |      |    70 |  1260 |     8 |
|   5 |     TABLE ACCESS FULL | T3   |    70 |   420 |     2 |
|*  6 |     HASH JOIN         |      |    70 |   840 |     5 |
|   7 |      TABLE ACCESS FULL| T1   |    70 |   420 |     2 |
|   8 |      TABLE ACCESS FULL| T2   |    70 |   420 |     2 |
--------------------------------------------------------------

use_hash(t2)    swap_join_inputs(t2)
use_hash(t3)    swap_join_inputs(t3)
use_hash(t4)    swap_join_inputs(t4)

--------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost  |
--------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |     1 |    24 |    11 |
|   1 |  SORT AGGREGATE       |      |     1 |    24 |       |
|*  2 |   HASH JOIN           |      |    70 |  1680 |    11 |
|   3 |    TABLE ACCESS FULL  | T4   |    70 |   420 |     2 |
|*  4 |    HASH JOIN          |      |    70 |  1260 |     8 |
|   5 |     TABLE ACCESS FULL | T3   |    70 |   420 |     2 |
|*  6 |     HASH JOIN         |      |    70 |   840 |     5 |
|   7 |      TABLE ACCESS FULL| T2   |    70 |   420 |     2 |
|   8 |      TABLE ACCESS FULL| T1   |    70 |   420 |     2 |
--------------------------------------------------------------

Note the extreme change in shape and apparent order of tables in the plan. Despite this the join order really is t1 -> t2 -> t3 -> t4 in every case. I’ll give a quick description of the first and last plans to explain this.

First plan (no_swap all the way):

    We build a hash table from t1 and probe it with t2 to create a result set
    As this result set is generated we build a new hash table from it
    As the result set completes we discard the hash table from t1
    We probe the result set with t3 to create a second result set
    As the second result set is generated we build a new hash table from it
    As the second result set completes we discard the hash table from the first result set
    We probe the second result set with t4 to create a third result set
    As the third result set is generated we pass the results up to the aggregation step to count the output.

It is very obvious from this description that the tables are being joined in the order we dictated.
Last plan (swap all the way)

    We build a hash table from t4
    We build a hash table from t3
    We build a hash table from t2
    We pick a row from t1 and probe the t2 hash,
    if the row “survives” we probe the t3 hash
    if the row “survives” we probe the t4 hash
    if the row “survives” we pass it up to the aggregation step to be counted.

With this description it becomes clear that, once again, the tables are being joined in the order we dictated.

Notice: the number of in-memory hash tables we build in the first case is two and, no matter how many tables are involved in this pattern, the number of in-memory hash tables will always be two. The actual size of the two hash tables is a little unpredictable, though, and, a very crude guideline, you might expect the size to grow as more tables are joined into the result set. In the last case the number of in-memory hash tables we build is “N-1″ (where N is the number of tables joined). We can predict the approximate size of each hash table because it is based on the data we expect to extract from the corresponding “real” table. If you have enough memory to hold all the hash tables in memory at once you will find that this join pattern is likely to be the fastest option you can produce.

Footnote: although a hash join is not fully specified unless you have also supplied an associated “swap/no swap” hint, the no_swap_join_inputs() hint didn’t become available until 10g !

Footnote 2: I’ve had this note on my draft list for nearly a year now – after answering this question on OTN. Unfortunately I think it’s a bit too late to update the thread now.

Special Case

And now we come to the point made by Vyacheslav Rasskazov. I have changed the original code to hint the eight different paths that could come from a single join order with three hash joins. But if I were limited to changing the data (technically the statistics) I would only be able to persuade the optimizer to pick four of those paths for this join order. The four paths that the optimizer till not consider are the ones which I have hinted with /*+ swap_join_inputs(t2) – i.e. the 2nd, 4th, 6th and 8th above.

As far as I can tell the built-in decision process is this: when considering a hash join, the optimizer will examine the cost of swapping the inputs if the expected volume (i.e. predicted cardinality * row length) of the second input is smaller than the expected volume of the first input. However, in the special case of the first join of the first join order, this does not happen. This is fairly easy to demonstrate by setting up a two-table example where selected content is a small number of large rows from one table and a large number of small rows from the other table – the first join order is dictated by number of rows, swapping is dictated by volume of data. I suspect, but cannot confirm, that this is an accident (i.e. bug) since I can think of no rational explanation of why this has to happen.

Since my original text only allowed for one join order (the leading() hint) the optimizer will not examine the option for swapping the (t1, t2) step of that join order – which is why Vyacheslav’s answer is the answer to the question I actually asked.

My apologies – this wasn’t intended to be a cunning trap – I simply forgot that odd little glitch when I was writing up the question.

In passing, I have launched (yet another) poll in response to a question from a reader. In this case the poll is whether I should carry on blogging, or stop blogging for a year to write a book. To my mind it makes more sense to blog because it gets more information out in a more timely fashion – but I am curious to see what other people think.

Jonathan Lewis's picture

Index Join – 3

I’ve recently been writing about the index join mechanism and ways of emulating it. Those notes were originally inspired by an example of an index join that appeared on OTN a little while ago.

It was a plan that combined “bitmap/btree conversion” with the basic index join strategy so, with hindsight, it was an “obvious” and brilliant execution plan for a certain type of query. The query in the original posting was a simple select (with no predicates) against a huge table in a data warehouse – presumably extracting a small number of columns from a much wider row.

SELECT DISTINCT
	ECP_ITEM_MASTER_DIM.ORGANIZATION_ID,
	ECP_ITEM_MASTER_DIM.INV_MFG_PRODUCTION_LINE,
	ECP_ITEM_MASTER_DIM.INV_PRODUCT_FAMILY,
	ECP_ITEM_MASTER_DIM.INV_SEGMENT_3,
	ECP_ITEM_MASTER_DIM.INV_SEGMENT_4,
	ECP_ITEM_MASTER_DIM.INV_SEGMENT_5
FROM
	ecp.ECP_ITEM_MASTER_DIM
;

(I really hate reading SQL where the whole table name has been repeated as the alias all the way through the SQL – it makes the code so hard to read, especially when it’s all in upper case. It’s important to use aliases, of course, but 3 or 4 letters is a sensible length.)

Here’s the execution plan:


---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name                         | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |                              |  1799K|    42M|       | 51279   (1)| 00:10:16 |
|   1 |  HASH UNIQUE                       |                              |  1799K|    42M|   151M| 51279   (1)| 00:10:16 |
|   2 |   VIEW                             | index$_join$_001             |  1799K|    42M|       | 38227   (1)| 00:07:39 |
|*  3 |    HASH JOIN                       |                              |       |       |       |            |          |
|*  4 |     HASH JOIN                      |                              |       |       |       |            |          |
|*  5 |      HASH JOIN                     |                              |       |       |       |            |          |
|*  6 |       HASH JOIN                    |                              |       |       |       |            |          |
|*  7 |        HASH JOIN                   |                              |       |       |       |            |          |
|   8 |         BITMAP CONVERSION TO ROWIDS|                              |  1799K|    42M|       |   485   (1)| 00:00:06 |
|   9 |          BITMAP INDEX FULL SCAN    | ECP_ITEM_MASTER_DIM_IMPL_BMX |       |       |       |            |          |
|  10 |         BITMAP CONVERSION TO ROWIDS|                              |  1799K|    42M|       |   230   (0)| 00:00:03 |
|  11 |          BITMAP INDEX FULL SCAN    | ECP_ITEM_MASTER_DIM_IPF_BMX  |       |       |       |            |          |
|  12 |        BITMAP CONVERSION TO ROWIDS |                              |  1799K|    42M|       |   229   (0)| 00:00:03 |
|  13 |         BITMAP INDEX FULL SCAN     | ECP_ITEM_MASTER_DIM_IS3_BMX  |       |       |       |            |          |
|  14 |       BITMAP CONVERSION TO ROWIDS  |                              |  1799K|    42M|       |   228   (0)| 00:00:03 |
|  15 |        BITMAP INDEX FULL SCAN      | ECP_ITEM_MASTER_DIM_IS4_BMX  |       |       |       |            |          |
|  16 |      BITMAP CONVERSION TO ROWIDS   |                              |  1799K|    42M|       |   201   (0)| 00:00:03 |
|  17 |       BITMAP INDEX FULL SCAN       | ECP_ITEM_MASTER_DIM_IS5_BMX  |       |       |       |            |          |
|  18 |     BITMAP CONVERSION TO ROWIDS    |                              |  1799K|    42M|       |   207   (0)| 00:00:03 |
|  19 |      BITMAP INDEX FULL SCAN        | ECP_ITEM_MASTER_DIM_OI_BMX   |       |       |       |            |          |
---------------------------------------------------------------------------------------------------------------------------

Isn’t it brilliant! The optimizer has seen that all the required columns can be found in indexes (six of them) – but they happen to be bitmap indexes so the optimizer has done a “bitmap conversion to rowid” on all six indexes one after the other with five consecutive hash joins – carrying the column values with each conversion and hash join.

Unfortunately the owner of this plan wasn’t happy with the resulting plan because a full tablescan turned out to be faster – nevertheless, it’s a very clever concept as the size of the table was measured in Gigabytes while the indexes were only a few megabytes each, allowing for a significant saving in I/O time.

I was a little curious, though, about the final join strategy. It’s annoying that Oracle didn’t report any costs on the hash join lines themselves because that could be very revealing. It’s remarkable that the value in the Bytes column for the final view (which is six columns of data) is the same as the bytes column for each index conversion (and remember that the projection from each conversion is just one data column with an accompanying rowid) – there’s clearly something wrong with the arithmetic.

This may explain why the optimizer has decided to run the 6-way join using only two running hash joins (rather than first setting up five hash tables in memory than passing the last table through them). If you think about this, when Oracle gets to the last hash join (lines 3, 4 and 18) it has to build a hash table from the result of the previous four joins and (in this case) that’s going to need a similar amount of memory as five in-memory hash tables. With that thought in mind I was puzzled that Oracle hadn’t just built five in-memory hash tables then walked through each in turn with the sixth.

Still – it’s not my (or my client’s) problem; maybe one day I’ll need to look more closely at a similar case.

[Further reading on Index Joins]

Jonathan Lewis's picture

ANSI – argh

I’m not keen on ANSI standard SQL – even though it is, technically, the strategic option and even though you have to use it for full outer joins and partitioned outer joins.

One reason for disliking it is that it “separates join predicates from filter predicates” – a reason often given in praise of the syntax which, to my mind, claims a spurious distinction and introduces a mechanism that makes it harder to keep mental track of what’s going to happen as you walk  through the join order.

The other reason for disliking ANSI SQL in Oracle databases is that sometimes it really is necessary to add hints to the SQL to make the optimizer do what needs to be done – and ANSI makes it so much harder and messier to add hints to code. Here’s a wonderful example that Tony Hasler presented in our recent debate “Does Oracle Ignore Hints” at the UKOUG annual conference:

WITH q1 as (
	SELECT /*+ qb_name(q1block) */
		*
	FROM	t1
	JOIN	t2
	ON	t1_i1 = t2_i1
	AND	t1_i1 < 10
),
q2 AS (
	SELECT
		/*+ qb_name(q2block) */
		*
	FROM
		t3
	JOIN	t4
	ON	t3_i1 = t4_i1
	AND	t3_i1 < 10
)
SELECT
	/*+
		no_merge(@q1block)
		no_merge(@q2block)
		leading (@q1block t2)
		use_nl  (@q1block t1)
	*/
	*
FROM
	q1
JOIN
	q2
ON	t1_i1 + t2_i1 = t3_i1 + t4_i1
;

Just to make life really hard, he’s included a couple of “factored subqueries” – and there are a few outstanding optimizer defects with handling subquery factoring – so when he claimed that this was an example of Oracle ignoring hints I had two different directions of investigation to worry about.

Here’s the execution plan (from my 10.2.0.3 system with the data generation, constraints and indexing that Tony supplied):

------------------------------------------------------------------------
| Id  | Operation                      | Name  | Rows  | Bytes | Cost  |
------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |       |   250K|    33M|    54 |
|*  1 |  HASH JOIN                     |       |   250K|    33M|    54 |
|   2 |   VIEW                         |       |  5000 |   327K|    11 |
|*  3 |    HASH JOIN                   |       |  5000 |   581K|    11 |
|   4 |     TABLE ACCESS BY INDEX ROWID| T4    |  5000 |   253K|     3 |
|*  5 |      INDEX RANGE SCAN          | T4_I1 |   900 |       |     2 |
|   6 |     TABLE ACCESS BY INDEX ROWID| T3    |  5000 |   327K|     3 |
|*  7 |      INDEX RANGE SCAN          | T3_I1 |   900 |       |     2 |
|   8 |   VIEW                         |       |  5000 |   361K|    12 |
|*  9 |    HASH JOIN                   |       |  5000 |   615K|    12 |
|  10 |     TABLE ACCESS BY INDEX ROWID| T1    |  5000 |   297K|     3 |
|* 11 |      INDEX RANGE SCAN          | T1_I1 |   900 |       |     2 |
|  12 |     TABLE ACCESS BY INDEX ROWID| T2    |  5000 |   317K|     3 |
|* 13 |      INDEX RANGE SCAN          | T2_I1 |   900 |       |     2 |
------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$16C51A37
   2 - SEL$4C69CCA2 / Q2@SEL$1
   3 - SEL$4C69CCA2
   4 - SEL$4C69CCA2 / T4@SEL$2
   5 - SEL$4C69CCA2 / T4@SEL$2
   6 - SEL$4C69CCA2 / T3@SEL$2
   7 - SEL$4C69CCA2 / T3@SEL$2
   8 - SEL$7939585E / Q1@SEL$1
   9 - SEL$7939585E
  10 - SEL$7939585E / T1@SEL$3
  11 - SEL$7939585E / T1@SEL$3
  12 - SEL$7939585E / T2@SEL$3
  13 - SEL$7939585E / T2@SEL$3

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1_I1"+"T2_I1"="T3_I1"+"T4_I1")

As you can see, Oracle has copied the two factored subqueries inline (they appear just once each in the body of the query so this is – probably – inevitable). Then Oracle has obeyed the no_merge() hints – which I could check by deleting the hints and watching the plan change. So why, in lines 10 through 13, has Oracle not obeyed the leading() and use_nl() hints ?

By changing the ANSI syntax to traditional Oracle syntax, I got a different plan:

-------------------------------------------------------------------------
| Id  | Operation                       | Name  | Rows  | Bytes | Cost  |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |       |   250K|    33M| 10045 |
|*  1 |  HASH JOIN                      |       |   250K|    33M| 10045 |
|   2 |   VIEW                          |       |  5000 |   327K|    11 |
|*  3 |    HASH JOIN                    |       |  5000 |   581K|    11 |
|   4 |     TABLE ACCESS BY INDEX ROWID | T4    |  5000 |   253K|     3 |
|*  5 |      INDEX RANGE SCAN           | T4_I1 |   900 |       |     2 |
|   6 |     TABLE ACCESS BY INDEX ROWID | T3    |  5000 |   327K|     3 |
|*  7 |      INDEX RANGE SCAN           | T3_I1 |   900 |       |     2 |
|   8 |   VIEW                          |       |  5000 |   361K| 10003 |
|   9 |    TABLE ACCESS BY INDEX ROWID  | T1    |     1 |    61 |     2 |
|  10 |     NESTED LOOPS                |       |  5000 |   615K| 10003 |
|  11 |      TABLE ACCESS BY INDEX ROWID| T2    |  5000 |   317K|     3 |
|* 12 |       INDEX RANGE SCAN          | T2_I1 |   900 |       |     2 |
|* 13 |      INDEX RANGE SCAN           | T1_I1 |     1 |       |     1 |
-------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
   2 - Q2BLOCK / Q2@SEL$1
   3 - Q2BLOCK
   4 - Q2BLOCK / T4@Q2BLOCK
   5 - Q2BLOCK / T4@Q2BLOCK
   6 - Q2BLOCK / T3@Q2BLOCK
   7 - Q2BLOCK / T3@Q2BLOCK
   8 - Q1BLOCK / Q1@SEL$1
   9 - Q1BLOCK / T1@Q1BLOCK
  11 - Q1BLOCK / T2@Q1BLOCK
  12 - Q1BLOCK / T2@Q1BLOCK
  13 - Q1BLOCK / T1@Q1BLOCK

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1_I1"+"T2_I1"="T3_I1"+"T4_I1")
   3 - access("T3_I1"="T4_I1")
   5 - access("T4_I1"<10)
   7 - access("T3_I1"<10)
  12 - access("T2_I1"<10)
  13 - access("T1_I1"="T2_I1")
       filter("T1_I1"<10)

Notice how the optimizer is now obeying the leading() and use_nl() hints.

The problem is this: Oracle doesn’t optimise ANSI SQL, it transforms it then optimises it. Transformation can change query blocks, and Tony’s hints apply to specific query blocks. After a little testing and checking I worked out what the SQL looked like AFTER transformation and BEFORE optimisation; and it’s this:

select
	/*+ qb_name(sel$4) */
	*
from
	(
	SELECT
		/*+ qb_name(sel$1) */
		Q1.T1_I1 T1_I1, Q1.T1_I2 T1_I2, Q1.T1_D1 T1_D1, Q1.T2_I1 T2_I1, Q1.T2_I2 T2_I2, Q1.T2_TS T2_TS,
		Q2.T3_I1 T3_I1, Q2.T3_I2 T3_I2, Q2.T3_TSTZ T3_TSTZ, Q2.T4_I1 T4_I1, Q2.T4_I2 T4_I2
	FROM
		(
		SELECT
			/*+ NO_MERGE QB_NAME (Q1BLOCK) */
			from$_subquery$_003.T1_I1_0 T1_I1, from$_subquery$_003.T1_I2_1 T1_I2,
			from$_subquery$_003.T1_D1_2 T1_D1, from$_subquery$_003.T2_I1_3 T2_I1,
			from$_subquery$_003.T2_I2_4 T2_I2, from$_subquery$_003.T2_TS_5 T2_TS
		FROM	(
			SELECT
				/*+ qb_name(sel$3) */
				T1.T1_I1 T1_I1_0, T1.T1_I2 T1_I2_1, T1.T1_D1 T1_D1_2,
				T2.T2_I1 T2_I1_3, T2.T2_I2 T2_I2_4, T2.T2_TS T2_TS_5
			FROM
				TEST_USER.T1 T1,
				TEST_USER.T2 T2
			WHERE
				T1.T1_I1 = T2.T2_I1
			AND	T1.T1_I1 < 10
			)	from$_subquery$_003
		)	Q1,
		(
		SELECT
			/*+ NO_MERGE QB_NAME (Q2BLOCK) */
			from$_subquery$_006.T3_I1_0 T3_I1, from$_subquery$_006.T3_I2_1 T3_I2,
			from$_subquery$_006.T3_TSTZ_2 T3_TSTZ, from$_subquery$_006.T4_I1_3 T4_I1,
			from$_subquery$_006.T4_I2_4 T4_I2
		FROM
			(
			SELECT
				/*+ qb_name(sel$2) */
				T3.T3_I1 T3_I1_0, T3.T3_I2 T3_I2_1, T3.T3_TSTZ T3_TSTZ_2,
				T4.T4_I1 T4_I1_3, T4.T4_I2 T4_I2_4 FROM TEST_USER.T3 T3, TEST_USER.T4 T4
			WHERE
				T3.T3_I1 = T4.T4_I1
			AND	T3.T3_I1 < 10
			)	from$_subquery$_006
		)	Q2
	WHERE
		Q1.T1_I1 + Q1.T2_I1 = Q2.T3_I1 + Q2.T4_I1
	)
;

I got most of this from the “Query Block Name / Object Alias” section of the ANSI execution plan (there are some important clues there, like ‘T1@SEL$3′) and the “unparsed” SQL from the 10053 trace.

Notice how the query blocks q1block and q2block still exist – that’s why the no_merge() hints can survive the transformation. Notice, though, that the transformation engines has introduced a layer of inline views inside q1block and q2block - which is why the leading(@q1block t2) and use_nl(@q1block t1) hints are no longer valid: they reference objects which are not in q1block. To get his hints to work at the global level, Tony would have to change the last two hints to reference sel$3 rather than q1block.

So, next time you write a complicated piece of ANSI, make sure you think carefully about what you’re going to have to do if you subsequently have to add hints to force a particular execution plan.  (And bear in mind that one day the transformation engine might be modified to transform the query differently.)

[Further reading on "ignoring hints"]

To prevent automated spam submissions leave this field empty.
Syndicate content