Indexing

Jonathan Lewis's picture

Clustering_Factor

Being a very reserved British type of character I’m not really one to make a big fuss about advertising myself, which is why it’s taken me five years to realise that I ought to make it easy for people to find the free download of Chapter 5 (Clustering Factor) of Cost Based Oracle Fundamentals.

Apress changes the relevant URL from time to time, and I’ve just discovered that they’ve now bundled the pdf file of the chapter into this  zip file.

The thing that prompted me to post this special note was that some time ago Mohamed Houri translated the chapter into French as a gesture of appreciation for the fact that I had written the book and Apress has given me permission to post the translation, which is this pdf file.

While I’m on the topic of French translations I’ll just add a temporary note to point people to the listing of articles translated into French by Franck Pachot, where I’ve just added seven new items, five of mine and two from Greg Rahn.

Jonathan Lewis's picture

ASSM wreck

Yesterday I introduced a little framework I use to avoid the traps inherent in writing PL/SQL loops when modelling a session that does lots of simple calls to the database. I decided to publish the framework because I had recently come across an example where a series of SQL statements gives a very different result from a single PL/SQL block.

The model starts with a simple data set – which in this case is created in a tablespace using ASSM (automatic segment space management), an 8KB block size and 1MB uniform extents (in a locally management tablespace).


create table t1
tablespace test_8k_assm
as
select
	trunc((rownum-1)/100)	n1,
	lpad('x',40)		v1,
	rpad('x',100)		padding
from
	dual
connect by
	rownum <= 20000
;

create index t1_i1 on t1(n1, v1)
tablespace test_8k_assm
;

validate index t1_i1;
execute print_table('select * from index_stats');

You can see that the n1 column is defined to have 200 rows for each of 100 different values, and that each set of two hundreds rows is stored (at least initially) in a very small cluster of blocks.

With the data set in place I am now going to pick a set of two hundred rows at random, delete it, re-insert it, and commit; and I’m going to repeat that process 1,000 times.

declare
	rand	number(3);
begin
	for i in 1..1000 loop

		rand := trunc(dbms_random.value(0,200));

		delete from t1
		where n1 = rand
		;

		insert into t1
		select
			rand,
			lpad('x',40),
			rpad('x',100)
		from
			dual
		connect by
			rownum <= 100
		;

		commit;

	end loop;
end;
/

validate index t1_i1;
execute print_table('select * from index_stats');

You might think that this piece of code is a little strange – but it is a model of some processing that I’ve recently seen on a client site, and it has crossed my mind that it might appear in a number of systems hidden underneath the covers of dbms_job. So what does it do to the index ?

Given the delay that usually appears between the time an index entry is marked as deleted and the time that the space can be reused, and given the way I’ve engineered my date so that the space needed for the 200 rows for each key value is little more than a block (an important feature of this case), I wouldn’t be too surprised if the index had stabilised at nearly twice its original size. But that’s not what happened to my example running under ASSM. Here are the “before” and “after” results from my test:


                       Before         After
LF_ROWS                20,000        70,327
LF_BLKS                   156           811
LF_ROWS_LEN         1,109,800     3,877,785
BR_ROWS                   155           810
BR_BLKS                     3            10
BR_ROWS_LEN             8,903        45,732
DEL_LF_ROWS                 0        50,327
DEL_LF_ROWS_LEN             0     2,767,985
DISTINCT_KEYS             200           190
MOST_REPEATED_KEY         100         1,685
BTREE_SPACE         1,272,096     6,568,320
USED_SPACE          1,118,703     3,923,517
PCT_USED                   88            60
ROWS_PER_KEY              100           370
BLKS_GETS_PER_ACCESS       54           189

It’s a small disaster – our index has grown in size by a factor of about five, and we have more deleted rows than “real” rows. (Note, by the way, that the awfulness of the index is NOT really indicated by the PCT_USED figure – one which is often suggested as an indicator of the state of an index).

Unfortunately this is the type of problem that doesn’t surprise me when using ASSM; it’s supposed to help with highly concurrent OLTP activity (typified by a large number of very small transactions) but runs into problems updating free space bitmaps whenever you get into “batch-like” activity.

However, there is a special consideration in play here – I’ve run the entire operation as a single pl/sql loop. Would the same problem appear if I ran each delete/insert cycle as a completely independent SQL script using the “start_1000.sql” script from my previous note ?

To test the effect of running 1,000 separate tasks, rather than executing a single pl/sql loop, I wrote the following code into the start_1.sql script that I described in the article before running start_1000.sql:


declare
	rand	number(3);
begin

	rand := trunc(dbms_random.value(0,200));

	delete from t1
	where n1 = rand
	;

	insert into t1
	select
		rand,
		lpad('x',40),
		rpad('x',100)
	from
		dual
	connect by
		rownum <= 100
	;

	commit;

end;
/

The impact was dramatically different. (Still very wasteful, but quite a lot closer to the scale of the results that you might expect from freelist management).


                       Before         After
                    ---------     ---------
LF_ROWS                20,000        39,571
LF_BLKS                   156           479
LF_ROWS_LEN         1,109,800     2,196,047
BR_ROWS                   155           478
BR_BLKS                     3             6
BR_ROWS_LEN             8,903        26,654
DEL_LF_ROWS                 0        19,571
DEL_LF_ROWS_LEN             0     1,086,247
DISTINCT_KEYS             200           199
MOST_REPEATED_KEY         100           422
BTREE_SPACE         1,272,096     3,880,192
USED_SPACE          1,118,703     2,222,701
PCT_USED                   88            58
ROWS_PER_KEY              100           199
BLKS_GETS_PER_ACCESS       54           102

I haven’t yet investigated why the pl/sql loop should have produced such a damaging effect – although I suspect that it might be a side effect of the pinning of bitmap blocks (amongst others, of course) that takes place within a single database call. It’s possible that the repeated database calls from SQL*Plus keep “rediscovering” bitmap blocks that show free space while the pinning effects stop the pl/sql from “going back” to bitmap blocks that have recently acquired free space.

Interestingly the impact of using ASSM was dramatically reduced if one object used freelists and the other used ASSM – and with my specific example the combination of a freelist table with an ASSM index even did better than the expected 50% usage from the “traditional” option of using freelists for both the table and index.

Note – the purpose of this note is NOT to suggest that you should avoid using ASSM in general; but if you can identify code in your system that is doing something similar to the model then it’s worth checking the related indexes (see my index efficiency note) to see if any of them are displaying the same problem as this test case. If they are you may want to do one of two things: think about a schedule for coalescing or even rebuilding problem indexes on a regular basis, or see if you can move the table, index, or both, into a tablespace using freelist management.

Jonathan Lewis's picture

Index Rebuilds

A couple of days ago I found several referrals coming in from a question about indexing on the Russian Oracle Forum. Reading the thread I found a pointer to a comment I’d written for the Oracle-L list server a couple of years ago about Advanced Queueing and why you might find that it was necessary to rebuild the IOTs (index organized tables) that support AQ.

The queue tables are, of course, a perfect example of what I call the “FIFO” index so it’s not a surprise that they might need special consideration. Rather than rewrite the whole note I’ll just link to it from here. (One of the notes in the rest of the Oracle-L thread also points to MOS document 271855.1 which describes the whys and hows of rebuilding AQ tables.)

Jonathan Lewis's picture

Prefixed

From time to time the question about whether local indexes on partitioned tables should be prefixed or non-prefixed appears on the Oracle forums and mailing lists.

It’s possible that I’m wrong – although no-one has come up with a counter-example to the statement I keep repeating – but the whole prefixed/non-prefixed thing for local indexes dates back to a limitation in the optimizer somewhere in the 8.0 time line where Oracle couldn’t do partition elimination on indexes properly but the overhead of the error it made could be dramatically reduced (in most cases) by sticking the partition key at the start of the index.

The guideline for local indexes are the same as they would be for a non-partitioned index on a non-partitioned heap table – the partitioning column(s) are just columns in the table that might be worth including in the index, and the order of the index columns is dictated by the important queries that have to access the table.

For further comments, there’s a note I wrote (which I’ve just been reminded of) on the OTN database forum that adds a little detail to this argument.

Jonathan Lewis's picture

Index Rebuilds

There are many suggestions floating around the internet about identifying which Oracle indexes to rebuild. One of these involves running the validate (or analyze index validate) command on an index and checking the resulting figures in view index_stats to determine whether or not del_lf_rows exceeds some percentage (usually 20%) of lf_rows and rebuilding the index if so. Since this suggestion came up, yet again, in a recent OTN thread about Oracle index rebuilding I thought I’d write up a quick note highlighting one of the obvious flaws in the argument.

I’m not going to bother pointing out the threat inherent in the table-locking when you use the “validate index” command (or the “analyze index validate” equivalent) but I think it’s worth making some comments about the misunderstanding built into this policy. So let’s start by building some data, creating an index on it, then deleting 90% of the data:


create table t1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		rownum <= 10000
)
select
	1000000 + rownum	id,
	lpad(rownum,10,'0')	small_vc
from
	generator	v1,
	generator	v2
where
	rownum <= 40000
;

create index t1_i1 on t1(id);

validate index t1_i1;
select
	100 * del_lf_rows/lf_rows deletion_percent
from
	index_stats
;

delete from t1
where
	mod(id,100) >= 10
;

commit;

validate index t1_i1;
select
	100 * del_lf_rows/lf_rows deletion_percent
from
	index_stats
;

I haven’t bothered to collect statistics in this code as I’m not interested in execution plans, only in the amount of data deleted and what this does to the physical structure of the index. Here’s the the output of my script starting from the moment just after I’ve created the index:


Index analyzed.

DELETION_PERCENT
----------------
               0

1 row selected.

36000 rows deleted.

Commit complete.

Index analyzed.

DELETION_PERCENT
----------------
              90

1 row selected.

According to the general advice, this index is now in need of a rebuild since del_lf_rows is far more than 20% of lf_rows – but before we rebuild the index let’s delete a little more data.


delete from t1
where
	mod(id,100) = 0
;

commit;

validate index t1_i1;
select
	100 * del_lf_rows/lf_rows deletion_percent
from
	index_stats
;

My first delete statement got rid of 90% of the data leaving the 4,000 rows where mod(id,100) was between zero and nine. So my second delete has eliminated 10% of the remaining 4,000 rows. Let’s see what we get when we validate the index:


400 rows deleted.

Commit complete.

Index analyzed.

DELETION_PERCENT
----------------
              10

1 row selected.

How wonderful – by deleting a few more rows we’ve got to a state where we don’t need to rebuild the index after all !

Warning

This note is not intended to tell you when you should, or should not, rebuild an index. It is simply trying to highlight the fact that anyone who thinks that a figure exceeding 20% for del_lf_rows / lf_rows does not have a proper understanding of how indexes work, and is basing their assumption on an over-simplistic model. In this case the error in this model is that it allows you to miss indexes which might actually benefit from some action.

The problem is based on a fundamental misunderstanding, which is this: if you believe that an index entry reserves a spot in an index and leave a little hole that can never be reused when it is deleted (unless you re-insert exactly the same value) then inevitably you will think that the del_lf_rows figure is some sort of measure of actual space that could be reclaimed.

But, as the Oracle myth busters like Richard Foote have been saying for years,  that’s not how Oracle’s B-tree indexes work. When you delete an index entry, Oracle marks it as deleted but leaves it in place. When you commit your transaction Oracle does nothing to the index entry – but other processes now know that the entry can be wiped from the block allowing the space to be re-used.

This is what del_lf_rows is about – it’s the number of rows that are marked as deleted by transactions that have committed; and since the validate command can only run if it has locked the table in exclusive mode, any index entries marked for deletion will inevitably be committed deletes. So after I had deleted (and commited) 36,000 rows there were 36,000 entries in the index marked as deleted and committed; when I deleted a few more entries my second transaction wiped the deleted entries from any leaf blocks it visited, tidying the blocks (with a “heap block compress”) before marking a few more rows as deleted.

The upshot of this is that many systems (especially OLTP systems) will see del_lf_rows as a fairly small fraction of lf_rows because most systems tend to do little updates scattered randomly across lots of leaf blocks – constantly wiping out the rows marked as deleted by earlier transactions.

It’s really only in the case where a single large delete has taken place in the recent past that you’re likely to see a lot of deleted rows still in the index when you validate it and – as most people are now aware – a process that is supposed to do a very large delete is a process that should be considered in the design phase as a candidate for special treatment such as dropping/disabling some indexes before the delete then rebuilding afterwards. It won’t be a process where you will have to validate the index to decide roughly how much data you’ve deleted, and where in the index that data was, it’s a process where you’ll know what’s going to happen before it happens.

[Further reading on rebuilding indexes]

Jonathan Lewis's picture

Index join – 4

In a recent note I wrote about index joins I made a passing comment about limitations in the optimizer’s available strategies that might make you choose to write your code to emulate an index join through explicit SQL references.

Here are two SQL similar SQL statements (with execution plans) that demonstrate the initial problem – the first is just a restatement of the basic example I supplied in the first article:

create table indjoin
as
select
	rownum	id,
	rownum	val1,
	rownum	val2,
	rpad('x',500) padding
from all_objects where rownum <= 3000
;

-- collect stats, compute, no histograms

create unique index ij_v1 on indjoin(id, val1);
create unique index ij_v2 on indjoin(id, val2);

select
	val1, val2
from
	indjoin		ij
where
	val1 between 100 and 200
and	val2 between 50 and 150
;

---------------------------------------------------------------------------
| Id  | Operation              | Name             | Rows  | Bytes | Cost  |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                  |     3 |    24 |    24 |
|*  1 |  VIEW                  | index$_join$_001 |     3 |    24 |    24 |
|*  2 |   HASH JOIN            |                  |       |       |       |
|*  3 |    INDEX FAST FULL SCAN| IJ_V1            |     3 |    24 |    11 |
|*  4 |    INDEX FAST FULL SCAN| IJ_V2            |     3 |    24 |    11 |
---------------------------------------------------------------------------

select
	val1, val2, rowid
from
	indjoin		ij
where
	val1 between 100 and 200
and	val2 between 50 and 150
;

-----------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost  |
-----------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |     3 |    60 |    17 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| INDJOIN |     3 |    60 |    17 |
|*  2 |   INDEX FULL SCAN           | IJ_V1   |   102 |       |     9 |
-----------------------------------------------------------------------

When we include the rowid in the query the optimizer stops using the index join – and it won’t even use the mechanism if we hint it. Apparently, for the purposes of analysing the query, Oracle doesn’t recognise the rowid as a column in the table and this automatically precludes the possibility of using the index join as the access method. So we have to use the manual rewrites I introduced in an earlier article.

You might wonder why this matters – but consider a case where a “perfect” index doesn’t exist for the following query:

select
	padding
from
	indjoin		ij
where
	val1 between 100 and 200
and	val2 between 50 and 150
;

The only access path available to the optimizer at this point is a fulll tablescan – but what if the two indexes are very small compared to the table; wouldn’t it be a good idea to use an index hash join between the two indexes to get a list of rowids and visit the table only for those rows. Unfortunately isn’t a path the optimizer can derive – so we might try something like:


select
	t.padding
from
	(
	select
		/*+
			index_join(ij ij_v1 ij_v2)
			no_merge
		*/
		rowid
	from
		indjoin		ij
	where
		val1 between 100 and 200
	and	val2 between 50 and 150
	)	v1,
	indjoin	t
where
	t.rowid = v1.rowid
;

But, as we’ve just seen, you can’t do an index join if you select the rowid, so this code won’t follow the strategy we want. (In fact, when I tried it, there was something distinctly bug-like about the plan – but I won’t go into that now). But we can do the following:


select
	t.padding
from
	(
	select
		rowid
	from
		indjoin		ij
	where
		val1 between 100 and 200
	)	v1,
	(
	select
		rowid
	from
		indjoin		ij
	where
		val2 between 50 and 150
	)	v2,
	indjoin	t
where
	v2.rowid = v1.rowid
and	t.rowid = v2.rowid
;

-----------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost  |
-----------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |     3 |  1632 |    10 |
|   1 |  NESTED LOOPS               |         |     3 |  1632 |    10 |
|*  2 |   HASH JOIN                 |         |     3 |    96 |     7 |
|*  3 |    INDEX FAST FULL SCAN     | IJ_V1   |   102 |  1632 |     3 |
|*  4 |    INDEX FAST FULL SCAN     | IJ_V2   |   102 |  1632 |     3 |
|   5 |   TABLE ACCESS BY USER ROWID| INDJOIN |     1 |   512 |     1 |
-----------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("V2".ROWID="V1".ROWID)
   3 - filter("VAL1">=100 AND "VAL1"<=200)
   4 - filter("VAL2">=50 AND "VAL2"<=150)

It’s amazing what you can make the optimizer do (even without hinting) if you think about the mechanics underneath the basic operations.

[Further reading on Index Joins]

Jonathan Lewis's picture

FBI oddities

Function-based indexes are wonderful things – but they don’t always work exactly as expected. Here’s an example of one such anomaly.

Imagine you have some type of “orders” table where most orders are in a “finished with” state, and you have a requirement to access the small number of orders in the “new” state. Here’s a sample data set to emulate this type of data requirement (created in 11.1.0.6, 1MB uniforma extents, freelist management and 8KB blocks).


create table t1 (
	state		varchar2(10),
	n1		number,
	v1		varchar2(10),
	padding	varchar2(100)
);

insert into t1
select
	decode(mod(rownum,100),0,'OPEN','CLOSED'),
	rownum,
	lpad(rownum,10,0),
	rpad('x',100,'x')
from
	all_objects
where
	rownum <= 5000
;

I’ve generated this data set so that every 100th row is marked as ‘OPEN’ and all the rest are marked as ‘CLOSED’ – in a real system the percentage of ‘OPEN’ orders would probably be much smaller so we could easily decide to have an index on state to give us an efficient access path to the open orders. But such an index would be very large, because it would also hold entries for the huge number of closed orders; we’d also have to create a histogram on the column (possibly by writing a simple script) so that Oracle could recognise the skewed data distribution.

If we wanted to be clever, though, and if we were able to edit the SQL that addressed this table, we could minimise the size of the index and avoid the need for a histogram by creating a function-based index that held values just for the rows where the state was ‘OPEN’. For example, I could create an index which held the order number only for those rows where the state was open; and there are several ways I could do this, for example:


create index t1_f1 on t1(decode(state,'CLOSED', to_number(null), n1 ));
create index t1_f2 on t1(to_number(decode(state,'CLOSED', null, n1 )));
create index t1_f3 on t1(case when state = 'CLOSED' then to_number(null) else n1 end);
create index t1_f4 on t1(to_number(case when state = 'CLOSED' then null else n1 end));
create index t1_f5 on t1(decode(state,'CLOSED', null, n1 ));
create index t1_f6 on t1(decode(state,'CLOSED', cast(null as number), n1 ));

If you’re wondering why I’ve included a “to_number()” in the first index, remember that NULL is implicitly assumed to be a NULL of type character by Oracle – so I’ve got to do something to tell Oracle that this NULL is supposed to be a numeric NULL. Index t1_f5 is the same as t1_f1, but without the to_number(), and index t1_f6 is the same again but using the more modern cast() to supply the conversion.

You’ll note that I haven’t yet shown any attempt to collect statistics. If we create the indexes AFTER we’ve collected stats on the table we’ll have to collect some extra table stats once the indexes exist because each function-based index will have added a new (hidden) column to the table and, although the “create index” commands will have created statistics for the indexes (from 10g onwards), we will not yet have stats on these hidden columns. So I’m going to wait until after creating the indexes to generate the stats:


begin
	dbms_stats.gather_table_stats(
		ownname		=> user,
		tabname		=>'T1',
		estimate_percent	=> 100,
		method_opt		=> 'for all columns size 1'
	);
end;
/

The question is now this – given the definitions of the indexes above, which of the following six queries – each one designed to be an exact match for one of the index definitions – will use “its” index. (Note that I have hinted the queries to ensure that if the optimizer is allowed to use an index it will use an index – and I’ve included the name of the relevant index as a comment at the end of each hint):


select
	/*+ index(t1) t1_f1 */
	v1
from
	t1
where
	decode(state,'CLOSED', to_number(null), n1 ) = 100
;

select
	/*+ index(t1) t1_f2 */
	v1
from
	t1
where
	to_number(decode(state,'CLOSED', null, n1 )) = 100
;

select
	/*+ index(t1) t1_f3 */
	v1
from
	t1
where
	case when state = 'CLOSED' then to_number(null) else n1 end = 100
;

select
	/*+ index(t1) t1_f4 */
	v1
from
	t1
where
	to_number(case when state = 'CLOSED' then null else n1 end) = 100
;

select
	/*+ index(t1) t1_f5 */
	v1
from
	t1
where
	decode(state,'CLOSED', null, n1 ) = 100
;

select
	/*+ index(t1) t1_f6 */
	v1
from
	t1
where
	decode(state,'CLOSED', cast(null as number), n1 ) = 100
;

The answer depends on the version of Oracle. Under Oracle 11.1.0.6 I got the following results. First, the attempt to create t1_f5 resulted in the following Oracle error (and that’s an important clue to what has happened in another part of the test):

create index t1_f5 on t1(decode(state,'CLOSED', null, n1 ))
                         *
ERROR at line 1:
ORA-01408: such column list already indexed

The index usage was as follows:

    t1_f1		not used	(decode)
    t1_f2		not used	(decode)
    t1_f3		used		(case)
    t1_f4		used		(case)
    t1_f5		non-existent - but used t1_f1
    t1_f6		used		(cast)
    

If you want it in a sound-bite: newer technologies do better than older technologies. But why do the results look the way they do ? You can find the answer in the index definitions that have been stored in the database:


column index_name format a10		heading "Index"
column column_position format 999	heading "Posn"
column column_expression format a72	heading "Expression"

select
	index_name, column_position, column_expression
from
	user_ind_expressions
where
	table_name = 'T1'
;

Index      Posn Expression
---------- ---- ------------------------------------------------------------------------
T1_F1         1 DECODE("STATE",'CLOSED',NULL,"N1")
T1_F2         1 TO_NUMBER(DECODE("STATE",'CLOSED',NULL,TO_CHAR("N1")))
T1_F3         1 CASE "STATE" WHEN 'CLOSED' THEN NULL ELSE "N1" END
T1_F4         1 TO_NUMBER(TO_CHAR(CASE "STATE" WHEN 'CLOSED' THEN NULL ELSE "N1" END ))
T1_F6         1 DECODE("STATE",'CLOSED',CAST(NULL AS number),"N1")

Compare the stored definition with the orginal definitions. Notice how the decodes and NULLs don’t work happily together.

In t1_f1 the explicit to_number() that I included has disappeared – that’s why I was unable to create index t1_f5 – its definition was identical to the modified t1_f1 definition. Then, of course, my predicate no longer matches the exact index definition.

In the t1_f2 definition, because NULL is implicitly character Oracle has added an explicit to_char() to the n1 column I supplied so that its type agrees with the NULL, thus allowing the final to_number() to work. So, again, my predicate no longer matches the index definition.

In t1_f3 and t1_f4 I didn’t include any explicit conversions, and Oracle didn’t add any implicit conversions – but if you look closely it has transformed the version of the case statement I supplied into the simpler form – and everything happened to work (there was an earlier version of Oracle where Oracle would do this transformation for the predicate at run time but not for the index at index creation time – with the result that the “specially created” index wouldn’t work.

Index t1_f5 was not created because my explicit definition matched Oracle’s implicit conversion of t1_f1 – and then my explicit rendition of the matching predicate allowed the optimizer to use index t1_f1.

Finally, with the cast() operator the decode() wasn’t “clever enough” to eliminate my explicit conversion, so the predicate matched the index definition and the index was used.

So the message is this – be careful how you define your function-based indexes, and check what Oracle has stored as the index definition before you commit too much effort to rewriting code to use your new index.

Footnote: Inevitably there are more questions you could ask to fill in further details here. For example, if you created a “genuine” virtual column in 11g using one of my “unusable” decode() expressions, and then indexed the virtual column, would Oracle use the index ? If I had included some cast() operators in my case expressions and corresponding predicates would Oracle still have been able to use the indexes or would I have found the index definitions and predicates were transformed differently and ceased to match ? Is the behaviour shown consistent across all popular versions of Oracle ? (the answer to that last one is No)

These questions (and others) are left as exercises for the interested reader to carry out in the privacy and quiet of their own workplaces.

Jonathan Lewis's picture

Index Join – 3

I’ve recently been writing about the index join mechanism and ways of emulating it. Those notes were originally inspired by an example of an index join that appeared on OTN a little while ago.

It was a plan that combined “bitmap/btree conversion” with the basic index join strategy so, with hindsight, it was an “obvious” and brilliant execution plan for a certain type of query. The query in the original posting was a simple select (with no predicates) against a huge table in a data warehouse – presumably extracting a small number of columns from a much wider row.

SELECT DISTINCT
	ECP_ITEM_MASTER_DIM.ORGANIZATION_ID,
	ECP_ITEM_MASTER_DIM.INV_MFG_PRODUCTION_LINE,
	ECP_ITEM_MASTER_DIM.INV_PRODUCT_FAMILY,
	ECP_ITEM_MASTER_DIM.INV_SEGMENT_3,
	ECP_ITEM_MASTER_DIM.INV_SEGMENT_4,
	ECP_ITEM_MASTER_DIM.INV_SEGMENT_5
FROM
	ecp.ECP_ITEM_MASTER_DIM
;

(I really hate reading SQL where the whole table name has been repeated as the alias all the way through the SQL – it makes the code so hard to read, especially when it’s all in upper case. It’s important to use aliases, of course, but 3 or 4 letters is a sensible length.)

Here’s the execution plan:


---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name                         | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |                              |  1799K|    42M|       | 51279   (1)| 00:10:16 |
|   1 |  HASH UNIQUE                       |                              |  1799K|    42M|   151M| 51279   (1)| 00:10:16 |
|   2 |   VIEW                             | index$_join$_001             |  1799K|    42M|       | 38227   (1)| 00:07:39 |
|*  3 |    HASH JOIN                       |                              |       |       |       |            |          |
|*  4 |     HASH JOIN                      |                              |       |       |       |            |          |
|*  5 |      HASH JOIN                     |                              |       |       |       |            |          |
|*  6 |       HASH JOIN                    |                              |       |       |       |            |          |
|*  7 |        HASH JOIN                   |                              |       |       |       |            |          |
|   8 |         BITMAP CONVERSION TO ROWIDS|                              |  1799K|    42M|       |   485   (1)| 00:00:06 |
|   9 |          BITMAP INDEX FULL SCAN    | ECP_ITEM_MASTER_DIM_IMPL_BMX |       |       |       |            |          |
|  10 |         BITMAP CONVERSION TO ROWIDS|                              |  1799K|    42M|       |   230   (0)| 00:00:03 |
|  11 |          BITMAP INDEX FULL SCAN    | ECP_ITEM_MASTER_DIM_IPF_BMX  |       |       |       |            |          |
|  12 |        BITMAP CONVERSION TO ROWIDS |                              |  1799K|    42M|       |   229   (0)| 00:00:03 |
|  13 |         BITMAP INDEX FULL SCAN     | ECP_ITEM_MASTER_DIM_IS3_BMX  |       |       |       |            |          |
|  14 |       BITMAP CONVERSION TO ROWIDS  |                              |  1799K|    42M|       |   228   (0)| 00:00:03 |
|  15 |        BITMAP INDEX FULL SCAN      | ECP_ITEM_MASTER_DIM_IS4_BMX  |       |       |       |            |          |
|  16 |      BITMAP CONVERSION TO ROWIDS   |                              |  1799K|    42M|       |   201   (0)| 00:00:03 |
|  17 |       BITMAP INDEX FULL SCAN       | ECP_ITEM_MASTER_DIM_IS5_BMX  |       |       |       |            |          |
|  18 |     BITMAP CONVERSION TO ROWIDS    |                              |  1799K|    42M|       |   207   (0)| 00:00:03 |
|  19 |      BITMAP INDEX FULL SCAN        | ECP_ITEM_MASTER_DIM_OI_BMX   |       |       |       |            |          |
---------------------------------------------------------------------------------------------------------------------------

Isn’t it brilliant! The optimizer has seen that all the required columns can be found in indexes (six of them) – but they happen to be bitmap indexes so the optimizer has done a “bitmap conversion to rowid” on all six indexes one after the other with five consecutive hash joins – carrying the column values with each conversion and hash join.

Unfortunately the owner of this plan wasn’t happy with the resulting plan because a full tablescan turned out to be faster – nevertheless, it’s a very clever concept as the size of the table was measured in Gigabytes while the indexes were only a few megabytes each, allowing for a significant saving in I/O time.

I was a little curious, though, about the final join strategy. It’s annoying that Oracle didn’t report any costs on the hash join lines themselves because that could be very revealing. It’s remarkable that the value in the Bytes column for the final view (which is six columns of data) is the same as the bytes column for each index conversion (and remember that the projection from each conversion is just one data column with an accompanying rowid) – there’s clearly something wrong with the arithmetic.

This may explain why the optimizer has decided to run the 6-way join using only two running hash joins (rather than first setting up five hash tables in memory than passing the last table through them). If you think about this, when Oracle gets to the last hash join (lines 3, 4 and 18) it has to build a hash table from the result of the previous four joins and (in this case) that’s going to need a similar amount of memory as five in-memory hash tables. With that thought in mind I was puzzled that Oracle hadn’t just built five in-memory hash tables then walked through each in turn with the sixth.

Still – it’s not my (or my client’s) problem; maybe one day I’ll need to look more closely at a similar case.

[Further reading on Index Joins]

Jonathan Lewis's picture

Index Join – 2

In an earlier article introducing the index join I raised a question that came up at the first ES2N virtual conference:

    “If you hint an index hash join, is there any way of telling Oracle the order in which it should use the indexes?”

Consider the following example:

create table indjoin
nologging
as
select
	rownum	id,
	rownum	val1,
	rownum	val2,
	rownum	val3,
	rpad('x',500) padding
from
	all_objects
where
	rownum <= 5000
;

/*

alter table indjoin
add constraint ij_pk primary key (id)

*/

create unique index ij_v1 on indjoin(id, val1);
create unique index ij_v2 on indjoin(id, val2);
create unique index ij_v3 on indjoin(id, val3);

-- gather statistics: without histograms

select
	/*+ index_join(ij) */
	count(*)
from
	indjoin	ij
where
	val1 between 100 and 200
and	val2 between 50 and 150
and	val3 between 250 and 550
;

The query plan for this query is (thanks to the hint) a three-way index hash join:

-----------------------------------------------------------------------------
| Id  | Operation                | Name             | Rows  | Bytes | Cost  |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |                  |     1 |    12 |    74 |
|   1 |  SORT AGGREGATE          |                  |     1 |    12 |       |
|*  2 |   VIEW                   | index$_join$_001 |     1 |    12 |    74 |
|*  3 |    HASH JOIN             |                  |       |       |       |
|*  4 |     HASH JOIN            |                  |       |       |       |
|*  5 |      INDEX FAST FULL SCAN| IJ_V1            |     1 |    12 |    18 |
|*  6 |      INDEX FAST FULL SCAN| IJ_V2            |     1 |    12 |    18 |
|*  7 |     INDEX FAST FULL SCAN | IJ_V3            |     1 |    12 |    18 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("VAL1">=100 AND "VAL1"<=200 AND "VAL2">=50 AND
              "VAL2"<=150 AND "VAL3">=250 AND "VAL3"<=550)
   3 - access(ROWID=ROWID)
   4 - access(ROWID=ROWID)
   5 - filter("VAL1"<=200 AND "VAL1">=100)
   6 - filter("VAL2"<=150 AND "VAL2">=50)
   7 - filter("VAL3"<=550 AND "VAL3">=250)

But what if you know the data better than Oracle, and know that the join order for the three indexes should be different – there are no extra direct hints you can add to the code to tell Oracle the best order for the hash join. (You might, of course, be able to make use of the cardinality() hint – or plan around with the undocumented, hence unsupported, opt_estimate() or column_stats() or index_stats() hints, but I wouldn’t be keen to use such an indirect approach.)

But you CAN rewrite the query to get the same mechanism working under your control. The code looks more complex – but we often have to make a trade between clarity (simplicity) and speed in critical cases, so you may find some examples where the complexity is acceptable:

select
	count(*)
from
	(
	select
		/*+ no_merge */
		rowid
	from
		indjoin		ij
	where
		val1 between 100 and 200
	)	v1,
	(
	select
		/*+ no_merge */
		rowid
	from
		indjoin		ij
	where
		val2 between 50 and 150
	)	v2,
	(
	select
		/*+ no_merge */
		rowid
	from
		indjoin		ij
	where
		val3 between 250 and 550
	)	v3
where
	v2.rowid = v1.rowid
and	v3.rowid = v1.rowid
;

It’s another example of referencing a table twice (or three times) in the query because multiple references allow you to define a better execution path than a single reference. The execution we get from this plan (running under 10.2.0.3) is as follows:


------------------------------------------------------------------
| Id  | Operation                | Name  | Rows  | Bytes | Cost  |
------------------------------------------------------------------
|   0 | SELECT STATEMENT         |       |     1 |    36 |    14 |
|   1 |  SORT AGGREGATE          |       |     1 |    36 |       |
|*  2 |   HASH JOIN              |       |   102 |  3672 |    14 |
|*  3 |    HASH JOIN             |       |   102 |  2448 |     9 |
|   4 |     VIEW                 |       |   102 |  1224 |     4 |
|*  5 |      INDEX FAST FULL SCAN| IJ_V1 |   102 |  1632 |     4 |
|   6 |     VIEW                 |       |   102 |  1224 |     4 |
|*  7 |      INDEX FAST FULL SCAN| IJ_V2 |   102 |  1632 |     4 |
|   8 |    VIEW                  |       |   302 |  3624 |     4 |
|*  9 |     INDEX FAST FULL SCAN | IJ_V3 |   302 |  4832 |     4 |
------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("V3".ROWID="V1".ROWID)
   3 - access("V2".ROWID="V1".ROWID)
   5 - filter("VAL1">=100 AND "VAL1"<=200)
   7 - filter("VAL2">=50 AND "VAL2"<=150)
   9 - filter("VAL3">=250 AND "VAL3"<=550)

By creating three explicit query blocks (which I’ve ring-fenced with no_merge hints), one for each index, I’ve made Oracle extract the same three sets of data that it was using in the index hash join. I’ve then left Oracle to join the three result sets – which it has done with hash joins. Interestingly this seems to have done a little less work than the original index join – the final complex filter action doesn’t appear in the manual rewrite.

Since I’ve now got a query that seems to be “just” a three table join, I can dictate the join order, guarantee the hash joins, and dictate which rowsources should be used as build rowsources, and which as probe. For example, let’s apply the following hints:

select
	/*+
		leading (v1 v3 v2)
		use_hash(v3) no_swap_join_inputs(v3)
		use_hash(v2) swap_join_inputs(v2)
	*/
	count(*)
from
        ....

The resulting plan is as follows:

------------------------------------------------------------------
| Id  | Operation                | Name  | Rows  | Bytes | Cost  |
------------------------------------------------------------------
|   0 | SELECT STATEMENT         |       |     1 |    36 |    14 |
|   1 |  SORT AGGREGATE          |       |     1 |    36 |       |
|*  2 |   HASH JOIN              |       |   102 |  3672 |    14 |
|   3 |    VIEW                  |       |   102 |  1224 |     4 |
|*  4 |     INDEX FAST FULL SCAN | IJ_V2 |   102 |  1632 |     4 |
|*  5 |    HASH JOIN             |       |   102 |  2448 |     9 |
|   6 |     VIEW                 |       |   102 |  1224 |     4 |
|*  7 |      INDEX FAST FULL SCAN| IJ_V1 |   102 |  1632 |     4 |
|   8 |     VIEW                 |       |   302 |  3624 |     4 |
|*  9 |      INDEX FAST FULL SCAN| IJ_V3 |   302 |  4832 |     4 |
------------------------------------------------------------------

The join order (you can check the trace file to confirm this) is: ij_v1, ij_v3, ij_v2 – but because of the swap_join_inputs(v2) hint the ij_v2 index appears first in the plan.
We build a hash table with ij_v2, then build a hash table with ij_v1 with we probe (join) ij_v3.
We then use the result of joining ij_v1/ij_v3 to probe (join) ij_v2 – which means v2 really is the last object in the join order.

It may look complex – but all we’ve done is describe an index-join in detail, and that has allowed us to specify which indexes are joined when. I’ve already pointed out that the manual version appears to be slightly more efficien than the original. It’s also more powerful, and addresses a defect in the current implementation of the index join. But that’s a topic for another blog.

[Further reading on Index Joins]

Jonathan Lewis's picture

Index Join

One of the less well known access paths available to the optimizer is the “index join” also known as the “index hash join” path. It’s an access path that can be used when the optimizer decides that it doesn’t need to visit a table to supply the select list because there are indexes on the table that, between them, hold all the required columns. A simple example might look something like the following:


create table indjoin
as
select
	rownum	id,
	rownum	val1,
	rownum	val2,
	rpad('x',500) padding
from
	all_objects
where
	rownum <= 3000
;

create unique index ij_v1 on indjoin(id, val1);
create unique index ij_v2 on indjoin(id, val2);

-- collect stats on the table and indexes

select
	ij.id
from
	indjoin		ij
where
	ij.val1 between 100 and 200
and	ij.val2 between 50 and 150
;

Note that the columns in the where clause appear in (some) indexes, and the column(s) in the select list exist in (at least) some indexes. Under these circumstances the optimizer can produce the following plan (the test script was one I wrote for 8i – but this plan comes from an 11.1 instance):


---------------------------------------------------------------------------
| Id  | Operation              | Name             | Rows  | Bytes | Cost  |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                  |     3 |    36 |    24 |
|   1 |  VIEW                  | index$_join$_001 |     3 |    36 |    24 |
|*  2 |   HASH JOIN            |                  |       |       |       |
|*  3 |    INDEX FAST FULL SCAN| IJ_V1            |     3 |    36 |    11 |
|*  4 |    INDEX FAST FULL SCAN| IJ_V2            |     3 |    36 |    11 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access(ROWID=ROWID)
   3 - filter("VAL1"<=200 AND "VAL1">=100)
   4 - filter("VAL2"<=150 AND "VAL2">=50)

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - "ID"[NUMBER,22]
   2 - (#keys=1) "ID"[NUMBER,22]
   3 - ROWID[ROWID,10], "ID"[NUMBER,22]
   4 - ROWID[ROWID,10]

We do a fast full scan of the two indexes extracting the rowid and id from index ij_v1 and just the rowid from index ij_v2. We can then get the result we want by doing a hash join between these two result sets on the rowid values because any time the two rowsources have a rowid in common, it’s a rowid for a row where val1 is between 100 and 200, and val2 is between 50 and 150 and the first rowsource is carrying the id - which is the thing we need to report.

There are a couple of little observations that we can make about this example.

    First, although I’ve only used two indexes in this example Oracle is not limited to just two indexes. The number of indexes that could be used is effectively unlimited.
    Second, the index_join path is strictly limited to cases where the optimizer can see that every column in the query can be found in indexes on the table.
    Third, although my example uses index fast full scans that’s not a necessary feature of the plan. Just like any other hash join, Oracle could use an index range (or full) scan to get some of the data.
    Finally, there are clearly a couple of bugs in the code.

Bugs:

If you check the rows/bytes columns in the plan you’ll see that the predicted number of rows selected is the same for both indexes (lines 3 and 4) – but we extract the rowid and the id from the first index (projection detail for line 3), so the total data volume expected from line 3 is slightly larger than the total data volume from line 4 where we extract only the rowid; theoretically, therefore, the optimizer has used the tables (indexes) in the wrong order – the one supplying the smaller volume of data should have been used as the first (build) rowsource.

More significantly, though, a quick check of the code that generates the data tells you that each index will supply 101 rows to the hash join – and you can even show that for other query execution plans the optimizer will calculate this cardinality (nearly) correctly. In the case of the index join the optimizer seems to have lost the correct individual cardinalities and has decided to use the size of the final result set as the cardinality of the two driving index scans.

There’s more, of course – one of the strangest things about the index join is that if your select list includes the table’s rowid, the optimizer doesn’t consider that to be a column in the index. So even though the predicate section of the plan shows the rowids being projected in the hash join, Oracle won’t use an index join for a query returning the rowid !

Footnote: The reason I’ve written this brief introduction to the index join is because an interesting question came up at the first E2SN virtual conference.

“If you hint an index hash join, is there any way of telling Oracle the order in which it should use the indexes?”

The answer is no – but there are ways of creating code that will do what you want, and that will be the topic of my next blog.

[Further reading on Index Joins]

To prevent automated spam submissions leave this field empty.
Syndicate content