Execution plans

Jonathan Lewis's picture

Index Join

One of the less well known access paths available to the optimizer is the “index join” also known as the “index hash join” path. It’s an access path that can be used when the optimizer decides that it doesn’t need to visit a table to supply the select list because there are indexes on the table that, between them, hold all the required columns. A simple example might look something like the following:


create table indjoin
as
select
	rownum	id,
	rownum	val1,
	rownum	val2,
	rpad('x',500) padding
from
	all_objects
where
	rownum <= 3000
;

create unique index ij_v1 on indjoin(id, val1);
create unique index ij_v2 on indjoin(id, val2);

-- collect stats on the table and indexes

select
	ij.id
from
	indjoin		ij
where
	ij.val1 between 100 and 200
and	ij.val2 between 50 and 150
;

Note that the columns in the where clause appear in (some) indexes, and the column(s) in the select list exist in (at least) some indexes. Under these circumstances the optimizer can produce the following plan (the test script was one I wrote for 8i – but this plan comes from an 11.1 instance):


---------------------------------------------------------------------------
| Id  | Operation              | Name             | Rows  | Bytes | Cost  |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                  |     3 |    36 |    24 |
|   1 |  VIEW                  | index$_join$_001 |     3 |    36 |    24 |
|*  2 |   HASH JOIN            |                  |       |       |       |
|*  3 |    INDEX FAST FULL SCAN| IJ_V1            |     3 |    36 |    11 |
|*  4 |    INDEX FAST FULL SCAN| IJ_V2            |     3 |    36 |    11 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access(ROWID=ROWID)
   3 - filter("VAL1"<=200 AND "VAL1">=100)
   4 - filter("VAL2"<=150 AND "VAL2">=50)

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - "ID"[NUMBER,22]
   2 - (#keys=1) "ID"[NUMBER,22]
   3 - ROWID[ROWID,10], "ID"[NUMBER,22]
   4 - ROWID[ROWID,10]

We do a fast full scan of the two indexes extracting the rowid and id from index ij_v1 and just the rowid from index ij_v2. We can then get the result we want by doing a hash join between these two result sets on the rowid values because any time the two rowsources have a rowid in common, it’s a rowid for a row where val1 is between 100 and 200, and val2 is between 50 and 150 and the first rowsource is carrying the id - which is the thing we need to report.

There are a couple of little observations that we can make about this example.

    First, although I’ve only used two indexes in this example Oracle is not limited to just two indexes. The number of indexes that could be used is effectively unlimited.
    Second, the index_join path is strictly limited to cases where the optimizer can see that every column in the query can be found in indexes on the table.
    Third, although my example uses index fast full scans that’s not a necessary feature of the plan. Just like any other hash join, Oracle could use an index range (or full) scan to get some of the data.
    Finally, there are clearly a couple of bugs in the code.

Bugs:

If you check the rows/bytes columns in the plan you’ll see that the predicted number of rows selected is the same for both indexes (lines 3 and 4) – but we extract the rowid and the id from the first index (projection detail for line 3), so the total data volume expected from line 3 is slightly larger than the total data volume from line 4 where we extract only the rowid; theoretically, therefore, the optimizer has used the tables (indexes) in the wrong order – the one supplying the smaller volume of data should have been used as the first (build) rowsource.

More significantly, though, a quick check of the code that generates the data tells you that each index will supply 101 rows to the hash join – and you can even show that for other query execution plans the optimizer will calculate this cardinality (nearly) correctly. In the case of the index join the optimizer seems to have lost the correct individual cardinalities and has decided to use the size of the final result set as the cardinality of the two driving index scans.

There’s more, of course – one of the strangest things about the index join is that if your select list includes the table’s rowid, the optimizer doesn’t consider that to be a column in the index. So even though the predicate section of the plan shows the rowids being projected in the hash join, Oracle won’t use an index join for a query returning the rowid !

Footnote: The reason I’ve written this brief introduction to the index join is because an interesting question came up at the first E2SN virtual conference.

“If you hint an index hash join, is there any way of telling Oracle the order in which it should use the indexes?”

The answer is no – but there are ways of creating code that will do what you want, and that will be the topic of my next blog.

[Further reading on Index Joins]

Jonathan Lewis's picture

Quiz Night

Apart from the fact that the “Rows” figure for the FILTER operation at line 6 is blank, what’s the obvious error in this extract from an execution plan:

-------------------------------------------------------------------------
| Id  | Operation                            | Name             | Rows  |
-------------------------------------------------------------------------
        ...
|   5 |      NESTED LOOPS                    |                  |  3864 |
|   6 |       FILTER                         |                  |       |
|   7 |        HASH JOIN OUTER               |                  |  3864 |
|   8 |         HASH JOIN OUTER              |                  |   282K|
|   9 |          TABLE ACCESS BY INDEX ROWID | PRODUCT          |   282K|
|  10 |           INDEX RANGE SCAN           | PRD_SUPP_I1      |   282K|
|  11 |          VIEW                        |                  |  2293K|
|  12 |           HASH GROUP BY              |                  |  2293K|
|  13 |            PARTITION LIST SINGLE     |                  |  5790K|
|  14 |             TABLE ACCESS FULL        | PRODUCT_PRICING  |  5790K|
|  15 |         VIEW                         |                  |  2307K|
|  16 |          HASH GROUP BY               |                  |  2307K|
|  17 |           PARTITION LIST SINGLE      |                  |  5703K|
|  18 |            TABLE ACCESS FULL         | PRODUCT_PRICING  |  5703K|
        ...
-------------------------------------------------------------------------

Update 21/Nov/2010:
Once again I am reminded of two things – it’s important to be precise in your use of language if you want people to understand the question; and you can see a lot if you look carefully.

If you start to think about the activity that the plan represents, and the SQL that might have produced it, there are some ideas you might get about re-writing the query to be more efficient – but the point I was trying to make is that there is clearly an error in the content that the optimizer is displaying. The error suggests either that the optimizer has done the wrong arithmetic, or that the output is not a correct copy of the results produced by the optimizer.

The answer I was expecting comes from line 7. Stripping the error back to the bare minimum we see this:

--------------------------------------------------------------------------
| Id  | Operation                             | Name             | Rows  |
--------------------------------------------------------------------------
        ...
|   7 |        HASH JOIN OUTER                |                  |  3864 |
|   8 |         rowsource 1 (HASH JOIN OUTER) |                  |   282K|
|  15 |         rowsource 2 (VIEW)            |                  |  2307K|
        ...
--------------------------------------------------------------------------

As Milo points out in comment 3, In an outer join the result set cannot have fewer rows than the “preserved” rowsource (which, in this case, is the result set from line 8). I mentioned the fact that the “Rows” figure for the FILTER operation at line 6 was blank – it’s just possible that the optimizer has overwritten the figure in line 7 with the figure that should have been in line 6; there are cases where a FILTER operation and the operation you would normally think of as its first child are combined, so it’s possible that a little storage glitch has appeared in some cases where the combination rule doesn’t apply.

Someone did mention the FILTER operation and pointed out that it wasn’t filtering any data. The commonest forms of FILTER operation essentially check that some predicate it true for each row in their first child rowsource – and it is possible for someone to write code that has a filter that doesn’t eliminate any rows. In fact, though, this plan is probably saying: “line 7 will produce 282K rows, and the filter at line 6 will reduce that to 3,684.” (There’s also a comment about a “group by” not reducing the size of the rowsource – the comment was caused by a parallax error, but it is possible, of course, for Oracle to decide that a “group by” is going to produce an output with just as many rows as the input.)

Sean Molloy’s opening comment asks how you can get two different estimates from the same tablescan — and follows up with one answer which is that since we are looking at PARTITION LIST SINGLE the two tablescans could be of different partitions. But it’s only a puzzle if there were no predicates on the tablescans and, as Pavol points out in comment 7, there are no “star flags” in the ID column to suggest the presence of any predicates – but there are no stars anywhere – and there have to be some predicates in the plan, since you can’t do a hash join, index range scan, or filter without a predicate. As Timur points out – you don’t get the predicate section in the report from dbms_xplan.display_awr(), so you don’t get the stars.

Speaking of missing information, Dave Costa in comment 4 suggests that the user made an error in choosing which bits of the plan to copy. I did the choosing – how could it possible by wrong ! Given the number of times I’ve said “you must include the predicate section”, why isn’t it there ? (answer: it’s display_awr). In fact lines 5 and 6 were redundant as far as the “obvious problem” was concerned – but I thought that the blank and reappearance of the same cardinality might be a helpful visual clue.

The SQL:

Several people have commented on the rationale for code that does two outer join aggregations on the same table. It does look a little unusual, but it’s not possible to come to any conclusion about whether it’s a good thing or a bad thing without knowing the data and the intent of the SQL. For example the intent could be something like:


select
        product_name, min(offer_price), max(offer_price) ...
from

(You can assume that in the actual code, the min() and max() would be hidden inside a couple of inline views)
In this case code which visits the product_pricing table twice might be a very good idea – because there is a special “index (min/max) range scan” optimisation that works very well if (a) have the right indexes in place (b) use a nested loop and (c) only ask for min, or max, but not both. Perhaps our problem is that the optimizer is doing a hash join when it should be doing a nested loop.

Another possibility is that we have something like:


select
        product_name, min(gbp_offer_price), min(usd_offer_price) ...
from

(Again you can assume the min() and max() would be hidden inside inline view, and the different column names would be derived names rather than being from two separate columns in a table).
Notice that the plan shows list partitioning – maybe we have one currency per partition, and we’ve written the query to maximise the benefits of partition elimination (as well as leaving the data nicely normalised, thus maximising efficiency of maintenance).

Bottom line on the SQL – in this case I wasn’t asking people to guess what was wrong with the code; but it’s very interesting to see how many different topics of thought can come out from a starting point of a few lines extracted from a plan.

Greg Rahn's picture

Reading Parallel Execution Plans With Bloom Pruning And Composite Partitioning

You’ve probably heard sayings like “sometimes things aren’t always what they seem” and “people lie”. Well, sometimes execution plans lie. It’s not really by intent, but it is sometimes difficult (or impossible) to represent everything in a query execution tree in nice tabular format like dbms_xplan gives. One of the optimizations that was introduced back in 10gR2 was the use of bloom filters. Bloom filters can be used in two ways: 1) for filtering or 2) for partition pruning (bloom pruning) starting with 11g. Frequently the data models used in data warehousing are dimensional models (star or snowflake) and most Oracle warehouses use simple range (or interval) partitioning on the fact table date key column as that is the filter that yields the largest I/O reduction from partition pruning (most queries in a time series star schema include a time window, right!). As a result, it is imperative that the join between the date dimension and the fact table results in partition pruning. Let’s consider a basic two table join between a date dimension and a fact table. For these examples I’m using STORE_SALES and DATE_DIM which are TPC-DS tables (I frequently use TPC-DS for experiments as it uses a [...]

Jonathan Lewis's picture

Manual Optimisation

Here’s an example of “creative SQL” that I wrote in response to a question on OTN about combining data from two indexes to optimise access to a table. It demonstrates the principle that you can treat an index as a special case of a table – allowing you to make a query go faster by referencing the same table more times.

Unfortunately you shouldn’t use this particular example in a production system because it relies on the data appearing in the right order without having an “order by” clause. This type of thing makes me really keen to have a hint that says something like: /*+ qb_name(my_driver) assume_ordered(@my_driver) */ so that you could tell the optimizer that it can assume that the rowset from a given query block will appear in the order of the final “order by” clause.

Jonathan Lewis's picture

CBO Surprise

Well, it surprised me!

I’ve said for a very long time that in principle, ignoring bugs and restrictions, the optimizer will always choose the lowest cost option during its search for an execution path. It turns out that this isn’t true. In a comment attached to a note I had written about a possible bug relating to function-based indexes I was told that there are cases where the optimizer follows a rule that allows it to ignore the lowest cost path if it is derived from a range-based predicate involving unpeekable bind variables.

The trouble is, any statement made about “bind variables” may also apply in any circumstances where the optimizer sees: “unknown value”. Here’s a simplified example that I find a little worrying (running on 11.1):

Jonathan Lewis's picture

Not NULL

Here’s a little detail that I’ve known for years – but keep forgetting until I embarrass myself by rediscovering it (usually in front of a client). I’ll start with a demonstration of a useful little feature of mandatory columns:


drop table t1 purge;

create table t1
as
select
	*
from
	all_objects
where
	rownum <= 10000
;

execute dbms_stats.gather_table_stats(user,'t1')

create index t1_i1 on t1(object_name);

set autotrace traceonly explain

select count(*) from t1;

/*

--------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Cost (%CPU)| Time
--------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1 |    13   (0)| 00:00:0
|   1 |  SORT AGGREGATE       |       |     1 |            |
|   2 |   INDEX FAST FULL SCAN| T1_I1 | 10000 |    13   (0)| 00:00:0
--------------------------------------------------------------------

*/

Oracle can use the index on column object_name to count the number of rows in the table because the column has been declared NOT NULL, so every row in the table also has to appear in the index. Let’s just demonstrate that by changing the column definition:


alter table t1 modify object_name null;
select count(*) from t1;

/*

-------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    40   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |            |          |
|   2 |   TABLE ACCESS FULL| T1   | 10000 |    40   (0)| 00:00:01 |
-------------------------------------------------------------------

*/

Now let’s make the column mandatory again – by adding a constraint:

Jonathan Lewis's picture

Joins – MJ

The final join mechanism in my “all joins are nested loop joins” argument is the Merge Join – a join mechanism that depends on both its row sources being pre-sorted on the join columns. The note on hash joins pointed out that a “traditional” nested loop join may result in repeated visits to data blocks [...]

Jonathan Lewis's picture

Joins – HJ

In the second note on my thesis that “all joins are nested loop joins with different startup costs” I want to look at hash joins, and I’ll start by going back to the execution plan I posted on “Joins – NLJ”. --------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| [...]

Jonathan Lewis's picture

Joins – NLJ

This is part one of my thesis that “all joins are nested loop joins – it’s just the startup overheads that vary”; there will be a note on “Joins – HJ” and “Joins – MJ” to follow. In some ways, the claim is trivially obvious – a join simply takes two row sources and compares [...]

Jonathan Lewis's picture

double trouble

In the latest Quiz Night, I asked how you could make a query more efficient by changing a two table join into a three table join – with the clue that my third table was a repeat of the first table. Gary Myers, in comment 4,  provided the type of answer I was looking for. Sometimes [...]

To prevent automated spam submissions leave this field empty.
Syndicate content