Indexing

Jonathan Lewis's picture

FBI oddities

Function-based indexes are wonderful things – but they don’t always work exactly as expected. Here’s an example of one such anomaly.

Imagine you have some type of “orders” table where most orders are in a “finished with” state, and you have a requirement to access the small number of orders in the “new” state. Here’s a sample data set to emulate this type of data requirement (created in 11.1.0.6, 1MB uniforma extents, freelist management and 8KB blocks).


create table t1 (
	state		varchar2(10),
	n1		number,
	v1		varchar2(10),
	padding	varchar2(100)
);

insert into t1
select
	decode(mod(rownum,100),0,'OPEN','CLOSED'),
	rownum,
	lpad(rownum,10,0),
	rpad('x',100,'x')
from
	all_objects
where
	rownum <= 5000
;

I’ve generated this data set so that every 100th row is marked as ‘OPEN’ and all the rest are marked as ‘CLOSED’ – in a real system the percentage of ‘OPEN’ orders would probably be much smaller so we could easily decide to have an index on state to give us an efficient access path to the open orders. But such an index would be very large, because it would also hold entries for the huge number of closed orders; we’d also have to create a histogram on the column (possibly by writing a simple script) so that Oracle could recognise the skewed data distribution.

If we wanted to be clever, though, and if we were able to edit the SQL that addressed this table, we could minimise the size of the index and avoid the need for a histogram by creating a function-based index that held values just for the rows where the state was ‘OPEN’. For example, I could create an index which held the order number only for those rows where the state was open; and there are several ways I could do this, for example:


create index t1_f1 on t1(decode(state,'CLOSED', to_number(null), n1 ));
create index t1_f2 on t1(to_number(decode(state,'CLOSED', null, n1 )));
create index t1_f3 on t1(case when state = 'CLOSED' then to_number(null) else n1 end);
create index t1_f4 on t1(to_number(case when state = 'CLOSED' then null else n1 end));
create index t1_f5 on t1(decode(state,'CLOSED', null, n1 ));
create index t1_f6 on t1(decode(state,'CLOSED', cast(null as number), n1 ));

If you’re wondering why I’ve included a “to_number()” in the first index, remember that NULL is implicitly assumed to be a NULL of type character by Oracle – so I’ve got to do something to tell Oracle that this NULL is supposed to be a numeric NULL. Index t1_f5 is the same as t1_f1, but without the to_number(), and index t1_f6 is the same again but using the more modern cast() to supply the conversion.

You’ll note that I haven’t yet shown any attempt to collect statistics. If we create the indexes AFTER we’ve collected stats on the table we’ll have to collect some extra table stats once the indexes exist because each function-based index will have added a new (hidden) column to the table and, although the “create index” commands will have created statistics for the indexes (from 10g onwards), we will not yet have stats on these hidden columns. So I’m going to wait until after creating the indexes to generate the stats:


begin
	dbms_stats.gather_table_stats(
		ownname		=> user,
		tabname		=>'T1',
		estimate_percent	=> 100,
		method_opt		=> 'for all columns size 1'
	);
end;
/

The question is now this – given the definitions of the indexes above, which of the following six queries – each one designed to be an exact match for one of the index definitions – will use “its” index. (Note that I have hinted the queries to ensure that if the optimizer is allowed to use an index it will use an index – and I’ve included the name of the relevant index as a comment at the end of each hint):


select
	/*+ index(t1) t1_f1 */
	v1
from
	t1
where
	decode(state,'CLOSED', to_number(null), n1 ) = 100
;

select
	/*+ index(t1) t1_f2 */
	v1
from
	t1
where
	to_number(decode(state,'CLOSED', null, n1 )) = 100
;

select
	/*+ index(t1) t1_f3 */
	v1
from
	t1
where
	case when state = 'CLOSED' then to_number(null) else n1 end = 100
;

select
	/*+ index(t1) t1_f4 */
	v1
from
	t1
where
	to_number(case when state = 'CLOSED' then null else n1 end) = 100
;

select
	/*+ index(t1) t1_f5 */
	v1
from
	t1
where
	decode(state,'CLOSED', null, n1 ) = 100
;

select
	/*+ index(t1) t1_f6 */
	v1
from
	t1
where
	decode(state,'CLOSED', cast(null as number), n1 ) = 100
;

The answer depends on the version of Oracle. Under Oracle 11.1.0.6 I got the following results. First, the attempt to create t1_f5 resulted in the following Oracle error (and that’s an important clue to what has happened in another part of the test):

create index t1_f5 on t1(decode(state,'CLOSED', null, n1 ))
                         *
ERROR at line 1:
ORA-01408: such column list already indexed

The index usage was as follows:

    t1_f1		not used	(decode)
    t1_f2		not used	(decode)
    t1_f3		used		(case)
    t1_f4		used		(case)
    t1_f5		non-existent - but used t1_f1
    t1_f6		used		(cast)
    

If you want it in a sound-bite: newer technologies do better than older technologies. But why do the results look the way they do ? You can find the answer in the index definitions that have been stored in the database:


column index_name format a10		heading "Index"
column column_position format 999	heading "Posn"
column column_expression format a72	heading "Expression"

select
	index_name, column_position, column_expression
from
	user_ind_expressions
where
	table_name = 'T1'
;

Index      Posn Expression
---------- ---- ------------------------------------------------------------------------
T1_F1         1 DECODE("STATE",'CLOSED',NULL,"N1")
T1_F2         1 TO_NUMBER(DECODE("STATE",'CLOSED',NULL,TO_CHAR("N1")))
T1_F3         1 CASE "STATE" WHEN 'CLOSED' THEN NULL ELSE "N1" END
T1_F4         1 TO_NUMBER(TO_CHAR(CASE "STATE" WHEN 'CLOSED' THEN NULL ELSE "N1" END ))
T1_F6         1 DECODE("STATE",'CLOSED',CAST(NULL AS number),"N1")

Compare the stored definition with the orginal definitions. Notice how the decodes and NULLs don’t work happily together.

In t1_f1 the explicit to_number() that I included has disappeared – that’s why I was unable to create index t1_f5 – its definition was identical to the modified t1_f1 definition. Then, of course, my predicate no longer matches the exact index definition.

In the t1_f2 definition, because NULL is implicitly character Oracle has added an explicit to_char() to the n1 column I supplied so that its type agrees with the NULL, thus allowing the final to_number() to work. So, again, my predicate no longer matches the index definition.

In t1_f3 and t1_f4 I didn’t include any explicit conversions, and Oracle didn’t add any implicit conversions – but if you look closely it has transformed the version of the case statement I supplied into the simpler form – and everything happened to work (there was an earlier version of Oracle where Oracle would do this transformation for the predicate at run time but not for the index at index creation time – with the result that the “specially created” index wouldn’t work.

Index t1_f5 was not created because my explicit definition matched Oracle’s implicit conversion of t1_f1 – and then my explicit rendition of the matching predicate allowed the optimizer to use index t1_f1.

Finally, with the cast() operator the decode() wasn’t “clever enough” to eliminate my explicit conversion, so the predicate matched the index definition and the index was used.

So the message is this – be careful how you define your function-based indexes, and check what Oracle has stored as the index definition before you commit too much effort to rewriting code to use your new index.

Footnote: Inevitably there are more questions you could ask to fill in further details here. For example, if you created a “genuine” virtual column in 11g using one of my “unusable” decode() expressions, and then indexed the virtual column, would Oracle use the index ? If I had included some cast() operators in my case expressions and corresponding predicates would Oracle still have been able to use the indexes or would I have found the index definitions and predicates were transformed differently and ceased to match ? Is the behaviour shown consistent across all popular versions of Oracle ? (the answer to that last one is No)

These questions (and others) are left as exercises for the interested reader to carry out in the privacy and quiet of their own workplaces.

Jonathan Lewis's picture

Index Join – 3

I’ve recently been writing about the index join mechanism and ways of emulating it. Those notes were originally inspired by an example of an index join that appeared on OTN a little while ago.

It was a plan that combined “bitmap/btree conversion” with the basic index join strategy so, with hindsight, it was an “obvious” and brilliant execution plan for a certain type of query. The query in the original posting was a simple select (with no predicates) against a huge table in a data warehouse – presumably extracting a small number of columns from a much wider row.

SELECT DISTINCT
	ECP_ITEM_MASTER_DIM.ORGANIZATION_ID,
	ECP_ITEM_MASTER_DIM.INV_MFG_PRODUCTION_LINE,
	ECP_ITEM_MASTER_DIM.INV_PRODUCT_FAMILY,
	ECP_ITEM_MASTER_DIM.INV_SEGMENT_3,
	ECP_ITEM_MASTER_DIM.INV_SEGMENT_4,
	ECP_ITEM_MASTER_DIM.INV_SEGMENT_5
FROM
	ecp.ECP_ITEM_MASTER_DIM
;

(I really hate reading SQL where the whole table name has been repeated as the alias all the way through the SQL – it makes the code so hard to read, especially when it’s all in upper case. It’s important to use aliases, of course, but 3 or 4 letters is a sensible length.)

Here’s the execution plan:


---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name                         | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |                              |  1799K|    42M|       | 51279   (1)| 00:10:16 |
|   1 |  HASH UNIQUE                       |                              |  1799K|    42M|   151M| 51279   (1)| 00:10:16 |
|   2 |   VIEW                             | index$_join$_001             |  1799K|    42M|       | 38227   (1)| 00:07:39 |
|*  3 |    HASH JOIN                       |                              |       |       |       |            |          |
|*  4 |     HASH JOIN                      |                              |       |       |       |            |          |
|*  5 |      HASH JOIN                     |                              |       |       |       |            |          |
|*  6 |       HASH JOIN                    |                              |       |       |       |            |          |
|*  7 |        HASH JOIN                   |                              |       |       |       |            |          |
|   8 |         BITMAP CONVERSION TO ROWIDS|                              |  1799K|    42M|       |   485   (1)| 00:00:06 |
|   9 |          BITMAP INDEX FULL SCAN    | ECP_ITEM_MASTER_DIM_IMPL_BMX |       |       |       |            |          |
|  10 |         BITMAP CONVERSION TO ROWIDS|                              |  1799K|    42M|       |   230   (0)| 00:00:03 |
|  11 |          BITMAP INDEX FULL SCAN    | ECP_ITEM_MASTER_DIM_IPF_BMX  |       |       |       |            |          |
|  12 |        BITMAP CONVERSION TO ROWIDS |                              |  1799K|    42M|       |   229   (0)| 00:00:03 |
|  13 |         BITMAP INDEX FULL SCAN     | ECP_ITEM_MASTER_DIM_IS3_BMX  |       |       |       |            |          |
|  14 |       BITMAP CONVERSION TO ROWIDS  |                              |  1799K|    42M|       |   228   (0)| 00:00:03 |
|  15 |        BITMAP INDEX FULL SCAN      | ECP_ITEM_MASTER_DIM_IS4_BMX  |       |       |       |            |          |
|  16 |      BITMAP CONVERSION TO ROWIDS   |                              |  1799K|    42M|       |   201   (0)| 00:00:03 |
|  17 |       BITMAP INDEX FULL SCAN       | ECP_ITEM_MASTER_DIM_IS5_BMX  |       |       |       |            |          |
|  18 |     BITMAP CONVERSION TO ROWIDS    |                              |  1799K|    42M|       |   207   (0)| 00:00:03 |
|  19 |      BITMAP INDEX FULL SCAN        | ECP_ITEM_MASTER_DIM_OI_BMX   |       |       |       |            |          |
---------------------------------------------------------------------------------------------------------------------------

Isn’t it brilliant! The optimizer has seen that all the required columns can be found in indexes (six of them) – but they happen to be bitmap indexes so the optimizer has done a “bitmap conversion to rowid” on all six indexes one after the other with five consecutive hash joins – carrying the column values with each conversion and hash join.

Unfortunately the owner of this plan wasn’t happy with the resulting plan because a full tablescan turned out to be faster – nevertheless, it’s a very clever concept as the size of the table was measured in Gigabytes while the indexes were only a few megabytes each, allowing for a significant saving in I/O time.

I was a little curious, though, about the final join strategy. It’s annoying that Oracle didn’t report any costs on the hash join lines themselves because that could be very revealing. It’s remarkable that the value in the Bytes column for the final view (which is six columns of data) is the same as the bytes column for each index conversion (and remember that the projection from each conversion is just one data column with an accompanying rowid) – there’s clearly something wrong with the arithmetic.

This may explain why the optimizer has decided to run the 6-way join using only two running hash joins (rather than first setting up five hash tables in memory than passing the last table through them). If you think about this, when Oracle gets to the last hash join (lines 3, 4 and 18) it has to build a hash table from the result of the previous four joins and (in this case) that’s going to need a similar amount of memory as five in-memory hash tables. With that thought in mind I was puzzled that Oracle hadn’t just built five in-memory hash tables then walked through each in turn with the sixth.

Still – it’s not my (or my client’s) problem; maybe one day I’ll need to look more closely at a similar case.

[Further reading on Index Joins]

Jonathan Lewis's picture

Index Join – 2

In an earlier article introducing the index join I raised a question that came up at the first ES2N virtual conference:

    “If you hint an index hash join, is there any way of telling Oracle the order in which it should use the indexes?”

Consider the following example:

create table indjoin
nologging
as
select
	rownum	id,
	rownum	val1,
	rownum	val2,
	rownum	val3,
	rpad('x',500) padding
from
	all_objects
where
	rownum <= 5000
;

/*

alter table indjoin
add constraint ij_pk primary key (id)

*/

create unique index ij_v1 on indjoin(id, val1);
create unique index ij_v2 on indjoin(id, val2);
create unique index ij_v3 on indjoin(id, val3);

-- gather statistics: without histograms

select
	/*+ index_join(ij) */
	count(*)
from
	indjoin	ij
where
	val1 between 100 and 200
and	val2 between 50 and 150
and	val3 between 250 and 550
;

The query plan for this query is (thanks to the hint) a three-way index hash join:

-----------------------------------------------------------------------------
| Id  | Operation                | Name             | Rows  | Bytes | Cost  |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |                  |     1 |    12 |    74 |
|   1 |  SORT AGGREGATE          |                  |     1 |    12 |       |
|*  2 |   VIEW                   | index$_join$_001 |     1 |    12 |    74 |
|*  3 |    HASH JOIN             |                  |       |       |       |
|*  4 |     HASH JOIN            |                  |       |       |       |
|*  5 |      INDEX FAST FULL SCAN| IJ_V1            |     1 |    12 |    18 |
|*  6 |      INDEX FAST FULL SCAN| IJ_V2            |     1 |    12 |    18 |
|*  7 |     INDEX FAST FULL SCAN | IJ_V3            |     1 |    12 |    18 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("VAL1">=100 AND "VAL1"<=200 AND "VAL2">=50 AND
              "VAL2"<=150 AND "VAL3">=250 AND "VAL3"<=550)
   3 - access(ROWID=ROWID)
   4 - access(ROWID=ROWID)
   5 - filter("VAL1"<=200 AND "VAL1">=100)
   6 - filter("VAL2"<=150 AND "VAL2">=50)
   7 - filter("VAL3"<=550 AND "VAL3">=250)

But what if you know the data better than Oracle, and know that the join order for the three indexes should be different – there are no extra direct hints you can add to the code to tell Oracle the best order for the hash join. (You might, of course, be able to make use of the cardinality() hint – or plan around with the undocumented, hence unsupported, opt_estimate() or column_stats() or index_stats() hints, but I wouldn’t be keen to use such an indirect approach.)

But you CAN rewrite the query to get the same mechanism working under your control. The code looks more complex – but we often have to make a trade between clarity (simplicity) and speed in critical cases, so you may find some examples where the complexity is acceptable:

select
	count(*)
from
	(
	select
		/*+ no_merge */
		rowid
	from
		indjoin		ij
	where
		val1 between 100 and 200
	)	v1,
	(
	select
		/*+ no_merge */
		rowid
	from
		indjoin		ij
	where
		val2 between 50 and 150
	)	v2,
	(
	select
		/*+ no_merge */
		rowid
	from
		indjoin		ij
	where
		val3 between 250 and 550
	)	v3
where
	v2.rowid = v1.rowid
and	v3.rowid = v1.rowid
;

It’s another example of referencing a table twice (or three times) in the query because multiple references allow you to define a better execution path than a single reference. The execution we get from this plan (running under 10.2.0.3) is as follows:


------------------------------------------------------------------
| Id  | Operation                | Name  | Rows  | Bytes | Cost  |
------------------------------------------------------------------
|   0 | SELECT STATEMENT         |       |     1 |    36 |    14 |
|   1 |  SORT AGGREGATE          |       |     1 |    36 |       |
|*  2 |   HASH JOIN              |       |   102 |  3672 |    14 |
|*  3 |    HASH JOIN             |       |   102 |  2448 |     9 |
|   4 |     VIEW                 |       |   102 |  1224 |     4 |
|*  5 |      INDEX FAST FULL SCAN| IJ_V1 |   102 |  1632 |     4 |
|   6 |     VIEW                 |       |   102 |  1224 |     4 |
|*  7 |      INDEX FAST FULL SCAN| IJ_V2 |   102 |  1632 |     4 |
|   8 |    VIEW                  |       |   302 |  3624 |     4 |
|*  9 |     INDEX FAST FULL SCAN | IJ_V3 |   302 |  4832 |     4 |
------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("V3".ROWID="V1".ROWID)
   3 - access("V2".ROWID="V1".ROWID)
   5 - filter("VAL1">=100 AND "VAL1"<=200)
   7 - filter("VAL2">=50 AND "VAL2"<=150)
   9 - filter("VAL3">=250 AND "VAL3"<=550)

By creating three explicit query blocks (which I’ve ring-fenced with no_merge hints), one for each index, I’ve made Oracle extract the same three sets of data that it was using in the index hash join. I’ve then left Oracle to join the three result sets – which it has done with hash joins. Interestingly this seems to have done a little less work than the original index join – the final complex filter action doesn’t appear in the manual rewrite.

Since I’ve now got a query that seems to be “just” a three table join, I can dictate the join order, guarantee the hash joins, and dictate which rowsources should be used as build rowsources, and which as probe. For example, let’s apply the following hints:

select
	/*+
		leading (v1 v3 v2)
		use_hash(v3) no_swap_join_inputs(v3)
		use_hash(v2) swap_join_inputs(v2)
	*/
	count(*)
from
        ....

The resulting plan is as follows:

------------------------------------------------------------------
| Id  | Operation                | Name  | Rows  | Bytes | Cost  |
------------------------------------------------------------------
|   0 | SELECT STATEMENT         |       |     1 |    36 |    14 |
|   1 |  SORT AGGREGATE          |       |     1 |    36 |       |
|*  2 |   HASH JOIN              |       |   102 |  3672 |    14 |
|   3 |    VIEW                  |       |   102 |  1224 |     4 |
|*  4 |     INDEX FAST FULL SCAN | IJ_V2 |   102 |  1632 |     4 |
|*  5 |    HASH JOIN             |       |   102 |  2448 |     9 |
|   6 |     VIEW                 |       |   102 |  1224 |     4 |
|*  7 |      INDEX FAST FULL SCAN| IJ_V1 |   102 |  1632 |     4 |
|   8 |     VIEW                 |       |   302 |  3624 |     4 |
|*  9 |      INDEX FAST FULL SCAN| IJ_V3 |   302 |  4832 |     4 |
------------------------------------------------------------------

The join order (you can check the trace file to confirm this) is: ij_v1, ij_v3, ij_v2 – but because of the swap_join_inputs(v2) hint the ij_v2 index appears first in the plan.
We build a hash table with ij_v2, then build a hash table with ij_v1 with we probe (join) ij_v3.
We then use the result of joining ij_v1/ij_v3 to probe (join) ij_v2 – which means v2 really is the last object in the join order.

It may look complex – but all we’ve done is describe an index-join in detail, and that has allowed us to specify which indexes are joined when. I’ve already pointed out that the manual version appears to be slightly more efficien than the original. It’s also more powerful, and addresses a defect in the current implementation of the index join. But that’s a topic for another blog.

[Further reading on Index Joins]

Jonathan Lewis's picture

Index Join

One of the less well known access paths available to the optimizer is the “index join” also known as the “index hash join” path. It’s an access path that can be used when the optimizer decides that it doesn’t need to visit a table to supply the select list because there are indexes on the table that, between them, hold all the required columns. A simple example might look something like the following:


create table indjoin
as
select
	rownum	id,
	rownum	val1,
	rownum	val2,
	rpad('x',500) padding
from
	all_objects
where
	rownum <= 3000
;

create unique index ij_v1 on indjoin(id, val1);
create unique index ij_v2 on indjoin(id, val2);

-- collect stats on the table and indexes

select
	ij.id
from
	indjoin		ij
where
	ij.val1 between 100 and 200
and	ij.val2 between 50 and 150
;

Note that the columns in the where clause appear in (some) indexes, and the column(s) in the select list exist in (at least) some indexes. Under these circumstances the optimizer can produce the following plan (the test script was one I wrote for 8i – but this plan comes from an 11.1 instance):


---------------------------------------------------------------------------
| Id  | Operation              | Name             | Rows  | Bytes | Cost  |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                  |     3 |    36 |    24 |
|   1 |  VIEW                  | index$_join$_001 |     3 |    36 |    24 |
|*  2 |   HASH JOIN            |                  |       |       |       |
|*  3 |    INDEX FAST FULL SCAN| IJ_V1            |     3 |    36 |    11 |
|*  4 |    INDEX FAST FULL SCAN| IJ_V2            |     3 |    36 |    11 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access(ROWID=ROWID)
   3 - filter("VAL1"<=200 AND "VAL1">=100)
   4 - filter("VAL2"<=150 AND "VAL2">=50)

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - "ID"[NUMBER,22]
   2 - (#keys=1) "ID"[NUMBER,22]
   3 - ROWID[ROWID,10], "ID"[NUMBER,22]
   4 - ROWID[ROWID,10]

We do a fast full scan of the two indexes extracting the rowid and id from index ij_v1 and just the rowid from index ij_v2. We can then get the result we want by doing a hash join between these two result sets on the rowid values because any time the two rowsources have a rowid in common, it’s a rowid for a row where val1 is between 100 and 200, and val2 is between 50 and 150 and the first rowsource is carrying the id - which is the thing we need to report.

There are a couple of little observations that we can make about this example.

    First, although I’ve only used two indexes in this example Oracle is not limited to just two indexes. The number of indexes that could be used is effectively unlimited.
    Second, the index_join path is strictly limited to cases where the optimizer can see that every column in the query can be found in indexes on the table.
    Third, although my example uses index fast full scans that’s not a necessary feature of the plan. Just like any other hash join, Oracle could use an index range (or full) scan to get some of the data.
    Finally, there are clearly a couple of bugs in the code.

Bugs:

If you check the rows/bytes columns in the plan you’ll see that the predicted number of rows selected is the same for both indexes (lines 3 and 4) – but we extract the rowid and the id from the first index (projection detail for line 3), so the total data volume expected from line 3 is slightly larger than the total data volume from line 4 where we extract only the rowid; theoretically, therefore, the optimizer has used the tables (indexes) in the wrong order – the one supplying the smaller volume of data should have been used as the first (build) rowsource.

More significantly, though, a quick check of the code that generates the data tells you that each index will supply 101 rows to the hash join – and you can even show that for other query execution plans the optimizer will calculate this cardinality (nearly) correctly. In the case of the index join the optimizer seems to have lost the correct individual cardinalities and has decided to use the size of the final result set as the cardinality of the two driving index scans.

There’s more, of course – one of the strangest things about the index join is that if your select list includes the table’s rowid, the optimizer doesn’t consider that to be a column in the index. So even though the predicate section of the plan shows the rowids being projected in the hash join, Oracle won’t use an index join for a query returning the rowid !

Footnote: The reason I’ve written this brief introduction to the index join is because an interesting question came up at the first E2SN virtual conference.

“If you hint an index hash join, is there any way of telling Oracle the order in which it should use the indexes?”

The answer is no – but there are ways of creating code that will do what you want, and that will be the topic of my next blog.

[Further reading on Index Joins]

Jonathan Lewis's picture

Good Nulls

I’ve often been heard to warn people of the accidents that can happen when they forget about the traps that appear when you start allowing columns to be NULL – but sometimes NULLs are good, especially when it helps Oracle understand where the important (e.g. not null) data might be.

An interesting example of this came up on OTN a few months ago where someone was testing the effects of changing a YES/NO column into a YES/NULL column (which is a nice idea because it allows you to create a very small index on the YESes, and avoid creating a histogram to tell the optimizer that the number of YESes is small).

They were a little puzzled, though, about why their tests showed Oracle using an index to find data in the YES/NO case, but not using the index in the YES/NULL case. I supplied a short explanation on the thread, and was planning to post a description on the blog, but someone on the thread supplied a link to AskTom where Tom Kyte had already answered the question, so I’m just going to leave you with a link to his explanation.

Jonathan Lewis's picture

Manual Optimisation

Here’s an example of “creative SQL” that I wrote in response to a question on OTN about combining data from two indexes to optimise access to a table. It demonstrates the principle that you can treat an index as a special case of a table – allowing you to make a query go faster by referencing the same table more times.

Unfortunately you shouldn’t use this particular example in a production system because it relies on the data appearing in the right order without having an “order by” clause. This type of thing makes me really keen to have a hint that says something like: /*+ qb_name(my_driver) assume_ordered(@my_driver) */ so that you could tell the optimizer that it can assume that the rowset from a given query block will appear in the order of the final “order by” clause.

Jonathan Lewis's picture

Index degeneration

There’s a thread on OTN that talks about a particular deletion job taking increasing amounts of time each time it is run.

It looks like an example where some thought needs to go into index maintenance and I’ve contributed a few comments to the thread – so this is a lazy link so that I don’t have to repeat myself on the blog.

Jonathan Lewis's picture

Index rebuilds

A couple of years ago I wrote about a poster on the OTN db forum who was puzzled by the fact that when he started rebuilding tables they got bigger.  (Bad luck, sometimes that’s what happens !)

A few days ago a related question appeared: I rebuilt some indexes and my query got slower. (Bad luck, sometimes that’s what happens – again!)

If you rebuild an index it’s physically different and its statistics are different. Plans can change and go slower because the index stats look sufficiently different; plans can stay the same and go slower because the index is physically different. 

I’ve added a couple of comments to the thread – there may still be some further mileage in it.

Jonathan Lewis's picture

Index Space

I’ve just been reminded of a thread on OTN which turned into a Q&A about index space usage and various related topics. On re-reading it, I decided it was too good to waste in the backwaters of OTN, so here’s a link to it.

Index Space Utilization.

Jonathan Lewis's picture

FBI Bug

Here’s an odd little bug (I think) in the optimizer that showed itself in a thread on the OTN database forum some time ago. The optimizer can choose an index which is NOT the cheapest index for satisfying a query against a single table. Here’s the demo – which I’ve run against 11.1.0.6 using an 8KB block size, ASSM and system allocated extent sizes:

To prevent automated spam submissions leave this field empty.
Syndicate content