partitioning

Jonathan Lewis's picture

Truncate Partition

A recent posting on the OTN database forum (which I didn’t answer correctly on my first attempt) raised the problem of truncating partitions when there are referential integrity constraints in place. Let’s start with a demonstration of the problem:

randolf.geist's picture

Incremental Partition Statistics Review

Introduction

Here is a summary of the findings while evaluating Incremental Partition Statistics that have been introduced in Oracle 11g.

The most important point to understand is that Incremental Partition Statistics are not "cost-free", so anyone who is telling you that you can gather statistics on the lowest level (partition or sub-partition in case of composite partitioning) without any noticeable overhead in comparison to non-incremental statistics (on the lowest level) is not telling you the truth.

Although this might be obvious I've already personally heard someone making such claims so it's probably worth to mention.

In principle you need to test on your individual system whether the overhead that is added to each statistics update on the lowest level outweighs the overhead of actually gathering statistics on higher levels, of course in particular on global level.

randolf.geist's picture

Dynamic Sampling On Multiple Partitions - Bugs

In a recent OTN thread I've been reminded of two facts about Dynamic Sampling that I already knew but had forgotten in the meantime:

1. The table level dynamic sampling hint uses a different number of blocks for sampling than the session / cursor level dynamic sampling. So even if for both for example level 5 gets used the number of sampled blocks will be different for most of the 10 levels available (obviously level 0 and 10 are exceptions)

2. The Dynamic Sampling code uses a different approach for partitioned objects if it is faced with the situation that there are more partitions than blocks to sample according to the level (and type table/cursor/session) of Dynamic Sampling

Note that all this here applies to the case where no statistics have been gathered for the table - I don't cover the case when Dynamic Sampling gets used on top of existing statistics.

Jonathan Lewis's picture

Table rebuilds

Here’s a question from the OTN forum that I’ve seen fairly frequently in the last couple of years:

I have a table abc(c_date date, c_id varchr2(20), c_name varchar2);
The table is already range partitoned on “c_date” column and I have lot of data in that table.
Now my task is to create the hash partions under range; hash partiton needs 16 partitons.
I dont want to create another table with range and hash partitons and migrate the data. The amount of data I have is 3 terabytes.

Jonathan Lewis's picture

Indexing

A question about partitioning came up on OTN a few days ago – but the really interesting part of the thread was about the choice of indexes, and how the choice of partitioning, combined with the data patterns, could make a big difference to the choice of indexes. I think the comments I made about indexes are worth seeing, so I’ve linked to the thread.


Jonathan Lewis's picture

Partitioned Bitmaps

The following question appeared in a comment to an earlier posting on multi-column bitmap indexes and the inability of Oracle to create a bitmap index join when (to the human eye) the strategy was an obvious choice.

    I have a query which is using 2 indexes both are bitmap indexes (sizes are 37 and 24 Mbs) and table size is 17gb. While i ran the following query which can very well get the index itself, it takes around 6-8 minutes and using pga around 3 gb.

could you please explain me why ?

SQL_ID  5z0a50a2yzdky, child number 0
-------------------------------------
select count(1) from (select distinct SRC_SYS,PROD_KEY from dds.REV_F)

Plan hash value: 867747470

--------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name                 | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |                      |       |       |   221K(100)|          |       |       |
|   1 |  SORT AGGREGATE                   |                      |     1 |       |            |          |       |       |
|   2 |   VIEW                            |                      | 24533 |       |   221K  (6)| 00:44:22 |       |       |
|   3 |    HASH UNIQUE                    |                      | 24533 |   479K|   221K  (6)| 00:44:22 |       |       |
|   4 |     VIEW                          | index$_join$_002     |    63M|  1209M|   212K  (2)| 00:42:28 |       |       |
|*  5 |      HASH JOIN                    |                      |       |       |            |          |       |       |
|   6 |       PARTITION LIST ALL          |                      |    63M|  1209M|  3591   (1)| 00:00:44 |     1 |   145 |
|   7 |        BITMAP CONVERSION TO ROWIDS|                      |    63M|  1209M|  3591   (1)| 00:00:44 |       |       |
|   8 |         BITMAP INDEX FULL SCAN    | REV_F_IDX1           |       |       |            |          |     1 |   145 |
|   9 |       PARTITION LIST ALL          |                      |    63M|  1209M| 13724   (1)| 00:02:45 |     1 |   145 |
|  10 |        BITMAP CONVERSION TO ROWIDS|                      |    63M|  1209M| 13724   (1)| 00:02:45 |       |       |
|  11 |         BITMAP INDEX FULL SCAN    | REV_F_IDX5           |       |       |            |          |     1 |   145 |
--------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access(ROWID=ROWID)

28 rows selected.

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2    610.89    1464.86     707459      17090          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4    610.90    1464.87     707459      17090          0           1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: SYS

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE (cr=17090 pr=707459 pw=446115 time=1464867976 us)
  26066   VIEW  (cr=17090 pr=707459 pw=446115 time=1464795748 us)
  26066    HASH UNIQUE (cr=17090 pr=707459 pw=446115 time=1464769678 us)
63422824     VIEW  index$_join$_002 (cr=17090 pr=707459 pw=446115 time=1084846545 us)
63422824      HASH JOIN  (cr=17090 pr=707459 pw=446115 time=958000889 us)
63422824       PARTITION LIST ALL PARTITION: 1 145 (cr=3561 pr=0 pw=0 time=63423134 us)
63422824        BITMAP CONVERSION TO ROWIDS (cr=3561 pr=0 pw=0 time=9554 us)
   7112         BITMAP INDEX FULL SCAN REV_F_IDX1 PARTITION: 1 145 (cr=3561 pr=0 pw=0 time=155525 us)(object id 366074)
63422824       PARTITION LIST ALL PARTITION: 1 145 (cr=13529 pr=8864 pw=0 time=190268771 us)
63422824        BITMAP CONVERSION TO ROWIDS (cr=13529 pr=8864 pw=0 time=63553723 us)
 432700         BITMAP INDEX FULL SCAN REV_F_IDX5 PARTITION: 1 145 (cr=13529 pr=8864 pw=0 time=3157351 us)(object id 366658)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                       2        0.00          0.00
  direct path write temp                      29741        1.62        107.05
  db file sequential read                      8864        0.20          2.35
  direct path read temp                       46573        0.79        211.02
  SQL*Net message from client                     2       29.22         29.22

In this case Oracle is clearly doing the bitmap join, but it’s managing to do something very inefficient. The problem lies in the partitioning or, to be more precise, Oracle’s failure to take advantage of partitioning. The OP complains of using 3GB of memory and several minutes of elapsed time. The plan shows us that the we have 145 partitions (PARTITION LIST ALL PARTITION: 1 145), and we have been told that the table is about 17GB is size, so the “average” partition is about 120MB – so why isn’t Oracle using a partition-wise approach and processing the data one partition at a time ? The answer is simple – it can’t be done (at present).

An index join works by doing a hash join between rowids – and since we are using bitmap indexes we have to convert bitmaps to rowids as part of the plan. In this query we then want to count the number of distinct combintations of (SRC_SYS,PROD_KEY) – and the same combination may appear in different partitions, so Oracle has had to generate a plan that handles the entire data set in a single join rather than trying to handle each partition separately (notice how the “partition list all” operator appears twice, once for each index).

The “Row Source Operation” tells us we only had to scan a few thousand block, but we have to build a hash table of 64 million entries:

63422824        BITMAP CONVERSION TO ROWIDS (cr=3561 pr=0 pw=0 time=9554 us)

At 10 bytes per rowid (for a partitioned table), plus the length of the input column, plus linked list overheads (8 bytes per pointer) you can see the 3 GB beginning to appear out of the volume of work being done. And then the Oracle dumped the whole thing to disc (perhaps in anticipation of the impending “hash unique”, perhaps because it still needed to do a one-pass operation – it would have been interesting to see the full row source execution statistics from a call to dbms_xplan.display_cursor())

What we need to see is a plan more like the following:

---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name                 | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |                      |       |       |   221K(100)|          |       |       |
|   1 |  SORT AGGREGATE                    |                      |     1 |       |            |          |       |       |
|   2 |   VIEW                             |                      | 24533 |       |   221K  (6)| 00:44:22 |       |       |
|   3 |    HASH UNIQUE                     |                      | 24533 |   479K|   221K  (6)| 00:44:22 |       |       |
|   4 |     PARTITION LIST ALL             |                      |    63M|  1209M|  3591   (1)| 00:00:44 |     1 |   145 |
|*  5 |      HASH UNIQUE                   |                      |       |       |            |          |       |       |
|   6 |       VIEW                         | index$_join$_002     |    63M|  1209M|   212K  (2)| 00:42:28 |       |       |
|*  7 |        HASH JOIN                   |                      |       |       |            |          |       |       |
|   8 |         BITMAP CONVERSION TO ROWIDS|                      |    63M|  1209M|  3591   (1)| 00:00:44 |       |       |
|   9 |          BITMAP INDEX FULL SCAN    | REV_F_IDX1           |       |       |            |          |     1 |   145 |
|  10 |         BITMAP CONVERSION TO ROWIDS|                      |    63M|  1209M| 13724   (1)| 00:02:45 |       |       |
|  11 |          BITMAP INDEX FULL SCAN    | REV_F_IDX5           |       |       |            |          |     1 |   145 |
-------------------------------------------- ------------------------------------------------------------------------------

Line 4 shows us that we do something for each partition in turn. The sub-plan to line 4 tells us that we are collecting the unique combinations of (SRC_SYS,PROD_KEY) for a given partition. Line 3 tells us that we are collating the results from the different partitions and generating the set of values that is unique across all partitions.

The problem is this: can we engineer a strange piece of SQL that makes plan appear – because the optimizer isn’t going to do it automatically (yet).

Obviously it would be pretty easy to write some sort of solution using pl/sql and pipelined functions - perhaps a function that takes a table_name loops through each partition of the table in turn returning the distinct combinations for that partition, as this would allow you to “select count(distinct(…)) from table_function(…);” to get your result.

You might be able to avoid pl/sql by creating a piece of SQL joining the table’s metadata to the table by partition identifier – except you would probably need to use a lateral view, which Oracle doesn’t support, and make the partition extended syntax part of the lateral dependency .. which Oracle definitely doesn’t support.

So is there an alternative, purely SQL, strategy ?

I’m thinking about it – I have a cunning plan, but I haven’t had time to test it yet.

to be continued …

Update (7th July 2011):

I’ve finally managed to make a bit of time to write up my notes about this – and in the interim Randolf Geist has said everything that needs to be said, and the OP has pointed out that a full tablescan works faster anyway. However, here goes …

My cunning plans involved finding ways of forcing Oracle into doing a single partition hash join for each partition in turn by playing clever  tricks with dbms_mview.pmarker or the dba_tab_partitions view, or even the tbl$or$idx$part$num() function. But I realised I was in trouble as soon as I wrote the first  little test that was supposed to do an index hash join on a single explicitly referenced partition. Here’s my table description, index definitions, query and execution plan:

SQL> desc t1
 Name                    Null?    Type
 ----------------------- -------- ----------------
 NVCUSTATUS                       VARCHAR2(20)        -- list partitined on this column, multiple values in some partitions
 FREQ_COLUMN                      NUMBER(4)
 HBAL_COLUMN                      NUMBER(8)
 RAND_COLUMN                      NUMBER(8)

create bitmap index t1_freq on t1(freq_column) local;
create bitmap index t1_hbal on t1(hbal_column) local;

select
	count(1)
from	(
	select
		/*+index_join(t1) */
		distinct freq_column, hbal_column
	from t1 partition (p6)
	)
;

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name             | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                  |   894K|  6112K|  2966   (2)| 00:00:36 |       |       |
|   1 |  VIEW                          | index$_join$_001 |   894K|  6112K|  2966   (2)| 00:00:36 |       |       |
|*  2 |   HASH JOIN                    |                  |       |       |            |          |       |       |
|   3 |    PARTITION LIST SINGLE       |                  |   894K|  6112K|   208   (0)| 00:00:03 |   KEY |   KEY |
|   4 |     BITMAP CONVERSION TO ROWIDS|                  |   894K|  6112K|   208   (0)| 00:00:03 |       |       |
|   5 |      BITMAP INDEX FULL SCAN    | T1_FREQ          |       |       |            |          |     6 |     6 |
|   6 |    PARTITION LIST SINGLE       |                  |   894K|  6112K|   299   (0)| 00:00:04 |   KEY |   KEY |
|   7 |     BITMAP CONVERSION TO ROWIDS|                  |   894K|  6112K|   299   (0)| 00:00:04 |       |       |
|   8 |      BITMAP INDEX FULL SCAN    | T1_HBAL          |       |       |            |          |     6 |     6 |
-------------------------------------------------------------------------------------------------------------------

Even though we address a single partition explicitly, and even though the optimizer recognises it as partition 6 in lines 5 and 8, the plan isn’t doing a “partition-wise” join between the two indexes – the hash join calls two independent partition operations. In effect, Oracle is joining two different tables, and there isn’t a join condition on the partitioning column.

If we change the index definitions, though, to append the partitioning column (and given the small number of distinct values in each partition this didn’t affect the index size by much), and then rewrite the query to include the join, we get a better result – but only if we do a manual rewrite of the query (and if the partitioning column is declared not null).


alter table t1 modify nvcustatus not null;

drop index t1_freq;
drop index t1_hbal;

create bitmap index t1_freq on t1(freq_column, nvcustatus) local;
create bitmap index t1_hbal on t1(hbal_column, nvcustatus) local;

select
	count(*)
from	(
	select
		/*+
			qb_name(main)
			no_eliminate_join(@SEL$96BB32CC t1@hbal)
			no_eliminate_join(@SEL$96BB32CC t1@freq)
		*/
		distinct ftb.freq_column, htb.hbal_column
	from
		(
		select
			/*+ qb_name(freq) index(t1 t1_freq) */
			nvcustatus, freq_column, rowid frid
		from
			t1
		)	ftb,
		(
		select
			/*+ qb_name(hbal) index(t1 t1_hbal) */
			nvcustatus, hbal_column, rowid hrid
		from
			t1
		)	htb
	where
		ftb.frid = htb.hrid
	and	ftb.nvcustatus= htb.nvcustatus
	)
;

-------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|
-------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |         |      1 |        |      1 |00:00:05.44 |     719 |   4380 |   3660 |       |       |          |         |
|   1 |  SORT AGGREGATE                  |         |      1 |      1 |      1 |00:00:05.44 |     719 |   4380 |   3660 |       |       |          |         |
|   2 |   VIEW                           |         |      1 |  35118 |  23223 |00:00:05.50 |     719 |   4380 |   3660 |       |       |          |         |
|   3 |    HASH UNIQUE                   |         |      1 |  35118 |  23223 |00:00:05.46 |     719 |   4380 |   3660 |  1268K|  1268K| 2647K (0)|         |
|   4 |     PARTITION LIST ALL           |         |      1 |    330K|   1000K|00:00:18.44 |     719 |   4380 |   3660 |       |       |          |         |
|*  5 |      HASH JOIN                   |         |      8 |    330K|   1000K|00:00:15.34 |     719 |   4380 |   3660 |   283M|    16M|   27M (1)|   31744 |
|   6 |       BITMAP CONVERSION TO ROWIDS|         |      8 |   1000K|   1000K|00:00:00.99 |     296 |    289 |      0 |       |       |          |         |
|   7 |        BITMAP INDEX FULL SCAN    | T1_FREQ |      8 |        |   1831 |00:00:00.30 |     296 |    289 |      0 |       |       |          |         |
|   8 |       BITMAP CONVERSION TO ROWIDS|         |      7 |   1000K|   1000K|00:00:06.48 |     423 |    416 |      0 |       |       |          |         |
|   9 |        BITMAP INDEX FULL SCAN    | T1_HBAL |      7 |        |   7353 |00:00:00.41 |     423 |    416 |      0 |       |       |          |         |
-------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("NVCUSTATUS"="NVCUSTATUS" AND ROWID=ROWID)

As you can see from the plan, I ran one of my tests with rowsource execution statistics enabled, so that I could count starts.

My query basically breaks the implicit join of two indexes into an explcit join so that I can include the join on the partitioning column. You’ll notice that I now iterate through each partition – line 4 – and do a hash join for each pair of partitions. This is the target I was hoping for. The theory was that by doing (up to) eight small hash joins, each join will operate in memory, rather than flooding one large build table to disc. Of course, the thing I’m building the hash table with is now three values (rowid, freq_value, nvcustatus) rather than the two values from the implicit join (rowid, freq_value), so if I’m unlucky this version of the query could take even longer than the original because it may have to do more I/O.

If you’re wondering why I only started lines 8 and 9 once, this was because one of may partitions was empty, so Oracle had to scan it once as the build table to find that it was empty, then didn’t need to scan it as the probe table.

Footnote: Although I got the plan I wanted – it really didn’t make much difference, and even with the million rows I was using it was fast to solve the problem with a tablescan. The relative benefit of the different plans would be dicatated by the length of rows, the lengths of the relevant columns, the number of partitions, and the available memory for hashing.

Jonathan Lewis's picture

Prefixed

From time to time the question about whether local indexes on partitioned tables should be prefixed or non-prefixed appears on the Oracle forums and mailing lists.

It’s possible that I’m wrong – although no-one has come up with a counter-example to the statement I keep repeating – but the whole prefixed/non-prefixed thing for local indexes dates back to a limitation in the optimizer somewhere in the 8.0 time line where Oracle couldn’t do partition elimination on indexes properly but the overhead of the error it made could be dramatically reduced (in most cases) by sticking the partition key at the start of the index.

The guideline for local indexes are the same as they would be for a non-partitioned index on a non-partitioned heap table – the partitioning column(s) are just columns in the table that might be worth including in the index, and the order of the index columns is dictated by the important queries that have to access the table.

For further comments, there’s a note I wrote (which I’ve just been reminded of) on the OTN database forum that adds a little detail to this argument.

Jonathan Lewis's picture

Upgrade issues

Here’s an example of how a bug-fix can create problems. It’s a code change in 11.2.x.x and (I believe) 10.2.0.5 relating to the costing of queries involving (but perhaps not restricted to) composite partitioned tables. I first saw this change in an email from Doug Burns, who sent me the 10053 traces from a very simple query that had started using the wrong index after an upgrade from 10.2.0.4 to 11.2.0.2.

As part of his testing he had set the optimizer_features_enable parameter back to 10.2.0.4 and found that not only did the choice of index change back to the expected index, but the costs of the two indexes changed dramatically. (The cost of using the “right” index changed from 15 to something in excess of 9,000 on the upgrade!)

The example I’m going to show you demonstrates the principles of the cost change – but with the data set and statistics I’ve generated you won’t see a change of execution plan. This is typical of some of the models I create – it’s enough to establish a principle, after which it’s simple enough to recognise the problems that the principle can cause.

So here’s a data set. I created it in a default 11.2.0.2 install on Windows 32-bit, so it’s running with 8KB blocks, autoallocate extents, and ASSM – but I’ve disabled CPU costing:

create table t1(
	list_col	number,
	range_col	number,
	v100		varchar2(10),
	v10		varchar2(10),
	vid		varchar2(10),
	padding		varchar2(100)
)
partition by range(range_col)
subpartition by list (list_col)
subpartition template (
	subpartition s0		values(0),
	subpartition s1		values(1),
	subpartition s2		values(2),
	subpartition s3		values(3),
	subpartition s4		values(4),
	subpartition s5		values(5),
	subpartition s6		values(6),
	subpartition s7		values(7),
	subpartition s8		values(8),
	subpartition s9		values(9)
)
(
	partition p_10 values less than (10),
	partition p_20 values less than (20),
	partition p_30 values less than (30),
	partition p_40 values less than (40),
	partition p_50 values less than (50),
	partition p_60 values less than (60),
	partition p_70 values less than (70),
	partition p_80 values less than (80),
	partition p_90 values less than (90),
	partition p_100 values less than (100)
)
;

insert into t1
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		rownum <= 10000
)
select
	mod(rownum,10),
	mod(rownum,100),
	lpad(mod(rownum,10),10),
	lpad(mod(rownum,100),10),
	lpad(rownum,10),
	rpad('x',100)
from
	generator	v1,
	generator	v2
where
	rownum <= 1000000
;

alter table t1 add constraint t1_pk primary key(v10, vid, v100, range_col, list_col) using index local;

create index t1_one_col on t1(v100) local;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt 	 => 'for all columns size 1'

	);
end;
/

You’ll notice that one of the indexes I’ve created has only one column, while the other has five columns (including the single column of the smaller index). Since I’ve created only 1,000,000 rows spread across 100 partitions every partition is small and the corresponding index partitions are also small, so I’m going to create use dbms_stats.get_index_stats() and dbms_stats.set_index_stats() to make the indexes appear much larger. Specifically I will set the blevel on the single column index to 2, and the blevel on the multi-column index to 3. (This difference in settings isn’t just whimsy, it helps to emulate Doug’s problem.) I’ve previously published the type of code to make this possible; in this case I only set the table-level stats because the queries I’m going to use will trigger Oracle to use just the table-level stats.

select
	/*+ index(t1(v100)) */
	count(*)
from
	t1
where
	v10 = '0000000005'
and	v100 = '0000000005'
;

select
	/*+ index(t1(v10, vid, v100)) */
	count(*)
from
	t1
where
	v10 = '0000000005'
and	v100 = '0000000005'
;

I’m going to show you two sets of execution plans. The first one is where I’ve set optimizer_features_enable to ’10.2.0.4′, the second it where I’ve left it to default.

10.2.0.4 execution plans:
===================
---------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name       | Rows  | Bytes | Cost  | Pstart| Pstop |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |            |     1 |    22 |  2350 |       |       |
|   1 |  SORT AGGREGATE                      |            |     1 |    22 |       |       |       |
|   2 |   PARTITION RANGE ALL                |            |  1000 | 22000 |  2350 |     1 |    10 |
|   3 |    PARTITION LIST ALL                |            |  1000 | 22000 |  2350 |     1 |    10 |
|*  4 |     TABLE ACCESS BY LOCAL INDEX ROWID| T1         |  1000 | 22000 |  2350 |     1 |   100 |
|*  5 |      INDEX RANGE SCAN                | T1_ONE_COL |   100K|       |   330 |     1 |   100 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("V10"='0000000005')
   5 - access("V100"='0000000005')

------------------------------------------------------------------------------
| Id  | Operation            | Name  | Rows  | Bytes | Cost  | Pstart| Pstop |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |       |     1 |    22 |   100 |       |       |
|   1 |  SORT AGGREGATE      |       |     1 |    22 |       |       |       |
|   2 |   PARTITION RANGE ALL|       |  1000 | 22000 |   100 |     1 |    10 |
|   3 |    PARTITION LIST ALL|       |  1000 | 22000 |   100 |     1 |    10 |
|*  4 |     INDEX RANGE SCAN | T1_PK |  1000 | 22000 |   100 |     1 |   100 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("V10"='0000000005' AND "V100"='0000000005')
       filter("V100"='0000000005')

11.2.0.2 execution plans:
===================
Execution Plan
----------------------------------------------------------
Plan hash value: 3019183742

---------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name       | Rows  | Bytes | Cost  | Pstart| Pstop |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |            |     1 |    22 |  2530 |       |       |
|   1 |  SORT AGGREGATE                      |            |     1 |    22 |       |       |       |
|   2 |   PARTITION RANGE ALL                |            |  1000 | 22000 |  2530 |     1 |    10 |
|   3 |    PARTITION LIST ALL                |            |  1000 | 22000 |  2530 |     1 |    10 |
|*  4 |     TABLE ACCESS BY LOCAL INDEX ROWID| T1         |  1000 | 22000 |  2530 |     1 |   100 |
|*  5 |      INDEX RANGE SCAN                | T1_ONE_COL |   100K|       |   510 |     1 |   100 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("V10"='0000000005')
   5 - access("V100"='0000000005')

------------------------------------------------------------------------------
| Id  | Operation            | Name  | Rows  | Bytes | Cost  | Pstart| Pstop |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |       |     1 |    22 |   370 |       |       |
|   1 |  SORT AGGREGATE      |       |     1 |    22 |       |       |       |
|   2 |   PARTITION RANGE ALL|       |  1000 | 22000 |   370 |     1 |    10 |
|   3 |    PARTITION LIST ALL|       |  1000 | 22000 |   370 |     1 |    10 |
|*  4 |     INDEX RANGE SCAN | T1_PK |  1000 | 22000 |   370 |     1 |   100 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("V10"='0000000005' AND "V100"='0000000005')
       filter("V100"='0000000005')

Regardless of the setting of optimizer_features_enable, if I hint the single-column index I have a plan that visits each partition and subpartition in turn for the index then table; and if I hint the primary key I do that same walk through each partition and subpartition and visit just the index.

The big difference is in the cost. By default, both execution plans cost more in 11.2.0.2 – but the odd thing is that the change in costs is different. The cost of using the single-column index has gone up by 180, the cost of using the primary key index has gone up by 270; and the change in costs can be attributed completely to the indexes.

Is there a pattern in the change ? Yes, it’s not immediately obvious, of course, and needs a few extra tests to confirm it, but the change in cost can be calculated as: (number of range partitions * (number of list partitions – 1) * blevel of index).

    For the single column index this is: 10 * 9 * 2 = 180
    For the primary key index this is: 10 * 9 * 3 = 270

You don’t have to see an example of a change in plan actually happening once you’ve seen this change in arithmetic. In certain circumstances the cost of using composite partitions goes up when you upgrade to 11.2 – and the change in cost is proportional to the number of data segments (subpartitions) and the blevel of the indexes.

In Doug’s case he had a table with roughly 4,000 subpartitions in total – and he had a single column index with a blevel of 2, and a multi-column index with a blevel of 3: so the cost of using the (smaller, but less efficient) index went up by about 8,000 and the cost of using the (larger but efficient) index went up by about 12,000. The difference of 4,000 between these two increases was far higher than the original cost of using the bad index – so Oracle switched from the good index to the bad index.

The nasty thing about this problem is that it’s correcting an error – 11.2 is allowing for the cost of probing every single index subpartition, that’s what the extra multiples of the blevel represent; unfortunately the optimizer’s underlying paradigm of “every block visit is a disk I/O” makes this correction a threat.

Solutions and Workarounds

There is an /*+ optimizer_features_enable(‘x.x.x.x’) */ hint that could be used if the problem applies to just a couple of queries.

If the problem applies to classes of queries involving a couple of big tables and indexes you could use dbms_stats.set_index_stats() to adjust the blevel of the critical indexes.

If the problem appears all over the place you could set the optimizer_features_enable parameter to ’10.2.0.4′ in the parameter file, or in a session logon trigger.

If the problem appears all over the place, but there are other features of the 11.2.0.2 optimizer that are very helpful you could take advantage of the “_fix_control” parameter – after talking to Oracle support. This particular problem comes under bug fix 7132684, which appears in v$system_fix_control under 10.2.0.5, with the description “Fix costing for non prefixed local index”. If you’ve got the right symptoms, then the following statement will bypass the problem:

alter session set "_fix_control"='7132684:OFF';

Final Warning:

This note comes from 30 minutes looking at Doug’s trace files, one hour building a test case, and an hour spent writing this blog. This is not a complete determination of the problem it’s just an initial outline of what can happen and the symptoms to watch out for, posted as an early warning to help a few people save some time. Do not treat this note as a definitive reference.

Ichiro Obata's picture

Exadataを支える技術6 Advanced Compression

Oracle Advanced CompressionによるTableの圧縮はParallel Queryのパフォーマンスにも大きく影響するOracle Advanced Compression White Paperより:

大規模DWHシステムの場合、物理ディスクからの読み込みデータ量を減らすことがパフォーマンス改善のキーとなる。TPC-Hベンチマークで、その影響度を見てみると(http://ichiroobata.blogspot.com/2010/08/partitioning.htmlより):

Ichiro Obata's picture

Exadataを支える技術5 Oracle Partitioning

Oracle Data Sheetより、

パーティショニングにより、さまざまなアプリケーションの可用性、性能および管理面でさまざまなメリットをもたらします。

Syndicate content