Oakies Blog Aggregator

pete's picture

Oracle Privilege Analysis Now Free in EE from 18c and back ported to all 12c

Wow!!, i just got an email from someone in Oracle to let me know that the Privilege Analysis feature of Database Vault has had its licensing changed from this week to now be free as part of an Enterprise Edition....[Read More]

Posted by Pete On 20/11/18 At 10:06 PM

Jonathan Lewis's picture

Table order

Over the last few days I’ve highlighted on Twitter a couple of older posts showing how a change in the order that tables appear in the from clause could affect the execution plan of a query. In one case the note was purely theoretical describing a feature of the way the optimizer works with simple query blocks, in the other case the note was about an anomaly with table elimination that could appear with both “ANSI” and “traditional” Oracle syntax.

Here’s another note that might be more generally useful – an example of an odd side effect of ordering and “ANSI” syntax, with a suggestion for a pattern for writing ANSI SQL. It’s based on a test I wrote to play around with a problem that showed up on the Oracle database forum more than six years ago and shows a strange inconsistency. The setup is a little long-winded as the example involves 4 tables, so I’ll leave the script to create, load and index the tables to the end of the note. Here’s the query that introduced the problem; it’s a fairly straightforward 4 table join with two (left) outer joins:


select
        episode.episode_id , episode.cross_ref_id , episode.date_required ,
        product.number_required,
        request.site_id
from
        episode
left join
        request
on      episode.cross_ref_id = request.cross_ref_id
join
        product
ON      episode.episode_id = product.episode_id
left join
        product_sub_type
ON      product.prod_sub_type_id = product_sub_type.prod_sub_type_id
where
        episode.department_id = 2
and     product.status = 'I'
order by
        episode.date_required
;

And here’s the execution plan:


----------------------------------------------------------------------------------------
| Id  | Operation            | Name    | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |         | 33333 |  1725K|       | 17135   (4)| 00:00:01 |
|   1 |  SORT ORDER BY       |         | 33333 |  1725K|  2112K| 17135   (4)| 00:00:01 |
|*  2 |   HASH JOIN OUTER    |         | 33333 |  1725K|  1632K| 16742   (4)| 00:00:01 |
|*  3 |    HASH JOIN         |         | 33333 |  1236K|       |   436   (8)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL| PRODUCT | 33333 |   325K|       |    54  (12)| 00:00:01 |
|*  5 |     TABLE ACCESS FULL| EPISODE |   300K|  8203K|       |   375   (6)| 00:00:01 |
|   6 |    TABLE ACCESS FULL | REQUEST |  4000K|    57M|       | 13542   (3)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("EPISODE"."CROSS_REF_ID"="REQUEST"."CROSS_REF_ID"(+))
   3 - access("EPISODE"."EPISODE_ID"="PRODUCT"."EPISODE_ID")
   4 - filter("PRODUCT"."STATUS"='I')
   5 - filter("EPISODE"."DEPARTMENT_ID"=2)

The first thing you’ll notice, of course, is that the plan reports a three table join. Thanks to various referential integrity constraints, the absence of the table in the final select list, and the nature of the join to that table, the optimizer has determined that the product_sub_type table could be eliminated from the join without changing the result set.

What you can’t tell from the plan is that there’s an index on the request table that holds all the columns needed to satisfy the query, and an index fast full scan on the index would be significantly more efficient than the tablescan that appears at operation 6.

Having noticed from the plan that product_sub_type is redundant, the obvious thing to do before investigating further is to rewrite the statement to remove the table . Here’s the resulting query, with execution plan:

----------------------------------------------------------------------------------------------
| Id  | Operation              | Name        | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |             | 33333 |  1725K|       |  5525   (6)| 00:00:01 |
|   1 |  SORT ORDER BY         |             | 33333 |  1725K|  2112K|  5525   (6)| 00:00:01 |
|*  2 |   HASH JOIN OUTER      |             | 33333 |  1725K|  1632K|  5132   (7)| 00:00:01 |
|*  3 |    HASH JOIN           |             | 33333 |  1236K|       |   436   (8)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL  | PRODUCT     | 33333 |   325K|       |    54  (12)| 00:00:01 |
|*  5 |     TABLE ACCESS FULL  | EPISODE     |   300K|  8203K|       |   375   (6)| 00:00:01 |
|   6 |    INDEX FAST FULL SCAN| IX4_REQUEST |  4000K|    57M|       |  1932   (7)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("EPISODE"."CROSS_REF_ID"="REQUEST"."CROSS_REF_ID"(+))
   3 - access("EPISODE"."EPISODE_ID"="PRODUCT"."EPISODE_ID")
   4 - filter("PRODUCT"."STATUS"='I')
   5 - filter("EPISODE"."DEPARTMENT_ID"=2)

So – when the optimizer removes the product_sub_type from the query the plan reports a tablescan of request, when we remove product_sub_type the plan reports an index fast full scan of an appropriate index – which appears to be roughly one seventh (1,932/13,542) of the size of the table. It’s a little surprising that the optimizer didn’t get it right by itself – but “ANSI” style SQL often displays quirky little side effects because of the way the optimizer transforms it into traditional Oracle style.

We could stop at that point, of course, but then you’d wonder about the significance of the title of the post. So let’s play around with the join order of the original query, without removing the product_sub_type table.

As a general strategy (though not an absolute rule) I tend to arrange code so that outer joins don’t appear before “inner” joins. In this example that means I would have written the original statement as follows:


select
        episode.episode_id, episode.cross_ref_id, episode.date_required,
        product.number_required,
        request.site_id
from
        episode
join
        product
ON      product.episode_id = episode.episode_id
left join
        product_sub_type
ON      product_sub_type.prod_sub_type_id = product.prod_sub_type_id
left join
        request
on      request.cross_ref_id = episode.cross_ref_id
where
        episode.department_id = 2
and     product.status        = 'I'
order by
        episode.date_required
;

All I’ve done is move the join between episode and product up the SQL, following it with the outer join to product_sub_type, finally closing with the outer join between episode and request. Here’s the execution plan – which you might expect to look exactly like the original plan:


----------------------------------------------------------------------------------------------
| Id  | Operation              | Name        | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |             | 33333 |  1725K|       |  5525   (6)| 00:00:01 |
|   1 |  SORT ORDER BY         |             | 33333 |  1725K|  2112K|  5525   (6)| 00:00:01 |
|*  2 |   HASH JOIN OUTER      |             | 33333 |  1725K|  1632K|  5132   (7)| 00:00:01 |
|*  3 |    HASH JOIN           |             | 33333 |  1236K|       |   436   (8)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL  | PRODUCT     | 33333 |   325K|       |    54  (12)| 00:00:01 |
|*  5 |     TABLE ACCESS FULL  | EPISODE     |   300K|  8203K|       |   375   (6)| 00:00:01 |
|   6 |    INDEX FAST FULL SCAN| IX4_REQUEST |  4000K|    57M|       |  1932   (7)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("REQUEST"."CROSS_REF_ID"(+)="EPISODE"."CROSS_REF_ID")
   3 - access("PRODUCT"."EPISODE_ID"="EPISODE"."EPISODE_ID")
   4 - filter("PRODUCT"."STATUS"='I')
   5 - filter("EPISODE"."DEPARTMENT_ID"=2)

The product_sub_type table has been eliminated and we’re doing an index fast full scan of the ix4_request index instead of a tablescan of the much larger request table.

tl;dr

Changing the order of the tables in an ANSI join – especially when there are outer joins involved – could make a significant difference to the way the query is transformed and optimised. While it is nice to write the table ordering so that “chains” of joins are easily visible, bear in mind that re-ordering the join to postpone outer joins may be enough to help the optimizer produce a better execution plan.

Footnote

If you want to play around with the example, here’s the code to create and load the tables. The code doesn’t follow my usual style as most of it is cut-n-pasted from the Oracle forum thread:


rem
rem     script:         Ansi_outer_5.sql
rem     Dated:          July 2012
rem     Author:         Jonathan Lewis
rem
rem     Last tested
rem             18.3.0.0        iffs still not used by default
rem             12.2.0.1        iffs still not used by default
rem

create table episode (
        episode_id number (*,0),
        department_id number (*,0),
        date_required date,
        cross_ref_id varchar2 (11),
        padding varchar2 (80),
        constraint pk_episode primary key (episode_id)
)
;

create table product_sub_type (
        prod_sub_type_id number (*,0),
        sub_type_name varchar2 (20),
        units varchar2 (20),
        padding varchar2 (80),
        constraint pk_product_sub_type primary key (prod_sub_type_id)
)
;

create table product (
        product_id number (*,0),
        prod_type_id number (*,0),
        prod_sub_type_id number (*,0),
        episode_id number (*,0),
        status varchar2 (1),
        number_required number (*,0),
        padding varchar2 (80),
        constraint pk_product primary key (product_id),
        constraint nn_product_episode check (episode_id is not null) 
)
;

alter table product add constraint fk_product 
        foreign key (episode_id) references episode (episode_id)
;

alter table product add constraint fk_prod_sub_type
        foreign key (prod_sub_type_id) references product_sub_type (prod_sub_type_id)
;

create table request (
        request_id number (*,0),
        department_id number (*,0),
        site_id number (*,0),
        cross_ref_id varchar2 (11),
        padding varchar2 (80),
        padding2 varchar2 (80),
        constraint pk_request primary key (request_id),
        constraint nn_request_department check (department_id is not null),
        constraint nn_request_site_id check (site_id is not null)
)
;

prompt  ===================
prompt  Loading episode ...
prompt  ===================

insert /*+ append */ into episode
with generator as 
(select rownum r
          from (select rownum r from dual connect by rownum <= 1000) a,
               (select rownum r from dual connect by rownum <= 1000) b,
               (select rownum r from dual connect by rownum <= 1000) c
         where rownum <= 1e6
       ) 
select r, 2,
    sysdate + mod (r, 14),
    to_char (r, '0000000000'),
    'ABCDEFGHIJKLMNOPQRSTUVWXYZ' || to_char (r, '000000')
  from generator g
where g.r <= 3e5
/ 

commit;

prompt  ============================
prompt  Loading product_sub_type ...
prompt  ============================

insert /*+ append */ into product_sub_type
with generator as 
(select rownum r
          from (select rownum r from dual connect by rownum <= 1000) a,
               (select rownum r from dual connect by rownum <= 1000) b,
               (select rownum r from dual connect by rownum <= 1000) c
         where rownum <= 1e6
       ) 
select r, 
       to_char (r, '000000'),
       to_char (mod (r, 3), '000000'),
       'ABCDE' || to_char (r, '000000')
  from generator g
where g.r <= 15
/ 

commit;

prompt  ===================
prompt  Loading product ...
prompt  ===================

insert /*+ append */ into product
with generator as 
(select rownum r
          from (select rownum r from dual connect by rownum <= 1000) a,
               (select rownum r from dual connect by rownum <= 1000) b,
               (select rownum r from dual connect by rownum <= 1000) c
         where rownum <= 1e6
       ) 
select r, mod (r, 12) + 1, mod (r, 15) + 1, mod (r, 300000) + 1,
       decode (mod (r, 3), 0, 'I', 1, 'C', 2, 'X', 'U'),
       dbms_random.value (1, 100), NULL
  from generator g
where g.r <= 1e5
/ 

commit;

prompt  ===================
prompt  Loading request ...
prompt  ===================

insert /*+ append */ into request
with generator as 
(select rownum r
          from (select rownum r from dual connect by rownum <= 1000) a,
               (select rownum r from dual connect by rownum <= 1000) b,
               (select rownum r from dual connect by rownum <= 1000) c
         where rownum <= 1e7
       ) 
select 
        r, mod (r, 4) + 1, 1, to_char (r, '0000000000'),
        'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01234567890123456789' || to_char (r, '000000'),
        'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789012345678' || to_char (r, '000000')
  from generator g
where g.r <= 4e6
/ 

commit;

create index ix1_episode_cross_ref on episode (cross_ref_id);

create index ix1_product_episode on product (episode_id);
create index ix2_product_type on product (prod_type_id);

create index ix1_request_site on request (site_id);
create index ix2_request_dept on request (department_id);
create index ix3_request_cross_ref on request (cross_ref_id);
create index ix4_request on request (cross_ref_id, site_id);

exec dbms_stats.gather_schema_stats ('test_user')

Note that there is a call to gather_schema_stats() at the end, rather than a set of 4 calls to gather_table_stats(); you may want to change this. The entire data set, including indexes, will need about 1.5GB of free space.

 

davidkurtz's picture

How Not to Build A(n Autonomous) Data Warehouse

My day job involves investigating and resolving performance problems, so I get to see a lot of bad stuff.  Often, these problems have their roots in poor design.  It is not surprising. but is nonetheless disappointing, that when I point this out I am told that the system is either delivered this way by the vendor, or it has already been built and it is too late to change.
In the last couple of years, I have worked on several data warehouse applications that have provided the inspiration for a new presentation that I am giving at the DOAG and UKOUG conferences this year.
The presentation and this series of related blogs have several objectives:

  • Partly, it is an attempt to get some advice out in the hope that some of these poor design decisions are not made in the first place. 
  • I get to have a bit of a rant about how to design a data warehouse properly, and I will feel better at the end it.
  • I get to attend two really good conferences where I will learn (or be reminded of) much.

This particular blog post is an index to various topics within the presentation, that I have turned into separate blog postings.

Oracle Sales History (SH) Sample Schema

I cannot illustrate issues with actual examples from client systems, no matter how well anonymised they are. So, I have recreated some of the most egregious mistakes using the Oracle Sales History example schema, SH.  Everything I am going to show you does have an origin in the real world!  The SH sample schema is a simple and very typical star schema.  It is well designed and implemented, and represents good practice. I encourage you to install and play with it for yourself.
My examples use a single fact table (SALES), that has five dimensions around it (CUSTOMERS, CHANNELS, PRODUCTS, PROMOTIONS and TIMES), although I will only use three. COUNTRIES is a dimension on the CUSTOMERS dimension, sometimes called a 'snowflake'.

In some of my examples, I have deliberately broken the data model in exactly the same way that I have seen it broken in real life. So, if you find yourself saying "nobody would actually do that!", let me assure you that they did!

Common Mistakes in Data Warehouse Design and Build

As each blog post is published, I will add a link to them in this section.

davidkurtz's picture

Data Warehouse Design Mistakes 3: Date Dimensions Without Date Primary Keys

This post is part of a series that discusses some common issues in data warehouses.

Good Practice 

It is not uncommon to see a time dimension with one row for every day. This approach saves putting functions on the date column of the fact table.  For example, in the Oracle Sales History sample schema:

  • The primary key on the time dimension is a date.
  • There are 37 different attribute columns. This saves coding complex SQL functions to group dates.
#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; line-height: 95%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">Name                                      Null?    Type
----------------------------------------- -------- -----------
TIME_ID NOT NULL DATE
DAY_NAME NOT NULL VARCHAR2(9)
DAY_NUMBER_IN_WEEK NOT NULL NUMBER(1)
DAY_NUMBER_IN_MONTH NOT NULL NUMBER(2)
CALENDAR_WEEK_NUMBER NOT NULL NUMBER(2)
FISCAL_WEEK_NUMBER NOT NULL NUMBER(2)
WEEK_ENDING_DAY NOT NULL DATE
WEEK_ENDING_DAY_ID NOT NULL NUMBER
CALENDAR_MONTH_NUMBER NOT NULL NUMBER(2)
FISCAL_MONTH_NUMBER NOT NULL NUMBER(2)
CALENDAR_MONTH_DESC NOT NULL VARCHAR2(8)
CALENDAR_MONTH_ID NOT NULL NUMBER
FISCAL_MONTH_DESC NOT NULL VARCHAR2(8)
FISCAL_MONTH_ID NOT NULL NUMBER
DAYS_IN_CAL_MONTH NOT NULL NUMBER
DAYS_IN_FIS_MONTH NOT NULL NUMBER
END_OF_CAL_MONTH NOT NULL DATE
END_OF_FIS_MONTH NOT NULL DATE
CALENDAR_MONTH_NAME NOT NULL VARCHAR2(9)
FISCAL_MONTH_NAME NOT NULL VARCHAR2(9)
CALENDAR_QUARTER_DESC NOT NULL CHAR(7)
CALENDAR_QUARTER_ID NOT NULL NUMBER
FISCAL_QUARTER_DESC NOT NULL CHAR(7)
FISCAL_QUARTER_ID NOT NULL NUMBER
DAYS_IN_CAL_QUARTER NOT NULL NUMBER
DAYS_IN_FIS_QUARTER NOT NULL NUMBER
END_OF_CAL_QUARTER NOT NULL DATE
END_OF_FIS_QUARTER NOT NULL DATE
CALENDAR_QUARTER_NUMBER NOT NULL NUMBER(1)
FISCAL_QUARTER_NUMBER NOT NULL NUMBER(1)
CALENDAR_YEAR NOT NULL NUMBER(4)
CALENDAR_YEAR_ID NOT NULL NUMBER
FISCAL_YEAR NOT NULL NUMBER(4)
FISCAL_YEAR_ID NOT NULL NUMBER
DAYS_IN_CAL_YEAR NOT NULL NUMBER
DAYS_IN_FIS_YEAR NOT NULL NUMBER
END_OF_CAL_YEAR NOT NULL DATE
END_OF_FIS_YEAR NOT NULL DATE

I have seen some time dimensions with 200 years of dates (eg 1900 – 2100), this is fine if your queries specify a date range on the dimension, but if you have an unbounded inequality such as simply after a date then you will get a lot of rows from the time dimension. This can also bring challenges, so keep it sensible.

Not Good Practice 

You should represent a date as a date and not as a string or a number.

  • 31st December 2018
  • Not 20181231 
  • Not '20181231' 

If you represent a date as a string you can get miscalculations in the optimizer.

  • For example, the difference between 31st December 2018 and 1st January 2019 should be 1 day.
  • However, if you use the string representation of a day it is 20190101-20181231=8870 
  • This can lead to bad cardinality estimates, and in more complex queries it can cause subsequent bad optimizer decisions later in the query. 

I will create a copy of the TIMES dimension, but with a numeric primary key, and I will add the original date in a new attribute column CALENDAR_DATE.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; line-height: 95%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">INSERT /*+APPEND*/ INTO bad_times
( TIME_ID, CALENDAR_DATE, DAY_NAME
...
)
SELECT
TO_NUMBER(TO_CHAR(TIME_ID,'YYYYMMDD')) time_id, TIME_ID calendar_date, DAY_NAME
...
FROM times
/

TIME_ID is now a number that contains the date string, and CALENDAR_DATE is the original date value

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; line-height: 95%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">   TIME_ID CALENDAR_
---------- ---------
19980101 01-JAN-98
19980102 02-JAN-98
19980103 03-JAN-98
19980104 04-JAN-98
19980105 05-JAN-98
19980106 06-JAN-98
19980107 07-JAN-98
19980108 08-JAN-98
19980109 09-JAN-98
19980110 10-JAN-98
...

I will also create a copy of sales where I have used the same numeric format for TIME_ID. The foreign key between these is still on TIME_ID, but that column is now a number.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; line-height: 95%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">INSERT /*+APPEND*/ INTO bad_sales
(prod_id, cust_id, time_id, channel_id, promo_id, quantity_sold, amount_sold)
SELECT prod_id, cust_id, TO_NUMBER(TO_CHAR(time_id,'yyyymmdd')), channel_id, promo_id, quantity_sold, amount_sold
FROM sales
/

I will demonstrate the problem with a simple monthly sales analysis report for FY 1999.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; line-height: 95%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">SELECT  t.fiscal_year, t.fiscal_month_id, t.fiscal_month_desc
, COUNT(*) num_sales
, SUM(s.amount_sold) total_amount_sold
FROM sales s
, customers u
, products p
, times t
WHERE s.time_id = t.time_id
AND s.prod_id = p.prod_id
AND u.cust_id = s.cust_id
AND t.time_id >= TO_DATE('27121998','DDMMYYYY')
AND t.time_id < TO_DATE('27121999','DDMMYYYY')
GROUP BY t.fiscal_year, t.fiscal_month_id, t.fiscal_month_desc
ORDER BY 1
/

Plan hash value: 3667272686

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | Pstart| Pstop | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 165 (100)| | | | 13 |00:00:00.28 | 500 | 303 | | | |
| 1 | SORT GROUP BY | | 1 | 365 | 21170 | 165 (14)| 00:00:01 | | | 13 |00:00:00.28 | 500 | 303 | 2048 | 2048 | 2048 (0)|
|* 2 | HASH JOIN | | 1 | 365 | 21170 | 164 (13)| 00:00:01 | | | 365 |00:00:00.28 | 500 | 303 | 1298K| 1298K| 1527K (0)|
| 3 | PART JOIN FILTER CREATE | :BF0000 | 1 | 366 | 8784 | 14 (0)| 00:00:01 | | | 365 |00:00:00.01 | 14 | 0 | | | |
| 4 | TABLE ACCESS BY INDEX ROWID BATCHED| TIMES | 1 | 366 | 8784 | 14 (0)| 00:00:01 | | | 365 |00:00:00.01 | 14 | 0 | | | |
|* 5 | INDEX RANGE SCAN | TIMES_PK | 1 | 366 | | 3 (0)| 00:00:01 | | | 365 |00:00:00.01 | 3 | 0 | | | |
| 6 | VIEW | VW_GBC_5 | 1 | 366 | 12444 | 150 (14)| 00:00:01 | | | 365 |00:00:00.28 | 486 | 303 | | | |
| 7 | HASH GROUP BY | | 1 | 366 | 4758 | 150 (14)| 00:00:01 | | | 365 |00:00:00.28 | 486 | 303 | 1063K| 1063K| 2552K (0)|
| 8 | PARTITION RANGE AND | | 1 | 230K| 2924K| 133 (4)| 00:00:01 |KEY(AP)|KEY(AP)| 247K|00:00:00.21 | 486 | 303 | | | |
|* 9 | TABLE ACCESS FULL | SALES | 5 | 230K| 2924K| 133 (4)| 00:00:01 |KEY(AP)|KEY(AP)| 247K|00:00:00.21 | 486 | 303 | | | |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("ITEM_1"="T"."TIME_ID")
5 - access("T"."TIME_ID">=TO_DATE(' 1998-12-27 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "T"."TIME_ID"=TO_DATE(' 1998-12-27 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))

The literal predicate on T.TIME_ID is repeated on S.TIME_ID in the predicate section of the execution plan due to the optimizer performing transitive closure.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; line-height: 95%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">SELECT  t.fiscal_year, t.fiscal_month_id, t.fiscal_month_desc
, COUNT(*) num_sales
, SUM(s.amount_sold) total_amount_sold
from bad_sales s
, customers u
, products p
, bad_times t
WHERE s.time_id = t.time_id
AND s.prod_id = p.prod_id
AND u.cust_id = s.cust_id
AND t.calendar_date >= TO_DATE('27121998','DDMMYYYY')
AND t.calendar_date < TO_DATE('27121999','DDMMYYYY')
GROUP BY t.fiscal_year, t.fiscal_month_id, t.fiscal_month_desc
ORDER BY 1
/

Plan hash value: 4232725394

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | Pstart| Pstop | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 509 (100)| | | | 13 |00:00:00.15 | 540 | | | |
| 1 | SORT GROUP BY | | 1 | 366 | 22692 | 509 (19)| 00:00:01 | | | 13 |00:00:00.15 | 540 | 2048 | 2048 | 2048 (0)|
|* 2 | HASH JOIN | | 1 | 366 | 22692 | 508 (18)| 00:00:01 | | | 365 |00:00:00.15 | 540 | 1335K| 1335K| 1207K (0)|
| 3 | JOIN FILTER CREATE | :BF0001 | 1 | 366 | 10980 | 16 (0)| 00:00:01 | | | 365 |00:00:00.01 | 56 | | | |
| 4 | PART JOIN FILTER CREATE | :BF0000 | 1 | 366 | 10980 | 16 (0)| 00:00:01 | | | 365 |00:00:00.01 | 56 | | | |
|* 5 | TABLE ACCESS FULL | BAD_TIMES | 1 | 366 | 10980 | 16 (0)| 00:00:01 | | | 365 |00:00:00.01 | 56 | | | |
| 6 | VIEW | VW_GBC_5 | 1 | 1460 | 46720 | 492 (19)| 00:00:01 | | | 370 |00:00:00.15 | 483 | | | |
| 7 | HASH GROUP BY | | 1 | 1460 | 16060 | 492 (19)| 00:00:01 | | | 370 |00:00:00.15 | 483 | 1079K| 1079K| 2561K (0)|
| 8 | JOIN FILTER USE | :BF0001 | 1 | 918K| 9870K| 423 (6)| 00:00:01 | | | 250K|00:00:00.10 | 483 | | | |
| 9 | PARTITION RANGE JOIN-FILTER| | 1 | 918K| 9870K| 423 (6)| 00:00:01 |:BF0000|:BF0000| 250K|00:00:00.09 | 483 | | | |
|* 10 | TABLE ACCESS FULL | BAD_SALES | 5 | 918K| 9870K| 423 (6)| 00:00:01 |:BF0000|:BF0000| 250K|00:00:00.09 | 483 | | | |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

If we do the same query on the new CALENDAR_DATE attribute column on BAD_TIMES dimension we still expect 366 rows from TIMES, but then we expect 918K rows from BAD_SALES instead of 230K rows from SALES.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; line-height: 95%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">SELECT  t.fiscal_year, t.fiscal_month_id, t.fiscal_month_desc
, COUNT(*) num_sales
, SUM(s.amount_sold) total_amount_sold
FROM bad_sales s
, customers u
, products p
, bad_times t
WHERE s.time_id = t.time_id
AND s.prod_id = p.prod_id
AND u.cust_id = s.cust_id
AND t.time_id >= 19982712
AND t.time_id < 19992712
GROUP BY t.fiscal_year, t.fiscal_month_id, t.fiscal_month_desc
ORDER BY 1
/

Plan hash value: 1098183223

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | Pstart| Pstop | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 176 (100)| | | | 13 |00:00:00.16 | 459 | | | |
| 1 | SORT GROUP BY | | 1 | 445 | 24030 | 176 (18)| 00:00:01 | | | 13 |00:00:00.16 | 459 | 2048 | 2048 | 2048 (0)|
|* 2 | HASH JOIN | | 1 | 445 | 24030 | 175 (18)| 00:00:01 | | | 365 |00:00:00.16 | 459 | 1355K| 1355K| 1540K (0)|
| 3 | PART JOIN FILTER CREATE| :BF0000 | 1 | 445 | 9790 | 16 (0)| 00:00:01 | | | 365 |00:00:00.01 | 56 | | | |
|* 4 | TABLE ACCESS FULL | BAD_TIMES | 1 | 445 | 9790 | 16 (0)| 00:00:01 | | | 365 |00:00:00.01 | 56 | | | |
| 5 | VIEW | VW_GBC_5 | 1 | 470 | 15040 | 159 (19)| 00:00:01 | | | 365 |00:00:00.16 | 402 | | | |
| 6 | HASH GROUP BY | | 1 | 470 | 5170 | 159 (19)| 00:00:01 | | | 365 |00:00:00.16 | 402 | 1079K| 1079K| 2576K (0)|
| 7 | PARTITION RANGE AND | | 1 | 295K| 3177K| 138 (7)| 00:00:01 |KEY(AP)|KEY(AP)| 247K|00:00:00.10 | 402 | | | |
|* 8 | TABLE ACCESS FULL | BAD_SALES | 4 | 295K| 3177K| 138 (7)| 00:00:01 |KEY(AP)|KEY(AP)| 247K|00:00:00.09 | 402 | | | |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

But, now if I use the numeric value of the date in the predicate it incorrectly calculates the cardinality of TIMES as 445 instead of 366. The estimate of rows from BAD_SALES is not as far off as the previous query at only 296K rows, but it is still higher than the original 230K.
We see that if you query the BAD_TIMES dimension by an attribute column, then things will mostly work quite well, but if you put a criterion directly on the TIME_ID column, you will get cardinality misestimates. The queries in this example are very simple, but if more tables were joined, cardinality misestimates would have more opportunity to cause plan regressions.

Workaround

If you are stuck with a numeric primary key on your time dimension that you cannot change, you might be able to change the value you actually put into it. If the TIME_ID was the Julian date value all the arithmetic would correctly because there would be no gaps between months and years.

#8EAADB .5pt; mso-border-themecolor: accent1; mso-border-themetint: 153; mso-padding-alt: 0cm 5.4pt 0cm 5.4pt; mso-yfti-tbllook: 1184;">
#4472C4; border-right: none; border: solid #4472C4 1.0pt; mso-background-themecolor: accent1; mso-border-bottom-alt: solid #4472C4 .5pt; mso-border-bottom-themecolor: accent1; mso-border-left-alt: solid #4472C4 .5pt; mso-border-left-themecolor: accent1; mso-border-themecolor: accent1; mso-border-top-alt: solid #4472C4 .5pt; mso-border-top-themecolor: accent1; padding: 0cm 5.4pt 0cm 5.4pt;" valign="top">
#4472C4; border-bottom: solid #4472C4 1.0pt; border-left: none; border-right: none; border-top: solid #4472C4 1.0pt; mso-background-themecolor: accent1; mso-border-bottom-alt: solid #4472C4 .5pt; mso-border-bottom-themecolor: accent1; mso-border-bottom-themecolor: accent1; mso-border-top-alt: solid #4472C4 .5pt; mso-border-top-themecolor: accent1; mso-border-top-themecolor: accent1; padding: 0cm 5.4pt 0cm 5.4pt;" valign="top">
From
#4472C4; border-bottom: solid #4472C4 1.0pt; border-left: none; border-right: none; border-top: solid #4472C4 1.0pt; mso-background-themecolor: accent1; mso-border-bottom-alt: solid #4472C4 .5pt; mso-border-bottom-themecolor: accent1; mso-border-bottom-themecolor: accent1; mso-border-top-alt: solid #4472C4 .5pt; mso-border-top-themecolor: accent1; mso-border-top-themecolor: accent1; padding: 0cm 5.4pt 0cm 5.4pt;" valign="top">
To
#4472C4; border-bottom: solid #4472C4 1.0pt; border-left: none; border-right: none; border-top: solid #4472C4 1.0pt; mso-background-themecolor: accent1; mso-border-bottom-alt: solid #4472C4 .5pt; mso-border-bottom-themecolor: accent1; mso-border-bottom-themecolor: accent1; mso-border-top-alt: solid #4472C4 .5pt; mso-border-top-themecolor: accent1; mso-border-top-themecolor: accent1; padding: 0cm 5.4pt 0cm 5.4pt;" valign="top">
Difference
#4472C4; border-left: none; border: solid #4472C4 1.0pt; mso-background-themecolor: accent1; mso-border-bottom-alt: solid #4472C4 .5pt; mso-border-bottom-themecolor: accent1; mso-border-right-alt: solid #4472C4 .5pt; mso-border-right-themecolor: accent1; mso-border-themecolor: accent1; mso-border-top-alt: solid #4472C4 .5pt; mso-border-top-themecolor: accent1; padding: 0cm 5.4pt 0cm 5.4pt;" valign="top">
Function
#D9E2F3; border-top: none; border: solid #8EAADB 1.0pt; mso-background-themecolor: accent1; mso-background-themetint: 51; mso-border-alt: solid #8EAADB .5pt; mso-border-themecolor: accent1; mso-border-themecolor: accent1; mso-border-themetint: 153; mso-border-themetint: 153; mso-border-top-alt: solid #8EAADB .5pt; mso-border-top-themecolor: accent1; mso-border-top-themetint: 153; padding: 0cm 5.4pt 0cm 5.4pt;" valign="top">
Date
#D9E2F3; border-bottom: solid #8EAADB 1.0pt; border-left: none; border-right: solid #8EAADB 1.0pt; border-top: none; mso-background-themecolor: accent1; mso-background-themetint: 51; mso-border-alt: solid #8EAADB .5pt; mso-border-bottom-themecolor: accent1; mso-border-bottom-themetint: 153; mso-border-left-alt: solid #8EAADB .5pt; mso-border-left-themecolor: accent1; mso-border-left-themetint: 153; mso-border-right-themecolor: accent1; mso-border-right-themetint: 153; mso-border-themecolor: accent1; mso-border-themetint: 153; mso-border-top-alt: solid #8EAADB .5pt; mso-border-top-themecolor: accent1; mso-border-top-themetint: 153; padding: 0cm 5.4pt 0cm 5.4pt;" valign="top">
31st December 2018
#D9E2F3; border-bottom: solid #8EAADB 1.0pt; border-left: none; border-right: solid #8EAADB 1.0pt; border-top: none; mso-background-themecolor: accent1; mso-background-themetint: 51; mso-border-alt: solid #8EAADB .5pt; mso-border-bottom-themecolor: accent1; mso-border-bottom-themetint: 153; mso-border-left-alt: solid #8EAADB .5pt; mso-border-left-themecolor: accent1; mso-border-left-themetint: 153; mso-border-right-themecolor: accent1; mso-border-right-themetint: 153; mso-border-themecolor: accent1; mso-border-themetint: 153; mso-border-top-alt: solid #8EAADB .5pt; mso-border-top-themecolor: accent1; mso-border-top-themetint: 153; padding: 0cm 5.4pt 0cm 5.4pt;" valign="top">
1st January 2019
#D9E2F3; border-bottom: solid #8EAADB 1.0pt; border-left: none; border-right: solid #8EAADB 1.0pt; border-top: none; mso-background-themecolor: accent1; mso-background-themetint: 51; mso-border-alt: solid #8EAADB .5pt; mso-border-bottom-themecolor: accent1; mso-border-bottom-themetint: 153; mso-border-left-alt: solid #8EAADB .5pt; mso-border-left-themecolor: accent1; mso-border-left-themetint: 153; mso-border-right-themecolor: accent1; mso-border-right-themetint: 153; mso-border-themecolor: accent1; mso-border-themetint: 153; mso-border-top-alt: solid #8EAADB .5pt; mso-border-top-themecolor: accent1; mso-border-top-themetint: 153; padding: 0cm 5.4pt 0cm 5.4pt;" valign="top">
1
#D9E2F3; border-bottom: solid #8EAADB 1.0pt; border-left: none; border-right: solid #8EAADB 1.0pt; border-top: none; mso-background-themecolor: accent1; mso-background-themetint: 51; mso-border-alt: solid #8EAADB .5pt; mso-border-bottom-themecolor: accent1; mso-border-bottom-themetint: 153; mso-border-left-alt: solid #8EAADB .5pt; mso-border-left-themecolor: accent1; mso-border-left-themetint: 153; mso-border-right-themecolor: accent1; mso-border-right-themetint: 153; mso-border-themecolor: accent1; mso-border-themetint: 153; mso-border-top-alt: solid #8EAADB .5pt; mso-border-top-themecolor: accent1; mso-border-top-themetint: 153; padding: 0cm 5.4pt 0cm 5.4pt;" valign="top">
#8EAADB 1.0pt; mso-border-alt: solid #8EAADB .5pt; mso-border-themecolor: accent1; mso-border-themecolor: accent1; mso-border-themetint: 153; mso-border-themetint: 153; mso-border-top-alt: solid #8EAADB .5pt; mso-border-top-themecolor: accent1; mso-border-top-themetint: 153; padding: 0cm 5.4pt 0cm 5.4pt;" valign="top">
Number
#8EAADB 1.0pt; border-left: none; border-right: solid #8EAADB 1.0pt; border-top: none; mso-border-alt: solid #8EAADB .5pt; mso-border-bottom-themecolor: accent1; mso-border-bottom-themetint: 153; mso-border-left-alt: solid #8EAADB .5pt; mso-border-left-themecolor: accent1; mso-border-left-themetint: 153; mso-border-right-themecolor: accent1; mso-border-right-themetint: 153; mso-border-themecolor: accent1; mso-border-themetint: 153; mso-border-top-alt: solid #8EAADB .5pt; mso-border-top-themecolor: accent1; mso-border-top-themetint: 153; padding: 0cm 5.4pt 0cm 5.4pt;" valign="top">
20181231
#8EAADB 1.0pt; border-left: none; border-right: solid #8EAADB 1.0pt; border-top: none; mso-border-alt: solid #8EAADB .5pt; mso-border-bottom-themecolor: accent1; mso-border-bottom-themetint: 153; mso-border-left-alt: solid #8EAADB .5pt; mso-border-left-themecolor: accent1; mso-border-left-themetint: 153; mso-border-right-themecolor: accent1; mso-border-right-themetint: 153; mso-border-themecolor: accent1; mso-border-themetint: 153; mso-border-top-alt: solid #8EAADB .5pt; mso-border-top-themecolor: accent1; mso-border-top-themetint: 153; padding: 0cm 5.4pt 0cm 5.4pt;" valign="top">
20190101
#8EAADB 1.0pt; border-left: none; border-right: solid #8EAADB 1.0pt; border-top: none; mso-border-alt: solid #8EAADB .5pt; mso-border-bottom-themecolor: accent1; mso-border-bottom-themetint: 153; mso-border-left-alt: solid #8EAADB .5pt; mso-border-left-themecolor: accent1; mso-border-left-themetint: 153; mso-border-right-themecolor: accent1; mso-border-right-themetint: 153; mso-border-themecolor: accent1; mso-border-themetint: 153; mso-border-top-alt: solid #8EAADB .5pt; mso-border-top-themecolor: accent1; mso-border-top-themetint: 153; padding: 0cm 5.4pt 0cm 5.4pt;" valign="top">
8070
#8EAADB 1.0pt; border-left: none; border-right: solid #8EAADB 1.0pt; border-top: none; mso-border-alt: solid #8EAADB .5pt; mso-border-bottom-themecolor: accent1; mso-border-bottom-themetint: 153; mso-border-left-alt: solid #8EAADB .5pt; mso-border-left-themecolor: accent1; mso-border-left-themetint: 153; mso-border-right-themecolor: accent1; mso-border-right-themetint: 153; mso-border-themecolor: accent1; mso-border-themetint: 153; mso-border-top-alt: solid #8EAADB .5pt; mso-border-top-themecolor: accent1; mso-border-top-themetint: 153; padding: 0cm 5.4pt 0cm 5.4pt;" valign="top">
TO_CHAR(…,'YYYMMDD')
#D9E2F3; border-top: none; border: solid #8EAADB 1.0pt; mso-background-themecolor: accent1; mso-background-themetint: 51; mso-border-alt: solid #8EAADB .5pt; mso-border-themecolor: accent1; mso-border-themecolor: accent1; mso-border-themetint: 153; mso-border-themetint: 153; mso-border-top-alt: solid #8EAADB .5pt; mso-border-top-themecolor: accent1; mso-border-top-themetint: 153; padding: 0cm 5.4pt 0cm 5.4pt;" valign="top">
Julian
#D9E2F3; border-bottom: solid #8EAADB 1.0pt; border-left: none; border-right: solid #8EAADB 1.0pt; border-top: none; mso-background-themecolor: accent1; mso-background-themetint: 51; mso-border-alt: solid #8EAADB .5pt; mso-border-bottom-themecolor: accent1; mso-border-bottom-themetint: 153; mso-border-left-alt: solid #8EAADB .5pt; mso-border-left-themecolor: accent1; mso-border-left-themetint: 153; mso-border-right-themecolor: accent1; mso-border-right-themetint: 153; mso-border-themecolor: accent1; mso-border-themetint: 153; mso-border-top-alt: solid #8EAADB .5pt; mso-border-top-themecolor: accent1; mso-border-top-themetint: 153; padding: 0cm 5.4pt 0cm 5.4pt;" valign="top">
2458484
#D9E2F3; border-bottom: solid #8EAADB 1.0pt; border-left: none; border-right: solid #8EAADB 1.0pt; border-top: none; mso-background-themecolor: accent1; mso-background-themetint: 51; mso-border-alt: solid #8EAADB .5pt; mso-border-bottom-themecolor: accent1; mso-border-bottom-themetint: 153; mso-border-left-alt: solid #8EAADB .5pt; mso-border-left-themecolor: accent1; mso-border-left-themetint: 153; mso-border-right-themecolor: accent1; mso-border-right-themetint: 153; mso-border-themecolor: accent1; mso-border-themetint: 153; mso-border-top-alt: solid #8EAADB .5pt; mso-border-top-themecolor: accent1; mso-border-top-themetint: 153; padding: 0cm 5.4pt 0cm 5.4pt;" valign="top">
2458485
#D9E2F3; border-bottom: solid #8EAADB 1.0pt; border-left: none; border-right: solid #8EAADB 1.0pt; border-top: none; mso-background-themecolor: accent1; mso-background-themetint: 51; mso-border-alt: solid #8EAADB .5pt; mso-border-bottom-themecolor: accent1; mso-border-bottom-themetint: 153; mso-border-left-alt: solid #8EAADB .5pt; mso-border-left-themecolor: accent1; mso-border-left-themetint: 153; mso-border-right-themecolor: accent1; mso-border-right-themetint: 153; mso-border-themecolor: accent1; mso-border-themetint: 153; mso-border-top-alt: solid #8EAADB .5pt; mso-border-top-themecolor: accent1; mso-border-top-themetint: 153; padding: 0cm 5.4pt 0cm 5.4pt;" valign="top">
1
#D9E2F3; border-bottom: solid #8EAADB 1.0pt; border-left: none; border-right: solid #8EAADB 1.0pt; border-top: none; mso-background-themecolor: accent1; mso-background-themetint: 51; mso-border-alt: solid #8EAADB .5pt; mso-border-bottom-themecolor: accent1; mso-border-bottom-themetint: 153; mso-border-left-alt: solid #8EAADB .5pt; mso-border-left-themecolor: accent1; mso-border-left-themetint: 153; mso-border-right-themecolor: accent1; mso-border-right-themetint: 153; mso-border-themecolor: accent1; mso-border-themetint: 153; mso-border-top-alt: solid #8EAADB .5pt; mso-border-top-themecolor: accent1; mso-border-top-themetint: 153; padding: 0cm 5.4pt 0cm 5.4pt;" valign="top">
TO_CHAR(….'J')

davidkurtz's picture

Data Warehouse Design Mistakes 1: Lack of Foreign Key Constraints

This post is part of a series that discusses some common issues in data warehouses.

What is the Purpose of Foreign Keys?

  • In the context of relational databases, a foreign key is a column, or group of columns, on one table that uniquely identifies a row in another table.  So, a foreign key on a child record uniquely identifies a row on the parent table.
  • If a foreign key constraint is enforced (which they are when the foreign key constraint is enabled, which they are by default) then you cannot insert a child for which you cannot find a parent, and you cannot delete a parent for which children exist.  This guarantees that data is, and remains, referentially integral.
A foreign key constraint can also permit the Oracle optimizer to perform certain optimisations in executing a SQL query.  
  • If you are querying a child and parent table (or if you prefer a fact and dimension table) without referencing any attribute on the parent (or dimension), then the optimizer can rely on the foreign key constraint to know that the parent row will always be present and so omit the parent table from the query.  This is called 'foreign key join elimination', and I will discuss this in more detail below.
I think that documenting relationships between tables through foreign keys helps developers build sensible SQL code that follows the data model, performs better, and is easier to understand and therefore easier to maintain.
If you have foreign key constraints defined in an OLTP system, it is typical to see them enabled exactly because they enforce referential integrity.
In a data warehouse there is less emphasis on revalidating data, but saving effort though join elimination is important.  Unfortunately, it is all too common to see downstream data warehouses without foreign key constraints at all.  It seems to be very easy to find an excuse not to build them. While researching this blog I came across this blog posting that I thought had the ring of truth of about it.

"9 Reasons Why There Are No Foreign Keys in Your Database" (Piotr Kononow)

  1. Performance: degrades DML performance as foreign keys are validated
  2. Legacy data is not referentially integral in the first place.
  3. Full Table Reload.  Should disable, reload, and then re-enable and revalidate constraints.
  4. High-Level Framework doesn't create foreign keys.
  5. Cross-Database relations
  6. Database platform agnosticism (eg. PeopleSoft)
  7. Open for Change
  8. Lazy Architect
  9. Table relationships are not clear/revealed.

I'll add one more unacceptable excuse this list.

  • Extracting referentially integral data from an OLTP system into the data warehouse, so there is no need for more foreign keys to revalidate it there again.

Foreign Key Join Elimination

If you are querying a child and parent table, without referencing any attribute on the parent, then the optimizer can rely on the foreign key constraint to know that the parent row will always be present and so omit the parent table from the query.  
Let's start with a very simple demonstration, on the Sales History demo schema, of a query of the fact table (SALES), and three dimension tables (PRODUCTS, TIMES, CUSTOMERS).  I am referencing attribute columns on the PRODUCT and TIMES tables, but nothing on the CUSTOMERS table other than the primary key column CUST_ID in a join predicate.

Without Foreign Key Constraints

I have disabled the foreign key constraints between these tables to model the situation without foreign key constraints.
#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; line-height: 95%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">ALTER TABLE sales MODIFY CONSTRAINT sales_channel_fk DISABLE NOVALIDATE;
ALTER TABLE sales MODIFY CONSTRAINT sales_customer_fk DISABLE NOVALIDATE;
ALTER TABLE sales MODIFY CONSTRAINT sales_product_fk DISABLE NOVALIDATE;
ALTER TABLE sales MODIFY CONSTRAINT sales_promo_fk DISABLE NOVALIDATE;
ALTER TABLE sales MODIFY CONSTRAINT sales_time_fk DISABLE NOVALIDATE;

This is a query for certain products (Electronics) in a single year (1999).

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; line-height: 95%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">SELECT  p.prod_category
, t.fiscal_year
, COUNT(*)
FROM sales s
, products p
, times t
, customers c
WHERE s.time_id = t.time_id
AND s.prod_id = p.prod_id
AND t.fiscal_year = 2001
AND c.cust_id = s.cust_id
AND p.prod_category = 'Electronics'
AND p.prod_category = 'Software/Other'
GROUP BY p.prod_category
, t.fiscal_year
ORDER BY 1
/
Throughout these tests, having set STATISTICS_LEVEL=ALL in order to collect additional runtime statistics, I have will extract execution plans, as follows

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">select * from table(dbms_xplan.display_cursor(null,null,'ADVANCED +ADAPTIVE PROJECTION +ALLSTATS LAST, IOSTATS'));

You can see that all 4 tables (or indexes on them) referenced in the SQL appear in the execution plan (in bold).

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 80%; line-height: 90%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | Pstart| Pstop | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 1167 (100)| | | | 1 |00:00:00.21 | 3386 | 472 | | | |
| 1 | SORT GROUP BY NOSORT | | 1 | 1 | 44 | 1167 (3)| 00:00:01 | | | 1 |00:00:00.21 | 3386 | 472 | | | |
| 2 | NESTED LOOPS | | 1 | 2840 | 122K| 1167 (3)| 00:00:01 | | | 2840 |00:00:00.21 | 3386 | 472 | | | |
| 3 | VIEW | VW_GBF_35 | 1 | 2840 | 108K| 1167 (3)| 00:00:01 | | | 2840 |00:00:00.19 | 544 | 472 | | | |
| 4 | HASH GROUP BY | | 1 | 2840 | 138K| 1167 (3)| 00:00:01 | | | 2840 |00:00:00.19 | 544 | 472 | 1137K| 1137K| 1403K (0)|
|* 5 | HASH JOIN | | 1 | 41362 | 2019K| 1163 (3)| 00:00:01 | | | 23678 |00:00:00.17 | 544 | 472 | 1695K| 1695K| 1571K (0)|
| 6 | PART JOIN FILTER CREATE | :BF0000 | 1 | 364 | 4368 | 16 (0)| 00:00:01 | | | 364 |00:00:00.01 | 55 | 0 | | | |
|* 7 | TABLE ACCESS FULL | TIMES | 1 | 364 | 4368 | 16 (0)| 00:00:01 | | | 364 |00:00:00.01 | 55 | 0 | | | |
|* 8 | HASH JOIN | | 1 | 165K| 6156K| 1146 (3)| 00:00:01 | | | 26637 |00:00:00.17 | 488 | 472 | 1572K| 1572K| 1390K (0)|
|* 9 | VIEW | index$_join$_002 | 1 | 13 | 273 | 2 (0)| 00:00:01 | | | 13 |00:00:00.01 | 5 | 0 | | | |
|* 10 | HASH JOIN | | 1 | | | | | | | 13 |00:00:00.01 | 5 | 0 | 1355K| 1355K| 1376K (0)|
|* 11 | INDEX RANGE SCAN | PRODUCTS_PROD_CAT_IX | 1 | 13 | 273 | 1 (0)| 00:00:01 | | | 13 |00:00:00.01 | 1 | 0 | | | |
| 12 | INDEX FAST FULL SCAN | PRODUCTS_PK | 1 | 13 | 273 | 1 (0)| 00:00:01 | | | 72 |00:00:00.01 | 4 | 0 | | | |
| 13 | PARTITION RANGE JOIN-FILTER| | 1 | 918K| 14M| 1136 (2)| 00:00:01 |:BF0000|:BF0000| 296K|00:00:00.13 | 482 | 472 | | | |
| 14 | TABLE ACCESS FULL | SALES | 5 | 918K| 14M| 1136 (2)| 00:00:01 |:BF0000|:BF0000| 296K|00:00:00.13 | 482 | 472 | | | |
|* 15 | INDEX UNIQUE SCAN | CUSTOMERS_PK | 2840 | 1 | 5 | 0 (0)| | | | 2840 |00:00:00.01 | 2842 | 0 | | | |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

With Foreign Key Constraints

I have re-enabled the foreign key constraints

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; line-height: 90%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">ALTER TABLE sales MODIFY CONSTRAINT sales_channel_fk enable validate;
ALTER TABLE sales MODIFY CONSTRAINT sales_customer_fk enable validate;
ALTER TABLE sales MODIFY CONSTRAINT sales_product_fk enable validate;
ALTER TABLE sales MODIFY CONSTRAINT sales_promo_fk enable validate;
ALTER TABLE sales MODIFY CONSTRAINT sales_time_fk enable validate;

Now the CUSTOMERS table is no longer referenced due to foreign key join elimination.  The nested loop operation (at line 2 of the previous plan) to join the CUSTOMERS table is no longer required because the query does not need anything from that dimension table.  The foreign key constraint tells the optimizer that there will always be a corresponding row for every SALES record.  Thus not joining to it does not change the result of the query.  There is one less operation requiring PGA memory, and the number of buffers required, and the optimizer cost has also dropped.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 80%; line-height: 90%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | Pstart| Pstop | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 1163 (100)| | | | 1 |00:00:00.87 | 544 | 472 | | | |
| 1 | SORT GROUP BY NOSORT | | 1 | 1 | 45 | 1163 (3)| 00:00:01 | | | 1 |00:00:00.87 | 544 | 472 | | | |
|* 2 | HASH JOIN | | 1 | 41362 | 1817K| 1163 (3)| 00:00:01 | | | 23678 |00:00:00.85 | 544 | 472 | 1695K| 1695K| 1669K (0)|
| 3 | PART JOIN FILTER CREATE | :BF0000 | 1 | 364 | 4368 | 16 (0)| 00:00:01 | | | 364 |00:00:00.01 | 55 | 0 | | | |
|* 4 | TABLE ACCESS FULL | TIMES | 1 | 364 | 4368 | 16 (0)| 00:00:01 | | | 364 |00:00:00.01 | 55 | 0 | | | |
|* 5 | HASH JOIN | | 1 | 165K| 5346K| 1146 (3)| 00:00:01 | | | 26637 |00:00:00.79 | 488 | 472 | 1572K| 1572K| 1331K (0)|
|* 6 | VIEW | index$_join$_002 | 1 | 13 | 273 | 2 (0)| 00:00:01 | | | 13 |00:00:00.01 | 5 | 0 | | | |
|* 7 | HASH JOIN | | 1 | | | | | | | 13 |00:00:00.01 | 5 | 0 | 1355K| 1355K| 1377K (0)|
|* 8 | INDEX RANGE SCAN | PRODUCTS_PROD_CAT_IX | 1 | 13 | 273 | 1 (0)| 00:00:01 | | | 13 |00:00:00.01 | 1 | 0 | | | |
| 9 | INDEX FAST FULL SCAN | PRODUCTS_PK | 1 | 13 | 273 | 1 (0)| 00:00:01 | | | 72 |00:00:00.01 | 4 | 0 | | | |
| 10 | PARTITION RANGE JOIN-FILTER| | 1 | 918K| 10M| 1136 (2)| 00:00:01 |:BF0000|:BF0000| 296K|00:00:00.43 | 482 | 472 | | | |
| 11 | TABLE ACCESS FULL | SALES | 5 | 918K| 10M| 1136 (2)| 00:00:01 |:BF0000|:BF0000| 296K|00:00:00.25 | 482 | 472 | | | |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Note
-----
- rely constraint used for this statement

From Oracle 12.2, when a constraint has been used to remove a table from an execution plan, you get a note in the execution plan "rely constraint used for this statement".  Though it doesn't tell you which table has been eliminated, nor due to which constraint.
Admittedly, the actual run time of the query has gone up because there is more work for the optimizer to do.  However, this is an unusually small test, so it has swamped the savings in not visiting the CUSTOMERS table.  Foreign key join elimination should generally deliver a performance improvement because the query processes less data.

With Disabled Reliable Constraints

If you don't want the overhead and complexity of enforced foreign key constraints, you can still get the benefit of foreign key join elimination.  It is possible to disable the constraint from SALES to CUSTOMER so that it is not enforced, but it can still tell the optimizer that it can RELY upon the referential integrity of the data as if the constraint were enforced.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">ALTER TABLE sales MODIFY CONSTRAINT sales_customer_fk RELY DISABLE NOVALIDATE;

This behaviour has been available since constraints were introduced in Oracle 8 (c. 1998), but there has been a change in Oracle 12c.  Foreign key join elimination on disabled RELY constraints does not occur if QUERY_REWRITE_INTEGRITY is set to its default value of ENFORCED.  It must be set to either TRUSTED or STALE_TOLERATED.  However, bear in mind that setting it to STALE_TOLERATED also affects how materialized views can be used by query rewrite.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">ALTER SESSION SET query_rewrite_integrity = TRUSTED;

And now we are back to the same plan as before without the CUSTOMERS table, and with the "rely constraint used for this statement" note.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 80%; line-height: 90%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | Pstart| Pstop | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 1164 (100)| | | | 1 |00:00:01.55 | 544 | 472 | | | |
| 1 | SORT GROUP BY NOSORT | | 1 | 1 | 45 | 1164 (3)| 00:00:01 | | | 1 |00:00:01.55 | 544 | 472 | | | |
|* 2 | HASH JOIN | | 1 | 81133 | 3565K| 1164 (3)| 00:00:01 | | | 110K|00:00:01.46 | 544 | 472 | 1476K| 1476K| 1528K (0)|
|* 3 | VIEW | index$_join$_002 | 1 | 26 | 546 | 2 (0)| 00:00:01 | | | 26 |00:00:00.01 | 5 | 0 | | | |
|* 4 | HASH JOIN | | 1 | | | | | | | 26 |00:00:00.01 | 5 | 0 | 1298K| 1298K| 1612K (0)|
|* 5 | INDEX RANGE SCAN | PRODUCTS_PROD_CAT_IX | 1 | 26 | 546 | 1 (0)| 00:00:01 | | | 26 |00:00:00.01 | 1 | 0 | | | |
| 6 | INDEX FAST FULL SCAN | PRODUCTS_PK | 1 | 26 | 546 | 1 (0)| 00:00:01 | | | 72 |00:00:00.01 | 4 | 0 | | | |
|* 7 | HASH JOIN | | 1 | 229K| 5369K| 1160 (3)| 00:00:01 | | | 246K|00:00:01.00 | 538 | 472 | 1695K| 1695K| 1683K (0)|
| 8 | PART JOIN FILTER CREATE | :BF0000 | 1 | 364 | 4368 | 16 (0)| 00:00:01 | | | 364 |00:00:00.01 | 55 | 0 | | | |
|* 9 | TABLE ACCESS FULL | TIMES | 1 | 364 | 4368 | 16 (0)| 00:00:01 | | | 364 |00:00:00.01 | 55 | 0 | | | |
| 10 | PARTITION RANGE JOIN-FILTER| | 1 | 918K| 10M| 1136 (2)| 00:00:01 |:BF0000|:BF0000| 296K|00:00:00.47 | 482 | 472 | | | |
| 11 | TABLE ACCESS FULL | SALES | 5 | 918K| 10M| 1136 (2)| 00:00:01 |:BF0000|:BF0000| 296K|00:00:00.28 | 482 | 472 | | | |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Note
-----
- rely constraint used for this statement

Multi-Column Foreign Key Join Elimination 

It has always been possible to create foreign keys on multiple columns, sometimes called composite foreign keys.  From Oracle 12.2, it is possible to get join elimination on composite foreign keys. Jonathan Lewis has published a blog with demonstration scripts.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; line-height: 90%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">ALTER TABLE child 
add constraint child_fk_parent foreign key (id_g, id_p)
references parent (id_g, id);

However, he has also found a case where join elimination depends on the order of tables in the from clause – (currently unpublished) bug 22228669.  The workaround is to list parents in the from clause before their children.
The need for single column keys inevitably leads to meaningless keys, often generated from a sequence.  Another of Jonathan's blogs and its comments discusses the pros and cons. 
Updated 19th November: It depends how important foreign key join elimination is to you.  How much use a system makes of this feature will depend upon how it is written.  If you think that you may need to use this feature then, at least until this bug is resolved, I would stick with single column keys in Oracle 12.2.

Summary of Good Practice

  • Primary keys (with unique indexes) on all tables.
  • Define foreign key constraints on all dimension columns of fact tables referencing the primary or unique keys on the dimension tables through equality joins only. 
    • I think that even from Oracle 12.2 I would still avoid creating multi-column primary and foreign keys because there appears to be a bug with this feature.
  • If foreign key constraints are to be enforced, then the key columns should also be indexed avoid TM locking during DML operations.
  • Otherwise, in order to achieve join elimination mark the constraints as reliable:
  • #eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">ALTER TABLE … MODIFY CONSTRAINT … RELY NOVALIDATE DISABLE
    • From Oracle 12 you must also set QUERY_REWRITE_INTEGRITY to TRUSTED or STALE_TOLERATED.

Richard Foote's picture

Presenting at AUSOUG Connect 2018 Conference in Melbourne, 21 November 2018.

  After initially not being in a position to make it this year, I will now be presenting at the AUSOUG Connect 2018 Conference in Melbourne this coming Wednesday, 21 November 2018. My presentation will be: “12c Release 2 and 18c – New Indexing Related Features” Oracle Database 12.2 and 18.3 releases have introduced a number […]

davidkurtz's picture

Data Warehouse Design Mistakes 2: Effective Dating Dimensions

This post is part of a series that discusses some common issues in data warehouses.

I have seen situations where some dimensions are effective-dated.  That is to say that there are multiple rows in the dimension table for the same main dimension id, but for different date ranges.  At least one of the date columns has to become part of the primary key.
The dimensions in the Sales History sample schema have got effective from and to dates, but these columns are not part of the primary key.  Let's imagine that the products get updated every year and a new version of the product is sold, requiring a new dimension row.

The Wrong Way

I am going to build a copy of the PRODUCTS table with a new effective dated row for each calendar year in which the product was sold.  The structure and indexing of the BAD_PROUCTS dimension table is unchanged, except that the primary key is not just on PROD_ID but now additionally includes PROD_EFF_FROM

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; line-height: 95%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">CREATE TABLE bad_products (
prod_id NUMBER(6) NOT NULL,
prod_name VARCHAR2(50) NOT NULL,
prod_desc VARCHAR2(4000) NOT NULL,
prod_subcategory VARCHAR2(50) NOT NULL,
prod_subcategory_id NUMBER NOT NULL,
prod_subcategory_desc VARCHAR2(2000) NOT NULL,
prod_category VARCHAR2(50) NOT NULL,
prod_category_id NUMBER NOT NULL,
prod_category_desc VARCHAR2(2000) NOT NULL,
prod_weight_class NUMBER(3) NOT NULL,
prod_unit_of_measure VARCHAR2(20) ,
prod_pack_size VARCHAR2(30) NOT NULL,
supplier_id NUMBER(6) NOT NULL,
prod_status VARCHAR2(20) NOT NULL,
prod_list_price NUMBER(8,2) NOT NULL,
prod_min_price NUMBER(8,2) NOT NULL,
prod_total VARCHAR2(13) NOT NULL,
prod_total_id NUMBER NOT NULL,
prod_src_id NUMBER ,
prod_eff_from DATE ,
prod_eff_to DATE ,
prod_valid VARCHAR2(1) )
/
ALTER TABLE bad_products
ADD CONSTRAINT bad_products_pk PRIMARY KEY (prod_id, prod_eff_from)
/

Now, I will populate BAD_PRODUCTS by copying PRODUCTS.  I will create rows for each product for each year in which it was sold.  I started with 72 rows in PRODUCTS, but I have 275 rows in BAD_PRODUCTS.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; line-height: 95%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">INSERT INTO bad_products
WITH s AS (
SELECT s.prod_id
, TRUNC(s.time_id,'YYYY') time_id
from sales s
GROUP BY s.prod_id, TRUNC(s.time_id,'YYYY')
)
SELECT p.prod_id
,p.prod_name
,p.prod_desc||' '||TO_CHAR(s.time_id,'YYYY')
,p.prod_subcategory
,p.prod_subcategory_id
,p.prod_subcategory_desc||' '||TO_CHAR(s.time_id,'YYYY')
,p.prod_category
,p.prod_category_id
,p.prod_category_desc||' '||TO_CHAR(s.time_id,'YYYY')
,p.prod_weight_class
,p.prod_unit_of_measure
,p.prod_pack_size
,p.supplier_id
,p.prod_status
,p.prod_list_price
,p.prod_min_price
,p.prod_total
,p.prod_total_id
,p.prod_src_id
,s.time_id PROD_EFF_FROM
,ADD_MONTHS(s.time_id,12)-1 PROD_EFF_TO
,p.prod_valid
from products p
, s
where s.prod_id = p.prod_id
/

Finally, I will build the same indexes as on the PRODUCTS table.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; line-height: 95%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">CREATE INDEX bad_products_prod_cat_ix ON bad_products (prod_category);
CREATE BITMAP INDEX bad_products_prod_status_bix ON bad_products (prod_status);
CREATE INDEX bad_products_prod_subcat_ix ON bad_products (prod_subcategory);

I can't now create a foreign key for the product on the SALES table.
I can’t use PROD_ID only because that does not match the primary key.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; line-height: 95%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">ALTER TABLE sales
ADD CONSTRAINT sales_bad_product_fk
FOREIGN KEY (prod_id) REFERENCES bad_products (prod_id)
;
ERROR at line 3:
ORA-02270: no matching unique or primary key for this column-list

I can't build the foreign key on PROD_ID and TIME_ID, because the TIME_IDs are merely inside the effective range but are not on the PRODUCTS table

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; line-height: 95%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">ALTER TABLE sales
ADD CONSTRAINT sales_bad_product_fk
FOREIGN KEY (prod_id, time_id) REFERENCES bad_products (prod_id, prod_eff_from)
;
ERROR at line 2:
ORA-02298: cannot validate (SH.SALES_BAD_PRODUCT_FK) - parent keys not found

When I come to query the data, I have no choice but to code two inequality conditions on the product table with one of them on a column that is not part of the foreign key.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; line-height: 95%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">
from sales s
, bad_products p
, times t
WHERE s.time_id = t.time_id
AND s.prod_id = p.prod_id
AND t.time_id >= p.prod_eff_from
AND (t.time_id <= p.prod_eff_to OR p.prod_eff_to IS NULL)

So, even on Oracle 12.2, even if I create the above multi-column constraint NOVALIDATE RELY and set QUERY_REWRITE_INTEGRITY to TRUSTED, I have no chance of being able to eliminate the product table with foreign key join elimination.

The Right Way 

If the product changes and we need a new dimension row for an updated product we will have a new product, with a new product ID, but will with effective dates for that year. Here, I have created an exact copy of the SALES table, called SALES2, but now the product ID is prefixed with the calendar year.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; line-height: 95%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">INSERT /*+APPEND*/ INTO sales2
SELECT
prod_id+1000*TO_NUMBER(TO_CHAR(time_id,'YYYY')) prod_id
,cust_id
,time_id
,channel_id
,promo_id
,quantity_sold
,amount_sold
FROM sales
/

And similarly, I have created a copy of the PRODUCTS dimension table, called PRODUCTS2 with additional product rows for each year that the product was sold, but again the product ID has been prefixed with the calendar year. The primary key is still just PROD_ID and all the indexes and foreign keys are exactly the same as on the original tables.  Both BAD_PRODUCTS and PRODUCTS2 have the 275 rows.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; line-height: 95%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">INSERT INTO products2
WITH s AS (
SELECT s.prod_id
, TRUNC(s.time_id,'YYYY') time_id
FROM sales s
GROUP BY s.prod_id, TRUNC(s.time_id,'YYYY')
)
SELECT p.PROD_ID+(1000*TO_NUMBER(TO_CHAR(s.time_id,'YYYY')))
,p.prod_name
,p.prod_desc||' '||TO_CHAR(s.time_id,'YYYY')
,p.prod_subcategory
,p.prod_subcategory_id
,p.prod_subcategory_desc||' '||TO_CHAR(s.time_id,'YYYY')
,p.prod_category
,p.prod_category_id
,p.prod_category_desc||' '||TO_CHAR(s.time_id,'YYYY')
,p.prod_weight_class
,p.prod_unit_of_measure
,p.prod_pack_size
,p.supplier_id
,p.prod_status
,p.prod_list_price
,p.prod_min_price
,p.prod_total
,p.prod_total_id
,p.prod_src_id
,s.time_id PROD_EFF_FROM
,ADD_MONTHS(s.time_id,12)-1 PROD_EFF_TO
,p.prod_valid
FROM products p
, s
WHERE s.prod_id = p.prod_id
/

This is effectively the same query on SALES2 and PRODUCTS2, and it returns the same results. I have kept the inequality criteria on the production effective dates with both SALES and TIMES, although they are not needed here. However, we will see that the way the tables are structured and accessed will have a cost implication.
This time I want to analyse software sales (product category 205) for the USA by fiscal year.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; line-height: 95%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">SELECT c.country_name
, p.prod_category_id
, p.prod_category
, t.fiscal_year
, COUNT(*) num_sales
, SUM(s.amount_sold) total_amount_sold
FROM sales2 s
, customers u
, products2 p
, times t
, countries c
WHERE s.time_id = t.time_id
AND s.prod_id = p.prod_id
AND u.cust_id = s.cust_id
AND u.country_id = c.country_id
AND c.country_iso_code = 'US'
AND p.prod_category_id = 205
AND t.time_id >= p.prod_eff_from
AND (t.time_id <= p.prod_eff_to OR p.prod_eff_to IS NULL)
AND s.time_id >= p.prod_eff_from
AND (s.time_id <= p.prod_eff_to OR p.prod_eff_to IS NULL)
GROUP BY c.country_name
, p.prod_category_id
, p.prod_category
, t.fiscal_year
ORDER BY 1,2
/

With Query Transformation

First the wrong way with the BAD_PROUCTS table

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 80%; line-height: 90%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">Plan hash value: 2522704620

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | Pstart| Pstop | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 2000 (100)| | | | 5 |00:00:06.60 | 305K| | | |
| 1 | TEMP TABLE TRANSFORMATION | | 1 | | | | | | | 5 |00:00:06.60 | 305K| | | |
| 2 | LOAD AS SELECT (CURSOR DURATION MEMORY) | SYS_TEMP_0FD9D7230_A4BC21 | 1 | | | | | | | 0 |00:00:00.08 | 1525 | 1024 | 1024 | |
|* 3 | HASH JOIN | | 1 | 2921 | 81788 | 418 (1)| 00:00:01 | | | 18520 |00:00:00.05 | 1524 | 1185K| 1185K| 721K (0)|
|* 4 | TABLE ACCESS FULL | COUNTRIES | 1 | 1 | 18 | 2 (0)| 00:00:01 | | | 1 |00:00:00.01 | 2 | | | |
| 5 | TABLE ACCESS FULL | CUSTOMERS | 1 | 55500 | 541K| 416 (1)| 00:00:01 | | | 55500 |00:00:00.02 | 1521 | | | |
| 6 | SORT GROUP BY | | 1 | 5 | 495 | 1582 (1)| 00:00:01 | | | 5 |00:00:06.52 | 303K| 2048 | 2048 | 2048 (0)|
|* 7 | HASH JOIN | | 1 | 11568 | 1118K| 1580 (1)| 00:00:01 | | | 237K|00:00:06.29 | 303K| 2290K| 1666K| 2158K (0)|
| 8 | TABLE ACCESS FULL | SYS_TEMP_0FD9D7230_A4BC21 | 1 | 2921 | 43815 | 4 (0)| 00:00:01 | | | 18520 |00:00:00.01 | 0 | | | |
|* 9 | HASH JOIN | | 1 | 11568 | 948K| 1576 (1)| 00:00:01 | | | 237K|00:00:06.18 | 303K| 1744K| 1744K| 1704K (0)|
| 10 | TABLE ACCESS FULL | TIMES | 1 | 1826 | 21912 | 16 (0)| 00:00:01 | | | 1826 |00:00:00.01 | 55 | | | |
|* 11 | HASH JOIN | | 1 | 30764 | 2163K| 1559 (1)| 00:00:01 | | | 237K|00:00:06.09 | 303K| 1115K| 1115K| 1352K (0)|
|* 12 | TABLE ACCESS FULL | BAD_PRODUCTS | 1 | 102 | 4284 | 5 (0)| 00:00:01 | | | 102 |00:00:00.01 | 14 | | | |
| 13 | VIEW | VW_ST_F7251F01 | 1 | 48359 | 1416K| 1554 (1)| 00:00:01 | | | 237K|00:00:05.02 | 303K| | | |
| 14 | NESTED LOOPS | | 1 | 48359 | 2361K| 1526 (1)| 00:00:01 | | | 237K|00:00:04.82 | 303K| | | |
| 15 | PARTITION RANGE ALL | | 1 | 48359 | 991K| 397 (2)| 00:00:01 | 1 | 28 | 237K|00:00:03.36 | 302K| | | |
| 16 | BITMAP CONVERSION TO ROWIDS | | 28 | 48359 | 991K| 397 (2)| 00:00:01 | | | 237K|00:00:03.15 | 302K| | | |
| 17 | BITMAP AND | | 28 | | | | | | | 16 |00:00:03.01 | 302K| | | |
| 18 | BITMAP MERGE | | 28 | | | | | | | 16 |00:00:00.04 | 1819 | 1024K| 512K|36864 (0)|
| 19 | BITMAP KEY ITERATION | | 28 | | | | | | | 1572 |00:00:00.03 | 1819 | | | |
| 20 | BUFFER SORT | | 28 | | | | | | | 2856 |00:00:00.01 | 9 | 73728 | 73728 | |
|* 21 | TABLE ACCESS BY INDEX ROWID BATCHED| BAD_PRODUCTS | 1 | 102 | 816 | 24 (0)| 00:00:01 | | | 102 |00:00:00.01 | 9 | | | |
| 22 | BITMAP CONVERSION TO ROWIDS | | 1 | | | | | | | 275 |00:00:00.01 | 1 | | | |
| 23 | BITMAP INDEX FULL SCAN | BAD_PRODUCTS_PROD_STATUS_BIX | 1 | | | | | | | 1 |00:00:00.01 | 1 | | | |
|* 24 | BITMAP INDEX RANGE SCAN | SALES_PROD_BIX | 2856 | | | | | 1 | 28 | 1572 |00:00:00.02 | 1810 | | | |
| 25 | BITMAP MERGE | | 28 | | | | | | | 16 |00:00:02.97 | 301K| 8316K| 1039K| 317K (0)|
| 26 | BITMAP KEY ITERATION | | 28 | | | | | | | 19186 |00:00:02.85 | 301K| | | |
| 27 | BUFFER SORT | | 28 | | | | | | | 296K|00:00:00.42 | 0 | 26M| 1871K| 865K (0)|
| 28 | TABLE ACCESS FULL | SYS_TEMP_0FD9D7230_A4BC21 | 5 | 2921 | 14605 | 4 (0)| 00:00:01 | | | 18520 |00:00:00.01 | 0 | | | |
|* 29 | BITMAP INDEX RANGE SCAN | SALES_CUST_BIX | 296K| | | | | 1 | 28 | 19186 |00:00:01.97 | 301K| | | |
| 30 | TABLE ACCESS BY USER ROWID | SALES | 237K| 1 | 29 | 1157 (1)| 00:00:01 | ROWID | ROWID | 237K|00:00:00.92 | 829 | | | |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

11 - access("ITEM_3"="P"."PROD_ID")
filter(("ITEM_1">="P"."PROD_EFF_FROM" AND ("ITEM_1"<="P"."PROD_EFF_TO" OR "P"."PROD_EFF_TO" IS NULL)))

And now the right way with PRODUCTS2. At first glance the plans are not very different, both plans do a star transformation, but the devil is in the detail.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 80%; line-height: 90%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | Pstart| Pstop | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 1207 (100)| | | | 5 |00:00:05.08 | 304K| | | |
| 1 | TEMP TABLE TRANSFORMATION | | 1 | | | | | | | 5 |00:00:05.08 | 304K| | | |
| 2 | LOAD AS SELECT (CURSOR DURATION MEMORY) | SYS_TEMP_0FD9D7231_A4BC21 | 1 | | | | | | | 0 |00:00:00.07 | 1525 | 1024 | 1024 | |
|* 3 | HASH JOIN | | 1 | 2921 | 81788 | 418 (1)| 00:00:01 | | | 18520 |00:00:00.05 | 1524 | 1185K| 1185K| 689K (0)|
|* 4 | TABLE ACCESS FULL | COUNTRIES | 1 | 1 | 18 | 2 (0)| 00:00:01 | | | 1 |00:00:00.01 | 2 | | | |
| 5 | TABLE ACCESS FULL | CUSTOMERS | 1 | 55500 | 541K| 416 (1)| 00:00:01 | | | 55500 |00:00:00.02 | 1521 | | | |
| 6 | SORT GROUP BY | | 1 | 5 | 435 | 789 (1)| 00:00:01 | | | 5 |00:00:05.01 | 302K| 2048 | 2048 | 2048 (0)|
|* 7 | HASH JOIN | | 1 | 17849 | 1516K| 787 (1)| 00:00:01 | | | 237K|00:00:04.83 | 302K| 2290K| 1666K| 2123K (0)|
| 8 | TABLE ACCESS FULL | SYS_TEMP_0FD9D7231_A4BC21 | 1 | 2921 | 43815 | 4 (0)| 00:00:01 | | | 18520 |00:00:00.01 | 0 | | | |
|* 9 | HASH JOIN | | 1 | 17849 | 1255K| 782 (1)| 00:00:01 | | | 237K|00:00:04.74 | 302K| 1744K| 1744K| 1704K (0)|
| 10 | TABLE ACCESS FULL | TIMES | 1 | 1826 | 21912 | 16 (0)| 00:00:01 | | | 1826 |00:00:00.01 | 55 | | | |
|* 11 | HASH JOIN | | 1 | 17849 | 1045K| 766 (1)| 00:00:01 | | | 237K|00:00:04.68 | 302K| 1298K| 1298K| 1658K (0)|
|* 12 | TABLE ACCESS FULL | PRODUCTS2 | 1 | 102 | 2856 | 5 (0)| 00:00:01 | | | 102 |00:00:00.01 | 14 | | | |
| 13 | VIEW | VW_ST_F7251F01 | 1 | 17937 | 560K| 761 (1)| 00:00:01 | | | 237K|00:00:04.18 | 302K| | | |
| 14 | NESTED LOOPS | | 1 | 17937 | 963K| 733 (1)| 00:00:01 | | | 237K|00:00:03.99 | 302K| | | |
| 15 | PARTITION RANGE ALL | | 1 | 17936 | 402K| 395 (1)| 00:00:01 | 1 | 28 | 237K|00:00:02.76 | 302K| | | |
| 16 | BITMAP CONVERSION TO ROWIDS | | 28 | 17936 | 402K| 395 (1)| 00:00:01 | | | 237K|00:00:02.60 | 302K| | | |
| 17 | BITMAP AND | | 28 | | | | | | | 16 |00:00:02.49 | 302K| | | |
| 18 | BITMAP MERGE | | 28 | | | | | | | 16 |00:00:00.03 | 1700 | 1024K| 512K| 9216 (0)|
| 19 | BITMAP KEY ITERATION | | 28 | | | | | | | 399 |00:00:00.02 | 1700 | | | |
| 20 | BUFFER SORT | | 28 | | | | | | | 2856 |00:00:00.01 | 9 | 73728 | 73728 | |
|* 21 | TABLE ACCESS BY INDEX ROWID BATCHED| PRODUCTS2 | 1 | 102 | 1020 | 24 (0)| 00:00:01 | | | 102 |00:00:00.01 | 9 | | | |
| 22 | BITMAP CONVERSION TO ROWIDS | | 1 | | | | | | | 275 |00:00:00.01 | 1 | | | |
| 23 | BITMAP INDEX FULL SCAN | PRODUCTS2_PROD_STATUS_BIX | 1 | | | | | | | 1 |00:00:00.01 | 1 | | | |
|* 24 | BITMAP INDEX RANGE SCAN | SALES2_PROD_BIX | 2856 | | | | | 1 | 28 | 399 |00:00:00.01 | 1691 | | | |
| 25 | BITMAP MERGE | | 28 | | | | | | | 16 |00:00:02.46 | 300K| 8400K| 1050K| 312K (0)|
| 26 | BITMAP KEY ITERATION | | 28 | | | | | | | 19192 |00:00:02.36 | 300K| | | |
| 27 | BUFFER SORT | | 28 | | | | | | | 296K|00:00:00.33 | 0 | 26M| 1871K| 865K (0)|
| 28 | TABLE ACCESS FULL | SYS_TEMP_0FD9D7231_A4BC21 | 5 | 2921 | 14605 | 4 (0)| 00:00:01 | | | 18520 |00:00:00.01 | 0 | | | |
|* 29 | BITMAP INDEX RANGE SCAN | SALES2_CUST_BIX | 296K| | | | | 1 | 28 | 19192 |00:00:01.63 | 300K| | | |
| 30 | TABLE ACCESS BY USER ROWID | SALES2 | 237K| 1 | 32 | 366 (1)| 00:00:01 | ROWID | ROWID | 237K|00:00:00.76 | 830 | | | |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

9 - access("ITEM_3"="T"."TIME_ID")
11 - access("ITEM_2"="P"."PROD_ID")

The key differences are:

  • At line 24 of both plans. 1572 rows are returned from the SALES_PROD_BIX index because it is returning every effective dated software product, but only 399 rows are returned from SALES2_PROD_BID because the PROD_ID is different for every fiscal year.
  • At line 11, hashing the BAD_PRODUCTS table took 1.07s because it was necessary to evaluate the inequality conditions, whereas it only took 0.50s to hash PRODUCTS2 with a simple equality join. 
  • The optimizer cost with PRODUCTS2 is much lower, and the overall response time is 5.08s instead of 6.60s.

Without Query Transformation 

If I repeat the test without star transformation enabled, I again see differences in the hash joining of the product dimensions. First on BAD_PRODUCTS

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 80%; line-height: 90%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">Plan hash value: 2324880021

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | Pstart| Pstop | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 1592 (100)| | | | 5 |00:00:01.94 | 5805 | 4128 | | | |
| 1 | SORT GROUP BY | | 1 | 11 | 1144 | 1592 (3)| 00:00:01 | | | 5 |00:00:01.94 | 5805 | 4128 | 2048 | 2048 | 2048 (0)|
|* 2 | HASH JOIN | | 1 | 9845 | 999K| 1590 (3)| 00:00:01 | | | 237K|00:00:01.74 | 5805 | 4128 | 1744K| 1744K| 1620K (0)|
| 3 | PART JOIN FILTER CREATE | :BF0000 | 1 | 1826 | 21912 | 16 (0)| 00:00:01 | | | 1826 |00:00:00.01 | 55 | 0 | | | |
| 4 | TABLE ACCESS FULL | TIMES | 1 | 1826 | 21912 | 16 (0)| 00:00:01 | | | 1826 |00:00:00.01 | 55 | 0 | | | |
|* 5 | HASH JOIN | | 1 | 26182 | 2352K| 1574 (3)| 00:00:01 | | | 237K|00:00:01.65 | 5700 | 4128 | 1115K| 1115K| 1390K (0)|
|* 6 | TABLE ACCESS FULL | BAD_PRODUCTS | 1 | 102 | 4284 | 5 (0)| 00:00:01 | | | 102 |00:00:00.01 | 14 | 0 | | | |
|* 7 | HASH JOIN | | 1 | 48360 | 2361K| 1568 (3)| 00:00:01 | | | 526K|00:00:01.07 | 5685 | 4128 | 2290K| 1666K| 2216K (0)|
|* 8 | HASH JOIN | | 1 | 2921 | 81788 | 418 (1)| 00:00:01 | | | 18520 |00:00:00.04 | 1524 | 0 | 1236K| 1236K| 771K (0)|
|* 9 | TABLE ACCESS FULL | COUNTRIES | 1 | 1 | 18 | 2 (0)| 00:00:01 | | | 1 |00:00:00.01 | 2 | 0 | | | |
| 10 | TABLE ACCESS FULL | CUSTOMERS | 1 | 55500 | 541K| 416 (1)| 00:00:01 | | | 55500 |00:00:00.02 | 1521 | 0 | | | |
| 11 | PARTITION RANGE JOIN-FILTER| | 1 | 918K| 19M| 1142 (3)| 00:00:01 |:BF0000|:BF0000| 918K|00:00:00.77 | 4160 | 4128 | | | |
| 12 | TABLE ACCESS FULL | SALES | 20 | 918K| 19M| 1142 (3)| 00:00:01 |:BF0000|:BF0000| 918K|00:00:00.76 | 4160 | 4128 | | | |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("S"."TIME_ID"="T"."TIME_ID")
filter((("T"."TIME_ID"<="P"."PROD_EFF_TO" OR "P"."PROD_EFF_TO" IS NULL) AND "T"."TIME_ID">="P"."PROD_EFF_FROM"))
5 - access("S"."PROD_ID"="P"."PROD_ID")
filter(("S"."TIME_ID">="P"."PROD_EFF_FROM" AND ("S"."TIME_ID"<="P"."PROD_EFF_TO" OR "P"."PROD_EFF_TO" IS NULL)))

And then on PRODUCTS2

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 80%; line-height: 90%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">Plan hash value: 1297664275

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | Pstart| Pstop | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 875 (100)| | | | 5 |00:00:00.92 | 3126 | | | |
| 1 | SORT GROUP BY | | 1 | 11 | 1012 | 875 (4)| 00:00:01 | | | 5 |00:00:00.92 | 3126 | 2048 | 2048 | 2048 (0)|
|* 2 | HASH JOIN | | 1 | 17849 | 1603K| 873 (4)| 00:00:01 | | | 237K|00:00:00.75 | 3126 | 1744K| 1744K| 1621K (0)|
| 3 | PART JOIN FILTER CREATE | :BF0000 | 1 | 1826 | 21912 | 16 (0)| 00:00:01 | | | 1826 |00:00:00.01 | 55 | | | |
| 4 | TABLE ACCESS FULL | TIMES | 1 | 1826 | 21912 | 16 (0)| 00:00:01 | | | 1826 |00:00:00.01 | 55 | | | |
|* 5 | HASH JOIN | | 1 | 17849 | 1394K| 857 (4)| 00:00:01 | | | 237K|00:00:00.70 | 3070 | 1298K| 1298K| 1703K (0)|
|* 6 | TABLE ACCESS FULL | PRODUCTS2 | 1 | 102 | 2856 | 5 (0)| 00:00:01 | | | 102 |00:00:00.01 | 14 | | | |
|* 7 | HASH JOIN | | 1 | 48360 | 2455K| 852 (4)| 00:00:01 | | | 526K|00:00:00.59 | 3055 | 2290K| 1666K| 2216K (0)|
|* 8 | HASH JOIN | | 1 | 2921 | 81788 | 418 (1)| 00:00:01 | | | 18520 |00:00:00.04 | 1524 | 1236K| 1236K| 774K (0)|
|* 9 | TABLE ACCESS FULL | COUNTRIES | 1 | 1 | 18 | 2 (0)| 00:00:01 | | | 1 |00:00:00.01 | 2 | | | |
| 10 | TABLE ACCESS FULL | CUSTOMERS | 1 | 55500 | 541K| 416 (1)| 00:00:01 | | | 55500 |00:00:00.03 | 1521 | | | |
| 11 | PARTITION RANGE JOIN-FILTER| | 1 | 918K| 21M| 426 (6)| 00:00:01 |:BF0000|:BF0000| 918K|00:00:00.35 | 1530 | | | |
| 12 | TABLE ACCESS FULL | SALES2 | 20 | 918K| 21M| 426 (6)| 00:00:01 |:BF0000|:BF0000| 918K|00:00:00.35 | 1530 | | | |

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  • One difference is where we hash join the product dimension. BAD_PRODUCTS takes 0.58s, PRODUCTS2 takes just 0.11s.
  • The Bloom filter on SALES2 takes just 0.35s, while the filter on SALES takes 0.76s and requires fewer buffers because the PROD_ID equality join is more selective.

Summary 

  • Effective dating leads to inequality conditions between fact and dimension tables.
    • Can't express inequalities in foreign key – it has to be an equality join to primary key/unique key.
    • It is impossible to achieve foreign key join elimination on these dimensions.
    • You have to do more work evaluating the inequality conditions when you eventually hash join these dimensions into the result set. 
    • Bloom Filters only work with equality predicates. They rough match data, but there can be false positive results. The subsequent hash join does an exact match on a reduced volume of data. The Bloom cannot reduce the data on the inequality conditions, so they return more data to the hash operation that then takes longer to execute.
  • Conclusion
    • You should rigorously follow the star schema principles so you can always define a foreign key between facts and dimensions.
Jonathan Lewis's picture

num_index_keys

The title is the name of an Oracle hint that came into existence in Oracle 10.2.0.3 and made an appearance recently in a question on the rarely used “My Oracle Support” Community forum (you’ll need a MOS account to be able to read the original). I wouldn’t have found it but the author also emailed me the link asking if I could take a look at it.  (If you want to ask me for help – without paying me, that is – then posting a public question in the Oracle (ODC) General Database or SQL forums and emailing me a private link is the strategy most likely to get an answer, by the way.)

The question was about a very simple query using a straightforward index – with a quirky change of plan after upgrading from 10.2.0.3 to 12.2.0.1. Setting the optimizer_features_enable to ‘10.2.0.3’ in the 12.2.0.1 system re-introduced the 10g execution plan. Here’s the query:


SELECT t1.*
   FROM   DW1.t1
  WHERE   t1.C1 = '0001' 
    AND   t1.C2 IN ('P', 'F', 'C')
    AND   t1.C3 IN (
                    '18110034450001',
                    '18110034450101',
                    '18110034450201',
                    '18110034450301',
                    '18110034450401',
                    '18110034450501'
          );
 

Information supplied: t1 holds about 500 million rows at roughly 20 rows per block, the primary key index is (c1, c2, c3, c4), there are just a few values for each of c1, c2 and c4, while c3 is “nearly unique” (which, for clarity, was expanded to “the number of distinct values of c3 is virtually the same as the number of rows in the table”).

At the moment we don’t have any information about histograms and we don’t known whether or not “nearly unique” might still allow a few values of c3 to have a large number of duplicates, so that’s something we might want to follow up on later.

Here are the execution plans – the fast one (from 10g) first, then the slow (12c) plan – and you should look carefully at the predicate section of the two plans:


10g (pulled from memory with rowsource execution statistics enabled)
--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |      1 |        |      6 |00:00:00.01 |      58 |      5 |
|   1 |  INLIST ITERATOR             |                  |      1 |        |      6 |00:00:00.01 |      58 |      5 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1               |     18 |      5 |      6 |00:00:00.01 |      58 |      5 |
|*  3 |    INDEX RANGE SCAN          | PK_T1            |     18 |      5 |      6 |00:00:00.01 |      52 |      4 |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."C1"='0001' AND (("T1"."C2"='C' OR "T1"."C2"='F' OR
              "T1"."C2"='P')) AND (("C3"='18110034450001' OR "C3"='18110034450101' OR
              "C3"='18110034450201' OR "C3"='18110034450301' OR "C3"='18110034450401' OR
              "C3"='18110034450501')))

 

12c (from explain plan)
---------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                  |     1 |   359 |     7   (0)| 00:00:01 |
|   1 |  INLIST ITERATOR                     |                  |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1               |     1 |   359 |     7   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | PK_T1            |     1 |       |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."C1"='0001' AND ("T1"."C2"='C' OR "T1"."C2"='F' OR
              "T1"."C2"='P'))
       filter("C3"='18110034450001' OR "C3"='18110034450101' OR
              "C3"='18110034450201' OR "C3"='18110034450301' OR
              "C3"='18110034450401' OR "C3"='18110034450501')
  

When comparing plans it’s better, of course, to present the same sources from the two systems, it’s not entirely helpful to have the generated plan from explain plan in one version and a run-time plan with stats in the other – given the choice I’d like to see the run-time from both. Despite this, I felt fairly confident that the prediction would match the run-time for 12c and that I could at least guess the “starts” figure for 12c.

The important thing to notice is the way that the access predicate in 10g has split into an access predicate followed by a filter predicate in 12c. So 12c is going to iterate three times (once for each of the values  ‘C’, ‘F’, ‘P’) and then walk a potentially huge linked list of index leaf blocks looking for 6 values of c3, while 10g is going to probe the index 18 times (3 combinations of c2 x six combinations of c3) to find “nearly unique” rows which means probably one leaf block per probe.

The 12c plan was taking minutes to run, the 10g plan was taking less than a second. The difference in execution time was probably the effect of the 12c plan ranging through (literally) thousands of index leaf blocks.

There are many bugs and anomalies relating to in-list iteration and index range scans and cardinality calculations – here’s a quick sample of v$system_fix_control in 12.2.0.1:


select optimizer_feature_enable ofe, sql_feature, bugno, description
from v$system_fix_control
where
	optimizer_feature_enable between '10.2.0.4' and '12.2.0.1'
and	(   sql_feature like '%CBO%'
	 or sql_feature like '%CARDINALITY%'
	)
and	(    lower(description) like '%list%'
	 or  lower(description) like '%iterat%'
	 or  lower(description) like '%multi%col%'
	)
order by optimizer_feature_enable, sql_feature, bugno
;

OFE        SQL_FEATURE                      BUGNO DESCRIPTION
---------- --------------------------- ---------- ----------------------------------------------------------------
10.2.0.4   QKSFM_CBO_5259048              5259048 undo unused inlist
           QKSFM_CBO_5634346              5634346 Relax equality operator restrictions for multicolumn inlists

10.2.0.5   QKSFM_CBO_7148689              7148689 Allow fix of bug 2218788 for in-list predicates

11.1.0.6   QKSFM_CBO_5139520              5139520 kkoDMcos: For PWJ on list dimension, use part/subpart bits

11.2.0.1   QKSFM_CBO_6818410              6818410 eliminate redundant inlist predicates

11.2.0.2   QKSFM_CBO_9069046              9069046 amend histogram column tracking for multicolumn stats

11.2.0.3   QKSFM_CARDINALITY_11876260    11876260 use index filter inlists with extended statistics
           QKSFM_CBO_10134677            10134677 No selectivity for transitive inlist predicate from equijoin
           QKSFM_CBO_11834739            11834739 adjust NDV for list partition key column after pruning
           QKSFM_CBO_11853331            11853331 amend index cost compare with inlists as filters
           QKSFM_CBO_12591120            12591120 check inlist out-of-range values with extended statistics

11.2.0.4   QKSFM_CARDINALITY_12828479    12828479 use dynamic sampling cardinality for multi-column join key check
           QKSFM_CARDINALITY_12864791    12864791 adjust for NULLs once for multiple inequalities on nullable colu
           QKSFM_CARDINALITY_13362020    13362020 fix selectivity for skip scan filter with multi column stats
           QKSFM_CARDINALITY_14723910    14723910 limit multi column group selectivity due to NDV of inlist column
           QKSFM_CARDINALITY_6873091      6873091 trim histograms based on in-list predicates
           QKSFM_CBO_13850256            13850256 correct estimates for transitive inlist predicate with equijoin

12.2.0.1   QKSFM_CARDINALITY_19847091    19847091 selectivity caching for inlists
           QKSFM_CARDINALITY_22533539    22533539 multi-column join sanity checks for table functions
           QKSFM_CARDINALITY_23019286    23019286 Fix cdn estimation with multi column stats on fixed data types
           QKSFM_CARDINALITY_23102649    23102649 correction to inlist element counting with constant expressions
           QKSFM_CBO_17973658            17973658 allow partition pruning due to multi-inlist iterator
           QKSFM_CBO_21057343            21057343 order predicate list
           QKSFM_CBO_22272439            22272439 correction to inlist element counting with bind variables

There are also a number of system parameters relating to inlists that are new (or have changed values) in 12.2.0.1 when compared with 10.2.0.3 – but I’m not going to go into those right now.

I was sufficiently curious about this anomaly that I emailed the OP to say I would be happy to take a look at the 10053 trace files for the query – the files probably weren’t going to be very large given that it was only a single table query – but in the end it turned out that I solved the problem before he’d had time to email them. (Warning – don’t email me a 10053 file on spec; if I want one I’ll ask for it.)

Based on the description I created an initial model of the problem – it took about 10 minutes to code:


rem     Tested on 12.2.0.1, 18.3.0.1

drop table t1 purge;

create table t1 (
	c1 varchar2(4) not null,
	c2 varchar2(1) not null,
	c3 varchar2(15) not null,
	c4 varchar2(4)  not null,
	v1 varchar2(250)
)
;

insert into t1
with g as (
	select rownum id 
	from dual
	connect by level <= 1e4 -- > hint to avoid wordpress format issue
)
select
	'0001',
	chr(65 + mod(rownum,11)),
	'18110034'||lpad(1+100*rownum,7,'0'),
	lpad(mod(rownum,9),4,'0'),
	rpad('x',250,'x')
from
	g,g
where
        rownum <= 1e5 -- > hint to avoid wordpress format issue
;


create unique index t1_i1 on t1(c1, c2, c3, c4);

begin
        dbms_stats.gather_table_stats(
                null,
                't1',
                method_opt => 'for all columns size 1'
        );
end;
/

alter session set statistics_level = all;
set serveroutput off

prompt	==========================
prompt	Default optimizer features
prompt	==========================

select
        /*+ optimizer_features_enable('12.2.0.1') */
	t1.*
FROM	t1
WHERE
	t1.c1 = '0001' 
AND	t1.c2 in ('H', 'F', 'C')
AND	t1.c3 in (
		'18110034450001',
		'18110034450101',
		'18110034450201',
		'18110034450301',
		'18110034450401',
		'18110034450501'
	)
;

select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last'));

select 
        /*+ optimizer_features_enable('10.2.0.3') */
	t1.*
FROM	t1
WHERE
	t1.c1 = '0001' 
AND	t1.c2 in ('H', 'F', 'C')
AND	t1.c3 in (
		'18110034450001',
		'18110034450101',
		'18110034450201',
		'18110034450301',
		'18110034450401',
		'18110034450501'
	)
;

select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last'));

alter session set statistics_level = all;
set serveroutput off

The two queries produced the same plan – regardless of the setting for optimizer_features_enable – it was the plan originally used by the OP’s 10g setting:


-------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |      1 |        |    20 (100)|      0 |00:00:00.01 |      35 |
|   1 |  INLIST ITERATOR             |       |      1 |        |            |      0 |00:00:00.01 |      35 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1    |     18 |      2 |    20   (0)|      0 |00:00:00.01 |      35 |
|*  3 |    INDEX RANGE SCAN          | T1_I1 |     18 |      2 |    19   (0)|      0 |00:00:00.01 |      35 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."C1"='0001' AND (("T1"."C2"='C' OR "T1"."C2"='F' OR "T1"."C2"='H')) AND
              (("T1"."C3"='18110034450001' OR "T1"."C3"='18110034450101' OR "T1"."C3"='18110034450201' OR
              "T1"."C3"='18110034450301' OR "T1"."C3"='18110034450401' OR "T1"."C3"='18110034450501')))

There was one important difference between the 10g and the 12c plans – in 10g the cost of the table access (hence the cost of the total query) was 20; in 12c it jumped to 28 – maybe there’s a change in the arithmetic for costing the iterator, and maybe that’s sufficient to cause a problem.

Before going further it’s worth checking what the costs would look like (and, indeed, if the plan is possible in both versions) if we force Oracle into the “bad” plan. That’s where we finally get to the hint in the title of this piece. If I add the hint /*+ num_index_keys(t1 t1_i1 2) */ what’s going to happen ? (Technically I’ve included a hint to use the index, and specified the query block name to make sure Oracle doesn’t decide to switch to a tablescan):


select
        /*+
            optimizer_features_enable('12.2.0.1')
            index_rs_asc(@sel$1 t1@sel$1 (t1.c1 t1.c2 t1.c3 t1.c4))
            num_index_keys(@sel$1 t1@sel$1 t1_i1 2)
        */
        t1.*
FROM        t1
WHERE
        t1.c1 = '0001'
AND        t1.c2 in ('H', 'F', 'C')
AND        t1.c3 in (
                '18110034450001',
                '18110034450101',
                '18110034450201',
                '18110034450301',
                '18110034450401',
                '18110034450501'
        )
;

------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |      1 |        |   150 (100)|      0 |00:00:00.01 |     154 |      1 |
|   1 |  INLIST ITERATOR                     |       |      1 |        |            |      0 |00:00:00.01 |     154 |      1 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      3 |     18 |   150   (2)|      0 |00:00:00.01 |     154 |      1 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |      3 |     18 |   142   (3)|      0 |00:00:00.01 |     154 |      1 |
------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."C1"='0001' AND (("T1"."C2"='C' OR "T1"."C2"='F' OR "T1"."C2"='H')))
       filter(("T1"."C3"='18110034450001' OR "T1"."C3"='18110034450101' OR "T1"."C3"='18110034450201' OR
              "T1"."C3"='18110034450301' OR "T1"."C3"='18110034450401' OR "T1"."C3"='18110034450501'))

This was the plan from 12.2.0.1 – and again the plan for 10.2.0.3 was identical except for costs which became 140 for the index range scan and 141 for the table access. At first sight it looks like 10g may be using the total selectivity of the entire query as the scaling factor for the index clustering_factor to find the table cost while 12c uses the cost of accessing the table for one iteration (rounding up) before multiplying by the number of iterations.

Having observed this detail I thought I’d do a quick test of what happened by default if I requested 145 distinct values of c3. Both versions defaulted to the access/filter path rather than the pure access path – but again there was a difference in costs. The 10g index cost was 140 with a table access cost of 158, while 12c had an index cost of 179 and a table cost of 372. So both versions switch plans at some point – do they switch at the same point ? Reader, I could not resist temptation, so I ran a test loop. With my data set the 12c version switched paths at 61 values in the in-list and 10g switched at 53 values –

Conclusion: there’s been a change in the selectivity calculations for the use of in-list iterators, which leads to a change in costs, which can lead to a change in plans; the OP was just unlucky with his data set and stats. Possibly there’s something about his data or stats that makes the switch appear with a much smaller in-list than mine.

Footnote:

When I responded to the thread on MOSC with the suggestion that the problem was in part due to statistics and might be affected by out of date stats (or a histogram on the (low-frequency) c2 column) the OP noted that stats hadn’t been gathered since some time in August – and found that the 12c path changed to the efficient (10g) one after re-gathering stats on the table.

 

Richard Foote's picture

Enable Index To Search For NULLs By Adding Constant to Index List. But Some Constants Better Than Others (Never Let Me Down)

By default, Oracle doesn’t index an entry if all columns within the index are NULL. However, (as I’ve blogged previously), it’s possible to index all possible NULL values by simply adding a constant value to the index column list. Importantly, the CBO knows when a column has all it’s NULL values indexed and can potentially […]

fritshoogland's picture

opatch investigations

This blogpost is about opatch and how to obtain information about the current oracle home(s), and how to obtain information about the patches to be applied.

Patches that can be applied using opatch are provided by oracle as zip files which have the following naming convention:
p[patchnumber]_[baseversion]_[platform]-[architecture].zip. The patch normally contains an XML file called ‘PatchSearch.xml’ and a directory with the patch number. Inside the patch number directory there is a README.txt which is lame, because it says ‘Refer to README.html’, and a README.html that contains the readme information that is also visible when the [README] button for this patch is selected in MOS.

I spend my time on the CLI exclusively. This is because I spend my time on remote servers all the time, and using the X window system would be unusable. The best part of using the CLI is that when done correctly, it gives you almost infinite control over what you do, while when clicking through an interface toggling selections and filling out fields makes you entering information that you have to copy from a document or make up on the spot, which then is quickly hidden by entering another tab or window. In fact, based on my experience, this is a guaranteed way of generating wrong or inconsistent results.

Because of being on the CLI almost exclusively on servers, I sometimes need to read the README.html. I can do that in MOS, but sometimes you want that information on the spot. You can open up the HTML file in ‘vim’ or ‘less’, but it will show you a lot of HTML making it very hard to read. What I find useful is installing an executable called ‘elinks’ (available on oracle linux via yum), and then read the README.html in this text based browser:

$ elinks README.html

Result:

                                             Oracle® Database Patch % psuBug % - Database Release Update % product_version % (1/12)
                                                       Go to primary content

   Patch 28655784 - Database Release Update 18.4.0.0.181016

   ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

   The script content on this page is for navigation purposes only and does not alter the content in any way.

                                                          Oracle® Database

   Patch 28655784 - Database Release Update 18.4.0.0.181016

   This document is accurate at the time of release. For any changes and additional information regarding Database Release Update
   18.4.0.0.181016, see these related documents that are available at My Oracle Support (http://support.oracle.com/):

     * Document 2433586.1 Oracle DB/GI/OJVM Update/Revision R18 Oct 2018 Known Issues

   This document includes the following sections:

     * Section 1, "Patch Information"

     * Section 2, "Prerequisites"

     * Section 3, "Installation"

     * Section 4, "Deinstallation"

     * Section 5, "Known Issues"

     * Section 6, "References"

     * Section 7, "Bugs Fixed by This Patch"

     * Section 8, "Documentation Accessibility"

1 Patch Information

   Database Release Update 18.4.0.0.181016 patches are cumulative. That is, the content of all previous Database bundles is
   included in the latest Database bundle patch.

   To install the Database Release Update 18.4.0.0.181016 patch, the Oracle home must have the 18.1 Database installed.
http://support.oracle.com/                                                                                                 [------]

The first thing I do, is obtain information about this patch. Based on my experience, very carefully examine the readme. It contains vital information about the patch, but I found inconsistencies in it in the past. I got the impression the readme’s of patches that are created with a certain interval are simply copied from a previous version and then corrected by hand, and things might be entered wrong or are forgotten and not touched, leaving wrong information.

opatch lspatches on an unzipped patch

Luckily, a lot of information in the readme is actually stored in the metadata of the patch. You can query the metadata of the patch to be applied using opatch lspatches by pointing it to the patch directory:
(in this example I am querying patch 28655784, which is RU 18.4, and my database oracle home is version 18.3)

$ /u01/app/oracle/product/18.3.0.0/dbhome_1/OPatch/opatch lspatches 28655784/
patch_id:28655784
unique_patch_id:22509982
date_of_patch:8 Oct 2018, 21:27:28 hrs PST8PDT
patch_description:Database Release Update : 18.4.0.0.181016 (28655784)
component:oracle.rdbms.rsf.ic,18.0.0.0.0,optional; oracle.oracore.rsf,18.0.0.0.0,optional; oracle.ctx.atg,18.0.0.0.0,optional; oracle.rdbms.rman,18.0.0.0.0,optional; oracle.rdbms.rsf,18.0.0.0.0,optional; oracle.sdo.locator.jrf,18.0.0.0.0,optional; ....
platform:226,Linux x86-64
executable:ORACLE_HOME/lib/libclntsh.so.18.1; ORACLE_HOME/lib/libasmclntsh18.so; ORACLE_HOME/lib/libskgxp11.so; ORACLE_HOME/lib/libskgxp18.so; ORACLE_HOME/lib/libsqlplus.so; ORACLE_HOME/bin/oracle; ...
instance_shutdown:true
online_rac_installable:true
patch_type:singleton
product_family:db
auto:false
bug:28571483, TRACKING BUG TO REGRESS ALL BLR/CIS OF 27502420
...

(edited for brevity)
Line 2: patch_id: patch number
Line 5: patch_description: description </p />
</p></div>
    <div class=»

To prevent automated spam submissions leave this field empty.