Kerry Osborne's picture

How to Lock SQL Profiles Generated by SQL Tuning Advisor

I’ve mentioned (many times) that I think SQL Profiles that are generated by the SQL Tuning Advisor (STA) tend to sour over time.

After seeing it happen at a few sites I began to wonder why. So first a few facts about the SQL Profiles that STA generates:

  1. They are simply a set hints that get applied to statements behind the scenes during parsing
  2. They consist mainly of OPT_ESTIMATE hints which modify optimizer calculations
  3. They also may contain direct statistics modification hints (COLUMN_STATS, TABLE_STATS)
  4. They usually contain a OPTIMIZER_FEATURES_ENABLED hint
  5. They very occasionally contain other environment type hints (FIRST_ROWS, etc…)
  6. They do not contain directive hints (FULL, INDEX, NESTED_LOOP, etc..)
  7. The names of STA profiles start with SYS_SQLPROF
  8. STA’s goal is to do a more through job of analyzing a SQL statement to get a better plan

I wrote a little query (sql_profile_distinct_hint.sql) to pull a list of hints from a 10g database along with the number of their occurrences and ran it on several production systems where STA Profiles had been created. Here’s the output from a  system that had 14 STA Profiles.

SQL> @sql_profile_distinct_hints
Enter value for profile_name: SYS_SQLPROF%
HINT                                                 COUNT(*)
-------------------------------------------------- ----------
COLUMN_STATS                                               13
FIRST_ROWS                                                  1
IGNORE_OPTIM_EMBEDDED_HINTS                                 1
INDEX_STATS                                                 1
OPTIMIZER_FEATURES_ENABLE                                  14
OPT_ESTIMATE                                              178
TABLE_STATS                                                 2

Notice that the vast majority of hints are of the OPT_ESTIMATE variety. Now let’s have a look at the actual hints contained in a STA Profile.

SYS@LAB112> @sql_profile_hints
Enter value for profile_name: SYS_SQLPROF_0126f1743c7d0005
OPT_ESTIMATE(@"SEL$86A1760A", TABLE, "A"@"SEL$6", SCALE_ROWS=2207.090256)
OPT_ESTIMATE(@"SEL$86A1760A", TABLE, "A"@"SEL$5", SCALE_ROWS=2261.586312)
COLUMN_STATS("KSO"."SKEW", "PK_COL", scale, length=5)
COLUMN_STATS("KSO"."SKEW", "COL1", scale, length=4 distinct=828841 nulls=12.8723033 min=1 max=1000000)
TABLE_STATS("KSO"."SKEW", scale, blocks=162294 rows=35183107.66)
7 rows selected.

So on this particular STA Profile, the OPT_ESTIMATE hint has been used to tell the optimizer to change the estimate of rows for table A in query block SEL$6 by multiplying it by 2207 (roughly). In addition, there are hints which are basically hard coding table stats and column stats. So as you can see, these hints, while they may be accurate when the Profile is created, are unlikely to remain accurate over the long haul. In fairness, the OPT_ESTIMATE hint does make sense in situations where the optimizer will never get a calculation correct because of a short coming in it’s abilities (correlated columns is a good example of this type of situation). And in those conditions, implementing a STA generated Profile is a valid long term approach. But in my experience this is the exception rather than the rule.

So what are STA Profiles good for? Well two things:

First, they are very good at showing us where the optimizer is having a problem. If you look at the hints that are generated, it is easy to identify the OPT_ESTIMATE hints where the scaling factors are off the chart (hint: anything with an exponent is a place where the optimizer is struggling). This is easy to do with my sql_profile_hints.sql script by the way. Here’s a set of OPT_ESTIMATE hints. Can you spot the place where the optimizer is really having a problem?

OPT_ESTIMATE(@"SEL$5DA710D3", INDEX_FILTER, "F"@"SEL$1", IDX$$_1AA260002, SCALE_ROWS=8.883203639e-06) 
OPT_ESTIMATE(@"SEL$5DA710D3", INDEX_SKIP_SCAN, "F"@"SEL$1", IDX$$_1AA260002, SCALE_ROWS=8.883203639e-06) 
OPT_ESTIMATE(@"SEL$5DA710D3", JOIN, ("B"@"SEL$1", "A"@"SEL$1"), SCALE_ROWS=4.446153275) 
OPT_ESTIMATE(@"SEL$5DA710D3", JOIN, ("C"@"SEL$1", "A"@"SEL$1"), SCALE_ROWS=7.884506683) 
OPT_ESTIMATE(@"SEL$5DA710D3", JOIN, ("E"@"SEL$1", "A"@"SEL$1"), SCALE_ROWS=25.60960842) 
OPT_ESTIMATE(@"SEL$5DA710D3", JOIN, ("F"@"SEL$1", "B"@"SEL$1"), SCALE_ROWS=26.34181566) 
OPT_ESTIMATE(@"SEL$5DA710D3", JOIN, ("F"@"SEL$1", "B"@"SEL$1", "A"@"SEL$1"), SCALE_ROWS=839.9683673) 
OPT_ESTIMATE(@"SEL$5DA710D3", TABLE, "D"@"SEL$1", SCALE_ROWS=5.083144565e+11)

It’s the first two lines and whatever alias F refers to is our problem area. The OPT_ESTIMATE hint tells the optimizer to decrease it’s estimate by a factor of 8.883203639e-06. So the optimizer has vastly overestimated the rows that will be returned by the index.

Second, STA Profiles are sometimes capable of producing better plans. This is primarily due to the fact that STA can take as long as you give it to analyze a statement, making sure that all the optimizer’s calculations are correct. It does this by running various pieces of the statement and checking that the number of rows the optimizer has estimated are actually correct. Obviously this can take a while on complex statements, much longer than the optimizer is allowed when parsing a statement. But as I’ve already shown, the SQL Profiles that get created to enable those better plans have a pretty good chance of going sour on us over time.

Which leads me to the point of this post. We can have our cake and eat it too! We can create the SQL Profile as recommended by STA and then “lock” the plan into place by converting the OPT_ESTIMATE hints to directive type hints. I put the word “lock” in quotes because there is really no such thing as “locking” a plan. It’s just that using directive hints as opposed to OPT_ESTIMATE hints, significantly lowers the probability of the plan changing in the future. So how do we make this conversion. Well I have a script for that called lock_STA_profile.sql. Here’s an example showing how it works.

SYS@LAB112> @sql_profiles
Enter value for sql_text: 
Enter value for name: 
NAME                           CATEGORY        STATUS   SQL_TEXT                                                               FORCE
------------------------------ --------------- -------- ---------------------------------------------------------------------- -----
PROFILE_fgn6qzrvrjgnz          DEFAULT         DISABLED select /*+ index(a SKEW_COL1) */ avg(pk_col) from kso.skew a           NO
PROFILE_8hjn3vxrykmpf          DEFAULT         DISABLED select /*+ invalid_hint (doda) */ avg(pk_col) from kso.skew where col1 NO
PROFILE_69k5bhm12sz98          DEFAULT         DISABLED SELECT dbin.instance_number,        dbin.db_name, dbin.instance_name,  NO
PROFILE_8js5bhfc668rp          DEFAULT         DISABLED select /*+ index(a SKEW_COL2_COL1) */ avg(pk_col) from kso.skew a wher NO
PROFILE_bxd77v75nynd8          DEFAULT         DISABLED select /*+ parallel (a 4) */ avg(pk_col) from kso.skew a where col1 >  NO
PROFILE_7ng34ruy5awxq          DEFAULT         DISABLED select i.obj#,i.ts#,i.file#,i.block#,i.intcols,i.type#,i.flags,i.prope NO
PROF_6kymwy3guu5uq_1388734953  DEFAULT         ENABLED  select 1                                                               YES
PROFILE_cnpx9s9na938m_MANUAL   DEFAULT         ENABLED  select /*+ opt_param('statistics_level','all') */ * from kso.skew wher NO
PROF_79m8gs9wz3ndj_3723858078  DEFAULT         ENABLED  /* SQL Analyze(252,1) */ select avg(pk_col) from kso.skew              NO
PROFILE_9ywuaagwscbj7_GPS      DEFAULT         ENABLED  select avg(pk_col) from kso.skew                                       NO
PROF_arcvrg5na75sw_3723858078  DEFAULT         ENABLED  select /*+ index(skew@sel$1 skew_col1) */ avg(pk_col) from kso.skew wh NO
SYS_SQLPROF_01274114fc2b0006   DEFAULT         ENABLED  select i.table_owner, i.table_name, i.index_name, FUNCIDX_STATUS, colu NO
SYS_SQLPROF_0127d10ffaa60000   DEFAULT         ENABLED  select table_owner||'.'||table_name tname , index_name, index_type, st NO
coe_abwg9nwg8prsj_3723858078   DEFAULT         ENABLED                                                                         NO
PROF_84q0zxfzn5u6s_2650913906  TEST            ENABLED  select avg(pk_col) from kso.skew                                       NO
PROF_0pvj94afp6faw_FULL        DEFAULT         ENABLED  select /* test 1 hint */ avg(pk_col) from kso.skew a where col1 = 2222 NO
PROF_875qbqc2gw2qz_4201340344  DEFAULT         ENABLED  select /* NOT IN */ department_name                                    NO
PROF_09gdkwq1bs48h_167097056   DEFAULT         ENABLED  select /*+ index (skew skew_col3_col2_col1) */ count(*) from kso.skew  NO
PROFILE_4cp821ufcwvgc_moved    DEFAULT         ENABLED  select count(*) from kso.skew where col3 = '01-jan-10'                 NO
PROF_8wvgj0n4kh6dx_2650913906  DEFAULT         ENABLED  select avg(pk_col) from kso.skew a where col1 = 333333                 NO
PROFILE_g737q1pfmbvjj_moved    DEFAULT         ENABLED  select /*+ full (skew) */ avg(pk_col) from kso.skew where col1 = 13613 NO
PROFILE_cvdnr0b8dcxzz_MANUAL   DEFAULT         ENABLED  select /* aasdas */ avg(pk_col) from kso.skew where col1 = 136133      NO
PROF_g4gp07gt2z920_105323984   DEFAULT         ENABLED  update sys.scheduler$_job set  last_start_date = :1, running_instance  NO
25 rows selected.
SYS@LAB112> @sql_profile_hints
Enter value for profile_name: SYS_SQLPROF_01281e513ace0000
OPT_ESTIMATE(@"SEL$86A1760A", TABLE, "A"@"SEL$6", SCALE_ROWS=2207.090256)
OPT_ESTIMATE(@"SEL$86A1760A", TABLE, "A"@"SEL$5", SCALE_ROWS=2261.586312)
5 rows selected.
SYS@LAB112> @find_sql
Enter value for sql_text: SELECT TASK_LIST.TASK_ID FROM (SELECT /*+ NO_MERGE(T) ORD%
Enter value for sql_id: 
------------- ------ ---------- ----- --------- -------- --------------------------------------------------
bqfx5q2jas08u      0 2496534803    86       .00       12 SELECT TASK_LIST.TASK_ID FROM (SELECT /*+ NO_MERGE
                                                         (T) ORDERED */ T.TASK_ID FROM (SELECT * FROM DBA_A
                                                         DVISOR_TASKS ORDER BY TASK_ID DESC) T, DBA_ADVISOR
                                                         _PARAMETERS_PROJ P1, DBA_ADVISOR_PARAMETERS_PROJ P
                                                         2 WHERE T.ADVISOR_NAME='ADDM' AND T.STATUS = 'COMP
                                                         LETED' AND T.EXECUTION_START >= (SYSDATE - 1) AND
                                                         T.HOW_CREATED = 'AUTO' AND T.TASK_ID = P1.TASK_ID
                                                         AND P1.PARAMETER_NAME = 'INSTANCE' AND P1.PARAMETE
                                                         R_VALUE = SYS_CONTEXT('USERENV','INSTANCE') AND T.
                                                         TASK_ID = P2.TASK_ID AND P2.PARAMETER_NAME = 'DB_I
                                                         D' AND P2.PARAMETER_VALUE = TO_CHAR(:B1 ) ORDER BY
                                                          T.TASK_ID DESC) TASK_LIST WHERE ROWNUM = 1
1 row selected.
SYS@LAB112> @dplan
Enter value for sql_id: bqfx5q2jas08u
Enter value for child_no: 
SQL_ID  bqfx5q2jas08u, child number 0
Plan hash value: 2496534803
| Id  | Operation                          | Name                   | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT                   |                        |       |       |     9 (100)|          |
|*  1 |  COUNT STOPKEY                     |                        |       |       |            |          |
|   2 |   VIEW                             |                        |     2 |    26 |     9   (0)| 00:00:01 |
|   3 |    NESTED LOOPS                    |                        |       |       |            |          |
|   4 |     NESTED LOOPS                   |                        |     2 |   240 |     9   (0)| 00:00:01 |
|*  5 |      FILTER                        |                        |       |       |            |          |
|   6 |       NESTED LOOPS OUTER           |                        |     2 |   188 |     7   (0)| 00:00:01 |
|   7 |        NESTED LOOPS                |                        |     2 |   126 |     5   (0)| 00:00:01 |
|*  8 |         TABLE ACCESS BY INDEX ROWID| WRI$_ADV_TASKS         |     2 |    74 |     3   (0)| 00:00:01 |
|   9 |          INDEX FULL SCAN DESCENDING| WRI$_ADV_TASKS_PK      |   822 |       |     2   (0)| 00:00:01 |
|* 10 |         TABLE ACCESS BY INDEX ROWID| WRI$_ADV_PARAMETERS    |     1 |    26 |     1   (0)| 00:00:01 |
|* 11 |          INDEX UNIQUE SCAN         | WRI$_ADV_PARAMETERS_PK |     1 |       |     0   (0)|          |
|* 12 |        TABLE ACCESS BY INDEX ROWID | WRI$_ADV_EXECUTIONS    |     1 |    31 |     1   (0)| 00:00:01 |
|* 13 |         INDEX UNIQUE SCAN          | WRI$_ADV_EXECS_PK      |     1 |       |     0   (0)|          |
|* 14 |      INDEX UNIQUE SCAN             | WRI$_ADV_PARAMETERS_PK |     1 |       |     0   (0)|          |
|* 15 |     TABLE ACCESS BY INDEX ROWID    | WRI$_ADV_PARAMETERS    |     1 |    26 |     1   (0)| 00:00:01 |
Predicate Information (identified by operation id):
   1 - filter(ROWNUM=1)
  10 - filter("A"."VALUE"=TO_CHAR(:B1))
  11 - access("A"."ID"="A"."TASK_ID" AND "A"."NAME"='DB_ID')
  12 - filter("A"."ADVISOR_ID"="E"."ADVISOR_ID")
  13 - access("A"."ID"="E"."TASK_ID" AND "A"."LAST_EXEC_NAME"="E"."NAME")
  14 - access("A"."ID"="A"."TASK_ID" AND "A"."NAME"='INSTANCE')
   - automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold
   - SQL profile SYS_SQLPROF_01281e513ace0000 used for this statement
57 rows selected.
SYS@LAB112> @lock_STA_profile
Enter value for sql_id: bqfx5q2jas08u
Enter value for child_no (0): 0
Enter value for new_profile_name (PROF_sqlid_planhash): 
Enter value for force_matching (FALSE): 
PL/SQL procedure successfully completed.
SYS@LAB112> @sql_profiles
Enter value for sql_text: 
Enter value for name: %bqfx5q2jas08u%
NAME                           CATEGORY        STATUS   SQL_TEXT                                                               FORCE
------------------------------ --------------- -------- ---------------------------------------------------------------------- -----
SYS@LAB112> @sql_profile_hints
Enter value for profile_name: PROF_bqfx5q2jas08u_2496534803
INDEX_DESC(@"SEL$86A1760A" "A"@"SEL$4" ("WRI$_ADV_TASKS"."ID"))
LEADING(@"SEL$86A1760A" "A"@"SEL$4" "A"@"SEL$6" "E"@"SEL$4" "A"@"SEL$5")
USE_NL(@"SEL$86A1760A" "A"@"SEL$6")
USE_NL(@"SEL$86A1760A" "E"@"SEL$4")
USE_NL(@"SEL$86A1760A" "A"@"SEL$5")
NLJ_BATCHING(@"SEL$86A1760A" "A"@"SEL$5")
29 rows selected.
SYS@LAB112> @sql_profiles
Enter value for sql_text: 
Enter value for name: SYS%
NAME                           CATEGORY        STATUS   SQL_TEXT                                                               FORCE
------------------------------ --------------- -------- ---------------------------------------------------------------------- -----
SYS_SQLPROF_01274114fc2b0006   DEFAULT         ENABLED  select i.table_owner, i.table_name, i.index_name, FUNCIDX_STATUS, colu NO
SYS_SQLPROF_0127d10ffaa60000   DEFAULT         ENABLED  select table_owner||'.'||table_name tname , index_name, index_type, st NO
3 rows selected.
SYS@LAB112> @sql_profile_hints
Enter value for profile_name: SYS_SQLPROF_01281e513ace0000
OPT_ESTIMATE(@"SEL$86A1760A", TABLE, "A"@"SEL$6", SCALE_ROWS=2207.090256)
OPT_ESTIMATE(@"SEL$86A1760A", TABLE, "A"@"SEL$5", SCALE_ROWS=2261.586312)
5 rows selected.

So in this example I listed all the SQL Profiles in existence on the system (using sql_profile.sql). Then I showed the hints associated with STA Profile, SYS_SQLPROF_01281e513ace0000 with sql_profile_hints.sql. Then I located the sql statement in v$sql using the find_sql.sql script. Then I used dbms_xplan (via the dplan.sql script) to show the plan for the statement (proving that it was using the STA Profile). Then I used the lock_STA_profile.sql script to create a directive hint based Profile in place of the OPT_ESTIMATE hint based Profile. Then I showed the hints for the new SQL Profile. Note that the original STA Profile is not dropped, but rather moved to the SAVED category, so you can still look at its hints as I have done at the end of this example.

So that’s it. This is a complex topic and I have blogged about it before on numerous occasions. You may want to look back at this post, Oracle Support Sanctions Manually Created SQL Profiles, to get a better feel for where the hints came from that are used to replace the OPT_ESTIMATE hints. By the way, Jonathan Lewis and Tom Kyte have also written about this feature. (I trust you can find them via Google)

Also, I have written a chapter on Plan Stability in the upcoming Apress book, Pro Oracle SQL. The chapter is 65 or so pages long and it covers SQL Profiles in depth, so if you are hungry for more info on this topic, I highly recommend it. ;)

You can pre-order the book here: Pro Oracle SQL (if you are so inclined)

It should be released in a few weeks.

Jonathan Lewis's picture

Frequency Histograms – 6

In an earlier post on frequency histograms I described how Oracle creates an approximate histogram when dealing with character columns, and warned you that the strategy could lead to a couple of anomalies if you were unlucky. I’ve already published a note about one such anomaly that can occur with fairly long character strings, this note describes another anomaly that could appear in less extreme cases. Again, we start by constructing a data set.

create table t1  (v1 varchar2(42));

insert	into t1
select	'next day'
from	all_objects
where 	rownum <= 150

insert into t1
select	'same day'
from 	all_objects
where 	rownum <= 350

insert into t1
select 	'priority high'
from 	all_objects
where 	rownum <= 500

		ownname		 => user,
		tabname		 =>'t1',
		estimate_percent => 100,
		method_opt 	 => 'for all columns size 254'

        endpoint_number - nvl(prev_endpoint,0)  frequency,
        chr(to_number(substr(hex_val, 2,2),'XX')) ||
        chr(to_number(substr(hex_val, 4,2),'XX')) ||
        chr(to_number(substr(hex_val, 6,2),'XX')) ||
        chr(to_number(substr(hex_val, 8,2),'XX')) ||
        chr(to_number(substr(hex_val,10,2),'XX')) ||
from    (
                lag(endpoint_number,1) over(
                        order by endpoint_number
                )                                                       prev_endpoint,
                table_name = 'T1'
        and     column_name = 'V1'
order by

--------------- ---------- ------------------------------- ------ ------------------------------------------
            150        150  6E65787420644D6D50DD04A6400000 next d
            650        500  7072696F72698970D6651967200000 priori
           1000        350  73616D6520645C36B7AD479D600000 same d

3 rows selected.

This example captures a histogram at a point when my data set has only three popular values in it, and those values are sufficiently different that their “6-byte encoding” is enough for Oracle to tell the difference between them so the histogram doesn’t use the endpoint_actual_value column.

But I have a problem. During the course of the working day I introduce some data with different status codes, and all the queries I run are about these status codes, but by the end of the day (when, possibly, the stats will be collected again by the automatic stats collection job) all the “interesting” data has been updated to one of the three popular values.

What does Oracle do about queries for my “rare” values ? It depends what they look like. Here’s a couple of examples from

	v1 = 'overnight'

	v1 = 'priority low'

set autotrace off

Neither of these values is in the histogram, so I would like Oracle to give a low cardinality estimate to them. But one of the values is very similar to a value that DOES exist in the histogram. This leads to an interesting variation in the numbers that appear in the execution plans:

| Id  | Operation          | Name | Rows  | Bytes | Cost  |
|   0 | SELECT STATEMENT   |      |     1 |    11 |     2 |
|   1 |  SORT AGGREGATE    |      |     1 |    11 |       |
|*  2 |   TABLE ACCESS FULL| T1   |     1 |    11 |     2 |

Predicate Information (identified by operation id):
   2 - filter("V1"='overnight')

Execution Plan
Plan hash value: 3724264953

| Id  | Operation          | Name | Rows  | Bytes | Cost  |
|   0 | SELECT STATEMENT   |      |     1 |    11 |     2 |
|   1 |  SORT AGGREGATE    |      |     1 |    11 |       |
|*  2 |   TABLE ACCESS FULL| T1   |   500 |  5500 |     2 |

Predicate Information (identified by operation id):
   2 - filter("V1"='priority low')

In the first case (‘overnight’) the optimizer has decided that the value doesn’t exist in the histogram. Since this database is running it has set the resulting cardinality to 1 (this would be 75 in or 11g – half the lowest known value in the histogram). But in the second case (‘priority low’) the optimizer has done its 6-byte manipulation and decided that ‘priority low’ matches ‘priority high’, and given it a cardinality of 500.

If Oracle had seen ‘priority low’ when creating the histogram it would have used the endpoint_actual_value column and generated two separate rows for ‘priority low’ and ‘priority high’. But it is perfectly feasible that because of timing (the interesting data exists only briefly during the day) or sampling (the interesting data is tiny compared to the popular values) you could see Oracle taking a “large data” execution path when you know that you’re only expecting to handle a small data set.

The solution, of course, is to write code to create a histogram that represents the situation as you want Oracle to see it.

Jonathan Lewis's picture

Good Nulls

I’ve often been heard to warn people of the accidents that can happen when they forget about the traps that appear when you start allowing columns to be NULL – but sometimes NULLs are good, especially when it helps Oracle understand where the important (e.g. not null) data might be.

An interesting example of this came up on OTN a few months ago where someone was testing the effects of changing a YES/NO column into a YES/NULL column (which is a nice idea because it allows you to create a very small index on the YESes, and avoid creating a histogram to tell the optimizer that the number of YESes is small).

They were a little puzzled, though, about why their tests showed Oracle using an index to find data in the YES/NO case, but not using the index in the YES/NULL case. I supplied a short explanation on the thread, and was planning to post a description on the blog, but someone on the thread supplied a link to AskTom where Tom Kyte had already answered the question, so I’m just going to leave you with a link to his explanation.

Jonathan Lewis's picture

Manual Optimisation

Here’s an example of “creative SQL” that I wrote in response to a question on OTN about combining data from two indexes to optimise access to a table. It demonstrates the principle that you can treat an index as a special case of a table – allowing you to make a query go faster by referencing the same table more times.

Unfortunately you shouldn’t use this particular example in a production system because it relies on the data appearing in the right order without having an “order by” clause. This type of thing makes me really keen to have a hint that says something like: /*+ qb_name(my_driver) assume_ordered(@my_driver) */ so that you could tell the optimizer that it can assume that the rowset from a given query block will appear in the order of the final “order by” clause.

tanelpoder's picture

A: The most fundamental difference between hash and nested loop joins

Ok guys, thanks for waiting!

I ended up expanding the article quite a lot compared to what I had originally planned. In fact I only wrote 50% of what I plan to write, I’ll update the rest… um… later… Instead of just stating the difference between the joins I took a step back and elaborated something what I often see people doing (and talking about in newsgroups and lists too).

Basically the most fundamental (or biggest or most important) difference between nested loop and hash joins is that:

  • Hash joins can not look up rows from the inner (probed) row source based on values retrieved from the outer (driving) row source, nested loops can.

In other words, when joining table A and B (A is driving table, B is the probed table), then a nested loop join can take 1st row from A and perform a lookup to B using that value (of the column(s) you join by). Then nested loop takes the next row from A and performs another lookup to table B using the new value. And so on and so on and so on.

This opens up additional access paths to the table B, for example when joining ORDERS and ORDER_ITEMS by ORDER_ID (and ORDER_ID is leading column of PK in ORDER_ITEMS table), then for whatever orders are taken from ORDERS table, we can perform a focused, narrow index range scan on ORDER_ITEMS for every ORDER_ID retrieved from the driving ORDERS table. A hash join can’t do that.

Of course this doesn’t mean that hash joins can’t use any indexes for tables they read – index range scans and unique lookups can still be used under a hash join, but only if there are constant values in the query text (in form of literal or bind variables). If there are no such constant (filter) conditions under a hash join, then the other options to use that index would be to do an INDEX FULL SCAN (which is a range scan from end to end of the index) or INDEX FAST FULL SCAN (which is like a full table scan through the entire index segment). However none of these opportunities give the same benefits as nested loops looking up rows from row source B dynamically based on what was retrieved from A during runtime.

Note that this nested loops benefit isn’t limited to indexes only on table B, the difference is more fundamental than just a specific access path. For example, if table B happens to be a single table hash cluster or indexed X$ table, then the nested loop is also able to do “optimized” lookups from these row-sources, based on the values retrieved from table A.

So, my article with a lot of (loosely) related details is here:

In the comments section of my question, Tom, Bernard Polarski, Christian Antognini and Marc Musette got the closest to what I had in my mind when I asked the question. However, of course your mileage may vary somewhat depending on what kind of problems you have experienced the most over all the years. Also, Jonathan Lewis had a valid comment regarding that the answer depends on what exactly does one mean by “fundamental” and yeah this was open to interpretation.

Nevertheless, I wanted to emphasize that there’s a more important difference between NL and hash joins, than the usual stuff you see in training material which talk about implementation details like hash tables and memory allocation…

Some day I will complete that article, I plan to add some design advice in there, like denormalization opportunities for getting the best of the both worlds etc. But now I’m gonna get a beer instead.

Thanks for reading and answering my blog, I was quite impressed by the volume of comments & answers to my question. I must do this more often!


tanelpoder's picture

New online seminars – Advanced Oracle Troubleshooting v2.0 Deep Dives

As I mentioned in a previous post, I won’t be doing much flying anymore and so am changing all my seminar offering to online seminars.

So, I’ve changed and re-arranged my seminar content into self-contained 4-hour deep dives and thanks to the online nature (no travel needed), people can choose which days they want to attend. If you’re interested in latch contention only, you can attend the Latch Contention deep dive for example etc. Or you can still attend all the deep dives. The cool thing is that these deep dive sessions take only half a day, too (and are priced accordingly). That way you don’t have to skip work for the whole day (or week) and still can get some of your daily work done too. Hopefully it makes your life a bit easier when getting approval to attend the sessions.

As the main feedback from my seminars has been that “there’s too much to learn” within the short 2-3 days I used to do my seminars in, I have arranged the material so that there will be more time to go deep into the subject area. Also, I have planned plenty of time for questions & answers (1 hour out of the 4 hours is planned Q&A sessions and attendees can also ask questions any time during the lecture & demos).

It looks like I will only offer my Advanced Oracle Troubleshooting v2.0 class online this year. I will probably schedule my Advanced SQL Tuning deep dives in January/February 2011 and the Advanced Troubleshooting class again in March/April and so on (until I go public with my other business, when I won’t have time for full length training anymore).

You can check the current schedule and pricing out here:

Here’s a brief outline of individual half-day Deep Dives I offer:

  1. AOT deep dive 1: Systematic approach for Advanced Oracle Troubleshooting
  2. AOT deep dive 2: Troubleshooting physical IO and buffer cache issues
  3. AOT deep dive 3: Troubleshooting commit, redo, undo and transaction issues
  4. AOT deep dive 4: Troubleshooting Oracle SGA/PGA/UGA and OS memory issues
  5. AOT deep dive 5: Troubleshooting shared pool and library cache issues
  6. AOT deep dive 6: Troubleshooting enqueue lock waits and deadlocks
  7. AOT deep dive 7: Troubleshooting latch contention
  8. AOT deep dive 8: Troubleshooting Mutex and “cursor: pin” contention
  9. AOT deep dive 9: Troubleshooting complex hangs and spins
  10. AOT deep dive 10: Troubleshooting crashes, bugs and ORA-600/ORA-7445 errors

So, sign up now, seats are limited ;-)


Jonathan Lewis's picture

Index degeneration

There’s a thread on OTN that talks about a particular deletion job taking increasing amounts of time each time it is run.

It looks like an example where some thought needs to go into index maintenance and I’ve contributed a few comments to the thread – so this is a lazy link so that I don’t have to repeat myself on the blog.

Jonathan Lewis's picture

Subquery Factoring (4)

I’ve written before about the effects of subquery factoring (common table expressions – or CTEs) on the optimizer, and the way that the optimizer can “lose” some strategies when you start factoring out subquery expressions. Here’s another example I came across quite recently. It involved a join of about 15 tables so I’ve only extracted a few lines from the SQL and resulting execution plans.

We start with the original query, which had factored out an aggregate subquery then used it in place of an inline view:

with max_cust_comm as (
		customer_communications ccm
	group by
left join
	max_cust_comm	mcc
on	mcc.order_id = ord.order_id

The execution path for this query included the following lines:

|   6 |     HASH JOIN OUTER            |                            |     1 |
|   7 |      NESTED LOOP               |                            |     1 |
|  41 |      VIEW                      |                            |   798K|
|  42 |       HASH GROUP BY            |                            |   798K|
|  43 |        TABLE ACCESS FULL       | CUSTOMER_COMMUNICATIONS    |   798K|

You can see that the optimizer has created a result set (VIEW) at line 41 by scanning the entire customer_communications table, for a total of about 800,000 rows, aggregating the data by order_id. This is not very efficient becauase (a) I happen to have a very useful index on the customer_communications table that contains all the data I need, and (b) there are just a few input rows where I need to find this max(comm_date).

Kerry Osborne's picture

Hotsos Symposium 2010 Presentations

I got an email a few days ago asking if I would provide the scripts from my Hotsos Symposium 2010 presentations. I didn’t even realize the presentations had been posted anywhere but I managed to find them on my company’s website. So anyway, I decided to go ahead and post a link to the PDF’s and the scripts here as well. So click on the pretty pictures to get the PDFs and the cleverly titled text links to get the accompanying zip files with the scripts.

Controlling Execution Plans Zip File

My Favorite Scripts 2010 Zip File

Oh and Bob Sneed as “Disco Duck” (Thanks Marco)

Jonathan Lewis's picture

Cardinalilty One

I think anyone who has read Wolfgang Breitling’s material about the optimizer will be familiar with the concept of Cardinality Feedback and one particular detail that when Oracle gets a cardinality estimate of one for a “driving” table then there’s a good chance that the execution plan will go wrong. (That’s not rule, by the way, just a fairly common observation after things have gone wrong.)

A recent note on OTN reminded me of a particular scenario where this specific problem can occur. It’s not particularly common, but it may hit people who are building data warehouses from multiple different sources. We start with an unlikely looking data set and very simple query:

drop table t1;

create table t1 as
	rownum id1,
	rownum id2
	rownum <= 10000

execute dbms_stats.gather_table_stats(user,'t1');

set autotrace traceonly

	id1 = id2

What do you think Oracle estimated cardinality will be for this predciate ? We know, because we saw the data being built, that we’re going to identify 10,000 rows. But the optimizer doesn’t see it that way – check line 2 of the execution plan. The optimizer thinks it will find just one row:

To prevent automated spam submissions leave this field empty.
Syndicate content