Jonathan Lewis's picture

Rownum effects

Here’s a hidden threat in the optimizer strategy that may cause performance problems if you’re trying to operate a series of batch updates (or batch deletes).

In the past I’ve pointed out that a predicate like “rownum <= N" generally makes the optimizer use “first_rows(N)” optimisation methods – known in the code as first_k_rows optimisation.

This isn’t true for updates and deletes, as the following simple example indicates:

create table t1
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		rownum <= 10000
	rownum			id,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
	generator	v1,
	generator	v2
	rownum <= 10000

create index t1_i1 on t1(id);

-- gather_table_stats, no histograms, compute, cascade

explain plan for
update t1 set
	small_vc = upper(small_vc)
	id > 100
and	rownum <= 200

select * from table(dbms_xplan.display);

explain plan for
	id > 100
and	rownum <= 200

select * from table(dbms_xplan.display);

As usual I ran this with system statistics (CPU costing) disabled, using a locally managed tablespace with uniform 1MB extents and freelist management – simply because this leads to a repeatable test. Since I was running I didn’t set the db_file_multiblock_read_count parameter (thus allowing the _db_file_optimizer_read_count to default to 8). These are the plans I got for the update and select respectively:

| Id  | Operation           | Name | Rows  | Bytes | Cost  |
|   0 | UPDATE STATEMENT    |      |   200 |  3000 |    27 |
|   1 |  UPDATE             | T1   |       |       |       |
|*  2 |   COUNT STOPKEY     |      |       |       |       |
|*  3 |    TABLE ACCESS FULL| T1   |  9901 |   145K|    27 |

Predicate Information (identified by operation id):
   2 - filter(ROWNUM<=200)
   3 - filter("ID">100)

| Id  | Operation                    | Name  | Rows  | Bytes | Cost  |
|   0 | SELECT STATEMENT             |       |   200 |  3000 |     6 |
|*  1 |  COUNT STOPKEY               |       |       |       |       |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1    |   200 |  3000 |     6 |
|*  3 |    INDEX RANGE SCAN          | T1_I1 |       |       |     2 |

Predicate Information (identified by operation id):
   1 - filter(ROWNUM<=200)
   3 - access("ID">100)

Note how the select statement uses an index range scan with stop key as the best strategy for finding 200 rows and then stopping – and the total cost of 6 is the cost of visiting the (well-clustered) table data for two hundred rows. The update statement uses a full tablescan to find the first 200 rows with a total cost of 27 – which happens to be the cost of a completed tablescan, not the cost of “enough of the tablescan to find 200 rows”. The update statement has NOT been optimized with using the first_k_rows strategy – it has used the all_rows strategy.

The demonstration is just a starting-point of course – you need to do several more checks and tests to convince yourself that first_k_rows optimisation isn’t going to appear for updates (and deletes) and to discover why it can be a problem that needs to be addressed. One of the simplest checks is to look at the 10053 (CBO) trace files to see the critical difference, especially to notice what’s in the trace for the select but missing from the trace for the update. The critical lines show the following type of information – but only in the trace file for the select:

First K Rows: K = 200.00, N = 9901.00
First K Rows: Setup end

First K Rows: K = 200.00, N = 9901.00
First K Rows: old pf = -1.0000000, new pf = 0.0202000


First K Rows: unchanged join prefix len = 1

Final cost for query block SEL$1 (#0) - First K Rows Plan:

But why might it matter anyway ? Here’s the shape of a piece of SQL, embedded in pl/sql, that I found recently at a client site:

update	tabX set
	col1 = {constant}
	col2 in (
		complex subquery
and	{list of other predicates}
and	rownum <= 200

For most of the calls to this SQL there would be a small number of rows ready for update, and the pl/sql calling this update statement would populate an array (note the “returning” clause) with the ids for the rows updated and then do something with those ids. Unfortunately there were occasions when the data (and the statistics about the data) covered tens of thousands of rows that needed the update. When this happened the optimizer chose to unnest the complex subquery – instead of using a very precise and efficient filter subquery approach – and do a massive hash semi-join that took a couple of CPU minutes per 200 rows and hammered the system to death for a couple of hours.

If Oracle had followed the first_k_rows optimizer strategy it would have used the “small data” access path and taken much less time to complete the task. As it was we ended up using hints to force the desired access path – in this case it was sufficient to add a /*+ no_unnest */ hint to the subquery.

tanelpoder's picture

Q: The most fundamental difference between HASH and NESTED LOOP joins?

So, what do you think is the most fundamental difference between NESTED LOOPS and HASH JOINS?

This is not a trick question. You’re welcome to write your opinion in the comments section – and I’ll follow up with an article about it (my opinion) later today…

Update: The answer article is here:


cary.millsap's picture

My Actual OTN Interview

And now, the actual OTN interview (9:11) is online. Thank you, Justin; it was a lot of fun. And thank you to Oracle Corporation for another great show. It's an ever-growing world we work in, and I'm thrilled to be a part of it.

karlarao's picture

Oracle Closed World and Unconference Presentations

There are so many things to blog about these past few days. That is mainly about the cool stuffs around OCW and OOW, sessions that I have attended (OCW, unconference, OOW), plus the interesting people that I’ve met on various areas of expertise.. So I’ll be posting some highlights (and a lot of photos) on the next posts.

Last Monday (Sept. 20) I was able to present at the Oracle Closed World @ Thirsty Bear. The full agenda is here

glennfawcett's picture

Oracle Open World presentation uploaded… Optimizing Oracle Databases on SPARC Enterprise M-Series Servers.

As promised, I have attached the slide deck we used for our presentation at Oracle Open World.  A big thanks to Allan for asking me to help… glad to do it!

ID#: S315915
Title: Optimizing Oracle Databases on Sun SPARC Enterprise M-Series Servers
Track: Sun SPARC Servers
Date: 20-SEP-10
Time: 12:30 – 13:30
Venue: Moscone South
Room: Rm 270
Slides : oow2010_db_on_mseries.pdf

There are a bunch of OTN white papers that were produced to show how to run the Oracle stack best on Sun servers.  I will try to post an index soon but feel free to peruse the OTN site, there is lots of new content.

Jonathan Lewis's picture

Index degeneration

There’s a thread on OTN that talks about a particular deletion job taking increasing amounts of time each time it is run.

It looks like an example where some thought needs to go into index maintenance and I’ve contributed a few comments to the thread – so this is a lazy link so that I don’t have to repeat myself on the blog.

glennfawcett's picture

Oracle Open World 2010… Optimizing Oracle Databases on Sun SPARC Enterprise M-Series Servers

This year at OOW I will be co-presenting on Oracle performance on Sun servers. Stop by and say hi if you get a chance.

ID#: S315915
Title: Optimizing Oracle Databases on Sun SPARC Enterprise M-Series Servers
Track: Sun SPARC Servers
Date: 20-SEP-10
Time: 12:30 – 13:30
Venue: Moscone South
Room: Rm 270

arupnanda's picture

A Tool to Enable Stats Collection for Future Sessions for Application Profiling

The other day I was putting together my presentation for Oracle Open World on Application Profiling in RAC. I was going to describe a methodology for putting a face to an app by measuring how it behaves in a database – a sort of a signature of that application. I was going to use the now-ubiquitous 10046 trace for wait events and other activities inside the database. For resource consumption such as redo generated, logical I/Os, etc., I used the v$sesstat; but then I was stuck. How would I collect the stats of a session when the session has not even started and I don’t know the SID. That problem led to the development of this tool where the stats of a future session can be recorded based on some identifying factors such as username, module, etc. Hope this helps in your performance management efforts.

The Problem

Suppose you want to find out the resource consumed by a session. The resources could be redo generation, CPU used, logical I/O, undo records generated – the list is endless. This is required for a lot of things. Consider a case where you want to find out which apps are generating the most redo; you would issue a query like this:

select sid, value
from v$sesstat s, v$statname n
where n.statistic# = s.statistic#
and = 'redo size'

The value column will show the redo generated. From the SID you can identify the session. Your next stop is v$session to get the other relevant information such as username, module, authentication scheme, etc. Problem solved, right?

Not so fast. Look at the above query; it selects from v$sesstat. When the session is disconnected, the stats disappear, making the entries for that session go from v$sesstat. If you run the query, you will not find these sessions. You have to constantly select from the v$sesstat view to capture the stats of the sessions hoping that you would capture the stats before the session disconnects. But it will be not be guaranteed. Some short sessions will be missed in between collection samples. Even if you are lucky to capture some stats of a short session, the other relevant information from v$session will be gone.

Oracle provides a package dbms_monitor, where a procedure named client_id_stat_enable allows you to enable stats collection on a future session where the client_id matches a specific value, e.g. CLIENT1. Here is an example:

execute dbms_monitor.client_id_stat_enable('CLIENT1');

However there are three issues:

(1) It collects only about 27 stats, out of 400+

(2) It offers only three choices for selecting sessions – client_id, module_name and service_name.

(3) It aggregate them, sums up all stats for a specific client_id. That is pretty much useless without a detailed session level.

So, in short, I didn’t have a readily available solution.


Well, necessity is the mother of invention. When you can’t find a decent tool; you build it; and so did I. I built this tool to capture the stats. This is version 1 of the tool. It has some limitations, as shown at the end. These limitations do not apply to all situations; so the tool may be useful in a majority of the cases. Later I will expand the tool to overcome these limitations.


The fundamental problem, as you recall, is not the dearth of data (v$sesstat has plenty); it’s the sessions in the future. To capture those sessions, the tool relies on a post-logon database trigger to capture the values.

The second problem was persistence. V$SESSTAT is a dynamic performance view, which means the records of the session will be gone when the session disappears. So, the tool relies on a table to store the data.

The third problem is the getting the values at the very end of the session. The difference between the values captured at the end and beginning of the session are the stats. To capture the values at the very end; not anytime before, the tool relies on a pre-logoff database trigger.

The fourth challenge is identification of sessions. SID of a session is not unique; it can be reused for a new session; it will definitely be reused when the database is recycled. So, the tool uses a column named CAPTURE_ID, a sequentially incremented number for each capture. Since we capture once at the beginning and then at the end, I must use the same capture_id. I use a package variable to store that capture_Id.

Finally, the tool allows you to enable stats collections based on some session attributes such as username, client_id, module, service_name, etc. For instance you may want to enable stats for any session where the username = ‘SCOTT’ or where the os_user is ‘ananda’, etc. These preferences are stored in a table reserved for that purpose.


Now that you understand how the tool is structured, let me show the actual code and scripts to create the tool.

(1) First, we should create the table that holds the preferences. Let’s call this table RECSTATS_ENABLED. This table holds all the different sessions attributes (ip address, username, module, etc.) that can enable stats collection in a session.


If you want to enable stats collection of a session based on a session attribute, insert a record into this table with the session attribute and the value. Here are some examples. I want to collect stats on all sessions where client_info matches ‘MY_CLIENT_INFO1’. You would insert a record like this:

insert into recstats_enabled values ('CLIENT_INFO','MY_CLIENT_INFO1');

Here are some more examples. All sessions where ACTION is ‘MY_ACTION1’:

insert into recstats_enabled values ('ACTION','MY_ACTION1');

Those of user SCOTT:

insert into recstats_enabled values ('SESSION_USER','SCOTT')

Those with service name APP:

insert into recstats_enabled values ('SERVICE_NAME','APP')

You can insert as many preferences as you want. You can even insert multiple values of a specific attribute. For instance, to enable stats on sessions with service names APP1 and APP2, insert two records.

Important: the session attribute names follow the naming convention of the USERENV context used in SYS_CONTEXT function.

(2) Next, we will create a table to hold the statistics


Note, I used the tablespace USERS; because I don’t want this table, which can potentially grow to huge size, to overwhelm the system tablespace. The STATISTIC_NAME and STATISTIC_VALUE columns record the stats collected. The other columns record the other relevant data from the sessions. All the attributes here have been shown with VARCHAR2(2000) for simplicity; of course they don’t need that much of space. In the future versions, I will put a more meaningful limit; but 2000 does not hurt as it is varchar2.

The capture point will show when the values were captured – START or END of the session.

(3) We will also need a sequence to identify the sessions. Each session will have 400+ stats; we will have a set at the end and once at the beginning. We could choose SID as an identifier; but SIDs could be reused. So, we need something that is truly unique – a sequence number. This will be recorded in the CAPTURE_ID column in the stats table.

SQL> create sequence seq_recstats;

(4) To store the capture ID when the post-logon trigger is fired, to be used inside the pre-logoff trigger, we must use a variable that would be visible to entire session. A package variable is the best for that.

create or replace package pkg_recstats
g_recstats_id number;

(5) Finally, we will go on to the meat of the tool – the triggers. First, the post-logon trigger to capture the stats in the beginning of the session:

CREATE OR REPLACE TRIGGER SYS.tr_post_logon_recstats
after logon on database
l_stmt varchar2(32000);
l_attr_val recstats_enabled.attribute_value%TYPE;
l_capture_point recstats.capture_point%type := 'START';
l_matched boolean := FALSE;
pkg_recstats.g_recstats_id := null;
for r in (
select session_attribute, attribute_value
from recstats_enabled
order by session_attribute
exit when l_matched;
-- we select the userenv; but the null values may cause
-- problems in matching; so let’s use a value for NVL
-- that will never be used - !_!_!
l_stmt := 'select nvl(sys_context(''USERENV'','''||
r.session_attribute||'''),''!_!_!_!'') from dual';
execute immediate l_stmt into l_attr_val;
if l_attr_val = r.attribute_value then
-- match; we should record the stats
-- and exit the loop; since stats should
-- be recorded only for one match.
l_matched := TRUE;
select seq_recstats.nextval
into pkg_recstats.g_recstats_id
from dual;
insert into recstats
from v$mystat s, v$statname n
where s.statistic# = n.statistic#;
end if;
end loop;

The code is self explanatory. I have provided more explanation as comments where needed.

(6) Next, the pre-logoff trigger to capture the stats at the end of the session:

CREATE OR REPLACE TRIGGER SYS.tr_pre_logoff_recstats
before logoff on database
l_capture_point recstats.capture_point%type := 'END';
if (pkg_recstats.g_recstats_id is not null) then
insert into recstats
from v$mystat s, v$statname n
where s.statistic# = n.statistic#;
end if;

Again the code is self explanatory. We capture the stats only of the global capture ID has been set by the post-logoff trigger. If we didn’t do that all the sessions would have started recording stats at their completion.


Now that the setup is complete, let’s perform a test by connecting as a user with the service name APP:

SQL> connect arup/arup@app

In this session, perform some actions that will generate a lot of activity. The following SQL will do nicely:

SQL> create table t as select * from all_objects;

SQL> exit

Now check the RECSTATS table to see the stats on this catured_id, which happens to be 1330.

col name format a60
col value format 999,999,999
select a.statistic_name name, b.statistic_value - a.statistic_value value
from recstats a, recstats b
where a.capture_id = 1330
and a.capture_id = b.capture_id
and a.statistic_name = b.statistic_name
and a.capture_point = 'START'
and b.capture_point = 'END'
and (b.statistic_value - a.statistic_value) != 0
order by 2

Here is the output:

NAME                                                                VALUE
------------------------------------------------------------ ------------
workarea memory allocated -2
change write time 1
parse time cpu 1
table scans (long tables) 1
cursor authentications 1
sorts (memory) 1
user commits 2
opened cursors current 2
IMU Flushes 2
index scans kdiixs1 2
parse count (hard) 2
workarea executions - optimal 2
redo synch writes 2
redo synch time 3
rows fetched via callback 5
table fetch by rowid 5
parse time elapsed 5
recursive cpu usage 8
switch current to new buffer 10
cluster key scan block gets 10
cluster key scans 10
deferred (CURRENT) block cleanout applications 10
Heap Segment Array Updates 10
table scans (short tables) 12
messages sent 13
index fetch by key 15
physical read total multi block requests 15
SQL*Net roundtrips to/from client 18
session cursor cache hits 19
session cursor cache count 19
user calls 25
CPU used by this session 28
CPU used when call started 29
buffer is not pinned count 33
execute count 34
parse count (total) 35
opened cursors cumulative 36
physical read total IO requests 39
physical read IO requests 39
calls to get snapshot scn: kcmgss 45
non-idle wait count 67
user I/O wait time 116
non-idle wait time 120
redo ordering marks 120
calls to kcmgas 143
enqueue releases 144
enqueue requests 144
DB time 149
hot buffers moved to head of LRU 270
recursive calls 349
active txn count during cleanout 842
cleanout - number of ktugct calls 842
consistent gets - examination 879
IMU undo allocation size 968
physical reads cache prefetch 997
physical reads 1,036
physical reads cache 1,036
table scan blocks gotten 1,048
commit cleanouts 1,048
commit cleanouts successfully completed 1,048
no work - consistent read gets 1,060
redo subscn max counts 1,124
Heap Segment Array Inserts 1,905
calls to kcmgcs 2,149
consistent gets from cache (fastpath) 2,153
free buffer requested 2,182
free buffer inspected 2,244
HSC Heap Segment Block Changes 2,519
db block gets from cache (fastpath) 2,522
consistent gets 3,067
consistent gets from cache 3,067
bytes received via SQL*Net from client 3,284
bytes sent via SQL*Net to client 5,589
redo entries 6,448
db block changes 9,150
db block gets 10,194
db block gets from cache 10,194
session logical reads 13,261
IMU Redo allocation size 16,076
table scan rows gotten 72,291
session uga memory 88,264
session pga memory 131,072
session uga memory max 168,956
undo change vector size 318,640
session pga memory max 589,824
physical read total bytes 8,486,912
cell physical IO interconnect bytes 8,486,912
physical read bytes 8,486,912
redo size 8,677,104

This clearly shows you all the stats of that session. Of course the table recorded all other details of the session as well – such as username, client_id, etc., which are useful later for more detailed analysis. You can perform aggregations as well now. Here is an example of the stats collected for redo size:

select session_user, sum(STATISTIC_VALUE) STVAL
from recstats
where STATISTIC_NAME = 'redo size'
group by session_user


------------ ---------
ARUP 278616
APEX 4589343
… and so on …

You can disassemble the aggregates to several attributes as well. Here is an example where you want to find out the redo generated from different users coming from different client machines

select session_user, host, sum(STATISTIC_VALUE) stval
from recstats
where STATISTIC_NAME = 'redo size'
group by session_user, host


------------ ----------- -------
ARUP oradba2 12356
ARUP oradba1 264567
APEX oradba2 34567
… and so on …

Granularity like this shows you how the application from different client servers helped; not just usernames.


As I mentioned, there are some limitations you should be aware of. I will address them in the next iterations of the tool. These are not serious and applicable in only certain cases. As long as you don’t encounter that case, you should be fine.

(1) The logoff trigger does not fire when the user exits from the session ungracefully, such as closing down the SQL*Plus window, or closing the program before exiting. In such cases the stats at the end of the session will not be recorded. In most application infrastructure it does not happen; but it could happen for adhoc user sessions such as people connecting through TOAD.

(2) The session attributes such as module, client_id and action can be altered within the session. If that is the case, this tool does not record that fact since there is no triggering event. The logoff trigger records the module, action and client_id set at that time. These attributes are not usually changed in application code; so it may not apply to your case.

(3) Parallel Query sessions will have a special issue since there will be no logoff trigger. So in case of parallel queries, you will not see any differential stats. If you don’t use PQ, as most OLTP applications do, you will not be affected.

(4) If the session just sits there without disconnecting, the logoff trigger will never fire and the stats will never be captured. Of course, it will be eventually captured when the session exits.

Once again, these limitations apply only to certain occasions. As long as you are aware of these caveats, you will be able to use this tool to profile many of your applications.

Happy Profiling!
marco's picture

Oracle – Small but important changes

From this version onward the default storage model for XMLType is not CLOB but Binary XML. This has the advantage that due to its Binary XML nature Oracle can optimize XML handling because the format of the XML document or instance is known. Also Oracle can, if possible, transport XML binary format, so smaller in size, data in and out the database if used with the binary XML API’s. Another advantage handling content driven queries, is the possibility of query rewrites or optimizing  memory objects, so called XOB objects which are more efficient or smaller in size.

See the Oracle XMLDB Developers Guide 11.2.0.x for more info.

…and this concludes my first post via an Android phone…


cary.millsap's picture

My OTN Interview at OOW2010 (which hasn’t happened yet)

Yesterday, Justin Kestelyn from Oracle Technology Network sent me a note specifying some logistics for our OTN interview together called “Coding for Performance, with Cary Millsap.” It will take place at Oracle OpenWorld on Monday, September 20 at 11:45am Pacific Time.

One of Justin’s requests in his note was, “Topics: Please suggest 5 topics for discussion?” So, I thought for a couple of minutes about questions I’d like him to ask me, and I wrote down the first one. And then I thought to myself that I might as well write down the answer I would hope to say to this; maybe it’ll help me remember everything I want to say. Then I wrote another question, and the answer just flowed, and then another, and another. Fifteen minutes later, I had the whole thing written out.

To prevent automated spam submissions leave this field empty.
Syndicate content