Internals

tanelpoder's picture

Secret hacking session – full scans, direct path reads, object level checkpoints, ORA-8103s! (again)

I’m mentioning this again just in case you missed the announcement (because I posted it on the weekend):

 

There will be anotner free Secret hacking session – about full scans, direct path reads, object level checkpoints, ORA-8103s!

It will happen tomorrow, Tuesday 9th August, online!

 

Register here:

See you soon!

 

tanelpoder's picture

Training Schedule for 2011 and Public Appearances

Online Seminars
A lot of people have asked me about whether I’d be doing any more seminars in the future. And the answer is yes – at least this year (might be too busy running a company the next year ;-)
I have finally put together the schedule for my 2011 seminars. In addition to the Advanced Oracle Troubleshooting seminar I will also deliver my Advanced Oracle SQL Tuning and Oracle Partitioning and Parallel Execution for Performance seminars, which I have done only onsite in past.
So, check out the seminars page:
Also don’t forget the Expert Oracle Exadata virtual conference next week!
Public Appearances

Oracle OpenWorld 2. October
  • I will talk about Large-Scale Consolidation onto Oracle Exadata: Planning, Execution, and Validation
  • Session ID 09355
Maybe I’ll lurk around the UKOUG venue as well in december ;-)

mwidlake's picture

Index Organized Tables – the Basics.

IOT2 – Examples and proofs..>
IOT3 – Greatly reducing IO with IOTs….>
IOT4 – Boosting Buffer Cache Efficiency……>

I think Index Organized Tables(IOTs) are a much under-used and yet very useful feature of Oracle. Over the next few postings I’m going to cover some aspect of Index Organised Tables, both good and not-so-good. I am going to cover some benefits of IOTs that I think many people are unaware of. In this first post I am just going to run through the basics of IOTs.

The idea behind an IOT is simple. You hold all the data for the table in the ordered structure of an index. Why would you want to do that? Let us consider a very common requirement, accessing a row in a “large” table via a known, unique key.

Traditionally you have a heap table holding the data you want to access and a standard index to support access to that table. See the first diagram below. The 4-layer triangle represents the index, with a root block, two levels of branch blocks and then the leaf blocks at the “bottom”. The blue rectangle represents the table with the squares being individual rows. Of course, in a large table there would be thousands or millions of “squares”, this is just a simple diagram to show the idea.

When you issue a SQL statement to select the row via the indexed column(s) then oracle will read the root block (1), find the relevent block in the first level of branch blocks (2), then the relevant block in the second level of branch blocks (3) and finally (as far as the index is concerned) the relevant Leaf Block for the unique key. The leaf block holds the indexed column(s) and also the rowid. The rowid is the fastest way to look up a record, it states the file, block and row offset for the row. This allows oracle to go straight to the block and get the row. That is read number (5).
The number of branch blocks {and thus the number of blocks that need to be read to find a row} will vary depending on how much data is indexed, the number and size of the columns in the index, how efficiently the space has been used in the blocks and one or two other factors. In my experience most indexes for tables with thousands or millions of rows have one, two or three levels of branch blocks.

The second diagram shows a representation of the Index Organized Table. The table has in effect disappeared as a distinct object and the information has been moved into the leaf blocks of the index {part of me feels Index Organized Tables should really be called Table Organized Indexes or Table Containing Indexes as that would better indicate what is physically done}:

So with the IOT oracle reads the root block (1), the two branch level blocks (2 and 3) and finally the leaf block (4). The leaf block does not hold the rowid but rather the rest of the columns for the table {this can be changed, a more advanced feature allows you to store some or all the extra columns in an overflow segment}. Thus to access the same data, Oracle has to read only 4 blocks, not 5. Using an IOT saves one block read per unique lookup.

This saving of block reads is probably the main feature that IOTs are known for, but there are others which I will cover in later posts. Two things I will mention now is that, firstly, the use of IOTs is potentially saving disc space. An index is in effect duplication of data held in the table. When you create an index no new information is created but space is used up holding some of the table information in a structure suitable for fast lookup. Secondly, the index and table have to be maintained whenever a change is made to the columns that are indexed. IOTs reduce this maintenance overhead as there is only one thing to maintain.

Now for some drawbacks.

  • The IOT has to be indexed on the primary key. There is no option to create an IOT based on other indexes. As such you have to either be accessing the table via the primary key to get the benefit – or you have to be a little cunning.
  • The index is going to be larger than it was and very often larger than the original table. This can slow down range scans or full scans of the index and a “full table scan” will now be a full index scan on this large object, so that can also negatively impact performance. However, if a range scan would then have resulted in access to the table to get extra columns, the IOT gives a similar benefit in reducing IO to that for single row lookups.
  • I just want to highlight that you now have no rowid for the rows.
  • Secondary indexes are supported but will potentially be less efficient due to this lack of rowid.

So, a brief summary is that Index Organised Tables effectively move the table data into the Primary Key index, reduce the number of block lookups needed to select one row, can save some disc space. But you can only organize the table via the Primary Key and it can make full or partial table scans and lookups via other indexes slower.

There are several more important benefits to IOTs {in my opinion} which I will come to over the next week or two.

tanelpoder's picture

RAC hack!

In other words – FREE STUFF!!!

Riyaj Shamsudeen does a free RAC hacking session on 12 July!

He will demonstrate how the LMS background process works, with the help of OS tracing tools like truss and DTrace.

Sign up here!

 

Update:

A video recording of the session can be found here: http://www.vimeo.com/26798681

 

 

tanelpoder's picture

What is the purpose of segment level checkpoint before DROP/TRUNCATE of a table?

There was a very good question asked in Oracle-L list today, which has bothered me too in past.
The question was:
What is the purpose of a segment level checkpoint before DROP/TRUNCATE of a table?
 
In other words, why do we have to wait for the enq: RO – fast object reuse wait event (and in 11.2 the enq: CR – block range reuse ckpt wait) when dropping & truncating segments?
 
I’m not fully confident that I know all the real reasons behind this, but it could be related to the need to get rid of segment’s dirty buffers in buffer cache, before dropping the object.
 
Imagine this:
 
  • You have a large buffer cache and you drop table A without checkpointing the dirty buffers.  
  • Immediately after the drop succeeds (some buffers are still dirty in cache) some other segment (table B) reuses that space for itself and writes stuff into it.
  • A few seconds later, DBWR wakes up to find & write some dirty buffers to disk (anything it finds from its lists). As there are some old & dirty blocks of table A still in the cache, they get written to disk too, overwriting some of the new table B blocks!

  

So, this is one reason why you should checkpoint the blocks to disk before dropping (or truncating) a segment. Of course you might ask that why doesn’t DBWR just check whether the dirty buffer is part of an existing object or a dropped one when it walks through its dirty list? It could just discard the dirty buffers of dropped objects it finds. It would be doable – but I also think it would get quite complex. DBWR is a low level background proces, understanding the cache layer and dealing with physical datablocks in a file# = X block offset = Y. It doesn’t really know anything about the segments/objects which use these blocks. If it should start checking for logical existence of an object, it would have to start running code to access (a much higher level concept) data dictionary cache – and possibly query data dictionary tables via recursive calls, etc, so making it much more complicated.
 
So, this logic may just be matter of implementation, it’d be too complex to implement such selective discarding of dirty buffers, based on a higher-level concept of existence of a segment or object. Dropping and truncating tables so frequently, that these waits become a serious problem (consuming significant % of response time) indicate a design problem anyway. For example, former SQL server developers creating multiple temporary tables in Oracle – for breaking a complex query down into smaller parts, just like they had been doing it in SQL Server.
 
Anyway, here’s what I think about this – I’d love to hear other opinions, if you think otherwise!

tanelpoder's picture

Advanced RAC Training by Oracle RAC expert Riyaj Shamsudeen

If you’ve troubleshooted (or tuned) RAC then you probably already know Riyaj Shamsudeen and his Orainternals blog & website (links below).

Anyway, since I started delivering my Advanced Oracle Troubleshooting classes some years ago, many people asked whether I would do a similar class for RAC. I had deliberately left out the RAC-specific stuff from my troubleshooting material, because it’s a very wide and complex topic and I feel like before trying to master RAC troubleshooting, you should master troubleshooting of regular single instance databases anyway. I realized that I didn’t have the time to build (and maintain) yet another set of trainig material, especially on so complex topic as RAC performance & troubleshooting. 

So, having seen Riyaj’s impressive work and his presentations at various conferences, I asked whether he would be interested in building a RAC troubleshooting class, going from fundamentals to advanced topics – and he said yes. By now we are that far that I’m happy to announce the first Advanced RAC online seminars by Riyaj Shamsudeen (split across two weeks of online sessions, 4-hours per day, in end of august and september).

We initially called the seminar “Advanced RAC Troubleshooting” but then realized, that there are some closely related non-troubleshooting topics to be covered, like fundamental concepts, internals and also how to configure RAC for performance (so that you wouldn’t have to troubleshoot performance later :-)

We’ll use the same infrastructure and seminar philosophy as I do in my own online seminars, it’s just that this is Riyaj’s material and he will deliver it too.

You can read more about the seminar content, dates and sign up at the seminars page:

Seminars:

Riyaj’s blog:

Riyaj’s website (articles, slides etc):

 

Let the RAC hacking begin! ;-)

 

tanelpoder's picture

Tech Reviewer, Tech Reviewer! ;-)

I just noticed that Jonathan Lewis has announced that he’s writing a new Oracle (fundamental) internals book, due to be out in November.

So, I’m happy to add to Jonathan’s announcement, that I’m the tech reviewer of that book!

After all the hard work on the Exadata book, I didn’t want to hear about working on any book again (even if it’s just tech reviewing work), but as this is Jonathan’s book, about exactly these topics I love and focus on, I had no choice but to make an exception and become a reviewer ;-)

I’ve already reviewed a couple of chapters and this book is going to be awesome!

 

tanelpoder's picture

Reminder and Public Appearances 2011

First, a reminder – my Advanced Oracle Troubleshooting v2.0 online seminar starts next week already. Last chance to sign up, I can accept registrations until Sunday :-)

I won’t do another AOT seminar before Oct (or Nov) this year. More details and sign-up here:

I have rescheduled my Advanced SQL Tuning and Partitioning & Parallel Execution for Performance seminars too. I will do them in September/October. Unfortunately I’m too busy right now to do them before the summer.

Public Appearances:

  • I will be speaking at the UKOUG Exadata Special Event in London on 18th April
  • I have submitted a few papers for Oracle OpenWorld in San Francisco as well (end of Sep/beginning of Oct), all about Exadata. Let’s see how it goes, but I’ll be there anyway, which means that I’ll probably show up at the Oracle Closed World event too!

And that’s all the travel I will do this year…

Virtual Conferences:

I’ll soon announce the 2nd EsSN virtual conference too ;-)

Free online stuff:

Perhaps in a month or so I will do another hacking session (I’ll plan 2 hours this time, 1 hour isn’t nearly enough for going deep). The topic will probably be about low-level details of SQL plan execution internals… stay tuned!

tanelpoder's picture

Oracle Troubleshooting TV Show: Season 1, Episode 01 ;-)

Ok, it’s official – the first and only Oracle Troubleshooting TV show is live now!

The first show is almost 2 hours about the ORA-4031 errors and shared pool hacking. It’s a recording of the US/EMEA timezone online hacking session I did some days ago.

There are a couple of things to note:

  1. The text still isn’t as sharp as in the original recording, but it’s much better than in my previous upload attempts and is decently readable. I’ll try some more variations with my next shows so I hope the text quality will get better! Or maybe I should just switch to GUI tools or powerpoint slides? ;-)
  2. You probably should view this video in full screen (otherwise the text will be tiny and unreadable)
  3. There’s advertising in the beginning (and maybe end) of this show! I’ll see how much money I’ll make out of this – maybe these shows start contributing towards the awesome beer selection I’ll have in my fridge some day (right now I have none). Viewing a 30-sec advert is small price to pay for 2 hours of kick-ass shared pool hacking content !!!
  4. You can download the scripts and tools used in the demos from http://tech.e2sn.com/oracle-scripts-and-tools/
  5. Make sure you check out my online Oracle troubleshooting seminars too (this April and May already)

View the embedded video below or go to my official Oracle Troubleshooting TV show channel:

http://tanelpoder.blip.tv

Enjoy!

tanelpoder's picture

LOBREAD SQL Trace entry in Oracle 11.2 (and tracing OPI calls with event 10051)

A few days ago I looked into a SQL Tracefile of some LOB access code and saw a LOBREAD entry there. This is a really welcome improvement (or should I say, bugfix of a lacking feature) for understanding resource consumption by LOB access OPI calls. Check the bottom of the output below:

*** 2011-03-17 14:34:37.242
WAIT #47112801352808: nam='SQL*Net message from client' ela= 189021 driver id=1413697536 #bytes=1 p3=0 obj#=99584 tim=1300390477242725
WAIT #0: nam='gc cr multi block request' ela= 309 file#=10 block#=20447903 class#=1 obj#=99585 tim=1300390477243368
WAIT #0: nam='cell multiblock physical read' ela= 283 cellhash#=379339958 diskhash#=787888372 bytes=32768 obj#=99585 tim=1300390477243790
WAIT #0: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=99585 tim=1300390477243865
[...snipped...]
WAIT #0: nam='SQL*Net more data to client' ela= 2 driver id=1413697536 #bytes=2048 p3=0 obj#=99585 tim=1300390477244205
WAIT #0: nam='SQL*Net more data to client' ela= 4 driver id=1413697536 #bytes=2048 p3=0 obj#=99585 tim=1300390477244221
WAIT #0: nam='gc cr multi block request' ela= 232 file#=10 block#=20447911 class#=1 obj#=99585 tim=1300390477244560
WAIT #0: nam='cell multiblock physical read' ela= 882 cellhash#=379339958 diskhash#=787888372 bytes=32768 obj#=99585 tim=1300390477245579
WAIT #0: nam='SQL*Net more data to client' ela= 16 driver id=1413697536 #bytes=2020 p3=0 obj#=99585 tim=1300390477245685
WAIT #0: nam='SQL*Net more data to client' ela= 6 driver id=1413697536 #bytes=2048 p3=0 obj#=99585 tim=1300390477245706
WAIT #0: nam='SQL*Net more data to client' ela= 5 driver id=1413697536 #bytes=1792 p3=0 obj#=99585 tim=1300390477245720
#ff0000;">LOBREAD: c=1000,e=2915,p=8,cr=5,cu=0,tim=1300390477245735

In past versions of Oracle the CPU (c=) usage figures and other stats like number of physical/logical reads of the LOB chunk read OPI call were just lost – they were never reported in the tracefile. In past only the most common OPI calls, like PARSE, EXEC, BIND, FETCH (and recently CLOSE cursor) were instrumented with SQL Tracing. But since 11.2(.0.2?) the LOBREAD’s are printed out too. This is good, as it reduces the amount of guesswork needed to figure out what are those WAITs for cursor #0 – which is really a pseudocursor.

Why cursor#0? It’s because normally, with PARSE/EXEC/BIND/FETCH, you always had to specify a cursor slot number you operated on (if you fetch from cursor #5, it means that Oracle process went to slot #5 in the open cursor array in your session’s UGA and followed the pointers to shared cursor’s executable parts in library cache from there). But LOB interface works differently – if you select a LOB column using your query (cursor), then all your application gets is a LOB LOCATOR (sort of a pointer with LOB item ID and consistent read/version SCN). Then it’s your application which must issue another OPI call (LOBREAD) to read the chunks of that LOB out from the database. And the LOB locator is independent from any cursors, it doesn’t follow the same cursor API as regular SQL statements (as it requires way different functionality compared to a regular select or update statement).

So, whenever a wait happened in your session due to an access using a LOB locator, then there’s no specific cursor responsible for it (as far as Oracle sees internally) and that’s why a fake, pseudocursor #0 is used.

Note that on versions earlier than 11.2(.0.2?) when the LOBREAD wasn’t printed out to trace – you can use OPI call tracing (OPI stands for Oracle Program Interface and is the server-side counterpart to OCI API in the client side) using event 10051. First enable SQL Trace and then the event 10051 (or the other way around if you like):

SQL> @oerr 10051

ORA-10051: trace OPI calls

SQL> alter session set events '10051 trace name context forever, level 1';

Session altered.

Now run some LOB access code and check the tracefile:

*** 2011-03-17 14:37:07.178
WAIT #47112806168696: nam='SQL*Net message from client' ela= 6491763 driver id=1413697536 #bytes=1 p3=0 obj#=99585 tim=1300390627178602
OPI CALL: type=105 argc= 2 cursor=  0 name=Cursor close all
CLOSE #47112806168696:c=0,e=45,dep=0,type=1,tim=1300390627178731
OPI CALL: type=94 argc=28 cursor=  0 name=V8 Bundled Exec
=====================
PARSING IN CURSOR #47112802701552 len=19 dep=0 uid=93 oct=3 lid=93 tim=1300390627179807 hv=1918872834 ad='271cc1480' sqlid='3wg0udjt5zb82'
select * from t_lob
END OF STMT
PARSE #47112802701552:c=1000,e=1027,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=3547887701,tim=1300390627179805
EXEC #47112802701552:c=0,e=29,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3547887701,tim=1300390627179884
WAIT #47112802701552: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=99585 tim=1300390627179939
WAIT #47112802701552: nam='SQL*Net message from client' ela= 238812 driver id=1413697536 #bytes=1 p3=0 obj#=99585 tim=1300390627418785
OPI CALL: type= 5 argc= 2 cursor= 26 name=FETCH
WAIT #47112802701552: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=99585 tim=1300390627418945
FETCH #47112802701552:c=0,e=93,p=0,cr=5,cu=0,mis=0,r=1,dep=0,og=1,plh=3547887701,tim=1300390627418963
WAIT #47112802701552: nam='SQL*Net message from client' ela= 257633 driver id=1413697536 #bytes=1 p3=0 obj#=99585 tim=1300390627676629
#ff0000;">OPI CALL: type=96 argc=21 cursor=  0 name=#ff0000;">LOB/FILE operations
WAIT #0: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=99585 tim=1300390627676788
[...snip...]
WAIT #0: nam='SQL*Net more data to client' ela= 2 driver id=1413697536 #bytes=1792 p3=0 obj#=99585 tim=1300390627677054
LOBREAD: c=0,e=321,p=0,cr=5,cu=0,tim=1300390627677064

Check the bold and especially the red string above.  Tracing OPI calls gives you some extra details of what kind of tasks are executed in the session. The “LOB/FILE operations” call indicates that whatever lines come after it (unlike SQL trace call lines where all the activity happens before a call line is printed (with some exceptions of course)) are done for this OPI call (until a next OPI call is printed out). OPI call tracing should work even on ancient database versions…

By the way, if you are wondering, what’s the cursor number 47112801352808 in the “WAIT #47112801352808″ above? Shouldn’t the cursor numbers be small numbers?

Well, in 11.2.0.2 this was also changed. Before that, the X in CURSOR #X (and PARSE #X, BIND #X, EXEC #X, FETCH #X) represented the slot number in your open cursor array (controlled by open_cursors) in your session’s UGA. Now, the tracefile dumps out the actual address of that cursor. 47112801352808 in HEX is 2AD94DC9FC68 and it happens to reside in the UGA of my session.

Naturally I asked Cary Millsap about whether he had spotted this LOBREAD already and yes, Cary’s way ahead of me – he said that Method-R’s mrskew tool v2.0, which will be out soon, will support it too.

It’s hard to not end up talking about Cary’s work when talking about performance profiling and especially Oracle SQL trace, so here are a few very useful bits which you should know about:

If you want to understand the SQL trace & profiling stuff more, then the absolute must document is Cary’s paper on the subject – Mastering Performance with Extended SQL Trace:

Also, if you like to optimize your work like me (in other words: you’re proactively lazy ;-) and you want to avoid some boring “where-the-heck-is-this-tracefile-now” and “scp-copy-it-over-to-my-pc-for-analysis” work then check out Cary’s MrTrace plugin (costs ~50 bucks and has a 30-day trial) for SQL Developer. I’ve ended up using it myself regularly although I still tend to avoid GUIs:

Share

To prevent automated spam submissions leave this field empty.
Syndicate content