Oakies Blog Aggregator

oradebug's picture

How Incompetence Stimulates the Information Technology Economy

As information technology professionals, we constantly complain about mismanaged projects in which we have the misfortune to be involved.  Frequently, someone with more power than knowledge – usually in management and under the sway of a persuasive vendor sales team – has come up with a systems design most politely described as novel. Because of our experience and understanding of the technology, we can clearly see the plan will fail.  Heedlessly, despite our loud protests, the project plods onward toward oblivion, consuming money, time and our human spirit along the way.

We feel bitter and exasperated at being ignored.  We are shocked at the audacity of spending a corporation’s money when you don’t know what you are doing.  Our most common reaction is to escalate, complain, and engage in a battle of wills with the author of the flawed plan.

We must change our thinking. To engage in resistance to doomed projects is both futile and self-destructive.  By releasing our attachment to the idea that projects must be done right, we begin to see the virtuous effects of projects being done wrong.

Of course it is our duty to register our objections to a flawed plan to those in charge in email, using clear, technical and unambiguous terms. Beyond that, however, our emotional involvement should end.  In the event our customer or employer deploys a badly designed system, it will surely require many skilled, competent troubleshooters to fix the system.  That’s us! Incompetent systems design results in strong demand for competent technical people.

But the benefits to the I.T. economy are not restricted to the aftermath.  During implementation of a failed system, a company pays vendors and employees to build it, even though they lack the skills or knowledge to realize their system is doomed.  Without doomed projects, these people would not even be part of an I.T. economy.  It is like a kind employment stimulus based on stupidity.

So don’t feel hate and loathing if you are in the midst of a doomed project, and nobody will listen to you.  Release your attachment to the need for others to do things right, and wait for failure.  When it comes, do your best to help.

I’m not saying I think this is how things should be.  I’m just saying it works paradoxically to the benefit of good technical contributors.  It certainly is not something to get bent out of shape about.

marco's picture

Full House @MOW 2010 in Denmark

Wow, it will be a full house during Miracle OpenWorld 2010, as Moans announced during this video, maybe there won't be enough beds, maybe there won't be enough food, but Moans at least promised us, the 208+ attendees, the new Miracle Brew, anyway, have a look...

alberto.dellera's picture

Xtrace: an Oracle session trace browser (introduction)

Xtrace is a graphical tool that can navigate Oracle trace files, manipulate them, and optionally get them back as a text file. It actually makes (much) more, but in this first post we are going to focus on its basic browsing capabilities.

Let’s see the tool in action on the trace file produced by this simple PL/SQL block:

begin
  for r in (select * from t) loop
    null;
  end loop;
end;

The resulting trace file is

WAIT #2: nam='SQL*Net message from client' ela= 61126 driver id=1413697536 #bytes=1 p3=0 obj#=76357 tim=5789636384898
=====================
PARSING IN CURSOR #26 len=66 dep=0 uid=73 oct=47 lid=73 tim=5789636385129 hv=3421439103 ad='aeb809c8'
begin
  for r in (select * from t) loop
    null;
  end loop;
end;
END OF STMT
PARSE #26:c=0,e=153,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=5789636385122
BINDS #26:
=====================
PARSING IN CURSOR #28 len=15 dep=1 uid=73 oct=3 lid=73 tim=5789636386184 hv=1406298530 ad='a0503300'
SELECT * FROM T
END OF STMT
PARSE #28:c=0,e=804,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,tim=5789636386181
BINDS #28:
EXEC #28:c=0,e=64,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=5789636386284
WAIT #28: nam='db file sequential read' ela= 19 file#=4 block#=59 blocks=1 obj#=76357 tim=5789636386383
WAIT #28: nam='db file sequential read' ela= 11 file#=4 block#=60 blocks=1 obj#=76357 tim=5789636386457
FETCH #28:c=0,e=243,p=2,cr=3,cu=0,mis=0,r=100,dep=1,og=1,tim=5789636386566
FETCH #28:c=0,e=54,p=0,cr=1,cu=0,mis=0,r=100,dep=1,og=1,tim=5789636386663
FETCH #28:c=0,e=3,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=5789636386693
EXEC #26:c=0,e=1543,p=2,cr=4,cu=0,mis=0,r=1,dep=0,og=1,tim=5789636386746
WAIT #26: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=76357 tim=5789636387057
WAIT #26: nam='SQL*Net message from client' ela= 42743 driver id=1413697536 #bytes=1 p3=0 obj#=76357 tim=5789636429824
STAT #28 id=1 cnt=200 pid=0 pos=1 obj=76357 op='TABLE ACCESS FULL T (cr=4 pr=2 pw=0 time=363 us)'

Even for this artificially simple trace file, it takes a lot of effort to read and understand it; for example, it takes a while to associate the recursive SQL lines to the execution of the PL/SQL blocks (the “EXEC #26” line).

With Xtrace, the trace reading experience is remarkably much better:

Note the indentation by recursive level (which is provided out-of-the -box) and the color of the lines by statement (that takes perhaps a minute in order to be set up).
You can try this example live by pressing the “Launch” button above if you are interested; in particular, try the “Options” button of the middle pane, and the “set color” popup menus of the top pane.

Suggestion: you might even check the hyperlinks that links together the lines; for example, the xct pointer that links the SQL recursive calls to the parent “EXEC #26” (check the interactive manual for more information).

You can also get the trace back as a text file, if so desired:

000 line zero
001 xtrace: log file 'E:\localCVS30\TrilogyLectures\MioSitoWeb\xtrace\dist\xtrace.log'
002 VIRTUAL CALL #-4: 'null call - ignore this'
003 VIRTUAL CALL #-4: 'null call - ignore this'
004 +WAIT #2: nam='SQL*Net message from client' xe=ela=61126 p1='driver id'=1413697536 p2='#bytes'=1 p3=''=0 xphy=0 obj#=76357 tim=5789636384898
005 VIRTUAL CALL #-8: 'wait-for-client'
006 VIRTUAL CALL #-5: 'client-message-received'
007 ---------------------PARSING IN CURSOR #26: len=66 dep=0 uid=73 oct=47 lid=73 tim=5789636385129 hv=3421439103 ad='0eb809c8'
    begin
      for r in (select * from t) loop
        null;
      end loop;
    end;
    END OF STMT
008 PARSE #26: mis=0 r=0 dep=0 og=1 tim=5789636385122 e=153 c=0 p=0 cr=0 cu=0
009 BINDS #26:
010   ---------------------PARSING IN CURSOR #28: len=15 dep=1 uid=73 oct=3 lid=73 tim=5789636386184 hv=1406298530 ad='00503300'
      SELECT * FROM T
      END OF STMT
011   PARSE #28: mis=1 r=0 dep=1 og=1 tim=5789636386181 e=804 c=0 p=0 cr=0 cu=0
012   BINDS #28:
013   EXEC  #28: mis=0 r=0 dep=1 og=1 tim=5789636386284 e=64 c=0 p=0 cr=0 cu=0
014   +WAIT #28: nam='db file sequential read' xe=ela=19 p1='file#'=4 p2='block#'=59 p3='blocks'=1 xphy=1 obj#=76357 tim=5789636386383
015   +WAIT #28: nam='db file sequential read' xe=ela=11 p1='file#'=4 p2='block#'=60 p3='blocks'=1 xphy=1 obj#=76357 tim=5789636386457
016   FETCH #28: mis=0 r=100 dep=1 og=1 tim=5789636386566 e=243 c=0 p=2 cr=3 cu=0
017   FETCH #28: mis=0 r=100 dep=1 og=1 tim=5789636386663 e=54 c=0 p=0 cr=1 cu=0
018   FETCH #28: mis=0 r=0 dep=1 og=1 tim=5789636386693 e=3 c=0 p=0 cr=0 cu=0
019 EXEC  #26: mis=0 r=1 dep=0 og=1 tim=5789636386746 e=1543 c=0 p=2 cr=4 cu=0
020 -WAIT #26: nam='SQL*Net message to client' xe=ela=2 p1='driver id'=1413697536 p2='#bytes'=1 p3=''=0 xphy=0 obj#=76357 tim=5789636387057
021 +WAIT #26: nam='SQL*Net message from client' xe=ela=42743 p1='driver id'=1413697536 p2='#bytes'=1 p3=''=0 xphy=0 obj#=76357 tim=5789636429824
022 VIRTUAL CALL #-8: 'wait-for-client'
023 VIRTUAL CALL #-5: 'client-message-received'
024   STAT  #28: id=1 pid=0 pos=1 obj=76357 op='TABLE ACCESS FULL T' cnt=200 avg(cnt)=200.0 card=n/a cr=4 avg(cr)=4.0 cost=n/a pr=2 pw=0 time=363 size=n/a xnexecs=1 xstatn=0 xplannum=0
025
026 VIRTUAL CALL #-4: 'null call - ignore this'

This can be obtained using the “save as text“ popup menu of the middle pane.

We are going to keep exploring Xtrace in the upcoming posts.

CONNECT BY oddity

This week I’ve seen an issue with a CONNECT BY query: for some reason Oracle 10.2.0.4 decided to build a weird plan (the query is weird too, but that’s not my point here ). An explanation of why that happened looks interesting, so here it is. Set up: drop table t2 cascade constraints purge; drop [...]

Unintended Consequences

Hi all,

My apologies for my extended absence.  Well, I ran into something interesting today, and, I thought it would be appropriate for a blog post, and apparently, I got inspired.  No startling revelation here, this is just a bit of a cautionary tale about the unintended consequences of using a new feature.  A quick search of the Oracle Documentation confirms that the skip locked directive of the select for update statement was introduced in Oracle 11g, version 11.1. But, before we dive into that, let’s review the functionality of select for update through versions of Oracle preceding 11g.

Select for update

First, a bit about select for update in general. The select for update functionality has been available in Oracle since some very ancient release of Oracle. It allows for the user to lock specific rows in a table, without changing them. This is generally used as part of a pessimistic locking strategy. (See here for an explanation of optimistic and pessimistic locking.) select for update default behavior is to wait for locks. That is, Oracle will do a read consistent view of the data, to identify the set of rows that will be locked, and then do current mode gets to impose row-level locks on that set of rows. If, when attempting to lock the set of rows, a row is encountered that is already locked by another session, the session will wait, indefinitely, on a TX enqueue, for a commit or rollback to free the lock, and then execution will continue.

Select for Update with nowait

Oracle also provides for a nowait option. If the developer codes select ... from .... for update nowait, then the select for update will proceed in much the same way described above, except, if it encounters a row locked by a different session, rather than waiting, it will raise an ORA-00054 resource busy and NOWAIT specified error and statement level rollback will occur.

Select for Update with wait n

The functionality of select for update, with the option of nowait, was unchanged from some very ancient version of Oracle, up until Oracle 9i Release 2. Starting with Oracle 9i Release 2, (version 9.2.0), Oracle introduced the wait n clause. This allows the developer to modify the behavior of the select for update to wait up to n seconds. If a lock cannot be acquired on a particular row for n seconds, Oracle will raise an ORA-30006 resource busy; acquire with WAIT timeout expired error, and statement level rollback will occur.

New in 11g!

Starting with 11g, (version 11.1.0), Oracle introduced the skip locked clause. This clause allows for the select for update to skip any rows it encounters that are already locked. In this way, it’s neither subject to waiting for locks, nor to the ORA-00054 or ORA-30006 error. It simply locks what it can, and moves on. This can be quite useful, when the application design calls for single or multiple producers of data (programs that add rows to a table) and multiple consumers (concurrently executing programs that consume, or delete, rows from the table). In this scenario, and absent any strategy to partition the data so that the consumers do not interfere with each other, implementing either a wait or nowait strategy with select for update simply won’t work. But, the skip locked functionality fits the bill very well. Each consumer can query the table, locking the rows that are not locked, and skipping over any rows already locked by someone else. All the consumers can work, and none of them serialize behind each other. Note that this functionality has actually been available in Oracle since at least version 8.0, (the Advanced Queueing feature is built on it), but was not documented, and therefore not supported by Oracle, until version 11.1.

Unintended Consequences

So, here begins my tale of unintended consequences. We have an application in which there are several concurrent processes, all running the same code, consuming rows from a table. The program does a select for update skip locked, does some processing, and then finally deletes the locked rows. The SQL is similar in form to:

select ...
from [table_name]
where [some predicates]
and rownum < 101
for update skip locked;

So, find the first 100 rows in the table that satisfy the predicates, skipping over any rows that are already locked. Now, one would expect, given that the number of rows in the table is far greater than 100 * (number of concurrent consumers), each consumer process should have no problem identifying 100 rows for processing. I was a bit puzzled then, when it was reported to me that Oracle was actually returning significantly fewer than 100 rows. The code is simple, the logic is simple, so, what’s going on? How could Oracle return fewer than 100 rows, when there’s plenty of data in the table? Note that there is no error being raised, Oracle is simply going through the table, locking what it can, skipping what it can’t, and coming up with fewer than 100 rows. This in spite of the fact that there are 50 or fewer consumer processes, and that there are 10s of thousands (sometimes 100s of thousands) of rows in the table. The most rows that could be locked at any one time would be 50 consumer processes * 100 rows per process = 5000 rows. So, what’s going on? After reviewing the application level log files, and and several discussions and walk-throughs of the code, we were stumped. So, I try enabling 10046 trace, and sure enough, significantly fewer than 100 rows are being locked. Simple queries prove that there are more than enough eligible rows in the table. So, again, we are stumped. Another test, running only one consumer process, shows that the code behaves as expected when run single-threaded. So, there’s a concurrency problem….but what? Is there some bizarre bug in the skip locked functionality? It seems premature to consider that, so I decided to rule out that possibility, as extremely unlikely, unless I can find more direct evidence of that fact. So, what next? Well, I wondered, perhaps there is a diagnostic event that would reveal what’s happening under the covers with skip locked? Well, I tried poking around and looking for such an event, but, to date, I’m not aware of any such functionality. So, how else to continue the investigation? Remember, as I said before, no error is being raised. The only symptom is that the number of rows returned is much smaller than expected. After another brainstorming session with the architect who designed and wrote the code, we came up with the idea to change the code in our test environment, and remove the skip locked clause, causing the code to block, and allowing us (hopefully) to clearly see where the locking is happening. Sure enough, when we did that, the problem revealed itself almost immediately, and the solution was clear and obvious. When we started multiple, concurrent consumer programs, one wait event was immediately obvious and overwhelming:

enq: TX - allocate ITL entry

So, clearly, the root cause of all my problems and headaches is a shortage of ITL entries in the header of the data block. So, the blocking is happening at the block-level, not the row-level. This explains why locks are not being taken on rows that are not locked, even when the cursor has not yet identified 100 candidate rows. Due to the high level of concurrency, the number of ITL slots (8k block size * 10% PCTFREE = 819 bytes free space per block / 24 bytes per ITL entry = a max of 34 ITL entries per block) is not sufficient. We are running up to 50 concurrent consumer processes, and may want to go as high as 100, and the number of rows per block is well over 100 in many cases. So, now that we’ve gotten to the bottom of the problem, the solution is fairly straightforward and simple. A simple alter table ... move nologging initrans 110 and a rebuild of the indexes, and we are back in business.

Conclusion

In my opinion, the interesting part of this little puzzle, is the effect of the skip locked functionality. The whole point, is to allow the code to skip rows that have been locked by another session, and continue running. That’s great, and allows for coding applications that can be highly scalable and performant. However, it’s important to understand that the skip locked clause means that Oracle will skip any time a TX lock is encountered, including TX locks that are happening at the block-level. The problem is, if you’re encountering non-row level locking, skip locked is still going to skip over the locked resource and move on. No error will be reported! This may lead to unpredictable results. The moral of the story is that skip locked doesn’t necessarily mean skip locked rows, it means skip any locked resource.

It seems that there is always how we think a feature works, and how it actually works.

So, this concludes my investigation. I hope it helps someone in the future avoid a similar problem.

I’m going to make an attempt to blog more regularly, but, I know better than to make any promises! Until next time….

doug's picture

11gR2 for Windows 64 on OTN

Thanks to Marco mentioning it on Twitter, I downloaded and installed 11gR2 for Windows from OTN this evening. The reason I had to switch to OEL on VMWare was because of the delay on the 11gR2 Windows release, but I'll give it a try and see which works best for me. Installation was certainly a breeze.

Jonathan Lewis's picture

Failed Login

Here’s a piece of code I found recently running every half hour on a client site:

SQL_ID = 2trtpvb5jtr53
SELECT
TO_CHAR(current_timestamp AT TIME ZONE :"SYS_B_0", :"SYS_B_1") AS curr_timestamp,
COUNT(username) AS failed_count
FROM
sys.dba_audit_session
WHERE
[...]

harald's picture

A few words on Index Compression

For the last couple of months I have been quite busy delivering Oracle11g New Features courses and one of the topics covered is table compression. To introduce this topic I ask my students in which version the first compression feature was introduced. Strangely enough almost no one seems to know the correct answer, which is […]

kevinclosson's picture

Another “New” Blog Worth Reading.

Earlier this week I was taking a gander at what in-bound traffic my site was getting. One referrer stood out in my initial perusal. While I don’t know Gavin Soorma, just a bit of reading on his blog led me to conclude that this site is a jewel worth visiting:
Gavin Soorma
[...]

doug's picture

Outages

Some people might have noticed that the blog has been down a few times over the past week, particularly on Friday. The server that the site and blog is hosted on has been subject to some sort of hack attack and as a result it was moved to a new server and all privileges were locked down for a while, particularly on the blog directory

One factor was probably that my Serendipity installation was fairly out of date - the punchline being that one of my Bank Holiday weekend tasks was to upgrade it. Too late, unfortunately :-(

But, thanks to the efforts of Mike Scott, everything is back online and both blogs have been updated to the latest version of Serendipity.

Many thanks to Mike and if I can tear myself from my year-end accounts long enough, I might still have some time to reply to some comments and get the stats stuff moving again.

To prevent automated spam submissions leave this field empty.