tanelpoder's picture

COUNT STOPKEY operation (the where ROWNUM

I was running tests on some very large tables on an Exadata cluster and found an interesting bug.

Instead of having to query 4Billion row tables, I have reproduced this case with a cartesian join…

Check this. I’m generating 8 Billion rows using a 3-way cartesian join of set of 2000 rows. So, this results in 2000 * 2000 * 2000 rows, which is 8 billion rows.

SQL>  with sq as (select null from dual connect by level <= 2000)
      select count(*)
      from sq a, sq b, sq c;


Everything worked well as you see. All 8 billion rows were nicely counted. Let’s modify this query a bit, by adding a WHERE rownum <= 8 000 000 000 predicate, which shouldn’t modify the outcome of my query as 8 billion rows is exactly what I have:

SQL> with sq as (select null from dual connect by level <= 2000)
     select count(*)
     from sq a, sq b, sq c
     where rownum <= 8000000000;


Ding! (as I’m unable to come up with a better word to present this problem :)

Seems like we have a problem! The COUNT operation thinks that we have only a bit over 4 billion rows returned from the query plan branches below it.

We clearly have a(n easily reproducible ) bug here!

What happens here is that whenever you use ROWNUM < N operator in a query predicate like I have done, you will introduce an additional step to the query plan (COUNT STOPKEY).

Check this simple query:


Plan hash value: 508354683

| Id  | Operation          | Name | E-Rows |
|   0 | SELECT STATEMENT   |      |        |
|*  1 |  COUNT STOPKEY     |      |        |
|   2 |   TABLE ACCESS FULL| T    |     41 |

Predicate Information (identified by operation id):

   1 - filter(ROWNUM<=1000)

As the above plan explains, the “WHERE rownum <” predicate is applied in the execution plan line (row source) 1 – COUNT STOPKEY. You won’t see this line when you are just counting rows without any “ROWNUM <” predicate.

How this works is that the COUNT STOPKEY knows how many rows you want and will just keep calling its child function under it in the execution plan tree to get more and more rows, until the required amount of rows have been returned. And then the COUNT STOPKEY function just stops calling its child row sources and declares the end-of-data condition.

And here’s the catch – due to a bug, the variable used to keep track of number of processed rows by COUNT STOPKEY is only 4 bytes, even on 64bit platforms. So, it can hold values up to 4294967295 in it (the count returned above), which is 2 ^ 32 – 1. That wouldn’t be so much of a problem in practical real world applications, but what I don’t like is that the operation will silently return wrong data – it will just stop fetching more rows, even though we haven’t reached the ROWNUM <= 8 000 000 000 limit yet and there are rows to fetch from the child row-source, but COUNT STOPKEY declares end-of-data condition and returns ~4B as a result.

This is (obviously a bug) and now there’s a bug number for that as well (thanks to Greg Rahn for letting me know) – Bug# 10214991 (unfortunately it’s not public in MOS).

Now, there’s a reason why this bug has gone unnoticed for so long despite that 4+ billion-row tables have existed for long time (I worked first with such Oracle database in year 2000 – on Windows NT :).

  1. A real-life business query with a construct of WHERE ROWNUM <= X makes sense only when the data is ordered by some meaningful business attribute (a column). Otherwise you’ll get the query results in quite random order. Note that I’m talking about real, business queries here.
  2. The only right way to order data in SQL is via an ORDER BY clause. There is no other right way to do it, even though some hacks sometimes work (and will stop working after the next upgrade)
  3. Nobody (or not many people) have written queries like: give me the sum of 5 billion biggest purchases ordered by the selling price or give me the sum of last 5 billion purchases up to this date. If you replace the billion by just ten, hundred, or just a thousand, then hell yeah, such types of queries are being executed every day (or should I say minute, second).
    Whether using ROWNUM in your business queries (instead of actual date range or purchase price in $$$) is a good or bad design is a completely different question – the fact is that ROWNUM is used in such SQLs.

Now we’ll just need to wait until the data volumes (and reports) get so big that someone actually hits this problem in real life!

So, better patch your database before the end of this decade! :-)


oraclebase's picture

Melbourne Update

The Melbourne class was a little bigger than the previous classes, with 12 people including several people from Oracle Support. Being the 3rd class in quick succession, everything felt really smooth, with no real surprises.

I actually managed to get a reasonable amount of sleep before both classes, which was a bonus. Saying that, I fell asleep at 20:00 last night, so I ended up waking up at 03:00, which gave me some time to play catch-up with emails and forum questions. It does mean that today will end up being a pretty long day, since my flight to Hong Kong isn’t until 00:15 (including a slight delay).

Apparently there have been typhoon warnings in Hong Kong and the schools have been closed, but surprisingly it hasn’t really affected flights. Let’s see how that works out.

This was my third visit to Australia, but only my second visit to Sydney and Melbourne. The real take-home message from this trip is I felt really at ease all the time I’ve been in Australia. I’m not surprised a lot of British people decide to move to Australia. It’s probably one of the easiest places for a Brit to relocate to.



Update: I’m in the hotel in Hong Kong now. No signs of a typhoon, but I’ve received a mail saying the class may be cancelled if the weather is too bad. Weird. I’m just off to lunch in the tallest building in Hong Kong, or so my mate tells me.

oradebug's picture

How I got access to My Oracle Support (MOS) for US$2.67

Oracle professionals know that the two main sites for information and support on running Oracle products are the Oracle Technology Network (OTN), and My Oracle Support (MOS).

kevinclosson's picture

Configuring Linux Hugepages for Oracle Database Is Just Too Difficult! Isn’t It? Part – II.

After my recent blog entry entitled   Configuring Linux Hugepages for Oracle Is Just Too Difficult! Isnt It? Part I, I engaged in a couple of email threads and a thread on oracle-l about when to employ hugepages.  In those exchanges I was amazed to find that it is still a borderline issue for folks. I feel it is very cut and dried and thus I prepared the following guidelines that more or less spell it out.

  1. Reasons for Using Hugepages
    1. Use hugepages if OLTP or ERP. Full stop.
    2. Use hugepages if DW/BI with large numbers of dedicated connections or a large SGA. Full stop.
    3. Use hugepages if you don’t like the amount of memory page tables are costing you (/proc/meminfo). Full stop.
  2. SGA Memory Management Models
    1. AMM does not support hugepages. Full stop.
    2. ASMM supports hugepages.
  3. Instance Type
    1. ASM uses AMM by default. ASM instances do not need hugepages. Full stop.
    2. All non-ASM instances should be considered candidate for hugepages. See 1.1->1.3 above.
  4. Configuration
    1. Limits (multiple layers)
      1. /etc/security/limits.conf establishes limits for hugepages for processes. Note, setting these values does not pre-allocate any resources.
      2. Ulimit also establishes hugepages limits for processes.
  5. Allocation
    1. /etc/sysctl.conf vm.nr_hugepages allocates memory to the hugepages pool.
  6. Sizing
    1. Read MOS 401749.1 for information on tools available to aid in the configuration of vm/nr_hugepages

To make the point of how urgently  Oracle DBAs need to qualify their situation against list items 1.1 through 1.3 above, please consider the following quote from an internal email I received. The email is real and the screen output came from a real customer system. Yes, 120+ gigabytes of memory wasted in page tables. Fact is often stranger than fiction!

And here is an example of kernel pagetables usage, with a 24GB SGA, and 6000+ connections ..  with no hugepages in use ..

# grep PageT /proc/meminfo

PageTables:   123731372 kB

Filed under: oracle

cary.millsap's picture

Virtual Seminar: "Systematic Oracle SQL Optimization in Real Life"

On November 18 and 19, I’ll be presenting along with Tanel Põder, Jonathan Lewis, and Kerry Osborne in a virtual (GoToWebinar) seminar called Systematic Oracle SQL Optimization in Real Life. Here are the essentials:

What: Systematic Oracle SQL Optimization in Real Life.
oraclebase's picture

Sydney Update

I’m now sitting in my hotel room in Melbourne, so the Sydney experience is complete.

Originally I was told the Oracle University classes would be cancelled if there were less than 10 people. In Singapore I got 9 people, which was the smallest class I had ever taught for Oracle University. I immediately beat that record in Sydney by having 7 people in the class. I’m guessing that from an expenses point of view, the costs are lower because I’m doing four courses in what amounts to a single round trip to Australia. If this were just a single class requiring a requiring a return flight from UK to Sydney it wouldn’t have happened. Anyway…

Day 1 went smoothly. I got a few questions that made me think, which is always good. Once I had finished the class, it was straight off to the train station to get into town to speak at the Oracle Meetup organised by the Pythian guys. The train journey took longer than I expected, so I arrived about 20 minutes late, by which point the projector was irretrievable locked away. So instead I did my presentation with my laptop pointing at the people around the table and did a lot of zooming. :) Despite this setback, which was totally my fault, it seemed to go OK.

It’s always good to meet new people, but I was especially happy to finally meet Nuno “Noons” Souto and Gary Myers, whose blogs I’ve been following for ages and who have both helped me in the past by correcting my numerous mistakes. After the presentation finished and we had an informal chat, it was back to the train station and then the hotel.

Day 2 of the course went smoothly enough. I had finally got something resembling sleep, so I felt a bit more on top of my game. As always, I over ran. If they gave me 3 days I’m sure I would still over run. :)

This morning was a 06:30 flight to Melbourne, so I had to get up at about 04:00 to get ready and get the taxi to the airport. I actually woke up at 03:00, so I guess I’m going to feel a bit rough later. Luckily I’ve got the rest of the day off to recover before I start the Melbourne class tomorrow.

Melbourne was the first class to confirm, so I’m guessing it has a minimum of 10 people, but you never know. Perhaps I can break my record again. :)

On a slightly worrying note, I’m having a bit of trouble with my credit and debit cards. I warned the banks involved that I was travelling so my usage might look a little odd. I also banged some cash onto them to preempt any problems with bills coming in while I was away. Even so, it’s all been a bit random as to which cards are accepted and which cards fail. If all else fails I’ll get Larry to fly me over some cash… :)



Kerry Osborne's picture

Virtual Oracle Conference

Tanel Poder has put together an Online Virtual Oracle Conference and he asked me to participate. I had to think about it for about 2 seconds before I said yes. The speakers are Tanel, Cary Millsap, Jonathan Lewis and myself. Wow! How did I sneak into that line up you might wonder. Apparently Tanel asked Tom Kyte first, but Tom had to decline. It’s not the first time I’ve played second fiddle to Tom – and hopefully won’t be the last. I feel like I’m in pretty tall cotton (as we say in Texas).

Here’s how Tanel pitched the idea to me:

My reasoning is that we start from higher level and then drill down:

1) Cary talking about how to approach things right (and eventually find the problem SQL or operations)
2) I’m drilling down inside a SQL with row-source profiling (and eventually find where is the problem)
3) Jonathan talks why it happens and what to do about it (how to help CBO by writing better SQL)
4) Kerry talks how to help CBO and fix SQL when you can’t actually change the SQL text to any better

The whole idea of running a seminar on-line without anyone traveling is a very interesting topic. The technology has progressed to the point where it is quite possible. The idea of a short conference with multiple pretty well known speakers is also pretty cool. It will be interesting to see what kind of participation we get.

The conference is scheduled to be 8 hours spread over two days, Nov. 18 and 19. Follow the link below to read more about the event including the schedule, cost, how to sign up and abstracts for the talks.

Virtual Oracle Conference

Hope to see you there!

P.S. – There is a pretty good discount if you sign up before Nov. 1.

tanelpoder's picture

Read currently running SQL statement’s bind variable values using V$SQL_MONITOR.BIND_XML in Oracle 11.2

The title pretty much says it. In Oracle 11.2, if you have the Diag+Tuning Pack licenses and the SQL monitoring kicks in for your SQL statement, then instead of the old fashioned ERRORSTACK dump reading you can just query the V$SQL_MONITOR.BIND_XML to find the values and metadata of your SQL statement’s bind variables.

I’ve written an example here:

And a related comment – V$SQL_BIND_CAPTURE is not a reliable way for identifying the current bind variable values in use. Oracle’s bind capture mechanism does not capture every single bind variable into SGA (it would slow down apps which run lots of short statements with bind variables). The bind capture only selectively samples bind values, during the first execution of a new cursor and then every 15 minutes from there (controlled by _cursor_bind_capture_interval parameter), assuming that new executions of that same cursor are still being started (the capture happens only when execution starts, not later during the execution).


tanelpoder's picture

Few new blogs worth checking out

Here are few blogs which I’ve found lately and think that you should also check out if you’re interested in Oracle stuff (and take time to read blogs :)

Oracle performance & Grid Control

Oracle Exadata

Oracle, Unix, storage (for DW)

Oracle, Storage & Virtualization (VMWare)

Julian Dyke’s new blog (RAC, Data Guard so far – probably internals later :-)


oraclebase's picture

Singapore Update

Call me paranoid, but when I started filling in a customs declaration that mentions the death penalty for drug traffickers, I suddenly got worried about the cold cures and vitamin C pills in my bag.

When I walked out of the airport in Singapore I couldn’t believe how humid it was. Dubai is hot, and Washington was very humid this year, but Singapore was something else.

I got to the hotel pretty late and I was teaching the next day, so I went straight to bed. The Oracle University office was about 1 km from the hotel so I walked to it. By the time I got there I felt like I needed a shower. Even at 08:00 in the morning it felt like hard work. After a couple of hundred yards I could feel myself panting like a dog.

The class was pretty small, with nine paying attendees and one Oracle University instructor who came along for the ride. Big classes can be exciting, but small classes feel more informal, so either way it’s pretty cool. We went out as a group at lunch time on both days, so there was plenty of time for gossiping, as well as the teaching stuff.

Singapore is full of restaurants. It feels like every building has a food court in it, and most of the places are busy. Speaking to a couple of people on the course, it seems many of the people never cook. Eating out is so cheap that unless you are cooking for about 6 people, it’s cheaper to eat out that cook at home. Sounds good to me. :)

On the first evening I went for a walk around the bay area. It was dark by the time I got out, so all the photos are night shots. I’m not great with a camera at the best of times, but add in darkness and you know it’s going to go wrong. I took a lot of shots, but I’ve deleted most of them because they were just too blurry. The best of them made it here. The “altitude” shots, where you can see my reflection in the glass, were taken from the Singapore Flyer. My photos don’t do it justice.

The skyline would make you believe the place is like a scene out of Blade Runner, but when you are walking around the central business district it actually feels quite open. There is a lot of space between the buildings, so it never feels claustrophobic. Another nice thing is the streets are very clean. No litter, no chewing gum stuck everywhere and no dog crap on the pavements. If only the residents of Birmingham could act like this.

The bay area is obviously where all the money is. There is a crazy casino built to look like a boat is sitting on three towers. Next to that there is an absolutely giant shopping mall, half of which is still under construction. They were prepping for a Christian Dior show when I was walking round. The other side of the bay seems to be where the young and rich go out to pose in the evening. Lots of very expensive cars and people trying hard to be noticed. I’m surprised I wasn’t escorted away by the authorities. :)

On the second day of the course we had a couple of delays, so I ended up running over by an hour. No big surprise there. :)

I was going to go to the Night Safari on the second evening, but I foolishly turned on my laptop when I got back to the hotel and the next thing I knew it was 22:00, so I went to bed.

This morning it was a 06:00 start to get to the airport. I waited for a few minutes for a taxi and started to break a sweat standing still. Even at silly o’clock in the morning it was boiling.

The journey from Singapore to Sydney was pretty standard seven hour flight, so nothing major to report, but a few little incidents will be added to a general travel rant (about passengers) I am in the process of writing. :)



To prevent automated spam submissions leave this field empty.
Syndicate content