IOT Bug

Jonathan Lewis's picture

Here’s a worrying bug that showed up a couple of days ago on the Oracle-L mailing list. It’s a problem that I’ve tested against 12.2.0.1 and 19.3.0.0 – it may be present on earlier versions of Oracle. One of the nastiest things about it is that you might not notice it until you get an “out of space” error from the operating system. You won’t get any wrong results from it, but it may well be adding an undesirable performance overhead.

Basically it seems that (under some circumstances, at least) Oracle is setting the “block guess” component of the secondary index on Index Organized Tables (IOTs) to point to blocks in the overflow segment instead of blocks in the primary key segment. As a result, when you execute a query that accesses the IOT through the secondary index and has to do reads from disc to satisfy the query – your session goes through the following steps:

  • Identify index entry from secondary index – acquire “block guess”
  • Read indicated block and discover the object number on the block is wrong, and the block type is wrong
  • Write a (silent) ORA-01410 error and do a block dump into the trace file
  • Use the “logical rowid” from the secondary index (i.e. the stored primary key value) to access the primary key index by key value

So your query runs to completion and you get the right result because Oracle eventually gets there using the primary key component stored in the secondary index, but it always starts with the guess[see sidebar] and for every block you read into the cache because of the guess you get a dump to the trace file.

Here’s a little code to demonstrate. The problem with this code is that everything appears to works perfectly, you have to be able to find the trace file for your session to see what’s gone wrong. First we create some data – this code is largely copied from the original posting on Oracle-L, with a few minor changes:


rem
rem     Script:         iot_bug_12c.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2019
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem
rem     Notes
rem     THe OP had tested on 19.5.0.0 to get the same effect, see:
rem     //www.freelists.org/post/oracle-l/IOT-cannot-get-valid-consensus-bug-or-unexplained-behavio
rem

drop table randomload purge;

create table randomload(
        roll number,
        name varchar2(40),
        mark1 number,
        mark2 number,
        mark3 number,
        mark4 number,
        mark5 number,
        mark6 number,
        primary key (roll)
) 
organization index 
including mark3 overflow
;

create index randomload_idx on randomload(mark6);

insert into randomload 
select 
        rownum, 
        dbms_random.string(0,40) name, 
        round(dbms_random.value(0,100)), 
        round(dbms_random.value(0,100)), 
        round(dbms_random.value(0,100)), 
        round(dbms_random.value(0,100)), 
        round(dbms_random.value(0,100)), 
        round(dbms_random.value(0,10000)) 
from 
        dual 
connect by 
        level < 1e5 -- > comment to avoid wordpress format issue
;

commit;

exec dbms_stats.gather_table_stats(null,'randomload', cascade=>true);

prompt  ==================================================
prompt  pct_direct_access should be 100 for randomload_idx
prompt  ==================================================

select 
        table_name, index_name, num_rows, pct_direct_access, iot_redundant_pkey_elim  
from 
        user_indexes
where
        table_name = 'RANDOMLOAD'
;

It should take just a few seconds to build the data set and you should check that the pct_direct_access is 100 for the index called randomload_idx.

The next step is to run a query that will do an index range scan on the secondary index.

 
column mark6 new_value m_6

select 
        mark6, count(*) 
from
        randomload 
group by 
        mark6
order by 
        count(*)
fetch first 5 rows only
;

alter system flush buffer_cache;
alter session set events '10046 trace name context forever, level 8';
set serveroutput off

select avg(mark3) 
from 
        randomload 
where 
        mark6 = &m_6
;

select * from table(dbms_xplan.display_cursor);

alter session set events '10046 trace name context off';
set serveroutput on

I’ve started by selecting one of the least frequencly occuring values of m_6 (a column I know to be in the overflow); then I’ve flushed the buffer cache so that any access I make to the data will have to start with disk reads (the original poster suggested restarting the database at this point, but that’s not necessary).

Then I’ve enabled sql_trace to show wait states (to capture details of what blocks were read and which object they belong to),, and I’ve run a query for m_3 (a column that is in the primary key (TOP) segment of the IOT) and pulled its execution plan from memory to check that the query did use a range scan of the secondary index. Here’s the plan:

----------------------------------------------------------------------------------------
| Id  | Operation          | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                   |       |       |    11 (100)|          |
|   1 |  SORT AGGREGATE    |                   |     1 |     7 |            |          |
|*  2 |   INDEX UNIQUE SCAN| SYS_IOT_TOP_77298 |    10 |    70 |    11   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN| RANDOMLOAD_IDX    |    10 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("MARK6"=1316)
   3 - access("MARK6"=1316)

As you can see the plan shows what we are hoping to see – an index range scan of the secondary index that let’s it follow up with a unique scan of the primary key segment. It’s just a little odd that the access predicate reported for operation 2 (unique scan of TOP) suggests that the access is on a column that isn’t in the primary key and isn’t even in the TOP section.

So the query works and gives the right answer. But what do we find in the trace directory ? If you’re running 12c (possibly only 12.2), each time the error occurs the following pattern of information will be written to the alert log (it didn’t appear in 19.3)


ORCL(3):Hex dump of (file 22, block 16747) in trace file /u01/app/oracle/diag/rdbms/orcl12c/orcl12c/trace/orcl12c_ora_7888.trc
ORCL(3):
ORCL(3):Corrupt block relative dba: 0x0580416b (file 22, block 16747)
ORCL(3):Bad header found during multiblock buffer read (logical check)
ORCL(3):Data in bad block:
ORCL(3): type: 6 format: 2 rdba: 0x0580416b
ORCL(3): last change scn: 0x0000.0b86.0e36484c seq: 0x1 flg: 0x06
ORCL(3): spare3: 0x0
ORCL(3): consistency value in tail: 0x484c0601
ORCL(3): check value in block header: 0x4408
ORCL(3): computed block checksum: 0x0
ORCL(3):

And the following pattern of information is written to the trace file [Update: a follow-up test on 11.2.0.4 suggests that the basic “wrong block address” error also happens in that version of Oracle, but doesn’t result in a dump to the trace file]:


kcbzibmlt:: encounter logical error ORA-1410, try re-reading from other mirror..
cursor valid? 1 makecr 0 line 15461 ds_blk (22, 16747) bh_blk (22, 16747)
kcbds 0x7ff1ca8c0b30: pdb 3, tsn 8, rdba 0x0580416b, afn 22, objd 135348, cls 1, tidflg 0x8 0x80 0x0
    dsflg 0x108000, dsflg2 0x0, lobid 0x0:0, cnt 0, addr 0x0, exf 0x10a60af0, dx 0x0, ctx 0
    whr: 'qeilwh03: qeilbk'
env [0x7ff1ca8e3e54]: (scn: 0x00000b860e364893   xid: 0x0000.000.00000000  uba: 0x00000000.0000.00  statement num=0  parent xid:  0x0000.000.00000000  st-scn: 0x0000000000000000  hi-scn: 0x0000000000000000  ma-scn: 0x00000b860e364879  flg: 0x00000660)
BH (0xb1fd6278) file#: 22 rdba: 0x0580416b (22/16747) class: 1 ba: 0xb1c34000
  set: 10 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 763,14
  dbwrid: 0 obj: 135348 objn: 135348 tsn: [3/8] afn: 22 hint: f
  hash: [0x9eff0528,0x77cff808] lru: [0xb1fd2578,0x9ff84658]
  ckptq: [NULL] fileq: [NULL]
  objq: [0xb6f654c0,0x9ff84680] objaq: [0xb6f654d0,0x9ff84690]
  use: [0x77b78128,0x77b78128] wait: [NULL]
  st: READING md: EXCL tch: 0
  flags: only_sequential_access
  Printing buffer operation history (latest change first):
  cnt: 5
  01. sid:10 L122:zgb:set:st          02. sid:10 L830:olq1:clr:WRT+CKT
  03. sid:10 L951:zgb:lnk:objq        04. sid:10 L372:zgb:set:MEXCL
  05. sid:10 L123:zgb:no:FEN          06. sid:10 L083:zgb:ent:fn
  07. sid:08 L192:kcbbic2:bic:FBD     08. sid:08 L191:kcbbic2:bic:FBW
  09. sid:08 L604:bic2:bis:REU        10. sid:08 L190:kcbbic2:bic:FAW
  11. sid:08 L602:bic1_int:bis:FWC    12. sid:08 L822:bic1_int:ent:rtn
  13. sid:08 L832:oswmqbg1:clr:WRT    14. sid:08 L930:kubc:sw:mq
  15. sid:08 L913:bxsv:sw:objq        16. sid:08 L608:bxsv:bis:FBW
Hex dump of (file 22, block 16747)

   ... etc.

Corrupt block relative dba: 0x0580416b (file 22, block 16747)
Bad header found during multiblock buffer read (logical check)
Data in bad block:
 type: 6 format: 2 rdba: 0x0580416b
 last change scn: 0x0000.0b86.0e36484c seq: 0x1 flg: 0x06
 spare3: 0x0
 consistency value in tail: 0x484c0601
 check value in block header: 0x4408
 computed block checksum: 0x0
TRCMIR:kcf_reread     :start:  16747:0:/u01/app/oracle/oradata/orcl12c/orcl/test_8k_assm.dbf
TRCMIR:kcf_reread     :done :  16747:0:/u01/app/oracle/oradata/orcl12c/orcl/test_8k_assm.dbf

The nasty bit, of course, is the bit I’ve removed and replaced with just “etc.”: it’s a complete block dump (raw and symbolic) which in my example was somthing like 500 lines and 35KB in size.

It’s not immediately obvious exactly what’s going on and why, but the 10046 trace helps a little. From another run of the test (on 19.3.0.0) I got the following combination of details – which is an extract showing the bit of the wait state trace leading into the start of the first block dump:

WAIT #140478118667016: nam='db file scattered read' ela= 108 file#=13 block#=256 blocks=32 obj#=77313 tim=103574529210
WAIT #140478118667016: nam='db file scattered read' ela= 2236 file#=13 block#=640 blocks=32 obj#=77313 tim=103574531549
WAIT #140478118667016: nam='db file scattered read' ela= 534 file#=13 block#=212 blocks=32 obj#=77312 tim=103574532257
kcbzibmlt: encounter logical error ORA-1410, try re-reading from other mirror..
cursor valid? 1 warm_up abort 0 makecr 0 line 16082 ds_blk (13, 212) bh_blk (13, 212)

Object 77313 is the secondary index, object 77312 is the primary key index (IOT_TOP). It may seem a little odd that Oracle is using db file scattered reads of 32 blocks to read the indexes but this is a side effect of flushing the buffer – Oracle may decide to prefeetch many extra blocks of an object to “warmup” the cache just after instance startup or a flush of the buffer cache. The thing I want to check, though, is what’s wrong with the blocks that Oracle read from object 77312:


alter system dump datafile 13 block min 212 block max 243;

BH (0xc8f68e68) file#: 13 rdba: 0x034000d4 (13/212) class: 1 ba: 0xc8266000
  set: 10 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 0,15
  dbwrid: 0 obj: 77311 objn: 77311 tsn: [3/6] afn: 13 hint: f

BH (0xa7fd6c38) file#: 13 rdba: 0x034000d4 (13/212) class: 1 ba: 0xa7c2a000
  set: 12 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 0,15
  dbwrid: 0 obj: 77311 objn: 77311 tsn: [3/6] afn: 13 hint: f

BH (0xa5f75780) file#: 13 rdba: 0x034000d5 (13/213) class: 0 ba: 0xa5384000
  set: 11 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 0,15
  dbwrid: 0 obj: 77311 objn: 77311 tsn: [3/6] afn: 13 hint: f

BH (0xdafe9220) file#: 13 rdba: 0x034000d5 (13/213) class: 1 ba: 0xdadcc000
  set: 9 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 0,15
  dbwrid: 0 obj: 77311 objn: 77311 tsn: [3/6] afn: 13 hint: f

I’ve reported the first few lines of the symbolic dump for the first few blocks of the resulting trace file. Look at the third line of each group of three BH lines: it’s reporting object 77311 (the overflow segment), not 77312 (the TOP segment). And every single block reported in the db file scattered read of 32 blocks for object 77312 reports itself, when dumped, as being part of object 77311. And that’s possibly the immediate cause of the ORA-01410.

We can take the investigation a little further by dumping a leaf block or two from the secondary index.


alter session set events 'immediate trace name treedump level 77313';

----- begin tree dump
branch: 0x3400104 54526212 (0: nrow: 542, level: 1)
   leaf: 0x340010d 54526221 (-1: row:278.278 avs:2479)
   leaf: 0x340075e 54527838 (0: row:132.132 avs:5372)
   leaf: 0x34005fb 54527483 (1: row:41.41 avs:7185)

alter system dump datafile 13 block 1886   -- leaf: 0x340075e

BH (0xd5f5d090) file#: 13 rdba: 0x0340075e (13/1886) class: 1 ba: 0xd5158000
  set: 9 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 0,15
  dbwrid: 0 obj: 77313 objn: 77313 tsn: [3/6] afn: 13 hint: f
...
row#6[5796] flag: K------, lock: 0, len=18
col 0; len 2; (2):  c1 1d
col 1; len 4; (4):  c3 07 41 5c
tl: 8 fb: --H-FL-- lb: 0x0  cc: 1
col  0: [ 4]  03 40 05 7c

I’ve done a treedump of the secondary index and picked a leaf block address from the treedump and dumped that leaf block, and from that leaf block I’ve extracted one index entry to show you the three components: the key value (c1 1d), the primary key for the row (c3 07 41 5c), and the block guess (03 40 05 75). Read the block guess as a 4 byte hex number, and it translates to file 13, block 1397 – which should belong to the TOP segment. So the exciting question is – what object does block (13, 1397) think it belongs to ?


alter system dump datafile 13 block 1397;

Block header dump:  0x03400575
 Object id on Block? Y
 seg/obj: 0x12dff  csc:  0x00000b860e308c46  itc: 2  flg: E  typ: 1 - DATA
     brn: 0  bdba: 0x3400501 ver: 0x01 opc: 0
     inc: 0  exflg: 0

Converting from Hex to Decimal: obj: 0x12dff = 77311 which is the overflow segment. The secondary index block guess is pointing at a block in the overflow segment.

There are two ways to handle this problem – you could simply rebuild the index (alter index rebuild) or, as the original poster did, use the “update block references” command to correct all the block guesses: “alter index randomload_idx update block references;”. Neither is desirable, but if you’re seeing a lot of large trace files following the pattern above then it may be necessary.

There was one particular inconsistency in the tests – which I ran many times – occasionally the pct_direct_access for the secondary index would be reported as zero (which, technically, should always happen given the error).  If it did, of course, Oracle wouldn’t follow the guess but would go straight to the step where it used the primary key “logical rowid” – thus bypassing the error and block dump.

tl;dr

In some circumstances the block guesses in the secondary indexes of IOTs may be pointing to the overflow segment instead of the primary key (TOP) segment. If this happens then queries will still run and give the right answers, but whenever they read a “guessed” block from disc they will report an ORA-01410 error and dump a block trace. This will affect performance and may cause space problems at the O/S level.

Sidebar

An entry in the secondary index of an Index Organized Table (IOT) consists of three parts, which intially we can think in the form:

({key-value}, {logical rowid}, {block guess})

Since IOTs don’t have real rowids the “logical rowid” is actually the primary key of the row where the {key value} will be found. As a short cut for efficient execution Oracle includes the block address (4 bytes) where that primary key value was stored when the row was inserted. Because an IOT is an index “rows” in the IOT can move as new data is inserted and leaf blocks split, so eventually any primary key may move to a different block – this is why we refer to the block address as a guess – a few days, hours, or minutes after you’ve inserted the row the block address may no longer be correct.)

To help the runtime engine do the right thing Oracle collects a statistic called pct_direct_access for secondary indexes of IOTs. This is a measure of what percentage of the block guesses are still correct at the time that the statistics are gathered. If this value is high enough the run-time engine will choose to try using the block guesses while executing a query (falling back to using the logical rowid if it turns out that the guess is invalid), but if the value drops too low the optimizer will ignore the block guesses and only use the logical rowid.

Not relevant to this note – but a final point about secondary indexes and logical rowids – if the definition of the index includes  some of the columns from the primary keys Oracle won’t store those columns twice (in more recent version, that is) – the code is clever enough to use the values stored in the (key value) component when it needs to use the (logical rowid) component.

 

Update (Jan 2020)

I passed this example on to Oracle, and there are now two bugs recorded for it:

  • Bug 30733525 – ALERT LOG ENTRIES RE BLOCK GUESSES IN THE SECONDARY INDEXES OF IOTS POINTING TO OVERFLOW SEGMENT INSTEAD OF INDEX SEGMENT
  • Bug 30733563 – WRONG GUESS DBA IN INDEX

 

 

 

To prevent automated spam submissions leave this field empty.