statistics

Jonathan Lewis's picture

System Stats

A quick collation – and warning – for 11.2

Bottom line – be careful about what you do with system stats on 11.2

Footnote: the MOS link is a search string  producing a list of references. I set it up like that because one of the articles referencing the bug is called “Things to consider before upgrade to 11.2.0.2″ and it’s worth reading.

Addendum: one of the people on the two-day course I’ve just run in Berlin sent me a link for a quick note on how to set your own values for the system stats if you hit this bug. It’s actually quite a reasonable thing to do whether or not you hit the bug given the way that gathering the stats can produce unsuitable figures anyway:  setting system stats. (I’ve also added their company blog to the links on the right, they have a number interesting items and post fairly regularly.)

Jonathan Lewis's picture

blevel=1

Here’s one of those quick answers I give sometimes on forums or newsgroups. I forget when I wrote this, and when, and what the specific question was – but it was something to do with rebuilding an index on a small table where data was constantly being deleted and inserted.

Another problem with high insert/delete rates appears with very small indexes.

If you have a table that is small but constantly recycles its space you may also find you have an index where the number of leaf blocks puts you close to the borderline between having blevel = 1 and blevel = 2. If the size crosses that border occasionally and the statistics are updated to reflect the change – which is quite likely for a table subject to lots of updates and deletes if you have automatic stats collection enabled – then execution plans could change, resulting in dramatic changes in performance.

The workaround is fairly obvious – don’t let Oracle collect stats automatically on that table, instead create a stats-collection strategy for eliminating the change in blevel. For example, keep the stats locked except when you run your own code to deal with the stats, making sure that you overwrite the index blevel with 1 even if it has just crossed the boundary to 2.

Footnote: the reason why a change from 1 to 2 is dramatic is because Oracle ignores the blevel in the optimizer arithmetic when it is set to 1; so the change from 1 to 2 actually has the impact of a change from zero to 2. Then the cost of a nested loop access is “cost of single access multiplied by number of times you do it” – so the sudden appearance of a 2 in the formula gives an increment in cost of  “2 * number of times you visit the table” of your small table is the second table in a nested loop – and suddenly a nested loop becomes much more expensive without a real change in the data size.

Footnote 2: it should be obvious that you don’t need to rebuild the index once you know what the problem is; but since we’re talking about a small index with a blevel that is usually 1 it probably won’t take more than a fraction of a second to rebuild the index and there’s a fair chance you can find a safe moment to do it. In terms of complexity the solution is just as simple as the stats solution – so you might as well consider it. The only thing you need to be careful about is that you don’t happen to rebuild the index at a time when the blevel is likely to be 2.

Footnote 3: For an example of the type of code that will adjust the blevel of an index see this URL. (Note, the example talks about copying stats from one place to another – but the principle is the same.)

Jonathan Lewis's picture

Virtual bug

I’ve said in the past that one of the best new features, in my view, in 11g was the appearance of proper virtual columns; and I’ve also been very keen on the new “approximate NDV” that makes it viable to collect stats with the “auto_sample_size”.

Who’d have guessed that if you put them both together, then ran a parallel stats collection it would break :(

The bug number Karen quotes (10013177.8) doesn’t (appear to) mention extended stats – but since virtual columns, function-based indexes, and extended stats share a number of implementation details I’d guess that they might be affected as well.

Jonathan Lewis's picture

Rows per block

A recent question on the OTN database forum:

Can any one please point to me a document or a way to calculate the average number of rows per block in oralce 10.2.0.3

One answer would be to collect stats and then approximate as block / avg_row_len – although you have to worry about things like row overheads, the row directory, and block overheads before you can be sure you’ve got it right. On top of this, the average might not be too helpful anyway. So here’s another (not necessarily fast) option that gives you more information about the blocks that have any rows in them (I picked the source$ table from a 10g system because source$ is often good for some extreme behaviour).


break on report

compute sum of tot_blocks on report
compute sum of tot_rows   on report

column avg_rows format 999.99

select
	twentieth,
	min(rows_per_block)			min_rows,
	max(rows_per_block)			max_rows,
	sum(block_ct)				tot_blocks,
	sum(row_total)				tot_rows,
	round(sum(row_total)/sum(block_ct),2)	avg_rows
from
	(
	select
		ntile(20) over (order by rows_per_block) twentieth,
		rows_per_block,
		count(*)			block_ct,
		rows_per_block * count(*)	row_total
	from
		(
		select
			fno, bno, count(*) rows_per_block
		from
			(
			select
				dbms_rowid.rowid_relative_fno(rowid)	as fno,
				dbms_rOwId.rowid_block_number(rowid)	as bno
			from
				source$
			)
		group by
			fno, bno
		)
	group by
		rows_per_block
	order by
		rows_per_block
	)
group by
	twentieth
order by
	twentieth
;

I’ve used the ntile() function to split the results into 20 lines, obviously you might want to change this according to the expected variation in rowcounts for your target table. In my case the results looked like this:

 TWENTIETH   MIN_ROWS   MAX_ROWS TOT_BLOCKS   TOT_ROWS AVG_ROWS
---------- ---------- ---------- ---------- ---------- --------
         1          1         11       2706       3470     1.28
         2         12         22         31        492    15.87
         3         23         34         30        868    28.93
         4         35         45         20        813    40.65
         5         46         57         13        664    51.08
         6         59         70         18       1144    63.56
         7         71         81         23       1751    76.13
         8         82         91         47       4095    87.13
         9         92        101         79       7737    97.94
        10        102        111        140      14976   106.97
        11        112        121        281      32799   116.72
        12        122        131        326      41184   126.33
        13        132        141        384      52370   136.38
        14        142        151        325      47479   146.09
        15        152        161        225      35125   156.11
        16        162        171        110      18260   166.00
        17        172        181         58      10207   175.98
        18        182        191         18       3352   186.22
        19        193        205         22       4377   198.95
        20        206        222         16       3375   210.94
                                 ---------- ----------
sum                                    4872     284538

Of course, the moment you see a result like this it prompts you to ask more questions.

Is the “bell curve” effect that you can see centred around the 13th ntile indicative of a normal distribution of row lengths – if so why is the first ntile such an extreme outlier – is that indicative of a number of peculiarly long rows, did time of arrival have a special effect, is it the result of a particular pattern of delete activity … and so on.

Averages are generally very bad indicators if you’re worried about the behaviour of an Oracle system.

Jonathan Lewis's picture

Stats collection

From time to time I see people asking how they can check how far their system has got in it’s call to collect stats. The question is often a little ambiguous – they know which call to dbms_stats they’ve used, so they know whether they’re trying to monitor stats collection against a single table (taking advantage of v$session_longops springs to mind) or against an entire schema.

Here’s one simple-minded approach that I whipped up a few years ago – it’s possible you could do better with the latest versions of dbms_stats. Its purpose is simply to give you some way of checking what work dbms_stats needs to do (in this example, for a schema), so that you can check how much has been done and how much remains. The basic method can be modified in various ways to match you tastes.

We start with the simple observation that many calls to dbms_stats have an “options” parameter that allows you to specify things like ‘LIST STALE’, rather than ‘GATHER STALE’. The “list” options populate an object table type with details of the data segments whose statistics would be gathered by the corresponding “gather” option. So we could start with a simple piece of code to do the following:

declare
	m_object_list	dbms_stats.objecttab;
begin

	dbms_stats.gather_schema_stats(
		ownname		=> 'test_user',
		options		=> 'LIST AUTO',
--		options		=> 'LIST STALE',
--		options		=> 'LIST EMPTY',
		objlist		=> m_object_list
	);

	for i in 1..m_object_list.count loop
		dbms_output.put_line(
			rpad(m_object_list(i).ownname,30)     ||
			rpad(m_object_list(i).objtype, 6)     ||
			rpad(m_object_list(i).objname,30)     ||
			rpad(m_object_list(i).partname,30)    ||
			rpad(m_object_list(i).subpartname,30) ||
			lpad(m_object_list(i).confidence,4)
		);
	end loop;
end;
/

Call this before you make your call to gather stats (and in general it might be better to use the utl_file package to write to a file rather than using dbms_output and capturing screen output) then, as time passes, you can check the “last_analyzed” column on the relevant view to see which objects are still waiting for their stats to be collected. Of course, this approach is a little clunky, and requires a certain amount of tedious manual labour to get the check done, but once we have the basic principle the rest is easy. Let’s start by using the code in a pipelined function.

create or replace function list_stale (
	i_option	in	varchar2,
	i_user		in	varchar2 default user
)
return dbms_stats.objecttab pipelined
as
	pragma autonomous_transaction;
	m_object_list	dbms_stats.objecttab;
begin

	if i_option not in (
		'LIST AUTO', 'LIST STALE','LIST EMPTY'
	) then
		null;
	else
		dbms_stats.gather_schema_stats(
			ownname		=> i_user,
			options		=> i_option,
			objlist		=> m_object_list
		);

		for i in 1..m_object_list.count loop
			pipe row (m_object_list(i));
		end loop;

		end if;

	return;
end;
/

You’ll notice that I’ve declared the function to run as an autonomous transaction – the call to dbms_stats does various things (such as updating the mon_mods$ and col_usage$ tables) that you’re not supposed to do in a pipelined function, but you can hide these by using the automonous_transaction pragma. (Strangely the error message you get in the absences of the pragma is: “ORA-04092: cannot COMMIT in a trigger”.)

With this definition I can call the function with code like:

select  *
from    table(list_stale('LIST AUTO'))

OWNNAME      OBJTYP OBJNAME         PARTNAME         SUBPARTNAME            CONFIDENCE
------------ ------ --------------- ---------------- ---------------------- ----------
TEST_USER    TABLE  PT_RANGE                                                       100
TEST_USER    TABLE  PT_RANGE        P600                                           100
TEST_USER    TABLE  PT_RL                                                          100
TEST_USER    TABLE  PT_RL           P_2002_MAR                                     100
TEST_USER    TABLE  PT_RL           P_2002_MAR       P_2002_MAR_THE_REST           100
TEST_USER    TABLE  PT_RL           P_2002_MAR       P_2002_MAR_MN                 100
TEST_USER    TABLE  PT_RL           P_2002_MAR       P_2002_MAR_EAST_COAST         100
TEST_USER    TABLE  PT_RL           P_2002_MAR       P_2002_MAR_CA                 100
TEST_USER    TABLE  T1                                                             100
TEST_USER    INDEX  PT_PK                                                          100
TEST_USER    INDEX  PT_PK           P600                                           100
TEST_USER    INDEX  T1_PK                                                          100
        ... etc ...

This will give me a list of all objects in my schema that need stats collected – either because they have no stats, or their stats are stale.

But I can be more subtle than this. I could, for example, write a query that joins this “table” to the view dba_segments and keep running it as the stats collection proceeds to report the objects that are still waiting for stats and the sizes of those objects. The SQL needs just a little thought as the objtype is only reported as “TABLE” or “INDEX” so you have to do a UNION ALL in the join and work out which rows are really for partitions and which for subpartitions. And then you have to mess about a bit with outer joins because Oracle will, for example, want to collect stats on a table when a partition is stale – and there is no data segment at the table level of a partitioned table.

Your code might look something like this:

with list_result as (
	select *
	from	table(list_stale('LIST AUTO'))
)
select
	lst.objname, nvl(seg.segment_type,lst.objtype) segment_type, lst.partname, seg.blocks
from
	list_result	lst,
	dba_segments	seg
where
	lst.partname is null
and	seg.owner(+) = lst.ownname
and	seg.segment_type(+) = lst.objtype
and	seg.segment_name(+) = lst.objname
and	seg.partition_name is null
union all
select
	lst.objname, nvl(seg.segment_type,lst.objtype || ' PARTITION') segment_type, lst.partname, seg.blocks
from
	list_result	lst,
	dba_segments	seg
where
	lst.subpartname is null
and	lst.partname is not null
and	seg.owner(+) = lst.ownname
and	substr(seg.segment_type(+),1,5) = lst.objtype
and	seg.segment_name(+) = lst.objname
and	seg.partition_name(+) = lst.partname
union all
select
	lst.objname, segment_type, lst.partname, seg.blocks
from
	list_result	lst,
	dba_segments	seg
where
	lst.subpartname is not null
and	seg.owner = lst.ownname
and	substr(seg.segment_type,1,5) = lst.objtype
and	seg.segment_name = lst.objname
and	seg.partition_name = lst.subpartname
order by
	1,2,3
/

In a tiny schema, where every data segment is one of my “1MB uniform extents”, this is the sample of output I got:

OBJNAME         SEGMENT_TYPE       PARTNAME                           BLOCKS
--------------- ------------------ ------------------------------ ----------
PT_PK           INDEX
PT_PK           INDEX PARTITION    P200                                  128
PT_PK           INDEX PARTITION    P400                                  128
PT_PK           INDEX PARTITION    P600                                  128
PT_RANGE        TABLE
PT_RANGE        TABLE PARTITION    P200                                  128
PT_RANGE        TABLE PARTITION    P400                                  128
PT_RANGE        TABLE PARTITION    P600                                  128
PT_RL           TABLE
PT_RL           TABLE PARTITION    P_2002_FEB
PT_RL           TABLE PARTITION    P_2002_MAR
PT_RL           TABLE SUBPARTITION P_2002_FEB                            128
PT_RL           TABLE SUBPARTITION P_2002_FEB                            128
PT_RL           TABLE SUBPARTITION P_2002_FEB                            128
PT_RL           TABLE SUBPARTITION P_2002_FEB                            128
PT_RL           TABLE SUBPARTITION P_2002_MAR                            128
PT_RL           TABLE SUBPARTITION P_2002_MAR                            128
PT_RL           TABLE SUBPARTITION P_2002_MAR                            128
PT_RL           TABLE SUBPARTITION P_2002_MAR                            128
RL_PK           INDEX
RL_PK           INDEX PARTITION    P_2002_FEB
RL_PK           INDEX PARTITION    P_2002_MAR
RL_PK           INDEX SUBPARTITION P_2002_FEB                            128
RL_PK           INDEX SUBPARTITION P_2002_FEB                            128
RL_PK           INDEX SUBPARTITION P_2002_FEB                            128
RL_PK           INDEX SUBPARTITION P_2002_FEB                            128
RL_PK           INDEX SUBPARTITION P_2002_MAR                            128
RL_PK           INDEX SUBPARTITION P_2002_MAR                            128
RL_PK           INDEX SUBPARTITION P_2002_MAR                            128
RL_PK           INDEX SUBPARTITION P_2002_MAR                            128
T1              TABLE                                                    128
T1_PK           INDEX                                                    128
T2              TABLE                                                    128
T2_I1           INDEX                                                    128

Obviously you could do something more complex to associate indexes with their tables, then sum subpartitions back to partitions and partitions back to table level stats – but a simple list of objects with primary physical sizes might be sufficient to give you an idea of how much work is still to be done. If you want to be really elegant (or extravagant) you could even combine this SQL with another piece reporting the objects with last_analyzed more recent than the start of run including, if you’re not running a parallel collection, the approximate time taken for each object (think lag() and comparing consecutive times).

Distinct placement

As a follow-up to a recent post on different names Oracle can use for the intermediate views, here is a quick example of the technique called distinct placement.

Niall Litchfield's picture

Distinctly Odd

I recently got involved with a performance investigation for an Oracle 9.2 database. The process of investigation threw up some interesting information for me regarding the accuracy of statistics collection in Oracle. It also highlights how different defaults in different versions of Oracle can lead to remarkably different statistics and hence execution plans.  Finally, it [...]

randolf.geist's picture

Pending Statistics

This is just a quick heads-up to those that plan to use the Pending Statistics feature that has been introduced in Oracle 11.1.

It looks like that in all currently available versions that support this feature Pending Statistics have not been implemented consequently for all possible DBMS_STATS calls, so you have to be very careful which calls you use. Having enabled the pending statistics for a particular table you might start to manipulate the statistics under the impression that the modifications performed are not reflected in the actual dictionary statistics (by "dictionary statistics" in this case I don't mean the statistics of the data dictionary objects themselves but the actual statistics of database objects stored in the data dictionary) but only in the pending statistics area allowing you to test statistics modifications in an isolated environment using the OPTIMIZER_USE_PENDING_STATISTICS parameter on session level.

You therefore might be in for a surprise to find out that this holds true only for a limited set of DBMS_STATS calls, but not for all.

This effectively means that particular changes to the statistics will be effective immediately although pending statistics have been enabled.

In particular manipulations of the statistics using the SET_*_STATS procedures of DBMS_STATS seem to ignore the pending statistics settings and still update the dictionary statistics immediately without further notice.

This is rather unfortunate since this means that Pending Statistics can not be used in a straightforward way to test user-defined statistics which can be very helpful under certain circumstances but require extensive testing before using them on a live system.

But also other calls, like gathering statistics only for a particular set of columns show an unexpected behaviour: It looks like that both statistics get modified, the pending statistics area, but also the dictionary statistics.

Note that setting the GLOBAL preferences (DBMS_STATS.SET_GLOBAL_PREFS) for PUBLISH to FALSE seems to fix this particular issue - in that case only the pending statistics get updated, but the dictionary statistics are left unchanged. This fix does not apply to the SET_*_STATS procedures unfortunately, those seem to always update the dictionary statistics.

The worst thing however is that the statistics history that is automatically maintained since Oracle 10g does not reflect these (unintended) changes properly, so you can not easily recover from the potentially unwanted modifications by calling DBMS_STATS.RESTORE_TABLE_STATS.

Finally I was obviously able to activate the pending statistics using DBMS_STATS.RESTORE_TABLE_STATS - you can rather clearly see this behaviour when using the SET_GLOBAL_PREFS('PUBLISH', 'FALSE') variant of the following script.

The following is a small demonstration of the issues encountered - please note that it modifies the GLOBAL preferences for the PUBLISH setting if you intend to run this test by yourself.

set echo on timing on linesize 130 tab off trimspool on

alter session set nls_date_format = 'DD.MM.YYYY HH24:MI:SS';

drop table t purge;

create table t
as
select * from all_objects
where rownum <= 1000;

exec dbms_stats.set_global_prefs('PUBLISH', 'TRUE')

select dbms_stats.get_prefs('PUBLISH', null, 'T') from dual;

-- Our baseline, no histograms, basic column statistics for all columns
exec dbms_stats.gather_table_stats(null, 'T', estimate_percent => null, cascade => false, method_opt => 'FOR ALL COLUMNS SIZE 1')

-- Verify the result
select
num_distinct
, density
, last_analyzed
, num_buckets
, user_stats
from
user_tab_col_statistics
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

-- Enable pending statistics for table T
-- You can try these different calls
--
-- The GATHER_*_STATS procedures seem to behave correctly
-- only when setting the GLOBAL PREFS to FALSE
--
-- "Correctly" means that the results are reflected in the
-- pending area only but not in the dictionary statistics
--
-- Note that the SET_*_STATS procedures seem to ignore the setting
-- always and publish directly to the dictionary
-- no matter what the PUBLISH setting is on any level (TABLE, GLOBAL)
--
-- exec dbms_stats.set_global_prefs('PUBLISH', 'FALSE')
exec dbms_stats.set_table_prefs(null, 'T', 'PUBLISH', 'FALSE')
-- exec dbms_stats.set_schema_prefs(user, 'PUBLISH', 'FALSE')

-- Verify the current setting, statistics will not be published
select dbms_stats.get_prefs('PUBLISH', null, 'T') from dual;

-- Wait for two seconds to make the LAST_ANALYZED column meaningful
exec dbms_lock.sleep(2)

-- This is supposed to go to the pending statistics area
exec dbms_stats.gather_table_stats(null, 'T', estimate_percent => null, cascade => false, method_opt => 'FOR ALL COLUMNS SIZE 254')

-- Yes, it worked, the dictionary statistics are not modified
select
num_distinct
, density
, last_analyzed
, num_buckets
, user_stats
from
user_tab_col_statistics
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

-- The pending statistics area contains now the new statistics including histograms
select
num_distinct
, density
, last_analyzed
from
user_col_pending_stats
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

select
count(*)
from
user_tab_histgrm_pending_stats
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

select
count(*)
from
user_tab_histograms
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

-- Wait for two seconds to make the LAST_ANALYZED column meaningful
exec dbms_lock.sleep(2)

-- Let's gather statistics only for the OBJECT_NAME column
exec dbms_stats.gather_table_stats(null, 'T', estimate_percent => null, cascade => false, method_opt => 'FOR COLUMNS OBJECT_NAME SIZE 1')

-- Oops, why do my dictionary statistics reflect that change (Note the LAST_ANALYZED column)
-- Except for you set the GLOBAL preferences for PUBLISH to FALSE
select
num_distinct
, density
, last_analyzed
, num_buckets
, user_stats
from
user_tab_col_statistics
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

-- I do have now the statistics updated in both, pending statistics and dictionary statistics
-- Except for you set the GLOBAL preferences for PUBLISH to FALSE
select
num_distinct
, density
, last_analyzed
from
user_col_pending_stats
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

select
count(*)
from
user_tab_histgrm_pending_stats
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

select
count(*)
from
user_tab_histograms
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

-- Wait for two seconds to make the LAST_ANALYZED column meaningful
exec dbms_lock.sleep(2)

-- Let's recreate the histogram only on the OBJECT_NAME column
exec dbms_stats.gather_table_stats(null, 'T', estimate_percent => null, cascade => false, method_opt => 'FOR COLUMNS OBJECT_NAME SIZE 254')

-- Oops, I did it again...
-- Except for you set the GLOBAL preferences for PUBLISH to FALSE
select
num_distinct
, density
, last_analyzed
, num_buckets
, user_stats
from
user_tab_col_statistics
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

select
num_distinct
, density
, last_analyzed
from
user_col_pending_stats
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

select
count(*)
from
user_tab_histgrm_pending_stats
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

select
count(*)
from
user_tab_histograms
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

-- Wait for two seconds to make the LAST_ANALYZED column meaningful
exec dbms_lock.sleep(2)

-- Let's define a manually crafted NDV and DENSITY value
-- Again I expect this to go to the pending statistics area
declare
srec dbms_stats.statrec;
novals dbms_stats.numarray;
distcnt number;
avgclen number;
nullcnt number;
density number;
begin
dbms_stats.get_column_stats(null, 't', 'object_name', distcnt => distcnt, avgclen => avgclen, nullcnt => nullcnt, density => density, srec => srec);
dbms_stats.set_column_stats(
ownname=>null,
tabname=>'t',
colname=>'object_name',
distcnt=>distcnt*100,
nullcnt=>nullcnt,
srec=>srec,
avgclen=>avgclen,
density=>density/100
);
end;
/

-- Nope, no change here
select
num_distinct
, density
, last_analyzed
from
user_col_pending_stats
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

-- But I just changed it in the dictionary statistics
-- Even in case of setting the GLOBAL preference to FALSE
select
num_distinct
, density
, last_analyzed
, num_buckets
, user_stats
from
user_tab_col_statistics
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

-- And what is even worse: The statistics history does not reflect all these changes to the dictionary statistics
select table_name, stats_update_time from USER_TAB_STATS_HISTORY where table_name = 'T';

exec dbms_stats.restore_table_stats(null, 'T', systimestamp)

-- But which statistics have been restored now?
-- It looks like this actually restored the PENDING statistics
-- according to the LAST_ANALYZED information??
select
num_distinct
, density
, last_analyzed
, num_buckets
, user_stats
from
user_tab_col_statistics
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

select
count(*)
from
user_tab_histgrm_pending_stats
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

select
count(*)
from
user_tab_histograms
where
table_name = 'T'
and column_name = 'OBJECT_NAME';

exec dbms_stats.set_global_prefs('PUBLISH', 'TRUE')

And this is what I get from running this on 11.1.0.7, 11.2.0.1 or 11.2.0.2:

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL>
SQL> alter session set nls_date_format = 'DD.MM.YYYY HH24:MI:SS';

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> drop table t purge;

Table dropped.

Elapsed: 00:00:00.20
SQL>
SQL> create table t
2 as
3 select * from all_objects
4 where rownum <= 1000;

Table created.

Elapsed: 00:00:00.35
SQL>
SQL> exec dbms_stats.set_global_prefs('PUBLISH', 'TRUE')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.03
SQL>
SQL> select dbms_stats.get_prefs('PUBLISH', null, 'T') from dual;

DBMS_STATS.GET_PREFS('PUBLISH',NULL,'T')
----------------------------------------------------------------------------------------------------------------------------------
TRUE

Elapsed: 00:00:00.00
SQL>
SQL> -- Our baseline, no histograms, basic column statistics for all columns
SQL> exec dbms_stats.gather_table_stats(null, 'T', estimate_percent => null, cascade => false, method_opt => 'FOR ALL COLUMNS SIZE 1')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.09
SQL>
SQL> -- Verify the result
SQL> select
2 num_distinct
3 , density
4 , last_analyzed
5 , num_buckets
6 , user_stats
7 from
8 user_tab_col_statistics
9 where
10 table_name = 'T'
11 and column_name = 'OBJECT_NAME';

NUM_DISTINCT DENSITY LAST_ANALYZED NUM_BUCKETS USE
------------ ---------- ------------------- ----------- ---
995 .001005025 18.01.2011 18:58:24 1 NO

Elapsed: 00:00:00.01
SQL>
SQL> -- Enable pending statistics for table T
SQL> -- You can try these different calls
SQL> --
SQL> -- The GATHER_*_STATS procedures seem to behave correctly
SQL> -- only when setting the GLOBAL PREFS to FALSE
SQL> --
SQL> -- "Correctly" means that the results are reflected in the
SQL> -- pending area only but not in the dictionary statistics
SQL> --
SQL> -- Note that the SET_*_STATS procedures seem to ignore the setting
SQL> -- always and publish directly to the dictionary
SQL> -- no matter what the PUBLISH setting is on any level (TABLE, GLOBAL)
SQL> --
SQL> -- exec dbms_stats.set_global_prefs('PUBLISH', 'FALSE')
SQL> exec dbms_stats.set_table_prefs(null, 'T', 'PUBLISH', 'FALSE')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.01
SQL> -- exec dbms_stats.set_schema_prefs(user, 'PUBLISH', 'FALSE')
SQL>
SQL> -- Verify the current setting, statistics will not be published
SQL> select dbms_stats.get_prefs('PUBLISH', null, 'T') from dual;

DBMS_STATS.GET_PREFS('PUBLISH',NULL,'T')
----------------------------------------------------------------------------------------------------------------------------------
FALSE

Elapsed: 00:00:00.01
SQL>
SQL> -- Wait for two seconds to make the LAST_ANALYZED column meaningful
SQL> exec dbms_lock.sleep(2)

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.01
SQL>
SQL> -- This is supposed to go to the pending statistics area
SQL> exec dbms_stats.gather_table_stats(null, 'T', estimate_percent => null, cascade => false, method_opt => 'FOR ALL COLUMNS SIZE 254')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.99
SQL>
SQL> -- Yes, it worked, the dictionary statistics are not modified
SQL> select
2 num_distinct
3 , density
4 , last_analyzed
5 , num_buckets
6 , user_stats
7 from
8 user_tab_col_statistics
9 where
10 table_name = 'T'
11 and column_name = 'OBJECT_NAME';

NUM_DISTINCT DENSITY LAST_ANALYZED NUM_BUCKETS USE
------------ ---------- ------------------- ----------- ---
995 .001005025 18.01.2011 18:58:24 1 NO

Elapsed: 00:00:00.01
SQL>
SQL> -- The pending statistics area contains now the new statistics including histograms
SQL> select
2 num_distinct
3 , density
4 , last_analyzed
5 from
6 user_col_pending_stats
7 where
8 table_name = 'T'
9 and column_name = 'OBJECT_NAME';

NUM_DISTINCT DENSITY LAST_ANALYZED
------------ ---------- -------------------
995 .00101 18.01.2011 18:58:26

Elapsed: 00:00:00.01
SQL>
SQL> select
2 count(*)
3 from
4 user_tab_histgrm_pending_stats
5 where
6 table_name = 'T'
7 and column_name = 'OBJECT_NAME';

COUNT(*)
----------
255

Elapsed: 00:00:00.01
SQL>
SQL> select
2 count(*)
3 from
4 user_tab_histograms
5 where
6 table_name = 'T'
7 and column_name = 'OBJECT_NAME';

COUNT(*)
----------
2

Elapsed: 00:00:00.01
SQL>
SQL> -- Wait for two seconds to make the LAST_ANALYZED column meaningful
SQL> exec dbms_lock.sleep(2)

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.01
SQL>
SQL> -- Let's gather statistics only for the OBJECT_NAME column
SQL> exec dbms_stats.gather_table_stats(null, 'T', estimate_percent => null, cascade => false, method_opt => 'FOR COLUMNS OBJECT_NAME SIZE 1')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.14
SQL>
SQL> -- Oops, why do my dictionary statistics reflect that change (Note the LAST_ANALYZED column)
SQL> -- Except for you set the GLOBAL preferences for PUBLISH to FALSE
SQL> select
2 num_distinct
3 , density
4 , last_analyzed
5 , num_buckets
6 , user_stats
7 from
8 user_tab_col_statistics
9 where
10 table_name = 'T'
11 and column_name = 'OBJECT_NAME';

NUM_DISTINCT DENSITY LAST_ANALYZED NUM_BUCKETS USE
------------ ---------- ------------------- ----------- ---
995 .001005025 18.01.2011 18:58:29 1 NO

Elapsed: 00:00:00.01
SQL>
SQL> -- I do have now the statistics updated in both, pending statistics and dictionary statistics
SQL> -- Except for you set the GLOBAL preferences for PUBLISH to FALSE
SQL> select
2 num_distinct
3 , density
4 , last_analyzed
5 from
6 user_col_pending_stats
7 where
8 table_name = 'T'
9 and column_name = 'OBJECT_NAME';

NUM_DISTINCT DENSITY LAST_ANALYZED
------------ ---------- -------------------
995 .001005025 18.01.2011 18:58:29

Elapsed: 00:00:00.01
SQL>
SQL> select
2 count(*)
3 from
4 user_tab_histgrm_pending_stats
5 where
6 table_name = 'T'
7 and column_name = 'OBJECT_NAME';

COUNT(*)
----------
0

Elapsed: 00:00:00.01
SQL>
SQL> select
2 count(*)
3 from
4 user_tab_histograms
5 where
6 table_name = 'T'
7 and column_name = 'OBJECT_NAME';

COUNT(*)
----------
2

Elapsed: 00:00:00.01
SQL>
SQL> -- Wait for two seconds to make the LAST_ANALYZED column meaningful
SQL> exec dbms_lock.sleep(2)

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.01
SQL>
SQL> -- Let's recreate the histogram only on the OBJECT_NAME column
SQL> exec dbms_stats.gather_table_stats(null, 'T', estimate_percent => null, cascade => false, method_opt => 'FOR COLUMNS OBJECT_NAME SIZE 254')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.34
SQL>
SQL> -- Oops, I did it again...
SQL> -- Except for you set the GLOBAL preferences for PUBLISH to FALSE
SQL> select
2 num_distinct
3 , density
4 , last_analyzed
5 , num_buckets
6 , user_stats
7 from
8 user_tab_col_statistics
9 where
10 table_name = 'T'
11 and column_name = 'OBJECT_NAME';

NUM_DISTINCT DENSITY LAST_ANALYZED NUM_BUCKETS USE
------------ ---------- ------------------- ----------- ---
995 .00101 18.01.2011 18:58:32 254 NO

Elapsed: 00:00:00.01
SQL>
SQL> select
2 num_distinct
3 , density
4 , last_analyzed
5 from
6 user_col_pending_stats
7 where
8 table_name = 'T'
9 and column_name = 'OBJECT_NAME';

NUM_DISTINCT DENSITY LAST_ANALYZED
------------ ---------- -------------------
995 .00101 18.01.2011 18:58:32

Elapsed: 00:00:00.01
SQL>
SQL> select
2 count(*)
3 from
4 user_tab_histgrm_pending_stats
5 where
6 table_name = 'T'
7 and column_name = 'OBJECT_NAME';

COUNT(*)
----------
255

Elapsed: 00:00:00.01
SQL>
SQL> select
2 count(*)
3 from
4 user_tab_histograms
5 where
6 table_name = 'T'
7 and column_name = 'OBJECT_NAME';

COUNT(*)
----------
255

Elapsed: 00:00:00.01
SQL>
SQL> -- Wait for two seconds to make the LAST_ANALYZED column meaningful
SQL> exec dbms_lock.sleep(2)

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.01
SQL>
SQL> -- Let's define a manually crafted NDV and DENSITY value
SQL> -- Again I expect this to go to the pending statistics area
SQL> declare
2 srec dbms_stats.statrec;
3 novals dbms_stats.numarray;
4 distcnt number;
5 avgclen number;
6 nullcnt number;
7 density number;
8 begin
9 dbms_stats.get_column_stats(null, 't', 'object_name', distcnt => distcnt, avgclen => avgclen, nullcnt => nullcnt, density => density, srec => srec);
10 dbms_stats.set_column_stats(
11 ownname=>null,
12 tabname=>'t',
13 colname=>'object_name',
14 distcnt=>distcnt*100,
15 nullcnt=>nullcnt,
16 srec=>srec,
17 avgclen=>avgclen,
18 density=>density/100
19 );
20 end;
21 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.17
SQL>
SQL> -- Nope, no change here
SQL> select
2 num_distinct
3 , density
4 , last_analyzed
5 from
6 user_col_pending_stats
7 where
8 table_name = 'T'
9 and column_name = 'OBJECT_NAME';

NUM_DISTINCT DENSITY LAST_ANALYZED
------------ ---------- -------------------
995 .00101 18.01.2011 18:58:32

Elapsed: 00:00:00.01
SQL>
SQL> -- But I just changed it in the dictionary statistics
SQL> -- Even in case of setting the GLOBAL preference to FALSE
SQL> select
2 num_distinct
3 , density
4 , last_analyzed
5 , num_buckets
6 , user_stats
7 from
8 user_tab_col_statistics
9 where
10 table_name = 'T'
11 and column_name = 'OBJECT_NAME';

NUM_DISTINCT DENSITY LAST_ANALYZED NUM_BUCKETS USE
------------ ---------- ------------------- ----------- ---
99500 .0000101 18.01.2011 18:58:34 254 YES

Elapsed: 00:00:00.01
SQL>
SQL> -- And what is even worse: The statistics history does not reflect all these changes to the dictionary statistics
SQL> select table_name, stats_update_time from USER_TAB_STATS_HISTORY where table_name = 'T';

TABLE_NAME STATS_UPDATE_TIME
------------------------------ ---------------------------------------------------------------------------
T 18-JAN-11 06.58.24.391000 PM +01:00

Elapsed: 00:00:00.01
SQL>
SQL> exec dbms_stats.restore_table_stats(null, 'T', systimestamp)

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.29
SQL>
SQL> -- But which statistics have been restored now?
SQL> -- It looks like this actually restored the PENDING statistics
SQL> -- according to the LAST_ANALYZED information??
SQL> select
2 num_distinct
3 , density
4 , last_analyzed
5 , num_buckets
6 , user_stats
7 from
8 user_tab_col_statistics
9 where
10 table_name = 'T'
11 and column_name = 'OBJECT_NAME';

NUM_DISTINCT DENSITY LAST_ANALYZED NUM_BUCKETS USE
------------ ---------- ------------------- ----------- ---
995 .00101 18.01.2011 18:58:32 254 NO

Elapsed: 00:00:00.01
SQL>
SQL> select
2 count(*)
3 from
4 user_tab_histgrm_pending_stats
5 where
6 table_name = 'T'
7 and column_name = 'OBJECT_NAME';

COUNT(*)
----------
255

Elapsed: 00:00:00.01
SQL>
SQL> select
2 count(*)
3 from
4 user_tab_histograms
5 where
6 table_name = 'T'
7 and column_name = 'OBJECT_NAME';

COUNT(*)
----------
255

Elapsed: 00:00:00.01
SQL>
SQL> exec dbms_stats.set_global_prefs('PUBLISH', 'TRUE')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.01
SQL>

It is also interesting to note that, although Oracle has added an array of new dictionary views that allow to access the pending statistics area, these views are inconsistent with the existing statistics-related views in terms of naming conventions, columns and behaviour.

For example the *_col_pending_stats view does not have a NUM_BUCKETS column, and the corresponding *_tab_histgrm_pending_stats view for histogram details shows 0 rows if basic column statistics have been defined but no histogram whereas the original *_tab_histograms returns two rows if basic column statistics have been collected representing the low and high value of the column.

Summary

The Pending Statistics feature clearly shows some unexpected behaviour. In particular you better don't rely on it preventing the dictionary statistics from being updated by DBMS_STATS calls with the PUBLISH attribute set to FALSE.

Some of this clearly looks like a bug but I couldn't find a corresponding bug entry yet in My Oracle Support.

Note that this post does not cover the actual usage of Pending Statistics by the optimizer - I haven't done any extensive testing in this regard, but some quick checks showed that it seems to work as expected, which means that the optimizer picks statistics for those tables that have Pending Statistics defined when setting OPTIMIZER_USE_PENDING_STATISTICS = TRUE, but still uses the statistics from the dictionary for those that don't have any pending statistics defined.

Jonathan Lewis's picture

System Statistics

I wrote an article about system statistics / CPU Costing for Oracle magazine a few years ago – and last week I realised that I’ve never supplied a link to it in the notes and comments I’ve made about system statistics. So I’ve just run a search through the Oracle website trying to find it – and discovered that it’s no longer available. Apparently the editors have decided that any technical articles over a certain age should be withdrawn in case they are out of date and misleading. (Clearly they’ve read my blog on trust – I wish the people maintaining Metalink would do the same as the magazine editors – but they probably have a much larger volume to worry about).

However, I have discovered translations of the article in Russian, Korean and Chinese – so if you can read any of these languages, you might want to take a look at them before they disappear too.

If you want an original English version – dated April 2004, which is when I sent it in to Oracle Magazine, and before it underwent some editing – I’ve posted it as a pdf file.

[More on System Statistics]

Jonathan Lewis's picture

Frequency Histograms – 6

In an earlier post on frequency histograms I described how Oracle creates an approximate histogram when dealing with character columns, and warned you that the strategy could lead to a couple of anomalies if you were unlucky. I’ve already published a note about one such anomaly that can occur with fairly long character strings, this note describes another anomaly that could appear in less extreme cases. Again, we start by constructing a data set.


create table t1  (v1 varchar2(42));

insert	into t1
select	'next day'
from	all_objects
where 	rownum <= 150
;

insert into t1
select	'same day'
from 	all_objects
where 	rownum <= 350
;

insert into t1
select 	'priority high'
from 	all_objects
where 	rownum <= 500
;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'t1',
		estimate_percent => 100,
		method_opt 	 => 'for all columns size 254'
	);
end;
/

select
        endpoint_number,
        endpoint_number - nvl(prev_endpoint,0)  frequency,
        hex_val,
        chr(to_number(substr(hex_val, 2,2),'XX')) ||
        chr(to_number(substr(hex_val, 4,2),'XX')) ||
        chr(to_number(substr(hex_val, 6,2),'XX')) ||
        chr(to_number(substr(hex_val, 8,2),'XX')) ||
        chr(to_number(substr(hex_val,10,2),'XX')) ||
        chr(to_number(substr(hex_val,12,2),'XX')),
        endpoint_actual_value
from    (
        select
                endpoint_number,
                lag(endpoint_number,1) over(
                        order by endpoint_number
                )                                                       prev_endpoint,
                to_char(endpoint_value,'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX')hex_val,
                endpoint_actual_value
        from
                user_tab_histograms
        where
                table_name = 'T1'
        and     column_name = 'V1'
        )
order by
        endpoint_number
;

ENDPOINT_NUMBER  FREQUENCY HEX_VAL                         CHR(TO ENDPOINT_ACTUAL_VALUE
--------------- ---------- ------------------------------- ------ ------------------------------------------
            150        150  6E65787420644D6D50DD04A6400000 next d
            650        500  7072696F72698970D6651967200000 priori
           1000        350  73616D6520645C36B7AD479D600000 same d

3 rows selected.

This example captures a histogram at a point when my data set has only three popular values in it, and those values are sufficiently different that their “6-byte encoding” is enough for Oracle to tell the difference between them so the histogram doesn’t use the endpoint_actual_value column.

But I have a problem. During the course of the working day I introduce some data with different status codes, and all the queries I run are about these status codes, but by the end of the day (when, possibly, the stats will be collected again by the automatic stats collection job) all the “interesting” data has been updated to one of the three popular values.

What does Oracle do about queries for my “rare” values ? It depends what they look like. Here’s a couple of examples from 10.2.0.3:

select
	count(*)
from
	t1
where
	v1 = 'overnight'
;

select
	count(*)
from
	t1
where
	v1 = 'priority low'
;

set autotrace off

Neither of these values is in the histogram, so I would like Oracle to give a low cardinality estimate to them. But one of the values is very similar to a value that DOES exist in the histogram. This leads to an interesting variation in the numbers that appear in the execution plans:

-----------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost  |
-----------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    11 |     2 |
|   1 |  SORT AGGREGATE    |      |     1 |    11 |       |
|*  2 |   TABLE ACCESS FULL| T1   |     1 |    11 |     2 |
-----------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("V1"='overnight')

Execution Plan
----------------------------------------------------------
Plan hash value: 3724264953

-----------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost  |
-----------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    11 |     2 |
|   1 |  SORT AGGREGATE    |      |     1 |    11 |       |
|*  2 |   TABLE ACCESS FULL| T1   |   500 |  5500 |     2 |
-----------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("V1"='priority low')

In the first case (‘overnight’) the optimizer has decided that the value doesn’t exist in the histogram. Since this database is running 10.2.0.3 it has set the resulting cardinality to 1 (this would be 75 in 10.2.0.4 or 11g – half the lowest known value in the histogram). But in the second case (‘priority low’) the optimizer has done its 6-byte manipulation and decided that ‘priority low’ matches ‘priority high’, and given it a cardinality of 500.

If Oracle had seen ‘priority low’ when creating the histogram it would have used the endpoint_actual_value column and generated two separate rows for ‘priority low’ and ‘priority high’. But it is perfectly feasible that because of timing (the interesting data exists only briefly during the day) or sampling (the interesting data is tiny compared to the popular values) you could see Oracle taking a “large data” execution path when you know that you’re only expecting to handle a small data set.

The solution, of course, is to write code to create a histogram that represents the situation as you want Oracle to see it.

Syndicate content