statistics

Jonathan Lewis's picture

Unique Fail

As in – how come a unique (or primary key) index is predicted to return more than one row using a unique scan, for example (running on 10.2.0.3 – but the same type of thing happens on newer versions):

mwidlake's picture

Table High Water Mark and How Empty the Table Is

On a recent post about reducing the high water mark in a table by shrinking it someone asked if you could tell how much “space” there was in a table that could be recovered by compressing it.

Yes. As pseudo-code:

select number_of_rows*average_row_length. That is how much data there is.
Divided by the block size, to give how many blocks-worth of data you have.
Divided this by a “overhead” fudge factor to allow for wasted space (see later) of about 20%
Compare to the number of blocks used by the table.

Here is some code to do it, showing some extra data and how we calculate the effective percentage of space used (pct_spc_used). It runs against an example table ‘HWM’

Statistics poll

Thanks to everyone for spreading the link and participating in the poll on statistics. As it was pointed out in the comments, the poll isn’t very specific, I know. My point was to get an idea how many people use the defaults (I considered Oracle version 11g) for stats collection, how many do a very simple and reasonable tweak to get more stable execution plans, and how many people prefer custom statistics management. Well, I think the results will not change much and here are as following (326 votes at the time of writing):

Jonathan Lewis's picture

Extended Stats

I’m very keen on the 11g extended stats feature, but I’ve just discovered a critical weakness in one of the implementation details that could lead to some surprising instability in execution plans. It’s a combination of “column group” statistics and “out of range” predicates. Let’s start with  some sample data. (Note: I was running this test on 11.2.0.3):

create table t1
as
with generator as (
	select	--+ materialize
		rownum id 
	from dual 
	connect by 
		level <= 10000
)
select
	mod(rownum,100)		col1,
	mod(rownum,10)		col2
from
	generator	v1,
	generator	v2
where
	rownum <= 50000
;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1'
	);
end;
/

METHOD_OPT

Here is a poll prompted by today twitter talks on the default METHOD_OPT value (which is ‘for all columns size auto’ since 10g) and automatic statistics gathering job (auto-task since 11g). CBO development team suggests to use the default job to gather statistics with default options. Do you use the job exactly this way with real production databases? I’m very interested to see some numbers. Please share the poll link http://poll.fm/3n7pn so that more people vote. Thanks.

randolf.geist's picture

Incremental Partition Statistics Review

Introduction

Here is a summary of the findings while evaluating Incremental Partition Statistics that have been introduced in Oracle 11g.

The most important point to understand is that Incremental Partition Statistics are not "cost-free", so anyone who is telling you that you can gather statistics on the lowest level (partition or sub-partition in case of composite partitioning) without any noticeable overhead in comparison to non-incremental statistics (on the lowest level) is not telling you the truth.

Although this might be obvious I've already personally heard someone making such claims so it's probably worth to mention.

In principle you need to test on your individual system whether the overhead that is added to each statistics update on the lowest level outweighs the overhead of actually gathering statistics on higher levels, of course in particular on global level.

Jonathan Lewis's picture

NewDensity

A recent comment on a note I wrote some time ago about faking histograms asked about the calculations of selectivity in the latest versions of Oracle. As I read the question, I realised that I had originally supplied a formula for calculating cardinality, rather than selectivity, so I thought I’d supply a proper example.

We’ll start with a script to create some data and stats – and I’m going to start with a script I wrote in Jan 2001 (which is why it happens to use the analyze command rather than dbms_stats.gather_table_stats, even though this example comes from an instance of 11.2.0.2).

Jonathan Lewis's picture

I Wish

A couple of days ago I wrote about some things I’d like to see in index statistics, which means changing dbms_stats.gather_index_stats(). Here’s an idea for dbms_stats.gather_table_stats(). I’d like to see the number of chained and migrated rows counted separately in dba_tables when we run the packaged procedure. The optimizer will use information about chained or migrated rows, but the information is only collected if you use the analyze command (and even then the two figures are summed into the single value chain_cnt) .

Jonathan Lewis's picture

I wish

Here are a few thoughts on dbms_stats – in particular the procedure gather_index_stats.

The procedure counts the number of used leaf blocks and the number of distinct keys using a count distinct operation, which means you get an expensive aggregation operation when you gather stats on a large index. It would be nice efficiency feature if Oracle changed the code to use the new Approximate NDV mechanism for these counts.

Jonathan Lewis's picture

System Stats

A quick collation – and warning – for 11.2

Bottom line – be careful about what you do with system stats on 11.2

Footnote: the MOS link is a search string  producing a list of references. I set it up like that because one of the articles referencing the bug is called “Things to consider before upgrade to 11.2.0.2″ and it’s worth reading.

Addendum: one of the people on the two-day course I’ve just run in Berlin sent me a link for a quick note on how to set your own values for the system stats if you hit this bug. It’s actually quite a reasonable thing to do whether or not you hit the bug given the way that gathering the stats can produce unsuitable figures anyway:  setting system stats. (I’ve also added their company blog to the links on the right, they have a number interesting items and post fairly regularly.)

To prevent automated spam submissions leave this field empty.
Syndicate content