histograms

Jonathan Lewis's picture

Join Cardinality – 5

So far in this series I’ve written about the way that the optimizer estimates cardinality for an equijoin where one end of the join has a frequency histogram and the other end has a histogram of type:

Jonathan Lewis's picture

Upgrades – again

I’ve got a data set which I’ve recreated in 11.2.0.4 and 12.2.0.1.

I’ve generated stats on the data set, and the stats are identical.

I don’t have any indexes or extended stats, or SQL Plan directives or SQL Plan Profiles, or SQL Plan Baselines, or SQL Patches to worry about.

I’m joining two tables, and the join column on one table has a frequency histogram while the join column on the other table has a height-balanced histogram.  The histograms were created with estimate_percent => 100%. (which explains why I’ve got a height-balanced histogram in 12c rather than a hybrid histogram.)

Here are the two execution plans, 11.2.0.4 first, pulled from memory by dbms_xplan.display_cursor():

Jonathan Lewis's picture

Join Cardinality – 4

In previous installments of this series I’ve been describing how Oracle estimates the join cardinality for single column joins with equality where the columns have histograms defined. So far I’ve  covered two options for the types of histogram involved: frequency to frequency, and frequency to top-frequency. Today it’s time to examine frequency to hybrid.

My first thought about this combination was that it was likely to be very similar to frequency to top-frequency because a hybrid histogram has a list of values with “repeat counts” (which is rather like a simple frequency histogram), and a set of buckets with variable sizes that could allow us to work out an “average selectivity” of the rest of the data.

Jonathan Lewis's picture

Upgrade threat

Here’s one I’ve just discovered while trying to build a reproducible test case – that didn’t reproduce because an internal algorithm has changed.

If you upgrade from 12c to 18c and have a number of hybrid histograms in place you may find that some execution plans change because of a change in the algorithm for producing hybrid histograms (and that’s not just if you happen to get the patch that fixes the top-frequency/hybrid bug relating to high values).

Here’s a little test to demonstrate how I wasted a couple of hours trying to solve the wrong problem – first a simple data set:

Jonathan Lewis's picture

Hybrid Fake

Oracle 12c introduced the “Hybrid” histogram – a nice addition to the available options and one that (ignoring the bug for which a patch has been created) supplies the optimizer with better information about the data than the equivalent height-balanced histogram. There is still a problem, though, in the trade-off between accuracy and speed: just as it does with height-balanced histograms when using auto_sample_size Oracle samples (typically) about 5,500 rows to create a hybrid histogram, and the SQL it uses to generate the necessary summary is essentially an aggregation of the sample, so either you have a small sample with the risk of lower accuracy or a large sample with an increase in workload.

Jonathan Lewis's picture

Join Cardinality – 3

In the previous posting I listed the order of precision of histograms as:

Jonathan Lewis's picture

Hacking for Skew

In my presentation to the UKOUG SIG yesterday “Struggling with Statistics – part 2” I described a problem that I wrote about a few months ago: when you join a fact table with a massively skewed distribution on one of the surrogate key columns to a dimension holding the unique list of keys and descriptions a query against a description “loses” the skew. Here’s an demo of the problem that’s a little simpler than the one in the previous article.

Jonathan Lewis's picture

Extended Histograms – 2

Following on from the previous posting which raised the idea of faking a frequency histogram for a column group (extended stats), this is just a brief demonstration of how you can do this. It’s really only a minor variation of something I’ve published before, but it shows how you can use a query to generate a set of values for the histogram and it pulls in a detail about how Oracle generates and stores column group values.

We’ll start with the same table as we had before – two columns which hold only the combinations (‘Y’, ‘N’) or (‘N’, ‘Y’) in a very skewed way, with a requirement to ensure that the optimizer provides an estimate of 1 if a user queries for (‘N’,’N’) … and I’m going to go the extra mile and create a histogram that does the same when the query is for the final possible combination of (‘Y’,’Y’).

Jonathan Lewis's picture

Extended Histograms

Today’s little puzzle comes courtesy of the Oracle-L mailing list. A table has two columns (c2 and c3), which contain only the values ‘Y’ and ‘N’, with the following distribution:


select   c2, c3, count(*)
from     t1
group by c2, c3
;

C C   COUNT(*)
- - ----------
N Y       1994
Y N      71482

2 rows selected.

The puzzle is this – how do you get the optimizer to predict a cardinality of zero (or, using its best approximation, 1) if you execute a query where the predicate is:

where   c2 = 'N' and c3 = 'N'

Here are 4 tests you might try:

Jonathan Lewis's picture

exp catch

No-one should be using exp/imp to export and import data any more, they should be using the datapump equivalents expdp/impdp – but if you’re on an older (pre-12c) version of Oracle and still using exp/imp to do things like moving tables with their production statistics over to test systems then be careful that you don’t fall into an obsolescence trap when you finally upgrade to 12c (or Oracle 18).

exp/imp will mess up some of your histograms if you’re still using them to move tables/statistics in 12c.

To prevent automated spam submissions leave this field empty.
Syndicate content