## Join Cardinality – 5

Posted November 1, 2018So far in this series I’ve written about the way that the optimizer estimates cardinality for an equijoin where one end of the join has a frequency histogram and the other end has a histogram of type:

## Upgrades – again

Posted October 28, 2018I’ve got a data set which I’ve recreated in 11.2.0.4 and 12.2.0.1.

I’ve generated stats on the data set, and the stats are identical.

I don’t have any indexes or extended stats, or SQL Plan directives or SQL Plan Profiles, or SQL Plan Baselines, or SQL Patches to worry about.

I’m joining two tables, and the join column on one table has a * frequency histogram* while the join column on the other table has a

*. The histograms were created with*

**height-balanced histogram***=> 100%. (which explains why I’ve got a height-balanced histogram in 12c rather than a*

**estimate_percent***.)*

**hybrid histogram**Here are the two execution plans, 11.2.0.4 first, pulled from memory by * dbms_xplan.display_cursor()*:

## Join Cardinality – 4

Posted October 25, 2018In previous installments of this series I’ve been describing how Oracle estimates the join cardinality for single column joins with equality where the columns have histograms defined. So far I’ve covered two options for the types of histogram involved: * frequency to frequency*, and

**. Today it’s time to examine**

*frequency to top-frequency**frequency to hybrid*.

My first thought about this combination was that it was likely to be very similar to *frequency to top-frequency* because a hybrid histogram has a list of values with “repeat counts” (which is rather like a simple frequency histogram), and a set of buckets with variable sizes that could allow us to work out an *“average selectivity”* of the rest of the data.

## Upgrade threat

Posted October 23, 2018Here’s one I’ve just discovered while trying to build a reproducible test case – that didn’t reproduce because an internal algorithm has changed.

If you upgrade from 12c to 18c and have a number of * hybrid* histograms in place you may find that some execution plans change because of a change in the algorithm for producing hybrid histograms (and that’s not just if you happen to get the patch that fixes

**the top-frequency/hybrid bug**relating to high values).

Here’s a little test to demonstrate how I wasted a couple of hours trying to solve the wrong problem – first a simple data set:

## Hybrid Fake

Posted October 10, 2018Oracle 12c introduced the * “Hybrid” histogram* – a nice addition to the available options and one that (ignoring

*) supplies the optimizer with better information about the data than the equivalent height-balanced histogram. There is still a problem, though, in the trade-off between accuracy and speed: just as it does with height-balanced histograms when using*

**the bug for which a patch has been created***Oracle samples (typically) about 5,500 rows to create a hybrid histogram, and the SQL it uses to generate the necessary summary is essentially an aggregation of the sample, so either you have a small sample with the risk of lower accuracy or a large sample with an increase in workload.*

**auto_sample_size**## Join Cardinality – 3

Posted October 9, 2018In the * previous posting* I listed the order of precision of histograms as:

## Hacking for Skew

Posted September 28, 2018In my presentation to the UKOUG SIG yesterday *“Struggling with Statistics – part 2”* I described a problem that I wrote about a few months ago: when you join a fact table with a massively skewed distribution on one of the surrogate key columns to a dimension holding the unique list of keys and descriptions a query against a description “loses” the skew. Here’s an demo of the problem that’s a little simpler than the one in * the previous article*.

## Extended Histograms – 2

Posted August 2, 2018Following on from * the previous posting* which raised the idea of faking a frequency histogram for a column group (extended stats), this is just a brief demonstration of how you can do this. It’s really only a minor variation of something I’ve published before, but it shows how you can use a query to generate a set of values for the histogram and it pulls in a detail about how Oracle generates and stores column group values.

We’ll start with the same table as we had before – two columns which hold only the combinations (‘Y’, ‘N’) or (‘N’, ‘Y’) in a very skewed way, with a requirement to ensure that the optimizer provides an estimate of 1 if a user queries for (‘N’,’N’) … and I’m going to go the extra mile and create a histogram that does the same when the query is for the final possible combination of (‘Y’,’Y’).

## Extended Histograms

Posted July 31, 2018Today’s little puzzle comes courtesy of * the Oracle-L mailing list*. A table has two columns (

*and*

**c2***), which contain only the values*

**c3***‘Y’*and

*‘N’*, with the following distribution:

select c2, c3, count(*) from t1 group by c2, c3 ; C C COUNT(*) - - ---------- N Y 1994 Y N 71482 2 rows selected.

The puzzle is this – how do you get the optimizer to predict a cardinality of zero (or, using its best approximation, 1) if you execute a query where the predicate is:

where c2 = 'N' and c3 = 'N'

Here are 4 tests you might try:

## exp catch

Posted April 10, 2018No-one should be using * exp/imp* to export and import data any more, they should be using the

*equivalents*

**datapump***– but if you’re on an older (pre-12c) version of Oracle and still using exp/imp to do things like moving tables with their production statistics over to test systems then be careful that you don’t fall into an obsolescence trap when you finally upgrade to 12c (or Oracle 18).*

**expdp/impdp***exp/imp will mess up some of your histograms if you’re still using them to move tables/statistics in 12c.*