statistics

connor_mc_d's picture

No more stale statistics in 19c

There is an odd contradiction that we all encounter for most databases, especially if they are predominantly used during the business day. Here is how that contradiction comes to be – it is in the way that we obtain and use optimizer  statistics on those databases. The contradiction runs like this:

  • To minimize service disruption, we gather statistics at a quiet time, for example, in the middle of the night
  • We then use those statistics during the business day whilst user activity is at its highest.
  • Highest user activity will typically mean the highest frequency of data changes.
  • Hence the statistics are at their peak accuracy when no-one is using them to optimize queries, and they are at their least accurate when everyone is using them to optimize queries!

We can demonstrate this easily with the following script run in 18c.

Jonathan Lewis's picture

gather_system_stats

What happens when you execute dbms_stats.gather_system_stats() with the ‘Exadata’ option ?

Here’s what my system stats look like (12.2.0.1 test results) after doing so. (The code to generate the two different versions is at the end of the note).

Jonathan Lewis's picture

opt_estimate 5

If you’ve been wondering why I resurrected my drafts on the opt_estimate() hint, a few weeks ago I received an email containing an example of a query where a couple of opt_estimate() hints were simply not working. The critical features of the example was that the basic structure of the query was of a type that I had not previously examined. That’s actually a common type of problem when trying to investigate any Oracle feature from cold – you can spend days thinking about all the possible scenarios you should model then the first time you need to do apply your knowledge to a production system the requirement falls outside every model you’ve examined.

Before you go any further reading this note, though, I should warn you that it ends in frustration because I didn’t find a solution to the problem I wanted to fix – possibly because there just isn’t a solution, possibly because I didn’t look hard enough.

Jonathan Lewis's picture

opt_estimate 4

In the previous article in this series on the opt_estimate() hint I mentioned the “query_block” option for the hint. If you can identify a specify query block that becomes an “outline_leaf” in an execution plan (perhaps because you’ve deliberately given an query block name to an inline subquery and applied the no_merge() hint to it) then you can use the opt_estimate() hint to tell the optimizer how many rows will be produced by that query block (each time it starts). The syntax of the hint is very simple:

Jonathan Lewis's picture

opt_estimate 3

This is just a quick note to throw out a couple of of the lesser-known options for the opt_estimate() hint – and they may be variants that are likely to be most useful since they address a problem where the optimizer can produce consistently bad cardinality estimates. The first is the “group by” option – a hint that I once would have called a “strategic” hint but which more properly ought to be called a “query block” hint. Here’s the simplest possible example (tested under 12.2, 18.3 and 19.2):

Jonathan Lewis's picture

opt_estimate 2

This is a note that was supposed to be a follow-up to an initial example of using the opt_estimate() hint to manipulate the optimizer’s statistical understanding of how much data it would access and (implicitly) how much difference that would make to the resource usage. Instead, two years later, here’s part two – on using opt_estimate() with nested loop joins. As usual I’ll start with a little data set:

Franck Pachot's picture

When Oracle Statistic Gathering times out.

When Oracle Statistic Gathering times out — I

This first part is about running manually, killing the job, and locking the stats

In a previous post, I explained how to see where the Auto Stats job has been running and timed out:

SYS.STATS_TARGET$

I got a case where it always timed out at the end of the standard maintenance window. One table takes many hours, longer than the largest maintenance window, it will always be killed at the end. And, because it stayed stale, and staler each day, this table was always listed first by the Auto Stat job. And many tables never got their chance to get their stats gathered for … years.

In that case, the priority is to gather statistics. That can be long. Then I run the job manually:

Jonathan Lewis's picture

In-table predicates

This note was prompted by a recent email asking about the optimizer’s method for estimating the selectivity of a predicate which compared two columns in the same table – for example:  “where orders.amount_invoiced = orders.amount_paid”. It’s been about 14 years since I wrote “Cost Based Oracle – Fundamentals” so my memory of what I wrote (and whether I even mentioned this case) was rather hazy, so I sent off a quick reply and decided to do a little checking.

Jonathan Lewis's picture

Stats advisor

This is just a little shout-out about the Stats Advisor – if you decide to give it a go, what sort of things is it likely to tell you. The answer is in a dynamic performance view called v$stats_advisor_rules – which I’ve list below from an instance running 18.3.0.0.

Jonathan Lewis's picture

Hash Partitions

Here’s an important thought if you’ve got any large tables which are purely hash partitioned. As a general guideline you should not need partition level stats on those tables. The principle of hash partitioned tables is that the rows are distributed uniformly and randomly based on the hash key so, with the assumption that the number of different hash keys is “large” compared to the number of partitions, any one partition should look the same as any other partition.

To prevent automated spam submissions leave this field empty.
Syndicate content