Jonathan Lewis's picture

Following on from a Twitter reference and an update to an old posting about a side effect of  constraints on the work done inserting data, I decided to have a closer look at the more general picture of default values and inserts. Here’s a script that I’ve tested against,, and (original install, no patches applied in all cases):

rem     Script:         defaults_cost.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Dec 2017

create table t1 (
        column1  varchar2(10),
        column2  varchar2(10),
        column3  varchar2(10),
        column4  varchar2(10),
        column32 varchar2(32)   default 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx',
        column33 varchar2(33)   default 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx',
        virtual1      generated always as (
                column4 || column3 || column2 || column1
        ) virtual
segment creation immediate

execute dbms_output.put_line(dbms_stats.create_extended_stats(user,'t1','(column1 || column2 || column3 || column4)'))
execute dbms_output.put_line(dbms_stats.create_extended_stats(user,'t1','(column1,column2,column3)'))

create or replace function plsqlfunction_with_30char_name(
        i_in varchar2
return varchar
        return initcap(i_in);

create index t1_i1 on t1(substr(plsqlfunction_with_30char_name(column1),1,10));

When you create a function-based index you get a hidden, virtual column supporting the index expression; when you create extended stats (of either type) you get a hidden virtual column holding the extension definition, when you create any type of virtual column, including a “real” virtual column you get a data dictionary entry holding the column name and the expression definition: all these options use the “data_default” column from user_tab_cols to display the defining information – as we can see when we the following query:

select  column_name, data_default
from    user_tab_cols
where   table_name = 'T1'
order by

COLUMN_NAME                      DATA_DEFAULT
-------------------------------- --------------------------------------------------------------------------------
COLUMN32                         'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
COLUMN33                         'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
VIRTUAL1                         "COLUMN4"||"COLUMN3"||"COLUMN2"||"COLUMN1"
SYS_NC00010$                     SUBSTR("TEST_USER"."PLSQLFUNCTION_WITH_30CHAR_NAME"("COLUMN1"),1,10)

Apart from the special cases I’ve just listed, you’ll also see the “default values” I specified for column32 and column33 – you’ll notice that I’ve supplied a 30 character string as the default for column32, and a 31 character string as the default for column33 – this is a convenience that means the used space in the data_default (which is a long column) corresponds to the name of the column once you include the single quotes in the their character count.

Having set my data up I’m going to emulate a bad application that uses lots of literal string SQL and leaves Oracle to fill in the default values (and, of course, derive the various virtual values it might need).

alter session set events '10046 trace name context forever, level 4';

        for i in 1..10 loop
                execute immediate '
                        insert into t1 (column1, column2, column3, column4)
                        values( ' || i || ', ' || i || ', ' || i || ', ' || i || ')'
        end loop;

alter session set events '10046 trace name context off';

This code generates 10 strings that populate column1 through to column4 only. But you’ll notice the call to enable SQL tracing – and here’s the interesting bit of the output from applying tkprof to the trace file:

  SQL ID: 47r1y8yn34jmj Plan Hash: 2191121161

select default$
 col$ where rowid=:1

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse       50      0.00       0.00          0          0          0           0
Execute     50      0.00       0.00          0          0          0           0
Fetch       50      0.00       0.00          0        100          0          50
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      150      0.00       0.00          0        100          0          50

The summary is the same for all three versions of Oracle I tested*** – we’ve queried for a default value from col$ 5 times for each row we insert! (Technically that’s 5 times for each insert statement we’ve had to (hard-)parse; this anomaly wouldn’t appear if we have been using a bind-variable method and reusing the insert statement.) *** There is one difference in 12.2 – the number of parse calls reported for this statement was 1 rather than 50 but, judging by the various cursor cache hit stats, that may be due to a change in accounting rather than a change in workload.

Check the table definition: there are two “real defaults” and 4 expressions due to the various virtual columns – so why 5 calls per insert and not 6 ? The answer lies in the length of the actual value involved – if the text that appears in the (long) data_default column is 32 characters or shorter it will be stored in the the dictionary cache (rowcache), but only one of our 6 values is that short, so Oracle looks up the other five on each insert (hard parse).

This is a little strange on two counts: first, given the enormous memories that have been available for years and the extreme usefulness of virtual columns and extended stats it seems something of an oversight to limit the memory available to the cache that holds such critical definitions; secondly, we don’t need to evaluate the virtual columns (unless they are indexed) or extended stats on inserts so why is Oracle querying the definitions anyway ? [Possible answer: perhaps Oracle is using generic code that allows for column check constraints – which might exist on users’ virtual columns – and hasn’t catered for bypassing system-generated virtual columns.]

A key point to remember before you start worrying too much about the impact of the execution count for this query against col$ is that it’s only relevant to “hard” parses – so you’re only likely to notice it if you have a large number of different “literal string” inserts that should be using bind variables; and that means you’ve probably got an important coding defect to address before you worry too much about the extra impact caused by this particular call. Nevertheless there are a couple of bug reports on MoS that have been raised against this query and, after writing this note, I did a quick Google search for the critical SQL_ID and found (among others) this production example from Franck Pachot.