Hacking together faster INSERTs

connor_mc_d's picture

Most developers tools out there have some mechanism to unload a table into a flat file, either as CSV, or Excel, and some even allow you to unload the data as INSERT statements. The latter is pretty cool because it’s a nice way of having a self-contained file that does not need Excel or DataPump or any tool additional to the one you’re probably using to unload the data.

SQLcl and SQL Developer are perhaps the easiest to utilize for such an extract. You simply add the pseudo-hint INSERT to get the output as insert statements. For example:


SQLcl: Release 18.4 Production on Wed Apr 17 17:05:49 2019

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

Last Successful login time: Wed Apr 17 2019 17:05:49 +08:00

Connected to:
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Version 18.6.0.0.0

SQL> select /*insert*/ * from t where rownum <= 10;
REM INSERTING into T
SET DEFINE OFF;
Insert into T (OWNER,OBJECT_NAME,OBJECT_TYPE) values ('SYS','I_FILE#_BLOCK#','INDEX');
Insert into T (OWNER,OBJECT_NAME,OBJECT_TYPE) values ('SYS','I_OBJ3','INDEX');
Insert into T (OWNER,OBJECT_NAME,OBJECT_TYPE) values ('SYS','I_TS1','INDEX');
Insert into T (OWNER,OBJECT_NAME,OBJECT_TYPE) values ('SYS','I_CON1','INDEX');
Insert into T (OWNER,OBJECT_NAME,OBJECT_TYPE) values ('SYS','IND$','TABLE');
Insert into T (OWNER,OBJECT_NAME,OBJECT_TYPE) values ('SYS','CDEF$','TABLE');
Insert into T (OWNER,OBJECT_NAME,OBJECT_TYPE) values ('SYS','C_TS#','CLUSTER');
Insert into T (OWNER,OBJECT_NAME,OBJECT_TYPE) values ('SYS','I_CCOL2','INDEX');
Insert into T (OWNER,OBJECT_NAME,OBJECT_TYPE) values ('SYS','I_PROXY_DATA$','INDEX');
Insert into T (OWNER,OBJECT_NAME,OBJECT_TYPE) values ('SYS','I_CDEF4','INDEX');
...
...

That’s pretty cool but no matter what tool you use, ultimately you end up with a set of insert statements that will load 1 row at a time into the database. For a small number of rows, that is no problem. But for larger numbers of rows then you might sitting around for a long time waiting for that script to finish. There is two things that are going to slow us down here.

1) Parsing

Every insert is a brand new statement since it contains literal values. Even for a simple INSERT, that’s a lot of parsing work for the database. This is easily solved with a couple of ALTER SESSION wrappers at the head and tail of the script.


alter session set cursor_sharing = force;
Insert into T (OWNER,OBJECT_NAME,OBJECT_TYPE) values ('SYS','I_FILE#_BLOCK#','INDEX');
Insert into T (OWNER,OBJECT_NAME,OBJECT_TYPE) values ('SYS','I_OBJ3','INDEX');
Insert into T (OWNER,OBJECT_NAME,OBJECT_TYPE) values ('SYS','I_TS1','INDEX');
Insert into T (OWNER,OBJECT_NAME,OBJECT_TYPE) values ('SYS','I_CON1','INDEX');
Insert into T (OWNER,OBJECT_NAME,OBJECT_TYPE) values ('SYS','IND$','TABLE');
Insert into T (OWNER,OBJECT_NAME,OBJECT_TYPE) values ('SYS','CDEF$','TABLE');
Insert into T (OWNER,OBJECT_NAME,OBJECT_TYPE) values ('SYS','C_TS#','CLUSTER');
Insert into T (OWNER,OBJECT_NAME,OBJECT_TYPE) values ('SYS','I_CCOL2','INDEX');
Insert into T (OWNER,OBJECT_NAME,OBJECT_TYPE) values ('SYS','I_PROXY_DATA$','INDEX');
Insert into T (OWNER,OBJECT_NAME,OBJECT_TYPE) values ('SYS','I_CDEF4','INDEX');
...
...
alter session set cursor_sharing = exact;

That will replace the literals with bind variable values and hence all of the INSERTs will be sharable.

2) Row by Row

A set of INSERT statements is inherently a row by row operation. One INSERT = one row. Of course, we developers have all had drummed into our programming minds over the years that database operations should be done in sets not row by row. But how can we get to that utopia with set of insert statements? We could totally rewrite the script with bind arrays and the like, but that’s a big job. So here is my quick hack to get convert single row inserts into multiple row inserts without too much effort.

A set of individual INSERTS such as


Insert into T (OWNER,OBJECT_NAME,OBJECT_TYPE) values ('SYS','I_FILE#_BLOCK#','INDEX');
Insert into T (OWNER,OBJECT_NAME,OBJECT_TYPE) values ('SYS','I_OBJ3','INDEX');
Insert into T (OWNER,OBJECT_NAME,OBJECT_TYPE) values ('SYS','I_TS1','INDEX');
Insert into T (OWNER,OBJECT_NAME,OBJECT_TYPE) values ('SYS','I_CON1','INDEX');
Insert into T (OWNER,OBJECT_NAME,OBJECT_TYPE) values ('SYS','IND$','TABLE');

can also be written as a multi-table INSERT, where all insert targets are the same table and all WHEN conditions are omitted and hence are implicitly true. The above set of INSERTs can be phrased as a single statement like this:


insert all
into t(OWNER,OBJECT_NAME,OBJECT_TYPE)  values (q'{SYS}',q'{I_FILE#_BLOCK#}',q'{INDEX}')
into t(OWNER,OBJECT_NAME,OBJECT_TYPE)  values (q'{SYS}',q'{I_OBJ3}',q'{INDEX}')
into t(OWNER,OBJECT_NAME,OBJECT_TYPE)  values (q'{SYS}',q'{I_TS1}',q'{INDEX}')
into t(OWNER,OBJECT_NAME,OBJECT_TYPE)  values (q'{SYS}',q'{I_CON1}',q'{INDEX}')
into t(OWNER,OBJECT_NAME,OBJECT_TYPE)  values (q'{SYS}',q'{IND$}',q'{TABLE}')
select * from dual;

You could write some awk/sed etc to take an existing INSERT script and make it into a multiple table one. I took a slightly different approach and whipped up some PL/SQL to allow pipelining the appropriate multi-table inserts out to a spool file. Here is the function I wrote to do it.


create or replace function as_insert(p_query varchar2, p_batch int default 10) return sys.odcivarchar2list pipelined as
    l_theCursor     integer default dbms_sql.open_cursor;
    l_columnValue   varchar2(4000);
    l_status        integer;
    l_descTbl       dbms_sql.desc_tab;
    l_colCnt        number;
    n number := 0;
    
    l_tname varchar2(200) := substr(p_query,instr(p_query,' ',-1,1)+1);
    l_collist varchar2(32000);
    l_colval varchar2(32000);
    l_dml varchar2(32000);
    
    l_nls sys.odcivarchar2list := sys.odcivarchar2list();
    
begin
   if l_tname is null then l_tname := '@@TABLE@@'; end if;

   select value
   bulk collect into l_nls
   from v$nls_parameters
   where parameter in (   
      'NLS_DATE_FORMAT',
      'NLS_TIMESTAMP_FORMAT',
      'NLS_TIMESTAMP_TZ_FORMAT')
   order by parameter;

    execute immediate 'alter session set nls_date_format=''yyyy-mm-dd hh24:mi:ss'' ';
    execute immediate 'alter session set nls_timestamp_format=''yyyy-mm-dd hh24:mi:ssff'' ';
    execute immediate 'alter session set nls_timestamp_tz_format=''yyyy-mm-dd hh24:mi:ssff tzr'' ';

    dbms_sql.parse(  l_theCursor,  p_query, dbms_sql.native );
    dbms_sql.describe_columns( l_theCursor, l_colCnt, l_descTbl );

    for i in 1 .. l_colCnt loop
        dbms_sql.define_column(l_theCursor, i, l_columnValue, 4000);
        l_collist := l_collist || l_descTbl(i).col_name||',';
    end loop;
    l_collist := 'into '||l_tname||'('||rtrim(l_collist,',')||')';

    l_status := dbms_sql.execute(l_theCursor);

    pipe row('alter session set cursor_sharing = force;');
    while ( dbms_sql.fetch_rows(l_theCursor) > 0 ) loop
       n := n + 1;
  
       if mod(n,p_batch) = 1 then
          pipe row('insert all ');
       end if;
       
        for i in 1 .. l_colCnt loop
            dbms_sql.column_value( l_theCursor, i, l_columnValue );
            if l_columnValue is null then
              l_colval := l_colval || 'null,';
            elsif l_descTbl(i).col_type in (1,8,9,96,112) then
              l_colval := l_colval || 'q''{'||l_columnValue ||'}''' || ',';
            elsif l_descTbl(i).col_type in (2,100,101) then
              l_colval := l_colval || l_columnValue || ',';
            elsif l_descTbl(i).col_type in (12) then
              l_colval := l_colval || 'to_date('''||l_columnValue||''',''yyyy-mm-dd hh24:mi:ss'')' || ',';
            elsif l_descTbl(i).col_type in (180) then
              l_colval := l_colval || 'to_timestamp('''||l_columnValue||''',''yyyy-mm-dd hh24:mi:ssff'')' || ',';
            elsif l_descTbl(i).col_type in (181) then
              l_colval := l_colval ||'to_timestamp_tz('''||l_columnValue||''',''yyyy-mm-dd hh24:mi:ssff tzr'')' || ',';
            elsif l_descTbl(i).col_type in (231) then
              l_colval := l_colval || 'to_timestamp('''||l_columnValue||''',''yyyy-mm-dd hh24:mi:ssff'')' || ',';
            elsif l_descTbl(i).col_type in (182) then
              l_colval := l_colval || 'to_yminterval('''||l_columnValue||''')' || ',';
            elsif l_descTbl(i).col_type in (183) then
              l_colval := l_colval ||'to_dsinterval('''||l_columnValue||''')'  || ',';
            end if;
        end loop;
        l_colval := rtrim(l_colval,',')||')';
        pipe row( l_collist  );
        pipe row( '  values ('||l_colval );
        if mod(n,p_batch) = 0 then
          pipe row('select * from dual;');
        end if;
        l_colval := null;
    end loop;
    if n = 0 then
      pipe row( 'No data found ');
    elsif mod(n,p_batch) != 0 then
      pipe row('select * from dual;');
    end if;
    pipe row('alter session set cursor_sharing = exact;');

    execute immediate 'alter session set nls_date_format='''||l_nls(1)||''' ';
    execute immediate 'alter session set nls_timestamp_format='''||l_nls(2)||''' ';
    execute immediate 'alter session set nls_timestamp_tz_format='''||l_nls(3)||''' ';
    return;
end;
/

and here’s an example of the output you’d expect to see from it


alter session set cursor_sharing = force;
insert all
into scott.emp(EMPNO,ENAME,SAL)
  values (7369,q'{SMITH}',800)
into scott.emp(EMPNO,ENAME,SAL)
  values (7499,q'{ALLEN}',1600)
into scott.emp(EMPNO,ENAME,SAL)
  values (7521,q'{WARD}',1250)
into scott.emp(EMPNO,ENAME,SAL)
  values (7566,q'{JONES}',2975)
into scott.emp(EMPNO,ENAME,SAL)
  values (7654,q'{MARTIN}',1250)
into scott.emp(EMPNO,ENAME,SAL)
  values (7698,q'{BLAKE}',2850)
into scott.emp(EMPNO,ENAME,SAL)
  values (7782,q'{CLARK}',2450)
into scott.emp(EMPNO,ENAME,SAL)
  values (7788,q'{SCOTT}',3000)
into scott.emp(EMPNO,ENAME,SAL)
  values (7839,q'{KING}',5000)
into scott.emp(EMPNO,ENAME,SAL)
  values (7844,q'{TURNER}',1500)
select * from dual;
insert all
into scott.emp(EMPNO,ENAME,SAL)
  values (7876,q'{ADAMS}',1100)
into scott.emp(EMPNO,ENAME,SAL)
  values (7900,q'{JAMES}',950)
into scott.emp(EMPNO,ENAME,SAL)
  values (7902,q'{FORD}',3000)
into scott.emp(EMPNO,ENAME,SAL)
  values (7934,q'{MILLER}',1300)
select * from dual;
alter session set cursor_sharing = exact;

Lets do a performance test on a large sample set. In this test, I’m loading a little over one million rows into an empty table, where the performance harness for each INSERT mechanism looks like the following to output the start and end times for the script:


select systimestamp from dual;
alter session set cursor_sharing = force;
set feedback off


[ Set of INSERT, or set of MULTI-TABLE inserts with 50 rows per INSERT]
set feedback on
alter session set cursor_sharing = exact;
select systimestamp from dual;

Running each gave the following results.


SQL> @c:\tmp\normal_insert.sql

SYSTIMESTAMP
---------------------------------------
17-APR-19 03.39.53.189000 PM +08:00


Session altered.


Session altered.


SYSTIMESTAMP
---------------------------------------
17-APR-19 03.43.47.920000 PM +08:00


SQL> @c:\tmp\multi_insert.sql

SYSTIMESTAMP
---------------------------------------
17-APR-19 03.43.40.402000 PM +08:00


Session altered.


Session altered.


SYSTIMESTAMP
---------------------------------------
17-APR-19 03.44.27.319000 PM +08:00

So that’s a nice four-fold speed boost, down from 3mins 54seconds to less than 50seconds.

Just to reiterate – I’m not suggesting you need to go this level for all INSERT scripts. Most of the time I just add the cursor_sharing options to my scripts, and that makes them plenty fast enough for the job. But if you have got a large number of inserts to do, then converting them to a multi-table equivalent might give you a nice boost.

Here’s the video version

To prevent automated spam submissions leave this field empty.