dbms_random

Jonathan Lewis's picture

In a recent ODC thread someone had a piece of SQL that was calling dbms_random.string(‘U’,20) to generate random values for a table of 100,000,000 rows. The thread was about how to handle the ORA-30009 error (not enough memory for operation) that is almost inevitable when you use the “select from dual connect by level <= n” strategy for generating very large numbers of rows, but this example of calling dbms_random.string() so frequently prompted me to point out an important CPU saving , and then publicise through this blog a little known fact (or deduction) about the dbms_random.string() function.

If you generate a random string of length 6 using only upper-case letters there are 308,915,766 different combinations (266); so if you’re after “nearly unique” values for 100 million rows then a six character string is probably good enough – it might give you a small percentage of values which appear in a handful rows but most of the values are likely to be unique or have two rows. If you want to get closer to uniqueness then 7 characters will do it, and 8 will make it almost certain that you will get a unique value in every row.

So if you want “nearly unique” and “random 20 character strings” it’s probably sufficient to generate random strings of 6 to 8 characters and then rpad() them up to 20 characters with spaced – the saving in CPU will be significant; roughly a factor of 3 (which is going to matter when you’re trying to generate 100 million rows. As a little demo I supplied the OP with a script to create a table of just one million random strings – first of 20 random characters, then of 6 random characters with 14 spaces appended. The run time (mostly CPU) dropped from 1 minute 55 seconds to 41 seconds.

Why is there such a difference ? Because to generate a random string of 6 characters Oracle generates a random string of one character six times in a row and concatenates them. The difference between 6 calls and 20 calls per row gives you that factor of around 3. For a quick demo, try running the following anonymous PL/SQL block:

rem
rem     Script:         random_speed.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2010
rem

begin
        dbms_random.seed(0);
        dbms_output.put_line(dbms_random.string('U',6));
        dbms_output.new_line;

        dbms_random.seed(0);
        dbms_output.put_line(dbms_random.string('U',1));
        dbms_output.put_line(dbms_random.string('U',1));
        dbms_output.put_line(dbms_random.string('U',1));
        dbms_output.put_line(dbms_random.string('U',1));
        dbms_output.put_line(dbms_random.string('U',1));
        dbms_output.put_line(dbms_random.string('U',1));
end;
/

Jere are the results I got from instances of 12.1.0.2, 12.2.0.1, and 18.1.0.0 (from LiveSQL):


BVGFJB
B
V
G
F
J
B

I haven’t shown the tests for all the possible dbms_random.string() options but, unsurprisingly, changing the test to use the ‘L’ (lower case alpha) option produces the same effect (and the same 6 letters changed to lower case). The same effect, with different characters, also appeared using the ‘A’ (mixed case alpha), ‘X’ (uppercase alphanumeric) and ‘P’ (all printable characters) options.

I haven’t considered the effect of using a multi-byte character set – maybe Oracle calls its random number generator once per byte rather than once per character. The investigation is left as an exercise to the interested reader.

tl;dr

When generating a very large number of random strings – keep the “operational” part of the string as short as you can and leave the rest to be rpad()‘ed.

To prevent automated spam submissions leave this field empty.