Oakies Blog Aggregator

Franck Pachot's picture

Oracle literal vs bind-variable in partition pruning and Top-N queries

Here is a query that retrieves the most recent value for a variable within a time range. The table has the variable ID, the timestamp and the value. All is properly indexed (it is actually an IOT) correctly: index on (VARIABLE_ID, TIMESTAMP, VALUE) locally partitioned on TIMESTAMP.

For whatever reason, the optimizer chooses an index fast full scan. With small partitions, this should not be a problem as we do not need to read all partitions: we want only the last value (rownum=1 after order by desc).

literals

Here is the execution plan with literals:

The interesting information is in the ‘Starts’ column: only two partitions have been read. That’s because we have found our row and there’s no need to continue. The line 5 is there for that with a STOPKEY on ROWNUM=1. This is possible because the partitions are read, by PARTITION RANGE ITERATOR, in the same order as we want from the ORDER BY: descending on TIMESTAMP. There’s a clue from Pstart/Pstop but the best way to see it is from partition pruning event 10128 level 2 (see https://support.oracle.com/epmos/faces/DocContentDisplay?id=166118.1)

The dump mentions a DESCENDING RANGE SCAN:

Partition Iterator Information:
partition level = PARTITION
call time = RUN
order = DESCENDING
Partition iterator for level 1:
iterator = RANGE [3954, 4266]
index = 4266
current partition: part# = 4266, subp# = 1048576, abs# = 4266

and the KKPAP_PRUNING table lists which one were actually read:

SQL> select ID,TYPE,LEVEL,ORDER,CALL,PARTNUM,APNUM 
from KKPAP_PRUNING;
ID TYPE  LEVEL     ORDER      CALL PARTNUM APNUM
-- ----- --------- ---------- ---- ------- -----
3 RANGE PARTITION DESCENDING RUN 4267 4266
3 RANGE PARTITION DESCENDING RUN 4266 4265

Those are the 2 ones counted by ‘Starts’.

bind variables

Here is the same query with bind variables for the timestamp range:

The big difference is that the STOPKEY is above the partition iterator here. This means that we cannot stop before having read all partitions. And that was the case: 313 ‘Starts’ — one for each partition — and 513 rows read before deciding which was the last one in the timestamp order.

It is different because the partitions are know known at parse time, because of the bind variables. This is why we see KEY/KEY rather than partition identifiers in Pstart/Pstop.

The event 10128 level 2 confirms that we have read all partitions:

SQL> select ID,TYPE,LEVEL,ORDER,CALL,PARTNUM,APNUM 
from KKPAP_PRUNING;
ID TYPE  LEVEL     ORDER      CALL PARTNUM APNUM
-- ----- --------- ---------- ---- ------- -----
6 RANGE PARTITION DESCENDING RUN 4267 4266
6 RANGE PARTITION DESCENDING RUN 4266 4265
6 RANGE PARTITION DESCENDING RUN 4265 4264
6 RANGE PARTITION DESCENDING RUN 4264 4263
...

However, as PARTITION DESCENDING is mentioned, I would have expected that the same optimization is possible as with the literals. That’s a big difference in execution time when the predicate covers many partitions, and we can see above that this is not considered by the optimizer: the cost is the same.

Note that Mauro Pagano has a note about the bind variables case where he mentions that the SORT step above the partition iterator happens even with RANGE SCAN:

What I forgot and had to relearn about DESC indexes

In summary, whether it is necessary or not, the optimizer may be more clever with partition iterations when he knows the partitions at optimization time, which means when parameters are passed as literals rather than bind variable. Of course, you must also consider the execution frequency because of the parsing (CPU and latch) overhead.

connor_mc_d's picture

2018-what grabbed your attention

Here are the blog posts that you hit on most this year, with the most viewed entry on top. Unsurprisingly it is about the release of 18c, but interestingly the ORA-14758 and the Active Sessions post have come up again from last years list, so it appears they are still common issues for the modern Oracle professional. And of course, it is nice to see that my Openworld Mega-download is helping the community.

Thanks for supporting the blog, and as always, there will be more content next year !

Franck Pachot's picture

Data Guard gap history

V$ARCHIVED_LOG has a lot of information, and one that is interesting in a Data Guard configuration is APPLIED, which a boolean, or rather a VARCHAR2(3) YES/NO as there are no booleans in Oracle SQL. But I would love to see a DATE or TIMESTAMP for it. As a workaround, here is a little AWK script that parses the standby alert.log to get this information. And join it with V$ARCHIVED_LOG.COMPLETION_TIME to see if we had gaps in the past between the archived logs and applied ones.

awk '
BEGIN {
print "set linesize 200 pagesize 1000"
print "column completion_time format a32"
print "column applied_time format a38"
print "column gap format a30"
s="with a as("
}
/^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]T[0-9][0-9]:[0-9][0-9]:[0-9][0-9][.][0-9]*[+][0-9]*:[0-9]*/ {
sub(/T/," ");ts=$0
}
/Media Recovery Log/{
print s" select timestamp" q ts q "applied_time,"q $NF q "name from dual "
s=" union all"
}
END{
print ") select thread#,sequence#,cast(completion_time as timestamp) completion_time,applied_time,applied_time-completion_time gap"
print " from a right outer join v$archived_log using(name) order by completion_time, applied_time;"
}
' q="'" $ORACLE_BASE/diag/rdbms/?*/?*/trace/alert_*.log | sqlplus -s / as sysdba

The result is something like this:

This tells me that I had a two hours gap around 12:15 that was resolved quickly.

Another example (RAC):

Here is an example in MAA (Data Guard + RAC) which encountered some problems. On one node I can see 3 recovery attempts for Thread 1 Sequence 8:

Actually, the recovery failed 3 times on this instance. Here is the detail in alert.log:

[oracle@cdb1-dg02-2 ~]$ grep -A 2 o1_mf_1_8_g1sqm9kw $ORACLE_BASE/diag/rdbms/?*/?*/trace/alert_*.log
PR00 (PID:30983): Media Recovery Log /u03/app/oracle/fast_recovery_area/ORCLDG02/archivelog/2018_12_21/o1_mf_1_8_g1sqm9kw_.arc
2018-12-21T12:37:05.588750+00:00
Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT NODELAY
--
Media Recovery Log /u03/app/oracle/fast_recovery_area/ORCLDG02/archivelog/2018_12_21/o1_mf_1_8_g1sqm9kw_.arc
2019-01-05T09:37:13.802072+00:00
Starting background process NSV0
--
Errors with log /u03/app/oracle/fast_recovery_area/ORCLDG02/archivelog/2018_12_21/o1_mf_1_8_g1sqm9kw_.arc
2019-01-05T09:38:12.389363+00:00
Media Recovery user canceled with status 16016
--
Media Recovery Log /u03/app/oracle/fast_recovery_area/ORCLDG02/archivelog/2018_12_21/o1_mf_1_8_g1sqm9kw_.arc
2019-01-05T09:38:39.837743+00:00
Starting background process NSV0
--
Errors with log /u03/app/oracle/fast_recovery_area/ORCLDG02/archivelog/2018_12_21/o1_mf_1_8_g1sqm9kw_.arc
2019-01-05T09:39:37.412067+00:00
Media Recovery user canceled with status 16016
--
Media Recovery Log /u03/app/oracle/fast_recovery_area/ORCLDG02/archivelog/2018_12_21/o1_mf_1_8_g1sqm9kw_.arc
2019-01-05T09:40:04.846300+00:00
Starting background process NSV0
--
Errors with log /u03/app/oracle/fast_recovery_area/ORCLDG02/archivelog/2018_12_21/o1_mf_1_8_g1sqm9kw_.arc
2019-01-05T09:41:02.277054+00:00
Media Recovery user canceled with status 16016
--
Media Recovery Log /u03/app/oracle/fast_recovery_area/ORCLDG02/archivelog/2018_12_21/o1_mf_1_8_g1sqm9kw_.arc
2019-01-05T14:32:06.257452+00:00
Starting background process RSM0

Then the other node attempted the recovery and failed 4 times before it is successful:

I used this only once. Usually, the apply gap is monitored in real-time from V$DATAGUARD_STATS (APPLY_LAG from TIME_COMPUTED). If you have any better idea to verify the past gap, please add a comment here or on the Twitter thread:

Of course, the best would be an Enhancement Request to get the APPLIED_TIME in V$ARCHIVED_LOG. Anyway, if you open an Active Data Guard for reporting, it is not a bad idea to have an heartbeat table with a scheduled timestamp update so that you can include the ‘as of’ date in your report.

Franck Pachot's picture

Oracle listener static service hi-jacking

We must be careful about the services that are registered to a listener because the user connects to them with a good idea of the database she wants to connect to, but another database or PDB can dynamically register a service with the same name, and then get the connections which were expected another destination. Of course, as a security best practice, user/password should not be the same in different databases, but what if the connection is done by a common user in multitenant? By creating a service, you can hi-jack the connections to CDB$ROOT and have them connected to your PDB.

You may think that static registration (the SID_LIST_LISTENER in listener.ora) is a solution, especially with the (STATIC_LISTENER=TRUE) introduced in 12cR2, but this defines only the target instance. The PDB is resolved dynamically.

This post is there only to raise awareness. I’ll not expose the possible exploits. But you can imagine: something that is scheduled to run as SYSDBA on CDB$ROOT can be high-jacked to be run on a PDB where you have more privileges to attempt some privilege escalation.

Here is my CDB1 database with PDB1 pluggable database. In addition to the default services, I’ve defined a static service CDB1_DGMGRL.

# lsnrctl status
LSNRCTL for Linux: Version 19.0.0.0.0 - Development on 05-JAN-2019 18:06:26
Copyright (c) 1991, 2018, Oracle.  All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=0.0.0.0)(PORT=1521)))
STATUS of the LISTENER
------------------------
Alias LISTENER
Version TNSLSNR for Linux: Version 19.0.0.0.0 - Development
Start Date 16-DEC-2018 17:33:58
Uptime 20 days 0 hr. 32 min. 28 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /u01/app/oracle/product/19.0.0/network/admin/listener.ora
Listener Log File /u01/app/oracle/diag/tnslsnr/VM190/listener/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=VM190)(PORT=1521)))
Services Summary...
Service "7cd707c6f0a7108ce053be38a8c0058b" has 1 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...
Service "CDB1" has 1 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...
Service "CDB1XDB" has 1 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...
Service "CDB1_DGMGRL" has 1 instance(s).
Instance "CDB1", status UNKNOWN, has 1 handler(s) for this service...
Service "pdb1" has 1 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...
The command completed successfully

I am the PDB administrator, connecting with the local user DEMO to PDB1 and create a service here, with the same name CDB1_DGMGRL

SQL> connect demo/demo@//localhost/PDB1
Connected.
SQL> show con_name
CON_NAME
------------------------------
PDB1
SQL> exec dbms_service.create_service(service_name=>'CDB1_DGMGRL', network_name=>'CDB1_DGMGRL');
PL/SQL procedure successfully completed.
SQL> exec dbms_service.start_service('CDB1_DGMGRL');
PL/SQL procedure successfully completed.

As I have started the service, it is registered by the listener, in addition to the static one.

Services Summary...
Service "7cd707c6f0a7108ce053be38a8c0058b" has 1 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...
Service "CDB1" has 1 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...
Service "CDB1XDB" has 1 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...
Service "CDB1_DGMGRL" has 2 instance(s).
Instance "CDB1", status UNKNOWN, has 1 handler(s) for this service...
Instance "CDB1", status READY, has 1 handler(s) for this service...
Service "pdb1" has 1 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...
The command completed successfully

The registration is not a problem, as it goes to the same instance. The problem is that the instance knows that service as belonging to PDB1. Then, when a common user connects with this service his session switches to the PDB1 container:

# sqlplus -s sys/oracle@"(DESCRIPTION=(CONNECT_DATA=(SERVICE_NAME=CDB1_DGMGRL))(ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=1521)))" as sysdba <<<'show con_name'
CON_NAME
------------------------------
PDB1

Do not expect (STATIC_SERVICE=TRUE) to change anything, because the static registration has no information about the PDB:

# sqlplus -s sys/oracle@"(DESCRIPTION=(CONNECT_DATA=(SERVICE_NAME=CDB1_DGMGRL)(INSTANCE_NAME=CDB1)(STATIC_SERVICE=TRUE))(ADDRESS=(PROTOCOL=TCP)(HOST=127.0.0.1)(PORT=1521)))" as sysdba <<<'show con_name'
CON_NAME
------------------------------
PDB1

By the way, even when the service is stopped:

SQL> connect demo/demo@//localhost/PDB1
Connected.
SQL> exec dbms_service.stop_service('CDB1_DGMGRL');
PL/SQL procedure successfully completed.

and then the listener knows only the static declaration:

Services Summary...
Service "7cd707c6f0a7108ce053be38a8c0058b" has 1 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...
Service "CDB1" has 1 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...
Service "CDB1XDB" has 1 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...
Service "CDB1_DGMGRL" has 1 instance(s).
Instance "CDB1", status UNKNOWN, has 1 handler(s) for this service...
Service "pdb1" has 1 instance(s).
Instance "CDB1", status READY, has 1 handler(s) for this service...
The command completed successfully

the instance will still redirect to PDB1 even if the session is not connected with a service:

SQL> connect sys/oracle@"(DESCRIPTION=(CONNECT_DATA=(SERVICE_NAME=CDB1_DGMGRL))(ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=1521)))" as sysdba
Connected.

SQL> select sys_context('userenv','service_name') from dual;
SYS_CONTEXT('USERENV','SERVICE_NAME')
--------------------------------------------------------------------
SYS$USERS
SQL> show con_name
CON_NAME
------------------------------
PDB1

Only when the service is deleted, the connections will stay in CDB$ROOT:

SQL> exec dbms_service.delete_service('CDB1_DGMGRL');
PL/SQL procedure successfully completed.
# sqlplus -s sys/oracle@"(DESCRIPTION=(CONNECT_DATA=(SERVICE_NAME=CDB1_DGMGRL)(INSTANCE_NAME=CDB1)(STATIC_SERVICE=TRUE))(ADDRESS=(PROTOCOL=TCP)(HOST=127.0.0.1)(PORT=1521)))" as sysdba <<<'show con_name'
CON_NAME
------------------------------
CDB$ROOT

Solution(s)

Basics: be careful with powerful privileges, such as INHERIT PRIVILEGES, which can allow a PDB DBA to overwrite a common user procedure with authid current_user.

Safe attitude: when you expect your connection to run in CDB$ROOT be sure to do it explicitely with ALTER SESSION SET CONTAINER=CDB$ROOT

Isolation: dedicate a listener for your system administration. Another listener will be the target of REMOTE_LISTENER for dynamic registration.

Least privileges: when you need only to access to CDB$ROOT connect with a user that has no CONNECT privilege on the PDBs.

DBaaS: if you are using multitenant for database on-demand provisioning where the user can choose the name of the PDB then be sure that the name cannot be the same as a static service you use. Naming conventions may help there.

Franck Pachot's picture

UUID (aka GUID) vs. Oracle sequence number

When you want to generate a unique number, for a primary key, the most common idea (and the right one) is to get a number from an always increasing generator. Oracle provides SEQUENCE for this purpose, which is optimized, easy to use and scalable.

But some application developer has the idea to use a UUID (Universally Unique Identifier) also known as GUID (Globally Unique Identifier). The main advantage is that you do not need a shared object like a sequence because there is no need for a central authority. UUID is generated with a hash function on some local identifiers (host, process, thread) and the hash value is large enough to reduce the probability of collisions without the need for a shared central authority.

However, in the context of Oracle Database:

  • having a shared object is not a problem, as we have a database
  • large values are less efficient (more space in disk, memory and more CPU to process)

NUMBER from SEQUENCE

In order to put some numbers about this argument, I create a 10 million rows table with an ID generated from a SEQUENCE:

SQL> create sequence DEMO_NUM_ID cache 1e6;
Sequence DEMO_NUM_ID created.
SQL> create table DEMO_NUM
(ID constraint DEMO_NUM_ID primary key, NUM)
as select DEMO_NUM_ID.nextval, rownum
from xmltable('1 to 10000000');
Table DEMO_NUM created.
Elapsed: 00:04:37.796

The data type returned by the sequence is a NUMBER:

SQL> desc DEMO_NUM
Name Null?    Type       
---- -------- ----------
ID NOT NULL NUMBER(38)
NUM NUMBER

A NUMBER stores two digits per bytes. Here, my 10 million values take on average less than 5 bytes:

SQL> select sum(vsize(ID))/count(*),max(ID),count(*) from DEMO_NUM;
SUM(VSIZE(ID))/COUNT(*)    MAX(ID)   COUNT(*)
----------------------- ---------- ----------
4.8888893 10000000 10000000

This is very efficient. It is even smaller than a ROWID which is the internal identification of a row.

UUID from SYS_GUID()

Here is a similar table with the ID generated as a GUID:

SQL> create table DEMO_GUID
(ID constraint DEMO_GUID_ID primary key, NUM)
as select sys_guid(), rownum
from xmltable('1 to 10000000');
Table DEMO_GUID created.
Elapsed: 00:05:45.900

You can already see that it takes longer to generate. This generates a RAW datatype in 16 bytes.

SQL> desc DEMO_GUID
Name Null? Type
---- -------- ------------
ID NOT NULL RAW(16 BYTE)
NUM NUMBER

This is very large. Each row where it is a primary key, and each foreign key, and indexes on them, will take 16 bytes where a large part is always the same (hashed from the host, and process). Note that a RAW is displayed with its hexadecimal character translation which is 32 characters here, but does not store it as a VARCHAR2(32) as this will be 2 times larger, and has some additional processing for characterset.

No surprise here, each value is 16 bytes:

SQL> select sum(vsize(ID))/count(*) from DEMO_GUID;
SUM(VSIZE(ID))/COUNT(*)
-----------------------
16

RAW from NUMBER

I see only one advantage in GUID primary keys: they are RAW datatypes. I like it because we don’t want arithmetic operations on it. And in addition to that nobody will complain about gaps in numbering. Then, can we store our NUMBER from the sequence as a ROW?

Here, I’m converting to a varchar2 hexadecimal and then to a raw. There’s probably a more efficient method to convert a number to a binary row. UTL_RAW has a CAST_FROM_NUMBER but that is the NUMBER representation. There’s also a CAST_FROM_BINARY_INTEGER. Here, I did a simple conversion through a varchar2, not very nice but the execution time is correct.

SQL> create sequence DEMO_RAW_ID cache 1e6;
Sequence DEMO_RAW_ID created.
SQL> create table DEMO_RAW 
(ID constraint DEMO_RAW_ID primary key, NUM)
as select hextoraw(to_char(DEMO_RAW_ID.nextval,rpad('FM',65,'X')))
, rownum
from xmltable('1 to 10000000');
Table DEMO_RAW created.
Elapsed: 00:04:21.259

This storage is even smaller than the NUMBER. 3 bytes instead of 5 bytes on average for those 10 million values:

SQL> select * from 
(select sum(vsize(ID))/count(*) "vsize(NUM)" from DEMO_NUM),
(select sum(vsize(ID))/count(*) "vsize(RAW)" from DEMO_RAW),
(select sum(vsize(ID))/count(*) "vsize(GUID)" from DEMO_GUID);
vsize(NUM) vsize(RAW) vsize(GUID)
---------- ---------- -----------
4.8888893 2.993421 16

This is totally expected as a byte can store 255 different values, but NUMBER uses only 99 ones with the two digits representation. In my opinion, it would make sense to have a SEQUENCE returning an integer as a RAW binary representation. But as I don’t think people will actually use it, I will not fill an Enhancement Request for that.

Here is the size for the tables and the index on this ID column:

SQL> select segment_name,segment_type,bytes/1024/1024 MBytes 
from dba_segments
where owner=user and segment_name like 'DEMO%' order by mbytes;
SEGMENT_NAME                   SEGMENT_TYPE           MBYTES
------------------------------ ------------------ ----------
DEMO_RAW_ID INDEX 160
DEMO_RAW TABLE 168
DEMO_NUM_ID INDEX 174
DEMO_NUM TABLE 188
DEMO_GUID_ID INDEX 296
DEMO_GUID TABLE 304

Of course, this confirms what we have seen with the average size. The GUID is definitely not a good solution.

Compression

12cR2 introduced a compression algorithm for indexes which is interesting even when there are no repeated column values: ADVANCED HIGH (which is available in Enterprise Edition with Advanced Compression Option):

SQL> alter index DEMO_GUID_ID rebuild compress advanced high;
Index DEMO_GUID_ID altered.
Elapsed: 00:01:30.035
SQL> alter index DEMO_RAW_ID rebuild compress advanced high;
Index DEMO_RAW_ID altered.
Elapsed: 00:00:57.193
SQL> alter index DEMO_NUM_ID rebuild compress advanced high;
Index DEMO_NUM_ID altered.
Elapsed: 00:00:49.574

This reduced all 3 indexes. But the GUID one is still the largest one even if a large part of the values are repeated.

SQL> select segment_name,segment_type,bytes/1024/1024 MBytes from dba_segments where owner=user and segment_name like 'DEMO%' order by mbytes;
SEGMENT_NAME                   SEGMENT_TYPE           MBYTES
------------------------------ ------------------ ----------
DEMO_RAW_ID INDEX 80
DEMO_NUM_ID INDEX 80
DEMO_GUID_ID INDEX 136
DEMO_RAW TABLE 168
DEMO_NUM TABLE 188
DEMO_GUID TABLE 304

It is interesting to see that the benefit of RAW number vs. NUMBER datatype is smaller once compressed.

When is GUID smaller than a NUMBER?

Here is where a NUMBER starts to be larger than a 16 bytes GUID:

1E30+1 is the first number that reaches 17 bytes. You will never reach this number with a sequence. Just calculate the size of a database storing these numbers — even with huge gaps in the sequences. That’s impossible. And anyway, GUID is not a solution there given the high probability of collisions.

In summary: do not use GUID or UUID for your primary keys.

A NUMBER generated from a SEQUENCE is the right/efficient/scalable way to get a unique identifier. And if you want to be sure to have no collision when merging or replicating from another system, then just add another ‘system identifier’ as an additional column in the primary key. This can be a node number in a distributed system, or a 3-letter identifier of the company. When column values have a repeated subset, there’s no efficient deduplication or compression techniques. But when it is a different column that is fully repeated, table and index compression can be used. So, if you really need to add a hashed hostname, put it in an additional column rather than the UUID idea of mixing all in one value.

connor_mc_d's picture

Your New Years Resolution

Aligning roughly with the calendar year, based on the Chinese zodiak we’re about to go from the year of the dog to the year of the pig. But for me, in the “Information Technology Zodiak” Smile , 2018 was the year of the hack, just as it was in 2017 and just as it will be for 2019.

I’ve not dedicated much time to keeping a record of all of the high profile breaches this year, but just off the top of my head I can think of:

  • The Elasticsearch breach,
  • The Starwood breach which, as a frequent traveller, probably guarantees I’ve been impacted in some as yet unknown way,
  • The Quora breach, a site that is very popular amongst I.T professionals.
  • The Google+ breach, which peeved me more about the fact that it was hush hushed by Google in order to preserve reputational damage. Um…what happened to looking after customers?

Adding to that list, whilst we were discussing security this year at Sangam 18, indian friends reminded me of the enormous Aadhaar breach at the start of the year, which is perhaps one of the largest of all time in terms of numbers of people impact.

And to be honest, I lost track of how many, which, where etc the amount of dramas that Facebook had.

I’m a realist so I’m not saying that hacks can always be avoided, or that anyone who gets hacked is being negligent. Sometimes even the best efforts are not good enough, because the hacks get more and more sophisticated all the time. But at the very least, I think we should set a goal for 2019 that all of our data at rest should be encrypted. If we can at least start with that small step, then at least if our data at rest is somehow nefariously obtained, it will still have a veil of protection.

So what’s your I.T resolution for next year? I hope it’s security related.

connor_mc_d's picture

Another day…another "use the right datatype" post

Here’s an interesting little oddity (aka bug) with scalar queries.

We’ll start with a simple working example


SQL> create table t1 ( c1 number );

Table created.

SQL> insert into t1 values (1);

1 row created.

SQL> create table t2 ( c1 int, c2 varchar2(10));

Table created.

SQL>
SQL> insert into t2 values(1,'t1');

1 row created.

SQL> insert into t2 values(1,'t01');

1 row created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats('','T1')

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats('','T2')

PL/SQL procedure successfully completed.

SQL>
SQL>
SQL> select a.c1,
  2    ( select max(b.c2) from t2 b where a.c1 = b.c1 )
  3  from t1 a;

        C1 (SELECTMAX
---------- ----------
         1 t1

1 row selected.

That all seems straightforward:

  • We got the value 1 from T1.C1,
  • used that as an input to the query into T2
  • got the maximum of the 2 matching rows from table T2
  • return the result as a column outer query

Let us now repeat the example, but we’ll now make column C1 in table T2 a VARCHAR2. The query will remain unchanged, so now we are comparing a numeric “1” with a varchar2 value of “1”.


SQL> create table t1 ( c1 number );

Table created.

SQL> insert into t1 values (1);

1 row created.

SQL> create table t2 ( c1 varchar2(10), c2 varchar2(10));

Table created.

SQL>
SQL> insert into t2 values('1','t1');

1 row created.

SQL> insert into t2 values('01','t01');

1 row created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats('','T1')

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats('','T2')

PL/SQL procedure successfully completed.

SQL>
SQL>
SQL> select a.c1,
  2    ( select max(b.c2) from t2 b where a.c1 = b.c1 )
  3  from t1 a;

        C1 (SELECTMAX
---------- ----------
         1 t1
         1 t01

2 rows selected.

Whoops…that doesn’t look right.

Don’t get me wrong, that is an error in the way that we are processing the query, because we should not be getting 2 rows back from it. But it is another example of where you can encounter boundary cases (and hence bugs) when you stray from the well-trodden route. Something is obviously going awry during the implicit conversion, because if we explicitly take care of it, then things are fine.


SQL> select a.c1,
  2    ( select max(b.c2) from t2 b where a.c1 = to_number(b.c1) )
  3  from t1 a;

        C1 (SELECTMAX
---------- ----------
         1 t1

1 row selected.

I’d love it if there was an init.ora parameter that could be used in (say) development environments that would error out any SQL statement with an implicit data type conversion, but until that day comes (and it may be never!) then please make sure you take care with your data types!

Happy New Year everyone!

Franck Pachot's picture

Index Only access with Oracle, MySQL, PostgreSQL, and Microsoft SQL Server

In my previous post about the advantages of index access over full table scans, I mentioned covering indexes. This is when an Index Range Scan can retrieve all columns without going to the table. Adding to an index all the columns used by the SELECT or WHERE clause is an important tuning technique for queries that are around the inflection point between index access and table full scan. But not all RDBMS are equal. What PostgreSQL calls ‘Index Only’ actually reads the table, except for static data with no concurrent modifications.

I’ll show the execution plans for this Index Only access on Oracle, MySQL, PostgreSQL, and MS SQLServer. As my skills on the non-Oracle ones are very limited, do not hesitate to comment if you think something is not correct.

I create a DEMO table

As I don’t want to bother with datatypes names. I use CREATE TABLE AS SELECT

This works on MySQL and PostgreSQL:

create table DEMO as select 1 A,1 B,1 C union all select 1 A,1 B,1 C

With Oracle I need to mention a table and DUAL is there for this goal:

create table DEMO as select 1 A, 1 B, 1 C from dual 
union all select 1 A, 1 B , 1 C from dual;

With SQL Server , no ‘Create Table As Select’, but the INTO syntax is used to mention the table to create:

select 1 A, 1 B, 1 C into DEMO union all select 1 A, 1 B , 1 C;

This table has 2 rows. I’ll add more rows so that the optimizer/query planner can base its estimations on statistics.

Insert rows

My goal is to query with a predicate ‘where A=2’ returning a small number of rows, as I want to benefit from index access. This works on all databases: insert a lot of (1,1,1) with a cartesian product and a few (2,0,0)

insert into DEMO select 1,1,1 
from DEMO a,DEMO b,DEMO c,DEMO d,DEMO e,DEMO f,DEMO g,DEMO h;
insert into DEMO values (2,0,0);
insert into DEMO select a.A, a.B, a.C from DEMO a;
create index DEMO_AB on DEMO(A,B);
select A,count(*) from DEMO group by A;

Because this version of Oracle gathers statistics during the CTAS but not for further DML, I delete statistics to rely on dynamic sampling:

begin dbms_stats.delete_table_stats('','DEMO'); end;
/

Explain plan

For MySQL I don’t know how to get the execution plan with execution statistics, so I simply used EXPLAIN

explain select sum(A) from DEMO where A=2;
explain select sum(B) from DEMO where A=2;
explain select sum(C) from DEMO where A=2;

The goal is to show the plan when I select only the column used for access(A), or an additional column found in the index (B) or a column which exists only in the table. The one with B is the one where I expect Index Only access.

For PostgreSQL the EXPLAIN statement can execute and show execution statistics:

explain (analyze,verbose,costs,buffers)
select sum(A) from DEMO where A=2;

For SQL Server, this is done with SET STATISTICS PROFILE before running the statements:

set statistics profile on

For Oracle, I used DBMS_XPLAN :

select /*+ gather_plan_statistics */ sum(B) from DEMO where A=2;
select plan_table_output from dbms_xplan.display_cursor(format=>'allstats last +projection');

Results

I’ve run them with db<>fiddle so that you can see the exact execution with the link provided.

Oracle 18c

https://dbfiddle.uk/?rdbms=oracle_18&fiddle=400775c8c1a7af79529312b4d3d080c6

When I access only A or B there is no need to go to the table with Oracle. This is a true Index Only access.

The +PROJECTION format shows which columns are returned by each operation and here it is clear that B comes from the index. No need to go to the table for each index entry.

The last example which selects column C is the only one where Oracle has to access the table:

In summary, with Oracle, I have to go to the table only when I need a column from it that is not in the index. This is a very interesting solution when I have a range scan which selects a large number of rows, where accessing rows scattered in the table is expensive, but where the table is too large for a full table scan to be efficient. Then the Index Only access is a solution. This technique is also known as Fat Index (from Tapio Lahdenmaki work).

PostgreSQL 11

https://dbfiddle.uk/?rdbms=postgres_11&fiddle=f674aab989f0bdf0b9972841081e1cac

PostgreSQL, since 9.2, has an access path explicitly called Index Only but if you run my example, you will not see it. And even when you query only the column A you will access the table:

This is just an optimized version of going to the table for each index entry, where a bitmap from index entries is built before scanning the table. The query planner may have chosen Index Only access, but it would be less efficient because - despite its name - Index Only access will finally access to the table rows. You see ‘Heap Fetches’ when it is the case. I blogged about it in the past. The reason is that PostgreSQL MVCC (multi-versioning to be able to read without blocking writes) is at Tuple level, and the index has no information about what is committed or not.

Postgres has a little optimization for this, using the visibility map, but this is maintained asynchronously by the vacuum process. In my db<>fiddle example you will see a real Index Only after a full vacuum:

In summary, PostgreSQL Index Only is really ‘Index Only’ when you see ‘Heap Fetches: 0’ and this means that the ‘fat index’ optimization is possible only for tables with rare modifications and frequent vacuum.

MySQL 8.0

https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=780df4e2a007bf9607c0711441400b5e

MySQL with the InnoDB engine is MVCC at block level like Oracle, where versioned index blocks contain the information about the visibility. You can see that when I select column A or B which are in the index, the ‘Extra’ information of the execution plan mentions ‘Using Index’ which means that there’s no access to the table:


SQL Server 2017

https://dbfiddle.uk/?rdbms=sqlserver_2017l&fiddle=4e2bc8cb87ffc7c004dff1ddb01d4308

SQL Server can do Index Only:

I can see the difference when selecting the C column which is not in the index. The need to go to the table for each index entry (RID Lookup) makes the optimizer chose a Table Scan for this example (except when hinted):


SQL Lite

As db<>fiddle is down for SQLite when writing this, I’ve run it with sqlfiddle http://sqlfiddle.com/#!5/59a60/4 and I see ‘COVERING’ mentioned when selecting only A or B:

SEARCH TABLE DEMO USING COVERING INDEX DEMO_AB (A=?)

In summary

Oracle and MySQL with InnoDB engine are the ones who can really do Index Only, aka Covering Index, in a high OLTP (where readers do not block writers). SQL Server can also do it, but with reads blocking writes (except with snapshot isolation). PostgreSQL can also do it, with non-blocking reads, but not efficiently when the tables have concurrent modifications.

Of course, my favorite is Oracle as this access path, and the MVCC at block level, is there for decades. Here is Oracle 7.3 execution plan for the same example:

connor_mc_d's picture

A Christmas Carol

You better watch out,
Let me tell you a fact.
If your SQL has literals,
You’re gonna be hacked.

SQL Injection is comin’ to town

We’ve got a library cache,
It’s memory all linked twice.
But it only works well,
When you’re binding all nice.

SQL Injection is comin’ to town

We know when you’ve been lazy,
And concatenated simple strings.
So just make sure you bind your stuff,
And don’t let the bad guys in.

So… you better watch out,
Let me tell you a fact.
If your SQL has literals,
You’re gonna be hacked.

SQL Injection is comin’ to town

 

Merry Christmas everyone! Smile

pete's picture

Oracle Security Blog Posts

I teach many training classes on Oracle security to lots of students worldwide both on-site and on-line and one area I often cover quote briefly is where can you find more information or keep up to date on Oracle security....[Read More]

Posted by Pete On 23/12/18 At 05:53 PM

To prevent automated spam submissions leave this field empty.