Oakies Blog Aggregator

pete's picture

Hardening and Securing The Oracle Database Training in London

I posted last week that I will teach my two day class " How to Perform a Security Audit of an Oracle Database " with Oracle University in London on the 29th and 30th April 2019. We have now added....[Read More]

Posted by Pete On 11/03/19 At 11:52 AM

connor_mc_d's picture

Oracle Magazine

Generally my blog is just snippets of tech content that take my interest as I encounter them (most commonly when looking at AskTOM). If I think they’ll be useful, I’ll just plonk them out right there and then. If you prefer your content in longer (and more structured Smile) form, then also I publish longer form articles on Oracle Magazine every couple of months. Below is a consolidated list of my articles. I’ll try to keep this list updated as I add new ones.

Old Dog, New Tricks, Part 2
Here’s a new SQL syntax for hierarchy processing.

Improved Defaults in Oracle Database 12c
The new DEFAULT clause provides better values for getting started and better performance.

Excellent Extensions
New features in Oracle Database 12c Release 2 make external tables even more flexible.

Tighter PL/SQL and SQL Integration
PL/SQL functions perform better in SQL in Oracle Database 12c.

All Aboard the SQL*Loader Express
A new processing mode takes the hassle out of dealing with flat file loading.

Long and Overflowing
LISTAGG in Oracle Database 12c Release 2 solves the problem of excessively long lists.

Assume the Best; Plan for the Worst
Here’s a technique for delivering better performance through optimistic code programming.

A Fresh Look at Auditing Row Changes
Triggers can provide auditing information, but there’s a future in flashback.

Better Tools for Better Data
New functions in Oracle Database 12c Release 2 solve data validation challenges.

Unintended Side Effects
Ensure that the code you write does not create problems elsewhere in your applications.

Write in a Read-Only Database
Run reports and DML against your standby database with Oracle Active Data Guard.

Open for Exchange
FOR EXCHANGE in Oracle Database 12c Release 2 takes the detective work out of partition exchanges.

A Higher-Level Perspective on SQL Tuning
The commonly missed first steps of tuning a SQL statement

Are We All on the Same Page?
Pagination of data can make (or break) your database.

Old Dog, New Tricks
Take advantage of SQL extensions for hierarchy processing.

Franck Pachot's picture

SQL prevents database corruption and injection, except in the ridiculous movie’s hacker scenes.

SQL is the Structured Query Language used to define and manipulate data in most of the databases in the world, and the most critical ones (banks, hospitals, airlines, secret services… ). And then, it gives the impression that with SQL you can do whatever you want, bypassing all application control, as if it were a backdoor to your database, wide opened on the network.

Superman 3 “overide all security” command

Programmers always laugh when seeing ridiculous hacking scenes in movies. In 2016 there was this “use SQL to corrupt their database” line in Jason Bourne (nothing to do with JSON or /bin/sh, by the way, it’s a movie) and recently in StarTrek: discovery series the hacking 'audit' was explained as “The probe used multiple SQL injections”. I’ve put the links at the end of this post.

I just want to explain that SQL is not a problem in this area, but can be the solution. SQL is not only a powerful language to manipulate data: it also has all features to encapsulate and protect the access to the database.

Actually, SQL is the implementation of Edgar F. Codd rule number 5:

Pro SQL Server Relational Database Design and Implementation
By Louis Davidson, Jessica Moss

When one says that SQL is DDL (Data Definition Language) and DML (Data Manipulation Language), and forgets the DCL (Data Control Language), he suggests that we can do anything, like dropping or changing data, and forgets all this ‘authorization’ part. And DDL does not only CREATE and DROP the tables, but we can (and should) create views to implement the encapsulation of data access, like the ‘private’ and ‘public’ declarations we find in other languages. And stored procedure when we need a procedural language more complex rules.

If we can “use SQL to corrupt their database”, that’s not because of SQL. That’s because of the developers who did not care about security and maybe didn’t know that:

  • schemas should be used to isolate the different layers — internal ones vs. the ones exposed to the users
  • views must be used to hide the internal private columns or rows, and expose only the public ones. Views are not only for queries (SELECT), but also modifications (INSERT, UPDATE, DELETE) can be made on views.
  • grant/revoke table access must be done according to the visibility specification. Never do a GRANT ALL PRIVILEGES TO PUBLIC thinking that the front-end will guard all access.
  • stored procedures should be used to expose the application services, rather than giving direct DML access on tables or views to the front-end

Here is a clear definition, by Philip Salvisberg, about the right design of database application where data is not exposed directly but through a secured application API defined in the database:

The Pink Database Paradigm (PinkDB) - Philipp Salvisberg's Blog

With this, impossible to “use SQL to corrupt their database” because the SQL statements are not written by the user, but coded by the developer who implemented this API, with all access rules enforced. But then, what about hackers trying to change the API behavior by trying StarTreck’s “multiple SQL injections”?

SQL Injection

The idea is exposed in the following were a user can pass some SQL statements through the application API to tweak a SELECT statement and change it to be a DROP TABLE one:

xkcd: Exploits of a Mom

But this, again, is not a weakness of SQL but the wrong design and misunderstanding of Codd’s Rule 5 which states that there is two way to manipulate data: “interactive and by program”. If the user is able to do this SQL injection, that’s because the developer has used the wrong one. Probably by laziness, he used the ad-hoc query language instead of the programmer’s one.

Interactive

Yes, SQL can be used by end-users and this is why it is a 4th generation declarative language, looking like an English sentence describing the result we want. Here, users can do data manipulation without the need to learn a procedural language. The security access, in this case, relies entirely upon the DDL (view definition) and DCL (privileges granted) after the end-user authentication (login with his username). The user has not been granted DROP TABLE privilege, and will never be able to do it.

But that is not correct when the application connects to the database with a generic account, like when one program runs to serve multiple users and transactions, because this account has probably more privileges to cope with the multiple user profiles.

By program

As Rule 5 mentions, the SQL language can be used by programs. There, it is not the user who writes the SQL statement. The SQL statement is coded in advance by the application developer. And it is parsed and compiled. And then it is executed, many times, for different end-users, with different parameters. In this case, SQL is not a simple text command passed as a character string to the database engine. Here, SQL defines a cursor that is prepared from a SQL text that contains some variables. And the cursor is executed, passing values to those variables. There is no way to ‘inject’ SQL in this because the user can inject only values to the parameters: he cannot inject code.

Do you take user input and build from it a string to pass to Runtime.exec() in Java or os.system() in Python? This is exactly what you do with something like:

Statement stmt = con.createStatement();
ResultSet rs=stmt.executeQuery(
"select * from students where name='"+argv[1]+"'"
);

This is subject to SQL injection because you can pass a value that will change the WHERE clause predicate.

The right way to program a database access is to prepare a statement (once), and then execute it:

Statement stmt=con.prepareStatement(
"select * from students where name=:1"
);
stmt.setString(1,argv[1]);
ResultSet rs = stmt.executeQuery(stmt);

Here there’s no way to pass something else than a String value for the name column.

This is mandatory for security, but also for performance. Do you imagine that you compile a different program for each user input?

In summary, are you an end-user running ad-hoc queries or a professional programmer who builds a secured and efficient application? SQL is the right language for both, but you must pick the right approach. Or your code will be ridiculous like these movies hacking scenes:

“Use SQL to corrupt their database” (Jason Bourne, 2016)


“The probe used multiple SQL injections” (Star Trek: Discovery, 2019)

dbakevlar's picture

Using SSMS with Power BI

I’m curious by nature and many have shown interest when I connect Power BI to the SQL Profiler to collect performance information, but if you can do that, what happens when you connect it to the SQL Server Management Studio, (SSMS)?

That can seem quite foreign, but if you can connect it to the SQL Profiler, you can connect it to the SSMS. Why you can do this is clearly understood when you begin to look underneath the covers of the PBIX file and the processes that run from your desktop.

The PBIX File

All Power BI files end with .pbix. As I and others have written about, you can make a copy of the file, (because we DBAs always like to work from copies vs. the originals) then rename the file from .pbix to .zip. You can then unzip the file and see the base components that make up a Power BI report and visuals:

https://dbakevlar.com/wp-content/uploads/2019/03/enrollmentzip-300x97.jpg 300w, https://dbakevlar.com/wp-content/uploads/2019/03/enrollmentzip-768x249.jpg 768w" sizes="(max-width: 1024px) 100vw, 1024px" />

You now have a breakdown of the DataModel, DataMashup, the Diagram Layouts, the state, the Metadata, Security, Settings and Version. It’s kind of cool to know you can dissect the file once unzipped and make changes to specific features this way, as folks in the community have documented.

The Desktop Processes and Ports

While the Power BI report is open, you’ll see what is running using the netstat -b -n command from the command prompt as an administrator:

https://dbakevlar.com/wp-content/uploads/2019/03/netstatpbi-300x275.jpg 300w, https://dbakevlar.com/wp-content/uploads/2019/03/netstatpbi-768x705.jpg 768w" sizes="(max-width: 855px) 100vw, 855px" />

You’ll notice that pbidesktop.exe is port 54125, with multiple child threads. You’ll also notice there is an executable also sourced from the pbidesktop.exe process called msmdsrv.exe. That’s the executable for an Analysis Server. Your data model is being run by trimmed down Analysis Server in Power BI. If you have an Analysis Server port, you can connect to it with SSMS using the ID for Power BI Desktop. In our example above, the ID is 54125 and as the desktop is running on your PC, it would be your “localhost” for the server name. Open up SSMS and update the server type to Analysis Server and I type in the following, using my Active Directory login to connect:

https://dbakevlar.com/wp-content/uploads/2019/03/ssmsas-300x202.jpg 300w, https://dbakevlar.com/wp-content/uploads/2019/03/ssmsas-768x518.jpg 768w, https://dbakevlar.com/wp-content/uploads/2019/03/ssmsas.jpg 1195w" sizes="(max-width: 1024px) 100vw, 1024px" />

You will connect to the trimmed down Analysis Server behind the PBI Desktop report you have running on your screen and have limited interactive options. If you need proof, here’s the expanded list from SSMS:

https://dbakevlar.com/wp-content/uploads/2019/03/ssmsas2-300x242.jpg 300w, https://dbakevlar.com/wp-content/uploads/2019/03/ssmsas2-768x620.jpg 768w, https://dbakevlar.com/wp-content/uploads/2019/03/ssmsas2.jpg 1519w" sizes="(max-width: 1024px) 100vw, 1024px" />

Notice that it matches my list of tables from the Power BI interface for the same report:

https://dbakevlar.com/wp-content/uploads/2019/03/ssms_pbi-158x300.jpg 158w, https://dbakevlar.com/wp-content/uploads/2019/03/ssms_pbi-768x1458.jpg 768w, https://dbakevlar.com/wp-content/uploads/2019/03/ssms_pbi.jpg 831w" sizes="(max-width: 461px) 100vw, 461px" />

If we expand SSMS interface for the Analysis Server data model further, we can match the columns, calculated columns and measures, as well:

https://dbakevlar.com/wp-content/uploads/2019/03/ssms_as4-300x226.jpg 300w, https://dbakevlar.com/wp-content/uploads/2019/03/ssms_as4-768x578.jpg 768w" sizes="(max-width: 1024px) 100vw, 1024px" />

There are limitations- In a full Analysis Server data model, you would be able to execute SQL, MDX, XML and DAX queries. This appears to be disabled from the backend of the PBI data model, which would require you to only perform, at least any DAX and MDX, in the UI of Power BI.

https://dbakevlar.com/wp-content/uploads/2019/03/ssas_ms5-1-300x26.jpg 300w, https://dbakevlar.com/wp-content/uploads/2019/03/ssas_ms5-1-768x66.jpg 768w" sizes="(max-width: 1024px) 100vw, 1024px" />

As you can see above, the Execute button is missing from the toolbar and is missing, no matter if you open an XMLA, DAX, MDX and for the SQL Query, you can’t connect to a database engine. The database isn’t a full SQL Server or even appear to be a Windows Internal Database, (WID).

What you can do from here, is connect to the SQL Profiler from the SSMS UI and trace performance for the Power BI report, (data loads, data refreshes, DAX, MDX, etc) along with durations for distributed work to remote data sources.

Well, I hope my little run through Power BI with SSMS was interesting and have a great weekend.



Tags:  ,


Del.icio.us



Facebook

TweetThis

Digg

StumbleUpon




Copyright © DBAKevlar [Using SSMS with Power BI], All Right Reserved. 2019.

davidkurtz's picture

Effective PeopleSoft Performance Monitoring

This advice note describes how to configure PeopleSoft systems on Oracle so that performance metrics are collected that are useful performance monitoring and subsequent performance tuning.

Contents

  • Oracle RDBMS Instrumentation
    • On-Line Component/Page Information
    • Application Engine Step Information
    • Cobol/nVision Instrumentation
    • nVision Layout Instrumentation
    • 2nd Database Connection Instrumentation
  • PeopleTools Performance Metrics
    • Cobol and Application Engine Batch Timings
    • PeopleSoft Performance Monitor

Summary of Recommendations

  • Set EnableAEMonitoring=1 in all Application Server and process scheduler domains in order to enable PeopleSoft instrumentation on-line, in the integration broker, and in Application Engine programs.
  • Implement the PSFTAPI package and trigger described above to set module and action at the start of all processes.
  • Implement Fine-Grained Audit policy and handler to instrument nVision processes if required.
  • Reduce AWR snapshot frequency from the default of 1 hour to every 15 minutes.
  • Enable Application Engine batch timings to at least database and preferably also file by setting AETrace=1152 in all Application Server and Process Scheduler domain configuration files.
  • Enable Cobol statement timings report should be written to log file by setting TraceSQL=128 in all Process Scheduler domain configuration files.
  • Performance Monitor is complex to set up and the delivered analytics are limited.  Nonetheless, the information that can be obtained from the data collected can be worth the effort of configuration to address on-line configuration and performances issues.

Oracle RDBMS Instrumentation

Oracle provides the dbms_application_info package to set certain attributes on the current database session.  These attributes are visible in some of the dynamic performance version, and are picked up by Active Session History (ASH) and can also be seen in AWR reports, Enterprise Manager screens, SQL trace files, and other performance utilities such as EDB360.  The package was first documented in Oracle 7.3.3

"Application developers can use the DBMS_APPLICATION_INFO package to record the name of the executing module or transaction in the database for use later when tracking the performance of various modules… System administrators can also use this information to track resource usage by module. When an application registers with the database, its name and actions are recorded in the V$SESSION and V$SQLAREA views. Your applications should set the name of the module and name of the action automatically each time a user enters that module. The module name could be the name of a … code segment in an … application. The action name should usually be the name or description of the current transaction within a module."

Oracle 7 Tuning, release 7.3.3 ©Oracle 1997, Chapter 23 Registering Applications

See also One of my Favourite Database Things: DBMS_APPLICATION_INFO
If module and action are consistently set to meaningful values as the application executes it is then possible to determine from where SQL statements originated and how different parts of the application are performing.
ASH is separately licenced as part the Diagnostics Pack, that is only available on Enterprise Edition of the database.  However, most PeopleSoft customers running on Oracle do so on Enterprise Edition and are licenced for the diagnostics pack.
PeopleSoft has set client_info since PeopleTools 7.53 so the PeopleSoft operator ID can be associated with the database session, mainly to allow auditing to be done with database triggers.  However, this attribute is not persisted to the ASH data.  However, client_id is also set to the operator ID, and this is collected by ASH.
PeopleTools has set module and action since PeopleTools 8.50, though the exact values used have changed sometimes with the PeopleTools version.
A new application server/process scheduler domain parameter EnableAEMonitoring was introduced in PeopleTools 8.54 to control this behaviour and it is not enabled by default.  If monitoring is not enabled module defaults to the program name and action remains blank.
Recommendation: Set EnableAEMonitoring=1 in all application server and process scheduler domains in order to enable PeopleSoft instrumentation on-line, in the integration broker, and in Application Engine programs.
See also:

On-Line Component/Page Information

In the online application, module and action are set to the component and page name respectively.  In a search dialogue, Action is set to 'xyzzy'.
In the Integration Broker module and action are set to service name and queue name.

Application Engine Step Information

In Application Engine, module is set to a string that includes the name of the Application Engine main program the was called and the sessionid_num for the current process instance number recorded on the process scheduler request table PSPRCSQUE.  For example: PSAE.PSPMCSOSUM.1448
Later on, it may be necessary to use regular expressions in SQL to process this string before profiling the ASH data.
Action is set to string concatenated from the Application Engine program, section, step name, and step type.  For example: PSPMCSOSUM.GETCNT.CNT.P
The program name may be different from that shown in module if one AE program calls another.
Note: Application Engine still doesn't reset ACTION on DO SELECT steps

Cobol/nVision Instrumentation

Cobol, nVision, and SQR do not set module and action.  Instead, they can be set at the start of every program initiated by Process Scheduler using a trigger on PSPRCSRQST.  The first thing a process does when it is initiated is to set the run status on its scheduler request record to 7, indicating that it is processing.  This is visible in the Process Monitor component.  A trigger on this transition can set module and action for the session to the program and process instance number on that row of data.
This technique was used prior to PeopleSoft adding instrumentation to PeopleTools.  It can still be applied to all processes, including Application Engine because any PeopleSoft instrumentation will simply overwrite the value set by the trigger.

Recommendation: Implement the PSFTAPI package and trigger described above in order to set module and action at the start of all processes.
The same technique was also used prior to the introduction of ASH to enable Oracle SQL Trace if a particular run control was specified, and is still occasionally useful.

nVision Layout Instrumentation

One of the challenges of tuning and monitoring nVision is to be able to identify each report being run.   nVision reports always run as the same process name, either NVSRUN for a single report, RPTBOOK for a report book of many reports, or DRILLDWN for a nVision drill-down query.  Knowing the process instance is useful because then we can look up the operator and run control ID
However, it would also be useful to know the report ID being run.  When each individual nVision report starts it queries the runtime parameters from the PS_NVS_REPORT PeopleTools table.  There is no update, so it is not possible to capture this with a DML trigger.  Instead, it is possible to define a fine-grained audit policy on the query and set module within a PL/SQL handler package that is invoked by the fine-grained audit.

Recommendation: Implement Fine-Grained Audit policy and handler to instrument nVision processes if required

2nd Database Connection Information

PeopleSoft programs use a second database connection to increment sequence numbers to minimise the row level locking on such tables.  It is like an AUTONOMOUS_TRANSACTION in PL/SQL.  There is no PeopleSoft instrumentation on this session.  It is possible to use an AFTER LOGON trigger to set client_info, module and action.

Oracle Automatic Workload Repository (AWR) Snapshots

PeopleSoft generates a lot of non-shareable SQL.

  • Dynamically generated SQL, often in PeopleCode, concatenates strings of SQL with bind variables, thus the bind variables become literals in the final SQL statement.  Statements with different literal values are considered to be different statements with different SQL_IDs.
  • Similarly, dynamic Cobol statements result in literal values in the SQL statement.
  • %BIND() variables in Application Engine will also become literal values in the SQL Statement unless the ReUseStatement attribute is set on the AE step, however, this cannot be set on statements with dynamic fragments code are introduced with %BIND(…,NOQUOTES).
  • Application Engine programs that use temporary records can use different non-shared instances of the record in different executions, and this also results in different statements with different SQL_IDs.
  • See also Performance Benefits of ReUse Statement Flag in Application Engine

Consequently, the library caching is not particularly efficient in PeopleSoft, and dynamic SQL statements are often quickly aged out of the library cache.  AWR snapshots can only capture the SQL that is in the library cache at the time of the snapshot.  If the SQL statement, or at least a statement with the same force matching signature or plan hash value, cannot be found the AWR it cannot be identified or analysed.  Therefore, it is advantageous to increase the snapshot frequency on PeopleSoft systems.
Recommendation: Reduce AWR snapshot frequency from the default of 1 hour to every 15 minutes.  This change results in only a modest increase overhead in processing and space on AWR, but it is worth the additional information that is captured.
This advice also applies to Statspack that may be used if you are not licenced for the Diagnostics Pack.

PeopleTools Performance Metrics

Batch Timings

PeopleSoft Application Engine and Cobol programs can emit batch timings reports on successful completion.

Application Engine

Application Engine batch timings are controlled by the AETrace flag in the Process Scheduler domain configuration file and for on-line AE programs in the Application Server domain configuration files.

  • AETrace=128: batch timings report is written to the AE Trace file to
  • AETrace=1024: batch timings are written to PS_BAT_TIMINGS% tables in the database

The overhead of batch timings is negligible while the program is running because it is accounted in memory and only written to file or database when the process completes successfully.
Recommendation: Enable Application Engine batch timings to at least database and preferably also file by setting AETrace=1152 in all Application Server and Process Scheduler domain configuration files.
The trace setting in the process scheduler configuration can be overridden by setting process specific command line parameter overrides in the process definition.  This is often done to set other trace settings, it is also common to see these unintentionally left in place longer than necessary.  If trace is set in this way it should always also set the batch timings flags.
See PeopleBooks -> Development Tools -> Application Engine -> Enabling Application Engine Tracing

Cobol

PeopleSoft Cobol programs can only write batching timings reports to file and not to the database.  This is controlled by a different parameter.

  • TraceSQL = 128: Enable Cobol statement timings report 

Recommendation: Enable Cobol statement timings report should be written to log file by setting TraceSQL=128 in all Process Scheduler domain configuration files.
This trace setting can also be overridden by setting process specific command line parameter overrides in the process definition.  If trace is set in this way it should always also set the batch timings flags.

PeopleSoft Performance Monitor

This provides information about the performance of the PIA including response times for the online transactions.  Metrics are stored in a separate monitoring PeopleSoft system to minimize the effect of measurement intrusion.  It optionally samples the state of each web server, application server and process scheduler collecting operating system and Tuxedo metrics.  It also has a PIA session trace capability.
The sampled data includes the number of busy application server processes and length of inbound Tuxedo service queues.  This data can be used to validate the sizing of the application servers.
Recommendation: Performance Monitor is complex to set up and the delivered analytics are limited.  Nonetheless, the information that can be obtained from the data collected can be worth the effort of configuration to address on-line configuration and performances issues.

pete's picture

Stop The DBA Reading Data in Subtle Ways

The Problem: Dan asked me a question about whether the DBA can be stopped from accessing views such as V$SQL or V$SQL_BIND_CAPTURE with Database Vault because these views can be used to read data from the SGA. I have covered....[Read More]

Posted by Pete On 08/03/19 At 03:41 PM

Jonathan Lewis's picture

Append hint

One of the questions that came up on the CBO Panel Session at the UKOUG Tech2018 conference was about the /*+ append */ hint – specifically how to make sure it was ignored when it came from a 3rd party tool that was used to load data into the database. The presence of the hint resulted in increasing amounts of space in the table being “lost” as older data was deleted by the application which then didn’t reuse the space the inserts always went above the table’s highwater mark; and it wasn’t possible to change the application code.

The first suggestion aired was to create an SQL Patch to associate the hint /*+ ignore_optim_embedded_hints */ with the SQL in the hope that this would make Oracle ignore the append hint. This won’t work, of course, because the append hint is not an optimizer hint, it’s a “behaviour” hint.

There are, however, various things that will make the append hint invalid – sometimes to the great surprise of the people using it. The three things I can think of at present are:

  • row-level triggers
  • enabled foreign key constraints
  • non-unique indexes enforcing unique constraints

It’s been some time since I last ran a detailed check so I’m not going to guarantee that the following claims are still true – if it matters to you then it’s easy enough to create a little test of (say) 10,000 rows inserted into a non-empty, indexed table.

  • Row level triggers turn array processing into single row processing, so there’s a significant increase in redo generation.
  • Non-unique indexes enforcing unique constraints have (small but) potentially undesirable effects on the optimizer and on run-time and may turn array processing into single row processing.
  • Foreign key constraint require checking which may have some impact, but doesn’t turn array processing into single row processing.

Of the three options the foreign key constraint seemed to me to be the best strategy to disable the hint with minimal side effects, so my answer was:

“Create a new table with no data and a primary key constraint; add an invisible column to the target table, index it (online) and add a foreign key constraint from the column to the new table”.

My thinking on this was that the foreign key will disable the append hint. The column will always be null which means it will always satisfy the foreign key constraint without being checked and it won’t introduce any index maintenance overheads. Net effect: /*+ append */ is disabled with no extra overheads.

Footnote

The append hint is also ignored if the table is an Index Organized Table (IOT), but changing a heap table to an IOT is probably not a sensible choice if all you want to do is disable the hint – the potential for unexpected client side anomalies is too high, and if the table is heavily indexed the processing overhead for the secondary indexes could be fairly significant.

If I recall correctly the person asking the question said that the “do nothing” trigger option sounded like something they would be most comfortable with and they’d live with the overheads. I understand the feeling – an invisible column with an invisible index and extra foreign key constraint sounds too close to the boundary where mixing and matching simple features ends up hitting some unexpected behaviour (i.e. a bug).

Update 14th March 2019

A note on tweeter has reminded me that distributed transactions introduce another limitation.  Inserting across a database link will work only if the insert is into a local table; the first insert below will append, the second will not.


insert /*+ append */ into local_table select * from table@remote_location;

insert /*+ append */ into table@remote_location select * from local_table;

There are a couple more limitations given in the comments – disable table locking, and adding a security policy (VPD / RLS / FGAC / OLS).

Uwe Hesse's picture

Comparison between #Oracle and #Exasol

After having worked with both databases for quite some time, this is what I consider to be the key differences between Oracle and Exasol. Of course the two have much in common: Both are relational databases with a transaction management system that supports the ACID model and both follow the ANSI SQL standard – both with some enhancements. Coming from Oracle as I do, much in Exasol looks quite familiar. But let’s focus on the differences:

Strengths

Oracle is leading technology for Online Transaction Processing (OLTP). If you have a high data volume with many users doing concurrent changes, this is where Oracle shines particularly.

Exasol is leading technology for analytical workloads. If you want to do real-time ad hoc reporting on high data volume, this is where Exasol shines particularly.

Architecture

Data Format & In-Memory processing

Oracle uses a row-oriented data format, which is well suited for OLTP but not so much for analytical workloads. That’s why Hybrid Columnar Compression (only available on Engineered Systems respectively on Oracle proprietary storage) and the In-Memory Column Store (extra charged option) have been added in recent years.

Exasol uses natively a compressed columnar data format and processes this format in memory. That is very good for analytical queries but bad for OLTP because one session that does DML on a table locks that table against DML from other sessions. Read Consistent SELECT is possible for these other sessions, though.

Oracle was designed for OLTP at times when memory was scarce and expensive. Exasol was designed to process analytical workloads in memory.

Clustering

Oracle started as a non-clustered (single instance) system. Real Application Clusters (RAC) have been added much later. The majority of Oracle installations is still non-clustered. RAC (extra charged option) is rather an exception than the rule. Most RAC installations are 2-node clusters with availability as the prime reason, scalability being rather a side aspect.

Exasol was designed from the start to run on clustered commodity Intel servers. Prime reasons were MPP performance and scalability with availability being rather a side aspect.

Data Distribution

This doesn’t matter for most Oracle installations, only for RAC. Here, Oracle uses a shared disk architecture while Exasol uses a shared nothing architecture, which is optimal for performance because every Exasol cluster node can operate on a different part of the data in parallel. Drawback is that after adding nodes to an Exasol cluster, the data has to be re-distributed.

With Exadata, Oracle tries to compensate the performance disadvantage of the shared disk architecture by enabling the storage servers to filter data locally for analytical workloads. This approach leads to better performance than Oracle can deliver on other (non-proprietary) platforms.

Availability & Recoverability

Clearly, Oracle is better in this area. A non-clustered Oracle database running in archive log mode will enable you to recover every single committed transaction you did since you took the last backup. With Exasol, you can only restore the last backup and all changes since then are lost. You can safeguard an Oracle database against site failure with a standby database at large distance without performance impact. Exasol doesn’t have that. With RAC, you can protect an Oracle database against node failure. The database stays up (the Global Resource Directory is frozen for a couple of seconds, though) upon node failure with no data loss.

If an Exasol cluster node fails, this leads to a database restart. Means no availability for a couple of seconds and all sessions get disconnected. But also no data loss. Optionally, Exasol can be configured as Synchronous Dual Data Center – similar to Oracle’s Extended RAC.

Complexity & Manageability

I realized that there’s a big difference between Exasol and Oracle in this area when I was teaching an Exasol Admin class recently: Some seasoned Oracle DBAs in the audience kept asking questions like “We can do this and that in Oracle, how does that work with Exasol?” (e.g. creating Materialized Views or Bitmap Indexes or an extra Keep Cache) and my answer was always like “We don’t need that with Exasol to get good performance”.

Let’s face it, an Oracle database is probably one of the most complex commercial software products ever developed. You need years of experience to administer an Oracle database with confidence. See this recent Oracle Database Administration manual to get an impression. It has 1690 pages! And that’s not yet Real Application Clusters, which is additionally 492 pages. Over 2100 pages of documentation to dig through, and after having worked with Oracle for over 20 years, I can proudly say that I actually know most of it.

In comparison, Exasol is very easy to use and to manage, because the system takes care of itself largely. Which is why our Admin class can have a duration of only two days and attendees feel empowered to manage Exasol afterwards.

That was intentionally so from the start: Exasol customers are not supposed to study the database for years (or pay someone who did) in order to get great performance. Oracle realized that being complex and difficult to manage is an obstacle and came out with the Autonomous Database – but that is only available in the proprietary Oracle Cloud.

Performance

Using comparable hardware and processing the same (analytical) workload, Exasol outperforms any competitor. That includes Oracle on Exadata. Our Presales consultants regard Exadata as a sitting duck, waiting to get shot on a POC. I was personally shocked to learn that, after drinking the Oracle Kool-Aid myself for years.

In my opinion, these two points are most important: Exasol is faster and at the same time much easier to manage! I mean anything useless could be easy to manage, so that’s not an asset on its own. But together with delivering striking performance, that’s really a big deal.

Licensing

This is and has always been a painpoint for Oracle customers: The licensing of an Oracle database is so complex and fine granular that you always wonder “Am I allowed to do this without violating my license? Do we really need these features that we paid for? Are we safe if Oracle does a License Audit?” With Exasol, all features are always included and the two most popular license types are totally easy to understand: You pay either for the data volume loaded into the cluster or for the amount of memory assigned to the database. No sleepless nights because of that!

Cloud

This topic becomes increasingly important as many of our new customers want to deploy Exasol in the cloud. And you may have noticed that Oracle pushes going cloud seriously over the last years.

Exasol runs with all features enabled in the cloud: You can choose between Amazon Web Services, (AWS), Microsoft Azure and ExaCloud

AWS

This is presently the most popular way our customers run Exasol in the cloud. See here for more details.

MS Azure

Microsoft’s cloud can also be used to run Exasol, which gives you the option to choose between two major public cloud platforms. See here for more details.

ExaCloud

Hosted and managed by Exasol, ExaCloud is a full database-as-a-service offering. See here for more details.

Hybrid Exasol deployments that combine cloud with on-prem can also be used, just depending on customer requirements.

Oracle offers RAC only on the Oracle Cloud platform, not on public clouds. Various other features are also restricted to be available only in Oracle’s own cloud. The licensing model has been tweaked to favor the usage of Oracle’s own cloud over other public clouds generally.

Customer Experience

Customers love Exasol, as the recent Dresner report confirms. We get a perfect recommendation score. I can also tell that from personal encounters: Literally every customer I met is pleased with our product and our services!

Conclusion

Oracle is great for OLTP and okay for analytical workloads – especially if you pay extra for things like Partitioning, RAC, In-Memory Column Store and Exadata. Then the performance you get for your analytical workload might suit your present demand.

Exasol is totally bad for OLTP but best in the world for analytical workloads. Do you think your data volume and your analytic demands will grow?

connor_mc_d's picture

External table preprocessor on Windows

There are plenty of blog posts about using the pre-processor facility in external tables to get OS level information available from inside the database. Here’s a simple example of getting a directory listing:


run_os.bat
==========
@echo off
cd \oracle
dir /b

SQL> create table fs_size (
  2   disk varchar2(64)
  3  )
  4  organization external
  5  (  type oracle_loader
  6     default directory temp
  7     access parameters
  8     ( records delimited by newline
  9       preprocessor  temp:'run_os.bat'
 10     )
 11     location ( temp:'empty.txt' )
 12  ) reject limit unlimited ;

Table created.

SQL>
SQL> select * from fs_size;

DISK
------------------------------------------------
18c_cloud
admin
adw_cloud
atp_cloud
audit
cfgtoollogs
checkpoints
datamodeler
dbsat
diag
instantclient
jdbc183
ords.184
ords122
ords181
ords184
ordsconf
product
sql
sqlcl
sqldeveloper
swingbench
wallet

So far so good. But on Windows, it can be quite particular about the runtime environment. For example, if I fire up a command prompt, as a logged in user I can easily get a listing of disk partitions on my machine:


C:\>wmic logicaldisk get caption
Caption
C:
D:
E:
F:
G:
M:
P:
X:
Z:

It would seem logical that this would be a simple drop-in replacement for the directory listing batch file I used before. I’ll amend the batch file to list out disk partitions


run_os.bat
==========
@echo off
wmic logicaldisk get caption

and now I’ll run my query again.


SQL> select * from fs_size;

no rows selected

Hmmm….that didn’t go as well as planned Smile. Looking in the log file, there are no clues about the cause.


 LOG file opened at 03/06/19 14:33:42

KUP-05007:   Warning: Intra source concurrency disabled because the preprocessor option is being used.

Field Definitions for table FS_SIZE
  Record format DELIMITED BY NEWLINE
  Data in file has same endianness as the platform
  Rows with all null fields are accepted

  Fields in Data Source: 

    DISK                            CHAR (255)
      Terminated by ","
      Trim whitespace same as SQL Loader

The issue here is that many Windows programs require a minimum environment configuration before they will run. In this case, I need to let Windows know the location of the OS installation.


run_os.bat
==========
@echo off
set SystemRoot=C:\WINDOWS
wmic logicaldisk get caption

And voila! I get access to many more Windows command to probe out information from the Operating System.


SQL> select * from fs_size;

DISK
--------------------------------
Caption
C:
D:
E:
F:
G:
M:
P:
X:
Z:
pete's picture

Oracle Security Training in London with Oracle University

I have just agreed some training dates with Oracle University in London and I will be teaching my very popular two day class How to Perform a security audit of an Oracle database on the 29th and 30th April 2019....[Read More]

Posted by Pete On 07/03/19 At 12:15 PM

To prevent automated spam submissions leave this field empty.