Oakies Blog Aggregator

tim.evdbt@gmail.com's picture

Delphix XPP explained

This article was originally posted on the Delphix Support blog on 15-Nov 2015, but with the deprecation of the XPP feature with the new 6.0 release and higher, it was decided best to remove this article.

So, I have saved it and posted it here instead, almost 4 years to the day after it was originally posted…

The topic of converting Oracle databases from one of the proprietary
UNIX platforms (i.e. Solaris, AIX, or HP-UX) to Linux seems at first pretty
esoteric and far-fetched.

Meh. Right?

Plus, a lot of folks who have used Solaris, AIX, and HP-UX over the years can
argue that those operating systems have far more capabilities and
technical advantages than Linux, and they may be absolutely correct.  Who wants to get caught up in a religious debate?

Meh again.  Right?

But interestingly enough, the big issue here is not the relative technical merit.

The Cloud is about the commoditization of servers and services.  Unless you are using the hosting services or Cloud services offered by the operating system vendor themselves (i.e. Oracle for Solaris, IBM for AIX, or by HP or HP-UX), then the only operating systems being offered by hosting or Cloud vendors are either Linux or Windows on Intel x86 or x86_64.

So, converting an Oracle database from UNIX to Linux is a prerequisite to moving it to the Cloud.  Betcha didn’t think of it that way, eh?

This is why Delphix uses the term modernization to describe the capability of cross-platform provisioning (i.e. XPP), also known as UNIX-to-Linux (i.e. U2L).  It is not because Linux on x86/x86_64 platforms are any more modern than Solaris on SPARC, AIX on Power, or HP-UX on PA-RISC/Itanium, but because modernization involves moving the data center to hosting or the Cloud.

I’m not willing to engage in an argument about which platform has more technical merit than one another.  However, I will assert that if you’re not running Oracle on Linux, then you are not positioned for the future, just based on the economics.  If you really think technical merit beats economics, then I have a nice VAX running VMS for you.

So, Delphix XPP a.k.a. U2L is the on-ramp to the Cloud.  I prefer the term “XPP” rather than “U2L”, so I’m going to use that going forward…

Now let’s be clear on a few things.  Only Oracle can write the software to migrate an existing Oracle database from one operating system to another, from one “endian” chipset to another.  So Delphix has not, and could not, write the basic processes to perform conversion, because Oracle datafile formats are themselves proprietary.  But what Delphix has done is take the processes for conversion created by Oracle and actually made them feasible.

Here’s how…

The capability of cross-platform migration was introduced to Oracle RMAN in Oracle10g database release 1 (i.e. 10.1).  This was the CONVERT DATABASE functionality, and it was only capable of converting same-endian chipsets.

The terms “big endian” and “little endian” were invented in Jonathan Swift’s satirical novel “Gulliver’s Travels”, first published in 1726.

The term refers to a civil war in the novel between the people of Lilliput, some of whom ate their soft-boiled eggs by cracking the blunt or “big” end of an egg, others who ate their soft-boiled eggs by cracking the pointy or “little” end of an egg.  This wickedly sharp satire accurately portrays controversy and conflicts over the choices of ordering of bytes within words by processor manufacturers in the 1960s and 1970s, which have subsequently been propagated forward to this day.

-Jonathan Swift “Gulliver’s Travels”, 1726

All three of the proprietary UNIX variants (i.e. Solaris on SPARC, AIX on Power, and HP-UX on both PA-RISC and Itanium) are big-endian.  All x86 and x86_64 chips are little-endian, so XPP involves converting numerics from big-endian to little-endian.

In Oracle10 release 2 (i.e. 10.2), Oracle added the CONVERT TABLESPACE and CONVERT DATAFILE functionality which permit conversion between endian platforms (i.e. big to little or little to big).

So, at this point in time, XPP is functionality that has existed in Oracle for about 10 years now, or as long as Oracle10gR2 has been available.

But XPP hasn’t been a frequently-used feature over that time, as you are no doubt aware.  I know lots of people have known about it, but very few have actually done it.  And even fewer have done it outside of playgrounds, out in real-life databases.

There are two reasons for this: 1) the trend toward commoditization of x86/x86_64 is only now accelerating with the maturation of the Cloud and 2) there are substantial obstacles.

The most substantial obstacle is the fact that the functionality involving endian conversion (i.e. CONVERT DATAFILE/TABLESPACE) also includes Oracle’s transportable tablespace (a.k.a. TTS) functionality.  TTS requires that the source database be in read-only state during any tablespace transport operations.

Now, if you’re trying to convert a Linux copy of a UNIX production database, you don’t want to have to interrupt service in the production database by making it read-only.  In the IT biz, we call that down time and it is bad.  Instead, what you’re going to have to do is create a full copy of the UNIX production database, and then make that copy read-only.

This sounds innocuous enough, but let’s think that through?  If our production database on UNIX is several terabytes or larger, and we wish to simply test the conversion to Linux using CONVERT DATAFILE / TABLESPACE, then we need enough space for two additional copies of the production database:  one for the source UNIX copy (to be set to read-only) and one for the target converted Linux copy.  To paraphrase an old saying, “A couple terabytes here and a couple terabytes there, and pretty soon you’re talking real storage“.  And of course, it is not just about allocating twice the volume of the production storage for this test, there is also the time involved in copying two complete sets of the production database.  Needless to say, it is not easy to enter into this exercise lightly.  Nor repeatedly.  And so, this 10 year old functionality remains generally a big mystery.

Enter Delphix data virtualization.

Delphix can create a virtual database (VDB) for the read-only UNIX copy of the production database, requiring only a few minutes to completely provision regardless of the size of the database.  Of course, practically no additional storage is required for this virtual database which is set to read-only almost immediately and stays read-only throughout its brief lifetime.

Even better, Delphix can also create a VDB for the converted Linux copy of the database, and because the Delphix file-system is able to transparently deduplicate (or mooch) blocks that contain the same contents but in different endian format, even the converted Linux copy of the database consumes very little storage as a result of the conversion.

To use numbers from actual experience, converting a 5 TB production UNIX database without Delphix requires the allocation and copying of 10 TB of additional storage.  Converting the same 5 TB production UNIX database using Delphix requires less than 12 GB of additional storage.

Please note the change from “TB” to “GB” in that last sentence? That is an order of magnitude difference.

Admittedly, while the storage savings are consistent, the amount of time needed to perform XPP potentially has one or both of two long-running operations, which is one fewer long-running operation than when not using Delphix.  Those long-running operations are…

  1. Copying the production UNIX database to create the non-production UNIX database copy
  2. Copying database metadata for TTS operations from the non-production UNIX database copy to the converted Linux database
  3. Using the RMAN CONVERT DATAFILE command to convert all datafile blocks from UNIX to Linux

Item #1 is a long-running and expensive step only without Delphix XPP; with XPP, it is fast and inexpensive.

Item #2 is only an issue for certain databases supporting enormous applications like Peoplesoft which contain millions of objects like tables and indexes.  For most applications, there are only several thousand objects, so the single-threaded export/import of database metadata is not a concern

Item #3 might be an issue if the database is very large, because the RMAN CONVERT DATAFILE operation is converting every single block in the database.

Both items #2 and #3 could issues regardless of whether Delphix XPP is used or not.

So please understand from the beginning that XPP might be a long-running process because of the one or two long-running steps.  At least, with Delphix XPP there are only the one or two potentially long-running steps, because without Delphix XPP there is always one more long-running step.

Also, please understand that without the orchestration provided by Delphix, there are about a dozen non-trivial smaller steps to be performed manually in addition to the 2-3 long-running steps mentioned above.  For example, after the RMAN CONVERT DATAFILE processes are complete, there are several small post-processing tasks, such as recompiling all PL/SQL objects in the database, and so on.  Doing these steps manually as documented by Oracle is prone to human error, and the additional time and effort of automating these steps is strongly recommended without Delphix.

From a cookbook standpoint, there are two major phases of a Delphix XPP operation:  1) validation and 2) conversion, as documented online here.

The validation step, or validating an Oracle dSource for XPP, is documented online here.  It automates the creation of a UNIX database copy to be set to read-only so that validation procedures like the CHECK_DB and CHECK_EXTERNAL procedures in the DBMS_TDB package and the TRANSPORT_SET_CHECK procedure in the DBMS_TTS package can be executed automatically.  These procedures will fail if the database is not in read-only mode, and will verify whether the database contains any of the many restrictions listed in Oracle documentation for conversions online here and for transporting tablespaces online here.  Validation with Delphix XPP is fast and easy, so it can be performed over and over, to ensure that any restrictions encountered have been addressed and eliminated, one way or the other.  Delphix XPP also allows the upload of a SQL script to fix issues identified during validation, if it is not feasible to fix the problem in the source production database itself.  The creation and upload of scripts to fix identified XPP issues is documented online here.

The conversion step, or provisioning a Linux VDB from a UNIX dSource, is fully automated and is documented online here.  Conversion is possible only after validation is successfully completed.  If a SQL script was uploaded to address restrictions encountered during validation, then that script should continue to be used during conversion.  During conversion, Delphix automates the many small and large steps, including the provisioning of a read-only UNIX database copy, the export/import of database metadata, and the RMAN CONVERT DATAFILE processes.

So you can see that a lot of blood, sweat, and tears have been put into this feature.

XPP is another example of the enabling power of data virtualization.  Without it, Oracle cross-platform migration is a complicated, expensive, and slow process with many manual steps.  With Delphix data virtualization, XPP is simple and inexpensive.  It is still not a fast process due to the long-running steps described earlier, but even so it may be fast or easy enough to become feasible as your organization moves toward migration into the Cloud.

connor_mc_d's picture

Do you work in I.T?

If you do, then you know that only 20% of your I.T work is spent at the office. And no, I’m not talking about telecommuting or working remotely here. I’m talking about the 80% of our time we spend handling all those I.T calls from friends and family Smile

So I was motivated to put together this little ode – dedicated to all of us that have to go the extra mile to keep our loved one’s I.T devices working as they should.

Enjoy and share with all those I.T people in your life Smile

Uwe Hesse's picture

Getting started with #Exasol on #AWS

It’s amazingly easy to run an Exasol Cluster on Amazon Web Services (AWS).

Subscribe Exasol in AWS marketplace

After having registered and having logged in to your AWS account, go to the AWS marketplace and search for Exasol:

https://uhesse.files.wordpress.com/2019/11/aws01.png?w=1240&h=530 1240w, https://uhesse.files.wordpress.com/2019/11/aws01.png?w=150&h=64 150w, https://uhesse.files.wordpress.com/2019/11/aws01.png?w=300&h=128 300w, https://uhesse.files.wordpress.com/2019/11/aws01.png?w=768&h=328 768w, https://uhesse.files.wordpress.com/2019/11/aws01.png?w=1024&h=437 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Click on the Exasol Single Node and Cluster BYOL link and then on Continue to Subscribe:

https://uhesse.files.wordpress.com/2019/11/aws02.png?w=1240&h=282 1240w, https://uhesse.files.wordpress.com/2019/11/aws02.png?w=150&h=34 150w, https://uhesse.files.wordpress.com/2019/11/aws02.png?w=300&h=68 300w, https://uhesse.files.wordpress.com/2019/11/aws02.png?w=768&h=174 768w, https://uhesse.files.wordpress.com/2019/11/aws02.png?w=1024&h=232 1024w" sizes="(max-width: 620px) 100vw, 620px" />

After having reviewed the T&C, click on Accept Terms. It shows this message afterwards:

https://uhesse.files.wordpress.com/2019/11/aws03.png?w=1240&h=540 1240w, https://uhesse.files.wordpress.com/2019/11/aws03.png?w=150&h=65 150w, https://uhesse.files.wordpress.com/2019/11/aws03.png?w=300&h=131 300w, https://uhesse.files.wordpress.com/2019/11/aws03.png?w=768&h=335 768w, https://uhesse.files.wordpress.com/2019/11/aws03.png?w=1024&h=446 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Create Key Pair

Now login to the AWS Management Console, select a region close to your location and open the EC2 Dashboard. Click on Key Pairs:

https://uhesse.files.wordpress.com/2019/11/aws04.png?w=1238&h=482 1238w, https://uhesse.files.wordpress.com/2019/11/aws04.png?w=150&h=58 150w, https://uhesse.files.wordpress.com/2019/11/aws04.png?w=300&h=117 300w, https://uhesse.files.wordpress.com/2019/11/aws04.png?w=768&h=299 768w, https://uhesse.files.wordpress.com/2019/11/aws04.png?w=1024&h=399 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Click on Create Key Pair now and enter a name for the new Key Pair, then click on Create:

https://uhesse.files.wordpress.com/2019/11/aws05.png?w=1240&h=494 1240w, https://uhesse.files.wordpress.com/2019/11/aws05.png?w=150&h=60 150w, https://uhesse.files.wordpress.com/2019/11/aws05.png?w=300&h=120 300w, https://uhesse.files.wordpress.com/2019/11/aws05.png?w=768&h=306 768w, https://uhesse.files.wordpress.com/2019/11/aws05.png?w=1024&h=408 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Now you are ready to use the Exasol Cloud Deployment Wizard. Stay logged in with AWS Management Console as you will be routed back there by the Deployment Wizard soon.

Using the Cloud Deployment Wizard

Put this URL into your browser: https://cloudtools.exasol.com/ and click on AWS then:

https://uhesse.files.wordpress.com/2019/11/aws06.png?w=1238&h=562 1238w, https://uhesse.files.wordpress.com/2019/11/aws06.png?w=150&h=68 150w, https://uhesse.files.wordpress.com/2019/11/aws06.png?w=300&h=136 300w, https://uhesse.files.wordpress.com/2019/11/aws06.png?w=768&h=349 768w, https://uhesse.files.wordpress.com/2019/11/aws06.png?w=1024&h=465 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Select a region close to your location and click on Continue:

https://uhesse.files.wordpress.com/2019/11/aws07.png?w=1240&h=622 1240w, https://uhesse.files.wordpress.com/2019/11/aws07.png?w=150&h=75 150w, https://uhesse.files.wordpress.com/2019/11/aws07.png?w=300&h=151 300w, https://uhesse.files.wordpress.com/2019/11/aws07.png?w=768&h=385 768w, https://uhesse.files.wordpress.com/2019/11/aws07.png?w=1024&h=514 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Click on Advanced Configuration and specify

https://uhesse.files.wordpress.com/2019/11/aws08.png?w=1240&h=782 1240w, https://uhesse.files.wordpress.com/2019/11/aws08.png?w=150&h=95 150w, https://uhesse.files.wordpress.com/2019/11/aws08.png?w=300&h=189 300w, https://uhesse.files.wordpress.com/2019/11/aws08.png?w=768&h=484 768w, https://uhesse.files.wordpress.com/2019/11/aws08.png?w=1024&h=645 1024w" sizes="(max-width: 620px) 100vw, 620px" />

License Model Bring-your-own-license, System Type Enterprise Cluster, Instance Family Memory Optimized, Instance Type r5, Instance Model r5 large, Number of DB Node 1 then click Continue.

BYOL works without license file with a limit of 20 GB memory for the database. Means no costs are charged by Exasol (But by Amazon) for this environment.

Select create new VPC and click on Launch Stack on this page now:

https://uhesse.files.wordpress.com/2019/11/aws09.png?w=150&h=118 150w, https://uhesse.files.wordpress.com/2019/11/aws09.png?w=300&h=237 300w, https://uhesse.files.wordpress.com/2019/11/aws09.png?w=768&h=606 768w, https://uhesse.files.wordpress.com/2019/11/aws09.png 942w" sizes="(max-width: 620px) 100vw, 620px" />

Using CloudFormation

This takes you to the Quick create stack page of CloudFormation in AWS Management Console:

https://uhesse.files.wordpress.com/2019/11/aws10.png?w=1238&h=612 1238w, https://uhesse.files.wordpress.com/2019/11/aws10.png?w=150&h=74 150w, https://uhesse.files.wordpress.com/2019/11/aws10.png?w=300&h=148 300w, https://uhesse.files.wordpress.com/2019/11/aws10.png?w=768&h=380 768w, https://uhesse.files.wordpress.com/2019/11/aws10.png?w=1024&h=506 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Enter these details on the page:

Stack name

Key Pair (select the key pair created previously)

SYS User Password

ADMIN User Password

Public IPs (true)

Tick the acknowledge box and click on Create stack

Now go to the EC2 details page and copy the Public IP of the management node:

https://uhesse.files.wordpress.com/2019/11/aws11.png?w=1240&h=630 1240w, https://uhesse.files.wordpress.com/2019/11/aws11.png?w=150&h=76 150w, https://uhesse.files.wordpress.com/2019/11/aws11.png?w=300&h=153 300w, https://uhesse.files.wordpress.com/2019/11/aws11.png?w=768&h=391 768w, https://uhesse.files.wordpress.com/2019/11/aws11.png?w=1024&h=521 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Put that with an https:// prefix into a browser and click on Advanced:

https://uhesse.files.wordpress.com/2019/11/aws12.png?w=1240&h=752 1240w, https://uhesse.files.wordpress.com/2019/11/aws12.png?w=150&h=91 150w, https://uhesse.files.wordpress.com/2019/11/aws12.png?w=300&h=182 300w, https://uhesse.files.wordpress.com/2019/11/aws12.png?w=768&h=466 768w, https://uhesse.files.wordpress.com/2019/11/aws12.png?w=1024&h=621 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Then you should see a progress bar like this:

https://uhesse.files.wordpress.com/2019/11/aws13.png?w=1238&h=646 1238w, https://uhesse.files.wordpress.com/2019/11/aws13.png?w=150&h=78 150w, https://uhesse.files.wordpress.com/2019/11/aws13.png?w=300&h=157 300w, https://uhesse.files.wordpress.com/2019/11/aws13.png?w=768&h=401 768w, https://uhesse.files.wordpress.com/2019/11/aws13.png?w=1024&h=534 1024w" sizes="(max-width: 620px) 100vw, 620px" />

That screen changes after about 30 Minutes to the EXAoperation login screen.

Login as user admin with the password, you specified previously on the CloudFormation Quick create stack page. There should be a database running:

https://uhesse.files.wordpress.com/2019/11/aws14.png?w=1240&h=374 1240w, https://uhesse.files.wordpress.com/2019/11/aws14.png?w=150&h=45 150w, https://uhesse.files.wordpress.com/2019/11/aws14.png?w=300&h=90 300w, https://uhesse.files.wordpress.com/2019/11/aws14.png?w=768&h=232 768w, https://uhesse.files.wordpress.com/2019/11/aws14.png?w=1024&h=309 1024w" sizes="(max-width: 620px) 100vw, 620px" />

As you can see now, you have a database, a remote archive volume using an Amazon S3 bucket ready for backup & restore and a log service to monitor your system.

This database is limited to 20 GB memory only unless a license file is uploaded to the license server aka management node. For educational purposes, I don’t need more.

Use Elastic IPs

The public IPs of your data nodes will change upon every restart, which is probably not convenient.
Therefore, click on Elastic IPs in the EC2 dashboard, then click on Allocate new address:

https://uhesse.files.wordpress.com/2019/11/aws15.png?w=1240&h=382 1240w, https://uhesse.files.wordpress.com/2019/11/aws15.png?w=150&h=46 150w, https://uhesse.files.wordpress.com/2019/11/aws15.png?w=300&h=92 300w, https://uhesse.files.wordpress.com/2019/11/aws15.png?w=768&h=236 768w, https://uhesse.files.wordpress.com/2019/11/aws15.png?w=1024&h=315 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Select Amazon pool then click on Allocate:

https://uhesse.files.wordpress.com/2019/11/aws16.png?w=1240&h=428 1240w, https://uhesse.files.wordpress.com/2019/11/aws16.png?w=150&h=52 150w, https://uhesse.files.wordpress.com/2019/11/aws16.png?w=300&h=104 300w, https://uhesse.files.wordpress.com/2019/11/aws16.png?w=768&h=265 768w, https://uhesse.files.wordpress.com/2019/11/aws16.png?w=1024&h=354 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Click on the IP on the following screen:

https://uhesse.files.wordpress.com/2019/11/aws17.png?w=150&h=44 150w, https://uhesse.files.wordpress.com/2019/11/aws17.png?w=300&h=87 300w, https://uhesse.files.wordpress.com/2019/11/aws17.png?w=768&h=223 768w, https://uhesse.files.wordpress.com/2019/11/aws17.png?w=1024&h=297 1024w, https://uhesse.files.wordpress.com/2019/11/aws17.png 1026w" sizes="(max-width: 620px) 100vw, 620px" />

Select the action Associate address on the next screen:

https://uhesse.files.wordpress.com/2019/11/aws18.png?w=150&h=54 150w, https://uhesse.files.wordpress.com/2019/11/aws18.png?w=300&h=108 300w, https://uhesse.files.wordpress.com/2019/11/aws18.png?w=768&h=276 768w, https://uhesse.files.wordpress.com/2019/11/aws18.png 777w" sizes="(max-width: 620px) 100vw, 620px" />

Select the data node from the Select instance pull down menu and click on Associate:

https://uhesse.files.wordpress.com/2019/11/aws19.png?w=150&h=69 150w, https://uhesse.files.wordpress.com/2019/11/aws19.png?w=300&h=138 300w, https://uhesse.files.wordpress.com/2019/11/aws19.png?w=768&h=353 768w, https://uhesse.files.wordpress.com/2019/11/aws19.png?w=1024&h=471 1024w, https://uhesse.files.wordpress.com/2019/11/aws19.png 1137w" sizes="(max-width: 620px) 100vw, 620px" />

Close the next screen and go to the EC2 instance page. You should see the elastic IP assigned to the data node there:

https://uhesse.files.wordpress.com/2019/11/aws20.png?w=1240&h=534 1240w, https://uhesse.files.wordpress.com/2019/11/aws20.png?w=150&h=65 150w, https://uhesse.files.wordpress.com/2019/11/aws20.png?w=300&h=129 300w, https://uhesse.files.wordpress.com/2019/11/aws20.png?w=768&h=331 768w, https://uhesse.files.wordpress.com/2019/11/aws20.png?w=1024&h=441 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Connect with a SQL Client to your Exasol database on AWS

This is how that looks with DbVisualizer:

https://uhesse.files.wordpress.com/2019/11/aws21.png?w=150 150w, https://uhesse.files.wordpress.com/2019/11/aws21.png?w=300 300w" sizes="(max-width: 510px) 100vw, 510px" />

And that’s it: Now you have an Exasol 1+0 cluster running on AWS. That’s not the same as a single node system, because this 1+0 cluster can be enlarged with more data nodes. I will show how to do that in future posts.

A word about costs: Instead of using our corporate AWS account, I registered myself to see how much that will take. It was less than 80 Euro with a 2+1 cluster environment I used for about one month, shutting down the EC2 instances whenever I didn’t need them for testing and for creating courseware. It should be well below 10 Euro per day with the very moderate resource consumption configured for the environment subject to my postings.

Stay tuned for some more to come about Exasol on AWS </p />
    <div class=»

oraclebase's picture

WordPress 5.3 “Kirk”

WordPress 5.3 “Kirk” has been released.

I guess many of the people out there are running on hosted solutions, so your upgrade will happen when your hosting company decides to apply it. For those that self-host, you’re just a button press away.

This morning I’ve updated 5 separate blogs, all hosted independently, and they all went through fine. At the moment the upgrades have to be manually initiated, but I’m guessing in a few days they’ll just run through automatically.

Since it was introduced, the Site Health feature (Tools > Site Health) has been improved with each release. It’s worth running this to check if there are any recommended security or performance settings. The new release picked up some things the previous release didn’t. My website was bounced a couple of times while I made changes to sort them.

If I’m honest, I can barely notice a difference. I’m not a “power user” of WordPress. I just write blog posts with it. There are allegedly improvements to the block editing, but I really can’t tell. A couple of the popup controls look a little different, but from the way I use it, that’s about all. I’m sure people with different usage patterns will have a different reaction to this release.

I was interested to see what the Twenty Twenty theme would look like. Not my cup of tea! </p />
    <div class=»

mwidlake's picture

UKOUG TechFest19 Survival Guide

Brighton, December 1st-4th 2019, Grand Hotel, Techfest2019. This is the big Technology event for the UKOUG this year, one of the largest Oracle Tech events in Europe.

The UKOUG conference is traditionally the one to finish the European year of conferencing and it is always incredibly popular with both delegates and presenters. There are two things that are not traditional about this year’s UKOUG December conference:

  • It is Technology Focused. We asked our members when they wanted their annual conference and there was a strong split between Business Applications and Technology members, with many Business Apps members finding December a bad time to be out of the office and most of them preferring May/June, so we split the conference and the big Business Apps conference will be in June 2020. However, our Tech members wanted to stick to December.
  • The conference is in the South of England. Birmingham was our conference home for many years and we have been to Manchester & Liverpool, so time to try the South.

I’m really please we are in Brighton as it is a lively, fascinating place. Also, being that little bit further south, it might be less cold. Slightly!

Why Come?

Because there will be fantastic talks, round tables, Q&A sessions, experts to quiz, people with the the same technical challenges as you, Partners able to offer services and, last but not least, it will be fun!

Technical Content

The UKOUG conferences are very popular with presenters. On average we get 5 submissions per presenting slot, more for some streams. We could fill the conference with talks from Oracle ACEs, Oracle Certified Masters, and the best Oracle Corp offer. What we actually do is have stream-specific teams that select not just known speakers but also topics we know are hot, new presenters, avoid repeating content. It’s damned hard work but we aim to give you:

  • Independent experts who will tell you exactly how it is, like Richard Foote on indexes (all the way from Auz, so a rare chance to see him), Frank Pachot from CERN, Security guru Pete Finnigan, Abigail Giles-Haigh, Craig Shallahamer, Jonathan Lewis, Zahid Anwar, Loneke Dikmans…
  • Oracle giving you the latest information “from the horses mouth” and, just as important, the chance to meet product managers and other experts. People like Maria Colgan, Mike Deitrich, Jeff Smith, Nigel Bayliss, Susan Duncan
  • 9 or more concurrent streams across Development, Analytics & Data Science, Database, Systems & Infrastrructure, and APEX. No matter what your interest in the Oracle Tech world we hope your problem will not be “is there a session of interest” but “which session of interest do I go to now?”
  • Roundtable discussions, panels, keynotes, presentations – and the chance to meet the experts around the conference and at the socials


Learning stuff at conference is the name of the game, but so is having some fun. The more enjoyable the conference and the social times after are, the more you you will get out of the content. I know from personal experience that if a conference is just information and being serious, after a few hours my brain shuts off.

Also, it’s when you are more relaxed that the magic thing about attending an event in person happens – you meet people and get to know them better. This opens doors to industry experts, you find people dealing with the same PIA technical issues as you, you exchange war stories. You make friends. I get just as much (if not more) from the people I meet at conference than the official presentations.

Monday evening there will be networking drinks, Tuesday will be the big party (and I’ve been promised No Loud Music!!!). If you are a UKOUG volunteer or speaker, there is a drinks reception Sunday night. (I know of a couple of other events being put on by other companies too, such as Rittman Mead).

We will be having the retro games consoles scattered around the venue again.

And, we are in Brighton! Of course as the UKOUG President I would never encourage you to leave the conference hotel… But as a human being I would say go and look around Brighton, have a bit of fun! You might want to do what I am doing and be in Brighton a day or two before the event (or after) and really enjoy what the town has to offer.  Mrs Widlake is coming with me on Saturday so we can have a mini break.

One other fun thing – Mark Rittman is organising a gentle cycle ride Sunday morning. Details can be found {here},it will be a couple of hours via a cafe, prior to Super Sunday starting. I plan to take part.

Now, the practical stuff:

Getting There


Basically, if you can get to London OK, you can get to Brighton just fine. Trains go from Victoria in under an hour, from St Pancras (very convenient if you come to London on Eurostar), London Bridge (both about 90 mins) and, if you live near Cambridge, you can get a direct train through London to Brighton. There is a direct service from Gatwick Airport taking about half an hour.

I’d strongly advise booking *now*. If you come down on Saturday or Sunday, it can cost as little as £15-20 from London, £40 from Birmingham, Bristol or Leeds.

If you don’t often travel by train just be aware that “open” tickets and booking only a few days ahead can be eye-wateringly expensive. Plan ahead, decide when you are travelling, and book ASAP.


The best international airport to fly to for Brighton is Gatwick, as there is a fast (1/2 hour) train service direct to Brighton for as little as £10. A taxi will take 40-50 minutes and cost that many pounds.

Heathrow is also sort-of on the same side of London as Brighton but you will either have to go into London to Victoria by the slow Tube line and then out on the normal train services to Brighton, or take the Heathrow Express (15 mins, about £15 each way) to London Paddington and take the tube Central Line around to Victoria.

If you come in to Stansted, basically get into London (Stansted Express) and work it out from there!

For Luton (and Stansted, sort of) Niall Litchfield says

If you are flying into Luton, don’t go into London and change. Take the shuttle bus to Luton Airport Parkway station (10 minutes) and take the direct train to Brighton. If you are going to Stanstead then you should consider your life choices!



UPDATE – see comments by Niall Litchfield (again, helpful chap), a local who says to not drive in to Brighton as parking is so bad. He is 20 mins away and will take the local train. Best bet if you must is Park and Ride

It’s relatively simple to drive to Brighton. You go around the M25 to the M23 and down that, and keep going when it turns into the A23. I’m not so sure about coming along the more coastal road (A27) – I have bad memories of it taking ages to get anywhere.

But parking can be expensive. If you are not being provided parking by a hotel you are using or you plan to come in and go home each day then you might like to look at https://www.visitbrighton.com/plan-your-visit/travel-information/parking or similar. I’m no expert on parking in Brighton (I last did it 30 years ago) but I’ll ask someone local and update this accordingly. My one hint would be avoid NCP car parks – they are usually very expensive and, as a company, they are terrible. Ask anyone who commutes by train into London or any other major city and they probably hate NCP with a passion.


Don’t be daft, unless you are local, in which case you know more than I do!


Where to Stay

I’m afraid you missed the special deal to stay at the Grand (the location of the conference) but you might still be able to book there. However, at the time of writing (see image), there are many, many hotels available around Brighton and you might want to look at Air B&B for something cheaper.

I personally use Trivago to find accommodation but other websites are available. They should all allow you to what I do which is choose the lowest “comfort” level you want and the price range. I then use the map view as it makes things a lot easier than a list of hotels with no idea where they actually are!

I’m actually staying at the conference venue – as President I have a lot of duties so it makes sense for me to be on-site. I also know that there are a lot of presenters etc staying at the hotel so it should add to the vibe, but sometimes I specifically choose to stay a 5, 10 minute walk from a Conference, so I can get away from it all if I should wish. I find a 10 minutes stroll before a conference wakes me up and doing so after gives my brain a chance to turn off a little.

Coffee, Refreshments etc.

It’s been a problem for years at UKOUG conferences. Getting coffee (or tea or whatever) has been a real challenge as the venues always wanted a fortune to provide catering all day. Catering! Just hot drinks and maybe some biscuits! This year, tea & coffee will be available throughout the conference! I’m not guaranteeing it will be good tea and coffee, I’m not daft, but Brighton has a big coffee culture so I have hopes.

Water should always be available.

If your are a coffee snob (looking at one person in particular here) then, look, we are IN BRIGHTON! Go out the hotel and walk 2 minutes, you will soon find a hipster cafe and can get your double espresso skinny latte with raw cane sugar there. And in fact, yeah, do it! Pop out the venue for 10 mins and go to a local cafe. Or get an ice cream. Or, if you are inclined, a glass of wine and a cake. Cafe culture is all around you.

If you don’t like the provided coffee at the conference, don’t tell me. Tell me about other things that are right or wrong but, honestly, the quality of the coffee is not something I want to hear anything more about. This is the UK and it is an I.T. conference, the coffee is supposed to bad!

You will have been asked when you registered for the event if you have dietary requirements and this should be catered for. Vegetarian options should be provided at all meals as a matter of course. Any issues, as the UKOUG staff and they will sort it out for you.

At the social events there will be soft drinks as well as alcoholic ones. Some people like alcohol, some do not, it really is not that important if you drink or not. BUT if you find there are no soft options then let the UKOUG staff know immediately – we had a problem one year where the caterers only provided beer & wine and no one mentioned it for ages. They just got angry and slagged us off after the event.

There will be no secret whisky tasting this year. There never has been. It’s just a rumour. If whisky is not your thing then feel free to not bring a different thing to share at the non-existing tasting.

Chocolate. I’ve also not heard rumours about a chocolate tasting happening…

Other Hints

Go to at least one talk you know nothing about, that is not your core work area. You will probably learn something unexpectedly useful! You might even get a peak at a shift in your career.

Speak to the famous people. They are human, they are *just like you* (only, of course, much much smarter…). Honestly, just say “hi” or “isn’t it a shame about the Rugby world cup final” or “what bread do you like to bake?” (this is surprisingly likely to get an interested response from a growing number of speakers). Have a little chat. But also, please do not stalk. If you find yourself hanging about after a session to chat to the same person you chatted to three time already, you have become a scary stalker and need to stop.

If you don’t know many people at the conference, go to a panel session or a round table. If you can build up the courage, when you see a circle of half a dozen people chatting and you recognise some of them as “in your area”, go and join in. (And, if you are one of those people in a circle of mates, chatting, keep an eye out for people hanging about nearby looking nervous. I wish we did not stand in these circles, backs to each other, but I can’t think of a good way to break the circle.)

Take breaks. If you do 7 sessions in succession I am willing to bet nothing is going into the brain anymore. If you happen to find yourself talking with people just before a session starts and you are enjoying the conversation, maybe keep it going and have a coffee/water. I really do believe that those contacts you make/develop at conferences and the ad-hoc things you learn as just as valuable as listening to Connor McDonald bang on about SQL in his boring monotone again. He does rubbish slides.



davidkurtz's picture

nVision Bug in PeopleTools 8.55/8.56 Impacts Performance

I have recently come across an interesting bug in nVision that has a significant performance impact on nVision reports in particular and can impact the database as a whole.

Problem nVision SQL

This is an example of the problematic SQL generated by nVision.  The problem is that all of the SQL looks like this. There is never any group by clause, nor any grouping columns in the select clause in from of the SUM().

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">SELECT SUM(A.POSTED_BASE_AMT) 

Each query only returns a single row, that only populates a single cell in the report, and therefore a different SQL statement is generated and executed for every cell in the report.  Therefore, more statements are parsed and executed, and more scans of the ledger indexes and look-ups of the ledger table and performed.  This consumes more CPU, more logical I/O.

Normal nVision SQL

This is how I would expect normal nVision SQL to look.  This example, although obfuscated, came from a real customer system.  Note how the query is grouped by TREE_NODE_NUM from two of the tree selector tables, so this one query now populates a block of cells.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">SELECT L2.TREE_NODE_NUM,L3.TREE_NODE_NUM,SUM(A.POSTED_TOTAL_AMT) 

The Bug

This Oracle note details an nVision bug:

"UPTO SET2A-C Fixes - Details-only nPlosion not happening for Single Chart-field nPlosion Criteria.
And also encountered a performance issue when enabled details-only nPlosion for most of the row criteria in the same layout
Issue was introduced on build 8.55.19.
Condition: When most of the row filter criteria enabled Details-only nPlosion. This is solved in 8.55.22 & 8.56.07.
UPTO SET3 Fixes - Performance issue due to the SET2A-C fixes has solved but encountered new one. Performance issue when first chart-field is same for most of the row criteria in the same layout.
Issue was introduced on builds 8.55.22 & 8.56.07.
Condition: When most of the filter criteria’s first chart-field is same. The issue is solved in 8.55.25 & 8.56.10."

In summary

  • Bug introduced in PeopleTools 8.55.19, fully resolved in 8.55.25.
  • Bug introduced in PeopleTools 8.56.07, fully resolved in 8.56.10.

davidkurtz's picture

Consumer Group Mapping Rules Use Pattern Matching from 12.1

I recently noticed a small, but I think significant, change in the way consumer group mapping rules behave from Oracle 11.2.04.  Session attributes can be matched to resource groups using LIKE expressions and simple regular expressions specified in the matching rules, though only for certain attributes.
(Updated 12.11.2019) I am grateful to Mikhail Velikikh for his comment.  It depends on which version of Oracle's documentation for 11.2 you read.  Pattern matching does work in for the attributes listed in the 12.1 documentation. My testing indicates that pattern matching does not happen in
You cannot pattern match the SERVICE_NAME in 11.2.  The attribute value is validated against the list of valid services.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">exec DBMS_RESOURCE_MANAGER.SET_CONSUMER_GROUP_MAPPING(DBMS_RESOURCE_MANAGER.SERVICE_NAME, 'WIBBLE', 'MYGROUP1');
ORA-00042: Unknown Service name WIBBLE

However, you can pattern match SERVICE_NAME in 12.1, although SERVICE_NAME is not in the list of attributes for which the documentation says pattern matching is available.  This may be documentation bug (see Oracle support note 1992704.1).

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">DBMS_RESOURCE_MANAGER.SET_CONSUMER_GROUP_MAPPING(
attribute IN VARCHAR2,
value IN VARCHAR2,
consumer_group IN VARCHAR2 DEFAULT NULL);

11.2 Documentation (c)2010

You use the SET_CONSUMER_GROUP_MAPPING procedure to map a session attribute/value pair to a consumer group. The parameters for this procedure are the following:

attribute The session attribute type, specified as a package constant
value The value of the attribute
consumer group The consumer group to map to for this attribute/value pair

11.2 Documentation (c)2015

This procedure adds, deletes, or modifies entries that map sessions to consumer groups, based on the session's login and runtime attributes.

attribute Mapping attribute to add or modify. It can be one of the Constants listed.
value Attribute value to match. This includes both absolute mapping and regular expressions.
consumer group Name of the mapped consumer group, or NULL to delete a mapping

Usage Notes 
If no mapping exists for the given attribute and value, a mapping to the given consumer group will be created. If a mapping already exists for the given attribute and value, the mapped consumer group will be updated to the one given. If the consumer_group argument is NULL, then any mapping from the given attribute and value will be deleted.
The subprogram supports simple regex expressions for the value parameter. It implements the same semantics as the SQL 'LIKE' operator. Specifically, it uses '%' as a multi-character wildcard and '_' as a single character wildcard. The '\' character can be used to escape the wildcards. Note that wildcards can only be used if the attribute is one of the following:


*Consumer group mapping comparisons for DBMS_RESOURCE_MANAGER.CLIENT_PROGRAM are performed by stripping the @ sign and following characters from V$SESSION.PROGRAM before comparing it to the CLIENT_PROGRAM value supplied.

Autonomous Database Example

The mappings are reported in the view DBA_RSRC_GROUP_MAPPINGS. An example of using a LIKE expression in a mapping can be found in the standard configuration of autonomous cloud databases. There are 5 standard mappings for 5 services to 5 consumer groups. The automatically created service names are prefixed with the database name, then HIGH, LOW, MEDIUM, TP or TPURGENT., and are suffixed .atp.oraclecloud.com.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">SELECT * FROM v$services ORDER BY 1;


However, the mappings are defined using pattern matching rather than referencing the explicit name of the service, thus exactly the same mappings can be defined on every autonomous database.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">select * from DBA_RSRC_GROUP_MAPPINGS order by 1,2;

------------ --------------- ---------- ----------


  • By default, the ORACLE_USER attribute's priority 6 takes precedence over SERVICE_NAME whose priority is 7. Therefore, GGADMIN, the GoldenGate user, always uses the LOW consumer group irrespective of the service that it connects to. The default priorities can be changed.
  • Wildcards cannot be used for the ORACLE_USER attribute.

oraclebase's picture

Video : Oracle REST Data Services (ORDS) : Database Authentication

Today’s video is a run through the Database Authentication functionality in Oracle REST Data Services (ORDS).

As always, this is based on an article on the same subject.

There are better methods of authentication and authorization available from ORDS, which you can read about here.

The star of today’s video is Stew Ashton, who is barely audible over the noise of the bar. </p />
    <div class=»

Jonathan Lewis's picture

Nested Tables

This note is a little side effect of answering a question about the total space associated with a table, including its indexes, LOB columns and nested tables. The first suggestion I published failed to take account of the fact that nested tables can contain their own nested tables so I had to modify the code.

The basic change was easy – in the original code I had joined to the view dba_nested_tables to translate between a table_name and its parent_table_name. The only change I needed to make was to replace the reference to the view with a different view that joined a table_name to its “oldest ancestor”. To achieve this I had to do two things: first, create a table with multiple levels of nesting, then create a suitable view definition.  For reference – because it may help somebody – I’ve published the two stages here.

A revolting nested table:

The following code creates three table types, but the second table type

rem     Script:         nested_tables.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2019
rem     Last tested 

create or replace type jpl_item3 as object (v1 varchar2(10), n3 number);

create or replace type jpl_tab3_type as table of jpl_item3;

create or replace type jpl_item2 as object(v2 varchar2(10), jpl3 jpl_tab3_type);

create or replace type jpl_tab2_type as table of jpl_item2;

create or replace type jpl_item1 as object(v3 varchar2(10), jpl2 jpl_tab2_type);

create or replace type jpl_tab1_type as table of jpl_item1;

create table demo_nest_2 (
        id      number  constraint d2_pk primary key,
        jpl1    jpl_tab1_type
segment creation immediate
nested table jpl1 store as t_jpl1
        nested table jpl2  store as t_jpl2
                nested table jpl3 store as t_jpl3 
                return as locator
        return as locator
return as locator

I’ve never seen nested tables manually created in a production system though I believe they are likely to appear (along with varrays and LOBs) as a side effect of some XML or JSON mechanisms, but many years ago (in Practical Oracle 8i, possibly) I discussed the pros and cons of returning them “by value” or (as I have here) “by reference”. As you can see, you need to exercise some care with brackets and locating the text as you try to refine multiple levels of nesting.

Tne Ancestor View

We’ll take this in three steps – first, report from user_nested_tables. (The final script for reporting space used dba_nested_tables with a predicate available on the owner column, but I don’t give myself DBA privileges while writing quick and dirty bits of SQL.).

        parent_table_name, parent_table_column, table_name, 
        ltrim(storage_spec) storage_spec,       -- the DBA view definition includes lpad( , 30) !!
        ltrim(return_type ) return_type         -- the DBA view definition includes lpad( , 20) !!
order by
        parent_table_name, parent_table_column

-------------------- -------------------- -------------------- -------------------- --------------------
DEMO_NEST_2          JPL1                 T_JPL1               USER_SPECIFIED       LOCATOR
T_JPL1               JPL2                 T_JPL2               USER_SPECIFIED       LOCATOR
T_JPL2               JPL3                 T_JPL3               DEFAULT              LOCATOR

You’ll notice the odd ltrim() – I have no idea why the view defines these columns to be left-padded the way they are, possibly it dates back to the time when something like cast(… as vachar2(30)) wasn’t a possible option.

Next a simple “connect by” query what uses the above list in a materialize “with subquery” (CTE):

with my_nested_tables as (
        /*+ materialize */
        parent_table_name, table_name
        parent_table, child_table, pathname
from    (
                sys_connect_by_path(table_name, '/')    pathname,
                connect_by_root parent_table_name parent_table,
                table_name child_table
        connect by
                parent_table_name = prior table_name
order by
        parent_table, child_table, pathname

-------------------- -------------------- ----------------------------------------
DEMO_NEST_2          T_JPL1               /T_JPL1
DEMO_NEST_2          T_JPL2               /T_JPL1/T_JPL2
DEMO_NEST_2          T_JPL3               /T_JPL1/T_JPL2/T_JPL3
T_JPL1               T_JPL2               /T_JPL2
T_JPL1               T_JPL3               /T_JPL2/T_JPL3
T_JPL2               T_JPL3               /T_JPL3

As required this shows me demo_nest_2 as the owning ancestor of t_jpl1, t_jpl2 and t_jpl3. Unfortunately it has also produced three rows that we don’t want to see in our final space-summing code. But it’s easy enough to get rid of those – the only rows we want are the rows with a parent_table that doesn’t appear as a child_table:

with my_nested_tables as (
        /*+ materialize */
        parent_table_name, table_name
select  parent_table, child_table, pathname
from    (
                sys_connect_by_path(table_name, '/')    pathname,
                connect_by_root parent_table_name parent_table,
                table_name child_table
        connect by
                parent_table_name = prior table_name
where   (parent_table) not in (
                select table_name
                from   my_nested_tables
order by
        parent_table, child_table, pathname

-------------------- -------------------- ----------------------------------------
DEMO_NEST_2          T_JPL1               /T_JPL1
DEMO_NEST_2          T_JPL2               /T_JPL1/T_JPL2
DEMO_NEST_2          T_JPL3               /T_JPL1/T_JPL2/T_JPL3

3 rows selected.

And there’s the result we wanted to see. A first simple corroboration of the fact is that the output corresponds with the “NESTED TABLE” segments reported by user_segments.

Of course, having written a query that gets the right result from a table definition we used to help us define the query in the first place we now ought to create a few more tables with different structures to see if the query continues to give the right results in more complex cases.

What happens, for example, if I create a table with two columns of nested tables, and one of the nested table typed also contained two nested tables ? What happens if the base table is an index organized table ?

It’s easy to do the second test – just add “organization index” immediately after “segment creation immediate” in the table creation statement. The correct results drop out.

As for the first test – here’s the SQL to create one example (and the query still gets the right answers – even if you change the table to be index organized).

drop type jpl_tab1_type;
drop type jpl_item1;

drop type jpl_tab2_type;
drop type jpl_item2;

drop type jpl_tab3_type;
drop type jpl_item3;

purge recyclebin;

create or replace type jpl_item3 as object (v1 varchar2(10), n3 number);

create or replace type jpl_tab3_type as table of jpl_item3;

create or replace type jpl_item2 as object(v2 varchar2(10), jpl3 jpl_tab3_type, jpl3x jpl_tab3_type);

create or replace type jpl_tab2_type as table of jpl_item2;

create or replace type jpl_item1 as object(v3 varchar2(10), jpl2 jpl_tab2_type)

create or replace type jpl_tab1_type as table of jpl_item1;

create table demo_nest_3 (
        id      number  constraint d2_pk primary key,
        jpl1    jpl_tab1_type,
        jpl1a   jpl_tab1_type
segment creation immediate
-- organization index
nested table jpl1 store as t_jpl1
        nested table jpl2  store as t_jpl2
                nested table jpl3 store as t_jpl3 
                return as locator
                nested table jpl3x store as t_jpl3x 
                return as locator
        return as locator
return as locator
nested table jpl1a store as t_jpl1a
        nested table jpl2  store as t_jpl2a
                nested table jpl3 store as t_jpl3a
                return as locator
                nested table jpl3x store as t_jpl3xa
                return as locator
        return as locator
return as locator

All that remains now is to modify the code to use the equivalent DBA views, with the addition of the owner column, then slot the resulting code into the original query in place of the simple references to dba_nested_tables. If you go to the original posting you’ll see that I’ve done this by wrapping the code into a CTE so that the only changes to the rest of the code appear as two (flagged) changes where the CTE is then used.

jeremy.schneider's picture

PostgreSQL Invalid Page and Checksum Verification Failed

At the Seattle PostgreSQL User Group meetup this past Tuesday, we got onto the topic of invalid pages in PostgreSQL. It was a fun discussion and it made me realize that it’d be worth writing down a bunch of the stuff we talked about – it might be interesting to a few more people too!

Invalid Page In Block

You see an error message that looks like this:

ERROR: invalid page in block 1226710 of relation base/16750/27244

First and foremost – what does this error mean? I like to think of PostgreSQL as having a fairly strong “boundary” between (1) the database itself and (2) the operating system [and by extension everything else… firmware, disks, network, remote storage, etc]. PostgreSQL maintains page validity primarily on the way in and out of its buffer cache.

https://ardentperf.files.wordpress.com/2019/11/pg-page-validity.png?w=150 150w, https://ardentperf.files.wordpress.com/2019/11/pg-page-validity.png?w=300 300w, https://ardentperf.files.wordpress.com/2019/11/pg-page-validity.png?w=768 768w, https://ardentperf.files.wordpress.com/2019/11/pg-page-validity.png 1629w" sizes="(max-width: 1024px) 100vw, 1024px" />

What does this mean in practice? Suppose there’s a physical memory failure and somehow the ECC parity is unable to detect it. This means that a little bit of physical memory on the server now has incorrect garbage and the correct data from that memory is lost.

  • If the garbage bytes map to part of the kernel page cache, then when PostgreSQL tries to copy the page into it’s buffer cache then it will (if possible) detect that something is wrong, refuse to poison its buffer cache with this invalid 8k page, and error out any queries that require this page for processing with the ERROR message above.
  • If the garbage bytes map to part of PostgreSQL’s database buffer cache, then PostgreSQL will quite happily assume nothing is wrong and attempt to process the data on the page. Results are unpredictable; probably all sorts of ERROR messages, crashes and failure modes could result – or maybe even incorrect data returned with no ERROR message at all. (Note that this is probably the case for nearly all software… and also note that ECC is pretty good.)

How PostgreSQL Checks Page Validity

PostgreSQL has two main “validity checks” that it performs on pages. You can read the code in the function PageIsVerified() but I’ll summarize here. You can tell from your error message which validity check failed. It depends on whether you see a second additional WARNING right before the ERROR. The warning would look like this:

WARNING: page verification failed, calculated checksum 3482 but expected 32232
  1. If the above warning is not present, this means the page header failed a basic sanity check. This could conceivably be caused by both problems inside and outside of PostgreSQL.
  2. If you see the above warning (page verification failed), this means the checksum recorded in the block did not match the checksum calculated for the block. This most likely indicates that there was a problem outside of (underneath) the database – operating system, memory, networking, storage, etc.

About Basic Page Header Sanity Checks

As of when I’m writing this article in 2019, the following basic sanity checks are performed on the page header:

  • There are 32 bits reserved for page flag bits; at present only three are used and the other 29 bits should always be zero/off.
  • Every page is divided into four parts (header, free space, tuples, special space).  Offsets for the divisions are stored as 16-bit numbers in the page header; the offsets should go in order and should not have a value pointing off the page.
  • The offset of the special space should always be aligned.

About PostgreSQL Checksums

PostgreSQL version 9.3 (released in 2013) added the ability to calculate a checksum on data pages and store the checksum in the page. There are two inputs to the checksum: (1) every single byte of the data page, with zeros in the four bytes where the checksum will be stored later and (2) the page offset/address. This means that PostgreSQL doesn’t just detect if a byte is changed in the page – it also detects if a perfectly valid page gets somehow dropped into the wrong place.

Checksums are not maintained for blocks while they are in the shared buffers – so if you look at a buffer in the PostgreSQL page cache with pageinspect and you see a checksum value, note that it’s probably just leftover from the last read but wouldn’t have been maintained when the page was changed. The checksum is calculated and stamped onto the page when the page is written out of the buffer cache into the operating system page cache (remember the pages get flushed to disk later by a separate fsync call).

The checksum algorithm is specifically designed to take advantage of SIMD instructions. The slowest part of the algorithm is multiplication, so if possible PostgreSQL will be compiled to perform 32 multiplications at a time in parallel. In 2013 there were two platforms specifically documented to support this: x86 SSE4.1 and ARM NEON. The implementation is specifically tuned for optimal performance on x86 SSE. This is an important point actually – particularly for folks running PostgreSQL on embedded devices – PostgreSQL certainly compiles and works well on a lot of platforms, but evidently the checksum implementation is explicitly optimized to run the fastest on Intel. (To be clear… I think everyone should use checksums on every platform unless you have a really really good reason not to – just don’t be surprised if you start comparing benchmarks with Intel and you see a difference!)

For folks interested in digging a little more into the history… here’s the original commit using a CRC algorithm which never shipped in an actual PostgreSQL release (Simon Riggs, Jeff Davis and Greg Smith), here’s the subsequent commit introducing the FNV-1a algorithm instead of CRC which is what we still use today (Ants Aasma), and here’s the commit doing the major refactor which moved the algorithm into a header file for easier external use (Tom Lane).

More Ways To Check Validity

At the SEAPUG meetup this led to a discussion about checking validity. Do checksums cover everything? (hint: no) Are there more ways we can validate our databases? (hint: yes)

I haven’t yet made a full list of which files are covered by checksums and which ones aren’t, but I know that not everything is. For example: I’m pretty sure that the visiblity map and SLRU files aren’t covered with checksums. But for what it’s worth, there are two extra tools we can use verification in PostgreSQL databases:

  • The amcheck extension can scan a B-Tree index for a number of logical problems – for example, verifying that all B-Tree pages have items in “logical” order. (This could be useful, for example, if you’re not specifying ICU collation and you recently upgraded your operating system collation libraries… since PostgreSQL uses OS collation by default.)
  • The pg_visibility_map extension includes two functions to check for corruption in the visibility map – pg_check_frozen() and pg_check_visible().

Responding To Checksum Failures

Finally, what if you actually run into a checksum failure? What should you do, and are there any additional tools you should know about?

First of all – on any database – there are a few things you should always do immediately when you see something indicating that a data corruption may have occurred:

  • Verify that your backup retention and your log file retention are sufficiently long – I recommend at least a month (this is a Happiness Hint). You never know how long the investigation might take, or how long ago something important might have happened.
  • Take a moment to articulate and write out the impact to the business. (Are important queries failing right now? Is this causing an application outage?) This seems small but it can be very useful in making decisions later. Don’t exaggerate the impact but don’t understate it either. It can also be helpful to note important timelines that you already know about. For example: management is willing to use yesterday’s backup and take a day of data loss to avoid an 12 hour outage, but not to avoid a 4 hour outage …or… management needs a status update at 11:00am Eastern Time.
  • If there’s a larger team working on the system in question, communicate a freeze on changes until things are resolved.
  • Make a list or inventory of all copies of the data. Backups, physical replicas or hot standbys, logical replicas, copies, etc. Sometimes the very process of making this list can immediately give you ideas for simple solutions (like checking if a hot standby has the block intact). The next thing you probably want to do is check all items in this list to see if they have a valid copy of the data in question. Do not take any actions to remediate the problem right away, collect all of the information first. The data you collect now might useful during RCA investigation after you’re back online.
  • If there was one checksum failure, then you probably want to make sure there aren’t more.
    • If it’s a small database, consider whether you can scan the whole thing and verify the checksum on every single block.
    • If it’s a large database, consider whether you can at least scan all of the system/catalog tables and perhaps scan the tables which are throwing errors in their entirety. (PostgreSQL stops on the first error, so there isn’t an easy way to check if other blocks in the same table also have checksum problems.)
  • A few general best practices… (1) have a second person glance at your screen before you execute any actual changes, (2) never delete anything but always rename/move instead, (3) when investigating individual blocks, also look at the block before and the block after to verify they look like what you’d normally expect, (4) test the remediation plan before running it in production, and (5) document everything. If you’ve never seen Paul Vallée’s FIT-ACER checklist then it’s worth reviewing.

There’s no single prescriptive process for diagnosing the scope of the problem and finding the right path forward for you. It involves learning what you need to know about PostgreSQL, a little creative thinking about possible resolutions, and balancing the needs of the business (for example, how long can you be down and how much data loss can you tolerate).

That being said, there are a few tools I know about which can be very useful in the process. (And there are probably more that I don’t know about; please let me know and I can add them to this list!)

Data investigation:

Unix/Linux Commands
You might be surprised at what you can do with the tools already installed on your operating system. I’ve never been on a unix system that didn’t have dd and od installed, and I find that many Linux systems have hexdump and md5sum installed as well. A few examples of how useful these tools are: dd can extract the individual block with invalid data on the primary server and extract the same block on the hot standby, then od/hexdump can be used to create a human-readable textual dump of the binary data. You can even use diff to find the differences between the blocks. If you have a standby cluster with storage-level replication then you could use md5sum to see at a glance if the blocks match. (Quick word of caution on comparing hot standbys: last I checked, PostgreSQL doesn’t seem to maintain the free space identically on hot standbys, so the checksums might differ on perfectly healthy blocks. You can still look at the diff and verify whether free space is the only difference.) Drawbacks: low-level utilities can only do binary dumps but cannot interpret the data. Also, utilities like dd are “sharp knives” – powerful tools which can cause damage if misused!

For a great example of using dd and od, read the code in Bertrand Drouvot‘s pg_toolkit script collection.
Data investigation and checksum verification:

This is a crazy awesome utility and I have no idea why it’s not in core PostgreSQL. It makes an easy-to-read textual dump of the binary contents of PostgreSQL data blocks. You can process a whole file or just specify a range of blocks to dump. It can verify checksums and it can even decode the contents of the tuples. As far as I can tell, it was originally written by Patrick Macdonald at Red Hat some time in the 2000’s and then turned over to the PostgreSQL Global Development Group for stewardship around 2011. It’s now in a dedicated repository at git.postgresql.org and it seems that Tom Lane, Christoph Berg and Teodor Sigaev keep it alive but don’t invest heavily in it. Drawbacks: be aware that it doesn’t address the race condition with a running server (see Credativ pg_checksums below). For dumping only a block with a checksum problem, this is not an issue since the server won’t let the block into its buffer cache anyway.
Checksum verification:

PostgreSQL pg_checksums
PostgreSQL itself starting in version 11 has a command-line utility to scan one relation or everything and verify the checksum on every single block. It’s called pg_verify_checksums in v11 and pg_checksums in v12. Drawbacks: first, this utility requires you to shut down the database before it will run. It will throw an error and refuse to run if the database is up. Second, you can scan a single relation but you can’t say which database it’s in… so if the OID exists in multiple databases, there’s no way to just scan the one you care about.
Checksum verification:

Credativ pg_checksums
The fine engineers of Credativ have published an enhanced version of pg_checksums which can verify checksums on a running database. It looks to me like the main case they needed to protect against was the race condition between pg_checksum reading blocks while the running PostgreSQL server was writing those same blocks. Linux of course work on a 4k page size; so if an 8k database page is half written when pg_checksum reads it then we will get a false positive. The version from credativ however is smart enough to deal with this. Drawbacks: check the github issues; there are a couple notable drawbacks but this project was only announced last week and all the drawbacks might be addressed by the time you read this article. Also, being based on the utility in PostgreSQL, the same limitation about scanning a single relation applies.

Note that both Credativ’s and PostgreSQL’s pg_checksums utilities access the control file, even when just verifying checksums. As a result, you need to make sure you compile against the same version of PostgreSQL code as the target database you’re scanning.
Checksum verification:

Satoshi Nagayasu postgres-toolkit
I’m not sure if this is still being maintained, but Satoshi Nagayasu wrote postgres-toolkit quite a long time ago which includes a checksum verification utility. It’s the oldest one I have seen so far – and it still compiles and works! (Though if you want to compile it on PostgreSQL 11 or newer then you need to use the patch in this pull request.) Satoshi’s utility also has the very useful capability of scanning an arbitrary file that you pass in – like pg_filedump but stripped down to just do the checksum verification. It’s clever enough to infer the segment number from the filename and scan the file, even if the file isn’t part of a PostgreSQL installation. This would be useful, for example, if you were on a backup server and wanted to extract a single file from your backup and check if the damaged block has valid checksum in the backup. Drawbacks: be aware that it doesn’t address the race condition with a running server.
Checksum verification:

Google pg_page_verification
Simple program; you pass in a data directory and it will scan every file in the data directory to verify the checksums on all blocks. Published to Github in early 2018. Drawbacks: be aware that it doesn’t address the race condition with a running server. Probably superseded by the built-in PostgreSQL utilities.

PostgreSQL Developer Options
PostgreSQL has hundreds of “parameters” – knobs and button you can use to configure how it runs. There are 294 entries in the pg_settings table on version 11. Buried in these parameters are a handful of “Developer Options” providing powerful (and dangerous) tools for mitigating data problems – such as ignore_checksum_failure, zero_damaged_pages and ignore_system_indexes. Read very carefully and exercise great care with these options – when not fully understood, they can have unexpected side effects including unintended data loss. Exercise particular care with the ignore_checksum_failure option – even if you set that in an individual session, the page will be readable to all connections… think of it as poisoning the buffer cache. That being said, sometimes an option like zero_damaged_pages is the fastest way to get back up and running. (Just make sure you’ve saved a copy of that block!) By the way… a trick to trigger a read of one specific block is to SELECT * FROM table WHERE ctid='(blockno,1)'

Unix/Linux Commands
I would discourage the use of dd to mitigate invalid data problems. It’s dangerous even for experienced engineers; simple mistakes can compound the problem. I can’t imagine a situation where this is a better approach than the zero_damaged_pages developer option and a query to read a specific block. That said, I have seen cases where dd was used to zero out a page.

More About Data Investigation

In order to put some of this together, I’ll just do a quick example session. I’m running PostgreSQL 11.5 an on EC2 instance and I used dd to write a few evil bytes into a couple blocks of my database.

First, lets start by just capturing the information from the log files:

$ grep "invalid page" ../log/postgresql.log|sed 's/UTC.*ERROR//'
 2019-10-15 19:53:37 :  invalid page in block 0 of relation base/16385/16493
 2019-10-16 22:26:30 :  invalid page in block 394216 of relation base/16385/16502
 2019-10-16 22:43:24 :  invalid page in block 394216 of relation base/16385/16502
 2019-11-05 23:55:33 :  invalid page in block 394216 of relation base/16385/16502
 2019-11-05 23:57:58 :  invalid page in block 394216 of relation base/16385/16502
 2019-11-05 23:59:14 :  invalid page in block 262644 of relation base/16385/16502
 2019-11-05 23:59:21 :  invalid page in block 262644 of relation base/16385/16502
 2019-11-05 23:59:22 :  invalid page in block 262644 of relation base/16385/16502
 2019-11-05 23:59:23 :  invalid page in block 262644 of relation base/16385/16502
 2019-11-06 00:01:12 :  invalid page in block 262644 of relation base/16385/16502
 2019-11-06 00:01:16 :  invalid page in block 0 of relation base/16385/16493
 2019-11-06 00:02:05 :  invalid page in block 250 of relation base/16385/16492

With a little command-line karate we can list each distinct block and see the first time we got an error on that block:

$ grep "invalid page" ../log/postgresql.log |
sed 's/UTC.*ERROR//' |
awk '{print $1" "$2" "$11" invalid_8k_block "$8" segment "int($8/131072)" offset "($8%131072)}' |
sort -k3,5 -k1,2 |
uniq -f2
   2019-11-06 00:02:05 base/16385/16492 invalid_8k_block 250 segment 0 offset 250
   2019-10-15 19:53:37 base/16385/16493 invalid_8k_block 0 segment 0 offset 0
   2019-11-05 23:59:14 base/16385/16502 invalid_8k_block 262644 segment 2 offset 500
   2019-10-16 22:26:30 base/16385/16502 invalid_8k_block 394216 segment 3 offset 1000 

So we know that there are at least 4 blocks corrupt. Lets scan the whole data directory using Credativ’s pg_checksum (without shutting down the database) to see if there are any more blocks with bad checksums:

$ pg_checksums -D /var/lib/pgsql/11.5/data |& fold -s
 pg_checksums: error: checksum verification failed in file
 "/var/lib/pgsql/11.5/data/base/16385/16502.3", block 1000: calculated checksum
 2ED4 but block contains 4EDF
 pg_checksums: error: checksum verification failed in file
 "/var/lib/pgsql/11.5/data/base/16385/16502.3", block 1010: calculated checksum
 9ECF but block contains ACBE
 pg_checksums: error: checksum verification failed in file
 "/var/lib/pgsql/11.5/data/base/16385/16502.2", block 500: calculated checksum
 5D6 but block contains E459
 pg_checksums: error: checksum verification failed in file
 "/var/lib/pgsql/11.5/data/base/16385/16493", block 0: calculated checksum E7E4
 but block contains 78F9
 pg_checksums: error: checksum verification failed in file
 "/var/lib/pgsql/11.5/data/base/16385/16492", block 250: calculated checksum
 44BA but block contains 3ABA
 Checksum operation completed
 Files scanned:  1551
 Blocks scanned: 624158
 Bad checksums:  5
 Data checksum version: 1

Ah-ha… there was one more bad checksum which didn’t show up in the logs! Next lets choose one of the bad blocks and dump the contents using unix command line tools.

$ dd status=none if=base/16385/16492 bs=8192 count=1 skip=250 | od -A d -t x1z -w16
0000000 00 00 00 00 e0 df 6b b0 ba 3a 04 00 0c 01 80 01  >......k..:......<
0000016 00 20 04 20 00 00 00 00 80 9f f2 00 00 9f f2 00  >. . ............<
0000032 80 9e f2 00 00 9e f2 00 80 9d f2 00 00 9d f2 00  >................<
0000048 80 9c f2 00 00 9c f2 00 80 9b f2 00 00 9b f2 00  >................<
0000064 80 9a f2 00 00 9a f2 00 80 99 f2 00 00 99 f2 00  >................<
0000080 80 98 f2 00 00 98 f2 00 80 97 f2 00 00 97 f2 00  >................<

Here we see the page header and the beginning of the line pointers. One thing I think it’s easy to remember is that the first 8 bytes are the page LSN and the next 2 bytes are the page checksum. Notice that the page checksum bytes contain “ba 3a” which matches the error message from the scan above (3ABA). Sometimes it can be useful to know just the very top of the page even if you don’t remember anything else!

This is useful, but lets try the pg_filedump utility next. This utility takes a lot of options. In this example I’m going to ask it to verify the checksum (-k), only scan one block at offset 250 (-R 250 250) and even to decode the tuples (table row data) to a human-readable format (-D int,int,int,charN). There’s another argument (-f) that can even tell pg_filedump to show hexdump/od style raw data inline but I won’t demonstrate that here.

$ pg_filedump -k -R 250 250 -D int,int,int,charN base/16385/16492

* PostgreSQL File/Block Formatted Dump Utility - Version 11.0
* File: base/16385/16492
* Options used: -k -R 250 250 -D int,int,int,charN
* Dump created on: Fri Nov  8 21:48:38 2019

Block  250 ********************************************************
----- Block Offset: 0x001f4000 Offsets: Lower 268 (0x010c) Block: Size 8192 Version 4 Upper 384 (0x0180) LSN: logid 0 recoff 0xb06bdfe0 Special 8192 (0x2000) Items: 61 Free Space: 116 Checksum: 0x3aba Prune XID: 0x00000000 Flags: 0x0004 (ALL_VISIBLE) Length (including item array): 268 Error: checksum failure: calculated 0x44ba. ------ Item 1 -- Length: 121 Offset: 8064 (0x1f80) Flags: NORMAL COPY: 15251 1 0 Item 2 -- Length: 121 Offset: 7936 (0x1f00) Flags: NORMAL COPY: 15252 1 0 Item 3 -- Length: 121 Offset: 7808 (0x1e80) Flags: NORMAL COPY: 15253 1 0

That was the block header and the first few item. (Item pointer data is displayed first, then the table row data itself is displayed on the following line after the word COPY.) Looking down a little bit, we can even see where I wrote the bytes “budstuff” into a random location in this block – it turns out those bytes landed in the middle of a character field. This means that without checksums, PostgreSQL would not have thrown any errors at all but just returned an incorrect string the next time that row was queried!

COPY: 15297	1	0
 Item  48 -- Length:  121  Offset: 2048 (0x0800)  Flags: NORMAL
COPY: 15298	1	0
 Item  49 -- Length:  121  Offset: 1920 (0x0780)  Flags: NORMAL
COPY: 15299	1	0	                                           badstuff
 Item  50 -- Length:  121  Offset: 1792 (0x0700)  Flags: NORMAL
COPY: 15300	1	0

It’s immediately clear how useful this is (and easier to read ). The part where it decodes the table row data into a human readable form is an especially cool trick. Two notes about this.

  • First, the lines are prefixed with the word COPY for a reason – they are actually intended to be formatted so you can grep on the word COPY and then use the “copy” command (or it’s psql cousin) to feed the data directly back into a staging table in the database for cleanup. How cool is that!
  • Second, it can decode only a set of fairly standard data types and relation types.
$ pg_filedump -h

Version 11.0 (for PostgreSQL 11.x)
Copyright (c) 2002-2010 Red Hat, Inc.
Copyright (c) 2011-2018, PostgreSQL Global Development Group

Usage: pg_filedump [-abcdfhikxy] [-R startblock [endblock]] 
          [-D attrlist] [-S blocksize] [-s segsize] [-n segnumber] file

Display formatted contents of a PostgreSQL heap/index/control file
Defaults are: relative addressing, range of the entire file, block
               size as listed on block 0 in the file

The following options are valid for heap and index files:
  -D  Decode tuples using given comma separated list of types
      Supported types:
        bigint bigserial bool char charN date float float4 float8 int
        json macaddr name oid real serial smallint smallserial text
        time timestamp timetz uuid varchar varcharN xid xml
      ~ ignores all attributes left in a tuple

Now you see what happens when I start having fun… a quick update about our SEAPUG meetup this past Tuesday turned into a blog article that’s way too long. :) Hope it’s useful, and as always let me know what I can improve!

To prevent automated spam submissions leave this field empty.