Oakies Blog Aggregator

kevinclosson's picture

Announcing pgio (The SLOB Method for PostgreSQL) Is Released Under Apache 2.0 and Available at GitHub

This is just a quick post to advise readers that I have released pgio (The SLOB Method for PostgreSQL) under Apache 2.0. The bits are available at the following link: https://github.com/therealkevinc/pgio/releases/tag/1.0. The README is quite informative.

My last testing before the release showed “out of the box” data loading into Amazon Aurora with PostgreSQL compatibility at a rate of 1.69 TB/h. I only modified the pgio.conf file to specify the connection string and to set scale to 128 GB per schema:

#000000;" src="https://kevinclosson.files.wordpress.com/2019/09/pgio-aurora-setup.png?w=500&h=159" alt="" width="500" height="159" srcset="https://kevinclosson.files.wordpress.com/2019/09/pgio-aurora-setup.png?w=500&h=159 500w, https://kevinclosson.files.wordpress.com/2019/09/pgio-aurora-setup.png?w... 1000w, https://kevinclosson.files.wordpress.com/2019/09/pgio-aurora-setup.png?w... 150w, https://kevinclosson.files.wordpress.com/2019/09/pgio-aurora-setup.png?w... 300w, https://kevinclosson.files.wordpress.com/2019/09/pgio-aurora-setup.png?w... 768w" sizes="(max-width: 500px) 100vw, 500px" />

After loading data I edited pgio.conf to increase the number of threads per schema to 16 and then easily drove IOPS to the current, advertised IOPS limit of 120,000 for Amazon Aurora with PostgreSQL compatibility.

#000000;" src="https://kevinclosson.files.wordpress.com/2019/09/pgio-aurora-runit.png?w=500&h=154" alt="" width="500" height="154" srcset="https://kevinclosson.files.wordpress.com/2019/09/pgio-aurora-runit.png?w=500&h=154 500w, https://kevinclosson.files.wordpress.com/2019/09/pgio-aurora-runit.png?w... 1000w, https://kevinclosson.files.wordpress.com/2019/09/pgio-aurora-runit.png?w... 150w, https://kevinclosson.files.wordpress.com/2019/09/pgio-aurora-runit.png?w... 300w, https://kevinclosson.files.wordpress.com/2019/09/pgio-aurora-runit.png?w... 768w" sizes="(max-width: 500px) 100vw, 500px" />

Testing PostgreSQL physical I/O on any platform could not be any easier, repeatable, nor understandable.

Enjoy pgio!

 

 

Uwe Hesse's picture

Why I prefer VirtualBox over Hyper-V on my notebook

https://uhesse.files.wordpress.com/2019/09/vboxvshyperv.png?w=150&h=88 150w, https://uhesse.files.wordpress.com/2019/09/vboxvshyperv.png?w=300&h=176 300w, https://uhesse.files.wordpress.com/2019/09/vboxvshyperv.png?w=768&h=450 768w, https://uhesse.files.wordpress.com/2019/09/vboxvshyperv.png?w=1024&h=600 1024w, https://uhesse.files.wordpress.com/2019/09/vboxvshyperv.png 1029w" sizes="(max-width: 620px) 100vw, 620px" />

For development, demonstrations and testing, I need different database environments: Oracle, Postgres and Exasol in the first place. Having them available as VMs on my notebook is quite convenient. I consider my current corporate notebook an upper middleclass one. It’s a Dell Latitude 7480 with 2 cores, an SSD disk and 16 GB memory running Windows 10. Not too shabby but also not extremely powerful.

After having used VirtualBox for years, recently an opportunity came up to become a bit more familiar with Hyper-V, because one of our customers insisted to use only that for a team training. Yes, I’m a bit biased towards VirtualBox. Why do i prefer it over Hyper-V? Because it’s way faster for what I do with it. Especially, I observed that Hyper-V consumes much more CPU resources for the VMs than VirtualBox does. And that slows down everything of course.

For example, when I do an Exasol cluster node installation, it takes more than 30 Minutes with Hyper-V compared to 5 Minutes with VirtualBox! And the setup is the same for both: On my notebook, I create 4 VMs: 1 license server with 1500 MB memory and 3 data nodes each with 2500 MB memory. Each VM gets 1 virtual core. That’s no problem for VirtualBox but Hyper-V struggles and raises the CPU utilization on my notebook to 100% or close during the whole install.

In general, both Hyper-V and VirtualBox can do the same or very similar things. I’m sure there are use cases where Hyper-V performs well on a notebook too, and probably it’s better suited for dedicated virtualization servers than for a notebook anyway. So don’t get me wrong: I do not say VirtualBox is better than Hyper-V overall.

But if you want to run database sandboxes on your notebook, I strongly recommend to use VirtualBox instead of Hyper-V.

kevinclosson's picture

Announcing SLOB 2.5 for Download at Github.

This is just a quick blog post to announce that SLOB 2.5 is now available for downloading at: https://github.com/therealkevinc/SLOB_distribution/tree/SLOB_2.5.0.

There is an important bug fix in this release that corrects redo logging payload generation when testing with non-zero slob.conf->UPDATE_PCT.  I recommend downloading and using this release accordingly.  The bug is described in the release notes.

A special thanks to Maris Elsins for finding, blogging and reporting the bug.

If you adopt this release there is no need to reload SLOB (via setup.sh). Data loaded with SLOB 2.4 is compatible with SLOB 2.5. Simply deploy the tar archive and bring over your slob.conf and you’re ready to test with SLOB 2.5.

Uwe Hesse's picture

How to add a reserve node to an existing 2+0 #Exasol Cluster

After having installed a 2+0 Cluster in Hyper-V, now let’s expand that to a 2+1 Cluster.

Add the node as a VM first

Add another VM in Hyper-V with the same attributes as the existing two data nodes n11 and n12 and name it n13:

  • 2500 MB memory, not dynamically extending
  • Legacy network adapter, connected to edu-cluster-interconnect
  • Network adapter, connected to edu-public-network
  • Boot order with legacy network adapter first
  • Two hard disk of type VHDX with max. size 100 GB

Give it a static MAC for the Legacy Network Adapter 08:00:27:58:03:21 and a static MAC for normal Network Adapter 08:00:27:71:27:26 and make sure they do not conflict with other existing MAC addresses in your environment.

Copy an existing node in EXAoperation

Go to the nodes branch and click on the link under n11:

https://uhesse.files.wordpress.com/2019/09/addnode1.png?w=150&h=66 150w, https://uhesse.files.wordpress.com/2019/09/addnode1.png?w=300&h=133 300w, https://uhesse.files.wordpress.com/2019/09/addnode1.png?w=768&h=340 768w, https://uhesse.files.wordpress.com/2019/09/addnode1.png 930w" sizes="(max-width: 620px) 100vw, 620px" />

On the nodes detail page, click on Copy:

https://uhesse.files.wordpress.com/2019/09/addnode2.png?w=141&h=150 141w, https://uhesse.files.wordpress.com/2019/09/addnode2.png?w=282&h=300 282w, https://uhesse.files.wordpress.com/2019/09/addnode2.png 677w" sizes="(max-width: 620px) 100vw, 620px" />

Enter 13 as the node numbers and change the MAC addresses as listed above, then click on Copy Node:

https://uhesse.files.wordpress.com/2019/09/addnode3.png?w=150 150w, https://uhesse.files.wordpress.com/2019/09/addnode3.png?w=300 300w" sizes="(max-width: 387px) 100vw, 387px" />

Click on the n0013(13) link and check the disk configuration of the new node. It should look like this:

https://uhesse.files.wordpress.com/2019/09/addnode4.png?w=150&h=23 150w, https://uhesse.files.wordpress.com/2019/09/addnode4.png?w=300&h=46 300w, https://uhesse.files.wordpress.com/2019/09/addnode4.png?w=768&h=118 768w, https://uhesse.files.wordpress.com/2019/09/addnode4.png 967w" sizes="(max-width: 620px) 100vw, 620px" />

Don’t forget to set the install flag

On the nodes branch, tick the checkbox of node n13 and Execute the action Set install flag for it:

https://uhesse.files.wordpress.com/2019/09/addnode5.png?w=150&h=65 150w, https://uhesse.files.wordpress.com/2019/09/addnode5.png?w=300&h=131 300w, https://uhesse.files.wordpress.com/2019/09/addnode5.png?w=768&h=335 768w, https://uhesse.files.wordpress.com/2019/09/addnode5.png 907w" sizes="(max-width: 620px) 100vw, 620px" />

The state changes to Unknown To install. Now power on n13 in Hyper-V.

After a while the logservice should display the new node getting installed:

https://uhesse.files.wordpress.com/2019/09/addnode6.png?w=150&h=46 150w, https://uhesse.files.wordpress.com/2019/09/addnode6.png?w=300&h=92 300w, https://uhesse.files.wordpress.com/2019/09/addnode6.png?w=768&h=235 768w, https://uhesse.files.wordpress.com/2019/09/addnode6.png?w=1024&h=314 1024w, https://uhesse.files.wordpress.com/2019/09/addnode6.png 1135w" sizes="(max-width: 620px) 100vw, 620px" />

This can be time consuming (took me more than 30 Minutes on my notebook ) but in the end, it lists: Boot process finished after x seconds. When you refresh the nodes branch then, the state of the node n13 changes to Running To Install.

Now tick the checkbox of n13 again and Execute the action Set active flag:

https://uhesse.files.wordpress.com/2019/09/addnode7.png?w=150&h=82 150w, https://uhesse.files.wordpress.com/2019/09/addnode7.png?w=300&h=164 300w, https://uhesse.files.wordpress.com/2019/09/addnode7.png 715w" sizes="(max-width: 620px) 100vw, 620px" />

The state changes to Running Active for n13, same as for the other nodes. Notice that you can’t tell from this page if a node is an active node or a reserve node.

Add the disk capacity of the new node to the storage service

On the Storage branch, tick the checkbox for n13 and click on Add Unused Disks:

https://uhesse.files.wordpress.com/2019/09/addnode8.png?w=150&h=55 150w, https://uhesse.files.wordpress.com/2019/09/addnode8.png?w=300&h=109 300w, https://uhesse.files.wordpress.com/2019/09/addnode8.png?w=768&h=280 768w, https://uhesse.files.wordpress.com/2019/09/addnode8.png?w=1024&h=373 1024w, https://uhesse.files.wordpress.com/2019/09/addnode8.png 1035w" sizes="(max-width: 620px) 100vw, 620px" />

Add the new node as reserve node to the running database

On the database detail pages, click Edit and add n13 as a Reserve node, then click Apply:

https://uhesse.files.wordpress.com/2019/09/addnode9.png?w=73 73w, https://uhesse.files.wordpress.com/2019/09/addnode9.png?w=145 145w" sizes="(max-width: 503px) 100vw, 503px" />

The database should look like this now:

https://uhesse.files.wordpress.com/2019/09/addnode10.png?w=150&h=31 150w, https://uhesse.files.wordpress.com/2019/09/addnode10.png?w=300&h=62 300w, https://uhesse.files.wordpress.com/2019/09/addnode10.png?w=768&h=160 768w, https://uhesse.files.wordpress.com/2019/09/addnode10.png 898w" sizes="(max-width: 620px) 100vw, 620px" />

Notice that you didn’t need to shutdown the database to add a reserve node to it.

Now your cluster has been extended to a 2+1 Cluster. Next article will explain how you can enlarge the database and make this a 3+0 Cluster.

Stay tuned </p />
</p></div>
    <div class=»

Franck Pachot's picture

The Oracle Cloud Free Tier

The New “Always Free Service”s announced at OOW19

Every software vendor has also some free offers, to attract users, demonstrate their product, and support advocacy. What is free at Oracle? Today, the target is about the products which help to attract developers. We have the Oracle XE database that can be installed everywhere for free, with some limits on the capacity, but mostly every features. There are the developer tools that ease the use of the database, like SQL Developer. But what about Cloud?

Cloud free trials and promotions

You may have tested the 30-days free trial, and find it not so easy as it is for only one month, with an e-mail address, phone number, and credit card information that you cannot reuse. The limit on credit is not a problem as they are burned slowlier than in paid subscription (good to test many features, not good to evaluate the real price). As an ACE Director, I have access to longer trials. Actually, those trials are just promotions: you subscribe like for a paid account but are not be charged.

view-source:https://myservices.us.oraclecloud.com/mycloud/signup?language=en&sourceType=:ow:o:p:feb:0916FreePageBannerButton&intcmp=:ow:o:p:feb:0916FreePageBannerButton

As far as I know, there are 5 types of promotions.

  • “Free Trial” : $300 / 30 days
  • “Developer” : $500 / 30 days
  • “Student” : $5000 / 365 days
  • “Startup” : $100000 / 365 days
  • “Educator” : $25000 / 365 days

Where did I get this information from? It is just a guess when looking at the trial sign-up form source code which contains a JavaScript “promoMap”

Where did I get to this sign-up page? I just clicked on “Start for free” in the new Oracle Cloud Free Tier that was just announced by Larry Elison at Oracle Open World 2019. And that’s the purpose of this blog post.

The 30-days “Free Trial” is the one available from the Oracle website (the annoying pop-up that you get even when reading Oracle blogs). The “Developer” one can be available for Hands-On Labs. The “Student” is for Oracle University customers, the “Educator” is for the Oracle University instructors. The “Student” is also the one we can get through the ACED program. The “Startup” one has higher limits (like 20 OCPU instead of 6 in the other promotions)

Oracle Cloud Free Tier

Here it is, an extension of the current free trial (300$ on mostly all services, up to 8 instances and 5TB, for 30 days) where, in addition to this free trial, some services are offered free for un unlimited time.

https://www.oracle.com/cloud/free/

Oracle Cloud Free Tier

It is an extension. You still need to create a trial account (with a new e-mail, phone number, credit card) but beyond the trial you will continue to have access to some free service, forever.

What is free?

The unlimited free tier lets you create at most 2 database services and 2 compute services.

2 database services

Those are the common autonomous services: ATP serverless (for OLTP) and ADW (for datawarehouse). They come with many tools for administration, development and reporting: APEX (low code rapid application development), SQL Developer (Web version), Zepplin notebooks (through Oracle Machine Learning),…

Each database service is limited to 1 OCPU and can go up to 20 GB.

2 compute services

Each VM is limited to 1/8th of OCPU and 1 GB RAM.

It includes the Block storage (100GB) for 2 volumes (to be associated to the 2 VMs), Object Storage (10GB), Archive storage (10GB), and one load balancer (10 Mbps).

How free and unlimited?

First, you must sign-up for the 30-days trial, where you have to provide credit card information.

But you will not be billed.

What you do with the 30-days trial can be upgraded later to a paid subscription. Or not, and you still keep the free tier.

You need to provide a unique e-mail, phone, and credit card. You need to access the free service at least every 3 months or it can be removed. Note that nothing prevents you to run production on it (except the limits of course).

More info on the “Always Free Cloud Services” in the Universal Credits document:

Here is what I get after the subscription:

Jonathan Lewis's picture

Updatable Join Views

Here’s a quick “how to”.

If you want to update a column in table A with a column value from table B, then there’s a simple way to check if the required result can be achieved through an updatable join view.

Step 1: write a query that joins table A to table B and reports the rows in table A that you want to update, with the value from table B that should be used to update them, e.g.


select  a.rowid, a.col1, b.col2 
from    
        tableA a,
        tableB b
where
        a.status = 'Needs Update'
and     b.colX   = a.colX
and     b.colY   = a.colY
and     b.colZ   = a.colZ
/

Step 2: If there is a uniqueness constraint (or suitable index) on table B (the table from which you are copying a value) that enforces the restriction that there should be at most one row in B for any combination of the join columns (colX, colY, colZ) then you can take this query, and turn it into an inline-view in an update statement:


update (
        select a.rowid, a.col1, b.col2 
        from    
                tableA a,
                tableB b
        where
                a.status = 'Needs Update'
        and     b.colX   = a.colX
        and     b.colY   = a.colY
        and     b.colZ   = a.colZ
)  v
set     v.col1 = v.col2
/

If there is nothing enforcing the uniqueness of (colX, colY, colZ) this statement will result in Oracle raising error ORA-01779 “cannot modify a column which maps to a non key-preserved table”. This error will appear even if there are currently no actual duplicates in table B that could cause a problem.

Footnote

This example ignores the extra bit of mess that is needed to deal with the case where B rows are supposed to match A rows when the columns in the join predicates can be null; but that just means your original query will probably have to include some predicates like (b.colX = a.colX or (a.colX is null and b.colX is null)) or make use of the sys_op_map_nonnull() function.

 

Jonathan Lewis's picture

Little sleeps

A peripheral question in a recent comment (made in response to me asking whether a loop had been written with a sleep time of 1/100th or 1/1000th of a second) asked “How do you sleep for 1/1000th of a second in pure PL/SQL?”

The answer starts with “How pure is pure ?” Here’s a “pure” PL/SQL solution that “cheats” by calling one of the routines in Oracle’s built-in Java library:


create or replace procedure milli_sleep( i_milliseconds in number) 
as 
        language java
        name 'java.lang.Thread.sleep(long)';
/


create or replace procedure nano_sleep( i_milliseconds in number, i_nanoseconds in number)
as
        language java
        name 'java.lang.Thread.sleep(long, int)';
/

prompt  Milli sleep
prompt  ===========
execute milli_sleep(18)

prompt  Nano sleep
prompt  ==========
execute  nano_sleep(0,999999)


The “nano-second” component of the nano_sleep() procedure is restricted to the ranage 0 – 999999. In both cases the “milli-second” component has to be positive.

Whether your machine is good at handling sleeps of less than 1/100th of a second is another question, of course.

Update – due to popular demand

If you want to find out what else is available in the database you can query view all_java_methods searching by partial name (which is very slow) for something you think might exist, for example:

SQL> select owner, name , method_name from all_java_methods where upper(method_name) like '%MILLI%'

OWNER           NAME                                     METHOD_NAME
--------------- ---------------------------------------- ----------------------------------------
SYS             java/util/concurrent/TimeUnit$4          toMillis
SYS             java/util/concurrent/TimeUnit$5          toMillis
SYS             java/util/concurrent/TimeUnit$6          toMillis
SYS             java/util/concurrent/TimeUnit$7          toMillis
SYS             java/util/concurrent/TimeUnit            toMillis
SYS             java/lang/System                         currentTimeMillis
SYS             javax/swing/ProgressMonitor              setMillisToDecideToPopup
SYS             javax/swing/ProgressMonitor              getMillisToDecideToPopup

There’s a lot more than the few listed above – but I just wanted to pick up currentTimeMillis. If you spot something that looks interesting the easiest next step is probably to do a google search with (for example): Oracle java.lang.system currenttimemillis (alternatively you could just keep a permanent link to Oracle’s manual pages for the Java and serarch them. In my case this link was high on the list of google hits, giving me the following method description:


static long 	currentTimeMillis​() 	Returns the current time in milliseconds.

Conveniently this is easy to embed in pl/sql (be careful with case sensitivity):


create or replace function milli_time return number
as
        language java
        name 'java.lang.System.currentTimeMillis() return long';
/

execute dbms_output.put_line(milli_time)
execute dbms_lock.sleep(1)
execute dbms_output.put_line(milli_time)

----------------

SQL> @ java_procs

Function created.

1568719117725

PL/SQL procedure successfully completed.


PL/SQL procedure successfully completed.

1568719118734

PL/SQL procedure successfully completed.

You now have a PL/SQL function that will return the number of millisecond since 1st January 1970.

Jonathan Lewis's picture

Nologging

Bobby Durrett recently published a note about estimating the volume of non-logged blocks written by an instance with the aim of getting some idea of the extra redo that would be generated if a database were switched to “force logging”.

Since my most recent blog notes have included various extracts and summaries from the symbolic dumps of redo logs it occurred to me that another strategy for generating the same information would be to dump the redo generated by Oracle when it wanted to log some information about non-logged blocks. This may sound like a contradiction, of course, but it’s the difference between data and meta-data: if Oracle wants to write data blocks to disc without logging their contents it needs to write a note into the redo log saying “there is no log of the contents of these blocks”.

In terms of redo op codes this is done through “layer 19”, the set of op codes relating to direct path loads, with op code 19.2 being the specific “invalidate range” one that we are (probably)interested in.

So here’s a little demo of extracting the information we need:

rem
rem     Script:         nologging_writes.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sep 2019
rem
rem     Last tested 
rem             12.2.0.1
rem

column c_scn new_value m_scn_1
select to_char(current_scn,'999999999999999999999999') c_scn from v$database;

create table t1 nologging
as
select  * 
from    all_objects
where   rownum <= 10000
/

column c_scn new_value m_scn_2
select to_char(current_scn,'999999999999999999999999') c_scn from v$database;

alter session set tracefile_identifier = 'TABLE';
alter system dump redo scn min &m_scn_1 scn max &m_scn_2 layer 19;


create index t1_i1
on t1(object_name, owner, object_id)
pctfree 80
nologging
/

column c_scn new_value m_scn_3
select to_char(current_scn,'999999999999999999999999') c_scn from v$database;

alter session set tracefile_identifier = 'INDEX';
alter system dump redo scn min &m_scn_2 scn max &m_scn_3 layer 19;


insert /*+ append */ into t1
select * from t1
/

column c_scn new_value m_scn_4
select to_char(current_scn,'999999999999999999999999') c_scn from v$database;

alter session set tracefile_identifier = 'APPEND';
alter system dump redo scn min &m_scn_3 scn max &m_scn_4 layer 19;


I’ve executed a “create table nologging”, a “create index nologging”, then an “insert /*+ append */” into the nologging table. I’ve captured the current SCN before and after each call, added an individual identifier to the tracefile name for each call, then dumped the redo between each pair of SCNs, restricting the dump to layer 19. (I could have been more restrictive and said “layer 19 opcode 2”, but there is an opcode 19.4 which might also be relevant – though I don’t know when it might appear.)

Here’s the list of trace files I generated, plus a couple extra that appeared around the same time:

 ls -ltr *.trc | tail -6
-rw-r----- 1 oracle oinstall 361355 Sep 12 19:44 orcl12c_ora_23630.trc
-rw-r----- 1 oracle oinstall   5208 Sep 12 19:44 orcl12c_ora_23630_TABLE.trc
-rw-r----- 1 oracle oinstall  27434 Sep 12 19:44 orcl12c_ora_23630_INDEX.trc
-rw-r----- 1 oracle oinstall   2528 Sep 12 19:44 orcl12c_ora_23630_APPEND.trc
-rw-r----- 1 oracle oinstall 162633 Sep 12 19:45 orcl12c_mmon_3042.trc
-rw-r----- 1 oracle oinstall 162478 Sep 12 19:45 orcl12c_gen0_2989.trc


And having identified the trace files we can now extract the block invalidation records (I’ve inserted blank lines to separate the results from the three separate files):

grep OP orcl12c_ora_23630_*.trc

orcl12c_ora_23630_APPEND.trc:CHANGE #1 INVLD CON_ID:3 AFN:22 DBA:0x058001bd BLKS:0x0043 OBJ:125947 SCN:0x00000b860da6e1a5 SEQ:1 OP:19.2 ENC:0 FLG:0x0000
orcl12c_ora_23630_APPEND.trc:CHANGE #1 INVLD CON_ID:3 AFN:22 DBA:0x05800482 BLKS:0x006e OBJ:125947 SCN:0x00000b860da6e1a5 SEQ:1 OP:19.2 ENC:0 FLG:0x0000

orcl12c_ora_23630_INDEX.trc:CHANGE #1 INVLD CON_ID:3 AFN:22 DBA:0x058000c4 BLKS:0x0004 OBJ:125948 SCN:0x00000b860da6e162 SEQ:1 OP:19.2 ENC:0 FLG:0x0000
orcl12c_ora_23630_INDEX.trc:CHANGE #1 INVLD CON_ID:3 AFN:22 DBA:0x058000c8 BLKS:0x0004 OBJ:125948 SCN:0x00000b860da6e162 SEQ:1 OP:19.2 ENC:0 FLG:0x0000
...
...     70 lines deleted
...
orcl12c_ora_23630_INDEX.trc:CHANGE #1 INVLD CON_ID:3 AFN:22 DBA:0x05800424 BLKS:0x0004 OBJ:125948 SCN:0x00000b860da6e181 SEQ:1 OP:19.2 ENC:0 FLG:0x0000
orcl12c_ora_23630_INDEX.trc:CHANGE #1 INVLD CON_ID:3 AFN:22 DBA:0x05800428 BLKS:0x0004 OBJ:125948 SCN:0x00000b860da6e181 SEQ:1 OP:19.2 ENC:0 FLG:0x0000
orcl12c_ora_23630_INDEX.trc:CHANGE #1 MEDIA RECOVERY MARKER CON_ID:3 SCN:0x0000000000000000 SEQ:0 OP:18.3 ENC:0 FLG:0x0000

orcl12c_ora_23630_TABLE.trc:CHANGE #1 INVLD CON_ID:3 AFN:22 DBA:0x05800103 BLKS:0x000d OBJ:125947 SCN:0x00000b860da6e13e SEQ:1 OP:19.2 ENC:0 FLG:0x0000
orcl12c_ora_23630_TABLE.trc:CHANGE #1 INVLD CON_ID:3 AFN:22 DBA:0x05800111 BLKS:0x000f OBJ:125947 SCN:0x00000b860da6e140 SEQ:1 OP:19.2 ENC:0 FLG:0x0000
orcl12c_ora_23630_TABLE.trc:CHANGE #1 INVLD CON_ID:3 AFN:22 DBA:0x05800121 BLKS:0x0007 OBJ:125947 SCN:0x00000b860da6e141 SEQ:1 OP:19.2 ENC:0 FLG:0x0000
orcl12c_ora_23630_TABLE.trc:CHANGE #1 INVLD CON_ID:3 AFN:22 DBA:0x05800268 BLKS:0x0008 OBJ:125947 SCN:0x00000b860da6e142 SEQ:1 OP:19.2 ENC:0 FLG:0x0000
orcl12c_ora_23630_TABLE.trc:CHANGE #1 INVLD CON_ID:3 AFN:22 DBA:0x05800271 BLKS:0x000f OBJ:125947 SCN:0x00000b860da6e144 SEQ:1 OP:19.2 ENC:0 FLG:0x0000
orcl12c_ora_23630_TABLE.trc:CHANGE #1 INVLD CON_ID:3 AFN:22 DBA:0x05800081 BLKS:0x000f OBJ:125947 SCN:0x00000b860da6e146 SEQ:1 OP:19.2 ENC:0 FLG:0x0000
orcl12c_ora_23630_TABLE.trc:CHANGE #1 INVLD CON_ID:3 AFN:22 DBA:0x05800091 BLKS:0x000f OBJ:125947 SCN:0x00000b860da6e148 SEQ:1 OP:19.2 ENC:0 FLG:0x0000
orcl12c_ora_23630_TABLE.trc:CHANGE #1 INVLD CON_ID:3 AFN:22 DBA:0x058000a1 BLKS:0x000f OBJ:125947 SCN:0x00000b860da6e14a SEQ:1 OP:19.2 ENC:0 FLG:0x0000
orcl12c_ora_23630_TABLE.trc:CHANGE #1 INVLD CON_ID:3 AFN:22 DBA:0x058000b1 BLKS:0x000f OBJ:125947 SCN:0x00000b860da6e14c SEQ:1 OP:19.2 ENC:0 FLG:0x0000
orcl12c_ora_23630_TABLE.trc:CHANGE #1 INVLD CON_ID:3 AFN:22 DBA:0x05800182 BLKS:0x003b OBJ:125947 SCN:0x00000b860da6e14c SEQ:1 OP:19.2 ENC:0 FLG:0x0000

Each line records the number of blocks (BLKS:) allocated and, as you can see, the APPEND trace shows much larger allocations than the TABLE trace (except for the last one) because the tablespace is locally managed with system allocated extents, and the first few invalidation records for the table creation are in the initial 8 block (64KB) extents; by the time we get to the last few blocks of the initial table creation we’ve just allocated the first 128 block (1MB) extent, which is why the last invalidation record for the table can cover so many more blocks than than the first few.

It is interesting to note, though, that the invalidation record for the INDEX trace are all small, typically 4 blocks, sometimes 3, even when we’ve obviously got to a point where we’re allocating from extents of 128 blocks.

I believe that somewhere I have a note explaining that the invalidation records always identified batches of 5 blocks in older versions of Oracle – but that may simply have been a consequence of the way that freelist management used to work (allocating 5 blocks at a time from the segment to the master freelist).

Although we could simply list all the invalidation records and sum the block counts manually we could be a little smarter with our code, summing them with awk, for example.

grep -n "OP:19.2" orcl12c_ora_23630_TABLE.trc |
     sed 's/.*BLKS://' |
     sed 's/ .*$//'  |
     awk '{m = m + strtonum($0) ; printf("%8i %8i \n",strtonum($0),m)}'
 
      13       13 
      15       28 
       7       35 
       8       43 
      15       58 
      15       73 
      15       88 
      15      103 
      15      118 
      59      177 

It’s left as an exercise to the Unix enthusiast to work out how to take the base tracefile name extract all the sets of data, cater for the odd 18.3 records (whose presence I didn’t request), report any lines for 19.x rows other than 19.2 and sum BLKS separately by TABLE, INDEX, and APPEND.

Once you’ve summed the number of blocks across all the invalidation records (and assuming you’re using the standard 8KB block size) the increease in the volume of redo generated if you alter the database to force logging will be (8KB + a little bit) * number of blocks.  The “little bit” will be close to 44 bytes.

If you’ve set your database up to use multiple block sizes you’ll have to aggregate the invalidation recrords by the AFN (absolute file number) entry and check which files use which block size and multiply up accordingly. And if you’re using a pluggable database (as I was) inside a container database you might also want to filter the redo dump by CON_ID.

If you do set the database to force logging and repeat the search for layer 19 in the redo  you’ll find that each individual data block written using a direct path write generates its own redo record, which will have length “data block size + 44” bytes and hold a single change vector of type 19.1 (Direct Loader block redo entry).

Footnote

It’s worth mentioning, that the dump of redo will go back into the archived redo logs in order to cover the entire range requested by the SCN man/max valeus; so it would be perfectly feasible (though possibly a little time and I/O consuming) to run the report across a full 24 hour window.

martin.bach's picture

Dead Connection Detection (DCD) and the Oracle database

Dead Connection Detection is a useful feature of the Oracle database: it allows for the cleanup of “dead” sessions so they don’t linger around consuming memory and other system resources. The idea is simple: if the database detects that a client process is no longer connected to its server process, it cleans up. This can happen in many ways, in most cases this kind of problem is triggered by an end user.

A dead connection shouldn’t be confused with idle connections: an idle connection still maintains the network link between client and server process, except that there is no activity. Idle connections aren’t maintained/controlled via DCD, there are other tools in the database handling such cases.

As a by product, DCD can also help with overly eager firewalls forcibly removing seemingly idle network connections. I found the following posts and the references therein very useful:

With Oracle 12c Oracle changed the way DCD works by no longer relying on its network layer but rather pushing the functionality into the TCP stack on platforms that support it. This change in behaviour is also explained in an Oracle white paper from 2014.

For the first part of this post I set sqlnet.expire_time to 1 as per a MOS note I found, your value is probably different. The parameter is documented in the Net*8 reference, please use it to work out what the best value is for you. As others have pointed out, this parameter has to go into the RDBMS home, more specifically $ORACLE_HOME/network/admin/sqlnet.ora. I am using Oracle 19.4 on Oracle Linux 7.7 for this blog post. I have seen the same behaviour in 12.1.0.2 as well in my lab.

Background

In addition to the proven methods of checking whether TCP_KEEPALIVE is enabled for a given session I wanted to show another one. Using the ss(8) utility it is possible to show socket options. I also tried lsof but on my system I couldn’t get it to print the options:

SQL> select spid from v$process where addr = (select paddr from v$session where username = 'MARTIN');

SPID
------------------------
13656

SQL> exit 

...

[root@server2 ~]# lsof -nP -p 13656 -T f
lsof: unsupported TCP/TPI info selection: f
lsof 4.87

Although the man-page for lsof reads:

       -T [t]   controls the reporting of some TCP/TPI information,  also  reported  by  net‐
                stat(1),  following  the network addresses.  In normal output the information
                appears in parentheses, each item except TCP or TPI state name identified  by
                a keyword, followed by `=', separated from others by a single space:

                     
                     QR=
                     QS=
                     SO=
                     SS=
                     TF=
                     WR=
                     WW=

                Not all values are reported for all UNIX dialects.  Items values (when avail‐
                able) are reported after the item name and '='.

                When the field output mode is in effect  (See  OUTPUT  FOR  OTHER  PROGRAMS.)
                each item appears as a field with a `T' leading character.

                -T with no following key characters disables TCP/TPI information reporting.

                -T with following characters selects the reporting of specific TCP/TPI infor‐
                mation:

                     f    selects reporting of socket options,
                          states and values, and TCP flags and
                          values.

So let’s try something else: ss(8) – another utility to investigate sockets

Revealing socket options

I have used ss(8) in the past when I didn’t have netstat available, which is more and more common now that netstat is deprecated and its designated successor is ss :)

As far as I know you can’t limit ss to show information just for a PID, I use grep to limit the output. The output is in fact very wide, which is why this might not look pretty on the blog depending on whether the renderer decides to wrap output or not.

[root@server2 ~]# ss -nop | egrep 'NetidState|13656'
NetidState Recv-Q Send-Q                                    Local Address:Port                                      Peer Address:Port                                                                                                           
tcp  ESTAB 0      0                               [::ffff:192.168.100.22]:1521                           [::ffff:192.168.100.21]:25780                           users:(("oracle_13656_si",pid=13656,fd=18)) timer:(keepalive,4.412ms,0)        
[root@server2 ~]# 

I used the following options:

  • -n for “Do not try to resolve service names”
  • -o for “Show timer information” and finally
  • -p to “Show process using socket”.

The main option here is -o. As per the man page:

       -o, --options
              Show timer information. For tcp protocol, the output format is:

              timer:(,,)

              
                     the name of the timer, there are five kind of timer names:

                     on: means one of these timers: tcp retrans timer, tcp early retrans timer and tail loss probe timer
                     keepalive: tcp keep alive timer
                     timewait: timewait stage timer
                     persist: zero window probe timer
                     unknown: none of the above timers

              
                     how long time the timer will expire

              
                     how many times the retran occurs

With a little shell loop I can show how that timer is decrementing:

[root@server2 ~]# for i in $(seq 1 5); do ss -nop | grep 13656; sleep 1 ; done
tcp  ESTAB 0      0                               [::ffff:192.168.100.22]:1521                           [::ffff:192.168.100.21]:25780                           users:(("oracle_13656_si",pid=13656,fd=18)) timer:(keepalive,20sec,0)          
tcp  ESTAB 0      0                               [::ffff:192.168.100.22]:1521                           [::ffff:192.168.100.21]:25780                           users:(("oracle_13656_si",pid=13656,fd=18)) timer:(keepalive,19sec,0)          
tcp  ESTAB 0      0                               [::ffff:192.168.100.22]:1521                           [::ffff:192.168.100.21]:25780                           users:(("oracle_13656_si",pid=13656,fd=18)) timer:(keepalive,18sec,0)          
tcp  ESTAB 0      0                               [::ffff:192.168.100.22]:1521                           [::ffff:192.168.100.21]:25780                           users:(("oracle_13656_si",pid=13656,fd=18)) timer:(keepalive,17sec,0)          
tcp  ESTAB 0      0                               [::ffff:192.168.100.22]:1521                           [::ffff:192.168.100.21]:25780                           users:(("oracle_13656_si",pid=13656,fd=18)) timer:(keepalive,16sec,0)          
[root@server2 ~]# 

Summary

Using the ss utility it is possible to check if a keepalive timer is implemented as a means to support DCD with 12.1 and later releases. Invoking ss(8) hasn’t caused any problems on my system, but as with every such tool you need to ensure it’s safe to use before attempting to look at an important system.

To prevent automated spam submissions leave this field empty.