Oakies Blog Aggregator

oraclebase's picture

RHEL7 and Oracle Linux 7 beta

Nearly two weeks ago, Oracle announced the Oracle Linux 7 Beta 1. Being the Linux fanboy I am, I downloaded it straight away from here.

Oracle Linux is a clone of the Red Hat Enterprise Linux (RHEL) distribution. The RHEL7 beta, and therefore OL7 beta, distro is based on a cut of Fedora 19, although depending on who you ask, it’s possibly more a mix of Fedora 18, 19 and 20… Suffice to say, there are a lot of changes compared to the RHEL6/OL6 distribution.

As I’ve mentioned several times before, my desktop at home is running Fedora 20, so I’m pretty used to most of the changes, but I’ve not written much about them, apart from the odd blog post. It’s not a high priority for me, since I’m not a sysadmin, but I’ll be updating/rewriting a few of the Linux articles on the site to include the new stuff.

When Surachart Opun mentioned having to look at systemd and firewalld, it seemed like the perfect opportunity to update my firewall and services articles. You can see the new versions here.

RHEL7/OL7 is only in beta, and even after the production release I’m sure it will be a long time before Oracle actually certify any products against it, but if you are not a Fedora user, it’s probably worth you having a play around with this stuff.

Cheers

Tim…


RHEL7 and Oracle Linux 7 beta was first posted on April 23, 2014 at 10:15 am.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.
Richard Foote's picture

Presenting at ODTUG Kscope14 Conference in Seattle June 22-26 2014

  Just a short note to say I’ll be presenting at the Oracle Development Tools User Group (ODTUG) Kaleidoscope 14 Conference this year in beautiful Seattle, Washington on June 22-26 2014. I had a fantastic time when I attended this conference a few years ago when it was held in Monterey so I’m really looking forward to […]

arupnanda's picture

Restoring Controlfile When AUTOBACKUP Fail

Allow me to present the snapshot of a day from the life of John--the DBA at Acme Bank. On this particular day a database John manages crashed entirely and had to be restored from the backup. He takes regular (backupset) RMAN backups to tape. Since everything--including the controlfile--had crashed, John had to first restore the controlfile and then restore the database. The controlfile is always backed up with the backup database command. John was sure of that. However, restore controlfile from autobackup gave the error:

RMAN-06172: no AUTOBACKUP found or specified handle is not a valid copy or piece

Without the controlfile, the recovery was stuck, even though all the valid pieces were there. It was a rather alarming situation. Others would have panicked; but not John. As always, he managed to resolve the situation by completing recovery. Interested to learn how? Read on.

Background

Since controlfile was also damaged, the first task at hand was to restore the controlfile. To restore the controlfile, John needs a very special information: the DBID--database identifier. This is not something that would be available until the database is at least mounted. In unmounted state--which is how the database is in right now--John couldn't just go and get it from the database. Fortunately, he follows a best practice: he records the DBID in a safe place.

This is  the command John used to restore the controlfile from the backup. The commands assume the usage of Data Domain Boost, the media management layer (MML) plugin for Data Domain backup appliance; but it could apply to any MML--NetBackup, TSM, etc.

SQL> startup nomount;

RMAN> run {
2>      allocate channel c1 type sbt_tape PARMS 
'BLKSIZE=1048576,SBT_LIBRARY=/prodb/oradb/db1/lib/libddobk.so,ENV=(STORAGE_UNIT=DDB01,BACKUP_HOST=prolin1.proligence.com,ORACLE_HOME=/prodb/oradb/db1)';
3>      send 'set username ddboostadmin password password servername prolin1.proligence.com';
4>      set dbid = 2553024456;
5>      restore controlfile from autobackup;
6>      release channel c1;
7> }

using target database control file instead of recovery catalog
allocated channel: c1
channel c1: SID=1045 device type=SBT_TAPE
channel c1: Data Domain Boost API
sent command to channel: c1

executing command: SET DBID

Starting restore at 22-APR-14

channel c1: looking for AUTOBACKUP on day: 20140422
channel c1: looking for AUTOBACKUP on day: 20140421
channel c1: looking for AUTOBACKUP on day: 20140420
channel c1: looking for AUTOBACKUP on day: 20140419
channel c1: looking for AUTOBACKUP on day: 20140418
channel c1: looking for AUTOBACKUP on day: 20140417
channel c1: looking for AUTOBACKUP on day: 20140416
channel c1: no AUTOBACKUP in 7 days found

released channel: c1

RMAN-00571:===========================================================
RMAN-00569:=============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571:===========================================================
RMAN-03002: failure of restore command at 04/22/2014 16:08:25
RMAN-06172: no AUTOBACKUP found or specified handle is not a valid copy or piece

So, RMAN couldn't locate the backup of the controlfile. John knew that by default, RMAN searches only 7 days of backup. Thinking that perhaps the controlfile was somehow not backed up in the last seven days, he expanded the search to 20 days, using the special parameter maxdays, shown below:

RMAN> run {
2>      allocate channel c1 type sbt_tape PARMS 'BLKSIZE=1048576,SBT_LIBRARY=/prodb/oradb/db1/lib/libddobk.so,ENV=(STORAGE_UNIT=DDB01,BACKUP_HOST=prolin1.proligence.com,ORACLE_HOME=/prodb/oradb/db1)';
3>      send 'set username ddboostadmin password password servername prolin1.proligence.com';
4>      set dbid = 2553024456;
5>      restore controlfile from autobackup maxdays 20;
6>      release channel c1;
7> }

allocated channel: c1
channel c1: SID=1045 device type=SBT_TAPE
channel c1: Data Domain Boost API
sent command to channel: c1
executing command: SET DBID

Starting restore at 22-APR-14
channel c1: looking for AUTOBACKUP on day: 20140422
channel c1: looking for AUTOBACKUP on day: 20140421
channel c1: looking for AUTOBACKUP on day: 20140420
channel c1: looking for AUTOBACKUP on day: 20140419
channel c1: looking for AUTOBACKUP on day: 20140418
channel c1: looking for AUTOBACKUP on day: 20140417
channel c1: looking for AUTOBACKUP on day: 20140416
channel c1: looking for AUTOBACKUP on day: 20140415
channel c1: looking for AUTOBACKUP on day: 20140414
channel c1: looking for AUTOBACKUP on day: 20140413
channel c1: looking for AUTOBACKUP on day: 20140412
channel c1: looking for AUTOBACKUP on day: 20140411
channel c1: looking for AUTOBACKUP on day: 20140410
channel c1: looking for AUTOBACKUP on day: 20140409
channel c1: looking for AUTOBACKUP on day: 20140408
channel c1: looking for AUTOBACKUP on day: 20140407
channel c1: looking for AUTOBACKUP on day: 20140406
channel c1: looking for AUTOBACKUP on day: 20140405
channel c1: looking for AUTOBACKUP on day: 20140404
channel c1: looking for AUTOBACKUP on day: 20140403
channel c1: no AUTOBACKUP in 20 days found

released channel: c1

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 04/22/2014 16:17:56
RMAN-06172: no AUTOBACKUP found or specified handle is not a valid copy or piece  

No luck; it gave the same error. So--John concluded--it was not an issue with the absence of controlfile backup. Something else caused the backup of controlfile to be invisible. He did, however, know that the controlfiles are backed up along with the regular backups. Without the database in mounted mode, he couldn't find out the location of those controlfile backups. If this database was registered to a catalog, he could have got that information from the catalog; but unfortunately, being a new database, it was not yet registered. That avenue was closed.

He did, however, follow another best practice--saving the rman log files. As a rule, he sends the RMAN output logs to the tape along with the backup. He recalled the most recent backup log and checked the log for the name of the backup piece. Here is an excerpt from the log:

... output truncated ...
channel c8: starting piece 1 at 21-APR-14
channel c8: finished piece 1 at 21-APR-14
piece handle=#ff6666;">14p69u7q_1_1 tag=TAG20140421T141608 comment=API Version 2.0,MMS Version 1.1.1.0
channel c8: backup set complete, elapsed time: 00:00:01
channel c5: finished piece 1 at 21-APR-14
piece handle=#ff6666;">10p69rhb_1_1 tag=TAG20140421T141608 comment=API Version 2.0,MMS Version 1.1.1.0
channel c5: backup set complete, elapsed time: 00:47:33
channel c6: finished piece 1 at 21-APR-14
... output truncated ...

Looking at the output, John notes the names of the backup pieces created, listed next to "piece handle"--14p69u7q_1_1, 10p69rhb_1_1, etc.He still did not know exactly which one contained the controlfile backup; but it was not difficult to try them one by one. He tried to get the controlfile from the first backuppiece, using the following command where he used a special clause: restore controlfile from a location.

RMAN> run {
2>      allocate channel c1 type sbt_tape PARMS 'BLKSIZE=1048576,SBT_LIBRARY=/prodb/oradb/db1/lib/libddobk.so,ENV=(STORAGE_UNIT=DDB01,BACKUP_HOST=prolin1.proligence.com,ORACLE_HOME=/prodb/oradb/db1)';
3>      send 'set username ddboostadmin password password servername prolin1.proligence.com';
4>      set dbid = 2553024456;
5>      restore controlfile from '14p69u7q_1_1';
6>      release channel c1;
7> }

allocated channel: c1
channel c1: SID=1045 device type=SBT_TAPE
channel c1: Data Domain Boost API

sent command to channel: c1
executing command: SET DBID
Starting restore at 22-APR-14

channel c1: restoring control file
channel c1: restore complete, elapsed time: 00:01:25
output file name=+prodb3CTL1/prodb3/control01.ctl
output file name=+prodb3DATA1/prodb3/control02.ctl
output file name=+prodb3INDX1/prodb3/control03.ctl

Finished restore at 22-APR-14
released channel: c1  

It worked; the controlfile was restored! If it hadn't worked, John would have tried the other backup pieces one by one until he hit the one with the controlfile backup. After this, John mounted the database.

RMAN> alter database mount;
database mounted

The rest was easy; all he had to do was to issue "restore database" and "recover database using backup controlfile". The first thing John did after the database was mounted was checking the controlfile autobackup setting:

RMAN> show CONTROLFILE AUTOBACKUP;

RMAN configuration parameters for database with db_unique_name prodb3 are:
CONFIGURE CONTROLFILE AUTOBACKUP OFF; #default

Well, lesson learned. John immediately changed it to ON.

RMAN> CONFIGURE CONTROLFILE AUTOBACKUP on;

new RMAN configuration parameters:

CONFIGURE CONTROLFILE AUTOBACKUP ON;
new RMAN configuration parameters are successfully stored

Someone suggested that he could have tried to restore the controlfile from the TAG instead of the actual backup piece. Had he attempted the restore from the TAG, he would have got a different error:

RMAN> run {
2>      allocate channel c1 type sbt_tape PARMS 'BLKSIZE=1048576,SBT_LIBRARY=/prodb/oradb/db1/lib/libddobk.so,ENV=(STORAGE_UNIT=DDB01,BACKUP_HOST=prolin1.proligence.com,ORACLE_HOME=/prodb/oradb/db1)';
3>      send 'set username ddboostadmin password password servername prolin1.proligence.com';
4>      set dbid = 2553024456;
5>      restore controlfile from tag=TAG20140421T141608;
6>      release channel c1;
7> }

allocated channel: c1
channel c1: SID=1045 device type=SBT_TAPE
channel c1: Data Domain Boost API

sent command to channel: c1
 
executing command: SET DBID
 
Starting restore at 22-APR-14
 
released channel: c1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 04/22/2014 16:10:04
RMAN-06563: control file or SPFILE must be restored using FROM AUTOBACKUP

So that command could not have helped John.

No Backup of Controlfile

Let's consider another scenario: there was no backup of controlfile. Dreaded as it sounds, it's still not end of the world. John could create the controlfile from a backup located somewhere else. This special backup could be created by two special commands:

SQL> alter database backup controlfile to '/tmp/cont.dbf'
2 /

Database altered.

The above command creates a copy of the controlfile with the data as of the time of the command. Another command is:

SQL> alter database backup controlfile to trace as '/tmp/cbk.trc' reuse;

Database altered.

This command creates a text file that you can use as SQL statement (after some minor editing) to create a controlfile. The major difference between these two approaches is that the first approach produces a snapshot of the controlfile as of that time, along with all the data--the backup, the archived logs, etc. The second approach creates a brand new "blank" controfile that you must feed to bring up. John uses both options as a Plan B. On another post we will see how he saved the day using these two special controlfile backups.

Takeaways

What did you learn from the story. Here are some key takeaways:

  1. Always write down the DBID of all your databases somewhere. If you use a recovery catalog, it's there; but it's good to note it down separately. This number does not change unless you use NID utility; so recording once is enough.
  2. Always configure controlfile autobackup. The default is OFF; make it ON.
  3. Always save the backup log files. In a crunch, they yield valuable information otherwise not available.
  4. When controlfile is not found, you can use restore controlfile from 'location' syntax in RMAN to pull the controlfile from the location. If that location does not have a controlfile backup, don't worry; just try all available locations. One might contain what you are looking for. You have nothing to lose but everything to gain.
  5. Always use a script for this type RMAN restore activities instead of typing at the prompt. You will find changing data, e.g. various backup locations, easier and make less mistakes that way.
  6. Always create a backup controlfile everyday, even if you don't think you need it. You may someday and you will thank yourself when you do.

oraclebase's picture

Some new Oracle Database 12c Articles

I’ve recently put some new Oracle 12c articles on the website.

The privilege usage stuff is really cool. Normally, trying to figure out what you can remove from a user is always a complete pain in the ass. Some of the databases I’m currently working with have used GRANT like it’s going out of fashion. Trying to identify what is really necessary is a tough job. Features like this are going to be hard to wait for…

Cheers

Tim…

 


Some new Oracle Database 12c Articles was first posted on April 22, 2014 at 7:18 pm.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.
martin.bach's picture

RAC 12c enhancements: adding an additional SCAN-part 1

Based on customer request Oracle has added the functionality to add a second SCAN, completely independent of the SCAN defined/created during the cluster creation. Why would you want to use this feature? A few reasons that spring to mind are:

  • Consolidation: customers insist on using a different network
  • Separate network for Data Guard traffic

To demonstrate the concept I am going to show you in this blog post how I

  1. Add a new network resource
  2. Create new VIPs
  3. Add a new SCAN
  4. Add a new SCAN listener

It actually sounds more complex than it is, but I have a feeling I need to split this article in multiple parts as it’s far too long.

The lab setup

When you install RAC 11.2 and 12.1 you are prompted to specify a Single Client Access Name, or SCAN. This SCAN is usually defined in the corporate DNS server and resolves to 3 IP addresses. This allows for an easy way to implement client-side load balancing. The SCAN is explained in more detail in Pro Oracle Database 11g RAC on Linux for 11.2 and on OTN for 11.2 and 12.1. To spice the whole configuration up a little bit I decided to use RAC One Node on the clusters I am using for this demonstration.

I created 2 12.1.0.1.2 clusters for this Data Guard test. Hosts ron12cprinode1 and ron12cprinode2 form the primary cluster, ron12csbynode1 and ron12csbynode2 will form the standby cluster. The RAC One Node database is named RON:

[oracle@ron12cprinode1 ~]$ srvctl config database -db ron
Database unique name: ron
Database name: ron
Oracle home: /u01/app/oracle/product/12.1.0.1/dbhome_1
Oracle user: oracle
Spfile: +DATA/ron/spfilepri.ora
Password file: +DATA/ron/orapwpri
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: ron
Database instances:
Disk Groups: RECO
Mount point paths:
Services: ron12c
Type: RACOneNode
Online relocation timeout: 30
Instance name prefix: pri
Candidate servers: ron12cprinode1,ron12cprinode2
Database is administrator managed
[oracle@ron12cprinode1 ~]$

To make things even more interesting I defined my ORACLE_SID prefix on the primary to be “pri” and “sby” on the standby.

[oracle@ron12cprinode1 ~]$ ps -ef | grep smon
oracle    2553     1  0 Feb06 ?        00:00:09 asm_smon_+ASM1
oracle   15660 15578  0 05:05 pts/3    00:00:00 grep smon
oracle   28241     1  0 Feb07 ?        00:00:18 ora_smon_pri_1
[oracle@ron12cprinode1 ~]$

A quick check with gpnptool reveals the network usage before the addition of the second SCAN:


 
  
  
 

There is the default network, (“netnum 1″) that is created on the network defined as “public” during the installation. I have another spare network port (eth2) reserved for the new network and Data Guard traffic. Currently network 1 is the only one available.

[root@ron12cprinode1 ~]# srvctl config network
Network 1 exists
Subnet IPv4: 192.168.100.0/255.255.255.0/eth0, static
Subnet IPv6:

As you can see RAC 12c now supports IPv6. I have another network available that I want to make available for Data Guard traffic. For this purpose I added all nodes into DNS. I am a bit old-fashioned when it comes to DNS, I am still using bind most of the time. Here is an excerpt of my reverse name resolution file:

; hosts - primary cluster
50	PTR	ron12cprinode1.dg.example.com.
51	PTR	ron12cprinode1-vip.example.com.
52	PTR	ron12cprinode2.dg.example.com.
53	PTR	ron12cprinode2-vip.dg.example.com.
; Data Guard SCAN - primary cluster
54	PTR	ron12cpri-scan.dg.example.com.
55	PTR	ron12cpri-scan.dg.example.com.
56	PTR	ron12cpri-scan.dg.example.com.

; hosts - standby cluster
57	PTR	ron12csbynode1.dg.example.com.
58	PTR	ron12csbynode1-vip.dg.example.com.
59	PTR	ron12csbynode2.dg.example.com.
60	PTR	ron12csbynode2-vip.dg.example.com.
; Data Guard SCAN - standby cluster
61	PTR	ron12csby-scan.dg.example.com.
62	PTR	ron12csby-scan.dg.example.com.
63	PTR	ron12csby-scan.dg.example.com.

The domain is *.dg.example.com, the primary database client traffic will be routed through *.example.com.

Adding the new network

The first step to be performed is to make Clusterware aware of the second network. I am doing this on both sides of the cluster. Notice that the primary nodes are called *pri* whereas the standby cluster is called *sby*

[root@ron12cprinode1 ~]# srvctl add network -netnum 2 -subnet 192.168.102.0/255.255.255.0/eth2 -nettype static -verbose
Successfully added Network.

[root@ron12csbynode1 ~]# srvctl add network -netnum 2 -subnet 192.168.102.0/255.255.255.0/eth2 -nettype static -verbose
Successfully added Network.

So this worked, now I have 2 networks:

[root@ron12cprinode1 ~]# srvctl config network
Network 1 exists
Subnet IPv4: 192.168.100.0/255.255.255.0/eth0, static
Subnet IPv6:
Network 2 exists
Subnet IPv4: 192.168.102.0/255.255.255.0/eth2, static
Subnet IPv6:
[root@ron12cprinode1 ~]#

In the next step I have to add VIPs for the new nodes on the *.dg.example.com subnet. The VIPs must be added on all cluster nodes, 4 in my case.

[oracle@ron12cprinode2 ~]$ srvctl add vip -h

Adds a VIP to the Oracle Clusterware.

Usage: srvctl add vip -node  -netnum  -address {|}/[/if1[|if2...]] [-skip] [-verbose]
    -node               Node name
    -address                       /[/if1[|if2...]] VIP address specification for node applications
    -netnum               Network number (default number is 1)
    -skip                          Skip reachability check of VIP address
    -verbose                       Verbose output
    -help                          Print usage
[oracle@ron12cprinode2 ~]$

So I did this on each node in my cluster

[root@ron12cprinode1 ~]# srvctl add vip -node ron12cprinode1 -netnum 2 -address 192.168.102.51/255.255.255.0/eth2 -verbose
Network exists: 2/192.168.102.0/255.255.255.0/eth2, type static
Successfully added VIP.
[root@ron12cprinode2 ~]# srvctl add vip -node ron12cprinode2 -netnum 2 -address 192.168.102.53/255.255.255.0/eth2 -verbose
Network exists: 2/192.168.102.0/255.255.255.0/eth2, type static
Successfully added VIP.

[root@ron12csbynode1 ~]# srvctl add vip -node ron12csbynode1 -netnum 2 -address 192.168.102.58/255.255.255.0/eth2 -verbose
Network exists: 2/192.168.102.0/255.255.255.0/eth2, type static
Successfully added VIP.
[root@ron12csbynode2 ~]# srvctl add vip -node ron12csbynode2 -netnum 2 -address 192.168.102.60/255.255.255.0/eth2 -verbose
Network exists: 2/192.168.102.0/255.255.255.0/eth2, type static
Successfully added VIP.

And I need to start the VIPs. They have some funny names as you can see in crsctl status resource (the names can’t be defined, see output of srvctl add scan -h above)

[root@ron12cprinode1 ~]# srvctl status vip -vip ron12cprinode1_2
VIP 192.168.102.51 is enabled
VIP 192.168.102.51 is not running
[root@ron12cprinode1 ~]# srvctl start vip -vip ron12cprinode1_2
[root@ron12cprinode1 ~]# srvctl start vip -vip ron12cprinode2_2
[root@ron12cprinode1 ~]# srvctl status vip -vip ron12cprinode1_2
VIP 192.168.102.51 is enabled
VIP 192.168.102.51 is running on node: ron12cprinode1
[root@ron12cprinode1 ~]#

Add the second SCAN

At this time you can add the second SCAN. The command syntax is shown here:

[oracle@ron12cprinode1 ~]$ srvctl add scan -h

Adds a SCAN VIP to the Oracle Clusterware.

Usage: srvctl add scan -scanname  [-netnum ]
    -scanname           Domain name qualified SCAN name
    -netnum               Network number (default number is 1)
    -subnet                        /[/if1[|if2...]] NET address specification for network
    -help                          Print usage

Implemented on my first cluster node the command is easier to comprehend.

[root@ron12cprinode1 ~]# srvctl add scan -scanname ron12cpri-dgscan.dg.example.com -netnum 2

[root@ron12cprinode1 ~]# srvctl status scan
SCAN VIP scan1 is enabled
SCAN VIP scan1 is running on node ron12cprinode2
SCAN VIP scan2 is enabled
SCAN VIP scan2 is running on node ron12cprinode1
SCAN VIP scan3 is enabled
SCAN VIP scan3 is running on node ron12cprinode1

You need to create the SCAN on both clusters. On my primary cluster the SCAN has been created with the following configuration:

[root@ron12cprinode1 ~]# srvctl config scan -netnum 2
SCAN name: ron12cpri-dgscan.dg.example.com, Network: 2
Subnet IPv4: 192.168.102.0/255.255.255.0/eth2
Subnet IPv6:
SCAN 0 IPv4 VIP: 192.168.102.54
SCAN name: ron12cpri-dgscan.dg.example.com, Network: 2
Subnet IPv4: 192.168.102.0/255.255.255.0/eth2
Subnet IPv6:
SCAN 1 IPv4 VIP: 192.168.102.55
SCAN name: ron12cpri-dgscan.dg.example.com, Network: 2
Subnet IPv4: 192.168.102.0/255.255.255.0/eth2
Subnet IPv6:
SCAN 2 IPv4 VIP: 192.168.102.56
[root@ron12cprinode1 ~]#

You can see the new VIPs in the output of ifconfig, just as you would with the primary SCAN:

eth2      Link encap:Ethernet  HWaddr 52:54:00:FE:E2:D5
          inet addr:192.168.102.50  Bcast:192.168.102.255  Mask:255.255.255.0
          inet6 addr: fe80::5054:ff:fefe:e2d5/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:523 errors:0 dropped:0 overruns:0 frame:0
          TX packets:339 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:118147 (115.3 KiB)  TX bytes:72869 (71.1 KiB)

eth2:1    Link encap:Ethernet  HWaddr 52:54:00:FE:E2:D5
          inet addr:192.168.102.55  Bcast:192.168.102.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

So there is nothing too surprising in the output, it’s exactly the same as before with the public SCAN created during the installation.

End of part 1

This already seems like a lot of text to me so I think it’s time to pause here. The next parts will demonstrate the addition of the SCAN listeners, the new node listeners on the *.dg.example.com network and finally the duplication of the primary RAC One Node database for use as a standby database.

Richard Foote's picture

Indexing Foreign Key Constraints With Invisible Indexes (Invisible People)

In my previous post I discussed when deleting rows from parent tables, how Bitmap Indexes based on the FK constraint can prevent the expensive Full Tables Scans (FTS) on the child tables but not the associated exclusive table locks. Last year, I discussed how it was possible in Oracle Database 12c to have multiple indexes on the same column list. […]

dbakevlar's picture

ASH Analytics- Tracking Down an Issue, Part I

This is the first of many posts I’ll do where we take a particular issue and trouble-shoot.  I’ll go through multiple scenarios to show the common steps to take via the Enterprise Manager 12c from discovery to identification to research to explanation.

This blog will use the following features:

  • ASH Analytics
  • SQL Details
  • ASH Report

The first thing we need is an issue-

ash_b_0421

Note the CPU red line-  the IO is considerably over that and quite a lot of IO, along with Commit and Concurrency waits.  We’ll start with this as our issue that we can see, has quickly escalated over our timeline, (15 minutes was requested from the view…)

Scanning down to the Top SQL, (by default) we can then see that this IO pressure is primarily caused by two-three SQL_ID during the timeline:

ash_b_0421_2

We can click on the first SQL_ID and display the insert statement.  We also note the escalation from 1pm to 1:15pm in the Statistics tab that shows the active sessions.

ash_b_0421_3

So I know a few of you are asking, “Why isn’t there any Plan Hash Value?  If it is a simple insert, with no subquery, etc. then would it have a Plan Hash Value?   Think about that as we inspect the next tab, Activity.

ash_b_0421_4

We are now inspecting just the wait events connected with THIS SQL_ID and the sessions that are involved with it.  On the right, we can quickly see that there is a huge percentage of cell single block physical read waits involved with this insert.  What does that tell you about the hardware that the insert is running on?  Yes, it’s part of an Exadata environment.

Scrolling down allows us to view the sessions involved in this timeline.  This is not just the sessions for the 15 minutes of escalation we are interested in.  These are ALL the sessions in the timeline shown above.  We can view from the top, .77% of the activity for the first one and note that it decreases as we go through the list.  Also notice that it’s 1-10 of 200 sessions total that have been captured during the total timeline.  We already know a majority of them are part of our escalation in IO on the system.

ash_b_0421_5

Inspecting the last 10 helps us see the decrease in IO resources as concurrency waits increased-

ash_b_0421_8

We are down to .39% by the time we get to the 200th session captured in this timeline.  Now getting back to our “Should we have a plan hash value for our insert statement?” question.  No, we shouldn’t and there isn’t a plan, nor is there plan control that will be found for this simple insert.  This is verified when you inspect the Plan and Plan Control tabs for this SQL_ID.

ash_b_0421_6

We then can look at the Tuning History.

ash_b_0421_7

You might wonder how could it be that this insert, that doesn’t have a plan or plan hash value is found in the tuning task?  If it is a top SQL that is deemed an impact to performance, then it will show up here.  There is no escape for this insert.

At this time, we can see that we are looking at a concurrency issue that is not coming from a user or from a specific session in any way.  So the next step is to get a clearer look at the 5+ minutes that lead up to the heavy IO to see what is really going on and eliminate assumptions.

ash_b_0421_9

I simply click on “Run ASH Report” in the SQL Details window, change the times to 1:10pm –> 1:16pm and notice that we are focused on this SQL_ID right still.

The report shows the standard header and high level info, but then once you dig in, you’ll see the top events:

ash_b_0421_10

We see that yes, it is almost 96% of the activity with an average of 21 active sessions running at a given time.

ash_b_0421_11

If you were to have traced this session, you’d see something similar to the following in the trace file to represent the data in the first three lines that we now see in the ASH report:

WAIT #3: nam='cell single block physical read' ela= 945 cellhash#=3983857215 diskhash#=3558547146 bytes=8192 
#ff0000;">obj#=12894 tim=1912764532115123
WAIT #3: nam='cell single block physical read' ela= 483 cellhash#=1017296094 diskhash#=1180032005 bytes=8192 
obj#=46129 tim=1213898462783761
WAIT #3: nam='cell single block physical read' ela= 697 cellhash#=1017296094 diskhash#=3896957901 bytes=8192 
obj#=47883 tim=1158963811267355

Note that there are THREE cell single block physical reads for the insert event.  Why would we see this?  This is the one question to the readers of this blog-  What would cause the single block physical reads on these three indexes?

 

ash_b_0421_13

It’s not the table we are having an issue with but the indexes that are on the table.  Three indexes show up very clearly in our Top DB Objects section showing our wait event, cell single block physical read.  The ASH report gained this information from the #ff0000;">obj#, just like you see in the trace file info in red up above that I displayed for you.  For each of those entries shown, a very similar percentage of the event is displayed farther up in the Top Event P1/P2/P3 Values.

So from here, I think we’ve inspected the SQL_ID pretty fully and we know why the problem is occurring.  There are three indexes that require a scan

The final answer we want is why is this heavy load occurring?  Well, ASH answers this for us, too-

ash_b_0421_answer

Yeah, someone is TRYING to put a heavy load on the database with Swingbench and the active average sessions matches our active average sessions overall.   We’ll leave it as is-  no one wants to be responsible for killing someone’s load test… :)

 



Tags:  


Del.icio.us



Facebook

TweetThis

Digg

StumbleUpon




Copyright © DBA Kevlar [ASH Analytics- Tracking Down an Issue, Part I], All Right Reserved. 2014.

oraclebase's picture

Divergent

I’ve just got back from watching Divergent. It is a really good film!

Don’t believe any bullshit comparisons to that crappy Hunger Games. It is so much better than that rubbish! It’s probably a good idea not to read any summaries of the plot either, since it sounds kind-of stupid on paper. In the film it makes a lot more sense and just works.

Very pleased I went to see it. One of the better Sci Fi/action films I’ve seen is quite a while. Much more worthwhile than that spider-thing…

Cheers

Tim…


Divergent was first posted on April 22, 2014 at 12:16 am.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.
khailey's picture

Performance Testing with Agile Data

apple_orange

Performance testing requires full, fresh data

Many organizations don’t even attempt to test performance until very late in their development cycle because it is only in the UAT (or equivalent) environment that they have a full copy of their production data set.  Errors and inefficiencies found at this stage are expensive to fix and are often promoted to production due to pressures from the business to meet release schedules.

Delphix customers give each developer, or team of developers, a full, fresh copy of the database where they can validate the performance of their code in the UNIT TESTING phase of their projects.  Poorly written queries are identified by the developer and can be fixed immediately, before their code is submitted for integration with the rest of the application.   The test/fix iteration is much tighter and results in higher quality, better performing application releases.

How does Delphix enable this?

VDBs created by Delphix have many advantages over a physical database, and therefore can be used in unique ways.  Consider the following:

  • Self service.  Delphix automates all the complexity required to make changes to a database, allowing developers and testers to get what they need without waiting on associated support organizations.
  • Fast provisioning.  VDBs require no data movement at the time they are created, so even large databases can be created in a few minutes.
  • Easy data refresh.  Refreshing a VDB with the latest data from production can be done with 3 mouse clicks.  Never test against synthetic data.
  • Data rewind/reset.  Delphix tracks all changes made to a VDB and can rewind the state of the database to any point in time at the request of the user.  Run a test, rewind, change parameters, run the test again.
  • Efficient use of infrastructure.  VDBs run in a tiny storage footprint, allowing teams to run many more database environments in parallel.
  • Efficient use of licenses.  Turning VDBs on and off is trivial.  Test environments can be spun up as needed and suspended when testing is finished.  Suspended VDBs use no resources of the DBMS.
  • Database relocation.  VDBs are easily moved between database hosts, even across datacenters.

Following are examples of performance changes that can easily be tested in VDBs:

Database Configuration

  • changes to initialization parameters (eg. optimizer_index_cost_adj, optimizer_index_caching, etc)
  • changes to redo size, parameters
  • DBMS version and patch set
  • SGA size
  • CPU type, speed (move VDB between database hosts)
  • Different DB statistics, statistics gathering methods

Data Modeling

  • Index changes
  • SQL Profiles – Like the old stored outlines, you can set up a complex system of profiles on a VDB and test different explain plans
  • Run complex and potentially debilitating queries on a VDB to minimize impact, use TKPROF and heavy tracing you can’t do elsewhere

Application  Configuration

  • Testing application server connection pool sizes/limits
  • network bandwidth testing for multi-hop/firewall configuration
  • theoretical maximums for concurrent batch jobs (not just at the DB, but the app tier as well)
  • testing database monitoring solutions/thresholds/configuration impact
  • Oracle trace event impact when turned on (deviation from a baseline)

Enabling a physical UAT environment with Delphix

As mentioned above, many Delphix customers will still maintain a final testing environment that matches the production setup exactly.  They will have fibre channel (or equivalent) connections to the SAN directly from their DBMS host.  Even in this environment, which bypasses the Delphix Engine, our software can provide great benefit to the testing process.

The V2P feature can be used to migrate any data set contained within Delphix to a physical storage environment.  That means any data set captured from production, or any data set modified by a VDB can be pushed to UAT in an automated fashion by Delphix.  Running a V2P operation is not as fast as creating/refreshing a VDB because it requires data movement, but it is faster than restoring a traditional database backup and automates all the instance creation and configuration tasks.

Bringing it all together

The high level life cycle of performance testing on the Delphix Agile Data Platform looks something like the following:

  1. Create and/or refresh development environments with the latest full data set from production.
  2. Use VDBs to iterate quickly on unit tests of new code, data modeling changes, DBMS configuration changes.
  3. Integrate and test all changes in a highly parallelized QA environment, using VDBs to minimize the setup time between test runs.
  4. Run V2P to migrate release candidates to UAT for final performance verification.
  5. Promote changes to production.

 

A

morten's picture

The Poetry of Science

So easter was spent with the kids at a family oriented vacation spot. When everyone else was sleeping, I spent some time watching my list of videos "to watch" that I compile during my working days, so I dont forget to wathc all those wonderfull videos out there on the net. One of the videos I watched this holiday, was The Poetry of Science, with Richard Dawkins and Neil deGrasse Tyson. If I could only choose one video that my kids could watch during their education, this would be the one. The discussion between these two men, is really inspiring and if I was ever asked to define the meaning of life, this videos content would be a big part of my answer.