RAC, parallel query and udpsnoop

I presented about various performance myths in my ‘battle of the nodes’ presentation. One of the myth was that how spawning parallel query slaves across multiple RAC instances can cause major bottleneck in the interconnect. In fact, that myth was direct result of a lessons learnt presentation from a client engagement. Client was suffering from performance issues with enormous global cache waits running in to 30+ms average response time for global cache CR traffic and crippling application performance. Essentially, their data warehouse queries were performing hundreds of parallel queries concurrently with slaves spawning across three node RAC instances.

Of course, I had to hide the client details and simplified using a test case to explain the myth. Looks like either a)my test case is bad or b) some sort of bug I encountered in version c) I made a mistake in my analysis somewhere. Most likely it is the last one :-( . Greg Rahn questioned that example and this topic deserves more research to understand this little bit further. At this point, I don’t have and database is in and so we will test this in


UDP is one of the protocol used for cache fusion traffic in RAC and it is the Oracle recommended protocol. In this article, UDP traffic size must be measured. Measuring Global cache traffic using AWR reports was not precise. So, I decided to use a dtrace tool kit tool:udpsnoop.d to measure the traffic between RAC nodes. There are two RAC nodes in this setup. You can read more about udpsnoop.d. That tool udpsnoop.d can be downloaded from dtrace toolkit . Output of this script is of the form:

Diagnosing and Resolving “gc block lost”

Last week, one of our clients had a sudden slow down on all of their applications which is running on two node RAC environment

Below is the summary of the setup:
– Server and Storage: SunFire X4200 with LUNs on EMC CX300
– OS: RHEL 4.3 ES
– Oracle (database and clusterware)
– Database Files, Flash Recovery Area, OCR, and Voting disk are located on OCFS2 filesystems
– Application: Forms and Reports (6i and also lower)

As per the DBA, the workload on the database was normal and there were no changes on the RAC nodes and on the applications. Hmm, I can’t really tell because I haven’t really looked into their workload so I don’t have past data to compare.

Collaborate 09: Don’t miss these sessions

Collaborate 09 starts on Sunday, May 3 (a few days from now!) in Orlando. I’ve been offline for several weeks (more on that later), but will be returning to the world of computers and technology in full force in Orlando. I’ve had a few inquiries about whether or not I’ll be at Collaborate, so I thought I’d resurrect my blog with a post about where I’ll be and some of the highlights I see at Collaborate 09.

First, where I’ll be presenting:

Single Instance and RAC Kernel/OS upgrade

This document will serve as a guide for the Kernel and OS upgrade activities for

  1. Single Instance on ASM using raw devices
  2. RAC with ASM (using ASMlib) and OCFS2

Upgrading the Kernel and OS is easy and will just need some few commands. The critical part is the dependencies once the Kernel gets updated, so if you’re using ASMlib and OCFS2 you’ll notice that after the upgrade they’re not working anymore… you can’t startup the ASM, then if your OCR and Voting Disk are on OCFS2 the CRS stack wont start all because the RPMs of ASMlib and OCFS2 are kernel dependent, also there are similar components/softwares that are kernel dependent so you have to check them out and do a risk analysis before doing the upgrade.

ADV: RAC Attack Hands-on Event at Collaborate09

The RAC SIG, Oracle and IOUG are thrilled to present the hands-on event dubbed “RAC Attack!” at Collaborate09 in Orlando, FL. It is a half-day University Session in the IOUG Forum scheduled for the morning of Thursday, May 7th.

Each participant will have their own private RAC cluster to use. You’ll be able to install a new cluster, test session failover, perform backup and recovery and just about anything else you’d like to try (time permitting). The session will have lab outlines with very specific instructions that cater to beginners. Advanced users are welcome to test anything they like. If you try something that doesn’t work, we have mechanisms in place to help “reset” your cluster in 15 minutes and let you continue working and testing.

Congratulations New Oracle ACE, Jeremy Schneider!

I’ll be the first to offer a large congratulations to Jeremy Schneider on being the most recent appointment to the Oracle ACE program. He certainly deserves it (I nominated him, so I suppose I would think so) and I continue to look for great things to come.

Start Database Services automatically after instance startup

Those of us that have dealt with RAC environments for a while are familiar with the behavior of Oracle Services in an Oracle Cluster. Services are an essential component for managing workload in a RAC environment. If you’re not defining any non-default services in your RAC database, you’re making a mistake. To learn more about services, I strongly recommend reading the definitive whitepaper by Jeremy Schneider on the topic.

Security, Forecasting Oracle Performance and Some stuff to post… soon…

I’ve been busy this February “playing around/studying” on the following:

1) Oracle Security products (Advance Security Option, Database Vault, Audit Vault, Data Masking, etc. etc.). Well, every organization must guard their digital assets against any threat (external/internal) because once compromised it could lead to negative publicity, lost revenue, litigation, lost of trust.. and the list goes on.. I’m telling you, Oracle has a lot to offer (breadth of products and features, some of them are even free!) on this area and you just need to have the knowledge to stitch them..

