Oakies Blog Aggregator

tanelpoder's picture

Oracle Peformance Visualization…

Coskan Gundogar and Karl Arao have written two interesting articles about Oracle performance analysis and visualization, check these out!

Coskan’s article:

Karl’s article:

Note that in March I will be releasing PerfSheet v3.0, which will have lots of improvements! ;-)

Share/Bookmark

tanelpoder's picture

Bind Variable Peeking – execution plan inefficiency

In my Beyond Oracle Wait interface article I troubleshooted a test case where an execution plan somehow went “crazy” and started burning CPU, lots of logical IOs and the query never completed.

I have uploaded the test case I used to my new website, to a section where I will upload some of my demo scripts which I show at my seminars (and people can download & test these themselves too):

http://tech.e2sn.com/oracle-seminar-demo-scripts

Basically what I do is this:

  1. I run the query with bind variable values where only a handful of rows match the filter condition. Thus Oracle picks nested loop join (and indexed access path)
  2. Then I run the same query with different bind values, where a lot of rows match the filter condition. Oracle reuses existing execution plan (with nested loops!!!). Oracle ends up looping through a lot of blocks again and again (because nested loop visits the “right” side of the join once for every row coming from the “left” side of the join).

Using nested loops over lots of rows is a sure way to kill your performance.

And an interesting thing with my script is that the problem still happens in Oracle 11.1 and 11.2 too!

Oracle 11g has Adaptive Cursor Sharing, right? This should take care of such a problem, right? Well no, adaptive bind variable peeking is a reactive technique – it only kicks in after the problem has happened!

So feel free to download the script, review it and test it out!

Share/Bookmark

tanelpoder's picture

Bind Variable Peeking – execution plan inefficiency

In my Beyond Oracle Wait interface article I troubleshooted a test case where an execution plan somehow went “crazy” and started burning CPU, lots of logical IOs and the query never completed.
I have uploaded the test case that you can run here (note that it drops and creates tables T1..T5 in your schema):
/ast/02_bind_peeking_nested_loops.sql
Basically what I do is this:
I run the query with bind variable values where only a handful of rows match the filter condition.

tanelpoder's picture

Bind Variable Peeking – execution plan inefficiency

In my Beyond Oracle Wait interface article I troubleshooted a test case where an execution plan somehow went “crazy” and started burning CPU, lots of logical IOs and the query never completed.
I have uploaded the test case that you can run here (note that it drops and creates tables T1..T5 in your schema):
/ast/02_bind_peeking_nested_loops.sql
Basically what I do is this:
I run the query with bind variable values where only a handful of rows match the filter condition.

marco's picture

Small Intermezzo Towards Miracle OpenWorld 2010

Created a small YouTube introduction movie as requested by Moans for the Miracle OpenWorld 2010 event this year. I am happy they more or less resolved their issues with Oracle Legal regarding the naming for the event (see the movie and news section on the Miracle OpenWorld 2010 main page).

Anyway, made a small attempt (in HD) to tell a bit about XML on YouTube.

As said:

    “An exploration in mending and gluing things together, while it absolutely doesn’t make any sense what I am saying, but then again, who cares…its bogus anyway…”

Hope to CU around somewhere this year.

:-)

khailey's picture

Tuning the Application, Database and Hardware


Tuning Example in 3 parts



part 1 - "The Database is hanging!" AKA "the application has problems, good grief"


I wonder if you can imagine, or have had the experience of the application guys calling with anger and panic in their voices saying "the database is sooo slow, you've got to speed it up."
What's your first reaction? What tools do you use? How long does it take to figure out what's going on?
Let's take a look at how it would work with DB Optimizer. When I get a call like this I take a look at the database with DB Optimizer:



I can clearly see that the database is not bottlenecked and there must be a problem on the application.
Why do I think it's the application and not the database? The database is showing plenty of free CPU in the load chart, the largest chart, on the top, in the image above. In the load chart, there is a horizontal red line. The red line represents the number of CPU's on the system, which in this case is 2 CPUs. The CPU line is rarely crossed by bars which represent the load on the database, measured in average number of sessions. The session activity is averaged over 5 samples over 5 seconds, thus bars are 5 seconds wide. The bars above fall mostly about 1 average active session and the bars are rarely green. Green represents CPU load. Any other color bar indicates a sessions waiting. The main wait in this case is orange, which is log file sync, ie waits for commits. Why is the database more or less idle and why are most of the waits we do see for "commit"? I look at the code coming to the database and see something like this:
insert into foo values ('a');
commit;
insert into foo values ('a');
commit;
insert into foo values ('a');
commit;
insert into foo values ('a');
commit;
insert into foo values ('a');
commit;
insert into foo values ('a');
commit;
insert into foo values ('a');
commit;
Doing single row inserts and committing after each is very inefficient. There is a lot of time wasted on network communication which is why the database is mainly idle, when the application thinks it's running full speed ahead, it is actually waiting mainly on network communication and commits. If we commit less and batch the work we send to the database, reducing network communications, we will run much more efficiently. Changing the code to
begin
for i in 1..1000 loop
insert into foo values ('a');
-- commit;
end loop;
end;
/
commit;
improves the communication delay and now we get a fully loaded database but we run into database configuration issues.


Part 2 It *is* the database (ie DBA get to work)


In the above DB Optimizer screen, the same workload was run 4 times. We can see that the time (width of the load) reduced, and the percent of activity on CPU increased.
Runs:
1. "log file sync" , the orange color, is the biggest color area, which means uses are waiting on commits, still even though we are committing less in the code. In this case we moved the log files to a faster device. (you can see the checkpoint activity just after run 1 where we moved the log files)
2 "buffer busy wait" , the burnt red, is the biggest color area. We drilled down on the buffer busy wait event in the Top Event section and the details tells use to move the table from a normal tablespace to an Automatice Segmenet Space Managed tablepace.
3."log file switch (checkpoint incomplete)" , the dark brown, is the largest color area, so we increased the size of the log files. (you can see the IO time spent creating the new redo logs just after run 3 )
4. the run time is the shortest and all the time is spent on the CPU which was our goal - ie to take advanteage of all the processors and run the batch job as quickly as possible.


Part 3 It's the machine (rock paper scissors)
Now that the application is tuned and the database is tuned let's run a bigger load:

We can see that the CPU load is constantly over the max CPU line. How can we have a bigger CPU load than there are actually CPUs on the machine? Because, this actually means that the demand for CPU is higher than the CPU available on the machine. In the image above there are 2 CPUs on the machine but and average of 3 users who think they are on the CPU, which means that on average 1 users is not really on the CPU but ready to run on the CPU and waiting for the CPU.
At this point we have two options - in this case we are only running one kind of load, ie the insert. For inserts we can actually go even further tuning this insert and use Oracle's bulk load commands:
declare
TYPE IDX IS TABLE OF Integer INDEX BY BINARY_INTEGER;
MY_IDX IDX;
BEGIN
for i in 1..8000 loop
MY_IDX(i):=1;
end loop;
FORALL indx IN MY_IDX.FIRST .. MY_IDX.LAST
INSERT INTO foo ( dummy )
VALUES ( MY_IDX(indx) );
COMMIT;
end;
/

But if this was an application that had a lot of different SQL and the SQL load was well distributed across the system then we'd have a case for adding more hardware to the system. Making the decision to add more hardware can be a difficult decision because in general the information to make the decision is unknown, unclear or just plain confusing, but DB Optimizer makes it easy and clear, which can save weeks and months of wasteful meetings and debates. For example

If we look in the bottom left, there is no SQL that takes up a significant amount of load, ie there is no outlier SQL that we could tune and gain back a lot of wasted CPU. We'd have to tune many many SQL and make improvements on most of them to gain back enough CPU to get our load down below the max CPU line. In this case, adding CPUs to the machine might be the easiest and most cost affective solution.
Conclusion:
With the load chart we can quickly and easily identify the bottlenecks in the database, take corrective actions, and see the results. IN part 1 we had an application problem, in part 2 we had 3 database configuration issues and in part 3 we had a hardware sizing issue. In all 3 chapters DB Optimizer provides a clear and easy presentation of the data and issues making solutions clear.


James Morle's picture

Upcoming Presentation Slots, Spring 2010

I will be presenting the Oracle Wait Interface is Useless Dog and Pony show a couple of times this spring. It was first presented at the UKOUG 2009 with Tanel Poder, and will be seeing the light of day at the following events:

I hope to see some of you there, and will get the final part posted before RMOUG – so the picture will be complete. If you’ve got any questions, it’s a good opportunity to come and heckle at one of those events!

karlarao's picture

Craig Shallahamer is now blogging!

Craig Shallahamer is now blogging… check out his initial post here!

Earlier I had a few mail exchanges with him and got to check his new work.. called the Stress Identifier. Hmm.. so how it will be helpful? Well if you have already characterized your workload, you may want to definitively determine which internal database activity is stressing your CPU, IO, and network subsystems. So this tool will avoid the guess work of selecting that “unit of work”… check out the link for more details…

And most of all.. grab a copy of his new book Oracle Performance Firefighting… which got lots of good stuff on performance </p />
</p></div>
    <div class=»

khailey's picture

Hung database and can't connect?

I can be exasperating when the database is slow or hanging and I can't even connect to see what's happening. In those cases, here is a trick from Tanel Poder:

If you have such an instance hang you can use a preliminary connection (which starts the process and attaches to SGA, but doesn't initialize SGA structs nor allocate any state objects):

sqlplus -prelim "/as sysdba"

oradebug dump latches 1
oradebug dump hanganalyze 4

Here's a good write up by Arup Nanda

karlarao's picture

Workload characterization using DBA_HIST tables and kSar

Been busy these past few days..

Well, this post is about taking advantage of the built in data store that started in 10gR1.. yeah it’s AWR, and it’s very much like a “Statspack on steroids” (got that from Kyle Hailey’s modules). What’s exciting about 10g and above is we could utilize the “Time Model” and the OSSTAT view and use them together with the SYSTAT view to have a far better workload information when going through all the SNAP_IDs.

I have this “little” issue before of getting lazy when generating multiple AWR reports… yeah it’s just so daunting, you’ll get tired just by generating those reports by hand and the catch is… you’ve not yet analyzed anything yet.. :p   (but yeah, you’re lucky if you’ve got access to the OEM performance page)

So.. I was thinking, if I get to see all the relevant workload info across SNAP_IDs it would be easier for me to notice trends and even possible for me to visualize the data, or even possible to do some statistics out of it.

On the Chapter 9 of Craig Shallahamer’s book (Oracle Performance Firefighting)… there he explained in detail what information you need to get for you to be able to define the Database Server’s Capacity, Requirements, and Utilization (U=R/C)…
I’ve outline some of the points here (see the Firefighting tab)

…. since the AWR report is based on DBA_HIST tables, I was able to make a script that shows the following:

1) CPU capacity – see image below, column 4 to 6
2) CPU requirements – see image below, column 7 to 14
3) Memory requirements – see image below, column 15
4) IO requirements – see image below, column 16 to 21
5) some SYSSTAT delta values – see image 22 to 23
6) Utilization – see image below, the last 6 columns

…. I’ve used the tables below for the query:

  • dba_hist_snapshot
  • dba_hist_osstat
  • dba_hist_sys_time_model
  • dba_hist_sysstat

Check out the scripts here:
For Linux/Unix: awr_genwl.sql
For Windows: awr_genwl_win.sql

IMPORTANT NOTE: Diagnostic Pack License is needed for the scripts, I’m updating my scripts frequently.. if the download link is not working here.. check out the Scripts Section for the file name and the most recent version of the script

Now time for some action! </p />
</p></div>
    <div class=»

To prevent automated spam submissions leave this field empty.