Troubleshooting

tanelpoder's picture

How to log on to an instance when even SYSDBA can't do so?

When you have a situation where let say the archiver process is stuck, regular users and even users with DBA role may not be able to log on. This usually because new sessions logging on generate audit records or (sometimes) update SEQ$ table to get new values from a sequence for populating AUDSID. If the archiver is stuck, then any DML will get blocked as well as they can't generate any new redo until the archiver stuck issue is fixed.

In these cases the SYSDBA (and SYSOPER) privlege holders will still be able to log on and create their session. This is because SYSDBA connections do not generated any audit records to the database tables (that's why there's the audit_file_dest directory where SYSDBA audit files will be written) and SYSDBA connections also don't get their AUDSID from a sequence but use a hardcoded value (0xFFFFFFFF) instead.

But there are cases when even SYSDBA privilege holders can't log on! One situation what I've seen was due a bug, where a process never released a shared pool latch it had taken (in fact it was stuck spinning in free memory searching code). That instance happened to have only one shared pool subpool in use, protected by a single latch. 

That meant that no-one else could allocate/free memory from shared pool, causing most sessions to eventually hang.

Read on how to resolve such situations here:
http://tech.e2sn.com/oracle/troubleshooting/hang/how-to-log-on-even-when-sysdba-can-t-do-so

tanelpoder's picture

Systematic Oracle Latch Contention troubleshooting

As an Oracle DBA, developer or performance analyst, you may have run into what is termed "latch contention" at various points. So what exactly is a "latch" and why do we have contention on this "latch". In this article, we will take an in-depth look at latches and how we determine and resolve such contention. Whether you are a newbie or an experienced old-timer, we hope this article will cast a little more light on this ill-understood subject.
Read the article here:
 http://tech.e2sn.com/oracle/troubleshooting/latch-contention-troubleshooting
 

tanelpoder's picture

How to read an Oracle ERRORSTACK output

Errorstack tracefiles are very useful for troubleshooting ORA-600's, crashes, hangs and even just bad performance.
Errorstack dumps are dumped automatically by Oracle when critical errors such as ORA-600 happen. This is when you see an error reported in alert.log and the generated tracefile has a "ksedmp: internal or fatal error" string in its beginning, followed by the error code (usually ORA-7445 or ORA-600 with some parameters). 
"ksedmp" means Kernel Service Error DuMP, so everything below that line is the errorstack dump.

Errorstack dumps can also be manually invoked by issuing ORADEBUG ERRORSTACK 3 (when being connected to the target process using ORADEBUG SETOSPID). This can be useful when a session seems to be hung (but without showing a reasonable wait event in V$SESSION_WAIT) or is consuming much more resources than normally and you want to know which exact bind variable values are currently used for executing the SQL.

Read more here:
http://tech.e2sn.com/oracle/troubleshooting/how-to-read-errorstack-output
 

tanelpoder's picture

Non-trivial performance problems

Gwen Shapira has written an article about a good example of a non-trivial performance problem.

I’m not talking about anything advanced here (such as bugs or problems arising at OS/Oracle touchpoint) but that sometimes the root cause of a problem (or at least the reason why you notice this problem now) is not something deeply technical or related to some specific SQL optimizer feature or a configuration issue. Instead of focusing on the first symptom you see immediately, it pays off to take a step back and see how the problem task/application/SQL is actually used by the users or client applications.

In other words, talk to the users, ask how exactly they experience the problem and then drill down from there.

Share/Bookmark

tanelpoder's picture

Oracle Latch Contention Troubleshooting

I wrote a latch contention troubleshooting article for IOUG Select journal last year (it was published earlier this year). I have uploaded this to tech.E2SN too, I recommend you to read it if you want to become systematic about latch contention troubleshooting:

http://tech.e2sn.com/oracle/troubleshooting

I’m working on getting the commenting & feedback work at tech.E2SN site too, but for now you can comment here at this blog entry…

Share/Bookmark

tanelpoder's picture

Session Snapper v3.11 – bugfix update – now ASH report works properly on Oracle 10.1 too

This is an updated version of Snapper, which works ok on Oracle 10.1 now as well (9i support is coming some time in the future :)

Thanks to Jamey Johnston for sending me the fix info (and saving me some time that way :)

So if you have some problems with Snapper on Oracle 10.1, please make sure you have the latest version v3.11, which you can get from here:

http://tech.e2sn.com/oracle-scripts-and-tools/session-snapper

The output below is from Snapper 3.11 on Oracle 10.1.0.5, the ASH columns in the bottom part of the output are displayed correctly now:

Jonathan Lewis's picture

Index too big

I thought I’d posted this a couple of years ago – but maybe it was something I put on the OTN database forum in response to a question. If it was, the same (or similar) question has recently appeared.  “How come my index is so big when there’s no data in the table ?”
Of course, [...]

tanelpoder's picture

Oracle Session Snapper v3.10

Hi all, long time no see!  =8-)

Now as I’m done with the awesome Hotsos Symposium (and the training day which I delivered) and have got some rest, I’ll start publishing some of the cool things I’ve been working on over the past half a year or so.

The first is Oracle Session Snapper version 3!

There are some major improvements in Snapper 3, like ASH style session activity sampling!

When you troubleshoot a session’s performance (or instance performance) then the main things you want to know first are very very simple:

  1. Which SQL statements are being executed
  2. What are they doing, are they working on CPU or waiting.
  3. If waiting, then for what

Often this is enough for troubleshooting what’s wrong. For example, if a session is waiting for a lock, then wait interface will show you that. If a single SQL statement is taking 99% of total response time, the V$SESSION (ASH style) samples will point out the problem SQL and so on. Simple stuff.

However there are cases where you need to go beyond wait interface and use V$SESSTAT (and other) counters and even take a “screwdriver” and open Oracle up from outside by stack tracing :-)

When I wrote the first version of Snapper for my own use some 4-5 years ago I wrote it mainly having the “beyond wait interface” part in mind. So I focused on V$SESSTAT and various other counters and left the basic troubleshooting to other tools. I used to manually sample V$SESSION/V$SESSION_WAIT a few times in a row to get a rough overview of what a session was doing or some other special-purpose scripts.

However after Snapper got more popular and I started getting some feedback about it I saw the need for covering more with Snapper, not just the “beyond wait interface” part, but also the “wait interface” and “which SQL” part too.

tanelpoder's picture

A Free afternoon seminar in Singapore (24th Feb)

If you are in Singapore and have 24th Feb afternoon available then you can register and join a free Oracle performance troubleshooting seminar I’m doing in Singapore Management University’s (SMU) campus.

The seminar will be about:

  • Systematic Oracle Performance Troubleshooting
  • Identifying performance troublemakers
  • Understanding execution plans

The date is Wednesday, 24th Feb

The seminar time is from 15:30-19:00 (don’t be late)

Registration and more details are here:

Share/Bookmark

tanelpoder's picture

New versions of LatchProf and LatchProfX for latch contention troubleshooting and tuning

The LatchProf and LatchProfX scripts allow you to be more systematic with latch contention troubleshooting and tuning. No more guesswork is needed as these scripts give you exact session IDs and in this version also SQLIDs of the troublemaking applications.

You can download the new versions here:

Example output (with SQLID info) is below:

To prevent automated spam submissions leave this field empty.
Syndicate content