Using ASH
The ASH report allows you to focus on short-lived SQL statements that have been recently run and may have only executed for a brief amount of time. Use the following script to generate an ASH report: SQL> @?/rdbms/admin/ashrpt
Search the output for the section labeled “Top SQL.” Here is some sample output: Sampled #
SQL ID Planhash of Executions % Activity
----------------------- -------------------- -------------------- --------------
Event % Event Top Row Source % RwSrc
------------------------------ ------- --------------------------------- -------
4k8runghhh31d 3219321046 12 51.61
CPU + Wait for CPU 51.61 HASH JOIN 12.26
select countryimp0_.COUNTRY_ID as COUNTRY_ID, countryimp0_.COUNTRY_NAME
The previous output indicates that the query is waiting for CPU resources. In this scenario, it may actually be another query that is consuming the CPU resources that is the problem.
When is the ASH report more useful than the AWR or ADDM reports? The AWR and ADDM output shows top-consuming SQL in terms of total database time. If the SQL performance problem is transient and short-lived, it may not appear on the AWR and ADDM reports. In these situations, an ASH report is more useful.
Using Statspack
If you don’t have a license to use the AWR, ADDM, and ASH reports, the free Statspack utility can help you identify poorly performing SQL statements. Run the following script as SYS to install Statspack: SQL> @?/rdbms/admin/spcreate.sql
This script creates a PERFSTAT user that owns the Statspack repository. To enable the automatic gathering of Statspack statistics, run this script:
SQL> @?/rdbms/admin/spauto.sql
After some snapshots have been gathered, you can run the following script as the PERFSTAT user to create a Statspack report:
SQL> @?/rdbms/admin/spreport.sql
Once the report is created, search for the section labeled “SQL ordered by CPU.” Here is some sample output:
SQL ordered by CPU DB/Inst: DW11/DW11 Snaps: 11-14
-> Total DB CPU (s): 107
-> Captured SQL accounts for 246.0% of Total DB CPU
-> SQL reported below exceeded 1.0% of Total DB CPU
615
CHAPTER 22 ■ DATABASE TROUBLESHOOTING
CPU CPU per Elapsd Old Time (s) Executions Exec (s) %Total Time (s) Buffer Gets Hash Value
---------- ----------- ---------- ------ ---------- --------------- ----------
254.95 4 63.74 238.1 249.74 12,811 2873951798
Module: SQL*Plus
select count(*) from dba_indexes, dba_tables
■
Tip
View the ORACLE_HOME/rdbms/admin/spdoc.txt file for Statspack documentation.
Detecting and Resolving Locking Issues
Sometimes a developer or application user will report that a process that normally takes seconds to run is now taking several minutes and doesn't appear to be doing anything. In these situations, the problem is usually one of the following:
•
•
In this scenario, I first check the alert.log to see if there are any obvious issues that have occurred recently (like a tablespace not being able to allocate another extent). If there is nothing obvious in the alert.log file, I run a SQL query to check for locking issues. The query listed here is a more sophisticated version of the lock detecting script introduced in Chapter 3. This query shows information such as the locking session SQL statement and the waiting SQL statement:
set lines 80
col blkg_user form a10
col blkg_machine form a10
col blkg_sid form 99999999
col wait_user form a10
col wait_machine form a10
col wait_sid form 9999999
col obj_own form a10
col obj_name form a10
col blkg_sql form a50
col wait_sql form a50
--
select
s1.username blkg_user
,s1.machine blkg_machine
,s1.sid blkg_sid
,s1.serial# blkg_serialnum
,s1.process blkg_OS_PID
,substr(b1.sql_text,1,50) blkg_sql
,chr(10)
,s2.username wait_user
,s2.machine wait_machine
,s2.sid wait_sid
616
CHAPTER 22 ■ DATABASE TROUBLESHOOTING
,s2.serial# wait_serialnum
,s2.process wait_OS_PID
,substr(w1.sql_text,1,50) wait_sql
,lo.object_id blkd_obj_id
,do.owner obj_own
,do.object_name obj_name
from v$lock l1
,v$session s1
,v$lock l2
,v$session s2
,v$locked_object lo
,v$sqlarea b1
,v$sqlarea w1
,dba_objects do
where s1.sid = l1.sid
and s2.sid = l2.sid
and l1.id1 = l2.id1
and s1.sid = lo.session_id
and lo.object_id = do.object_id
and l1.block = 1
and s1.prev_sql_addr = b1.address
and s2.sql_address = w1.address
and l2.request > 0;
The output from this query does not fit well on one page. When running this query, you will have to format it so that it fits within the size of your terminal. Here is some sample output indicating that the INV table is locked by the STAR2 user and that the CIA_SEL user is waiting for the lock to be released: BLKG_USER BLKG_MACHI BLKG_SID BLKG_SERIALNUM BLKG_OS_PID
---------- ---------- --------- -------------- ------------
BLKG_SQL C WAIT_USER WAIT_MACHI
-------------------------------------------------- - ---------- ----------
STAR2 dwdb 1084 265 3153
update inv set inv_id=5 where inv_id = 7 CIA_SEL xengdb 1086 222 21436
WAIT_SID WAIT_SERIALNUM WAIT_OS_PID
-------- -------------- ------------
WAIT_SQL BLKD_OBJ_ID OBJ_OWN
-------------------------------------------------- ----------- ----------
OBJ_NAME
----------
update star2.inv set inv_id=3 where inv_id=7 150553 STAR2
INV
This situation is typical when applications don't explicitly issue a COMMIT or ROLLBACK at appropriate times in the code. This leaves a lock on a row and prevents one transaction from continuing until the lock is released. In this scenario, you can try to locate the user who is blocking the transaction and see if they need to push a button on the screen that says something like "commit your changes." If that's not possible, you can manually kill one of the sessions. Keep in mind that terminating a session may have unforeseen side effects (like rolling back data that a user thought was committed).
If you decide to kill one of the sessions, you need to identify the SID and serial number of the session you want to terminate. Once identified, use the ALTER SYSTEM KILL SESSION statement to 617
CHAPTER 22 ■ DATABASE TROUBLESHOOTING
terminate a user session. In this example, I decide to kill the lock held by the STAR2 user with the SID of 1072 and a serial number of 29, so from a DBA privileged account I run the following statement: SQL> alter system kill session '1084,265';
Again, be careful when killing sessions. Ensure that you know the impact of killing a session and thereby rolling back any active transactions currently open in that session.
The other way to kill a session is to use an operating system command such as kill. From the prior output, you can identify the operating system processes from the BLKG_OS_PID column and WAIT_OS_PID
column. Before you terminate a process from the operating system, ensure that it isn't a critical process.
For this example, to terminate the blocking OS process, first check the blocking process ID: $ ps -ef | grep 3153
Here is some sample output:
oracle 3153 1690 0 10:40:30 pts/1 0:00 sqlplus star2....
Next use the kill command as shown:
$ kill -9 3153
The kill command will unceremoniously terminate a process. Any open transactions associated with the process will be rolled back by the Oracle process monitor.
Resolving Open Cursor Issues
The OPEN_CURSORS initialization parameter determines the maximum number of cursors a session can have open. This setting is per each session. The default value of 50 is usually too low for any application.
When an application exceeds the number of open cursors allowed, the following error is thrown: ORA-01000: maximum open cursors exceeded
Usually the prior error is encountered when:
• OPEN_CURSORS initialization parameter is set too low.
• Developers write code that doesn't close cursors properly.
To investigate this issue, first determine the current setting of the parameter: SQL> show parameter open_cursors;
If the value is less than 300, consider setting it higher. I typically set this value to 1000 for busy OLTP
systems. You can dynamically modify this value while your database is open as shown: SQL> alter system set open_cursors=1000;
If you're using an spfile, consider making the change both in memory and in the spfile at the same time:
SQL> alter system set open_cursors=1000 scope=both;
After setting OPEN_CURSORS to a higher value, if the application still continues to exceed the maximum value, you probably have an issue with code that is not properly closing cursors. Run a query such as the following to determine the number of open cursors each session has opened: 618
CHAPTER 22 ■ DATABASE TROUBLESHOOTING
select
a.value
,c.username
,c.machine
,c.sid
,c.serial#
from v$sesstat a
,v$statname b
,v$session c
where a.statistic# = b.statistic#
and c.sid = a.sid
and b.name = 'opened cursors current'
and a.value != 0
and c.username IS NOT NULL
order by 1,2;
■
Tip
I recommend that you query V$SESSION instead of V$OPEN_CURSOR to determine the number of open cursors. V$SESSION provides a more accurate number of the cursors currently open.
If you work in an environment that has thousands of connections to the database, you may want to view only the top cursor-consuming sessions. The following query uses an inline view and the pseudocolumn ROWNUM to display the top twenty values:
select * from (
select
a.value
,c.username
,c.machine
,c.sid
,c.serial#
from v$sesstat a
,v$statname b
,v$session c
where a.statistic# = b.statistic#
and c.sid = a.sid
and b.name = 'opened cursors current'
and a.value != 0
and c.username IS NOT NULL
order by 1 desc,2)
where rownum < 21;
If a single session has over 1000 open cursors, there is probably something in the code that is not closing a cursor. When this limit is reached, somebody should inspect the application code to determine if a cursor is not being closed.
619
CHAPTER 22 ■ DATABASE TROUBLESHOOTING
Troubleshooting Undo Tablespace Issues
Problems with the undo tablespace are usually of the following nature:
• ORA-01555: snapshot too old
• ORA-30036: unable to extend segment by ... in undo tablespace 'UNDOTBS1'
The prior listed errors can be caused by many different issues such as the undo tablespace not being sized correctly or poorly written SQL or PL/SQL code.
Determining if Undo is Correctly Sized
Suppose that you have a long-running SQL statement that is throwing an ORA-01555 “snapshot too old”
error and you want to determine if adding space to the undo tablespace might help alleviate the issue.
Run this next query to identify potential issues with your undo tablespace. The query checks for issues with the undo tablespace that have occurred within the last day:
select
to_char(begin_time,'MM-DD-YYYY HH24:MI') begin_time
,ssolderrcnt ORA_01555_cnt
,nospaceerrcnt no_space_cnt
,txncount max_num_txns
,maxquerylen max_query_len
,expiredblks blck_in_expired
from v$undostat
where begin_time > sysdate - 1
order by begin_time;
Here is some sample output. Part of the output has been omitted to fit this on the page: BEGIN_TIME ORA_01555_CNT NO_SPACE_CNT MAX_NUM_TXNS MAX_QUERY_LEN
---------------- ------------- ------------ ------------ -------------
07-20-2009 18:10 0 0 249 0
07-20-2009 18:20 0 0 290 0
07-20-2009 18:30 0 0 244 0
07-20-2009 18:40 0 0 179 0
The ORA_01555_CNT column indicates the number of times your database has encountered the ORA-01555 “snapshot too old” error. If this column reports a non-zero value, you need to do one or more of the following:
• Ensure that code does not contain COMMIT statements within cursor loops.
• Tune the SQL statement throwing the error so that it runs faster.
• Ensure that you have good statistics (so your SQL runs efficiently).
• Increase the UNDO_RETENTION initialization parameter.
The NO_SPACE_CNT column displays the number of times space was requested in the undo tablespace but none was to be found. If the NO_SPACE_CNT is reporting a non-zero value, you may need to add more space to your undo tablespace.
There is a maximum of four days’ worth of information stored in the V$UNDOSTAT view. The statistics are gathered every ten minutes for a maximum of 576 rows in the table. If you’ve stopped and started 620