CHAPTER 19 ■ RMAN RESTORE AND RECOVERY
other server were intact. I was able to restore and recover the production database from these backups.
We lost about a days worth of data (between corrupt archive logs and downtime when no incoming transactions were allowed), but we were able to get the database restored and recovered about 20 hours after the initial phone call. That was a long day.
Most situations in which you need to restore and recovery will not be as bad as the one just described. However, the previous scenario does highlight the need for:
• A Backup strategy.
• A DBA with backup & recovery skills.
• A restore and recovery strategy, including a requirement to periodically test the restore and recovery.
This chapter walks you through restore and recovery using RMAN. It covers many of the common tasks you will have to perform when dealing with media failures.
Determining Media Recovery Required
The term “media recovery” refers to the need to restore files that have been lost or damaged due to failure of the underlying storage media (usually a disk of some sort). Usually, you know that media recovery is required by some sort of an error like the following:
ORA-01157: cannot identify/lock data file 1 - see DBWR trace file
ORA-01110: data file 1: '/u02/oracle/oradata/E64208/system01.dbf'
The error may be displayed on your screen when performing DBA tasks such as stopping and starting the database. Or you might see such an error in a trace file or the alert.log file. If you don't notice the issue right away, with a severe media failure, the database will stop processing transactions and users will start calling you.
To understand how Oracle is determining that media recovery is required, you must understand how Oracle determines that everything is okay. When Oracle shuts down normally (IMMEDIATE, TRANSACTIONAL, NORMAL), part of the shutdown process is to flush all modified blocks to disk and mark the header of each datafile with the current SCN and to update the control file with the current SCN
information.
Upon startup, Oracle checks to see if the SCN in the control file matches the SCN in the header of the datafile. If there is a match, then Oracle attempts to open the datafiles and online redo log files. If all files are available and can be opened, Oracle starts normally. The following query compares the SCN in the control file (for each datafile) with the SCN in the datafile header: SET LINES 132
COL name FORM a40
COL status FORM A8
COL file# FORM 9999
COL control_file_SCN FORM 999999999999999
COL datafile_SCN FORM 999999999999999
--
SELECT
a.name
,a.status
,a.file#
,a.checkpoint_change# control_file_SCN
,b.checkpoint_change# datafile_SCN
,CASE
510
CHAPTER 19 ■ RMAN RESTORE AND RECOVERY
WHEN ((a.checkpoint_change# - b.checkpoint_change#) = 0) THEN 'Startup Normal'
WHEN ((b.checkpoint_change#) = 0) THEN 'File Missing?'
WHEN ((a.checkpoint_change# - b.checkpoint_change#) > 0) THEN 'Media Rec. Req.'
WHEN ((a.checkpoint_change# - b.checkpoint_change#) < 0) THEN 'Old Control File'
ELSE 'what the ?'
END datafile_status
FROM v$datafile a -- control file SCN for datafile
,v$datafile_header b -- datafile header SCN
WHERE a.file# = b.file#
ORDER BY a.file#;
If the control file SCN values are greater than the datafile SCN values, then media recovery is most likely required.
Determining What to Restore
Media recovery requires that you perform manual tasks to get your database back in one piece. These tasks usually involve a combination of RESTORE and RECOVER commands. You will have to issue an RMAN
RESTORE command if your datafiles have experienced media failure. This could be because of somebody accidentally deleting files or a disk failure.
How the Process Works
When you issue the RESTORE command, RMAN will automatically determine how to extract the datafiles from any of the following available backups:
• Full database backup
• Incremental level 0 backup
• Image copy backup generated by BACKUP AS COPY command
After the files are restored from a backup, you are required to apply redo to them via the RECOVER
command. When you issue the RECOVER command, Oracle will examine the SCNs in the affected datafiles and determine whether any of them need to be recovered. If the SCN in the datafile is less than the corresponding SCN in the control file, then media recovery will be required.
Oracle will retrieve the datafile SCN and then look for the corresponding SCN in the redo stream to determine where to start the recovery process. If the starting recovery SCN is in the online redo log files, the archived redo log files are not required for recovery.
During a recovery, RMAN will automatically determine how to apply redo. First, RMAN will apply any incremental backups available that are greater than zero, such as the incremental level 1. Next, any archived redo log files on disk will be applied. If the archived redo log files do not exist on disk, RMAN
will attempt to retrieve them from a backup set.
To be able to perform a complete recovery, all of the following conditions need to be true:
• Your database is in archivelog mode.
• You have a good baseline backup of your database.
• You have any required redo that has been generated since the backup (archived redo log files, online redo log files, or incremental backups that RMAN can use for recovery instead of applying redo).
511
CHAPTER 19 ■ RMAN RESTORE AND RECOVERY
There are a wide variety of restore and recovery scenarios. How you restore and recover depends directly on your backup strategy and what files have been damaged. Listed next are the general steps to follow when facing a media failure:
1. Determine what files need to be restored.
2. Depending on the damage, set your database mode to nomount, mount, or open.
3. Use the RESTORE command to retrieve files from RMAN backups.
4. Use the RECOVER command for datafiles requiring recovery.
5. Open your database.
Your particular restore and recovery scenario may not require that all of the previous steps be performed. For example, you may just want to restore your spfile, which doesn’t require a recovery step.
The first step in a restore and recovery process is to determine what files have experienced media failure. You can usually determine what files need to be restored from the following sources:
• Error messages displayed on your screen, either from RMAN or SQL*Plus
• Alert.log file and corresponding trace files
• Data dictionary views
If you're using Oracle Database 11
g
or higher, then in addition to the previously listed methods you should consider the Data Recovery Advisor for obtaining information about the extent of a failure and corresponding corrective action.
Using Data Recovery Advisor
The Data Recovery Advisor tool was introduced in Oracle Database 11
g
. In the event of a media failure, this tool will display the details of the failure, recommend corrective actions, and it will perform the recommended actions if you specify it to do so. It's like having another set of eyes to provide feedback when in a restore and recovery situation. There are three modes to Data Recovery Advisor:
• Listing failures.
• Suggesting corrective action.
• Running commands to repair failures.
The Data Recovery Advisor is invoked from RMAN. You can think of the Data Recovery Advisor as a set of RMAN commands that can assist you when dealing with media failures.
Listing Failures
When using the Data Recovery Advisor, the LIST FAILURE command is used to display any issues with the datafiles, control files, or online redo logs:
RMAN> list failure;
If there are no detected failures, you'll see a message indicating that there are no failures. Here is some sample output indicating that there may be an issue with a datafile: 512
CHAPTER 19 ■ RMAN RESTORE AND RECOVERY
List of Database Failures
=========================
Failure ID Priority Status Time Detected Summary
---------- -------- --------- ------------- -------
662 HIGH OPEN 16-SEP-10 One or more non-system datafiles are missing The prior message doesn’t indicate which specific file may be experiencing a failure. To dig a little deeper, use the DETAIL clause:
RMAN> list failure 662 detail;
List of Database Failures
=========================
Failure ID Priority Status Time Detected Summary
---------- -------- --------- ------------- -------
662 HIGH OPEN 16-SEP-10
One or more non-system datafiles are missing
Impact: See impact for individual child failures
List of child failures for parent failure ID 662
Failure ID Priority Status Time Detected Summary
---------- -------- --------- ------------- -------
665 HIGH OPEN 16-SEP-10 Datafile 7:
'/ora01/dbfile/O11R2/users02.dbf' is missing
Impact: Some objects in tablespace USERS might be unavailable
This output details which file has experienced a failure and the nature of the problem (file missing).
Suggesting Corrective Action
The ADVISE FAILURE command gives advice about how to recover from potential problems detected by the Data Recovery Advisor. If you have multiple failures with your database, you can directly specify the failure ID to get advice on a given failure like so:
RMAN> advise failure 665;
Here is some sample output for this particular issue:
=======================
========================
------ ------------------
1 Restore and recover datafile 7
Strategy: The repair includes complete media recovery with no data loss Repair script: /ora01/app/oracle/diag/rdbms/o11r2/O11R2/hm/reco_1184243250.hm In this case, the Data Recovery Advisor created a script that can be used to potentially fix the problem. The contents of the repair script can be viewed with an operating system editor. For example: $ vi /ora01/app/oracle/diag/rdbms/o11r2/O11R2/hm/reco_1184243250.hm
513
CHAPTER 19 ■ RMAN RESTORE AND RECOVERY
Here are the contents of the script (for this particular example):
# restore and recover datafile
sql 'alter database datafile 7 offline';
restore datafile 7;
recover datafile 7;
sql 'alter database datafile 7 online';
After reviewing the script, you can decide to manually run the suggested commands, or you can have the Data Recovery Advisor run the script via the REPAIR command (see the next section for details).
Repairing Failures
If you have identified a failure and viewed the recommended advice, you can proceed to actually repairing a failure. If you want to inspect what the REPAIR FAILURE command will do without actually running the commands, use the PREVIEW clause. Before you run the command, make sure you first run the LIST FAILURE and ADVISE FAILURE commands from the same connected session. In other words, the RMAN session that you're in must run the LIST and ADVISE commands within the same session before running the REPAIR command.
RMAN> repair failure preview;
If you're satisfied with the repair suggestions, then run the REPAIR FAILURE command.
RMAN> repair failure;
You'll be prompted at this point for confirmation.
Do you really want to execute the above repair (enter YES or NO)?
Type in YES to proceed.
YES
If all goes well, you should see a final message like this:
media recovery complete, elapsed time: 00:00:02
Finished recover at 16-SEP-10
sql statement: alter database datafile 7 online
repair failure complete
■
Note
You can run the Data Recovery Advisor commands from the RMAN command prompt or from Enterprise Manager.
In this way, you can use the RMAN commands of LIST FAILURE, ADVISE FAILURE, and REPAIR
FAILURE to resolve media failures. The Data Recovery Advisor can assist with most issues that you'll run into. I did have a couple scenarios where the following recommendation was provided: 514
CHAPTER 19 ■ RMAN RESTORE AND RECOVERY
Mandatory Manual Actions
========================
Optional Manual Actions
=======================
Automated Repair Options
========================
This output indicates that you (the DBA) aren't quite out of a job yet. ;) Using RMAN to Stop/Start Oracle
You can use RMAN to stop and start your database with methods that are almost identical to those available through SQL*Plus. When performing restore and recovery operations, it’s often more convenient to stop and start your database from within RMAN. The following RMAN commands can be used to stop and start your database:
• SHUTDOWN
• STARTUP
• ALTER DATABASE
Shutting Down
The SHUTDOWN command works the same from RMAN as it does from SQL*Plus. There are four types of shutdown: ABORT, IMMEDIATE, NORMAL, and TRANSACTIONAL. I usually first attempt to use SHUTDOWN
IMMEDIATE to stop a database. If that doesn't work, don't hesitate to use SHUTDOWN ABORT. Here are some examples:
RMAN> shutdown immediate;
RMAN> shutdown abort;
If you don't specify a shutdown option, NORMAL is the default. Shutting a database down with NORMAL
is rarely viable as this mode waits for currently connected users to disconnect at their leisure. I never use NORMAL when shutting down a database.