My Journey as Oracle DBA+DevOps Engineer: Backup and Recovery Instance Architecture

The Oracle instance consists of the system global area (SGA), which is the memory allocated to the Oracle instance, and a set of Oracle processes called the background processes. The Oracle processes start when you start the instance and keep running as long as the instance is alive. Each of the Oracle background processes is in charge of a specific activity, such as writing
changed data to the datafiles, cleaning up after disconnected user sessions, and so on.
We’ll briefly review the key Oracle background processes that perform critical backup and recovery–related tasks, which are the checkpoint process, the log writer process, and the archiver process.
The Checkpoint Process
The checkpoint process does three things:
• It signals the database write process (DBWn) at each checkpoint.
• It updates the datafile headers with the checkpoint information.
• It updates the control files with the checkpoint information.
The Log Writer Process
Oracle’s online redo log files record all changes made to the database. Oracle uses a “writeahead” protocol, meaning the logs are written to before the datafiles are. Therefore, it is critical to always protect the online logs against loss by ensuring they are multiplexed. Any changes made to the database are first recorded in the redo log buffer, which is part of the SGA.
Redo log files come into play when a database instance fails or crashes. Upon restart, the instance will read the redo log files looking for any committed changes that need to be applied to the datafiles. Remember, when you commit, Oracle ensures that what you are committing has first been written to the redo log files before these changes are recorded in the actual datafiles. The redo log is the ultimate source of truth for all changes to the data in an Oracle database, since an instance failure before the changes are written to the datafiles means that the changes are only in the redo log files but not in the datafiles.
The log writer (LGWR) process is responsible for transferring the contents of the redo log buffer to the online redo log files. The log writer writes to the online redo files under the following circumstances:
• At each commit
• Every three seconds
• When the redo log buffer is one-third full
The important thing to remember here is that the log writer process writes before the database writer does, because of the write-ahead protocol. Data changes aren’t necessarily written to datafiles when you commit a transaction, but they are always written to the redo log.
The Archiver Process
The archiver (ARCn) is an optional background process and is in charge of archiving the filled online redo log files, before they can be overwritten by new data. The archiver background process is used only if you’re running your database in archivelog mode.

Physical Database Structures Used in Recovering Data
You need to deal with four major physical database structures during a database recovery:
• Datafiles
• Redo logs (archived and online)
• Control files
• Undo records
In a basic database recovery situation, you’d need to first restore datafiles by using backups (from a past period, of course).Once the restoration of the datafiles is completed, you issue the recover command, which results in the database rolling forward all committed data and thus bringing the database up-to-date. The database also rolls back any uncommitted data that’s recorded in the undo segments that are part of the undo tablespace. The database server automatically performs the rollback of uncommitted data by using undo records in the undo tablespace to undo all uncommitted changes that were applied to the datafiles from the
redo logs during the recovery process. This rolling back of uncommitted data takes place by using the information about all the changes made since the last database start-up. Oracle records all changes made to the database in files called the online redo log files. Since Oracle uses a round-robin method of writing the online redo log members, it is critical that you save the filled online redo logs before they are written. The process of saving the filled redo log files is called archiving, and the saved redo log files are termed archived redo log files. A media recovery process uses both the archived redo log files and the online redo log files.
The control file is essential for the Oracle instance to function, because it contains critical information concerning tablespace and datafile records, checkpoints, redo log threads in the current online redo log, log sequence numbers, and so on.
RMAN lets you back up all the files you need for a database recovery, including datafiles, control files, and archived redo logs. RMAN also lets you make image copies of both datafiles and control files, in addition to the standard RMAN-formatted backup pieces. You should never back up online redo log files; instead, always duplex these files to protect against the loss of an online redo log.
HOW TO DETECT UNDO:There are some views that show information related to undo activity:

V$UNDOSTAT: histogram-like view that shows statistics for 10-minute intervals.
V$TRANSACTION: present time view providing information on current transactions.
V$SESSTAT: individual session statistics, which includes one for undo usage.
V$UNDOSTAT will provide who did hint, recording the longest running query for that 10-interval, through the MAXQUERYID column which may be linked to V$SQL and use columns PARSING_USER_ID or PARSING_SCHEMA_NAME the get a grip on the suspect.
V$TRANSACTION linked with V$SESSION will show current used undo blocks for ongoing transactions. This query may help:

SELECT a.sid, a.username, b.used_urec, b.used_ublk

FROM v$session a, v$transaction b

WHERE a.saddr = b.ses_addr

ORDER BY b.used_ublk DESC

V$SESSTAT provides another view, which uses the undo kind of view, but we must avoid getting lost in the maze of Oracle statistics and focusing on just one: Undo change vector size, which will accumulate the bytes of undo used during the session lifetime. Following query is designed to pinpoint who is having a high undo activity.

SELECT a.sid, b.name, a.value

FROM v$sesstat a, v$statname b

WHERE a.statistic# = b.statistic#

AND a.statistic# = 176<– Which stands for undo change vector size

ORDER BY a.value DESC

HOW TO DETECT REDO:To find sessions generating lots of redo, you can use either of the following methods. Both methods examine the amount of undo generated. When a transaction generates undo, it will automatically generate redo as well. The methods are: 1) Query V$SESS_IO. This view contains the column BLOCK_CHANGES which indicates how much blocks have been changed by the session. High values indicate a session generating lots of redo.

The query you can use is:

SELECT s.sid, s.serial#, s.username, s.program, i.block_changes

FROM v$session s, v$sess_io i

WHERE s.sid = i.sid

ORDER BY 5 desc, 1, 2, 3, 4;

Run the query multiple times and examine the delta between each occurrence of BLOCK_CHANGES. Large deltas indicate high redo generation by the session.
Query V$TRANSACTION. These view contains information about the amount of undo blocks and undo records accessed by the transaction (as found in the USED_UBLK and USED_UREC columns).

The query you can use is:

SELECT s.sid, s.serial#, s.username, s.program, t.used_ublk, t.used_urec

FROM v$session s, v$transaction t

WHERE s.taddr = t.addr

ORDER BY 5 desc, 6 desc, 1, 2, 3, 4;

Run the query multiple times and examine the delta between each occurrence of USED_UBLK and USED_UREC. Large deltas indicate high redo generation by the session. You use the first query when you need to check for programs generating lots of redo when these programs activate more than one transaction. The latter query can be used to find out which particular transactions are generating redo.

Archivelog and Noarchivelog Mode of Operation
You can operate your Oracle database in either archivelog mode or noarchivelog mode. In noarchivelog mode, Oracle will overwrite the filled online redo logs, instead of archiving (saving) the online redo logs. In this mode, you’re protected only from instance failures, such as those caused by a power failure, for example, but not from a media failure. Thus, if there is a media failure, such as a damaged disk drive, the changes that were overwritten are gone forever, and the database won’t be able to access those data modifications to recover the database up to the current point in time. The transactions made since the last backup are lost forever, and you can restore the database only to the point of the last backup you made.

If you are running your database in noarchivelog mode and you happen to lose a datafile, for example, you follow these steps to get back to work again:
1. If the instance isn’t already shut down, first shut it down.
2. Restore the entire database (datafiles and control files) from the backups.
3. Restart the database by using the startup (open mode) command.
4. Users lose any data that was changed or newly entered in the database since you took the backup that was just restored. You can enter the data if you have a source, or you’re going to have a data loss situation.

If you are running a production database—or if you want to make sure that all the data changes made to any database, for that matter, are always protected—you must operate your database in archivelog mode. Only a database running in archivelog mode can recover from both instance and media failures. You can’t perform a media recovery on a database running in noarchivelog mode.

If you’re running the database in noarchivelog mode, remember that you can make a whole-database backup only after first shutting the database down. You can’t make any online tablespace backups in such a database. A database in noarchivelog mode also can’t use the tablespace point-in-time recovery technique. Make sure you take frequent whole-database backups if an important database is running in noarchivelog mode for some reason.

My Journey as Oracle DBA+DevOps Engineer

Wednesday, November 5, 2014

Backup and Recovery Instance Architecture–Undo Redo

No comments:

Post a Comment