Oracle DBA Tips Corner |
Log Gap Detection and Resolution
by Jeff Hunter, Sr. Database Administrator
Contents
Overview
Primary Database Heartbeat
Determining if an Archive Gap Exists
We start with the following assumptions:
Now, let's simulate a network failure by shutting down the Oracle TNS Listener process
and physical standby database on vmlinux4.idevelopment.info:
We can then further investigate the trace file (testdb_arc1_3495.trc)
written to the alert.log above:
Now let's perform a few log switches from the primary database:
Finally, let's query the V$ARCHIVE_GAP view from the physical standby database
to determine if there are any gaps:
After identifying a gap (as shown above), the DBA will need to query
the primary database to locate the archived redo logs on the primary
database. The following query assumes the local archive destination
on the primary database is LOG_ARCHIVE_DEST_1:
Copy the above redo log files to the physical standby database and register them
using the ALTER DATABASE REGISTER LOGFILE ... SQL statement on
the physical standby database. For example:
Using FAL_CLIENT and FAL_SERVER
FAL_CLIENT and FAL_SERVER are initialization parameters used to
configure log gap detection and resolution at the standby database side of a
physical database configuration. This functionality is provided by log apply
services and is used by the physical standby database to manage the detection
and resolution of archived redo logs.
FAL server is a background process that is generally located on the primary database
server (however it can also be found on another standby database). FAL server is
responsible for servicing incoming requests from the FAL client.
Further Reading
All articles, scripts and material located at the Internet address of http://www.idevelopment.info is the copyright of Jeffrey M. Hunter
and is protected under copyright laws of the United States. This document may not be hosted on any other site without my express,
prior, written permission. Application to host any of the material elsewhere can be made by contacting me at jhunter@idevelopment.info.
I have made every effort and taken great care in making sure that the material included on my web site is technically accurate,
but I disclaim any and all responsibility for any loss, damage or destruction of data or any other property which may arise from
relying on it. I will in no case be liable for any monetary damages arising from such loss, damage or destruction.
Archive gaps are simply a range of archived redo logs that were created at a time when the
standby database was not available to receive them as they were being generated by the
primary database. These archive gaps occur most often during network unavailability
between the primary and standby database. During network outages like this, the standby
database fails to receive any redo data from the primary database. When the network failure
has been resolved, automatic transmission of redo data from the primary to the standby
database is resumed. The missing archived redo logs signify the gap.
In general, it is the primary database that first discovers there is a gap. Every one
minute, the primary database polls all of its standby databases to determine if there
is a gap in the sequence of archived redo logs. This polling between the primary and
standby database is often referred to as the heartbeat. This heartbeat is performed
by the primary database serially. The DBA can always determine if there is a gap
in the archived redo logs by querying the V$ARCHIVE_GAP view
(from the physical standby database) as described
in the next section.
As mentioned in the previous section, the DBA can determine if
there is a there is a gap in the archived redo logs by querying
the V$ARCHIVE_GAP view from the physical standby database. Let's
walk through a quick example of how to manually check and resolve
a gap in the archived redo logs.
With the above assumptions, perform a query
of the V$ARCHIVE_GAP view from the physical standby database:
SQL> select * from v$archive_gap;
no rows selected
From the output of the above query, we can conclude that there are no gaps
in the archived redo logs on the physical standby database!
$ lsnrctl stop
$ sqlplus "/ as sysdba" <<EOF
> shutdown abort
> EOF
SQL> ORACLE instance shut down.
At this point, the physical standby database is completely down and unavailable to
the primary database. One of the first things to notice is a series of RFS
errors in the alert.log for the primary database:
Errors in alert.log from the Primary Database
...
Sat Jul 22 23:19:04 2006
ARC1: Evaluating archive log 3 thread 1 sequence 16
ARC1: Beginning to archive log 3 thread 1 sequence 16
Creating archive destination LOG_ARCHIVE_DEST_2: 'TESTDB_VMLINUX4'
Creating archive destination LOG_ARCHIVE_DEST_1: '/u02/oraarchive/TESTDB/arch_t1_s16.dbf'
ARC1: Completed archiving log 3 thread 1 sequence 16
Sat Jul 22 23:21:25 2006
Errors in file /u01/app/oracle/admin/TESTDB/bdump/testdb_arc1_3495.trc:
ORA-12541: TNS:no listener
Sat Jul 22 23:22:25 2006
Errors in file /u01/app/oracle/admin/TESTDB/bdump/testdb_arc1_3495.trc:
ORA-12541: TNS:no listener
Sat Jul 22 23:23:25 2006
Errors in file /u01/app/oracle/admin/TESTDB/bdump/testdb_arc1_3495.trc:
ORA-12541: TNS:no listener
...
Errors in testdb_arc1_3495.trc from the Primary Database
/u01/app/oracle/admin/TESTDB/bdump/testdb_arc1_3495.trc
Oracle9i Enterprise Edition Release 9.2.0.7.0 - Production
With the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining options
JServer Release 9.2.0.7.0 - Production
ORACLE_HOME = /u01/app/oracle/product/9.2.0
System name: Linux
Node name: vmlinux3
Release: 2.6.9-22.EL
Version: #1 Sat Oct 8 17:48:27 CDT 2005
Machine: i686
Instance name: TESTDB
Redo thread mounted by this instance: 0 SQL> alter system switch logfile;
System altered.
SQL> alter system switch logfile;
System altered.
SQL> alter system switch logfile;
System altered.
Next, bring up the Oracle TNS Listener process and mount the physical standby database:
$ lsnrctl start
$ sqlplus "/ as sysdba"
SQL> startup nomount
SQL> alter database mount standby database;
SQL> alter database recover managed standby database disconnect from session;
After a minute has passed, the primary database (the ARCH1 process) will
wake up and attempt to ping the physical standby database. Once it is
determined that the physical standby database is mounted (and the TNS
Listener process is up), it will start to push any archived redo
logs to the standby database. This can be see in the alert.log from the
primary database:
RFS Process Recovery in alert.log from the Primary Database
...
Sat Jul 22 23:28:26 2006
ARC1: Begin FAL archive (thread 1 sequence 17 destination TESTDB_VMLINUX4)
Creating archive destination LOG_ARCHIVE_DEST_2: 'TESTDB_VMLINUX4'
ARC1: Complete FAL archive (thread 1 sequence 17 destination TESTDB_VMLINUX4)
ARC1: Begin FAL archive (thread 1 sequence 18 destination TESTDB_VMLINUX4)
Creating archive destination LOG_ARCHIVE_DEST_2: 'TESTDB_VMLINUX4'
ARC1: Complete FAL archive (thread 1 sequence 18 destination TESTDB_VMLINUX4)
ARC1: Begin FAL archive (thread 1 sequence 19 destination TESTDB_VMLINUX4)
Creating archive destination LOG_ARCHIVE_DEST_2: 'TESTDB_VMLINUX4'
ARC1: Complete FAL archive (thread 1 sequence 19 destination TESTDB_VMLINUX4)
...SQL> select * from v$archive_gap;
no rows selected
As we can see, there are no gaps in the archive redo logs being reported!
When the DBA queries the V$ARCHIVE_GAP view and has a record returned, this indicates
a gap in the archived redo logs as illustrated below and may require manual intervention
by the DBA:
SQL> select * from v$archive_gap;
THREAD# LOW_SEQUENCE# HIGH_SEQUENCE#
-------- -------------- --------------
1 24 28
From the output above, the physical standby database is currently missing logs from
sequence 24 to sequence 28 for thread 1. Note that this view only returns the next
gap that is currently blocking managed recovery from continuing. After resolving
the identified gap and starting managed recovery, the DBA should query
the V$ARCHIVE_GAP view again on the physical standby database to
determine the next (if any) gap sequence. This process should be repeated
until there are no more gaps.
SELECT name
FROM v$archived_log
WHERE thread# = 1
AND dest_id = 1
AND sequence# BETWEEN 24 and 28;
NAME
--------------------------------------
/u02/oraarchive/TESTDB/arch_t1_s24.dbf
/u02/oraarchive/TESTDB/arch_t1_s25.dbf
/u02/oraarchive/TESTDB/arch_t1_s26.dbf
/u02/oraarchive/TESTDB/arch_t1_s27.dbf
/u02/oraarchive/TESTDB/arch_t1_s28.dbf
SQL> ALTER DATABASE REGISTER LOGFILE '/u02/oraarchive/TESTDB/arch_t1_s24.dbf';
SQL> ALTER DATABASE REGISTER LOGFILE '/u02/oraarchive/TESTDB/arch_t1_s25.dbf';
SQL> ALTER DATABASE REGISTER LOGFILE '/u02/oraarchive/TESTDB/arch_t1_s26.dbf';
SQL> ALTER DATABASE REGISTER LOGFILE '/u02/oraarchive/TESTDB/arch_t1_s27.dbf';
SQL> ALTER DATABASE REGISTER LOGFILE '/u02/oraarchive/TESTDB/arch_t1_s28.dbf';
After the redo logs have been registered on the physical standby database, the DBA
can restart the managed recovery operations. For example, to put the physical
standby database into automatic recovery managed mode:
SQL> alter database recover managed standby database disconnect from session;
As mentioned throughout this article, we can see that Data Guard provides gap detection and resolution
automatically by functionality provided in the primary database. The primary database
polls its standby databases using a heartbeat every one minute. No extra configuration
settings are required and no intervention is needed by the DBA to detect and resolve
these gaps. In addition to this built-in functionality; however, it is also possible
to configure log apply services to take part in automatically resolving archive
gaps as they occur on the physical standby database side.
Note that FAL_CLIENT and FAL_SERVER only need to be defined in the initialization
parameter file for the standby database(s). It is possible; however, to define these
two parameters in the initialization parameter for the primary database server to
ease the amount of work that would need to be performed if the primary database were
required to transition its role.
FAL_CLIENT=<net_service_name_of_standby_database>
FAL_SERVER=<net_service_name_of_primary_database>
Note that the value provided for FAL_SERVER is not limited to defining only
one FAL server; it can define multiple FAL servers (possible a FAL server that
is running on another standby database) that are separated by commas as in
the following example:
FAL_SERVER=primary_db,standby_db2
Additional information on Oracle Data Guard can be found
in the Data Guard Concepts and Administration guide
which is available from the Oracle Documentation Library:
Data Guard Concepts and Administration - (Oracle9i Release 2)
Data Guard Concepts and Administration - (Oracle10g Release 2)
Wednesday, 02-Jan-2008 16:49:19 EST
Page Count: 20985