DBA Tips Archive for Oracle

  


Log Gap Detection and Resolution

by Jeff Hunter, Sr. Database Administrator

Contents

Introduction

Archive gaps are simply a range of archived redo logs that were created at a time when the standby database was not available to receive them as they were being generated by the primary database. These archive gaps occur most often during network unavailability between the primary and standby database. During network outages like this, the standby database fails to receive any redo data from the primary database. When the network failure has been resolved, automatic transmission of redo data from the primary to the standby database is resumed. The missing archived redo logs signify the gap.

Primary Database Heartbeat

In general, it is the primary database that first discovers there is a gap. Every one minute, the primary database polls all of its standby databases to determine if there is a gap in the sequence of archived redo logs. This polling between the primary and standby database is often referred to as the heartbeat. This heartbeat is performed by the primary database serially. The DBA can always determine if there is a gap in the archived redo logs by querying the V$ARCHIVE_GAP view (from the physical standby database) as described in the next section.

Determining if an Archive Gap Exists

As mentioned in the previous section, the DBA can determine if there is a there is a gap in the archived redo logs by querying the V$ARCHIVE_GAP view from the physical standby database. Let's walk through a quick example of how to manually check and resolve a gap in the archived redo logs.

We start with the following assumptions:

  1. The Data Guard configuration consists of a physical primary database and one physical standby database.
  2. The Data Guard configuration is set for Maximum Performance.
  3. The name of the physical primary database is TESTDB.IDEVELOPMENT.INFO and resides on the node vmlinux3.idevelopment.info.
  4. The name of the physical standby database is TESTDB.IDEVELOPMENT.INFO and resides on the node vmlinux4.idevelopment.info.
  5. The primary database is configured to send redo to the physical standby database by defining: log_archive_dest_2='service=TESTDB_VMLINUX4 optional reopen=15'
  6. Neither FAL_CLIENT nor FAL_SERVER is defined on the physical standby database.
  7. The standby database is in managed recovery mode and is current (all archived redo logs have been applied) with the primary database. The current log sequence on the primary database is 16.

With the above assumptions, perform a query of the V$ARCHIVE_GAP view from the physical standby database:


SQL> select * from v$archive_gap; no rows selected

From the output of the above query, we can conclude that there are no gaps in the archived redo logs on the physical standby database!

Now, let's simulate a network failure by shutting down the Oracle TNS Listener process and physical standby database on vmlinux4.idevelopment.info:


$ lsnrctl stop $ sqlplus "/ as sysdba" <<EOF > shutdown abort > EOF SQL> ORACLE instance shut down.

At this point, the physical standby database is completely down and unavailable to the primary database. One of the first things to notice is a series of RFS errors in the alert.log for the primary database.

Errors in alert.log from the Primary Database


... Sat Jul 22 23:19:04 2006 ARC1: Evaluating archive log 3 thread 1 sequence 16 ARC1: Beginning to archive log 3 thread 1 sequence 16 Creating archive destination LOG_ARCHIVE_DEST_2: 'TESTDB_VMLINUX4' Creating archive destination LOG_ARCHIVE_DEST_1: '/u02/oraarchive/TESTDB/arch_t1_s16.dbf' ARC1: Completed archiving log 3 thread 1 sequence 16 Sat Jul 22 23:21:25 2006 Errors in file /u01/app/oracle/admin/TESTDB/bdump/testdb_arc1_3495.trc: ORA-12541: TNS:no listener Sat Jul 22 23:22:25 2006 Errors in file /u01/app/oracle/admin/TESTDB/bdump/testdb_arc1_3495.trc: ORA-12541: TNS:no listener Sat Jul 22 23:23:25 2006 Errors in file /u01/app/oracle/admin/TESTDB/bdump/testdb_arc1_3495.trc: ORA-12541: TNS:no listener ...

We can then further investigate the trace file (testdb_arc1_3495.trc) written to the alert.log above.

Errors in testdb_arc1_3495.trc from the Primary Database


/u01/app/oracle/admin/TESTDB/bdump/testdb_arc1_3495.trc Oracle9i Enterprise Edition Release 9.2.0.7.0 - Production With the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining options JServer Release 9.2.0.7.0 - Production ORACLE_HOME = /u01/app/oracle/product/9.2.0 System name: Linux Node name: vmlinux3 Release: 2.6.9-22.EL Version: #1 Sat Oct 8 17:48:27 CDT 2005 Machine: i686 Instance name: TESTDB Redo thread mounted by this instance: 0 Oracle process number: 11 Unix process pid: 3495, image: oracle@vmlinux3 (ARC1) *** SESSION ID:(10.1) 2006-07-22 23:18:25.890 Destination LOG_ARCHIVE_DEST_2 is in CLUSTER CONSISTENT mode Destination LOG_ARCHIVE_DEST_2 is in MAXIMUM PERFORMANCE mode *** 2006-07-22 23:20:25.902 RFS network connection lost at host 'TESTDB_VMLINUX4' Fail to ping standby 'TESTDB_VMLINUX4', error = 3113 Error 3113 when pinging standby TESTDB_VMLINUX4. *** 2006-07-22 23:20:25.903 kcrrfail: dest:2 err:3113 force:0 *** 2006-07-22 23:21:25.908 Error 12541 connecting to destination LOG_ARCHIVE_DEST_2 standby host 'TESTDB_VMLINUX4' Error 12541 attaching to destination LOG_ARCHIVE_DEST_2 standby host 'TESTDB_VMLINUX4' Heartbeat failed to connect to standby 'TESTDB_VMLINUX4'. Error is 12541. *** 2006-07-22 23:21:25.909 kcrrfail: dest:2 err:12541 force:0 ORA-12541: TNS:no listener ...

Now let's perform a few log switches from the primary database:


SQL> alter system switch logfile; System altered. SQL> alter system switch logfile; System altered. SQL> alter system switch logfile; System altered.

Next, bring up the Oracle TNS Listener process and mount the physical standby database:


$ lsnrctl start $ sqlplus "/ as sysdba" SQL> startup nomount SQL> alter database mount standby database; SQL> alter database recover managed standby database disconnect from session;

After a minute has passed, the primary database (the ARCH1 process) will wake up and attempt to ping the physical standby database. Once it is determined that the physical standby database is mounted (and the TNS Listener process is up), it will start to push any archived redo logs to the standby database. This can be see in the alert.log from the primary database:

RFS Process Recovery in alert.log from the Primary Database


... Sat Jul 22 23:28:26 2006 ARC1: Begin FAL archive (thread 1 sequence 17 destination TESTDB_VMLINUX4) Creating archive destination LOG_ARCHIVE_DEST_2: 'TESTDB_VMLINUX4' ARC1: Complete FAL archive (thread 1 sequence 17 destination TESTDB_VMLINUX4) ARC1: Begin FAL archive (thread 1 sequence 18 destination TESTDB_VMLINUX4) Creating archive destination LOG_ARCHIVE_DEST_2: 'TESTDB_VMLINUX4' ARC1: Complete FAL archive (thread 1 sequence 18 destination TESTDB_VMLINUX4) ARC1: Begin FAL archive (thread 1 sequence 19 destination TESTDB_VMLINUX4) Creating archive destination LOG_ARCHIVE_DEST_2: 'TESTDB_VMLINUX4' ARC1: Complete FAL archive (thread 1 sequence 19 destination TESTDB_VMLINUX4) ...

Finally, let's query the V$ARCHIVE_GAP view from the physical standby database to determine if there are any gaps:


SQL> select * from v$archive_gap; no rows selected

As we can see, there are no gaps in the archive redo logs being reported!

 

When the DBA queries the V$ARCHIVE_GAP view and has a record returned, this indicates a gap in the archived redo logs as illustrated below and may require manual intervention by the DBA:

SQL> select * from v$archive_gap;

THREAD#  LOW_SEQUENCE#  HIGH_SEQUENCE#
-------- -------------- --------------
       1            24              28

From the output above, the physical standby database is currently missing logs from sequence 24 to sequence 28 for thread 1. Note that this view only returns the next gap that is currently blocking managed recovery from continuing. After resolving the identified gap and starting managed recovery, the DBA should query the V$ARCHIVE_GAP view again on the physical standby database to determine the next (if any) gap sequence. This process should be repeated until there are no more gaps.

After identifying a gap (as shown above), the DBA will need to query the primary database to locate the archived redo logs on the primary database. The following query assumes the local archive destination on the primary database is LOG_ARCHIVE_DEST_1:

SELECT name
FROM v$archived_log
WHERE thread# = 1
  AND dest_id = 1
  AND sequence# BETWEEN 24 and 28;

NAME
--------------------------------------
/u02/oraarchive/TESTDB/arch_t1_s24.dbf
/u02/oraarchive/TESTDB/arch_t1_s25.dbf
/u02/oraarchive/TESTDB/arch_t1_s26.dbf
/u02/oraarchive/TESTDB/arch_t1_s27.dbf
/u02/oraarchive/TESTDB/arch_t1_s28.dbf

Copy the above redo log files to the physical standby database and register them using the ALTER DATABASE REGISTER LOGFILE ... SQL statement on the physical standby database. For example:

SQL> ALTER DATABASE REGISTER LOGFILE '/u02/oraarchive/TESTDB/arch_t1_s24.dbf';
SQL> ALTER DATABASE REGISTER LOGFILE '/u02/oraarchive/TESTDB/arch_t1_s25.dbf';
SQL> ALTER DATABASE REGISTER LOGFILE '/u02/oraarchive/TESTDB/arch_t1_s26.dbf';
SQL> ALTER DATABASE REGISTER LOGFILE '/u02/oraarchive/TESTDB/arch_t1_s27.dbf';
SQL> ALTER DATABASE REGISTER LOGFILE '/u02/oraarchive/TESTDB/arch_t1_s28.dbf';

After the redo logs have been registered on the physical standby database, the DBA can restart the managed recovery operations. For example, to put the physical standby database into automatic recovery managed mode:

SQL> alter database recover managed standby database disconnect from session;

Using FAL_CLIENT and FAL_SERVER

As mentioned throughout this article, we can see that Data Guard provides gap detection and resolution automatically by functionality provided in the primary database. The primary database polls its standby databases using a heartbeat every one minute. No extra configuration settings are required and no intervention is needed by the DBA to detect and resolve these gaps. In addition to this built-in functionality; however, it is also possible to configure log apply services to take part in automatically resolving archive gaps as they occur on the physical standby database side.

FAL_CLIENT and FAL_SERVER are initialization parameters used to configure log gap detection and resolution at the standby database side of a physical database configuration. This functionality is provided by log apply services and is used by the physical standby database to manage the detection and resolution of archived redo logs.

 

Note that FAL_CLIENT and FAL_SERVER only need to be defined in the initialization parameter file for the standby database(s). It is possible; however, to define these two parameters in the initialization parameter for the primary database server to ease the amount of work that would need to be performed if the primary database were required to transition its role.

FAL server is a background process that is generally located on the primary database server (however it can also be found on another standby database). FAL server is responsible for servicing incoming requests from the FAL client.

Further Reading

Additional information on Oracle Data Guard can be found in the Data Guard Concepts and Administration guide which is available from the Oracle Documentation Library:

  Data Guard Concepts and Administration - (Oracle9i Release 2)
  Data Guard Concepts and Administration - (Oracle10g Release 2)

About the Author

Jeffrey Hunter is an Oracle Certified Professional, Java Development Certified Professional, Author, and an Oracle ACE. Jeff currently works as a Senior Database Administrator for The DBA Zone, Inc. located in Pittsburgh, Pennsylvania. His work includes advanced performance tuning, Java and PL/SQL programming, developing high availability solutions, capacity planning, database security, and physical / logical database design in a UNIX / Linux server environment. Jeff's other interests include mathematical encryption theory, tutoring advanced mathematics, programming language processors (compilers and interpreters) in Java and C, LDAP, writing web-based database administration tools, and of course Linux. He has been a Sr. Database Administrator and Software Engineer for over 20 years and maintains his own website site at: http://www.iDevelopment.info. Jeff graduated from Stanislaus State University in Turlock, California, with a Bachelor's degree in Computer Science and Mathematics.



Copyright (c) 1998-2014 Jeffrey M. Hunter. All rights reserved.

All articles, scripts and material located at the Internet address of http://www.idevelopment.info is the copyright of Jeffrey M. Hunter and is protected under copyright laws of the United States. This document may not be hosted on any other site without my express, prior, written permission. Application to host any of the material elsewhere can be made by contacting me at jhunter@idevelopment.info.

I have made every effort and taken great care in making sure that the material included on my web site is technically accurate, but I disclaim any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on it. I will in no case be liable for any monetary damages arising from such loss, damage or destruction.

Last modified on
Wednesday, 28-Dec-2011 14:00:47 EST
Page Count: 45822