DBA Tips Archive for Oracle

  


Differences in df and du on Oracle Cluster File System (OCFS2) and Orphan Files

by Jeff Hunter, Sr. Database Administrator

Contents

  1. Overview
  2. Troubleshooting Steps
  3. Upgrade OCFS2
  4. Nightly Check Script for Orphan Files
  5. About the Author

Overview

Recently, it was noticed that the df and du commands were displaying different results on two OCFS2 file systems from all clustered database nodes in an Oracle RAC 10g configuration.

Mount Point LUN Purpose OCFS2 Kernel Driver OCFS2 Tools OCFS2 Console
/u02 /dev/iscsi/thingdbcrsvol1/part1 Oracle CRS Components 1.4.2-1.el5 1.4.2-1.el5 1.4.2-1.el5
/u03 /dev/iscsi/thingdbfravol1/part1 Flash Recovery Area 1.4.2-1.el5 1.4.2-1.el5 1.4.2-1.el5

For example:


[oracle@thing1 ~]$ df -k Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/VolGroup01-LogVol00 33708824 17234092 14734752 54% / /dev/hda1 101086 12188 83679 13% /boot tmpfs 1036988 0 1036988 0% /dev/shm domo:Public 4799457152 1876358656 2923098496 40% /domo /dev/sda1 10485728 337920 10147808 4% /u02 /dev/sdb1 943714272 326912416 616801856 35% /u03 [oracle@thing1 ~]$ du -sk /u03 78463904 /u03

In the above example, the difference between df and du on the /u03 cluster file system is 248 GB! (326912416-78463904)

The problem appeared to be isolated to the /u03 cluster file system which is used exclusively for the global Flash Recovery Area (FRA). The files stored in the FRA include RMAN backups, archived redo logs, and flashback database logs:

/u03/flash_recovery_area/thingdb/archivelog
/u03/flash_recovery_area/thingdb/autobackup
/u03/flash_recovery_area/thingdb/backupset
/u03/flash_recovery_area/thingdb/flashback

These directories are all managed by the Oracle RDBMS software — no files in the FRA (or /u03 in general) are manually added or removed.

Researching this problem yielded the following My Oracle Support notes:

After several days of research, it was determined that orphan files on the OCFS2 cluster file system were responsible for the significant difference between the df and du commands. OCFS2 was apparently leaving some deleted files in the orphan directory (the //orphan_dir name space in OCFS2) after being deleted.

Note ID 468923.1 on the My Oracle Support website explains that despite deleting files and/or directories on an OCFS2 cluster file system, it is possible that orphaned files may exist. These are files that have been deleted, but are being accessed by running processes on one or more OCFS2 cluster nodes.

When an object (file and/or directory) is deleted, the file system unlinks the object entry from the existing directory and links it as an entry against that cluster node's orphan directory (the //orphan_dir name space in OCFS2). When the object is eventually no longer used across the cluster, the file system frees it's inode including all disk space associated with it.

So it appears that in order to completely avoid orphaned files being created, the application/database should guarantee that the files are not being opened by other processes when deleting. Again, the problem seemed to be isolated to the /u03 file system which was used exclusively for the global Flash Recovery Area and managed solely by the Oracle RDBMS software. No files on this file system were being manually added or deleted.

The orphaned files in /u03 added up significantly given they were RMAN backups, archived logs, flashback logs, etc. The initial plan was to manually remove all orphan files using /sbin/fsck.ocfs2 -fy /dev/iscsi/thingdbfravol1/part1. Since one of the pre-requisites is to unmount the file system(s), a production outage had to be scheduled.

It was later found that a bug within OCFS2 was leaving some deleted files in the orphan directory after being deleted. The solution was to upgrade the OCFS2 configuration to version 1.4.4 or higher. At the time of this writing, the latest version of OCFS2 was 1.4.7-1.

This article discussing some of the troubleshooting steps that were involved in resolving OCFS2 not removing orphan files.

Troubleshooting Steps

Reboot Cluster Nodes

During the initial troubleshooting phase, all nodes in the cluster were rebooted to determine its effect.


[root@thing1 ~]# reboot [root@thing2 ~]# reboot

When the nodes came back online, the problem was resolved and df and du were calculating the same disk space usage. This was a short lived victory, however, when in less than 24 hours, df and du were once again reporting stark differences on the /u03 cluster file system.

Identify Holders Associated with the OCFS2 "//orphan_dir" Name Space

Testing resumed with the assumption that the cluster file system(s) were having an issue removing orphaned files. Note ID 468923.1 on the My Oracle Support website included directions to identify which node, application, or user (holders) were associated with //orphan_dir name space entries (if any). To determine whether or not the orphaned files were truly being held by a process (from one of the nodes in the RAC), the following was run from all OCFS2 clustered nodes:


[root@thing1 ~]# find /proc -name fd -exec ls -l {} \; | grep deleted [root@thing2 ~]# find /proc -name fd -exec ls -l {} \; | grep deleted

Similarly, the lsof command can be used to produce the same results:


[root@thing1 ~]# lsof | grep -i deleted [root@thing2 ~]# lsof | grep -i deleted

Although the find /proc and lsof commands both produced output, the files listed didn't appear to have anything to do with the files on either of the clustered file systems (/u02 or /u03). It was now evident that while excessive orphan files did indeed exist on the /u03 cluster file system, no processes were identified as holding a lock on them.

Query "//orphan_dir" Name Space using "debugfs.ocfs2"

The next round of troubleshooting involved querying the //orphan_dir name space in OCFS2 using the debugfs.ocfs2 command. The general syntax used to query the //orphan_dir name space is:


debugfs.ocfs2 -R "ls -l //orphan_dir:<OCFS2_SLOT_NUM>" <DISK_DEVICE_NAME>

Where:

For example:


Node 1 - (thing1) [root@thing1 ~]# /sbin/debugfs.ocfs2 -R "ls -l //orphan_dir:0000" /dev/iscsi/thingdbfravol1/part1 24 drwxr-xr-x 2 0 0 3896 18-Aug-2010 10:27 . 18 drwxr-xr-x 6 0 0 3896 27-Oct-2009 19:48 .. 83865735 -rw-r----- 0 501 501 2147483648 13-Aug-2010 00:14 00000000000bd021 83865734 -rw-r----- 0 501 501 2147483648 13-Aug-2010 00:13 00000000000bd020 83865720 -rw-r----- 0 501 501 2147483648 12-Apr-2010 00:09 0000000004ffb078 83865719 -rw-r----- 0 501 501 2147483648 12-Apr-2010 00:08 0000000004ffb077 83865718 -rw-r----- 0 501 501 2147483648 12-Apr-2010 00:08 0000000004ffb076 ... <snip> ... Node 2 - (thing2) [root@thing1 ~]# /sbin/debugfs.ocfs2 -R "ls -l //orphan_dir:0001" /dev/iscsi/thingdbfravol1/part1 25 drwxr-xr-x 2 0 0 3896 18-Aug-2010 10:27 . 18 drwxr-xr-x 6 0 0 3896 27-Oct-2009 19:48 ..

The above output shows that cluster node 1 (thing1) found entries (files) associated with the //orphan_dir name space on /u03, meaning they are orphan files. The output from debugfs.ocfs2 shows the file entries using some type of "hex" value (i.e. 83865735=00000000000bd021). This hex value indicates the inode number. Note that cluster node 2 (thing2) has no entries in the //orphan_dir name space indicating there are no orphan files on this node. This makes sense since the nightly RMAN process is only run from the first node (thing1).

Manually Remove Orphan Files using "fsck.ocfs2"

By the time it was identified that orphan files were not being removed from the OCFS2 cluster file system, immediate action was required to clear out entries found in the //orphan_dir name space and allow the file system to free it's inodes and disk space associated with it. The size and number of orphan files were considerably large given they were RMAN backups and archived logs.

To manually clear all orphaned files, schedule an outage on all nodes in the cluster. Bring down the clustered database and all Oracle RAC services, unmount the OCFS2 file system(s) that contain the orphan files to be removed, and run the fsck.ocfs2 command from all nodes in the cluster as follows:


[root@thing1 ~]# umount /u03 [root@thing2 ~]# umount /u03 [root@thing1 ~]# /sbin/fsck.ocfs2 -fy /dev/iscsi/thingdbfravol1/part1 [root@thing2 ~]# /sbin/fsck.ocfs2 -fy /dev/iscsi/thingdbfravol1/part1

After removing all orphan files, mount the OCFS2 cluster file systems and restart all Oracle RAC services.

Upgrade OCFS2

The permanent solution in this case was to upgrade OCFS2 to version 1.4.4 or higher according to Note ID 806554.1 from the My Oracle Support website. At the time of this writing, the latest version of OCFS2 was 1.4.7-1.

Upgrading from OCFS2 version 1.4.2-1 to 1.4.7-1 does not require any on-disk format change. At a minimum, it is a simple kernel driver update which means the upgrade could be performed in a rolling manner. While this would avoid a cluster-wide outage, a full outage was still scheduled since it was unknown if cleaning out the orphan files using fsck.ocfs2 could be performed on a disk device that still had other cluster instances mounting it.

The following paper provides step-by-step instructions to upgrade an installation of Oracle Cluster File System 2 (OCFS2) 1.4 on the Linux platform.

  Upgrading OCFS2 - 1.4

Nightly Check Script for Orphan Files

To ensure orphan files are not being held on an OCFS2 cluster file system, the following script should be scheduled to run nightly from all nodes in the cluster.

The purpose of this script is to identify and warn the DBA on any orphan files in an OCFS2 cluster file system. The script queries the //orphan_dir name space using the debugfs.ocfs2 command from a clustered node to determine the number of orphaned files on that node.

This script should be scheduled to run on a nightly basis through CRON as the root user account.

  ocfs2_check_orphaned_files.ksh

The ocfs2_check_orphaned_files.ksh script takes three parameters:

For example, the following is scheduled nightly from two nodes in an Oracle RAC cluster; namely thing1 and thing2. Note that node 1 (thing1) passes in slot number 0000 while node 2 (thing2) passes in slot number 0001. The third parameter (5) specifies that at most, five orphan files can exist in the provided OCFS cluster file system before this script issues a warning email.


Node 1 - (thing1) ocfs2_check_orphaned_files.ksh /dev/iscsi/thingdbcrsvol1/part1 0000 5 ocfs2_check_orphaned_files.ksh /dev/iscsi/thingdbfravol1/part1 0000 5 Node 2 - (thing2) ocfs2_check_orphaned_files.ksh /dev/iscsi/thingdbcrsvol1/part1 0001 5 ocfs2_check_orphaned_files.ksh /dev/iscsi/thingdbfravol1/part1 0001 5

About the Author

Jeffrey Hunter is an Oracle Certified Professional, Java Development Certified Professional, Author, and an Oracle ACE. Jeff currently works as a Senior Database Administrator for The DBA Zone, Inc. located in Pittsburgh, Pennsylvania. His work includes advanced performance tuning, Java and PL/SQL programming, developing high availability solutions, capacity planning, database security, and physical / logical database design in a UNIX / Linux server environment. Jeff's other interests include mathematical encryption theory, tutoring advanced mathematics, programming language processors (compilers and interpreters) in Java and C, LDAP, writing web-based database administration tools, and of course Linux. He has been a Sr. Database Administrator and Software Engineer for over 20 years and maintains his own website site at: http://www.iDevelopment.info. Jeff graduated from Stanislaus State University in Turlock, California, with a Bachelor's degree in Computer Science and Mathematics.



Copyright (c) 1998-2017 Jeffrey M. Hunter. All rights reserved.

All articles, scripts and material located at the Internet address of http://www.idevelopment.info is the copyright of Jeffrey M. Hunter and is protected under copyright laws of the United States. This document may not be hosted on any other site without my express, prior, written permission. Application to host any of the material elsewhere can be made by contacting me at jhunter@idevelopment.info.

I have made every effort and taken great care in making sure that the material included on my web site is technically accurate, but I disclaim any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on it. I will in no case be liable for any monetary damages arising from such loss, damage or destruction.

Last modified on
Saturday, 18-Sep-2010 17:44:04 EDT
Page Count: 13227