DBA Tips Archive for Oracle

  


Building an Inexpensive Oracle RAC 10g R2 on Linux - (RHEL 4.5 / iSCSI)

by Jeff Hunter, Sr. Database Administrator


Contents

  1. Overview
  2. Oracle RAC 10g Overview
  3. Shared-Storage Overview
  4. iSCSI Technology
  5. Hardware and Costs
  6. Install the Linux Operating System
  7. Network Configuration
  8. Install Openfiler
  9. Configure iSCSI Volumes using Openfiler
  10. Configure iSCSI Volumes on Oracle RAC Nodes
  11. Create "oracle" User and Directories
  12. Configure the Linux Servers for Oracle
  13. Configure the "hangcheck-timer" Kernel Module
  14. Configure RAC Nodes for Remote Access using SSH
  15. All Startup Commands for Both Oracle RAC Nodes
  16. Install and Configure Oracle Cluster File System (OCFS2)
  17. Install and Configure Automatic Storage Management (ASMLib 2.0)
  18. Download Oracle RAC 10g Software
  19. Pre-Installation Tasks for Oracle10g Release 2
  20. Install Oracle Clusterware 10g Software
  21. Install Oracle Database 10g Software
  22. Install Oracle Database 10g Companion CD Software
  23. Create TNS Listener Process
  24. Create the Oracle Cluster Database
  25. Post-Installation Tasks - (Optional)
  26. Verify TNS Networking Files
  27. Create / Alter Tablespaces
  28. Verify the RAC Cluster & Database Configuration
  29. Starting / Stopping the Cluster
  30. Transparent Application Failover - (TAF)
  31. Troubleshooting
  32. Conclusion
  33. Building an Oracle RAC Cluster Remotely
  34. Acknowledgements
  35. About the Author



Overview

One of the most efficient ways to become familiar with Oracle Real Application Cluster (RAC) 10g technology is to have access to an actual Oracle RAC 10g cluster. In learning this new technology, you will soon start to realize the benefits Oracle RAC 10g has to offer like fault tolerance, new levels of security, load balancing, and the ease of upgrading capacity. The problem though is the price of the hardware required for a typical production RAC configuration. A small two node cluster, for example, could run anywhere from US$10,000 to well over US$20,000. This would not even include the heart of a production RAC environment, the shared storage. In most cases, this would be a Storage Area Network (SAN), which generally start at US$10,000.

For those who simply want to become familiar with Oracle RAC 10g, this article provides a low cost alternative to configure an Oracle RAC 10g system using commercial off the shelf components and downloadable software for educational purposes. The estimated cost for this configuration could be anywhere from US$2,000 to US$2,700. The system will consist of a dual node cluster (each with a single processor), both running Linux (CentOS 4.5 or Red Hat Enterprise Linux 4 Update 5), Oracle10g Release 2, OCFS2, and ASMLib 2.0. All shared disk storage for Oracle RAC will be based on iSCSI using a Network Storage Server; namely Openfiler Release 2.2 (respin 2).

Powered by rPath Linux, Openfiler is a free browser-based network storage management utility that delivers file-based Network Attached Storage (NAS) and block-based Storage Area Networking (SAN) in a single framework. Openfiler supports CIFS, NFS, HTTP/DAV, FTP, however, we will only be making use of its iSCSI capabilities to implement an inexpensive SAN for the shared storage components required by Oracle RAC 10g. A 500GB external hard drive will be connected to the network storage server (sometimes referred to in this article as the Openfiler server) via its USB 2.0 interface. The Openfiler server will be configured to use this disk for iSCSI based storage and will be used in our Oracle RAC 10g configuration to store the shared files required by Oracle Clusterware as well as all Oracle ASM volumes.

  This article is provided for educational purposes only, so the setup is kept simple to demonstrate ideas and concepts. For example, the disk mirroring configured in this article will be setup on one physical disk only, while in practice that should be performed on multiple physical drives. Also note that while this article provides detailed instructions for successfully installing a complete Oracle RAC 10g system, it is by no means a substitute for the official Oracle documentation. In addition to this article, users should also consult the following Oracle documents to gain a full understanding of alternative configuration options, installation, and administration with Oracle RAC 10g. Oracle's official documentation site is docs.oracle.com.

  Oracle Clusterware and Oracle Real Application Clusters Installation Guide - 10g Release 2 (10.2) for Linux
  Oracle Clusterware and Oracle Real Application Clusters Administration and Deployment Guide - 10g Release 2 (10.2)
  2 Day + Real Application Clusters Guide - 10g Release 2 (10.2)

Although in past articles I used raw partitions for storing files on shared storage, here we will make use of the Oracle Cluster File System V2 (OCFS2) and Oracle Automatic Storage Management (ASM). The two Oracle RAC nodes will be configured as follows:

Oracle Database Files
RAC Node Name Instance Name Database Name $ORACLE_BASE File System -
Volume Manager for DB Files
linux1 orcl1 orcl /u01/app/oracle ASM
linux2 orcl2 orcl /u01/app/oracle ASM
Oracle Clusterware Shared Files
File Type File Name iSCSI
Volume Name
Mount Point File System
Oracle Cluster Registry (OCR) /u02/oradata/orcl/OCRFile crs /u02 OCFS2
Voting Disk /u02/oradata/orcl/CSSFile crs /u02 OCFS2

  With Oracle Database 10g Release 2 (10.2), Cluster Ready Services, or CRS, is now called Oracle Clusterware.

The Oracle Clusterware software will be installed to /u01/app/crs on both of the nodes that make up the RAC cluster. Starting with Oracle Database 10g Release 2 (10.2), Oracle Clusterware should be installed in a separate Oracle Clusterware home directory which is non-release specific (/u01/app/oracle/product/10.2.0/... for example) and must never be a subdirectory of the ORACLE_BASE directory (/u01/app/oracle for example). This is a change to the Optimal Flexible Architecture (OFA) rules. Note that the Oracle Clusterware and Oracle Real Application Clusters installation documentation from Oracle incorrectly state that the Oracle Clusterware home directory can be a subdirectory of the ORACLE_BASE directory. For example, in Chapter 2, "Preinstallation", in the section "Oracle Clusterware home directory", it incorrectly lists the path /u01/app/oracle/product/crs as a possible Oracle Clusterware home (or CRS home) path. This is incorrect. The default ORACLE_BASE path is /u01/app/oracle, and the Oracle Clusterware home must never be a subdirectory of the ORACLE_BASE directory. This issue is tracked with Oracle documentation bug "5843155" - (B14203-08 HAS CONFLICTING CRS_HOME LOCATIONS ) and is fixed in Oracle 11g.

The Oracle Clusterware software will be installed to /u01/app/crs on both of the nodes that make up the RAC cluster, however, the Clusterware software requires that two of its files, the "Oracle Cluster Registry (OCR)" file and the "Voting Disk" file be shared with both nodes in the cluster. These two files will be installed on shared storage using Oracle's Cluster File System, Release 2 (OCFS2). It is also possible to use RAW devices for these files, however, it is not possible to use ASM for these two shared Clusterware files.

The Oracle Database 10g Release 2 software will be installed into a separate Oracle Home; namely /u01/app/oracle/product/10.2.0/db_1 on both of the nodes that make up the RAC cluster. All of the Oracle physical database files (data, online redo logs, control files, archived redo logs) will be installed to shared volumes being managed by Automatic Storage Management (ASM).

  The Oracle database files could have just as well been stored on the Oracle Cluster File System (OFCS2). Using ASM, however, makes the article that much more interesting!

  This article is only designed to work as documented with absolutely no substitutions!

The only exception here is the choice of vendor hardware (i.e. machines, networking
equipment, and internal / external hard drives). Ensure that the hardware you
purchase from the vendor is supported on Red Hat Linux 4. I tend to stick with Dell
hardware given their superb quality and compatibility with Linux. For a test system
of this nature, I highly recommend purchasing pre-owned or refurbished Dell hardware
from a reputable company like Stallard Technologies, Inc.. Stallard Technologies has a
proven track record of delivering the best value on pre-owned hardware combined with
a commitment to superior customer service. I base my recommendation on my own
outstanding personal experience with their organization. To learn more about Stallard
Technologies, visit their website or contact William Buchanan.

The following is a list of past articles which describe configuring a similar Oracle RAC Cluster
using various versions of Oracle, Operating System, and shared storage medium:

  Building an Inexpensive Oracle RAC 10g Release 2 on Linux - (CentOS 4.2 / FireWire)
  Building an Inexpensive Oracle RAC 10g Release 1 on Linux - (WBEL 3.0 / FireWire)
  Building an Inexpensive Oracle RAC 9i on Linux - (Fedora Core 1 / FireWire)



Oracle RAC 10g Overview

Oracle RAC, introduced with Oracle9i, is the successor to Oracle Parallel Server (OPS). Oracle RAC allows multiple instances to access the same database (storage) simultaneously. RAC provides fault tolerance, load balancing, and performance benefits by allowing the system to scale out, and at the same time since all nodes access the same database, the failure of one instance will not cause the loss of access to the database.

At the heart of Oracle RAC 10g is a shared disk subsystem. All nodes in the cluster must be able to access all of the data, redo log files, control files and parameter files for all nodes in the cluster. The data disks must be globally available in order to allow all nodes to access the database. Each node has its own redo log file(s) and UNDO tablespace, but the other nodes must be able to access them (and the shared control file) in order to recover that node in the event of a system failure.

The biggest difference between Oracle RAC and OPS is the addition of Cache Fusion. With OPS a request for data from one node to another required the data to be written to disk first, then the requesting node can read that data. With cache fusion, data is passed along a high-speed interconnect using a sophisticated locking algorithm.

Not all clustering solutions use shared storage. Some vendors use an approach known as a Federated Cluster, in which data is spread across several machines rather than shared by all. With Oracle RAC 10g, however, multiple nodes use the same set of disks for storing data. With Oracle RAC 10g, the data files, redo log files, control files, and archived log files reside on shared storage on raw-disk devices, a NAS, ASM, or on a clustered file system. Oracle's approach to clustering leverages the collective processing power of all the nodes in the cluster and at the same time provides failover security.

Pre-configured Oracle RAC 10g solutions are available from vendors such as Dell, IBM and HP for production environments. This article, however, focuses on putting together your own Oracle RAC 10g environment for development and testing by using Linux servers and a low cost shared disk solution; iSCSI.

For more background about Oracle RAC, visit the Oracle RAC Product Center on OTN.



Shared-Storage Overview

Today, fibre channel is one of the most popular solutions for shared storage. As mentioned earlier, fibre channel is a high-speed serial-transfer interface that is used to connect systems and storage devices in either point-to-point (FC-P2P), arbitrated loop (FC-AL), or switched topologies (FC-SW). Protocols supported by Fibre Channel include SCSI and IP. Fibre channel configurations can support as many as 127 nodes and have a throughput of up to 2.12 gigabits per second in each direction, and 4.25 Gbps is expected.

Fibre channel, however, is very expensive. Just the fibre channel switch alone can start at around US$1,000. This does not even include the fibre channel storage array and high-end drives, which can reach prices of about US$300 for a 36GB drive. A typical fibre channel setup which includes fibre channel cards for the servers is roughly US$10,000, which does not include the cost of the servers that make up the cluster.

A less expensive alternative to fibre channel is SCSI. SCSI technology provides acceptable performance for shared storage, but for administrators and developers who are used to GPL-based Linux prices, even SCSI can come in over budget, at around US$2,000 to US$5,000 for a two-node cluster.

Another popular solution is the Sun NFS (Network File System) found on a NAS. It can be used for shared storage but only if you are using a network appliance or something similar. Specifically, you need servers that guarantee direct I/O over NFS, TCP as the transport protocol, and read/write block sizes of 32K.

The shared storage that will be used for this article is based on iSCSI technology using a network storage server installed with Openfiler. This solution offers a low-cost alternative to fibre channel for testing and educational purposes, but given the low-end hardware being used, it is not recommended to be used in a production environment.



iSCSI Technology

For many years, the only technology that existed for building a network based storage solution was a Fibre Channel Storage Area Network (FC SAN). Based on an earlier set of ANSI protocols called Fiber Distributed Data Interface (FDDI), Fibre Channel was developed to move SCSI commands over a storage network.

Several of the advantages to FC SAN include greater performance, increased disk utilization, improved availability, better scalability, and most important to us — support for server clustering! Still today, however, FC SANs suffer from three major disadvantages. The first is price. While the costs involved in building a FC SAN have come down in recent years, the cost of entry still remains prohibitive for small companies with limited IT budgets. The second is incompatible hardware components. Since its adoption, many product manufacturers have interpreted the Fibre Channel specifications differently from each other which has resulted in scores of interconnect problems. When purchasing Fibre Channel components from a common manufacturer, this is usually not a problem. The third disadvantage is the fact that a Fibre Channel network is not Ethernet! It requires a separate network technology along with a second set of skill sets that need to exist with the datacenter staff.

With the popularity of Gigabit Ethernet and the demand for lower cost, Fibre Channel has recently been given a run for its money by iSCSI-based storage systems. Today, iSCSI SANs remain the leading competitor to FC SANs.

Ratified on February 11th, 2003 by the Internet Engineering Task Force (IETF), the Internet Small Computer System Interface, better known as iSCSI, is an Internet Protocol (IP)-based storage networking standard for establishing and managing connections between IP-based storage devices, hosts, and clients. iSCSI is a data transport protocol defined in the SCSI-3 specifications framework and is similar to Fibre Channel in that it is responsible for carrying block-level data over a storage network. Block-level communication means that data is transferred between the host and the client in chunks called blocks. Database servers depend on this type of communication (as opposed to the file level communication used by most NAS systems) in order to work properly. Like a FC SAN, an iSCSI SAN should be a separate physical network devoted entirely to storage, however, its components can be much the same as in a typical IP network (LAN).

While iSCSI has a promising future, many of its early critics were quick to point out some of its inherent shortcomings with regards to performance. The beauty of iSCSI is its ability to utilize an already familiar IP network as its transport mechanism. The TCP/IP protocol, however, is very complex and CPU intensive. With iSCSI, most of the processing of the data (both TCP and iSCSI) is handled in software and is much slower than Fibre Channel which is handled completely in hardware. The overhead incurred in mapping every SCSI command onto an equivalent iSCSI transaction is excessive. For many the solution is to do away with iSCSI software initiators and invest in specialized cards that can offload TCP/IP and iSCSI processing from a server's CPU. These specialized cards are sometimes referred to as an iSCSI Host Bus Adaptor (HBA) or a TCP Offload Engine (TOE) card. Also consider that 10-Gigabit Ethernet is a reality today!

As with any new technology, iSCSI comes with its own set of acronyms and terminology. For the purpose of this article, it is only important to understand the difference between an iSCSI initiator and an iSCSI target.

iSCSI Initiator

Basically, an iSCSI initiator is a client device that connects and initiates requests to some service offered by a server (in this case an iSCSI target). The iSCSI initiator software will need to exist on each of the Oracle RAC nodes (linux1 and linux2).

An iSCSI initiator can be implemented using either software or hardware. Software iSCSI initiators are available for most major operating system platforms. For this article, we will be using the free Linux iscsi-sfnet software driver found in the iscsi-initiator-utils RPM — developed as part of the Linux-iSCSI Project. The iSCSI software initiator is generally used with a standard network interface card (NIC) — a Gigabit Ethernet card in most cases. A hardware initiator is an iSCSI HBA (or a TCP Offload Engine (TOE) card), which is basically just a specialized Ethernet card with a SCSI ASIC on-board to offload all the work (TCP and SCSI commands) from the system CPU. iSCSI HBAs are available from a number of vendors, including Adaptec, Alacritech, Intel, and QLogic.

iSCSI Target

An iSCSI target is the "server" component of an iSCSI network. This is typically the storage device that contains the information you want and answers requests from the initiator(s). For the purpose of this article, the node openfiler1 will be the iSCSI target.

So with all of this talk about iSCSI, does this mean the death of Fibre Channel anytime soon? Probably not. Fibre Channel has clearly demonstrated its capabilities over the years with its capacity for extremely high speeds, flexibility, and robust reliability. Customers who have strict requirements for high performance storage, large complex connectivity, and mission critical reliability will undoubtedly continue to choose Fibre Channel.

Before closing out this section, I thought it would be appropriate to present the following chart that shows speed comparisons of the various types of disk interfaces and network technologies. For each interface, I provide the maximum transfer rates in kilobits (kb), kilobytes (KB), megabits (Mb), megabytes (MB), gigabits (Gb), and gigabytes (GB) per second with some of the more common ones highlighted in grey.

Disk Interface / Network / BUS Speed
Kb KB Mb MB Gb GB
Serial 115 14.375 0.115 0.014    
Parallel (standard) 920 115 0.92 0.115    
10Base-T Ethernet     10 1.25    
IEEE 802.11b wireless Wi-Fi (2.4 GHz band)     11 1.375    
USB 1.1     12 1.5    
Parallel (ECP/EPP)     24 3    
SCSI-1     40 5    
IEEE 802.11g wireless WLAN (2.4 GHz band)     54 6.75    
SCSI-2 (Fast SCSI / Fast Narrow SCSI)     80 10    
100Base-T Ethernet (Fast Ethernet)     100 12.5    
ATA/100 (parallel)     100 12.5    
IDE     133.6 16.7    
Fast Wide SCSI (Wide SCSI)     160 20    
Ultra SCSI (SCSI-3 / Fast-20 / Ultra Narrow)     160 20    
Ultra IDE     264 33    
Wide Ultra SCSI (Fast Wide 20)     320 40    
Ultra2 SCSI     320 40    
FireWire 400 - (IEEE1394a)     400 50    
USB 2.0     480 60    
Wide Ultra2 SCSI     640 80    
Ultra3 SCSI     640 80    
FireWire 800 - (IEEE1394b)     800 100    
Gigabit Ethernet     1000 125 1  
PCI - (33 MHz / 32-bit)     1064 133 1.064  
Serial ATA I - (SATA I)     1200 150 1.2  
Wide Ultra3 SCSI     1280 160 1.28  
Ultra160 SCSI     1280 160 1.28  
PCI - (33 MHz / 64-bit)     2128 266 2.128  
PCI - (66 MHz / 32-bit)     2128 266 2.128  
AGP 1x - (66 MHz / 32-bit)     2128 266 2.128  
Serial ATA II - (SATA II)     2400 300 2.4  
Ultra320 SCSI     2560 320 2.56  
FC-AL Fibre Channel     3200 400 3.2  
PCI-Express x1 - (bidirectional)     4000 500 4  
PCI - (66 MHz / 64-bit)     4256 532 4.256  
AGP 2x - (133 MHz / 32-bit)     4264 533 4.264  
Serial ATA III - (SATA III)     4800 600 4.8  
PCI-X - (100 MHz / 64-bit)     6400 800 6.4  
PCI-X - (133 MHz / 64-bit)       1064 8.512 1
AGP 4x - (266 MHz / 32-bit)       1066 8.528 1
10G Ethernet - (IEEE 802.3ae)       1250 10 1.25
PCI-Express x4 - (bidirectional)       2000 16 2
AGP 8x - (533 MHz / 32-bit)       2133 17.064 2.1
PCI-Express x8 - (bidirectional)       4000 32 4
PCI-Express x16 - (bidirectional)       8000 64 8



Hardware and Costs

The hardware used to build our example Oracle RAC 10g environment consists of three Linux servers (two Oracle RAC nodes and one Network Storage Server) and components that can be purchased at many local computer stores or over the Internet (i.e. Stallard Technologies, Inc.).

Oracle RAC Node 1 - (linux1)
  Dell Dimension 2400 Series

     - Intel(R) Pentium(R) 4 Processor at 2.80GHz
     - 1GB DDR SDRAM (at 333MHz)
     - 40GB 7200 RPM Internal Hard Drive
     - Integrated Intel 3D AGP Graphics
     - Integrated 10/100 Ethernet - (Broadcom BCM4401)
     - CDROM (48X Max Variable)
     - 3.5" Floppy
     - No Keyboard, Monitor, or Mouse - (Connected to KVM Switch)

US$620
  1 - Ethernet LAN Card

Each Linux server for Oracle RAC should contain two NIC adapters. The Dell Dimension includes an integrated 10/100 Ethernet adapter that will be used to connect to the public network. The second NIC adapter will be used for the private network (RAC interconnect and Openfiler networked storage). Select the appropriate NIC adapter that is compatible with the maximum data transmission speed of the network switch to be used for the private network. For the purpose of this article, I used a Gigabit Ethernet switch (and 1Gb Ethernet cards) for the private network.

Used for RAC interconnect to linux2 and Openfiler networked storage.

     Gigabit Ethernet

       Intel 10/100/1000Mbps PCI Desktop Adapter - (PWLA8391GT)

US$35

Oracle RAC Node 2 - (linux2)
  Dell Dimension 2400 Series

     - Intel(R) Pentium(R) 4 Processor at 2.80GHz
     - 1GB DDR SDRAM (at 333MHz)
     - 40GB 7200 RPM Internal Hard Drive
     - Integrated Intel 3D AGP Graphics
     - Integrated 10/100 Ethernet - (Broadcom BCM4401)
     - CDROM (48X Max Variable)
     - 3.5" Floppy
     - No Keyboard, Monitor, or Mouse - (Connected to KVM Switch)

US$620
  1 - Ethernet LAN Card

Each Linux server for Oracle RAC should contain two NIC adapters. The Dell Dimension includes an integrated 10/100 Ethernet adapter that will be used to connect to the public network. The second NIC adapter will be used for the private network (RAC interconnect and Openfiler networked storage). Select the appropriate NIC adapter that is compatible with the maximum data transmission speed of the network switch to be used for the private network. For the purpose of this article, I used a Gigabit Ethernet switch (and 1Gb Ethernet cards) for the private network.

Used for RAC interconnect to linux1 and Openfiler networked storage.

     Gigabit Ethernet

       Intel 10/100/1000Mbps PCI Desktop Adapter - (PWLA8391GT)

US$35

Network Storage Server - (openfiler1)
  Dell PowerEdge 1800

     - Dual 3.0GHz Xeon / 1MB Cache / 800FSB (SL7PE)
     - 2GB of ECC Memory
     - 40GB IDE Hard Drive
     - Single embedded Intel 10/100/1000 Gigabit NIC
     - 4 x Integrated USB 2.0 Ports
     - No Keyboard, Monitor, or Mouse - (Connected to KVM Switch)

US$650
  1 - Ethernet LAN Card

The Network Storage Server (Openfiler server) should contain two NIC adapters. The Dell PowerEdge 1800 machine includes an integrated 10/100/1000 Ethernet adapter that will be used to connect to the public network. The second NIC adapter will be used for the private network (Openfiler networked storage). Select the appropriate NIC adapter that is compatible with the maximum data transmission speed of the network switch to be used for the private network. For the purpose of this article, I used a Gigabit Ethernet switch (and 1Gb Ethernet cards) for the private network.

Used for networked storage on the private network.

     Gigabit Ethernet

       Intel 10/100/1000Mbps PCI Desktop Adapter - (PWLA8391GT)

US$35

Miscellaneous Components
  Storage Device(s) - External Hard Drive

For the database storage I used a single external LaCie d2 Hard Drive Extreme with Triple Interface (500GB) drive which was connected to the Openfiler server via its USB 2.0 interface. The Openfiler server will be configured to use this disk for iSCSI based storage and will be used in our Oracle RAC 10g configuration to store the shared files required by Oracle Clusterware as well as all Oracle ASM volumes.

Note: Since the writing of this article, LaCie has discontinued the 500GB version of this external hard drive and only the 250GB and 320GB capacities exist. Please be aware that any type of hard disk (internal or external) should work for database storage as long as it can be recognized by the network storage server (Openfiler) and has adequate space.

       LaCie d2 Hard Drive Extreme with Triple Interface

US$260
  1 - Ethernet Switch

Used for the interconnect between linux1-priv and linux2-priv. This switch will also be used for network storage traffic for Openfiler. For the purpose of this article, I used a Gigabit Ethernet switch (and 1Gb Ethernet cards) for the private network.

     Gigabit Ethernet

       D-Link 8-port 10/100/1000 Desktop Switch - (DGS-2208)

US$50

  6 - Network Cables

       Category 5e patch cable - (Connect linux1 to public network)
       Category 5e patch cable - (Connect linux2 to public network)
       Category 5e patch cable - (Connect openfiler1 to public network)
       Category 5e patch cable - (Connect linux1 to interconnect Ethernet switch)
       Category 5e patch cable - (Connect linux2 to interconnect Ethernet switch)
       Category 5e patch cable - (Connect openfiler1 to interconnect Ethernet switch)

US$5
US$5
US$5
US$5
US$5
US$5

Optional Components
  KVM Switch

This article requires access to the console of all nodes (servers) in order to install the operating system and perform several of the configuration tasks. When managing a very small number of servers, it might make sense to connect each server with its own monitor, keyboard, and mouse in order to access its console. However, as the number of servers to manage increases, this solution becomes unfeasible. A more practical solution would be to configure a dedicated computer which would include a single monitor, keyboard, and mouse that would have direct access to the console of each server. This solution is made possible using a Keyboard, Video, Mouse Switch —better known as a KVM Switch. A KVM switch is a hardware device that allows a user to control multiple computers from a single keyboard, video monitor and mouse. Avocent provides a high quality and economical 4-port switch which includes four 6' cables:

       SwitchView® 1000 - (4SV1000BND1-001)

For a detailed explanation and guide on the use and KVM switches, please see the article "KVM Switches For the Home and the Enterprise".

US$340
Total     US$2,675  


We are about to start the installation process. Now that we have talked about the hardware that will be used in this example, let's take a conceptual look at what the environment would look like (click on the graphic below to view larger image):

Figure 1: Oracle RAC 10g Release 2 Testing Configuration

As we start to go into the details of the installation, it should be noted that most of the tasks within this document will need to be performed on both Oracle RAC nodes (linux1 and linux2). I will indicate at the beginning of each section whether or not the task(s) should be performed on both Oracle RAC nodes or on the network storage server (openfiler1).



Install the Linux Operating System


  Perform the following installation on both Oracle RAC nodes in the cluster!

After procuring the required hardware, it is time to start the configuration process. The first task we need to perform is to install the Linux operating system. As already mentioned, this article will use CentOS 4.5. Although I have used Red Hat Fedora in the past, I wanted to switch to a Linux environment that would guarantee all of the functionality contained with Oracle. This is where CentOS comes in. The CentOS project takes the Red Hat Enterprise Linux 4 source RPMs and compiles them into a free clone of the Red Hat Enterprise Server 4 product. This provides a free and stable version of the Red Hat Enterprise Linux 4 (AS/ES) operating environment that I can now use for testing different Oracle configurations. I have moved away from Fedora as I need a stable environment that is not only free, but as close to the actual Oracle supported operating system as possible. While CentOS is not the only project performing the same functionality, I tend to stick with it as it is stable and reacts fast with regards to updates by Red Hat.


Downloading CentOS

Use the links (below) to download CentOS 4.5. After downloading CentOS, you will then want to burn each of the ISO images to CD.

  CentOS.org

  If you are downloading the above ISO files to a MS Windows machine, there are many options for burning these images (ISO files) to a CD. You may already be familiar with and have the proper software to burn images to CD. If you are not familiar with this process and do not have the required software to burn images to CD, here are just two (of many) software packages that can be used:

  UltraISO
  Magic ISO Maker


Installing CentOS

This section provides a summary of the screens used to install CentOS. For more detailed installation instructions, it is possible to use the manuals from Red Hat Linux http://www.redhat.com/docs/manuals/. I would suggest, however, that the instructions I have provided below be used for this Oracle RAC 10g configuration.

  Before installing the Linux operating system on both nodes, you should have the two NIC interfaces (cards) installed.

After downloading and burning the CentOS images (ISO files) to CD, insert CentOS Disk #1 into the first server (linux1 in this example), power it on, and answer the installation screen prompts as noted below. After completing the Linux installation on the first node, perform the same Linux installation on the second node while substituting the node name linux1 for linux2 and the different IP addresses were appropriate.

Boot Screen

The first screen is the CentOS boot screen. At the boot: prompt, hit [Enter] to start the installation process.
Media Test
When asked to test the CD media, tab over to [Skip] and hit [Enter]. If there were any errors, the media burning software would have warned us. After several seconds, the installer should then detect the video card, monitor, and mouse. The installer then goes into GUI mode.
Welcome to CentOS
At the welcome screen, click [Next] to continue.
Language / Keyboard Selection
The next two screens prompt you for the Language and Keyboard settings. In almost all cases, you can accept the defaults. Make the appropriate selection for your configuration and click [Next] to continue.
Installation Type
Choose the [Custom] option and click [Next] to continue.
Disk Partitioning Setup
Select [Automatically partition] and click [Next] continue.

If there were a previous installation of Linux on this machine, the next screen will ask if you want to "remove" or "keep" old partitions. Select the option to [Remove all partitions on this system]. Also, ensure that the [hda] drive is selected for this installation. I also keep the checkbox [Review (and modify if needed) the partitions created] selected. Click [Next] to continue.

You will then be prompted with a dialog window asking if you really want to remove all partitions. Click [Yes] to acknowledge this warning.

Partitioning
The installer will then allow you to view (and modify if needed) the disk partitions it automatically selected. For most automatic layouts, the installer will choose 100MB for /boot, double the amount of RAM (systems with < 2GB RAM) or an amount equal to RAM (systems with > 2GB RAM) for swap, and the rest going to the root (/) partition. Starting with EL 4, the installer will create the same disk configuration as just noted but will create them using the Logical Volume Manager (LVM). For example, it will partition the first hard drive (/dev/hda for my configuration) into two partitions — one for the /boot partition (/dev/hda1) and the remainder of the disk dedicate to a LVM named VolGroup00 (/dev/hda2). The LVM Volume Group (VolGroup00) is then partitioned into two LVM partitions - one for the root filesystem (/) and another for swap.

The main concern during the partitioning phase is to ensure enough swap space is allocated as required by Oracle (which is a multiple of the available RAM). The following is Oracle's requirement for swap space:

Available RAM Swap Space Required
Between 1 GB and 2 GB 1.5 times the size of RAM
Between 2 GB and 8 GB Equal to the size of RAM
More than 8 GB .75 times the size of RAM

For the purpose of this install, I will accept all automatically preferred sizes. (Including 2GB for swap since I have 1GB of RAM installed.)

If for any reason, the automatic layout does not configure an adequate amount of swap space, you can easily change that from this screen. To increase the size of the swap partition, [Edit] the volume group VolGroup00. This will bring up the "Edit LVM Volume Group: VolGroup00" dialog. First, [Edit] and decrease the size of the root file system (/) by the amount you want to add to the swap partition. For example, to add another 512MB to swap, you would decrease the size of the root file system by 512MB (i.e. 36,032MB - 512MB = 35,520MB). Now add the space you decreased from the root file system (512MB) to the swap partition. When completed, click [OK] on the "Edit LVM Volume Group: VolGroup00" dialog.

Once you are satisfied with the disk layout, click [Next] to continue.

Boot Loader Configuration
The installer will use the GRUB boot loader by default. To use the GRUB boot loader, accept all default values and click [Next] to continue.
Network Configuration
I made sure to install both NIC interfaces (cards) in each of the Linux machines before starting the operating system installation. This screen should have successfully detected each of the network devices.

First, make sure that each of the network devices are checked to [Active on boot]. The installer may choose to not activate eth1 by default.

Second, [Edit] both eth0 and eth1 as follows. You may choose to use different IP addresses for both eth0 and eth1 and that is OK. Configure eth1 (the interconnect and storage network) on a different subnet than eth0 (the public network):

eth0:
- Check OFF the option to [Configure using DHCP]
- Leave the [Activate on boot] checked ON
- IP Address: 192.168.1.100
- Netmask: 255.255.255.0

eth1:
- Check OFF the option to [Configure using DHCP]
- Leave the [Activate on boot] checked ON
- IP Address: 192.168.2.100
- Netmask: 255.255.255.0

Continue by setting your hostname manually. I used "linux1" for the first node and "linux2" for the second. Finish this dialog off by supplying your gateway and DNS servers.

Firewall
On this screen, make sure to select [No firewall]. Also under the option to "Enable SELinux?", select [Disabled] and click [Next] to continue.

You will be prompted with a warning dialog about not setting the firewall. If this occurs, simply hit [Proceed] to continue.

Additional Language Support / Time Zone
The next two screens allow you to select additional language support and time zone information. In almost all cases, you can accept the defaults. Make the appropriate selection for your configuration and click [Next] to continue.
Set Root Password
Select a root password and click [Next] to continue.
Package Group Selection
Scroll down to the bottom of this screen and select [Everything] under the "Miscellaneous" section. Click [Next] to continue.

Please note that the installation of Oracle does not require all Linux packages to be installed. My decision to install all packages was for the sake of brevity. Please see section "Pre-Installation Tasks for Oracle10g Release 2" for a more detailed look at the critical packages required for a successful Oracle installation.

Also note that with some RHEL 4 distributions, you will not get the "Package Group Selection" screen by default. There, you are asked to simply "Install default software packages" or "Customize software packages to be installed". Select the option to "Customize software packages to be installed" and click [Next] to continue. This will then bring up the "Package Group Selection" screen. Now, scroll down to the bottom of this screen and select [Everything] under the "Miscellaneous" section. Click [Next] to continue.

About to Install
This screen is basically a confirmation screen. Click [Next] on this screen and then the [Continue] button on the dialog box to start the installation. During the installation process, you will be asked to switch disks to Disk #2, Disk #3, and then Disk #4.

Note that with CentOS 4.5, the installer would ask to switch to Disk #2, Disk #3, Disk #4, Disk #1, and then back to Disk #4.

Graphical Interface (X) Configuration
With most RHEL 4 distributions (not the case with CentOS 4.5), when the installation is complete, the installer will attempt to detect your video hardware. Ensure that the installer has detected and selected the correct video hardware (graphics card and monitor) to properly use the X Windows server. You will continue with the X configuration in the next several screens.
Congratulations
And that's it. You have successfully installed CentOS on the first node (linux1). The installer will eject the CD from the CD-ROM drive. Take out the CD and click [Reboot] to reboot the system.

When the system boots into Linux for the first time, it will prompt you with another Welcome screen. The following wizard allows you to configure the date and time, add any additional users, test the sound card, and to install any additional CDs. The only screen I care about is the time and date (and if you are using CentOS 4.x, the monitor/display settings). As for the others, simply run through them as there is nothing additional that needs to be installed (at this point anyways!). If everything was successful, you should now be presented with the login screen.

Perform the same installation on the second node
After completing the Linux installation on the first node, repeat the above steps for the second node (linux2). When configuring the machine name and networking, ensure to configure the proper values. For my installation, this is what I configured for linux2:

First, make sure that each of the network devices are checked to [Active on boot]. The installer may choose to not activate eth1 by default

Second, [Edit] both eth0 and eth1 as follows:

eth0:
- Check OFF the option to [Configure using DHCP]
- Leave the [Activate on boot] checked ON
- IP Address: 192.168.1.101
- Netmask: 255.255.255.0

eth1:
- Check OFF the option to [Configure using DHCP]
- Leave the [Activate on boot] checked ON
- IP Address: 192.168.2.101
- Netmask: 255.255.255.0

Continue by setting your hostname manually. I used "linux2" for the second node. Finish this dialog off by supplying your gateway and DNS servers.



Network Configuration


  Perform the following network configuration on both Oracle RAC nodes in the cluster!


Introduction to Network Settings

Although we configured several of the network settings during the installation of CentOS, it is important to not skip this section as it contains critical steps that are required for a successful RAC environment.

During the Linux O/S install we already configured the IP address and host name for both of the Oracle RAC nodes. We now need to configure the /etc/hosts file as well as adjusting several of the network settings for the interconnect.

Both of the Oracle RAC nodes should have one static IP address for the public network and one static IP address for the private cluster interconnect. The private interconnect should only be used by Oracle to transfer Cluster Manager and Cache Fusion related data along with data for the network storage server (Openfiler). Note that Oracle does not support using the public network interface for the interconnect. You must have one network interface for the public network and another network interface for the private interconnect. For a production RAC implementation, the interconnect should be at least gigabit (or more) and only be used by Oracle as well as having the network storage server (Openfiler) on a separate gigabit network.


Configuring Public and Private Network

In our two node example, we need to configure the network on both Oracle RAC nodes for access to the public network as well as their private interconnect.

The easiest way to configure network settings in Red Hat Linux is with the program Network Configuration. This application can be started from the command-line as the "root" user account as follows:

# su -
# /usr/bin/system-config-network &

  Do not use DHCP naming for the public IP address or the interconnects - we need static IP addresses!

Using the Network Configuration application, you need to configure both NIC devices as well as the /etc/hosts file. Both of these tasks can be completed using the Network Configuration GUI. Notice that the /etc/hosts entries are the same for both nodes.

Our example configuration will use the following settings:

Oracle RAC Node 1 - (linux1)
Device IP Address Subnet Gateway Purpose
eth0 192.168.1.100 255.255.255.0 192.168.1.1 Connects linux1 to the public network
eth1 192.168.2.100 255.255.255.0   Connects linux1 (interconnect) to linux2 (linux2-priv)
/etc/hosts
127.0.0.1        localhost.localdomain localhost

# Public Network - (eth0)
192.168.1.100    linux1
192.168.1.101    linux2

# Private Interconnect - (eth1)
192.168.2.100    linux1-priv
192.168.2.101    linux2-priv

# Public Virtual IP (VIP) addresses - (eth0:1)
192.168.1.200    linux1-vip
192.168.1.201    linux2-vip

# Private Storage Network for Openfiler - (eth1)
192.168.1.195    openfiler1
192.168.2.195    openfiler1-priv

Oracle RAC Node 2 - (linux2)
Device IP Address Subnet Gateway Purpose
eth0 192.168.1.101 255.255.255.0 192.168.1.1 Connects linux2 to the public network
eth1 192.168.2.101 255.255.255.0   Connects linux2 (interconnect) to linux1 (linux1-priv)
/etc/hosts
127.0.0.1        localhost.localdomain localhost

# Public Network - (eth0)
192.168.1.100    linux1
192.168.1.101    linux2

# Private Interconnect - (eth1)
192.168.2.100    linux1-priv
192.168.2.101    linux2-priv

# Public Virtual IP (VIP) addresses - (eth0:1)
192.168.1.200    linux1-vip
192.168.1.201    linux2-vip

# Private Storage Network for Openfiler - (eth1)
192.168.1.195    openfiler1
192.168.2.195    openfiler1-priv

  Note that the virtual IP addresses only need to be defined in the /etc/hosts file (or your DNS) for both Oracle RAC nodes. The public virtual IP addresses will be configured automatically by Oracle when you run the Oracle Universal Installer, which starts Oracle's Virtual Internet Protocol Configuration Assistant (VIPCA). All virtual IP addresses will be activated when the srvctl start nodeapps -n <node_name> command is run. Although I am getting ahead of myself, this is the Host Name/IP Address that will be configured in the client(s) tnsnames.ora file for each Oracle Net Service Name. All of this will be explained much later in this article!


In the screen shots below, only Oracle RAC Node 1 (linux1) is shown. Ensure to make all the proper network settings to both Oracle RAC nodes!



Figure 2: Network Configuration Screen - Node 1 (linux1)



Figure 3: Ethernet Device Screen - eth0 (linux1)



Figure 4: Ethernet Device Screen - eth1 (linux1)



Figure 5: Network Configuration Screen - /etc/hosts (linux1)


Once the network is configured, you can use the ifconfig command to verify everything is working. The following example is from linux1:

# /sbin/ifconfig -a
eth0      Link encap:Ethernet  HWaddr 00:14:6C:76:5C:71
          inet addr:192.168.1.100  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::214:6cff:fe76:5c71/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:3059 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1539 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:3460697 (3.3 MiB)  TX bytes:145612 (142.1 KiB)
          Interrupt:169 Base address:0xef00

eth1      Link encap:Ethernet  HWaddr 00:0E:0C:64:D1:E5
          inet addr:192.168.2.100  Bcast:192.168.2.255  Mask:255.255.255.0
          inet6 addr: fe80::20e:cff:fe64:d1e5/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:11 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 b)  TX bytes:782 (782.0 b)
          Base address:0xddc0 Memory:fe9c0000-fe9e0000

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:1764 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1764 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1991946 (1.8 MiB)  TX bytes:1991946 (1.8 MiB)

sit0      Link encap:IPv6-in-IPv4
          NOARP  MTU:1480  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)


About Virtual IP

Why do we have a Virtual IP (VIP) in 10g? Why does it just return a dead connection when its primary node fails?

It's all about availability of the application. When a node fails, the VIP associated with it is supposed to be automatically failed over to some other node. When this occurs, two things happen.

  1. The new node re-arps the world indicating a new MAC address for the address. For directly connected clients, this usually causes them to see errors on their connections to the old address.

  2. Subsequent packets sent to the VIP go to the new node, which will send error RST packets back to the clients. This results in the clients getting errors immediately.

This means that when the client issues SQL to the node that is now down, or traverses the address list while connecting, rather than waiting on a very long TCP/IP time-out (~10 minutes), the client receives a TCP reset. In the case of SQL, this is ORA-3113. In the case of connect, the next address in tnsnames is used.

Going one step further is making use of Transparent Application Failover (TAF). With TAF successfully configured, it is possible to completely avoid ORA-3113 errors all together! TAF will be discussed in more detail in the section "Transparent Application Failover - (TAF)".

Without using VIPs, clients connected to a node that died will often wait a 10 minute TCP timeout period before getting an error. As a result, you don't really have a good HA solution without using VIPs.

Source - Metalink: "RAC Frequently Asked Questions" (Note:220970.1)


Confirm the RAC Node Name is Not Listed in Loopback Address

Ensure that the node names (linux1 or linux2) are not included for the loopback address in the /etc/hosts file. If the machine name is listed in the in the loopback address entry as below:
    127.0.0.1        linux1 localhost.localdomain localhost
it will need to be removed as shown below:
    127.0.0.1        localhost.localdomain localhost

  If the RAC node name is listed for the loopback address, you will receive the following error during the RAC installation:
ORA-00603: ORACLE server session terminated by fatal error
or
ORA-29702: error occurred in Cluster Group Service operation


Confirm localhost is defined in the /etc/hosts file for the loopback address

Ensure that the entry for localhost.localdomain and localhost are included for the loopback address in the /etc/hosts file for each of the Oracle RAC nodes:
    127.0.0.1        localhost.localdomain localhost

  If an entry does not exist for localhost in the /etc/hosts file, Oracle Clusterware will be unable to start the application resources — notably the ONS process. The error would indicate "Failed to get IP for localhost" and will be written to the log file for ONS. For example:
CRS-0215 could not start resource 'ora.linux1.ons'. Check log file
"/u01/app/crs/log/linux1/racg/ora.linux1.ons.log"
for more details.
The ONS log file will contain lines similar to the following:

Oracle Database 10g CRS Release 10.2.0.1.0 Production Copyright 1996, 2005 Oracle. All rights reserved.
2007-04-14 13:10:02.729: [ RACG][3086871296][13316][3086871296][ora.linux1.ons]: Failed to get IP for localhost (1)
Failed to get IP for localhost (1)
Failed to get IP for localhost (1)
onsctl: ons failed to start
...


Adjusting Network Settings

With Oracle 9.2.0.1 and onwards, Oracle now makes use of UDP as the default protocol on Linux for inter-process communication (IPC), such as Cache Fusion and Cluster Manager buffer transfers between instances within the RAC cluster.

Oracle strongly suggests to adjust the default and maximum send buffer size (SO_SNDBUF socket option) to 256 KB, and the default and maximum receive buffer size (SO_RCVBUF socket option) to 256 KB.

The receive buffers are used by TCP and UDP to hold received data until it is read by the application. The receive buffer cannot overflow because the peer is not allowed to send data beyond the buffer size window. This means that datagrams will be discarded if they don't fit in the socket receive buffer. This could cause the sender to overwhelm the receiver.

  The default and maximum window size can be changed in the /proc file system without reboot:
# su - root

# sysctl -w net.core.rmem_default=262144
net.core.rmem_default = 262144

# sysctl -w net.core.rmem_max=262144
net.core.rmem_max = 262144

# sysctl -w net.core.wmem_default=262144
net.core.wmem_default = 262144

# sysctl -w net.core.wmem_max=262144
net.core.wmem_max = 262144

The above commands made the changes to the already running O/S. You should now make the above changes permanent (for each reboot) by adding the following lines to the /etc/sysctl.conf file for both nodes in your RAC cluster:

# +---------------------------------------------------------+
# | ADJUSTING NETWORK SETTINGS                              |
# +---------------------------------------------------------+
# | With Oracle 9.2.0.1 and onwards, Oracle now makes use   |
# | of UDP as the default protocol on Linux for             |
# | inter-process communication (IPC), such as Cache Fusion |
# | and Cluster Manager buffer transfers between instances  |
# | within the RAC cluster. Oracle strongly suggests to     |
# | adjust the default and maximum receive buffer size      |
# | (SO_RCVBUF socket option) to 256 KB, and the default    |
# | and maximum send buffer size (SO_SNDBUF socket option)  |
# | to 256 KB. The receive buffers are used by TCP and UDP  |
# | to hold received data until it is read by the           |
# | application. The receive buffer cannot overflow because |
# | the peer is not allowed to send data beyond the buffer  |
# | size window. This means that datagrams will be          |
# | discarded if they don't fit in the socket receive       |
# | buffer. This could cause the sender to overwhelm the    |
# | receiver.                                               |
# +---------------------------------------------------------+

# +---------------------------------------------------------+
# | Default setting in bytes of the socket "receive" buffer |
# | which may be set by using the SO_RCVBUF socket option.  |
# +---------------------------------------------------------+
net.core.rmem_default=262144

# +---------------------------------------------------------+
# | Maximum setting in bytes of the socket "receive" buffer |
# | which may be set by using the SO_RCVBUF socket option.  |
# +---------------------------------------------------------+
net.core.rmem_max=262144

# +---------------------------------------------------------+
# | Default setting in bytes of the socket "send" buffer    |
# | which may be set by using the SO_SNDBUF socket option.  |
# +---------------------------------------------------------+
net.core.wmem_default=262144

# +---------------------------------------------------------+
# | Maximum setting in bytes of the socket "send" buffer    |
# | which may be set by using the SO_SNDBUF socket option.  |
# +---------------------------------------------------------+
net.core.wmem_max=262144


Check and turn off UDP ICMP rejections:

During the Linux installation process, I indicated to not configure the firewall option. (By default the option to configure a firewall is selected by the installer.) This has burned me several times so I like to do a double-check that the firewall option is not configured and to ensure udp ICMP filtering is turned off.

If UDP ICMP is blocked or rejected by the firewall, the Oracle Clusterware software will crash after several minutes of running. When the Oracle Clusterware process fails, you will have something similar to the following in the <machine_name>_evmocr.log file:

08/29/2005 22:17:19
oac_init:2: Could not connect to server, clsc retcode = 9
08/29/2005 22:17:19
a_init:12!: Client init unsuccessful : [32]
ibctx:1:ERROR: INVALID FORMAT
proprinit:problem reading the bootblock or superbloc 22
When experiencing this type of error, the solution is to remove the udp ICMP (iptables) rejection rule - or to simply have the firewall option turned off. The Oracle Clusterware software will then start to operate normally and not crash. The following commands should be executed as the root user account:

  1. Check to ensure that the firewall option is turned off. If the firewall option is stopped (like it is in my example below) you do not have to proceed with the following steps.
    # /etc/rc.d/init.d/iptables status
    Firewall is stopped.

  2. If the firewall option is operating you will need to first manually disable UDP ICMP rejections:
    # /etc/rc.d/init.d/iptables stop
    
    Flushing firewall rules: [  OK  ]
    Setting chains to policy ACCEPT: filter [  OK  ]
    Unloading iptables modules: [  OK  ]

  3. Then, to turn UDP ICMP rejections off for next server reboot (which should always be turned off):
    # chkconfig iptables off 



Install Openfiler


  Perform the following installation on the network storage server (openfiler1)!

With the network configured on both Oracle RAC nodes, the next step is to install the Openfiler software to the network storage server (openfiler1). Later in this article, the network storage server will be configured as an iSCSI storage device for all Oracle RAC 10g shared storage requirements.

Powered by rPath Linux, Openfiler is a free browser-based network storage management utility that delivers file-based Network Attached Storage (NAS) and block-based Storage Area Networking (SAN) in a single framework. The entire software stack interfaces with open source applications such as Apache, Samba, LVM2, ext3, Linux NFS and iSCSI Enterprise Target. Openfiler combines these ubiquitous technologies into a small, easy to manage solution fronted by a powerful web-based management interface.

Openfiler supports CIFS, NFS, HTTP/DAV, and FTP, however, we will only be making use of its iSCSI capabilities to implement an inexpensive SAN for the shared storage components required by Oracle RAC 10g. A 500GB external hard drive will be connected to the Openfiler server via its USB 2.0 interface. The Openfiler server will be configured to use this disk for iSCSI based storage and will be used in our Oracle RAC 10g configuration to store the shared files required by Oracle Clusterware as well as all Oracle ASM volumes.

To learn more about Openfiler, please visit their website at http://www.openfiler.com/


Download Openfiler

Use the links (below) to download Openfiler 2.2 x86 (respin 2). After downloading Openfiler, you will then need to burn the ISO image to CD.

Note: At this time, Openfiler 2.3 is not supported to work with this article. Please use Openfiler 2.2 (respin 2). My current plans are to have this article updated and fully tested to work with CentOS 5.3 and Openfiler 2.3 (Final) by Q2 2009.

    openfiler-2.2-x86-disc1.iso

Processor Type x86
Size 323 MB
SHA1SUM cae69e2452eb660a3b73c315c6435c99fc25976d

    openfiler-2.2-x86_64-disc1.iso

Processor Type x86_64
Size 329 MB
SHA1SUM bbe345362a49db5ff7c19ac5768fc2c67f48037c


  If you are downloading the above ISO file to a MS Windows machine, there are many options for burning the ISO image (ISO file) to a CD. You may already be familiar with and have the proper software to burn images to CD. If you are not familiar with this process and do not have the required software to burn images to CD, here are just two (of many) software packages that can be used:

  UltraISO
  Magic ISO Maker


Install Openfiler

This section provides a summary of the screens used to install the Openfiler software. For the purpose of this article, I opted to install Openfiler with all default options. The only manual change required was for configuring the local network settings.

Once the install has completed, the server will reboot to make sure all required components, services and drivers are started and recognized. After the reboot, the external hard drive should be discovered by the Openfiler server as the device /dev/sda.

For more detailed installation instructions, please visit http://www.openfiler.com/docs/. I would suggest, however, that the instructions I have provided below be used for this Oracle RAC 10g configuration.

  Before installing the Openfiler software to the network storage server, you should have both NIC interfaces (cards) installed and any external hard drives connected and turned on.

After downloading and burning the Openfiler ISO image (ISO file) to CD, insert the CD into the network storage server (openfiler1 in this example), power it on, and answer the installation screen prompts as noted below.

Boot Screen

The first screen is the Openfiler boot screen. At the boot: prompt, hit [Enter] to start the installation process.
Media Test
When asked to test the CD media, tab over to [Skip] and hit [Enter]. If there were any errors, the media burning software would have warned us. After several seconds, the installer should then detect the video card, monitor, and mouse. The installer then goes into GUI mode.
Welcome to Openfiler NAS/SAN Appliance
At the welcome screen, click [Next] to continue.
Keyboard Configuration
The next screen prompts you for the Keyboard settings. Make the appropriate selection for your configuration.
Disk Partitioning Setup
The next screen asks whether to perform disk partitioning using "Automatic Partitioning" or "Manual Partitioning with Disk Druid". You can choose either method here, although the official Openfiler documentation suggests to use Manual Partitioning. Since the internal hard drive I will be using for this install is small and only going to be used to store the Openfiler software (I will not be using any space on the internal 40GB hard drive for iSCSI storage), I opted to use "Automatic Partitioning".

Select [Automatically partition] and click [Next] continue.

If there were a previous installation of Linux on this machine, the next screen will ask if you want to "remove" or "keep" old partitions. Select the option to [Remove all partitions on this system].

Important: Ensure that ONLY the hard drive you are going to use for the Openfiler software is selected for this installation (i.e. [hda]). If Openfiler detected any other internal or external disks that will be used for database storage, un-select them now. For example, in addition to the [hda] drive showing up, Openfiler also detected and selected the external 500GB SATA hard drive (as [sda]) which I needed to "un-select".

I also keep the checkbox [Review (and modify if needed) the partitions created] selected. Click [Next] to continue.

You will then be prompted with a dialog window asking if you really want to remove all partitions. Click [Yes] to acknowledge this warning.

Partitioning
The installer will then allow you to view (and modify if needed) the disk partitions it automatically selected for /dev/hda. In almost all cases, the installer will choose 100MB for /boot, double the amount of RAM for swap, and the rest going to the root (/) partition. I like to have a minimum of 1GB for swap. For the purpose of this install, I will accept all automatically preferred sizes. (Including 2GB for swap since I have 2GB of RAM installed.)
Network Configuration
I made sure to install both NIC interfaces (cards) in the network storage server before starting the Openfiler installation. This screen should have successfully detected each of the network devices.

First, make sure that each of the network devices are checked to [Active on boot]. The installer may choose to not activate eth1 by default.

Second, [Edit] both eth0 and eth1 as follows. You may choose to use different IP addresses for both eth0 and eth1 and that is OK. You must, however, configure eth1 (the storage network) to be on the same subnet you configured for eth1 on linux1 and linux2:

eth0:
- Check OFF the option to [Configure using DHCP]
- Leave the [Activate on boot] checked ON
- IP Address: 192.168.1.195
- Netmask: 255.255.255.0

eth1:
- Check OFF the option to [Configure using DHCP]
- Leave the [Activate on boot] checked ON
- IP Address: 192.168.2.195
- Netmask: 255.255.255.0

Continue by setting your hostname manually. I used a hostname of "openfiler1". Finish this dialog off by supplying your gateway and DNS servers.

Time Zone Selection
The next screen allows you to configure your time zone information. Make the appropriate selection for your location.
Set Root Password
Select a root password and click [Next] to continue.
About to Install
This screen is basically a confirmation screen. Click [Next] to start the installation.
Congratulations
And that's it. You have successfully installed Openfiler on the network storage server. The installer will eject the CD from the CD-ROM drive. Take out the CD and click [Reboot] to reboot the system.

If everything was successful after the reboot, you should now be presented with a text login screen and the URL(s) to use for administering the Openfiler server.

Modify /etc/hosts File on Openfiler Server
Although not mandatory, I typically copy the contents of the /etc/hosts file from one of the Oracle RAC nodes to the new Openfiler server. This allows convenient name resolution when testing the network for the cluster.



Configure iSCSI Volumes using Openfiler


  Perform the following configuration tasks on the network storage server (openfiler1)!

Openfiler administration is performed using the Openfiler Storage Control Center — a browser based tool over an https connection on port 446. For example:

https://openfiler1:446/
From the Openfiler Storage Control Center home page, login as an administrator. The default administration login credentials for Openfiler are:

The first page the administrator sees is the [Accounts] / [Authentication] screen. Configuring user accounts and groups is not necessary for this article and will therefore not be discussed.

To use Openfiler as an iSCSI storage server, we have to perform three major tasks; set up iSCSI services, configure network access, and create physical storage.


Services

To control services, use the Openfiler Storage Control Center and navigate to [Services] / [Enable/Disable]:


Figure 6: Enable iSCSI Openfiler Service

To enable the iSCSI service, click on 'Enable' under the 'iSCSI target' service name. After that, the 'iSCSI target' status should change to 'Disable'.

The ietd program implements the user level part of iSCSI Enterprise Target software for building an iSCSI storage system on Linux. With the iSCSI target enabled, we should be able to SSH into the Openfiler server and see the iscsi-target service running:

[root@openfiler1 ~]# service iscsi-target status
ietd (pid 3784) is running...


Network Access Restriction

The next step is to configure network access in Openfiler so both Oracle RAC nodes (linux1 and linux2) have permissions to our iSCSI volumes through the storage (private) network.

  iSCSI volumes will be created in the next section!

Again, this task can be completed using the Openfiler Storage Control Center by navigating to [General] / [Local Networks]. The Local Networks screen allows an administrator to setup networks and/or hosts that will be allowed to access resources exported by the Openfiler appliance. For the purpose of this article, we will want to add both Oracle RAC nodes individually rather than allowing the entire 192.168.2.0 network have access to Openfiler resources.

When entering each of the Oracle RAC nodes, note that the 'Name' field is just a logical name used for reference only. As a convention when entering nodes, I simply use the node name defined for that IP address. Next, when entering the actual node in the 'Network/Host' field, always use it's IP address even though its host name may already be defined in your /etc/hosts file or DNS. Lastly, when entering actual hosts in our Class C network, use a subnet mask of 255.255.255.255.

It is important to remember that you will be entering the IP address of the private network (eth1) for each of the RAC nodes in the cluster.

The following image shows the results of adding both Oracle RAC nodes:


Figure 7: Configure Openfiler Host Access for Oracle RAC Nodes


Physical Storage

In this section, we will be creating the five iSCSI volumes to be used as shared storage by both of the Oracle RAC nodes in the cluster. This involves multiple steps that will be performed on the external USB hard drive connected to the Openfiler server.

Storage devices like internal IDE/SATA/SCSI disks, external USB or FireWire drives, or any other storage can be connected to the Openfiler server, and served to the clients. Once these devices are discovered at the OS level, Openfiler Storage Control Center can be used to set up and manage all that storage.

In our case, we have a 500GB external USB hard drive for our storage needs. On the Openfiler server this drive is seen as /dev/sda (HDS72505 0KLAT80). To see this and to start the process of creating our iSCSI volumes, navigate to [Volumes] / [Physical Storage Mgmt.] from the Openfiler Storage Control Center:


Figure 8: Openfiler Physical Storage

Partitioning the Physical Disk

The first step we will perform is to create a single primary partition on the /dev/sda external USB hard drive. By clicking on the /dev/sda link, we are presented with the options to 'Edit' or 'Create' a partition. Since we will be creating a single primary partition that spans the entire disk, most of the options can be left to their default setting where the only modification would be to change the 'Partition Type' from 'Extended partition' to 'Physical volume'. Here are the values I specified to create the primary partition on /dev/sda:

Mode: Primary
Partition Type: Physical volume
Starting Cylinder: 1
Ending Cylinder: 60801

The size now shows 465.76 GB. To accept that, we click on the Create button. This results in a new partition (/dev/sda1) on our external hard drive:


Figure 9: Partition the Physical Volume

Volume Group Management
The next step is to create a Volume Group. We will be creating a single volume group named rac1 that contains the newly created primary partition.

From the Openfiler Storage Control Center, navigate to [Volumes] / [Volume Group Mgmt.]. There we would see any existing volume groups, or none as in our case. Using the Volume Group Management screen, enter the name of the new volume group (rac1), click on the checkbox in front of /dev/sda1 to select that partition, and finally click on the 'Add volume group' button. After that we are presented with the list that now shows our newly created volume group named "rac1":


Figure 10: New Volume Group Created

Logical Volumes
We can now create the five logical volumes in the newly created volume group (rac1).

From the Openfiler Storage Control Center, navigate to [Volumes] / [Create New Volume]. There we will see the newly created volume group (rac1) along with its block storage statistics. Also available at the bottom of this screen is the option to create a new volume in the selected volume group. Use this screen to create the following five logical (iSCSI) volumes. After creating each logical volume, the application will point you to the "List of Existing Volumes" screen. You will then need to click back to the "Create New Volume" tab to create the next logical volume until all five iSCSI volumes are created:

iSCSI / Logical Volumes
Volume Name Volume Description Required Space (MB) Filesystem Type
crs Oracle Clusterware 2,048 iSCSI
asm1 Oracle ASM Volume 1 118,720 iSCSI
asm2 Oracle ASM Volume 2 118,720 iSCSI
asm3 Oracle ASM Volume 3 118,720 iSCSI
asm4 Oracle ASM Volume 4 118,720 iSCSI

In effect we have created five iSCSI disks that can now be presented to iSCSI clients (linux1 and linux2) on the network. The "List of Existing Volumes" screen should look as follows:


Figure 11: New Logical (iSCSI) Volumes

Grant Access Rights to New Logical Volumes
Before an iSCSI client can have access to the newly created iSCSI volumes, it needs to be granted the appropriate permissions. Awhile back, we configured Openfiler with two hosts (the Oracle RAC nodes) that can be configured with access rights to resources. We now need to grant both of the Oracle RAC nodes access to each of the newly created iSCSI volumes.

From the Openfiler Storage Control Center, navigate to [Volumes] / [List of Existing Volumes]. This will present the screen shown in the previous section. For each of the logical volumes, click on the 'Edit' link (under the Properties column). This will bring up the 'Edit properties' screen for that volume. Scroll to the bottom of this screen; change both hosts from 'Deny' to 'Allow' and click the 'Update' button. Perform this task for all five logical volumes.


Figure 12: Grant Host Access to Logical (iSCSI) Volumes

Make iSCSI Targets Available to Clients
Every time a new logical volume is added, we need to restart the associated service on the Openfiler server. In our case we created five iSCSI logical volumes, so we have to restart the iSCSI target (iscsi-target) service. This will make the new iSCSI targets available to all clients on the network who have privileges to access them.

To restart the iSCSI target service, use the Openfiler Storage Control Center and navigate to [Services] / [Enable/Disable]. The iSCSI target service should already be enabled (several sections back). If so, disable the service then enable it again. (See Figure 6)

The same task can be achieved through an SSH session on the Openfiler server:

[root@openfiler1 ~]# service iscsi-target restart
Stopping iSCSI target service: [  OK  ]
Starting iSCSI target service: [  OK  ]



Configure iSCSI Volumes on Oracle RAC Nodes


  Configure the iSCSI initiator on both Oracle RAC nodes in the cluster! Creating partitions, however, should only be executed on one of nodes in the RAC cluster.

An iSCSI client can be any system (Linux, Unix, MS Windows, Apple Mac, etc.) for which iSCSI support (a driver) is available. In our case, the clients are two Linux servers, linux1 and linux2, running Red Hat 4.

In this section we will be configuring the iSCSI initiator on both of the Oracle RAC nodes. This involves configuring the /etc/iscsi.conf file on both of the Oracle RAC nodes with the name of the network storage server (openfiler1) so they can discover the iSCSI volumes created in the previous section. We then go through the arduous task of mapping the iSCSI target names discovered from Openfiler to the local SCSI device name on one of the nodes — namely linux1 (the node where we will be partitioned the iSCSI volumes from). This is often considered a lengthy task but only needs to be performed in this section and when formatting the iSCSI volumes with the Oracle Cluster File System (OCFS2) and Automatic Storage Management (ASM). Knowing the local SCSI device name and which iSCSI target it maps to is required in order to know which volume (device) is to be used for OCFS2 and which volumes belong to ASM. Note that every time one of the Oracle RAC nodes is rebooted, the mappings may be different. For example, the iSCSI target name "iqn.2006-01.com.openfiler:rac1.crs" may have been discovered as /dev/sdd on linux1 during the process of configuring the volumes (as it was for me when writing this section!). After rebooting this node, however, "iqn.2006-01.com.openfiler:rac1.crs" may get discovered as /dev/sde. This will not be a problem for normal operations after the installation and configuration since all disks will be labeled either by OCFS2 or ASM (later in this article). When either of these services attempt to mount a volume, it will do so using their label and not using their local SCSI device name.


iSCSI (initiator) service

On each of the Oracle RAC nodes, we have to make sure the iSCSI (initiator) service is up and running. If not installed as part of the operating system setup, the iscsi-initiator-utils RPM (i.e. iscsi-initiator-utils-4.0.3.0-5.i386.rpm) should be downloaded and installed on each of the Oracle RAC nodes.

  Both of the Oracle RAC nodes must have the iscsi-initiator-utils RPM installed. To determine if this package is installed, perform the following on both Oracle RAC nodes:
# rpm -qa | grep iscsi
iscsi-initiator-utils-4.0.3.0-5
If not installed, the iscsi-initiator-utils RPM package can be found on disk 3 of 4 of the RHEL4 Update 5 distribution or downloaded from one of the Internet RPM resources.

Use the following command to install the iscsi-initiator-utils RPM package if not present:

# rpm -Uvh iscsi-initiator-utils-4.0.3.0-5.i386.rpm
warning: iscsi-initiator-utils-4.0.3.0-5.i386.rpm: 
  V3 DSA signature: NOKEY, key ID 443e1821
Preparing...                ########################################### [100%]
   1:iscsi-initiator-utils  ########################################### [100%]

After verifying that the iscsi-initiator-utils RPM is installed, the only configuration step required on the Oracle RAC nodes (iSCSI client) is to specify the network storage server (iSCSI server) in the /etc/iscsi.conf file. Edit the /etc/iscsi.conf file and include an entry for DiscoveryAddress which specifies the hostname of the Openfiler network storage server. In our case that was:

...
DiscoveryAddress=openfiler1-priv
...
After making the change to the /etc/iscsi.conf file on both Oracle RAC nodes, we can start (or restart) the iscsi initiator service on both nodes:
# service iscsi restart
Searching for iscsi-based multipath maps
Found 0 maps
Stopping iscsid: iscsid not running

Checking iscsi config:  [  OK  ]
Loading iscsi driver:  [  OK  ]
Starting iscsid: [  OK  ]
We should also configure the iSCSI service to be active across machine reboots for both Oracle RAC nodes. The Linux command chkconfig can be used to achieve that as follows:
# chkconfig --level 345 iscsi on


Discovering iSCSI Targets

Although the iSCSI initiator service has been configured and is running on both of the Oracle RAC nodes, the discovery instructions in this section only need to be run from the node we will be partitioning and labeling volumes from; namely linux1.

When the Openfiler server publishes available iSCSI targets, configured clients get the message that new iSCSI disks are now available. This happens when the iscsi-target service gets started/restarted on the Openfiler server or when the iSCSI initiator service is started/restarted on the client. We would see something like this in the client's /var/log/messages file:

...
Aug 18 12:39:39 linux1 iscsi: iscsid startup succeeded
Aug 18 12:39:39 linux1 iscsid[13822]: Connected to Discovery Address 192.168.2.195
Aug 18 12:39:39 linux1 kernel: iscsi-sfnet:host0: Session established
Aug 18 12:39:39 linux1 kernel: iscsi-sfnet:host2: Session established
Aug 18 12:39:39 linux1 kernel: iscsi-sfnet:host1: Session established
Aug 18 12:39:39 linux1 kernel: scsi0 : SFNet iSCSI driver
Aug 18 12:39:39 linux1 kernel: scsi2 : SFNet iSCSI driver
Aug 18 12:39:39 linux1 kernel: scsi1 : SFNet iSCSI driver
Aug 18 12:39:39 linux1 kernel:   Vendor: Openfile  Model: Virtual disk      Rev: 0
Aug 18 12:39:39 linux1 kernel:   Type:   Direct-Access                      ANSI SCSI revision: 04
Aug 18 12:39:39 linux1 kernel: SCSI device sda: 243138560 512-byte hdwr sectors (124487 MB)
Aug 18 12:39:39 linux1 kernel: SCSI device sda: drive cache: write through
Aug 18 12:39:39 linux1 kernel:   Vendor: Openfile  Model: Virtual disk      Rev: 0
Aug 18 12:39:39 linux1 kernel:   Type:   Direct-Access                      ANSI SCSI revision: 04
Aug 18 12:39:39 linux1 kernel: SCSI device sda: 243138560 512-byte hdwr sectors (124487 MB)
Aug 18 12:39:39 linux1 kernel: iscsi-sfnet:host3: Session established
Aug 18 12:39:39 linux1 kernel: iscsi-sfnet:host4: Session established
Aug 18 12:39:39 linux1 kernel: scsi3 : SFNet iSCSI driver
Aug 18 12:39:39 linux1 kernel: SCSI device sda: drive cache: write through
Aug 18 12:39:39 linux1 kernel:  sda: unknown partition table
Aug 18 12:39:39 linux1 kernel: Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
Aug 18 12:39:39 linux1 kernel:   Vendor: Openfile  Model: Virtual disk      Rev: 0
Aug 18 12:39:39 linux1 scsi.agent[13934]: disk at /devices/platform/host0/target0:0:0/0:0:0:0
Aug 18 12:39:39 linux1 kernel:   Type:   Direct-Access                      ANSI SCSI revision: 04
Aug 18 12:39:39 linux1 kernel:   Vendor: Openfile  Model: Virtual disk      Rev: 0
Aug 18 12:39:39 linux1 kernel:   Type:   Direct-Access                      ANSI SCSI revision: 04
Aug 18 12:39:39 linux1 kernel: scsi4 : SFNet iSCSI driver
Aug 18 12:39:39 linux1 kernel: SCSI device sdb: 243138560 512-byte hdwr sectors (124487 MB)
Aug 18 12:39:39 linux1 kernel:   Vendor: Openfile  Model: Virtual disk      Rev: 0
Aug 18 12:39:39 linux1 kernel:   Type:   Direct-Access                      ANSI SCSI revision: 04
Aug 18 12:39:39 linux1 kernel: SCSI device sdb: drive cache: write through
Aug 18 12:39:39 linux1 scsi.agent[13983]: disk at /devices/platform/host2/target2:0:0/2:0:0:0
Aug 18 12:39:39 linux1 scsi.agent[13996]: disk at /devices/platform/host3/target3:0:0/3:0:0:0
Aug 18 12:39:40 linux1 kernel: SCSI device sdb: 243138560 512-byte hdwr sectors (124487 MB)
Aug 18 12:39:40 linux1 kernel: SCSI device sdb: drive cache: write through
Aug 18 12:39:40 linux1 kernel:  sdb: unknown partition table
Aug 18 12:39:40 linux1 kernel: Attached scsi disk sdb at scsi2, channel 0, id 0, lun 0
Aug 18 12:39:40 linux1 kernel: SCSI device sdc: 243138560 512-byte hdwr sectors (124487 MB)
Aug 18 12:39:40 linux1 kernel: SCSI device sdc: drive cache: write through
Aug 18 12:39:40 linux1 kernel: SCSI device sdc: 243138560 512-byte hdwr sectors (124487 MB)
Aug 18 12:39:40 linux1 kernel: SCSI device sdc: drive cache: write through
Aug 18 12:39:40 linux1 kernel:  sdc: unknown partition table
Aug 18 12:39:40 linux1 kernel: Attached scsi disk sdc at scsi3, channel 0, id 0, lun 0
Aug 18 12:39:40 linux1 kernel: SCSI device sdd: 243138560 512-byte hdwr sectors (124487 MB)
Aug 18 12:39:40 linux1 kernel: SCSI device sdd: drive cache: write through
Aug 18 12:39:40 linux1 kernel: SCSI device sdd: 243138560 512-byte hdwr sectors (124487 MB)
Aug 18 12:39:40 linux1 kernel: SCSI device sdd: drive cache: write through
Aug 18 12:39:40 linux1 kernel:  sdd: unknown partition table
Aug 18 12:39:40 linux1 kernel: Attached scsi disk sdd at scsi1, channel 0, id 0, lun 0
Aug 18 12:39:40 linux1 kernel: SCSI device sde: 4194304 512-byte hdwr sectors (2147 MB)
Aug 18 12:39:40 linux1 scsi.agent[14032]: disk at /devices/platform/host4/target4:0:0/4:0:0:0
Aug 18 12:39:40 linux1 scsi.agent[14045]: disk at /devices/platform/host1/target1:0:0/1:0:0:0
Aug 18 12:39:40 linux1 kernel: SCSI device sde: drive cache: write through
Aug 18 12:39:40 linux1 kernel: SCSI device sde: 4194304 512-byte hdwr sectors (2147 MB)
Aug 18 12:39:40 linux1 kernel: SCSI device sde: drive cache: write through
Aug 18 12:39:40 linux1 kernel:  sde: unknown partition table
Aug 18 12:39:40 linux1 kernel: Attached scsi disk sde at scsi4, channel 0, id 0, lun 0
...
The above entries show that the client (linux1) was able to establish the iSCSI sessions with the iSCSI storage server (openfiler1-priv at 192.168.2.195).

We also see how the local SCSI device names map to iSCSI targets' host IDs and LUNs:

Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
Attached scsi disk sdb at scsi2, channel 0, id 0, lun 0
Attached scsi disk sdc at scsi3, channel 0, id 0, lun 0
Attached scsi disk sdd at scsi1, channel 0, id 0, lun 0
Attached scsi disk sde at scsi4, channel 0, id 0, lun 0


Another way to determine how local SCSI device names map to iSCSI targets' host IDs and LUNs is with the dmesg command:

# dmesg | sort | grep '^Attached scsi disk'
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
Attached scsi disk sdb at scsi2, channel 0, id 0, lun 0
Attached scsi disk sdc at scsi3, channel 0, id 0, lun 0
Attached scsi disk sdd at scsi1, channel 0, id 0, lun 0
Attached scsi disk sde at scsi4, channel 0, id 0, lun 0


We now have to work out the mapping of iSCSI target names to local SCSI IDs (which gets displayed as HOST ID below), by running the iscsi-ls command on the client (linux1):

# iscsi-ls
*******************************************************************************
SFNet iSCSI Driver Version ...4:0.1.11-4(15-Jan-2007)
*******************************************************************************
TARGET NAME             : iqn.2006-01.com.openfiler:rac1.asm4
TARGET ALIAS            :
HOST ID                 : 0
BUS ID                  : 0
TARGET ID               : 0
TARGET ADDRESS          : 192.168.2.195:3260,1
SESSION STATUS          : ESTABLISHED AT Sat Aug 18 12:39:39 EDT 2007
SESSION ID              : ISID 00023d000001 TSIH 100
*******************************************************************************
TARGET NAME             : iqn.2006-01.com.openfiler:rac1.asm3
TARGET ALIAS            :
HOST ID                 : 1
BUS ID                  : 0
TARGET ID               : 0
TARGET ADDRESS          : 192.168.2.195:3260,1
SESSION STATUS          : ESTABLISHED AT Sat Aug 18 12:39:39 EDT 2007
SESSION ID              : ISID 00023d000001 TSIH 300
*******************************************************************************
TARGET NAME             : iqn.2006-01.com.openfiler:rac1.asm2
TARGET ALIAS            :
HOST ID                 : 2
BUS ID                  : 0
TARGET ID               : 0
TARGET ADDRESS          : 192.168.2.195:3260,1
SESSION STATUS          : ESTABLISHED AT Sat Aug 18 12:39:39 EDT 2007
SESSION ID              : ISID 00023d000001 TSIH 200
*******************************************************************************
TARGET NAME             : iqn.2006-01.com.openfiler:rac1.asm1
TARGET ALIAS            :
HOST ID                 : 3
BUS ID                  : 0
TARGET ID               : 0
TARGET ADDRESS          : 192.168.2.195:3260,1
SESSION STATUS          : ESTABLISHED AT Sat Aug 18 12:39:39 EDT 2007
SESSION ID              : ISID 00023d000001 TSIH 400
*******************************************************************************
TARGET NAME             : iqn.2006-01.com.openfiler:rac1.crs
TARGET ALIAS            :
HOST ID                 : 4
BUS ID                  : 0
TARGET ID               : 0
TARGET ADDRESS          : 192.168.2.195:3260,1
SESSION STATUS          : ESTABLISHED AT Sat Aug 18 12:39:39 EDT 2007
SESSION ID              : ISID 00023d000001 TSIH 500
*******************************************************************************
Using the mapping information from local SCSI ID to the iSCSI targets' host IDs / LUNs along with the iSCSI targets' name to SCSI ID, we can then generate a full mapping from iSCSI target name to local SCSI device name for the host linux1:

iSCSI Target Name to local SCSI Device Name
iSCSI Target Name Host / SCSI ID SCSI Device Name
iqn.2006-01.com.openfiler:rac1.asm4 0 /dev/sda
iqn.2006-01.com.openfiler:rac1.asm3 1 /dev/sdd
iqn.2006-01.com.openfiler:rac1.asm2 2 /dev/sdb
iqn.2006-01.com.openfiler:rac1.asm1 3 /dev/sdc
iqn.2006-01.com.openfiler:rac1.crs 4 /dev/sde

Note that the method I used above to create the mapping of iSCSI Target Names to local SCSI Device Names can become pretty cumbersome and is very prone to errors.

A much more efficient process in generating this mapping comes from a script written by Martin Jones:

iscsi-ls-map.sh
# ---------------------
# FILE: iscsi-ls-map.sh
# ---------------------

RUN_USERID=root
export RUN_USERID

RUID=`id | awk -F\( '{print $2}'|awk -F\) '{print $1}'`
if [[ ${RUID} != "$RUN_USERID" ]];then
    echo " "
    echo "You must be logged in as $RUN_USERID to run this script."
    echo "Exiting script."
    echo " "
    exit 1
fi

dmesg | grep "^Attach"  \
      | awk -F" " '{ print "/dev/"$4 " " $6 }'  \
      | sed -e 's/,//' | sed -e 's/scsi//'  \
      | sort -n -k2  \
      | sed -e '/disk1/d' > /tmp/tmp_scsi_dev

iscsi-ls | egrep -e "TARGET NAME" -e "HOST ID"   \
         | awk -F" " '{ if ($0 ~ /^TARGET.*/) printf $4; if ( $0 ~ /^HOST/) printf " %s\n",$4}'  \
         | sort -n -k2  \
         | cut -d':' -f2-  \
         | cut -d'.' -f2- > /tmp/tmp_scsi_targets

join -t" " -1 2 -2 2 /tmp/tmp_scsi_dev /tmp/tmp_scsi_targets > MAP


echo "Host / SCSI ID    SCSI Device Name         iSCSI Target Name"
echo "----------------  -----------------------  -----------------"

cat MAP | sed -e 's/ /                 /g'

rm -f MAP

Example run:

# ./iscsi-ls-map.sh
Host / SCSI ID    SCSI Device Name          iSCSI Target Name
----------------  ------------------------  -----------------
0                 /dev/sda                  asm4
1                 /dev/sdd                  asm3
2                 /dev/sdb                  asm2
3                 /dev/sdc                  asm1
4                 /dev/sde                  crs


Create Partitions on iSCSI Volumes

The next step is to create a single primary partition on each of the iSCSI volumes that spans the entire size of the volume. As mentioned earlier in this article, I will be using Oracle's Cluster File System, Release 2 (OCFS2) to store the two files to be shared for Oracle's Clusterware software. We will then be using Automatic Storage Management (ASM) to create four ASM volumes; two for all physical database files (data/index files, online redo log files, and control files) and two for the Flash Recovery Area (RMAN backups and archived redo log files).

The following table lists the five iSCSI volumes and what file systems they will support:

Oracle Shared Drive Configuration
File System
Type
iSCSI Target
(short) Name
Size Mount
Point
ASM Diskgroup
Name
File
Types
OCFS2 crs 2 GB /u02   Oracle Cluster Registry (OCR) File - (~100 MB)
Voting Disk - (~20MB)
ASM asm1 118 GB ORCL:VOL1 +ORCL_DATA1 Oracle Database Files
ASM asm2 118 GB ORCL:VOL2 +ORCL_DATA1 Oracle Database Files
ASM asm3 118 GB ORCL:VOL3 +FLASH_RECOVERY_AREA Oracle Flash Recovery Area
ASM asm4 118 GB ORCL:VOL4 +FLASH_RECOVERY_AREA Oracle Flash Recovery Area
Total   474 GB      

As shown in the table above, we will need to create a single Linux primary partition on each of the five iSCSI volumes. The fdisk command is used in Linux for creating (and removing) partitions. For each of the five iSCSI volumes, you can use the default values when creating the primary partition as the default action is to use the entire disk. You can safely ignore any warnings that may indicate the device does not contain a valid DOS partition (or Sun, SGI or OSF disklabel).

For the purpose of this example, I will be running the fdisk command from linux1 to create a single primary partition for each of the local SCSI devices identified in the previous section:

Please note that creating the partition on each of the iSCSI volumes must only be run from one of the nodes in the Oracle RAC cluster!

# ---------------------------------------

# fdisk /dev/sda
Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-15134, default 1): 1
Last cylinder or +size or +sizeM or +sizeK (1-15134, default 15134): 15134

Command (m for help): p

Disk /dev/sda: 124.4 GB, 124486942720 bytes
255 heads, 63 sectors/track, 15134 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1               1       15134   121563823+  83  Linux

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

# ---------------------------------------

# fdisk /dev/sdb
Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-15134, default 1): 1
Last cylinder or +size or +sizeM or +sizeK (1-15134, default 15134): 15134

Command (m for help): p

Disk /dev/sdb: 124.4 GB, 124486942720 bytes
255 heads, 63 sectors/track, 15134 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1       15134   121563823+  83  Linux

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

# ---------------------------------------

# fdisk /dev/sdc
Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-15134, default 1): 1
Last cylinder or +size or +sizeM or +sizeK (1-15134, default 15134): 15134

Command (m for help): p

Disk /dev/sdc: 124.4 GB, 124486942720 bytes
255 heads, 63 sectors/track, 15134 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1               1       15134   121563823+  83  Linux

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

# ---------------------------------------

# fdisk /dev/sdd
Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-15134, default 1): 1
Last cylinder or +size or +sizeM or +sizeK (1-15134, default 15134): 15134

Command (m for help): p

Disk /dev/sdd: 124.4 GB, 124486942720 bytes
255 heads, 63 sectors/track, 15134 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1               1       15134   121563823+  83  Linux

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

# ---------------------------------------

# fdisk /dev/sde
Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-1009, default 1): 1
Last cylinder or +size or +sizeM or +sizeK (1-1009, default 1009): 1009

Command (m for help): p

Disk /dev/sde: 2147 MB, 2147483648 bytes
67 heads, 62 sectors/track, 1009 cylinders
Units = cylinders of 4154 * 512 = 2126848 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sde1               1        1009     2095662   83  Linux

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

# ---------------------------------------
After creating all required partitions, you should now inform the kernel of the partition changes using the following command as the "root" user account from both of the Oracle RAC nodes in the cluster. Note that the mapping the iSCSI target names discovered from Openfiler and the local SCSI device name will be different on both Oracle RAC nodes. This will not cause any problems since the volumes will be mounted by name as opposed to their local SCSI device name.

linux2

# partprobe

# fdisk -l

Disk /dev/hda: 40.0 GB, 40000000000 bytes
255 heads, 63 sectors/track, 4863 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hda1   *           1          13      104391   83  Linux
/dev/hda2              14        4863    38957625   8e  Linux LVM

Disk /dev/sda: 124.4 GB, 124486942720 bytes
255 heads, 63 sectors/track, 15134 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1               1       15134   121563823+  83  Linux

Disk /dev/sdb: 124.4 GB, 124486942720 bytes
255 heads, 63 sectors/track, 15134 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1       15134   121563823+  83  Linux

Disk /dev/sdc: 124.4 GB, 124486942720 bytes
255 heads, 63 sectors/track, 15134 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1               1       15134   121563823+  83  Linux

Disk /dev/sdd: 124.4 GB, 124486942720 bytes
255 heads, 63 sectors/track, 15134 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1               1       15134   121563823+  83  Linux

Disk /dev/sde: 2147 MB, 2147483648 bytes
67 heads, 62 sectors/track, 1009 cylinders
Units = cylinders of 4154 * 512 = 2126848 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sde1               1        1009     2095662   83  Linux



Create "oracle" User and Directories


  Perform the following tasks on both Oracle RAC nodes in the cluster!

I will be using the Oracle Cluster File System, Release 2 (OCFS2) to store the files required to be shared for the Oracle Clusterware software. When using OCFS2, the UID of the UNIX user "oracle" and GID of the UNIX group "oinstall" must be the same on both of the Oracle RAC nodes in the cluster. If either the UID or GID are different, the files on the OCFS2 file system will show up as "unowned" or may even be owned by a different user. For this article, I will use 501 for the "oracle" UID and 501 for the "oinstall" GID.

Note that members of the UNIX group oinstall are considered the "owners" of the Oracle software. Members of the dba group can administer Oracle databases, for example starting up and shutting down databases. In this article, we are creating the oracle user account to have both responsibilities!

  This guide adheres to the Optimal Flexible Architecture (OFA) for naming conventions used in creating the directory structure.


Create Group and User for Oracle

Lets start this section by creating the UNIX oinstall and dba group and oracle user account:

# groupadd -g 501 oinstall
# groupadd -g 502 dba
# useradd -m -u 501 -g oinstall -G dba -d /home/oracle -s /bin/bash -c "Oracle Software Owner" oracle
# id oracle
uid=501(oracle) gid=501(oinstall) groups=501(oinstall),502(dba)
Set the password for the oracle account:
# passwd oracle
Changing password for user oracle.
New UNIX password: xxxxxxxxxxx
Retype new UNIX password: xxxxxxxxxxx
passwd: all authentication tokens updated successfully.


Verify That the User nobody Exists

Before installing the Oracle software, complete the following procedure to verify that the user nobody exists on the system:

  1. To determine if the user exists, enter the following command:
    # id nobody
    uid=99(nobody) gid=99(nobody) groups=99(nobody)
    If this command displays information about the nobody user, then you do not have to create that user.

  2. If the user nobody does not exist, then enter the following command to create it:
    # /usr/sbin/useradd nobody

  3. Repeat this procedure on all the other Oracle RAC nodes in the cluster.


Create the Oracle Base Directory

The next step is to create a new directory that will be used to store the Oracle Database software. When configuring the oracle user's environment (later in this section) we will be assigning the location of this directory to the $ORACLE_BASE environment variable.

The following assumes that the directories are being created in the root file system. Please note that this is being done for the sake of simplicity and is not recommended as a general practice. Normally, these directories would be created on a separate file system.

After the directory is created, you must then specify the correct owner, group, and permissions for it. Perform the following on both Oracle RAC nodes:

# mkdir -p /u01/app/oracle
# chown -R oracle:oinstall /u01/app/oracle
# chmod -R 775 /u01/app/oracle

At the end of this procedure, you will have the following:


Create the Oracle Clusterware Home Directory

Next, create a new directory that will be used to store the Oracle Clusterware software. When configuring the oracle user's environment (later in this section) we will be assigning the location of this directory to the $ORA_CRS_HOME environment variable.

As noted in the previous section, the following assumes that the directories are being created in the root file system. This is being done for the sake of simplicity and is not recommended as a general practice. Normally, these directories would be created on a separate file system.

After the directory is created, you must then specify the correct owner, group, and permissions for it. Perform the following on both Oracle RAC nodes:

# mkdir -p /u01/app/crs
# chown -R oracle:oinstall /u01/app/crs
# chmod -R 775 /u01/app/crs

At the end of this procedure, you will have the following:


Create Mount Point for OCFS2 / Clusterware

Let's now create the mount point for the Oracle Cluster File System, Release 2 (OCFS2) that will be used to store the two Oracle Clusterware shared files.

Perform the following on both Oracle RAC nodes:

# mkdir -p /u02
# chown -R oracle:oinstall /u02
# chmod -R 775 /u02


Create Login Script for oracle User Account

To ensure that the environment is setup correctly for the "oracle" UNIX userid on both Oracle RAC nodes, use the following .bash_profile:

  When you are setting the Oracle environment variables for each Oracle RAC node, ensure to assign each RAC node a unique Oracle SID!

For this example, I used:

  • linux1 : ORACLE_SID=orcl1
  • linux2 : ORACLE_SID=orcl2

Login to each node as the oracle user account:

# su - oracle
.bash_profile for Oracle User
# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
      . ~/.bashrc
fi

alias ls="ls -FA"
alias s="screen -DRRS iPad -t iPad"

export JAVA_HOME=/usr/local/java

# User specific environment and startup programs
export ORACLE_BASE=/u01/app/oracle
export ORACLE_HOME=$ORACLE_BASE/product/10.2.0/db_1
export ORA_CRS_HOME=/u01/app/crs
export ORACLE_PATH=$ORACLE_BASE/dba_scripts/sql:.:$ORACLE_HOME/rdbms/admin
export CV_JDKHOME=/usr/local/java

# Each RAC node must have a unique ORACLE_SID. (i.e. orcl1, orcl2,...)
export ORACLE_SID=orcl1

export PATH=.:${JAVA_HOME}/bin:${PATH}:$HOME/bin:$ORACLE_HOME/bin
export PATH=${PATH}:/usr/bin:/bin:/usr/bin/X11:/usr/local/bin
export PATH=${PATH}:$ORACLE_BASE/dba_scripts/bin
export ORACLE_TERM=xterm
export TNS_ADMIN=$ORACLE_HOME/network/admin
export ORA_NLS10=$ORACLE_HOME/nls/data
export NLS_DATE_FORMAT="DD-MON-YYYY HH24:MI:SS"
export LD_LIBRARY_PATH=$ORACLE_HOME/lib
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:$ORACLE_HOME/oracm/lib
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/lib:/usr/lib:/usr/local/lib
export CLASSPATH=.:$ORACLE_HOME/jdbc/lib/ojdbc6.jar
export CLASSPATH=${CLASSPATH}:$ORACLE_HOME/jlib:$ORACLE_HOME/rdbms/jlib
export CLASSPATH=${CLASSPATH}:$ORACLE_HOME/network/jlib
export THREADS_FLAG=native
export TEMP=/tmp
export TMPDIR=/tmp



Configure the Linux Servers for Oracle


  Perform the following configuration procedures on both Oracle RAC nodes in the cluster!

  The kernel parameters and shell limits discussed in this section will need to be defined on both Oracle RAC nodes in the cluster every time the machine is booted. This section provides very detailed information about setting those kernel parameters required for Oracle. Instructions for placing them in a startup script (/etc/sysctl.conf) are included in the section "All Startup Commands for Both Oracle RAC Nodes".


Overview

This section focuses on configuring both Oracle RAC Linux servers - getting each one prepared for the Oracle RAC 10g installation. This includes verifying enough swap space, setting shared memory and semaphores, setting the maximum amount of file handles, setting the IP local port range, setting shell limits for the oracle user, activating all kernel parameters for the system, and finally how to verify the correct date and time for both nodes in the cluster.

Throughout this section you will notice that there are several different ways to configure (set) these parameters. For the purpose of this article, I will be making all changes permanent (through reboots) by placing all commands in the /etc/sysctl.conf file.


Swap Space Considerations


Configuring Kernel Parameters and Shell Limits

The kernel parameters and shell limits presented in this section are recommended values only as documented by Oracle. For production database systems, Oracle recommends that you tune these values to optimize the performance of the system.

On both Oracle RAC nodes, verify that the kernel parameters described in this section are set to values greater than or equal to the recommended values. Also note that when setting the four semaphore values that all four values need to be entered on one line.


Setting Shared Memory

Shared memory allows processes to access common structures and data by placing them in a shared memory segment. This is the fastest form of Inter-Process Communications (IPC) available - mainly due to the fact that no kernel involvement occurs when data is being passed between the processes. With shared memory, data does not need to be copied between processes.

Oracle makes use of shared memory for its Shared Global Area (SGA) which is an area of memory that is shared by all Oracle backup and foreground processes. Adequate sizing of the SGA is critical to Oracle performance since it is responsible for holding the database buffer cache, shared SQL, access paths, and so much more.

To determine all shared memory limits, use the following:

# ipcs -lm

------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 32768
max total shared memory (kbytes) = 8388608
min seg size (bytes) = 1

Setting SHMMAX

The SHMMAX parameters defines the maximum size (in bytes) for a shared memory segment. The Oracle SGA is comprised of shared memory and it is possible that incorrectly setting SHMMAX could limit the size of the SGA. When setting SHMMAX, keep in mind that the size of the SGA should fit within one shared memory segment. An inadequate SHMMAX setting could result in the following:
ORA-27123: unable to attach to shared memory segment

You can determine the value of SHMMAX by performing the following:

# cat /proc/sys/kernel/shmmax
33554432
The default value for SHMMAX is 32MB. This is often too small to configure the Oracle SGA. I generally set the SHMMAX parameter to 2GB using the following methods:
  • You can alter the default setting for SHMMAX without rebooting the machine by making the changes directly to the /proc file system (/proc/sys/kernel/shmmax) using the following command:
    # sysctl -w kernel.shmmax=2147483648

  • You should then make this change permanent by inserting the kernel parameter in the /etc/sysctl.conf startup file:
    # echo "kernel.shmmax=2147483648" >> /etc/sysctl.conf

Setting SHMMNI

We now look at the SHMMNI parameters. This kernel parameter is used to set the maximum number of shared memory segments system wide. The default value for this parameter is 4096.

You can determine the value of SHMMNI by performing the following:

# cat /proc/sys/kernel/shmmni
4096
The default setting for SHMMNI should be adequate for our Oracle RAC 10g Release 2 installation.

Setting SHMALL

Finally, we look at the SHMALL shared memory kernel parameter. This parameter controls the total amount of shared memory (in pages) that can be used at one time on the system. In short, the value of this parameter should always be at least:
ceil(SHMMAX/PAGE_SIZE)
The default size of SHMALL is 2097152 and can be queried using the following command:
# cat /proc/sys/kernel/shmall
2097152
The default setting for SHMALL should be adequate for our Oracle RAC 10g Release 2 installation.

  The page size in Red Hat Linux on the i386 platform is 4096 bytes. You can, however, use bigpages which supports the configuration of larger memory page sizes.


Setting Semaphores

Now that we have configured our shared memory settings, it is time to take care of configuring the semaphores. The best way to describe a semaphore is as a counter that is used to provide synchronization between processes (or threads within a process) for shared resources like shared memory. Semaphore sets are supported in System V where each one is a counting semaphore. When an application requests semaphores, it does so using "sets".

To determine all semaphore limits, use the following:

# ipcs -ls

------ Semaphore Limits --------
max number of arrays = 128
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 32
semaphore max value = 32767
You can also use the following command:
# cat /proc/sys/kernel/sem
250     32000   32      128

Setting SEMMSL

The SEMMSL kernel parameter is used to control the maximum number of semaphores per semaphore set.

Oracle recommends setting SEMMSL to the largest PROCESS instance parameter setting in the init.ora file for all databases on the Linux system plus 10. Also, Oracle recommends setting the SEMMSL to a value of no less than 100.

Setting SEMMNI

The SEMMNI kernel parameter is used to control the maximum number of semaphore sets in the entire Linux system.

Oracle recommends setting the SEMMNI to a value of no less than 100.

Setting SEMMNS

The SEMMNS kernel parameter is used to control the maximum number of semaphores (not semaphore sets) in the entire Linux system.

Oracle recommends setting the SEMMNS to the sum of the PROCESSES instance parameter setting for each database on the system, adding the largest PROCESSES twice, and then finally adding 10 for each Oracle database on the system.

Use the following calculation to determine the maximum number of semaphores that can be allocated on a Linux system. It will be the lesser of:

SEMMNS  -or-  (SEMMSL * SEMMNI)

Setting SEMOPM

The SEMOPM kernel parameter is used to control the number of semaphore operations that can be performed per semop system call.

The semop system call (function) provides the ability to do operations for multiple semaphores with one semop system call. A semaphore set can have the maximum number of SEMMSL semaphores per semaphore set and is therefore recommended to set SEMOPM equal to SEMMSL.

Oracle recommends setting the SEMOPM to a value of no less than 100.

Setting Semaphore Kernel Parameters

Finally, we see how to set all semaphore parameters. In the following, the only parameter I care about changing (raising) is SEMOPM. All other default settings should be sufficient for our example installation.
  • You can alter the default setting for all semaphore settings without rebooting the machine by making the changes directly to the /proc file system (/proc/sys/kernel/sem) using the following command:
    # sysctl -w kernel.sem="250 32000 100 128"

  • You should then make this change permanent by inserting the kernel parameter in the /etc/sysctl.conf startup file:
    # echo "kernel.sem=250 32000 100 128" >> /etc/sysctl.conf


Setting File Handles

When configuring the Red Hat Linux server, it is critical to ensure that the maximum number of file handles is large enough. The setting for file handles denotes the number of open files that you can have on the Linux system.

Use the following command to determine the maximum number of file handles for the entire system:

# cat /proc/sys/fs/file-max
102462

Oracle recommends that the file handles for the entire system be set to at least 65536.

  You can query the current usage of file handles by using the following:
# cat /proc/sys/fs/file-nr
825     0       65536
The file-nr file displays three parameters:
  • Total allocated file handles
  • Currently used file handles
  • Maximum file handles that can be allocated

  If you need to increase the value in /proc/sys/fs/file-max, then make sure that the ulimit is set properly. Usually for Linux 2.4 and 2.6 it is set to unlimited. Verify the ulimit setting my issuing the ulimit command:
# ulimit
unlimited


Setting IP Local Port Range

Configure the system to allow a local port range of 1024 through 65000.

Use the following command to determine the value of ip_local_port_range:

# cat /proc/sys/net/ipv4/ip_local_port_range
32768   61000
The default value for ip_local_port_range is ports 32768 through 61000. Oracle recommends a local port range of 1024 to 65000.


Setting Shell Limits for the oracle User

To improve the performance of the software on Linux systems, Oracle recommends you increase the following shell limits for the oracle user:

Shell Limit Item in limits.conf Hard Limit
Maximum number of open file descriptors nofile 65536
Maximum number of processes available to a single user nproc 16384

To make these changes, run the following as root:

cat >> /etc/security/limits.conf <<EOF
oracle soft nproc 2047
oracle hard nproc 16384
oracle soft nofile 1024
oracle hard nofile 65536
EOF

cat >> /etc/pam.d/login <<EOF
session required /lib/security/pam_limits.so
EOF
Update the default shell startup file for the "oracle" UNIX account.


Activating All Kernel Parameters for the System

At this point, we have covered all of the required Linux kernel parameters needed for a successful Oracle installation and configuration. Within each section above, we configured the Linux system to persist each of the kernel parameters through reboots on system startup by placing them all in the /etc/sysctl.conf file.

We could reboot at this point to ensure all of these parameters are set in the kernel or we could simply "run" the /etc/sysctl.conf file by running the following command as root. Perform this on both Oracle RAC nodes in the cluster!

# sysctl -p

net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
kernel.sysrq = 0
kernel.core_uses_pid = 1
net.core.rmem_default = 262144
net.core.wmem_default = 262144
net.core.rmem_max = 262144
net.core.wmem_max = 262144
kernel.shmmax = 2147483648
kernel.sem = 250 32000 100 128
fs.file-max = 65536
net.ipv4.ip_local_port_range = 1024 65000


Setting the Correct Date and Time on Both Oracle RAC Nodes

During the installation of Oracle Clusterware, the Database, and the Companion CD, the Oracle Universal Installer (OUI) first installs the software to the local node running the installer (i.e. linux1). The software is then copied remotely to all of the remaining nodes in the cluster (i.e. linux2). During the remote copy process, the OUI will execute the UNIX "tar" command on each of the remote nodes to extract the files that were archived and copied over. If the date and time on the node performing the install is greater than that of the node it is copying to, the OUI will throw an error from the "tar" command indicating it is attempting to extract files stamped with a time in the future:

Error while copying directory 
    /u01/app/crs with exclude file list 'null' to nodes 'linux2'.
[PRKC-1002 : All the submitted commands did not execute successfully]
---------------------------------------------
linux2:
   /bin/tar: ./bin/lsnodes: time stamp 2006-09-13 09:21:34 is 735 s in the future
   /bin/tar: ./bin/olsnodes: time stamp 2006-09-13 09:21:34 is 735 s in the future
   ...(more errors on this node)

Please note that although this would seem like a severe error from the OUI, it can safely be disregarded as a warning. The "tar" command DOES actually extract the files; however, when you perform a listing of the files (using ls -l) on the remote node, they will be missing the time field until the time on the remote server is greater than the timestamp of the file.

Before starting any of the above noted installations, ensure that each member node of the cluster is set as closely as possible to the same date and time. Oracle strongly recommends using the Network Time Protocol feature of most operating systems for this purpose, with all nodes using the same reference Network Time Protocol server.

Accessing a Network Time Protocol server, however, may not always be an option. In this case, when manually setting the date and time for the nodes in the cluster, ensure that the date and time of the node you are performing the software installations from (linux1) is less than all other nodes in the cluster (linux2). I generally use a 20 second difference as shown in the following example:

Setting the date and time from linux1:

# date -s "9/2/2007 01:12:00"

Setting the date and time from linux2:

# date -s "9/2/2007 01:12:20"

The two-node RAC configuration described in this article does not make use of a Network Time Protocol server.



Configure the "hangcheck-timer" Kernel Module


  Perform the following configuration procedures on both Oracle RAC nodes in the cluster!

Oracle 9.0.1 and 9.2.0.1 used a userspace watchdog daemon called watchdogd to monitor the health of the cluster and to restart a RAC node in case of a failure. Starting with Oracle 9.2.0.2 (and still available in Oracle10g Release 2), the watchdog daemon has been deprecated by a Linux kernel module named hangcheck-timer which addresses availability and reliability problems much better. The hang-check timer is loaded into the Linux kernel and checks if the system hangs. It will set a timer and check the timer after a certain amount of time. There is a configurable threshold to hang-check that, if exceeded will reboot the machine. Although the hangcheck-timer module is not required for Oracle Clusterware (Cluster Manager) operation, it is highly recommended by Oracle.


The hangcheck-timer.ko Module

The hangcheck-timer module uses a kernel-based timer that periodically checks the system task scheduler to catch delays in order to determine the health of the system. If the system hangs or pauses, the timer resets the node. The hangcheck-timer module uses the Time Stamp Counter (TSC) CPU register which is a counter that is incremented at each clock signal. The TCS offers much more accurate time measurements since this register is updated by the hardware automatically.

Much more information about the hangcheck-timer project can be found here.


Installing the hangcheck-timer.ko Module

The hangcheck-timer was normally shipped only by Oracle, however, this module is now included with Red Hat Linux AS starting with kernel versions 2.4.9-e.12 and higher. The hangcheck-timer should already be included. Use the following to ensure that you have the module included:
# find /lib/modules -name "hangcheck-timer.ko"
/lib/modules/2.6.9-55.EL/kernel/drivers/char/hangcheck-timer.ko
In the above output, we care about the hangcheck timer object (hangcheck-timer.ko) in the /lib/modules/2.6.9-55.EL/kernel/drivers/char directory.


Configuring and Loading the hangcheck-timer Module

There are two key parameters to the hangcheck-timer module:

  The two hangcheck-timer module parameters indicate how long a RAC node must hang before it will reset the system. A node reset will occur when the following is true:
system hang time > (hangcheck_tick + hangcheck_margin)


Configuring Hangcheck Kernel Module Parameters

Each time the hangcheck-timer kernel module is loaded (manually or by Oracle) it needs to know what value to use for each of the two parameters we just discussed: (hangcheck-tick and hangcheck-margin).

These values need to be available after each reboot of the Linux server. To do this, make an entry with the correct values to the /etc/modprobe.conf file as follows:

# su -
# echo "options hangcheck-timer hangcheck_tick=30 hangcheck_margin=180" >> /etc/modprobe.conf
Each time the hangcheck-timer kernel module gets loaded, it will use the values defined by the entry I made in the /etc/modprobe.conf file.


Manually Loading the Hangcheck Kernel Module for Testing

Oracle is responsible for loading the hangcheck-timer kernel module when required. It is for this reason that it is not required to perform a modprobe or insmod of the hangcheck-timer kernel module in any of the startup files (i.e. /etc/rc.local).

It is only out of pure habit that I continue to include a modprobe of the hangcheck-timer kernel module in the /etc/rc.local file. Someday I will get over it, but realize that it does not hurt to include a modprobe of the hangcheck-timer kernel module during startup.

So to keep myself sane and able to sleep at night, I always configure the loading of the hangcheck-timer kernel module on each startup as follows:

# echo "/sbin/modprobe hangcheck-timer" >> /etc/rc.local

  You don't have to manually load the hangcheck-timer kernel module using modprobe or insmod after each reboot. The hangcheck-timer module will be loaded by Oracle (automatically) when needed.

Now, to test the hangcheck-timer kernel module to verify it is picking up the correct parameters we defined in the /etc/modprobe.conf file, use the modprobe command. Although you could load the hangcheck-timer kernel module by passing it the appropriate parameters (e.g. insmod hangcheck-timer hangcheck_tick=30 hangcheck_margin=180), we want to verify that it is picking up the options we set in the /etc/modprobe.conf file.

To manually load the hangcheck-timer kernel module and verify it is using the correct values defined in the /etc/modprobe.conf file, run the following command:

# su -
# modprobe hangcheck-timer
# grep Hangcheck /var/log/messages | tail -2
Sep  2 01:16:37 linux1 kernel: Hangcheck: starting hangcheck timer 0.9.0 (tick is 30 seconds, margin is 180 seconds).
Sep  2 01:16:37 linux1 kernel: Hangcheck: Using monotonic_clock().



Configure RAC Nodes for Remote Access using SSH


  Perform the following configuration procedures on both Oracle RAC nodes in the cluster!

Before you can install Oracle RAC 10g, you must configure secure shell (SSH) for the UNIX user account you plan to use to install Oracle Clusterware and the Oracle Database software. The installation and configuration tasks described in this section will need to be performed on both Oracle RAC nodes. As configured earlier in this article, the software owner for Oracle Clusterware and the Oracle Database software will be "oracle".

The goal here is to setup user equivalence for the oracle UNIX user account. User equivalence enables the oracle UNIX user account to access all other nodes in the cluster (running commands and copying files) without the need for a password. Oracle added support in 10g Release 1 for using the SSH tool suite for setting up user equivalence. Before Oracle Database 10g, user equivalence had to be configured using remote shell (RSH).

  The SSH configuration described in this article uses SSH1. If SSH is not available, then OUI attempts to use rsh and rcp instead. These services, however, are disabled by default on most Linux systems. The use of RSH will not be discussed in this article.

You need either an RSA or a DSA key for the SSH protocol. RSA is used with the SSH 1.5 protocol, while DSA is the default for the SSH 2.0 protocol. With OpenSSH, you can use either RSA or DSA. For the purpose of this article, we will configure SSH using SSH1.

If you have an SSH2 installation, and you cannot use SSH1, then refer to your SSH distribution documentation to configure SSH1 compatibility or to configure SSH2 with DSA. This type of configuration is beyond the scope of this article and will not be discussed.

So, why do we have to setup user equivalence? Installing Oracle Clusterware and the Oracle Database software is only performed from one node in a RAC cluster. When running the Oracle Universal Installer (OUI) on that particular node, it will use the ssh and scp commands to run remote commands on and copy files (the Oracle software) to all other nodes within the RAC cluster. The oracle UNIX user account on the node running the OUI (runInstaller) must be trusted by all other nodes in your RAC cluster. This means that you must be able to run the secure shell commands (ssh or scp) on the Linux server you will be running the OUI from against all other Linux servers in the cluster without being prompted for a password.

  Please note that the use of secure shell is not required for normal RAC operation. This configuration, however, must to be enabled for RAC and patchset installations as well as creating the clustered database.

The methods required for configuring SSH1, an RSA key, and user equivalence is described in the following sections.


Configuring the Secure Shell

To determine if SSH is installed and running, enter the following command:
# pgrep sshd
3797
If SSH is running, then the response to this command is a list of process ID number(s). Run this command on both Oracle RAC nodes in the cluster to verify the SSH daemons are installed and running!

  To find out more about SSH, refer to the man page:
# man ssh


Creating the RSA Keys on Both Oracle RAC Nodes

The first step in configuring SSH is to create an RSA public/private key pair on both Oracle RAC nodes in the cluster. The command to do this will create a public and private key for RSA (for a total of two keys per node). The content of the RSA public keys will then need to be copied into an authorized key file which is then distributed to both Oracle RAC nodes in the cluster.

Use the following steps to create the RSA key pair. Please note that these steps will need to be completed on both Oracle RAC nodes in the cluster:

  1. Log on as the "oracle" UNIX user account.
    # su - oracle

  2. If necessary, create the .ssh directory in the "oracle" user's home directory and set the correct permissions on it:
    $ mkdir -p ~/.ssh
    $ chmod 700 ~/.ssh

  3. Enter the following command to generate an RSA key pair (public and private key) for the SSH protocol:
    $ /usr/bin/ssh-keygen -t rsa
    At the prompts:
    • Accept the default location for the key files.
    • Enter and confirm a pass phrase. This should be different from the "oracle" UNIX user account password however it is not a requirement.

    This command will write the public key to the ~/.ssh/id_rsa.pub file and the private key to the ~/.ssh/id_rsa file. Note that you should never distribute the private key to anyone!

  4. Repeat the above steps for each Oracle RAC node in the cluster.

Now that both Oracle RAC nodes contain a public and private key for RSA, you will need to create an authorized key file on one of the nodes. An authorized key file is nothing more than a single file that contains a copy of everyone's (every node's) RSA public key. Once the authorized key file contains all of the public keys, it is then distributed to all other nodes in the cluster.

Complete the following steps on one of the nodes in the cluster to create and then distribute the authorized key file. For the purpose of this article, I am using linux1:

  1. First, determine if an authorized key file already exists on the node (~/.ssh/authorized_keys). In most cases this will not exist since this article assumes you are working with a new install. If the file doesn't exist, create it now:
    $ touch ~/.ssh/authorized_keys
    $ cd ~/.ssh
    $ ls -l *.pub
    -rw-r--r-- 1 oracle oinstall 223 Sep  2 01:18 id_rsa.pub
    The listing above should show the id_rsa.pub public key created in the previous section.

  2. In this step, use SCP (Secure Copy) or SFTP (Secure FTP) to copy the content of the ~/.ssh/id_rsa.pub public key from both Oracle RAC nodes in the cluster to the authorized key file just created (~/.ssh/authorized_keys). Again, this will be done from linux1. You will be prompted for the oracle UNIX user account password for both Oracle RAC nodes accessed.

    The following example is being run from linux1 and assumes a two-node cluster, with nodes linux1 and linux2:

    $ ssh linux1 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
    The authenticity of host 'linux1 (192.168.1.100)' can't be established.
    RSA key fingerprint is 9a:8b:9b:12:23:6d:87:4d:04:66:44:97:f9:d8:57:10.
    Are you sure you want to continue connecting (yes/no)? yes
    Warning: Permanently added 'linux1,192.168.1.100' (RSA) to the list of known hosts.
    oracle@linux1's password: xxxxx
    
    $ ssh linux2 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
    The authenticity of host 'linux2 (192.168.1.101)' can't be established.
    RSA key fingerprint is 2b:6d:15:55:8e:ba:7a:fc:a9:f2:0b:ba:01:48:30:f2.
    Are you sure you want to continue connecting (yes/no)? yes
    Warning: Permanently added 'linux2,192.168.1.101' (RSA) to the list of known hosts.
    oracle@linux2's password: xxxxx

      The first time you use SSH to connect to a node from a particular system, you will see a message similar to the following:
    The authenticity of host 'linux1 (192.168.1.100)' can't be established.
    RSA key fingerprint is 9a:8b:9b:12:23:6d:87:4d:04:66:44:97:f9:d8:57:10.
    Are you sure you want to continue connecting (yes/no)? yes
    Enter yes at the prompt to continue. You should not see this message again when you connect from this system to the same node.

  3. At this point, we have the RSA public key from every node in the cluster in the authorized key file (~/.ssh/authorized_keys) on linux1. We now need to copy it to the remaining nodes in the cluster. In our two-node cluster example, the only remaining node is linux2. Use the scp command to copy the authorized key file to all remaining nodes in the RAC cluster:
    $ scp ~/.ssh/authorized_keys linux2:.ssh/authorized_keys
    oracle@linux2's password: xxxxx
    authorized_keys                             100%  446     0.4KB/s   00:00

  4. Change the permission of the authorized key file for both Oracle RAC nodes in the cluster by logging into the node and running the following:
    $ chmod 600 ~/.ssh/authorized_keys

  5. At this point, if you use ssh to log in to or run a command on another node, you are prompted for the pass phrase that you specified when you created the RSA key. For example, test the following from linux1:
    $ ssh linux1 hostname
    Enter passphrase for key '/home/oracle/.ssh/id_rsa': xxxxx
    linux1
    
    $ ssh linux2 hostname
    Enter passphrase for key '/home/oracle/.ssh/id_rsa': xxxxx
    linux2

      If you see any other messages or text, apart from the host name, then the Oracle installation can fail. Make any changes required to ensure that only the host name is displayed when you enter these commands. You should ensure that any part of a login script(s) that generate any output, or ask any questions, are modified so that they act only when the shell is an interactive shell.


Enabling SSH User Equivalency for the Current Shell Session

When running the OUI, it will need to run the secure shell tool commands (ssh and scp) without being prompted for a pass phrase. Even though SSH is configured on both Oracle RAC nodes in the cluster, using the secure shell tool commands will still prompt for a pass phrase. Before running the OUI, you need to enable user equivalence for the terminal session you plan to run the OUI from. For the purpose of this article, all Oracle installations will be performed from linux1.

User equivalence will need to be enabled on any new terminal shell session before attempting to run the OUI. If you log out and log back in to the node you will be performing the Oracle installation from, you must enable user equivalence for the terminal shell session as this is not done by default.

To enable user equivalence for the current terminal shell session, perform the following steps:

  1. Log on to the node where you want to run the OUI from (linux1) as the "oracle" UNIX user account.
    # su - oracle

  2. Enter the following commands:
    $ exec /usr/bin/ssh-agent $SHELL
    $ /usr/bin/ssh-add
    Enter passphrase for /home/oracle/.ssh/id_rsa: xxxxx
    Identity added: /home/oracle/.ssh/id_rsa (/home/oracle/.ssh/id_rsa)
    At the prompt, enter the pass phrase for each key that you generated.

  3. If SSH is configured correctly, you will be able to use the ssh and scp commands without being prompted for a password or pass phrase from this terminal session:
    $ ssh linux1 "date;hostname"
    Sun Sep  2 01:22:46 EST 2007
    linux1
    
    $ ssh linux2 "date;hostname"
    Sun Sep  2 01:23:28 EST 2007
    linux2

      The commands above should display the date set on each Oracle RAC node along with its hostname. If any of the nodes prompt for a password or pass phrase then verify that the ~/.ssh/authorized_keys file on that node contains the correct public keys.

    Also, if you see any other messages or text, apart from the date and hostname, then the Oracle installation can fail. Make any changes required to ensure that only the date is displayed when you enter these commands. You should ensure that any part of a login script(s) that generate any output, or ask any questions, are modified so that they act only when the shell is an interactive shell.

  4. The Oracle Universal Installer is a GUI interface and requires the use of an X Server. From the terminal session enabled for user equivalence (the node you will be performing the Oracle installations from), set the environment variable DISPLAY to a valid X Windows display:

    Bourne, Korn, and Bash shells:

    $ DISPLAY=<Any X-Windows Host>:0
    $ export DISPLAY
    C shell:
    $ setenv DISPLAY <Any X-Windows Host>:0
    After setting the DISPLAY variable to a valid X Windows display, you should perform another test of the current terminal session to ensure that X11 forwarding is not enabled:
    $ ssh linux1 hostname
    linux1
    
    $ ssh linux2 hostname
    linux2

      If you are using a remote client to connect to the node performing the installation, and you see a message similar to: "Warning: No xauth data; using fake authentication data for X11 forwarding." then this means that your authorized keys file is configured correctly; however, your SSH configuration has X11 forwarding enabled. For example:
    $ export DISPLAY=melody:0
    $ ssh linux2 hostname
    Warning: No xauth data; using fake authentication data for X11 forwarding.
    linux2
    Note that having X11 Forwarding enabled will cause the Oracle installation to fail. To correct this problem, create a user-level SSH client configuration file for the "oracle" UNIX user account that disables X11 Forwarding:

    • Using a text editor, edit or create the file ~/.ssh/config
    • Make sure that the ForwardX11 attribute is set to no. For example, insert the following into the ~/.ssh/config file:
      Host *
              ForwardX11 no

  5. You must run the Oracle Universal Installer from this terminal session or remember to repeat the steps to enable user equivalence (steps 2, 3, and 4 from this section) before you start the Oracle Universal Installer from a different terminal session.


Remove any stty Commands

When installing the Oracle software, any hidden files on the system (i.e. .bashrc, .cshrc, .profile) will cause the installation process to fail if they contain stty commands.

To avoid this problem, you must modify these files to suppress all output on STDERR as in the following examples:

  If there are hidden files that contain stty commands that are loaded by the remote shell, then OUI indicates an error and stops the installation.



All Startup Commands for Both Oracle RAC Nodes


  Verify that the following startup commands are included on both of the Oracle RAC nodes in the cluster!

Up to this point, we have talked in great detail about the parameters and resources that need to be configured on both nodes in the Oracle RAC 10g configuration. This section will take a deep breath and recap those parameters, commands, and entries (in previous sections of this document) that need to happen on both Oracle RAC nodes when they are booted.

In this section, I provide all of the commands, parameters, and entries that have been discussed so far that will need to be included in the startup scripts for each Linux node in the RAC cluster. For each of the startup files below, I indicate in blue the entries that should be included in each of the startup files in order to provide a successful RAC node.


/etc/modprobe.conf

All parameters and values to be used by kernel modules.

/etc/modprobe.conf
alias eth0 b44
alias eth1 tulip
alias snd-card-0 snd-intel8x0
options snd-card-0 index=0
alias usb-controller ehci-hcd
alias usb-controller1 uhci-hcd
options hangcheck-timer hangcheck_tick=30 hangcheck_margin=180


/etc/sysctl.conf

We wanted to adjust the default and maximum send buffer size as well as the default and maximum receive buffer size for the interconnect. This file also contains those parameters responsible for configuring shared memory, semaphores, file handles, and local IP range for use by the Oracle instance.

/etc/sysctl.conf
# Kernel sysctl configuration file for Red Hat Linux
#
# For binary values, 0 is disabled, 1 is enabled.  See sysctl(8) and
# sysctl.conf(5) for more details.

# Controls IP packet forwarding
net.ipv4.ip_forward = 0

# Controls source route verification
net.ipv4.conf.default.rp_filter = 1

# Controls the System Request debugging functionality of the kernel
kernel.sysrq = 0

# Controls whether core dumps will append the PID to the core filename.
# Useful for debugging multi-threaded applications.
kernel.core_uses_pid = 1


# +---------------------------------------------------------+
# | ADJUSTING NETWORK SETTINGS                              |
# +---------------------------------------------------------+
# | With Oracle 9.2.0.1 and onwards, Oracle now makes use   |
# | of UDP as the default protocol on Linux for             |
# | inter-process communication (IPC), such as Cache Fusion |
# | and Cluster Manager buffer transfers between instances  |
# | within the RAC cluster. Oracle strongly suggests to     |
# | adjust the default and maximum receive buffer size      |
# | (SO_RCVBUF socket option) to 256 KB, and the default    |
# | and maximum send buffer size (SO_SNDBUF socket option)  |
# | to 256 KB. The receive buffers are used by TCP and UDP  |
# | to hold received data until it is read by the           |
# | application. The receive buffer cannot overflow because |
# | the peer is not allowed to send data beyond the buffer  |
# | size window. This means that datagrams will be          |
# | discarded if they don't fit in the socket receive       |
# | buffer. This could cause the sender to overwhelm the    |
# | receiver.                                               |
# +---------------------------------------------------------+

# +---------------------------------------------------------+
# | Default setting in bytes of the socket "receive" buffer |
# | which may be set by using the SO_RCVBUF socket option.  |
# +---------------------------------------------------------+
net.core.rmem_default=262144

# +---------------------------------------------------------+
# | Maximum setting in bytes of the socket "receive" buffer |
# | which may be set by using the SO_RCVBUF socket option.  |
# +---------------------------------------------------------+
net.core.rmem_max=262144

# +---------------------------------------------------------+
# | Default setting in bytes of the socket "send" buffer    |
# | which may be set by using the SO_SNDBUF socket option.  |
# +---------------------------------------------------------+
net.core.wmem_default=262144

# +---------------------------------------------------------+
# | Maximum setting in bytes of the socket "send" buffer    |
# | which may be set by using the SO_SNDBUF socket option.  |
# +---------------------------------------------------------+
net.core.wmem_max=262144


# +---------------------------------------------------------+
# | ADJUSTING ADDITIONAL KERNEL PARAMETERS FOR ORACLE       |
# +---------------------------------------------------------+
# | Configure the kernel parameters for all Oracle Linux    |
# | servers by setting shared memory and semaphores,        |
# | setting the maximum amount of file handles, and setting |
# | the IP local port range.                                |
# +---------------------------------------------------------+

# +---------------------------------------------------------+
# | SHARED MEMORY                                           |
# +---------------------------------------------------------+
kernel.shmmax=2147483648

# +---------------------------------------------------------+
# | SEMAPHORES                                              |
# | ----------                                              |
# |                                                         |
# | SEMMSL_value  SEMMNS_value  SEMOPM_value  SEMMNI_value  |
# |                                                         |
# +---------------------------------------------------------+
kernel.sem=250 32000 100 128

# +---------------------------------------------------------+
# | FILE HANDLES                                            |
# ----------------------------------------------------------+
fs.file-max=65536

# +---------------------------------------------------------+
# | LOCAL IP RANGE                                          |
# ----------------------------------------------------------+
net.ipv4.ip_local_port_range=1024 65000

  Verify that each of the required kernel parameters (above) are configured in the /etc/sysctl.conf file. Then, ensure that each of these parameters are truly in effect by running the following command on both Oracle RAC nodes in the cluster:
# sysctl -p
net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
kernel.sysrq = 0
kernel.core_uses_pid = 1
net.core.rmem_default = 262144
net.core.wmem_default = 262144
net.core.rmem_max = 262144
net.core.wmem_max = 262144
kernel.shmmax = 2147483648
kernel.sem = 250 32000 100 128
fs.file-max = 65536
net.ipv4.ip_local_port_range = 1024 65000


/etc/hosts

All machine/IP entries for nodes in the RAC cluster.

/etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.

127.0.0.1        localhost.localdomain   localhost

# Public Network - (eth0)
192.168.1.100    linux1
192.168.1.101    linux2

# Private Interconnect - (eth1)
192.168.2.100    linux1-priv
192.168.2.101    linux2-priv

# Public Virtual IP (VIP) addresses - (eth0:1)
192.168.1.200    linux1-vip
192.168.1.201    linux2-vip

# Private Storage Network for Openfiler - (eth1)
192.168.1.195    openfiler1
192.168.2.195    openfiler1-priv

192.168.1.106    melody
192.168.1.102    alex
192.168.1.105    bartman
192.168.1.120    cartman


/etc/rc.local

Loading the hangcheck-timer kernel module.

/etc/rc.local
#!/bin/sh
#
# This script will be executed *after* all the other init scripts.
# You can put your own initialization stuff in here if you don't
# want to do the full Sys V style init stuff.

touch /var/lock/subsys/local

# +---------------------------------------------------------+
# | HANGCHECK TIMER                                         |
# | (I do not believe this is required, but doesn't hurt)   |
# +---------------------------------------------------------+

/sbin/modprobe hangcheck-timer



Install and Configure Oracle Cluster File System (OCFS2)


  Most of the installation and configuration procedures in this section should be performed on both Oracle RAC nodes in the cluster! Creating the OCFS2 filesystem, however, should only be executed on one of nodes in the RAC cluster.


Overview

It is now time to install the Oracle Cluster File System, Release 2 (OCFS2). OCFS2, developed by Oracle Corporation, is a Cluster File System which allows all nodes in a cluster to concurrently access a device via the standard file system interface. This allows for easy management of applications that need to run across a cluster.

OCFS (Release 1) was released in December 2002 to enable Oracle Real Application Cluster (RAC) users to run the clustered database without having to deal with RAW devices. The file system was designed to store database related files, such as data files, control files, redo logs, archive logs, etc. OCFS2 is the next generation of the Oracle Cluster File System. It has been designed to be a general purpose cluster file system. With it, one can store not only database related files on a shared disk, but also store Oracle binaries and configuration files (shared Oracle Home) making management of RAC even easier.

In this article, I will be using the latest release of OCFS2 (OCFS2 Release 1.2.5-6 at the time of this writing) to store the two files that are required to be shared by the Oracle Clusterware software. Along with these two files, I will also be using this space to store the shared ASM SPFILE for all Oracle RAC instances.

See the following page for more information on OCFS2 (including Installation Notes) for Linux:

  OCFS2 Project Documentation


Download OCFS2

First, let's download the latest OCFS2 distribution. The OCFS2 distribution comprises of two sets of RPMs; namely, the kernel module and the tools. The latest kernel module is available for download from http://oss.oracle.com/projects/ocfs2/files/ and the tools from http://oss.oracle.com/projects/ocfs2-tools/files/.

Download the appropriate RPMs starting with the latest OCFS2 kernel module (the driver). With CentOS 4.5, I am using kernel release 2.6.9-55.EL. The appropriate OCFS2 kernel module was found in the latest release of OCFS2 at the time of this writing (OCFS2 Release 1.2.5-6). The available OCFS2 kernel modules for Linux kernel 2.6.9-55.EL are listed below. Always download the latest OCFS2 kernel module that matches the distribution, platform, kernel version and the kernel flavor (smp, hugemem, psmp, etc).

  ocfs2-2.6.9-55.EL-1.2.5-6.i686.rpm - (for single processor)
  ocfs2-2.6.9-55.ELsmp-1.2.5-6.i686.rpm - (for multiple processors)
  ocfs2-2.6.9-55.ELhugemem-1.2.5-6.i686.rpm - (for hugemem)
For the tools, simply match the platform and distribution. You should download both the OCFS2 tools and the OCFS2 console applications.
  ocfs2-tools-1.2.4-1.i386.rpm - (OCFS2 tools)
  ocfs2console-1.2.4-1.i386.rpm - (OCFS2 console)

  The OCFS2 Console is optional but highly recommended. The ocfs2console application requires e2fsprogs, glib2 2.2.3 or later, vte 0.11.10 or later, pygtk2 (EL4) or python-gtk (SLES9) 1.99.16 or later, python 2.3 or later and ocfs2-tools.

  If you were curious as to which OCFS2 driver release you need, use the OCFS2 release that matches your kernel version. To determine your kernel release:
$ uname -a
Linux linux1 2.6.9-55.EL #1 Wed May 2 13:52:16 EDT 2007 i686 i686 i386 GNU/Linux
In the absence of the string "smp" after the string "EL", we are running a single processor (Uniprocessor) machine. If the string "smp" were to appear, then you would be running on a multi-processor machine.


Install OCFS2

I will be installing the OCFS2 files onto two - single processor machines. The installation process is simply a matter of running the following command on both Oracle RAC nodes in the cluster as the root user account:
$ su -
# rpm -Uvh ocfs2-2.6.9-55.EL-1.2.5-6.i686.rpm \
       ocfs2console-1.2.4-1.i386.rpm \
       ocfs2-tools-1.2.4-1.i386.rpm
Preparing...                ########################################### [100%]
   1:ocfs2-tools            ########################################### [ 33%]
   2:ocfs2-2.6.9-55.EL      ########################################### [ 67%]
   3:ocfs2console           ########################################### [100%]


Disable SELinux (RHEL4 U2 and higher)

Users of RHEL4 U2 and higher (CentOS 4.5 is based on RHEL4 U5) are advised that OCFS2 currently does not work with SELinux enabled. If you are using RHEL4 U2 or higher (which includes us since we are using CentOS 4.5) you will need to disable SELinux (using tool system-config-securitylevel) to get the O2CB service to execute.

  A ticket has been logged with Red Hat on this issue.

If you followed the installation instructions I provided for the CentOS operating system, the SELinux option should already be disabled in which case this section can be skipped. During the CentOS installation, the SELinux option was disabled in the Firewall section.

If you did not follow the instructions to disable the SELinux option during the installation of CentOS (or if you simply want to verify it is truly disable), run the "Security Level Configuration" GUI utility:

# /usr/bin/system-config-securitylevel &


This will bring up the following screen:


Figure 13: Security Level Configuration Opening Screen


Now, click the SELinux tab and check off the "Enabled" checkbox. After clicking [OK], you will be presented with a warning dialog. Simply acknowledge this warning by clicking "Yes". Your screen should now look like the following after disabling the SELinux option:


Figure 14: SELinux Disabled


After making this change on both nodes in the cluster, each node will need to be rebooted to implement the change. SELinux must be disabled before you can continue with configuring OCFS2!

# init 6


Configure OCFS2

The next step is to generate and configure the /etc/ocfs2/cluster.conf file on both Oracle RAC nodes in the cluster. The easiest way to accomplish this is to run the GUI tool ocfs2console. In this section, we will not only create and configure the /etc/ocfs2/cluster.conf file using ocfs2console, but will also create and start the cluster stack O2CB. When the /etc/ocfs2/cluster.conf file is not present, (as will be the case in our example), the ocfs2console tool will create this file along with a new cluster stack service (O2CB) with a default cluster name of ocfs2. This will need to be done on both Oracle RAC nodes in the cluster as the root user account:

  Note that OCFS2 will be configured to use the private network (192.168.2.0) for all of its network traffic as recommended by Oracle. While OCFS2 does not take much bandwidth, it does require the nodes to be alive on the network and sends regular keepalive packets to ensure that they are. To avoid a network delay being interpreted as a node disappearing on the net which could lead to a node-self-fencing, a private interconnect is recommended. It is safe to use the same private interconnect for both Oracle RAC and OCFS2.

A popular question then is what node name should be used and should it be related to the IP address? The node name needs to match the hostname of the machine. The IP address need not be the one associated with that hostname. In other words, any valid IP address on that node can be used. OCFS2 will not attempt to match the node name (hostname) with the specified IP address.

$ su -
# ocfs2console &
This will bring up the GUI as shown below:


Figure 15: ocfs2console Screen


Using the ocfs2console GUI tool, perform the following steps:

  1. Select [Cluster] -> [Configure Nodes...]. This will start the OCFS2 Cluster Stack (Figure 16). Acknowledge this Information dialog box by clicking [Close]. You will then be presented with the "Node Configuration" dialog.
  2. On the "Node Configuration" dialog, click the [Add] button.
    • This will bring up the "Add Node" dialog.
    • In the "Add Node" dialog, enter the Host name and IP address for the first node in the cluster. Leave the IP Port set to its default value of 7777. In my example, I added both nodes using linux1 / 192.168.2.100 for the first node and linux2 / 192.168.2.101 for the second node.
      Note: The node name you enter "must" match the hostname of the machine and the IP addresses will use the private interconnect.
    • Click [Apply] on the "Node Configuration" dialog - All nodes should now be "Active" as shown in Figure 17.
    • Click [Close] on the "Node Configuration" dialog.
  3. After verifying all values are correct, exit the application using [File] -> [Quit]. This needs to be performed on both Oracle RAC nodes in the cluster.



Figure 16: Starting the OCFS2 Cluster Stack


The following dialog shows the OCFS2 settings I used for the node linux1 and linux2:


Figure 17: Configuring Nodes for OCFS2


  See the Troubleshooting section if you get the error:
o2cb_ctl: Unable to access cluster service while creating node


After exiting the ocfs2console, you will have a /etc/ocfs2/cluster.conf similar to the following. This process needs to be completed on both Oracle RAC nodes in the cluster and the OCFS2 configuration file should be exactly the same for both of the nodes:

/etc/ocfs2/cluster.conf
node:
        ip_port = 7777
        ip_address = 192.168.2.100
        number = 0
        name = linux1
        cluster = ocfs2

node:
        ip_port = 7777
        ip_address = 192.168.2.101
        number = 1
        name = linux2
        cluster = ocfs2

cluster:
        node_count = 2
        name = ocfs2


O2CB Cluster Service

Before we can do anything with OCFS2 like formatting or mounting the file system, we need to first have OCFS2's cluster stack, O2CB, running (which it will be as a result of the configuration process performed above). The stack includes the following services:

All of the above cluster services have been packaged in the o2cb system service (/etc/init.d/o2cb). Here is a short listing of some of the more useful commands and options for the o2cb system service.

  The following commands are for demonstration purposes only and should not be run when installing and configuring OCFS2 for this article!


Configure O2CB to Start on Boot and Adjust O2CB Heartbeat Threshold

We would now like to configure the on-boot properties of the OC2B driver so that the cluster stack services will start on each boot. We will also be adjusting the OCFS2 Heartbeat Threshold from its default setting of 7 to 61. All of the tasks within this section will need to be performed on both nodes in the cluster.

  With releases of OCFS2 prior to 1.2.1, a bug existed where the driver would not get loaded on each boot even after configuring the on-boot properties to do so. This bug was fixed in release 1.2.1 of OCFS2 and does not need to be addressed in this article. If however you are using a release of OCFS2 prior to 1.2.1, please see the Troubleshooting section for a workaround to this bug.

Set the on-boot properties as follows:

# /etc/init.d/o2cb offline ocfs2
# /etc/init.d/o2cb unload
# /etc/init.d/o2cb configure
Configuring the O2CB driver.

This will configure the on-boot properties of the O2CB driver.
The following questions will determine whether the driver is loaded on
boot.  The current values will be shown in brackets ('[]').  Hitting
<ENTER> without typing an answer will keep that current value.  Ctrl-C
will abort.

Load O2CB driver on boot (y/n) [n]: y
Cluster to start on boot (Enter "none" to clear) [ocfs2]: ocfs2
Specify heartbeat dead threshold (>=7) [7]: 61
Specify network idle timeout in ms (>=5000) [10000]: 10000
Specify network keepalive delay in ms (>=1000) [5000]: 5000
Specify network reconnect delay in ms (>=2000) [2000]: 2000
Writing O2CB configuration: OK
Loading module "configfs": OK
Mounting configfs filesystem at /config: OK
Loading module "ocfs2_nodemanager": OK
Loading module "ocfs2_dlm": OK
Loading module "ocfs2_dlmfs": OK
Mounting ocfs2_dlmfs filesystem at /dlm: OK
Starting O2CB cluster ocfs2: OK


Format the OCFS2 File System

  Unlike the other tasks in this section, creating the OCFS2 file system should only be executed on one of nodes in the RAC cluster. I will be executing all commands in this section from linux1 only.

We can now start to make use of the iSCSI volume we partitioned for OCFS2 in the section "Create Partitions on iSCSI Volumes".

It is extremely important to note that at this point in the article you may have rebooted linux1 after partitioning the iSCSI volume to be used for OCFS2 (e.g. /dev/sde1). This means that the mapping of iSCSI target names discovered from Openfiler to the local SCSI device name on linux1 may be different. Please repeat the procedures documented in the section "Discovering iSCSI Targets" to determine if the iSCSI target names have been discovered as a different local SCSI device name on linux1 if you have rebooted.

For example, when creating the primary partition on the iSCSI volume to be used for OCFS2, I documented that the iSCSI target name "iqn.2006-01.com.openfiler:rac1.crs" mapped to the local SCSI device name /dev/sde. Then, earlier in this section, I had to reboot both nodes when disabling SELinux. After working through the procedures documented in the section "Discovering iSCSI Targets", it may be the case that "iqn.2006-01.com.openfiler:rac1.crs" is no longer mapped to the local SCSI device name /dev/sde. Please note that the local SCSI device name will most likely be different on your machine.

If the O2CB cluster is offline, start it. The format operation needs the cluster to be online, as it needs to ensure that the volume is not mounted on some node in the cluster.

Earlier in this document, we created the directory /u02 under the section "Create Mount Point for OCFS2 / Clusterware". This section contains the commands to create and mount the file system to be used for the Cluster Manager.

  Note that it is possible to create and mount the OCFS2 file system using either the GUI tool ocfs2console or the command-line tool mkfs.ocfs2. From the ocfs2console utility, use the menu [Tasks] - [Format].

See the instructions below on how to create the OCFS2 file system using the command-line tool mkfs.ocfs2.

To create the file system, we can use the Oracle executable mkfs.ocfs2. For the purpose of this example, I run the following command only from linux1 as the root user account using the local SCSI device name mapped to the iSCSI volume for crs — /dev/sde1. Also note that I specified a label named "oracrsfiles" which will be referred to when mounting or un-mounting the volume:

$ su -
# mkfs.ocfs2 -b 4K -C 32K -N 4 -L oracrsfiles /dev/sde1

mkfs.ocfs2 1.2.4
Filesystem label=oracrsfiles
Block size=4096 (bits=12)
Cluster size=32768 (bits=15)
Volume size=2145943552 (65489 clusters) (523912 blocks)
3 cluster groups (tail covers 977 clusters, rest cover 32256 clusters)
Journal size=67108864
Initial number of node slots: 4
Creating bitmaps: done
Initializing superblock: done
Writing system files: done
Writing superblock: done
Writing backup superblock: 1 block(s)
Formatting Journals: done
Writing lost+found: done
mkfs.ocfs2 successful


Mount the OCFS2 File System

Now that the file system is created, we can mount it. Let's first do it using the command-line, then I'll show how to include it in the /etc/fstab to have it mount on each boot.

  Mounting the file system will need to be performed on both nodes in the Oracle RAC cluster as the root user account using the OCFS2 label oracrsfiles!

First, here is how to manually mount the OCFS2 file system from the command-line. Remember that this needs to be performed as the root user account:

$ su -
# mount -t ocfs2 -o datavolume,nointr -L "oracrsfiles" /u02
If the mount was successful, you will simply get your prompt back. We should, however, run the following checks to ensure the file system is mounted correctly. Let's use the mount command to ensure that the new file system is really mounted. This should be performed on both nodes in the RAC cluster:
# mount
/dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw)
none on /proc type proc (rw)
none on /sys type sysfs (rw)
none on /dev/pts type devpts (rw,gid=5,mode=620)
usbfs on /proc/bus/usb type usbfs (rw)
/dev/hda1 on /boot type ext3 (rw)
none on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
cartman:SHARE2 on /cartman type nfs (rw,addr=192.168.1.120)
configfs on /config type configfs (rw)
ocfs2_dlmfs on /dlm type ocfs2_dlmfs (rw)
/dev/sde1 on /u02 type ocfs2 (rw,_netdev,datavolume,nointr,heartbeat=local)

  Please take note of the datavolume option I am using to mount the new file system. Oracle database users must mount any volume that will contain the Voting Disk file, Cluster Registry (OCR), Data files, Redo logs, Archive logs and Control files with the datavolume mount option so as to ensure that the Oracle processes open the files with the o_direct flag. The nointr option ensures that the I/O's are not interrupted by signals.

Any other type of volume, including an Oracle home (which I will not be using for this article), should not be mounted with this mount option.

  Why does it take so much time to mount the volume? It takes around 5 seconds for a volume to mount. It does so as to let the heartbeat thread stabilize. In a later release, Oracle plans to add support for a global heartbeat, which will make most mounts instant.


Configure OCFS2 to Mount Automatically at Startup

Let's take a look at what we have done so far. We downloaded and installed the Oracle Cluster File System, Release 2 (OCFS2), which will be used to store the files needed by Cluster Manager files. After going through the install, we loaded the OCFS2 module into the kernel and then formatted the clustered file system. Finally, we mounted the newly created file system using the OCFS2 label "oracrsfiles". This section walks through the steps responsible for mounting the new OCFS2 file system each time the machine(s) are booted using its label.

We start by adding the following line to the /etc/fstab file on both nodes in the RAC cluster:

LABEL=oracrsfiles     /u02          ocfs2   _netdev,datavolume,nointr     0 0

  Notice the "_netdev" option for mounting this file system. The _netdev mount option is a must for OCFS2 volumes. This mount option indicates that the volume is to be mounted after the network is started and dismounted before the network is shutdown.

Now, let's make sure that the ocfs2.ko kernel module is being loaded and that the file system will be mounted during the boot process.

If you have been following along with the examples in this article, the actions to load the kernel module and mount the OCFS2 file system should already be enabled. However, we should still check those options by running the following on both nodes in the RAC cluster as the root user account:

$ su -
# chkconfig --list o2cb
o2cb            0:off   1:off   2:on    3:on    4:on    5:on    6:off
The flags that I have marked in bold should be set to "on".


Check Permissions on New OCFS2 File System

Use the ls command to check ownership. The permissions should be set to 0775 with owner "oracle" and group "oinstall".

The following tasks only need to be executed on one of nodes in the RAC cluster. I will be executing all commands in this section from linux1 only.

Let's first check the permissions:

# ls -ld /u02
drwxr-xr-x  3 root root 4096 Sep  3 00:42 /u02
As we can see from the listing above, the oracle user account (and the oinstall group) will not be able to write to this directory. Let's fix that:
# chown oracle:oinstall /u02
# chmod 775 /u02
Let's now go back and re-check that the permissions are correct for both Oracle RAC nodes in the cluster:
# ls -ld /u02
drwxrwxr-x  3 oracle oinstall 4096 Sep  3 00:42 /u02


Create Directory for Oracle Clusterware Files

The last mandatory task is to create the appropriate directory on the new OCFS2 file system that will be used for the Oracle Clusterware shared files. We will also modify the permissions of this new directory to allow the "oracle" owner and group "oinstall" read/write access.

The following tasks only need to be executed on one of nodes in the RAC cluster. I will be executing all commands in this section from linux1 only.

# mkdir -p /u02/oradata/orcl
# chown -R oracle:oinstall /u02/oradata
# chmod -R 775 /u02/oradata
# ls -l /u02/oradata
total 4
drwxrwxr-x 2 oracle oinstall 4096 Sep  3 00:45 orcl


Reboot Both Nodes

Before starting the next section, this would be a good place to reboot both of the nodes in the RAC cluster. When the machines come up, ensure that the cluster stack services are being loaded and the new OCFS2 file system is being mounted:
# mount
/dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw)
none on /proc type proc (rw)
none on /sys type sysfs (rw)
none on /dev/pts type devpts (rw,gid=5,mode=620)
usbfs on /proc/bus/usb type usbfs (rw)
/dev/hda1 on /boot type ext3 (rw)
none on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
cartman:SHARE2 on /cartman type nfs (rw,addr=192.168.1.120)
configfs on /config type configfs (rw)
ocfs2_dlmfs on /dlm type ocfs2_dlmfs (rw)
/dev/sde1 on /u02 type ocfs2 (rw,_netdev,datavolume,nointr,heartbeat=local)


If you modified the O2CB heartbeat threshold, you should verify that it is set correctly:

# cat /proc/fs/ocfs2_nodemanager/hb_dead_threshold
61


How to Determine OCFS2 Version

To determine which version of OCFS2 is running, use:
# cat /proc/fs/ocfs2/version
OCFS2 1.2.5 Mon Jul 30 13:22:57 PDT 2007 (build 4d201e17b1bc7db76d96570e328927c7)



Install and Configure Automatic Storage Management (ASMLib 2.0)


  Most of the installation and configuration procedures should be performed on both of the Oracle RAC nodes in the cluster! Creating the ASM disks, however, will only need to be performed on a single node within the cluster.


Introduction

In this section, we will configure Automatic Storage Management (ASM) to be used as the file system / volume manager for all Oracle physical database files (data, online redo logs, control files, archived redo logs) and a Flash Recovery Area.

ASM was introduced in Oracle10g Release 1 and is used to alleviate the DBA from having to manage individual files and drives. ASM is built into the Oracle kernel and provides the DBA with a way to manage thousands of disk drives 24x7 for both single and clustered instances of Oracle. All of the files and directories to be used for Oracle will be contained in a disk group. ASM automatically performs load balancing in parallel across all available disk drives to prevent hot spots and maximize performance, even with rapidly changing data usage patterns.

There are two different methods to configure ASM on Linux:

In this article, I will be using the "ASM with ASMLib I/O" method. Oracle states (in Metalink Note 275315.1) that "ASMLib was provided to enable ASM I/O to Linux disks without the limitations of the standard UNIX I/O API". I plan on performing several tests in the future to identify the performance gains in using ASMLib. Those performance metrics and testing details are out of scope of this article and therefore will not be discussed.

We start this section by first downloading the ASMLib drivers (ASMLib Release 2.0) specific to our Linux kernel. We will then install and configure the ASMLib 2.0 drivers while finishing off the section with a demonstration of how to create the ASM disks.

If you would like to learn more about Oracle ASMLib 2.0, visit http://www.oracle.com/technology/tech/linux/asmlib/


Download the ASMLib 2.0 Packages

We start this section by downloading the latest ASMLib 2.0 libraries and the driver from OTN. At the time of this writing, the latest release of the ASMLib driver was 2.0.3-1. Like the Oracle Cluster File System, we need to download the version for the Linux kernel and number of processors on the machine. We are using kernel 2.6.9-55.EL #1 while the machines I am using are both single processor machines:
# uname -a
Linux linux1 2.6.9-55.EL #1 Wed May 2 13:52:16 EDT 2007 i686 i686 i386 GNU/Linux

  If you do not currently have an account with Oracle OTN, you will need to create one. This is a FREE account!


  Oracle ASMLib Downloads for Red Hat Enterprise Linux 4 AS

  oracleasm-2.6.9-55.EL-2.0.3-1.i686.rpm - (for single processor)
  oracleasm-2.6.9-55.ELsmp-2.0.3-1.i686.rpm - (for multiple processors)
  oracleasm-2.6.9-55.ELhugemem-2.0.3-1.i686.rpm - (for hugemem)
You will also need to download the following ASMLib tools:
  oracleasmlib-2.0.2-1.i386.rpm - (Userspace library)
  oracleasm-support-2.0.3-1.i386.rpm - (Driver support files)


Install ASMLib 2.0 Packages

This installation needs to be performed on both nodes in the RAC cluster as the root user account:
$ su -
# rpm -Uvh oracleasm-2.6.9-55.EL-2.0.3-1.i686.rpm \
       oracleasmlib-2.0.2-1.i386.rpm \
       oracleasm-support-2.0.3-1.i386.rpm
Preparing...                ########################################### [100%]
   1:oracleasm-support      ########################################### [ 33%]
   2:oracleasm-2.6.9-55.EL  ########################################### [ 67%]
   3:oracleasmlib           ########################################### [100%]


Configure and Loading the ASMLib 2.0 Packages

Now that we downloaded and installed the ASMLib 2.0 Packages for Linux, we now need to configure and load the ASM kernel module. This task needs to be run on both nodes in the RAC cluster as the root user account:
$ su -
# /etc/init.d/oracleasm configure
Configuring the Oracle ASM library driver.

This will configure the on-boot properties of the Oracle ASM library
driver.  The following questions will determine whether the driver is
loaded on boot and what permissions it will have.  The current values
will be shown in brackets ('[]').  Hitting <ENTER> without typing an
answer will keep that current value.  Ctrl-C will abort.

Default user to own the driver interface []: oracle
Default group to own the driver interface []: oinstall
Start Oracle ASM library driver on boot (y/n) [n]: y
Fix permissions of Oracle ASM disks on boot (y/n) [y]: y
Writing Oracle ASM library driver configuration: [  OK  ]
Creating /dev/oracleasm mount point: [  OK  ]
Loading module "oracleasm": [  OK  ]
Mounting ASMlib driver filesystem: [  OK  ]
Scanning system for ASM disks: [  OK  ]


Create ASM Disks for Oracle

  Creating the ASM disks only needs to be done on one node in the RAC cluster as the root user account. I will be running these commands on linux1. On the other Oracle RAC node, you will need to perform a scandisk to recognize the new volumes. When that is complete, you should then run the oracleasm listdisks command on both Oracle RAC nodes to verify that all ASM disks were created and available.

In the section "Create Partitions on iSCSI Volumes", we configured (partitioned) four iSCSI volumes to be used by ASM. ASM will be used for storing Oracle database files like online redo logs, database files, control files, archived redo log files, and the flash recovery area.

It is extremely important to note that at this point in the article you may have rebooted linux1 after partitioning the iSCSI volumes to be used for ASM (e.g. /dev/sda, /dev/sdb, /dev/sdc, and /dev/sdd). This means that the mapping of iSCSI target names discovered from Openfiler to the local SCSI device name on linux1 may be different. Please repeat the procedures documented in the section "Discovering iSCSI Targets" to determine if the iSCSI target names for all four ASM volumes have been discovered as a different local SCSI device name on linux1 if you have rebooted.

For example, I rebooted both Oracle RAC nodes after configuring OCFS2 (previous section). My ASM iSCSI target name mappings for linux1 did change and are now:

iSCSI Target Name to local SCSI Device Name - (ASM)
iSCSI Target Name Host / SCSI ID SCSI Device Name
iqn.2006-01.com.openfiler:rac1.asm4 0 /dev/sda
iqn.2006-01.com.openfiler:rac1.asm3 1 /dev/sdb
iqn.2006-01.com.openfiler:rac1.asm2 2 /dev/sdc
iqn.2006-01.com.openfiler:rac1.asm1 3 /dev/sdd

Use the above iSCSI target names to map which local SCSI device name to use when creating the ASM disks.

  If you are repeating this article using the same hardware (actually, the same shared logical drives), you may get a failure when attempting to create the ASM disks. If you do receive a failure, try listing all ASM disks that were used by the previous install using:
# /etc/init.d/oracleasm listdisks
VOL1
VOL2
VOL3
VOL4
As you can see, the results show that I have four ASM volumes already defined. If you have the four volumes already defined from a previous run, go ahead and remove them using the following commands. After removing the previously created volumes, use the "oracleasm createdisk" commands (below) to create the new volumes.
# /etc/init.d/oracleasm deletedisk VOL1
Removing ASM disk "VOL1" [  OK  ]
# /etc/init.d/oracleasm deletedisk VOL2
Removing ASM disk "VOL2" [  OK  ]
# /etc/init.d/oracleasm deletedisk VOL3
Removing ASM disk "VOL3" [  OK  ]
# /etc/init.d/oracleasm deletedisk VOL4
Removing ASM disk "VOL4" [  OK  ]

To create the ASM disks using the iSCSI target names to local SCSI device name mappings (above), type the following:

$ su -
# /etc/init.d/oracleasm createdisk VOL1 /dev/sdd1
Marking disk "/dev/sdd1" as an ASM disk [  OK  ]

# /etc/init.d/oracleasm createdisk VOL2 /dev/sdc1
Marking disk "/dev/sdc1" as an ASM disk [  OK  ] 

# /etc/init.d/oracleasm createdisk VOL3 /dev/sdb1
Marking disk "/dev/sdb1" as an ASM disk [  OK  ]

# /etc/init.d/oracleasm createdisk VOL4 /dev/sda1
Marking disk "/dev/sda1" as an ASM disk [  OK  ]


On all other nodes in the RAC cluster, you must perform a scandisk to recognize the new volumes:

# /etc/init.d/oracleasm scandisks
Scanning system for ASM disks [  OK  ]


We can now test that the ASM disks were successfully created by using the following command on both nodes in the RAC cluster as the root user account:

# /etc/init.d/oracleasm listdisks
VOL1
VOL2
VOL3
VOL4



Download Oracle RAC 10g Software


  The following download procedures only need to be performed on one node in the cluster!


Overview

The next logical step is to install Oracle Clusterware 10g Release 2 (10.2.0.1.0), Oracle Database 10g Release 2 (10.2.0.1.0), and finally the Oracle Database 10g Companion CD Release 2 (10.2.0.1.0) for Linux x86 software. However, we must first download and extract the required Oracle software packages from the Oracle Technology Network (OTN).

  If you do not currently have an account with Oracle OTN, you will need to create one. This is a FREE account!

Oracle offers a development and testing license free of charge. No support, however, is provided and the license does not permit production use. A full description of the license agreement is available on OTN.

In this section, we will be downloading and extracting the required software from Oracle to only one of the Linux nodes in the RAC cluster - namely linux1. This is the machine where I will be performing all of the Oracle installs from. The Oracle installer will copy the required software packages to all other nodes in the RAC configuration using the remote access method we setup in the section "Configure RAC Nodes for Remote Access using SSH".

Login to the node that you will be performing all of the Oracle installations from (linux1) as the "oracle" user account. In this example, I will be downloading the required Oracle software to linux1 and saving them to "/home/oracle/orainstall".


Oracle Clusterware Release 2 (10.2.0.1.0) for Linux x86

First, download the Oracle Clusterware Release 2 for Linux x86.

  Oracle Clusterware Release 2 (10.2.0.1.0)


Oracle Database 10g Release 2 (10.2.0.1.0) for Linux x86

Next, we need to download the Oracle Database 10g Release 2 (10.2.0.1.0) Software for Linux x86. This can be downloaded from the same page used to download the Oracle Clusterware Release 2 software:

  Oracle Database 10g Release 2 (10.2.0.1.0)


Oracle Database 10g Companion CD Release 2 (10.2.0.1.0) for Linux x86

Finally, we should download the Oracle Database 10g Companion CD for Linux x86. This can be downloaded from the same page used to download the Oracle Clusterware Release 2 software:

  Oracle Database 10g Companion CD Release 2 (10.2.0.1.0)


As the "oracle" user account, extract the three packages you downloaded to a temporary directory. In this example, I will use "/home/oracle/orainstall".

Extract the Clusterware package as follows:

# su - oracle
$ mkdir -p /home/oracle/orainstall
$ cd /home/oracle/orainstall
$ unzip 10201_clusterware_linux32.zip

Then extract the Oracle Database 10g Software:

$ cd /home/oracle/orainstall
$ unzip 10201_database_linux32.zip

Finally, extract the Oracle Database 10g Companion CD Software:

$ cd /home/oracle/orainstall
$ unzip 10201_companion_linux32.zip



Pre-Installation Tasks for Oracle10g Release 2


  Perform the following checks on both Oracle RAC nodes in the cluster!


When installing the Linux O/S (CentOS or Red Hat Enterprise Linux 4), you should verify that all required RPMs for Oracle are installed. If you followed the instructions I used for installing Linux, you would have installed Everything, in which case you will have all of the required RPM packages. However, if you performed another installation type (i.e. "Advanced Server), you may have some packages missing and will need to install them. All of the required RPMs are on the Linux CDs/ISOs.

The next pre-installation step is to run the Cluster Verification Utility (CVU). CVU is a command-line utility provided on the Oracle Clusterware installation media. It is responsible for performing various system checks to assist you with confirming the Oracle RAC nodes are properly configured for Oracle Clusterware and Oracle Real Application Clusters installation. The CVU only needs to be run from the node you will be performing the Oracle installations from (linux1 in this article). Note that the CVU is also run automatically at the end of the Oracle Clusterware installation as part of the Configuration Assistants process.


Check Required RPMs

The following packages must be installed on both nodes in the RAC cluster. Note that the version number for your Linux distribution may vary slightly.

binutils-2.15.92.0.2-21
compat-db-4.1.25-9
compat-gcc-32-3.2.3-47.3
compat-gcc-32-c++-3.2.3-47.3
compat-libstdc++-33-3.2.3-47.3
compat-libgcc-296-2.96-132.7.2
control-center-2.8.0-12.rhel4.5
cpp-3.4.6-3
gcc-3.4.6-3
gcc-c++-3.4.6-3
glibc-2.3.4-2.25
glibc-common-2.3.4-2.25
glibc-devel-2.3.4-2.25
glibc-headers-2.3.4-2.25
glibc-kernheaders-2.4-9.1.98.EL
gnome-libs-1.4.1.2.90-44.1
libaio-0.3.105-2
libstdc++-3.4.6-3
libstdc++-devel-3.4.6-3
make-3.80-6.EL4
openmotif-2.2.3-10.RHEL4.5
openmotif21-2.1.30-11.RHEL4.6 
pdksh-5.2.14-30.3
setarch-1.6-1
sysstat-5.0.5-11.rhel4
xscreensaver-4.18-5.rhel4.11

  Note that the openmotif RPM packages are only required to install Oracle demos. This article does not cover the installation of Oracle demos.

To query package information (gcc and glibc-devel for example), use the "rpm -q <PackageName> [, <PackageName>]" command as follows:

# rpm -q gcc glibc-devel
gcc-3.4.6-3
glibc-devel-2.3.4-2.25
If you need to install any of the above packages (which you should not have to if you installed Everything), use the "rpm -Uvh <PackageName.rpm>" command. For example, to install the GCC gcc-3.4.6-3 package, use:
# rpm -Uvh gcc-3.4.6-3.i386.rpm
Prerequisites for Using Cluster Verification Utility

JDK 1.4.2

You must have JDK 1.4.2 installed on your system before you can run CVU. If you do not have JDK 1.4.2 installed on your system, and you attempt to run CVU, you will receive an error message similar to the following:
ERROR. Either CV_JDKHOME environment variable should be set
or /stagepath/cluvfy/jrepack.zip should exist.
If you do not have JDK 1.4.2 installed, then download it from the Sun Web site, and use the Sun instructions to install it. JDK 1.4.2 is available as a download from the following Web site: http://www.sun.com/java.

If you do have JDK 1.4.2 installed, then you must define the user environment variable CV_JDKHOME for the path to the JDK. For example, if JDK 1.4.2 is installed in /usr/local/j2sdk1.4.2_15, then log in as the user that you plan to use to run CVU, and enter the following commands:

CV_JDKHOME=/usr/local/j2sdk1.4.2_15
export CV_JDKHOME
Note that this can be defined in the .bash_profile login script for the oracle user account.

Install cvuqdisk RPM (RHEL Users Only)

The second pre-requisite for running the CVU is for Red Hat Linux users. If you are using Red Hat Linux, then you must download and install the Red Hat operating system package cvuqdisk to both of the Oracle RAC nodes in the cluster. This means you will need to install the cvuqdisk RPM to both linux1 and linux2. Without cvuqdisk, CVU will be unable to discover shared disks, and you will receive the error message "Package cvuqdisk not installed" when you run CVU.

The cvuqdisk RPM can be found on the Oracle Clusterware installation media in the rpm directory. For the purpose of this article, the Oracle Clusterware media was extracted to the /home/oracle/orainstall/clusterware directory on linux1. Note that before installing the cvuqdisk RPM, we need to set an environment variable named CVUQDISK_GRP to point to the group that will own the cvuqdisk utility. The default group is oinstall which is the primary group we are using for the oracle UNIX user account in this article. If you are using a different primary group (i.e. dba), you will need to set CVUQDISK_GRP=<YOUR_GROUP> before attempting to install the cvuqdisk RPM.

Locate and copy the cvuqdisk RPM from linux1 to linux2 then perform the following steps as the root user account on both Oracle RAC nodes to install:

# -- IF YOU ARE USING A PRIMARY GROUP OTHER THAN oinstall
# CVUQDISK_GRP=<YOUR_GROUP>; export CVUQDISK_GRP

# cd /home/oracle/orainstall/clusterware/rpm
# rpm -iv cvuqdisk-1.0.1-1.rpm
Preparing packages for installation...
cvuqdisk-1.0.1-1

# ls -l /usr/sbin/cvuqdisk
-rwsr-x---  1 root oinstall 4168 Jun  2  2005 /usr/sbin/cvuqdisk

Verify Remote Access / User Equivalence

The CVU should be run from linux1 — the node we will be performing all of the Oracle installations from. Before running CVU, login as the oracle user account and verify remote access / user equivalence is configured to all nodes in the cluster. When using the secure shell method, user equivalence will need to be enabled for the terminal shell session before attempting to run the CVU. To enable user equivalence for the current terminal shell session, perform the following steps remembering to enter the pass phrase for each key that you generated when prompted:
# su - oracle
$ exec /usr/bin/ssh-agent $SHELL
$ /usr/bin/ssh-add
Enter passphrase for /home/oracle/.ssh/id_rsa: xxxxx
Identity added: /home/oracle/.ssh/id_rsa (/home/oracle/.ssh/id_rsa)
Checking Pre-Installation Tasks for CRS with CVU
Once all prerequisites for using CVU have been met, we can start by checking that all pre-installation tasks for Oracle Clusterware (CRS) are completed by executing the following command as the "oracle" UNIX user account (with user equivalence enabled) from linux1:
$ cd /home/oracle/orainstall/clusterware/cluvfy
$ ./runcluvfy.sh stage -pre crsinst -n linux1,linux2 -verbose
Review the CVU report. Note that there are several errors you may ignore in this report.

The first error is with regards to finding a suitable set of interfaces for VIPs which can be safely ignored. This is a bug documented in Metalink Note 338924.1:

Suitable interfaces for the private interconnect on subnet "192.168.2.0":
linux2 eth1:192.168.2.101
linux1 eth1:192.168.2.100

ERROR:
Could not find a suitable set of interfaces for VIPs.

Result: Node connectivity check failed.

As documented in the note, this error can be safely ignored.

The last set of errors that can be ignored deal with specific RPM package versions that do not exist in RHEL4 Update 5. For example:

While these specific packages are listed as missing in the CVU report, please ensure that the correct versions of the compat-* packages are installed on both of the Oracle RAC nodes in the cluster. For example, in RHEL4 Update 5, these would be:

Checking the Hardware and Operating System Setup with CVU
The next CVU check to run will verify the hardware and operating system setup. Again, run the following as the "oracle" UNIX user account from linux1:
$ cd /home/oracle/orainstall/clusterware/cluvfy
$ ./runcluvfy.sh stage -post hwos -n linux1,linux2 -verbose
Review the CVU report. As with the previous check (pre-installation tasks for CRS), the check for finding a suitable set of interfaces for VIPs will fail and can be safely ignored.

Also note that the check for shared storage accessibility will fail.

Checking shared storage accessibility...

WARNING:
Unable to determine the sharedness of /dev/sde on nodes:
        linux2,linux2,linux2,linux2,linux2,linux1,linux1,linux1,linux1,linux1


Shared storage check failed on nodes "linux2,linux1".
This too can be safely ignored. While we know the disks are visible and shared from both of our Oracle RAC nodes in the cluster, the check itself fails. Several reasons for this have been documented. The first came from Metalink indicating that cluvfy currently does not work with devices other than SCSI devices. This would include devices like EMC PowerPath and volume groups like those from Openfiler. At the time of this writing, no workaround exists other than to use manual methods for detecting shared devices. Another reason for this error was documented by Bane Radulovic at Oracle Corporation. His research shows that CVU calls smartclt on Linux, and the problem is that smartclt does not return the serial number from our iSCSI devices. For example, a check against /dev/sde shows:
# /usr/sbin/smartctl -i /dev/sde
smartctl version 5.33 [i686-redhat-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Device: Openfile Virtual disk     Version: 0
Serial number:
Device type: disk
Local Time is: Mon Sep  3 02:02:53 2007 EDT
Device supports SMART and is Disabled
Temperature Warning Disabled or Not Supported
At the time of this writing, it is unknown if the Openfiler developers have plans to fix this.



Install Oracle Clusterware 10g Software


  Perform the following installation procedures from only one of the Oracle RAC nodes in the cluster (linux1)! The Oracle Clusterware software will be installed to both of Oracle RAC nodes in the cluster by the Oracle Universal Installer.


Overview

We are ready to install the Cluster part of the environment - the Oracle Clusterware. In a previous section, we downloaded and extracted the install files for Oracle Clusterware to linux1 in the directory /home/oracle/orainstall/clusterware. This is the only node we need to perform the install from. During the installation of Oracle Clusterware, you will be asked for the nodes involved and to configure in the RAC cluster. During the installation phase, the OUI will copy the required software to all nodes using the remote access we configured in the section "Configure RAC Nodes for Remote Access using SSH".

So, what exactly is the Oracle Clusterware responsible for? It contains all of the cluster and database configuration metadata along with several system management features for RAC. It allows the DBA to register and invite an Oracle instance (or instances) to the cluster. During normal operation, Oracle Clusterware will send messages (via a special ping operation) to all nodes configured in the cluster - often called the heartbeat. If the heartbeat fails for any of the nodes, it checks with the Oracle Clusterware configuration files (on the shared disk) to distinguish between a real node failure and a network failure.

After installing Oracle Clusterware, the Oracle Universal Installer (OUI) used to install the Oracle Database 10g software (next section) will automatically recognize these nodes. Like the Oracle Clusterware install we will be performing in this section, the Oracle Database 10g software only needs to be run from one node. The OUI will copy the software packages to all nodes configured in the RAC cluster.


Oracle Clusterware Shared Files

The two shared files (actually file groups) used by Oracle Clusterware will be stored on the Oracle Cluster File System, Release 2 (OFCS2) we created earlier. The two shared Oracle Clusterware file groups are:

  It is not possible to use Automatic Storage Management (ASM) for the two shared Oracle Clusterware files: Oracle Cluster Registry (OCR) or the CRS Voting Disk files. The problem is that these files need to be in place and accessible BEFORE any Oracle instances can be started. For ASM to be available, the ASM instance would need to be run first.

Also note that the two shared files could be stored on the OCFS2, shared RAW devices, or another vendor's clustered file system.


Verifying Terminal Shell Environment

Before starting the Oracle Universal Installer, you should first verify you are logged onto the server you will be running the installer from (i.e. linux1) then run the xhost command as root from the console to allow X Server connections. Next, login as the oracle user account. If you are using a remote client to connect to the node performing the installation (SSH / Telnet to linux1 from a workstation configured with an X Server), you will need to set the DISPLAY variable to point to your local workstation. Finally, verify remote access / user equivalence to all nodes in the cluster:

Verify Server and Enable X Server Access

# hostname
linux1

# xhost +
access control disabled, clients can connect from any host

Login as the oracle User Account and Set DISPLAY (if necessary)

# su - oracle

$ # IF YOU ARE USING A REMOTE CLIENT TO CONNECT TO THE
$ # NODE PERFORMING THE INSTALL
$ DISPLAY=<your local workstation>:0.0
$ export DISPLAY

Verify Remote Access / User Equivalence

Verify you are able to run the Secure Shell commands (ssh or scp) on the Linux server you will be running the Oracle Universal Installer from against all other Linux servers in the cluster without being prompted for a password.

When using the secure shell method, user equivalence will need to be enabled on any new terminal shell session before attempting to run the OUI. To enable user equivalence for the current terminal shell session, perform the following steps remembering to enter the pass phrase for the RSA key you generated when prompted:

$ exec /usr/bin/ssh-agent $SHELL
$ /usr/bin/ssh-add
Enter passphrase for /home/oracle/.ssh/id_rsa: xxxxx
Identity added: /home/oracle/.ssh/id_rsa (/home/oracle/.ssh/id_rsa)

$ ssh linux1 "date;hostname"
Mon Sep  3 02:10:59 EST 2007
linux1

$ ssh linux2 "date;hostname"
Mon Sep  3 02:11:38 EST 2007
linux2


Installing Oracle Clusterware

The following tasks are used to install the Oracle Clusterware:
$ cd ~oracle
$ /home/oracle/orainstall/clusterware/runInstaller

Screen Name Response
Welcome Screen Click Next
Specify Inventory directory and credentials Accept the default values:
   Inventory directory: /u01/app/oracle/oraInventory
   Operating System group name: oinstall
Specify Home Details Set the Name and Path for the ORACLE_HOME (actually the $ORA_CRS_HOME that I will be using in this article) as follows:
   Name: OraCrs10g_home
   Path: /u01/app/crs
Product-Specific Prerequisite Checks The installer will run through a series of checks to determine if the node meets the minimum requirements for installing and configuring the Oracle Clusterware software. If any of the checks fail, you will need to manually verify the check that failed by clicking on the checkbox. For my installation, all checks passed with no problems.

Click Next to continue.

Specify Cluster Configuration Cluster Name: crs
Public Node Name Private Node Name Virtual Node Name
linux1 linux1-priv linux1-vip
linux2 linux2-priv linux2-vip
Specify Network Interface Usage
Interface Name Subnet Interface Type
eth0 192.168.1.0 Public
eth1 192.168.2.0 Private
Specify Oracle Cluster Registry (OCR) Location Starting with Oracle Database 10g Release 2 (10.2) with RAC, Oracle Clusterware provides for the creation of a mirrored Oracle Cluster Registry (OCR) file, enhancing cluster reliability. For the purpose of this example, I did choose to mirror the OCR file by keeping the default option of "Normal Redundancy":

Specify OCR Location: /u02/oradata/orcl/OCRFile
Specify OCR Mirror Location: /u02/oradata/orcl/OCRFile_mirror

Specify Voting Disk Location Starting with Oracle Database 10g Release 2 (10.2) with RAC, CSS has been modified to allow you to configure CSS with multiple voting disks. In 10g Release 1 (10.1), you could configure only one voting disk. By enabling multiple voting disk configuration, the redundant voting disks allow you to configure a RAC database with multiple voting disks on independent shared physical disks. This option facilitates the use of the iSCSI network protocol, and other Network Attached Storage (NAS) storage solutions. Note that to take advantage of the benefits of multiple voting disks, you must configure at least three voting disks. For the purpose of this example, I did choose to mirror the voting disk by keeping the default option of "Normal Redundancy":

Voting Disk Location: /u02/oradata/orcl/CSSFile
Additional Voting Disk 1 Location: /u02/oradata/orcl/CSSFile_mirror1
Additional Voting Disk 2 Location: /u02/oradata/orcl/CSSFile_mirror2

Summary Click Install to start the installation!
Execute Configuration Scripts After the installation has completed, you will be prompted to run the orainstRoot.sh and root.sh script. Open a new console window on each node in the RAC cluster, (starting with the node you are performing the install from), as the "root" user account.

Navigate to the /u01/app/oracle/oraInventory directory and run orainstRoot.sh ON BOTH NODES in the RAC cluster.

Note: After executing the orainstRoot.sh on both nodes, verify the permissions of the file "/etc/oraInst.loc" are 644 (-rw-r--r--) and owned by root. Problems can occur during the installation of Oracle if the oracle user account does not have read permissions to this file - (the location of the oraInventory directory cannot be determined). For example, during the Oracle Clusterware post-installation process (while running the Oracle Clusterware Verification Utility), the following error will occur: "CRS is not installed on any of the nodes." If the permissions to /etc/oraInst.loc are not set correctly, it is possible you didn't run orainstRoot.sh on both nodes before running root.sh. Also, the umask setting may be off - it should be 0022. Run the following on both nodes in the RAC cluster to correct this problem:

# chmod 644 /etc/oraInst.loc
# ls -l /etc/oraInst.loc
-rw-r--r-- 1 root root 63 Sep 3 11:06 /etc/oraInst.loc


Within the same new console window on each node in the RAC cluster, (starting with the node you are performing the install from), stay logged in as the "root" user account.

Navigate to the /u01/app/crs directory and locate the root.sh file for both of the Oracle RAC nodes in the cluster - (starting with the node you are performing the install from). Run the root.sh file ON BOTH NODES in the RAC cluster ONE AT A TIME.

If the Oracle Clusterware home directory is a subdirectory of the ORACLE_BASE directory (which should never be!), you will receive several warnings regarding permissions while running the root.sh script on both nodes. These warnings can be safely ignored.

The root.sh may take awhile to run. When running the root.sh on the last node, you will receive a critical error and the output should look like:

...
Expecting the CRS daemons to be up within 600 seconds.
CSS is active on these nodes.
    linux1
    linux2
CSS is active on all nodes.
Waiting for the Oracle CRSD and EVMD to start
Oracle CRS stack installed and running under init(1M)
Running vipca(silent) for configuring nodeapps
The given interface(s), "eth0" is not public. Public interfaces should be used to configure virtual IPs.

This issue is specific to Oracle 10.2.0.1 (noted in Metalink article 338924.1) and needs to be resolved before continuing. The easiest workaround is to re-run vipca (GUI) manually as root from the last node in which the error occurred. Please keep in mind that vipca is a GUI and will need to set your DISPLAY variable accordingly to your X server:

# $ORA_CRS_HOME/bin/vipca

When the "VIP Configuration Assistant" appears, this is how I answered the screen prompts:

   Welcome: Click Next
   Network interfaces: Select only the public interface - eth0
   Virtual IPs for cluster nodes:
       Node Name: linux1
       IP Alias Name: linux1-vip
       IP Address: 192.168.1.200
       Subnet Mask: 255.255.255.0

       Node Name: linux2
       IP Alias Name: linux2-vip
       IP Address: 192.168.1.201
       Subnet Mask: 255.255.255.0

   Summary: Click Finish
   Configuration Assistant Progress Dialog: Click OK after configuration is complete.
   Configuration Results: Click Exit

Go back to the OUI and acknowledge the "Execute Configuration scripts" dialog window.

End of installation At the end of the installation, exit from the OUI.


Verify Oracle Clusterware Installation

After the installation of Oracle Clusterware, we can run through several tests to verify the install was successful. Run the following commands on both nodes in the RAC cluster.

Check Cluster Nodes

$ $ORA_CRS_HOME/bin/olsnodes -n
linux1  1
linux2  2
Confirm Oracle Clusterware Function
$ $ORA_CRS_HOME/bin/crs_stat -t -v
Name           Type           R/RA   F/FT   Target    State     Host
----------------------------------------------------------------------
ora.linux1.gsd application    0/5    0/0    ONLINE    ONLINE    linux1
ora.linux1.ons application    0/3    0/0    ONLINE    ONLINE    linux1
ora.linux1.vip application    0/0    0/0    ONLINE    ONLINE    linux1
ora.linux2.gsd application    0/5    0/0    ONLINE    ONLINE    linux2
ora.linux2.ons application    0/3    0/0    ONLINE    ONLINE    linux2
ora.linux2.vip application    0/0    0/0    ONLINE    ONLINE    linux2
Check CRS Status
$ $ORA_CRS_HOME/bin/crsctl check crs
CSS appears healthy
CRS appears healthy
EVM appears healthy
Check Oracle Clusterware Auto-Start Scripts
$ ls -l /etc/init.d/init.*
-r-xr-xr-x  1 root root  1951 Sep  3 11:28 /etc/init.d/init.crs
-r-xr-xr-x  1 root root  4714 Sep  3 11:28 /etc/init.d/init.crsd
-r-xr-xr-x  1 root root 35394 Sep  3 11:28 /etc/init.d/init.cssd
-r-xr-xr-x  1 root root  3190 Sep  3 11:28 /etc/init.d/init.evmd



Install Oracle Database 10g Software


  Perform the following installation procedures from only one of the Oracle RAC nodes in the cluster (linux1)! The Oracle Database software will be installed to both of Oracle RAC nodes in the cluster by the Oracle Universal Installer.


Overview

After successfully installing the Oracle Clusterware software, the next step is to install the Oracle Database 10g Release 2 Software (10.2.0.1.0) with Real Application Clusters (RAC).

  For the purpose of this example, we will forgo the "Create Database" option when installing the Oracle Database 10g Release 2 software. We will, instead, create the database using the Database Configuration Assistant (DBCA) after the Oracle Database 10g Software install.

Like the Oracle Clusterware install (previous section), the Oracle Database 10g software only needs to be run from one node. The OUI will copy the software packages to all nodes configured in the RAC cluster.


Verifying Terminal Shell Environment

As discussed in the previous section, (Install Oracle Clusterware 10g Software), the terminal shell environment needs to be configured for remote access and user equivalence to all nodes in the cluster before running the Oracle Universal Installer. Note that you can utilize the same terminal shell session used in the previous section which in this case, you do not have to perform any of the actions described below with regards to setting up remote access and the DISPLAY variable:

Login as the oracle User Account and Set DISPLAY (if necessary)

# su - oracle

$ # IF YOU ARE USING A REMOTE CLIENT TO CONNECT TO THE
$ # NODE PERFORMING THE INSTALL
$ DISPLAY=<your local workstation>:0.0
$ export DISPLAY

Verify Remote Access / User Equivalence

Verify you are able to run the Secure Shell commands (ssh or scp) on the Linux server you will be running the Oracle Universal Installer from against all other Linux servers in the cluster without being prompted for a password.

When using the secure shell method, user equivalence will need to be enabled on any new terminal shell session before attempting to run the OUI. To enable user equivalence for the current terminal shell session, perform the following steps remembering to enter the pass phrase for the RSA key you generated when prompted:

$ exec /usr/bin/ssh-agent $SHELL
$ /usr/bin/ssh-add
Enter passphrase for /home/oracle/.ssh/id_rsa: xxxxx
Identity added: /home/oracle/.ssh/id_rsa (/home/oracle/.ssh/id_rsa)

$ ssh linux1 "date;hostname"
Mon Sep  3 02:10:59 EST 2007
linux1

$ ssh linux2 "date;hostname"
Mon Sep  3 02:11:38 EST 2007
linux2


Run the Oracle Cluster Verification Utility

Before installing the Oracle Database Software, we should run the following database pre-installation check using the Cluster Verification Utility (CVU).

  Instructions for configuring CVU can be found in the section "Prerequisites for Using Cluster Verification Utility discussed earlier in this article.

$ cd /home/oracle/orainstall/clusterware/cluvfy
$ ./runcluvfy.sh stage -pre dbinst -n linux1,linux2 -r 10gR2 -verbose
Review the CVU report. Note that this report will contain the same errors we received when checking pre-installation tasks for CRS — failure to find a suitable set of interfaces for VIPs and the failure to find specific RPM packages that do not exist in RHEL4 Update 5. These two errors can be safely ignored.


Install Oracle Database 10g Release 2 Software

Install the Oracle Database 10g Release 2 software as follows:
$ cd ~oracle
$ /home/oracle/orainstall/database/runInstaller

Screen Name Response
Welcome Screen Click Next
Select Installation Type I selected the Enterprise Edition option. If you need other components like Oracle Label Security or if you want to simply customize the environment, select Custom.
Specify Home Details Set the Name and Path for the ORACLE_HOME as follows:
   Name: OraDb10g_home1
   Location: /u01/app/oracle/product/10.2.0/db_1
Specify Hardware Cluster Installation Mode Select the Cluster Installation option then select all nodes available. Click Select All to select all servers: linux1 and linux2.

  If the installation stops here and the status of any of the RAC nodes is "Node not reachable", perform the following checks:

  • Ensure the Oracle Clusterware is running on the node in question.
  • Ensure you are able to reach the node in question from the node you are performing the installation from.
Product-Specific Prerequisite Checks The installer will run through a series of checks to determine if the node meets the minimum requirements for installing and configuring the Oracle database software. If any of the checks fail, you will need to manually verify the check that failed by clicking on the checkbox.

It is possible to receive an error about the available swap space not meeting its minimum requirements:

Checking available swap space requirements...
Expected result: 3036MB
Actual Result: 1983MB

In most cases, you will have the minimum required swap space (as shown above) and this can be safely ignored. Simply click the check-box for "Checking available swap space requirements..." and click Next to continue.

Select Database Configuration Select the option to Install database Software only.

Remember that we will create the clustered database as a separate step using dbca.

Summary Click Install to start the installation!
Root Script Window - Run root.sh After the installation has completed, you will be prompted to run the root.sh script. It is important to keep in mind that the root.sh script will need to be run ON BOTH NODES in the RAC cluster ONE AT A TIME starting with the node you are running the database installation from.

First, open a new console window on the node you are installing the Oracle Database 10g software from as the root user account. For me, this was "linux1".

Navigate to the /u01/app/oracle/product/10.2.0/db_1 directory and run root.sh.

After running the root.sh script on both nodes in the cluster, go back to the OUI and acknowledge the "Execute Configuration scripts" dialog window.

End of installation At the end of the installation, exit from the OUI.



Install Oracle Database 10g Companion CD Software


  Perform the following installation procedures from only one of the Oracle RAC nodes in the cluster (linux1)! The Oracle Database 10g Companion CD software will be installed to both of Oracle RAC nodes in the cluster by the Oracle Universal Installer.


Overview

After successfully installing the Oracle Database software, the next step is to install the Oracle Database 10g Companion CD Release 2 software (10.2.0.1.0).

Please keep in mind that this is an optional step. For the purpose of this article, my testing database will often make use of the Java Virtual Machine (Java VM) and Oracle interMedia and therefore will require the installation of the Oracle Database 10g Companion CD. The type of installation to perform will be the Oracle Database 10g Products installation type.

This installation type includes the Natively Compiled Java Libraries (NCOMP) files to improve Java performance. If you do not install the NCOMP files, the "ORA-29558:JAccelerator (NCOMP) not installed" error occurs when a database that uses Java VM is upgraded to the patch release.

Like the Oracle Clusterware and Database install (previous sections), the Oracle Database 10g Companion CD software only needs to be run from one node. The OUI will copy the software packages to all nodes configured in the RAC cluster.


Verifying Terminal Shell Environment

As discussed in the previous section, (Install Oracle Database 10g Software), the terminal shell environment needs to be configured for remote access and user equivalence to all nodes in the cluster before running the Oracle Universal Installer. Note that you can utilize the same terminal shell session used in the previous section which in this case, you do not have to perform any of the actions described below with regards to setting up remote access and the DISPLAY variable:

Login as the oracle User Account and Set DISPLAY (if necessary)

# su - oracle

$ # IF YOU ARE USING A REMOTE CLIENT TO CONNECT TO THE
$ # NODE PERFORMING THE INSTALL
$ DISPLAY=<your local workstation>:0.0
$ export DISPLAY

Verify Remote Access / User Equivalence

Verify you are able to run the Secure Shell commands (ssh or scp) on the Linux server you will be running the Oracle Universal Installer from against all other Linux servers in the cluster without being prompted for a password.

When using the secure shell method, user equivalence will need to be enabled on any new terminal shell session before attempting to run the OUI. To enable user equivalence for the current terminal shell session, perform the following steps remembering to enter the pass phrase for the RSA key you generated when prompted:

$ exec /usr/bin/ssh-agent $SHELL
$ /usr/bin/ssh-add
Enter passphrase for /home/oracle/.ssh/id_rsa: xxxxx
Identity added: /home/oracle/.ssh/id_rsa (/home/oracle/.ssh/id_rsa)

$ ssh linux1 "date;hostname"
Mon Sep  3 02:10:59 EST 2007
linux1

$ ssh linux2 "date;hostname"
Mon Sep  3 02:11:38 EST 2007
linux2


Install Oracle Database 10g Companion CD Software

Install the Oracle Database 10g Companion CD Software as follows:
$ cd ~oracle
$ /home/oracle/orainstall/companion/runInstaller

Screen Name Response
Welcome Screen Click Next
Select a Product to Install Select the Oracle Database 10g Products 10.2.0.1.0 option.
Specify Home Details Set the destination for the ORACLE_HOME Name and Path to that of the previous Oracle Database 10g software install as follows:
   Name: OraDb10g_home1
   Path: /u01/app/oracle/product/10.2.0/db_1
Specify Hardware Cluster Installation Mode The Cluster Installation option will be selected along with all of the available nodes in the cluster by default. Stay with these default options and click Next to continue.

  If the installation stops here and the status of any of the RAC nodes is "Node not reachable", perform the following checks:

  • Ensure the Oracle Clusterware is running on the node in question.
  • Ensure you are able to reach the node in question from the node you are performing the installation from.
Product-Specific Prerequisite Checks The installer will run through a series of checks to determine if the node meets the minimum requirements for installing and configuring the Oracle Database 10g Companion CD Software. If any of the checks fail, you will need to manually verify the check that failed by clicking on the checkbox. For my installation, all checks passed with no problems.

Click Next to continue.

Summary On the Summary screen, click Install to start the installation!
End of installation At the end of the installation, exit from the OUI.



Create TNS Listener Process


  Perform the following configuration procedures from only one of the Oracle RAC nodes in the cluster (linux1)! The Network Configuration Assistant (NETCA) will setup the TNS listener in a clustered configuration on both of Oracle RAC nodes in the cluster.


Overview

The Database Configuration Assistant (DBCA) requires the Oracle TNS Listener process to be configured and running on all nodes in the RAC cluster before it can create the clustered database.

The process of creating the TNS listener only needs to be performed from one of the nodes in the RAC cluster. All changes will be made and replicated to both Oracle RAC nodes in the cluster. On one of the nodes (I will be using linux1) bring up the Network Configuration Assistant (NETCA) and run through the process of creating a new TNS listener process and to also configure the node for local access.


Verifying Terminal Shell Environment

As discussed in the previous section, (Install Oracle Database 10g Companion CD Software), the terminal shell environment needs to be configured for remote access and user equivalence to all nodes in the cluster before running the Network Configuration Assistant (NETCA). Note that you can utilize the same terminal shell session used in the previous section which in this case, you do not have to perform any of the actions described below with regards to setting up remote access and the DISPLAY variable:

Login as the oracle User Account and Set DISPLAY (if necessary)

# su - oracle

$ # IF YOU ARE USING A REMOTE CLIENT TO CONNECT TO THE
$ # NODE PERFORMING THE INSTALL
$ DISPLAY=<your local workstation>:0.0
$ export DISPLAY

Verify Remote Access / User Equivalence

Verify you are able to run the Secure Shell commands (ssh or scp) on the Linux server you will be running the Oracle Universal Installer from against all other Linux servers in the cluster without being prompted for a password.

When using the secure shell method, user equivalence will need to be enabled on any new terminal shell session before attempting to run the OUI. To enable user equivalence for the current terminal shell session, perform the following steps remembering to enter the pass phrase for the RSA key you generated when prompted:

$ exec /usr/bin/ssh-agent $SHELL
$ /usr/bin/ssh-add
Enter passphrase for /home/oracle/.ssh/id_rsa: xxxxx
Identity added: /home/oracle/.ssh/id_rsa (/home/oracle/.ssh/id_rsa)

$ ssh linux1 "date;hostname"
Mon Sep  3 02:10:59 EST 2007
linux1

$ ssh linux2 "date;hostname"
Mon Sep  3 02:11:38 EST 2007
linux2


Run the Network Configuration Assistant

To start the NETCA, run the following:
$ netca &

The following table walks you through the process of creating a new Oracle listener for our RAC environment.

Screen Name Response
Select the Type of Oracle
Net Services Configuration
Select Cluster configuration
Select the nodes to configure Select all of the nodes: linux1 and linux2.
Type of Configuration Select Listener configuration.
Listener Configuration -
Next 6 Screens
The following screens are now like any other normal listener configuration. You can simply accept the default parameters for the next six screens:
   What do you want to do: Add
   Listener name: LISTENER
   Selected protocols: TCP
   Port number: 1521
   Configure another listener: No
   Listener configuration complete! [ Next ]
You will be returned to this Welcome (Type of Configuration) Screen.
Type of Configuration Select Naming Methods configuration.
Naming Methods Configuration The following screens are:
   Selected Naming Methods: Local Naming
   Naming Methods configuration complete! [ Next ]
You will be returned to this Welcome (Type of Configuration) Screen.
Type of Configuration Click Finish to exit the NETCA.


Verify TNS Listener Configuration

The Oracle TNS listener process should now be running on both nodes in the RAC cluster:
$ hostname
linux1

$ ps -ef | grep lsnr | grep -v 'grep' | grep -v 'ocfs' | awk '{print $9}'
LISTENER_LINUX1

$ $ORA_CRS_HOME/bin/crs_stat ora.linux1.LISTENER_LINUX1.lsnr
NAME=ora.linux1.LISTENER_LINUX1.lsnr
TYPE=application
TARGET=ONLINE
STATE=ONLINE on linux1

=====================

$ hostname
linux2

$ ps -ef | grep lsnr | grep -v 'grep' | grep -v 'ocfs' | awk '{print $9}'
LISTENER_LINUX2

$ $ORA_CRS_HOME/bin/crs_stat ora.linux2.LISTENER_LINUX2.lsnr
NAME=ora.linux2.LISTENER_LINUX2.lsnr
TYPE=application
TARGET=ONLINE
STATE=ONLINE on linux2



Create the Oracle Cluster Database


  The database creation process should only be performed from one of the Oracle RAC nodes in the cluster (linux1)!


Overview

We will be using the Oracle Database Configuration Assistant (DBCA) to create the clustered database.

Before executing the Database Configuration Assistant, make sure that $ORACLE_HOME and $PATH are set appropriately for the $ORACLE_BASE/product/10.2.0/db_1 environment.

You should also verify that all services we have installed up to this point (Oracle TNS listener, Oracle Clusterware processes, etc.) are running before attempting to start the clustered database creation process.


Verifying Terminal Shell Environment

As discussed in the previous section, (Create TNS Listener Process), the terminal shell environment needs to be configured for remote access and user equivalence to all nodes in the cluster before running the Database Configuration Assistant (DBCA). Note that you can utilize the same terminal shell session used in the previous section which in this case, you do not have to perform any of the actions described below with regards to setting up remote access and the DISPLAY variable:

Login as the oracle User Account and Set DISPLAY (if necessary)

# su - oracle

$ # IF YOU ARE USING A REMOTE CLIENT TO CONNECT TO THE
$ # NODE PERFORMING THE INSTALL
$ DISPLAY=<your local workstation>:0.0
$ export DISPLAY

Verify Remote Access / User Equivalence

Verify you are able to run the Secure Shell commands (ssh or scp) on the Linux server you will be running the DBCA from against all other Linux servers in the cluster without being prompted for a password.

When using the secure shell method, user equivalence will need to be enabled on any new terminal shell session before attempting to run the DBCA. To enable user equivalence for the current terminal shell session, perform the following steps remembering to enter the pass phrase for the RSA key you generated when prompted:

$ exec /usr/bin/ssh-agent $SHELL
$ /usr/bin/ssh-add
Enter passphrase for /home/oracle/.ssh/id_rsa: xxxxx
Identity added: /home/oracle/.ssh/id_rsa (/home/oracle/.ssh/id_rsa)

$ ssh linux1 "date;hostname"
Mon Sep  3 02:10:59 EST 2007
linux1

$ ssh linux2 "date;hostname"
Mon Sep  3 02:11:38 EST 2007
linux2


Run the Oracle Cluster Verification Utility

Before creating the Oracle clustered database, we should run the following database configuration check using the Cluster Verification Utility (CVU).

  Instructions for configuring CVU can be found in the section "Prerequisites for Using Cluster Verification Utility discussed earlier in this article.

$ cd /home/oracle/orainstall/clusterware/cluvfy
$ ./runcluvfy.sh stage -pre dbcfg -n linux1,linux2 -d ${ORACLE_HOME} -verbose
Review the CVU report. Note that this report will contain the same error we received when checking pre-installation tasks for CRS — failure to find a suitable set of interfaces for VIPs. This error can be safely ignored.


Create the Clustered Database

To start the database creation process, run the following:

$ dbca &
Screen Name Response
Welcome Screen Select Oracle Real Application Clusters database.
Operations Select Create a Database.
Node Selection Click the Select All button to select all servers: linux1 and linux2.
Database Templates Select Custom Database
Database Identification Select:
   Global Database Name: orcl.idevelopment.info
   SID Prefix: orcl

  I used idevelopment.info for the database domain. You may use any domain. Keep in mind that this domain does not have to be a valid DNS domain.

Management Option Leave the default options here which is to Configure the Database with Enterprise Manager / Use Database Control for Database Management
Database Credentials I selected to Use the Same Password for All Accounts. Enter the password (twice) and make sure the password does not start with a digit number.
Storage Options For this article, we will select to use Automatic Storage Management (ASM).
Create ASM Instance Supply the SYS password to use for the new ASM instance.

Also, starting with Oracle10g Release 2, the ASM instance server parameter file (SPFILE) needs to be on a shared disk. You will need to modify the default entry for "Create server parameter file (SPFILE)" to reside on the OCFS2 partition as follows: /u02/oradata/orcl/dbs/spfile+ASM.ora. All other options can stay at their defaults.

You will then be prompted with a dialog box asking if you want to create and start the ASM instance. Select the OK button to acknowledge this dialog.

The OUI will now create and start the ASM instance on all nodes in the RAC cluster.

ASM Disk Groups To start, click the Create New button. This will bring up the "Create Disk Group" window with the four volumes we configured earlier using ASMLib.

If the volumes we created earlier in this article do not show up in the "Select Member Disks" window: (ORCL:VOL1, ORCL:VOL2, ORCL:VOL3, and ORCL:VOL4) then click on the "Change Disk Discovery Path" button and input "ORCL:VOL*".

For the first "Disk Group Name" I used the string ORCL_DATA1. Select the first two ASM volumes (ORCL:VOL1 and ORCL:VOL2) in the "Select Member Disks" window. Keep the "Redundancy" setting to Normal.

After verifying all values in this window are correct, click the OK button. This will present the "ASM Disk Group Creation" dialog. When the ASM Disk Group Creation process is finished, you will be returned to the "ASM Disk Groups" windows.

Click the Create New button again. For the second "Disk Group Name", I used the string FLASH_RECOVERY_AREA. Select the last two ASM volumes (ORCL:VOL3 and ORCL:VOL4) in the "Select Member Disks" window. Keep the "Redundancy" setting to Normal.

After verifying all values in this window are correct, click the OK button. This will present the "ASM Disk Group Creation" dialog.

When the ASM Disk Group Creation process is finished, you will be returned to the "ASM Disk Groups" window with two disk groups created and selected. Select only one of the disk groups by using the checkbox next to the newly created Disk Group Name ORCL_DATA1 (ensure that the disk group for FLASH_RECOVERY_AREA is not selected) and click Next to continue.

Database File Locations I selected to use the default which is Use Oracle-Managed Files:
   Database Area: +ORCL_DATA1
Recovery Configuration Check the option for Specify Flash Recovery Area.

For the Flash Recovery Area, click the [Browse] button and select the disk group name +FLASH_RECOVERY_AREA.

My disk group has a size of about 118GB. When defining the Flash Recovery Area size, use the entire volume minus 10% — (118-10%=106 GB). I used a Flash Recovery Area Size of 106 GB (108544 MB).

Database Content I left all of the Database Components (and destination tablespaces) set to their default value, although it is perfectly OK to select the Sample Schemas. This option is available since we installed the Oracle Companion CD software.
Database Services For this test configuration, click Add, and enter the Service Name: orcl_taf. Leave both instances set to Preferred and for the "TAF Policy" select Basic.
Initialization Parameters Change any parameters for your environment. I left them all at their default settings.
Database Storage Change any parameters for your environment. I left them all at their default settings.
Creation Options Keep the default option Create Database selected and click Finish to start the database creation process.

Click OK on the "Summary" screen.

End of Database Creation At the end of the database creation, exit from the DBCA.

Note: When exiting the DBCA you will not receive any feedback from the dialog window for around 30-60 seconds. After awhile, another dialog will come up indicating that it is starting all Oracle instances and HA service "orcl_taf". This may take several minutes to complete. When finished, all windows and dialog boxes will disappear.

When the Oracle Database Configuration Assistant has completed, you will have a fully functional Oracle RAC cluster running!

$ $ORA_CRS_HOME/bin/crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora....SM1.asm application    ONLINE    ONLINE    linux1
ora....X1.lsnr application    ONLINE    ONLINE    linux1
ora.linux1.gsd application    ONLINE    ONLINE    linux1
ora.linux1.ons application    ONLINE    ONLINE    linux1
ora.linux1.vip application    ONLINE    ONLINE    linux1
ora....SM2.asm application    ONLINE    ONLINE    linux2
ora....X2.lsnr application    ONLINE    ONLINE    linux2
ora.linux2.gsd application    ONLINE    ONLINE    linux2
ora.linux2.ons application    ONLINE    ONLINE    linux2
ora.linux2.vip application    ONLINE    ONLINE    linux2
ora.orcl.db    application    ONLINE    ONLINE    linux1
ora....l1.inst application    ONLINE    ONLINE    linux1
ora....l2.inst application    ONLINE    ONLINE    linux2
ora...._taf.cs application    ONLINE    ONLINE    linux2
ora....cl1.srv application    ONLINE    ONLINE    linux1
ora....cl2.srv application    ONLINE    ONLINE    linux2


Create the orcl_taf Service

During the creation of the Oracle clustered database, we added a service named "orcl_taf" that will be used to connect to the database with TAF enabled. During several of my installs, the service was added to the tnsnames.ora, but was never updated as a service for each Oracle instance.

Use the following to verify the orcl_taf service was successfully added:

SQL> show parameter service

NAME                 TYPE        VALUE
-------------------- ----------- --------------------------------
service_names        string      orcl.idevelopment.info, orcl_taf

If the only service defined was for orcl.idevelopment.info, then you will need to manually add the service to both instances:

SQL> show parameter service

NAME                 TYPE        VALUE
-------------------- ----------- --------------------------
service_names        string      orcl.idevelopment.info

SQL> alter system set service_names = 
  2  'orcl.idevelopment.info, orcl_taf.idevelopment.info' scope=both;



Post-Installation Tasks - (Optional)

This chapter describes several optional tasks that can be applied to your new Oracle 10g environment in order to enhance availability as well as database management.


Re-compile Invalid Objects

Run the utlrp.sql script to recompile all invalid PL/SQL packages now instead of when the packages are accessed for the first time. This step is optional but recommended.
$ sqlplus / as sysdba
SQL> @?/rdbms/admin/utlrp.sql


Enabling Archive Logs in a RAC Environment

Whether a single instance or clustered database, Oracle tracks and logs all changes to database blocks in online redolog files. In an Oracle RAC environment, each instance will have its own set of online redolog files known as a thread. Each Oracle instance will use its group of online redologs in a circular manner. Once an online redolog fills, Oracle moves to the next one. If the database is in "Archive Log Mode", Oracle will make a copy of the online redo log before it gets reused. A thread must contain at least two online redologs (or online redolog groups). The same holds true for a single instance configuration. The single instance must contain at least two online redologs (or online redolog groups).

The size of an online redolog file is completely independent of another intances' redolog size. Although in most configurations the size is the same, it may be different depending on the workload and backup / recovery considerations for each node. It is also worth mentioning that each instance has exclusive write access to its own online redolog files. In a correctly configured RAC environment, however, each instance can read another instance's current online redolog file to perform instance recovery if that instance was terminated abnormally. It is therefore a requirement that online redo logs be located on a shared storage device (just like the database files).

As already mentioned, Oracle writes to its online redolog files in a circular manner. When the current online redolog fills, Oracle will switch to the next one. To facilitate media recovery, Oracle allows the DBA to put the database into "Archive Log Mode" which makes a copy of the online redolog after it fills (and before it gets reused). This is a process known as archiving.

The Database Configuration Assistant (DBCA) allows users to configure a new database to be in archive log mode, however most DBA's opt to bypass this option during initial database creation. In cases like this where the database is in no archive log mode, it is a simple task to put the database into archive log mode. Note however that this will require a short database outage. From one of the nodes in the Oracle RAC configuration, use the following tasks to put a RAC enabled database into archive log mode. For the purpose of this article, I will use the node linux1 which runs the orcl1 instance:

  1. Login to one of the nodes (i.e. linux1) and disable the cluster instance parameter by setting cluster_database to FALSE from the current instance:
    $ sqlplus / as sysdba
    SQL> alter system set cluster_database=false scope=spfile sid='orcl1';

  2. Shutdown all instances accessing the clustered database:
    $ srvctl stop database -d orcl

  3. Using the local instance, MOUNT the database:
    $ sqlplus / as sysdba
    SQL> startup mount

  4. Enable archiving:
    SQL> alter database archivelog;

  5. Re-enable support for clustering by modifying the instance parameter cluster_database to TRUE from the current instance:
    SQL> alter system set cluster_database=true scope=spfile sid='orcl1';

  6. Shutdown the local instance:
    SQL> shutdown immediate

  7. Bring all instance back up using srvctl:
    $ srvctl start database -d orcl

  8. (Optional) Bring any services (i.e. TAF) back up using srvctl:
    $ srvctl start service -d orcl

  9. Login to the local instance and verify Archive Log Mode is enabled:
    $ sqlplus / as sysdba
    SQL> archive log list
    Database log mode              Archive Mode
    Automatic archival             Enabled
    Archive destination            USE_DB_RECOVERY_FILE_DEST
    Oldest online log sequence     83
    Next log sequence to archive   84
    Current log sequence           84

After enabling Archive Log Mode, each instance in the RAC configuration can automatically archive redologs!


Download and Install Custom Oracle Database Scripts

DBA's rely on Oracle's data dictionary views and dynamic performance views in order to support and better manage their databases. Although these views provide a simple and easy mechanism to query critical information regarding the database, it helps to have a collection of accurate and readily available SQL scripts to query these views.

In this section you will download and install a collection of Oracle DBA scripts that can be used to manage many aspects of your database including space management, performance, backups, security, and session management. The Oracle DBA scripts archive can be downloaded using the following link http://www.idevelopment.info/data/Oracle/DBA_scripts/dba_scripts_archive_Oracle.zip. As the oracle user account, download the dba_scripts_archive_Oracle.zip archive to the $ORACLE_BASE directory of each node in the cluster. For the purpose of this example, the dba_scripts_archive_Oracle.zip archive will be copied to /u01/app/oracle. Next, unzip the archive file to the $ORACLE_BASE directory.

For example, perform the following on both nodes in the Oracle RAC cluster as the oracle user account:

$ mv dba_scripts_archive_Oracle.zip /u01/app/oracle
$ cd /u01/app/oracle
$ unzip dba_scripts_archive_Oracle.zip
The final step is to verify (or set) the appropriate environment variable for the current UNIX shell to ensure the Oracle SQL scripts can be run from SQL*Plus while in any directory. For UNIX verify the following environment variable is set and included in your login shell script:
ORACLE_PATH=$ORACLE_BASE/dba_scripts/sql:.:$ORACLE_HOME/rdbms/admin
export ORACLE_PATH

  Note that the ORACLE_PATH environment variable should already be set in the .bash_profile login script that was created in the section Create Login Script for oracle User Account.

Now that the Oracle DBA scripts have been unzipped and the UNIX environment variable ($ORACLE_PATH) has been set to the appropriate directory, you should now be able to run any of the SQL scripts in your $ORACLE_BASE/dba_scripts/sql while logged into SQL*Plus. For example, to query tablespace information while logged into the Oracle database as a DBA user:

SQL> @dba_tablespaces

Status    Tablespace Name TS Type      Ext. Mgt.  Seg. Mgt.    Tablespace Size    Used (in bytes) Pct. Used
--------- --------------- ------------ ---------- --------- ------------------ ------------------ ---------
ONLINE    UNDOTBS1        UNDO         LOCAL      MANUAL         1,283,457,024          9,043,968         1
ONLINE    SYSAUX          PERMANENT    LOCAL      AUTO             524,288,000        378,732,544        72
ONLINE    USERS           PERMANENT    LOCAL      AUTO           2,147,483,648        321,257,472        15
ONLINE    SYSTEM          PERMANENT    LOCAL      MANUAL           838,860,800        505,544,704        60
ONLINE    INDX            PERMANENT    LOCAL      AUTO           1,073,741,824             65,536         0
ONLINE    UNDOTBS2        UNDO         LOCAL      MANUAL         1,283,457,024         22,282,240         2
ONLINE    TEMP            TEMPORARY    LOCAL      MANUAL         1,073,741,824         92,274,688         9
                                                            ------------------ ------------------ ---------
avg                                                                                                      23
sum                                                              8,225,030,144      1,329,201,152

7 rows selected.
To obtain a list of all available Oracle DBA scripts while logged into SQL*Plus, run the help.sql script:
SQL> @help.sql

========================================
Automatic Shared Memory Management
========================================
asmm_components.sql

========================================
Automatic Storage Management
========================================
asm_alias.sql
asm_clients.sql
asm_diskgroups.sql
asm_disks.sql
asm_disks_perf.sql
asm_drop_files.sql
asm_files.sql
asm_files2.sql
asm_templates.sql

< --- SNIP --- >

perf_top_sql_by_buffer_gets.sql
perf_top_sql_by_disk_reads.sql

========================================
Workspace Manager
========================================
wm_create_workspace.sql
wm_disable_versioning.sql
wm_enable_versioning.sql
wm_freeze_workspace.sql
wm_get_workspace.sql
wm_goto_workspace.sql
wm_merge_workspace.sql
wm_refresh_workspace.sql
wm_remove_workspace.sql
wm_unfreeze_workspace.sql
wm_workspaces.sql


Create Shared Oracle Password Files

In this section, I present the steps required to configure a shared Oracle password file between all instances in the Oracle clustered database. The password file for the database in UNIX is located at $ORACLE_HOME/dbs/orapw<ORACLE_SID> for each instance and contains a list of all database users that have SYSDBA privileges. When a database user is granted the SYSDBA role, the instance records this in the database password file for the instance you are logged into. But what about the other instances in the cluster? The database password file on other instances do not get updated and will not contain the user who was just granted the SYSDBA role. Therefore a program (like RMAN) that tries to login as this new user with SYSDBA privileges will fail if it tries to use an instance with a password file that does not contain his or her name.

To resolve this problem, a common solution is to place a single database password file on a shared / clustered file system and then create symbolic links from each of the instances to this single version of the database password file. Since the environment described in this article makes use of the Oracle Clustered File System (OCFS2), we will use it to store the single version of the database password file.

  In this section, we will also be including the Oracle password file for the ASM instance.

  1. Create the database password directory on the clustered file system mounted on /u02/oradata/orcl. Perform the following from only one node in the cluster as the oracle user account - (linux1):
    $ mkdir -p /u02/oradata/orcl/dbs
  2. From one node in the cluster (linux1), move the database password files to the database password directory on the clustered file system. Chose a node that contains a database password file that has the most recent SYSDBA additions. In most cases, this will not matter since any missing entries can be easily added by granting them the SYSDBA role - (plus the fact that this is a fresh install and unlikely you created any SYSDBA users at this point!). Note that the database server does not need to be shutdown while performing the following actions. From linux1 as the oracle user account:

    $ mv $ORACLE_HOME/dbs/orapw+ASM1 /u02/oradata/orcl/dbs/orapw+ASM
    $ mv $ORACLE_HOME/dbs/orapworcl1 /u02/oradata/orcl/dbs/orapworcl
    
    $ ln -s /u02/oradata/orcl/dbs/orapw+ASM $ORACLE_HOME/dbs/orapw+ASM1
    $ ln -s /u02/oradata/orcl/dbs/orapworcl $ORACLE_HOME/dbs/orapworcl1
  3. From the second node in the cluster (linux2):
    $ rm $ORACLE_HOME/dbs/orapw+ASM2
    $ rm $ORACLE_HOME/dbs/orapworcl2
    
    $ ln -s /u02/oradata/orcl/dbs/orapw+ASM $ORACLE_HOME/dbs/orapw+ASM2
    $ ln -s /u02/oradata/orcl/dbs/orapworcl $ORACLE_HOME/dbs/orapworcl2

Now, when a user is granted the SYSDBA role, all instances will have access to the same password file:

SQL> GRANT sysdba TO scott;



Verify TNS Networking Files


  Ensure that the TNS networking files are configured on both Oracle RAC nodes in the cluster!


listener.ora

We already covered how to create a TNS listener configuration file (listener.ora) for a clustered environment in the section Create TNS Listener Process. The listener.ora file should be properly configured and no modifications should be needed.

For clarity, I included a copy of the listener.ora file from my node linux1:

listener.ora
# listener.ora.linux1 Network Configuration File:
# /u01/app/oracle/product/10.2.0/db_1/network/admin/listener.ora.linux1
# Generated by Oracle configuration tools.

LISTENER_LINUX1 =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = TCP)(HOST = linux1-vip)(PORT = 1521)(IP = FIRST))
      (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.100)(PORT = 1521)(IP = FIRST))
    )
  )

SID_LIST_LISTENER_LINUX1 =
  (SID_LIST =
    (SID_DESC =
      (SID_NAME = PLSExtProc)
      (ORACLE_HOME = /u01/app/oracle/product/10.2.0/db_1)
      (PROGRAM = extproc)
    )
  )


tnsnames.ora

Here is a copy of my tnsnames.ora file that was configured by Oracle and can be used for testing the Transparent Application Failover (TAF). This file should already be configured on both of the Oracle RAC nodes in the RAC cluster.

You can include any of these entries on other client machines that need access to the clustered database.

tnsnames.ora
# tnsnames.ora Network Configuration File:
# /u01/app/oracle/product/10.2.0/db_1/network/admin/tnsnames.ora
# Generated by Oracle configuration tools.

LISTENERS_ORCL =
  (ADDRESS_LIST =
    (ADDRESS = (PROTOCOL = TCP)(HOST = linux1-vip)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = linux2-vip)(PORT = 1521))
  )

ORCL2 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = linux2-vip)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = orcl.idevelopment.info)
      (INSTANCE_NAME = orcl2)
    )
  )

ORCL1 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = linux1-vip)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = orcl.idevelopment.info)
      (INSTANCE_NAME = orcl1)
    )
  )

ORCL_TAF =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = linux1-vip)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = linux2-vip)(PORT = 1521))
    (LOAD_BALANCE = yes)
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = orcl_taf.idevelopment.info)
      (FAILOVER_MODE =
        (TYPE = SELECT)
        (METHOD = BASIC)
        (RETRIES = 180)
        (DELAY = 5)
      )
    )
  )

ORCL =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = linux1-vip)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = linux2-vip)(PORT = 1521))
    (LOAD_BALANCE = yes)
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = orcl.idevelopment.info)
    )
  )

EXTPROC_CONNECTION_DATA =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC0))
    )
    (CONNECT_DATA =
      (SID = PLSExtProc)
      (PRESENTATION = RO)
    )
  )


Connecting to Clustered Database From an External Client

This is an optional step, but I like to perform it in order to verify my TNS files are configured correctly. Use another machine (i.e. a Windows machine connected to the network) that has Oracle installed and add the TNS entries (in the tnsnames.ora) from either of the nodes in the cluster that were created for the clustered database.

Important Note: Verify that the machine you are connecting from can resolve all host names exactly how they appear in the listener.ora and tnsnames.ora files. For the purpose of this document, the machine you are connecting from should be able to resolve the following host names in the local hosts file or through DNS:

192.168.1.100    linux1
192.168.1.101    linux2
192.168.1.200    linux1-vip
192.168.1.201    linux2-vip

Try to connect to the clustered database using all available service names defined in the tnsnames.ora file:

C:\> sqlplus system/manager@orcl2
C:\> sqlplus system/manager@orcl1
C:\> sqlplus system/manager@orcl_taf
C:\> sqlplus system/manager@orcl



Create / Alter Tablespaces

When creating the clustered database, we left all tablespaces set to their default size. If you are using a large drive for the shared storage, you may want to make a sizable testing database.

Below are several optional SQL commands for modifying and creating all tablespaces for the test database.

Note: Please keep in mind that the database file names (OMF files) being listed in these examples may differ from what the Oracle Database Configuration Assistant (DBCA) creates for your environment. When working through this section, substitute the data file names that were created in your environment where appropriate. The following query can be used to determine the file names for your environment:

SQL> select tablespace_name, file_name
  2  from dba_data_files
  3  union
  4  select tablespace_name, file_name
  5  from dba_temp_files;

TABLESPACE_NAME     FILE_NAME
--------------- --------------------------------------------------
EXAMPLE         +ORCL_DATA1/orcl/datafile/example.257.570913311
INDX            +ORCL_DATA1/orcl/datafile/indx.270.570920045
SYSAUX          +ORCL_DATA1/orcl/datafile/sysaux.260.570913287
SYSTEM          +ORCL_DATA1/orcl/datafile/system.262.570913215
TEMP            +ORCL_DATA1/orcl/tempfile/temp.258.570913303
UNDOTBS1        +ORCL_DATA1/orcl/datafile/undotbs1.261.570913263
UNDOTBS2        +ORCL_DATA1/orcl/datafile/undotbs2.265.570913331
USERS           +ORCL_DATA1/orcl/datafile/users.264.570913355

$ sqlplus "/ as sysdba"

SQL> create user scott identified by tiger default tablespace users;
SQL> grant dba, resource, connect to scott;

SQL> alter database datafile '+ORCL_DATA1/orcl/datafile/users.264.570913355' resize 1024m;
SQL> alter tablespace users add datafile '+ORCL_DATA1' size 1024m autoextend off;

SQL> create tablespace indx datafile '+ORCL_DATA1' size 1024m
  2  autoextend on next 50m maxsize unlimited
  3  extent management local autoallocate
  4  segment space management auto;

SQL> alter database datafile '+ORCL_DATA1/orcl/datafile/system.262.570913215' resize 800m;

SQL> alter database datafile '+ORCL_DATA1/orcl/datafile/sysaux.260.570913287' resize 500m;

SQL> alter tablespace undotbs1 add datafile '+ORCL_DATA1' size 1024m
  2  autoextend on next 50m maxsize 2048m;

SQL> alter tablespace undotbs2 add datafile '+ORCL_DATA1' size 1024m
  2  autoextend on next 50m maxsize 2048m;

SQL> alter database tempfile '+ORCL_DATA1/orcl/tempfile/temp.258.570913303' resize 1024m;

Here is a snapshot of the tablespaces I have defined for my test database environment:

Status    Tablespace Name TS Type      Ext. Mgt.  Seg. Mgt.    Tablespace Size    Used (in bytes) Pct. Used
--------- --------------- ------------ ---------- --------- ------------------ ------------------ ---------
ONLINE    UNDOTBS1        UNDO         LOCAL      MANUAL         1,283,457,024         85,065,728         7
ONLINE    SYSAUX          PERMANENT    LOCAL      AUTO             524,288,000        275,906,560        53
ONLINE    USERS           PERMANENT    LOCAL      AUTO           2,147,483,648            131,072         0
ONLINE    SYSTEM          PERMANENT    LOCAL      MANUAL           838,860,800        500,301,824        60
ONLINE    EXAMPLE         PERMANENT    LOCAL      AUTO             157,286,400         83,820,544        53
ONLINE    INDX            PERMANENT    LOCAL      AUTO           1,073,741,824             65,536         0
ONLINE    UNDOTBS2        UNDO         LOCAL      MANUAL         1,283,457,024          3,801,088         0
ONLINE    TEMP            TEMPORARY    LOCAL      MANUAL         1,073,741,824         27,262,976         3
                                                            ------------------ ------------------ ---------
avg                                                                                                      22
sum                                                              8,382,316,544        976,355,328

8 rows selected.



Verify the RAC Cluster & Database Configuration


  The following RAC verification checks should be performed on both Oracle RAC nodes in the cluster! For this article, however, I will only be performing checks from linux1.


Overview

This section provides several srvctl commands and SQL queries that can be used to validate your Oracle RAC configuration.

  There are five node-level tasks defined for SRVCTL:

  • Adding and deleting node level applications.
  • Setting and unsetting the environment for node-level applications.
  • Administering node applications.
  • Administering ASM instances.
  • Starting and stopping a group of programs that includes virtual IP addresses, listeners, Oracle Notification Services, and Oracle Enterprise Manager agents (for maintenance purposes).


Status of all instances and services

$ srvctl status database -d orcl
Instance orcl1 is running on node linux1
Instance orcl2 is running on node linux2


Status of a single instance

$ srvctl status instance -d orcl -i orcl2
Instance orcl2 is running on node linux2


Status of a named service globally across the database

$ srvctl status service -d orcl -s orcl_taf
Service orcl_taf is running on instance(s) orcl2, orcl1


Status of node applications on a particular node

$ srvctl status nodeapps -n linux1
VIP is running on node: linux1
GSD is running on node: linux1
Listener is running on node: linux1
ONS daemon is running on node: linux1


Status of an ASM instance

$ srvctl status asm -n linux1
ASM instance +ASM1 is running on node linux1.


List all configured databases

$ srvctl config database
orcl


Display configuration for our RAC database

$ srvctl config database -d orcl
linux1 orcl1 /u01/app/oracle/product/10.2.0/db_1
linux2 orcl2 /u01/app/oracle/product/10.2.0/db_1


Display all services for the specified cluster database

$ srvctl config service -d orcl
orcl_taf PREF: orcl2 orcl1 AVAIL:


Display the configuration for node applications - (VIP, GSD, ONS, Listener)

$ srvctl config nodeapps -n linux1 -a -g -s -l
VIP exists.: /linux1-vip/192.168.1.200/255.255.255.0/eth0:eth1
GSD exists.
ONS daemon exists.
Listener exists.


Display the configuration for the ASM instance(s)

$ srvctl config asm -n linux1
+ASM1 /u01/app/oracle/product/10.2.0/db_1


All running instances in the cluster

SELECT
    inst_id
  , instance_number inst_no
  , instance_name inst_name
  , parallel
  , status
  , database_status db_status
  , active_state state
  , host_name host
FROM gv$instance
ORDER BY inst_id;

 INST_ID  INST_NO INST_NAME  PAR STATUS  DB_STATUS    STATE     HOST
-------- -------- ---------- --- ------- ------------ --------- -------
       1        1 orcl1      YES OPEN    ACTIVE       NORMAL    linux1
       2        2 orcl2      YES OPEN    ACTIVE       NORMAL    linux2


All data files which are in the disk group

select name from v$datafile
union
select member from v$logfile
union
select name from v$controlfile
union
select name from v$tempfile;

NAME
-------------------------------------------
+FLASH_RECOVERY_AREA/orcl/controlfile/current.258.570913191
+FLASH_RECOVERY_AREA/orcl/onlinelog/group_1.257.570913201
+FLASH_RECOVERY_AREA/orcl/onlinelog/group_2.256.570913211
+FLASH_RECOVERY_AREA/orcl/onlinelog/group_3.259.570918285
+FLASH_RECOVERY_AREA/orcl/onlinelog/group_4.260.570918295
+ORCL_DATA1/orcl/controlfile/current.259.570913189
+ORCL_DATA1/orcl/datafile/example.257.570913311
+ORCL_DATA1/orcl/datafile/indx.270.570920045
+ORCL_DATA1/orcl/datafile/sysaux.260.570913287
+ORCL_DATA1/orcl/datafile/system.262.570913215
+ORCL_DATA1/orcl/datafile/undotbs1.261.570913263
+ORCL_DATA1/orcl/datafile/undotbs1.271.570920865
+ORCL_DATA1/orcl/datafile/undotbs2.265.570913331
+ORCL_DATA1/orcl/datafile/undotbs2.272.570921065
+ORCL_DATA1/orcl/datafile/users.264.570913355
+ORCL_DATA1/orcl/datafile/users.269.570919829
+ORCL_DATA1/orcl/onlinelog/group_1.256.570913195
+ORCL_DATA1/orcl/onlinelog/group_2.263.570913205
+ORCL_DATA1/orcl/onlinelog/group_3.266.570918279
+ORCL_DATA1/orcl/onlinelog/group_4.267.570918289
+ORCL_DATA1/orcl/tempfile/temp.258.570913303

21 rows selected.


All ASM disk that belong to the 'ORCL_DATA1' disk group

SELECT path
FROM   v$asm_disk
WHERE  group_number IN (select group_number
                        from v$asm_diskgroup
                        where name = 'ORCL_DATA1');

PATH
----------------------------------
ORCL:VOL1
ORCL:VOL2



Starting / Stopping the Cluster

At this point, everything has been installed and configured for Oracle RAC 10g. We have all of the required software installed and configured plus we have a fully functional clustered database.

With all of the work we have done up to this point, a popular question might be, "How do we start and stop services?". If you have followed the instructions in this article, all services should start automatically on each reboot of the Linux nodes. This would include Oracle Clusterware, all Oracle instances, Enterprise Manager Database Console, etc.

There are times, however, when you might want to shutdown a node and manually start it back up. Or you may find that Enterprise Manager is not running and need to start it. This section provides the commands responsible for starting and stopping the cluster environment.

Ensure that you are logged in as the "oracle" UNIX user. I will be running all of the commands in this section from linux1:

# su - oracle

$ hostname
linux1


Stopping the Oracle RAC 10g Environment

The first step is to stop the Oracle instance. Once the instance (and related services) is down, then bring down the ASM instance. Finally, shutdown the node applications (Virtual IP, GSD, TNS Listener, and ONS).
$ export ORACLE_SID=orcl1
$ emctl stop dbconsole
$ srvctl stop instance -d orcl -i orcl1
$ srvctl stop asm -n linux1
$ srvctl stop nodeapps -n linux1


Starting the Oracle RAC 10g Environment

The first step is to start the node applications (Virtual IP, GSD, TNS Listener, and ONS). Once the node applications are successfully started, then bring up the ASM instance. Finally, bring up the Oracle instance (and related services) and the Enterprise Manager Database console.
$ export ORACLE_SID=orcl1
$ srvctl start nodeapps -n linux1
$ srvctl start asm -n linux1
$ srvctl start instance -d orcl -i orcl1
$ emctl start dbconsole


Start / Stop All Instances with SRVCTL

Start / Stop all of the instances and its enabled services. I just included this for fun as a way to bring down all instances!
$ srvctl start database -d orcl

$ srvctl stop database -d orcl



Transparent Application Failover - (TAF)


Overview

It is not uncommon for businesses of today to demand 99.99% or even 99.999% availability for their enterprise applications. Think about what it would take to ensure a downtime of no more than .5 hours or even no downtime during the year. To answer many of these high availability requirements, businesses are investing in mechanisms that provide for automatic failover when one participating system fails. When considering the availability of the Oracle database, Oracle RAC 10g provides a superior solution with its advanced failover mechanisms. Oracle RAC 10g includes the required components that all work within a clustered configuration responsible for providing continuous availability - when one of the participating systems fail within the cluster, the users are automatically migrated to the other available systems.

A major component of Oracle RAC 10g that is responsible for failover processing is the Transparent Application Failover (TAF) option. All database connections (and processes) that loose connections are reconnected to another node within the cluster. The failover is completely transparent to the user.

This final section provides a short demonstration on how automatic failover works in Oracle RAC 10g. Please note that a complete discussion on failover in Oracle RAC 10g would be an article in of its own. My intention here is to present a brief overview and example of how it works.

One important note before continuing is that TAF happens automatically within the OCI libraries. This means that your application (client) code does not need to change in order to take advantage of TAF. Certain configuration steps, however, will need to be done on the Oracle TNS file tnsnames.ora.

  Keep in mind that at the time of this article, using the Java thin client will not be able to participate in TAF since it never reads the tnsnames.ora file.


Setup tnsnames.ora File

Before demonstrating TAF, we need to verify that a valid entry exists in the tnsnames.ora file on a non-RAC client machine (if you have a Windows machine lying around). Ensure that you have Oracle RDBMS software installed. (Actually, you only need a client install of the Oracle software.)

During the creation of the clustered database in this article, I created a new service that will be used for testing TAF named ORCL_TAF. It provides all of the necessary configuration parameters for load balancing and failover. You can copy the contents of this entry to the %ORACLE_HOME%\network\admin\tnsnames.ora file on the client machine (my Windows laptop is being used in this example) in order to connect to the new Oracle clustered database:

tnsnames.ora File Entry for Clustered Database
...
ORCL_TAF =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = linux1-vip)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = linux2-vip)(PORT = 1521))
    (LOAD_BALANCE = yes)
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = orcl_taf.idevelopment.info)
      (FAILOVER_MODE =
        (TYPE = SELECT)
        (METHOD = BASIC)
        (RETRIES = 180)
        (DELAY = 5)
      )
    )
  )
...


SQL Query to Check the Session's Failover Information

The following SQL query can be used to check a session's failover type, failover method, and if a failover has occurred. We will be using this query throughout this example.
COLUMN instance_name    FORMAT a13
COLUMN host_name        FORMAT a9
COLUMN failover_method  FORMAT a15
COLUMN failed_over      FORMAT a11

SELECT
    instance_name
  , host_name
  , NULL AS failover_type
  , NULL AS failover_method
  , NULL AS failed_over
FROM v$instance
UNION
SELECT
    NULL
  , NULL
  , failover_type
  , failover_method
  , failed_over
FROM v$session
WHERE username = 'SYSTEM';

Transparent Application Failover Demonstration

From a Windows machine (or other non-RAC client machine), login to the clustered database using the orcl_taf service as the SYSTEM user:
C:\> sqlplus system/manager@orcl_taf

COLUMN instance_name    FORMAT a13
COLUMN host_name        FORMAT a9
COLUMN failover_method  FORMAT a15
COLUMN failed_over      FORMAT a11

SELECT
    instance_name
  , host_name
  , NULL AS failover_type
  , NULL AS failover_method
  , NULL AS failed_over
FROM v$instance
UNION
SELECT
    NULL
  , NULL
  , failover_type
  , failover_method
  , failed_over
FROM v$session
WHERE username = 'SYSTEM';


INSTANCE_NAME HOST_NAME FAILOVER_TYPE FAILOVER_METHOD FAILED_OVER
------------- --------- ------------- --------------- -----------
orcl1         linux1
                        SELECT        BASIC           NO
DO NOT logout of the above SQL*Plus session! Now that we have run the query (above), we should now shutdown the instance orcl1 on linux1 using the abort option. To perform this operation, we can use the srvctl command-line utility as follows:
# su - oracle
$ srvctl status database -d orcl
Instance orcl1 is running on node linux1
Instance orcl2 is running on node linux2

$ srvctl stop instance -d orcl -i orcl1 -o abort

$ srvctl status database -d orcl
Instance orcl1 is not running on node linux1
Instance orcl2 is running on node linux2
Now let's go back to our SQL session and rerun the SQL statement in the buffer:
COLUMN instance_name    FORMAT a13
COLUMN host_name        FORMAT a9
COLUMN failover_method  FORMAT a15
COLUMN failed_over      FORMAT a11

SELECT
    instance_name
  , host_name
  , NULL AS failover_type
  , NULL AS failover_method
  , NULL AS failed_over
FROM v$instance
UNION
SELECT
    NULL
  , NULL
  , failover_type
  , failover_method
  , failed_over
FROM v$session
WHERE username = 'SYSTEM';


INSTANCE_NAME HOST_NAME FAILOVER_TYPE FAILOVER_METHOD FAILED_OVER
------------- --------- ------------- --------------- -----------
orcl2         linux2
                        SELECT        BASIC           YES

SQL> exit
From the above demonstration, we can see that the above session has now been failed over to instance orcl2 on linux2.



Troubleshooting

Confirm the RAC Node Name is Not Listed in Loopback Address
Ensure that the node names (linux1 or linux2) are not included for the loopback address in the /etc/hosts file. If the machine name is listed in the in the loopback address entry as below:
    127.0.0.1        linux1 localhost.localdomain localhost
it will need to be removed as shown below:
    127.0.0.1        localhost.localdomain localhost

  If the RAC node name is listed for the loopback address, you will receive the following error during the RAC installation:
ORA-00603: ORACLE server session terminated by fatal error
or
ORA-29702: error occurred in Cluster Group Service operation


Confirm localhost is defined in the /etc/hosts file for the loopback address

Ensure that the entry for localhost.localdomain and localhost are included for the loopback address in the /etc/hosts file for each of the Oracle RAC nodes:
    127.0.0.1        localhost.localdomain localhost

  If an entry does not exist for localhost in the /etc/hosts file, Oracle Clusterware will be unable to start the application resources — notably the ONS process. The error would indicate "Failed to get IP for localhost" and will be written to the log file for ONS. For example:
CRS-0215 could not start resource 'ora.linux1.ons'. Check log file
"/u01/app/crs/log/linux1/racg/ora.linux1.ons.log"
for more details.
The ONS log file will contain lines similar to the following:

Oracle Database 10g CRS Release 10.2.0.1.0 Production Copyright 1996, 2005 Oracle. All rights reserved.
2007-04-14 13:10:02.729: [ RACG][3086871296][13316][3086871296][ora.linux1.ons]: Failed to get IP for localhost (1)
Failed to get IP for localhost (1)
Failed to get IP for localhost (1)
onsctl: ons failed to start
...


Setting the Correct Date and Time on Both Oracle RAC Nodes

During the installation of Oracle Clusterware, the Database, and the Companion CD, the Oracle Universal Installer (OUI) first installs the software to the local node running the installer (i.e. linux1). The software is then copied remotely to all of the remaining nodes in the cluster (i.e. linux2). During the remote copy process, the OUI will execute the UNIX "tar" command on each of the remote nodes to extract the files that were archived and copied over. If the date and time on the node performing the install is greater than that of the node it is copying to, the OUI will throw an error from the "tar" command indicating it is attempting to extract files stamped with a time in the future:

Error while copying directory 
    /u01/app/crs with exclude file list 'null' to nodes 'linux2'.
[PRKC-1002 : All the submitted commands did not execute successfully]
---------------------------------------------
linux2:
   /bin/tar: ./bin/lsnodes: time stamp 2006-09-13 09:21:34 is 735 s in the future
   /bin/tar: ./bin/olsnodes: time stamp 2006-09-13 09:21:34 is 735 s in the future
   ...(more errors on this node)

Please note that although this would seem like a severe error from the OUI, it can safely be disregarded as a warning. The "tar" command DOES actually extract the files; however, when you perform a listing of the files (using ls -l) on the remote node, they will be missing the time field until the time on the server is greater than the timestamp of the file.

Before starting any of the above noted installations, ensure that each member node of the cluster is set as closely as possible to the same date and time. Oracle strongly recommends using the Network Time Protocol feature of most operating systems for this purpose, with all nodes using the same reference Network Time Protocol server.

Accessing a Network Time Protocol server, however, may not always be an option. In this case, when manually setting the date and time for the nodes in the cluster, ensure that the date and time of the node you are performing the software installations from (linux1) is less than all other nodes in the cluster (linux2). I generally use a 20 second difference as shown in the following example:

Setting the date and time from linux1:

# date -s "9/2/2007 01:12:00"

Setting the date and time from linux2:

# date -s "9/2/2007 01:12:20"

The two-node RAC configuration described in this article does not make use of a Network Time Protocol server.


Openfiler - Logical Volumes Not Active on Boot

One issue that I have run into several times occurs when using a USB drive connected to the Openfiler server. When the Openfiler server is rebooted, the system is able to recognize the USB drive however, it is not able to load the logical volumes and writes the following message to /var/log/messages - (also available through dmesg):
iSCSI Enterprise Target Software - version 0.4.14
iotype_init(91) register fileio
iotype_init(91) register blockio
iotype_init(91) register nullio
open_path(120) Can't open /dev/rac1/crs -2
fileio_attach(268) -2
open_path(120) Can't open /dev/rac1/asm1 -2
fileio_attach(268) -2
open_path(120) Can't open /dev/rac1/asm2 -2
fileio_attach(268) -2
open_path(120) Can't open /dev/rac1/asm3 -2
fileio_attach(268) -2
open_path(120) Can't open /dev/rac1/asm4 -2
fileio_attach(268) -2

  Please note that I am not suggesting that this only occurs with USB drives connected to the Openfiler server. It may occur with other types of drives, however I have only seen it with USB drives!

If you do receive this error, you should first check the status of all logical volumes using the lvscan command from the Openfiler server:

# lvscan
  inactive          '/dev/rac1/crs' [2.00 GB] inherit
  inactive          '/dev/rac1/asm1' [115.94 GB] inherit
  inactive          '/dev/rac1/asm2' [115.94 GB] inherit
  inactive          '/dev/rac1/asm3' [115.94 GB] inherit
  inactive          '/dev/rac1/asm4' [115.94 GB] inherit
Notice that the status for each of the logical volumes is set to inactive - (the status for each logical volume on a working system would be set to ACTIVE).

I currently know of two methods to get Openfiler to automatically load the logical volumes on reboot, both of which are described below.

Method 1

One of the first steps is to shutdown both of the Oracle RAC nodes in the cluster - (linux1 and linux2). Then, from the Openfiler server, manually set each of the logical volumes to ACTIVE for each consecutive reboot:
# lvchange -a y /dev/rac1/crs
# lvchange -a y /dev/rac1/asm1
# lvchange -a y /dev/rac1/asm2
# lvchange -a y /dev/rac1/asm3
# lvchange -a y /dev/rac1/asm4

  Another method to set the status to active for all logical volumes is to use the Volume Group change command as follows:
# vgscan
  Reading all physical volumes.  This may take a while...
  Found volume group "rac1" using metadata type lvm2

# vgchange -ay
  5 logical volume(s) in volume group "rac1" now active

After setting each of the logical volumes to active, use the lvscan command again to verify the status:

# lvscan
  ACTIVE            '/dev/rac1/crs' [2.00 GB] inherit
  ACTIVE            '/dev/rac1/asm1' [115.94 GB] inherit
  ACTIVE            '/dev/rac1/asm2' [115.94 GB] inherit
  ACTIVE            '/dev/rac1/asm3' [115.94 GB] inherit
  ACTIVE            '/dev/rac1/asm4' [115.94 GB] inherit
As a final test, reboot the Openfiler server to ensure each of the logical volumes will be set to ACTIVE after the boot process. After you have verified that each of the logical volumes will be active on boot, check that the iSCSI target service is running:
# service iscsi-target status
ietd (pid 2668) is running...
Finally, restart each of the Oracle RAC nodes in the cluster - (linux1 and linux2).

Method 2

This method was kindly provided by Martin Jones. His workaround includes amending the /etc/rc.sysinit script to basically wait for the USB disk (/dev/sda in my example) to be detected. After making the changes to the /etc/rc.sysinit script (described below), verify the external drives are powered on and then reboot the Openfiler server.

The following is a small portion of the /etc/rc.sysinit script on the Openfiler server with the changes (highlighted in blue) proposed by Martin:

Make Modifications to /etc/rc.sysinit
# LVM2 initialization, take 2
        if [ -c /dev/mapper/control ]; then
                if [ -x /sbin/multipath.static ]; then
                        modprobe dm-multipath >/dev/null 2>&1
                        /sbin/multipath.static -v 0
                        if [ -x /sbin/kpartx ]; then
                                /sbin/dmsetup ls --target multipath --exec "/sbin/kpartx -a"
                        fi
                fi
 

                if [ -x /sbin/dmraid ]; then
                        modprobe dm-mirror > /dev/null 2>&1
                        /sbin/dmraid -i -a y
                fi

#-----
#-----  MJONES - Customisation Start
#-----

       # Check if /dev/sda is ready
         while [ ! -e /dev/sda ]
         do
             echo "Device /dev/sda for first USB Drive is not yet ready."
             echo "Waiting..."
             sleep 5
         done
         echo "INFO - Device /dev/sda for first USB Drive is ready."

#-----
#-----  MJONES - Customisation END
#-----
                if [ -x /sbin/lvm.static ]; then
                        if /sbin/lvm.static vgscan > /dev/null 2>&1 ; then
                                action $"Setting up Logical Volume
Management:" /sbin/lvm.static vgscan --mknodes --ignorelockingfailure &&
/sbin/lvm.static vgchange -a y --ignorelockingfailure
                        fi
                fi
        fi
 

# Clean up SELinux labels
if [ -n "$SELINUX" ]; then
   for file in /etc/mtab /etc/ld.so.cache ; do
      [ -r $file ] && restorecon $file  >/dev/null 2>&1
   done
fi

Finally, restart each of the Oracle RAC nodes in the cluster - (linux1 and linux2).


OCFS2 - Configure O2CB to Start on Boot

With the releases of OCFS2 prior to 1.2.1, there is a bug that exists where the driver does not get loaded on each boot even after configuring the on-boot properties to do so.

  Note that this section does not apply here since the version of OCFS2 used in this article is greater than release 1.2.1.

After attempting to configure the on-boot properties to start on each boot according to the official OCFS2 documentation, you will still get the following error on each boot:

...
Mounting other filesystems:
     mount.ocfs2: Unable to access cluster service

Cannot initialize cluster mount.ocfs2:
    Unable to access cluster service Cannot initialize cluster [FAILED]
...
Red Hat changed the way the service is registered between chkconfig-1.3.11.2-1 and chkconfig-1.3.13.2-1. The O2CB script used to work with the former.

Before attempting to configure the on-boot properties:

After resolving the bug I listed above, we can now continue to set the on-boot properties as follows:
# /etc/init.d/o2cb offline ocfs2
# /etc/init.d/o2cb unload
# /etc/init.d/o2cb configure
Configuring the O2CB driver.

This will configure the on-boot properties of the O2CB driver.
The following questions will determine whether the driver is loaded on
boot.  The current values will be shown in brackets ('[]').  Hitting
<ENTER> without typing an answer will keep that current value.  Ctrl-C
will abort.

Load O2CB driver on boot (y/n) [n]: y
Cluster to start on boot (Enter "none" to clear) [ocfs2]: ocfs2
Writing O2CB configuration: OK
Loading module "configfs": OK
Mounting configfs filesystem at /config: OK
Loading module "ocfs2_nodemanager": OK
Loading module "ocfs2_dlm": OK
Loading module "ocfs2_dlmfs": OK
Mounting ocfs2_dlmfs filesystem at /dlm: OK
Starting cluster ocfs2: OK


OCFS2 - o2cb_ctl: Unable to access cluster service while creating node

While configuring the nodes for OCFS2 using ocfs2console, it is possible to run into the error:

o2cb_ctl: Unable to access cluster service while creating node

This error does not show up when you startup ocfs2console for the first time. This message comes up when there is a problem with the cluster configuration or if you do not save the cluster configuration initially while setting it up using ocfs2console. This is a bug!

The work-around is to exit from the ocfs2console, unload the o2cb module and remove the ocfs2 cluster configuration file /etc/ocfs2/cluster.conf. I also like to remove the /config directory. After removing the ocfs2 cluster configuration file, restart the ocfs2console program.

For example:

# /etc/init.d/o2cb offline ocfs2
# /etc/init.d/o2cb unload
Unmounting ocfs2_dlmfs filesystem: OK
Unloading module "ocfs2_dlmfs": OK
Unmounting configfs filesystem: OK
Unloading module "configfs": OK

# rm -f /etc/ocfs2/cluster.conf
# rm -rf /config

# ocfs2console &

This time, it will add the nodes!


OCFS2 - Adjusting the O2CB Heartbeat Threshold

With previous versions of this article, (using FireWire as opposed to iSCSI for the shared storage), I was able to install and configure OCFS2, format the new volume, and finally install Oracle Clusterware (with its two required shared files; the voting disk and OCR file), located on the new OCFS2 volume. While I was able to install Oracle Clusterware and see the shared drive using FireWire, however, I was receiving many lock-ups and hanging after about 15 minutes when the Clusterware software was running on both nodes. It always varied on which node would hang (either linux1 or linux2 in my example). It also didn't matter whether there was a high I/O load or none at all for it to crash (hang).

After looking through the trace files for OCFS2, it was apparent that access to the voting disk was too slow (exceeding the O2CB heartbeat threshold) and causing the Oracle Clusterware software (and the node) to crash. On the console would be a message similar to the following:

...
Index 0: took 0 ms to do submit_bio for read
Index 1: took 3 ms to do waiting for read completion
Index 2: took 0 ms to do bio alloc write
Index 3: took 0 ms to do bio add page write
Index 4: took 0 ms to do submit_bio for write
Index 5: took 0 ms to do checking slots
Index 6: took 4 ms to do waiting for write completion
Index 7: took 1993 ms to do msleep
Index 8: took 0 ms to do allocating bios for read
Index 9: took 0 ms to do bio alloc read
Index 10: took 0 ms to do bio add page read
Index 11: took 0 ms to do submit_bio for read
Index 12: took 10006 ms to do waiting for read completion
(13,3):o2hb_stop_all_regions:1888 ERROR: stopping heartbeat on all active regions.
Kernel panic - not syncing: ocfs2 is very sorry to be fencing this system by panicing

The solution I used was to increase the O2CB heartbeat threshold from its default value of 7, to 61. Some setups may require an even higher setting. This is a configurable parameter that is used to compute the time it takes for a node to "fence" itself. During the installation and configuration of OCFS2, we adjusted this value in the section "Configure O2CB to Start on Boot and Adjust O2CB Heartbeat Threshold". If you encounter a kernel panic from OCFS2 and need to increase the heartbeat threshold, use the same procedures described in the section "Configure O2CB to Start on Boot and Adjust O2CB Heartbeat Threshold". If you are using an earlier version of OCFS2 tools (prior to ocfs2-tools release 1.2.2-1), the following describes how to manually adjust the O2CB heartbeat threshold.

First, let's see how to determine what the O2CB heartbeat threshold is currently set to. This can be done by querying the /proc file system as follows:

# cat /proc/fs/ocfs2_nodemanager/hb_dead_threshold
7
We see that the value is 7, but what does this value represent? Well, it is used in the formula below to determine the fence time (in seconds):
[fence time in seconds] = (O2CB_HEARTBEAT_THRESHOLD - 1) * 2
So, with an O2CB heartbeat threshold of 7, we would have a fence time of:
(7 - 1) * 2 = 12 seconds
If we want a larger threshold (say 120 seconds), we would need to adjust O2CB_HEARTBEAT_THRESHOLD to 61 as shown below:
(61 - 1) * 2 = 120 seconds

Let's see now how to manually increase the O2CB heartbeat threshold from 7 to 61. This task will need to be performed on all Oracle RAC nodes in the cluster. We first need to modify the file /etc/sysconfig/o2cb and set O2CB_HEARTBEAT_THRESHOLD to 61:

/etc/sysconfig/o2cb
#
# This is a configuration file for automatic startup of the O2CB
# driver.  It is generated by running /etc/init.d/o2cb configure.
# Please use that method to modify this file
#

# O2CB_ENABELED: 'true' means to load the driver on boot.
O2CB_ENABLED=true

# O2CB_BOOTCLUSTER: If not empty, the name of a cluster to start.
O2CB_BOOTCLUSTER=ocfs2

# O2CB_HEARTBEAT_THRESHOLD: Iterations before a node is considered dead.
O2CB_HEARTBEAT_THRESHOLD=61

# O2CB_IDLE_TIMEOUT_MS: Time in ms before a network connection is considered dead.
O2CB_IDLE_TIMEOUT_MS=

# O2CB_KEEPALIVE_DELAY_MS: Max time in ms before a keepalive packet is sent
O2CB_KEEPALIVE_DELAY_MS=

# O2CB_RECONNECT_DELAY_MS: Min time in ms between connection attempts
O2CB_RECONNECT_DELAY_MS=

After modifying the file /etc/sysconfig/o2cb, we need to alter the o2cb configuration. Again, this should be performed on all Oracle RAC nodes in the cluster.

# umount /u02
# /etc/init.d/o2cb offline ocfs2
# /etc/init.d/o2cb unload
# /etc/init.d/o2cb configure
Configuring the O2CB driver.

This will configure the on-boot properties of the O2CB driver.
The following questions will determine whether the driver is loaded on
boot.  The current values will be shown in brackets ('[]').  Hitting
<ENTER> without typing an answer will keep that current value.  Ctrl-C
will abort.

Load O2CB driver on boot (y/n) [n]: y
Cluster to start on boot (Enter "none" to clear) [ocfs2]: ocfs2
Specify heartbeat dead threshold (>=7) [7]: 61
Specify network idle timeout in ms (>=5000) [10000]: 10000
Specify network keepalive delay in ms (>=1000) [5000]: 5000
Specify network reconnect delay in ms (>=2000) [2000]: 2000
Writing O2CB configuration: OK
Loading module "configfs": OK
Mounting configfs filesystem at /config: OK
Loading module "ocfs2_nodemanager": OK
Loading module "ocfs2_dlm": OK
Loading module "ocfs2_dlmfs": OK
Mounting ocfs2_dlmfs filesystem at /dlm: OK
Starting O2CB cluster ocfs2: OK
We can now check again to make sure the settings took place in for the o2cb cluster stack:
# cat /proc/fs/ocfs2_nodemanager/hb_dead_threshold
61

  It is important to note that the value of 61 I used for the O2CB heartbeat threshold will not work for all configurations. In some cases, the O2CB heartbeat threshold value had to be increased to as high as 601 in order to prevent OCFS2 from panicking the kernel.



Conclusion

Oracle RAC 10g allows the DBA to configure a database solution with superior fault tolerance and load balancing. For those DBAs, however, that want to become more familiar with the features and benefits of Oracle RAC 10g will find the costs of configuring even a small RAC cluster costing in the range of US$15,000 to US$20,000.

This article has hopefully given you an economical solution to setting up and configuring an inexpensive Oracle RAC 10g Release 2 Cluster using CentOS 4 Linux (or Red Hat Enterprise Linux 4) and iSCSI technology. The RAC solution presented in this article can be put together for around US$2,200 and will provide the DBA with a fully functional Oracle RAC 10g Release 2 cluster. While the hardware used for this article should be stable enough for educational purposes, it should never be considered for a production environment.



Building an Oracle RAC Cluster Remotely

An iDevelopment.info reader is now offering 3 computers that will allow you to REMOTELY build an Oracle RAC Cluster from scratch using a VNC enabled KVM. The cost is $14 for 7 days full access. Please email him at "Bryan" AT "AHCCINC" DOT "COM" with the words "RAC LAB" as the subject. If the lab is available, you can start building your RAC now!



Acknowledgements

An article of this magnitude and complexity is generally not the work of one person alone. Although I was able to author and successfully demonstrate the validity of the components that make up this configuration, there are several other individuals that deserve credit in making this article a success.

First, I would like to thank Bane Radulovic from the Server BDE Team at Oracle Corporation. Bane not only introduced me to Openfiler, but shared with me his experience and knowledge of the product and how to best utilize it for Oracle RAC 10g. His research and hard work made the task of configuring Openfiler seamless. Bane was also involved with hardware recommendations and testing.

I would next like to thank Oracle ACE Werner Puschitz for his outstanding work on "Installing Oracle Database 10g with Real Application Cluster (RAC) on Red Hat Enterprise Linux Advanced Server 3". This article, along with several others of his, provided information on Oracle RAC 10g that could not be found in any other Oracle documentation. Without his hard work and research into issues like configuring and installing the hangcheck-timer kernel module, properly configuring UNIX shared memory, and configuring OCFS2 and ASMLib, this article may have never come to fruition. If you are interested in examining technical articles on Linux internals and in-depth Oracle configurations written by Werner Puschitz, please visit his website at www.puschitz.com.

Also, thanks to Tzvika Lemel for his comments and suggestions on using Oracle's Cluster Verification Utility (CVU).

Lastly, I would like to express my appreciation to the following vendors for generously supplying the hardware for this article; Stallard Technologies, Inc., Maxtor, Avocent Corporation, Intel, D-Link, SIIG, and LaCie.



About the Author

Jeffrey Hunter is an Oracle Certified Professional, Java Development Certified Professional, Author, and an Oracle ACE. Jeff currently works as a Senior Database Administrator for The DBA Zone, Inc. located in Pittsburgh, Pennsylvania. His work includes advanced performance tuning, Java and PL/SQL programming, developing high availability solutions, capacity planning, database security, and physical / logical database design in a UNIX / Linux server environment. Jeff's other interests include mathematical encryption theory, tutoring advanced mathematics, programming language processors (compilers and interpreters) in Java and C, LDAP, writing web-based database administration tools, and of course Linux. He has been a Sr. Database Administrator and Software Engineer for over 20 years and maintains his own website site at: http://www.iDevelopment.info. Jeff graduated from Stanislaus State University in Turlock, California, with a Bachelor's degree in Computer Science and Mathematics.



Copyright (c) 1998-2014 Jeffrey M. Hunter. All rights reserved.

All articles, scripts and material located at the Internet address of http://www.idevelopment.info is the copyright of Jeffrey M. Hunter and is protected under copyright laws of the United States. This document may not be hosted on any other site without my express, prior, written permission. Application to host any of the material elsewhere can be made by contacting me at jhunter@idevelopment.info.

I have made every effort and taken great care in making sure that the material included on my web site is technically accurate, but I disclaim any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on it. I will in no case be liable for any monetary damages arising from such loss, damage or destruction.

Last modified on
Monday, 14-Jul-2014 18:11:45 EDT
Page Count: 86656