Oracle DBA Tips Corner

     Return to the Oracle DBA Tips Corner.


Building an Inexpensive Oracle RAC 9i on Linux - (Fedora Core 1 / FireWire)

by Jeff Hunter, Sr. Database Administrator


Contents

  1. Overview
  2. Oracle9i Real Application Cluster (RAC) Introduction
  3. What software is necessary for RAC?
  4. Shared-Storage Overview
  5. FireWire Technology
  6. Hardware & Costs
  7. A Brief Walk Through the Process
  8. Why Fedora Core 1 and not Fedora Core 2?
  9. Install Red Hat Linux - (Fedora Core 1)
  10. Network Configuration
  11. Obtaining and Installing a proper Linux Kernel
  12. Create "oracle" User and Directories
  13. Creating Partitions on the Shared FireWire Storage Device
  14. Create RAW Bindings
  15. Create Symbolic Links From RAW Volumes
  16. Configuring the Linux Servers
  17. Configuring the "hangcheck-timer" Kernel Module
  18. Configuring RAC Nodes for Remote Access
  19. All Startup Commands for Each RAC Node
  20. Update Red Hat Linux System - (Oracle Metalink Note: 252217.1)
  21. Downloading / Unpacking the Oracle9i Installation Files
  22. Installing Oracle9i Cluster Manager
  23. Installing Oracle9i RAC
  24. Creating TNS Networking Files
  25. Creating the Oracle Database
  26. Verifying the RAC Cluster / Database Configuration
  27. Altering Datafile Sizes
  28. Starting & Stopping the Cluster
  29. Transparent Application Failover - (TAF)
  30. Conclusion
  31. Acknowledgements
  32. About the Author



Overview

One of the most efficient ways to become familiar with Oracle9i Real Application Cluster (RAC) technology is to have access to an actual Oracle9i RAC cluster. In learning this new technology, you will soon start to realize the benefits Oracle9i RAC has to offer like fault tolerance, new levels of security, load balancing, and the ease of upgrading capacity. The problem though is the price of the hardware required for a typical production RAC configuration. A small two node cluster, for example, could run anywhere from $10,000 to well over $20,000. This would not even include the heart of a production RAC environment, the shared storage. In most cases, this would be a Storage Area Network (SAN), which generally start at $15,000.

For those who simply want to become familiar with Oracle9i RAC, this article provides a low cost alternative to configure an Oracle9i RAC system using commercial off the shelf components and downloadable software. The estimated cost for this configuration could be anywhere from $1000 to $1500. The system will consist of a dual node cluster, both running Linux (Red Hat Linux - Fedora Core 1 in this example) with a shared disk array based on IEEE1394 (FireWire) drive technology.

NOTE: This article is only designed to work as documented with absolutely no substitutions. If you are looking for an example that takes advantage of 10g, please see:
     Building an Inexpensive Oracle10g Release 1 RAC on Linux - (RHEL 3.0 / FireWire)
     Building an Inexpensive Oracle10g Release 2 RAC on Linux - (RHEL 4.2 / FireWire)
     Building an Inexpensive Oracle10g Release 2 RAC on Linux - (RHEL 4.4 / iSCSI)

Please note, that this is not the only way to build a low cost Oracle9i RAC system. I have seen other solutions that utilize an implementation based on SCSI rather than FireWire for shared storage. In most cases, SCSI will cost more than our FireWire solution where a typical SCSI card is priced around $70 and an 80GB external SCSI drive will cost around $700-$1000. Keep in mind that some motherboards may already include built-in SCSI controllers.

It is important to note that this configuration should NEVER be considered to run in a production environment. In a production environment, Fibre Channel is the technology of choice, since it is the high-speed serial-transfer interface that can connect systems and storage devices in either point-to-point or switched topologies. FireWire is able to offer a low-cost alternative to Fibre Channel for testing and development, but it is not ready for production.



Oracle9i Real Application Cluster (RAC) Introduction

Oracle Real Application Cluster (RAC) is the successor to Oracle Parallel Server (OPS). RAC allows multiple instances to access the same database (storage) simultaneously. RAC provides fault tolerance, load balancing, and performance benefits by allowing the system to scale out, and at the same time since all nodes access the same database, the failure of one instance will not cause the loss of access to the database.

At the heart of Oracle9i RAC is a shared disk subsystem. All nodes in the cluster must be able to access all of the data, redo log files, control files and parameter files for all nodes in the cluster. The data disks must be globally available in order to allow all nodes to access the database. Each node has its own redo log file(s) and UNDO tablespace, but the other nodes must be able to access them (and the shared control file) in order to recover that node in the event of a system failure.

Not all clustering solutions use shared storage. Some vendors use an approach known as a federated cluster, in which data is spread across several machines rather than shared by all. With Oracle9i RAC, however, multiple nodes use the same set of disks for storing data. With Oracle9i RAC, the data, redo log, control, and archived log files reside on shared storage on raw-disk devices or on a clustered file system. Oracle's approach to clustering leverages the collective processing power of all the nodes in the cluster and at the same time provides failover security.

Although it is not absolutely necessary, Oracle recommendeds that you install the Oracle Cluster File System (OCFS). OCFS makes disk management much easier for you by creating the same file system on all the nodes. This isn't necessary, but without OCFS, you will have to make all partitions manually.

NOTE: This article does not go into the details of installing or utilizing OCFS, but rather uses all manual methods for creating partitions and binding raw devices to those partitions.

One of the main reasons why I do not use the Oracle Cluster File System for Red Hat Linux is that OCFS comes in the form of RPMs. All of the RPM modules and the precompiled modules are tied to the Red Hat Advanced Server ($1,200) kernel-naming standard and will not load in the supplied 2.4.20 linked kernel.

The biggest difference between Oracle9i RAC and OPS is the addition of Cache Fusion. With OPS a request for data from one node to another required the data to be written to disk first, then the requesting node can read that data. With cache fusion, data is passed along with locks.

Pre-configured Oracle9i RAC solutions are available from vendors such as Dell, IBM and HP for production environments. This article, however, focuses on putting together your own Oracle9i RAC environment for development and testing by using Linux servers and a low cost shared disk solution; FireWire.



What software is necessary for RAC? Does it have a separate installation CD to order?

Real Application Clusters is contained within the Oracle9i Enterprise Edition. If you install Oracle9i Enterprise Edition onto a cluster, and the Oracle Universal Installer (OUI) recognizes the cluster, you will be provided the option of installing RAC. Most UNIX platforms require an OSD installation for the necessary clusterware. For Intel platforms (Linux and Windows), Oracle provides the OSD software within the Oracle9i Enterprise Edition release.



Shared-Storage Overview

Today, fibre channel is one of the most popular solutions for shared storage. As mentioned earlier, fibre channel is a high-speed serial-transfer interface that is used to connect systems and storage devices in either point-to-point or switched topologies. Protocols supported by fibre channel include SCSI and IP. Fibre channel configurations can support as many as 127 nodes and have a throughput of up to 2.12 gigabits per second. Fibre channel, although, is very expensive. Just the fibre channel switch alone can run as much as $1000. This does not even include the fibre channel storage array and high-end drives, which can reach prices of about $300 for a 36BG drive. A typical fibre channel setup which includes fibre channel cards for the servers, a basic setup is roughly $5,000, which does not include the cost of the servers that make up the cluster.

A less expensive alternative to fibre channel is SCSI. SCSI technology provides acceptable performance for shared storage, but for administrators and developers who are used to GPL-based Linux prices, even SCSI can come in over budget, at around $1,000 to $2,000 for a two-node cluster.

Another popular solution is the Sun NFS (Network File System). It can be used for shared storage but only if you are using a network appliance or something similar. Specifically, you need servers that guarantee direct I/O over NFS.



FireWire Technology

Developed by Apple Computer and Texas Instruments, FireWire is a cross-platform implementation of a high-speed serial data bus. With its high bandwidth, long distances (up to 100 meters in length) and high-powered bus, FireWire is being used in applications such as digital video (DV), professional audio, hard drives, high-end digital still cameras and home entertainment devices. Today, FireWire operates at transfer rates of up to 800 megabits per second while next generation FireWire calls for speeds to a theoretical bit rate to 1600 Mbps and then up to a staggering 3200 Mbps. That's 3.2 gigabits per second. This will make FireWire indispensable for transferring massive data files and for even the most demanding video applications, such as working with uncompressed high-definition (HD) video or multiple standard-definition (SD) video streams.

The following chart shows speed comparisons of the various types of disk interface. For each interface, I provide the maximum transfer rates in kilobits (kb), kilobytes (KB), megabits (Mb), megabytes (MB), gigabits (Gb), and gigabytes (GB) per second. As you can see, the capabilities of IEEE1394 compare very favorably with other disk interface and network technologies that are currently available today.

Disk Interface / Network / BUS Speed
Kb KB Mb MB Gb GB
Serial 115 14.375 0.115 0.014    
Parallel (standard) 920 115 0.92 0.115    
10Base-T Ethernet     10 1.25    
IEEE 802.11b wireless Wi-Fi (2.4 GHz band)     11 1.375    
USB 1.1     12 1.5    
Parallel (ECP/EPP)     24 3    
SCSI-1     40 5    
IEEE 802.11g wireless WLAN (2.4 GHz band)     54 6.75    
SCSI-2 (Fast SCSI / Fast Narrow SCSI)     80 10    
100Base-T Ethernet (Fast Ethernet)     100 12.5    
ATA/100 (parallel)     100 12.5    
IDE     133.6 16.7    
Fast Wide SCSI (Wide SCSI)     160 20    
Ultra SCSI (SCSI-3 / Fast-20 / Ultra Narrow)     160 20    
Ultra IDE     264 33    
Wide Ultra SCSI (Fast Wide 20)     320 40    
Ultra2 SCSI     320 40    
FireWire 400 - (IEEE1394a)     400 50    
USB 2.0     480 60    
Wide Ultra2 SCSI     640 80    
Ultra3 SCSI     640 80    
FireWire 800 - (IEEE1394b)     800 100    
Gigabit Ethernet     1000 125 1  
PCI - (33 MHz / 32-bit)     1064 133 1.064  
Serial ATA I - (SATA I)     1200 150 1.2  
Wide Ultra3 SCSI     1280 160 1.28  
Ultra160 SCSI     1280 160 1.28  
PCI - (33 MHz / 64-bit)     2128 266 2.128  
PCI - (66 MHz / 32-bit)     2128 266 2.128  
AGP 1x - (66 MHz / 32-bit)     2128 266 2.128  
Serial ATA II - (SATA II)     2400 300 2.4  
Ultra320 SCSI     2560 320 2.56  
FC-AL Fibre Channel     3200 400 3.2  
PCI-Express x1 - (bidirectional)     4000 500 4  
PCI - (66 MHz / 64-bit)     4256 532 4.256  
AGP 2x - (133 MHz / 32-bit)     4264 533 4.264  
Serial ATA III - (SATA III)     4800 600 4.8  
PCI-X - (100 MHz / 64-bit)     6400 800 6.4  
PCI-X - (133 MHz / 64-bit)       1064 8.512 1
AGP 4x - (266 MHz / 32-bit)       1066 8.528 1
10G Ethernet - (IEEE 802.3ae)       1250 10 1.25
PCI-Express x4 - (bidirectional)       2000 16 2
AGP 8x - (533 MHz / 32-bit)       2133 17.064 2.1
PCI-Express x8 - (bidirectional)       4000 32 4
PCI-Express x16 - (bidirectional)       8000 64 8



Hardware & Costs

The hardware used to build our example Oracle9i RAC environment consists of two Linux servers and components that can be purchased at any local computer store or over the Internet.

Server 1 - (linux1)
Dimension 2400 Series
     - Intel Pentium 4 Processor at 2.80GHz
     - 1GB DDR SDRAM (at 333MHz)
     - 40GB 7200 RPM Internal Hard Drive
     - Integrated Intel 3D AGP Graphics
     - Integrated 10/100 Ethernet - (Broadcom BCM4401)
     - CDROM (48X Max Variable)
     - 3.5" Floppy
     - No monitor (Already had one)
     - USB Mouse and Keyboard
$620
1 - Ethernet LAN Cards

       Linksys 10/100 Mpbs - (LNE100TX) - (Used for Interconnect to linux2)

  Each Linux server should contain two NIC adapters. The Dell Dimension includes an integrated 10/100 Ethernet adapter that will be used to connect to the public network. The second NIC adapter will be used for the private interconnect.

$20
1 - FireWire Card

The following is a list of FireWire I/O cards that contain the correct chipset, allow for multiple logins, and should work with this article (no guarantees however):

       Belkin FireWire 3-Port 1394 PCI Card - (F5U501-APL)
       SIIG 3-Port 1394 I/O Card - (NN-300012)
       StarTech 4 Port IEEE-1394 PCI Firewire Card - (PCI1394_4)
       Adaptec FireConnect 4300 FireWire PCI Card - (1890600)

  FireWire I/O cards with chipsets made by VIA or TI are known to work.

$30
Server 2 - (linux2)
Dimension 2400 Series
     - Intel Pentium 4 Processor at 2.80GHz
     - 1GB DDR SDRAM (at 333MHz)
     - 40GB 7200 RPM Internal Hard Drive
     - Integrated Intel 3D AGP Graphics
     - Integrated 10/100 Ethernet - (Broadcom BCM4401)
     - CDROM (48X Max Variable)
     - 3.5" Floppy
     - No monitor (Already had one)
     - USB Mouse and Keyboard
$620
1 - Ethernet LAN Cards

       Linksys 10/100 Mpbs - (LNE100TX) - (Used for Interconnect to linux1)

  Each Linux server should contain two NIC adapters. The Dell Dimension includes an integrated 10/100 Ethernet adapter that will be used to connect to the public network. The second NIC adapter will be used for the private interconnect.

$20
1 - FireWire Card

The following is a list of FireWire I/O cards that contain the correct chipset, allow for multiple logins, and should work with this article (no guarantees however):

       Belkin FireWire 3-Port 1394 PCI Card - (F5U501-APL)
       SIIG 3-Port 1394 I/O Card - (NN-300012)
       StarTech 4 Port IEEE-1394 PCI Firewire Card - (PCI1394_4)
       Adaptec FireConnect 4300 FireWire PCI Card - (1890600)

  FireWire I/O cards with chipsets made by VIA or TI are known to work.

$30
Miscellaneous Components
FireWire Hard Drive

The following is a list of FireWire drives (and enclosures) that contain the correct chipset, allow for multiple logins, and should work with this article (no guarantees however):

       Maxtor OneTouch III - 750GB FireWire 400/USB 2.0 Drive - (T01G750)
       Maxtor OneTouch III - 500GB FireWire 400/USB 2.0 Drive - (T01G500)
       Maxtor OneTouch III - 300GB FireWire 400/USB 2.0 Drive - (T01G300)

       Maxtor OneTouch III - 500GB FireWire 400/USB 2.0 Drive - (F01G500)
       Maxtor OneTouch III - 300GB FireWire 400/USB 2.0 Drive - (F01G300)

       Maxtor OneTouch II 300GB USB 2.0 / IEEE 1394a External Hard Drive - (E01G300)
       Maxtor OneTouch II 250GB USB 2.0 / IEEE 1394a External Hard Drive - (E01G250)
       Maxtor OneTouch II 200GB USB 2.0 / IEEE 1394a External Hard Drive - (E01A200)

       LaCie Hard Drive, Design by F.A. Porsche 250GB, FireWire 400 - (300703U)
       LaCie Hard Drive, Design by F.A. Porsche 160GB, FireWire 400 - (300702U)
       LaCie Hard Drive, Design by F.A. Porsche 80GB, FireWire 400 - (300699U)

       Dual Link Drive Kit, FireWire Enclosure, ADS Technologies - (DLX185)
           Maxtor Ultra 200GB ATA-133 (Internal) Hard Drive - (L01P200)

       Maxtor OneTouch 250GB USB 2.0 / IEEE 1394a External Hard Drive - (A01A250)
       Maxtor OneTouch 200GB USB 2.0 / IEEE 1394a External Hard Drive - (A01A200)

       Ensure that the FireWire drive that you purchase supports multiple logins. If the drive has a chipset that does not allow for concurrent access for more than one server, the disk and its partitions can only be seen by one server at a time. Disks with the Oxford 911 chipset are known to work. Here are the details about the disk that I purchased for this test:
  Vendor: Maxtor
  Model: OneTouch II
  Mfg. Part No. or KIT No.: E01G300
  Capacity: 300 GB
  Cache Buffer: 16 MB
  Rotational Speed (rpm): 7200 RPM
  Interface Transfer Rate : 400 Mbits/s
  "Combo" Interface: IEEE 1394 / USB 2.0 and USB 1.1 compatible

$280
1 - Extra FireWire Cable

Each node in the RAC configuration will need to connect to the shared storage device (the FireWire hard drive). The FireWire hard drive will come supplied with one FireWire cable. You will need to purchase one additional FireWire cable to connect the second node to the shared storage. Select the appropriate FireWire cable that is compatible with the data transmission speed (FireWire 400 / FireWire 800) and the desired cable length.

       Belkin 6-pin to 6-pin 1394 Cable, 3 ft. - (F3N400-03-ICE)
       Belkin 6-pin to 6-pin 1394 Cable, 14 ft. - (F3N400-14-ICE)

$20
1 - Ethernet hub or switch

Used for the interconnect between int-linux1 and int-linux2. A question I often receive is about substituting the Ethernet switch (used for interconnect int-linux1 / int-linux2) with a crossover CAT5 cable. I would not recommend this. I have found that when using a crossover CAT5 cable for the interconnect, whenever I took one of the PCs down, the other PC would detect a "cable unplugged" error, and thus the Cache Fusion network would become unavailable.

       Linksys EtherFast 10/100 5-port Ethernet Switch - (EZXS55W)

$25
4 - Network Cables

       Category 5e patch cable - (Connect linux1 to public network)
       Category 5e patch cable - (Connect linux2 to public network)
       Category 5e patch cable - (Connect linux1 to interconnect ethernet switch)
       Category 5e patch cable - (Connect linux2 to interconnect ethernet switch)

$5
$5
$5
$5
Total     $1685  

NOTE: I have received several emails since posting this article asking if the Maxtor OneTouch external drive (and the other external hard drives I have listed) has two IEEE1394 (FireWire) ports. All of the drives that I have listed and tested do have two IEEE1394 ports located on the back of the drive.

Click on the following images for a larger view of the Maxtor OneTouch external drive:

 

NOTE: Another question I received was about substituting the Ethernet switch (used for interconnect int-linux1 / int-linux2) with a crossover CAT5 cable. I would not recommend this. I have found that when using a crossover CAT5 cable for the interconnect, whenever I took one of the PCs down, the other PC would detect a "cable unplugged" error, and thus the Cache Fusion network would become unavailable.




A Brief Walk Through the Process

Before presenting the details of building our Oracle9i RAC system, I thought it would be beneficial to take a brief walk through the steps involved in building the environment.

Our implementation describes a dual node cluster (each with a single processor), each server running Red Hat Linux - Fedora Core 1. Note that most of the tasks within this document will need to be performed on both servers. I will indicate at the beginning of each section whether or not the task(s) should be performed on both nodes or not.

        Install Red Hat Linux / Fedora Core 1 - (on both nodes)
For this example configuration, you will be installing Red Hat Linux (Fedora Core 1) on both nodes that make up the RAC cluster.
        Configure network settings - (on both nodes)
After installing the Red Hat Linux software on both nodes, you will then need to configure the network on both nodes. This includes configuring the public network as well as the interconnect for the cluster. You should also adjust the default and maximum send buffer size settings for the interconnect for better performance when using cache fusion buffer transfers between instances. These settings will be put in your /etc/sysctl.conf file.
        Obtaining and Installing a proper Linux Kernel - (on both nodes)
In this section, we will be downloading and installing a new Linux kernel - one that supports multiple logins to the Fire Wire storage device. The kernel can be downloaded from Oracle's Linux Projects development group - http://oss.oracle.com. Once the new kernel is installed, there are several configuration steps in order to load the FireWire stack.
        Create UNIX oracle user account (dba group) - (on both nodes)
We will then create an Oracle UNIX user id on all nodes within the RAC cluster. This section also provides an example login script (.bash_profile) that can be used to set all required environment variables for the oracle user.
        Creating Partitions on the Shared FireWire Storage Device - (run once only from a single node)
This is where we create the physical and logical volumes using Logical Volume Manager (LVM). Instructions will be provided on how to remove all partitions from our FireWire drive and then how to use LVM to create all of our logical partitions.
        Create RAW Bindings - (on both nodes)
After creating our logical partitions, we need to configure raw devices on our FireWire shared storage to be used for all physical Oracle database files.
        Create Symbolic Links From RAW Volumes - (on both nodes)
It is helpful to create symbolic links from the RAW volumes to human readable names to make file recognition easier. Although this step is optional, it is highly recommended.
        Configuring the Linux Servers - (on both nodes)
This section will detail the steps involved to configure both Linux machines in order to prepare them for an Oracle9i RAC install.
        Configuring the "hangcheck-timer" Kernel Module - (on both nodes)
Oracle9i RAC uses a kernel module called the hangcheck-timer to monitor the health of the cluster and to restart a RAC mode in case of a failure. This section explains the steps required to configure the hangcheck-timer kernel module. Although the hangcheck-timer module is not required for Oracle Cluster Manager operation, it is highly recommended by Oracle.
        Configuring RAC Nodes for Remote Access - (on both nodes)
When installing Oracle9i RAC, the Oracle Installer will use the rsh command to copy the Oracle software to all other nodes within the RAC cluster. Included in this section are the instructions for configuring all nodes within your RAC cluster to run r* commands like rsh, rcp, and rlogin on a RAC node against other RAC nodes without a password.
        Configuring a Machine Startup Script - (on both nodes)
Up to this point, we have talked in great detail about the parameters and resources that will need to be configured on both nodes for our Oracle9i RAC configuration. This section will take a breather and recap those parameters and commands (in previous sections of this document) that need to happen on each node when the machine is cycled. Although there are several ways to do this, I simply provide a listing of the commands that you can put into a startup script (i.e. /etc/rc.local) that setup all required resources (disks, memory, etc.) each time the machine is booted. Other startup scripts are included within this section in order to provide a check as to whether you have updated all required scripts when each machine in the cluster is booted.
        Update Red Hat Linux System - (on both nodes)
There are several RPMs that will need to be applied to all nodes within the RAC cluster in preparation for the Oracle install. All of RPMs are included on the CDs for Fedora Core 1, plus I also put links to the files from this article. After applying all of the RPMs, you will then need to apply Oracle / Linux Patch 3006854. There is a link as well to download this patch. After applying all required patches, you should reboot all nodes within the RAC cluster.
        Downloading / Unpacking the Oracle9i Installation Files - (only needs to be preformed from a single node)
This section includes the steps to download and unpack the Oracle9i software distribution. The software can be downloaded from http://otn.oracle.com.
        Installing Oracle9i Cluster Manager - (only needs to be preformed from a single node)
Installing Oracle9i RAC is two step process: (1) Install the Oracle9i Cluster Manager and (2) Install the Oracle9i RDBMS software. In this section, we will go through the steps to install, configure and start the Oracle Cluster Manager software.

Keep in mind that the installation of Oracle Cluster Manager only needs to be preformed on one of the nodes (the installation process will rsh the files out to all other nodes contained within the cluster), but the configuring and starting the Cluster Manager needs to be preformed on both nodes.

        Installing Oracle9i RAC - (only needs to be preformed from a single node)
After installing Oracle Cluster Manager, it is time to install the Oracle9i RDBMS (RAC) software. This section provides many of the tasks involved to install the software as well as many post installation tasks that should be preformed before creating the Oracle cluster database.
        Creating TNS Networking Files - (on both nodes)
This section simply provides an example listing of my listener.ora and tnsnames.ora files. The Oracle TNS listener will need to be running on both nodes within the RAC cluster before starting the database creation below. The Oracle Installer may not install the listener.ora file. If this is the case, I provide an example listener.ora and tnsnames.ora file for the RAC networking configuration.
        Creating the Oracle Database - (only needs to be preformed from a single node)
After all of the software has been installed, we will now use the Oracle Database Configuration Assistant (DBCA) to create our clustered database on the shared storage (FireWire) device.
        Verifying the RAC Cluster / Database Configuration - (on both nodes)
After the Oracle Database Configuration Assistant has completed in creating the clustered database, you should have a fully functional Oracle RAC cluster running. This section provides several commands SQL queries that can be used to validate your Oracle9i RAC configuration.
        Starting & Stopping the Cluster - (only needs to be preformed from a single node)
Examples will be given in this section on how to start and stop the cluster. This includes how to fully bring up or down the entire cluster, along with examples of how to bring up and shutdown individual instances within the cluster.
        Transparent Application Failover (TAF) - (on one or both nodes)
Now that we have our cluster up and running, this section provides an example on how to test the Transparent Application Failover features of Oracle9i RAC. I will demonstrate how session failure works and how to setup your TNS configuration to take advantage of TAF.



Why Fedora Core 1 and not Fedora Core 2?

I made a significant effort to get this configuration to work with Red Hat Linux - Fedora Core 2 with no success. The primary reason this configuration did not work was due to incompatibilities with the modified 2.4 Linux kernel that needs to be downloaded and applied (provided by Oracle's Linux Projects development group) and the way kernel modules are handled in Fedora Core 2.

Fedora Core 2 includes the 2.6 kernel and makes use of the file /etc/modprobe.conf to exclusively control administrating kernel modules. This is unlike how kernel modules were handled in the 2.4 kernel where configuration information was contained in both the /etc/modprobe.conf and /etc/modules.conf configuration files. Starting with the 2.6 kernel, the /etc/modules.conf file has been phased out and only /etc/modprobe.conf is used.

The format and syntax for these two files are similar, but not exact. When the 2.4 Linux kernel boots, it will look for a file called /etc/modules.conf. I tried to copy the /etc/modprobe.conf file to /etc/modules.conf and take out everything but the bare essentials for the server to work (i.e. Network settings). No matter how I tried to configure the /etc/modules.conf for the Linux 2.4 kernel, I could not get it to accept the settings for networking and the ieee1394-controller. After several attempts in getting this configuration to work, I decided to use Fedora Core 1 since it uses the 2.4 kernel and made applying the modified 2.4 kernel an easy task.



Install Red Hat Linux - (Fedora Core 1)

After procuring the required hardware, it is time to start the configuration process. The first step in the process is to install the Red Hat Linux - Fedora Core 1 software on both servers.

You can download the RedHat Fedora Core 1 ISO files from the following location:

http://download.fedora.redhat.com/pub/fedora/linux/core/1/i386/iso/

NOTE: This article does not provide detailed instructions for installing Red Hat Linux - Fedora Core 1. For the purpose of this article, I choose to perform a Custom installation and then "Install Everything" when prompted for which products to install.

Documentation for installing Red Hat Linux can be found on their website at http://www.redhat.com/docs/manuals/.



Network Configuration


Configuring Public and Private Network

Lets start our Oracle RAC Linux configuration by ensuring the correct network configuration. In our two node example, we will need to configure the network on both nodes.

The easiest way to configure network settings in RedHat Linux is with the program Network Configuration. This application can be started from the command-line as the "root" user id as follows:

# su -
# /usr/bin/redhat-config-network &

NOTE:   Do not use DHCP naming as the interconnects need hard IP addresses!

Using the Network Configuration application, you will need to configure both NIC devices as well as the /etc/hosts file. Both of these tasks can be completed using the Network Configuration GUI. Notice that the /etc/hosts settings are the same for both nodes.

Our example configuration will use the following settings:

Server 1 - (linux1)
Device IP Address Subnet Gateway Purpose
eth0 192.168.1.100 255.255.255.0 192.168.1.1 Connects linux1 to the public network
eth1 192.168.2.100 255.255.255.0   Connects linux1 (interconnect) to linux2 (int-linux2)
/etc/hosts
127.0.0.1        localhost      loopback
192.168.1.100    linux1
192.168.2.100    int-linux1
192.168.1.101    linux2
192.168.2.101    int-linux2

Server 2 - (linux2)
Device IP Address Subnet Gateway Purpose
eth0 192.168.1.101 255.255.255.0 192.168.1.1 Connects linux2 to the public network
eth1 192.168.2.101 255.255.255.0   Connects linux2 (interconnect) to linux1 (int-linux1)
/etc/hosts
127.0.0.1        localhost      loopback
192.168.1.100    linux1
192.168.2.100    int-linux1
192.168.1.101    linux2
192.168.2.101    int-linux2

In the screen shots below, only node 1 (linux1) is shown. Ensure to make all the proper network settings to both nodes.



Network Configuration Screen - Node 1 (linux1)



Ethernet Device Screen - eth0 (linux1)



Ethernet Device Screen - eth1 (linux1)



Network Configuration Screen - /etc/hosts (linux1)


Adjusting Network Settings
With Oracle 9.2.0.1 and onwards, Oracle now makes use of UDP as the default protocol on Linux for interprocess communication (IPC), such as cache fusion buffer transfers between instances within the RAC cluseter.

Oracle strongly suggests to adjust the default and maximum send buffer size (SO_SNDBUF socket option) to 256 KB, and the default and maximum receive buffer size (SO_RCVBUF socket option) to 256 KB.

The receive buffers are used by TCP and UDP to hold received data until is is read by the application. The receive buffer cannot overflow because the peer is not allowed to send data beyond the buffer size window. This means that datagrams will be discarded if they don't fit in the socket receive buffer. This could cause the sender to overwhelm the receiver.

NOTE: The default and maximum window size can be changed in the /proc file system without reboot:
su - root

# Default setting in bytes of the socket receive buffer
sysctl -w net.core.rmem_default=262144

# Default setting in bytes of the socket send buffer
sysctl -w net.core.wmem_default=262144

# Maximum socket receive buffer size which may be set by using
# the SO_RCVBUF socket option
sysctl -w net.core.rmem_max=262144

# Maximum socket send buffer size which may be set by using 
# the SO_SNDBUF socket option
sysctl -w net.core.wmem_max=262144

You should make the above changes permanent by adding the following lines to the /etc/sysctl.conf file for each node in your RAC cluster:

net.core.rmem_default=262144
net.core.wmem_default=262144
net.core.rmem_max=262144
net.core.wmem_max=262144



Obtaining and Installing a proper Linux Kernel


Overview

The next step is to obtain and install a new Linux kernel that supports the use of IEEE1394 devices with multiple logins. In previous releases of this article, I included the steps to download a patched version of the Linux kernel (source code) and then compile it. Thanks to Oracle's Linux Projects development group, this is no longer a requirement. They provide a pre-compiled kernel for Red Hat Enterprise Linux 3.0 (which also works with Fedora!), that can simply be downloaded and installed. The instructions for downloading and installing the kernel are included in this section. Before going into the details of how to perform these actions, however, lets take a moment to discuss the changes that are required in the new kernel.

While FireWire drivers already exist for Linux, they often do not support shared storage. Normally, when you logon to an OS, the OS associates the driver to a specific drive for that machine alone. This implementation simply will not work for our RAC configuration. The shared storage (our FireWire hard drive) needs to be accessed by more than one node. We need to enable the FireWire driver to provide nonexclusive access to the drive so that multiple servers - the nodes that comprise the cluster - will be able to access the same storage. This is accomplished by removing the bit mask that identifies the machine during login in the source code. This results in allowing nonexclusive access to the FireWire hard drive. All other nodes in the cluster login to the same drive during their logon session, using the same modified driver, so they too also have nonexclusive access to the drive.

I'm probably getting ahead of myself, but I want to cover several topics before diving into the details of installing our new Linux kernel. Once we install our new Linux kernel (one that supports multiple logons to the FireWire drive) the system will detect and recognize the FireWire attached drive as a SCSI device. You will be able to use standard OS tools to partition the disk, create a file system, and so on. For Oracle9i RAC, you must make partitions for all the files and bind raw devices to those partitions. This article will make use of Logical Volume Manager (LVM) to make all needed paritions (actually to be known as logical partitions) on the FireWire shared drive.

Our implementation describes a dual node cluster (each with a single processor), each server running Red Hat Linux - Fedora Core 1. Keep in mind that the process of installing the patched Linux kernel will need to be performed on both Linux nodes. Red Hat Linux - Fedora Core 1 includes kernel linux-2.4.22-1.2115.nptl. We will need to download the Oracle Technet Supplied 2.4.21-27.0.2.ELorafw1 Linux kernel from the following URL: http://oss.oracle.com/projects/firewire/files.

NOTE: In previous articles, I provided instructions for downloading and installing the Technet Supplied 2.4.21-9.0.1 Linux kernel and even an update to the article that used the 2.4.21-15.ELorafw1 Linux kernel. Both of these Technet Supplied kernels are no longer available. It is advisable to use the newer 2.4.21-27.0.2 version.


  Perform the following procedures on both nodes in the cluster!


Download one of the following files:


Take a backup of your GRUB configuration file:

In most cases you will be using GRUB for your boot loader. Before actually installing the new kernel ensure to backup a copy of your /etc/grub.conf file:
# cp /etc/grub.conf /etc/grub.conf.original


Install the new kernel, as user root :

# rpm -ivh --force kernel-2.4.21-27.0.2.ELorafw1.i686.rpm   - (for single processor)
  - OR -
# rpm -ivh --force kernel-smp-2.4.21-27.0.2.ELorafw1.i686.rpm   - (for multiple processors)

NOTE: Installing the new kernel using RPM will also undate your grub or lilo configuration with the appropiate stanza. There is no need to add any new stanza to your boot loader configuration unless you want to have your old kernel image available.

The following is a listing of my /etc/grub.conf file before and then after the kernel install. As you can see, the install that I did put in another stanza for the 2.4.21-27.0.2.ELorafw1 kernel. If you want, you can chance the entry (default) in the new file so that the new kernel will be the default one booted. By default, the installer keeps the default kernel your old one by setting it to default=1.

Original /etc/grub.conf File for Fedora Core 1
# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd0,0)
#          kernel /vmlinuz-version ro root=/dev/hda3
#          initrd /initrd-version.img
#boot=/dev/hda
default=0
timeout=10
splashimage=(hd0,0)/grub/splash.xpm.gz
title Fedora Core (2.4.22-1.2115.nptl)
      root (hd0,0)
      kernel /vmlinuz-2.4.22-1.2115.nptl ro root=LABEL=/ rhgb
      initrd /initrd-2.4.22-1.2115.nptl.img
Newly Configured /etc/grub.conf File for Fedora Core 1 After Kernel Install
# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd0,0)
#          kernel /vmlinuz-version ro root=/dev/hda3
#          initrd /initrd-version.img
#boot=/dev/hda
default=0
timeout=10
splashimage=(hd0,0)/grub/splash.xpm.gz
title Fedora Core (2.4.21-27.0.2.ELorafw1)
        root (hd0,0)
        kernel /vmlinuz-2.4.21-27.0.2.ELorafw1 ro root=LABEL=/ rhgb
        initrd /initrd-2.4.21-27.0.2.ELorafw1.img
title Fedora Core (2.4.22-1.2115.nptl)
        root (hd0,0)
        kernel /vmlinuz-2.4.22-1.2115.nptl ro root=LABEL=/ rhgb
        initrd /initrd-2.4.22-1.2115.nptl.img


Add module options:

Add the following lines to /etc/modules.conf:

options sbp2 sbp2_exclusive_login=0
post-install sbp2 insmod sd_mod
post-remove sbp2 rmmod sd_mod

It is vital that the parameter sbp2_exclusive_login of the Serial Bus Protocol module (sbp2) be set to zero to allow multiple hosts to login to and access the FireWire disk concurrently. The second line ensures the SCSI disk driver module (sd_mod) is loaded as well since (sbp2) requires the SCSI layer. The core SCSI support module (scsi_mod) will be loaded automatically if (sd_mod) is loaded - there is no need to make a separate entry for it.


Reboot machine:

Reboot your machine into the new kernel.
Ensure the firewire (ieee1394) pci cards are plugged into the machine!


Loading the firewire stack:

NOTE: With Fedora Core 1, the loading of the FireWire stack should already be configured!

In most cases, the loading of the FireWire stack will already be configured in the /etc/rc.sysinit file. The commands that are contained within this file that are responsible for loading the FireWire stack are:

# modprobe ohci1394
# modprobe sbp2
In older versions of Red Hat, this was not the case and these commands would have to be manually run or put within a startup file. With Fedora Core 1 and higher, these commands are already put within the /etc/rc.sysinit file and run on each boot.


Rescan SCSI bus:

NOTE: With Fedora Core 1, you no longer need to rescan the SCSI bus in order to detect the disk! The disk should be detected automatically by the kernel.

In older versions of the kernel, I would need to run the rescan-scsi-bus.sh script in order to detect the FireWire drive. The purpose of this script was to create the SCSI entry for the node by using the following command:

echo "scsi add-single-device 0 0 0 0" > /proc/scsi/scsi

With Fedora Core 1, the disk should be detected automatically.


Check for SCSI Device:

After you have rebooted the machine, the kernel should automatically detect the disk as a SCSI device (/dev/sdXX). This section will provide several commands that should be run on both nodes in the cluster to ensure the FireWire drive was successfully detected.

For this configuration, I was performing the above procedures on both nodes at the same time. When complete, I shutdown both machines, started linux1 first, and then linux2. The following commands and results are from my linux2 machine. Again, make sure that you run the following commands on both nodes to ensure both machine can login to the shared drive.

Let's first check to see that the FireWire adapter was successfully detected:

# lspci
00:00.0 Host bridge: Intel Corp. 82845G/GL [Brookdale-G] Chipset Host Bridge (rev 01)
00:02.0 VGA compatible controller: Intel Corp. 82845G/GL [Brookdale-G] Chipset Integrated Graphics Device (rev 01)
00:1d.0 USB Controller: Intel Corp. 82801DB USB (Hub #1) (rev 01)
00:1d.1 USB Controller: Intel Corp. 82801DB USB (Hub #2) (rev 01)
00:1d.2 USB Controller: Intel Corp. 82801DB USB (Hub #3) (rev 01)
00:1d.7 USB Controller: Intel Corp. 82801DB USB2 (rev 01)
00:1e.0 PCI bridge: Intel Corp. 82801BA/CA/DB/EB PCI Bridge (rev 81)
00:1f.0 ISA bridge: Intel Corp. 82801DB LPC Interface Controller (rev 01)
00:1f.1 IDE interface: Intel Corp. 82801DB Ultra ATA Storage Controller (rev 01)
00:1f.3 SMBus: Intel Corp. 82801DB/DBM SMBus Controller (rev 01)
00:1f.5 Multimedia audio controller: Intel Corp. 82801DB AC'97 Audio Controller (rev 01)
01:04.0 FireWire (IEEE 1394): Texas Instruments TSB43AB23 IEEE-1394a-2000 Controller (PHY/Link)
01:05.0 Modem: Intel Corp.: Unknown device 1080 (rev 04)
01:06.0 Ethernet controller: Linksys Network Everywhere Fast Ethernet 10/100 model NC100 (rev 11)
01:09.0 Ethernet controller: Broadcom Corporation BCM4401 100Base-T (rev 01)
Second, let's check to see that the modules are loaded:
# lsmod |egrep "ohci1394|sbp2|ieee1394|sd_mod|scsi_mod"
sd_mod                 13424   0
sbp2                   19724   0
scsi_mod              104616   3  [sg sd_mod sbp2]
ohci1394               28008   0  (unused)
ieee1394               62884   0  [sbp2 ohci1394]
Third, let's make sure the disk was detected and an entry was made by the kernel:
# cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: Maxtor   Model: OneTouch         Rev: 0200
  Type:   Direct-Access                    ANSI SCSI revision: 06
Now let's ensure the FireWire drive is accessible for multiple logins and shows a valid login:
# dmesg | grep sbp2
ieee1394: sbp2: Query logins to SBP-2 device successful
ieee1394: sbp2: Maximum concurrent logins supported: 3
ieee1394: sbp2: Number of active logins: 1
ieee1394: sbp2: Logged into SBP-2 device
ieee1394: sbp2: Node[01:1023]: Max speed [S400] - Max payload [2048]

From the above output, you can see that the FireWire drive we have can support concurrent logins by up to 3 servers. It is vital that you have a drive where the chipset supports concurrent access for all nodes within the RAC cluster.


Troubleshooting SCSI Device Detection:

If you are having troubles with any of the procedures (above) in detecting the SCSI device, you can try the following:
# modprobe -r sbp2
# modprobe -r sd_mod
# modprobe -r ohci1394
# modprobe ohci1394
# modprobe sd_mod
# modprobe sbp2



Create "oracle" User and Directories


  Perform the following procedures on both nodes in the cluster!


Lets continue our example by creating the UNIX dba group and oracle userid along with all appropriate directories.

# mkdir /u01
# mkdir /u01/app

# groupadd -g 115 dba

# useradd -u 175 -g 115 -d /u01/app/oracle -s /bin/bash -c "Oracle Software Owner" -p oracle oracle

NOTE: When you are setting the Oracle environment variables for each RAC node, ensure to assign each RAC node a unique Oracle SID!

For this example, I used:

  • linux1 : ORACLE_SID=orcl1
  • linux2 : ORACLE_SID=orcl2

NOTE: The Oracle Universal Installer (OUI) requires at most 400MB of free space in the /tmp directory.

You can check the available space in /tmp by running the following command:

# df -k /tmp
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/hda3             36384656   6224240  28312140  19% /
If for any reason, you do not have enough space in /tmp, you can temporarily create space in another file system and point your TEMP and TMPDIR to it for the duration of the install. Here are the steps to do this:
# su -
# mkdir /<AnotherFilesystem>/tmp
# chown root.root /<AnotherFilesystem>/tmp
# chmod 1777 /<AnotherFilesystem>/tmp
# export TEMP=/<AnotherFilesystem>/tmp     # used by Oracle
# export TMPDIR=/<AnotherFilesystem>/tmp   # used by Linux programs
                                           #   like the linker "ld"
When the installation of Oracle is complete, you can remove the temporary directory using the following:
# su -
# rmdir /<AnotherFilesystem>/tmp
# unset TEMP
# unset TMPDIR

After creating the "oracle" UNIX userid on both nodes, ensure that the environment is setup correctly by using the following .bash_profile:

.bash_profile for Oracle User
# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
      . ~/.bashrc
fi

alias ls="ls -FA"

# User specific environment and startup programs
export ORACLE_BASE=/u01/app/oracle
export ORACLE_HOME=$ORACLE_BASE/product/9.2.0

# Each RAC node must have a unique ORACLE_SID. (i.e. orcl1, orcl2,...)
export ORACLE_SID=orcl1

export PATH=.:${PATH}:$HOME/bin:$ORACLE_HOME/bin
export PATH=${PATH}:/usr/bin:/bin:/usr/bin/X11:/usr/local/bin
export ORACLE_TERM=xterm
export TNS_ADMIN=$ORACLE_HOME/network/admin
export ORA_NLS33=$ORACLE_HOME/ocommon/nls/admin/data
export LD_LIBRARY_PATH=$ORACLE_HOME/lib
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:$ORACLE_HOME/oracm/lib
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/lib:/usr/lib:/usr/local/lib
export CLASSPATH=$ORACLE_HOME/JRE
export CLASSPATH=${CLASSPATH}:$ORACLE_HOME/jlib
export CLASSPATH=${CLASSPATH}:$ORACLE_HOME/rdbms/jlib
export CLASSPATH=${CLASSPATH}:$ORACLE_HOME/network/jlib
export THREADS_FLAG=native
export TEMP=/tmp
export TMPDIR=/tmp
export LD_ASSUME_KERNEL=2.4.1



Creating Partitions on the Shared FireWire Storage Device


  Only perform the following procedures for the shared storage on one RAC node!!



Overview

It is time to create the physical and logical volumes to be used by the Logical Volume Manager (LVM). For a more detailed view of managing the LVM, see my article [Managing Physical & Logical Volumes]. The following table lists the mappings of logical partition to tablespace that we will be accomplishing in this section of the document:

Logical Volume / RAW / File / Tablespace Mappings
Logical Volume RAW Volume Symbolic Link Tablespace / File Name Tablespace / File Size Partition Size
/dev/pv1/lvol1 /dev/raw/raw1 /u01/app/oracle/oradata/orcl/CMQuorumFile Cluster Manager Quorum File - 5 MB
/dev/pv1/lvol2 /dev/raw/raw2 /u01/app/oracle/oradata/orcl/SharedSrvctlConfigFile Shared Configuration File - 100 MB
/dev/pv1/lvol3 /dev/raw/raw3 /u01/app/oracle/oradata/orcl/spfileorcl.ora Server Parameter File - 10 MB
/dev/pv1/lvol4 /dev/raw/raw4 /u01/app/oracle/oradata/orcl/control01.ctl Control File 1 - 200 MB
/dev/pv1/lvol5 /dev/raw/raw5 /u01/app/oracle/oradata/orcl/control02.ctl Control File 2 - 200 MB
/dev/pv1/lvol6 /dev/raw/raw6 /u01/app/oracle/oradata/orcl/control03.ctl Control File 3 - 200 MB
/dev/pv1/lvol7 /dev/raw/raw7 /u01/app/oracle/oradata/orcl/cwmlite01.dbf CWMLITE 50 MB 55 MB
/dev/pv1/lvol8 /dev/raw/raw8 /u01/app/oracle/oradata/orcl/drsys01.dbf DRSYS 20 MB 25 MB
/dev/pv1/lvol9 /dev/raw/raw9 /u01/app/oracle/oradata/orcl/example01.dbf EXAMPLE 250 MB 255 MB
/dev/pv1/lvol10 /dev/raw/raw10 /u01/app/oracle/oradata/orcl/indx01.dbf INDX 100 MB 105 MB
/dev/pv1/lvol11 /dev/raw/raw11 /u01/app/oracle/oradata/orcl/odm01.dbf ODM 50 MB 55 MB
/dev/pv1/lvol12 /dev/raw/raw12 /u01/app/oracle/oradata/orcl/system01.dbf SYSTEM 800 MB 805 MB
/dev/pv1/lvol13 /dev/raw/raw13 /u01/app/oracle/oradata/orcl/temp01.dbf TEMP 250 MB 255 MB
/dev/pv1/lvol14 /dev/raw/raw14 /u01/app/oracle/oradata/orcl/tools01.dbf TOOLS 100 MB 105 MB
/dev/pv1/lvol15 /dev/raw/raw15 /u01/app/oracle/oradata/orcl/undotbs01.dbf UNDOTBS1 400 MB 405 MB
/dev/pv1/lvol16 /dev/raw/raw16 /u01/app/oracle/oradata/orcl/undotbs02.dbf UNDOTBS2 400 MB 405 MB
/dev/pv1/lvol17 /dev/raw/raw17 /u01/app/oracle/oradata/orcl/users01.dbf USERS 100 MB 105 MB
/dev/pv1/lvol18 /dev/raw/raw18 /u01/app/oracle/oradata/orcl/xdb01.dbf XDB 150 MB 155 MB
/dev/pv1/lvol19 /dev/raw/raw19 /u01/app/oracle/oradata/orcl/perfstat01.dbf PERFSTAT 100 MB 105 MB
/dev/pv1/lvol20 /dev/raw/raw20 /u01/app/oracle/oradata/orcl/redo01.log REDO G1 / M1 100 MB 105 MB
/dev/pv1/lvol21 /dev/raw/raw21 /u01/app/oracle/oradata/orcl/redo02.log REDO G2 / M1 100 MB 105 MB
/dev/pv1/lvol22 /dev/raw/raw22 /u01/app/oracle/oradata/orcl/redo03.log REDO G3 / M1 100 MB 105 MB
/dev/pv1/lvol23 /dev/raw/raw23 /u01/app/oracle/oradata/orcl/orcl_redo2_2.log REDO G4 / M1 100 MB 105 MB


Remove All Partitions on FireWire Shared Storage

In this example, I will be using the entire FireWire disk. (No partitions). In this case, I will be using /dev/sda to create the logical / physical volumes. This is not the only way to accomplish the task of creating our LVM environment. We could also create a Linux LVM partition (this is type 8e) on the disk. Lets say that the LVM partition is the first partition created on the disk. We would then need to work with /dev/sda1. Again, in this example, I will be using the entire FireWire drive (with no partitions) and therefore accessing /dev/sda. Before creating our physical and logical volumes, it is important to remove any existing partitions on the FireWire drive (since we will be using the entire disk) by using the fdisk command:
# fdisk /dev/sda
Command (m for help): p

Disk /dev/sda: 203.9 GB, 203927060480 bytes
255 heads, 63 sectors/track, 24792 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/sda1             1     24791 199133676    c  Win95 FAT32 (LBA)

Command (m for help): d
Selected partition 1

Command (m for help): p

Disk /dev/sda: 203.9 GB, 203927060480 bytes
255 heads, 63 sectors/track, 24792 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot    Start       End    Blocks   Id  System

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

Create Logical Volumes

The following set of commands perform the steps required to create logical volumes:

  1. Run the vgscan command (on all RAC nodes within the cluster) in order to create the /etc/lvmtab file.
  2. Use pvcreate to create a physical volume for use by the Logical Volume Manager (LVM).
  3. Use vgcreate to create a volume group for the drive or for the partition you want to use for RAW devices. Here we do the entire single drive. In our example (below), the command will allow 256 logical partitions and 256 physical partitions with a 128K extent size.
  4. Use lvcreate to create the logical volumes inside the volume group.

NOTE: As mentioned above, I needed to run the vgscan command on all nodes so that it could create the /etc/lvmtab file. This should be performed before running the commands below.

Put the following commands in a schell script, modify the permission to execute, and then run it as the "root" UNIX userid:

vgscan
pvcreate -d /dev/sda
vgcreate -l 256 -p 256 -s 128k /dev/pv1 /dev/sda
lvcreate -L 5m   /dev/pv1       # CMQuorumFile
lvcreate -L 100m /dev/pv1       # SharedSrvctlConfigFile
lvcreate -L 10m  /dev/pv1       # spfileorcl.ora
lvcreate -L 200m /dev/pv1       # control01.ctl
lvcreate -L 200m /dev/pv1       # control02.ctl
lvcreate -L 200m /dev/pv1       # control03.ctl
lvcreate -L 55m  /dev/pv1       # cwmlite01.dbf     (50 MB)
lvcreate -L 25m  /dev/pv1       # drsys01.dbf       (20 MB)
lvcreate -L 255m /dev/pv1       # example01.dbf     (200 MB)
lvcreate -L 105m /dev/pv1       # indx01.dbf        (100 MB)
lvcreate -L 55m  /dev/pv1       # odm01.dbf         (50 MB)
lvcreate -L 805m /dev/pv1       # system01.dbf      (800 MB)
lvcreate -L 255m /dev/pv1       # temp01.dbf        (250 MB)
lvcreate -L 105m /dev/pv1       # tools01.dbf       (100 MB)
lvcreate -L 405m /dev/pv1       # undotbs01.dbf     (400 MB)
lvcreate -L 405m /dev/pv1       # undotbs02.dbf     (400 MB)
lvcreate -L 105m /dev/pv1       # users01.dbf       (100 MB)
lvcreate -L 155m /dev/pv1       # xdb01.dbf         (150 MB)
lvcreate -L 105m /dev/pv1       # perfstat01.dbf    (100 MB)
lvcreate -L 105m /dev/pv1       # redo01.log        (100 MB)
lvcreate -L 105m /dev/pv1       # redo02.log        (100 MB)
lvcreate -L 105m /dev/pv1       # redo03.log        (100 MB)
lvcreate -L 105m /dev/pv1       # redo03.log        (100 MB)

Using the script (above) will result in the creation of /dev/pv1/lvol1 - /dev/pv1/lvol23.

I typically use the lvscan command to check the status of my logical volumes:

[root@linux2 root]# lvscan
lvscan -- ACTIVE            "/dev/pv1/lvol1" [5 MB]
lvscan -- ACTIVE            "/dev/pv1/lvol2" [100 MB]
lvscan -- ACTIVE            "/dev/pv1/lvol3" [10 MB]
lvscan -- ACTIVE            "/dev/pv1/lvol4" [200 MB]
lvscan -- ACTIVE            "/dev/pv1/lvol5" [200 MB]
lvscan -- ACTIVE            "/dev/pv1/lvol6" [200 MB]
lvscan -- ACTIVE            "/dev/pv1/lvol7" [55 MB]
lvscan -- ACTIVE            "/dev/pv1/lvol8" [25 MB]
lvscan -- ACTIVE            "/dev/pv1/lvol9" [255 MB]
lvscan -- ACTIVE            "/dev/pv1/lvol10" [105 MB]
lvscan -- ACTIVE            "/dev/pv1/lvol11" [55 MB]
lvscan -- ACTIVE            "/dev/pv1/lvol12" [805 MB]
lvscan -- ACTIVE            "/dev/pv1/lvol13" [255 MB]
lvscan -- ACTIVE            "/dev/pv1/lvol14" [105 MB]
lvscan -- ACTIVE            "/dev/pv1/lvol15" [405 MB]
lvscan -- ACTIVE            "/dev/pv1/lvol16" [405 MB]
lvscan -- ACTIVE            "/dev/pv1/lvol17" [105 MB]
lvscan -- ACTIVE            "/dev/pv1/lvol18" [155 MB]
lvscan -- ACTIVE            "/dev/pv1/lvol19" [105 MB]
lvscan -- ACTIVE            "/dev/pv1/lvol20" [105 MB]
lvscan -- ACTIVE            "/dev/pv1/lvol21" [105 MB]
lvscan -- ACTIVE            "/dev/pv1/lvol22" [105 MB]
lvscan -- ACTIVE            "/dev/pv1/lvol23" [105 MB]
lvscan -- 23 logical volumes with 3.88 GB total in 1 volume group
lvscan -- 23 active logical volumes

Reboot All Nodes in RAC Cluster

After you have finished creating the partitions, it is recommended that you reboot the kernel on all RAC nodes to make sure that all of the new partitions are recognized by the kernel on all RAC nodes:
# su -
# reboot

IMPORTANT: Keep in mind that you will need to put a call to the vgscan and then vgchange -a y in one of your startup scripts so that they are run at boot time for each machine in your RAC cluster. These two commands will give you an actual volume manager database before activating all volume groups. This document will provide all settings that should go into your /etc/rc.local script in order to setup each node within your Oracle9i RAC cluster.



Create RAW Bindings


  Perform the following procedures on both nodes in the cluster!


NOTE: Several of the commands within this section will need to be performed on every node within the cluster every time that machine is booted. Details of these commands and instructions for placing them in a startup script are included in section "All Startup Commands for Each RAC Node".

In this section, I will provide the instructions for configuring raw devices on our FireWire shared storage to be used for all physical Oracle database files including the Cluster Manager Quorum File and the Shared Configuration File for srvctl.

At this point, we have already created the partions required on our FireWire shared storage - we now need to bind all volumes to our raw device by using the raw command:

/usr/bin/raw /dev/raw/raw1 /dev/pv1/lvol1
/usr/bin/raw /dev/raw/raw2 /dev/pv1/lvol2
/usr/bin/raw /dev/raw/raw3 /dev/pv1/lvol3
/usr/bin/raw /dev/raw/raw4 /dev/pv1/lvol4
/usr/bin/raw /dev/raw/raw5 /dev/pv1/lvol5
/usr/bin/raw /dev/raw/raw6 /dev/pv1/lvol6
/usr/bin/raw /dev/raw/raw7 /dev/pv1/lvol7
/usr/bin/raw /dev/raw/raw8 /dev/pv1/lvol8
/usr/bin/raw /dev/raw/raw9 /dev/pv1/lvol9
/usr/bin/raw /dev/raw/raw10 /dev/pv1/lvol10
/usr/bin/raw /dev/raw/raw11 /dev/pv1/lvol11
/usr/bin/raw /dev/raw/raw12 /dev/pv1/lvol12
/usr/bin/raw /dev/raw/raw13 /dev/pv1/lvol13
/usr/bin/raw /dev/raw/raw14 /dev/pv1/lvol14
/usr/bin/raw /dev/raw/raw15 /dev/pv1/lvol15
/usr/bin/raw /dev/raw/raw16 /dev/pv1/lvol16
/usr/bin/raw /dev/raw/raw17 /dev/pv1/lvol17
/usr/bin/raw /dev/raw/raw18 /dev/pv1/lvol18
/usr/bin/raw /dev/raw/raw19 /dev/pv1/lvol19
/usr/bin/raw /dev/raw/raw20 /dev/pv1/lvol20
/usr/bin/raw /dev/raw/raw21 /dev/pv1/lvol21
/usr/bin/raw /dev/raw/raw22 /dev/pv1/lvol22
/usr/bin/raw /dev/raw/raw23 /dev/pv1/lvol23
/bin/chmod 600 /dev/raw/raw1
/bin/chmod 600 /dev/raw/raw2
/bin/chmod 600 /dev/raw/raw3
/bin/chmod 600 /dev/raw/raw4
/bin/chmod 600 /dev/raw/raw5
/bin/chmod 600 /dev/raw/raw6
/bin/chmod 600 /dev/raw/raw7
/bin/chmod 600 /dev/raw/raw8
/bin/chmod 600 /dev/raw/raw9
/bin/chmod 600 /dev/raw/raw10
/bin/chmod 600 /dev/raw/raw11
/bin/chmod 600 /dev/raw/raw12
/bin/chmod 600 /dev/raw/raw13
/bin/chmod 600 /dev/raw/raw14
/bin/chmod 600 /dev/raw/raw15
/bin/chmod 600 /dev/raw/raw16
/bin/chmod 600 /dev/raw/raw17
/bin/chmod 600 /dev/raw/raw18
/bin/chmod 600 /dev/raw/raw19
/bin/chmod 600 /dev/raw/raw20
/bin/chmod 600 /dev/raw/raw21
/bin/chmod 600 /dev/raw/raw22
/bin/chmod 600 /dev/raw/raw23
/bin/chown oracle:dba /dev/raw/raw1
/bin/chown oracle:dba /dev/raw/raw2
/bin/chown oracle:dba /dev/raw/raw3
/bin/chown oracle:dba /dev/raw/raw4
/bin/chown oracle:dba /dev/raw/raw5
/bin/chown oracle:dba /dev/raw/raw6
/bin/chown oracle:dba /dev/raw/raw7
/bin/chown oracle:dba /dev/raw/raw8
/bin/chown oracle:dba /dev/raw/raw9
/bin/chown oracle:dba /dev/raw/raw10
/bin/chown oracle:dba /dev/raw/raw11
/bin/chown oracle:dba /dev/raw/raw12
/bin/chown oracle:dba /dev/raw/raw13
/bin/chown oracle:dba /dev/raw/raw14
/bin/chown oracle:dba /dev/raw/raw15
/bin/chown oracle:dba /dev/raw/raw16
/bin/chown oracle:dba /dev/raw/raw17
/bin/chown oracle:dba /dev/raw/raw18
/bin/chown oracle:dba /dev/raw/raw19
/bin/chown oracle:dba /dev/raw/raw20
/bin/chown oracle:dba /dev/raw/raw21
/bin/chown oracle:dba /dev/raw/raw22
/bin/chown oracle:dba /dev/raw/raw23

NOTE: Keep in mind that the above bind steps will need to be done for each node within the RAC cluster on each startup. It will be placed in a startup script like /etc/rc.local.

You can verify raw bindings by using the raw command:

# raw -qa
/dev/raw/raw1:  bound to major 58, minor 0
/dev/raw/raw2:  bound to major 58, minor 1
/dev/raw/raw3:  bound to major 58, minor 2
/dev/raw/raw4:  bound to major 58, minor 3
/dev/raw/raw5:  bound to major 58, minor 4
/dev/raw/raw6:  bound to major 58, minor 5
/dev/raw/raw7:  bound to major 58, minor 6
/dev/raw/raw8:  bound to major 58, minor 7
/dev/raw/raw9:  bound to major 58, minor 8
/dev/raw/raw10: bound to major 58, minor 9
/dev/raw/raw11: bound to major 58, minor 10
/dev/raw/raw12: bound to major 58, minor 11
/dev/raw/raw13: bound to major 58, minor 12
/dev/raw/raw14: bound to major 58, minor 13
/dev/raw/raw15: bound to major 58, minor 14
/dev/raw/raw16: bound to major 58, minor 15
/dev/raw/raw17: bound to major 58, minor 16
/dev/raw/raw18: bound to major 58, minor 17
/dev/raw/raw19: bound to major 58, minor 18
/dev/raw/raw20: bound to major 58, minor 19
/dev/raw/raw21: bound to major 58, minor 20
/dev/raw/raw22: bound to major 58, minor 21
/dev/raw/raw23: bound to major 58, minor 22



Create Symbolic Links From RAW Volumes


  Perform the following procedures on both nodes in the cluster!


NOTE: Several of the commands within this section will need to be performed on every node within the cluster every time that machine is booted. Details of these commands and instructions for placing them in a startup script are included in section "All Startup Commands for Each RAC Node".

I generally create symbolic links from the RAW volumes to human readable names to make file recognition easier. If you decide to NOT use symbolic links then you will need to use the /dev/pv1/lvolX designations for the Oracle files you define when creating / maintaining tablespaces. For some people, dealing with the cryptic designations (i.e. /dev/pv1/lvol21) is simply too much trouble - it is much easier to work with human readable names. These commands will need to be issued once on each Linux server. I typically include the in the /etc/rc.local startup script. If you add tablespaces; a new logical volume, RAW binding and link name should be added to the various files on all nodes.

mkdir /u01/app/oracle/oradata
mkdir /u01/app/oracle/oradata/orcl

ln -s /dev/raw/raw1  /u01/app/oracle/oradata/orcl/CMQuorumFile
ln -s /dev/raw/raw2  /u01/app/oracle/oradata/orcl/SharedSrvctlConfigFile
ln -s /dev/raw/raw3  /u01/app/oracle/oradata/orcl/spfileorcl.ora
ln -s /dev/raw/raw4  /u01/app/oracle/oradata/orcl/control01.ctl
ln -s /dev/raw/raw5  /u01/app/oracle/oradata/orcl/control02.ctl
ln -s /dev/raw/raw6  /u01/app/oracle/oradata/orcl/control03.ctl
ln -s /dev/raw/raw7  /u01/app/oracle/oradata/orcl/cwmlite01.dbf
ln -s /dev/raw/raw8  /u01/app/oracle/oradata/orcl/drsys01.dbf
ln -s /dev/raw/raw9  /u01/app/oracle/oradata/orcl/example01.dbf
ln -s /dev/raw/raw10 /u01/app/oracle/oradata/orcl/indx01.dbf
ln -s /dev/raw/raw11 /u01/app/oracle/oradata/orcl/odm01.dbf
ln -s /dev/raw/raw12 /u01/app/oracle/oradata/orcl/system01.dbf
ln -s /dev/raw/raw13 /u01/app/oracle/oradata/orcl/temp01.dbf
ln -s /dev/raw/raw14 /u01/app/oracle/oradata/orcl/tools01.dbf
ln -s /dev/raw/raw15 /u01/app/oracle/oradata/orcl/undotbs01.dbf
ln -s /dev/raw/raw16 /u01/app/oracle/oradata/orcl/undotbs02.dbf
ln -s /dev/raw/raw17 /u01/app/oracle/oradata/orcl/users01.dbf
ln -s /dev/raw/raw18 /u01/app/oracle/oradata/orcl/xdb01.dbf
ln -s /dev/raw/raw19 /u01/app/oracle/oradata/orcl/perfstat01.dbf
ln -s /dev/raw/raw20 /u01/app/oracle/oradata/orcl/redo01.log
ln -s /dev/raw/raw21 /u01/app/oracle/oradata/orcl/redo02.log
ln -s /dev/raw/raw22 /u01/app/oracle/oradata/orcl/redo03.log
ln -s /dev/raw/raw23 /u01/app/oracle/oradata/orcl/orcl_redo2_2.log

chown -R oracle:dba /u01/app/oracle/oradata



Configuring the Linux Servers


  Perform the following procedures on both nodes in the cluster!


NOTE: Several of the commands within this section will need to be performed on every node within the cluster every time that machine is booted. Details of these commands and instructions for placing them in a startup script are included in section "All Startup Commands for Each RAC Node".

This section of the document focuses on configuring both Linux servers - getting each one prepared for the Oracle9i RAC installation.

Swap Space Considerations


Setting Shared Memory

Shared memory allows processes to access common structures and data by placing them in a shared memory segment. This is the fastest form of Interprocess Communications (IPC) available - mainly due to the fact that no kernel involvement occurs when data is being passed between the processes. Data does not need to be copied between processes.

Oracle makes use of shared memory for its Shared Global Area (SGA) which is an area of memory that is shared by all Oracle backup and foreground processes. Adequate sizing of the SGA is critical to Oracle performance since it is responsible for holding the database buffer cache, shared SQL, access paths, and so much more.

To determine all shared memory limits, use the following:

# ipcs -lm

------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 32768
max total shared memory (kbytes) = 8388608
min seg size (bytes) = 1