Oracle DBA Tips Corner

     Return to the Oracle DBA Tips Corner.


Building an Inexpensive Oracle RAC 10g Release 1 on Linux - (WBEL 3.0 / FireWire)

by Jeff Hunter, Sr. Database Administrator


Contents

  1. Overview
  2. Oracle10g Real Application Cluster (RAC) Introduction
  3. Shared-Storage Overview
  4. FireWire Technology
  5. Hardware & Costs
  6. Install White Box Enterprise Linux 3.0
  7. Network Configuration
  8. Obtaining and Installing a proper Linux Kernel
  9. Create "oracle" User and Directories
  10. Creating Partitions on the Shared FireWire Storage Device
  11. Configuring the Linux Servers for Oracle
  12. Configuring the "hangcheck-timer" Kernel Module
  13. Configuring RAC Nodes for Remote Access
  14. All Startup Commands for Each RAC Node
  15. Checking RPM Packages for Oracle10g R1
  16. Installing and Configuring Oracle Cluster File System (OCFS)
  17. Installing and Configuring Automatic Storage Management (ASM) and Disks
  18. Downloading Oracle10g R1 RAC Software
  19. Installing Oracle Cluster Ready Services (CRS) Software
  20. Installing Oracle10g R1 Database Software
  21. Creating TNS Listener Process
  22. Creating the Oracle Cluster Database
  23. Verifying TNS Networking Files
  24. Creating / Altering Tablespaces
  25. Verifying the RAC Cluster / Database Configuration
  26. Starting & Stopping the Cluster
  27. Transparent Application Failover - (TAF)
  28. Conclusion
  29. Acknowledgements
  30. About the Author



Overview

One of the most efficient ways to become familiar with Oracle10g Real Application Cluster (RAC) technology is to have access to an actual Oracle10g RAC cluster. In learning this new technology, you will soon start to realize the benefits Oracle10g RAC has to offer like fault tolerance, new levels of security, load balancing, and the ease of upgrading capacity. The problem though is the price of the hardware required for a typical production RAC configuration. A small two node cluster, for example, could run anywhere from $10,000 to well over $20,000. This would not even include the heart of a production RAC environment, the shared storage. In most cases, this would be a Storage Area Network (SAN), which generally start at $8,000.

For those who simply want to become familiar with Oracle10g RAC, this article provides a low cost alternative to configure an Oracle10g RAC system using commercial off the shelf components and downloadable software. The estimated cost for this configuration could be anywhere from $1200 to $1800. This system will consist of a dual node cluster (each with a single processor), both running Linux (White Box Enterprise Linux 3.0 Respin 1 or Red Hat Enterprise Linux 3) with a shared disk storage based on IEEE1394 (FireWire) drive technology. (Of course, you could also consider building a virtual cluster on a VMware Virtual Machine, but the experience won't quite be the same!)

  If you are interested in configuring the same type of configuration for Oracle9i, please see my article entitled "Building an Inexpensive Oracle9i RAC Configuration on Linux".

  This article is only designed to work as documented with absolutely no substitutions. If you are looking for an example that takes advantage of 10g R2 with Red Hat 4, please see "Building an Inexpensive Oracle10g Release 2 RAC Configuration on Linux - (CentOS 4.2)".

Please note, that this is not the only way to build a low cost Oracle10g RAC system. I have seen other solutions that utilize an implementation based on SCSI rather than FireWire for shared storage. In most cases, SCSI will cost more than a FireWire solution where a typical SCSI card is priced around $70 and an 80GB external SCSI drive will cost around $700-$1000. Keep in mind that some motherboards may already include built-in SCSI controllers.

It is important to note that this configuration should never be run in a production environment and that it is not supported by Oracle or any other vendor. In a production environment, fiber channel—the high-speed serial-transfer interface that can connect systems and storage devices in either point-to-point or switched topologies—is the technology of choice. FireWire offers a low-cost alternative to fiber channel for testing and development, but it is not ready for production.

Although in past experience I have used raw partitions for storing files on shared storage, here we will make use of the Oracle Cluster File System (OCFS) and Oracle Automatic Storage Management (ASM). The two Linux servers will be configured as follows:

Oracle Database Files
RAC Node Name Instance Name Database Name $ORACLE_BASE File System
linux1 orcl1 orcl /u01/app/oracle Automatic Storage Management (ASM)
linux2 orcl2 orcl /u01/app/oracle Automatic Storage Management (ASM)
Oracle CRS Shared Files
File Type File Name Partition Mount Point File System
Oracle Cluster Registry (OCR) /u02/oradata/orcl/OCRFile /dev/sda1 /u02/oradata/orcl Oracle's Cluster File System (OCFS)
CRS Voting Disk /u02/oradata/orcl/CSSFile /dev/sda1 /u02/oradata/orcl Oracle's Cluster File System (OCFS)

The Oracle Cluster Ready Services (CRS) software will be installed to /u01/app/oracle/product/10.1.0/crs on each of the nodes that make up the RAC cluster. However, the CRS software requires that two of its files, the "Oracle Cluster Registry (OCR)" file and the "CRS Voting Disk" file be shared with all nodes in the cluster. These two files will be installed on the shared storage using Oracle's Cluster File System (OCFS). It is possible (but not recommended by Oracle) to use RAW devices for these files, however, it is not possible to use ASM for these CRS files.

The Oracle10g R1 Database software will be installed into a separate Oracle Home; namely /u01/app/oracle/product/10.1.0/db_1. All of the Oracle physical database files (data, online redo logs, control files, archived redo logs), will be installed to different partitions of the shared drive being managed by Automatic Storage Management (ASM).

  The Oracle database files could have just as well been stored on the Oracle Cluster File System (OFCS). Using ASM, however, makes the article that much more interesting!



Oracle10g Real Application Cluster (RAC) Introduction

Oracle Real Application Cluster (RAC) is the successor to Oracle Parallel Server (OPS) and was first introduced in Oracle9i. RAC allows multiple instances to access the same database (storage) simultaneously. RAC provides fault tolerance, load balancing, and performance benefits by allowing the system to scale out, and at the same time since all nodes access the same database, the failure of one instance will not cause the loss of access to the database.

At the heart of Oracle10g RAC is a shared disk subsystem. All nodes in the cluster must be able to access all of the data, redo log files, control files and parameter files for all nodes in the cluster. The data disks must be globally available in order to allow all nodes to access the database. Each node has its own redo log file(s) and UNDO tablespace, but the other nodes must be able to access them (and the shared control file) in order to recover that node in the event of a system failure.

Not all clustering solutions use shared storage. Some vendors use an approach known as a federated cluster, in which data is spread across several machines rather than shared by all. With Oracle10g RAC, however, multiple nodes use the same set of disks for storing data. With Oracle10g RAC, the data files, redo log files, control files, and archived log files reside on shared storage on raw-disk devices, a NAS, ASM, or on a clustered file system. Oracle's approach to clustering leverages the collective processing power of all the nodes in the cluster and at the same time provides failover security.

The biggest difference between Oracle RAC and OPS is the addition of Cache Fusion. With OPS a request for data from one node to another required the data to be written to disk first, then the requesting node can read that data. With cache fusion, data is passed along a high-speed interconnect using a sophisticated locking algorithm.

Pre-configured Oracle10g RAC solutions are available from vendors such as Dell, IBM and HP for production environments. This article, however, focuses on putting together your own Oracle10g RAC environment for development and testing by using Linux servers and a low cost shared disk solution; FireWire.



Shared-Storage Overview

Today, fibre channel is one of the most popular solutions for shared storage. As mentioned earlier, fibre channel is a high-speed serial-transfer interface that is used to connect systems and storage devices in either point-to-point or switched topologies. Protocols supported by Fibre Channel include SCSI and IP. Fibre channel configurations can support as many as 127 nodes and have a throughput of up to 2.12 gigabits per second. Fibre channel, although, is very expensive. Just the fibre channel switch alone can run as much as $1000. This does not even include the fibre channel storage array and high-end drives, which can reach prices of about $300 for a 36GB drive. A typical fibre channel setup which includes fibre channel cards for the servers, a basic setup is roughly $5,000, which does not include the cost of the servers that make up the cluster.

A less expensive alternative to fibre channel is SCSI. SCSI technology provides acceptable performance for shared storage, but for administrators and developers who are used to GPL-based Linux prices, even SCSI can come in over budget, at around $1,000 to $2,000 for a two-node cluster.

Another popular solution is the Sun NFS (Network File System) found on a NAS. It can be used for shared storage but only if you are using a network appliance or something similar. Specifically, you need servers that guarantee direct I/O over NFS, TCP as the transport protocol, and read/write block sizes of 32K.

The shared storage that will be used for this article is based on IEEE1394 (FireWire) drive technology. FireWire is able to offer a low-cost alternative to Fibre Channel for testing and development, but should never be used in a production environment.



FireWire Technology

Developed by Apple Computer and Texas Instruments, FireWire is a cross-platform implementation of a high-speed serial data bus. With its high bandwidth, long distances (up to 100 meters in length) and high-powered bus, FireWire is being used in applications such as digital video (DV), professional audio, hard drives, high-end digital still cameras and home entertainment devices. Today, FireWire operates at transfer rates of up to 800 megabits per second while next generation FireWire calls for speeds to a theoretical bit rate to 1600 Mbps and then up to a staggering 3200 Mbps. That's 3.2 gigabits per second. This will make FireWire indispensable for transferring massive data files and for even the most demanding video applications, such as working with uncompressed high-definition (HD) video or multiple standard-definition (SD) video streams.

The following chart shows speed comparisons of the various types of disk interface. For each interface, I provide the maximum transfer rates in kilobits (kb), kilobytes (KB), megabits (Mb), megabytes (MB), gigabits (Gb), and gigabytes (GB) per second. As you can see, the capabilities of IEEE1394 compare very favorably with other disk interface and network technologies that are currently available today.

Disk Interface / Network / BUS Speed
Kb KB Mb MB Gb GB
Serial 115 14.375 0.115 0.014    
Parallel (standard) 920 115 0.92 0.115    
10Base-T Ethernet     10 1.25    
IEEE 802.11b wireless Wi-Fi (2.4 GHz band)     11 1.375    
USB 1.1     12 1.5    
Parallel (ECP/EPP)     24 3    
SCSI-1     40 5    
IEEE 802.11g wireless WLAN (2.4 GHz band)     54 6.75    
SCSI-2 (Fast SCSI / Fast Narrow SCSI)     80 10    
100Base-T Ethernet (Fast Ethernet)     100 12.5    
ATA/100 (parallel)     100 12.5    
IDE     133.6 16.7    
Fast Wide SCSI (Wide SCSI)     160 20    
Ultra SCSI (SCSI-3 / Fast-20 / Ultra Narrow)     160 20    
Ultra IDE     264 33    
Wide Ultra SCSI (Fast Wide 20)     320 40    
Ultra2 SCSI     320 40    
FireWire 400 - (IEEE1394a)     400 50    
USB 2.0     480 60    
Wide Ultra2 SCSI     640 80    
Ultra3 SCSI     640 80    
FireWire 800 - (IEEE1394b)     800 100    
Gigabit Ethernet     1000 125 1  
PCI - (33 MHz / 32-bit)     1064 133 1.064  
Serial ATA I - (SATA I)     1200 150 1.2  
Wide Ultra3 SCSI     1280 160 1.28  
Ultra160 SCSI     1280 160 1.28  
PCI - (33 MHz / 64-bit)     2128 266 2.128  
PCI - (66 MHz / 32-bit)     2128 266 2.128  
AGP 1x - (66 MHz / 32-bit)     2128 266 2.128  
Serial ATA II - (SATA II)     2400 300 2.4  
Ultra320 SCSI     2560 320 2.56  
FC-AL Fibre Channel     3200 400 3.2  
PCI-Express x1 - (bidirectional)     4000 500 4  
PCI - (66 MHz / 64-bit)     4256 532 4.256  
AGP 2x - (133 MHz / 32-bit)     4264 533 4.264  
Serial ATA III - (SATA III)     4800 600 4.8  
PCI-X - (100 MHz / 64-bit)     6400 800 6.4  
PCI-X - (133 MHz / 64-bit)       1064 8.512 1
AGP 4x - (266 MHz / 32-bit)       1066 8.528 1
10G Ethernet - (IEEE 802.3ae)       1250 10 1.25
PCI-Express x4 - (bidirectional)       2000 16 2
AGP 8x - (533 MHz / 32-bit)       2133 17.064 2.1
PCI-Express x8 - (bidirectional)       4000 32 4
PCI-Express x16 - (bidirectional)       8000 64 8



Hardware & Costs

The hardware used to build our example Oracle10g RAC environment consists of two Linux servers and components that can be purchased at any local computer store or over the Internet.

Server 1 - (linux1)
  Dimension 2400 Series
     - Intel Pentium 4 Processor at 2.80GHz
     - 1GB DDR SDRAM (at 333MHz)
     - 40GB 7200 RPM Internal Hard Drive
     - Integrated Intel 3D AGP Graphics
     - Integrated 10/100 Ethernet - (Broadcom BCM4401)
     - CDROM (48X Max Variable)
     - 3.5" Floppy
     - No monitor (Already had one)
     - USB Mouse and Keyboard
$620
  1 - Ethernet LAN Cards

Each Linux server should contain two NIC adapters. The Dell Dimension includes an integrated 10/100 Ethernet adapter that will be used to connect to the public network. The second NIC adapter will be used for the private interconnect.

       Linksys 10/100 Mpbs - (LNE100TX) - (Used for Interconnect to linux2)

$20
  1 - FireWire Card

The following is a list of FireWire I/O cards that contain the correct chipset, allow for multiple logins, and should work with this article (no guarantees however). FireWire I/O cards with chipsets made by VIA or TI are known to work.

       Belkin FireWire 3-Port 1394 PCI Card - (F5U501-APL)
       SIIG 3-Port 1394 I/O Card - (NN-300012)
       StarTech 4 Port IEEE-1394 PCI Firewire Card - (PCI1394_4)
       Adaptec FireConnect 4300 FireWire PCI Card - (1890600)

$30
Server 2 - (linux2)
  Dimension 2400 Series
     - Intel Pentium 4 Processor at 2.80GHz
     - 1GB DDR SDRAM (at 333MHz)
     - 40GB 7200 RPM Internal Hard Drive
     - Integrated Intel 3D AGP Graphics
     - Integrated 10/100 Ethernet - (Broadcom BCM4401)
     - CDROM (48X Max Variable)
     - 3.5" Floppy
     - No monitor (Already had one)
     - USB Mouse and Keyboard
$620
  1 - Ethernet LAN Cards

Each Linux server should contain two NIC adapters. The Dell Dimension includes an integrated 10/100 Ethernet adapter that will be used to connect to the public network. The second NIC adapter will be used for the private interconnect.

       Linksys 10/100 Mpbs - (LNE100TX) - (Used for Interconnect to linux1)

$20
  1 - FireWire Card

The following is a list of FireWire I/O cards that contain the correct chipset, allow for multiple logins, and should work with this article (no guarantees however). FireWire I/O cards with chipsets made by VIA or TI are known to work.

       Belkin FireWire 3-Port 1394 PCI Card - (F5U501-APL)
       SIIG 3-Port 1394 I/O Card - (NN-300012)
       StarTech 4 Port IEEE-1394 PCI Firewire Card - (PCI1394_4)
       Adaptec FireConnect 4300 FireWire PCI Card - (1890600)

$30
Miscellaneous Components
  FireWire Hard Drive

The following is a list of FireWire drives (and enclosures) that contain the correct chipset, allow for multiple logins, and should work with this article (no guarantees however):

       Maxtor OneTouch III - 750GB FireWire 400/USB 2.0 Drive - (T01G750)
       Maxtor OneTouch III - 500GB FireWire 400/USB 2.0 Drive - (T01G500)
       Maxtor OneTouch III - 300GB FireWire 400/USB 2.0 Drive - (T01G300)

       Maxtor OneTouch III - 500GB FireWire 400/USB 2.0 Drive - (F01G500)
       Maxtor OneTouch III - 300GB FireWire 400/USB 2.0 Drive - (F01G300)

       Maxtor OneTouch II 300GB USB 2.0 / IEEE 1394a External Hard Drive - (E01G300)
       Maxtor OneTouch II 250GB USB 2.0 / IEEE 1394a External Hard Drive - (E01G250)
       Maxtor OneTouch II 200GB USB 2.0 / IEEE 1394a External Hard Drive - (E01A200)

       LaCie Hard Drive, Design by F.A. Porsche 250GB, FireWire 400 - (300703U)
       LaCie Hard Drive, Design by F.A. Porsche 160GB, FireWire 400 - (300702U)
       LaCie Hard Drive, Design by F.A. Porsche 80GB, FireWire 400 - (300699U)

       Dual Link Drive Kit, FireWire Enclosure, ADS Technologies - (DLX185)
           Maxtor Ultra 200GB ATA-133 (Internal) Hard Drive - (L01P200)

       Maxtor OneTouch 250GB USB 2.0 / IEEE 1394a External Hard Drive - (A01A250)
       Maxtor OneTouch 200GB USB 2.0 / IEEE 1394a External Hard Drive - (A01A200)

       Ensure that the FireWire drive that you purchase supports multiple logins. If the drive has a chipset that does not allow for concurrent access for more than one server, the disk and its partitions can only be seen by one server at a time. Disks with the Oxford 911 chipset are known to work. Here are the details about the disk that I purchased for this test:
  Vendor: Maxtor
  Model: OneTouch II
  Mfg. Part No. or KIT No.: E01G300
  Capacity: 300 GB
  Cache Buffer: 16 MB
  Rotational Speed (rpm): 7200 RPM
  Interface Transfer Rate : 400 Mbits/s
  "Combo" Interface: IEEE 1394 / USB 2.0 and USB 1.1 compatible

$280
  1 - Extra FireWire Cable

Each node in the RAC configuration will need to connect to the shared storage device (the FireWire hard drive). The FireWire hard drive will come supplied with one FireWire cable. You will need to purchase one additional FireWire cable to connect the second node to the shared storage. Select the appropriate FireWire cable that is compatible with the data transmission speed (FireWire 400 / FireWire 800) and the desired cable length.

       Belkin 6-pin to 6-pin 1394 Cable, 3 ft. - (F3N400-03-ICE)
       Belkin 6-pin to 6-pin 1394 Cable, 14 ft. - (F3N400-14-ICE)

$20
  1 - Ethernet hub or switch

Used for the interconnect between int-linux1 and int-linux2. A question I often receive is about substituting the Ethernet switch (used for interconnect int-linux1 / int-linux2) with a crossover CAT5 cable. I would not recommend this. I have found that when using a crossover CAT5 cable for the interconnect, whenever I took one of the PCs down, the other PC would detect a "cable unplugged" error, and thus the Cache Fusion network would become unavailable.

       Linksys EtherFast 10/100 5-port Ethernet Switch - (EZXS55W)

$25
  4 - Network Cables

       Category 5e patch cable - (Connect linux1 to public network)
       Category 5e patch cable - (Connect linux2 to public network)
       Category 5e patch cable - (Connect linux1 to interconnect ethernet switch)
       Category 5e patch cable - (Connect linux2 to interconnect ethernet switch)

$5
$5
$5
$5

Total     $1685  

  I have received several emails since posting this article asking if the Maxtor OneTouch external drive (and the other external hard drives I have listed) has two IEEE1394 (FireWire) ports. All of the drives that I have listed and tested do have two IEEE1394 ports located on the back of the drive.

Click on the following images for a larger view of the Maxtor OneTouch external drive:

 


We are about to start the installation process. Now that we have talked about the hardware that will be used in this example, let's take a conceptual look at what the environment would look:

As we start to go into the details of the installation, it should be noted that most of the tasks within this document will need to be performed on both servers. I will indicate at the beginning of each section whether or not the task(s) should be performed on both nodes or not.



Install White Box Enterprise Linux 3.0


  Perform the following installation on all nodes in the cluster!

After procuring the required hardware, it is time to start the configuration process. The first task we need to perform is to install the Linux operating system. As already mentioned, this article will use White Box Enterprise Linux (WBEL) 3.0. Although I have used Red Hat Fedora in the past, I wanted to switch to a Linux environment that would guarantee all of the functionality contained with Oracle. This is where WBEL comes in. The WBEL Linux project takes the Red Hat Enterprise Linux 3 source RPMs, and compiles them into a free clone of the Enterprise Server 3.0 product. This provides a free and stable version of the Red Hat Enterprise Linux 3 (AS/ES) operating environment for testing different Oracle configurations. Over the last several months, I have been moving away from Fedora as I need a stable environment that is not only free, but as close to the actual Oracle supported operating system as possible. While WBEL is not the only project performing the same functionality, I tend to stick with it as it is stable and has been around the longest.


Downloading White Box Enterprise Linux

Use the links (below) to download White Box Enterprise Linux 3.0. After downloading WBEL, you will then want to burn each of the ISO images to CD.

  White Box Enterprise Linux

  If you are downloading the above ISO files to a MS Windows machine, there are many options for burning these images (ISO files) to a CD. You may already be familiar with and have the proper software to burn images to CD. If you are not familiar with this process and do not have the required software to burn images to CD, here are just two (of many) software packages that can be used:

  UltraISO
  Magic ISO Maker


Installing White Box Enterprise Linux

This section provides a summary of the screens used to install White Box Enterprise Linux. For more detailed installation instructions, it is possible to use the manuals from Red Hat Linux http://www.redhat.com/docs/manuals/. I would suggest, however, that the instructions I have provided below be used for this Oracle10g RAC configuration.

  Before installing the Linux operating system on both nodes, you should have the FireWire and two NIC interfaces (cards) installed.

Also, before starting the installation, ensure that the FireWire drive (our shared storage drive) is NOT connected to either of the two servers.

Although none of this is mandatory, it is how I will be performing the installation and configuration in this article.

After downloading and burning the WBEL images (ISO files) to CD, insert WBEL Disk #1 into the first server (linux1 in this example), power it on, and answer the installation screen prompts as noted below. After completing the Linux installation on the first node, perform the same Linux installation on the second node while substituting the node name linux1 for linux2 and the different IP addresses were appropriate.

Boot Screen

The first screen is the White Box Enterprise Linux boot screen. At the boot: prompt, hit [Enter] to start the installation process.
Media Test
When asked to test the CD media, tab over to [Skip] and hit [Enter]. If there were any errors, the media burning software would have warned us. After several seconds, the installer should then detect the video card, monitor, and mouse. The installer then goes into GUI mode.
Welcome to White Box Enterprise Linux
At the welcome screen, click [Next] to continue.
Language / Keyboard / Mouse Selection
The next three screens prompt you for the Language, Keyboard, and Mouse settings. Make the appropriate selections for your configuration.
Installation Type
Choose the [Custom] option and click [Next] to continue.
Disk Partitioning Setup
Select [Automatically partition] and click [Next] continue.

If there were a previous installation of Linux on this machine, the next screen will ask if you want to "remove" or "keep" old partitions. Select the option to [Remove all partitions on this system]. Also, ensure that the [hda] drive is selected for this installation. I also keep the checkbox [Review (and modify if needed) the partitions created] selected. Click [Next] to continue.

You will then be prompted with a dialog window asking if you really want to remove all partitions. Click [Yes] to acknowledge this warning.

Partitioning
The installer will then allow you to view (and modify if needed) the disk partitions it automatically selected. In most cases, the installer will choose 100MB for /boot, double the amount of RAM for swap, and the rest going to the root (/) partition. I like to have a minimum of 1GB for swap. For the purpose of this install, I will accept all automatically preferred sizes. (Including 2GB for swap since I have 1GB of RAM installed.)
Boot Loader Configuration
The installer will use the GRUB boot loader by default. To use the GRUB boot loader, accept all default values and click [Next] to continue.
Network Configuration
I made sure to install both NIC interfaces (cards) in each of the Linux machines before starting the operating system installation. This screen should have successfully detected each of the network devices.

First, make sure that each of the network devices are checked to [Active on boot]. The installer may choose to not activate eth1.

Second, [Edit] both eth0 and eth1 as follows. You may choose to use different IP addresses for both eth0 and eth1 and that is OK. If possible, try to put eth1 (the interconnect) on a different subnet than eth0 (the public network):

eth0:
- Check OFF the option to [Configure using DHCP]
- Leave the [Activate on boot] checked ON
- IP Address: 192.168.1.100
- Netmask: 255.255.255.0

eth1:
- Check OFF the option to [Configure using DHCP]
- Leave the [Activate on boot] checked ON
- IP Address: 192.168.2.100
- Netmask: 255.255.255.0

Continue by setting your hostname manually. I used "linux1" for the first node and "linux2" for the second. Finish this dialog off by supplying your gateway and DNS servers.

Firewall
On this screen, make sure to check [No firewall] and click [Next] to continue.
Additional Language Support / Time Zone
The next two screens allow you to select additional language support and time zone information. In almost all cases, you can accept the defaults.
Set Root Password
Select a root password and click [Next] to continue.
Package Group Selection
Scroll down to the bottom of this screen and select [Everything] under the Miscellaneous section. Click [Next] to continue.
About to Install
This screen is basically a confirmation screen. Click [Next] to start the installation. During the installation process, you will be asked to switch disks to Disk #2 and then Disk #3.
Graphical Interface (X) Configuration
When the installation is complete, the installer will attempt to detect your video hardware. Ensure that the installer has detected and selected the correct video hardware (graphics card and monitor) to properly use the X Windows server. You will continue with the X configuration in the next three screens.
Congratulations
And that's it. You have successfully installed White Box Enterprise Linux on the first node (linux1). The installer will eject the CD from the CD-ROM drive. Take out the CD and click [Exit] to reboot the system.

When the system boots into Linux for the first time, it will prompt you with another Welcome screen. The following wizard allows you to configure the date and time, add any additional users, testing the sound card, and to install any additional CDs. The only screen I care about is the time and date. As for the others, simply run through them as there is nothing additional that needs to be installed (at this point anyways!). If everything was successful, you should now be presented with the login screen.

Perform the same installation on the second node
After completing the Linux installation on the first node, repeat the above steps for the second node (linux2). When configuring the machine name and networking, ensure to configure the proper values. For my installation, this is what I configured for linux2:

First, make sure that each of the network devices are checked to [Active on boot]. The installer will choose not to activate eth1.

Second, [Edit] both eth0 and eth1 as follows:

eth0:
- Check off the option to [Configure using DHCP]
- Leave the [Activate on boot] checked
- IP Address: 192.168.1.101
- Netmask: 255.255.255.0

eth1:
- Check off the option to [Configure using DHCP]
- Leave the [Activate on boot] checked
- IP Address: 192.168.2.101
- Netmask: 255.255.255.0

Continue by setting your hostname manually. I used "linux2" for the second node. Finish this dialog off by supplying your gateway and DNS servers.



Network Configuration


  Perform the following network configuration on all nodes in the cluster!

  Although we configured several of the network settings during the installation of White Box Enterprise Linux, it is important to not skip this section as it contains critical steps that are required for a successful RAC environment.


Introduction to Network Settings

During the Linux O/S install we already configured the IP address and host name for each of the nodes. We now need to configure the /etc/hosts file as well as adjusting several of the network settings for the interconnect.

Each node should have one static IP address for the public network and one static IP address for the private cluster interconnect. The private interconnect should only be used by Oracle to transfer Cluster Manager and Cache Fusion related data. Although it is possible to use the public network for the interconnect, this not recommended as it may cause degraded database performance (reducing the amount of bandwidth for Cache Fusion and Cluster Manager traffic). For a production RAC implementation, the interconnect should be at least gigabit or more and only be used by Oracle.


Configuring Public and Private Network

In our two node example, we need to configure the network on both nodes for access to the public network as well as their private interconnect.

The easiest way to configure network settings in Red Hat Enterprise Linux 3 is with the program Network Configuration. This application can be started from the command-line as the "root" user account as follows:

# su -
# /usr/bin/redhat-config-network &

  Do not use DHCP naming for the public IP address or the interconnects - we need static IP addresses!

Using the Network Configuration application, you need to configure both NIC devices as well as the /etc/hosts file. Both of these tasks can be completed using the Network Configuration GUI. Notice that the /etc/hosts entries are the same for both nodes.

Our example configuration will use the following settings:

Server 1 - (linux1)
Device IP Address Subnet Gateway Purpose
eth0 192.168.1.100 255.255.255.0 192.168.1.1 Connects linux1 to the public network
eth1 192.168.2.100 255.255.255.0   Connects linux1 (interconnect) to linux2 (int-linux2)
/etc/hosts
127.0.0.1        localhost      loopback

# Public Network - (eth0)
192.168.1.100    linux1
192.168.1.101    linux2

# Private Interconnect - (eth1)
192.168.2.100    int-linux1
192.168.2.101    int-linux2

# Public Virtual IP (VIP) addresses for - (eth0)
192.168.1.200    vip-linux1
192.168.1.201    vip-linux2

Server 2 - (linux2)
Device IP Address Subnet Gateway Purpose
eth0 192.168.1.101 255.255.255.0 192.168.1.1 Connects linux2 to the public network
eth1 192.168.2.101 255.255.255.0   Connects linux2 (interconnect) to linux1 (int-linux1)
/etc/hosts
127.0.0.1        localhost      loopback

# Public Network - (eth0)
192.168.1.100    linux1
192.168.1.101    linux2

# Private Interconnect - (eth1)
192.168.2.100    int-linux1
192.168.2.101    int-linux2

# Public Virtual IP (VIP) addresses for - (eth0)
192.168.1.200    vip-linux1
192.168.1.201    vip-linux2

  Note that the virtual IP addresses only need to be defined in the /etc/hosts file for both nodes. The public virtual IP addresses will be configured automatically by Oracle when you run the Oracle Universal Installer, which starts Oracle's Virtual Internet Protocol Configuration Assistant (VIPCA). All virtual IP addresses will be activated when the srvctl start nodeapps -n <node_name> command is run. Although I am getting ahead of myself, this is the Host Name/IP Address that will be configured in the client(s) tnsnames.ora file for each Oracle Net Service Name. All of this will be explained much later in this article!


In the screen shots below, only node 1 (linux1) is shown. Ensure to make all the proper network settings to both nodes!



Network Configuration Screen - Node 1 (linux1)



Ethernet Device Screen - eth0 (linux1)



Ethernet Device Screen - eth1 (linux1)



Network Configuration Screen - /etc/hosts (linux1)


Once the network if configured, you can use the ifconfig command to verify everything is working. The following example is from linux1:

$ /sbin/ifconfig -a
eth0      Link encap:Ethernet  HWaddr 00:0C:41:F1:6E:9A
          inet addr:192.168.1.100  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:421591 errors:0 dropped:0 overruns:0 frame:0
          TX packets:403861 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:78398254 (74.7 Mb)  TX bytes:51064273 (48.6 Mb)
          Interrupt:9 Base address:0x400

eth1      Link encap:Ethernet  HWaddr 00:0D:56:FC:39:EC
          inet addr:192.168.2.100  Bcast:192.168.2.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1715352 errors:0 dropped:1 overruns:0 frame:0
          TX packets:4257279 errors:0 dropped:0 overruns:0 carrier:4
          collisions:0 txqueuelen:1000
          RX bytes:802574993 (765.3 Mb)  TX bytes:1236087657 (1178.8 Mb)
          Interrupt:3

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:1273787 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1273787 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:246580081 (235.1 Mb)  TX bytes:246580081 (235.1 Mb)


About Virtual IP

Why do we have a Virtual IP (VIP) in 10g? Why does it just return a dead connection when its primary node fails?

It's all about availability of the application. When a node fails, the VIP associated with it is supposed to be automatically failed over to some other node. When this occurs, two things happen.

  1. The new node re-arps the world indicating a new MAC address for the address. For directly connected clients, this usually causes them to see errors on their connections to the old address.

  2. Subsequent packets sent to the VIP go to the new node, which will send error RST packets back to the clients. This results in the clients getting errors immediately.

This means that when the client issues SQL to the node that is now down, or traverses the address list while connecting, rather than waiting on a very long TCP/IP time-out (~10 minutes), the client receives a TCP reset. In the case of SQL, this is ORA-3113. In the case of connect, the next address in tnsnames is used.

Without using VIPs, clients connected to a node that died will often wait a 10 minute TCP timeout period before getting an error. As a result, you don't really have a good HA solution without using VIPs.

Source - Metalink: "RAC Frequently Asked Questions" (Note:220970.1)


Make sure RAC node name is not listed in loopback address

Ensure that none of the node names (linux1 or linux2) are included for the loopback address in the /etc/hosts file. If the machine name is listed in the in the loopback address entry as below:
    127.0.0.1        linux1 localhost.localdomain localhost
it will need to be removed as shown below:
    127.0.0.1        localhost.localdomain localhost

  If the RAC node name is listed for the loopback address, you will receive the following error during the RAC installation:
ORA-00603: ORACLE server session terminated by fatal error
or
ORA-29702: error occurred in Cluster Group Service operation


Adjusting Network Settings

With Oracle 9.2.0.1 and onwards, Oracle now makes use of UDP as the default protocol on Linux for inter-process communication (IPC), such as Cache Fusion and Cluster Manager buffer transfers between instances within the RAC cluster.

Oracle strongly suggests to adjust the default and maximum send buffer size (SO_SNDBUF socket option) to 256 KB, and the default and maximum receive buffer size (SO_RCVBUF socket option) to 256 KB.

The receive buffers are used by TCP and UDP to hold received data until it is read by the application. The receive buffer cannot overflow because the peer is not allowed to send data beyond the buffer size window. This means that datagrams will be discarded if they don't fit in the socket receive buffer. This could cause the sender to overwhelm the receiver.

  The default and maximum window size can be changed in the /proc file system without reboot:
# su - root

# sysctl -w net.core.rmem_default=262144
net.core.rmem_default = 262144

# sysctl -w net.core.wmem_default=262144
net.core.wmem_default = 262144

# sysctl -w net.core.rmem_max=262144
net.core.rmem_max = 262144

# sysctl -w net.core.wmem_max=262144
net.core.wmem_max = 262144

The above commands made the changes to the already running O/S. You should now make the above changes permanent (for each reboot) by adding the following lines to the /etc/sysctl.conf file for each node in your RAC cluster:

# Default setting in bytes of the socket receive buffer
net.core.rmem_default=262144

# Default setting in bytes of the socket send buffer
net.core.wmem_default=262144

# Maximum socket receive buffer size which may be set by using
# the SO_RCVBUF socket option
net.core.rmem_max=262144

# Maximum socket send buffer size which may be set by using 
# the SO_SNDBUF socket option
net.core.wmem_max=262144


Check and turn off UDP ICMP rejections:

During the Linux installation process, I indicated to not configure the firewall option. (By default the option to configure a firewall is selected by the installer.) This has burned me several times so I like to do a double-check that the firewall option is not configured and to ensure udp ICMP filtering is turned off.

If UDP ICMP is blocked or rejected by the firewall, the CRS software will crash after several minutes of running. When the CRS process fails, you will have something similar to the following in the <machine_name>_evmocr.log file:

08/29/2005 22:17:19
oac_init:2: Could not connect to server, clsc retcode = 9
08/29/2005 22:17:19
a_init:12!: Client init unsuccessful : [32]
ibctx:1:ERROR: INVALID FORMAT
proprinit:problem reading the bootblock or superbloc 22
When experiencing this type of error, the solution was to remove the udp ICMP (iptables) rejection rule - or to simply have the firewall option turned off. The CRS will then start to operate normally and not crash. The following commands should be executed as the root user account:

  1. Check to ensure that the firewall option is turned off. If the firewall option is stopped (like it is in my example below) you do not have to proceed with the following steps.
    # /etc/rc.d/init.d/iptables status
    Firewall is stopped.

  2. If the firewall option is operating you will need to first manually disable UDP ICMP rejections:
    # /etc/rc.d/init.d/iptables stop
    
    Flushing firewall rules: [  OK  ]
    Setting chains to policy ACCEPT: filter [  OK  ]
    Unloading iptables modules: [  OK  ]

  3. Then, to turn UDP ICMP rejections off for next server reboot (which should always be turned off):
    # chkconfig iptables off 



Obtaining and Installing a proper Linux Kernel


  Perform the following kernel upgrade on all nodes in the cluster!


Overview

The next step is to obtain and install a new Linux kernel that supports the use of IEEE1394 devices with multiple logins. In previous releases of this article, I included the steps to download a patched version of the Linux kernel (source code) and then compile it. Thanks to Oracle's Linux Projects development group, this is no longer a requirement. They provide a pre-compiled kernel for Red Hat Enterprise Linux 3.0 (which also works with White Box Enterprise Linux!), that can simply be downloaded and installed. The instructions for downloading and installing the kernel are included in this section. Before going into the details of how to perform these actions, however, lets take a moment to discuss the changes that are required in the new kernel.

  I am using the term "multiple logins" a bit loosely in this article. The concept of "multiple login" is strictly not allowed in the IEEE1394 specification, as it is only a point to point protocol. The term "multiple logins", is often confused with "concurrent sessions", which is supported in the IEEE1394 specification. It simply means that the device allows multiple outstanding requests simultaneously (similar to the SCSI-2 protocol). Therefore multiple hosts (initiators) on a single bus are prohibited according to IEEE1394.

While FireWire drivers already exist for Linux, they often do not support shared storage. Normally, when you logon to an O/S, the O/S associates the driver to a specific drive for that machine alone. This implementation simply will not work for our RAC configuration. The shared storage (our FireWire hard drive) needs to be accessed by more than one node. We need to enable the FireWire driver to provide nonexclusive access to the drive so that multiple servers - the nodes that comprise the cluster - will be able to access the same storage. This is accomplished by removing the bit mask that identifies the machine during login in the source code. This results in allowing nonexclusive access to the FireWire hard drive. All other nodes in the cluster login to the same drive during their logon session, using the same modified driver, so they too also have nonexclusive access to the drive.

Our implementation describes a dual node cluster (each with a single processor), each server running White Box Enterprise Linux. Keep in mind that the process of installing the patched Linux kernel will need to be performed on both Linux nodes. White Box Enterprise Linux 3.0 (Respin 1) includes kernel 2.4.21-15.EL #1. We will need to download the Oracle Technet Supplied 2.4.21-27.0.2.ELorafw1 Linux kernel from the following URL: http://oss.oracle.com/projects/firewire/files.


Download one of the following files:


Take a backup of your GRUB configuration file:

In most cases you will be using GRUB for the boot loader. Before actually installing the new kernel, backup a copy of your /etc/grub.conf file:
# cp /etc/grub.conf /etc/grub.conf.original


Install the new kernel, as root:

# rpm -ivh --force kernel-2.4.21-27.0.2.ELorafw1.i686.rpm   - (for single processor)
  - OR -
# rpm -ivh --force kernel-smp-2.4.21-27.0.2.ELorafw1.i686.rpm   - (for multiple processors)

  Installing the new kernel using RPM will also update your GRUB (or lilo) configuration with the appropiate stanza. There is no need to add any new stanza to your boot loader configuration unless you want to have your old kernel image available.

The following is a listing of my /etc/grub.conf file before and then after the kernel install. As you can see, the install that I did put in another stanza for the 2.4.21-27.0.2.ELorafw1 kernel. If you want, you can chance the entry (default) in the new file so that the new kernel will be the default one booted. By default, the installer keeps the default kernel (your original one) by setting it to default=1. You should change the default value to zero (default=0) in order to enable the new kernel to boot by default.

Original /etc/grub.conf File
# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd0,0)
#          kernel /vmlinuz-version ro root=/dev/hda2
#          initrd /initrd-version.img
#boot=/dev/hda
default=0
timeout=10
splashimage=(hd0,0)/grub/splash.xpm.gz
title White Box Enterprise Linux (2.4.21-15.EL)
        root (hd0,0)
        kernel /vmlinuz-2.4.21-15.EL ro root=LABEL=/
        initrd /initrd-2.4.21-15.EL.img
Newly Configured /etc/grub.conf File After Kernel Install
# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd0,0)
#          kernel /vmlinuz-version ro root=/dev/hda2
#          initrd /initrd-version.img
#boot=/dev/hda
default=0
timeout=10
splashimage=(hd0,0)/grub/splash.xpm.gz
title White Box Enterprise Linux (2.4.21-27.0.2.ELorafw1)
        root (hd0,0)
        kernel /vmlinuz-2.4.21-27.0.2.ELorafw1 ro root=LABEL=/
        initrd /initrd-2.4.21-27.0.2.ELorafw1.img
title White Box Enterprise Linux (2.4.21-15.EL)
        root (hd0,0)
        kernel /vmlinuz-2.4.21-15.EL ro root=LABEL=/
        initrd /initrd-2.4.21-15.EL.img


Add module options:

Add the following lines to /etc/modules.conf:

alias ieee1394-controller ohci1394
options sbp2 sbp2_exclusive_login=0
post-install sbp2 insmod sd_mod
post-install sbp2 insmod ohci1394
post-remove sbp2 rmmod sd_mod

It is vital that the parameter sbp2_exclusive_login of the Serial Bus Protocol module (sbp2) be set to zero to allow multiple hosts to login to and access the FireWire disk concurrently. The second line ensures the SCSI disk driver module (sd_mod) is loaded as well since (sbp2) requires the SCSI layer. The core SCSI support module (scsi_mod) will be loaded automatically if (sd_mod) is loaded - there is no need to make a separate entry for it.


Connect FireWire drive to each machine and boot into the new kernel:

After you have performed the above tasks on both nodes in the cluster, power down both of them:
===============================

# hostname
linux1

# init 0

===============================

# hostname
linux2

# init 0

===============================
After both machines are powered down, connect each of them to the back of the FireWire drive.

Power on the FireWire drive.

Finally, power on each Linux server and ensure to boot each machine into the new kernel.


Check and turn off UDP ICMP rejections:

After rebooting each machine (above) check to ensure that the firewall option is turned off (stopped):
# /etc/rc.d/init.d/iptables status
Firewall is stopped.


Loading the FireWire stack:

  Starting with Red Hat Enterprise Linux (and of course White Box Enterprise Linux!), the loading of the FireWire stack should already be configured!

In most cases, the loading of the FireWire stack will already be configured in the /etc/rc.sysinit file. The commands that are contained within this file that are responsible for loading the FireWire stack are:

# modprobe sbp2
# modprobe ohci1394
In older versions of Red Hat, this was not the case and these commands would have to be manually run or put within a startup file. With Red Hat Enterprise Linux and higher, these commands are already put within the /etc/rc.sysinit file and run on each boot.


Check for SCSI Device:

After each machine has rebooted, the kernel should automatically detect the shared disk as a SCSI device (/dev/sdXX). This section will provide several commands that should be run on all nodes in the cluster to verify the FireWire drive was successfully detected and being shared by all nodes in the cluster.

For this configuration, I was performing the above procedures on both nodes at the same time. When complete, I shutdown both machines, started linux1 first, and then linux2. The following commands and results are from my linux2 machine. Again, make sure that you run the following commands on all nodes to ensure both machine can login to the shared drive.

Let's first check to see that the FireWire adapter was successfully detected:

# lspci
00:00.0 Host bridge: Intel Corp. 82845G/GL[Brookdale-G]/GE/PE DRAM Controller/Host-Hub Interface (rev 01)
00:02.0 VGA compatible controller: Intel Corp. 82845G/GL[Brookdale-G]/GE Chipset Integrated Graphics Device (rev 01)
00:1d.0 USB Controller: Intel Corp. 82801DB (ICH4) USB UHCI #1 (rev 01)
00:1d.1 USB Controller: Intel Corp. 82801DB (ICH4) USB UHCI #2 (rev 01)
00:1d.2 USB Controller: Intel Corp. 82801DB (ICH4) USB UHCI #3 (rev 01)
00:1d.7 USB Controller: Intel Corp. 82801DB (ICH4) USB2 EHCI Controller (rev 01)
00:1e.0 PCI bridge: Intel Corp. 82801BA/CA/DB/EB/ER Hub interface to PCI Bridge (rev 81)
00:1f.0 ISA bridge: Intel Corp. 82801DB (ICH4) LPC Bridge (rev 01)
00:1f.1 IDE interface: Intel Corp. 82801DB (ICH4) Ultra ATA 100 Storage Controller (rev 01)
00:1f.3 SMBus: Intel Corp. 82801DB/DBM (ICH4) SMBus Controller (rev 01)
00:1f.5 Multimedia audio controller: Intel Corp. 82801DB (ICH4) AC'97 Audio Controller (rev 01)
01:04.0 FireWire (IEEE 1394): Texas Instruments TSB43AB23 IEEE-1394a-2000 Controller (PHY/Link)
01:05.0 Modem: Intel Corp.: Unknown device 1080 (rev 04)
01:06.0 Ethernet controller: Linksys NC100 Network Everywhere Fast Ethernet 10/100 (rev 11)
01:09.0 Ethernet controller: Broadcom Corporation BCM4401 100Base-T (rev 01)
Second, let's check to see that the modules are loaded:
# lsmod |egrep "ohci1394|sbp2|ieee1394|sd_mod|scsi_mod"
sd_mod                 13808   0
sbp2                   19724   0
scsi_mod              106664   3  [sg sd_mod sbp2]
ohci1394               28008   0  (unused)
ieee1394               62916   0  [sbp2 ohci1394]
Third, let's make sure the disk was detected and an entry was made by the kernel:
# cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: Maxtor   Model: OneTouch         Rev: 0200
  Type:   Direct-Access                    ANSI SCSI revision: 06
Now let's verify that the FireWire drive is accessible for multiple logins and shows a valid login:
# dmesg | grep sbp2
ieee1394: sbp2: Query logins to SBP-2 device successful
ieee1394: sbp2: Maximum concurrent logins supported: 3
ieee1394: sbp2: Number of active logins: 1
ieee1394: sbp2: Logged into SBP-2 device
ieee1394: sbp2: Node[01:1023]: Max speed [S400] - Max payload [2048]

From the above output, you can see that the FireWire drive I have can support concurrent logins by up to 3 servers. It is vital that you have a drive where the chipset supports concurrent access for all nodes within the RAC cluster.

One other test I like to perform is to run a quick fdisk -l from each node in the cluster to verify that it is really being picked up by the O/S. Your drive may show that the device does not contain a valid partition table, but this is OK at this point of the RAC configuration.

# fdisk -l

Disk /dev/sda: 203.9 GB, 203927060480 bytes
255 heads, 63 sectors/track, 24792 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/sda1             1     24791 199133676    c  Win95 FAT32 (LBA)

Disk /dev/hda: 40.0 GB, 40000000000 bytes
255 heads, 63 sectors/track, 4863 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/hda1   *         1        13    104391   83  Linux
/dev/hda2            14      4609  36917370   83  Linux
/dev/hda3          4610      4863   2040255   82  Linux swap


Rescan SCSI bus no longer required:

  With Red Hat Enterprise Linux 3 (and you guessed it, White Box Enterprise Linux), you no longer need to rescan the SCSI bus in order to detect the disk! The disk should be detected automatically by the kernel as seen from the tests you performed above.

In older versions of the kernel, I would need to run the rescan-scsi-bus.sh script in order to detect the FireWire drive. The purpose of this script was to create the SCSI entry for the node by using the following command:

echo "scsi add-single-device 0 0 0 0" > /proc/scsi/scsi

With Red Hat Enterprise Linux 3, this is no longer required and the disk should be detected automatically.


Troubleshooting SCSI Device Detection:

If you are having troubles with any of the procedures (above) in detecting the SCSI device, you can try the following:
# modprobe -r sbp2
# modprobe -r sd_mod
# modprobe -r ohci1394
# modprobe ohci1394
# modprobe sd_mod
# modprobe sbp2

You may also want to unplug any USB devices connected to the server. The system may not be able to recognize your FireWire drive if you have a USB device attached!



Create "oracle" User and Directories


  Perform the following procedures on all nodes in the cluster!

  I will be using the Oracle Cluster File System (OCFS) to store the files required to be shared for the Oracle Cluster Ready Services (CRS). When using OCFS, the UID of the UNIX user "oracle" and GID of the UNIX group "dba" must be the same on all machines in the cluster. If either the UID or GID are different, the files on the OCFS file system will show up as "unowned" or may even be owned by a different user. For this article, I will use 175 for the "oracle" UID and 115 for the "dba" GID.


Create Group and User for Oracle

Lets continue this example by creating the UNIX dba group and oracle user account along with all appropriate directories.

# mkdir -p /u01/app
# groupadd -g 115 dba
# useradd -u 175 -g 115 -d /u01/app/oracle -s /bin/bash -c "Oracle Software Owner" -p oracle oracle
# chown -R oracle:dba /u01
# passwd oracle
# su - oracle

  When you are setting the Oracle environment variables for each RAC node, ensure to assign each RAC node a unique Oracle SID!

For this example, I used:

  • linux1 : ORACLE_SID=orcl1
  • linux2 : ORACLE_SID=orcl2

  The Oracle Universal Installer (OUI) requires at most 400MB of free space in the /tmp directory.

You can check the available space in /tmp by running the following command:

# df -k /tmp
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/hda2             36337384   4691460  29800056  14% /
If for any reason, you do not have enough space in /tmp, you can temporarily create space in another file system and point your TEMP and TMPDIR to it for the duration of the install. Here are the steps to do this:
# su -
# mkdir /<AnotherFilesystem>/tmp
# chown root.root /<AnotherFilesystem>/tmp
# chmod 1777 /<AnotherFilesystem>/tmp
# export TEMP=/<AnotherFilesystem>/tmp     # used by Oracle
# export TMPDIR=/<AnotherFilesystem>/tmp   # used by Linux programs
                                           #   like the linker "ld"
When the installation of Oracle is complete, you can remove the temporary directory using the following:
# su -
# rmdir /<AnotherFilesystem>/tmp
# unset TEMP
# unset TMPDIR


Create Login Script for oracle User Account

After creating the "oracle" UNIX user account on both nodes, make sure that you are logged in as the oracle user and verify that the environment is setup correctly by using the following .bash_profile:

.bash_profile for Oracle User
# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
      . ~/.bashrc
fi

alias ls="ls -FA"

# User specific environment and startup programs
export ORACLE_BASE=/u01/app/oracle
export ORACLE_HOME=$ORACLE_BASE/product/10.1.0/db_1
export ORA_CRS_HOME=$ORACLE_BASE/product/10.1.0/crs
export ORACLE_PATH=$ORACLE_BASE/common/oracle/sql:.:$ORACLE_HOME/rdbms/admin

# Each RAC node must have a unique ORACLE_SID. (i.e. orcl1, orcl2,...)
export ORACLE_SID=orcl1

export PATH=.:${PATH}:$HOME/bin:$ORACLE_HOME/bin
export PATH=${PATH}:/usr/bin:/bin:/usr/bin/X11:/usr/local/bin
export PATH=${PATH}:$ORACLE_BASE/common/oracle/bin
export ORACLE_TERM=xterm
export TNS_ADMIN=$ORACLE_HOME/network/admin
export ORA_NLS10=$ORACLE_HOME/nls/data
export LD_LIBRARY_PATH=$ORACLE_HOME/lib
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:$ORACLE_HOME/oracm/lib
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/lib:/usr/lib:/usr/local/lib
export CLASSPATH=$ORACLE_HOME/JRE
export CLASSPATH=${CLASSPATH}:$ORACLE_HOME/jlib
export CLASSPATH=${CLASSPATH}:$ORACLE_HOME/rdbms/jlib
export CLASSPATH=${CLASSPATH}:$ORACLE_HOME/network/jlib
export THREADS_FLAG=native
export TEMP=/tmp
export TMPDIR=/tmp
export LD_ASSUME_KERNEL=2.4.1


Create Mount Point for OCFS / CRS

Finally, let's create the mount point for the Oracle Cluster File System (OCFS) that will be used to store files for the Oracle Cluster Ready Service (CRS). These commands will need to be run as the "root" user account:
$ su -
# mkdir -p /u02/oradata/orcl
# chown -R oracle:dba /u02



Creating Partitions on the Shared FireWire Storage Device


  Create the following partitions on only one node in the cluster!


Overview

The next step is to create the required partitions on the FireWire (shared) drive. As mentioned earlier in this article, I will be using Oracle's Cluster File System (OCFS) to store the two files to be shared for Oracle's Cluster Ready Service (CRS). I will then be using Automatic Storage Management (ASM) for all physical database files (data/index files, online redo log files, control files, SPFILE, and archived redo log files).

The following table lists the individual partitions that will be created on the FireWire (shared) drive and what files will be contained on them.

Oracle Shared Drive Configuration
File System Type Partition Size Mount Point File Types
OCFS /dev/sda1 300 MB /u02/oradata/orcl Oracle Cluster Registry (OCR) File - (~100 MB)
CRS Voting Disk - (~20MB)
ASM /dev/sda2 50 GB ORCL:VOL1 Oracle Database Files
ASM /dev/sda3 50 GB ORCL:VOL2 Oracle Database Files
ASM /dev/sda4 50 GB ORCL:VOL3 Oracle Database Files
Total   150.3 GB    


Create All Partitions on FireWire Shared Storage

Like shown in the table (above) my FireWire drive shows up as the SCSI device /dev/sda. The fdisk command is used for creating (and removing) partitions. For this configuration, I will be creating four partitions - one for CRS and the other three for ASM (to store all Oracle database files). Before creating the new partitions, it is important to remove any existing partitions (if they exist) on the FireWire drive:
# fdisk /dev/sda
Command (m for help): p

Disk /dev/sda: 203.9 GB, 203927060480 bytes
255 heads, 63 sectors/track, 24792 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/sda1             1     24791 199133676    c  Win95 FAT32 (LBA)


Command (m for help): d
Selected partition 1

Command (m for help): p

Disk /dev/sda: 203.9 GB, 203927060480 bytes
255 heads, 63 sectors/track, 24792 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot    Start       End    Blocks   Id  System


Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-24792, default 1): 1
Last cylinder or +size or +sizeM or +sizeK (1-24792, default 24792): +300M


Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 2
First cylinder (38-24792, default 38): 38
Using default value 38
Last cylinder or +size or +sizeM or +sizeK (38-24792, default 24792): +50G

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 3
First cylinder (6118-24792, default 6118): 6118
Using default value 6118
Last cylinder or +size or +sizeM or +sizeK (6118-24792, default 24792): +50G

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Selected partition 4
First cylinder (12198-24792, default 12198): 12198
Using default value 12198
Last cylinder or +size or +sizeM or +sizeK (12198-24792, default 24792): +50G

Command (m for help): p

Disk /dev/sda: 203.9 GB, 203927060480 bytes
255 heads, 63 sectors/track, 24792 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/sda1             1        37    297171   83  Linux
/dev/sda2            38      6117  48837600   83  Linux
/dev/sda3          6118     12197  48837600   83  Linux
/dev/sda4         12198     18277  48837600   83  Linux

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.
After creating all required partitions, you should now inform the kernel of the partition changes using the following syntax as the "root" user account:
# partprobe

# fdisk -l /dev/sda
Disk /dev/sda: 203.9 GB, 203927060480 bytes
255 heads, 63 sectors/track, 24792 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/sda1             1        37    297171   83  Linux
/dev/sda2            38      6117  48837600   83  Linux
/dev/sda3          6118     12197  48837600   83  Linux
/dev/sda4         12198     18277  48837600   83  Linux

  The FireWire drive (and partitions created) will be exposed as a SCSI device.


Reboot All Nodes in RAC Cluster

After creating the partitions, it is recommended that you reboot the kernel on all RAC nodes to make sure that all of the new partitions are recognized by the kernel on all RAC nodes:
# su -
# reboot

  It is not mandatory to reboot each node. However, I have seen issues when not recycling each machine.

After each machine is back up, run the "fdisk -l /dev/sda" command on each machine in the cluster to ensure that they both can see the partition table:

# fdisk -l /dev/sda

Disk /dev/sda: 203.9 GB, 203927060480 bytes
255 heads, 63 sectors/track, 24792 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/sda1             1        37    297171   83  Linux
/dev/sda2            38      6117  48837600   83  Linux
/dev/sda3          6118     12197  48837600   83  Linux
/dev/sda4         12198     18277  48837600   83  Linux



Configuring the Linux Servers for Oracle


  Perform the following configuration procedures on all nodes in the cluster!

  Several of the commands within this section will need to be performed on every node within the cluster every time the machine is booted. This section provides very detailed information about setting shared memory, semaphores, and file handle limits. Instructions for placing them in a startup script (/etc/rc.local) are included in section "All Startup Commands for Each RAC Node".


Overview

This section focuses on configuring both Linux servers - getting each one prepared for the Oracle10g RAC installation. This includes verifying enough swap space, setting shared memory and semaphores, and finally how to set the maximum amount of file handles for the O/S.

Throughout this section you will notice that there are several different ways to configure (set) these parameters. For the purpose of this article, I will be making all changes permanent (through reboots) by placing all commands in the /etc/rc.local file. The method that I use will echo the values directly into the appropriate path of the /proc file system.


Swap Space Considerations


Setting Shared Memory

Shared memory allows processes to access common structures and data by placing them in a shared memory segment. This is the fastest form of Inter-Process Communications (IPC) available - mainly due to the fact that no kernel involvement occurs when data is being passed between the processes. Data does not need to be copied between processes.

Oracle makes use of shared memory for its Shared Global Area (SGA) which is an area of memory that is shared by all Oracle backup and foreground processes. Adequate sizing of the SGA is critical to Oracle performance since it is responsible for holding the database buffer cache, shared SQL, access paths, and so much more.

To determine all shared memory limits, use the following:

# ipcs -lm

------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 32768
max total shared memory (kbytes) = 8388608
min seg size (bytes) = 1

Setting SHMMAX

The SHMMAX parameters defines the maximum size (in bytes) for a shared memory segment. The Oracle SGA is comprised of shared memory and it is possible that incorrectly setting SHMMAX could limit the size of the SGA. When setting SHMMAX, keep in mind that the size of the SGA should fit within one shared memory segment. An inadequate SHMMAX setting could result in the following: