DBA Tips Archive for Oracle

  


Add a Node to an Existing Oracle RAC 10g R1 Cluster on Linux - (FireWire)

by Jeff Hunter, Sr. Database Administrator


Contents

  1. Overview
  2. Sharing the FireWire Drive
  3. Hardware & Costs
  4. Install the Operating System
  5. Network Configuration
  6. Obtaining and Installing a proper Linux Kernel
  7. Create "oracle" User and Directories
  8. Configuring New RAC Node for Remote Access
  9. All Startup Commands for the New RAC Node
  10. Checking RPM Packages for Oracle10g
  11. Installing and Configuring Oracle Cluster File System (OCFS)
  12. Installing and Configuring Automatic Storage Management (ASM)
  13. Add the Node to the Cluster with CRS
  14. Install the Oracle RDBMS Software on the New Node
  15. Reconfigure Listeners for New Node
  16. Create a New Oracle Instance



Overview

As your organization grows so too does your need for more application and database resources to support the company's IT systems. Oracle RAC 10g provides a scalable framework which allows DBA's to effortlessly extend the database tier to support this increased demand. As the number of users and transactions increase, additional Oracle instances can be added to the Oracle database cluster to distribute the extra load.

This document is an extension to my article "Building an Inexpensive Oracle10g RAC Configuration on Linux - (WBEL 3.0)". Contained in this document are the steps to add a single node (actually, a third node to be exact) to an already running and configured Oracle10g RAC environment.

This article assumes the following:

  1. Two-node Oracle10g Environment: As I noted previously, this article assumes that the reader has already built and configured a two-node Oracle10g RAC environment as per the article "Building an Inexpensive Oracle10g RAC Configuration on Linux - (WBEL 3.0)". This system would consist of a dual node cluster (each with a single processor), both running Linux (White Box Enterprise Linux 3.0 Respin 1 or Red Hat Enterprise Linux 3) with a shared disk storage based on IEEE1394 (FireWire) drive technology.

  2. FireWire Hub: The enclosure for the Maxtor One Touch 250GB USB 2.0 / Firewire External Hard Drive has only two IEEE1394 (FireWire) ports on the back. Both of these ports were being used by the two nodes in the current RAC configuration. To add a third node to the cluster, I needed to purchase a FireWire hub. The one I used for this article is a BELKIN F5U526-WHT White External 6-Port Firewire Hub with AC Adapter.



Sharing the FireWire Drive

This article provides the steps to add a new node to an already configured and running Oracle10g RAC environment. The article "Building an Inexpensive Oracle10g RAC Configuration on Linux - (WBEL 3.0)" can be used to setup a dual node cluster (each with a single processor), both running Linux (White Box Enterprise Linux 3.0 Respin 1 or Red Hat Enterprise Linux 3) with a shared disk storage based on IEEE1394 (FireWire) drive technology. This environment, however, uses a FireWire disk that has only IEEE1394 ports for two nodes. To add more than two nodes requires the use of a FireWire hub is required. I will be using a BELKIN F5U526-WHT White External 6-Port Firewire Hub with AC Adapter, US$65.

  A 3-node cluster requires a 4-port FireWire hub while a 4-node cluster requires a 5-port FireWire hub.

The drive firmware permits up to four nodes. The patched drivers (provided by Oracle's Linux Projects development group) can support up to four nodes, but as I already mentioned to support more than two nodes, you must add a four-, five, or six-port FireWire hub to the environment, to support the overall cable length of the bus.

According to Oracle, four is likely the maximum number of nodes that will successfully work with their patched drivers. FireWire itself is designed to support more than four - as many as sixty-four, in fact - but the drives they tried don't seem to allow for more than four nodes. The issue has been mentioned to Apple, and it's possible that this may change in future versions of the protocol.

To introduce the FireWire hub to the environment, you should power down the two nodes in the current RAC environment. Plug each of the Linux nodes into a port on the hub, and plug the FireWire drive into the hub as well.

  Without a FireWire hub, the configuration won't have enough power for the total cable length on the bus!



Hardware & Costs

The hardware used to build the third node for this article consists of a Linux workstation and components which can be purchased from any local computer store or over the Internet.

Server - (linux3)
Dimension 2400 Series
     - Intel Pentium 4 Processor at 2.80GHz
     - 1GB DDR SDRAM (at 333MHz)
     - 40GB 7200 RPM Internal Hard Drive
     - Integrated Intel 3D AGP Graphics
     - Integrated 10/100 Ethernet
     - CDROM (48X Max Variable)
     - 3.5" Floppy
     - No monitor (Already had one)
     - USB Mouse and Keyboard
$620
1 - Ethernet LAN Cards
     - Linksys 10/100 Mpbs - (Used for Interconnect to linux1 and linux2)

  The Linux server should contain two NIC adapters. The Dell Dimension includes an integrated 10/100 Ethernet adapter that will be used to connect to the public network. The second NIC adapter will be used for the private interconnect.

$20
1 - FireWire Card
     - SIIG, Inc. 3-Port 1394 I/O Card

  Cards with chipsets made by VIA or TI are known to work. In addition to the SIIG, Inc. 3-Port 1394 I/O Card, I have also successfully used the Belkin FireWire 3-Port 1394 PCI Card - (F5U501-APL) and StarTech 4 Port IEEE-1394 PCI Firewire Card I/O cards.

$30
Miscellaneous Components
1 - FireWire Hub
     - BELKIN F5U526-WHT White External 6-Port Firewire Hub
$60
1 - FireWire Cable
     - Belkin 6-pin to 6-pin 1394 Cable
$20
2 - Network Cables
     - Category 5e patch cable - (Connect linux3 to public network)
     - Category 5e patch cable - (Connect linux3 to interconnect ethernet switch)
$5
$5
Total     $760  


We are about to start the installation process. Now that we have talked about the hardware that will be used in this example, let's take a conceptual look at what the environment will look like:

Oracle10g RAC Environment
NOTE: Click on the graphic above to view larger image



Install the Operating System

After procuring the required hardware for the third node, it is time to install the Linux operating system. As already mentioned, this article will use White Box Enterprise Linux (WBEL) 3.0. Although I have used Red Hat Fedora in the past, I wanted to switch to a Linux environment that would guarantee all of the functionality contained with Oracle. This is where WBEL comes in. The WBEL Linux project takes the Red Hat Enterprise Linux 3 source RPMs, and compiles them into a free clone of the Enterprise Server 3.0 product. This provides a free and stable version of the Red Hat Enterprise Linux 3 (AS/ES) operating environment for testing different Oracle configurations. Over the last several months, I have been moving away from Fedora as I need a stable environment that is not only free, but as close to the actual Oracle supported operating system as possible. While WBEL is not the only project performing the same functionality, I tend to stick with it as it is stable and has been around the longest.


Downloading White Box Enterprise Linux

Use the links (below) to download White Box Enterprise Linux 3.0. After downloading WBEL, you will then want to burn each of the ISO images to CD.

  White Box Enterprise Linux

  If you are downloading the above ISO files to a MS Windows machine, there are many options for burning these images (ISO files) to a CD. You may already be familiar with and have the proper software to burn images to CD. If you are not familiar with this process and do not have the required software to burn images to CD, here are just two (of many) software packages that can be used:

  UltraISO
  Magic ISO Maker


Installing White Box Enterprise Linux

This section provides a summary of the screens used to install White Box Enterprise Linux. For more detailed installation instructions, it is possible to use the manuals from Red Hat Linux http://www.redhat.com/docs/manuals/. I would suggest, however, that the instructions I have provided below be used for this Oracle10g RAC configuration.

  Before installing the Linux operating system on the new node, you should have the FireWire and two NIC interfaces (cards) installed.

After downloading and burning the WBEL images (ISO files) to CD, insert WBEL Disk #1 into the new server (linux3 in this example), power it on, and answer the installation screen prompts as noted below.

Boot Screen

The first screen is the White Box Enterprise Linux boot screen. At the boot: prompt, hit [Enter] to start the installation process.
Media Test
When asked to test the CD media, tab over to [Skip] and hit [Enter]. If there were any errors, the media burning software would have warned us. After several seconds, the installer should then detect the video card, monitor, and mouse. The installer then goes into GUI mode.
Welcome to White Box Enterprise Linux
At the welcome screen, click [Next] to continue.
Language / Keyboard / Mouse Selection
The next three screens prompt you for the Language, Keyboard, and Mouse settings. Make the appropriate selections for your configuration.
Installation Type
Choose the [Custom] option and click [Next] to continue.
Disk Partitioning Setup
Select [Automatically partition] and click [Next] continue.

If there were a previous installation of Linux on this machine, the next screen will ask if you want to "remove" or "keep" old partitions. Select the option to [Remove all partitions on this system]. Also, ensure that the [hda] drive is selected for this installation. I also keep the checkbox [Review (and modify if needed) the partitions created] selected. Click [Next] to continue.

You will then be prompted with a dialog window asking if you really want to remove all partitions. Click [Yes] to acknowledge this warning.

Partitioning
The installer will then allow you to view (and modify if needed) the disk partitions it automatically selected. In most cases, the installer will choose 100MB for /boot, double the amount of RAM for swap, and the rest going to the root (/) partition. I like to have a minimum of 1GB for swap. For the purpose of this install, I will accept all automatically preferred sizes. (Including 2GB for swap since I have 1GB of RAM installed.)
Boot Loader Configuration
The installer will use the GRUB boot loader by default. To use the GRUB boot loader, accept all default values and click [Next] to continue.
Network Configuration
I made sure to install the second NIC interface (card) into the Linux machine before starting the operating system installation. This screen should have successfully detected each of the network devices.

First, make sure that each of the network devices are checked to [Active on boot]. The installer may choose to not activate eth1.

Second, [Edit] both eth0 and eth1 as follows. You may choose to use different IP addresses for both eth0 and eth1 and that is OK. Verify that eth1 (the interconnect) is on the same subnet to communicate with the other two RAC nodes. It should be on a different subnet then eth0 (the public network):

eth0:
- Check off the option to [Configure using DHCP]
- Leave the [Activate on boot] checked
- IP Address: 192.168.1.107
- Netmask: 255.255.255.0

eth1:
- Check off the option to [Configure using DHCP]
- Leave the [Activate on boot] checked
- IP Address: 192.168.2.107
- Netmask: 255.255.255.0

Continue by setting your hostname manually. I used "linux3" for the new node. Finish this dialog off by supplying your gateway and DNS servers.

Firewall
On this screen, make sure to check [No firewall] and click [Next] to continue.
Additional Language Support / Time Zone
The next two screens allow you to select additional language support and time zone information. In almost all cases, you can accept the defaults.
Set Root Password
Select a root password and click [Next] to continue.
Package Group Selection
Scroll down to the bottom of this screen and select [Everything] under the Miscellaneous section. Click [Next] to continue.
About to Install
This screen is basically a confirmation screen. Click [Next] to start the installation. During the installation process, you will be asked to switch disks to Disk #2 and then Disk #3.
Graphical Interface (X) Configuration
When the installation is complete, the installer will attempt to detect your video hardware. Ensure that the installer has detected and selected the correct video hardware (graphics card and monitor) to properly use the X Windows server. You will continue with the X configuration in the next three screens.
Congratulations
And that's it. You have successfully installed White Box Enterprise Linux on the new node (linux3). The installer will eject the CD from the CD-ROM drive. Take out the CD and click [Exit] to reboot the system.

When the system boots into Linux for the first time, it will prompt you with another Welcome screen. The following wizard allows you to configure the date and time, add any additional users, testing the sound card, and to install any additional CDs. The only screen I care about is the time and date. As for the others, simply run through them as there is nothing additional that needs to be installed (at this point anyways!). If everything was successful, you should now be presented with the login screen.



Network Configuration

  Although we configured several of the network settings during the installation of White Box Enterprise Linux, it is important to not skip this section as it contains critical steps that are required for the RAC environment.


Introduction to Network Settings

During the Linux O/S install we already configured the IP address and host name for the new node. We now need to configure the /etc/hosts file as well as adjusting several of the network settings for the interconnect.

All nodes in the RAC cluster should have one static IP address for the public network and one static IP address for the private cluster interconnect. The private interconnect should only be used by Oracle to transfer Cluster Manager and Cache Fusion related data. Note that Oracle does not support using the public network interface for the interconnect. You must have one network interface for the public network and another network interface for the private interconnect. For a production RAC implementation, the interconnect should be at least gigabit or more and only be used by Oracle.


Configuring Public and Private Network

With the new node, we need to configure the network for access to the public network as well as the private interconnect.

The easiest way to configure network settings in Red Hat Enterprise Linux 3 is with the program Network Configuration. This application can be started from the command-line as the "root" user account as follows:

# su -
# /usr/bin/redhat-config-network &

  Do not use DHCP naming for the public IP address or the interconnects - we need static IP addresses!

Using the Network Configuration application, you need to configure both NIC devices as well as the /etc/hosts file. Both of these tasks can be completed using the Network Configuration GUI. Keep in mind that the /etc/hosts entries should be the same for all nodes in the RAC configuration.

The new node should be configured with the following settings:

Server 3 - (linux3)
Device IP Address Subnet Purpose
eth0 192.168.1.107 255.255.255.0 Connects linux3 to the public network
eth1 192.168.2.107 255.255.255.0 Connects linux3 to the interconnect network
/etc/hosts
127.0.0.1        localhost      loopback

# Public Network - (eth0)
192.168.1.100    linux1
192.168.1.101    linux2
192.168.1.107    linux3

# Private Interconnect - (eth1)
192.168.2.100    int-linux1
192.168.2.101    int-linux2
192.168.2.107    int-linux3

# Public Virtual IP (VIP) addresses for - (eth0:1)
192.168.1.200    vip-linux1
192.168.1.201    vip-linux2
192.168.1.207    vip-linux3

  Note that the virtual IP addresses only need to be defined in the /etc/hosts file for the new node (and all other nodes in the cluster) The public virtual IP addresses will be configured automatically by Oracle when you run the Oracle Universal Installer, which starts Oracle's Virtual Internet Protocol Configuration Assistant (VIPCA). All virtual IP addresses will be activated when the srvctl start nodeapps -n <node_name> command is run. Although I am getting ahead of myself, this is the Host Name/IP Address that will be configured in the client(s) tnsnames.ora file. All of this will be explained much later in this article!


The screen shots below show the screen shots to configure the network.



Network Configuration Screen - Node 3 (linux3)



Ethernet Device Screen - eth0 (linux3)



Ethernet Device Screen - eth1 (linux3)



Network Configuration Screen - /etc/hosts (linux3)


Once the network if configured, you can use the ifconfig command to verify everything is working. The following example is from linux3:

$ /sbin/ifconfig -a
eth0      Link encap:Ethernet  HWaddr 00:04:5A:42:84:78
          inet addr:192.168.1.107  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:40615 errors:0 dropped:0 overruns:0 frame:0
          TX packets:51321 errors:3 dropped:0 overruns:0 carrier:6
          collisions:0 txqueuelen:1000
          RX bytes:23287447 (22.2 Mb)  TX bytes:36558628 (34.8 Mb)
          Interrupt:11 Base address:0xf000

eth1      Link encap:Ethernet  HWaddr 00:40:CA:35:6C:93
          inet addr:192.168.2.107  Bcast:192.168.2.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:43 errors:0 dropped:0 overruns:0 frame:0
          TX packets:28 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:4189 (4.0 Kb)  TX bytes:2364 (2.3 Kb)
          Interrupt:9 Base address:0xe500

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:8670 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8670 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:912592 (891.2 Kb)  TX bytes:912592 (891.2 Kb)


Make sure RAC node name is not listed in loopback address

Ensure that the node name (linux3) is not included for the loopback address in the /etc/hosts file. If the machine name is listed in the in the loopback address entry as below:
    127.0.0.1        linux3 localhost.localdomain localhost
it will need to be removed as shown below:
    127.0.0.1        localhost.localdomain localhost

  If the RAC node name is listed for the loopback address, you will receive the following error during the RAC installation:
ORA-00603: ORACLE server session terminated by fatal error
or
ORA-29702: error occurred in Cluster Group Service operation


Adjusting Network Settings

With Oracle 9.2.0.1 and onwards, Oracle now makes use of UDP as the default protocol on Linux for inter-process communication (IPC), such as Cache Fusion and Cluster Manager buffer transfers between instances within the RAC cluster.

Oracle strongly suggests to adjust the default and maximum send buffer size (SO_SNDBUF socket option) to 256 KB, and the default and maximum receive buffer size (SO_RCVBUF socket option) to 256 KB.

The receive buffers are used by TCP and UDP to hold received data until it is read by the application. The receive buffer cannot overflow because the peer is not allowed to send data beyond the buffer size window. This means that datagrams will be discarded if they don't fit in the socket receive buffer. This could cause the sender to overwhelm the receiver.

  The default and maximum window size can be changed in the /proc file system without reboot:
# su - root

# sysctl -w net.core.rmem_default=262144
net.core.rmem_default = 262144

# sysctl -w net.core.wmem_default=262144
net.core.wmem_default = 262144

# sysctl -w net.core.rmem_max=262144
net.core.rmem_max = 262144

# sysctl -w net.core.wmem_max=262144
net.core.wmem_max = 262144

The above commands made the changes to the already running O/S. You should now make the above changes permanent (for each reboot) by adding the following lines to the /etc/sysctl.conf file for each node in your RAC cluster:

# Default setting in bytes of the socket receive buffer
net.core.rmem_default=262144

# Default setting in bytes of the socket send buffer
net.core.wmem_default=262144

# Maximum socket receive buffer size which may be set by using
# the SO_RCVBUF socket option
net.core.rmem_max=262144

# Maximum socket send buffer size which may be set by using 
# the SO_SNDBUF socket option
net.core.wmem_max=262144



Obtaining and Installing a proper Linux Kernel


Overview

The next step is to obtain and install a new Linux kernel on the new node (linux3). The new linux kernel provides support for IEEE1394 devices to allow multiple logins. Red Hat Enterprise Linux 3 (and White Box Enterprise Linux 3.0 (Respin 1)) includes kernel 2.4.21-15.EL #1. We will need to download the Oracle Technet Supplied 2.4.21-27.0.2.ELorafw1 Linux kernel from the following URL: http://oss.oracle.com/projects/firewire/files.


Download one of the following files:


Take a backup of your GRUB configuration file:

In most cases you will be using GRUB for the boot loader. Before actually installing the new kernel, backup a copy of your /etc/grub.conf file:
# cp /etc/grub.conf /etc/grub.conf.original


Install the new kernel, as root:

On the new node (linux3), type the following:

# rpm -ivh --force kernel-2.4.21-27.0.2.ELorafw1.i686.rpm   - (for single processor)
  - OR -
# rpm -ivh --force kernel-smp-2.4.21-27.0.2.ELorafw1.i686.rpm   - (for multiple processors)

  Installing the new kernel using RPM will also update your GRUB (or lilo) configuration with the appropiate stanza. There is no need to add any new stanza to your boot loader configuration unless you want to have your old kernel image available.

The following is a listing of my /etc/grub.conf file before and then after the kernel install. As you can see, the install that I did put in another stanza for the 2.4.21-27.0.2.ELorafw1 kernel. If you want, you can chance the entry (default) in the new file so that the new kernel will be the default one booted. By default, the installer keeps the default kernel (your original one) by setting it to default=1. You should change the default value to zero (default=0) in order to enable the new kernel to boot by default.

Original /etc/grub.conf File
# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd0,0)
#          kernel /vmlinuz-version ro root=/dev/hda2
#          initrd /initrd-version.img
#boot=/dev/hda
default=0
timeout=10
splashimage=(hd0,0)/grub/splash.xpm.gz
title White Box Enterprise Linux (2.4.21-15.EL)
        root (hd0,0)
        kernel /vmlinuz-2.4.21-15.EL ro root=LABEL=/
        initrd /initrd-2.4.21-15.EL.img
Newly Configured /etc/grub.conf File After Kernel Install
# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd0,0)
#          kernel /vmlinuz-version ro root=/dev/hda2
#          initrd /initrd-version.img
#boot=/dev/hda
default=0
timeout=10
splashimage=(hd0,0)/grub/splash.xpm.gz
title White Box Enterprise Linux (2.4.21-27.0.2.ELorafw1)
        root (hd0,0)
        kernel /vmlinuz-2.4.21-27.0.2.ELorafw1 ro root=LABEL=/
        initrd /initrd-2.4.21-27.0.2.ELorafw1.img
title White Box Enterprise Linux (2.4.21-15.EL)
        root (hd0,0)
        kernel /vmlinuz-2.4.21-15.EL ro root=LABEL=/
        initrd /initrd-2.4.21-15.EL.img


Add module options:

Add the following lines to /etc/modules.conf:

alias ieee1394-controller ohci1394
options sbp2 sbp2_exclusive_login=0
post-install sbp2 insmod sd_mod
post-install sbp2 insmod ohci1394
post-remove sbp2 rmmod sd_mod

It is vital that the parameter sbp2_exclusive_login of the Serial Bus Protocol module (sbp2) be set to zero to allow multiple hosts to login to and access the FireWire disk concurrently. The second line ensures the SCSI disk driver module (sd_mod) is loaded as well since (sbp2) requires the SCSI layer. The core SCSI support module (scsi_mod) will be loaded automatically if (sd_mod) is loaded - there is no need to make a separate entry for it.


Connect FireWire drive to new machine and boot into the new kernel:

After you have performed the above tasks on the new node, power down all three nodes in the cluster:
===============================

# hostname
linux1

# init 0

===============================

# hostname
linux2

# init 0

===============================

# hostname
linux3

# init 0

===============================
Once all machines are powered down, connect the new node to one of the ports on the FireWire hub.

Power on all Linux servers in the cluster (including the new node) one at a time. Verify that the new node (and all other nodes) boot into the new kernel.

  I am still not sure why all of the machines had to be powered down before the new Linux server was able to see the shared drive. When I attempted to connect the third Linux server to the cluster and recognize the shared drive, it crashed the first two RAC instances. For now, the easiest way to perform this is to cycle all nodes in the cluster so the new server can recognize the shared drive.


Loading the FireWire stack:

  Starting with Red Hat Enterprise Linux (and of course White Box Enterprise Linux!), the loading of the FireWire stack should already be configured!

In most cases, the loading of the FireWire stack will already be configured in the /etc/rc.sysinit file. The commands that are contained within this file that are responsible for loading the FireWire stack are:

# modprobe sbp2
# modprobe ohci1394
In older versions of Red Hat, this was not the case and these commands would have to be manually run or put within a startup file. With Red Hat Enterprise Linux and higher, these commands are already put within the /etc/rc.sysinit file and run on each boot.


Check for SCSI Device:

After the new node has rebooted, the kernel should automatically detect the shared drive as a SCSI device (/dev/sdXX). This section will provide several commands that should be run on the new node in the cluster to verify the FireWire drive was successfully detected and being shared by the new node.

Let's first check to see that the FireWire adapter was successfully detected:

# lspci
00:00.0 Host bridge: Intel Corp. 82845G/GL[Brookdale-G]/GE/PE DRAM Controller/Host-Hub Interface (rev 01)
00:02.0 VGA compatible controller: Intel Corp. 82845G/GL[Brookdale-G]/GE Chipset Integrated Graphics Device (rev 01)
00:1d.0 USB Controller: Intel Corp. 82801DB (ICH4) USB UHCI #1 (rev 01)
00:1d.1 USB Controller: Intel Corp. 82801DB (ICH4) USB UHCI #2 (rev 01)
00:1d.2 USB Controller: Intel Corp. 82801DB (ICH4) USB UHCI #3 (rev 01)
00:1d.7 USB Controller: Intel Corp. 82801DB (ICH4) USB2 EHCI Controller (rev 01)
00:1e.0 PCI bridge: Intel Corp. 82801BA/CA/DB/EB/ER Hub interface to PCI Bridge (rev 81)
00:1f.0 ISA bridge: Intel Corp. 82801DB (ICH4) LPC Bridge (rev 01)
00:1f.1 IDE interface: Intel Corp. 82801DB (ICH4) Ultra ATA 100 Storage Controller (rev 01)
00:1f.3 SMBus: Intel Corp. 82801DB/DBM (ICH4) SMBus Controller (rev 01)
00:1f.5 Multimedia audio controller: Intel Corp. 82801DB (ICH4) AC'97 Audio Controller (rev 01)
01:04.0 FireWire (IEEE 1394): Texas Instruments TSB43AB23 IEEE-1394a-2000 Controller (PHY/Link)
01:05.0 Modem: Intel Corp.: Unknown device 1080 (rev 04)
01:06.0 Ethernet controller: Linksys NC100 Network Everywhere Fast Ethernet 10/100 (rev 11)
01:09.0 Ethernet controller: Broadcom Corporation BCM4401 100Base-T (rev 01)
Second, let's check to see that the modules are loaded:
# lsmod |egrep "ohci1394|sbp2|ieee1394|sd_mod|scsi_mod"
sbp2                   19724   0
ohci1394               28008   0  (unused)
ieee1394               62884   0  [sbp2 ohci1394]
sd_mod                 13712   0
scsi_mod              106664   4  [sg sbp2 sym53c8xx sd_mod]
Third, let's make sure the disk was detected and an entry was made by the kernel:
# cat /proc/scsi/scsi
Attached devices:
Host: scsi2 Channel: 00 Id: 00 Lun: 00
  Vendor: Maxtor   Model: OneTouch         Rev: 0200
  Type:   Direct-Access                    ANSI SCSI revision: 06
Now let's verify that the FireWire drive is accessible for multiple logins and shows a valid login:
# dmesg | grep sbp2
ieee1394: sbp2: Query logins to SBP-2 device successful
ieee1394: sbp2: Maximum concurrent logins supported: 3
ieee1394: sbp2: Number of active logins: 2
ieee1394: sbp2: Logged into SBP-2 device
ieee1394: sbp2: Node[01:1023]: Max speed [S400] - Max payload [2048]

From the above output, you can see that the FireWire drive I have can support concurrent logins by up to 3 servers. It is vital that you have a drive where the chipset supports concurrent access for all nodes within the RAC cluster.

One other test I like to perform is to run a quick fdisk -l from each node in the cluster to verify that it is really being picked up by the O/S. It will show that the device contains all partitions that were created when the cluster was first configured:

# fdisk -l

Disk /dev/sda: 203.9 GB, 203927060480 bytes
255 heads, 63 sectors/track, 24792 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/sda1             1        37    297171   83  Linux
/dev/sda2            38      6117  48837600   83  Linux
/dev/sda3          6118     12197  48837600   83  Linux
/dev/sda4         12198     18277  48837600   83  Linux

Disk /dev/hda: 40.0 GB, 40000000000 bytes
255 heads, 63 sectors/track, 4863 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/hda1   *         1        13    104391   83  Linux
/dev/hda2            14      4609  36917370   83  Linux
/dev/hda3          4610      4863   2040255   82  Linux swap


Troubleshooting SCSI Device Detection:

If you are having troubles with any of the procedures (above) in detecting the SCSI device, you can try the following:
# modprobe -r sbp2
# modprobe -r sd_mod
# modprobe -r ohci1394
# modprobe ohci1394
# modprobe sd_mod
# modprobe sbp2

You may also want to unplug any USB devices connected to the server. The system may not be able to recognize your FireWire drive if you have a USB device attached!



Create "oracle" User and Directories


  The required shared files for Oracle Cluster Ready Services (CRS) are stored using the Oracle Cluster File System (OCFS) within the original RAC cluster. When using OCFS, the UID of the UNIX user "oracle" and GID of the UNIX group "dba" must be the same on all machines in the cluster. If either the UID or GID are different, the files on the OCFS file system will show up as "unowned" or may even be owned by a different user. For this article (and the previous article that created the original RAC configuration), I used 175 for the "oracle" UID and 115 for the "dba" GID.


Create Group and User for Oracle

Lets continue this example by creating the UNIX dba group and oracle user account along with all appropriate directories on the new node.

# mkdir -p /u01/app
# groupadd -g 115 dba
# useradd -u 175 -g 115 -d /u01/app/oracle -s /bin/bash -c "Oracle Software Owner" -p oracle oracle
# chown -R oracle:dba /u01
# passwd oracle
# su - oracle

  When you are setting the Oracle environment variables for each RAC node, ensure to assign each RAC node a unique Oracle SID!

For this example, I used:

  • linux1 : ORACLE_SID=orcl1
  • linux2 : ORACLE_SID=orcl2
  • linux3 : ORACLE_SID=orcl3


Create Login Script for oracle User Account

After creating the "oracle" UNIX user account on the new node, make sure that you are logged in as the oracle user and verify that the environment is setup correctly by using the following .bash_profile:

.bash_profile for Oracle User
# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
      . ~/.bashrc
fi

alias ls="ls -FA"
alias s="screen -DRRS iPad -t iPad"

# User specific environment and startup programs
export ORACLE_BASE=/u01/app/oracle
export ORACLE_HOME=$ORACLE_BASE/product/10.1.0/db_1
export ORA_CRS_HOME=$ORACLE_BASE/product/10.1.0/crs

# Each RAC node must have a unique ORACLE_SID. (i.e. orcl1, orcl2, orcl3, ...)
export ORACLE_SID=orcl3

export PATH=.:${PATH}:$HOME/bin:$ORACLE_HOME/bin
export PATH=${PATH}:/usr/bin:/bin:/usr/bin/X11:/usr/local/bin
export ORACLE_TERM=xterm
export TNS_ADMIN=$ORACLE_HOME/network/admin
export ORA_NLS10=$ORACLE_HOME/nls/data
export NLS_DATE_FORMAT="DD-MON-YYYY HH24:MI:SS"
export LD_LIBRARY_PATH=$ORACLE_HOME/lib
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:$ORACLE_HOME/oracm/lib
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/lib:/usr/lib:/usr/local/lib
export CLASSPATH=$ORACLE_HOME/JRE
export CLASSPATH=${CLASSPATH}:$ORACLE_HOME/jlib
export CLASSPATH=${CLASSPATH}:$ORACLE_HOME/rdbms/jlib
export CLASSPATH=${CLASSPATH}:$ORACLE_HOME/network/jlib
export THREADS_FLAG=native
export TEMP=/tmp
export TMPDIR=/tmp
export LD_ASSUME_KERNEL=2.4.1


Create Mount Point for OCFS / CRS

Finally, let's create the mount point for the Oracle Cluster File System (OCFS) which is currently being used to store files for the Oracle Cluster Ready Service (CRS). These commands will need to be run as the "root" user account:
$ su -
# mkdir -p /u02/oradata/orcl
# chown -R oracle:dba /u02



Configuring New RAC Node for Remote Access

When running the Oracle Universal Installer on a RAC node, it will use the rsh (or ssh) command to copy the Oracle software to all other nodes within the RAC cluster. The oracle UNIX account on the node running the Oracle Installer (runIntaller) must be trusted by all other nodes in your RAC cluster (including the new node we are adding!). This means that you should be able to run r* commands like rsh, rcp, and rlogin on the Linux server you will be running the Oracle installer from, against all other Linux servers in the cluster without a password. The rsh daemon validates users using the /etc/hosts.equiv file or the .rhosts file found in the user's (oracle's) home directory.

  The use of rcp and rsh are not required for normal RAC operation. However rcp and rsh should to be enabled for RAC, patchset installation and for the tasks of adding nodes to the cluster.

  Oracle added support in 10g for using the Secure Shell (SSH) tool suite for setting up user equivalence. This article, however, uses the older method of rcp for copying the Oracle software to the other nodes in the cluster. When using the SSH tool suite, the scp (as opposed to the rcp) command would be used to copy the software in a very secure manner. In an effort to get this article out on time, I did not include instructions for setting up and using the SSH protocol. I will start using SSH in future articles.

First, let's make sure that we have the rsh RPMs installed on on the new node:

# rpm -q rsh rsh-server
rsh-0.17-17
rsh-server-0.17-17
From the above, we can see that we have the rsh and rsh-server installed.

  If rsh is not installed, run the following command from the CD where the RPM is located:
# su -
# rpm -ivh rsh-0.17-17.i386.rpm rsh-server-0.17-17.i386.rpm

To enable the "rsh" service, the "disable" attribute in the /etc/xinetd.d/rsh file must be set to "no" and xinetd must be reloaded. This can be done by running the following commands on the new node:

# su -
# chkconfig rsh on
# chkconfig rlogin on
# service xinetd reload
Reloading configuration: [  OK  ]
To allow the "oracle" UNIX user account to be trusted among the RAC nodes, create the /etc/hosts.equiv file on the new node:
# su -
# touch /etc/hosts.equiv
# chmod 600 /etc/hosts.equiv
# chown root.root /etc/hosts.equiv
Now add all RAC nodes to the /etc/hosts.equiv file similar to the following example for all nodes in the cluster:
# cat /etc/hosts.equiv
+linux1 oracle
+linux2 oracle
+linux3 oracle
+int-linux1 oracle
+int-linux2 oracle
+int-linux3 oracle

  In the above example, the second field permits only the oracle user account to run rsh commands on the specified nodes. For security reasons, the /etc/hosts.equiv file should be owned by root and the permissions should be set to 600. In fact, some systems will only honor the content of this file if the owner of this file is root and the permissions are set to 600.

  Before attempting to test your rsh command, ensure that you are using the correct version of rsh. By default, Red Hat Linux puts /usr/kerberos/sbin at the head of the $PATH variable. This will cause the Kerberos version of rsh to be executed.

I will typically rename the Kerberos version of rsh so that the normal rsh command is being used. Use the following:

# su -

# which rsh
/usr/kerberos/bin/rsh

# cd /usr/kerberos/bin
# mv rsh rsh.original

# which rsh
/usr/bin/rsh

You should now test your connections and run the rsh command from the node that will be performing the Oracle CRS and 10g RAC installation. I will be using the node linux1 to perform the install so this is where I will run the following commands from:

# su - oracle

$ hostname
linux1

$ rsh linux3 ls -l /etc/hosts.equiv
-rw-------    1 root     root           102 May 24 11:23 /etc/hosts.equiv

$ rsh int-linux3 ls -l /etc/hosts.equiv
-rw-------    1 root     root           102 May 24 11:23 /etc/hosts.equiv



All Startup Commands for the New RAC Node

Up to this point, we have talked about some (but not all) of the parameters and resources that need to be configured on the new node for the Oracle10g RAC configuration. This section will take a deep breath and recap those parameters, commands, and entries (in previous sections of this document) that need to happen on each node when the machine is booted.

In this section, I provide all of the commands, parameters, and entries that have been discussed so far that will need to be included in the startup scripts for the new node to be added to the RAC cluster. For each of the startup files below, I indicate in blue the entries that should be included in each of the startup files in order to provide a successful RAC node.

  Notice that the parameters used to configure the hangcheck-timer and O/S kernel parameters for Oracle have not been discussed in any detail in this article. I did, however, include them in the files below. All of these parameters will need to be configured for the new Linux node being added to the cluster. In particular, the hangcheck-timer parameters will need to be configured in /etc/modules.conf and /etc/rc.local while all O/S kernel parameters are included in /etc/rc.local.


/etc/modules.conf

All parameters and values to be used by kernel modules.

/etc/modules.conf
alias eth0 tulip
alias eth1 b44
alias sound-slot-0 i810_audio
post-install sound-slot-0 /bin/aumix-minimal -f /etc/.aumixrc -L >/dev/null 2>&1 || :
pre-remove sound-slot-0 /bin/aumix-minimal -f /etc/.aumixrc -S >/dev/null 2>&1 || :
alias usb-controller usb-uhci
alias usb-controller1 ehci-hcd
alias ieee1394-controller ohci1394
options sbp2 sbp2_exclusive_login=0
post-install sbp2 insmod sd_mod
post-install sbp2 insmod ohci1394
post-remove sbp2 rmmod sd_mod
options hangcheck-timer hangcheck_tick=30 hangcheck_margin=180


/etc/sysctl.conf

We wanted to adjust the default and maximum send buffer size as well as the default and maximum receive buffer size for the interconnect.

/etc/sysctl.conf
# Kernel sysctl configuration file for Red Hat Linux
#
# For binary values, 0 is disabled, 1 is enabled.  See sysctl(8) and
# sysctl.conf(5) for more details.

# Controls IP packet forwarding
net.ipv4.ip_forward = 0

# Controls source route verification
net.ipv4.conf.default.rp_filter = 1

# Controls the System Request debugging functionality of the kernel
kernel.sysrq = 0

# Controls whether core dumps will append the PID to the core filename.
# Useful for debugging multi-threaded applications.
kernel.core_uses_pid = 1

# Default setting in bytes of the socket receive buffer
net.core.rmem_default=262144

# Default setting in bytes of the socket send buffer
net.core.wmem_default=262144

# Maximum socket receive buffer size which may be set by using
# the SO_RCVBUF socket option
net.core.rmem_max=262144

# Maximum socket send buffer size which may be set by using
# the SO_SNDBUF socket option
net.core.wmem_max=262144


/etc/hosts

All machine/IP entries for nodes in the RAC cluster.

/etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1        localhost.localdomain   localhost
# Public Network - (eth0)
192.168.1.100    linux1
192.168.1.101    linux2
192.168.1.107    linux3
# Private Interconnect - (eth1)
192.168.2.100    int-linux1
192.168.2.101    int-linux2
192.168.2.107    int-linux3
# Public Virtual IP (VIP) addresses for - (eth0)
192.168.1.200    vip-linux1
192.168.1.201    vip-linux2
192.168.1.207    vip-linux3
192.168.1.106    melody
192.168.1.102    alex
192.168.1.105    bartman


/etc/hosts.equiv

Allow logins to each node as the oracle user account without the need for a password.

/etc/hosts.equiv
+linux1 oracle
+linux2 oracle
+linux3 oracle
+int-linux1 oracle
+int-linux2 oracle
+int-linux3 oracle


/etc/grub.conf

Determine which kernel to load by default when the node is booted.

/etc/grub.conf
# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd0,0)
#          kernel /vmlinuz-version ro root=/dev/hda2
#          initrd /initrd-version.img
#boot=/dev/hda
default=0
timeout=10
splashimage=(hd0,0)/grub/splash.xpm.gz
title White Box Enterprise Linux (2.4.21-27.0.2.ELorafw1)
        root (hd0,0)
        kernel /vmlinuz-2.4.21-27.0.2.ELorafw1 ro root=LABEL=/
        initrd /initrd-2.4.21-27.0.2.ELorafw1.img
title White Box Enterprise Linux (2.4.21-15.EL)
        root (hd0,0)
        kernel /vmlinuz-2.4.21-15.EL ro root=LABEL=/
        initrd /initrd-2.4.21-15.EL.img


/etc/rc.local

These commands are responsible for configuring shared memory, semaphores, and file handles for use by the Oracle instance.

/etc/rc.local
#!/bin/sh
#
# This script will be executed *after* all the other init scripts.
# You can put your own initialization stuff in here if you don't
# want to do the full Sys V style init stuff.

touch /var/lock/subsys/local

# +---------------------------------------------------------+
# | SHARED MEMORY                                           |
# +---------------------------------------------------------+

echo "2147483648" > /proc/sys/kernel/shmmax
echo       "4096" > /proc/sys/kernel/shmmni


# +---------------------------------------------------------+
# | SEMAPHORES                                              |
# | ----------                                              |
# |                                                         |
# | SEMMSL_value  SEMMNS_value  SEMOPM_value  SEMMNI_value  |
# |                                                         |
# +---------------------------------------------------------+

echo "256 32000 100 128" > /proc/sys/kernel/sem


# +---------------------------------------------------------+
# | FILE HANDLES                                            |
# ----------------------------------------------------------+

echo "65536" > /proc/sys/fs/file-max


# +---------------------------------------------------------+
# | HANGCHECK TIMER                                         |
# | (I do not believe this is required, but doesn't hurt)   |
# ----------------------------------------------------------+

/sbin/modprobe hangcheck-timer



Checking RPM Packages for Oracle10g


Overview

When installing the Linux O/S (White Box Enterprise Linux or Red Hat Enterprise Linux 3.0), you should verify that all required RPMs are installed. If you followed the instructions I used for installing Linux, you would have installed Everything, in which case you will have all of the required RPM packages. However, if you performed another installation type (i.e. "Advanced Server), you may have some packages missing and will need to install them. All of the required RPMs are on the Linux CDs/ISOs.


Check Required RPMs

The following packages (or higher versions) must be installed.

make-3.79.1
gcc-3.2.3-34
glibc-2.3.2-95.20
glibc-devel-2.3.2-95.20
glibc-headers-2.3.2-95.20
glibc-kernheaders-2.4-8.34
cpp-3.2.3-34
compat-db-4.0.14-5
compat-gcc-7.3-2.96.128
compat-gcc-c++-7.3-2.96.128
compat-libstdc++-7.3-2.96.128
compat-libstdc++-devel-7.3-2.96.128
openmotif-2.2.2-16
setarch-1.3-1
To query package information (gcc and glibc-devel for example), use the "rpm -q <PackageName> [, <PackageName>]" command as follows:
# rpm -q gcc glibc-devel
gcc-3.2.3-34
glibc-devel-2.3.2-95.20
If you need to install any of the above packages (which you should not have to if you installed Everything), use the "rpm -Uvh <PackageName.rpm>" command. For example, to install the GCC 3.2.3-24 package, use:
# rpm -Uvh gcc-3.2.3-24.i386.rpm



Installing and Configuring Oracle Cluster File System (OCFS)


Overview

It is now time to install the Oracle Cluster File System (OCFS). The files that are required to be shared by CRS (there will be two of them) in the current cluster are using OCFS Release 1.0.

See the following document for more information on Oracle Cluster File System Release 1.0 (including Installation Notes) for Red Hat Linux:

  Oracle Cluster File System - (Part No. B10499-01)


Downloading OCFS

Let's now download the OCFS files (driver, tools, support) from the Oracle Linux Projects Development Group web site. The main URL for the OCFS project files is:
  http://oss.oracle.com/projects/ocfs/files/RedHat/RHEL3/i386/
The page (above) will contain several releases of the OCFS files for different versions of the Linux kernel. First, download the key OCFS drivers for either a single processor or a multiple processor Linux server:
  ocfs-2.4.21-EL-1.0.14-1.i686.rpm - (for single processor)
- OR -
  ocfs-2.4.21-EL-smp-1.0.14-1.i686.rpm - (for multiple processors)
You will also need to download the following two support files:
  ocfs-support-1.0.10-1.i386.rpm - (1.0.10-1 support package)
  ocfs-tools-1.0.10-1.i386.rpm - (1.0.10-1 tools package)

  If you were curious as to which OCFS driver release you need, use the OCFS release that matches your kernel version. To determine your kernel release:
$ uname -a
Linux linux1 2.4.21-27.0.2.ELorafw1 #1 Tue Dec 28 16:58:59 PST 2004 i686 i686 i386 GNU/Linux
In the absence of the string "smp" after the string "ELorafw1", we are running a single processor (Uniprocessor) machine. If the string "smp" were to appear, then you would be running on a multi-processor machine.


Installing OCFS

The new node I will be installing the OCFS files onto is a single processor Linux machine. The installation process is simply a matter of running the following command as the root user account:
$ su -
# rpm -Uvh ocfs-2.4.21-EL-1.0.14-1.i686.rpm \
         ocfs-support-1.0.10-1.i386.rpm \
         ocfs-tools-1.0.10-1.i386.rpm
Preparing...                ########################################### [100%]
   1:ocfs-support           ########################################### [ 33%]
   2:ocfs-2.4.21-EL         ########################################### [ 67%]
Linking OCFS module into the module path [  OK  ]
   3:ocfs-tools             ########################################### [100%]


Configuring and Loading OCFS

The next step is to generate and configure the /etc/ocfs.conf file. The easiest way to accomplish this is to run the GUI tool ocfstool This will need to be done as the root user account:
$ su -
# ocfstool &
This will bring up the GUI as shown below:


Using the ocfstool GUI tool, perform the following steps:

  1. Select [Task] - [Generate Config]
  2. In the "OCFS Generate Config" dialog, enter the interface and DNS Name for the private interconnect. In my example, this would be eth1 and int-linux3 for the new node.
  3. After verifying all values are correct, exit the application.


The following dialog shows the settings I used for the node linux3:


After exiting the ocfstool, you will have a /etc/ocfs.conf similar to the following:

/etc/ocfs.conf
#
# ocfs config
# Ensure this file exists in /etc
#

        node_name = int-linux3
        ip_address = 192.168.2.107
        ip_port = 7000
        comm_voting = 1
        guid = 2B051E7EF7F329918E210040CA356C93

  Notice the guid value. This is a group user ID that has to be unique for all nodes in the cluster. Keep in mind also, that the /etc/ocfs.conf could have been created manually or by simply running the ocfs_uid_gen -c command that will assign (or update) the GUID value in the file.

The next step is to load the ocfs.o kernel module. Like all steps in this section, run the following command as the root user account:

$ su -
# /sbin/load_ocfs
/sbin/insmod ocfs node_name=int-linux3 ip_address=192.168.2.107 cs=1803 guid=2B051E7EF7F329918E210040CA356C93 comm_voting=1 ip_port=7000
Using /lib/modules/2.4.21-EL-ABI/ocfs/ocfs.o
Warning: kernel-module version mismatch
        /lib/modules/2.4.21-EL-ABI/ocfs/ocfs.o was compiled for kernel version 2.4.21-27.EL
        while this kernel is version 2.4.21-27.0.2.ELorafw1
Warning: loading /lib/modules/2.4.21-EL-ABI/ocfs/ocfs.o will taint the kernel: forced load
  See http://www.tux.org/lkml/#export-tainted for information about tainted modules
Module ocfs loaded, with warnings


The two warnings (above) can safely be ignored! To verify that the kernel module was loaded, run the following:

# /sbin/lsmod |grep ocfs
ocfs                  299072   0  (unused)


  The ocfs module will stay loaded until the machine is cycled. I will provide instructions for how to load the module automatically in the section Configuring OCFS to Mount Automatically at Startup.


  Many types of errors can occur while attempting to load the ocfs module. For the purpose of this article, I did not run into any of these problems. I only include them here for documentation purposes!
One common error looks like this:
# /sbin/load_ocfs
/sbin/insmod ocfs node_name=int-linux3 \
                  ip_address=192.168.2.107 \
                  cs=1891 \
                  guid=2B051E7EF7F329918E210040CA356C93 \
                  comm_voting=1 ip_port=7000
Using /lib/modules/2.4.21-EL-ABI/ocfs/ocfs.o
/lib/modules/2.4.21-EL-ABI/ocfs/ocfs.o: kernel-module version mismatch
        /lib/modules/2.4.21-EL-ABI/ocfs/ocfs.o was compiled for kernel version 2.4.21-4.EL
        while this kernel is version 2.4.21-15.ELorafw1.
This usually means you have the wrong version of the modutils RPM. Get the latest version of modutils and use the following commnad to update your system:
rpm -Uvh modutils-devel-2.4.25-12.EL.i386.rpm


Other problems can occur when using FireWire. If you are still having troubles loading and verifying the loading of the ocfs module, try the following on all nodes that are having the error as the "root" user account

$ su -
# /lib/modules/`uname -r`/kernel/drivers/addon/ocfs
# ln -s `rpm -qa | grep ocfs-2 | xargs rpm -ql | grep "/ocfs.o$"` \
      /lib/modules/`uname -r`/kernel/drivers/addon/ocfs/ocfs.o

Thanks again to Werner Puschitz for coming up with the above solutions!


Mounting the OCFS File System

Now that the file system is created, we can mount it. Let's first do it using the command-line, then I'll show how to include it in the /etc/fstab to have it mount on each boot.

  Mounting the file system will need to be performed as the root user account.

First, here is how to manually mount the OCFS file system from the command-line. Remember that this needs to be performed as the root user account:

$ su -
# mount -t ocfs /dev/sda1 /u02/oradata/orcl
If the mount was successful, you will simply got your prompt back. We should, however, run the following checks to ensure the file system is mounted correctly with the right permissions:

First, let's use the mount command to ensure that the new file system is really mounted:

# mount
/dev/hda3 on / type ext3 (rw)
none on /proc type proc (rw)
none on /dev/pts type devpts (rw,gid=5,mode=620)
usbdevfs on /proc/bus/usb type usbdevfs (rw)
/dev/hda1 on /boot type ext3 (rw)
none on /dev/shm type tmpfs (rw)
/dev/sda1 on /u02/oradata/orcl type ocfs (rw)

Next, use the ls command to check ownership. The permissions should be set to 0775 with owner "oracle" and group "dba". If this is not the case for all nodes in the cluster, then it is very possible that the "oracle" UID (175 in this example) and/or the "dba" GID (115 in this example) are not the same across all nodes.

# ls -ld /u02/oradata/orcl
drwxrwxr-x    1 oracle   dba        131072 May 24 12:52 /u02/oradata/orcl


Configuring OCFS to Mount Automatically at Startup

Let's take a look at what we have done so far. We downloaded and installed the Oracle Cluster File System that will be used to store the files needed by Cluster Manager files. After going through the install, we loaded the OCFS module into the kernel and then created the clustered file system. Finally, we mounted the newly created file system. This section walks through the steps responsible for loading the OCFS module and ensure the file system(s) are mounted each time the machine(s) are booted.

We start by adding the following line to the /etc/fstab file on the new node:

/dev/sda1    /u02/oradata/orcl     ocfs    _netdev    0 0

  Notice the "_netdev" option for mounting this file system. This option prevents the OCFS file system from being mounted until all of the networking services are enabled.

Now, let's make sure that the ocfs.o kernel module is being loaded and that the file system will be mounted during the boot process.

If you have been following along with the examples in this article, the actions to load the kernel module and mount the OCFS file system should already be enabled. However, we should still check those options by running the following as the root user account:

$ su -
# chkconfig --list ocfs
ocfs            0:off   1:off   2:on    3:on    4:on    5:on    6:off
The flags that I have marked in bold should be set to "on". If for some reason these options are set to "off", you can use the following command to enable them:
$ su -
# chkconfig ocfs on


  Note that loading the ocfs.o kernel module will also mount the OCFS file system(s) configured in /etc/fstab!



Installing and Configuring Automatic Storage Management (ASM)


Introduction

In this section, we will configure Automatic Storage Management (ASM) to be used as the file system / volume manager for all Oracle physical database files (data, online redo logs, control files, archived redo logs).

If you would like to learn more about the ASMLib, visit http://www.oracle.com/technology/tech/linux/asmlib/install.html


Downloading the ASMLib Packages

We start this section by downloading the ASMLib libraries and driver from OTN. Like the Oracle Cluster File System, we need to download the version for the Linux kernel and number of processors on the machine. We are using kernel 2.4.21 while the machine I am using has only a single processor:
# uname -a
Linux linux1 2.4.21-27.0.2.ELorafw1 #1 Tue Dec 28 16:58:59 PST 2004 i686 i686 i386 GNU/Linux

  If you do not currently have an account with Oracle OTN, you will need to create one. This is a FREE account!


  Oracle ASMLib Downloads


Installing ASMLib Packages

This installation needs to be performed as the root user account:
$ su -
# rpm -Uvh oracleasm-2.4.21-EL-1.0.3-1.i686.rpm \
         oracleasmlib-1.0.0-1.i386.rpm \
         oracleasm-support-1.0.3-1.i386.rpm
Preparing...                ########################################### [100%]
   1:oracleasm-support      ########################################### [ 33%]
   2:oracleasm-2.4.21-EL    ########################################### [ 67%]
Linking module oracleasm.o into the module path [  OK  ]
   3:oracleasmlib           ########################################### [100%]


Configuring and Loading the ASMLib Packages

Now that we downloaded and installed the ASMLib Packages for Linux, we now need to configure and load the ASM kernel module. This task needs to be run as the root user account:
$ su -
# /etc/init.d/oracleasm configure
Configuring the Oracle ASM library driver.

This will configure the on-boot properties of the Oracle ASM library
driver.  The following questions will determine whether the driver is
loaded on boot and what permissions it will have.  The current values
will be shown in brackets ('[]').  Hitting  without typing an
answer will keep that current value.  Ctrl-C will abort.

Default user to own the driver interface []: oracle
Default group to own the driver interface []: dba
Start Oracle ASM library driver on boot (y/n) [n]: y
Fix permissions of Oracle ASM disks on boot (y/n) [y]: y
Writing Oracle ASM library driver configuration [  OK  ]
Creating /dev/oracleasm mount point [  OK  ]
Loading module "oracleasm" [  OK  ]
Mounting ASMlib driver filesystem [  OK  ]
Scanning system for ASM disks [  OK  ]


Scan for ASM Disks

From the new node, you can now perform a scandisk to recognize the current volumes. Even though the above configuration automatically ran the scandisk utility, I still like to manually perform this step!
# /etc/init.d/oracleasm scandisks
Scanning system for ASM disks [  OK  ]

We can now test that the ASM disks were successfully identified using the following command as the root user account:

# /etc/init.d/oracleasm listdisks
VOL1
VOL2
VOL3



Add the Node to the Cluster with CRS

At this point, the following tasks have been performed on the new RAC node:


We finally get to the core of the article by adding the new node to the current RAC cluster! The new node (linux3) will need to be added to the cluster at the clusterware layer so that the other nodes in the RAC cluster consider it to be part of the cluster. Here are the steps:

  1. Set the DISPLAY Variable

    Adding a node to the cluster requires running a script that will launch the GUI Oracle Universal Installer (OUI). You will be performing this install from one of the existing nodes in the current RAC cluster (i.e. linux1). Set your DISPLAY variable on this server to the workstation or server you are using for an X server:

    $ DISPLAY=<machine_name_or_IP_address>:0.0; export DISPLAY

  2. Run addNode.sh Script from Existing Node

    From the same node (linux1 in my example), change to the directory $ORA_CRS_HOME/oui/bin and run the addNode.sh command as the oracle UNIX user account. Note that this will bring up the OUI and prompts for the new node as shown in the following:

    $ hostname
    linux1
    
    $ cd $ORA_CRS_HOME/oui/bin
    $ ./addNode.sh -ignoreSysPrereqs

  3. Welcome Screen

    At the Welcome screen, click [Next].

  4. Specify Hardware Cluster Installation Mode

    At the next screen, Specify Hardware Cluster Installation Mode, enter the information for the new node: linux3. This new node should already be in the /etc/hosts file and pingable from each of the cluster nodes already configured in the RAC cluster:

    After entering the cluster information for linux3, click [Next].

  5. Run orainstRoot.sh script as root

    The OUI will then ask for you to perform tasks as the root user on the new node (linux3).

    Login to the new node as the root UNIX user account and run the orainstRoot.sh as follows:

    $ hostname
    linux3
    
    $ cd $ORACLE_BASE/oraInventory
    
    $ su
    Password: xxxx
    
    # ./orainstRoot.sh
    Creating the Oracle inventory pointer file (/etc/oraInst.loc)
    Changing groupname of /u01/app/oracle/oraInventory to dba.
    
    # exit
    After running the orainstRoot.sh script as root, go back to the OUI and acknowledge the dialog.

  6. Summary Screen

    This will bring you to the summary screen. Click [Next] to copy the Oracle CRS software to the new node.

  7. Run rootaddnode.sh as root

    After the copy is complete, you will be prompted to run the rootaddnode.sh on the EXISTING node you ran the addNode.sh script from - linux1 in this example:

      Open the rootaddnode.sh first and verify that the CLSCFG information in the script is correct. It should contain the new public and private node names and node numbers as follows:
    $CLSCFG -add -nn <node3>,3 -pn <node3-private>,3 -hn <node3>,3

    $ hostname
    linux1
    
    $ cd $ORA_CRS_HOME
    
    $ su
    Password: xxxx
    
    # ./rootaddnode.sh
    clscfg: EXISTING configuration version 2 detected.
    clscfg: version 2 is 10G Release 1.
    Attempting to add 1 new nodes to the configuration
    Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
    node <nodenumber>: <nodename> <private interconnect name> <hostname>
    node 3: linux3 int-linux3 linux3
    Creating OCR keys for user 'root', privgrp 'root'..
    Operation successful.
    
    # exit
    After running the rootaddnode.sh script as root, go back to the OUI and acknowledge the dialog.

  8. Run root.sh as root on new node

    The OUI will then bring up another dialog box asking to run the root.sh script on all new nodes - linux3.

      If you were installing more than one new node and the Oracle version is < 10.1.0.4, then:

    1. Locate the highest numbered NEW cluster node using $ORA_CRS_HOME/bin/olsnodes -n
    2. Run the root.sh script on this highest numbered NEW cluster node first.
    3. Run the root.sh script on the remaining NEW nodes in any order.

    For versions of Oracle 10.1.0.4 and higher, the root.sh scripts can be run on the NEW nodes in any order!

    In this article, I am only adding one node, so this note does not apply.

    $ hostname
    linux3
    
    $ su
    Password: xxxx
    
    # cd $ORA_CRS_HOME
    # ./root.sh
    Running Oracle10 root.sh script...
    \nThe following environment variables are set as:
        ORACLE_OWNER= oracle
        ORACLE_HOME=  /u01/app/oracle/product/10.1.0/crs
    Finished running generic part of root.sh script.
    Now product-specific root actions will be performed.
    Checking to see if Oracle CRS stack is already up...
    /etc/oracle does not exist. Creating it now.
    Setting the permissions on OCR backup directory
    Oracle Cluster Registry configuration upgraded successfully
    WARNING: directory '/u01/app/oracle/product/10.1.0' is not owned by root
    WARNING: directory '/u01/app/oracle/product' is not owned by root
    WARNING: directory '/u01/app/oracle' is not owned by root
    WARNING: directory '/u01/app' is not owned by root
    WARNING: directory '/u01' is not owned by root
    clscfg: EXISTING configuration version 2 detected.
    clscfg: version 2 is 10G Release 1.
    assigning default hostname linux1 for node 1.
    assigning default hostname linux2 for node 2.
    Successfully accumulated necessary OCR keys.
    Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
    node <nodenumber>: <nodename> <private interconnect name> <hostname>
    node 1: linux1 int-linux1 linux1
    node 2: linux2 int-linux2 linux2
    clscfg: Arguments check out successfully.
    
    NO KEYS WERE WRITTEN. Supply -force parameter to override.
    -force is destructive and will destroy any previous cluster
    configuration.
    Oracle Cluster Registry for cluster has already been initialized
    Adding daemons to inittab
    Preparing Oracle Cluster Ready Services (CRS):
    Expecting the CRS daemons to be up within 600 seconds.
    CSS is active on these nodes.
            linux1
            linux2
            linux3
    CSS is active on all nodes.
    Waiting for the Oracle CRSD and EVMD to start
    Waiting for the Oracle CRSD and EVMD to start
    Waiting for the Oracle CRSD and EVMD to start
    Waiting for the Oracle CRSD and EVMD to start
    Oracle CRS stack installed and running under init(1M)

    If you encounter any problems running the root.sh script, refer to Note: 240001.1 on Metalink.

    After successfully running the root.sh script on the new node, go back to the OUI and acknowledge the dialog. The installation will continue by updating the new cluster information to the OCR disk.

  9. Exit the OUI

    At the End of Installation screen, exit from the OUI.



Install the Oracle RDBMS Software on the New Node

After copying and configuring the CRS software on the new node, we now need to copy the Oracle RDBMS software from one of the existing nodes. As in the previous step, I will be using linux1.

  1. Set the DISPLAY Variable

    Copying the RDBMS software to the new node requires running a script that will launch the GUI Oracle Universal Installer (OUI). You will be performing this install from one of the existing nodes in the current RAC cluster (i.e. linux1). Set your DISPLAY variable on this server to the workstation or server you are using for an X server:

    $ DISPLAY=<machine_name_or_IP_address>:0.0; export DISPLAY

  2. Run addNode.sh Script from Existing Node

    From the same node (linux1 in my example), change to the directory $ORACLE_HOME/oui/bin and run the addNode.sh command as the oracle UNIX user account. Note that this will bring up the OUI and prompts for the new node as shown in the following:

    $ hostname
    linux1
    
    $ cd $ORACLE_HOME/oui/bin
    $ ./addNode.sh -ignoreSysPrereqs

  3. Welcome Screen

    At the Welcome screen, click [Next].

  4. Specify Hardware Cluster Installation Mode

    At the next screen, Specify Hardware Cluster Installation Mode, click the new node linux3:

    After making this selection, click [Next].

  5. Summary Screen

    This will bring you to the summary screen. Click [Next] to copy the Oracle RDBMS software to the new node.

  6. Run root.sh as root on new node

    After the copy is complete, the OUI will bring up a dialog box asking to run the root.sh script on all new nodes - (linux3 in this example).

    $ hostname
    linux3
    
    $ su
    Password: xxxx
    
    # cd $ORACLE_HOME
    # ./root.sh
    Running Oracle10 root.sh script...
    \nThe following environment variables are set as:
        ORACLE_OWNER= oracle
        ORACLE_HOME=  /u01/app/oracle/product/10.1.0/db_1
    
    Enter the full pathname of the local bin directory: [/usr/local/bin]: /usr/local/bin
       Copying dbhome to /usr/local/bin ...
       Copying oraenv to /usr/local/bin ...
       Copying coraenv to /usr/local/bin ...
    
    \nCreating /etc/oratab file...
    Adding entry to /etc/oratab file...
    Entries will be added to the /etc/oratab file as needed by
    Database Configuration Assistant when a database is created
    Finished running generic part of root.sh script.
    Now product-specific root actions will be performed.
    
    
    CRS resources are already configured

    After successfully running the root.sh script on the new node, go back to the OUI and acknowledge the dialog. The installation will continue by updating the new cluster information to the OCR disk.

  7. Exit the OUI

    At the End of Installation screen, exit from the OUI.

  8. Run the VIPCA Tool

    From the same existing node (linux1), run the vipca as the root user with the new nodelist. For example:

    $ hostname
    linux1
    
    $ su
    Password: xxxx
    
    # DISPLAY=<machine_name_or_IP_address>:0.0; export DISPLAY
    # cd $ORACLE_HOME/bin
    # ./vipca

  9. VIPCA Welcome Screen

    At the VIPCA Welcome Screen, click [Next].

  10. VIPCA Network Interfaces

    Verify that both network interfaces are selected (eth0 and eth1) and click [Next].

  11. Enter New Node's Virtual IP Information.

    In the next screen, enter the node's virtual IP address information as shown in the following screen:

  12. VIPCA Summary Screen

    Click [Finish]. You will now see a progress bar creating and then starting the new CRS resources. Once this is complete, click [Ok], view the configuration results, and finally click on the [Exit] button.



Reconfigure Listeners for New Node

Now, we need to run the Network Configuration Assistant (NETCA) on the new node to verify that the listener is configured.

  1. Set the DISPLAY Variable

    The NETCA uses a GUI that requires an X Server. You will be performing this configuration from the new node: linux3. Set your DISPLAY variable on this server to the workstation or server you are using for an X server:

    $ DISPLAY=<machine_name_or_IP_address>:0.0; export DISPLAY

  2. Run NETCA

    From the new node and as the oracle user, run the NETCA:

    $ netca &
    Perform the following actions:

    • Choose "Cluster Configuration", click [Next].
    • Select all nodes, click [Next].
    • Choose "Listener Configuration", click [Next].
    • Choose "Reconfigure", click [Next].
    • Choose the listener you would like to reconfigure (LISTENER in our example), click [Next].
    • Choose the correct protocol (TCP in our example), click [Next].
    • Choose the correct standard port (1521 in our example), click [Next].
    • Choose whether or not to configure another listener (No in our example), click [Next].
    • You will then get errors for the listener's on all existing nodes in the cluster. Select to Not try again are whichever button is appropriate to ignore and continue. You should not get any errors for the new node / listener being added.
    • Exit from the NETCA.

  3. Verify Network Configuration

    From the new node, run the following tests to verify that the listener CRS resource was created as the oracle UNIX user account:

    $ hostname
    linux3
    
    $ ps -ef | grep lsnr | grep -v 'grep' | grep -v 'ocfs' | awk '{print $9}'
    LISTENER_LINUX3
    
    $ cd $ORA_CRS_HOME/bin
    $ ./crs_stat | grep LISTENER
    NAME=ora.linux1.LISTENER_LINUX1.lsnr
    NAME=ora.linux2.LISTENER_LINUX2.lsnr
    NAME=ora.linux3.LISTENER_LINUX3.lsnr

  4. Verify All CRS Services are ONLINE

    One last test is to verify that all CRS services are ONLINE.

    $ cd $ORA_CRS_HOME/bin
    $ ./crs_stat | egrep 'NAME|TARGET'
    NAME=ora.linux1.ASM1.asm
    TARGET=ONLINE
    NAME=ora.linux1.LISTENER_LINUX1.lsnr
    TARGET=ONLINE
    NAME=ora.linux1.gsd
    TARGET=ONLINE
    NAME=ora.linux1.ons
    TARGET=ONLINE
    NAME=ora.linux1.vip
    TARGET=ONLINE
    NAME=ora.linux2.ASM2.asm
    TARGET=ONLINE
    NAME=ora.linux2.LISTENER_LINUX2.lsnr
    TARGET=ONLINE
    NAME=ora.linux2.gsd
    TARGET=ONLINE
    NAME=ora.linux2.ons
    TARGET=ONLINE
    NAME=ora.linux2.vip
    TARGET=ONLINE
    NAME=ora.linux3.LISTENER_LINUX3.lsnr
    TARGET=ONLINE
    NAME=ora.linux3.gsd
    TARGET=ONLINE
    NAME=ora.linux3.ons
    TARGET=ONLINE
    NAME=ora.linux3.vip
    TARGET=ONLINE
    NAME=ora.orcl.db
    TARGET=ONLINE
    NAME=ora.orcl.orcl1.inst
    TARGET=ONLINE
    NAME=ora.orcl.orcl2.inst
    TARGET=ONLINE
    NAME=ora.orcl.orcltest.cs
    TARGET=ONLINE
    NAME=ora.orcl.orcltest.orcl1.srv
    TARGET=ONLINE
    NAME=ora.orcl.orcltest.orcl2.srv
    TARGET=ONLINE

      If any of the CRS services are OFFLINE, you can simply bring all of the services ONLINE using srvctl as follows:
    srvctl start nodeapps -n linux3
    where linux3 is the new node being added to the cluster.



Create a New Oracle Instance

Finally, we need to create the Oracle10g instance on the new node using the Database Configuration Assistant (DBCA). The DBCA should be run from a pre-existing node - (again, I will be using linux1).

  1. Set the DISPLAY Variable

    The DBCA uses a GUI that requires an X Server. You will be performing this configuration from a pre-existing node: linux1. Set your DISPLAY variable on this server to the workstation or server you are using for an X server:

    $ DISPLAY=<machine_name_or_IP_address>:0.0; export DISPLAY

  2. Run DBCA

    From the pre-existing node and as the oracle user, run the DBCA

    $ dbca &
    Perform the following actions:

    • On the Welcome screen, choose "Oracle Real Application Clusters", click [Next].
    • Choose "Instance Management", click [Next].
    • Choose "Add an Instance", click [Next].
    • Choose the database you would like to add an instance to and specify a user with SYSDBA privileges, click [Next].
    • The next screen displays the instances that are currently configured for the given database, click [Next].
    • Choose the correct instance name and node. The instance will be orcl3 and the node to create it on is linux3. Click [Next].
    • For the new instance, orcl3, choose "Preferred" for the orcltest service.
    • By default, the DBCA does a good job of determining the storage segments to be added for the new instance. It will add an UNDO tablespace, the database files for this tablespace, and two redo log groups. Verify the storage options and click [Finish].
    • Review the summary screen, click [Ok] and wait a few seconds for the progress bar to start.
    • Allow the progress bar to finish. When asked if you want to perform another operation, choose No to exit the DBCA.

  3. Verify Instance

    Login to one of the instances and query the gv$instance view:

    SQL> select inst_id, instance_name, status from gv$instance order by inst_id;
    
       INST_ID INSTANCE_NAME    STATUS
    ---------- ---------------- ------------
             1 orcl1            OPEN
             2 orcl2            OPEN
             3 orcl3            OPEN

  4. Update TNSNAMES

    Login to all machines that will be accessing the third instance and update the tnsnames.ora file(s).

  5. Verify EM Database Control

    The DBCA should have updated and added the new node(s) to EM Database Control. Bring up a web browser and navigate to:

    http://linux3:5500/em



Copyright (c) 1998-2017 Jeffrey M. Hunter. All rights reserved.

All articles, scripts and material located at the Internet address of http://www.idevelopment.info is the copyright of Jeffrey M. Hunter and is protected under copyright laws of the United States. This document may not be hosted on any other site without my express, prior, written permission. Application to host any of the material elsewhere can be made by contacting me at jhunter@idevelopment.info.

I have made every effort and taken great care in making sure that the material included on my web site is technically accurate, but I disclaim any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on it. I will in no case be liable for any monetary damages arising from such loss, damage or destruction.

Last modified on
Saturday, 18-Sep-2010 17:44:08 EDT
Page Count: 14355