Oracle DBA Tips Corner |
Building an Inexpensive Oracle RAC 10g Release 2 on Linux - (CentOS 4.2 / FireWire)
by Jeff Hunter, Sr. Database Administrator
Contents
One of the most efficient ways to become familiar with Oracle10g Real Application Cluster (RAC) technology is to have access to an actual Oracle10g RAC cluster. In learning this new technology, you will soon start to realize the benefits Oracle10g RAC has to offer like fault tolerance, new levels of security, load balancing, and the ease of upgrading capacity. The problem though is the price of the hardware required for a typical production RAC configuration. A small two node cluster, for example, could run anywhere from $10,000 to well over $20,000. This would not even include the heart of a production RAC environment, the shared storage. In most cases, this would be a Storage Area Network (SAN), which generally start at $10,000.For those who simply want to become familiar with Oracle10g RAC, this article provides a low cost alternative to configure an Oracle10g RAC system using commercial off the shelf components and downloadable software. The estimated cost for this configuration could be anywhere from $1200 to $1800. This system will consist of a dual node cluster (each with a single processor), both running Linux (CentOS 4.2 or Red Hat Enterprise Linux 4 Update 2), Oracle10g Release 2, OCFS2, ASMLib 2.0 with a shared disk storage based on IEEE1394 (FireWire) drive technology. (Of course, you could also consider building a virtual cluster on a VMware Virtual Machine, but the experience won't quite be the same!)
![]()
This article will mark the last in a series to make use of FireWire technology as the shared storage medium in order to build an inexpensive Oracle10g RAC system. Future releases of this article will adopt the use of iSCSI; more specifically, building a network storage server using Openfiler. Powered by rPath Linux, Openfiler is a free browser-based network storage management utility that delivers file-based Network Attached Storage (NAS) and block-based Storage Area Networking (SAN) in a single framework. Openfiler supports CIFS, NFS, HTTP/DAV, FTP, however, I will only be making use of its iSCSI capabilities to implement an inexpensive SAN for the shared storage component required by Oracle10g RAC. Please note, that this is not the only way to build a low cost Oracle10g RAC system. I have worked on other solutions that utilize an implementation based on SCSI rather than FireWire for shared storage. In most cases, SCSI will cost more than a FireWire solution where an inexpensive SCSI configuration will consist of:
- SCSI Controller: Two SCSI controllers priced from $20 (Adaptec AHA-2940UW) to $220 (Adaptec 39320A-R) each.
- SCSI Enclosure: $70 - (Inclose 1 Bay 3.5" U320 SCSI Case)
- SCSI Hard Drive: $140 - (36GB 15K 68p U320 SCSI Hard Drive)
- SCSI Cables: Two SCSI cables priced at $20 each - (3ft External HD68 to HD68 U320 Cable)
Keep in mind that some motherboards may already include built-in SCSI controllers.
It is important to note that the FireWire configuration described in this article should never be run in a production environment and that it is not supported by Oracle or any other vendor. In a production environment, fiber channel the high-speed serial-transfer interface that can connect systems and storage devices in either point-to-point or switched topologies is the technology of choice. FireWire offers a low-cost alternative to fiber channel for testing and development, but it is not ready for production.
Although in past articles I used raw partitions for storing files on shared storage, here we will make use of the Oracle Cluster File System V2 (OCFS2) and Oracle Automatic Storage Management (ASM). The two Linux servers will be configured as follows:
Oracle Database Files RAC Node Name Instance Name Database Name $ORACLE_BASE File System / Volume Manager for DB Files linux1 orcl1 orcl /u01/app/oracle Automatic Storage Management (ASM) linux2 orcl2 orcl /u01/app/oracle Automatic Storage Management (ASM) Oracle Clusterware Shared Files File Type File Name Partition Mount Point File System Oracle Cluster Registry (OCR) /u02/oradata/orcl/OCRFile /dev/sda1 /u02/oradata/orcl Oracle's Cluster File System, Release 2 (OCFS2) Voting Disk /u02/oradata/orcl/CSSFile /dev/sda1 /u02/oradata/orcl Oracle's Cluster File System, Release 2 (OCFS2)
![]()
With Oracle Database 10g Release 2 (10.2), Cluster Ready Services, or CRS, is now called Oracle Clusterware. The Oracle Clusterware software will be installed to /u01/app/oracle/product/crs on each of the nodes that make up the RAC cluster. However, the Clusterware software requires that two of its files, the "Oracle Cluster Registry (OCR)" file and the "Voting Disk" file be shared with all nodes in the cluster. These two files will be installed on shared storage using Oracle's Cluster File System, Release 2 (OCFS2). It is possible (but not recommended by Oracle) to use RAW devices for these files, however, it is not possible to use ASM for these two Clusterware files.
![]()
Starting with Oracle Database 10g Release 2 (10.2), Oracle Clusterware should be installed in a separate Oracle Clusterware home directory which is non-release specific. This is a change to the Optimal Flexible Architecture (OFA) rules. You should not install Oracle Clusterware in a release-specific Oracle home mount point, (/u01/app/oracle/product/10.2.0/... for example), as succeeding versions of Oracle Clusterware will overwrite the Oracle Clusterware installation in the same path. Also, If Oracle Clusterware 10g Release 2 (10.2) detects an existing Oracle Cluster Ready Services installation, then it overwrites the existing installation in the same path. The Oracle10g Release 2 Database software will be installed into a separate Oracle Home; namely /u01/app/oracle/product/10.2.0/db_1 on each of the nodes that make up the RAC cluster. All of the Oracle physical database files (data, online redo logs, control files, archived redo logs), will be installed to different partitions of the shared drive being managed by Automatic Storage Management (ASM).
![]()
The Oracle database files could have just as well been stored on the Oracle Cluster File System (OFCS2). Using ASM, however, makes the article that much more interesting!
![]()
This article is only designed to work as documented with absolutely no substitutions! If you are interested in configuring the same type of configuration for Oracle10g Release 1, please see my article entitled "Building an Inexpensive Oracle RAC 10g Release 1 Configuration on Linux - (WBEL 3.0)".
If you are interested in configuring the same type of configuration for Oracle9i, please see my article entitled "Building an Inexpensive Oracle RAC 9i Configuration on Linux".
Oracle RAC, introduced with Oracle9i, is the successor to Oracle Parallel Server (OPS). RAC allows multiple instances to access the same database (storage) simultaneously. RAC provides fault tolerance, load balancing, and performance benefits by allowing the system to scale out, and at the same time since all nodes access the same database, the failure of one instance will not cause the loss of access to the database.At the heart of Oracle10g RAC is a shared disk subsystem. All nodes in the cluster must be able to access all of the data, redo log files, control files and parameter files for all nodes in the cluster. The data disks must be globally available in order to allow all nodes to access the database. Each node has its own redo log file(s) and UNDO tablespace, but the other nodes must be able to access them (and the shared control file) in order to recover that node in the event of a system failure.
The biggest difference between Oracle RAC and OPS is the addition of Cache Fusion. With OPS a request for data from one node to another required the data to be written to disk first, then the requesting node can read that data. With cache fusion, data is passed along a high-speed interconnect using a sophisticated locking algorithm.
Not all clustering solutions use shared storage. Some vendors use an approach known as a Federated Cluster, in which data is spread across several machines rather than shared by all. With Oracle10g RAC, however, multiple nodes use the same set of disks for storing data. With Oracle10g RAC, the data files, redo log files, control files, and archived log files reside on shared storage on raw-disk devices, a NAS, ASM, or on a clustered file system. Oracle's approach to clustering leverages the collective processing power of all the nodes in the cluster and at the same time provides failover security.
Pre-configured Oracle10g RAC solutions are available from vendors such as Dell, IBM and HP for production environments. This article, however, focuses on putting together your own Oracle10g RAC environment for development and testing by using Linux servers and a low cost shared disk solution; FireWire.
For more background about Oracle RAC, visit the Oracle RAC Product Center on OTN.
Today, fibre channel is one of the most popular solutions for shared storage. As mentioned earlier, fibre channel is a high-speed serial-transfer interface that is used to connect systems and storage devices in either point-to-point or switched topologies. Protocols supported by Fibre Channel include SCSI and IP. Fibre channel configurations can support as many as 127 nodes and have a throughput of up to 2.12 gigabits per second. Fibre channel, however, is very expensive. Just the fibre channel switch alone can start at around $1000. This does not even include the fibre channel storage array and high-end drives, which can reach prices of about $300 for a 36GB drive. A typical fibre channel setup which includes fibre channel cards for the servers, a basic setup is roughly $10,000, which does not include the cost of the servers that make up the cluster.A less expensive alternative to fibre channel is SCSI. SCSI technology provides acceptable performance for shared storage, but for administrators and developers who are used to GPL-based Linux prices, even SCSI can come in over budget, at around $2,000 to $5,000 for a two-node cluster.
Another popular solution is the Sun NFS (Network File System) found on a NAS. It can be used for shared storage but only if you are using a network appliance or something similar. Specifically, you need servers that guarantee direct I/O over NFS, TCP as the transport protocol, and read/write block sizes of 32K.
The shared storage that will be used for this article is based on IEEE1394 (FireWire) drive technology. FireWire is able to offer a low-cost alternative to Fibre Channel for testing and development, but should never be used in a production environment.
Developed by Apple Computer and Texas Instruments, FireWire is a cross-platform implementation of a high-speed serial data bus. With its high bandwidth, long distances (up to 100 meters in length) and high-powered bus, FireWire is being used in applications such as digital video (DV), professional audio, hard drives, high-end digital still cameras and home entertainment devices. Today, FireWire operates at transfer rates of up to 800 megabits per second while next generation FireWire calls for speeds to a theoretical bit rate to 1600 Mbps and then up to a staggering 3200 Mbps. That's 3.2 gigabits per second. This will make FireWire indispensable for transferring massive data files and for even the most demanding video applications, such as working with uncompressed high-definition (HD) video or multiple standard-definition (SD) video streams.The following chart shows speed comparisons of the various types of disk interfaces. For each interface, I provide the maximum transfer rates in kilobits (kb), kilobytes (KB), megabits (Mb), megabytes (MB), gigabits (Gb), and gigabytes (GB) per second. As you can see, the capabilities of IEEE1394 compare very favorably with other disk interface and network technologies that are currently available today.
Disk Interface / Network / BUS Speed Kb KB Mb MB Gb GB Serial 115 14.375 0.115 0.014 Parallel (standard) 920 115 0.92 0.115 10Base-T Ethernet 10 1.25 IEEE 802.11b wireless Wi-Fi (2.4 GHz band) 11 1.375 USB 1.1 12 1.5 Parallel (ECP/EPP) 24 3 SCSI-1 40 5 IEEE 802.11g wireless WLAN (2.4 GHz band) 54 6.75 SCSI-2 (Fast SCSI / Fast Narrow SCSI) 80 10 100Base-T Ethernet (Fast Ethernet) 100 12.5 ATA/100 (parallel) 100 12.5 IDE 133.6 16.7 Fast Wide SCSI (Wide SCSI) 160 20 Ultra SCSI (SCSI-3 / Fast-20 / Ultra Narrow) 160 20 Ultra IDE 264 33 Wide Ultra SCSI (Fast Wide 20) 320 40 Ultra2 SCSI 320 40 FireWire 400 - (IEEE1394a) 400 50 USB 2.0 480 60 Wide Ultra2 SCSI 640 80 Ultra3 SCSI 640 80 FireWire 800 - (IEEE1394b) 800 100 Gigabit Ethernet 1000 125 1 PCI - (33 MHz / 32-bit) 1064 133 1.064 Serial ATA I - (SATA I) 1200 150 1.2 Wide Ultra3 SCSI 1280 160 1.28 Ultra160 SCSI 1280 160 1.28 PCI - (33 MHz / 64-bit) 2128 266 2.128 PCI - (66 MHz / 32-bit) 2128 266 2.128 AGP 1x - (66 MHz / 32-bit) 2128 266 2.128 Serial ATA II - (SATA II) 2400 300 2.4 Ultra320 SCSI 2560 320 2.56 FC-AL Fibre Channel 3200 400 3.2 PCI-Express x1 - (bidirectional) 4000 500 4 PCI - (66 MHz / 64-bit) 4256 532 4.256 AGP 2x - (133 MHz / 32-bit) 4264 533 4.264 Serial ATA III - (SATA III) 4800 600 4.8 PCI-X - (100 MHz / 64-bit) 6400 800 6.4 PCI-X - (133 MHz / 64-bit) 1064 8.512 1 AGP 4x - (266 MHz / 32-bit) 1066 8.528 1 10G Ethernet - (IEEE 802.3ae) 1250 10 1.25 PCI-Express x4 - (bidirectional) 2000 16 2 AGP 8x - (533 MHz / 32-bit) 2133 17.064 2.1 PCI-Express x8 - (bidirectional) 4000 32 4 PCI-Express x16 - (bidirectional) 8000 64 8
The hardware used to build our example Oracle10g RAC environment consists of two Linux servers and components that can be purchased at any local computer store or over the Internet.
![]()
I have received several emails since posting this article asking if the Maxtor OneTouch external drive (and the other external hard drives I have listed) has two IEEE1394 (FireWire) ports. All of the drives that I have listed and tested do have two IEEE1394 ports located on the back of the drive. Click on the following images for a larger view of the Maxtor OneTouch external drive:
![]()
![]()
We are about to start the installation process. Now that we have talked about the hardware that will be used in this example, let's take a conceptual look at what the environment would look like:
Figure 1: Oracle10g Release 2 Testing Rac ConfigurationAs we start to go into the details of the installation, it should be noted that most of the tasks within this document will need to be performed on both servers. I will indicate at the beginning of each section whether or not the task(s) should be performed on both nodes or not.
Install the Linux Operating System
![]()
Perform the following installation on all nodes in the cluster!
After procuring the required hardware, it is time to start the configuration process. The first task we need to perform is to install the Linux operating system. As already mentioned, this article will use CentOS 4.2. Although I have used Red Hat Fedora in the past, I wanted to switch to a Linux environment that would guarantee all of the functionality contained with Oracle. This is where CentOS comes in. The CentOS Enterprise Linux project takes the Red Hat Enterprise Linux 4 source RPMs, and compiles them into a free clone of the Red Hat Enterprise Server 4 product. This provides a free and stable version of the Red Hat Enterprise Linux 4 (AS/ES) operating environment that I can now use for testing different Oracle configurations. Over the last several months, I have been moving away from Fedora as I need a stable environment that is not only free, but as close to the actual Oracle supported operating system as possible. While CentOS is not the only project performing the same functionality, I tend to stick with it as it is stable and reacts fast with regards to updates by Red Hat.
Downloading CentOS Enterprise LinuxUse the links (below) to download CentOS Enterprise Linux 4.2. After downloading CentOS, you will then want to burn each of the ISO images to CD.
- CentOS-4.2-i386-bin1of4.iso (618 MB)
- CentOS-4.2-i386-bin2of4.iso (635 MB)
- CentOS-4.2-i386-bin3of4.iso (639 MB)
- CentOS-4.2-i386-bin4of4.iso (217 MB)
![]()
If you are downloading the above ISO files to a MS Windows machine, there are many options for burning these images (ISO files) to a CD. You may already be familiar with and have the proper software to burn images to CD. If you are not familiar with this process and do not have the required software to burn images to CD, here are just two (of many) software packages that can be used:
Installing CentOS Enterprise LinuxThis section provides a summary of the screens used to install CentOS Enterprise Linux. For more detailed installation instructions, it is possible to use the manuals from Red Hat Linux http://www.redhat.com/docs/manuals/. I would suggest, however, that the instructions I have provided below be used for this Oracle10g RAC configuration.
![]()
Before installing the Linux operating system on both nodes, you should have the FireWire and two NIC interfaces (cards) installed. Also, before starting the installation, ensure that the FireWire drive (our shared storage drive) is NOT connected to either of the two servers. You may also choose to connect both servers to the FireWire drive and simply turn the power off to the drive.
Although none of this is mandatory, it is how I will be performing the installation and configuration for this article.
After downloading and burning the CentOS images (ISO files) to CD, insert CentOS Disk #1 into the first server (linux1 in this example), power it on, and answer the installation screen prompts as noted below. After completing the Linux installation on the first node, perform the same Linux installation on the second node while substituting the node name linux1 for linux2 and the different IP addresses were appropriate.
Boot Screen
The first screen is the CentOS Enterprise Linux boot screen. At the boot: prompt, hit [Enter] to start the installation process.Media TestWhen asked to test the CD media, tab over to [Skip] and hit [Enter]. If there were any errors, the media burning software would have warned us. After several seconds, the installer should then detect the video card, monitor, and mouse. The installer then goes into GUI mode.Welcome to CentOS Enterprise LinuxAt the welcome screen, click [Next] to continue.Language / Keyboard SelectionThe next two screens prompt you for the Language and Keyboard settings. Make the appropriate selections for your configuration.Installation TypeChoose the [Custom] option and click [Next] to continue.Disk Partitioning SetupSelect [Automatically partition] and click [Next] continue.PartitioningIf there were a previous installation of Linux on this machine, the next screen will ask if you want to "remove" or "keep" old partitions. Select the option to [Remove all partitions on this system]. Also, ensure that the [hda] drive is selected for this installation. I also keep the checkbox [Review (and modify if needed) the partitions created] selected. Click [Next] to continue.
You will then be prompted with a dialog window asking if you really want to remove all partitions. Click [Yes] to acknowledge this warning.
The installer will then allow you to view (and modify if needed) the disk partitions it automatically selected. In almost all cases, the installer will choose 100MB for /boot, double the amount of RAM for swap, and the rest going to the root (/) partition. I like to have a minimum of 1GB for swap. For the purpose of this install, I will accept all automatically preferred sizes. (Including 2GB for swap since I have 1GB of RAM installed.)Boot Loader ConfigurationStarting with RHEL 4, the installer will create the same disk configuration as just noted but will create them using the Logical Volume Manager (LVM). For example, it will partition the first hard drive (/dev/hda for my configuration) into two partitions - one for the /boot partition (/dev/hda1) and the remainder of the disk dedicate to a LVM named VolGroup00 (/dev/hda2). The LVM Volume Group (VolGroup00) is then partitioned into two LVM partitions - one for the root file system (/) and another for swap. I basically check that it created at least 1GB of swap. Since I have 1GB of RAM installed, the installer created 2GB of swap. Saying that, I just accept the default disk layout.
The installer will use the GRUB boot loader by default. To use the GRUB boot loader, accept all default values and click [Next] to continue.Network ConfigurationI made sure to install both NIC interfaces (cards) in each of the Linux machines before starting the operating system installation. This screen should have successfully detected each of the network devices.FirewallFirst, make sure that each of the network devices are checked to [Active on boot]. The installer may choose to not activate eth1.
Second, [Edit] both eth0 and eth1 as follows. You may choose to use different IP addresses for both eth0 and eth1 and that is OK. If possible, try to put eth1 (the interconnect) on a different subnet than eth0 (the public network):
eth0:
- Check OFF the option to [Configure using DHCP]
- Leave the [Activate on boot] checked ON
- IP Address: 192.168.1.100
- Netmask: 255.255.255.0eth1:
- Check OFF the option to [Configure using DHCP]
- Leave the [Activate on boot] checked ON
- IP Address: 192.168.2.100
- Netmask: 255.255.255.0Continue by setting your hostname manually. I used "linux1" for the first node and "linux2" for the second. Finish this dialog off by supplying your gateway and DNS servers.
On this screen, make sure to select [No firewall] and click [Next] to continue. You may be prompted with a warning dialog about not setting the firewall. If this occurs, simply hit [Proceed] to continue.Additional Language Support / Time ZoneThe next two screens allow you to select additional language support and time zone information. In almost all cases, you can accept the defaults.Set Root PasswordSelect a root password and click [Next] to continue.Package Group SelectionScroll down to the bottom of this screen and select [Everything] under the "Miscellaneous" section. Click [Next] to continue.About to InstallPlease note that the installation of Oracle does not require all Linux packages to be installed. My decision to install all packages was for the sake of brevity. Please see section "Check RPM Packages for Oracle10g Release 2" for a more detailed look at the critical packages required for a successful Oracle installation.
Also note that with some RHEL 4 distributions, you will not get the "Package Group Selection" screen by default. There, you are asked to simply "Install default software packages" or "Customize software packages to be installed". Select the option to "Customize software packages to be installed" and click [Next] to continue. This will then bring up the "Package Group Selection" screen. Now, scroll down to the bottom of this screen and select [Everything] under the "Miscellaneous" section. Click [Next] to continue.
This screen is basically a confirmation screen. Click [Next] to start the installation. During the installation process, you will be asked to switch disks to Disk #2, Disk #3, and then Disk #4. Click [Continue] to start the installation process.Graphical Interface (X) ConfigurationNote that with CentOS 4.2, the installer will ask to switch to Disk #2, Disk #3, Disk #4, Disk #1, and then back to Disk #4.
With most RHEL 4 distributions (not the case with CentOS 4.2), when the installation is complete, the installer will attempt to detect your video hardware. Ensure that the installer has detected and selected the correct video hardware (graphics card and monitor) to properly use the X Windows server. You will continue with the X configuration in the next serveral screens.CongratulationsAnd that's it. You have successfully installed CentOS Enterprise Linux on the first node (linux1). The installer will eject the CD from the CD-ROM drive. Take out the CD and click [Exit] to reboot the system.Perform the same installation on the second nodeWhen the system boots into Linux for the first time, it will prompt you with another Welcome screen. The following wizard allows you to configure the date and time, add any additional users, testing the sound card, and to install any additional CDs. The only screen I care about is the time and date (and if you are using CentOS 4.x, the monitor/display settings). As for the others, simply run through them as there is nothing additional that needs to be installed (at this point anyways!). If everything was successful, you should now be presented with the login screen.
After completing the Linux installation on the first node, repeat the above steps for the second node (linux2). When configuring the machine name and networking, ensure to configure the proper values. For my installation, this is what I configured for linux2:First, make sure that each of the network devices are checked to [Active on boot]. The installer may choose to not activate eth1.
Second, [Edit] both eth0 and eth1 as follows:
eth0:
- Check OFF the option to [Configure using DHCP]
- Leave the [Activate on boot] checked ON
- IP Address: 192.168.1.101
- Netmask: 255.255.255.0eth1:
- Check OFF the option to [Configure using DHCP]
- Leave the [Activate on boot] checked ON
- IP Address: 192.168.2.101
- Netmask: 255.255.255.0Continue by setting your hostname manually. I used "linux2" for the second node. Finish this dialog off by supplying your gateway and DNS servers.
![]()
Perform the following network configuration on all nodes in the cluster!
![]()
Although we configured several of the network settings during the installation of CentOS Enterprise Linux, it is important to not skip this section as it contains critical steps that are required for a successful RAC environment.
Introduction to Network SettingsDuring the Linux O/S install we already configured the IP address and host name for each of the nodes. We now need to configure the /etc/hosts file as well as adjusting several of the network settings for the interconnect.Each node should have one static IP address for the public network and one static IP address for the private cluster interconnect. The private interconnect should only be used by Oracle to transfer Cluster Manager and Cache Fusion related data. Although it is possible to use the public network for the interconnect, this not recommended as it may cause degraded database performance (reducing the amount of bandwidth for Cache Fusion and Cluster Manager traffic). For a production RAC implementation, the interconnect should be at least gigabit or more and only be used by Oracle.
Configuring Public and Private NetworkIn our two node example, we need to configure the network on both nodes for access to the public network as well as their private interconnect.The easiest way to configure network settings in Red Hat Linux is with the program Network Configuration. This application can be started from the command-line as the "root" user account as follows:
# su - # /usr/bin/system-config-network &
![]()
Do not use DHCP naming for the public IP address or the interconnects - we need static IP addresses! Using the Network Configuration application, you need to configure both NIC devices as well as the
/etc/hostsfile. Both of these tasks can be completed using the Network Configuration GUI. Notice that the/etc/hostsentries are the same for both nodes.Our example configuration will use the following settings:
Server 1 - (linux1) Device IP Address Subnet Gateway Purpose eth0 192.168.1.100 255.255.255.0 192.168.1.1 Connects linux1 to the public network eth1 192.168.2.100 255.255.255.0 Connects linux1 (interconnect) to linux2 (linux2-priv) /etc/hosts 127.0.0.1 localhost loopback # Public Network - (eth0) 192.168.1.100 linux1 192.168.1.101 linux2 # Private Interconnect - (eth1) 192.168.2.100 linux1-priv 192.168.2.101 linux2-priv # Public Virtual IP (VIP) addresses for - (eth0) 192.168.1.200 linux1-vip 192.168.1.201 linux2-vip
Server 2 - (linux2) Device IP Address Subnet Gateway Purpose eth0 192.168.1.101 255.255.255.0 192.168.1.1 Connects linux2 to the public network eth1 192.168.2.101 255.255.255.0 Connects linux2 (interconnect) to linux1 (linux1-priv) /etc/hosts 127.0.0.1 localhost loopback # Public Network - (eth0) 192.168.1.100 linux1 192.168.1.101 linux2 # Private Interconnect - (eth1) 192.168.2.100 linux1-priv 192.168.2.101 linux2-priv # Public Virtual IP (VIP) addresses for - (eth0) 192.168.1.200 linux1-vip 192.168.1.201 linux2-vip
![]()
Note that the virtual IP addresses only need to be defined in the /etc/hosts file (or your DNS) for both nodes. The public virtual IP addresses will be configured automatically by Oracle when you run the Oracle Universal Installer, which starts Oracle's Virtual Internet Protocol Configuration Assistant (VIPCA). All virtual IP addresses will be activated when the srvctl start nodeapps -n <node_name> command is run. Although I am getting ahead of myself, this is the Host Name/IP Address that will be configured in the client(s) tnsnames.ora file for each Oracle Net Service Name. All of this will be explained much later in this article!
In the screen shots below, only node 1 (linux1) is shown. Ensure to make all the proper network settings to both nodes!
![]()
Figure 2: Network Configuration Screen - Node 1 (linux1)
![]()
Figure 3: Ethernet Device Screen - eth0 (linux1)
![]()
Figure 4: Ethernet Device Screen - eth1 (linux1)
![]()
Figure 5: Network Configuration Screen - /etc/hosts (linux1)
Once the network if configured, you can use the ifconfig command to verify everything is working. The following example is from linux1:$ /sbin/ifconfig -a eth0 Link encap:Ethernet HWaddr 00:0D:56:FC:39:EC inet addr:192.168.1.100 Bcast:192.168.1.255 Mask:255.255.255.0 inet6 addr: fe80::20d:56ff:fefc:39ec/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:835 errors:0 dropped:0 overruns:0 frame:0 TX packets:1983 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:705714 (689.1 KiB) TX bytes:176892 (172.7 KiB) Interrupt:3 eth1 Link encap:Ethernet HWaddr 00:0C:41:E8:05:37 inet addr:192.168.2.100 Bcast:192.168.2.255 Mask:255.255.255.0 inet6 addr: fe80::20c:41ff:fee8:537/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:9 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 b) TX bytes:546 (546.0 b) Interrupt:11 Base address:0xe400 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:5110 errors:0 dropped:0 overruns:0 frame:0 TX packets:5110 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:8276758 (7.8 MiB) TX bytes:8276758 (7.8 MiB) sit0 Link encap:IPv6-in-IPv4 NOARP MTU:1480 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
About Virtual IPWhy do we have a Virtual IP (VIP) in 10g? Why does it just return a dead connection when its primary node fails?It's all about availability of the application. When a node fails, the VIP associated with it is supposed to be automatically failed over to some other node. When this occurs, two things happen.
- The new node re-arps the world indicating a new MAC address for the address. For directly connected clients, this usually causes them to see errors on their connections to the old address.
- Subsequent packets sent to the VIP go to the new node, which will send error RST packets back to the clients. This results in the clients getting errors immediately.
This means that when the client issues SQL to the node that is now down, or traverses the address list while connecting, rather than waiting on a very long TCP/IP time-out (~10 minutes), the client receives a TCP reset. In the case of SQL, this is ORA-3113. In the case of connect, the next address in tnsnames is used.
Going one step further is making use of Transparent Application Failover (TAF). With TAF successfully configured, it is possible to completely avoid ORA-3113 errors alltogether! TAF will be discussed in more detail in the section "Transparent Application Failover - (TAF)".
Without using VIPs, clients connected to a node that died will often wait a 10 minute TCP timeout period before getting an error. As a result, you don't really have a good HA solution without using VIPs.
Source - Metalink: "RAC Frequently Asked Questions" (Note:220970.1)
Make sure RAC node name is not listed in loopback addressEnsure that the node names (linux1 or linux2) are not included for the loopback address in the /etc/hosts file. If the machine name is listed in the in the loopback address entry as below:127.0.0.1 linux1 localhost.localdomain localhostit will need to be removed as shown below:127.0.0.1 localhost.localdomain localhost
![]()
If the RAC node name is listed for the loopback address, you will receive the following error during the RAC installation: ORA-00603: ORACLE server session terminated by fatal errororORA-29702: error occurred in Cluster Group Service operation
Adjusting Network SettingsWith Oracle 9.2.0.1 and onwards, Oracle now makes use of UDP as the default protocol on Linux for inter-process communication (IPC), such as Cache Fusion and Cluster Manager buffer transfers between instances within the RAC cluster.Oracle strongly suggests to adjust the default and maximum send buffer size (SO_SNDBUF socket option) to 256 KB, and the default and maximum receive buffer size (SO_RCVBUF socket option) to 256 KB.
The receive buffers are used by TCP and UDP to hold received data until it is read by the application. The receive buffer cannot overflow because the peer is not allowed to send data beyond the buffer size window. This means that datagrams will be discarded if they don't fit in the socket receive buffer. This could cause the sender to overwhelm the receiver.
![]()
The default and maximum window size can be changed in the /proc file system without reboot: # su - root # sysctl -w net.core.rmem_default=262144 net.core.rmem_default = 262144 # sysctl -w net.core.wmem_default=262144 net.core.wmem_default = 262144 # sysctl -w net.core.rmem_max=262144 net.core.rmem_max = 262144 # sysctl -w net.core.wmem_max=262144 net.core.wmem_max = 262144The above commands made the changes to the already running O/S. You should now make the above changes permanent (for each reboot) by adding the following lines to the /etc/sysctl.conf file for each node in your RAC cluster:
# Default setting in bytes of the socket receive buffer net.core.rmem_default=262144 # Default setting in bytes of the socket send buffer net.core.wmem_default=262144 # Maximum socket receive buffer size which may be set by using # the SO_RCVBUF socket option net.core.rmem_max=262144 # Maximum socket send buffer size which may be set by using # the SO_SNDBUF socket option net.core.wmem_max=262144
Check and turn off UDP ICMP rejections:During the Linux installation process, I indicated to not configure the firewall option. (By default the option to configure a firewall is selected by the installer.) This has burned me several times so I like to do a double-check that the firewall option is not configured and to ensure udp ICMP filtering is turned off.If UDP ICMP is blocked or rejected by the firewall, the Oracle Clusterware software will crash after several minutes of running. When the Oracle Clusterware process fails, you will have something similar to the following in the <machine_name>_evmocr.log file:
08/29/2005 22:17:19 oac_init:2: Could not connect to server, clsc retcode = 9 08/29/2005 22:17:19 a_init:12!: Client init unsuccessful : [32] ibctx:1:ERROR: INVALID FORMAT proprinit:problem reading the bootblock or superbloc 22When experiencing this type of error, the solution was to remove the udp ICMP (iptables) rejection rule - or to simply have the firewall option turned off. The Oracle Clusterware software will then start to operate normally and not crash. The following commands should be executed as the root user account:
- Check to ensure that the firewall option is turned off. If the firewall option is stopped (like it is in my example below) you do not have to proceed with the following steps.
# /etc/rc.d/init.d/iptables status Firewall is stopped.
- If the firewall option is operating you will need to first manually disable UDP ICMP rejections:
# /etc/rc.d/init.d/iptables stop Flushing firewall rules: [ OK ] Setting chains to policy ACCEPT: filter [ OK ] Unloading iptables modules: [ OK ]
- Then, to turn UDP ICMP rejections off for next server reboot (which should always be turned off):
# chkconfig iptables off
Obtain & Install FireWire Modules
![]()
Perform the following FireWire module install and configuration on all nodes in the cluster!
OverviewThe next step is to obtain and install the FireWire modules that support the use of IEEE1394 devices with multiple logins.
![]()
In previous articles, it was required to download and install both a new Linux kernel, (e.g. the Oracle Technet Supplied 2.6.9-11.0.0.10.3.EL #1 Linux kernel), and the supporting FireWire modules. As of November 2005, oss.oracle.com now provides pre-compiled FireWire modules for the 2.6.9-22.EL and 2.6.9-22.0.1.EL Linux kernels. Installing a new Linux kernel is no longer required. We will only need to install and configure the supporting FireWire modules!
![]()
I am using the term "multiple logins" a bit loosely in this article. The concept of "multiple logins" is strictly not allowed in the IEEE1394 specification, as it is only a point to point protocol. The term "multiple logins", is often confused with "concurrent sessions", which is supported in the IEEE1394 specification. It simply means that the device allows multiple outstanding requests simultaneously (similar to the SCSI protocol). Therefore multiple hosts (initiators) on a single bus are prohibited according to IEEE1394. In previous releases of this article, I included the steps to download a patched version of the Linux kernel (the C source code) and then compile it. Thanks to Oracle's Linux Projects development group, this is no longer a requirement. Oracle now provides a pre-compiled module that supports the sharing of FireWire drives. The instructions for downloading and installing the supporting FireWire module is included in this section. Before going into the details of how to perform these actions, however, lets take a moment to discuss the changes that are required to support sharing of the FireWire drive.
While FireWire drivers already exist for Linux, they often do not support shared storage. Normally, when you logon to an O/S, the O/S associates the driver to a specific drive for that machine alone. This implementation simply will not work for our RAC configuration. The shared storage (our FireWire hard drive) needs to be accessed by more than one node. We need to enable the FireWire driver to provide nonexclusive access to the drive so that multiple servers - the nodes that comprise the cluster - will be able to access the same storage. This is accomplished by removing the bit mask that identifies the machine during login in the source code. This results in allowing nonexclusive access to the FireWire hard drive. All other nodes in the cluster login to the same drive during their logon session, using the same modified driver, so they too also have nonexclusive access to the drive.
Our implementation describes a dual node cluster (each with a single processor), each server running CentOS 4.2 Enterprise Linux. Keep in mind that the process of installing the supporting FireWire modules will need to be performed on both Linux nodes. CentOS Enterprise Linux 4.2 includes kernel 2.6.9-22.EL #1. Knowing this, we now need to download the matching FireWire module from:
Download one of the following files for the supporting FireWire Modules:
- oracle-firewire-modules-2.6.9-22.EL-1286-1.i686.rpm - (for single processor)
- OR -
- oracle-firewire-modules-2.6.9-22.ELsmp-1286-1.i686.rpm - (for multiple processors)
Install the supporting FireWire modules, as root:Install the supporting FireWire modules package by running either of the following:# rpm -ivh oracle-firewire-modules-2.6.9-22.EL-1286-1.i686.rpm - (for single processor) - OR - # rpm -ivh oracle-firewire-modules-2.6.9-22.ELsmp-1286-1.i686.rpm - (for multiple processors)
Add module options:Add the following lines to /etc/modprobe.conf:
options sbp2 exclusive_login=0It is vital that the parameter sbp2 exclusive_login of the Serial Bus Protocol module (sbp2) be set to zero to allow multiple hosts to login to and access the FireWire disk concurrently.
Perform the above tasks on the second Linux server:With the supporting FireWire modules installed on the first Linux server, move on to the second Linux server and repeat the same tasks in this section on it.
Connect FireWire drive to each machine and boot with the new FireWire modules installed:After you have performed the above tasks on both nodes in the cluster, power down both of them:=============================== # hostname linux1 # init 0 =============================== # hostname linux2 # init 0 ===============================After both machines are powered down, connect each of them to the back of the FireWire drive.Finally, power on each Linux server one at a time and ensure to watch for the "Probing for New Hardware" section during the boot process.
![]()
RHEL 4 users will be prompted during the boot process on both nodes at the "Probing for New Hardware" section for your FireWire hard drive. Simply select the option to "Configure" the device and continue the boot process. If you are not prompted during the "Probing for New Hardware" section for the new FireWire drive, you will need to run the following commands and reboot the machine. Do not put these commands in a script and attempt to run them - run them interactively at the command-line:
# rpm -e oracle-firewire-modules-2.6.9-22.EL-1286-1 # rpm -Uvh oracle-firewire-modules-2.6.9-22.EL-1286-1.i686.rpm # modprobe -r sbp2 # modprobe -r sd_mod # modprobe -r ohci1394 # modprobe ohci1394 # modprobe sd_mod # modprobe sbp2 # /usr/sbin/kudzu # init 6After running /usr/sbin/kudzu (above), you should be prompted to "Configure" the new drive. There are times when this didn't work the first time. If it didn't work, I had to power down everything, power them back up and perform the modprobe tasks (above) again.
Check and turn off UDP ICMP rejections:After rebooting each machine (above) check to ensure that the firewall option is turned off (stopped):# /etc/rc.d/init.d/iptables status Firewall is stopped.
Loading the FireWire stack:
![]()
Starting with Red Hat Enterprise Linux 3 (and of course CentOS Enterprise Linux!), the loading of the FireWire stack should already be configured! In most cases, the loading of the FireWire stack will already be configured in the /etc/rc.sysinit file. The commands that are contained within this file that are responsible for loading the FireWire stack are:
# modprobe sbp2 # modprobe ohci1394In older versions of Red Hat, this was not the case and these commands would have to be manually run or put within a startup file. With Red Hat Enterprise Linux 3 and higher, these commands are already put within the /etc/rc.sysinit file and run on each boot.
Check for SCSI Device:After each machine has rebooted, the kernel should automatically detect the shared disk as a SCSI device (/dev/sdXX). This section will provide several commands that should be run on all nodes in the cluster to verify the FireWire drive was successfully detected and being shared by all nodes in the cluster.For this configuration, I was performing the above procedures on both nodes at the same time. The following commands and results are from my linux2 machine. Again, make sure that you run the following commands on all nodes to ensure both machine can login to the shared drive.
Let's first check to see that the FireWire adapter was successfully detected:
# lspci 00:00.0 Host bridge: Intel Corporation 82845G/GL[Brookdale-G]/GE/PE DRAM Controller/Host-Hub Interface (rev 01) 00:02.0 VGA compatible controller: Intel Corporation 82845G/GL[Brookdale-G]/GE Chipset Integrated Graphics Device (rev 01) 00:1d.0 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #1 (rev 01) 00:1d.1 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #2 (rev 01) 00:1d.2 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #3 (rev 01) 00:1d.7 USB Controller: Intel Corporation 82801DB/DBM (ICH4/ICH4-M) USB2 EHCI Controller (rev 01) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 81) 00:1f.0 ISA bridge: Intel Corporation 82801DB/DBL (ICH4/ICH4-L) LPC Interface Bridge (rev 01) 00:1f.1 IDE interface: Intel Corporation 82801DB (ICH4) IDE Controller (rev 01) 00:1f.3 SMBus: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) SMBus Controller (rev 01) 00:1f.5 Multimedia audio controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) AC'97 Audio Controller (rev 01) 01:04.0 Ethernet controller: Linksys NC100 Network Everywhere Fast Ethernet 10/100 (rev 11) 01:06.0 FireWire (IEEE 1394): Texas Instruments TSB43AB23 IEEE-1394a-2000 Controller (PHY/Link) 01:09.0 Ethernet controller: Broadcom Corporation BCM4401 100Base-T (rev 01)Second, let's check to see that the modules are loaded:# lsmod |egrep "ohci1394|sbp2|ieee1394|sd_mod|scsi_mod" sd_mod 17217 0 ohci1394 35784 0 sbp2 23948 0 scsi_mod 121293 2 sd_mod,sbp2 ieee1394 298228 2 ohci1394,sbp2Third, let's make sure the disk was detected and an entry was made by the kernel:# cat /proc/scsi/scsi Attached devices: Host: scsi0 Channel: 00 Id: 01 Lun: 00 Vendor: Maxtor Model: OneTouch II Rev: 023g Type: Direct-Access ANSI SCSI revision: 06Now let's verify that the FireWire drive is accessible for multiple logins and shows a valid login:# dmesg | grep sbp2 sbp2: $Rev: 1265 $ Ben Collins <bcollins@debian.org> ieee1394: sbp2: Maximum concurrent logins supported: 2 ieee1394: sbp2: Number of active logins: 1 ieee1394: sbp2: Logged into SBP-2 deviceFrom the above output, you can see that the FireWire drive I have can support concurrent logins by up to 2 servers. It is vital that you have a drive where the chipset supports concurrent access for all nodes within the RAC cluster.
One other test I like to perform is to run a quick fdisk -l from each node in the cluster to verify that it is really being picked up by the O/S. Your drive may show that the device does not contain a valid partition table, but this is OK at this point of the RAC configuration.
# fdisk -l Disk /dev/hda: 40.0 GB, 40000000000 bytes 255 heads, 63 sectors/track, 4863 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/hda1 * 1 13 104391 83 Linux /dev/hda2 14 4863 38957625 8e Linux LVM Disk /dev/sda: 300.0 GB, 300090728448 bytes 255 heads, 63 sectors/track, 36483 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 1 36483 293049666 c W95 FAT32 (LBA)
Rescan SCSI bus no longer required:
![]()
With Red Hat Enterprise Linux 3 and 4 (and you guessed it, CentOS Enterprise Linux), you no longer need to rescan the SCSI bus in order to detect the disk! The disk should be detected automatically by the kernel as seen from the tests you performed above. In older versions of the kernel, I would need to run the rescan-scsi-bus.sh script in order to detect the FireWire drive. The purpose of this script was to create the SCSI entry for the node by using the following command:
echo "scsi add-single-device 0 0 0 0" > /proc/scsi/scsiStarting with Red Hat Enterprise Linux 3, this is no longer required and the disk should be detected automatically.
Troubleshooting SCSI Device Detection:If you are having troubles with any of the procedures (above) in detecting the SCSI device, you can try the following:# rpm -e oracle-firewire-modules-2.6.9-22.EL-1286-1 # rpm -Uvh oracle-firewire-modules-2.6.9-22.EL-1286-1.i686.rpm # modprobe -r sbp2 # modprobe -r sd_mod # modprobe -r ohci1394 # modprobe ohci1394 # modprobe sd_mod # modprobe sbp2 # /usr/sbin/kudzuYou may also want to unplug any USB devices connected to the server. The system may not be able to recognize your FireWire drive if you have a USB device attached!
Troubleshooting Concurrent Logins to the FireWire Drive:One of the first things to verify is that you are using a FireWire drive that contains the correct chipset and allows for multiple logins. If the FireWire drive has a chipset that does not allow for concurrent access from more than one server, the disk and its partitions can only be seen by one server at a time. Disks with the Oxford 911 chipset (FireWire 400), Oxford 912 chipset (FireWire 800), or Oxford 922 chipset (FireWire 800) are known to work. Note that the Oxford 912 chipset is newer and faster than Oxford 922. For a full list of FireWire drives (and enclosures) I have tested, please see the section Verified and Tested FireWire Hard Drives.Although I have only run into this situation once, there can be problems with the FireWire cards (the IEEE1394 controller cards). For example, in one of my tests (using FireWire 800) I was unable to obtain concurrent logins to the FireWire drive when using the LaCie FireWire 800 PCI Card - (107755) in both nodes. While the first node was able to login to the FireWire drive, it was acquiring it for exclusive access and causing the second node to fail its login process to the drive. For example:
From linux1:
# dmesg | grep sbp2 sbp2: $Rev: 1265 $ Ben CollinsFrom linux2:ieee1394: sbp2: Maximum concurrent logins supported: 2 ieee1394: sbp2: Number of active logins: 0 ieee1394: sbp2: Logged into SBP-2 device # dmesg | grep sbp2 sbp2: $Rev: 1265 $ Ben Collinsieee1394: sbp2: Maximum concurrent logins supported: 2 ieee1394: sbp2: Number of active logins: 1 ieee1394: sbp2: Error logging into SBP-2 device - login failed sbp2: probe of 00d04b690809290b-0 failed with error -16 ieee1394: sbp2: Maximum concurrent logins supported: 2 ieee1394: sbp2: Number of active logins: 1 ieee1394: sbp2: Error logging into SBP-2 device - login failed sbp2: probe of 00d04b690809290b-0 failed with error -16 I have seen postings that indicate this can be resolved by using the sbp2 option "serialize_io=1" defined in the the /etc/modprobe.conf. For example, the entry in the /etc/modprobe.conf file would be:
options sbp2 serialize_io=1 exclusive_login=0Although this has been used to resolve some of the cases with failed concurrent logins, it did not resolve the problem I was having with installing the LaCie FireWire 800 PCI Card - (107755) in both nodes.Another solution to resolve failed concurrent logins is to use different FireWire cards for each of the nodes. For example, one that uses the Texas Instruments (TI) chipset and another that users the VIA Technologies (VIA) chipset. Actually for me, I was able to resolve this by simply using two different FireWire cards from different vendors. For example, I used the LaCie FireWire 800 PCI Card - (107755) in one node and the SIIG FireWire 800 PCI-32T host adapter - (NN-830112) in the second node. Although they both use the TI chipset, it was enough to resolve the problem I was having with failed concurrent logins. After this, both nodes were able to successfully login to the FireWire drive.
Create "oracle" User and Directories