Oracle DBA Tips Corner |
Building an Inexpensive Oracle RAC 10g Release 1 on Linux - (WBEL 3.0 / FireWire)
by Jeff Hunter, Sr. Database Administrator
Contents
Overview
For those who simply want to become familiar with Oracle10g RAC, this article
provides a low cost alternative to configure an Oracle10g RAC system using
commercial off the shelf components and downloadable software. The estimated cost
for this configuration could be anywhere from $1200 to $1800. This system
will consist of a dual node cluster (each with a single processor), both running Linux
(White Box Enterprise Linux 3.0 Respin 1 or Red Hat Enterprise Linux 3) with a shared disk storage based on IEEE1394
(FireWire)
drive technology.
(Of course, you could also consider building a
virtual cluster
on a VMware Virtual Machine, but the experience won't quite be the same!)
Please note, that this is not the only way to build a low cost Oracle10g RAC
system. I have seen other solutions that utilize an implementation based on SCSI
rather than FireWire for shared storage. In most cases, SCSI will cost more than
a FireWire solution
where a typical SCSI card is priced around $70 and an 80GB external SCSI drive
will cost around $700-$1000. Keep in mind that some motherboards may already include
built-in SCSI controllers.
It is important to note that this configuration should never be
run in a production environment and that it is
not supported by Oracle or any other vendor. In a production
environment, fiber channelthe high-speed serial-transfer interface that can
connect systems and storage devices in either point-to-point or switched
topologiesis the technology of choice.
FireWire offers a low-cost alternative to fiber channel for testing
and development, but it is not ready for production.
Although in past experience I have used raw partitions for storing
files on shared storage, here we will make use of the Oracle Cluster File System (OCFS) and
Oracle Automatic Storage Management (ASM). The two Linux servers will be configured as follows:
The Oracle Cluster Ready Services (CRS) software will be installed to
/u01/app/oracle/product/10.1.0/crs on each of the nodes
that make up the RAC cluster. However, the CRS software requires that two
of its files, the
"Oracle Cluster Registry (OCR)" file and the "CRS Voting Disk" file
be shared with all nodes in the cluster. These two files will
be installed on the shared storage using Oracle's Cluster File System (OCFS).
It is possible (but not recommended by Oracle) to use RAW devices for these files, however,
it is not possible to use ASM for these CRS files.
The Oracle10g R1 Database software will be installed into a separate
Oracle Home; namely /u01/app/oracle/product/10.1.0/db_1.
All of the Oracle physical database files (data, online redo logs, control files, archived redo logs),
will be installed to different partitions of the shared drive being
managed by Automatic Storage Management (ASM).
Oracle10g Real Application Cluster (RAC) Introduction
At the heart of Oracle10g RAC is a shared disk subsystem. All nodes in the
cluster must be able to access all of the data, redo log files, control
files and parameter files for all nodes in the cluster. The data disks must
be globally available in order to allow all nodes to access the database. Each
node has its own redo log file(s) and UNDO tablespace, but the other nodes must be able to
access them (and the shared control file) in order to recover that node in the event of a system failure.
Not all clustering solutions use shared storage. Some vendors use an approach
known as a federated cluster, in which data is spread across several machines
rather than shared by all. With Oracle10g RAC, however, multiple nodes use the same
set of disks for storing data. With Oracle10g RAC, the data files, redo log files, control files,
and archived log files reside on shared storage on raw-disk devices, a NAS, ASM, or on a
clustered file system. Oracle's approach to clustering leverages the collective
processing power of all the nodes in the cluster and at the same time provides
failover security.
The biggest difference between Oracle RAC and OPS is the addition of Cache
Fusion. With OPS a request for data from one node to another required the
data to be written to disk first, then the requesting node can read that
data. With cache fusion, data is passed along a high-speed interconnect
using a sophisticated locking algorithm.
Pre-configured Oracle10g RAC solutions are available from vendors such as
Dell, IBM and HP for production environments. This article, however,
focuses on putting together your own Oracle10g RAC environment for development
and testing by using Linux servers and a low cost shared disk solution; FireWire.
Shared-Storage Overview
A less expensive alternative to fibre channel is SCSI. SCSI technology provides
acceptable performance for shared storage, but for administrators and developers who
are used to GPL-based Linux prices, even SCSI can come in over budget, at around
$1,000 to $2,000 for a two-node cluster.
Another popular solution is the Sun NFS (Network File System) found on a NAS. It can be used for
shared storage but only if you are using a network appliance or something similar.
Specifically, you need servers that guarantee direct I/O over NFS, TCP as the transport protocol,
and read/write block sizes of 32K.
The shared storage that will be used for this article is based on
IEEE1394 (FireWire) drive technology. FireWire is able to offer a low-cost alternative
to Fibre Channel for testing and development, but should never be used in a production
environment.
FireWire Technology
The following chart shows speed comparisons of the various types of disk interface.
For each interface, I provide the maximum transfer rates in kilobits (kb), kilobytes (KB),
megabits (Mb), megabytes (MB), gigabits (Gb), and gigabytes (GB) per second.
As you can see, the capabilities of IEEE1394 compare very favorably with
other disk interface and network technologies that are currently available today.
Hardware & Costs
Each Linux server should contain two NIC adapters.
The Dell Dimension includes an integrated 10/100 Ethernet adapter that
will be used to connect to the public network. The second NIC adapter
will be used for the private interconnect.
The following is a list of FireWire I/O cards that contain the correct chipset,
allow for multiple logins, and should work with this article (no guarantees however).
FireWire I/O cards with chipsets made by VIA or TI are known to work.
Each Linux server should contain two NIC adapters.
The Dell Dimension includes an integrated 10/100 Ethernet adapter that
will be used to connect to the public network. The second NIC adapter
will be used for the private interconnect.
The following is a list of FireWire I/O cards that contain the correct chipset,
allow for multiple logins, and should work with this article (no guarantees however).
FireWire I/O cards with chipsets made by VIA or TI are known to work.
The following is a list of FireWire drives (and enclosures) that contain the correct chipset,
allow for multiple logins, and should work with this article (no guarantees however):
Each node in the RAC configuration will need to connect to the
shared storage device (the FireWire hard drive). The FireWire hard drive
will come supplied with one FireWire cable. You will need to purchase
one additional FireWire cable to connect the second node to the
shared storage. Select the appropriate FireWire cable that is compatible
with the data transmission speed (FireWire 400 / FireWire 800) and the
desired cable length.
Used for the interconnect between int-linux1 and int-linux2. A question I often
receive is about substituting the Ethernet switch (used for interconnect
int-linux1 / int-linux2) with a crossover CAT5 cable. I would not recommend this.
I have found that when using a crossover CAT5 cable for the
interconnect, whenever I took one of the PCs down, the
other PC would detect a "cable unplugged" error, and thus
the Cache Fusion network would become unavailable.
As we start to go into the details of the installation, it should be
noted that most of the tasks within this
document will need to be performed on both servers. I will indicate at the beginning
of each section whether or not the task(s) should be performed on both
nodes or not.
Install White Box Enterprise Linux 3.0
Also, before
starting the installation, ensure that the FireWire drive (our
shared storage drive) is NOT connected to either of the two servers.
Although none of this is mandatory, it is how I will be
performing the installation and configuration in this article.
After downloading and burning the WBEL images (ISO files) to CD,
insert WBEL Disk #1 into the first server (linux1 in this example), power it on,
and answer the installation screen prompts as noted below. After completing
the Linux installation on the first node, perform the same Linux installation
on the second node while substituting the node name linux1 for linux2
and the different IP addresses were appropriate.
Boot Screen
If there were
a previous installation of Linux on this machine, the next screen
will ask if you want to "remove" or "keep" old partitions. Select the option
to [Remove all partitions on this system]. Also, ensure that the
[hda] drive is selected for this installation. I also keep the
checkbox [Review (and modify if needed) the partitions created] selected.
Click [Next] to continue.
You will then be prompted with a dialog window asking if you really
want to remove all partitions. Click [Yes] to acknowledge this warning.
First, make sure that each of the network devices are checked
to [Active on boot]. The installer may choose
to not activate eth1.
Second, [Edit] both eth0 and eth1 as follows. You may choose
to use different IP addresses for both eth0 and eth1 and that
is OK. If possible, try to put eth1 (the interconnect) on
a different subnet than eth0 (the public network):
eth0:
eth1:
Continue by setting your hostname manually. I used
"linux1" for the first node and "linux2" for the second.
Finish this dialog off by supplying your gateway and
DNS servers.
When the system boots into Linux for the first time, it will prompt
you with another Welcome screen. The following wizard allows you to
configure the date and time, add any additional users, testing the
sound card, and to install any additional CDs. The only screen I care
about is the time and date. As for the others, simply run through
them as there is nothing additional that needs to be installed (at this point
anyways!). If everything was successful, you should now be presented with the
login screen.
First, make sure that each of the network devices are checked
to [Active on boot]. The installer will choose not
to activate eth1.
Second, [Edit] both eth0 and eth1 as follows:
eth0:
eth1:
Continue by setting your hostname manually. I used
"linux2" for the second node.
Finish this dialog off by supplying your gateway and
DNS servers.
Network Configuration
Each node should have one static IP address for the public network
and one static IP address for the private cluster interconnect. The
private interconnect should only be used by Oracle to transfer
Cluster Manager and Cache Fusion related data. Although it is possible
to use the public network for the interconnect, this not recommended as
it may cause degraded database performance (reducing the amount of bandwidth
for Cache Fusion and Cluster Manager traffic). For a production RAC implementation,
the interconnect should be at least gigabit or more and only be used by Oracle.
The easiest way to configure network settings in Red Hat Enterprise Linux 3 is with the program
Network Configuration. This application can be started from the command-line
as the "root" user account as follows:
Using the Network Configuration application, you need to configure
both NIC devices as well as the
Our example configuration will use the following settings:
It's all about availability of the application.
When a node fails, the VIP associated with it is supposed to be automatically
failed over to some other node. When this occurs, two things happen.
This means that when the client issues SQL to the node that is
now down, or traverses the address list while connecting, rather than
waiting on a very long TCP/IP time-out (~10 minutes), the client receives
a TCP reset. In the case of SQL, this is ORA-3113. In the case of
connect, the next address in tnsnames is used.
Without using VIPs, clients connected to a node that died will often wait
a 10 minute TCP timeout period before getting an error.
As a result, you don't really have a good HA solution without using VIPs.
Source - Metalink: "RAC Frequently Asked Questions" (Note:220970.1)
Oracle strongly suggests to adjust the default and maximum send buffer size
(SO_SNDBUF socket option) to 256 KB, and the default and maximum receive
buffer size (SO_RCVBUF socket option) to 256 KB.
The receive buffers are used by TCP and UDP to hold received data until it is read by
the application. The receive buffer cannot overflow because the peer is not allowed to
send data beyond the buffer size window. This means that datagrams will be discarded if
they don't fit in the socket receive buffer. This could cause the sender to overwhelm
the receiver.
The above commands made the changes to the already running O/S.
You should now make the above changes permanent (for each reboot) by adding the following lines
to the /etc/sysctl.conf file for each node in your RAC cluster:
If UDP ICMP is blocked or rejected by the firewall, the CRS software will crash after several minutes
of running. When the CRS process fails, you will have something similar to the following
in the <machine_name>_evmocr.log file:
Obtaining and Installing a proper Linux Kernel
While FireWire drivers already exist for Linux, they often do not support shared storage.
Normally, when you logon to an O/S, the O/S associates the driver to a specific drive
for that machine alone. This implementation simply will not work for our RAC
configuration. The shared storage (our FireWire hard drive) needs to be accessed
by more than one node. We need to enable the FireWire driver to provide nonexclusive
access to the drive so that multiple servers - the nodes that comprise the cluster -
will be able to access the same storage. This is accomplished by removing the bit mask
that identifies the machine during login in the source code. This results in allowing nonexclusive access
to the FireWire hard drive. All other nodes in the cluster login to the same drive
during their logon session, using the same modified driver, so they too also have
nonexclusive access to the drive.
Our implementation describes a dual node cluster (each with a single processor), each server
running White Box Enterprise Linux. Keep in mind that the process of installing the
patched Linux kernel will need to be performed on both Linux nodes.
White Box Enterprise Linux 3.0 (Respin 1) includes kernel 2.4.21-15.EL #1. We will need to download the Oracle
Technet Supplied 2.4.21-27.0.2.ELorafw1 Linux kernel from the following URL:
http://oss.oracle.com/projects/firewire/files.
The following is a listing of my /etc/grub.conf file before and then after
the kernel install. As you can see, the install that I did put in another
stanza for the 2.4.21-27.0.2.ELorafw1 kernel. If you want, you can chance
the entry (default) in the new file so that the new kernel will be the default
one booted. By default, the installer keeps the default kernel (your original one) by setting
it to default=1. You should change the default value to zero (default=0)
in order to enable the new kernel to boot by default.
It is vital that the parameter sbp2_exclusive_login of the Serial Bus
Protocol module (sbp2) be set to zero to allow multiple hosts to
login to and access the FireWire disk concurrently. The second line ensures
the SCSI disk driver module (sd_mod) is loaded as well since
(sbp2) requires the SCSI layer. The core SCSI support module
(scsi_mod) will be loaded automatically if (sd_mod)
is loaded - there is no need to make a separate entry for it.
Power on the FireWire drive.
Finally, power on each Linux server and ensure to boot each machine into the new kernel.
In most cases, the loading of the FireWire stack will already
be configured in the /etc/rc.sysinit file. The commands
that are contained within this file that are responsible for loading
the FireWire stack are:
For this configuration, I was performing the above procedures on
both nodes at the same time. When complete, I shutdown both machines, started
linux1 first, and then linux2. The following commands and
results are from my linux2 machine. Again, make sure that you run
the following commands on all nodes to ensure both machine can login to the
shared drive.
Let's first check to see that the FireWire adapter was successfully detected:
From the above output, you can see that the FireWire drive I have can
support concurrent logins by up to 3 servers. It is vital that you have a
drive where the chipset supports concurrent access for all nodes within the RAC cluster.
One other test I like to perform is to run a quick fdisk -l from
each node in the cluster to verify that it is really being picked up by
the O/S. Your drive may show that the device does not contain a valid partition table,
but this is OK at this point of the RAC configuration.
In older versions of the kernel, I would need to run the
rescan-scsi-bus.sh script
in order to detect the FireWire drive. The purpose of this script
was to create the SCSI entry for the node by using the following
command:
With Red Hat Enterprise Linux 3, this is no longer required and the disk should be detected automatically.
You may also want to unplug any USB devices connected to the server. The system
may not be able to recognize your FireWire drive if you have a USB device attached!
Create "oracle" User and Directories
For this example, I used:
You can check the available space in /tmp by running the following
command:
Creating Partitions on the Shared FireWire Storage Device
The following table lists the individual partitions that will
be created on the FireWire (shared) drive and what files
will be contained on them.
Create All Partitions on FireWire Shared Storage
After each machine is back up, run the "fdisk -l /dev/sda" command on each
machine in the cluster to ensure that they both can see the partition table:
Configuring the Linux Servers for Oracle
Throughout this section you will notice that there are several different ways to
configure (set) these parameters. For the purpose of this article, I will
be making all changes permanent (through reboots) by placing all commands
in the /etc/rc.local file. The method that I use will echo
the values directly into the appropriate path of the /proc file system.
# free
- OR -
# cat /proc/swaps
- OR -
# cat /proc/meminfo | grep MemTotal
As root, make a file that will act as additional swap space, let's say about 300MB:
Now we should change the file permissions:
Finally we format the "partition" as swap and add it to the swap space:
Oracle makes use of shared memory for its Shared Global Area (SGA) which is an area of
memory that is shared by all Oracle backup and foreground processes. Adequate sizing of
the SGA is critical to Oracle performance since it is responsible for holding the database
buffer cache, shared SQL, access paths, and so much more.
To determine all shared memory limits, use the following:
Setting SHMMAX
One of the most efficient ways to become familiar with Oracle10g Real
Application Cluster (RAC) technology is to have access to an actual
Oracle10g RAC cluster. In learning this new technology, you will soon start
to realize the benefits Oracle10g RAC has to offer like fault tolerance,
new levels of security, load balancing, and the ease of upgrading capacity. The problem though is the
price of the hardware required for a typical production RAC configuration.
A small two node cluster, for example, could run anywhere from $10,000
to well over $20,000. This would not even include the heart of a production
RAC environment, the shared storage. In most cases, this would be a Storage
Area Network (SAN), which generally start at $8,000.
If you are interested in configuring the same type of configuration
for Oracle9i, please see my article entitled
"Building an Inexpensive Oracle9i RAC Configuration on Linux".
This article is only designed to work as documented with absolutely no substitutions.
If you are looking for an example that takes advantage of 10g R2 with Red Hat 4,
please see
"Building an Inexpensive Oracle10g Release 2 RAC Configuration on Linux - (CentOS 4.2)".
Oracle Database Files
RAC Node Name
Instance Name
Database Name
$ORACLE_BASE
File System
linux1
orcl1
orcl
/u01/app/oracle
Automatic Storage Management (ASM)
linux2
orcl2
orcl
/u01/app/oracle
Automatic Storage Management (ASM)
Oracle CRS Shared Files
File Type
File Name
Partition
Mount Point
File System
Oracle Cluster Registry (OCR)
/u02/oradata/orcl/OCRFile
/dev/sda1
/u02/oradata/orcl
Oracle's Cluster File System (OCFS)
CRS Voting Disk
/u02/oradata/orcl/CSSFile
/dev/sda1
/u02/oradata/orcl
Oracle's Cluster File System (OCFS)
The Oracle database files could have just as well been stored on
the Oracle Cluster File System (OFCS). Using ASM, however, makes
the article that much more interesting!
Oracle Real Application Cluster (RAC) is the successor to Oracle
Parallel Server (OPS) and was first introduced in Oracle9i. RAC allows multiple instances to access the same
database (storage) simultaneously. RAC provides fault tolerance, load balancing, and
performance benefits by allowing the system to scale out, and at the same
time since all nodes access the same database, the failure of one instance
will not cause the loss of access to the database.
Today, fibre channel is one of the most popular solutions for shared storage.
As mentioned earlier, fibre channel is a high-speed serial-transfer interface
that is used to connect systems and storage devices in either point-to-point
or switched topologies. Protocols supported by Fibre Channel include SCSI
and IP. Fibre channel configurations can support as many as 127 nodes
and have a throughput of up to 2.12 gigabits per second. Fibre channel, although, is
very expensive. Just the fibre channel switch alone can run as much as $1000. This
does not even include the fibre channel storage array and high-end drives,
which can reach prices of about $300 for a 36GB drive. A typical fibre channel setup
which includes fibre channel cards for the servers, a basic setup is roughly $5,000,
which does not include the cost of the servers that make up the cluster.
Developed by Apple Computer and Texas Instruments, FireWire is a cross-platform
implementation of a high-speed serial data bus. With its high bandwidth, long
distances (up to 100 meters in length) and high-powered bus, FireWire is being
used in applications such as
digital video (DV), professional audio, hard drives, high-end digital still cameras
and home entertainment devices. Today, FireWire operates at transfer rates of up to
800 megabits per second while next generation FireWire calls for speeds to a theoretical
bit rate to 1600 Mbps and then up to a staggering 3200 Mbps. That's 3.2 gigabits per
second. This will make FireWire indispensable for transferring massive data files
and for even the most demanding video applications, such as working
with uncompressed high-definition (HD) video or multiple standard-definition (SD)
video streams.
Disk Interface / Network / BUS
Speed
Kb
KB
Mb
MB
Gb
GB
Serial
115
14.375
0.115
0.014
Parallel (standard)
920
115
0.92
0.115
10Base-T Ethernet
10
1.25
IEEE 802.11b wireless Wi-Fi (2.4 GHz band)
11
1.375
USB 1.1
12
1.5
Parallel (ECP/EPP)
24
3
SCSI-1
40
5
IEEE 802.11g wireless WLAN (2.4 GHz band)
54
6.75
SCSI-2 (Fast SCSI / Fast Narrow SCSI)
80
10
100Base-T Ethernet (Fast Ethernet)
100
12.5
ATA/100 (parallel)
100
12.5
IDE
133.6
16.7
Fast Wide SCSI (Wide SCSI)
160
20
Ultra SCSI (SCSI-3 / Fast-20 / Ultra Narrow)
160
20
Ultra IDE
264
33
Wide Ultra SCSI (Fast Wide 20)
320
40
Ultra2 SCSI
320
40
FireWire 400 - (IEEE1394a)
400
50
USB 2.0
480
60
Wide Ultra2 SCSI
640
80
Ultra3 SCSI
640
80
FireWire 800 - (IEEE1394b)
800
100
Gigabit Ethernet
1000
125
1
PCI - (33 MHz / 32-bit)
1064
133
1.064
Serial ATA I - (SATA I)
1200
150
1.2
Wide Ultra3 SCSI
1280
160
1.28
Ultra160 SCSI
1280
160
1.28
PCI - (33 MHz / 64-bit)
2128
266
2.128
PCI - (66 MHz / 32-bit)
2128
266
2.128
AGP 1x - (66 MHz / 32-bit)
2128
266
2.128
Serial ATA II - (SATA II)
2400
300
2.4
Ultra320 SCSI
2560
320
2.56
FC-AL Fibre Channel
3200
400
3.2
PCI-Express x1 - (bidirectional)
4000
500
4
PCI - (66 MHz / 64-bit)
4256
532
4.256
AGP 2x - (133 MHz / 32-bit)
4264
533
4.264
Serial ATA III - (SATA III)
4800
600
4.8
PCI-X - (100 MHz / 64-bit)
6400
800
6.4
PCI-X - (133 MHz / 64-bit)
1064
8.512
1
AGP 4x - (266 MHz / 32-bit)
1066
8.528
1
10G Ethernet - (IEEE 802.3ae)
1250
10
1.25
PCI-Express x4 - (bidirectional)
2000
16
2
AGP 8x - (533 MHz / 32-bit)
2133
17.064
2.1
PCI-Express x8 - (bidirectional)
4000
32
4
PCI-Express x16 - (bidirectional)
8000
64
8
The hardware used to build our example Oracle10g RAC environment
consists of two Linux servers and components
that can be purchased at any local computer store or over the Internet.
Server 1 - (linux1)
Dimension 2400 Series
- Intel Pentium 4 Processor at 2.80GHz
- 1GB DDR SDRAM (at 333MHz)
- 40GB 7200 RPM Internal Hard Drive
- Integrated Intel 3D AGP Graphics
- Integrated 10/100 Ethernet - (Broadcom BCM4401)
- CDROM (48X Max Variable)
- 3.5" Floppy
- No monitor (Already had one)
- USB Mouse and Keyboard
$620
1 - Ethernet LAN Cards
Linksys 10/100 Mpbs - (LNE100TX) - (Used for Interconnect to linux2)
$20
1 - FireWire Card
Belkin FireWire 3-Port 1394 PCI Card - (F5U501-APL)
SIIG 3-Port 1394 I/O Card - (NN-300012)
StarTech 4 Port IEEE-1394 PCI Firewire Card - (PCI1394_4)
Adaptec FireConnect 4300 FireWire PCI Card - (1890600)
$30
Server 2 - (linux2)
Dimension 2400 Series
- Intel Pentium 4 Processor at 2.80GHz
- 1GB DDR SDRAM (at 333MHz)
- 40GB 7200 RPM Internal Hard Drive
- Integrated Intel 3D AGP Graphics
- Integrated 10/100 Ethernet - (Broadcom BCM4401)
- CDROM (48X Max Variable)
- 3.5" Floppy
- No monitor (Already had one)
- USB Mouse and Keyboard
$620
1 - Ethernet LAN Cards
Linksys 10/100 Mpbs - (LNE100TX) - (Used for Interconnect to linux1)
$20
1 - FireWire Card
Belkin FireWire 3-Port 1394 PCI Card - (F5U501-APL)
SIIG 3-Port 1394 I/O Card - (NN-300012)
StarTech 4 Port IEEE-1394 PCI Firewire Card - (PCI1394_4)
Adaptec FireConnect 4300 FireWire PCI Card - (1890600)
$30
Miscellaneous Components
FireWire Hard Drive
Maxtor OneTouch III - 750GB FireWire 400/USB 2.0 Drive - (T01G750)
Maxtor OneTouch III - 500GB FireWire 400/USB 2.0 Drive - (T01G500)
Maxtor OneTouch III - 300GB FireWire 400/USB 2.0 Drive - (T01G300)
Maxtor OneTouch III - 500GB FireWire 400/USB 2.0 Drive - (F01G500)
Maxtor OneTouch III - 300GB FireWire 400/USB 2.0 Drive - (F01G300)
Maxtor OneTouch II 300GB USB 2.0 / IEEE 1394a External Hard Drive - (E01G300)
Maxtor OneTouch II 250GB USB 2.0 / IEEE 1394a External Hard Drive - (E01G250)
Maxtor OneTouch II 200GB USB 2.0 / IEEE 1394a External Hard Drive - (E01A200)
LaCie Hard Drive, Design by F.A. Porsche 250GB, FireWire 400 - (300703U)
LaCie Hard Drive, Design by F.A. Porsche 160GB, FireWire 400 - (300702U)
LaCie Hard Drive, Design by F.A. Porsche 80GB, FireWire 400 - (300699U)
Dual Link Drive Kit, FireWire Enclosure, ADS Technologies - (DLX185)
Maxtor Ultra 200GB ATA-133 (Internal) Hard Drive - (L01P200)
Maxtor OneTouch 250GB USB 2.0 / IEEE 1394a External Hard Drive - (A01A250)
Maxtor OneTouch 200GB USB 2.0 / IEEE 1394a External Hard Drive - (A01A200)
Ensure that the FireWire drive that you purchase supports multiple logins. If the drive
has a chipset that does not allow for concurrent access for more than one server,
the disk and its partitions can only be seen by one server at a time. Disks with
the Oxford 911 chipset are known to work. Here are the
details about the disk that I purchased for this test:
Vendor: Maxtor
Model: OneTouch II
Mfg. Part No. or KIT No.: E01G300
Capacity: 300 GB
Cache Buffer: 16 MB
Rotational Speed (rpm): 7200 RPM
Interface Transfer Rate : 400 Mbits/s
"Combo" Interface: IEEE 1394 / USB 2.0 and USB 1.1 compatible
$280
1 - Extra FireWire Cable
Belkin 6-pin to 6-pin 1394 Cable, 3 ft. - (F3N400-03-ICE)
Belkin 6-pin to 6-pin 1394 Cable, 14 ft. - (F3N400-14-ICE)
$20
1 - Ethernet hub or switch
Linksys EtherFast 10/100 5-port Ethernet Switch - (EZXS55W)
$25
4 - Network Cables
Category 5e patch cable - (Connect linux1 to public network)
Category 5e patch cable - (Connect linux2 to public network)
Category 5e patch cable - (Connect linux1 to interconnect ethernet switch)
Category 5e patch cable - (Connect linux2 to interconnect ethernet switch)
$5
$5
$5
$5
Total
$1685
We are about to start the installation process. Now that we have
talked about the hardware that will be used in this example, let's
take a conceptual look at what the environment would look:
Perform the following installation on all nodes in the cluster!
After procuring the required hardware, it is time to start the configuration
process. The first task we need to perform is to install the
Linux operating system. As already mentioned,
this article will use White Box Enterprise Linux (WBEL) 3.0.
Although I have used Red Hat Fedora in the past, I wanted to switch
to a Linux environment that would guarantee all of the functionality
contained with Oracle. This is where WBEL comes in.
The WBEL Linux project takes the Red Hat Enterprise Linux 3 source RPMs, and compiles
them into a free clone of the Enterprise Server 3.0 product. This provides
a free and stable version of the Red Hat Enterprise Linux 3 (AS/ES) operating environment for
testing different Oracle configurations. Over the last several months, I have been
moving away from Fedora as I need a stable environment that is not only
free, but as close to the actual Oracle supported operating system as possible.
While WBEL is not the only project performing the same functionality, I
tend to stick with it as it is stable and has been around the longest.
Downloading White Box Enterprise Linux
Use the links (below) to download White Box Enterprise Linux 3.0. After
downloading WBEL, you will then want to burn each of the ISO images
to CD.
If you are downloading the above ISO files to a MS Windows machine,
there are many options for burning these images (ISO files) to a CD. You
may already be familiar with and have the proper software
to burn images to CD. If you are not familiar with this process
and do not have the required software to burn images to CD, here are just
two (of many) software packages that can be used:
Installing White Box Enterprise Linux
This section provides a summary of the screens used to install
White Box Enterprise Linux. For more detailed installation instructions, it
is possible to use the manuals from Red Hat Linux
http://www.redhat.com/docs/manuals/.
I would suggest, however, that the instructions I have provided
below be used for this Oracle10g RAC configuration.
Before installing the Linux operating system on both
nodes, you should have
the FireWire and two NIC interfaces (cards) installed.
The first screen is the White Box Enterprise Linux boot screen.
At the boot: prompt, hit [Enter] to start the installation process.
Media Test
When asked to test the CD media, tab over to [Skip] and hit
[Enter]. If there
were any errors, the media burning software would have warned us. After several
seconds, the installer should then detect the video card, monitor, and mouse.
The installer then goes into GUI mode.
Welcome to White Box Enterprise Linux
At the welcome screen, click [Next] to continue.
Language / Keyboard / Mouse Selection
The next three screens prompt you for the Language, Keyboard, and Mouse
settings. Make the appropriate selections for your configuration.
Installation Type
Choose the [Custom] option and click [Next] to continue.
Disk Partitioning Setup
Select [Automatically partition] and click [Next] continue.
Partitioning
The installer will then allow you to view (and modify if needed)
the disk partitions it automatically selected. In most cases, the
installer will choose 100MB for /boot, double the amount of
RAM for swap, and the rest going to the root (/) partition. I like to
have a minimum of 1GB for swap. For the purpose of this install,
I will accept all automatically preferred sizes. (Including
2GB for swap since I have 1GB of RAM installed.)
Boot Loader Configuration
The installer will use the GRUB boot loader by default.
To use the GRUB boot loader, accept all default values and click [Next] to continue.
Network Configuration
I made sure to install both NIC interfaces (cards) in each of the
Linux machines before starting the operating system installation.
This screen should have successfully detected each of the network
devices.
Firewall
- Check OFF the option to [Configure using DHCP]
- Leave the [Activate on boot] checked ON
- IP Address: 192.168.1.100
- Netmask: 255.255.255.0
- Check OFF the option to [Configure using DHCP]
- Leave the [Activate on boot] checked ON
- IP Address: 192.168.2.100
- Netmask: 255.255.255.0
On this screen, make sure to check [No firewall]
and click [Next] to continue.
Additional Language Support / Time Zone
The next two screens allow you to select additional language support
and time zone information.
In almost all cases, you can accept the defaults.
Set Root Password
Select a root password and click [Next] to continue.
Package Group Selection
Scroll down to the bottom of this screen and select
[Everything] under the Miscellaneous section. Click
[Next] to continue.
About to Install
This screen is basically a confirmation screen. Click [Next]
to start the installation. During the installation process,
you will be asked to switch disks to Disk #2 and then Disk #3.
Graphical Interface (X) Configuration
When the installation is complete, the installer will attempt to detect
your video hardware. Ensure that the installer has detected
and selected the correct video hardware (graphics card and monitor) to
properly use the X Windows server. You will continue with the X
configuration in the next three screens.
Congratulations
And that's it. You have successfully installed White Box Enterprise Linux
on the first node (linux1). The installer will eject the CD
from the CD-ROM drive. Take out the CD and click [Exit] to reboot
the system.
Perform the same installation on the second node
After completing the Linux installation on the first node, repeat the above
steps for the second node (linux2). When configuring the machine name
and networking, ensure to configure the proper values. For my installation,
this is what I configured for linux2:
- Check off the option to [Configure using DHCP]
- Leave the [Activate on boot] checked
- IP Address: 192.168.1.101
- Netmask: 255.255.255.0
- Check off the option to [Configure using DHCP]
- Leave the [Activate on boot] checked
- IP Address: 192.168.2.101
- Netmask: 255.255.255.0
Perform the following network configuration on all nodes in the cluster!
Although we configured several of the network
settings during the installation of White Box Enterprise Linux, it is important
to not skip this section as it contains critical
steps that are required for a successful RAC environment.
Introduction to Network Settings
During the Linux O/S install we already configured the IP address and
host name for each of the nodes.
We now need to configure
the /etc/hosts file as well as adjusting several of the
network settings for the interconnect.
Configuring Public and Private Network
In our two node example, we need to configure the network on both nodes
for access to the public network as well as their private interconnect.
# su -
# /usr/bin/redhat-config-network &
Do not use DHCP naming for the public IP address or the interconnects - we need static IP addresses!
/etc/hosts file. Both of these tasks can
be completed using the Network Configuration GUI. Notice that the /etc/hosts
entries are the same for both nodes.
Server 1 - (linux1)
Device
IP Address
Subnet
Gateway
Purpose
eth0
192.168.1.100
255.255.255.0
192.168.1.1
Connects linux1 to the public network
eth1
192.168.2.100
255.255.255.0
Connects linux1 (interconnect) to linux2 (int-linux2)
/etc/hosts
127.0.0.1 localhost loopback
# Public Network - (eth0)
192.168.1.100 linux1
192.168.1.101 linux2
# Private Interconnect - (eth1)
192.168.2.100 int-linux1
192.168.2.101 int-linux2
# Public Virtual IP (VIP) addresses for - (eth0)
192.168.1.200 vip-linux1
192.168.1.201 vip-linux2
Server 2 - (linux2)
Device
IP Address
Subnet
Gateway
Purpose
eth0
192.168.1.101
255.255.255.0
192.168.1.1
Connects linux2 to the public network
eth1
192.168.2.101
255.255.255.0
Connects linux2 (interconnect) to linux1 (int-linux1)
/etc/hosts
127.0.0.1 localhost loopback
# Public Network - (eth0)
192.168.1.100 linux1
192.168.1.101 linux2
# Private Interconnect - (eth1)
192.168.2.100 int-linux1
192.168.2.101 int-linux2
# Public Virtual IP (VIP) addresses for - (eth0)
192.168.1.200 vip-linux1
192.168.1.201 vip-linux2
Note that the virtual IP addresses only need to be defined in the /etc/hosts file for both
nodes. The public virtual IP addresses will be configured automatically by Oracle when you run the
Oracle Universal Installer, which starts Oracle's Virtual Internet Protocol Configuration Assistant (VIPCA).
All virtual IP addresses will be activated when the srvctl start nodeapps -n <node_name> command
is run. Although I am getting ahead of myself, this is the Host Name/IP Address that will be
configured in the client(s) tnsnames.ora file for each Oracle Net Service Name.
All of this will be explained much later in this article!
In the screen shots below, only node 1 (linux1) is shown. Ensure to make
all the proper network settings to both nodes!
Network Configuration Screen - Node 1 (linux1)
Ethernet Device Screen - eth0 (linux1)
Ethernet Device Screen - eth1 (linux1)
Network Configuration Screen - /etc/hosts (linux1)
Once the network if configured, you can use the ifconfig
command to verify everything is working. The following example
is from linux1:
$ /sbin/ifconfig -a
eth0 Link encap:Ethernet HWaddr 00:0C:41:F1:6E:9A
inet addr:192.168.1.100 Bcast:192.168.1.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:421591 errors:0 dropped:0 overruns:0 frame:0
TX packets:403861 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:78398254 (74.7 Mb) TX bytes:51064273 (48.6 Mb)
Interrupt:9 Base address:0x400
eth1 Link encap:Ethernet HWaddr 00:0D:56:FC:39:EC
inet addr:192.168.2.100 Bcast:192.168.2.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1715352 errors:0 dropped:1 overruns:0 frame:0
TX packets:4257279 errors:0 dropped:0 overruns:0 carrier:4
collisions:0 txqueuelen:1000
RX bytes:802574993 (765.3 Mb) TX bytes:1236087657 (1178.8 Mb)
Interrupt:3
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:1273787 errors:0 dropped:0 overruns:0 frame:0
TX packets:1273787 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:246580081 (235.1 Mb) TX bytes:246580081 (235.1 Mb)
About Virtual IP
Why do we have a Virtual IP (VIP) in 10g?
Why does it just return a dead connection when its primary node fails?
Make sure RAC node name is not listed in loopback address
Ensure that none of the node names (linux1 or linux2) are
included for the loopback address in the /etc/hosts file.
If the machine name is listed in the in the loopback address entry as below:
127.0.0.1 linux1 localhost.localdomain localhost
it will need to be removed as shown below:
127.0.0.1 localhost.localdomain localhost
If the RAC node name is listed for the loopback address, you will
receive the following error during the RAC installation:
ORA-00603: ORACLE server session terminated by fatal error
or
ORA-29702: error occurred in Cluster Group Service operation
Adjusting Network Settings
With Oracle 9.2.0.1 and onwards, Oracle now makes use of UDP as the default protocol
on Linux for inter-process communication (IPC), such as Cache Fusion
and Cluster Manager buffer transfers
between instances within the RAC cluster.
The default and maximum window size can be changed in the /proc file system
without reboot:
# su - root
# sysctl -w net.core.rmem_default=262144
net.core.rmem_default = 262144
# sysctl -w net.core.wmem_default=262144
net.core.wmem_default = 262144
# sysctl -w net.core.rmem_max=262144
net.core.rmem_max = 262144
# sysctl -w net.core.wmem_max=262144
net.core.wmem_max = 262144
# Default setting in bytes of the socket receive buffer
net.core.rmem_default=262144
# Default setting in bytes of the socket send buffer
net.core.wmem_default=262144
# Maximum socket receive buffer size which may be set by using
# the SO_RCVBUF socket option
net.core.rmem_max=262144
# Maximum socket send buffer size which may be set by using
# the SO_SNDBUF socket option
net.core.wmem_max=262144
Check and turn off UDP ICMP rejections:
During the Linux installation process, I indicated to not configure the
firewall option. (By default the option to configure a firewall is selected
by the installer.)
This has burned me several times so I like to do a double-check that the firewall
option is not configured and to ensure udp ICMP filtering is turned off.
08/29/2005 22:17:19
oac_init:2: Could not connect to server, clsc retcode = 9
08/29/2005 22:17:19
a_init:12!: Client init unsuccessful : [32]
ibctx:1:ERROR: INVALID FORMAT
proprinit:problem reading the bootblock or superbloc 22
When experiencing this type of error, the solution was to remove the udp ICMP (iptables)
rejection rule - or to simply have the firewall option turned off.
The CRS will then start to operate normally and not crash. The following commands
should be executed as the root user account:
# /etc/rc.d/init.d/iptables status
Firewall is stopped.
# /etc/rc.d/init.d/iptables stop
Flushing firewall rules: [ OK ]
Setting chains to policy ACCEPT: filter [ OK ]
Unloading iptables modules: [ OK ]
# chkconfig iptables off
Perform the following kernel upgrade on all nodes in the cluster!
Overview
The next step is to obtain and install a new Linux kernel that supports the use
of IEEE1394 devices with multiple logins. In previous releases of this article,
I included the steps to download a patched version of the Linux kernel (source code) and then
compile it. Thanks to
Oracle's Linux Projects development group, this
is no longer a requirement. They provide a pre-compiled kernel for Red Hat Enterprise Linux 3.0
(which also works with White Box Enterprise Linux!), that can simply be
downloaded and installed. The instructions for downloading and installing the kernel
are included in this section. Before going into the details of how to perform these actions,
however, lets take a moment to discuss the changes that are required in the new kernel.
I am using the term "multiple logins" a bit loosely in this article. The concept of "multiple login"
is strictly not allowed in the IEEE1394 specification, as it is only a point to point protocol.
The term "multiple logins", is often confused with "concurrent sessions", which is supported in the
IEEE1394 specification. It simply means that the device allows multiple outstanding requests
simultaneously (similar to the SCSI-2 protocol). Therefore multiple hosts (initiators) on a single
bus are prohibited according to IEEE1394.
Download one of the following files:
- OR -
Take a backup of your GRUB configuration file:
In most cases you will be using GRUB for the boot loader. Before actually installing the new kernel,
backup a copy of your /etc/grub.conf file:
# cp /etc/grub.conf /etc/grub.conf.original
Install the new kernel, as root:
# rpm -ivh --force kernel-2.4.21-27.0.2.ELorafw1.i686.rpm - (for single processor)
- OR -
# rpm -ivh --force kernel-smp-2.4.21-27.0.2.ELorafw1.i686.rpm - (for multiple processors)
Installing the new kernel using RPM will also update your GRUB (or lilo)
configuration with the appropiate stanza. There is no need to add any new
stanza to your boot loader configuration unless you want to have
your old kernel image available.
Original /etc/grub.conf File # grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE: You have a /boot partition. This means that
# all kernel and initrd paths are relative to /boot/, eg.
# root (hd0,0)
# kernel /vmlinuz-version ro root=/dev/hda2
# initrd /initrd-version.img
#boot=/dev/hda
default=0
timeout=10
splashimage=(hd0,0)/grub/splash.xpm.gz
title White Box Enterprise Linux (2.4.21-15.EL)
root (hd0,0)
kernel /vmlinuz-2.4.21-15.EL ro root=LABEL=/
initrd /initrd-2.4.21-15.EL.img
Newly Configured /etc/grub.conf File After Kernel Install # grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE: You have a /boot partition. This means that
# all kernel and initrd paths are relative to /boot/, eg.
# root (hd0,0)
# kernel /vmlinuz-version ro root=/dev/hda2
# initrd /initrd-version.img
#boot=/dev/hda
default=0
timeout=10
splashimage=(hd0,0)/grub/splash.xpm.gz
title White Box Enterprise Linux (2.4.21-27.0.2.ELorafw1)
root (hd0,0)
kernel /vmlinuz-2.4.21-27.0.2.ELorafw1 ro root=LABEL=/
initrd /initrd-2.4.21-27.0.2.ELorafw1.img
title White Box Enterprise Linux (2.4.21-15.EL)
root (hd0,0)
kernel /vmlinuz-2.4.21-15.EL ro root=LABEL=/
initrd /initrd-2.4.21-15.EL.img
Add module options:
Add the following lines to /etc/modules.conf:
alias ieee1394-controller ohci1394
options sbp2 sbp2_exclusive_login=0
post-install sbp2 insmod sd_mod
post-install sbp2 insmod ohci1394
post-remove sbp2 rmmod sd_mod
Connect FireWire drive to each machine and boot into the new kernel:
After you have performed the above tasks on both nodes in the cluster,
power down both of them:
===============================
# hostname
linux1
# init 0
===============================
# hostname
linux2
# init 0
===============================
After both machines are powered down, connect each of them to the back of the FireWire drive.
Check and turn off UDP ICMP rejections:
After rebooting each machine (above)
check to ensure that the firewall option is turned off (stopped):
# /etc/rc.d/init.d/iptables status
Firewall is stopped.
Loading the FireWire stack:
Starting with Red Hat Enterprise Linux (and of course White Box Enterprise Linux!),
the loading of the FireWire stack should already be configured!
# modprobe sbp2
# modprobe ohci1394
In older versions of Red Hat, this was not the case and these commands
would have to be manually run or put within a startup file. With Red Hat Enterprise Linux
and higher, these commands are already put within the /etc/rc.sysinit file
and run on each boot.
Check for SCSI Device:
After each machine has rebooted, the kernel should automatically detect
the shared disk as a SCSI device (/dev/sdXX). This section will provide
several commands that should be run on all nodes in the cluster to
verify the FireWire drive was successfully detected and being shared by
all nodes in the cluster.
# lspci
00:00.0 Host bridge: Intel Corp. 82845G/GL[Brookdale-G]/GE/PE DRAM Controller/Host-Hub Interface (rev 01)
00:02.0 VGA compatible controller: Intel Corp. 82845G/GL[Brookdale-G]/GE Chipset Integrated Graphics Device (rev 01)
00:1d.0 USB Controller: Intel Corp. 82801DB (ICH4) USB UHCI #1 (rev 01)
00:1d.1 USB Controller: Intel Corp. 82801DB (ICH4) USB UHCI #2 (rev 01)
00:1d.2 USB Controller: Intel Corp. 82801DB (ICH4) USB UHCI #3 (rev 01)
00:1d.7 USB Controller: Intel Corp. 82801DB (ICH4) USB2 EHCI Controller (rev 01)
00:1e.0 PCI bridge: Intel Corp. 82801BA/CA/DB/EB/ER Hub interface to PCI Bridge (rev 81)
00:1f.0 ISA bridge: Intel Corp. 82801DB (ICH4) LPC Bridge (rev 01)
00:1f.1 IDE interface: Intel Corp. 82801DB (ICH4) Ultra ATA 100 Storage Controller (rev 01)
00:1f.3 SMBus: Intel Corp. 82801DB/DBM (ICH4) SMBus Controller (rev 01)
00:1f.5 Multimedia audio controller: Intel Corp. 82801DB (ICH4) AC'97 Audio Controller (rev 01)
01:04.0 FireWire (IEEE 1394): Texas Instruments TSB43AB23 IEEE-1394a-2000 Controller (PHY/Link)
01:05.0 Modem: Intel Corp.: Unknown device 1080 (rev 04)
01:06.0 Ethernet controller: Linksys NC100 Network Everywhere Fast Ethernet 10/100 (rev 11)
01:09.0 Ethernet controller: Broadcom Corporation BCM4401 100Base-T (rev 01)
Second, let's check to see that the modules are loaded:
# lsmod |egrep "ohci1394|sbp2|ieee1394|sd_mod|scsi_mod"
sd_mod 13808 0
sbp2 19724 0
scsi_mod 106664 3 [sg sd_mod sbp2]
ohci1394 28008 0 (unused)
ieee1394 62916 0 [sbp2 ohci1394]
Third, let's make sure the disk was detected and an entry was made by the kernel:
# cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: Maxtor Model: OneTouch Rev: 0200
Type: Direct-Access ANSI SCSI revision: 06
Now let's verify that the FireWire drive is accessible for multiple logins and
shows a valid login:
# dmesg | grep sbp2
ieee1394: sbp2: Query logins to SBP-2 device successful
ieee1394: sbp2: Maximum concurrent logins supported: 3
ieee1394: sbp2: Number of active logins: 1
ieee1394: sbp2: Logged into SBP-2 device
ieee1394: sbp2: Node[01:1023]: Max speed [S400] - Max payload [2048]
# fdisk -l
Disk /dev/sda: 203.9 GB, 203927060480 bytes
255 heads, 63 sectors/track, 24792 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sda1 1 24791 199133676 c Win95 FAT32 (LBA)
Disk /dev/hda: 40.0 GB, 40000000000 bytes
255 heads, 63 sectors/track, 4863 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/hda1 * 1 13 104391 83 Linux
/dev/hda2 14 4609 36917370 83 Linux
/dev/hda3 4610 4863 2040255 82 Linux swap
Rescan SCSI bus no longer required:
With Red Hat Enterprise Linux 3 (and you guessed it, White Box Enterprise Linux),
you no longer need to rescan the SCSI bus in order
to detect the disk! The disk should be detected automatically by
the kernel as seen from the tests you performed above.
echo "scsi add-single-device 0 0 0 0" > /proc/scsi/scsi
Troubleshooting SCSI Device Detection:
If you are having troubles with any of the procedures (above) in detecting
the SCSI device, you can try the following:
# modprobe -r sbp2
# modprobe -r sd_mod
# modprobe -r ohci1394
# modprobe ohci1394
# modprobe sd_mod
# modprobe sbp2
Perform the following procedures on all nodes in the cluster!
I will be using the Oracle Cluster File System (OCFS) to store the files required to be shared
for the Oracle Cluster Ready Services (CRS).
When using OCFS, the UID of the UNIX user "oracle" and GID of the UNIX group "dba" must be the same
on all machines in the cluster. If either the UID or GID are different, the files on the OCFS file system
will show up as "unowned" or may even be owned by a different user. For this article, I will use
175 for the "oracle" UID and 115 for the "dba" GID.
Create Group and User for Oracle
Lets continue this example by creating the UNIX
dba group
and oracle user account along with all appropriate directories.
# mkdir -p /u01/app
# groupadd -g 115 dba
# useradd -u 175 -g 115 -d /u01/app/oracle -s /bin/bash -c "Oracle Software Owner" -p oracle oracle
# chown -R oracle:dba /u01
# passwd oracle
# su - oracle
When you are setting the Oracle environment variables for each RAC node, ensure to
assign each RAC node a unique Oracle SID!
The Oracle Universal Installer (OUI) requires at most 400MB of free space
in the /tmp directory.
# df -k /tmp
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/hda2 36337384 4691460 29800056 14% /
If for any reason, you do not have enough space in /tmp, you
can temporarily create space in another file system and point your
TEMP and TMPDIR to it for the duration of the install.
Here are the steps to do this:
# su -
# mkdir /<AnotherFilesystem>/tmp
# chown root.root /<AnotherFilesystem>/tmp
# chmod 1777 /<AnotherFilesystem>/tmp
# export TEMP=/<AnotherFilesystem>/tmp # used by Oracle
# export TMPDIR=/<AnotherFilesystem>/tmp # used by Linux programs
# like the linker "ld"
When the installation of Oracle is complete, you can remove the temporary directory using the following:
# su -
# rmdir /<AnotherFilesystem>/tmp
# unset TEMP
# unset TMPDIR
Create Login Script for oracle User Account
After creating the "oracle" UNIX user account on both nodes, make sure
that you are logged in as the oracle user and
verify that the environment is setup correctly by using the
following
.bash_profile:
.bash_profile for Oracle User # .bash_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
alias ls="ls -FA"
# User specific environment and startup programs
export ORACLE_BASE=/u01/app/oracle
export ORACLE_HOME=$ORACLE_BASE/product/10.1.0/db_1
export ORA_CRS_HOME=$ORACLE_BASE/product/10.1.0/crs
export ORACLE_PATH=$ORACLE_BASE/common/oracle/sql:.:$ORACLE_HOME/rdbms/admin
# Each RAC node must have a unique ORACLE_SID. (i.e. orcl1, orcl2,...)
export ORACLE_SID=orcl1
export PATH=.:${PATH}:$HOME/bin:$ORACLE_HOME/bin
export PATH=${PATH}:/usr/bin:/bin:/usr/bin/X11:/usr/local/bin
export PATH=${PATH}:$ORACLE_BASE/common/oracle/bin
export ORACLE_TERM=xterm
export TNS_ADMIN=$ORACLE_HOME/network/admin
export ORA_NLS10=$ORACLE_HOME/nls/data
export LD_LIBRARY_PATH=$ORACLE_HOME/lib
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:$ORACLE_HOME/oracm/lib
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/lib:/usr/lib:/usr/local/lib
export CLASSPATH=$ORACLE_HOME/JRE
export CLASSPATH=${CLASSPATH}:$ORACLE_HOME/jlib
export CLASSPATH=${CLASSPATH}:$ORACLE_HOME/rdbms/jlib
export CLASSPATH=${CLASSPATH}:$ORACLE_HOME/network/jlib
export THREADS_FLAG=native
export TEMP=/tmp
export TMPDIR=/tmp
export LD_ASSUME_KERNEL=2.4.1
Create the following partitions on only one node in the cluster!
Overview
The next step is to create the required partitions on the FireWire (shared) drive. As mentioned
earlier in this article, I will be using Oracle's Cluster File System (OCFS) to
store the two files to be shared for Oracle's Cluster Ready Service (CRS). I will then be
using Automatic Storage Management (ASM) for all physical database files (data/index files,
online redo log files, control files, SPFILE, and archived redo log files).
Oracle Shared Drive Configuration
File System Type
Partition
Size
Mount Point
File Types
OCFS
/dev/sda1
300 MB
/u02/oradata/orcl
Oracle Cluster Registry (OCR) File - (~100 MB)
CRS Voting Disk - (~20MB)
ASM
/dev/sda2
50 GB
ORCL:VOL1
Oracle Database Files
ASM
/dev/sda3
50 GB
ORCL:VOL2
Oracle Database Files
ASM
/dev/sda4
50 GB
ORCL:VOL3
Oracle Database Files
Total
150.3 GB
Like shown in the table (above) my FireWire drive shows up as the SCSI device
/dev/sda. The fdisk command is used for creating (and removing) partitions.
For this configuration, I will be creating four partitions - one for CRS and the other
three for ASM (to store all Oracle database files). Before creating the new partitions, it is
important to remove any existing partitions (if they exist) on the FireWire drive:
# fdisk /dev/sda
Command (m for help): p
Disk /dev/sda: 203.9 GB, 203927060480 bytes
255 heads, 63 sectors/track, 24792 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sda1 1 24791 199133676 c Win95 FAT32 (LBA)
Command (m for help): d
Selected partition 1
Command (m for help): p
Disk /dev/sda: 203.9 GB, 203927060480 bytes
255 heads, 63 sectors/track, 24792 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-24792, default 1): 1
Last cylinder or +size or +sizeM or +sizeK (1-24792, default 24792): +300M
Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 2
First cylinder (38-24792, default 38): 38
Using default value 38
Last cylinder or +size or +sizeM or +sizeK (38-24792, default 24792): +50G
Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 3
First cylinder (6118-24792, default 6118): 6118
Using default value 6118
Last cylinder or +size or +sizeM or +sizeK (6118-24792, default 24792): +50G
Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Selected partition 4
First cylinder (12198-24792, default 12198): 12198
Using default value 12198
Last cylinder or +size or +sizeM or +sizeK (12198-24792, default 24792): +50G
Command (m for help): p
Disk /dev/sda: 203.9 GB, 203927060480 bytes
255 heads, 63 sectors/track, 24792 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sda1 1 37 297171 83 Linux
/dev/sda2 38 6117 48837600 83 Linux
/dev/sda3 6118 12197 48837600 83 Linux
/dev/sda4 12198 18277 48837600 83 Linux
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.
After creating all required partitions, you should now
inform the kernel of the partition changes using
the following syntax as the "root" user account:
# partprobe
# fdisk -l /dev/sda
Disk /dev/sda: 203.9 GB, 203927060480 bytes
255 heads, 63 sectors/track, 24792 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sda1 1 37 297171 83 Linux
/dev/sda2 38 6117 48837600 83 Linux
/dev/sda3 6118 12197 48837600 83 Linux
/dev/sda4 12198 18277 48837600 83 Linux
The FireWire drive (and partitions created) will be exposed
as a SCSI device.
Reboot All Nodes in RAC Cluster
After creating the partitions, it is recommended that you
reboot the kernel on all RAC nodes to make sure that all of the new partitions are
recognized by the kernel on all RAC nodes:
# su -
# reboot
It is not mandatory to reboot each node. However, I have seen issues
when not recycling each machine.
# fdisk -l /dev/sda
Disk /dev/sda: 203.9 GB, 203927060480 bytes
255 heads, 63 sectors/track, 24792 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sda1 1 37 297171 83 Linux
/dev/sda2 38 6117 48837600 83 Linux
/dev/sda3 6118 12197 48837600 83 Linux
/dev/sda4 12198 18277 48837600 83 Linux
Perform the following configuration procedures on all nodes in the cluster!
Several of the commands within this section will need to be performed on every node within the cluster
every time the machine is booted. This section provides very detailed information about setting shared memory,
semaphores, and file handle limits. Instructions for placing them in
a startup script (/etc/rc.local) are included in section
"All Startup Commands for Each RAC Node".
Overview
This section focuses on configuring both Linux servers -
getting each one prepared for the Oracle10g RAC installation. This includes
verifying enough swap space, setting shared memory and semaphores, and finally how to
set the maximum amount of file handles for the O/S.
Swap Space Considerations
(An inadequate amount of swap during the installation
will cause the Oracle Universal Installer to either "hang" or "die")
# dd if=/dev/zero of=tempswap bs=1k count=300000
# chmod 600 tempswap
# mke2fs tempswap
# mkswap tempswap
# swapon tempswap
Setting Shared Memory
Shared memory allows processes to access common structures and data by placing
them in a shared memory segment. This is the fastest form of Inter-Process Communications
(IPC) available - mainly due to the fact that no kernel involvement occurs when data is
being passed between the processes. Data does not need to be copied between processes.
# ipcs -lm
------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 32768
max total shared memory (kbytes) = 8388608
min seg size (bytes) = 1
The SHMMAX parameters defines the maximum size (in bytes) for
a shared memory segment. The Oracle SGA is comprised of shared memory and
it is possible that incorrectly setting SHMMAX could limit the
size of the SGA. When setting SHMMAX, keep in mind that the size of the
SGA should fit within one shared memory segment. An inadequate SHMMAX setting
could result in the following: