DBA Tips Archive for Oracle

  


Remove a Node from an Existing Oracle RAC 10g R1 Cluster on Linux - (FireWire)

by Jeff Hunter, Sr. Database Administrator


Contents

  1. Overview
  2. Remove the Instance
  3. Remove the Node from the Cluster



Overview

With any RAC configuration, it is common for the DBA to encounter a scenario where he or she needs to remove a node from the RAC environment. It may be that a server is being underutilized in the cluster and could be better used in another business unit. Another scenario is a node failure. In this case, a node can be removed from the cluster while the remaining nodes continue to service ongoing requests.

This document is an extension to two articles: "Building an Inexpensive Oracle10g RAC Configuration on Linux - (WBEL 3.0)" and "Adding a Node to an Oracle10g RAC Cluster - (WBEL 3.0)". Contained in this document are the steps to remove a single node (the third node I added in the second article) from an already running and configured Oracle10g RAC environment.

This article assumes the following:

  1. Three-node Oracle10g Environment: As I noted previously, this article assumes that the reader has already built and configured a three-node Oracle10g RAC environment. This system would consist of a three node cluster (each with a single processor), all three running Linux (White Box Enterprise Linux 3.0 Respin 1 or Red Hat Enterprise Linux 3) with a shared disk storage based on IEEE1394 (FireWire) drive technology.

  2. Node to be Removed is Available: The node to be removed in this example is available and running within the cluster. Of the three nodes in current RAC configuration, I will be removing linux3.

  3. FireWire Hub: The enclosure for the Maxtor One Touch 250GB USB 2.0 / Firewire External Hard Drive has only two IEEE1394 (FireWire) ports on the back. To configure a three-node cluster, I needed to purchase a FireWire hub. The one I used for this article is a BELKIN F5U526-WHT White External 6-Port Firewire Hub with AC Adapter.

The steps in this document provide the steps for removing the node's metadata from the cluster registry. The node being removed can easily be added back to the cluster at a later time.

If a node needs to be removed from an Oracle10g RAC database, even if the node will no longer be available to the environment, there is a certain amount of cleanup that needs to be done. The remaining nodes need to be informed of the change of status of the departing node.

The three most important steps that need to be followed are and will be discussed in this article are:

  1. Remove the instance using DBCA (preferred) or command-line (using srvctl).
  2. Remove the node from the cluster.
  3. Reconfigure the OS and remaining hardware.

For the purpose of this example, I have a three-node Oracle10g cluster:

Oracle10g RAC Configuration
Node Name IP Address Instance Name Using ASM ASM Instance Name Status
linux1 192.168.1.100 orcl1 Yes +ASM1 Available
linux2 192.168.1.101 orcl2 Yes +ASM2 Available
linux3 192.168.1.107 orcl3 Yes +ASM3 To be removed.

I will be removing node linux3, along with all metadata associated with it. Most of the operations to remove the node from the cluster will need to be performed from a pre-existing node that is available and will remain in the cluster. For this article, I will be performing all of these actions from linux1 to remove linux3.



Remove the Instance

When removing a node from an Oracle10g RAC cluster, the DBA will first need to remove the instance that is (or was) accessing the clustered database. This includes the ASM instance if the database is making use of Automatic Storage Management. Most of the actions to remove the instance need to be performed on a pre-existing node in the cluster that is available and will remain available after the removal.

For this section, I will be removing the instance(s) on linux3 and performing all of these operations from linux1:

This section provides two ways to perform the action of removing the instance(s): using DBCA or command-line (svrctl). When possible, always attempt to use the DBCA method.


Using DBCA

The following steps can be used to remove an Oracle10g instance from a clustered database using DBCA - even if the instance on the node is not available.

  1. First, verify that you have a good backup of the Oracle Configuration Repository (OCR) using ocrconfig:
    $ ocrconfig -showbackup
    
    int-linux1     2005/05/25 10:01:46     /u01/app/oracle/product/10.1.0/crs/cdata/crs
    
    int-linux1     2005/05/25 06:01:45     /u01/app/oracle/product/10.1.0/crs/cdata/crs
    
    int-linux1     2005/05/25 02:01:45     /u01/app/oracle/product/10.1.0/crs/cdata/crs
    
    int-linux1     2005/05/24 00:02:48     /u01/app/oracle/product/10.1.0/crs/cdata/crs
    
    int-linux1     2005/05/23 20:02:47     /u01/app/oracle/product/10.1.0/crs/cdata/crs

  2. Next, run the DBCA from one of the nodes you are going to keep. The database should remain up as well as leaving the departing instance up and running (if it is available).
    $ dbca &
    Within the DBCA, perform the following steps:

    1. Choose "Oracle Real Application Clusters database" and click [Next].
    2. Choose "Instance Management" and click [Next].
    3. Choose "Delete an instance" and click [Next].
    4. On the next screen, select the cluster database from which you want to remove the instance from. You will need to supply the system privilege (SYSDBA) username and password and click [Next].
    5. On the next screen, a list of cluster database instances will appear. Highlight the instance you would like to delete (linux3 in my example) and click [Next].
    6. If you have services configured, they will need to be reassigned. Modify the services so that each service can run on one of the remaining instances. Set "not used" for each service regarding the instance that is to be deleted. click [Finish].
    7. Acknowledge the dialog box by clicking [Ok] when asked to confirm you want to delete the selected instance.
    8. Acknowledge the second dialog by clicking [Ok] when asked to confirm the DBCA will remove the Oracle instance and all associated OFA directory structure. All information about this instance will be deleted.

        If the database is in archive log mode, the DBA may receive the following errors:

      ORA-00350 or ORA-00312

      This may occur because the DBCA cannot drop the current log, as it needs archived. This issue is fixed with 10.1.0.3 patchset. If the DBA encounters this error, click the [Ignore] button and when the DBCA completes, manually archive the logs for the deleted instance and drop the log group:

      SQL> alter system archive log all;
      SQL> alter database drop logfile group 3;

    9. After the DBCA has removed the instance, click [No] when prompted to perform another operation. The DBCA will exit.

  3. Verify that the redo thread for the dropped instance has been removed by querying v$log:
    SQL> select group#, thread#, status from v$log;
    
        GROUP#    THREAD# STATUS
    ---------- ---------- ----------------
             1          1 CURRENT
             2          1 INACTIVE
             3          2 CURRENT
             4          2 INACTIVE

    If for any reason the redo thread is not disabled then disable the thread:

    SQL> alter database disable public thread 3;

  4. Verify that the instance was removed from the Oracle Configuration Repository (OCR) using the srvctl config database -d <db_name> command. The following example assumes the name of the clustered database is orcl:
    $ srvctl config database -d orcl
    linux1 orcl1 /u01/app/oracle/product/10.1.0/db_1
    linux2 orcl2 /u01/app/oracle/product/10.1.0/db_1
    You should also run the crs_stat command:
    $ $ORA_CRS_HOME/bin/crs_stat | grep ins
    NAME=ora.orcl.orcl1.inst
    NAME=ora.orcl.orcl2.inst

  5. If the node had an ASM instance and the node will no longer be a part of the cluster, the DBA should remove the ASM instance using the following, assuming the node being removed is linux3:
    $ srvctl stop asm -n linux3
    $ srvctl remove asm -n linux3

    Verify that the ASM instance was removed using the following:

    $ srvctl config asm -n linux3
    If the removal of the ASM instance was successful, you should simply get your prompt back with no output. If however, you receive a record back (i.e. +ASM3 /u01/app/oracle/product/10.1.0/db_1), then the removal of the ASM instance failed.


Using SRVCTL

The following steps can be used to remove an Oracle10g instance from a clustered database using the command-line utility srvctl - even if the instance on the node is not available.
  1. First, verify that you have a good backup of the Oracle Configuration Repository (OCR) using ocrconfig:
    $ ocrconfig -showbackup
    
    int-linux1     2005/05/25 10:01:46     /u01/app/oracle/product/10.1.0/crs/cdata/crs
    
    int-linux1     2005/05/25 06:01:45     /u01/app/oracle/product/10.1.0/crs/cdata/crs
    
    int-linux1     2005/05/25 02:01:45     /u01/app/oracle/product/10.1.0/crs/cdata/crs
    
    int-linux1     2005/05/24 00:02:48     /u01/app/oracle/product/10.1.0/crs/cdata/crs
    
    int-linux1     2005/05/23 20:02:47     /u01/app/oracle/product/10.1.0/crs/cdata/crs

  2. Use the srvctl command-line utility from a pre-existing / available node in the cluster to remove the instance (from the node to be removed) from the cluster. This should be run as the oracle UNIX user account as follows:
    $ srvctl remove instance -d orcl -i orcl3
    Remove instance orcl3 for the database orcl? (y/[n]) y

  3. Verify that the redo thread for the dropped instance has been removed by querying v$log. If for any reason the redo thread is not disabled then disable the thread:
    SQL> alter database disable public thread 3;

  4. Verify that the instance was removed from the Oracle Configuration Repository (OCR) using the srvctl config database -d <db_name> command:
    $ srvctl config database -d orcl
    linux1 orcl1 /u01/app/oracle/product/10.1.0/db_1
    linux2 orcl2 /u01/app/oracle/product/10.1.0/db_1
    You should also run the crs_stat command:
    $ $ORA_CRS_HOME/bin/crs_stat | grep ins
    NAME=ora.orcl.orcl1.inst
    NAME=ora.orcl.orcl2.inst

  5. If the node had an ASM instance and the node will no longer be a part of the cluster, the DBA should remove the ASM instance using the following, assuming the name of the clustered database is named orcl and the node being removed is linux3:
    $ srvctl stop asm -n linux3
    $ srvctl remove asm -n linux3

    Verify that the ASM instance was removed using the following:

    $ srvctl config asm -n linux3



Remove the Node from the Cluster

Now that the instance has been removed (and the ASM instance is applicable), we now need to remove the node from the cluster. This is a manual method performed using scripts that need to be run on the deleted node (if available) to remove the CRS install as well as scripts that should be run on any of the existing nodes (i.e. linux1).

Before proceeding to the steps for removing the node, we need to determine the node name and the CRS-assigned node number for each node stored in the Oracle Cluster Registry. This can be run from any of the existing nodes (linux1 for this example).

$ $ORA_CRS_HOME/bin/olsnodes -n
linux1  1
linux2  2
linux3  3

Now that we have the node name and node number, we can start the steps to remove the node from the cluster. Here are the steps that should be executed from a pre-existing (available) node in the cluster (i.e. linux1):

  1. Run the NETCA utility to remove the network configuration:
    $ DISPLAY=<machine_or_ip_address>:0.0; export DISPLAY
    $ netca &
    Perform the following steps within the NETCA:

    1. Choose "Cluster Configuration" and click [Next].
    2. Only select the node you are removing and click [Next].
    3. Choose "Listener Configuration" and click [Next].
    4. Choose "Delete" and delete any listeners configured on the node you are removing. Acknowledge the dialog box to delete the listener configuration.

      NOTE: For some reason, I needed to login to linux3 and manually kill the process ID for the listener process.

  2. Run the crs_stat command to verify that all database resources are running on nodes that are going to be kept:
    $ $ORA_CRS_HOME/bin/crs_stat
    For example, verify that the node to be removed is not running any database resources. Look for the record of type:
    NAME=ora.<db_name>.db
    TYPE=application
    TARGET=ONLINE
    STAT=ONLINE on <node>
    Assuming the name of the clustered database is orcl, this is the record that was returned from the crs_stat command on my system:
    NAME=ora.orcl.db
    TYPE=application
    TARGET=ONLINE
    STATE=ONLINE on linux1
    I am safe here since the resource is running on linux1 and not linux3 - the node I want to remove.

    If, however, the database resource was running on linux3, we would need to relocate it to a node that we are going to keep (i.e. linux1) using the following:

    $ $ORA_CRS_HOME/bin/crs_relocate ora.<db_name>.db

  3. From a pre-existing node (i.e. linux1), remove the nodeapps from the node you are removing as the root UNIX user account:
    $ su
    Password: xxxxx
    
    # srvctl stop nodeapps -n linux3
    CRS-0210: Could not find resource ora.linux3.LISTENER_LINUX3.lsnr.
    
    # srvctl remove nodeapps -n linux3
    Please confirm that you intend to remove the node-level applications on node linux3 (y/[n]) y
    #

  4. The next step is to update the node list using the updateNodeList option to the OUI as the oracle user. This procedure will remove the node to be deleted from the list of node locations maintained by the OUI by listing only those remaining nodes. The only file that I know of that gets modified is $ORACLE_BASE/oraInventory/ContentsXML/inventory.xml. Here is the command I used for removing linux3 from the list. Notice that the DISPLAY variable needs to be set even though the GUI does not run.
    $ DISPLAY=<machine_or_ip_address>:0.0; export DISPLAY
    
    $ $ORACLE_HOME/oui/bin/runInstaller -ignoreSysPrereqs -updateNodeList \
    ORACLE_HOME=/u01/app/oracle/product/10.1.0/db_1 \
    CLUSTER_NODES=linux1,linux2
    Note that the command above will produce the following error which can safely be ignored:
    PRKC-1002 : All the submitted commands did not execute successfully 

  5. If the node to be removed is still available and running the CRS stack, the DBA will need to stop the CRS stack and remove the ocr.loc file. These tasks should be performed as the root user account and on the node that is to be removed from the cluster. The nosharedvar option assumes the ocr.loc file is not on a shared file system (which is the case in my example). If the file does exist on a shared file system, then specify sharedvar. From the node to be removed (i.e. linux3) and as the root user, run the following:
    $ su
    Password: xxxx
    
    # cd $ORA_CRS_HOME/install
    # ./rootdelete.sh remote nosharedvar
    Running Oracle10 root.sh script...
    \nThe following environment variables are set as:
        ORACLE_OWNER= oracle
        ORACLE_HOME=  /u01/app/oracle/product/10.1.0/crs
    Finished running generic part of root.sh script.
    Now product-specific root actions will be performed.
    Shutting down Oracle Cluster Ready Services (CRS):
    /etc/init.d/init.crsd: line 188: 29017 Aborted                 $ORA_CRS_HOME/bin/crsd -2
    
    Shutting down CRS daemon.
    Shutting down EVM daemon.
    Shutting down CSS daemon.
    Shutdown request successfully issued.
    Checking to see if Oracle CRS stack is down...
    Oracle CRS stack is not running.
    Oracle CRS stack is down now.
    Removing script for Oracle Cluster Ready services
    Removing OCR location file '/etc/oracle/ocr.loc'
    Cleaning up SCR settings in '/etc/oracle/scls_scr/linux3'

  6. Next, using the node name and CRS-assigned node number for the node to be deleted, run the rootdeletenode.sh command as follows. Keep in mind that this command should be run from a pre-existing / available node (i.e. linux1) in the cluster as the root UNIX user account:
    $ su
    Password: xxxx
    
    # cd $ORA_CRS_HOME/install
    # ./rootdeletenode.sh linux3,3
    Running Oracle10 root.sh script...
    \nThe following environment variables are set as:
        ORACLE_OWNER= oracle
        ORACLE_HOME=  /u01/app/oracle/product/10.1.0/crs
    Finished running generic part of root.sh script.
    Now product-specific root actions will be performed.
    clscfg: EXISTING configuration version 2 detected.
    clscfg: version 2 is 10G Release 1.
    Successfully deleted 13 values from OCR.
    Key SYSTEM.css.interfaces.nodelinux3 marked for deletion is not there. Ignoring.
    Successfully deleted 5 keys from OCR.
    Node deletion operation successful.
    'linux3,3' deleted successfully

    To verify that the node was successfully removed, use the following as either the oracle or root user:

    $ $ORA_CRS_HOME/bin/olsnodes -n
    linux1  1
    linux2  2

  7. Now, switch back to the oracle UNIX user account on the same pre-existing node (linux1) and run the runInstaller command to update the OUI node list, however this time for the CRS installation ($ORA_CRS_HOME). This procedure will remove the node to be deleted from the list of node locations maintained by the OUI by listing only those remaining nodes. The only file that I know of that gets modified is $ORACLE_BASE/oraInventory/ContentsXML/inventory.xml. Here is the command I used for removing linux3 from the list. Notice that the DISPLAY variable needs to be set even though the GUI does not run.
    $ DISPLAY=<machine_or_ip_address>:0.0; export DISPLAY
    
    $ $ORACLE_HOME/oui/bin/runInstaller -ignoreSysPrereqs -updateNodeList \
    ORACLE_HOME=/u01/app/oracle/product/10.1.0/crs \
    CLUSTER_NODES=linux1,linux2
    Note that each of the commands above will produce the following error which can safely be ignored:
    PRKC-1002 : All the submitted commands did not execute successfully 

    The OUI now contains the valid nodes that are part of the cluster!

  8. Now that the node has been removed from the cluster, the DBA should manually remove all Oracle10g RAC installation files from the deleted node. Obviously, this applies only if the removed node is still accessible and only if the files are not on a shared file system that is still being accessed by other nodes in the cluster!

    From the deleted node (linux3) I performed the following tasks as the root UNIX user account:

    1. Remove ORACLE_HOME and ORA_CRS_HOME:
      # rm -rf /u01/app/oracle/product/10.1.0/db_1
      # rm -rf /u01/app/oracle/product/10.1.0/crs

    2. Remove all init scripts and soft links (for Linux). For a list of init scripts and soft links for other UNIX platforms, see Metalink Note: 269320.1
      # rm -f /etc/init.d/init.cssd
      # rm -f /etc/init.d/init.crs
      # rm -f /etc/init.d/init.crsd
      # rm -f /etc/init.d/init.evmd
      # rm -f /etc/rc2.d/K96init.crs
      # rm -f /etc/rc2.d/S96init.crs
      # rm -f /etc/rc3.d/K96init.crs
      # rm -f /etc/rc3.d/S96init.crs
      # rm -f /etc/rc5.d/K96init.crs
      # rm -f /etc/rc5.d/S96init.crs
      # rm -Rf /etc/oracle/scls_scr

    3. Remove all remaining files:
      # rm -rf /etc/oracle
      # rm -f /etc/oratab
      # rm -f /etc/oraInst.loc
      # rm -rf /etc/ORCLcluster
      # rm -rf /u01/app/oracle/oraInventory
      # rm -rf /u01/app/oracle/product
      # rm -rf /u01/app/oracle/admin
      # rm -f /usr/local/bin/coraenv
      # rm -f /usr/local/bin/dbhome
      # rm -f /usr/local/bin/oraenv

    4. Remove all CRS/EVM entries from the file /etc/inittab:
      h1:35:respawn:/etc/init.d/init.evmd run >/dev/null 2>&1 </dev/null
      h2:35:respawn:/etc/init.d/init.cssd fatal >/dev/null 2>&1 </dev/null
      h3:35:respawn:/etc/init.d/init.crsd run >/dev/null 2>&1 </dev/null


Copyright (c) 1998-2017 Jeffrey M. Hunter. All rights reserved.

All articles, scripts and material located at the Internet address of http://www.idevelopment.info is the copyright of Jeffrey M. Hunter and is protected under copyright laws of the United States. This document may not be hosted on any other site without my express, prior, written permission. Application to host any of the material elsewhere can be made by contacting me at jhunter@idevelopment.info.

I have made every effort and taken great care in making sure that the material included on my web site is technically accurate, but I disclaim any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on it. I will in no case be liable for any monetary damages arising from such loss, damage or destruction.

Last modified on
Saturday, 18-Sep-2010 17:44:09 EDT
Page Count: 79937