Return to Solaris Home Page.


Creating Metadevices - (Using DiskSuite 4.2.1 Commands)

by Jeff Hunter, Sr. Database Administrator

Contents

  1. Overview
  2. Examining the Disks In Our Example
  3. Partitioning the Disks
  4. Metadevice State Database - (State Database Replicas)
  5. Creating a Stripe - (RAID 0)
  6. Creating a Concatenation - (RAID 0)
  7. Creating Mirrors - (RAID 1)
  8. Creating a RAID 5 Volume - (RAID 5)
  9. Creating a Trans Metadevice
  10. Creating Hot Spare

Overview

This article provides a comprehensive overview for creating DiskSuite metadevices (Stripes, Concatenations, Mirrors, RAID5, and Hot Spares) using the DiskSuite command-line tools. Most of the information can also be found in the "Solstice DiskSuite 4.2.1 User's Guide" (Part Number 806-3205-10).

Examining the Disks In Our Example

This article is all about providing definitions and examples of DiskSuite's command line tools.

For all examples in this document, I will be utilizing a Sun Blade 150 connected to a Sun StorEDGE D1000 Disk Array containing twelve 9.1GB / 10000 RPM / UltraSCSI disk drives for a total disk array capacity of 108GB. The disk array is connected to the Sun Blade 150 using a Dual Differential Ultra/Wide SCSI (X6541A) host adapter. In the Sun StorEDGE D1000 Disk Array, the system identifies the drives as follows:

Controller 1Controller 2
c1t0d0   -    (d0) c2t0d0   -    (d0)
c1t1d0   -    (d0) c2t1d0   -    (d1)
c1t2d0   -    (d1) c2t2d0   -    (d1)
c1t3d0   -    (d20) c2t3d0   -    (d20)
c1t4d0   -    (d3) c2t4d0   -    (d3)
c1t5d0   -    (d3) c2t5d0   -    (d4)

d0 : RAID 0 - Stripe
d1 : RAID 0 - Concatenation
d20 : RAID 1 - Mirror
d3 : RAID 5
d4 : Hot Spare

From the configuration above, you can see we have plenty of disk drives to utilize for our examples!

Partitioning the Disks

Metadevices in DiskSuite are built from slices (disk partitions). If the disks you plan on using as metadevices have not been partitioned, do so now. For the twelve 9.1GB disk drives within the D1000 Disk Array, I use the same partition sizes and layout. I will make slice 0 a partition size of 100MB, just in case I want to attach a journaling trans metadevice later. By convention, I will use slice 7 for the entire rest of the disk for storing the actual data. I will also use slice 7 to store the metadb replicas for each of the tweleve disks. Also by convention, I will use slice 2 as the backup partition.

The following is the partition tables from one of the twelve hard drives:

format> verify

Primary label contents:

Volume name = <        >
ascii name  = 
pcyl        = 4926
ncyl        = 4924
acyl        =    2
nhead       =   27
nsect       =  133
Part      Tag    Flag     Cylinders        Size            Blocks
0 unassigned    wm       0 -   57      101.70MB    (58/0/0)     208278
1 unassigned    wm       0               0         (0/0/0)           0
2     backup    wm       0 - 4923        8.43GB    (4924/0/0) 17682084
3 unassigned    wm       0               0         (0/0/0)           0
4 unassigned    wm       0               0         (0/0/0)           0
5 unassigned    wm       0               0         (0/0/0)           0
6 unassigned    wm       0               0         (0/0/0)           0
7 unassigned    wm      58 - 4923        8.33GB    (4866/0/0) 17473806
Use the format(1M) command to edit the partition table, label the disks, and set the volume name.

Metadevice State Database - (State Database Replicas)

The metadevice database is used by DiskSuite to hold configuration and state information. Before creating metadevices you will need to create state database replicas. The state database replicas ensure that the data in the metadevice state database is always valid. When the metadevice state database is updated, each state database replica is also updated.

At a bare minimum, DiskSuite requires a minimum of three state database replicas. The system will only stay running with exactly half or more state database replicas available at any one time. The system will panic when less than half the state database replicas are available. The system will not reboot without one more than half the total state database replicas. Instead, it will go into single-user mode for administrative tasks.

State database replicas are created on disk slices using the metadb. Keep in mind that state database replicas can only be created on slices that are not in use. (i.e. have no file system or being used to store RAW data). You cannot create state database replicas on slices on partitions that contain a file system, root (/), /usr, or swap. State database replicas can be created on slices that will be part of a metadevice, but will need to be created BEFORE adding the slice to a metadevice.

In the following example, I will create one state database replica on each of the first 11 disk drives in the D1000 Disk Array using the metadb command. On the twelfth disk, I will give an example of how to create two state database replicas on the same slice. In total I will be creating 13 state database replicas on 12 twelve disks. The replicas will be created on slice 7 for each disk. (This is the slice that we created to be be used as a metadevice for each disk in the disk array.) I will create the state database replicas on the tweleve disks using the following methods:

  1. The first four initial state database replicas on the first four disks in the disk array using the -a and -f command line options to the metadb command.
  2. Then create seven more replicas just using the -a option to the metadb command.
  3. Then use the -c option to the metadb command on the twelfth disk to give an example of how to create two replicas on a single slice.

Creating the (Initial) First Four State Database Replicas

# metadb -a -f c1t0d0s7 c1t1d0s7 c1t2d0s7 c1t3d0s7

Creating the Next Seven State Database Replicas

# metadb -a c1t4d0s7 c1t5d0s7 c2t0d0s7 c2t1d0s7 c2t2d0s7 c2t3d0s7 c2t4d0s7

Creating Two State Database Replicas On the Same Slice

# metadb -a -c2 c2t5d0s7

Query All State Database Replicas

# metadb

        flags           first blk       block count
     a        u         16              1034            /dev/dsk/c1t0d0s7
     a        u         16              1034            /dev/dsk/c1t1d0s7
     a        u         16              1034            /dev/dsk/c1t2d0s7
     a        u         16              1034            /dev/dsk/c1t3d0s7
     a        u         16              1034            /dev/dsk/c1t4d0s7
     a        u         16              1034            /dev/dsk/c1t5d0s7
     a        u         16              1034            /dev/dsk/c2t0d0s7
     a        u         16              1034            /dev/dsk/c2t1d0s7
     a        u         16              1034            /dev/dsk/c2t2d0s7
     a        u         16              1034            /dev/dsk/c2t3d0s7
     a        u         16              1034            /dev/dsk/c2t4d0s7
     a        u         16              1034            /dev/dsk/c2t5d0s7
     a        u         1050            1034            /dev/dsk/c2t5d0s7

Deleting a State Database Replica

# metadb -d c2t4d0s7

Ok, now lets put it back!

# metadb -a c2t4d0s7

Creating a Stripe - (RAID 0)

A DiskSuite Striped Metadevice (often called just a stripe) is one of three types of simple metadevices.

A simple metadevice is called so because they are made only from slices. Simple metadevices can be used directly or as the basic building block for mirrors and trans metadevices.

NOTE: Sometimes a striped metadevice is called a stripe. Other times, stripe refers to the component blocks of a striped concatenation. "To stripe" means to spread I/O requests across disks by chunking parts of the disks and mapping those chunks to a virtual device (a metadevice). Both striping concatenation is also classified as RAID Level 0.

The data in a striped metadevice is arranged across two or more slices. The striping alternates equally-sized segments of data across two or more slices to form one logical storage unit. These segments are interleaved round-robin, so that the combined space is made alternately from each slice. Sort of like a shuffled deck of cards.

  1. The following example creates a striped metadevice using 3 slices named /dev/md/rdsk/d0 using the metainit command. Of the twelve disks available in the D1000 Disk Array, I will be using slices c1t0d0s7, c2t0d0s7, c1t1d0s7 as follows:
    # metainit d0 1 3 c1t0d0s7 c2t0d0s7 c1t1d0s7 -i 32k
    d0: Concat/Stripe is setup

  2. Use the metastat command to query your new metadevice:
    # metastat d0
    d0: Concat/Stripe
        Size: 52407054 blocks
        Stripe 0: (interlace: 64 blocks)
            Device              Start Block  Dbase
            c1t0d0s7                3591     Yes
            c2t0d0s7                3591     Yes
            c1t1d0s7                3591     Yes
    Let's explain the details of the above example. First notice that the new striped metadevice, d0, consists of a single stripe (Stripe 0) made of three slices (c1t0d0s7, c2t0d0s7, c1t1d0s7). The -i option sets the interlace to 32KB. (The interlace cannot be less than 8KB, nor greater than 100MB.) If interlace were not specified on the command line, the striped metadevice would use the default of 16KB. When using the metastat command to verify our metadevice, we can see from all disks belonging to Stripe 0, that this is a stripped metadevice. Also, that the interlace is 32k (512 * 64 blocks) as we defined it. The total size of the stripe is 26,832,411,648 bytes (512 * 52407054 blocks).

  3. Now that we have created our simple metadevice (a stripe), we can now pretend that the metadevice is a big partition (slice) on which we can do the usual file system things. Let's now create a UFS file system using the newfs command. I want to create a UFS file system with an 8KB block size:
    # newfs -i 8192 /dev/md/rdsk/d0
    newfs: construct a new file system /dev/md/rdsk/d0: (y/n)? y
    /dev/md/rdsk/d0:        52407054 sectors in 14594 cylinders of 27 tracks, 133 sectors
            25589.4MB in 913 cyl groups (16 c/g, 28.05MB/g, 3392 i/g)
    super-block backups (for fsck -F ufs -o b=#) at:
     32, 57632, 115232, 172832, 230432, 288032, 345632, 403232, 460832, 518432,
     576032, 633632, 691232, 748832, 806432, 864032, 921632, 979232, 1036832,
     1094432, 1152032, 1209632, 1267232, 1324832, 1382432, 1440032, 1497632,
     1555232, 1612832, 1670432, 1728032, 1785632, 1838624, 1896224, 1953824,
     2011424, 2069024, 2126624, 2184224, 2241824, 2299424, 2357024, 2414624,
    
                      <----------   SNIP   ---------->
    
     50909216, 50966816, 51024416, 51082016, 51139616, 51197216, 51254816,
     51312416, 51370016, 51427616, 51480608, 51538208, 51595808, 51653408,
     51711008, 51768608, 51826208, 51883808, 51941408, 51999008, 52056608,
     52114208, 52171808, 52229408, 52287008, 52344608, 52402208,

  4. Finally, we mount the file system on /db0 as follows:
    # mkdir /db0
    # mount -F ufs /dev/md/dsk/d0 /db0

  5. To ensure that this new file system is mounted each time the machine is started, insert the following line into you /etc/vfstab file (all on one line with tabs separating the fields):
    /dev/md/dsk/d0       /dev/md/rdsk/d0      /db0  ufs     2       yes     -

Creating a Concatenation - (RAID 0)

The method used for creating a Concatenated Metadevice is very similar to that used in creating a Striped Metadevice - both use the metainit command (obviously using different options) and the same method for creating and mounting a UFS file system for.

A DiskSuite Concatenated Metadevice (often called just a Concatenation) is one of three types of simple metadevices.

A simple metadevice is called so because they are made only from slices. Simple metadevices can be used directly or as the basic building block for mirrors and trans metadevices.

The data for a concatenated metadevice is organized serially and adjacently across disk slices, forming one logical storage unit. Many system administrators use a concatenated metadevice to get more storage capacity by logically combining the capacities of several slices. It is possible to add more slices to the concatenated metadevice as the demand for storage grows. A concatenated metadevice enables you to dynamically expand storage capacity and file system sizes online! With a concatenated metadevice you can add slices even if the other slices are currently active.

NOTE: You can also create a concatenated metadevice from a single slice. You could, for example, create a single-slice concatenated metadevice. Later, when you need more storage, you can add more slices to the concatenated metadevice.

  1. The following example creates a concatenated metadevice using 3 slices named /dev/md/rdsk/d1 using the metainit command. Of the twelve disks available in the D1000 Disk Array, I will be using slices c2t1d0s7, c1t2d0s7, c2t2d0s7 as follows:
    # metainit d1 3 1 c2t1d0s7 1 c1t2d0s7 1 c2t2d0s7
    d1: Concat/Stripe is setup

  2. Use the metastat command to query your new (or in our example all) metadevices:
    # metastat
    d0: Concat/Stripe
        Size: 52407054 blocks
        Stripe 0: (interlace: 64 blocks)
            Device              Start Block  Dbase
            c1t0d0s7                3591     Yes
            c2t0d0s7                3591     Yes
            c1t1d0s7                3591     Yes
    
    d1: Concat/Stripe
        Size: 52410645 blocks
        Stripe 0:
            Device              Start Block  Dbase
            c2t1d0s7                3591     Yes
        Stripe 1:
            Device              Start Block  Dbase
            c1t2d0s7                3591     Yes
        Stripe 2:
            Device              Start Block  Dbase
            c2t2d0s7                3591     Yes
    Let's explain the details of the above example. First notice that the new striped metadevice, d1, consists of three stripes (Stripe 0, Stripe 1, Stripe 2,) each made from a single slice (c2t1d0s7, c1t2d0s7, c2t2d0s7 respectively). When using the metastat command to verify our metadevice, we can see this is a concatenation from the fact of having multiple Stripes. The total size of the concatenation is 26,834,250,240 bytes (512 * 52410645 blocks).

  3. Now that we have created our simple metadevice (a concatenation), we can now pretend that the metadevice is a big partition (slice) on which we can do the usual file system things. Let's now create a UFS file system using the newfs command. I want to create a UFS file system with an 8KB block size:
    # newfs -i 8192 /dev/md/rdsk/d1
    newfs: construct a new file system /dev/md/rdsk/d1: (y/n)? y
    Warning: 1 sector(s) in last cylinder unallocated
    /dev/md/rdsk/d1:        52410644 sectors in 14595 cylinders of 27 tracks, 133 sectors
            25591.1MB in 913 cyl groups (16 c/g, 28.05MB/g, 3392 i/g)
    super-block backups (for fsck -F ufs -o b=#) at:
     32, 57632, 115232, 172832, 230432, 288032, 345632, 403232, 460832, 518432,
     576032, 633632, 691232, 748832, 806432, 864032, 921632, 979232, 1036832,
     1094432, 1152032, 1209632, 1267232, 1324832, 1382432, 1440032, 1497632,
    
                      <----------   SNIP   ---------->
    
     51312416, 51370016, 51427616, 51480608, 51538208, 51595808, 51653408,
     51711008, 51768608, 51826208, 51883808, 51941408, 51999008, 52056608,
     52114208, 52171808, 52229408, 52287008, 52344608, 52402208,

  4. Finally, we mount the file system on /db1 as follows:
    # mkdir /db1
    # mount -F ufs /dev/md/dsk/d1 /db1

  5. To ensure that this new file system is mounted each time the machine is started, insert the following line into you /etc/vfstab file (all on one line with tabs separating the fields):
    /dev/md/dsk/d1       /dev/md/rdsk/d1      /db1  ufs     2       yes     -

Creating Mirrors - (RAID 1)

A mirror is a metadevice just like any other metadevice (stripe, concatenation) and is made of one or more submirrors. A submirror is made of one or more striped or concatenated metadevices. Mirroring data provides you with maximum data availability by maintaining multiple copies of your data (also called RAID 1).

Before creating a mirror, create the striped metadevices or concatenated metadevices that will make up the mirror.

Any file system including root (/), swap, and /usr, or any application such as a database, can use a mirror.

When creating a mirror, first create a one-way mirror, then attach a second submirror. This starts a resync operation and ensures that data is not corrupted.

To mirror an existing file system, use an additional slice of equal or greater size than the slice already used by the mirror. You can use a concatenated metadevice or striped metadevice of two or more slices that have adequate space to contain the mirror.

You can create a one-way mirror for a future two- or three-way mirror.

You can create up to a three-way mirror. However, two-way mirrors usually provide sufficient data redundancy for most applications, and are less expensive in terms of disk drive costs. A three-way mirror enables you to take a submirror offline and perform a backup while maintaining a two-way mirror for continued data redundancy.

Use the same size slices when creating submirrors. Using different size slices creates unused space in the mirror.

Avoid having slices of submirrors on the same disk. Also, when possible, use disks attached to different controllers to avoid single points-of-failure. For maximum protection and performance, place each submirror on a different physical disk and, when possible, on different disk controllers. For further data availability, use hot spares with mirrors.

Adding additional state database replicas before creating a mirror can increase the mirror's performance. As a general rule, add one additional replica for each mirror you add to the system.

If possible create mirrors from disks consisting of the same disk geometries. The historical reason is that UFS uses disk blocks based on disk geometries. Today, the issue is centered around performance: a mirror composed of disks with different geometries will only be as fast as its slowest disk.

This section will contain the following five examples for creating different types of two-way mirrors:

  1. Create a Mirror From Unused Slices
  2. Create a Mirror From a File System That Can Be Unmounted
  3. Create a Mirror From a File System That Cannot Be Unmounted
  4. Create a Mirror From swap
  5. Create a Mirror From root (/)

To perform the above mirror examples, I will be using the two disks: c1t3d0 and c2t3d0. After creating each two-way mirror example, I will be deleting the newly created mirror to get ready for the next example.

Create a Mirror From Unused Slices

  1. Use the metainit command to create two metadevices - each new concatenation metadevice (d21 and d22) consists of a single stripe (Stripe 0) made of one slice (c1t3d0s7 and c2t3d0s7) respectively:
    # metainit d21 1 1 c1t3d0s7
    d21: Concat/Stripe is setup
    
    # metainit d22 1 1 c2t3d0s7
    d22: Concat/Stripe is setup

  2. Using the metainit -m command to create a one-way mirror (named d20) from one of the submirrors.
    # metainit d20 -m d21
    d20: Mirror is setup

  3. Finally, use the metattach command to create the two-way mirror (named d20) from the second submirror (d22).
    # metattach d20 d22
    d20: submirror d22 is attached
    We now have a two-way mirror, d20. The metainit command was first used to create the two submirrors (d21 and d22), which are actually concatenations. The metainit -m command was then used to create a one-way mirror from the d21 concatenation. We then used the metattach command to attach d22, creating a two-way mirror and causing a mirror resync. (Any data on the attached submirror is overwritten by the other submirror during the resync.) The system verifies that the objects are set up.

  4. Now that we have created our simple metadevice (a mirror), and the mirror resync is complete, we can now pretend that the metadevice is just a regular partition (slice) on which we can do the usual file system things. Let's now create a UFS file system using the newfs command. I want to create a UFS file system with an 8KB block size:
    # newfs -i 8192 /dev/md/rdsk/d20
    newfs: construct a new file system /dev/md/rdsk/d20: (y/n)? y
    Warning: 1 sector(s) in last cylinder unallocated
    /dev/md/rdsk/d20:       17470214 sectors in 4865 cylinders of 27 tracks, 133 sectors
            8530.4MB in 305 cyl groups (16 c/g, 28.05MB/g, 3392 i/g)
    super-block backups (for fsck -F ufs -o b=#) at:
     32, 57632, 115232, 172832, 230432, 288032, 345632, 403232, 460832, 518432,
     576032, 633632, 691232, 748832, 806432, 864032, 921632, 979232, 1036832,
     1094432, 1152032, 1209632, 1267232, 1324832, 1382432, 1440032, 1497632,
     1555232, 1612832, 1670432, 1728032, 1785632, 1838624, 1896224, 1953824,
    
                      <----------   SNIP   ---------->
    
     16321568, 16379168, 16436768, 16494368, 16547360, 16604960, 16662560,
     16720160, 16777760, 16835360, 16892960, 16950560, 17008160, 17065760,
     17123360, 17180960, 17238560, 17296160, 17353760, 17411360, 17468960,

  5. Finally, we mount the file system on /db20 as follows:
    # mkdir /db20
    # mount -F ufs /dev/md/dsk/d20 /db20

  6. To ensure that this new file system is mounted each time the machine is started, insert the following line into you /etc/vfstab file (all on one line with tabs separating the fields):
    /dev/md/dsk/d20       /dev/md/rdsk/d20      /db20  ufs     2       yes     -

  7. The volume, /db20 is now ready for use!


Create a Mirror From a File System That Can Be Unmounted

  1. The procedures document in this section can be used to mirror a file system that can be unmounted during normal operation. While most file systems can be unmounted during normal operation, there are some which cannot be unmounted like root /, /usr, /opt or swap. Procedures for mirroring those file systems which cannot be unmounted during normal operation are documented in the next section.

  2. First, identify the slice that contains the file system to me mirrored. For this example, I will be using /dev/dsk/c1t3d0s7 that contains an existing file system that I want to have mirrored. This is a file system that can be unmounted.

    The slice /dev/dsk/c1t3d0s7 contains an 8K UFS file system and is mounted on /db20.

  3. Use the metainit -f to put the mounted file system's slice in a single slice (one-way) concat/stripe. (This will be submirror1) The following command creates one stripe that contains one slice. The new metadevice will be named d21:
    # metainit -f d21 1 1 c1t3d0s7
    d21: Concat/Stripe is setup

  4. Create a second concat/stripe. (This will be submirror2)
    # metainit d22 1 1 c2t3d0s7
    d22: Concat/Stripe is setup

  5. Use the metainit -m command to create a one-way mirror with submirror1.
    # metainit d20 -m d21
    d20: Mirror is setup

  6. Unmount the file system
    # umount /db20

  7. Edit the /etc/vfstab file so that the existing file system entry now refers to the newly created mirror. In the following example snippet, I commented out the original entry for the c1t3d0s7 slice and added a new entry that refers to the newly created mirrored metadevice (d20) to be mounted to /db20:
    # /dev/dsk/c1t3d0s7     /dev/rdsk/c1t3d0s7    /db20   ufs     2       yes     -
    /dev/md/dsk/d20     /dev/md/rdsk/d20      /db20   ufs     2       yes     -

  8. Remount the file system:
    # mount /db20

  9. Use the metattach command to attach submirror2
    # metattach d20 d22
    d20: submirror d22 is attached

  10. After attaching d22 (submirror2), this triggers a mirror resync. Use the metastat command to view the progress of the mirror resync:
    # metastat d20
    d20: Mirror
        Submirror 0: d21
          State: Okay
        Submirror 1: d22
          State: Resyncing
        Resync in progress: 15 % done
        Pass: 1
        Read option: roundrobin (default)
        Write option: parallel (default)
        Size: 17470215 blocks
    
    d21: Submirror of d20
        State: Okay
        Size: 17470215 blocks
        Stripe 0:
            Device              Start Block  Dbase State        Hot Spare
            c1t3d0s7                3591     Yes   Okay
    
    
    d22: Submirror of d20
        State: Resyncing
        Size: 17470215 blocks
        Stripe 0:
            Device              Start Block  Dbase State        Hot Spare
            c2t3d0s7                3591     Yes   Okay

  11. From the above example, we didn't create a multi-way mirror right away. Rather, we created a one-way mirror with the metainit command then attach the additional submirrors with the metattach command. When the metattach command is not used, no resync operations occur and data could become corrupted. Also, do not create a two-mirror for a file system without first unmounting the file system , editing the /etc/vfstab file to reference the mirrored metadevice, and then mount the file system to the new mirrored metadevice before attaching the second submirror.


Create a Mirror From a File System That Cannot Be Unmounted

  1. The procedures in this section can be used to mirror file systems, such as /usr and /opt - those that cannot be unmounted during normal system usage.

  2. First, identify the slice that contains the file system to me mirrored. For this example, I will be using the /usr file system which is located on c0t0d0s6 that I want to have mirrored. This is a file system that cannot be unmounted.

    The slice /dev/dsk/c0t0d0s6 contains an 8K UFS file system and is mounted on /usr. This will be made into submirror1 (d21) using the metainit command. For submirror2 (to make our two-way mirror) I will be using /dev/dsk/c2t3d0s7.

  3. Use the metainit -f to put the mounted file system's slice in a single slice (one-way) concat/stripe. (This will be submirror1) The following command creates one stripe that contains one slice. The new metadevice will be named d21:
    # metainit -f d21 1 1 c0t0d0s6
    d21: Concat/Stripe is setup

  4. Create a second concat/stripe. (This will be submirror2)
    # metainit d22 1 1 c2t3d0s7
    d22: Concat/Stripe is setup

  5. Use the metainit -m command to create a one-way mirror with submirror1.
    # metainit d20 -m d21
    d20: Mirror is setup

  6. Edit the /etc/vfstab file so that the file system (/usr) now refers to the newly created mirror. In the example snippet, I commented out the original entry for the c0t0d0s6 slice and added a new entry that refers to the newly created mirror to be mounted to /usr:
    # /dev/dsk/c0t0d0s6     /dev/rdsk/c0t0d0s6      /usr    ufs     1       no      -
    /dev/md/dsk/d20 /dev/md/rdsk/d20        /usr    ufs     1       no      -

  7. Reboot the system
    # reboot

  8. Use the metattach command to attach submirror2
    # metattach d20 d22
    d20: submirror d22 is attached

  9. After attaching d22 (submirror2), this triggers a mirror resync. Use the metastat command to view the progress of the mirror resync:
    # metastat d20
    d20: Mirror
        Submirror 0: d21
          State: Okay
        Submirror 1: d22
          State: Resyncing
        Resync in progress: 8 % done
        Pass: 1
        Read option: roundrobin (default)
        Write option: parallel (default)
        Size: 16781040 blocks
    
    d21: Submirror of d20
        State: Okay
        Size: 16781040 blocks
        Stripe 0:
            Device              Start Block  Dbase State        Hot Spare
            c0t0d0s6                   0     No    Okay
    
    
    d22: Submirror of d20
        State: Resyncing
        Size: 17470215 blocks
        Stripe 0:
            Device              Start Block  Dbase State        Hot Spare
            c2t3d0s7                3591     Yes   Okay

  10. From the above example, we didn't create a multi-way mirror right away for the /usr file system. Rather, we created a one-way mirror with the metainit command then attach the additional submirrors with the metattach command (after rebooting the server). When the metattach command is not used, no resync operations occur and data could become corrupted. Also, do not create a two-mirror for a file system without first editing the /etc/vfstab file to reference the mirror metadevice and then rebooting the server before attaching the second submirror.


Create a Mirror From swap

  1. The procedures in this section of the documentation can be used to mirror the swap file system. The swap file system, like /usr and /opt, cannot be unmounted during normal system usage.

  2. First, identify the slice that contains the swap file system to me mirrored. For this example, the swap file system it is located on c0t0d0s3 that I want to have mirrored. This is a file system that cannot be unmounted.

    The slice /dev/dsk/c0t0d0s3 contains the swap file system. This will be made into submirror1 (d21) using the metainit command. For submirror2 (to make our two-way mirror) I will be using /dev/dsk/c2t3d0s7.

  3. Use the metainit -f to put the mounted file system (swap) in a single slice (one-way) concat/stripe. (This will be submirror1) The following command creates one stripe that contains one slice. The new metadevice will be named d21:
    # metainit -f d21 1 1 c0t0d0s3
    d21: Concat/Stripe is setup

  4. Create a second concat/stripe. (This will be submirror2)
    # metainit d22 1 1 c2t3d0s7
    d22: Concat/Stripe is setup

  5. Use the metainit -m command to create a one-way mirror with submirror1.
    # metainit d20 -m d21
    d20: Mirror is setup

  6. Edit the /etc/vfstab file so that the swap file system now refers to the newly created mirror. In the example snippet, I commented out the original swap entry for the c0t0d0s3 slice and added a new entry that refers to the newly created mirror:
    # /dev/dsk/c0t0d0s3     -       -       swap    -       no      -
    /dev/md/dsk/d20 -       -       swap    -       no      -

  7. Reboot the system
    # reboot

  8. Use the metattach command to attach submirror2
    # metattach d20 d22
    d20: submirror d22 is attached

  9. After attaching d22 (submirror2), this triggers a mirror resync. Use the metastat command to view the progress of the mirror resync:
    # metastat d20
    d20: Mirror
        Submirror 0: d21
          State: Okay
        Submirror 1: d22
          State: Resyncing
        Resync in progress: 32 % done
        Pass: 1
        Read option: roundrobin (default)
        Write option: parallel (default)
        Size: 2101200 blocks
    
    d21: Submirror of d20
        State: Okay
        Size: 2101200 blocks
        Stripe 0:
            Device              Start Block  Dbase State        Hot Spare
            c0t0d0s3                   0     No    Okay
    
    
    d22: Submirror of d20
        State: Resyncing
        Size: 17470215 blocks
        Stripe 0:
            Device              Start Block  Dbase State        Hot Spare
            c2t3d0s7                3591     Yes   Okay

  10. Verify that the swap file system is mounted on the d20 metadevice:
    # swap -l
    swapfile             dev  swaplo blocks   free
    /dev/md/dsk/d20     85,20     16 2101184 2101184

  11. From the above example, we didn't create a multi-way mirror right away for the swap file system. Rather, we created a one-way mirror with the metainit command then attach the additional submirrors with the metattach command (after rebooting the server). When the metattach command is not used, no resync operations occur and data could become corrupted. Also, do not create a two-mirror for a file system without first editing the /etc/vfstab file to reference the mirror metadevice and then rebooting the server before attaching the second submirror.


Create a Mirror From root (/)

  1. Use the following procedures to mirror the root (/) file system on a SPARC system.

    NOTE: The task for using the command-line to mirror root (/) on an x86 system is different from the task used for a SPARC system.

    When mirroring root (/), it is essential that you record the secondary root slice name to reboot the system if the primary submirror fails. This information should be written down, not recorded on the system, which may not be available in the event of a disk failure.

  2. Use the metainit -f to put the root (/) slice in a single slice (one-way) concat. (submirror1). (This will be submirror1)

    The following command creates one stripe that contains one slice. The new metadevice will be named d21:

    # metainit -f d21 1 1 c0t0d0s0
    d21: Concat/Stripe is setup

  3. Create a second concat/stripe. (This will be submirror2)
    # metainit d22 1 1 c0t2d0s0
    d22: Concat/Stripe is setup

  4. Use the metainit -m command to create a one-way mirror with submirror1.
    # metainit d20 -m d21
    d20: Mirror is setup

  5. Run the metaroot command. This will update both the /etc/vfstab and /etc/system files to reflect the new rootslice the system will boot from:
    # metaroot d20

  6. Run the lockfs command:
    # lockfs -fa

  7. Reboot the system
    # reboot

  8. Use the metattach command to attach submirror2
    # metattach d20 d22
    d20: submirror d22 is attached

  9. Record/document the alternate boot path in the case of failure.
    # ls -l /dev/rdsk/c0t2d0s0
    lrwxrwxrwx  1 root  root   42 Nov 12 09:35 /dev/rdsk/c0t2d0s0 -> ../../devices/pci@1f,0/ide@d/dad@2,0:a,raw

NOTE: The -f option forces the creation of the first concatenation, d21, which contains the mounted file system root (/) on /dev/dsk/c0t0d0s0. The second concatenation, d22, is created from /dev/dsk/c0t2d0s0. (This slice must be the same size or greater than that of d21) The metainit command with the -m option creates the one-way mirror d20 using the concatenation containing root (/). Next, the metaroot command edits the /etc/vfstab and /etc/system files so that the system may be booted with the root file system (/) on a metadevice. (It is a good idea to run lockfs -fa before rebooting.) After a reboot, the submirror d22 is attached to the mirror, causing a mirror resync. (The system verifies that the concatenations and the mirror are set up, and that submirror d22 is attached.) The ls -l command is run on the root raw device to determine the path to the alternate root device in case the system needs to be booted from it.


Creating a RAID5 Volume - (RAID 5)

A RAID5 metadevice uses storage capacity equivalent to one slice in the metadevice to store redundant information about user data stored on the remainder of the RAID5 metadevice's slices. The redundant information is distributed across all slices in the metadevice. Like a mirror, a RAID5 metadevice increases data availability, but with a minimum of cost in terms of hardware.

The system must contain at least three state database replicas before you can create RAID5 metadevices.

A RAID5 metadevice can only handle a single slice failure.

Follow the 20-percent rule when creating a RAID5 metadevice: because of the complexity of parity calculations, metadevices with greater than about 20 percent writes should probably not be RAID5 metadevices. If data redundancy is needed, consider mirroring.

There are drawbacks to a slice-heavy RAID5 metadevice: the more slices a RAID5 metadevice contains, the longer read and write operations will take if a slice fails.

A RAID5 metadevice must consist of at least three slices.

A RAID5 metadevice can be grown by concatenating additional slices to the metadevice. The new slices do not store parity information, however they are parity protected. The resulting RAID5 metadevice continues to handle a single slice failure.

The interlace value is key to RAID5 performance. It is configurable at the time the metadevice is created; thereafter, the value cannot be modified. The default interlace value is 16 Kbytes. This is reasonable for most applications.

Use the same size disk slices. Creating a RAID5 metadevice from different size slices results in unused disk space in the metadevice.

Do not create a RAID5 metadevice from a slice that contains an existing file system. Doing so will erase the data during the RAID5 initialization process.

RAID5 metadevices cannot be striped, concatenated, or mirrored.

  1. The following example creates a RAID 5 metadevice using 3 slices that will be named /dev/md/rdsk/d3 with the metainit command. Of the twelve disks available in the D1000 Disk Array, I will be using slices c1t4d0s7, c2t4d0s7, c1t5d0s7 as follows:
    # metainit d3 -r c1t4d0s7 c2t4d0s7 c1t5d0s7
    d3: RAID is setup
    Let's explain the details of the above example. The RAID5 metadevice d3 is created with the -r option from three slices. Because no interlace is specified, d3 uses the default of 16 Kbytes. The system verifies that the RAID5 metadevice has been set up, and begins initializing the metadevice.

  2. Use the metastat command to query your new RAID5 metadevices. After running the above command, the metadevice will go through an initialization state. This may take several minutes to complete. When using the metastat command, you will be able to view how far of the initialization is completed. You must wait for the initialization to finish before you can use the new RAID5 metadevice. The following screenshot shows the RAID5 metadevice during its initialization phase:
    # metastat d3
    d3: RAID
        State: Initializing
        Initialization in progress: 22% done
        Interlace: 32 blocks
        Size: 34936839 blocks
    Original device:
        Size: 34939712 blocks
            Device              Start Block  Dbase State        Hot Spare
            c1t4d0s7                3921     Yes   Initializing
            c2t4d0s7                3921     Yes   Initializing
            c1t5d0s7                3921     Yes   Initializing
    When the disks within the RAID5 metadevice are completed with their initialization phase, this is what it will look like:
    # metastat d3
    d3: RAID
        State: Okay
        Interlace: 32 blocks
        Size: 34936839 blocks
    Original device:
        Size: 34939712 blocks
            Device              Start Block  Dbase State        Hot Spare
            c1t4d0s7                3921     Yes   Okay
            c2t4d0s7                3921     Yes   Okay
            c1t5d0s7                3921     Yes   Okay

  3. Now that we have created our RAID5 metadevice, we can now pretend that the metadevice is a big partition (slice) on which we can do the usual file system things. Let's now create a UFS file system using the newfs command. I want to create a UFS file system with an 8KB block size:
    # newfs -i 8192 /dev/md/rdsk/d3
    newfs: construct a new file system /dev/md/rdsk/d3: (y/n)? y
    Warning: 1 sector(s) in last cylinder unallocated
    /dev/md/rdsk/d3:        34936838 sectors in 9729 cylinders of 27 tracks, 133 sectors
            17059.0MB in 609 cyl groups (16 c/g, 28.05MB/g, 3392 i/g)
    super-block backups (for fsck -F ufs -o b=#) at:
     32, 57632, 115232, 172832, 230432, 288032, 345632, 403232, 460832, 518432,
     576032, 633632, 691232, 748832, 806432, 864032, 921632, 979232, 1036832,
     1094432, 1152032, 1209632, 1267232, 1324832, 1382432, 1440032, 1497632,
    
                      <----------   SNIP   ---------->
    
     34016288, 34073888, 34131488, 34189088, 34246688, 34304288, 34361888,
     34419488, 34477088, 34534688, 34592288, 34649888, 34707488, 34765088,
     34822688, 34880288, 34933280,

  4. Finally, we mount the file system on /db3 as follows:
    # mkdir /db3
    # mount -F ufs /dev/md/dsk/d3 /db3

  5. To ensure that this new file system is mounted each time the machine is started, insert the following line into you /etc/vfstab file (all on one line with tabs separating the fields):
    /dev/md/dsk/d3       /dev/md/rdsk/d3      /db3  ufs     2       yes     -

Creating a Trans Metadevice

A trans metadevice enables UFS logging, which is the process of recording UFS updates in a log before the updates are applied to the UNIX file system. A trans metadevice can increase overall file system availability after reboot, because it reduces the amount of time fsck(1M) has to run when the system reboots.

The trans metadevice normally has two devices: the master device and the logging device. The master contains the file system that is being logged. The logging device contains the log and can be shared by several file systems. It is a sequence of records, each of which describes a change to a file system. Both the master device and the logging device can be a slice or a metadevice.

Though logs can be shared between file systems, heavily-used file systems should have their own logging device.

Small file systems with mostly read operations probably do not need to be logged.

Any UFS, except root (/), can be logged.

Even if you don't have an available slice for a logging device, you can still set up a trans metadevice without a logging device. This is useful if you plan to enable logging on exported file systems, but do not have an available slice for the logging device at this time.

Before creating trans metadevices, identify the slices or metadevice to be used as the master devices and logging devices.

Avoid placing logs on heavily-used disks.

Do not use a RAID5 metadevice as a logging device. Instead, use a mirror for data redundancy.

Logs (the logging device) can be placed on a slice that already contains a state database replica.

Plan on using one megabyte of log space as a minimum, and an additional one megabyte of log space per 100 megabytes of file system data, up to a maximum log size of 64 Mbytes. Logs greater than 64 Mbytes waste space.

The master devices and logging devices of the same trans metadevice should be located on separate drives and possibly separate controllers.

CAUTION: Mirroring logging devices is strongly recommended. Losing the data in a logging device because of device errors can leave a file system in an inconsistent state which fsck(1M) may not be able to fix without user intervention. Using a mirror for the master device is a good idea to ensure data redundancy.

This section will contain the following four examples for creating different types of trans metadevices:

  1. Creating a Trans Metadevice for a File System That Can Be Unmounted
  2. Creating a Trans Metadevice for a File System That Cannot Be Unmounted
  3. Creating a Trans Metadevice Using Mirrors

If you notice the section of this document entitled, "Partitioning the Disks", you will see that, for all twelve 9.1GB disk drives within the D1000 Disk Array, I made slice 0 a partition size of 100MB, just in case I want to attach a journaling trans metadevice later. Well, that time is now, and I will be using slice 0 of the above selected disks.

Creating a Trans Metadevice for a File System That Can Be Unmounted

The following example creates a trans metadevice that can be unmounted. The UFS file system (the master device) that will be logged is a stripped metadevice (RAID 0) named (/dev/md/rdsk/d0). The UFS file system bound to this metadevice is /db0.

(Keep in mind that the master device can also be a regular slice; it does not have to be a metadevice!)

The logging device will be created on a 100M slice /dev/dsk/c1t0d0s0.

In this example, the Trans Metadevice will be named d5.

  1. First, unmount the master device file system:
    # umount /db0

  2. Create the trans metadevice with the metainit command. In the following command, we will create a tran metadevice named d5 using the -t option. The metadevice, d0 is the master device, while the slice c1t0d0s0 will contain the logging device.
    # metainit d5 -t d0 c1t0d0s0
    d5: Trans is setup

  3. Next, edit the /etc/vfstab file so that the file system references the newly created trans metadevice, (d5), each time the machine is started. Change the entry in the file that would mount the UFS file system to the /dev/md/rdsk/d0 metadevice: (all on one line with tabs separating the fields):
    # /dev/md/dsk/d0        /dev/md/rdsk/d0 /db0    ufs     2       yes     -
    /dev/md/dsk/d5  /dev/md/rdsk/d5 /db0    ufs     2       yes     -

  4. Finally, mount the file system:
    # mount /db0

  5. Use the metastat command to query information on the new trans metadevice:
    # metastat d5
    d5: Trans
        State: Okay
        Size: 52407054 blocks
        Master Device: d0
        Logging Device: c1t0d0s0
    
    d0: Concat/Stripe
        Size: 52407054 blocks
        Stripe 0: (interlace: 64 blocks)
            Device              Start Block  Dbase State        Hot Spare
            c1t0d0s7                3591     Yes   Okay
            c2t0d0s7                3591     Yes   Okay
            c1t1d0s7                3591     Yes   Okay
    
    c1t0d0s0: Logging device for d5
        State: Okay
        Size: 202637 blocks
    
            Logging Device      Start Block  Dbase
            c1t0d0s0                5641     No

  6. Logging becomes effective for the file system when it is remounted.

    On subsequent reboots, instead of checking the file system, fsck displays a logging message for the trans metadevice:

    reboot
    ...
    /dev/md/rdsk/d5: is logging.
Creating a Trans Metadevice for a File System That Cannot Be Unmounted
The following example creates a trans metadevice that cannot be unmounted during normal system operation. The UFS file system (the master device) that will be logged is /usr. The /usr file system is currently mounted on /dev/dsk/c0t0d0s6. The logging device will be created on slice /dev/dsk/c1t1d0s0.

  1. First, create the trans metadevice with the metainit command. Slice /dev/dsk/c0t0d0s6 contains the /usr file system. The slice to contain the logging device is /dev/dsk/c1t1d0s0. Because /usr cannot be unmounted, the metainit command is run with the -f option to force the creation of the trans device, d6.
    # metainit -f d6 -t /dev/dsk/c0t0d0s6 c1t1d0s0
    d6: Trans is setup

  2. Edit the /etc/vfstab file so that the /usr file system now refers to the newly created trans metadevice. In the example snippet, I commented out the original entry for the /usr file system (c0t0d0s6) and added a new entry that refers to the newly created trans metadevice to be mounted to /usr:
    # /dev/dsk/c0t0d0s6     /dev/rdsk/c0t0d0s6      /usr    ufs     1       no      -
    /dev/md/dsk/d6   /dev/md/rdsk/d6   /usr    ufs     1       no      -

  3. Reboot the system
    # reboot

  4. Logging becomes effective for the file system when the system is rebooted!

  5. Finally, lets check the status of the new Trans Metadevice:
    # metastat d6
    d6: Trans
        State: Okay
        Size: 16781040 blocks
        Master Device: c0t0d0s6
        Logging Device: c1t1d0s0
    
            Master Device       Start Block  Dbase
            c0t0d0s6                   0     No
    
    c1t1d0s0: Logging device for d6
        State: Okay
        Size: 202637 blocks
    
            Logging Device      Start Block  Dbase
            c1t1d0s0                5641     No
Creating a Trans Metadevice Using Mirrors
You can (and should!!!) increase data availability of a trans metadevice by using mirrors for the master AND logging devices. Failure to mirror the logging device could result in significant data loss if the logging slice experiences errors. If you are mirroring the logging device, it is a good idea that the master device be a mirror also.

  1. This is an example of how to create a Trans Metadevice where both the master device and the logging device is a mirror.

  2. To start off with, we already have a mirrored metadevice named d20 which is mounted on /db20. This will be the file system we want to have being logged. (Not that it matters for this example, but the mirrored metadevice is a two-way mirror that contains two submirrors: d21 and d22.)

  3. Next, make a mirror (lets name this one d30) to be used as the logging device. It will be a two-way mirror containing submirrors d31 and d32.

  4. First, unmount the file system you want to have logged:
    # umount /db20

  5. Next, use the metainit command with the -t option to create the Trans Metadevice, d40:
    # metainit d40 -t d20 d30
    d40: Trans is setup

  6. Edit the /etc/vfstab file so that the /db20 file system now refers to the newly created trans metadevice. In the example snippet, I commented out the original entry for the /db20 file system (/dev/md/dsk/d0) and added a new entry that refers to the newly created trans metadevice to be mounted to /db20:
    # /dev/md/dsk/d0   /dev/md/rdsk/d0   /db0    ufs     2       yes     -
    /dev/md/dsk/d40   /dev/md/rdsk/d40   /db0    ufs     2       yes      -

  7. Logging becomes effective for the file system when the file system is remounted!

  8. On subsequent file system remounts or system reboots, instead of checking the file system, fsck displays a logging message for the metadevice:
    # reboot
    ...
    /dev/md/rdsk/d40: is logging

Creating a Hot Spare



Last modified on: Saturday, 18-Sep-2010 18:23:24 EDT
Page Count: 45016