Monday, November 27, 2006

Comparisons of Disksuite vs Volume Manager

Comparisons of Disksuite vs Volume Manager

Characteristic

Solstice Disksuite

Veritas Volume Manager

Availability

Free with a server license, pay for workstation. Sun seems to have an on again off again relationship about whether future development will continue to occur with this product, but it is currently available for Solaris8 and will continue to be supported in that capacity. The current word is that future development is on again. Salt liberally. (Just to further confuse things, SDS is free for Solaris 8 up to 8 CPUs, even for commercial).

Available from Sun or directly from Veritas (pricing may differ considerably. Also execellent educational pricing. Free with storage array or A5000 (but cannot use striping outside array device)

Installation

relatively easy. You must do special things to first achieve rootdisk, swap, and other disk mirroring in exactly the right order.

easy. Follow the onscreen prompts, let it do its reboots.

Upgrading

easy, remove patches, remove packages, add new packages

slightly more complex, but well documented. There are several ways to do it.

Replacing failed
root mirror

very easy. replace disk and resynchronize

very easy. replace disk and resynchronize

Replacing failed
primary root disk

relatively easy. boot off mirror, replace disk, resync, boot off primary.

easy to slightly complex depending on setup. Well documented. 11 steps or fewer.

Replacing failed
data disk in
redundant (mirrored
or RAID5) volume

trivial

trivial

extensibility / number
of volumes

Traditionally, relatively easy but EXTREMELY limited by usage of hard partition table on disk. Number of total volumes on a typical system is very limited because of this. If you have a lot of disks, you can still create a lot of metadevices. The default is 256 max, but this can be increased by setting nmd=XXXX in /kernel/drv/md.conf and then rebooting. Schemes for managing metadevice naming for large number of deices are available, but clunky and occassionally contrived. NOTE: SDS 4.2.1+ (avail Sol7) removes the reliance upon disk VTOC for making metadevices through 'soft partitions'.

trivial. No limitations will be encountered by most people. Number of volumes is potentially limitless.

Moving a volume

difficult unless special planning and precautions have been taken with laying out the proper partition and disk labels beforehand. Somewhat hidden by GUI.

trivial. on redundant volumes can be done on the fly.

Growing a volume

volume can be extended in two different ways. It can be concatenated with another region of space someplace else or, if there is contiguous space following ALL of the partitions of the original volume, the stripe can be extended. Using concatenation you could grow a 4 disk stripe by 2 additional disks. (e.g. 4 disk stripe concatenated with a 2 disk stripe).

volume can be extended in two different ways. The columes of the stripe can be extended for Raid0/5, simple single-disk volumes can be grown directly, and in VxVM > 3.0, a volume can be re-layed out (The number of columns in a RAID-5 stripe can be reconfigured on the fly!). Contiguous space is not required. In VXVM < 3.0 if you are increasing the size of a stripe, you must have space on disks where is the original number of disks in a stripe. You can't 'grow' (but could relayout) a 4 disk stripe by adding two more disks, but you could with 4. Extremely flexible.

Shrinking a volume
(only possible with
VxFS filesystem!)

difficult. You must adjust all disk or soft partitions manually.

trivial. vxresize can shrink filesystem and volume in one command..

Relayout volume
(change 4 disk
raid-5 volume to 5 disk volume

Requires dump/restore of data.

Available on the fly for VxVM > 3.0

Logging

in SDS a Meta-trans device may be used which provides a log based addition on top of a UFS filesystem. This transaction log, if used, should be mirrorred! (Loss of log results in a filesystem that may be corrupted even beyond fsck repair.) Use of a UFS+ logging filesystem instead of a trans device is a better alternative. UFS+ logging is availabe in Sol7 and above.

VxVM has RAID-5 logs and mirror/drl logs, Logging, if used need not be mirrored, and volume can continue operating if log fails. Having 1 is highly recommended for crash recovery. Logs are infinitessimally small, typically 1 disk cylinder or so. The SDS logs are really more equivalent to a VxFS log at the filesystem level, but it is worth mentioning the additional capabilities of VxVM in this regard. UFS+ with logging can also be used on a VxVM volume. There are many kinds of purpose-specific logs for things like fast mirror resync, volume replication, database logging, etc.

Performance

Your mileage may vary. SDS seems to excel at simple RAID-0 striping, but seems to be only marginally faster than VxVM. VxVM also seems to gain back when using large interleaves. For best results, benchmark YOUR data with YOUR app and pay very close attention to your data size and your stripe unit/interleave size. RAID5 on VxVM is almost always faster by 20-30%.
Links:
archive1, archive2

Notifications (see also)

SNMP traps are used for notification. You must have something set to receive them. Notifications are limited in scope.

VxVM uses email for notifying you when a volume is being moved because of bad blocks using hot relocation or sparing. The notification is very good.

Sparing

hot spare disks may be designated for a diskset, but must be done at the slice level.

hot spare disks may be designated for a diskgroup. Or, extra space on any disk can be used for dynamic hot relocation without the need for reserving a spare.

Terminology

SDS diskset = VxVM diskgroup, SDS metadevice = VxVM volume, SDS Trans device ~ VxVM log, VxVM has subdisks which are units of data (e.g. a column of a stripe) that have no SDS equivalent. VxVM plexes are groupings of subdisks (e.g. into a stripe) that have no real SDS equivalent. VxVM Volumes are groupings of plexes. (e.g. the data plex and a log plex, or 2 plexes for a 0+1 volume)

GUI

Most people prefer the VxVM GUI, though there are a few that prefer the (now 4 years old) SDS gui. SDS has been renamed SVM in Solaris9 and the GUI is supposedly much improved. VxVM has gone through 3-4 GUI incarnations. Disclaimer: I *never* use the GUI

command line usage

metareplace, metaclear to delete, metainit for volumes, metadb for state databases, etc

vxassist is used for creation of all volume types, vxsd, vxvol, vxplex operate on appropriate VxVM objects (see terminology above). Generally, there are many more vx specific commands, but normal usage rarely requires 20% of these except for advanced configurations (special initializations, using alternate pathing, etc)

device database configuration copies

Kept in special, replicated, partitions you must setup on disk and configure via metdb. /etc/opt/SUNWmd, and /etc/system contain the boot/metadb information and description of the volumes. Lose these and you have big problems. NOTE: in Solaris9 SVM, configuration copies are now kept on the metadisks themselves with the data, like VxVM

Kept in the private region on each disk. Disks can move about and the machine can be reinstalled without having to worry about losing data in volumes.

Typical usage

Simple mirroring of root disk, simple striping of disks where situation is relatively stagnant (e.g. just a bunch of disks with RAID0 and no immediate scaling or mobility concerns). Scales well in size of small number of volumes, but poorly in large number of smaller volumes.

enterprise ready. Data mobility, scalability, and configuration are all extensively addressed. Replacing failed encapsulated rootdisk is more complicated than it needs to be. See Sun best practices paper for a better way. Other alternatives exist.

Best features

Simple/simplistic - root/swap mirroring and simple striping is no brainer, free or nearly so. Easier to fix by hand (without immediate support) when something goes fubar (vxvm is much more complex to understand under the hood).

extensible, error notifications are good, extremeley configurable, relayout on the fly with VxVM > 3.0, nice integration with VxFS, best scalability. Excellent edu pricing.

Worst features

configuration syntax (meta*), configuration information stored on host system (< Sol9). Metadb/slices -- a remnant from SunOS4 days! -- needs to be completely redone; naming is inflexible and limited. Number of hard metadevices has kernel hack workarounds, but is still very limiting. Required mirroring of trans logs is inconvenient, but mitigated by using native UFS+ w/logging in Solaris7 and above. Lack of drive level hotsparing (see sparing) is extremely inconvenient.

expensive for enterprises and big servers, root mirroring and primary rootdisk failure for encapsulated rootdisk is too complex (but well documented) (should be fixed in VxVM 4.0), somewhat steep learning curve for advanced usage. Recovery from administrative SNAFUs (involving restore and single user mode) on a mirrored rootdisk can be troublesome.

Tips

keep backups of your of configuration in case of corruption. Regular usage of metastat, metastat -p, and prtvtoc can help.

In VxVM regular usage of vxprint -ht is useful for disaster recovery. There are also several different disaster recovery scripts here

Using VxVM for
data and SDS
for root mirroring

Many people do this. There are tradeoffs. One the one hand you have added simplicity in the management of your rootdisks by not having to deal with VxVM encapsulation, which can ease recovery and upgrades. On the other hand, you now have the added complexity of having to maintain a separate rootdg volume someplace else, or use a simple slice (which, by the way, neither Sun nor Veritas will support if there are problems). You also have the added complexity of managing too completely separate storage/volume management products and their associated nuances and patches. In the end it boils down to preference. There is no right or wrong answer here, though some will say otherwise. ;) Veritas VxVM 4.0 removes the requirement for rootdg.

Mirroring Disks with Solstice DiskSuite

Mirroring Disks with Solstice DiskSuite

Introduction

This paper will present a short introduction to mirroring two disks using Solstice DiskSuite. Although not as robust as Veritas Volume Manager (VxVM) (also distributed by Sun as the "Sun Enterprise Volume Manager" (SEVM)), DiskSuite is nonetheless still a popular choice for doing basic disk mirroring. This tutorial will be presented using an actual mirroring session with comments and explanations interspersed.

Note that the following procedure is for DiskSuite 4.2, which runs on Solaris 2.6 and Solaris 7. This procedure will also work with DiskSuite 4.2.1, distributed with Solaris 8. Solstice DiskSuite is now known as Solaris Volume Manager in Solaris 9 and later. There are a few fundamental changes between the two versions, particularly in the size of state database replicas which could cause data loss if this procedure is followed exactly using SVM. The terms DiskSuite and Solaris Volume Manager are both used in this document to refer to the software. Where necessary, specific versions will be pointed out.

Installation

The first step to setting up mirroring using DiskSuite is to install the DiskSuite packages and any necessary patches for systems prior to Solaris 9. SVM is part of the base system in Solaris 9. The latest recommended version of DiskSuite is 4.2 for systems running Solaris 2.6 and Solaris 7, and 4.2.1 for Solaris 8.
There are currently three packages and one patch necessary to install DiskSuite 4.2. They are:

SUNWmd (Required)

SUNWmdg (Optional GUI)

SUNWmdn (Optional SNMP log daemon)

106627-19 (obtain latest revision)

The packages should be installed in the same order as listed above. Note that a reboot is necessary after the install as new drivers will be added to the Solaris kernel.

For DiskSuite 4.2.1, install the following packages:

SUNWmdu (Commands)

SUNWmdr (Drivers)

SUNWmdx (64-Bit Drivers)

SUNWmdg (Optional GUI)

SUNWmdnr (Optional log daemon configs)

SUNWmdnu (Optional log daemon)

For Solaris 2.6 and 7, to make life easier, be sure to update your PATH and MANPATH variables to add DiskSuite's directories. Executables reside in /usr/opt/SUNWmd/sbin and man pages in /usr/opt/SUNWmd/man. In Solaris 8, DiskSuite files were moved to "normal" system locations (/usr/sbin) so path updates are not necessary.

The Environment

In this example we will be mirroring two disks, both on the same controller. The first disk will be the primary disk and the second will be the mirror. The disks are:

	Disk 1: c0t0d0         Disk 2: c0t1d0

The partitions on the disks are presented below. There are a few items of note here. Each disk is partitioned exactly the same. This is necessary to properly implement the mirrors. Slice 2, commonly referred to as the 'backup' slice, which represents the entire disk must not be mirrored. There are situations where slice 2 is used as a normal slice, however, this author would not recommend doing so.

The three unassigned partitions on each disk are configured to each be 10MB. These 10MB slices will hold the DiskSuite State Database Replicas, or metadbs. More information on the state database replicas will be presented below. In DiskSuite 4.2 and 4.2.1, a metadb only occupies 1034 blocks (517KB) of space. In SVM, they occupy 8192 blocks (4MB). This can lead to many problems during an upgrade if the slices used for the metadb replicas are not large enough to support the new larger databases.

	Disk 1:          c0t0d0s0:  /         c0t0d0s1:  swap         c0t0d0s2:  backup         c0t0d0s3:  unassigned         c0t0d0s4:  /var         c0t0d0s5:  unassigned         c0t0d0s6:  unassigned         c0t0d0s7:  /export  	Disk 2:          c0t1d0s0:  /         c0t1d0s1:  swap         c0t1d0s2:  backup         c0t1d0s3:  unassigned         c0t1d0s4:  /var         c0t1d0s5:  unassigned         c0t1d0s6:  unassigned         c0t1d0s7:  /export

The Database State Replicas

The database state replicas serve a very important function in DiskSuite. They are the repositories of information on the state and configuration of each metadevice (A logical device created through DiskSuite is known as a metadevice). Having multiple replicas is critical to the proper operation of DiskSuite.

There must be a minimum of three replicas. DiskSuite requires at least half of the replicas to be present in order to continue to operate.

51% of the replicas must be present in order to reboot.

Replicas should be spread across disks and controllers where possible.

In a three drive configuration, at least one replica should be on each disk, thus allowing for a one disk failure.

In a two drive configuration, such as the one we present here, there must be at least two replicas per disk. If there were only three and the disk which held two of them failed, there would not be enough information for DiskSuite to function and the system would panic.

Here we will create our state replicas using the metadb command:

# metadb -a -f /dev/dsk/c0t0d0s3 # metadb -a  /dev/dsk/c0t0d0s5 # metadb -a  /dev/dsk/c0t0d0s6 # metadb -a  /dev/dsk/c0t1d0s3 # metadb -a  /dev/dsk/c0t1d0s5  # metadb -a  /dev/dsk/c0t1d0s6

The -a and -f options used together create the initial replica. The -a option attaches a new database device and automatically edits the appropriate files.

Initializing Submirrors

Each mirrored meta device contains two or more submirrors. The meta device gets mounted by the operating system rather than the original logical device. Below we will walk through the steps involved in creating metadevices for our primary filesystems.

Here we create the two submirrors for the / (root) filesystem, as well as a one way mirror between the meta device and its first submirror.

# metainit -f d10 1 1 c0t0d0s0  # metainit -f d20 1 1 c0t1d0s0  # metainit d0 -m d10

The first two commands create the two submirrors. The -f option forces the creation of the submirror even though the specified slice is a mounted filesystem. The second two options 1 1 specify the number of stripes on the metadevice and the number of slices that make up the stripe. In a mirroring situation, this should always be 1 1. Finally, we specify the logical device that we will be mirroring.

After mirroring the root partition, we need to run the metaroot command. This command will update the root entry in /etc/vfstab with the new metadevice as well as add the appropriate configuration information into /etc/system. Ommitting this step is one of the most common mistakes made by those unfamiliar with DiskSuite. If you do not run the metaroot command before you reboot, you will not be able to boot the system!

# metaroot d0

Next, we continue to create the submirrors and initial one way mirrors for the metadevices which will replace the swap, and /var partitions.

# metainit -f d11 1 1 c0t0d0s1 # metainit -f d21 1 1 c0t1d0s1 # metainit d1 -m d11  # metainit -f d14 1 1 c0t0d0s4 # metainit -f d24 1 1 c0t1d0s4 # metainit d4 -m d14  # metainit -f d17 1 1 c0t0d0s7 # metainit -f d27 1 1 c0t1d0s7 # metainit d7 -m d17

Updating /etc/vfstab

The /etc/vfstab file must be updated at this point to reflect the changes made to the system. The / partition will have already been updated through the metaroot command run earlier, but the system needs to know about the new devices for swap and /var. The entries in the file will look something like the following:

/dev/md/dsk/d1  -       -       swap    -       no      - /dev/md/dsk/d4  /dev/md/rdsk/d4 /var    ufs     1       yes      - /dev/md/dsk/d7  /dev/md/rdsk/d7 /export ufs     1       yes      -

Notice that the device paths for the disks have changed from the normal style /dev/dsk/c#t#d#s# and /dev/rdsk/c#t#d#s# to the new metadevice paths, /dev/md/dsk/d# and /dev/md/rdsk/d#.

The system can now be rebooted. When it comes back up it will be running off of the new metadevices. Use the df command to verify this. In the next step we will attach the second half of the mirrors and allow the two drives to synchronize.

Attaching the Mirrors

Now we must attach the second half of the mirrors. Once the mirrors are attached it will begin an automatic synchonization process to ensure that both halves of the mirror are identical. The progress of the synchonization can be monitored using the metastat command. To attach the submirrors, issue the following commands:

# metattach d0 d20 # metattach d1 d21 # metattach d4 d24 # metattach d7 d27

Final Thoughts

With an eye towards recovery in case of a future disaster it may be a good idea to find out the physical device path of the root partition on the second disk in order to create an Open Boot PROM (OBP) device alias to ease booting the system if the primary disk fails. In order to find the physical device path, simply do the following:

# ls -l /dev/dsk/c0t1d0s0

This should return something similar to the following:

/sbus@3,0/SUNW,fas@3,8800000/sd@1,0:a

Using this information, create a device alias using an easy to remember name such as altboot. To create this alias, do the following in the Open Boot PROM:

ok nvalias altboot /sbus@3,0/SUNW,fas@3,8800000/sd@1,0:a

For more information on creating OBP device aliases, refer to the following document: TechNote: Modifying the CD-ROM nvalias on an Ultra 10 (IDE based) System".

It is now possible to boot off of the secondary device in case of failure using boot altboot from the OBP.

 

Sunday, November 26, 2006

Storage area networking

Introduction

A storage area network (SAN) can address several challenges faced by system administrators. Unlike direct-attached storage (DAS), sans allow the administrator to manage a central pool of storage and allocate it to individual hosts as needed. Furthermore, the optical nature of sans provide flexibility not available with direct-attached storage which typically uses electrical signaling. For example, one can unplug a fibre cable without having to worry about "blowing" an hba.

However, their size and complexity can be daunting, particularly when faced with the terminology:

  • World-wide name (wwn): Similar to an Ethernet address in the traditional networking world, this is a unique name assigned by the manufacturer to the HBA. This name is then used to grant and restrict access to other components of the san.
  • zone: Within a fabric, switches can segment the SAN into logical "zones". Only those elements in the same zone can "see" each other.

Topology of a simple SAN

In this exercise, we're going to configure a simple san composed of three nodes and one storage array. The three nodes are Solaris servers with Emulex or JNI fibre host bus adapters (hba). As we'll see, the configuration process for these hba models differs slightly, and the concepts are very similar. Our example uses Brocade switches and an EMC Symmetrix array, but the concepts would apply equally well to different vendors' products.

The following diagram illustrates several principals common to most san topologies:

  • There are two separate storage area networks ("san1" and "san2"). Though most switches support inter-switch links (ISLs), many vendors suggest isolation of fabrics for simple SANs as shown here.. Each switch therefore comprises an independent "fabric".

  • The array provides multiple I/O paths to every disk device. In the diagram below, for example, the disk device "2ad" is accessible via both fa4b and fa13a. Depending on the array, the two paths may be simultaneously active (active/active) or only one path may be valid at a time (active/passive).

  • Each host has a connection into each fabric. Host-based software load-balances the traffic between the two links if the array supports active/active disk access. Furthermore, the host should adjust to a single link failure by rerouting all traffic along the surviving path.

Take a moment to examine the diagram below and consider the host "pear". Assuming that it requires highly-available access to disk 2ad, note that there are two separate paths to that device:

  1. pear(fcaw0) -- san1(port2) -- san1(port15) -- fa13a -- 2ad

  2. pear(fcaw1) -- san2(port2) -- san2(port14) -- fa4b -- 2ad

Figure 1: Topology of a simple SAN:
Image of simple san

 

Our desired configuration requires the following:

  • The server pear requires access to two devices accessible via both fa4b and fa13a. We will assign EMC devices 2ad and 2b1 to pear.
  • The server apple requires access to four devices accessible via both fa4a and fa13b. We will assign EMC devices 15d, 161, 165, and 169 to apple.
  • The server banana requires access to three devices accessible via both fa4a and fa13b. We will assign EMC devices 16d, 171, and 175 to banana.
  • All servers require access to the VolumeLogix device 000 on each FA.

After the configuration is complete, the following devices will remain unassigned and available for future use: 2b5, 2b9, 2bd, 2c1, and 2c5.

Identify the WWN of devices

In the sequence below, we configure the host hba drivers to scan for the required luns, update the switch zoning, and finally configure the lun masking (VolumeLogix) in order to make the storage visible to the above three servers.

Determine world-wide names of all adapters in SAN configuration.

For host bus adapters, the world wide name is typically displayed in the /var/adm/messages file after the fibre card and software driver have been installed. An example of for an Emulex PCI fibre HBA is as follows. In this case, the relevant value the WWPN, the World-Wide Port Name:

Aug 26 09:26:28 banana lpfc: [ID 242157 kern.info] NOTICE:                  lpfc0:031:Link Up Event received Data: 1 1 0 0 Aug 26 09:26:31 banana lpfc: [ID 129691 kern.notice] NOTICE:                  lpfc0: Firmware Rev 3.20 (D2D3.20X3) Aug 26 09:26:31 banana lpfc: [ID 664688 kern.notice] NOTICE:                  lpfc0: WWPN:10:00:00:00:c9:28:22:38                  WWNN:20:00:00:00:c9:28:22:38 DID 0x210913  Aug 26 09:26:31 banana lpfc: [ID 494464 kern.info] NOTICE:                  lpfc1:031:Link Up Event received Data: 1 1 0 0 Aug 26 09:26:34 banana lpfc: [ID 129691 kern.notice] NOTICE:                  lpfc1: Firmware Rev 3.20 (D2D3.20X3) Aug 26 09:26:34 banana lpfc: [ID 664688 kern.notice] NOTICE:                  lpfc1: WWPN:10:00:00:00:c9:28:22:56                  WWNN:20:00:00:00:c9:28:22:56 DID 0x210913 

Note that the HBA vendor also may provide a tool that allows the system administrator to query the wwn of the HBA (e.g. Emulex supplies the lputil application). The world-wide name of the symmetrix fibre adapters (FA) can be obtained from EMC. In the case of our sample san, the world wide names are as follows:

Host name port SCSI target World Wide Name
apple fcaw0 200000e069415402
apple fcaw1 200000e0694157a0
pear fcaw0 200000e069415773
pear fcaw1 200000e069415036
banana lpfc0 10000000c9282238
banana lpfc1 10000000c9282256
symmetrix fa4a target 20 500604872363ee43
symmetrix fa4b target 21 500604872363ee53
symmetrix fa13a target 22 500604872363ee4c
symmetrix fa13b target 23 500604872363ee5c

Update the host configuration

There are several vendors of host bus adapters, and the vendor's documentation is the best reference for the configuration process. In this section, we'll highlight the process of configuring the server for two popular vendors, Emulex and JNI. Please refer to the vendor's documentation for your specific card.

Typically, there are two configuration files that need to be updated once the vendor's hba software has been installed. The hba driver's configuration file typically resides in the /kernel/drv directory, and must be updated to support persistent binding and any other configuration requirements specified by the array vendor. Secondly, the Solaris "sd" driver configuration file sd.conf must be updated to tell the operating system to scan for more than the default list of scsi disk devices. The examples below describe the process for configuring Emulex and JNI cards in to support an EMC Symmetrix array.

  • Configure the /kernel/drv/fcaw.conf on servers apple and pear:

    fca_nport = 1; failover = 60; def_hba_binding  = "fas"; fca_verbose = 1; 

    Also available for examination are the complete fcaw.conf files for PowerPath and non-PowerPath.

  • Configure the /kernel/drv/lpfc.conf on server banana.
    fcp-bind-WWPN="500604872363ee43:lpfc0t20",               "500604872363ee5c:lpfc1t23"; 

    Also available for examination are the complete lpfc.conf files for PowerPath and non-PowerPath.

By default, the Solaris server will scan for a limited number of scsi devices. The administrator has to update the /kernel/drv/sd.conf file to tell the sd driver to scan for a broader range of scsi devices. In both cases, the target number associated with the WWN of the array adapter is arbitrary. In our case, we've assigned scsi targets 20, 21, 22, and 23 to the four array adapters. The following list describes the additions to the /kernel/drv/sd.conf file for each of the three hosts:

  • apple:
    # Entries added for host apple to "see" lun 0-4 on fa4a and fa13b # fa4a = 500604872363ee43 name="sd" class="scsi" target=20 lun=0 hba="fcaw0" wwn="500604872363ee43"; name="sd" class="scsi" target=20 lun=1 hba="fcaw0" wwn="500604872363ee43"; name="sd" class="scsi" target=20 lun=2 hba="fcaw0" wwn="500604872363ee43"; name="sd" class="scsi" target=20 lun=3 hba="fcaw0" wwn="500604872363ee43"; name="sd" class="scsi" target=20 lun=4 hba="fcaw0" wwn="500604872363ee43";  # fa13b = 500604872363ee5c name="sd" class="scsi" target=23 lun=0 hba="fcaw1" wwn="500604872363ee5c"; name="sd" class="scsi" target=23 lun=1 hba="fcaw1" wwn="500604872363ee5c"; name="sd" class="scsi" target=23 lun=2 hba="fcaw1" wwn="500604872363ee5c"; name="sd" class="scsi" target=23 lun=3 hba="fcaw1" wwn="500604872363ee5c"; name="sd" class="scsi" target=23 lun=4 hba="fcaw1" wwn="500604872363ee5c"; 
  • pear:
    # Entries added for host pear to "see" lun 0-2 on fa4b and fa13a # fa4b = 500604872363ee53 name="sd" class="scsi" target=21 lun=0 hba="fcaw0" wwn="500604872363ee53"; name="sd" class="scsi" target=21 lun=1 hba="fcaw0" wwn="500604872363ee53"; name="sd" class="scsi" target=21 lun=2 hba="fcaw0" wwn="500604872363ee53";  # fa13a = 500604872363ee4c name="sd" class="scsi" target=22 lun=0 hba="fcaw1" wwn="500604872363ee4c"; name="sd" class="scsi" target=22 lun=1 hba="fcaw1" wwn="500604872363ee4c"; name="sd" class="scsi" target=22 lun=2 hba="fcaw1" wwn="500604872363ee4c"; 
  • banana:
    # Entries added for host banana to "see" lun 5-7 on fa4a and fa13b # fa4a = 500604872363ee43 name="sd" parent="lpfc" target=20 lun=0 hba="lpfc0"; name="sd" parent="lpfc" target=20 lun=5 hba="lpfc0"; name="sd" parent="lpfc" target=20 lun=6 hba="lpfc0"; name="sd" parent="lpfc" target=20 lun=7 hba="lpfc0";  # fa13b = 500604872363ee5c name="sd" parent="lpfc" target=23 lun=0 hba="lpfc1"; name="sd" parent="lpfc" target=23 lun=5 hba="lpfc1"; name="sd" parent="lpfc" target=23 lun=6 hba="lpfc1"; name="sd" parent="lpfc" target=23 lun=7 hba="lpfc1"; 
Update the /etc/system file as per EMC's requirements for the Symmetrix.
set sd:sd_max_throttle=20 set scsi_options = 0x7F8  * Powerpath?   Enter * No           sd:sd_io_time=0x78 * Yes          sd:sd_io_time=0x3C set sd:sd_io_time=0x3C 

Update the switch configuration

In the example session below, we configure the switch "san1" zoning so that one of the hbas in each host can "see" fa4a and fa13a of the Symmetrix. The switch "san2" is configured so that the other hba in each host can "see" fa4b and fa13b of the Symmetrix. We leave the configuration of san2 as an exercise for the reader.

This configuration illustrates a few principles that we have not discussed earlier. First, each host bus adapter is given only the access required to see the disks it needs. Secondly, each zone contains a single host bus adapter (i.e. a signal initiator) and a single array adapter.

Fabric OS (tm)  Release v2.5.1b                                 login: admin Password:    
san1:admin> switchshow
switchName:     san1   switchType:     6.1 switchState:    Online    switchRole:     Principal switchDomain:   2         switchId:       fffc02 switchWwn:      10:00:00:60:69:40:27:db switchBeacon:   OFF                     port  0: id  No_Light                                      port  1: id  Online        F-Port  20:00:00:e0:69:41:54:02 port  2: id  Online        F-Port  20:00:00:e0:69:41:57:73 port  3: id  Online        F-Port  10:00:00:00:c9:28:22:38 port  4: id  No_Light       port  5: id  No_Light       port  6: id  No_Light       port  7: id  No_Light       port  8: id  No_Light       port  9: id  No_Light       port 10: id  No_Light       port 11: id  No_Light       port 12: id  No_Light                                      port 13: id  No_Light   port 14: id  Online        F-Port  50:06:04:87:23:63:ee:43 port 15: id  Online        F-Port  50:06:04:87:23:63:ee:4c 
san1:admin> aliCreate "APPLE_FCAW0", "20:00:00:e0:69:41:54:02";
san1:admin> aliCreate "PEAR_FCAW0", "20:00:00:e0:69:41:57:73";
san1:admin> aliCreate "BANANA_LPFC0", "10:00:00:00:c9:28:22:38";
san1:admin> aliCreate "SYMM_FA4A", "50:06:04:87:23:63:ee:43";
san1:admin> aliCreate "SYMM_FA13A", "50:06:04:87:23:63:ee:4c";
san1:admin> zoneCreate "Z_APPLE_FCAW0_SYMM_FA4A", "APPLE_FCAW0; SYMM_FA4A";
san1:admin> zoneCreate "Z_PEAR_FCAW0_SYMM_FA13A", "PEAR_FCAW0; SYMM_FA13A";
san1:admin> zoneCreate "Z_BANANA_LPFC0_SYMM_FA4A", "BANANA_LPFC0; SYMM_FA4A";
san1:admin> cfgAdd "FABRIC1", "Z_APPLE_FCAW0_SYMM_FA4A; Z_PEAR_FCAW0_SYMM_FA13A; Z_BANANA_LPFC0_SYMM_FA4A";
san1:admin> cfgSave
Updating flash ...
san1:admin> cfgEnable "FABRIC1";
zone config "FABRIC1" is in effect
san1:admin> zoneshow
Defined configuration:  cfg:   FABRIC1                        Z_APPLE_FCAW0_SYMM_FA4A; Z_PEAR_FCAW0_SYMM_FA13A;                  Z_BANANA_LPFC0_SYMM_FA4A          zone:  Z_APPLE_FCAW0_SYMM_FA4A                              APPLE_FCAW0; SYMM_FA4A  zone:  Z_BANANA_LPFC0_SYMM_FA4A                         BANANA_LPFC0; SYMM_FA4A   zone:  Z_PEAR_FCAW0_SYMM_FA13A                          PEAR_FCAW0; SYMM_FA13A  alias: APPLE_FCAW0                                     20:00:00:e0:69:41:54:02  alias: BANANA_LPFC0                                    10:00:00:00:c9:28:22:38  alias: PEAR_FCAW0                                      20:00:00:e0:69:41:57:73  alias: SYMM_FA13A                                      50:06:04:87:23:63:ee:4c  alias: SYMM_FA4A                                       50:06:04:87:23:63:ee:43                                         Effective configuration:  cfg:   FABRIC1        zone:  Z_APPLE_FCAW0_SYMM_FA4A                         20:00:00:e0:69:41:54:02                 50:06:04:87:23:63:ee:43  zone:  Z_BANANA_LPFC0_SYMM_FA4A                         10:00:00:00:c9:28:22:38                  50:06:04:87:23:63:ee:43  zone:  Z_PEAR_FCAW0_SYMM_FA13A                          20:00:00:e0:69:41:57:73                 50:06:04:87:23:63:ee:4c                                         san1:admin> logout  

Update the array's lun masking configuration

The zoning configuration defined above allows the two virtual endpoints of the fibre link to "see" each other. In the zoning example shown above, however, both apple and banana are both zoned to "see" fa4a (zones Z_APPLE_FCAW0_SYMM_FA4A and Z_BANANA_LPFC0_SYMM_FA4A). Often, the administrator wants to restrict the host visibility of disks for a single fibre adapter. The process of restricting access to disks based on the host WWN is referred to as lun masking.

In the example below, the administrator uses the EMC fpath command (recently, EMC has deprecated the fpath command in favour of the symmask/symmaskdb commands) to explicitly grant disk access to host WWNs. Without this access, the host would not be able to see the disks even though the zoning and hba configuration allowed it to see the fibre adapter on the array.

# /usr/symcli/bin/fpath adddev  -d /dev/rdsk/c1t20d0s2 \                -w 200000e069415402 -f 4a -r "15d 161 165 169" # /usr/symcli/bin/fpath chgname -d /dev/rdsk/c1t20d0s2 \                -w 200000e069415402 -n "apple/fcaw0" # /usr/symcli/bin/fpath adddev  -d /dev/rdsk/c1t20d0s2 \                -w 200000e0694157a0 -f 13b -r "15d 161 165 169" # /usr/symcli/bin/fpath chgname -d /dev/rdsk/c1t20d0s2 \                -w 200000e0694157a0 -n "apple/fcaw1"  # /usr/symcli/bin/fpath adddev  -d /dev/rdsk/c1t20d0s2 \                -w 200000e069415773 -f 4b -r "2ad 2b1" # /usr/symcli/bin/fpath chgname -d /dev/rdsk/c1t20d0s2 \                -w 200000e069415773 -n "pear/fcaw0" # /usr/symcli/bin/fpath adddev  -d /dev/rdsk/c1t20d0s2 \                -w 200000e069415036 -f 13a -r "2ad 2b1" # /usr/symcli/bin/fpath chgname -d /dev/rdsk/c1t20d0s2 \                -w 200000e069415036 -n "pear/fcaw1"  # /usr/symcli/bin/fpath adddev  -d /dev/rdsk/c1t20d0s2 \                -w 10000000c9282238 -f 4a -r "16d 171 175" # /usr/symcli/bin/fpath chgname -d /dev/rdsk/c1t20d0s2 \                -w 10000000c9282238 -n "banana/lpfc0" # /usr/symcli/bin/fpath adddev  -d /dev/rdsk/c1t20d0s2 \                -w 10000000c9282256 -f 13b -r "16d 171 175" # /usr/symcli/bin/fpath chgname -d /dev/rdsk/c1t20d0s2 \                -w 10000000c9282256 -n "banana/lpfc1" 

Refresh the VolumeLogix database:

# /usr/symcli/bin/fpath refresh -d /dev/rdsk/c1t20d0s2

Query the VolumeLogix database:

# /usr/symcli/bin/fpath lsdb -d /dev/rdsk/c1t20d0s2 -s on

Make a backup of the VolumeLogix database:

# /usr/symcli/bin/fpath backupdb -d /dev/rdsk/c1t20d0s2 \      -o /usr/emc/VolumeLogix/backup/vcmdb.`date'+%Y%m%d'` 

Reboot servers

Perform a reconfiguration reboot (e.g. "reboot -- -r") on all three servers.

You should see the desired disks. Put a Sun label on them via the "format" command and the configuration is complete.

To summarize, four "layers" of configuration must be correct in order to grant a host access to SAN storage:

  1. physical: The physical cabling linking the hba, switch, and array must function.
  2. host configuration: The host bus adapter must be configured according to the manufacturer's documentation. Our example referred to Emulex and JNI cards, but the process for other vendors is usually similar. Furthermore, The Solaris sd.conf configuration file must be updated so that the host scans for the appropriate luns.
  3. san: The san fabric must be configured so that the host bus adapter and the array fibre adapter are in the same zone.
  4. lun masking: The array lun masking configuration must be updated to give the host permission to access the required disks.

References:

Replacing a failed boot disk

Replacing a failed bootdisk

In the following example, the host has a failed bootdisk (c0t0d0). Fortunately, the system is using DiskSuite, with a mirror at c0t1d0. The following sequence of steps can be used to restore the system to full redundancy.

System fails to boot

When the system attempts to boot, it fails to find a valid device as required by the boot-device path at device alias "disk". It then attempts to boot from the network:

screen not found. Can't open input device. Keyboard not present.  Using ttya for input and output.  Sun Ultra 30 UPA/PCI (UltraSPARC-II 296MHz), No Keyboard OpenBoot 3.27, 512 MB memory installed, Serial #9377973. Ethernet address 8:0:20:8f:18:b5, Host ID: 808f18b5.    Initializing Memory Timeout waiting for ARP/RARP packet Timeout waiting for ARP/RARP packet Timeout waiting for ARP/RARP packet ... 

Boot from mirror

At this point, the administrator realizes that the boot disk has failed, and queries the device aliases to find the one corresponding to the disksuite mirror:

ok devalias sds-mirror               /pci@1f,4000/scsi@3/disk@1,0 sds-root                 /pci@1f,4000/scsi@3/disk@0,0 net                      /pci@1f,4000/network@1,1 disk                     /pci@1f,4000/scsi@3/disk@0,0 cdrom                    /pci@1f,4000/scsi@3/disk@6,0:f ... 

The administrator then boots the system from the mirror device "sds-mirror":

ok boot sds-mirror 

The system starts booting off of sds-mirror. However, because there are only two of the original four state database replicas available, a quorum is not achieved. The system requires manual intervention to remove the two failed state database replicas:

Starting with DiskSuite 4.2.1, an optional /etc/system parameter exists which allows DiskSuite to boot with just 50% of the state database replicas online. For example, if one of the two boot disks were to fail, just two of the four state database replicas would be available. Without this /etc/system parameter (or with older versions of DiskSuite), the system would complain of "insufficient state database replicas", and manual intervention would be required on bootup. To enable the "50% boot" behaviour with DiskSuite 4.2.1, execute the following command:

# echo "set md:mirrored_root_flag=1" >> /etc/system

Boot device: /pci@1f,4000/scsi@3/disk@1,0  File and args:  SunOS Release 5.8 Version Generic_108528-07 64-bit Copyright 1983-2001 Sun Microsystems, Inc.  All rights reserved. WARNING: md: d10: /dev/dsk/c0t0d0s0 needs maintenance WARNING: forceload of misc/md_trans failed WARNING: forceload of misc/md_raid failed WARNING: forceload of misc/md_hotspares failed configuring IPv4 interfaces: hme0. Hostname: pegasus metainit: pegasus: stale databases  Insufficient metadevice database replicas located.  Use metadb to delete databases which are broken. Ignore any "Read-only file system" error messages. Reboot the system when finished to reload the metadevice database. After reboot, repair any broken database replicas which were deleted.  Type control-d to proceed with normal startup, (or give root password for system maintenance): ******  single-user privilege assigned to /dev/console. Entering System Maintenance Mode  Oct 17 19:11:29 su: 'su root' succeeded for root on /dev/console Sun Microsystems Inc.   SunOS 5.8       Generic February 2000  # metadb -i         flags           first blk       block count     M     p             unknown         unknown         /dev/dsk/c0t0d0s5     M     p             unknown         unknown         /dev/dsk/c0t0d0s6      a m  p  lu         16              1034            /dev/dsk/c0t1d0s5      a    p  l          16              1034            /dev/dsk/c0t1d0s6  o - replica active prior to last mddb configuration change  u - replica is up to date  l - locator for this replica was read successfully  c - replica's location was in /etc/lvm/mddb.cf  p - replica's location was patched in kernel  m - replica is master, this is replica selected as input  W - replica has device write errors  a - replica is active, commits are occurring to this replica  M - replica had problem with master blocks  D - replica had problem with data blocks  F - replica had format problems  S - replica is too small to hold current data base  R - replica had device read errors   # metadb -d c0t0d0s5 c0t0d0s6 metadb: pegasus: /etc/lvm/mddb.cf.new: Read-only file system  # metadb -i         flags           first blk       block count      a m  p  lu         16              1034            /dev/dsk/c0t1d0s5      a    p  l          16              1034            /dev/dsk/c0t1d0s6  o - replica active prior to last mddb configuration change  u - replica is up to date  l - locator for this replica was read successfully  c - replica's location was in /etc/lvm/mddb.cf  p - replica's location was patched in kernel  m - replica is master, this is replica selected as input  W - replica has device write errors  a - replica is active, commits are occurring to this replica  M - replica had problem with master blocks  D - replica had problem with data blocks  F - replica had format problems  S - replica is too small to hold current data base  R - replica had device read errors  # reboot -- sds-mirror  

Check extent of failures

Once the reboot is complete, the administrator then logs into the system and checks the status of the DiskSuite metadevices. Not only have the state database replicas failed, but all of the DiskSuite metadevices previously located on device c0t0d0 need to be replaced. Clearly the disk has completely failed.

pegasus console login: root Password:  ****** Oct 17 19:14:03 pegasus login: ROOT LOGIN /dev/console Last login: Thu Oct 17 19:02:42 from rambler.wakefie Sun Microsystems Inc.   SunOS 5.8       Generic February 2000  # metastat d0: Mirror     Submirror 0: d10       State: Needs maintenance      Submirror 1: d20       State: Okay              Pass: 1     Read option: roundrobin (default)     Write option: parallel (default)     Size: 13423200 blocks  d10: Submirror of d0     State: Needs maintenance      Invoke: metareplace d0 c0t0d0s0 <new device>     Size: 13423200 blocks     Stripe 0:         Device              Start Block  Dbase State        Hot Spare         c0t0d0s0                   0     No    Maintenance     d20: Submirror of d0     State: Okay              Size: 13423200 blocks     Stripe 0:         Device              Start Block  Dbase State        Hot Spare         c0t1d0s0                   0     No    Okay            d1: Mirror     Submirror 0: d11       State: Needs maintenance      Submirror 1: d21       State: Okay              Pass: 1     Read option: roundrobin (default)     Write option: parallel (default)     Size: 2100000 blocks  d11: Submirror of d1     State: Needs maintenance      Invoke: metareplace d1 c0t0d0s1 <new device>     Size: 2100000 blocks     Stripe 0:         Device              Start Block  Dbase State        Hot Spare         c0t0d0s1                   0     No    Maintenance     d21: Submirror of d1     State: Okay              Size: 2100000 blocks     Stripe 0:         Device              Start Block  Dbase State        Hot Spare         c0t1d0s1                   0     No    Okay            d4: Mirror     Submirror 0: d14       State: Needs maintenance      Submirror 1: d24       State: Okay              Pass: 1     Read option: roundrobin (default)     Write option: parallel (default)     Size: 2100000 blocks  d14: Submirror of d4     State: Needs maintenance      Invoke: metareplace d4 c0t0d0s4 <new device>     Size: 2100000 blocks     Stripe 0:         Device              Start Block  Dbase State        Hot Spare         c0t0d0s4                   0     No    Maintenance     d24: Submirror of d4     State: Okay              Size: 2100000 blocks     Stripe 0:         Device              Start Block  Dbase State        Hot Spare         c0t1d0s4                   0     No    Okay           

The administrator replaces the failed disk with a new disk of the same geometry. Depending on the system model, the disk replacement may require that the system be powered down. The replacement disk is then partitioned identically to the mirror, and state database replicas are copied onto the replacement disk. Finally, the metareplace command copies that data from the mirror to the replacement disk, restoring redundancy to the system.

# prtvtoc /dev/rdsk/c0t1d0s2 | fmthard -s - /dev/rdsk/c0t0d0s2 fmthard:  New volume table of contents now in place.  # installboot /usr/platform/sun4u/lib/fs/ufs/bootblk /dev/rdsk/c0t0d0s0  # metadb -f -a /dev/dsk/c0t0d0s5  # metadb -f -a /dev/dsk/c0t0d0s6  # metadb -i         flags           first blk       block count      a        u         16              1034            /dev/dsk/c0t0d0s5      a        u         16              1034            /dev/dsk/c0t0d0s6      a m  p  luo        16              1034            /dev/dsk/c0t1d0s5      a    p  luo        16              1034            /dev/dsk/c0t1d0s6  o - replica active prior to last mddb configuration change  u - replica is up to date  l - locator for this replica was read successfully  c - replica's location was in /etc/lvm/mddb.cf  p - replica's location was patched in kernel  m - replica is master, this is replica selected as input  W - replica has device write errors  a - replica is active, commits are occurring to this replica  M - replica had problem with master blocks  D - replica had problem with data blocks  F - replica had format problems  S - replica is too small to hold current data base  R - replica had device read errors  # metareplace -e d0 c0t0d0s0 d0: device c0t0d0s0 is enabled  # metareplace -e d1 c0t0d0s1 d1: device c0t0d0s1 is enabled  # metareplace -e d4 c0t0d0s4 d4: device c0t0d0s4 is enabled  

Once the resync process is complete, operating system redundancy has been restored.

Admin's Guide to Solstice Disk Suite

About Solstice DiskSuite:
SolsticeTM DiskSuiteTM 4.2.1 is a software product that manages data and disk drives.
Solstice DiskSuite 4.2.1 runs on all SPARCTM systems running SolarisTM 8, and on all x86 systems running Solaris 8.
DiskSuite's diskset feature is supported only on the SPARC platform edition of Solaris. This feature is not supported on x86 systems.
Tableof Contents
1.Advantages of Disksuite
2. Disksuite terms
3. Disksuite Packages
4. Installing DiskSuite 4.2.1 in Solaris 8
5. Creating State Database
6. Creating MetaDevices
6.1. Concatenated Metadevice
6.2. Striped Metadevice
6.3 Mirrored Metadevice
6.3.1 Simple mirror
6.3.2 Mirroring a unmountable Partition
6.3.3 root mirroring & /usr mirroring
6.3.4 Making alternate root disk bootable
6.3.5 Setting alternate boot path for root mirror
6.4 RAID 5
6.5 TransMeta Device
6.5.1 TransMeta device for unmountable partition
6.5.2 TransMeta device for non unmountable partition.
6.5.3 TransMeta device using Mirror
6.6 Hotspare Pool
6.6.1 Adding a Hotspare to Mirror
6.6.2 Adding a Hotspare to RAID5
6.6.3 Adding a disk to Hotspare Pool.
6.7 disksets
6.7.1 Creating two diskset
6.7.2 Adding disk to diskset
6.7.3 Creating Mirror in Diskset
7. TroubleShooting
7.1 Recovering from Stale State Database Replicas
7.2 Metadevice Errors
8.0 Next Steps







InventioConsulting
Offering Unix , Solaris & IT Training Services throughout the UK.



1. Advantages of Disksuite
Solstice disk suite provides three major functionalities :
1. Over come the disk size limitation by providing for joining of multiple disk slices to form a bigger volume.
2. Fault Tolerance by allowing mirroring of data from one disk to another and keeping parity information in RAID5.
3. Performance enhancement by allowing spreading the data space over multiple disks .
2. Disksuite terms
Metadevice :A virtual device composed of several physical devices - slices/disks . All the operations are carried out using metadevice name and transparently implemented on the individual device.
RAID : A group of disks used for creating a virtual volume is called array and depending on disk/slice arrangement these are called various types of RAID (Redundant Array of Independent Disk ).
RAID 0 Concatenation/Striping
RAID 1 Mirroring
RAID 5 Striped array with rotating parity.
Concatenation :Concatenation is joining of two or more disk slices to add up the disk space . Concatenation is serial in nature i.e. sequential data operation are performed serially on first disk then second disk and so on . Due to serial nature new slices can be added up without having to take the backup of entire concatenated volume ,adding slice and restoring backup .
Striping :Spreading of data over multiple disk drives mainly to enhance the performance by distributing data in alternating chunks - 16 k interleave across the stripes . Sequential data operations are performed in parallel on all the stripes by reading/writing 16k data blocks alternatively form the disk stripes.
Mirroring : Mirroring provides data redundancy by simultaneously writing data on to two sub mirrors of a mirrored device . A submirror can be a stripe or concatenated volume and a mirror can have three mirrors . Main concern here is that a mirror needs as much as the volume to be mirrored.
RAID 5 : RAID 5 provides data redundancy and advantage of striping and uses less space than mirroring . A RAID 5 is made up of at least three disk which are striped with parity information written alternately on all the disks . In case of a single disk failure the data can be rebuild using the parity information from the remaining disks .




3. Disksuite Packages :
Solstice disk suite is a part of server edition of the Solaris OS and is not included with desktop edition . The software is in pkgadd format & can be found in following locations in CD :
Solaris 2.6 - “Solaris Server Intranet Extensions 1.0” CD.
Solaris 7 - “Solaris Easy Access Server 3.0”
Solaris 8 - “Solaris 8 Software 2 of 2”
Solaris 2.6 & 2.7 Solstice Disk suite version is 4.2 . Following packages are part of it but only the "SUNWmd" is the minimum required package and a patch.
SUNWmd - Solstice DiskSuite
SUNWmdg - Solstice DiskSuite Tool
SUNWmdn - Solstice DiskSuite Log Daemon
Patch No. 106627-04 (obtain latest revision)
Solaris 8 DiskSuite version is 4.2.1 .Following are the minimum required packages ..
SUNWmdr Solstice DiskSuite Drivers (root)
SUNWmdu Solstice DiskSuite Commands
SUNWmdx Solstice DiskSuite Drivers (64-bit)
4. Installing DiskSuite 4.2.1 in Solaris 8
# cd /cdrom/sol_8_401_sparc_2/Solaris_8/EA/products/DiskSuite_4.2.1/sparc/Packages
# pkgadd -d .
The following packages are available:
1 SUNWmdg Solstice DiskSuite Tool
(sparc) 4.2.1,REV=1999.11.04.18.29
2 SUNWmdja Solstice DiskSuite Japanese localization
(sparc) 4.2.1,REV=1999.12.09.15.37
3 SUNWmdnr Solstice DiskSuite Log Daemon Configuration Files
(sparc) 4.2.1,REV=1999.11.04.18.29
4 SUNWmdnu Solstice DiskSuite Log Daemon
(sparc) 4.2.1,REV=1999.11.04.18.29
5 SUNWmdr Solstice DiskSuite Drivers
(sparc) 4.2.1,REV=1999.12.03.10.00
6 SUNWmdu Solstice DiskSuite Commands
(sparc) 4.2.1,REV=1999.11.04.18.29
7 SUNWmdx Solstice DiskSuite Drivers(64-bit)
(sparc) 4.2.1,REV=1999.11.04.18.29
Select 1,3,4,5,6,7 packages .
Enter ‘yes’ for the questions asked during installation and reboot the system after installation .
Put /usr/opt/SUNWmd/bin in root PATH as the DISKSUITE commands are located in this directory
5. Creating State Database :
State meta database , metadb , keeps information of the metadevices and is needed for Disksuite operation . Disksuite can not function without metadb so a copy of replica databases is placed on different disks to ensure that a copy is available in case of a complete disk failure .
Metadb needs a dedicated disk slice so create partitions of about 5 Meg. on the disks for metadb. If there is no space available for metadb then it can be taken from swap . Having metadb on two disks can create problems as DISKSUITE looks for database replica number > 50% of total replicas and if one of the two disks crashes the replica falls at 50% . On next reboot system will go to single user mode and one has to recreate additional replicas to correct the metadb errors.
The following command creates three replicas of metadb on three disk slices.
#metadb -a -f -c 3 /dev/dsk/c0t1d0s6 /dev/dsk/c0t2d0s6 /dev/dsk/c0t3d0s6
6. Creating MetaDevices :
Metadevices can be created in two ways
1. Directly from the command line
2. Editing the /etc/opt/SUNWmd/ file as per example given in the md.tab and
initializing devices on command line using metainit .
6.1 ) Creating a concatenated Metadevice :
#metainit d0 3 1 /dev/dsk/c0t0d0s4 1 /dev/dsk/c0t0d0s4 1 /dev/dsk/c0t0d0s4
d0 - metadevice name
3 - Total Number of Slices
1 - Number of Slices to be added followed by slice name.
6.2 ) Creating a stripe of 32k interleave
# metainit d10 1 2 c0t1d0s2 c0t2d0s2 -i 32k
d0 - metadevice name
1 - Total Number of Stripe
2- Number of Slices to be added to stripe followed by slice name .
-i chunks of data written alternatively on stripes.
6.3 ) Creating a Mirror :
A mirror is a metadevice composed of one or more submirrors. A submirror is made of one or more striped or concatenated metadevices. Mirroring data provides you with maximum data availability by maintaining multiple copies of your data. The system must contain at least three state database replicas before you can create mirrors. Any file system including root (/), swap, and /usr, or any application such as a database, can use a mirror.
6.3.1 ) Creating a simple mirror from new partitions
1.Create two stripes for two submirors as d21 & d22
# metainit d21 1 1 c0t0d0s2
d21: Concat/Stripe is setup
# metainit t d22 1 1 c1t0d0s2
d22: Concat/Stripe is setup
2. Create a mirror device (d20) using one of the submirror (d21)
# metainit d20 -m d21
d20: Mirror is setup
3. Attach the second submirror (D21) to the main mirror device (D20)
# metattach d20 d22
d50: Submirror d52 is attached.
4. Make file system on new metadevice
#newfs /dev/md/rdsk/d20
edit /etc/vfstab to mount the /dev/dsk/d20 on a mount point.
6.3.2.) Mirroring a Partitions with data which can be unmounted
# metainit f d1 1 1 c1t0d0s0
d1: Concat/Stripe is setup
# metainit d2 1 1 c2t0d0s0
d2: Concat/Stripe is setup
# metainit d0 -m d1
d0: Mirror is setup
# umount /local
(Edit the /etc/vfstab file so that the file system references the mirror)
#mount /local
#metattach d0 d2
d0: Submirror d2 is attached

6.3.3 ) Mirroring a Partitions with data which can not be unmounted - root and /usr
· /usr mirroring
# metainit -f d12 1 1 c0t3d0s6
d12: Concat/Stripe is setup
# metainit d22 1 1 c1t0d0s6
d22: Concat/Stripe is setup
# metainit d2 -m d12
d2: Mirror is setup
(Edit the /etc/vfstab file so that /usr references the mirror)
# reboot
...
...
# metattach d2 d22
d2: Submirror d22 is attached
· root mirroring
# metainit -f d11 1 1 c0t3d0s0
d11: Concat/Stripe is setup
# metainit d12 1 1 c1t3d0s0
d12: Concat/Stripe is setup
# metainit d10 -m d11
d10: Mirror is setup
# metaroot d10
# lockfs -fa
# reboot


# metattach d10 d12
d10: Submirror d12 is attached
6.3.4 ) Making Mirrored disk bootable
a.) # installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c0t1d0s0
6.3.5 ) Creating alterbate name for Mirrored boot disk
a.) Find physical path name for the second boot disk
# ls -l /dev/rdsk/c1t3d0s0
lrwxrwxrwx 1 root root 55 Sep 12 11:19 /dev/rdsk/c1t3d0s0 ->../../devices/sbus@1,f8000000/esp@1,200000/sd@3,0:a
b.) Create an alias for booting from disk2
ok> nvalias bootdisk2 /sbus@1,f8000000/esp@1,200000/sd@3,0:a
ok> boot bootdisk2
6.4 ) Creating a RAID 5 volume :
The system must contain at least three state database replicas before you can create RAID5 metadevices.
A RAID5 metadevice can only handle a single slice failure.A RAID5 metadevice can be grown by concatenating additional slices to the metadevice. The new slices do not store parity information, however they are parity protected. The resulting RAID5 metadevice continues to handle a single slice failure. Create a RAID5 metadevice from a slice that contains an existing file system.will erase the data during the RAID5 initialization process .The interlace value is key to RAID5 performance. It is configurable at the time the metadevice is created; thereafter, the value cannot be modified. The default interlace value is 16 Kbytes which is reasonable for most of the applications.

6.4.1.) To setup raid5 on three slices of different disks .
# metainit d45 -r c2t3d0s2 c3t0d0s2 c4t0d0s2
d45: RAID is setup

6.5.) Creating a Trans Meta Device :
Trans meta devices enables ufs logging . There is one logging device and a master device and all file system changes are written into logging device and posted on to master device . This greatly reduces the fsck time for very large file systems as fsck has to check only the logging device which is usually of 64 M. maximum size.Logging device preferably should be mirrored and located on a different drive and controller than the master device .
Ufs logging can not be done for root partition.
6.5.1) Trans Metadevice for a File System That Can Be Unmounted
· /home2
1. Setup metadevice
# umount /home2
# metainit d63 -t c0t2d0s2 c2t2d0s1
d63: Trans is setup
Logging becomes effective for the file system when it is remounted
2. Change vfstab entry & reboot
from
/dev/md/dsk/d2 /dev/md/rdsk/d2 /home2 ufs 2 yes -
to
/dev/md/dsk/d63 /dev/md/rdsk/d63 /home2 ufs 2 yes -
# mount /home2
Next reboot displays the following message for logging device
# reboot
...
/dev/md/rdsk/d63: is logging
6.5.2 ) Trans Metadevice for a File System That Cannot Be Unmounted
· /usr
1.) Setup metadevice
# metainit -f d20 -t c0t3d0s6 c1t2d0s1
d20: Trans is setup
2.) Change vfstab entry & reboot:
from
/dev/dsk/c0t3d0s6 /dev/rdsk/c0t3d0s6 /usr ufs 1 no -
to
/dev/md/dsk/d20 /dev/md/rdsk/d20 /usr ufs 1 no -
# reboot
6.5.3 ) TransMeta device using Mirrors
1.) Setup metadevice
#umount /home2
#metainit d64 -t d30 d12
d64 trans is setup
2.) Change vfstab entry & reboot:
from
/dev/md/dsk/d30 /dev/md/rdsk/d30 /home2 ufs 2 yes
to
/dev/md/dsk/d64 /dev/md/rdsk/d64 /home2 ufs 2 yes
6.6 ) HotSpare Pool
A hot spare pool is a collection of slices reserved by DiskSuite to be automatically substituted in case of a slice failure in either a submirror or RAID5 metadevice . A hot spare cannot be a metadevice and it can be associated with multiple submirrors or RAID5 metadevices. However, a submirror or RAID5 metadevice can only be asociated with one hot spare pool. .Replacement is based on a first fit for the failed slice and they need to be replaced with repaired or new slices. Hot spare pools may be allocated, deallocated, or reassigned at any time unless a slice in the hot spare pool is being used to replace damaged slice of its associated metadevice.
6.6.1) Associating a Hot Spare Pool with Submirrors
# metaparam -h hsp100 d10
# metaparam -h hsp100 d11
# metastat d0
d0: Mirror
Submirror 0: d10
State: Okay
Submirror 1: d11
State: Okay
...
d10: Submirror of d0
State: Okay
Hot spare pool: hsp100
...
d11: Submirror of d0
State: Okay
Hot spare pool: hsp100
6.6.2 ) Associating or changing a Hot Spare Pool with a RAID5 Metadevice
#metaparam -h hsp001 d10
#metastat d10
d10:RAID
State: Okay
Hot spare Pool: hsp001
6.6.3 ) Adding a Hot Spare Slice to All Hot Spare Pools
# metahs -a -all /dev/dsk/c3t0d0s2
hsp001: Hotspare is added
hsp002: Hotspare is added
hsp003: Hotspare is added

6.7 ) Disksets
Few important points about disksets :
A diskset is a set of shared disk drives containing DiskSuite objects that can be shared exclusively (but not concurrently) by one or two hosts. Disksets are used in high availability failover situations where the ownership of the failed machine’s diskset is transferred to other machine . Disksets are connected to two hosts for sharing and must have same attributes , controller/target/drive , in both machines except for the ownership .
DiskSuite must be installed on each host that will be connected to the diskset.There is one metadevice state database per shared diskset and one on the "local" diskset. Each host must have its local metadevice state database set up before you can create disksets. Each host in a diskset must have a local diskset besides a shared diskset.A diskset can be created seprately on one host & then added to the second host later.
Drive should not be in use by a file system, database, or any other application for adding in diskset .
When a drive is added to disksuite it is repartitioned so that the metadevice state database replica for the diskset can be placed on the drive. Drives are repartitioned when they are added to a diskset only if Slice 7 is not set up correctly. A small portion of each drive is reserved in Slice 7 for use by DiskSuite. The remainder of the space on each drive is placed into Slice 0.. After adding a drive to a diskset, it may be repartitioned as necessary, provided that no changes are made to Slice 7 . If Slice 7 starts at cylinder 0, and is large enough to contain a state database replica, the disk is not repartitioned.
When drives are added to a diskset, DiskSuite re-balances the state database replicas across the remaining drives. Later, if necessary, you can change the replica layout with the metadb(1M) command.
To create a diskset, root must be a member of Group 14, or the ./rhosts file must contain an entry for each host.

6.7.1 ) Creating Two Disksets
host1# metaset -s diskset0 -a -h host1 host2
host1# metaset -s diskset1 -a -h host1 host2
host1# metaset
Set name = diskset0, Set number = 1
Host Owner
host1
host2
Set name = diskset1, Set number = 2
Host Owner
host1
host2
6.7.2 ) Adding Drives to a Diskset
host1# metaset -s diskset0 -a c1t2d0 c1t3d0 c2t2d0 c2t3d0 c2t4d0 c2t5d0

host1# metaset
Set name = diskset0, Set number = 1
Host Owner
host1 Yes
host2

Drive Dbase
c1t2d0 Yes
c1t3d0 Yes
c2t2d0 Yes
c2t3d0 Yes
c2t4d0 Yes
c2t5d0 Yes

Set name = diskset1, Set number = 2
Host Owner
host1
host2
6.7.3 ) Creating a Mirror in a Diskset
# metainit -s diskset0 d51 1 1 /dev/dsk/c0t0d0s2
diskset0/d51: Concat/Stripe is setup

# metainit -s diskset0 d52 1 1 /dev/dsk/c1t0d0s2
diskset0/d52: Concat/Stripe is setup

# metainit -s diskset0 d50 -m d51
diskset0/d50: mirror is setup

# metattach -s diskset0 d50 d52
diskset0/d50: Submirror d52 is attached