Wednesday, January 10, 2007

Removing Invalid Disk Device Files (/dev/dsk and /dev/rdsk)

Removing Invalid Disk Device Files (/dev/dsk and /dev/rdsk)

by Jeff Hunter, Sr. Database Administrator

Overview

Whether installing a SCSI controller or even an additional IDE disk to a Sun Solaris machine, the Solaris O/S will:
Create Disk Device Files under the Hardware Device Tree (/devices).
Create symbolic links in /dev/dsk, /dev/rdsk, and /dev/cfg that point to the devices in the /devices directory.
Make entries in the /etc/path_to_inst file.
Things will generally work fine until you decide to remove or move a device in the system. I have had situations where I have run out of devices on a host because of Sun's poor ability to remove invalid (hanging) disk device files after removing a device. This is one area where Sun could really improve. It looks like they are trying new things with the boot -p option but I've only ever seen it remove things once.
There are other times when I simply wanted to replace a certain type of SCSI controller and wanted to reuse the controller ID's from a previously removed card. For example, I have a host (an E450) which had 2 internal controllers (0 and 1) and a dual differential SCSI card installed (controllers 2 and 3). I removed the dual differential SCSI host adapter and decided to replace it with a Single-Ended SCSI host adapter but Solaris would always assign them controller numbers 4 and 5. I wanted the system to reassign controller numbers 2 and 3 for the new host adapter but links still existed for the original dual differential SCSI host adapter.
My intention in this article is to provide several solutions for either renumbering disk device files (SCSI controllers, SCSI disks, IDE controllers, IDE disks, etc.) or simply removing old ones from replaced or removed devices. Please keep in mind that this article has been put together from notes I found during many searches for answers on the Internet. If anyone reading this has other solutions, please email me and I would be happy to post them for others going through this procedure.

Using the devfsadm Command

The devfsadm command was introduced with Solaris 7 and can be found in /usr/sbin/devfsadm. This command is used to maintain the /dev and /devices namespaces. The devfsadm command replaces the previous suite of devfs administration tools including drvconfig(1M), disks(1M), tapes(1M), ports(1M), audlinks(1M), and devlinks(1M). To maintain backwards compatibility, all previous devfs commands are hard links to devfsadm.
In many cases, you only need run the command:
  # devfsadm -C
to invoke the cleanup routines that are not normally invoked to remove dangling logical links.

Manual Methods

The devfsadm command was introduced with Solaris 7. For those running older versions of Solaris (i.e. Solaris 2.6) or simply want to perform all manual steps, this section describes the procedures to do just that.
    1. Make a backup of your /etc/path_to_inst file and then modify the file so that all that exists is the SCSI / IDE reference for the boot drive. Remove all of the "pcipsy" and "glm" entries except for the one that is used by the controller that has the boot drive. Take note of the physical path of the controller you want to renumber.
    2. Remove all /dev/dsk/cX* and /dev/rdsk/cX* files where X is the controller number(s) you want to remove and even those that no longer exist. (In the case of the example I provided on the E450, that would be 2, 3, 4, and 5.)
    3. Remove all /dev/cfg/cX symbolic links where X is the controller(s) you want to remove. Make sure to not remove the controller with the boot drive. (Again, in the case of the example I provided on the E450, that would be 2, 3, 4, and 5.) It turns out this was one of the crucial steps that needed to be complete in order for Solaris to reuse controller numbers 2 and 3. The O/S was not able to reassign both of these controller numbers while the links (/dev/cfg/2 and /dev/cfg/3) still existed.
    4. Remove all files under /devices/* for the controller you want to remove or renumber as indicated in Step #1.
    5. Remove all files in /dev/sdXX* that symbolically link to controller(s) you do not want anymore. This may not be completely necessary, but it does clean things up.
    6. Reboot the server with the "-srv" option:
ok boot -srv

No comments: