Wednesday, March 7, 2007

Cool Unix Commands

Cool Commands

Peter Baer Galvin

There are so many commands in Solaris that it is difficult to separate the cool ones from the mundane. For example, there are commands to report how much time a program spends in each system call, and commands to dynamically show system activities, and most of these commands are included with Solaris 8 as well as Solaris 9. This month, I’m highlighting some of the commands that you might find particularly useful.

Systems administrators are tool users. Through experience, we have learned that the more tools we have, the better able we are to diagnose problems and implement solutions. The commands included in this column are gleaned from experience, friends, acquaintances, and from attendance at the SunNetwork 2002 conference in September. “The /procodile Hunter” talk by Solaris kernel developers Brian Cantrill and Mike Shapiro was especially enlightening and frightening because Cantrill wrote code to illustrate a point faster than Shapiro could explain the point they were trying to illustrate!

Useful Solaris Commands

truss -c (Solaris >= 8): This astounding option to truss provides a profile summary of the command being trussed:

$ truss -c grep asdf work.doc  syscall              seconds   calls  errors  _exit                    .00       1  read                     .01      24  open                     .00       8      4  close                    .00       5  brk                      .00      15  stat                     .00       1  fstat                    .00       4  execve                   .00       1  mmap                     .00      10  munmap                   .01       3  memcntl                  .00       2  llseek                   .00       1  open64                   .00       1                          ----     ---    ---  sys totals:              .02      76      4  usr time:                .00  elapsed:                 .05

It can also show profile data on a running process. In this case, the data shows what the process did between when truss was started and when truss execution was terminated with a control-c. It’s ideal for determining why a process is hung without having to wade through the pages of truss output.

truss -d and truss -D (Solaris >= 8): These truss options show the time associated with each system call being shown by truss and is excellent for finding performance problems in custom or commercial code. For example:

$ truss -d who  Base time stamp:  1035385727.3460  [ Wed Oct 23 11:08:47 EDT 2002 ]   0.0000 execve(“/usr/bin/who”, 0xFFBEFD5C, 0xFFBEFD64)  argc = 1   0.0032 stat(“/usr/bin/who”, 0xFFBEFA98)                = 0   0.0037 open(“/var/ld/ld.config”, O_RDONLY)             Err#2 ENOENT   0.0042 open(“/usr/local/lib/libc.so.1”, O_RDONLY)      Err#2 ENOENT   0.0047 open(“/usr/lib/libc.so.1”, O_RDONLY)            = 3   0.0051 fstat(3, 0xFFBEF42C)                            = 0  . . .

truss -D is even more useful, showing the time delta between system calls:

Dilbert> truss -D who   0.0000 execve(“/usr/bin/who”, 0xFFBEFD5C, 0xFFBEFD64)  argc = 1   0.0028 stat(“/usr/bin/who”, 0xFFBEFA98)                = 0   0.0005 open(“/var/ld/ld.config”, O_RDONLY)             Err#2 ENOENT   0.0006 open(“/usr/local/lib/libc.so.1”, O_RDONLY)      Err#2 ENOENT   0.0005 open(“/usr/lib/libc.so.1”, O_RDONLY)            = 3   0.0004 fstat(3, 0xFFBEF42C)                            = 0

In this example, the stat system call took a lot longer than the others.

truss -T: This is a great debugging help. It will stop a process at the execution of a specified system call. (“-U” does the same, but with user-level function calls.) A core could then be taken for further analysis, or any of the /proc tools could be used to determine many aspects of the status of the process.

truss -l (improved in Solaris 9): Shows the thread number of each call in a multi-threaded processes. Solaris 9 truss -l finally makes it possible to watch the execution of a multi-threaded application.

Truss is truly a powerful tool. It can be used on core files to analyze what caused the problem, for example. It can also show details on user-level library calls (either system libraries or programmer libraries) via the “-u” option.

pkg-get: This is a nice tool (http://www.bolthole.com/solaris) for automatically getting freeware packages. It is configured via /etc/pkg-get.conf. Once it’s up and running, execute pkg-get -a to get a list of available packages, and pkg-get -i to get and install a given package.

plimit (Solaris >= 8): This command displays and sets the per-process limits on a running process. This is handy if a long-running process is running up against a limit (for example, number of open files). Rather than using limit and restarting the command, plimit can modify the running process.

coreadm (Solaris >= 8): In the “old” days (before coreadm), core dumps were placed in the process’s working directory. Core files would also overwrite each other. All this and more has been addressed by coreadm, a tool to manage core file creation. With it, you can specify whether to save cores, where cores should be stored, how many versions should be retained, and more. Settings can be retained between reboots by coreadm modifying /etc/coreadm.conf.

pgrep (Solaris >= 8): pgrep searches through /proc for processes matching the given criteria, and returns their process-ids. A great option is “-n”, which returns the newest process that matches.

preap (Solaris >= 9): Reaps zombie processes. Any processes stuck in the “z” state (as shown by ps), can be removed from the system with this command.

pargs (Solaris >= 9): Shows the arguments and environment variables of a process.

nohup -p (Solaris >= 9): The nohup command can be used to start a process, so that if the shell that started the process closes (i.e., the process gets a “SIGHUP” signal), the process will keep running. This is useful for backgrounding a task that should continue running no matter what happens around it. But what happens if you start a process and later want to HUP-proof it? With Solaris 9, nohup -p takes a process-id and causes SIGHUP to be ignored.

prstat (Solaris >= 8): prstat is top and a lot more. Both commands provide a screen’s worth of process and other information and update it frequently, for a nice window on system performance. prstat has much better accuracy than top. It also has some nice options. “-a” shows process and user information concurrently (sorted by CPU hog, by default). “-c” causes it to act like vmstat (new reports printed below old ones). “-C” shows processes in a processor set. “-j” shows processes in a “project”. “-L” shows per-thread information as well as per-process. “-m” and “-v” show quite a bit of per-process performance detail (including pages, traps, lock wait, and CPU wait). The output data can also be sorted by resident-set (real memory) size, virtual memory size, execute time, and so on. prstat is very useful on systems without top, and should probably be used instead of top because of its accuracy (and some sites care that it is a supported program).

trapstat (Solaris >= 9): trapstat joins lockstat and kstat as the most inscrutable commands on Solaris. Each shows gory details about the innards of the running operating system. Each is indispensable in solving strange happenings on a Solaris system. Best of all, their output is good to send along with bug reports, but further study can reveal useful information for general use as well.

vmstat -p (Solaris >= 8): Until this option became available, it was almost impossible (see the “se toolkit”) to determine what kind of memory demand was causing a system to page. vmstat -p is key because it not only shows whether your system is under memory stress (via the “sr” column), it also shows whether that stress is from application code, application data, or I/O. “-p” can really help pinpoint the cause of any mysterious memory issues on Solaris.

pmap -x (Solaris >= 8, bugs fixed in Solaris >= 9): If the process with memory problems is known, and more details on its memory use are needed, check out pmap -x. The target process-id has its memory map fully explained, as in:

# pmap -x 1779  1779:   -ksh   Address  Kbytes     RSS    Anon  Locked Mode   Mapped File  00010000     192     192       -       - r-x--  ksh  00040000       8       8       8       - rwx--  ksh  00042000      32      32       8       - rwx--    [ heap ]  FF180000     680     664       -       - r-x--  libc.so.1  FF23A000      24      24       -       - rwx--  libc.so.1  FF240000       8       8       -       - rwx--  libc.so.1  FF280000     568     472       -       - r-x--  libnsl.so.1  FF31E000      32      32       -       - rwx--  libnsl.so.1  FF326000      32      24       -       - rwx--  libnsl.so.1  FF340000      16      16       -       - r-x--  libc_psr.so.1  FF350000      16      16       -       - r-x--  libmp.so.2  FF364000       8       8       -       - rwx--  libmp.so.2  FF380000      40      40       -       - r-x--  libsocket.so.1  FF39A000       8       8       -       - rwx--  libsocket.so.1  FF3A0000       8       8       -       - r-x--  libdl.so.1  FF3B0000       8       8       8       - rwx--    [ anon ]  FF3C0000     152     152       -       - r-x--  ld.so.1  FF3F6000       8       8       8       - rwx--  ld.so.1  FFBFE000       8       8       8       - rw---    [ stack ]  -------- ------- ------- ------- -------  total Kb    1848    1728      40       -

Here we see each chunk of memory, what it is being used for, how much space it is taking (virtual and real), and mode information.

df -h (Solaris >= 9): This command is popular on Linux, and just made its way into Solaris. df -h displays summary information about file systems in human-readable form:

$ df -h  Filesystem             size   used  avail capacity  Mounted on  /dev/dsk/c0t0d0s0      4.8G   1.7G   3.0G    37%    /  /proc                    0K     0K     0K     0%    /proc  mnttab                   0K     0K     0K     0%    /etc/mnttab  fd                       0K     0K     0K     0%    /dev/fd  swap                   848M    40K   848M     1%    /var/run  swap                   849M   1.0M   848M     1%    /tmp  /dev/dsk/c0t0d0s7       13G    78K    13G     1%    /export/home

Conclusion

Each administrator has a set of tools used daily, and another set of tools to help in a pinch. This column included a wide variety of commands and options that are lesser known, but can be very useful. Do you have favorite tools that have saved you in a bind? If so, please send them to me so I can expand my tool set as well. Alternately, send along any tools that you hate or that you feel are dangerous, which could also turn into a useful column!

Tuesday, March 6, 2007

HOWTO: Mirrored root disk on Solaris

http://www.brandonhutchinson.com/Mirroring_disks_with_DiskSuite.html


0. Partition the first disk

# format c0t0d0

Use the partition tool (=> "p <enter>, p <enter>"!) to setup the slices. We assume the following slice setup afterwards:

#  Tag         Flag  Cylinders      Size      Blocks
- ---------- ---- ------------- -------- --------------------
0 root wm 0 - 812 400.15MB (813/0/0) 819504
1 swap wu 813 - 1333 256.43MB (521/0/0) 525168
2 backup wm 0 - 17659 8.49GB (17660/0/0) 17801280
3 unassigned wm 1334 - 1354 10.34MB (21/0/0) 21168
4 var wm 1355 - 8522 3.45GB (7168/0/0) 7225344
5 usr wm 8523 - 14764 3.00GB (6242/0/0) 6291936
6 unassigned wm 14765 - 16845 1.00GB (2081/0/0) 2097648
7 home wm 16846 - 17659 400.15MB (813/0/0) 819504

1. Copy the partition table of the first disk to its future mirror disk

# prtvtoc /dev/rdsk/c0t0d0s2  fmthard -s - /dev/rdsk/c0t1d0s2

2. Create at least two state database replicas on each disk

# metadb -a -f -c 2 c0t0d0s3 c0t1d0s3

Check the state of all replicas with metadb:

# metadb

Notes:

A state database replica contains configuration and state information about the meta devices. Make sure that always at least 50% of the replicas are active!


3. Create the root slice mirror and its first submirror

# metainit -f d10 1 1 c0t0d0s0
# metainit -f d20 1 1 c0t1d0s0
# metainit d30 -m d10

Run metaroot to prepare /etc/vfstab and /etc/system (do this only for the root slice!):

# metaroot d30

4. Create the swap slice mirror and its first submirror

# metainit -f d11 1 1 c0t0d0s1
# metainit -f d21 1 1 c0t1d0s1
# metainit d31 -m d11

5. Create the var slice mirror and its first submirror

# metainit -f d14 1 1 c0t0d0s4
# metainit -f d24 1 1 c0t1d0s4
# metainit d34 -m d14

6. Create the usr slice mirror and its first submirror

# metainit -f d15 1 1 c0t0d0s5
# metainit -f d25 1 1 c0t1d0s5
# metainit d35 -m d15

7. Create the unassigned slice mirror and its first submirror

# metainit -f d16 1 1 c0t0d0s6
# metainit -f d26 1 1 c0t1d0s6
# metainit d36 -m d16

8. Create the home slice mirror and its first submirror

# metainit -f d17 1 1 c0t0d0s7
# metainit -f d27 1 1 c0t1d0s7
# metainit d37 -m d17

9. Edit /etc/vfstab to mount all mirrors after boot, including mirrored swap

/etc/vfstab before changes:

fd                 -                   /dev/fd  fd     -  no   -
/proc - /proc proc - no -
/dev/dsk/c0t0d0s1 - - swap - no -
/dev/md/dsk/d30 /dev/md/rdsk/d30 / ufs 1 no logging
/dev/dsk/c0t0d0s5 /dev/rdsk/c0t0d0s5 /usr ufs 1 no ro,logging
/dev/dsk/c0t0d0s4 /dev/rdsk/c0t0d0s4 /var ufs 1 no nosuid,logging
/dev/dsk/c0t0d0s7 /dev/rdsk/c0t0d0s7 /home ufs 2 yes nosuid,logging
/dev/dsk/c0t0d0s6 /dev/rdsk/c0t0d0s6 /opt ufs 2 yes nosuid,logging
swap - /tmp tmpfs - yes -

/etc/vfstab after changes:

fd                 -                   /dev/fd  fd     -  no   -
/proc - /proc proc - no -
/dev/md/dsk/d31 - - swap - no -
/dev/md/dsk/d30 /dev/md/rdsk/d30 / ufs 1 no logging
/dev/md/dsk/d35 /dev/md/rdsk/d35 /usr ufs 1 no ro,logging
/dev/md/dsk/d34 /dev/md/rdsk/d34 /var ufs 1 no nosuid,logging
/dev/md/dsk/d37 /dev/md/rdsk/d37 /home ufs 2 yes nosuid,logging
/dev/md/dsk/d36 /dev/md/rdsk/d36 /opt ufs 2 yes nosuid,logging
swap - /tmp tmpfs - yes -

Notes:

The entry for the root device (/) has already been altered by the metaroot command we executed before.


10. Reboot the system

# lockfs -fa && init 6

11. Attach the second submirrors to all mirrors

# metattach d30 d20
# metattach d31 d21
# metattach d34 d24
# metattach d35 d25
# metattach d36 d26
# metattach d37 d27

Notes:

This will finally cause the data from the boot disk to be synchronized with the mirror drive.

You can use metastat to track the mirroring progress.


12. Change the crash dump device to the swap metadevice

# dumpadm -d `swap -l  tail -1  awk '{print $1}'

13. Make the mirror disk bootable

# installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c0t1d0s0

Notes:

This will install a boot block to the second disk.


14. Determine the physical device path of the mirror disk

# ls -l /dev/dsk/c0t1d0s0
... /dev/dsk/c0t1d0s0 -> ../../devices/pci@1f,4000/scsi@3/sd@1,0:a

15. Create a device alias for the mirror disk

# eeprom "nvramrc=devalias mirror /pci@1f,4000/scsi@3/disk@1,0"
# eeprom "use-nvramrc?=true"

Add the mirror device alias to the Open Boot parameter boot-device to prepare the case of a problem with the primary boot device.

# eeprom "boot-device=disk mirror cdrom net"

You can also configure the device alias and boot-device list from the Open Boot Prompt (OBP a.k.a. ok prompt):

ok nvalias mirror /pci@1f,4000/scsi@3/disk@1,0
ok use-nvramrc?=true
ok boot-device=disk mirror cdrom net

Notes:

From the OBP, you can use boot mirror to boot from the mirror disk.

On my test system, I had to replace sd@1,0:a with disk@1,0. Use devalias on the OBP prompt to determine the correct device path.