Wednesday, March 7, 2007

Cool Unix Commands

Cool Commands

Peter Baer Galvin

There are so many commands in Solaris that it is difficult to separate the cool ones from the mundane. For example, there are commands to report how much time a program spends in each system call, and commands to dynamically show system activities, and most of these commands are included with Solaris 8 as well as Solaris 9. This month, I’m highlighting some of the commands that you might find particularly useful.

Systems administrators are tool users. Through experience, we have learned that the more tools we have, the better able we are to diagnose problems and implement solutions. The commands included in this column are gleaned from experience, friends, acquaintances, and from attendance at the SunNetwork 2002 conference in September. “The /procodile Hunter” talk by Solaris kernel developers Brian Cantrill and Mike Shapiro was especially enlightening and frightening because Cantrill wrote code to illustrate a point faster than Shapiro could explain the point they were trying to illustrate!

Useful Solaris Commands

truss -c (Solaris >= 8): This astounding option to truss provides a profile summary of the command being trussed:

$ truss -c grep asdf work.doc  syscall              seconds   calls  errors  _exit                    .00       1  read                     .01      24  open                     .00       8      4  close                    .00       5  brk                      .00      15  stat                     .00       1  fstat                    .00       4  execve                   .00       1  mmap                     .00      10  munmap                   .01       3  memcntl                  .00       2  llseek                   .00       1  open64                   .00       1                          ----     ---    ---  sys totals:              .02      76      4  usr time:                .00  elapsed:                 .05

It can also show profile data on a running process. In this case, the data shows what the process did between when truss was started and when truss execution was terminated with a control-c. It’s ideal for determining why a process is hung without having to wade through the pages of truss output.

truss -d and truss -D (Solaris >= 8): These truss options show the time associated with each system call being shown by truss and is excellent for finding performance problems in custom or commercial code. For example:

$ truss -d who  Base time stamp:  1035385727.3460  [ Wed Oct 23 11:08:47 EDT 2002 ]   0.0000 execve(“/usr/bin/who”, 0xFFBEFD5C, 0xFFBEFD64)  argc = 1   0.0032 stat(“/usr/bin/who”, 0xFFBEFA98)                = 0   0.0037 open(“/var/ld/ld.config”, O_RDONLY)             Err#2 ENOENT   0.0042 open(“/usr/local/lib/libc.so.1”, O_RDONLY)      Err#2 ENOENT   0.0047 open(“/usr/lib/libc.so.1”, O_RDONLY)            = 3   0.0051 fstat(3, 0xFFBEF42C)                            = 0  . . .

truss -D is even more useful, showing the time delta between system calls:

Dilbert> truss -D who   0.0000 execve(“/usr/bin/who”, 0xFFBEFD5C, 0xFFBEFD64)  argc = 1   0.0028 stat(“/usr/bin/who”, 0xFFBEFA98)                = 0   0.0005 open(“/var/ld/ld.config”, O_RDONLY)             Err#2 ENOENT   0.0006 open(“/usr/local/lib/libc.so.1”, O_RDONLY)      Err#2 ENOENT   0.0005 open(“/usr/lib/libc.so.1”, O_RDONLY)            = 3   0.0004 fstat(3, 0xFFBEF42C)                            = 0

In this example, the stat system call took a lot longer than the others.

truss -T: This is a great debugging help. It will stop a process at the execution of a specified system call. (“-U” does the same, but with user-level function calls.) A core could then be taken for further analysis, or any of the /proc tools could be used to determine many aspects of the status of the process.

truss -l (improved in Solaris 9): Shows the thread number of each call in a multi-threaded processes. Solaris 9 truss -l finally makes it possible to watch the execution of a multi-threaded application.

Truss is truly a powerful tool. It can be used on core files to analyze what caused the problem, for example. It can also show details on user-level library calls (either system libraries or programmer libraries) via the “-u” option.

pkg-get: This is a nice tool (http://www.bolthole.com/solaris) for automatically getting freeware packages. It is configured via /etc/pkg-get.conf. Once it’s up and running, execute pkg-get -a to get a list of available packages, and pkg-get -i to get and install a given package.

plimit (Solaris >= 8): This command displays and sets the per-process limits on a running process. This is handy if a long-running process is running up against a limit (for example, number of open files). Rather than using limit and restarting the command, plimit can modify the running process.

coreadm (Solaris >= 8): In the “old” days (before coreadm), core dumps were placed in the process’s working directory. Core files would also overwrite each other. All this and more has been addressed by coreadm, a tool to manage core file creation. With it, you can specify whether to save cores, where cores should be stored, how many versions should be retained, and more. Settings can be retained between reboots by coreadm modifying /etc/coreadm.conf.

pgrep (Solaris >= 8): pgrep searches through /proc for processes matching the given criteria, and returns their process-ids. A great option is “-n”, which returns the newest process that matches.

preap (Solaris >= 9): Reaps zombie processes. Any processes stuck in the “z” state (as shown by ps), can be removed from the system with this command.

pargs (Solaris >= 9): Shows the arguments and environment variables of a process.

nohup -p (Solaris >= 9): The nohup command can be used to start a process, so that if the shell that started the process closes (i.e., the process gets a “SIGHUP” signal), the process will keep running. This is useful for backgrounding a task that should continue running no matter what happens around it. But what happens if you start a process and later want to HUP-proof it? With Solaris 9, nohup -p takes a process-id and causes SIGHUP to be ignored.

prstat (Solaris >= 8): prstat is top and a lot more. Both commands provide a screen’s worth of process and other information and update it frequently, for a nice window on system performance. prstat has much better accuracy than top. It also has some nice options. “-a” shows process and user information concurrently (sorted by CPU hog, by default). “-c” causes it to act like vmstat (new reports printed below old ones). “-C” shows processes in a processor set. “-j” shows processes in a “project”. “-L” shows per-thread information as well as per-process. “-m” and “-v” show quite a bit of per-process performance detail (including pages, traps, lock wait, and CPU wait). The output data can also be sorted by resident-set (real memory) size, virtual memory size, execute time, and so on. prstat is very useful on systems without top, and should probably be used instead of top because of its accuracy (and some sites care that it is a supported program).

trapstat (Solaris >= 9): trapstat joins lockstat and kstat as the most inscrutable commands on Solaris. Each shows gory details about the innards of the running operating system. Each is indispensable in solving strange happenings on a Solaris system. Best of all, their output is good to send along with bug reports, but further study can reveal useful information for general use as well.

vmstat -p (Solaris >= 8): Until this option became available, it was almost impossible (see the “se toolkit”) to determine what kind of memory demand was causing a system to page. vmstat -p is key because it not only shows whether your system is under memory stress (via the “sr” column), it also shows whether that stress is from application code, application data, or I/O. “-p” can really help pinpoint the cause of any mysterious memory issues on Solaris.

pmap -x (Solaris >= 8, bugs fixed in Solaris >= 9): If the process with memory problems is known, and more details on its memory use are needed, check out pmap -x. The target process-id has its memory map fully explained, as in:

# pmap -x 1779  1779:   -ksh   Address  Kbytes     RSS    Anon  Locked Mode   Mapped File  00010000     192     192       -       - r-x--  ksh  00040000       8       8       8       - rwx--  ksh  00042000      32      32       8       - rwx--    [ heap ]  FF180000     680     664       -       - r-x--  libc.so.1  FF23A000      24      24       -       - rwx--  libc.so.1  FF240000       8       8       -       - rwx--  libc.so.1  FF280000     568     472       -       - r-x--  libnsl.so.1  FF31E000      32      32       -       - rwx--  libnsl.so.1  FF326000      32      24       -       - rwx--  libnsl.so.1  FF340000      16      16       -       - r-x--  libc_psr.so.1  FF350000      16      16       -       - r-x--  libmp.so.2  FF364000       8       8       -       - rwx--  libmp.so.2  FF380000      40      40       -       - r-x--  libsocket.so.1  FF39A000       8       8       -       - rwx--  libsocket.so.1  FF3A0000       8       8       -       - r-x--  libdl.so.1  FF3B0000       8       8       8       - rwx--    [ anon ]  FF3C0000     152     152       -       - r-x--  ld.so.1  FF3F6000       8       8       8       - rwx--  ld.so.1  FFBFE000       8       8       8       - rw---    [ stack ]  -------- ------- ------- ------- -------  total Kb    1848    1728      40       -

Here we see each chunk of memory, what it is being used for, how much space it is taking (virtual and real), and mode information.

df -h (Solaris >= 9): This command is popular on Linux, and just made its way into Solaris. df -h displays summary information about file systems in human-readable form:

$ df -h  Filesystem             size   used  avail capacity  Mounted on  /dev/dsk/c0t0d0s0      4.8G   1.7G   3.0G    37%    /  /proc                    0K     0K     0K     0%    /proc  mnttab                   0K     0K     0K     0%    /etc/mnttab  fd                       0K     0K     0K     0%    /dev/fd  swap                   848M    40K   848M     1%    /var/run  swap                   849M   1.0M   848M     1%    /tmp  /dev/dsk/c0t0d0s7       13G    78K    13G     1%    /export/home

Conclusion

Each administrator has a set of tools used daily, and another set of tools to help in a pinch. This column included a wide variety of commands and options that are lesser known, but can be very useful. Do you have favorite tools that have saved you in a bind? If so, please send them to me so I can expand my tool set as well. Alternately, send along any tools that you hate or that you feel are dangerous, which could also turn into a useful column!

No comments: