Cool Commands
Peter Baer Galvin
There are so many commands in Solaris that it is difficult to separate the cool ones from the mundane. For example, there are commands to report how much time a program spends in each system call, and commands to dynamically show system activities, and most of these commands are included with Solaris 8 as well as Solaris 9. This month, I’m highlighting some of the commands that you might find particularly useful.
Systems administrators are tool users. Through experience, we have learned that the more tools we have, the better able we are to diagnose problems and implement solutions. The commands included in this column are gleaned from experience, friends, acquaintances, and from attendance at the SunNetwork 2002 conference in September. “The /procodile Hunter†talk by Solaris kernel developers Brian Cantrill and Mike Shapiro was especially enlightening and frightening because Cantrill wrote code to illustrate a point faster than Shapiro could explain the point they were trying to illustrate!
Useful Solaris Commands
truss -c (Solaris >= 8): This astounding option to truss provides a profile summary of the command being trussed:
$ truss -c grep asdf work.doc syscall seconds calls errors _exit .00 1 read .01 24 open .00 8 4 close .00 5 brk .00 15 stat .00 1 fstat .00 4 execve .00 1 mmap .00 10 munmap .01 3 memcntl .00 2 llseek .00 1 open64 .00 1 ---- --- --- sys totals: .02 76 4 usr time: .00 elapsed: .05
It can also show profile data on a running process. In this case, the data shows what the process did between when truss was started and when truss execution was terminated with a control-c. It’s ideal for determining why a process is hung without having to wade through the pages of truss output.
truss -d and truss -D (Solaris >= 8): These truss options show the time associated with each system call being shown by truss and is excellent for finding performance problems in custom or commercial code. For example:
$ truss -d who Base time stamp: 1035385727.3460 [ Wed Oct 23 11:08:47 EDT 2002 ] 0.0000 execve(“/usr/bin/whoâ€, 0xFFBEFD5C, 0xFFBEFD64) argc = 1 0.0032 stat(“/usr/bin/whoâ€, 0xFFBEFA98) = 0 0.0037 open(“/var/ld/ld.configâ€, O_RDONLY) Err#2 ENOENT 0.0042 open(“/usr/local/lib/libc.so.1â€, O_RDONLY) Err#2 ENOENT 0.0047 open(“/usr/lib/libc.so.1â€, O_RDONLY) = 3 0.0051 fstat(3, 0xFFBEF42C) = 0 . . .
truss -D is even more useful, showing the time delta between system calls:
Dilbert> truss -D who 0.0000 execve(“/usr/bin/whoâ€, 0xFFBEFD5C, 0xFFBEFD64) argc = 1 0.0028 stat(“/usr/bin/whoâ€, 0xFFBEFA98) = 0 0.0005 open(“/var/ld/ld.configâ€, O_RDONLY) Err#2 ENOENT 0.0006 open(“/usr/local/lib/libc.so.1â€, O_RDONLY) Err#2 ENOENT 0.0005 open(“/usr/lib/libc.so.1â€, O_RDONLY) = 3 0.0004 fstat(3, 0xFFBEF42C) = 0
In this example, the stat system call took a lot longer than the others.
truss -T: This is a great debugging help. It will stop a process at the execution of a specified system call. (“-U†does the same, but with user-level function calls.) A core could then be taken for further analysis, or any of the /proc tools could be used to determine many aspects of the status of the process.
truss -l (improved in Solaris 9): Shows the thread number of each call in a multi-threaded processes. Solaris 9 truss -l finally makes it possible to watch the execution of a multi-threaded application.
Truss is truly a powerful tool. It can be used on core files to analyze what caused the problem, for example. It can also show details on user-level library calls (either system libraries or programmer libraries) via the “-u†option.
pkg-get: This is a nice tool (http://www.bolthole.com/solaris) for automatically getting freeware packages. It is configured via /etc/pkg-get.conf. Once it’s up and running, execute pkg-get -a to get a list of available packages, and pkg-get -i to get and install a given package.
plimit (Solaris >= 8): This command displays and sets the per-process limits on a running process. This is handy if a long-running process is running up against a limit (for example, number of open files). Rather than using limit and restarting the command, plimit can modify the running process.
coreadm (Solaris >= 8): In the “old†days (before coreadm), core dumps were placed in the process’s working directory. Core files would also overwrite each other. All this and more has been addressed by coreadm, a tool to manage core file creation. With it, you can specify whether to save cores, where cores should be stored, how many versions should be retained, and more. Settings can be retained between reboots by coreadm modifying /etc/coreadm.conf.
pgrep (Solaris >= 8): pgrep searches through /proc for processes matching the given criteria, and returns their process-ids. A great option is “-nâ€, which returns the newest process that matches.
preap (Solaris >= 9): Reaps zombie processes. Any processes stuck in the “z†state (as shown by ps), can be removed from the system with this command.
pargs (Solaris >= 9): Shows the arguments and environment variables of a process.
nohup -p (Solaris >= 9): The nohup command can be used to start a process, so that if the shell that started the process closes (i.e., the process gets a “SIGHUP†signal), the process will keep running. This is useful for backgrounding a task that should continue running no matter what happens around it. But what happens if you start a process and later want to HUP-proof it? With Solaris 9, nohup -p takes a process-id and causes SIGHUP to be ignored.
prstat (Solaris >= 8): prstat is top and a lot more. Both commands provide a screen’s worth of process and other information and update it frequently, for a nice window on system performance. prstat has much better accuracy than top. It also has some nice options. “-a†shows process and user information concurrently (sorted by CPU hog, by default). “-c†causes it to act like vmstat (new reports printed below old ones). “-C†shows processes in a processor set. “-j†shows processes in a “projectâ€. “-L†shows per-thread information as well as per-process. “-m†and “-v†show quite a bit of per-process performance detail (including pages, traps, lock wait, and CPU wait). The output data can also be sorted by resident-set (real memory) size, virtual memory size, execute time, and so on. prstat is very useful on systems without top, and should probably be used instead of top because of its accuracy (and some sites care that it is a supported program).
trapstat (Solaris >= 9): trapstat joins lockstat and kstat as the most inscrutable commands on Solaris. Each shows gory details about the innards of the running operating system. Each is indispensable in solving strange happenings on a Solaris system. Best of all, their output is good to send along with bug reports, but further study can reveal useful information for general use as well.
vmstat -p (Solaris >= 8): Until this option became available, it was almost impossible (see the “se toolkitâ€) to determine what kind of memory demand was causing a system to page. vmstat -p is key because it not only shows whether your system is under memory stress (via the “sr†column), it also shows whether that stress is from application code, application data, or I/O. “-p†can really help pinpoint the cause of any mysterious memory issues on Solaris.
pmap -x (Solaris >= 8, bugs fixed in Solaris >= 9): If the process with memory problems is known, and more details on its memory use are needed, check out pmap -x. The target process-id has its memory map fully explained, as in:
# pmap -x 1779 1779: -ksh Address Kbytes RSS Anon Locked Mode Mapped File 00010000 192 192 - - r-x-- ksh 00040000 8 8 8 - rwx-- ksh 00042000 32 32 8 - rwx-- [ heap ] FF180000 680 664 - - r-x-- libc.so.1 FF23A000 24 24 - - rwx-- libc.so.1 FF240000 8 8 - - rwx-- libc.so.1 FF280000 568 472 - - r-x-- libnsl.so.1 FF31E000 32 32 - - rwx-- libnsl.so.1 FF326000 32 24 - - rwx-- libnsl.so.1 FF340000 16 16 - - r-x-- libc_psr.so.1 FF350000 16 16 - - r-x-- libmp.so.2 FF364000 8 8 - - rwx-- libmp.so.2 FF380000 40 40 - - r-x-- libsocket.so.1 FF39A000 8 8 - - rwx-- libsocket.so.1 FF3A0000 8 8 - - r-x-- libdl.so.1 FF3B0000 8 8 8 - rwx-- [ anon ] FF3C0000 152 152 - - r-x-- ld.so.1 FF3F6000 8 8 8 - rwx-- ld.so.1 FFBFE000 8 8 8 - rw--- [ stack ] -------- ------- ------- ------- ------- total Kb 1848 1728 40 -
Here we see each chunk of memory, what it is being used for, how much space it is taking (virtual and real), and mode information.
df -h (Solaris >= 9): This command is popular on Linux, and just made its way into Solaris. df -h displays summary information about file systems in human-readable form:
$ df -h Filesystem size used avail capacity Mounted on /dev/dsk/c0t0d0s0 4.8G 1.7G 3.0G 37% / /proc 0K 0K 0K 0% /proc mnttab 0K 0K 0K 0% /etc/mnttab fd 0K 0K 0K 0% /dev/fd swap 848M 40K 848M 1% /var/run swap 849M 1.0M 848M 1% /tmp /dev/dsk/c0t0d0s7 13G 78K 13G 1% /export/home
Conclusion
Each administrator has a set of tools used daily, and another set of tools to help in a pinch. This column included a wide variety of commands and options that are lesser known, but can be very useful. Do you have favorite tools that have saved you in a bind? If so, please send them to me so I can expand my tool set as well. Alternately, send along any tools that you hate or that you feel are dangerous, which could also turn into a useful column!
No comments:
Post a Comment