Tuesday, June 5, 2007

Small tutorial on SCCS

Small tutorial on SCCS

The Basics

SCCS is a software versioning and revision control system. It works by maintaining information about files in a main directory (called SCCS). Each person working on the code creates a symbolic link to this directory in his/her own workspace. There is an SCCS directory for each project.

Setup

Decide on a central repository to store the code. This can be something like /usr/src/projectName.
Create the SCCS directory there: mkdir /usr/src/projectName/SCCS. Put the original files into the SCCS system using
sccs create filename
where filename is the name of the file.

Usage

  • Creating a file and putting it into sccs:
    sccs create filename
  • checking files out for editing:
    sccs edit filename
  • checking files in after editing:
    sccs delget filename
  • getting a particular version of a file:
    sccs get filename gets the latest version of the file
    sccs get -rxx filename gets version xx of the file
  • getting the latest version of all files:
    sccs get SCCS/s.*
  • getting information about which files are checked out:
    sccs info
  • getting information about a particular file:
    sccs prt filename
  • Diff between current file and checked in version:
    sccs diffs filename

To modify a file, check it out for editing (edit), make the modifications, and then check it back in (delget).

Version info in the source code

SCCS provides the ability to mark your source files with the sccs information, such as the date of the last modification. To enable this, include the string %W% %G% in your source code. Generally this would be placed inside a comment. Once a file is checked in, the string gets replaced by the name of the file and the date of last modification.

Monday, June 4, 2007

Basic Source Control Using RCS

 
Applying RCS and SCCS: From Source Control to Project Control

Applying RCS and SCCS

From Source Control to Project Control



Basic Source Control Using RCS

The Revision Control System, or RCS, was developed by Walter F. Tichy at Purdue University in the early 1980s. Implemented later than SCCS, and with full knowledge of it, RCS is a more user-friendly system, and in most ways a more powerful one. In this chapter we present the most basic capabilities of RCS, by showing how you can apply it to the source file modification cycle.

Background

Traditionally, RCS has been included in BSD UNIX distributions; currently, it is also distributed by the Free Software Foundation [Egg91]. RCS has not traditionally been included in AT&T-derived UNIX distributions. Despite the technical merits of RCS, its absence from System V and earlier AT&T systems can present practical and political obstacles to those who would like to use it.

If RCS is of interest to you, make sure your system provides it. If not, you'll be obliged to obtain it from another source, such as the FSF. (We provide instructions for doing so in Appendix H, References.)[1] The FSF, of course, distributes RCS in source form only. Though it's normally trivial to configure and build RCS for a UNIX-like ("POSIX-compliant") platform, this is still something you would have to do yourself if you obtained the system in this way.

[1] A successor to RCS called RCE (for Revision Control Engine) has recently been announced by a group working in Germany with Walter Tichy [XCC95]. RCE is built atop a difference generator that works between arbitrary files, and is implemented as a library, permitting source control operations to be integrated with existing applications. A "stand-alone" command-line interface that is compatible with RCS is provided, as well as a graphical interface. We have not evaluated RCE; if it's of interest to you, see Appendix H for information on finding out more about it.

In this book we describe RCS version 5.6.0.1, the most recent one available as this book was being written.[2] This version of RCS contains the commands listed in Table 3.1.[3]

[2] Note that RCS 5.6.0.1 differs from 5.6 only in that it provides partial support for a new form of conflict output in doing three-way merges. (See Chapter 5, Extending Source Control to Multiple Releases, for a discussion of of such merges.) Since the new support is incomplete, it's disabled, making 5.6.0.1 effectively identical to 5.6. Version 5.7 of RCS was released just after this book went into production. Though it changes nothing fundamental, 5.7 does introduce a few new features. We flag the most important or visible of these in footnotes at the relevant points in our presentation. Appendix G, Changes in RCS Version 5.7, gives a more complete summary of what changed.

[3] We include in Table 3-1 the rcsclean(1) command, which in older RCS releases was a useful but limited shell script. In the current release, the command is implemented in C and uses much of the same internal code as the other RCS commands. Though the RCS sources still flag rcsclean as experimental, we think it's "grown up" enough to warrant inclusion as part of the standard system.

Table 3.1: The RCS Command Set
Command Description
ci Check in RCS revisions
co Check out RCS revisions
ident Identify files
merge Three-way file merge
rcs Change RCS file attributes
rcsclean Clean up working files
rcsdiff Compare RCS revisions
rcsmerge Merge RCS revisions
rlog Print log messages and other info on RCS files

Conventions

Before describing the basic RCS commands, let's define some terms and take a look at command-line conventions, especially how you specify files to the system.

Nomenclature

When RCS creates an archive file, the name of the archive file is the source file name with ,v appended to it. Thus if you created an archive for the file xform.c, xform.c,v would become the archive's name. The ",v" nominally refers to the multiple "versions" of the source file stored in the archive file. Naturally enough, RCS calls its archive files "RCS files."

All of the terms that we introduced in prior chapters to talk about source control in fact come from RCS. Thus RCS uses the term "revision" to refer to each stored version of the source file. It also uses the term "check-in" to refer to the addition of a new revision to an RCS file and "check-out" to refer to the retrieval of an existing revision from an RCS file. And a source file that's been retrieved from an RCS file is known as a "working file."

RCS Command Lines

Like most programs with a UNIX heritage, all RCS commands expect a command line that consists of a command name followed by one or more file names. The file names may be (but don't have to be) preceded by one or more options. If given, options change how the command works. So to summarize, a command line looks like

command-name [options] files

Each option begins with a hyphen, which is what distinguishes it from a filename. After the hyphen comes a single letter that identifies the option; then (for some options) comes a string that serves as a value for the option. Never insert whitespace between an option letter and its value--let them appear as one argument on the command line. (The first argument not starting with a hyphen is assumed to begin the filename arguments for the command.) Each file named on a command line can be either a source file or an RCS file, as we explain below.

Thus a typical command might be

% rcsdiff -r1.2 xform.c

This invocation of the rcsdiff(1) command specifies one option (-r, which has 1.2 as its value) and specifies one file, xform.c.

One final note is really not related to RCS, but to entering quoted strings on a shell command line. As we'll see, you sometimes have the choice of entering a description of an operation either at a prompt from the program you're running or directly on the program's command line. If you want to give the description on the command line, you'll need to enter it as a quoted string (because it will contain whitespace). And if you want to continue the description over more than one line, you'll have to use whatever convention your shell supports for continuing a command line.

For example, the -m option to the ci program specifies comments for a check-in operation. If you want to give a multiline comment and you're using csh(1) (or derivatives) as your shell, you need to precede each carriage return in the commentary with a backslash:

% ci -m"Fix CR 604: vary conditioning algorithm according to\
? range data supplied by caller." filter.sh

However, under the Bourne shell (or ksh(1) or bash(1)), as long as the value for -m is quoted, you don't need to do anything special to continue the comments onto a second line:

$ ci -m"Fix CR 604: vary conditioning algorithm according to
> range data supplied by caller." filter.sh

Naming Files

In running an RCS command, you can name either a working file or the corresponding RCS file on the command line; the command will automatically derive the other file name from the one you provide. This means, for instance, that these two command lines are equivalent:

% rcsdiff -r1.2 xform.c,v % rcsdiff -r1.2 xform.c

Another feature provided by RCS is the automatic use of an RCS subdirectory for RCS files. If you create such a subdirectory beneath the directory where you're working, RCS will try to use it before trying to use the current directory. RCS will not, however, create a subdirectory if one doesn't already exist.

Let's examine naming in more detail. Say that your working file has the name workfile and that path1 and path2 are UNIX pathnames. Then the full set of rules for specifying names to an RCS command looks like this:

  • If you name only a working file (such as, say, path1/workfile), the command tries to use an RCS file with the name path1/RCS/workfile,v or path1/workfile,v (in that order). Naturally, this is also what happens in the simple case in which path1 is not present.

  • If you name only an RCS file without a pathname prefix (such as workfile,v), the command tries to use workfile,v first in any RCS subdirectory beneath the current directory, then in the current directory itself. If it can use an RCS file with one of those names, it tries to use a working file named workfile in the current directory.

  • If you name an RCS file with a pathname prefix (such as path2/workfile,v), the command expects to be able to use an RCS file with exactly that name. Then it tries to use a working file named workfile in the current directory.

  • If you name both a working file and an RCS file, then the command uses files with exactly those names during its execution. In this case the two files can be specified in either order, and can come from completely unrelated directories.

Suppose, for instance, that in your current directory you had a source file xform.c, as well as an RCS subdirectory. Then any of these command lines would create an archive file named RCS/xform.c from the xform.c source file. (The command here is the "check-in" command we describe below.)

% ci xform.c
% ci xform.c,v
% ci RCS/xform.c,v
% ci RCS/xform.c,v xform.c
% ci xform.c RCS/xform.c,v

When the source file and the RCS file are in the same directory (or when the RCS file is in an RCS subdirectory), there's no need to give both file names on the command line. This becomes useful only if the two files are in unrelated directories. For example, if xform.c were in the directory /home/cullen/current/src/mathlib, but you wanted to create the RCS file in the directory /project/archive/mathlib, either of these command lines would do the trick:

% ci /home/cullen/current/src/mathlib/xform.c /project/archive/mathlib/xform.c,v
% ci /project/archive/mathlib/xform.c,v /home/cullen/current/src/mathlib/xform.c

Command lines like this one become useful when you put files into separate trees according to their type. (We mentioned this possibility at the end of Chapter 2, The Basics of Source Control.) This is one of the key concepts behind project control, as we'll see time and again in later chapters. If you take this approach, though, you won't want to be typing horrendously long command lines all the time. It's far better to create some kind of "tree mapper" to manage the filenames for you. Such a mapper is fundamental to systems like TCCS.

Naturally, for any RCS command, you can specify more than one file, and the command will process each file in turn. For your own sake, if you frequently process more than one file at a time, you'll probably want to use an RCS subdirectory to hold RCS files. This helps you avoid naming RCS files by mistake when you use wildcards to name groups of working files.

Note that RCS will take a command line of intermixed working filenames and RCS filenames and match them up using the rules we outlined earlier in this chapter. Though this may work all right for simple cases, however, the potential for ambiguity or erroneous file inclusion is great enough that you should avoid the situation altogether and just segregate your RCS files in an RCS subdirectory.

This is desirable for more general administrative reasons as well. Working files and RCS files are innately different, and it only makes sense to keep them in distinct places to make it easy to administer them appropriately. In particular, by segregating your RCS files, you make it harder to access them accidentally in any way other than through the RCS command set. An rm -rf will still remove them, of course, but the added safety of an RCS subdirectory shouldn't be neglected.

Basic RCS Commands

Now we present one iteration of the source file modification cycle, using RCS commands to implement each operation. We also cover a few other basic commands that are not strictly part of the cycle. All of this involves only some of the RCS commands and few (if any) of their many options. Later chapters explore more of the potential of the full RCS command set.

Figure 3.1 depicts the basic source control operations. This is the same picture we presented as Figure 2-1, but with the "bubbles" annotated to show which RCS command actually implements each operation. So once again, the central part of the figure shows the modification cycle.

Figure 3.1: Basic source control operations in RCS

Figure 3.1

Let's cover each of the operations in more depth. Roughly speaking, we'll describe them in the order in which they appear in the figure, working from top to bottom.

Creating an RCS File

You create an RCS file with the command ci(1) (for "check-in"). The command line need not specify anything but the name of the source file you're checking in. So a simple example of creating an RCS file is

% ci xform.c

which will create the file xform.c,v. If you've already made an RCS subdirectory, then the file will be created there. Otherwise, it will be created in the current directory.

By default, when you create an RCS file, you're prompted for a short description of the file you're putting under source control. You may enter as much text as you like in response, ending your input with a line containing a period by itself. The interaction looks like this:

% ci xform.c
xform.c,v <-- xform.c initial revision: 1.1 enter description, terminated with ^D or '.': NOTE: This is NOT the log message! >> Matrix transform routines, for single-precision data.
>> .
done

Once the RCS file is created, your source file is immediately deleted. It is, of course, now safely stored in the RCS file and can be extracted as a working file whenever you want it.

As the warning "NOTE: This is NOT the log message!" implies, you really create two descriptions when you check in the initial revision of an archive file. The first description, which is what ci prompts for by default, is for the file itself--this message is meant to describe the role of the file in your project. In addition, ci also creates a log message (a term we'll come back to later), to describe the first revision of the archive file--you can use this description to trace the origins of the source file you're checking in.

By default, ci creates a log message with the value "Initial revision". If you want to use the message actually to capture some useful data, you can use the -m option on the ci command line to specify it, like this:

% ci -m"As ported from 9000 engineering group sources,\
? version 4.13." xform.c

Of course, this message has to be quoted, and the usual rules apply if it extends across multiple lines.

The ci command also lets you specify the archive file description on the command line, instead of being prompted for it, via the -t flag. In fact, you can use -t on any check-in, not just the first one, to change an archive's description. You can give a description either as the value to -t or in a file, which you name using -t.

If the value of the option starts with a hyphen, it's taken to be the literal text of the description; otherwise, it's taken to be the name of a file containing the description. So either of these command sequences would be equivalent to the original ci command we showed above:

% cat > xform.desc
Matrix transform routines, for single-precision data.
^D
% ci -txform.desc xform.c
% ci -t-"Matrix transform routines, for single-precision data." xform.c

Getting a Working File for Reading

You extract a working file from an existing RCS file with the command co (for "check-out"). The co(1) command is designed to be the mirror-image of ci(1). So, once again, in the simplest case you specify nothing but a filename when you run the command. A simple example of creating a working file is

% co xform.c
xform.c,v --> xform.c revision 1.1 done

This command will look for an RCS file with the name RCS/xform.c,v or xform.c,v and create a working file from it named xform.c. Here, the output from the command confirms that revision 1.1 of xform.c was extracted from the RCS file xform.c,v.

Since the command line doesn't explicitly say that you want to modify xform.c, the file is created read-only. This is a reminder that you shouldn't change it unless you coordinate your change with the RCS file (by locking the revision of xform.c that you want to modify).

If a writable file already exists with the same name as the working file that co is trying to create, the command will warn you of the file's presence and ask you whether you want to overwrite it. If a writable copy of xform.c existed, for instance, the exchange would look like this:

% co xform.c
RCS/xform.c,v --> xform.c revision 1.1 writable xform.c exists; remove it? [ny](n):

At this point, co expects a response from you that starts with y or n--responding with n, or with anything other than a word beginning with y, will cause co to abort the check-out. If you abort the check-out, co confirms that with the additional message

co error: checkout aborted

This message, of course, doesn't really indicate an "error"--just that the check-out was aborted as you requested.

CAUTION: As we said in Chapter 2, co will silently overwrite any read-only copy of a file that already exists with the same name as a working file it wants to check out, on the assumption that the existing file is the result of a previous co for reading. So if a file is under source control, do not try to maintain changed copies of it manually (i.e., outside source control). If you do, then sooner or later you're likely to delete a file you wanted to save.

You can change the way co checks out a file if for some reason the usual safeguards it provides against overwriting a working file aren't appropriate. First of all, you can use the -f option to "force" co to create a working file even if a writable file of the same name already exists. You might use -f if you had copied some outdated copy of the file into your work area but now wanted to overwrite it with a current copy from the archive file.

At the other extreme, you can check out a revision without affecting any file that already exists with the same name, by using the -p option. With -p, co will "print" the checked-out revision to its standard output, rather than putting it into a working file. You can then redirect standard output to capture the file in whatever way is appropriate. You might use -p if you wanted to have more than one revision of a file checked out simultaneously--you could check out all but one revision with -p into files with special names. (Of course, -p is purely a convenience. You can always avoid using it by doing regular check-outs and renaming the working files afterward.)

Getting a Working File for Modification

If you want to change a source file for which you've created an RCS file, you need to get a writable working copy by adding the -l option to the co command line. To check out xform.c for modification, you use the command line:

% co -l xform.c
xform.c,v --> xform.c revision 1.1 (locked) done

Compare the output from this command to that from the last co we looked at. As you can see, the current output confirms that a lock has been set on the revision of xform.c you've checked out. Now that you have the lock, you have the exclusive right to change this revision (revision 1.1) of the file and eventually to check in your working file as the next revision of the RCS file.

If someone else already held the lock to revision 1.1, you would not be able to lock it yourself. However, even when you can't lock a given revision, you can still check it out for reading only (that is, without the -l option). The assumption here is that you won't modify the file when you obtain it for reading only. If, for example, you requested the lock for revision 1.1 of xform.c but couldn't get it, co would inform you with an error message like this one:

% co -l xform.c
RCS/xform.c,v --> xform.c co error: revision 1.1 already locked by cullen

In this case you don't have the option of forcing the check-out to proceed, so co doesn't ask whether you want to. The check-out is aborted unconditionally. The error message points out what user owns the lock, which lets you contact him if you absolutely need to modify the file now. Perhaps he can check it back in. Even if he can't, waiting is better than circumventing RCS.

Occasionally, you may need to set a lock in an archive file without checking out a working file. Say, for instance, you're archiving sources distributed from outside your group, you've moved a new distribution into place, and now you want to check in the new version of each file. Checking out working files would overwrite the new sources with the older ones. To set a lock without creating a working file, use the command rcs -l.

Having a lock set in each archive file will enable you to check in the corresponding newly imported source file as a successor to the existing revision you locked.

Comparing a Working File to Its RCS File

To compare a working file against the RCS file it came from, use the rcsdiff(1) program. If, for instance, you want to compare the current contents of the working file against the original revision you checked out of the RCS file, just give the command with no options, as in

% rcsdiff xform.c

The rcsdiff command will output a line indicating what revision it is reading from the RCS file, then it will run the diff(1) program, comparing the revision it read against the current working file. The diff output will show the original revision as the "old" file being compared and the current working file as the "new" file being compared. Typical output might look like this:

% rcsdiff xform.c
=================================================================== RCS file: xform.c,v retrieving revision 1.1 diff -r1.1 xform.c 4a5 > j_coord = i_coord - x; 11,12c12,13 < for (j = j_coord; j < j_max; ++j) < if (a[j] < b[j]) { --- > for (j = j_coord + 1; j <= j_max; ++j) > if (a[j - 1] < b[j]) { 20d20 < j_coord = i_coord - x;

In other words, in the working file (the new file in the diff listing), an assignment to j_coord has moved from line 20 to line 4, and the first two lines of the for loop currently at line 12 have been changed.

You can also use rcsdiff to compare a working file against some revision other than the one it started from or to compare two different RCS file revisions to each other. To compare your working file against any revision of the RCS file, add a -r option to the rcsdiff command line, naming the revision you're interested in. For instance, to compare the current contents of xform.c against revision 1.3 of its RCS file, you use the command

% rcsdiff -r1.3 xform.c

To compare two different revisions already checked in to the RCS file, just give two -r options, as in

% rcsdiff -r1.1 -r1.2 xform.c

This command produces a diff listing with revision 1.1 as the "old" file and revision 1.2 as the "new" file. This form of rcsdiff can be particularly useful for debugging, since it lets you see recent changes to the file other than your own.

Adding a Working File to Its RCS File

When you're satisfied with the current state of your working file and want to save it for future reference, use the ci command to add it to the corresponding RCS file. This is, of course, the same command you used to create the RCS file in the first place; ordinarily, to check in a working file, you give the same simple command line as you did then. For instance,

% ci xform.c

This command would check in the current contents of xform.c as a new revision in the corresponding RCS file, then delete the working file. When you run ci, you'll be prompted for a description of your changes to the working file, in the same way as ci originally asked you to describe the file itself. A typical interaction might be

% ci xform.c
xform.c,v <-- xform.c new revision: 1.2; previous revision: 1.1 enter log message: (terminate with ^D or single '.') >> In function ff1(): move declaration of j_coord; fix
>> off-by-one error in traversal loop.
>> .
done

Again, you can enter as much text as you like, and you terminate your entry with a period on a line by itself.

Sometimes, you may prefer to give revision commentary directly on the ci command line. This can be handy when you're checking in more than one file and want all of the files to have the same commentary. You do this using the -m option to ci (as we've mentioned a few times in other contexts). For instance, the last check-in that we showed could be phrased as

% ci -m"In function ff1():  move declaration of j_coord; fix\
? off-by-one error in traversal loop." xform.c

Notice that we gave the comments as a quoted string, as they contain white space. Since in this example we're using a csh(1)-style shell, we had to type a backslash to extend the comments onto a second line.

Ordinarily, ci expects a newly checked-in revision to be different from its ancestor and will not complete the check-in if the two are identical. You can use the -f option to "force" a check-in to complete anyway in this case. By default, ci deletes your working file when the check-in is complete. Often, you'll still want a copy of the file to exist afterward. To make ci do an immediate check-out of the working file after checking it in, you can add either of two options to the command line. The -u option will check out your working file unlocked, suitable for read-only use. The -l option will set a new lock on the revision that you just checked in and check out the working file for modification. Both of these options are simply shorthand for doing a separate co following the check-in.

Discarding a Working File

If you decide that you don't want to keep the changes that you've made, you can use the rcs(1) command to discard your changes by unlocking the RCS file revision you started with. Run rcs just by naming the file you want to discard, preceded by the option -u (for "unlock"):

% rcs -u xform.c
RCS file: xform.c,v 1.1 unlocked done

This command will remove any lock you currently have set in the RCS file. However, it doesn't do anything to the working file you name and doesn't even require that the file exist. If you want to remove the working file, you have to do that yourself with rm(1).

If you've set more than one lock in a file under the same username, you need to tell rcs the revision you wish to unlock, by adding a revision ID to the -u option. Without it, the command can't tell which pending update to the archive file you want to cancel. This command, for instance, would unlock revision 1.1 of xform.c,v even if you had another revision locked:

% rcs -u1.1 xform.c
RCS file: xform.c,v 1.1 unlocked done

If you want to discard a working file and replace it with the original revision it came from, it may be more convenient to use the command co -f -u. The -u option causes co to unlock the checked-out revision if it was locked by you, while -f forces co to overwrite your writable working file with the original revision from the archive file.

Viewing the History of an RCS File

As we've seen, the ci command asks you for a description when you create an RCS file, as well as when you add a revision to one. Together, these descriptions form a history, or log, of all that's happened to the RCS file since its creation. The descriptions can be displayed by using the rlog(1) command.

As usual, you simply give on the command line the name of the file you want to examine. Here's an example:

% rlog xform.c RCS file:        xform.c,v;   Working file:    xform.c head:            1.2 branch: locks:           ;  strict access list: symbolic names: comment leader:  " * " total revisions: 2;    selected revisions: 2 description: Matrix transform routines, for single-precision data. ---------------------------- revision 1.2 date: 95/05/10 14:34:02;  author: rully;  state: Exp;  lines added/del: 3/3 In function ff1():  move declaration of j_coord; fix off-by-one error in traversal loop. ---------------------------- revision 1.1 date: 95/04/23 14:32:31;  author: rully;  state: Exp; As ported from 9000 engineering group sources, version 4.13. =============================================================================

The output of rlog can be divided into three parts. First appears a summary of various characteristics of the RCS file, which is unrelated to what we've discussed in this chapter. Next, following the description: line, we find the text entered when the RCS file was first created. Last, a list of revision entries appears, one for each revision in the RCS file. These entries are output with the most recent first. Each one contains the description that was originally entered for that revision.

Cleaning Up an RCS Source Directory

To help you tidy up a source directory when you're done working there, RCS provides a program called rcsclean. This program compares working files in the current directory against their archive files and removes working files that were checked out but never modified. More specifically:

  • A working file that was checked out for reading is removed only if it still matches the head revision on the default branch of the archive file.[4]

    [4] See Chapter 6, Applying RCS to Multiple Releases, for a discussion of RCS archive file branches.

  • If -u is given, a working file that was checked out for modification is removed if it still matches the original revision (that is, the one checked out locked by the user).

  • If a working file does not match the revision noted in the last two cases, then rcsclean will never remove it.

When -u is given, if rcsclean removes a working file, it also removes any lock corresponding to it. Any commands rcsclean executes are echoed to its standard output so you can see what's going on.

If you invoke rcsclean with no arguments, it will process all of the working files in the current directory. If you provide arguments, then only the working files you name will be processed. Needless to say, rcsclean has no effect on files other than working files checked out from an RCS file.

If you want to see what commands rcsclean would execute, if given a certain command line, you can use the -n flag. Then rcsclean will echo the commands it normally would run but will not actually execute them. Note that the output from rcsclean -n looks exactly like the normal output, so be careful not to confuse a -n run with the real McCoy.

Summary

You can put and keep files under source control with RCS by using only two commands, ci and co. This simplicity is a strong point of the system. We've also introduced the rcs command to abort a pending modification to an RCS file and informational commands rcsdiff and rlog, which give you detailed information about the contents of an RCS file. Finally, we presented rcsclean to remove unmodified working files.

Table 3.2 summarizes our presentation so far, by relating each operation in the source file modification cycle (plus a few other basic ones) to the RCS command that implements it.

Table 3.2: Basic RCS Commands
Command Basic Operation
ci Creating an archive file
co Getting a working file for reading
co -l Getting a working file for modification
rcsdiff Comparing a working file to its RCS file
ci Adding a working file to an RCS file
rcs -u plus rm Discarding a working file
rlog Viewing the history of an RCS file
rcsclean Cleaning up a source directory

Remember, too, that all of these commands employ an intelligent command-line interface that fairly well balances simplicity and flexibility and can provide an advantage over SCCS. That said, let's see what SCCS has to offer.

Monday, April 2, 2007

Understanding Virtual Memory -Linux

Understanding Virtual Memory

Introduction

One of the most important aspects of an operating system is the Virtual Memory Management system. Virtual Memory (VM) allows an operating system to perform many of its advanced functions, such as process isolation, file caching, and swapping. As such, it is imperative that an administrator understand the functions and tunable parameters of an operating system's Virtual Memory Manager so that optimal performance for a given workload may be achieved. After reading this article, the reader should have a rudimentary understanding of the data the Red Hat Enterprise Linux (RHEL3) VM controls and the algorithms it uses. Further, the reader should have a fairly good understanding of general Linux VM tuning techniques. It is important to note that Linux as an operating system has a proud legacy of overhaul. Items which no longer serve useful purposes or which have better implementations as technology advances are phased out. This implies that the tuning parameters described in this article may be out of date if you are using a newer or older kernel. Fear not however! With a well grounded understanding of the general mechanics of a VM, it is fairly easy to convert knowledge of VM tuning to another VM. The same general principles apply, and documentation for a given kernel (including its specific tunable parameters) can be found in the corresponding kernel source tree under the file Documentation/sysctl/vm.txt.

Definitions

To properly understand how a Virtual Memory Manager does its job, it helps to understand what components comprise a VM. While the low level view of a VM are overwhelming for most, a high level view is necessary to understand how a VM works and how it can be optimized for workloads.

What Comprises a VM

High Level Overview of VM Subsystem
Figure 1. High Level Overview of VM Subsystem

The inner workings of the Linux virtual memory subsystem are quite complex, but it can be defined at a high level with the following components:

MMU

The Memory Management Unit (MMU) is the hardware base that makes a VM system possible. The MMU allows software to reference physical memory by aliased addresses, quite often more than one. It accomplishes this through the use of pages and page tables. The MMU uses a section of memory to translate virtual addresses into physical addresses via a series of table lookups.

Zoned Buddy Allocator

The Zoned Buddy Allocator is responsible for the management of page allocations to the entire system. This code manages lists of physically contiguous pages and maps them into the MMU page tables, so as to provide other kernel subsystems with valid physical address ranges when the kernel requests them (Physical to Virtual Address mapping is handled by a higher layer of the VM). The name Buddy Allocator is derived from the algorithm this subsystem uses to maintain it free page lists. All physical pages in RAM are cataloged by the Buddy Allocator and grouped into lists. Each list represents clusters of 2n pages, where n is incremented in each list. If no entries exist on the requested list, an entry from the next list up is broken into two separate clusters and is returned to the caller while the other is added to the next list down. When an allocation is returned to the buddy allocator, the reverse process happens. Note that the Buddy Allocator also manages memory zones, which define pools of memory which have different purposes. Currently there are three memory pools which the Buddy Allocator manages accesses for:

  • DMA — This zone consists of the first 16 MB of RAM, from which legacy devices allocate to perform direct memory operations.

  • NORMAL — This zone encompasses memory addresses from 16 MB to 1 GB and is used by the kernel for internal data structures as well as other system and user space allocations.

  • HIGHMEM — This zone includes all memory above 1 GB and is used exclusively for system allocations (file system buffers, user space allocations, etc).

Slab Allocator

The Slab Allocator provides a more usable front end to the Buddy Allocator for those sections of the kernel which require memory in sizes that are more flexible than the standard 4 KB page. The Slab Allocator allows other kernel components to create caches of memory objects of a given size. The Slab Allocator is responsible for placing as many of the cache's objects on a page as possible and monitoring which objects are free and which are allocated. When allocations are requested and no more are available, the Slab Allocator requests more pages from the Buddy Allocator to satisfy the request. This allows kernel components to use memory in a much simpler way. This way components which make use of many small portions of memory are not required to individually implement memory management code so that too many pages are not wasted. The Slab Allocator may only allocate from the DMA and NORMAL zones.

Kernel Threads

The last component in the VM subsystem are the kernel threads: kscand, kswapd, kupdated, and bdflush. These tasks are responsible for the recovery and management of in use memory. All pages of memory have an associated state (for more information on the memory state machine, refer to the section called “The Life of a Page” section. In general, the active tasks in the kernel related to VM usage are responsible for attempting to move pages out of RAM. Periodically they examine RAM, trying to identify and free inactive memory so that it can be put to other uses in the system.

The Life of a Page

All of the memory managed by the VM is labeled by a state. These states help let the VM know what to do with a given page under various circumstances. Dependent on the current needs of the system, the VM may transfer pages from one state to the next, according to the state machine in Figure 2. “VM Page State Machine”. Using these states, the VM can determine what is being done with a page by the system at a given time and what actions the VM may take on the page. The states that have particular meanings are as follows:

  1. FREE — All pages available for allocation begin in this state. This indicates to the VM that the page is not being used for any purpose and is available for allocation.

  2. ACTIVE — Pages which have been allocated from the Buddy Allocator enter this state. It indicates to the VM that the page has been allocated and is actively in use by the kernel or a user process.

  3. INACTIVE DIRTY — This state indicates that the page has fallen into disuse by the entity which allocated it and thus is a candidate for removal from main memory. The kscand task periodically sweeps through all the pages in memory, taking note of the amount of time the page has been in memory since it was last accessed. If kscand finds that a page has been accessed since it last visited the page, it increments the page's age counter; otherwise, it decrements that counter. If kscand finds a page with its age counter at zero, it moves the page to the inactive dirty state. Pages in the inactive dirty state are kept in a list of pages to be laundered.

  4. INACTIVE LAUNDERED — This is an interim state in which those pages which have been selected for removal from main memory enter while their contents are being moved to disk. Only pages which were in the inactive dirty state can enter this state. When the disk I/O operation is complete, the page is moved to the inactive clean state, where it may be deallocated or overwritten for another purpose. If, during the disk operation, the page is accessed, the page is moved back into the active state.

  5. INACTIVE CLEAN — Pages in this state have been laundered. This means that the contents of the page are in sync with the backed up data on disk. Thus, they may be deallocated by the VM or overwritten for other purposes.

VM Page State Machine
Figure 2. VM Page State Machine

Tuning the VM

Now that the picture of the VM mechanism is sufficiently illustrated, how is it adjusted to fit certain workloads? There are two methods for changing tunable parameters in the Linux VM. The first is the sysctl interface. The sysctl interface is a programming oriented interface, which allows software programs to modify various tunable parameters directly. It is exported to system administrators via the sysctl utility, which allows an administrator to specify a value for any of the tunable VM parameters via the command line. For example:

sysctl -w vm.max map count=65535 

The sysctl utility also supports the use of a configuration file (/etc/sysctl.conf), in which all the desirable changes to a VM can be recorded for a system and restored after a restart of the operating system, making this access method suitable for long term changes to a system VM. The file is straightforward in its layout, using simple key-value pairs with comments for clarity. For example:

 #Adjust the min and max read-ahead for files vm.max-readahead=64 vm.min-readahead=32 #turn on memory over-commit  vm.overcommit_memory=2 #bump up the percentage of memory in use to activate bdflush vm.bdflush="40 500 0 0 500 3000 60 20 0"  

The second method of modifying VM tunable parameters is via the proc file system. This method exports every group of VM tunables as a virtual file, accessible via all the common Linux utilities used for modifying file contents. The VM tunables are available in the directory /proc/sys/vm/ and are most commonly read and modified using the cat and echo commands. For example, use the command cat /proc/sys/vm/kswapd to view the current value of the kswapd tunable. The output should be similar to:

 512 32 8  

Then, use the following command to modify the value of the tunable:

echo 511 31 7 > /proc/sys/vm/kswapd 

Use the cat /proc/sys/vm/kswapd command again to verify that the value was modified. The output should be:

 511 31 7  

The proc file system interface is a convenient method for making adjustments to the VM while attempting to isolate the peak performance of a system. For convenience, the following sections list the VM tunable parameters as the filenames they are exported to in the /proc/sys/vm/ directory. Unless otherwise noted, these tunables apply to the RHEL3 2.4.21-4 kernel.

bdflush

The bdflush file contains 9 parameters, of which 6 are tunable. These parameters affect the rate at which pages in the buffer cache (the subset of pagecache which stores files in memory) are freed and returned to disk. By adjusting the various values in this file, a system can be tuned to achieve better performance in environments where large amounts of file I/O are performed. Table 1. “bdflush Parameters” defines the parameters for bdflush in the order they appear in the file.

Parameter Description
nfract The percentage of dirty pages in the buffer cache required to activate the bdflush task
ndirty The maximum number of dirty pages in the buffer cache to write to disk in each bdflush execution
reserved1 Reserved for future use
reserved2 Reserved for future
interval The number of jiffies (10ms periods) to delay between bdflush iterations
age_buffer The time for a normal buffer to age before it is considered for flushing back to disk
nfract_sync The percentage of dirty pages in the buffer cache required to cause the tasks which are writing pages of memory to begin writing those pages to disk instead
nfract_stop_bdflush The percentage of dirty pages in buffer cache required to allow bdflush to return to idle state
reserved3 Reserved for future use
Table 1. bdflush Parameters

Generally, systems that require more free memory for application allocation want to set the bdflush values higher (except for the age_buffer, which would be moved lower), so that file data is sent to disk more frequently and in greater volume, thus freeing up pages of RAM for application use. This, of course, comes at the expense of CPU cycles because the system processor spends more time moving data to disk and less time running applications. Conversely, systems which are required to perform large amounts of I/O would want to do the opposite to these values, allowing more RAM to be used to cache disk file so that file access is faster.

dcache_priority

This file controls the bias of the priority for caching directory contents. When the system is under stress, it selectively reduces the size of various file system caches in an effort to reclaim memory. By increasing this value, memory reclamation bias is shifted away from the dirent cache. By reducing this amount, the bias is shifted towards reclaiming dirent memory. This is not a particularly useful tuning parameter, but it can be helpful in maintaining the interactive response time on an otherwise heavily loaded system. If you experience intolerable delays in communicating with your system when it is busy performing other work, increasing this parameter may help.

hugetlb_pool

The hugetlb_pool file is responsible for recording the number of megabytes used for huge pages. Huge pages are just like regular pages in the VM, only they are an order of magnitude larger. Note also that huge pages are not swappable. Huge pages are both beneficial and detrimental to a system. They are helpful in that each huge page takes only one set of entries in the VM page tables, which allows for a higher degree of virtual address caching in the TLB (Translation Look-aside Buffer: A device which caches virtual address translations for faster lookups) and a requisite performance improvement. On the downside, they are very large and can be wasteful of memory resources for those applications which do not need large amounts of memory. Some applications, however, do require large amounts of memory and can make good use of huge pages if they are written to be aware of them. If a system is running applications which require large amounts of memory and is aware of this feature, then it is advantageous to increase this value to an amount satisfactory to that application or set of applications.

inactive_clean_percent

This control specifies the minimum percentage of pages in each page zone that must be in the clean or laundered state. If any zone drops below this threshold, and the system is under pressure for more memory, then that zone will begin having its inactive dirty pages laundered. Note that this control is only available on the 2.4.21-5EL kernels forward. Raising the value for the corresponding zone which is memory starved causes pages to be paged out more quickly, eliminating memory starvation at the expense of CPU clock cycles. Lowering this number allows more data to remain in RAM, increasing the system performance but at the risk of memory starvation.

kswapd

While this set of parameters previously defined how frequently and in what volume a system moved non-buffer cache pages to disk, in Red Hat Enterprise Linux 3, these controls are unused.

max_map_count

The max_map_count file allows for the restriction of the number of VMAs (Virtual Memory Areas) that a particular process can own. A Virtual Memory Area is a contiguous area of virtual address space. These areas are created during the life of the process when the program attempts to memory map a file, links to a shared memory segment, or allocates heap space. Tuning this value limits the amount of these VMAs that a process can own. Limiting the amount of VMAs a process can own can lead to problematic application behavior because the system will return out of memory errors when a process reaches its VMA limit but can free up lowmem for other kernel uses. If your system is running low on memory in the NORMAL zone, then lowering this value will help free up memory for kernel use.

max-readahead

The max-readahead tunable affects how early the Linux VFS (Virtual File System) fetches the next block of a file from memory. File readahead values are determined on a per file basis in the VFS and are adjusted based on the behavior of the application accessing the file. Anytime the current position being read in a file plus the current read ahead value results in the file pointer pointing to the next block in the file, that block is fetched from disk. By raising this value, the Linux kernel allows the readahead value to grow larger, resulting in more blocks being prefetched from disks which predictably access files in uniform linear fashion. This can result in performance improvements but can also result in excess (and often unnecessary) memory usage. Lowering this value has the opposite affect. By forcing readaheads to be less aggressive, memory may be conserved at a potential performance impact.

min-readahead

Like max-readahead, min-readahead places a floor on the readahead value. Raising this number forces a file's readahead value to be unconditionally higher, which can bring about performance improvements provided that all file access in the system is predictably linear from the start to the end of a file. This, of course, results in higher memory usage from the pagecache. Conversely, lowering this value, allows the kernel to conserve pagecache memory at a potential performance cost.

overcommit_memory

overcommit_memory is a value which sets the general kernel policy toward granting memory allocations. If the value is 0, then the kernel checks to determine if there is enough memory free to grant a memory request to a malloc call from an application. If there is enough memory, then the request is granted. Otherwise, it is denied and an error code is returned to the application. If the setting in this file is 1, the kernel allows all memory allocations, regardless of the current memory allocation state. If the value is set to 2, then the kernel grants allocations above the amount of physical RAM and swap in the system as defined by the overcommit_ratio value. Enabling this feature can be somewhat helpful in environments which allocate large amounts of memory expecting worst case scenarios but do not use it all.

overcommit_ratio

The overcommit_ratio tunable defines the amount by which the kernel overextends its memory resources in the event that overcommit_memory is set to the value of 2. The value in this file represents a percentage added to the amount of actual RAM in a system when considering whether to grant a particular memory request. For instance, if this value is set to 50, then the kernel would treat a system with 1 GB of RAM and 1 GB of swap as a system with 2.5 GB of allocatable memory when considering whether to grant a malloc request from an application. The general formula for this tunable is:

allocatable memory=(swap size + (RAM size * overcommit ratio)) 

Use these previous two parameters with caution. Enabling overcommit_memory can create significant performance gains at little cost but only if your applications are suited to its use. If your applications use all of the memory they allocate, memory overcommit can lead to short performance gains followed by long latencies as your applications are swapped out to disk frequently when they must compete for oversubscribed RAM. Also, ensure that you have at least enough swap space to cover the overallocation of RAM (meaning that your swap space should be at least big enough to handle the percentage if overcommit in addition to the regular 50 percent of RAM that is normally recommended).

pagecache

The pagecache file adjusts the amount of RAM which can be used by the page cache. The page cache holds various pieces of data, such as open files from disk, memory mapped files, and pages of executable programs. Modifying the values in this file dictates how much of memory is used for this purpose. Table 2. “pagecache Parameters” defines the parameters for pagecache in the order they appear in the file.

Parameter Description
min The minimum amount of memory to reserve for pagecache use.
borrow The percentage of pagecache pages kswapd uses to balance the reclaiming of pagecache pages and process memory.
max If more memory than this percentage is used by pagecache, kswapd only evicts pages from the pagecache. Once the amount of memory in pagecache is below this threshold, kswapd begins moving process pages to swap again.
Table 2. pagecache Parameters

Increasing these values allows more programs and cached files to stay in memory longer, thereby allowing applications to execute more quickly. On memory starved systems, however, this may lead to application delays as processes must wait for memory to become available. Moving these values downward swaps processes and other disk-backed data out more quickly, allowing for other processes to obtain memory more easily and increasing execution speed. For most workloads the automatic tuning is sufficient. However, if your workload suffers from excessive swapping and a large cache, you may want to reduce the values until the swapping problem goes away.

page-cluster

The kernel attempts to read multiple pages from disk on a page fault to avoid excessive seeks on the hard drive. This parameter defines the number of pages the kernel tries to read from memory during each page fault. The value is interpreted as 2page-cluster pages for each page fault. A page fault is encountered every time a virtual memory address is accessed for which there is not yet a corresponding physical page assigned or for which the corresponding physical page has been swapped to disk. If the memory address has been requested in a valid way (for example, the application contains the address in its virtual memory map), then the kernel associates a page of RAM with the address or retrieves the page from disk and places it back in RAM. Then the kernel restarts the application from where it left off. By increasing the page-cluster value, pages subsequent to the requested page are also retrieved, meaning that if the workload of a particular system accesses data in RAM in a linear fashion, increasing this parameter can provide significant performance gains (much like the file readahead parameters described earlier). Of course if your workload accesses data discreetly in many separate areas of memory, then this can just as easily cause performance degradation.

Example Scenarios

Now that we have covered the details of kernel tuning, let us look at some example workloads and the various tuning parameters that may improve system performance.

File (IMAP, Web, etc.) Server

This workload is geared towards performing a large amount of I/O to and from the local disk, thus benefiting from an adjustment allowing more files to be maintained in RAM. This speeds up I/O by caching more files in RAM and eliminating the need to wait for disk I/O to complete. A simple change to sysctl.conf as follows usually benefits this workload:

 #increase the amount of RAM pagecache is allowed to use  #before we start moving it back to disk  vm.pagecache="10 40 100"  

General Compute Server With Many Active Users

This workload is a very general type of configuration. It involves many active users who likely run many processes, all of which may or may not be CPU intensive or I/O intensive or a combination thereof. As the default VM configuration attempts to find a balance between I/O and process memory usage, it may be best to leave most configuration settings alone in this case. However, this environment likely contains many small processes which, regardless of workload, consume memory resources, particularly lowmem. It may help, therefore, to tune the VM to conserve low memory resources when possible:

 #lower the pagecache max to keep from eating all memory up with cache  vm.pagecache=10 25 50  #lower max-readahead to reduce the amount of unneeded IO  vm.max-readahead=16  

Non interactive (Batch) Computing Server

A batch computing server is usually the exact opposite of a file server. Applications run without human interaction, and they commonly perform with little I/O. The number of processes running on controlled. Consequently this system should allow maximum throughput:

 #Reduce the amount of pagecache normally allowed vm.pagecache="1 10 100" #do not worry about conserving lowmem, not that many processes vm.max_map_count=128000 14 #crank up overcommit, processes can sleep as they are not interactive vm.overcommit=2  vm.overcommit_ratio=75  

Further Reading

  1. Understanding the Linux Kernel by Daniel Bovet and Marco Cesati (O'Reilly & Associates)

  2. Virtual Memory Behavior in Red Hat Enterprise Linux AS 2.1 by Bob Matthews and Norm Murray

  3. Towards an O(1) VM by Rik Van Riel

  4. The Linux Kernel Source Tree, versions 2.4.21-4EL & 2.4.21-5EL

Wednesday, March 7, 2007

Cool Unix Commands

Cool Commands

Peter Baer Galvin

There are so many commands in Solaris that it is difficult to separate the cool ones from the mundane. For example, there are commands to report how much time a program spends in each system call, and commands to dynamically show system activities, and most of these commands are included with Solaris 8 as well as Solaris 9. This month, I’m highlighting some of the commands that you might find particularly useful.

Systems administrators are tool users. Through experience, we have learned that the more tools we have, the better able we are to diagnose problems and implement solutions. The commands included in this column are gleaned from experience, friends, acquaintances, and from attendance at the SunNetwork 2002 conference in September. “The /procodile Hunter” talk by Solaris kernel developers Brian Cantrill and Mike Shapiro was especially enlightening and frightening because Cantrill wrote code to illustrate a point faster than Shapiro could explain the point they were trying to illustrate!

Useful Solaris Commands

truss -c (Solaris >= 8): This astounding option to truss provides a profile summary of the command being trussed:

$ truss -c grep asdf work.doc  syscall              seconds   calls  errors  _exit                    .00       1  read                     .01      24  open                     .00       8      4  close                    .00       5  brk                      .00      15  stat                     .00       1  fstat                    .00       4  execve                   .00       1  mmap                     .00      10  munmap                   .01       3  memcntl                  .00       2  llseek                   .00       1  open64                   .00       1                          ----     ---    ---  sys totals:              .02      76      4  usr time:                .00  elapsed:                 .05

It can also show profile data on a running process. In this case, the data shows what the process did between when truss was started and when truss execution was terminated with a control-c. It’s ideal for determining why a process is hung without having to wade through the pages of truss output.

truss -d and truss -D (Solaris >= 8): These truss options show the time associated with each system call being shown by truss and is excellent for finding performance problems in custom or commercial code. For example:

$ truss -d who  Base time stamp:  1035385727.3460  [ Wed Oct 23 11:08:47 EDT 2002 ]   0.0000 execve(“/usr/bin/who”, 0xFFBEFD5C, 0xFFBEFD64)  argc = 1   0.0032 stat(“/usr/bin/who”, 0xFFBEFA98)                = 0   0.0037 open(“/var/ld/ld.config”, O_RDONLY)             Err#2 ENOENT   0.0042 open(“/usr/local/lib/libc.so.1”, O_RDONLY)      Err#2 ENOENT   0.0047 open(“/usr/lib/libc.so.1”, O_RDONLY)            = 3   0.0051 fstat(3, 0xFFBEF42C)                            = 0  . . .

truss -D is even more useful, showing the time delta between system calls:

Dilbert> truss -D who   0.0000 execve(“/usr/bin/who”, 0xFFBEFD5C, 0xFFBEFD64)  argc = 1   0.0028 stat(“/usr/bin/who”, 0xFFBEFA98)                = 0   0.0005 open(“/var/ld/ld.config”, O_RDONLY)             Err#2 ENOENT   0.0006 open(“/usr/local/lib/libc.so.1”, O_RDONLY)      Err#2 ENOENT   0.0005 open(“/usr/lib/libc.so.1”, O_RDONLY)            = 3   0.0004 fstat(3, 0xFFBEF42C)                            = 0

In this example, the stat system call took a lot longer than the others.

truss -T: This is a great debugging help. It will stop a process at the execution of a specified system call. (“-U” does the same, but with user-level function calls.) A core could then be taken for further analysis, or any of the /proc tools could be used to determine many aspects of the status of the process.

truss -l (improved in Solaris 9): Shows the thread number of each call in a multi-threaded processes. Solaris 9 truss -l finally makes it possible to watch the execution of a multi-threaded application.

Truss is truly a powerful tool. It can be used on core files to analyze what caused the problem, for example. It can also show details on user-level library calls (either system libraries or programmer libraries) via the “-u” option.

pkg-get: This is a nice tool (http://www.bolthole.com/solaris) for automatically getting freeware packages. It is configured via /etc/pkg-get.conf. Once it’s up and running, execute pkg-get -a to get a list of available packages, and pkg-get -i to get and install a given package.

plimit (Solaris >= 8): This command displays and sets the per-process limits on a running process. This is handy if a long-running process is running up against a limit (for example, number of open files). Rather than using limit and restarting the command, plimit can modify the running process.

coreadm (Solaris >= 8): In the “old” days (before coreadm), core dumps were placed in the process’s working directory. Core files would also overwrite each other. All this and more has been addressed by coreadm, a tool to manage core file creation. With it, you can specify whether to save cores, where cores should be stored, how many versions should be retained, and more. Settings can be retained between reboots by coreadm modifying /etc/coreadm.conf.

pgrep (Solaris >= 8): pgrep searches through /proc for processes matching the given criteria, and returns their process-ids. A great option is “-n”, which returns the newest process that matches.

preap (Solaris >= 9): Reaps zombie processes. Any processes stuck in the “z” state (as shown by ps), can be removed from the system with this command.

pargs (Solaris >= 9): Shows the arguments and environment variables of a process.

nohup -p (Solaris >= 9): The nohup command can be used to start a process, so that if the shell that started the process closes (i.e., the process gets a “SIGHUP” signal), the process will keep running. This is useful for backgrounding a task that should continue running no matter what happens around it. But what happens if you start a process and later want to HUP-proof it? With Solaris 9, nohup -p takes a process-id and causes SIGHUP to be ignored.

prstat (Solaris >= 8): prstat is top and a lot more. Both commands provide a screen’s worth of process and other information and update it frequently, for a nice window on system performance. prstat has much better accuracy than top. It also has some nice options. “-a” shows process and user information concurrently (sorted by CPU hog, by default). “-c” causes it to act like vmstat (new reports printed below old ones). “-C” shows processes in a processor set. “-j” shows processes in a “project”. “-L” shows per-thread information as well as per-process. “-m” and “-v” show quite a bit of per-process performance detail (including pages, traps, lock wait, and CPU wait). The output data can also be sorted by resident-set (real memory) size, virtual memory size, execute time, and so on. prstat is very useful on systems without top, and should probably be used instead of top because of its accuracy (and some sites care that it is a supported program).

trapstat (Solaris >= 9): trapstat joins lockstat and kstat as the most inscrutable commands on Solaris. Each shows gory details about the innards of the running operating system. Each is indispensable in solving strange happenings on a Solaris system. Best of all, their output is good to send along with bug reports, but further study can reveal useful information for general use as well.

vmstat -p (Solaris >= 8): Until this option became available, it was almost impossible (see the “se toolkit”) to determine what kind of memory demand was causing a system to page. vmstat -p is key because it not only shows whether your system is under memory stress (via the “sr” column), it also shows whether that stress is from application code, application data, or I/O. “-p” can really help pinpoint the cause of any mysterious memory issues on Solaris.

pmap -x (Solaris >= 8, bugs fixed in Solaris >= 9): If the process with memory problems is known, and more details on its memory use are needed, check out pmap -x. The target process-id has its memory map fully explained, as in:

# pmap -x 1779  1779:   -ksh   Address  Kbytes     RSS    Anon  Locked Mode   Mapped File  00010000     192     192       -       - r-x--  ksh  00040000       8       8       8       - rwx--  ksh  00042000      32      32       8       - rwx--    [ heap ]  FF180000     680     664       -       - r-x--  libc.so.1  FF23A000      24      24       -       - rwx--  libc.so.1  FF240000       8       8       -       - rwx--  libc.so.1  FF280000     568     472       -       - r-x--  libnsl.so.1  FF31E000      32      32       -       - rwx--  libnsl.so.1  FF326000      32      24       -       - rwx--  libnsl.so.1  FF340000      16      16       -       - r-x--  libc_psr.so.1  FF350000      16      16       -       - r-x--  libmp.so.2  FF364000       8       8       -       - rwx--  libmp.so.2  FF380000      40      40       -       - r-x--  libsocket.so.1  FF39A000       8       8       -       - rwx--  libsocket.so.1  FF3A0000       8       8       -       - r-x--  libdl.so.1  FF3B0000       8       8       8       - rwx--    [ anon ]  FF3C0000     152     152       -       - r-x--  ld.so.1  FF3F6000       8       8       8       - rwx--  ld.so.1  FFBFE000       8       8       8       - rw---    [ stack ]  -------- ------- ------- ------- -------  total Kb    1848    1728      40       -

Here we see each chunk of memory, what it is being used for, how much space it is taking (virtual and real), and mode information.

df -h (Solaris >= 9): This command is popular on Linux, and just made its way into Solaris. df -h displays summary information about file systems in human-readable form:

$ df -h  Filesystem             size   used  avail capacity  Mounted on  /dev/dsk/c0t0d0s0      4.8G   1.7G   3.0G    37%    /  /proc                    0K     0K     0K     0%    /proc  mnttab                   0K     0K     0K     0%    /etc/mnttab  fd                       0K     0K     0K     0%    /dev/fd  swap                   848M    40K   848M     1%    /var/run  swap                   849M   1.0M   848M     1%    /tmp  /dev/dsk/c0t0d0s7       13G    78K    13G     1%    /export/home

Conclusion

Each administrator has a set of tools used daily, and another set of tools to help in a pinch. This column included a wide variety of commands and options that are lesser known, but can be very useful. Do you have favorite tools that have saved you in a bind? If so, please send them to me so I can expand my tool set as well. Alternately, send along any tools that you hate or that you feel are dangerous, which could also turn into a useful column!

Tuesday, March 6, 2007

HOWTO: Mirrored root disk on Solaris

http://www.brandonhutchinson.com/Mirroring_disks_with_DiskSuite.html


0. Partition the first disk

# format c0t0d0

Use the partition tool (=> "p <enter>, p <enter>"!) to setup the slices. We assume the following slice setup afterwards:

#  Tag         Flag  Cylinders      Size      Blocks
- ---------- ---- ------------- -------- --------------------
0 root wm 0 - 812 400.15MB (813/0/0) 819504
1 swap wu 813 - 1333 256.43MB (521/0/0) 525168
2 backup wm 0 - 17659 8.49GB (17660/0/0) 17801280
3 unassigned wm 1334 - 1354 10.34MB (21/0/0) 21168
4 var wm 1355 - 8522 3.45GB (7168/0/0) 7225344
5 usr wm 8523 - 14764 3.00GB (6242/0/0) 6291936
6 unassigned wm 14765 - 16845 1.00GB (2081/0/0) 2097648
7 home wm 16846 - 17659 400.15MB (813/0/0) 819504

1. Copy the partition table of the first disk to its future mirror disk

# prtvtoc /dev/rdsk/c0t0d0s2  fmthard -s - /dev/rdsk/c0t1d0s2

2. Create at least two state database replicas on each disk

# metadb -a -f -c 2 c0t0d0s3 c0t1d0s3

Check the state of all replicas with metadb:

# metadb

Notes:

A state database replica contains configuration and state information about the meta devices. Make sure that always at least 50% of the replicas are active!


3. Create the root slice mirror and its first submirror

# metainit -f d10 1 1 c0t0d0s0
# metainit -f d20 1 1 c0t1d0s0
# metainit d30 -m d10

Run metaroot to prepare /etc/vfstab and /etc/system (do this only for the root slice!):

# metaroot d30

4. Create the swap slice mirror and its first submirror

# metainit -f d11 1 1 c0t0d0s1
# metainit -f d21 1 1 c0t1d0s1
# metainit d31 -m d11

5. Create the var slice mirror and its first submirror

# metainit -f d14 1 1 c0t0d0s4
# metainit -f d24 1 1 c0t1d0s4
# metainit d34 -m d14

6. Create the usr slice mirror and its first submirror

# metainit -f d15 1 1 c0t0d0s5
# metainit -f d25 1 1 c0t1d0s5
# metainit d35 -m d15

7. Create the unassigned slice mirror and its first submirror

# metainit -f d16 1 1 c0t0d0s6
# metainit -f d26 1 1 c0t1d0s6
# metainit d36 -m d16

8. Create the home slice mirror and its first submirror

# metainit -f d17 1 1 c0t0d0s7
# metainit -f d27 1 1 c0t1d0s7
# metainit d37 -m d17

9. Edit /etc/vfstab to mount all mirrors after boot, including mirrored swap

/etc/vfstab before changes:

fd                 -                   /dev/fd  fd     -  no   -
/proc - /proc proc - no -
/dev/dsk/c0t0d0s1 - - swap - no -
/dev/md/dsk/d30 /dev/md/rdsk/d30 / ufs 1 no logging
/dev/dsk/c0t0d0s5 /dev/rdsk/c0t0d0s5 /usr ufs 1 no ro,logging
/dev/dsk/c0t0d0s4 /dev/rdsk/c0t0d0s4 /var ufs 1 no nosuid,logging
/dev/dsk/c0t0d0s7 /dev/rdsk/c0t0d0s7 /home ufs 2 yes nosuid,logging
/dev/dsk/c0t0d0s6 /dev/rdsk/c0t0d0s6 /opt ufs 2 yes nosuid,logging
swap - /tmp tmpfs - yes -

/etc/vfstab after changes:

fd                 -                   /dev/fd  fd     -  no   -
/proc - /proc proc - no -
/dev/md/dsk/d31 - - swap - no -
/dev/md/dsk/d30 /dev/md/rdsk/d30 / ufs 1 no logging
/dev/md/dsk/d35 /dev/md/rdsk/d35 /usr ufs 1 no ro,logging
/dev/md/dsk/d34 /dev/md/rdsk/d34 /var ufs 1 no nosuid,logging
/dev/md/dsk/d37 /dev/md/rdsk/d37 /home ufs 2 yes nosuid,logging
/dev/md/dsk/d36 /dev/md/rdsk/d36 /opt ufs 2 yes nosuid,logging
swap - /tmp tmpfs - yes -

Notes:

The entry for the root device (/) has already been altered by the metaroot command we executed before.


10. Reboot the system

# lockfs -fa && init 6

11. Attach the second submirrors to all mirrors

# metattach d30 d20
# metattach d31 d21
# metattach d34 d24
# metattach d35 d25
# metattach d36 d26
# metattach d37 d27

Notes:

This will finally cause the data from the boot disk to be synchronized with the mirror drive.

You can use metastat to track the mirroring progress.


12. Change the crash dump device to the swap metadevice

# dumpadm -d `swap -l  tail -1  awk '{print $1}'

13. Make the mirror disk bootable

# installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c0t1d0s0

Notes:

This will install a boot block to the second disk.


14. Determine the physical device path of the mirror disk

# ls -l /dev/dsk/c0t1d0s0
... /dev/dsk/c0t1d0s0 -> ../../devices/pci@1f,4000/scsi@3/sd@1,0:a

15. Create a device alias for the mirror disk

# eeprom "nvramrc=devalias mirror /pci@1f,4000/scsi@3/disk@1,0"
# eeprom "use-nvramrc?=true"

Add the mirror device alias to the Open Boot parameter boot-device to prepare the case of a problem with the primary boot device.

# eeprom "boot-device=disk mirror cdrom net"

You can also configure the device alias and boot-device list from the Open Boot Prompt (OBP a.k.a. ok prompt):

ok nvalias mirror /pci@1f,4000/scsi@3/disk@1,0
ok use-nvramrc?=true
ok boot-device=disk mirror cdrom net

Notes:

From the OBP, you can use boot mirror to boot from the mirror disk.

On my test system, I had to replace sd@1,0:a with disk@1,0. Use devalias on the OBP prompt to determine the correct device path.