Thursday, December 28, 2006

I/O Bottleneck - Trouble shooting

What does 100 percent busy mean?

Unix Insider 8/1/99

Q: Some of my disks get really slow when they are nearly 100 percent busy; however, when I see a striped volume or hardware RAID unit at high utilization levels, it still seems to respond quickly. Why is this? Do the old rules about high utilization still apply?

A: This occurs because more complex systems don't obey the same rules as simple systems when it comes to response time, throughput, and utilization. Even the simple systems aren't so simple. I'll begin our examination of this phenomenon by looking at a single disk, and then move on to combinations.

Part of this answer is based on my September 1997 Performance Q&A column. The information from the column was updated and included in my book as of April 1998, and has been further updated for inclusion in Sun BluePrints for Resource Management. Written by several members of our group at Sun, this book will be published this summer (see Resources for more information on both the book and the column). I've added much more explanation and several examples here.

Measurements on a single disk
In an old-style, single-disk model, the device driver maintains a queue of waiting requests that are serviced one at a time by the disk. The terms utilization, service time, wait time, throughput, and wait queue length have well-defined meanings in this scenario; and, for this sort of basic system, the setup is so simple that a very basic queuing model fits it well.

Figure 1. The simple disk model

Over time, disk technology has moved on. Nowadays, a standard disk is SCSI-based and has an embedded controller. The disk drive contains a small microprocessor and about 1 MB of RAM. It can typically handle up to 64 outstanding requests via SCSI tagged-command queuing. The system uses an SCSI host bus adaptor to talk to the disk. In large systems, there is yet another level of intelligence and buffering in a hardware RAID controller. However, the iostat utility is still built around the simple disk model above, and its use of terminology still assumes a single disk that can only handle a single request at a time. In addition, iostat uses the same reporting mechanism for client-side NFS mount points and complex disk volumes set up using Solstice DiskSuite or Veritas Volume Manager.

In the old days, if the device driver sent a request to the disk, the disk would do nothing else until it completed the request. The time this process took was the service time, and the average service time was a physical property of the disk itself. Disks that spun and sought faster had lower (and thus better) service times. With today's systems, if the device driver issues a request, that request is queued internally by the RAID controller and the disk drive, and several more requests can be sent before a response to the first comes back. The service time, as measured by the device driver, varies according to the load level and queue length, and is not directly comparable to the old-style service time of a simple disk drive. The response time is defined as the total waiting time in the queue plus the service time. Unfortunately, as I've mentioned before, iostat reports response time but labels it svc_t. We'll see later how to calculate the actual service time for a disk.

As soon as a device has one request in its internal queue, it becomes busy, and the proportion of the time that it is busy is the utilization. If there is always a request waiting, then the device is 100 percent busy. Because a single disk can only complete one I/O request at a time, it saturates at 100 percent busy. If the device has a large number of requests, and it is intelligent enough to reorder them, it may reduce the average service time and increase the throughput as more load is applied, even though it is already at 100 percent utilization.

The diagram below shows how a busy disk can operate more efficiently than a lightly loaded disk. In practice, the main difference you would see would be a lower service time for the busy disk, albeit with a higher average response time. This is because all the requests are present in the queue at the start, so the response time for the last request includes the time spent waiting for every other request to complete. In the lightly loaded case, each request is serviced as it is made, so there is no waiting, and response time is the same as the service time. If you hear your disk rattling on a desktop system when you start an application, it's because the head is seeking back and forth, as shown in the first case. Unfortunately, starting an application tends to generate a single thread of page-in disk reads. Each such read is not issued until the previous one is completed, so you end up with a fairly busy disk with only one request in the queue -- and it can't be optimized. If the disk is on a busy server instead, there are numerous accesses coming in parallel from different transactions and different users, so you will get a full queue and more efficient disk usage overall.

Figure 2. Disk head movements for a request sequence

Solaris disk instrumentation
The instrumentation provided in the Solaris operating environment takes account of this change by taking a request's waiting period and breaking it up into two separately measured queues. One queue, called the wait queue, is in the device driver; the other, called the active queue, is in the device itself. A read or write command is issued to the device driver and sits in the wait queue until the SCSI bus and disk are both ready. When the command is sent to the disk device, it moves to the active queue until the disk sends its response. The problem with iostat is that it tries to report the new measurements using some of the original terminology. The wait service time is actually the time spent in the wait queue. This isn't the correct definition of service time, in any case, and the word wait is being used to mean two different things.

Figure 3. Two-stage disk model used by Solaris 2

Utilization (U) is defined as the busy time (B) as a percentage of the total time (T) as shown below:

Now, we get to something called service time (S), but this is not what iostat prints out and calls svc_t. This is the real thing! It can be calculated as the busy time (B) divided by the number of accesses that completed, or alternatively as the utilization (U) divided by the throughput (X):

run is as close as you can get to the old-style disk service time; remember, however, that modern disks can queue more than one command at a time and can return them in a different order than the sequence in which they were issued, so it isn't an exact equivalent. To calculate S_run from iostat output, you need to divide the utilization by the total number of reads and writes, as we see here.

  % iostat -xn ...                               extended device statistics   r/s  w/s   kr/s   kw/s wait actv wsvc_t asvc_t   %w   %b device  21.9 63.5 1159.1 2662.9  0.0  2.7    0.0   31.8   0    93 c3t15d0

In this case U = 93% = 0.93, and throughput X = r/s + w/s = 21.9 + 63.5 = 85.4; so, service time S = U/X = 0.011 = 11 milliseconds (ms), while the reported response time R = 31.8 ms. The queue length is reported as 2.7, so this makes sense, as each request has to wait in the queue for several other requests to be serviced.

Using the SE Toolkit, a modified version of iostat written in SE prints out the response time and the service time data, using the format shown below.

  % se siostat.se 10 03:42:50  ------throughput------ -----wait queue----- ----active queue---- disk      r/s  w/s   Kr/s   Kw/s qlen res_t svc_t  %ut qlen res_t svc_t  %ut c0t2d0s0  0.0  0.2    0.0    1.2 0.00  0.02  0.02    0 0.00 22.87 22.87   0 03:43:00  ------throughput------ -----wait queue----- ----active queue---- disk      r/s  w/s   Kr/s   Kw/s qlen res_t svc_t  %ut qlen res_t svc_t  %ut c0t2d0s0  0.0  3.2    0.0   23.1 0.00  0.01  0.01    0 0.72 225.45 16.20   5

We can get the number that iostat calls service time. It's defined as the queue length (Q, shown by iostat with the headings wait and actv) divided by the throughput; but it's actually the residence or response time and includes all queuing effects:

Taking the values from our iostat example, R = Q / X = 2.7 / 85.4 = 0.0316 = 31.6 ms, which is close enough to what iostat reports. The difference between 31.6 and 31.8 is due to rounding errors in the reported values of 2.7 and 85.4. Using full precision, the result is identical to what iostat calculates as the response time.

Another way to express response time is in terms of service time and utilization. This method uses a theoretical model of response time that assumes that, as you approach 100 percent utilization with a constant service time, the response time increases to infinity:

Taking our example again, R = S/(1-U) = 0.011 / (1-0.93) = 0.157 = 157 ms. This is a lot more than the measured response time of 31.8 ms, so the disk is operating better than the simple model predicts at high utilizations. There are several reasons for this: the disk is much more complex than the model; it is actively trying to optimize itself, so the service time isn't constant; and the incoming data isn't as random as the model's assumptions would have it. However, the model does provide the right characteristics, and can be used as a simple way to do a worst-case analysis.

Complex resource utilization characteristics
One important characteristic of complex I/O subsystems is that the utilization measure can be confusing. When a simple system reaches 100 percent busy, it has also reached its maximum throughput. This is because only one thing is being processed at a time in the I/O device. When the device being monitored is an NFS server, a hardware RAID disk subsystem, or a striped volume, the situation is clearly much more complex. All of these can process many requests in parallel.

Figure 4. Complex I/O device queue model

As long as a single I/O is being serviced at all times, the utilization is reported as 100 percent, which makes sense because it means that the pool of devices is always busy doing something. However, there is enough capacity for additional I/Os to be serviced in parallel. Compared to a simple device, the service time for each I/O is the same, but the queue is being drained more quickly; thus, the average queue length and response time are less, and the peak throughput is greater. In effect, the load on each disk is divided by the number of disks; therefore, the true utilization of the striped disk volume is actually above 100 percent. You can see how this arises from the alternative definition of utilization as the throughput multiplied by the service time.

With only one request being serviced at a time, the busy time is the time it takes to service one request multiplied by the number of requests. If several requests can be serviced at once, the calculated utilization goes above 100 percent, because more than one thing can be done at a time! A four-way stripe, with each individual disk 100 percent busy, will have the same service time as one disk, but four times the throughput, and thus should really report up to 400 percent utilization.

The approximated model for response time in this case changes so that response time stays lower for a longer period of time; but it still heads for infinity when the underlying devices each reach 100 percent utilization.

Wrap up
So the real answer to our initial question is that the model of disk behavior and performance that is embodied by the iostat report is too simple to cope with the reality of a complex underlying disk subsystem. We stay with the old report to be consistent and to offer users familiar data, but in reality, a much more sophisticated approach is required. I'm working (slowly) on figuring out how to monitor and report on complex devices like this.

Resources

SE download site: http://www.sun.com/951001/columns/adrian/column2.html
virtual_adrian.se rule: http://www.sun.com/951001/columns/adrian/column2.html
Sun BluePrints Online: The Website that promotes best practices in the datacenter: http://www.sun.com/blueprints
See Adrian Cockcroft's FAQ for answers to three dozen performance-related questions: http://www.itworld.com/Comp/3380/UIR010329cockcroftletters/

Linux Memory Management

Linux Memory Management

[hide]

[edit]

Overview of memory management

Traditional Unix tools like 'top' often report a surprisingly small amount of free memory after a system has been running for a while. For instance, after about 3 hours of uptime, the machine I'm writing this on reports under 60 MB of free memory, even though I have 512 MB of RAM on the system. Where does it all go?

The biggest place it's being used is in the disk cache, which is currently over 290 MB. This is reported by top as "cached". Cached memory is essentially free, in that it can be replaced quickly if a running (or newly starting) program needs the memory.

The reason Linux uses so much memory for disk cache is because the RAM is wasted if it isn't used. Keeping the cache means that if something needs the same data again, there's a good chance it will still be in the cache in memory. Fetching the information from there is around 1,000 times quicker than getting it from the hard disk. If it's not found in the cache, the hard disk needs to be read anyway, but in that case nothing has been lost in time.

To see a better estimation of how much memory is really free for applications to use, run the command free -m:

Code: free -m

             total       used       free     shared    buffers     cached  Mem:           503        451         52          0         14        293  -/+ buffers/cache:        143        360  Swap:         1027          0       1027

The -/+ buffers/cache line shows how much memory is used and free from the perspective of the applications. Generally speaking, if little swap is being used, memory usage isn't impacting performance at all.

Notice that I have 512 MB of memory in my machine, but only 52 is listed as available by free (this 52 meg is used by the Kernel). This is mainly because the kernel can't be swapped out, so the memory it occupies could never be freed. There may also be regions of memory reserved for/by the hardware for other purposes as well, depending on the system architecture. However, the actualy memory that is free is 360Meg in this case (buffer memory Free).

[edit]

The mysterious 880 MB limit on x86

By default, the Linux kernel runs in and manages only low memory. This makes managing the page tables slightly easier, which in turn makes memory accesses slightly faster. The downside is that it can't use all of the memory once the amount of total RAM reaches the neighborhood of 880 MB. This has historically not been a problem, especially for desktop machines.

To be able to use all the RAM on an 1GB machine or better, the kernel needs to be recompiled. Go into 'make menuconfig' (or whichever config is preferred) and set the following option:

Linux Kernel Configuration: Large amounts of memory

Processor Type and Features ----> High Memory Support ---->  (*) 4GB

This applies both to 2.4 and 2.6 kernels. Turning on high memory support theoretically slows down accesses slightly, but according to Joseph_sys and log, there is no practical difference.

Also, the ck-sources kernel has a patch for 1gb high memory support.

[edit]

The difference among VIRT, RES, and SHR in top output

VIRT stands for the virtual size of a process, which is the sum of memory it is actually using, memory it has mapped into itself (for instance the video card's RAM for the X server), files on disk that have been mapped into it (most notably shared libraries), and memory shared with other processes. VIRT represents how much memory the program is able to access at the present moment.

RES stands for the resident size, which is an accurate representation of how much actual physical memory a process is consuming. (This also corresponds directly to the %MEM column.) This will virtually always be less than the VIRT size, since most programs depend on the C library.

SHR indicates how much of the VIRT size is actually sharable (memory or libraries). In the case of libraries, it does not necessarily mean that the entire library is resident. For example, if a program only uses a few functions in a library, the whole library is mapped and will be counted in VIRT and SHR, but only the parts of the library file containing the functions being used will actually be loaded in and be counted under RES.

[edit]

The difference between buffers and cache

Buffers are allocated by various processes to use as input queues, etc. Most time, buffers are some processes' output, and they are file buffers. A simplistic explanation of buffers is that they allow processes to temporarily store input in memory until the process can deal with it.

Cache is typically frequently requested disk I/O. If multiple processes are accessing the same files, much of those files will be cached to improve performance (RAM being so much faster than hard drives), it's disk cache.

[edit]

Swappiness (2.6 kernels)

Since 2.6, there has been a way to tune how much Linux favors swapping out to disk compared to shrinking the caches when memory gets full.

When an application needs memory and all the RAM is fully occupied, the kernel has two ways to free some memory at its disposal: it can either reduce the disk cache in the RAM by eliminating the oldest data or it may swap some less used portions (pages) of programs out to the swap partition on disk. It is not easy to predict which method would be more efficient. The kernel makes a choice by roughly guessing the effectiveness of the two methods at a given instant, based on the recent history of activity.

Before the 2.6 kernels, the user had no possible means to influence the calculations and there could happen situations where the kernel often made the wrong choice, leading to thrashing and slow performance. The addition of swappiness in 2.6 changes this. Thanks, ghoti!

Swappiness takes a value between 0 and 100 to change the balance between swapping applications and freeing cache. At 100, the kernel will always prefer to find inactive pages and swap them out; in other cases, whether a swapout occurs depends on how much application memory is in use and how poorly the cache is doing at finding and releasing inactive items.

The default swappiness is 60. A value of 0 gives something close to the old behavior where applications that wanted memory could shrink the cache to a tiny fraction of RAM. For laptops which would prefer to let their disk spin down, a value of 20 or less is recommended.

As a sysctl, the swappiness can be set at runtime with either of the following commands:

sysctl -w vm.swappiness=30  echo 30 >/proc/sys/vm/swappiness

The default when Gentoo boots can also be set in /etc/sysctl.conf:

File: /etc/sysctl.conf

# Control how much the kernel should favor swapping out applications (0-100) vm.swappiness = 30

Some patchsets (e.g. Con Kolivas' ck-sources patchset) allow the kernel to auto-tune the swappiness level as it sees fit; they may not keep a user-set value.

[edit]

Autoregulation

gentoo-sources (and probably other gentoo 2.6 kernels) prior to 2.6.7-gentoo contains the Con Kolivas autoregulated swappiness patch. This means that the kernel automatically adjusts the /proc/sys/vm/swappiness value as needed during runtime, so any changes you make will be clobbered next time it updates. A good explanation of this patch and how it works is on KernelTrap.

I repeat: With gentoo-sources (prior to 2.6.7-gentoo) it is neither necessary nor possible to permanently adjust the swappiness value. It's taken care of automatically, no need to worry.

gentoo-sources no longer contains this patch as of 2.6.7-gentoo. The maintainer of gentoo-sources, Greg, pulled the autoregulation patch from the ebuild. http://bugs.gentoo.org/show_bug.cgi?id=54560

INFO ON Promiscuous mode

INFO ON Promiscuous mode

Promiscuous mode is usually initiated by a network sniffer of some sort. Like Ethereal or dsniff. You may want to check your running processes and verify you're not running something like that or even a trojaned version or something normal. If I saw that in my logs, I would be concerned. You can see promiscuous mode by running /sbin/ifconfig -a

normal: look--> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:7153852 errors:0 dropped:0 overruns:0 frame:0 TX packets:6107958 errors:0 dropped:0 overruns:0 carrier:14

Promiscuous: look--> UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1 RX packets:7153858 errors:0 dropped:0 overruns:0 frame:0 TX packets:6107962 errors:0 dropped:0 overruns:0 carrier:14

In this case its normal state and can be ignored :

eth0      Link encap:Ethernet HWaddr 00:13:21:07:5A:2B
          inet addr:154.1.33.140 Bcast:154.1.33.255 Mask:255.255.255.128
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:15985086 errors:0 dropped:0 overruns:0 frame:0
          TX packets:21467022 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:7258750788 (6922.4 Mb) TX bytes:15187993324 (14484.3 Mb)
          Interrupt:25

eth1      Link encap:Ethernet HWaddr 00:13:21:07:5A:2A
          inet addr:172.18.39.17 Bcast:172.18.39.127 Mask:255.255.255.128
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:39861538 errors:0 dropped:0 overruns:0 frame:0
          TX packets:78187478 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2552135645 (2433.9 Mb) TX bytes:118484247360 (112995.3 Mb)

Veritas volume manager documentation

Installation
Installing Veritas volume manager
Managing the rootdg diskgroup
Mirroring the operating system

Third-party reference websites
VxVM/VxFS mailing list
www.cuddletech.com/veritas
SSA helpful hints

Maintenance
Replacing a failed boot disk
Performing maintenance while booted from cdrom.
Removing the OS from VxVM control
Creating a bootable disk from the OS mirror

Volume manager reference
Installation guide [3.5] [4.0]
Administrator's guide [3.5] [4.0] [4.1]
Hardware notes [3.5] [4.0] [4.1]
Troubleshooting guide [3.5] [4.0] [4.1]
User's guide -- Veritas enterprise administrator [3.5] [4.0]
Release notes [3.5] [4.0] [4.1]
Veritas FlashSnap point-in-time copy solutions -- Administrators guide [1.1] [4.1]

Filesystem reference
Installation guide [3.5] [4.0]
Administrator's guide [3.5] [4.0] [4.1]

Monday, December 18, 2006

Changing TimeZone to IST

Changing TimeZone to IST

Solution:

1) Log in as root

2) # cd /usr/share/lib/zoneinfo

Create a file called asia.ind

3) # vi asia.ind

RULE IST min max - Apr 1 0:00 0:00 D

Zone Asia/IST 5:30 1:00 IST

4) # zic asia.ind

5) #TZ=Asia/IST

6) #export TZ

To make it permanent put the entry in profile and also modify the /etc/TIMEZONE file accordingly.

Output:

# date

Sat Dec 9 16:07:49 IST 2000

To see the GMT timing

# date -u

Sat Dec 9 10:38:06 GMT 2000

SunService Tip Sheet: Network Security

INFODOC ID: 13335 SYNOPSIS: NETWORK SECURITY PSD/FAQ DETAIL DESCRIPTION: Product Support Document (PSD) for Network Security Revision 1.1 Date: April 18, 1996 1.0: About Network Security 1.1: Network Security Definitions 2.0: Network Security Debugging 3.0: Common How Tos 3.1: How to Prevent Remote Root Logins Under SunOS 3.2: How to Prevent Remote Root Logins Under Solaris 3.3: How to Turn Off Specific Network Services 3.4: How to Insure the Security of NFS Partitions 3.5: How to Insure the Security of NIS Maps 3.6: How to Insure the Security of NIS+ Maps 3.7: How to Keep Up to Date with the Latest Security Problems 3.8: How to Take Additional Steps to Secure Your Site 4.0: Frequently Asked Questions 5.0: Network Security Patches 5.1: Miscellaneous Networking Patches 5.2: DNS Patches 5.3: FTP Patches 5.4: Inetd Patches 5.5: NFS Patches 5.6: NIS Patches 5.7: nscd Patches 5.8: Sendmail Patches 6.0: Known Bugs & RFEs 7.0: References 8.0: Supportability 9.0: Additional Support 1.0 About Network Security 1.0: About Network Security =========================== As the internet gets continually bigger, the question of network security becomes an ever larger one. What follows are some tips and guidelines that you can use to get yourself started with network security. If network security is an extremely critical issue at your site, consider working with the Consulting services described in Section 9.0, because this document can really only touch the surface of a very important subject. This PSD tries to impart the following information: first, what can be done on a brand-new Sun to setup basic network security second, what public-domain or SunSoft programs can be used to improve security even further. It does not discuss security issues unrelated to the network (e.g. setuid programs, file permissions, restricted shells, etc), but you should consider these matters when you are working to secure your system. 1.1: Network Security Definitions --------------------------------- A lot of varied ground is covered in this PSD. The following terms are important to some parts of it: FIREWALL: A machine positioned in between your internal network and external networks (usually the internet). The most strict firewalls prevent any packets from being transmitted from the internal to external networks, depending on PROXY SERVERS for needed functionality. Less strict firewalls only prevent certain types of insecure packets (i.e., X11 packets) from being passed. PROXY SERVER: A daemon run on a firewall, which acts as a sort of go-between, accepting packets from the internal network and connecting them up to services on the external network, as appropriate (or vice-versa). Proxy servers are useful because they allow connections to go beyond a firewall, but still hide all information on the internal network from external users. FILTERING ROUTER: Really, a type of firewall. Filtering routers can be programmed to block certain types of packets (X11, sendmail, telnet, etc). 2.0: Network Security Debugging =============================== The best way to "debug" network security is to read this Tip Sheet and take a look at the programs noted in section 3.8. In particular: security scanners can point out security holes in your current setup, while firewalls and TCP/IP wrappers can be set up to provide a high level of logging, giving lots of information about security-related network activity. 3.0 Common How Tos 3.1: How to Prevent Remote Root Logins Under SunOS --------------------------------------------------- Remote root login permissions are controlled by the /etc/ttytab file under SunOS. To change remote root login permissions, you must modify every single 'network' line in the /etc/ttytab files. Root access over the network is denied if all of the network ttys are labelled unsecure: ttyp0 none network off unsecure After making changes to the ttytab, you must HUP process 1: # kill -HUP 1 Alternatively, you can reboot the machine. 3.2: How to Prevent Remote Root Logins Under Solaris ---------------------------------------------------- In the file /etc/default/login, there is a CONSOLE line. If there is no comment in front of the CONSOLE line, root can only login from the console: CONSOLE=/dev/console Changes to this file will take effect at once. 3.3: How to Turn Off Specific Network Services ---------------------------------------------- Network programs can be started from a variety of places. You must know how they are started in order to turn them off. The majority of network services that you will wish to disable are enabled in the /etc/inetd.conf file. To disable one of these services, simply comment out the appropriate line. For example, to disallow logins you would want to disable the following three services: #telnet stream tcp nowait root /usr/sbin/in.telnetd in.telnetd #shell stream tcp nowait root /usr/sbin/in.rshd in.rshd #login stream tcp nowait root /usr/sbin/in.rlogind in.rlogind ftpd, tftpd, fingerd and many other internet services can all be disabled in a similar manner. Afterwards, you must restart the inetd: # kill -HUP inetd-pid Most other network services (sendmail, rpc.nisd, etc) are initiated from the rc files. If a network service does not appear in the inetd, you should search through your rc files, find where it is started, and then comment out the daemon so that it does not start on bootup. You'll of course have to kill the currently running daemon to disable the service immediately. 3.4: How to Insure the Security of NFS Partitions ------------------------------------------------- When you are exporting NFS partitions, if you don't have a firewall or filtering router that prevents NFS packets from leaving your domain, you should make sure to restrict access rights to that partition. This is done by restricting the rw or ro option to a specific machine, a list of machines or a netgroup. Examples of using the rw and ro options in this way follow: rw=machine1 rw=machine1,machine2,machine3 rw=netgroup ro=machine1 ro=machine1,machine2,machine3 ro=netgroup These options should be incorporated into your export file in a way appropriate to your network. For example, under, SunOS, you might use the following: %% cat /etc/exportfs /export -ro=machine1 Under Solaris, you might use the following: %% cat /etc/dfs/dfstab share -F nfs -o ro=netgroup /export Note that it is most convenient to set up a netgroup that contains a listing of all of your machines and then export all of your NFS partitions to that netgroup. SunService has a NIS PSD that gives good information on how to set up a netgroup. 3.5: How to Insure the Security of NIS Maps ------------------------------------------- By default, NIS is not particularly secure. Anyone can grab a copy of NIS maps by simply figuring out the name of your NIS domain. Thus, the first step in securing NIS to use a nonintuitive name for your defaultdomain. Something that is not a derivation of your domain name or machine name is best. Newer versions of NIS allow you to further secure things via the securenets file. If you are using SunOS 4.1.3_U1 or lower or NSkit 1.0 or lower, you need to apply the appropriate patch before the securenets file can be used. SunOS 4.1.4 and newer versions of NSkit already have this available by default. The contents of the /var/yp/securenets file should be a number of lines that each read: netmask address For example, if you only wanted the machines 150.101.16.28 and 150.101.16.29 to be able to retrieve your NIS maps, you could enter the two lines: 255.255.255.255 150.101.16.28 255.255.255.255 150.101.16.29 If you wanted everyone on the network 150.101.16.0 to be able to retrieve your maps, you could enter the line: 255.255.255.0 150.101.16.0 3.6: How to Insure the Security of NIS+ Maps -------------------------------------------- NIS+ provides much better security than NIS and is highly suggested if you are worried about the security of your network. You can control who can access your maps with the NIS+ access rights. Type nisdefaults to examine the default values for your NIS+ domain: %% nisdefaults -r ----rmcdr---r--- Type niscat -o to determine the rights on an individual table: %% niscat -o passwd ... Access Rights : ----rmcdr---r--- Remember that these access rights are laid out in the format: nobody, owner, group, world, and that each of these four user groups has four access rights: read, modify, create, destroy. In the above examples, owner has all rights, while group and world have only read rights. Nobody has no rights. The above setup is secure under NIS+, since only people who are authenticated into your domain are able to look at your tables. You should only worry if you have extended rights to the nobody group. This might be required if you need to extend rights to NIS clients or to unauthenticated clients, but you should be aware that it reduces your security. If you need to change your access rights, you should use the nischmod command. NIS+ is very powerful and you can give rights to entire objects or individual table entries. Consult the nischmod for information on how to do this. 3.7: How to Keep Up to Date with the Latest Security Problems ------------------------------------------------------------- Though this document explains how to make your network services more secure, there are constantly new issues cropping up. If you want to make sure that your network remains secure in the future, you need to keep up with these new problems. Fortunately, there is an excellent third-party mailings list on the net that tells of network security problems as they become known. The CERT (Computer Emergency Response Team) Coordination Center publishes CERT advisories that tell of the newest network security problems for all OSes. You can subscribe to it by mailing: cert-advisory-request@cert.org 3.8: How to Take Additional Steps to Secure Your Site ----------------------------------------------------- Sections 3.1 through 3.6 of this document describe how to make your system as secure as possible, using the tools that came with it. Following those basic guidelines should provide more than enough security for most sites. However, there are some cases where you might want to implement additional levels of security, even preventing certain services from arriving at your site, in the name of making your network security even more ironclad. Addressed here are certain public-domain or Sun products that are especially helpful. There are also many third-party security programs available from other vendors. Firewalls --------- Firewalls are the ultimate in security, because they can totally isolate you from most of the network. Sun sells a Firewall product named Firewall-1. Consult with your local Sun Sales office for info on how to purchase Firewall-1. A separate PSD exists on the FW-1 Product. Security Scanners ----------------- Certain Security Scanners have been written that check for a number of common security problems. In general, if you've secured everything as noted in 3.1-3.6 and applied the patches described in section 5.0, you won't have any problems. However, you mightcan still wish to try these out. Cops checks for all kinds of common problems (including many non-network related ones) on a single machine: ftp://ftp.cert.org/pub/tools/cops ISS (the Internet Security Scanner) can probe for common security problems on an entire network of machines: ftp://ftp.cert.org/pub/tools/iss TCP/IP Wrappers --------------- These public domain wrappers can be used to log the use of certain TCP/IP programs (i.e., telnetd) and also to prevent access from certain sites or networks. They are available from: ftp://ftp.cert.org/pub/tools/tcp_wrappers Other Programs -------------- A number of other public domain network security programs are available from: ftp://ftp.cert.org/pub/tools You can also be interested in joining the CERT Tools mailing list, that announces the release of new security tools. You can request membership on the CERT Tools mailing list by dropping a line to: cert-tools-request@cert.org 4.0: Frequently Asked Questions =============================== Q: Why should I worry about security? A: There are crackers out on the net and in time just about every single site gets hit by them. Fortunately, if your site has a minimum level of security (i.e., no gaping security holes), the crackers will move on. Although the advice in this document will not necessarily make your site impregnable, it will provide sufficient security to keep 99.9%% of potential attackers out. Q1: What issues should I consider when setting up security at my site? Q2: How for should I go with implementing security policies on my network? A: When considering security at your site, you need to balance level of security with ease of use. Although a lot of basic security can be accomplished with no inconvenience to your users, when you start working with firewalls, TCP wrappers and filtering routers, you will be taking some functionality away from your users. Based on the visibility of your site and the confidentiality of its data, you must determine how much is enough and how much is too much. This varies from site to site, but a Security Consultant might be able to give you a little more direction in making this decision. Q: What services might I want to disable? TFTP should absolutely be disabled if you don't have diskless clients taking advantage of it. Many sites will also disable finger, so that external users can't figure out the user names of your internal users. Everything else pretty much depends on the needs of your site. Do people need to login from outside your network? FTP? If services are not being used, disabling them can prevent later unauthorized use. 5.0 Network Security Patches 5.0: Network Security Patches ============================= The followings patches specifically fix some manner of security problem in the listed network program. In no way is this a complete list of network patches, but simply a list of those that can have impact upon site security. 5.1: Miscellaneous Networking Patches ------------------------------------- 100567-04 SunOS 4.1,4.1.1, 4.1.2, 4.1.3: mfree and icmp redirect security 101587-01 SunOS 4.1.3_U1: security patch for mfree and icmp redirect Makes a machine resistant to ICMP spoofing. 5.2: DNS Patches ---------------- 102167-03 SunOS 5.3: dns fix 102165-03 SunOS 5.4: nss_dns.so.1 fixes Make DNS more resistant to spoofing. These do not upgrade in.named to Bind 4.9.3 and thus some security holes remain. These should be corrected in the near future. 5.3: FTP Patches ---------------- 101640-03 SunOS 4.1.3: in.ftpd logs password info when -d option is used. Fixes an error that caused in.ftpd to log passwords when run with the -d option. 5.4: Inetd Patches ------------------ 101786-02 SunOS 5.3: inetd fixes 102922-03 SunOS 5.4: inetd fixes 102923-03 SunOS 5.4_x86: inetd fixes Mostly related to performance problems, but also fixes a minor security hole. 5.5: NFS Patches ---------------- 102177-04 SunOS 4.1.3_U1: NFS Jumbo Patch 102394-02 SunOS 4.1.4: NFS Jumbo Patch Fix various NFS security holes. 5.6: NIS Patches ---------------- 100482-07 SunOS 4.1 4.1.1 4.1.2 4.1.3: ypserv and ypxfrd Jumbo Patch 101435-02 SunOS 4.1.3_U1: ypserv and ypxfrd fix 101363-09 NSkit 1.0: Jumbo Patch These patches are required to allow the usage of the securenets file in these older releases of NIS. 102034-01 SunOS 5.3: portmapper security hole Fixes for NIS-related portmapper security holes. 102707-02 SunOS 5.3: jumbo patch for NIS commands 102704-02 SunOS 5.4: jumbo patch for NIS commands 102705-02 SunOS 5.4_x86: jumbo patch for NIS commands These patches fix ypxfr related NIS problems that became apparent with NSKit 1.2. 103053-01 SunOS 5.4, 5.3: Jumbo patch for NSKIT v1.2 A patch specifically for NSKit 1.2, fixing a few more security problems. 5.7: nscd Patches ----------------- 103279-02 SunOS 5.5: nscd breaks password shadowing with NIS+ Keeps nscd from violating NIS+ shadow passwd security. 5.8: Sendmail Patches --------------------- 100377-22 SunOS 4.1.3: sendmail jumbo patch 101665-07 SunOS 4.1.3_U1: sendmail jumbo patch 102423-04 Sunos 4.1.4: Sendmail jumbo patch Fix a variety of older sendmail problems. These patches do not bring the SunOS sendmail up to version 8.6.10+ and so certain new sendmail problems are not yet addressed. They will be in the near future. 101739-12 SunOS 5.3: sendmail jumbo patch - security 102066-11 SunOS 5.4: sendmail jumbo patch - security 102064-10 SunOS 5.4_x86: sendmail jumbo patch - security These patches bring Solaris sendmail up to version 8.6.10+, fixing all known sendmail security holes. 6.0 Known Bugs And RFEs 6.1: Bugs --------- 1238679 DNS spoofing is possible per Cern ca-96.02 This documents the existing DNS security hole noted above, which is present in releases of BIND prior to 4.9.3. An upgrade of the SunOS and Solaris BINDs to 4.9.3 is currently in process and should be accomplished within the next few months. 1237810 4.1.x Unix sendmail vulnerability according to CIAC Bulletin G- This documents the fact that 4.1.X sendmail has some security holes because it has not been upgraded to 8.6.10+. An upgrade of the SunOS sendmail to 8.6.10+ is currently in process and should be accomplished within the next few months. 7.0 References 7.1: Important Man Pages ------------------------ No specific man pages refer to internet security. 7.2: Sunsolve Documents ----------------------- The following sunsolve documents provide some information not included here. 7.2.1: Infodocs --------------- 2105 convert existing NIS(yp) network to secure C2 domain 12793 What are the security ramifications of running TFTP? 7.2.2: SRDBs ------------ 6010 C2 security and Solaris 2.x 7.3: Sun Educational Services ----------------------------- There are no classes specifically on network security. 7.4: Solaris Documentation -------------------------- There is no Solaris documentation specifically on network security. 7.5: Third Party Documentation ------------------------------ The following books are not necessarily restricted to the network aspect of security, but do provide some information on it. _Computer Security Basics_, published by O'Reilly & Associates _Practical UNIX Security_, published by O'Reilly & Associates 7.6: RFCs --------- 1244 the Site Security Handbook 1281 Guidelines for the Secure Operation of the Internet 8.0: Supportability =================== SunService is not responsible for helping design security policies at your site. It is hoped that this document will help you to maintain robust network security at your site on your. We can help resolve problems where a Sun program has a security problem with it, but in such cases the contact must be a system administrator with a good understanding of the security issues. 9.0: Additional Support ======================= For additional help determining security policies at your site, please contact your local SunService office for possible consulting offerings. Sun's Customer Relations organization can put you in touch with your local SunIntegration or Sales office. You can reach Customer Relations at 800-821-4643. PRODUCT AREA: Gen. Network PRODUCT: Security SUNOS RELEASE: any HARDWARE: n/a

SunService Tip Sheet: Sun NFS

INFODOC ID: 11987 SYNOPSIS: NFS PSD/FAQ DETAIL DESCRIPTION: 1.0 About NFS SunService Tip Sheet for Sun NFS Revision: 2.9 Date: June 25, 1996 Mail to: brian.hackley@east.sun.com Mail to: gwhite@east.sun.com Table of Contents 1.0: About NFS 2.0: Debugging NFS 2.1: share and exportfs 2.2: showmount 2.3: nfsstat 2.4: rpcinfo 2.5: etherfind and snoop 2.6 Running a snoop of NFS requests: 2.7 Lockd debug hints 3.0: Common How Tos 3.1: Exporting Filesystems Under SunOS 3.2: Exporting Filesystems Under Solaris 3.3: Mounting Filesystems Under SunOS 3.4: Mounting Filesystems Under Solaris 3.5: Setting Up Secure NFS 4.0: Some Frequently Asked Questions 4.1: Miscellaneous NFS Questions 4.2: Problems Mounting Filesystems on a Client 4.3: Common NFS Client Errors Including NFS Server Not Responding 4.4: Problems Umounting Filesystems on a Client 4.5: Interoperability Problems With Non-Sun Systems 4.6: Common NFS Server Errors 4.7: Common nfsd Error Message on NFS Servers 4.8: Common rpc.mountd Error Messages on NFS Servers 4.9: Common rpc.lockd & rpc.statd Error Messages 4.10: NFS Related Shutdown Errors 4.11: NFS Performance Tuning 5.0: Patches 5.1: Core NFS Patches for SunOS 5.2: Patches Related to NFS for SunOS 5.3: Core NFS Patches for Solaris 5.4: Patches Related to NFS for Solaris 6.0: Known Bugs & RFEs 7.0: References 7.1: Important Man Pages 7.2 Sunsolve Documents 7.3 Sun Educational Services 7.4: Solaris Documentation 7.5: Third Party Documentation 7.6: RFCs 8.0: Supportability 9.0: Additional Support 1.0: About NFS This Tip Sheet documents a wide variety of information concerning NFS, as implemented in the SunOS and Solaris operating systems. It is intended as both an introduction to NFS and as a guide to the most common problems. There are many more complete references to NFS, a few of which are noted in section 7.4 and 7.5. The following terms are important to an understanding of NFS: The NFS SERVER is the machine that makes file systems available to the network. It does so by either EXPORTING (SunOS term) or SHARING (Solaris term) them. The NFS CLIENT is the machine that accesses file systems that have been made available. It does so by MOUNTING them. A number of different daemons are involved with NFS: RPC.MOUNTD only runs on NFS servers. It answers initial requests from clients for file systems. NFSD runs on NFS servers. They are the daemons that deal with the majority of the client NFS requests. On SunOS 4.1.X, BIODS (block I/O daemons) help clients with their NFS requests. These do not exist on Solaris 2.X. LOCKD and STATD are a set of daemons that keep track of locks on NFS files. There will typically be a set of daemons running on a client and server. NFS partitions can be mounted in one of two ways, hard or soft. HARD MOUNTS are permanent mounts designed to look just like any normal, local file system. If a partition that is hard mounted becomes unavailable, client programs will keep trying to access it forever. This will cause local processes to lock when a hard mounted disk goes away. Hard mounts are the default type of mount. SOFT MOUNTS will fail after a few retries if a remote partition becomes unavailable. This is a problem if you are writing to the partition, because you can never be sure that a write will actually get processed on the other hand, your local processes will not lock up if that partition does go away. In general, soft mounts should only be used if you are solely reading from a disk and even then it should be understood that the mount is an unreliable one. If you soft mount a partition that will be written to, you are nearly guaranteeing that you will have problems. There are a number of files related to NFS: /etc/exports (SunOS) or /etc/dfs/dfstab (Solaris) lists which files to export on a Server. These file are maintained by hand. /etc/xtab (SunOS) or /etc/dfs/sharetab (Solaris) lists the filesystems that actually are currently exported. They are maintained by exportfs and share, respectively. /etc/rmtab on a server lists filesystems that are remotely mounted by clients. This file is maintained by rpc.mountd. /etc/fstab (SunOS) or /etc/vfstab (Solaris) lists which files to mount on a client. These files are maintained by hand. /etc/mtab (SunOS) or /etc/mnttab (Solaris) on a client lists filesystems which are currently mounted onto that client. The mount and umount commands modify this file. 2.0 Debugging NFS General NFS Debugging Note When NFS is not working or working intermittently, it can be very difficult to track down what exactly is causing the problem. The following tools are the best ones available to figure out what exactly NFS is doing. 2.1: share and exportfs share (on Solaris) and exportfs (on SunOS) are good tools to use to see exactly how a NFS server is exporting its filesystems. Simply log on to the NFS server and run the command that is appropriate for the OS. SunOS: # exportfs /usr -root=koeller /mnt /tmp The above shows that /mnt and /tmp are exported normally. Since we see neither rw or ro as options, this means that the default is being used, which is rw to the world. In addition /usr gives root permissions to the machine koeller. Solaris: # share - /var rw=engineering "" - /usr/sbin rw=lab-manta.corp.sun.com "" - /usr/local rw "" The above shows that /usr/local is exported normally, /var is exported only to engineering (which happens to be a netgroup) and /usr/sbin is exported only to lab-manta.corp.sun.com (which is a machine). Note: netgroups are only supported if you are running NIS or NIS+. Consult documentation on those products for how to set up netgroups on your machines 2.2: showmount showmount, used with the -e option, can also show how a NFS server is exporting its file systems. Its benefit is that it works over the network, so you can see exactly what your NFS client is being offered. However, showmount does not show all of the mount options and thus you must sometimes use share or exportfs, as described in section 2.1. When you do a test with showmount, do it from the NFS client that is having problems: # showmount -e psi export list for psi: /var engineering /usr/sbin lab-manta.corp.sun.com /usr/local (everyone) # showmount -e crimson export list for crimson: /usr (everyone) /mnt (everyone) /tmp (everyone) Note that showmount only displays: the partition and who can mount it. We will not see any other options displayed. In the example above, there are no restrictions on who can mount crimson's partitions and so showmount lists (everyone). 2.3: nfsstat The nfsstat command gives diagnostics on what type of messages are being sent via NFS. It can be run with either the -c option, to show the stats of an NFS client, or the -s option, to show the stats of an NFS server. When we run 'nfsstat -c' we see the following: # nfsstat -c Client rpc: calls badcalls retrans badxids timeouts waits newcreds 45176 1 45 3 45 0 0 badverfs timers toobig nomem cantsend bufulocks 0 80 0 0 0 0 Client nfs: calls badcalls clgets cltoomany 44866 1 44866 0 Version 2: (44866 calls) null getattr setattr root lookup readlink read 0 0%% 7453 16%% 692 1%% 0 0%% 15225 33%% 55 0%% 13880 30%% wrcache write create remove rename link symlink 0 0%% 5162 11%% 623 1%% 914 2%% 6 0%% 306 0%% 0 0%% mkdir rmdir readdir statfs 15 0%% 0 0%% 467 1%% 68 0%% The rpc stats at the top are probably the most useful. High 'retrans' and 'timeout' values can indicate performance or network issues. The client nfs section can show you what types of NFS calls are taking up the most time. This can be useful if you're trying to figure out what is hogging your NFS. For the most part, the nfsstat command is most useful when you are doing network and performance tuning. Section 7.4 and 7.5 list books that give some information on this they are useful to make more sense of the nfsstat statistics. 2.4: rpcinfo You can test that you have a good, solid NFS connection to your NFS server via the rpcinfo command. As explained in the man page, this program provides information on various rpc daemons, such as nfsd, rpc.mountd, rpc.statd and rpc.lockd. Its biggest use is to determine that one of these daemons is responding on the NFS server. The following examples all show the indicated daemons correctly responding. If instead you get complaints about a service 'not responding,' there might be a problem. To see that nfsd is responding: # rpcinfo -T udp crimson nfs program 100003 version 2 ready and waiting [crimson is the name of the remote machine that I am testing] To see that mountd is responding: # rpcinfo -T udp crimson mountd program 100005 version 1 ready and waiting program 100005 version 2 ready and waiting To see that lockd is responding: # rpcinfo -T udp crimson nlockmgr program 100021 version 1 ready and waiting rpcinfo: RPC: Procedure unavailable program 100021 version 2 is not available program 100021 version 3 ready and waiting # rpcinfo -T udp crimson llockmgr program 100020 version 2 ready and waiting (the procedure unavailable error for the nlockmgr seems to be normal for most systems) If you run rpcinfo and determine that certain rpc services are not responding, check those daemons on the master. If none of the above works, you can verify that RPC services are working at all on the server by running: # rpcinfo remote-machine-name If this gives errors too, there is probably an issue with portmap (SunOS) or rpcbind (Solaris). [Note: the above rpcinfo commands will vary slightly under Solaris 2.5 or higher, as those OSes will offer NFS version 3, running over TCP, rather than UDP.] 2.5: etherfind and snoop If all else fails and NFS still doesn't seem to be working right, the last resort in debugging is to use a network sniffer program, such as etherfind (SunOS) or snoop (Solaris). This can give you some indication of whether remote machines are responding at all. Below is an example of a totally normal NFS interaction, shown by snoop on Solaris: # snoop psi and rainbow-16 Using device /dev/le (promiscuous mode) psi -> rainbow-16 NFS C GETATTR FH=4141 rainbow-16 -> psi NFS R GETATTR OK psi -> rainbow-16 NFS C READDIR FH=4141 Cookie=2600 rainbow-16 -> psi NFS R READDIR OK 1 entries (No more) These were the results when an 'ls' was run on 'psi' in a directory that was mounted from 'rainbow-16'. The lines labelled 'C' are NFS requests, while the lines labelled 'R' are NFS replies. Through snoop you can easily see: NFS not being responding to (you would get lots of 'C' lines without 'R' replies to them) and also certain errors (timeouts and retransmits particularly). The man page on snoop gives some indication of how to make more in-depth use of the tool. In general, it should only be used for very complex issues, where NFS is behaving very oddly and even then you must be very good with NFS to perceive unexpected behavior. See the next section for more tips on snoop. 2.6 Running a snoop of NFS requests: This is best done from a third, uninvolved machine. If the machine that you are trying to debug is on "systemA", then trace the packets going to and from "systemA" as follows: snoop -o /var/snoop.out systemA Alternatively, snoop between systemA and clientB: snoop -o /var/snoop.out systemA clientB snoop will run in the window (do not put into background), with a counter of the packets in the corner of the screen. It will dump all the packets into a raw snoop file. When the "network event" occurs, wait a couple of seconds and then kill the snoop job. If the network event includes error messages in /var/adm/messages, please send us the /var/adm/messages and the raw snoop file (/var/snoop.out). You can read the snoop file with snoop -i /var/snoop.out more. There are a variety of options in the snoop man page to increase verbosity, and/or to look at specific packets. Please note that if disk space becomes an issue, you must take steps similar to those listed above. A very large snoop file can be created in two or three minutes, so snooping is best reserved for easily reproduced events. 2.7 Lockd debug hints Please see section 4.9: Common rpc.lockd & rpc.statd Error Messages for information regarding specific lockd and statd problems. Generally you can pick out problem clients by snooping and/or putting lockd into debug mode. Sections 2.5 and 2.6 cover snoop. How to put the Solaris 2.3 and 2.4 lockd into debug mode: Edit the line in the /etc/init.d/nfs.client script that starts up lockd to start it with -d3 and redirect stdout to a filesystem with ALOT of disk space. /usr/lib/nfs/lockd -d3 > /var/lockd.debug.out Note that lockd always creates an empty file in the pwd, called logfile when it is running in debug mode. Disregard this file. If disk space becomes an issue from doing the lockd debug mode, you will have to stop lockd and restart it. If you turn the above command from a shell, make sure it is a bourne or korn shell (sh or ksh). How to put the Solaris 2.5 lockd into debug mode: You will have to do this from a shell, preferably a command tool window. You must capture the debug output that scrolls by into a script file: script /var/lockd.out /usr/lib/nfs/lockd -d3 After you are done debugging, CTRL/C the lockd job since it does not fork and exec to the background. Then exit or CTRL/D the script job. The debug output will be in /var/lockd.out. Please note that Solaris 2.5 will also log more detailed debug output into the /var/adm/messages file. We will need that also. How to run a truss of lockd (you rarely need to do this): Just modify the start of lockd to be a truss of the lockd process. You will need even more disk space to do this! For Solaris 2.3 and 2.4: truss -o /var/truss.out -vall -f /usr/lib/nfs/lockd -d3>var/lockd.debug.out For Solaris 2.5: script /var/lockd.out truss -o /var/truss.out -vall -f /usr/lib/nfs/lockd -d3 CTRL/C the script job (and exit from the script shell if on 2.5) after you have reproduced the problem. If disk space becomes an issue from doing the truss, use a cron job to: 1. stop running truss 2. move "current" truss file to "old" truss file 3. get PID of lockd 4. truss -o /var/truss.out.current -vall -f -p PID (PID from step 3). 3.0 Common How-Tos 3.1: Exporting Filesystems Under SunOS In order to export a fs under SunOS, you must first edit the file /etc/exports on your NFS server, adding a line for the new filesystem. For example, the following /etc/exports file is for a server that makes available the filesystems /usr, /var/spool/mail and /home: # cat /etc/exports /usr /var/spool/mail /home You can add normal mount options to these lines, such as ro, rw and root. These options are fully described in the exports man page. The following example shows our /etc/exports file, but this time with the filesystems all being exported read only: # cat /etc/exports /usr -ro /var/spool/mail -ro /home -ro If your machine is already exporting filesystems and you are adding a new one, simply run the exportfs command to make this new filesystem available: # exportfs -a If you have never exported filesystems from this machine before, you should reboot it after editing the /etc/exports file. This will cause rpc.mountd and nfsd to get started and will also automatically export out the filesystems. 3.2: Exporting Filesystems Under Solaris You must edit the file /etc/dfs/dfstab in order to make files automatically export on a Solaris system. The standard syntax of lines in that file is: share -F nfs partition For example, the following /etc/dfs/dfstab file is for a server that makes available the filesystems /usr, /var/spool/mail and /home: share -F nfs /usr share -F nfs /var/spool/mail share -F nfs /home You can add normal mount options to these lines, such as ro, rw and root. this is done by proceeding the options with a -o flag. These options are fully described in the share man page. The following example shows our /etc/dfs/dfstab file, but this time with the filesystems all being exported read only: share -F nfs -o ro /usr share -F nfs -o ro /var/spool/mail share -F nfs -o ro /home If your machine is already exporting filesystems and you are adding a new one, simply run the shareall command to make this new filesystem available: # shareall If you have never exported filesystems from this machine before, you must run the nfs.server script: # /etc/init.d/nfs.server start (The NFS Server will come up fine on the next boot, now that an /etc/dfs/dfstab file exists) 3.3: Mounting Filesystems Under SunOS You can always mount file systems with the mount command, with the following syntax: mount remotemachine:/remotepartition /localpartition For example: mount bigserver:/usr/local /usr/local You might also give the mount command any of the general mount options. For example, to mount /usr/local read only, you would use the command: mount -o ro bigserver:/usr/local /usr/local If you wish a filesystem to get mounted every time the machine is booted, you must edit the /etc/fstab file. The syntax is: remotemach:/remotepart /localpart nfs [options] 0 0 The options field is optional and can be left out if none are needed. To make /usr/local mount automatically, you would add the following to your /etc/fstab: bigserver:/usr/local /usr/local nfs 0 0 To make it mount read only, you could use: bigserver:/usr/local /usr/local nfs ro 0 0 3.4: Mounting Filesystems Under Solaris Section 3.3, above, shows how to correctly use the mount command, to interactively mount files. It works exactly the same under Solaris. If you wish a filesystem to get mounted every time the machine is booted, you must edit the /etc/vfstab file. The syntax is: remotemach:/remotepart - localpart nfs - yes [options] For example, to mount the /usr/local partition, you would enter: bigserver:/usr/local - /usr/local nfs - yes - To mount it readonly, you would enter: bigserver:/usr/local - /usr/local nfs - yes ro Consult the vfstab man page if you're interested in knowing what the fields that contain "-"s and "yes" are for. For the most part, they're only relevant for non-NFS mounts. 3.5: Setting Up Secure NFS NFS has built-in private-key encryption routines that can provide added security. To use this functionality, a partition must be both exported with the "secure" option and mounted with the "secure" option. In addition, either NIS or NIS+ must be available. Secure NFS will not work without one of these naming services. To add the -secure option to the /secret/top partition on a SunOS machine, the following exports entry would be needed on the server: /secret/top -secure In addition, the following fstab entry would be needed on the client: server:/secret/top /secret/top nfs rw,secure 0 0 (Solaris machines would have to have the -secure option similarly added.) If you are running NIS+, you will not need to do anything further to access the partition, since NIS+ users and NIS+ hosts will already have credentials created for them. If you are running NIS, you must create credentials for all users and hosts that might want to access the secure partition. Root can add credentials for users with the following command: # newkey -u username Users can create their own credentials with the following command: $ chkey The passwd supplied to these programs should be the same as the user's passwd. Root can add credentials for hosts with the following command: # newkey -h machinename The passwd supplied to newkey in this case should be the same as the machine's root passwd. It is important to note that rpc.yppasswd must be running on your NIS server for these commands to work. In addition, push out publickey maps afterwards to make sure that the most up-to-date credential information is available. Once this is all done, secure NFS should work on your NIS network, with two caveats: First, keyserv must be running on your client machines. If this is not the case, adjust your rc files, so that it automatically starts up. Second, if a user does not supply a passwd when logging in (due to a .rhosts or /etc/hosts.equiv for example) or if his secure key is different than his passwd, then he will need to execute the command 'keylogin' before he can access the secure NFS partition. 4.0 Frequently Asked Questions 4.1: Miscellaneous NFS Questions Q: What version of NFS does Sun implement? A: All of the currently supported revisions of SunOS and Solaris support NFS version 2, over UDP. In addition, Solaris 2.5 will support NFS version 3, over TCP. Although NFS version 3 is the default for Solaris 2.5 and up, the NFS will fall back to version 2 if other machines do not have version 3 capability. Q: What do these NFS Error Codes mean (e.g. NFS write error 49)? A: On SunOS, you can find a list of error codes in the intro(2) man page: # man 2 intro On Solaris, you can consult the /usr/include/sys/errno.h file. SRDB #10946, available through SunSolve also lists some of the NFS error codes. Q: Why isn't my netgroup entry working? A1: There are lots of factors related to netgroup. First, you must be using either NIS or NIS+, to propagate the netgroup. Second, netgroup will only work as ro or rw arguments and even then only when the ro or rw is not being used to override another ro or rw option. Netgroups can not be used as an argument to the root option. A2: NFS requires that the "reverse lookup" capability work such that the hostname returned by looking up the IP address (gethostbyaddr) matches EXACTLY the text specified in the netgroup entry. Otherwise the NFS mount will fail with "access denied" For example, if the NFS server has the following NIS netgroup entry: goodhosts (clienta,,) (clientb,,) (blahblah,,) clienta is at 192.1.1.1 and the Server uses DNS for hostname lookups. The NFS request to do the mount arrives from IP address 192.1.1.1 The NFS server looks up the IP address of 192.1.1.1 to get the hostname associated with that IP address. The gethostbyaddr MUST return "clienta". If it does not, the NFS request will fail with "access denied". telnet from the NFS client to the NFS server and run "who am i". The hostname in parentheses is the name that should be in the netgroup: hackley pts/13 Jan 24 09:21 (mercedes) The most common cause of this failure is failure of a DNS administrator to properly manage the "reverse lookup maps" e.g. 192.1.1.IN-ADDR.ARPA. Q: What can you tell me about CacheFS? A: CacheFS is the "cache file system". It allows a Solaris 2.X NFS client to cache a remote file system to improve performance. For example, CacheFS allows you to be on "clienta" and cache your home directory, which is mounted via NFS from an NFS server. Because most often CacheFS is used in conjunction with the automounter, we have some basic information on CacheFS in our automounter tips sheet (Product Support Document). You can read more about CacheFS in the "NFS Administration Guide" and in the "mount_cachefs" man page. Q: Is there a showfh for Solaris to show the NFS File Handle? A: Yes here it is: #!/bin/sh # # fhfind: takes the expanded filehandle string from an # NFS write error or stale filehandle message and maps # it to a pathname on the server. # # The device id in the filehandle is used to locate the # filesystem mountpoint. This is then used as the starting # point for a find for the file with the inode number # extracted from the filehandle. # # If the filesystem is big - the find can take a long time. # Since there's no way to terminate the find upon finding # the file, you might need to kill fhfind after it prints # the path. # if [ $# -ne 8 ] then echo echo "Usage: fhfind <filehandle> e.g." echo echo " fhfind 1540002 2 a0000 4df07 48df4455 a0000 2 25d1121d" exit 1 fi # Filesystem ID FSID1=$1 FSID2=$2 # FID for the file FFID1=$3 FFID2=`echo $4 tr [a-z] [A-Z]` # uppercase for bc FFID3=$5 # FID for the export point (not used) EFID1=$6 EFID2=$7 EFID3=$8 # Use the device id to find the /etc/mnttab # entry and thus the mountpoint for the filesystem. E=`grep $FSID1 /etc/mnttab` if [ "$E" = "" ] then echo echo "Cannot find filesystem for devid $FSID1" exit 0 fi set - $E MNTPNT=$2 INUM=`echo "ibase=16 $FFID2" bc` # hex to decimal for find echo echo "Now searching $MNTPNT for inode number $INUM" echo find $MNTPNT -mount -inum $INUM -print 2>dev/null 4.2: Problems Mounting Filesystems on a Client Q: Why do I get "permission denied" or "access denied" when I try to mount a remote filesystem? A1: Your remote NFS server is not exporting or sharing its file systems. You can verify this by running the showmount command as follows: # showmount -e servername That will provide you with a list of all the file systems that are being sent out. If a file system is not being exported, you should consult section 3.1 or 3.2, as applicable. A2: Your remote NFS server is exporting file systems, but only to a limited number of client machines, which does not include you. To verify this, again use the command showmount: # showmount -e psi /var engineering /usr/sbin lab-manta.corp.sun.com /usr/local (everyone) In this example, /usr/local is being exported to everyone, /var is being exported to the engineering group, and /usr/sbin is only being exported to the machine lab-manta.corp.sun.com. So, I might get the denial message if I tried to mount /var from a machine not in the engineering netgroup or if I tried to mount /usr/sbin from anything but lab-manta.corp.sun.com. A3: Your machine is given explicit permission to mount the partition, but the server does not list your correct machine name. In the example above, psi is exporting to "lab-manta.corp.sun.com", but the machine might actually identify itself as "lab-manta" without the suffix. Or, alternatively, a machine might be exporting to "machine-le0" while the mount request actually comes from "machine-le1". You can test this by first running "showmount -e" and then physically logging in to the server, from the client that cannot mount, and then typing "who". This will show you if the two names do not match. For example, I am on lab-manta, trying to mount /usr/sbin from psi: lab-manta# mount psi:/usr/sbin /test mount: access denied for psi:/usr/sbin I use showmount -e to verify that I am being exported to: lab-manta# showmount -e psi export list for psi: /usr/sbin lab-manta.corp.sun.com I then login to psi, from lab-manta, and execute who: lab-manta%% rsh psi ... psi# who root pts/6 Sep 8 14:02 (lab-manta) As can be seen, the names "lab-manta" and "lab-manta.corp.sun.com" do not match. The entry shown by who, lab-manta, is what should appear in my export file. When I change it and re-export, I can verify it with showmount and then see that mounts do work: lab-manta[23] showmount -e psi export list for psi: /usr/sbin lab-manta lab-manta[24] mount psi:/usr/sbin /test lab-manta[25] A4: Your client is a member of a netgroup, but it seems that the netgroup does not work. See Section 4.1 for notes on debugging netgroups. Q: Why do I get the following error when I try and mount a remote file system: nfs mount: remote-machine:: RPC: Program not registered nfs mount: retrying: /local-partition A: rpc.mountd is not running on the server. You probably just exported the first filesystem from a machine that has never done NFS serving before. Reboot the NFS server, if it is SunOS 4.X,. The the NFS server is running Solaris 2.X, run the following: /etc/init.d/nfs.server start Note: Consult section 3.1 or 3.2 for information on how to create the exports file on a SunOS 4.X system or on how to create the dfstab file on a Solaris 2.X system. Q: Why doesn't the mountd respond? After I try the mount I get NFS SERVER NOT RESPONDING. When I try to talk to the mountd, rpcinfo gives an rpc timed out error. How can I debug or fix a hung mountd on the NFS server. A: First, try killing the mountd process on the server and restarting it. This gets around many hung mountd issues. Second, make sure the NFS server is "patched up". There is a mountd patch for Solaris 2.3 and we've seen cases where the \patch 101973 patch helps on 2.4. Further troubleshooting tips to debug the hung mountd on Solaris 2.X: 1. get the PID of the running mountd 2. truss -f -vall -p PID 3. start a snoop at the same time you start the truss 4. if you have access to it, run "gcore" or "pstack" (unsupported utilities made available by SunService) to get the stack trace of the mountd PID. Update: pstack is supported at 2.5, /usr/proc/bin/pstack . It is on of the new "proc" tools. Q: Why do I get the message "Device Busy" in response to my mount command? A: You get this message because some process is using the underlying mount point. For example, if you had a shell whose pwd was /mnt and you tried to mount something into /mnt, e.g. mount server:/export/test /mnt you would see this error. To work around this, find the process using the directory and either kill it or move its pwd someplace else. The "fuser" command is extremely handy to do this: mercedes[hackley]:cd /mnt mercedes[hackley]:fuser -u /mnt /mnt: 4368c(hackley) 368c(hackley) In this case you see process # 368 and 4368 are using the /mnt mount point. PID 368 is the shell and PID 4368 was the fuser command. You can forcibly kill any process (must be root) from a mount point using fuser -k /mnt. Please note that fuser is not infallible and cannot identify kernel threads using a mount point (as sometimes happens with the automounter). 4.3: Common NFS Client Errors Including NFS Server Not Responding If a file system has been successfully mounted, you can encounter the following errors when accessing it. Q: Why do I get the following error message: Stale NFS file handle A1: This means that a file or directory that your client has open has been removed or replaced on the server. It happens most often when a dramatic change is made to the file system on the server, for example if it was moved to a new disk or totally erased and restored. The client should be rebooted to clear Stale NFS file handles. A2: If you prefer not to reboot the machine, you can create a new mount point on the client for the mount point with the Stale NFS file handle. Q: Why do I get the following error message: NFS Server <server> not responding NFS Server ok Note, this error will occur when using HARD mounts. This troubleshooting section applies to HARD or SOFT mounts. A1: If this problem is happening intermittently, while some NFS traffic is occurring, though slowly, you have run into the performance limitations of either your current network setup or your current NFS server. This issue is beyond the scope of what SunService can support. Consult sections 7.4 & 7.5 for some excellent references that can help you tune NFS performance. Section 9.0 can point you to where you can get additional support on this issue from Sun. A2: If the problem lasts for an extended period of time, during which no NFS traffic at all is going through, it is possible that your NFS server is no longer available. You can verify that the server is still responding by running the commands: # ping server and # ping -s server 8000 10 (this will send 10 8k ICMP Echo request packets to the server) If your machine is not available by ping, you will want to check the server machine's health, your network connections and your routing. If the ping works, check to see that the NFS server's nfsd and mountd are responding with the "rpcinfo" command: # rpcinfo -u server nfs program 100003 version 2 ready and waiting # rpcinfo -u server mountd program 100005 version 1 ready and waiting program 100005 version 2 ready and waiting If there is no response, go to the NFS server and find out why the nfsd and/or /mountd are not working over the network. From the server, run the same commands. If they work OK from the server, the network is the culprit. If they do NOT work, check to see if they are running. If not, restart them and repeat this process. If either nfsd or mountd IS running but does not respond, then kill it and restart it and retest. A3: Some older bugs might have caused this symptom. Make sure that you have the most up-to-date Core NFS patches on the NFS server. These are listed in Section 5.0 below. In addition, if you are running quad ethernet cards on Solaris, install the special quad ethernet patches listed in Section 5.4. A4: Try cutting down the NFS read and write size with the NFS mount options: rsize=1024,wsize=1024. This will eliminate problems with packet fragmentation across WANS, routers, hubs, and switches in a multivendor environment, until the root cause can be pin-pointed. THIS IS THE MOST COMMON RESOLUTION TO THIS PROBLEM. A5: If the NFS server is Solaris 2.3 and 2.4, 'nfsreadmap' occasionally caused the "NFS server not responding" message on Sun and non-Sun NFS clients. You can resolve this by adding the following entry to your /etc/system file on the NFS server: set nfs:nfsreadmap=0 And rebooting the machine. The nfsreadmap function was removed in 2.5 because it really didn't work. A6: If you are using FDDI on Solaris, you must enable fragmentation with the command: ndd -set /dev/ip ip_path_mtu_discovery 0 Add this to /etc/init.d/inetinit, after the other ndd command on line 18. A7: Another possible cause is IF the NFS SERVER is Ultrix, old AIX, Stratus, and older SGI and you ONLY get this error on Solaris 2.4 and 2.5 clients, but the 2.3 and 4.X clients are OK. The NFS Version 2 and 3 protocol allow for the NFS READDIR request to be 1048 bytes in length. Some older implementations incorrect thought the request had a max length of 1024. To work around this, either mount those problem servers with rsize=1024,wsize=1024 or add the following to the NFS client's /etc/system file and reboot: set nfs:nfs_shrinkreaddir=1 A8: Oftentimes NFS SERVER NOT RESPONDING is an indication of another problem on the NFS server, particularly on the disk subsystem. If you have a SPARCStorage Array, you must verify that you have the most recent firmware and patches due to the volatility of that product. Another general method that can be tried to is look at the output from iostat -xtc 5 and check the svt_t field. If this value goes over 50.0 (50 msec) for a disk that is being used to serve NFS requests, you might have found your bottleneck. Consult the references in Section 7 of this PSD for other possible NFS Server tuning hints. NOTE: NFS Server performance tuning services are only available on a Time and Materials basis. Q: Why can't I write to a NFS mounted file system as root? A: Due to security concerns, the root user is given "nobody" permissions when it tries to read from or write to a NFS file system. This means that root has less access than any user, will only be able to read from things with world read permissions, and will only be able to write to things with world write permissions. If you would like your machine to have normal root permissions to a filesystem, the filesystem must be exported with the option "root=clientmachine". An alternative is to export the filesystem with the "anon=0" option. This will allow everyone to mount the partition with full root permissions. Sections 3.1 and 3.2 show how to include options when exporting filesystems. Q1: Why do 'ls'es of NFS mounted directories sometimes get mangled on my SunOS machine? Q2: Why do I get errors when looking at a NFS file on my SunOS machine? A: By default, SunOS does not have UDP checksums enabled. This can cause problems if NFS is being done over an extended distance, especially if it is going across multiple routers. If you are seeing very strange errors on NFS or are getting corruption of directories when you view them, try turning UDP checksums on. You can do so my editing the kernel file /usr/sys/netinet/in_proto.c, changing the following: int udp_cksum = 0 /* turn on to check & generate udp checksums */ to: int udp_cksum = 1 /* turn on to check & generate udp checksums */ Afterwards, you will must build a new kernel, install it and reboot. UDP checksums must be enabled on both the NFS client and NFS server for it to have any effect. This is only an issue on SunOS machines, as Solaris machines have UDP checksums enabled by default. Q1: Why do I get intermittent errors writing to an NFS partition? Q2: Why do I get intermittent errors reading from an NFS partition? Q3: Why do I get the following error on my NFS partition? "nfs read error on <machine> rpc: timed out" A: These symptoms can all be caused by failures of soft mounts. Soft mounts time out instead of logging an "NFS SERVER NOT RESPONDING" message. Because of this and other reasons, it is recommended that you only mount non-critical read-only servers with soft mounts (e.g. man pages). To resolve the problem, you must solve the underlying problem (See the section above on "NFS server not responding" for troubleshooting assistance. Alternatively, you can mount the NFS server with hard,intr instead of soft, but this will have the effect of causing applications to hang instead of timeout when the NFS servers are unavailable or unreachable. 4.4: Problems Umounting Filesystems on a Client Q: When I try and umount a partition, why do I get the following error: /partition: Device busy A: This means that someone is actively using the partition you are trying to unmount. They might be running a program from it or they might simply be sitting in a subdirectory of the partition. In Solaris, you can run the command fuser to determine what processes are using a partition: # fuser /test /test: 1997c 1985c The above example shows that pids 1985 and 1997 are accessing the /test partition. Either kill the processes or run fuser -k /test to have fuser do this for you. NOTES: This functionality is not available under SunOS. It does not always identify an automounted process on Solaris. In many cases, it is necessary to reboot a machine in order to clear out all of the processes that could be making a file system busy. 4.5: Interoperability Problems With Non-Sun Systems The following problems are relevant to Suns that are doing mounts from non-Sun systems. Q: Why do I get the following error when mounting from my HP or SunOS 3.5 machine or other machine running an older version of NFS: nfsmount server/filesystem server not responding RPC authentication error \ why = invalid client credential. A: Older versions of NFS only allowed users to be in eight groups or less. Reduce root's number of groups to eight or less and the problem will go away. Users planning to access this partition should also reduce their number of groups to eight. Q: When I NFS mount filesystems to my Sun, from my PC, why does the Sun never see changes I make to those filesystems. A: Most PC NFS servers do not seem to correctly notify their NFS clients of changes made to their filesystems. It appears that this is due to the fact that file timestamps on PCs are very coarse. If you are having this problem, speak with the vendor of your PC NFS product. Q: Why do mounts from my SGI fail with "not a directory" ? A: For some reason, certain versions of the SGI NFS server sometimes begin using port 860 rather than 2049 for NFS. When this occurs, mounts will fail. In order to get around this bug, always use the "port" option, with 2049 as a value, when doing mounts from an SGI, e.g.: mount -o port=2049 sgi:/partition /localpartition If you are mounting from an SGI via autofs, be sure you have the newest version of the kernel patch (101318-74 or better for 5.3, 101945-32 or better for 5.4), as older versions of the kernel patch did not support the port option for autofs. Q: Why can't I NFS mount from my old, old machine? A: If you have a very old machine, it is probably running NFS version 1. Such machines often have problems talking to newer versions of NFS. If you have a very old machine, speak with the manufacturer to see if they've ported NFS version 2 or 3. 4.6: Common NFS Server Errors Q: Why do I get the following error when I run exportfs/shareall? exportfs: /var/opt: parent-directory (/var) already exported share_nfs: /var/opt: parent-directory (/var) already shared A: NFS specs forbid you from exporting both a parent directory and a sub-directory. If you try and export a sub-directory when the parent directory is already exported, you will get the above error. The above example showed an export of the subdirectory /var/opt being attempted, after the directory /var was already available. A very similar error will occur in the opposite case: exportfs: /var: sub-directory (/var/spool/mail) already exported This shows the directory /var being exported after /var/spool/mail was already available. If you want to have both a parent directory and its sub-directory exported, you must export just the parent directory. Among other things, this means that you can not have different options on parent and sub-directories, for example -ro on a parent directory and -rw on a specific subdirectory. Q: Why is my NFS server getting totally overrun by quota errors? A: Solaris 2.4 experienced an error relating to way that quotas and NFS interacted. Obtain 101945-34 or later, if quota message from NFS partitions are having a serious impact on your machine. If you are running into this problem where your client is Solaris and your server is SunOS, you will not have this option and it is recommended that you simply upgrade your SunOS system. Q: Why does the /etc/rmtab file get huge? A: The rmtab contains the list of all the file systems currently being mounted by remote machines. When a filesystem is unmounted by a remote machine, the line in the rmtab is just commented out, not deleted. This can make the rmtab file get very large, maybe even filling the root partition. If this is a problem at your site, add the following lines to your rc, prior to the starting of the rpc.mountd: if [ -f /etc/rmtab ] then sed -e "/^#/d" /etc/rmtab > /tmp/rmtab 2>dev/null mv /tmp/rmtab /etc/rmtab >dev/null 2>1 fi This will cause the rmtab file to be trimmed every time the system boots. 4.7: Common nfsd Error Message on NFS Servers Q: Why do I get the following error message when nfsd starts? /usr/lib/nfs/nfsd[247]: netdir_getbyname (transport udp, host/serv \1/nfs), Bad file number A: This problem is usually the result of an nfsd line not being in your services map. Consult your naming service (files, nis, nis+) and insert the following entry, if it is missing: nfsd 2049/udp nfs # NFS server daemon ...and at 2.5, you must also have: nfsd 2049/tcp nfs Q: Why do I get the following error message when nfsd starts? /usr/lib/nfs/nfsd[2943]: t_bind to wrong address /usr/lib/nfs/nfsd[2943]: Cannot establish NFS service over /dev/udp: \ transport setup problem. /usr/lib/nfs/nfsd[2943]: t_bind to wrong address /usr/lib/nfs/nfsd[2943]: Cannot establish NFS service over /dev/tcp: \ transport setup problem. /usr/lib/nfs/nfsd[2943]: Could not start NFS service for any protocol. Exiting. A: This problem is caused by trying to start a second nfsd when one is already running. 4.8: Common rpc.mountd Error Messages on NFS Servers Q: Why do I constantly get the following error message on my NFS server: Aug 15 13:13:56 servername mountd[930]: couldn't register TCP MOUNTPROG Aug 15 13:13:58 servername inetd[141]: mountd/rpc/udp server failing A: This problem occurs most often on SunOS machines. It typically means that you are starting rpc.mountd from the rc.local, but also have a line in your inetd.conf: mountd/1 dgram rpc/udp wait root /usr/etc/rpc.mountd rpc.mountd You can resolve this problem by commenting out the mountd line in the /etc/inetd.conf file and then killing and restarting your inetd. 4.9: Common rpc.lockd & rpc.statd Error Messages Q: What does it mean when I get the following error: lock manager: rpc error (#): RPC: Program/version mismatch A: Some of your systems are running up-to-date versions of lockd, while others are outdated. Install the most up-to-date lockd patch on all of your systems. See section 5.0 below for a list of lockd patches. Q: What does it mean when I get the following error: rpc.statd: cannot talk to statd on [machine] A: Either, [machine] is down or it is no longer doing NFS services. It's possible that the machine might still be around, but has changed its name or something similar. If these changes are going to be permanent, clear out the statmon directories on your machine. Do this by rebooting the machine into single user mode and running the following command: SunOS: rm /etc/sm/* /etc/sm.bak/* Solaris: rm /var/statmon/sm/* /var/statmon/sm.bak/* Afterwards, execute reboot to bring your machine back up. Alternatively, if you cannot put the system into single user mode, - Kill the statd and lockd process - clear out the "sm" and "sm.bak" directories" - Restart statd and lockd in that order Q: How can I fix these errors? The SunOS 4.1.X lockd reports: lockd[136]: fcntl: error Stale NFS file handle lockd[136]: lockd: unable to do cnvt. the lockd error message is different on Solaris 2.3 and 2.4: lockd: unable to do cnvt. _nfssys: error Stale NFS file handle Generally, this is caused by an error from a client. The client has submitted a request for a lock on a stale file handle. Sometimes, older or unpatched lockd clients will continually resubmit these requests. See the "lockd debug hint" Section for help in identifying the client making the request. See section 5.0 for info on the NFS and lockd patches. If the client is a non-Sun, contact the client system vendor for their latest lockd patch. Q: How can I fix the following errors: nlm1_reply: RPC unknown host create_client: no name for inet address 0x90EE4A14. We also see nlm1_call: RPC: Program not registered create_client: no name for inet address 0x90EE4A14. A: There are THREE items to check in order. 1. This first answer applies if the The hexadecimal address 0x90EE4A14 corresponds to an IP address in use on your network and it not in your hosts database (/etc/hosts, NIS, NIS+ or DNS as appropriate). In this case, to 144.238.74.20. The customer does not have that host ID in his NIS+ hosts table. The customer can find out the host name for that IP address by using telnet to connect to the IP address, then getting the hostname. The customer then adds the entry to the NIS+ hosts table. Then verify that gethostbyaddr() was working with the new IP/hostname in NIS+ with: ping -s 144.238.74.20 The responses interpret the IP address into the hostname. 2. If you do the above and the messages continue, kill and restart the lockd as it appears lockd caches name service information. 3. Patch levels: Solaris 2.4: 101945-34 or better kernel jumbo patch 101977-04 or better lockd jumbo patch 102216-05 or better klm kernel locking patch (See note below) Note: Patch 102216-05 contains a fix for a bug that can cause this error message: 1164679 KLM doesn't initialize rsys & rpid correctly Solaris 2.3: 101318-75 or better kernel jumbo patch Q: Why do I get the following error message on Solaris? lockd[2269]:netdir_getbyname (transport udp, host/serv \1/lockd), Resource temporarily unavailable lockd[2269]: Cannot establish LM service over /dev/udp: bind problem. Exiting. A: This is caused by missing entries for lockd in /etc/services, the NIS services map, or the NIS+ services table. Verify this with: getent services lockd If you don't get the lockd entries, add the following entry to the appropriate services database if it does not exist: lockd 4045/udp lockd 4045/tcp Check your /etc/nsswitch.conf file's services entry to determine which services database you are using. Q: Why do I get the following error message on Solaris? lockd[2947]: t_bind to wrong address lockd[2947]: Cannot establish LM service over /dev/udp: bind problem. Exiting. A: This is caused by trying to start lockd when it is already running. If you see this message at bootup, you must inspect your startup scripts in /etc/rc2.d and /etc/rc3.d to determine the cause. 4.10: NFS Related Shutdown Errors Q: Why do I get the following error, when running 'shutdown' on my Solaris machine: "showmount: machine: RPC:program not registered" A: This is due to a bug in the /usr/sbin/shutdown command. shutdown executes the showmount command as part of its scheme to warn other machines that it will not be available. If the machine you executed shutdown on is not a nfs server, shutdown will complain with the above message. This will cause no impact to your machine, but if it annoys you, you can run the older /usr/ucb/ shutdown program: # /usr/ucb/shutdown Q: Why do I get the following error, when running 'shutdown' on my Solaris machine: "nfs mount:machine(vold(PID###):server not responding:RPC not registered" A: This is due to a bug in vold, which causes it to be shutdown too late. This will cause no impact to your machine, but if it annoys you, you can stop vold before executing shutdown: # /etc/init.d/volmgt stop # shutdown 4.11 NFS Performance Tuning Q: How do I determine how many nfsds to run on a SunOS 4.1.X or on a Solaris 2.X system? A: It is difficult to provide NFS tuning in short a technical note, but here are some general guidelines. For more specific guidelines, consult the O'Reilly and Associates book "Managing NFS and NIS", the SunSoft Press book, "Sun Performance and Tuning", or the "SMCC NFS Performance and Tuning Guide". Ordering info is in Section 7 of this PSD. If you need NFS performance consulting assistance from SunService, please refer to Sections 8 and 9 of this document on supportability and support providers. In SunOS 4.1.X, the number of nfsd's specifies the number of nfsd processes that run. In Solaris 2.X, the number of nfsd's specifies the number of nfsd threads that run inside the single nfsd Unix process. Here are some general guidelines for SunOS 4.1.X: To determine how many nfsds to run, use any of the formulas below to pick a starting value. Then use the procedures below to adjust the number of nfsds until it is right for the particular environment. -------------------------------------------------------- VARIATION FORMULA -------------------------------------------------------- Variation 1 #(disk spindles) + #(network interfaces) -------------------------------------------------------- Variation 2 4 for a desktop system that is both client and server, 8 for a small dedicated server, 16 for a large NFS and compute server, 24 for a large NFS-only server +-----------+------------------------------------------+ Variation 3 2 * max#(simultaneous disk operations) -------------------------------------------------------- On Solaris 2.X, this number will be different. The SunSoft press book recommends taking the highest number obtained by applying the following three rules: * Two NFS threads per active client process * 32 NFS threads on a SPARCclassic server, 64 NFS threads per SuperSPARC processor. * 16 NFS threads per ethernet, 160 per FDDI The default for 2.X is 16 threads. Q: What other guidelines and help is there on Tuning NFS? A: Consult the O'Reilly and Associates book "Managing NFS and NIS", the SunSoft Press book, "Sun Performance and Tuning", or the "SMCC NFS Performance and Tuning Guide". Ordering info is in Section 7 of this PSD. 5.0 Patches General Information on Patches The following is the list of all of the NFS related patches for 4.1.3, 4.1.3_u1, 4.1.4, 2.3, 2.4, and 2.5. If you are having NFS problems, installing the patches is a good place to start, especially if you recognize the general symptoms noted below. In order for a machine to be stable, all of the recommended patches should be installed as well. The list of recommended patches for your operating system is available from sunsolve1.sun.com 5.1: Core NFS Patches for SunOS 4.1.X 100173-13 SunOS 4.1.3: NFS Jumbo Patch 102177-04 SunOS 4.1.3_U1: NFS Jumbo Patch 102394-02 SunOS 4.1.4: NFS Jumbo Patch Resolve a large number of NFS problems. Should be installed on any machine doing NFS. 100988-05 SunOS 4.1.3: UFS File system and NFS locking Jumbo Patch. 101784-04 SunOS 4.1.3_U1: rpc.lockd/rpc.statd jumbo patch 102516-05 SunOS 4.1.4: UFS File system Jumbo Patch Fixes a wide variety of rpc.lockd and rpc.statd problems. 102264-02 SunOS 4.1.4: rpc.lockd patch for assertion failed panic Fixes an "Assertion failed" panic related to the lockd. 103275-01 SunOS 4.1.4: System with heavy NFS load may crash due to IP driver bu 5.2: Patches Related to NFS for SunOS 100361-04 SunOS 4.1.1 4.1.2 4.1.3: server not responding due to limits of Resolves an error that could cause "NFS server not responding" errors on a machine that had more than 500 machines in its arp cache. Only a problem at sites with very large local nets. 101849-01 SunOS 4.1.3: rpc.quotad is very slow on busy NFS servers Speeds up slow rpc.quotads on NFS servers. 5.3: Core NFS Patches for Solaris SOLARIS 2.3: 101318-81 SunOS 5.3: Jumbo patch for kernel (includes libc, lockd) Resolves a large number of problems involving both nfs and the lockd, as well as the related autofs program. Should be installed on any 5.3 machine, but is an absolute necessity on a machine doing NFS. 102654-01 SunOS 5.3: rmtab grows without bounds This patch solves problems where the mountd hangs up, but the nfsd continues to process NFS requests from existing NFS mounts. 103059-01 SunOS 5.3: automountd /dev rdev not in mnttab This patch fixes a variety of issues where the automounter loses entries from mnttab, often seen with lofs (loopback) mounts. 101930-01 SunOS 5.3: some files may not show up under cachefs This patch is required with the "autoclient" product, which is needed to cache the / and /usr file systems with cachefs. 102932-02 SunOS 5.3: statd dies intermittently SOLARIS 2.4 and 2.4x86: 101945-42 SunOS 5.4: jumbo patch for kernel 101946-35 SunOS 5.4_x86: jumbo patch for kernel Resolves a large number of problems involving nfs, as well as the related autofs program. Should be installed on any 5.4 machine, but is an absolute necessity on a machine doing NFS. 102685-01 SunOS 5.4: lofs - causes problems with 400+ PC-NFS users This patch resolves some mountd hangs seen after sharing a lofs mount point. 101977-04 SunOS 5.4: lockd fixes 101978-03 SunOS 5.4_x86: lockd fixes Resolves various lockd error messages, as well as a lockd memory leak. 102216-07 SunOS 5.4: klmmod and rpcmod fixes Resolves problems with NFS file locking. It is needed whenever patching lockd. 102769-03 SunOS 5.4: statd requires enhancements in support of HADF This patch is generally needed in high availability server application. 102209-01 SunOS 5.4: No way to cache the root and /usr file systems with CacheFS 102210-01 SunOS 5.4_x86: No way to cache root & /usr file systems with CacheFS This patch is required with the "autoclient" product, which is needed to cache the / and /usr file systems with cachefs. 102217-07 SunOS 5.4_x86: NFS client starts using unreserved UDP port numb Resolves a problem specific to the x86 port of 5.4, which caused NFS clients to begin using unreserved ports. [look up bug 1179403] SOLARIS 2.5 and 2.5x86 103226-07 SunOS 5.5: /kernel/sys/nfs and /kernel/fs/nfs fixes 103227-06 SunOS 5.5_x86: /kernel: sys/nfs, fs/nfs & misc/nfssrv fixes This patch is needed for any Solaris 2.5 system w/ NFS 103325-02 SunOS 5.5: mount causes the system to panic Data fault This patch also fixes some file locking problems in klmmod 103477-02 SunOS 5.5: RPC: Unable to send/receive 103478-01 SunOS 5.5_x86: RPC: Unable to send/receive SOLARIS 2.5.1, 2.5.1_x86, and 2.5.1_ppc 103609-02 SunOS 5.5.1: RPC: Unable to send/receive 103611-01 SunOS 5.5.1_ppc: RPC: Unable to send/receive 103610-01 SunOS 5.5.1_x86: RPC: Unable to send/receive 5.4: Patches Related to NFS for Solaris We STRONGLY recommend you install these patches, especially if you have had any problems with "NFS SERVER NOT RESPONDING": SOLARIS 2.3: 101546-01 SunOS 5.3: nfs: multiple quota -v may not return info or too slow 101581-02 SunOS 5.3: quotaon/quotaoff/quotacheck fixes Resolves a problem that caused rquotad to hang on some NFS systems and to resolve other quota issues. 101306-11 SunOS 5.3: Jumbo Patch for le & qe drivers This is a "must install" patch for systems with Ethernet. 102272-02 SunOS 5.3: Ethernet and ledmainit fixes Resolve dma problems with le interface, possible causes of NFS server hangs 101734-03 SunOS 5.3: iommu fixes for sun4m Resolve iommu problems mainly on Sparc 5, possible causes of NFS server hangs. SOLARIS 2.4: 101973-23 SunOS 5.4: jumbo patch for libnsl and ypbind This patch resolves a variety of name service issues that can cause a 2.4 NFS server to not respond to certain requests. This is a "must have" patch. 102001-11 SunOS 5.4: le, qe, be Ethernet driver Jumbo Patch This is a "must install" patch for systems with Ethernet. 102332-01 SunOS 5.4: ledma fix Resolve dma problems with le interface, possible causes of NFS server hangs 102038-02 SunOS 5.4: iommunex_dma_mctl Sparc 5 only Resolves iommu problems on Sparc 5, possible causes of NFS server hangs. SOLARIS 2.5 102979-02 SunOS 5.5: /kernel/drv/be, /kernel/drv/hme and /kernel/drv/qe fixes 103244-03 SunOS 5.5: Fix for le driver 6.0 Bugs and RFEs Bugs and RFEs (Request for Enhancement) This section should be considered under construction and fairly dynamic. Bugs: 1149389 - Under heavy load, a 2.3 NFS server may see the following errors: Oct 18 08:57:12 cobra unix: xdrmblk_getmblk failed Oct 18 08:57:12 cobra unix: NOTICE: nfs_server: bad getargs There is no fix for this bug in 2.3. One case of this bug was fixed in the 2.4 FCS. Note: it is possible that this bug is caused by UDP checksum errors from the clients. This is most often seen with SunOS and PC clients. Enable UDP checksumming as a potential workaround. Another workaround is to change the rsize=1024,wsize=1024 on all of the NFS client so that there is no UDP packet reassembly problems. In any case, the root cause is corruption of a UDP packet or incorrect or non-existent creation of UDP checksums for requests from an NFS client. See the automount Tips Sheet "PSD" for some further information about automount bugs. 1174737 2.4 NFS clients hang with logging in to NFS mounted home directory, even though the NFS clients and server are "patched up". Workaround: Upgrade the 2.4 NFS client to 2.5. Alternatively, make the ksh shell history file local, by editing in the /etc/profile on the NFS clients: export HISTFILE=/tmp/$LOGNAME 1222181 2.4's mountd allows automount "lofs" mount point created by loopback mount to be shared or exported. This bug has been known to cause mountd hangs on 2.4!! RFEs: To be investigated and added 7.0 Documentation 7.1: Important Man Pages dfmounts (Solaris only) dfshares (Solaris only) exportfs exports lockd mnttab mount mountd nfs nfsd rmtab share (Solaris only) share_nfs (Solaris only) shareall (Solaris only) sharetab (Solaris only) showmount statd xtab unshare (Solaris only 7.2 Sunsolve Documents There are a huge number of Sunsolve documents related to NFS. The ones noted below are primarily those that have expanded information, not in this document. 7.2.1 Sun Infodocs 2016 How does NFS work? 7.2.2 Sun FAQs 1025 nfs mounting from non-Solaris system fails 7.2.3 Sun SRDBs 3874 Getting "Stale NFS Handles" errors 4456 What is the procedure for optimizing the number of nfsds? 4726 error running exportfs: "Too many levels of remote in path" 4727 exportfs doesn't recog. netgroup root access exported dir. 4769 How to use rpcinfo program to troubleshoot RPC daemons? 4840 Secure NFS failing with authentication errors 5594 nfs mount fails with "not owner" 5925 NFS request from unprivileged port. 6682 quota -v return no information on an NFS client 7334 rpc.lockd error- fcntl: error Stale NFS file handle 10609 diskless client boot gave nfs mount error 13 10946 NFS errors 11058 ls of file system mounted from ULTRIX server hangs on Solaris 2.4 7.3 Sun Educational Services Sun Education provides general SunOS and Solaris network administration classes. In the USA, contact Sun Education at 1-800-422-8020 for a current catalog and set of course descriptions. 7.4: Solaris Documentation _NFS Administration Guide_, Part #801-6634-10 Information on how to set up, maintain and debug NFS and autofs. _SMCC NFS Server Performance and Tuning Guide_, Part #802-5010-10 A very good resource for analyzing and improving NFS performance on a Solaris server. The part number shown is for the Solaris 2.5 version of this manual. 7.5: Third Party Documentation _Managing NFS and NIS_, by Hal Stern, published by O'Reilly & Associates, Inc, ISBN #0-937175-75-7 The definitive source for managing NFS in a SunOS environment. Has a section on performance tuning that is quite helpful. Gives some information on the automounter as well. The underlying concepts are still the same for Solaris, but some of the commands and file names have changed. _TCP/IP Network Administration_, by Craig Hunt, published by O'Reilly & Associates, Inc, ISBN #0-937175-82-X A good overview of TCP/IP, with a limited introduction to SunOS NFS. 7.6: RFCs RFCs are the internet-written documents that define the specifications of many common networking programs. RFCs can be retrieved from nic.ddn.mil, in the /rfc directory. 1094 NFS: Network File System Protocol specification The official spec on the NFS protocol. 8.0 Supportability 8.0: Supportability SunService is not responsible for the initial configuration of your NFS environment. In addition, SunService can not diagnose your NFS performance problems or suggest NFS tuning guidelines. Consulting services are available from Sun to provide these services on a flat fee or per hour consulting rate. Contact your local Sun office for further information on those services. We can help resolve problems where NFS is not behaving correctly, but in such cases the contact must be a system administrator who has a guarantee a solution to problems involving non-Sun hosts, nor PATCH ID: n/a PRODUCT AREA: Gen. Network PRODUCT: NFS SUNOS RELEASE: any UNBUNDLED RELEASE: n/a HARDWARE: n/a

Unix Tech Tips & Tricks

Thursday, December 28, 2006

I/O Bottleneck - Trouble shooting

What does 100 percent busy mean?

Linux Memory Management

Contents

Overview of memory management

The mysterious 880 MB limit on x86

The difference among VIRT, RES, and SHR in top output

The difference between buffers and cache

Swappiness (2.6 kernels)

Autoregulation

INFO ON Promiscuous mode

Veritas volume manager documentation

Veritas volume manager documentation

Monday, December 18, 2006

Changing TimeZone to IST

SunService Tip Sheet: Network Security

SunService Tip Sheet: Network Security

SunService Tip Sheet: Sun NFS

SunService Tip Sheet: Sun NFS

Tech Links

Tips & Tricks

Blog Archive