throbber
Library Express
`
`
`
`
`
`3/2..sz9sé'””
`
` §EmaiI
`
`§remzi@cs.“wisc.edu
`
`
`..
`
`..,.
`
`El Design and Implementation of the Sun Network Filesystem
`El Proceedings of the Summer 1985 USENIX Conference, Portland OR,
`El June 1985, pp. 119-130.
`
`
`
`Noticewarningconcerningcopyrightrestrictions:TheCopyrightLawoftheUnitedStates(Title17,United
`
`States Code) governs the making of photocopies or other reproductions of copyrighted material. Under
`certain conditions specified in the law, libraries and archives are authorized to furnish a photocopy or
`other reproduction. One of these specified conditions is that the photocopy or reproduction is not to be
`"used for any purpose other than private study, scholarship, or research." If a user makes a request for,
`or later uses, a photocopy or reproduction for purposes in excess of "fair use," that user may be liable for
`copyright infringement. This institution reserves the right to refuse to accept a copying order if, in its
`judgement, fulfillment of the order would involve violation of copyright law.
`
`I5E
`
`Upon receipt of this electronic reproduction of the publication you have
`requested, we ask that you comply with copyright law by not systematically
`reproducing it, or in any way distributing or making available multiple copies
`of it.
`
`ISSN/ISBN/OCLC
`
`[3 Copyright
`
`
`Updated.
`3”
`
`Numberofpages T_____
`
`$ __
`
`Notified:
`
`Petitioner 1\/licrosoft Corporation - Ex. 1041, p. 1
`
`Petitioner Microsoft Corporation - Ex. 1041, p. 1
`
`

`
`385
`
`USENIX AssociationM
`
`Summer Conference Proceedings
`r Portland 1985
`
`June 11 — 14, 1985
`Portland, Oregon USA
`
`
`
`
`
`
`
`2
`
`Petitioner Microsoft Corporation - Ex. 1041, p. 2
`
`

`
`
`
`For additional copies of these proceedings contact:
`
`USENIX Asociation
`
`2560 Ninth Street. Suite 215
`Berkeley. CA 94710 U.S.A.
`510/528-8649
`
`© Copyright 1985 by The USENIX Association
`
`All rights reserved.
`
`This volume is published as a collective work.
`
`Rights to individual papers remain
`
`with the author or the author’s employer.
`
`UNIX is a trademark of AT&‘l‘ Bell Laboratories.
`
`Other trademarks are noted in the text.
`
`Petitioner Microsoft Corporation - Ex. 1041, p. 3
`
`Petitioner Microsoft Corporation - Ex. 1041, p. 3
`
`

`
`Design and Implementation of the Sun Network Filesystem
`
`Russel Sandberg
`David Goldberg
`Steve Kleiman
`D W I h
`31'}; L305’.
`-
`SUl12§\g5Cl&§:¥:it:l'I'AS‘;eI.nC.
`
`Mountain View. CA. 94110
`(415) 960-7293
`
`NOTICE: Th‘
`l
`BE PRoTEt:I§gn'iDA‘;S/RIAL wit
`C0P"r:' ht L
`-
`I 9
`aw (Title 77 U5 Cfldé)
`
`Introduction
`
`The Sun Network Filesystem (NFS) provides transparent. remote access to filesystems. Unlike
`many other remote filesystem implementations under UNIX‘I'. the NFS is designed to be easily
`portable to other operating systems and machine architectures.
`It uses an External Data
`Representation (XDR) specification to describe protocols in a machine and system independent
`way. The NFS is implemented on top of a Remote Procedure Call package (RPC) to help
`simplify protocol definition. implementation. and maintenance.
`
`In order to build the NFS into the UNIX 4.2 kernel in a user transparent way, we decided to add
`a new interface to the kernel which separates generic filesystem operations from specific
`filesystem implementations. The “filesystem interface" consists of two parts: the Virtual File
`System (VFS) interface defines the operations that can be done on a filesystem. while the vnode
`interface defines the operations that can be done on a file within that filesystem. This new
`interface allows us to implement and install new filesystems in much the same way as new device
`drivers are added to the kernel.
`
`In this paper we discuss the design and implementation of the filesystem interface in the kernel
`and the NFS virtual filesystem. We describe some interesting design issues and how they were
`resolved. and point out some of the shortcomings of the current implementation. We conclude
`with some ideas for future enhancements.
`
`Design Goals
`
`The NFS was designed to make sharing of filesystem resources in a network of non-homogeneous
`machines easier. Our goal was to provide a UNIX-like way of making remote files available to
`local programs without having to modify. or even recompile,
`those programs. In addition. we
`wanted remote file access to be comparable in speed to local file access.
`
`The overall design goals of the NFS were:
`
`Machine and Operating System Independence
`The protocols used should be independent of UNIX so that an NFS server can
`supply files to many different types of clients. The protocols should also be
`simple enough that they can be implemented on low end machines like the PC.
`
`Crash Recovery
`When .clients can mount remote filesystems from many different servers it is
`very important that clients be able to recover easily from server crashes.
`
`Transparent Access
`We want to provide a system which allows programs to access remote tiles in
`exactly the same way as local files. No pathname parsing. no special libraries.
`no recompiling. Programs should not be able to tell whether a file is remote or
`local.
`
`1' UNIX is a trademark of Bell Laboratories.
`
`
`
`“9
`
`Petitioner l\/licrosoft Corporation - Ex. 1041, p. 4
`
`Petitioner Microsoft Corporation - Ex. 1041, p. 4
`
`

`
`
`
`UNIX Semantics Maintained on Client
`In order for
`transparent access to work on UNIX machines, UNIX filesystern
`semantics have to be maintained for remote files.
`
`Reasonable Performance
`People will not want to use the NFS if it is no faster than the existing networking
`utilities, such as rcp. even if it is easier to use. Our design goal is to make NFS
`as fast as the Sun Network Disk protocol (ND‘), or about 80% as fast as a
`local disk.
`
`Basic Design
`
`The NFS design consists of three major pieces: the protocol, the server side and the client side.
`N FS Protocol
`
`The NFS protocol uses the Sun Remote Procedure Call (RPC) mechanism [1]. For the same
`reasons that procedure calls help simplify programs, RPC helps simplify the definition,
`organization, and implementation of remote services. The NFS protocol is defined in terms of a
`set of procedures, their arguments and results, and their effects. Remote procedure calls are
`synchronous, that is. the client blocks until the server has completed the call and returned the
`results. This makes RFC very easy to use since it behaves like a local procedure call.
`
`The NFS uses a stateless protocol. The parameters to each procedure call contain all of the
`information necessary to complete the call, and the server does'not keep track of any past
`requests. This makes crash recovery very easy; when a server crashes, the client resends NFS
`requests until a response is received, and the server does no crash recovery at all. When a client
`crashes no recovery is necessary for either the client or the server. When state is maintained on
`the server, on the other hand, recovery is much harder. Both client and server need to be able to
`reliably detect crashes. The server needs to detect client crashes so that it can discard any state it
`is holding for the client, and the client must detect server crashes so that it can rebuild the
`server's state.
`
`Using a stateless protocol allows us to avoid complex crash recovery and simplifies the protocol.
`If a client just resends requests until a response is received, data will never be lost due to a server
`crash.
`In fact the client can not
`tell the difference between a server that has crashed and
`recovered, and a server that is slow.
`
`Sun's remote procedure call package is designed to be transport independent. New transport
`protocols can be “plugged in" to the RPC implementation without affecting the higher level
`protocol code. The NFS uses the ARPA User Datagram Protocol (UDP) and lntemet Protocol
`(IP) for its transport level. Since UDP is an unreliable datagram protocol, packets can get lost,
`but because the NFS protocol is stateless and the NFS requests are idempotent,
`the client can
`recover by retrying the call until the packet gets through.
`
`The most common NFS procedure parameter is a structure called a file handle (fhandle or fh)
`which is provided by the server and used by the client to reference a file. The fhandle is opaque,
`that is, the client never looks at the contents of the fhandle, but uses it when operations are done
`on that file.
`
`An outline of the NFS protocol procedures is given below. For the complete specification see the
`Sun Network Filesystem Protocol Specification [2].
`
`null() returns ()
`Do nothing procedure to ping the server and measure round trip time.
`lookup(dirfh. name) returns (fh, attr)
`’
`Returns a new {handle and attributes for the named file in a directory.
`
`create(dirfh, name, attr) returns (newfh, attr)
`Creates a new file and returns its fhandle and attributes.
`
`remove(dirfh, name) returns (status)
`Removes a file from a directory.
`
`getattr(fh) returns (attr)
`Returns file attributes. This procedure is like a stat call.
`E ND, the Sun Network Disk Protocol, provides block-level access to remote, sub-partitioned disks.
`
`1‘Vl|
`
`Petitioner Microsoft Corporation - Ex. 1041, p. 5
`
`Petitioner Microsoft Corporation - Ex. 1041, p. 5
`
`

`
`read also
`
`setattr(th. attr) returns (attr)
`Sets the mode, uid, gid, size, access time. and modify time of a file. Setting the size to
`zero truncates the file.
`read(fh, offset, count) returns (attr, data)
`Returns up to count bytes of data from a file starting offset bytes into the file.
`returns the attributes of the tile.
`wrlte(fh. offset, count, data) returns (attr)
`Writes count bytes of data to a file beginning offset bytes from the beginning of the file.
`Returns the attributes of the file after the write takes place.
`_
`rename(dirfh, name. tofh. toname) returns (status)
`Renames the file name in the directory dirfh, to toname in the directory lofh.
`link(dirih. name. tofh. toname) returns (status)
`Creates the file toname in the directory loflt, which is a link to the file name in the
`directory dirfh.
`symllnk(dirfli. name, string) returns (status)
`Creates a symbolic link name in the directory dirflx with value string. The server does not
`interpret the string argument in any way. just saves it and makes an association to the new
`symbolic link file.
`'
`readllnk(fh) returns (string)
`Returns the string which is associated with the symbolic link file.
`mkdlr(dirfh, name, attr) returns (fh, newattr)
`Creates a new directory name in the directory dirfh and returns the new ihandle and
`attributes .
`rmdlr(dirfh, name) returns(status)
`Removes the empty directory name from the parent directory dirflz.
`readdir(dirfh, cookie, count) retums(entries)
`Returns up to count bytes of directory entries from the directory dirfh. Each entry contains
`a file name. file id, and an opaque pointer to the next directory entry called a cookie. The
`cookie is used in subsequent readdir calls to start reading at a specific entry in the
`directory. A readdir call with the cookie of zero returns entries starting with the first
`entry in the directory.
`statfs(fh) returns (fsstats)
`Returns filesystem information such as block size. number of free blocks. etc.
`New thandles are returned byxthe lookup. create, and mkdlr procedures which also take an
`fhandle as an argument. The first remote fhandle. for the root of a filesystem, is obtained by the
`client using another RPC based protocol. The MOUNT protocol takes a directory pathname and
`returns an fhandle if the client has access permission to the filesystem which contains that
`directory. The reason for making this a separate protocol is that this makes it easier to plug in
`new filesystem access checking methods. and it separates out the operating system dependent
`aspects of the protocol. Note that the MOUNT protocol is the only place that UNIX pathnames
`are passed to the server.
`ln other operating system implementations the MOUNT protocol can
`be replaced without having to change the NFS protocol.
`The NFS protocol and RPC are built on top of an External Data Representation (XDR)
`specification [3]. XDR defines the size, bytes order and alignment of basic data types such as
`string, integer. union, boolean and array. Complex structures can be built from the basic data
`types. Using XDR not only makes protocols machine and language independent. it also makes
`them easy to define. The arguments and results of RPC procedures are defined using an XDR
`data definition language that looks a lot like C declarations.
`
`Server Side
`Because the NFS server is stateless. as mentioned above. when servicing an NFS request it must
`commit any modified data to stable storage before returning results. The implication for UNIX
`based servers is that
`requests which modify the filesystem must flush all modified data to disk
`before returning from the call. This means that. for example on a write request. not only the
`data block, but also any modified indirect blocks and the block containing the.inode must be
`flushed if they have been modified.
`- Another modification to UNIX necessary to make the server work is the addition of a generation
`number in the mode. and a filesystem id in the superblock. These extra numbers make it
`possible for the server to use the inode number.
`inode generation number. and filesystem id
`Petitioner l\/licrosoft Corporation - Ex. 1041, p. 6
`
`
`
`Petitioner Microsoft Corporation - Ex. 1041, p. 6
`
`

`
`together as the fhandle for a file. The inode generation number is necessary because the server
`may hand out an fhandle with an inode number of a file that is later removed and the inode
`reused. When the original fhandle comes back, the server must be able to tell that this inode
`number now refers to a different file. The generation number has to be incremented every time
`the inode is freed.
`
`Client Side
`The client side provides the transparent interface to the NFS. To make transparent access to
`remote files work we had to use a method of locating remote files that does not change the
`structure of path names. Some UNIX based remote file access schemes use host.-path to name
`remote files. This does not allow real transparent access since existing programs that parse
`pathnames have to be modified.
`Rather than doing a “late binding" of file address. we decided to do the hostname lookup and
`file address binding once per tilesystem by allowing the client to attach a remote filesystern to a
`directory using the mount program. This method has the advantage that the client only has to
`deal with hostnames once. at mount time.
`It also allows the server to limit access to filesystems
`by checking client credentials. The disadvantage is that remote files are not available to the
`client until a mount is done.
`Transparent access to different types of filesystems mounted on a single machine is provided by a
`new filesystems interface in the kernel. Each “filesystem type" supports two sets of operations:
`the Virtual Filesystem (VFS) interface defines the procedures that operate on the filesystem as a
`whole; and the Virtual Node (vnode) interface defines the procedures that operate on an
`individual file within that filesystem type. Figure 1 is a schematic diagram of the filesystem
`interface and how the NFS uses it.
`
`CLIENT
`
`System Calls '
`
`VNODE/VFS
`
`\
`
`SERVER
`
`System Calls
`
`VNODE/VFS
`
`
`
`5'.<.
`fig
`
`PC Filesystcm
`
`4.2 Filesystem
`
`NFS Filesystem
`
`Figure 1
`
`The Filesystem Interface
`The VFS interface is implemented using a structure that contains the operations that can be done
`on a whole filesystem. Likewise,
`the mode interface is a structure that contains the operations
`that can be done on a node (file or directory) within a filesystem. There is one VFS structure per
`
`122
`
`Petitioner Microsoft Corporation - Ex. 1041 p. 7
`
`Petitioner Microsoft Corporation - Ex. 1041, p. 7
`
`

`
`mounted filesystem in the kernel and one vnode structure for each active node. Using this
`abstract data type implementation allows the kernel to treat all filesystems and nodes in the same
`way without knowing which underlying filesystem implementation it is using.
`Each vnode contains a pointer to its parent VFS and a pointer to a mounted—on VFS. This
`means that any node in a filesystem tree can be a mount point for another tilesystem. A root
`operation is provided in the VFS to return the root vnode of a mounted filesystem. This is used
`by the pathname traversal routines in the kernel to bridge mount points. The root operation is
`used instead of just keeping a pointer so that the root vnode for each mounted filesystem can be
`released. The VFS of a mounted filesystem also contains a back pointer to the vnode on which it
`is mounted so that pathnames that include “.." can also be traversed across mount points.
`
`In addition to the VFS and vnode operations, each filesystem type must provide mount and
`mount_root operations to mount normal and root filesystems. The operations defined for the
`filesystem interface are:
`
`Filesystem Operations
`
`mount( varies )
`mount_root( )
`
`VFS Operations
`
`unmount(vfs)
`root(vt's) retums(vnode)
`statfs (vfs) retums( fsstatbuf)
`sync(vfs)
`
`Vnode Operations
`
`System call to mount filesystem
`Mount filesystem as root
`
`Unmount fllesystem
`Return the vnode of the filesystem root
`Return filesystem statistics
`Flush delayed write blocks
`
`Mark file open
`open(vnode. flags)
`Mark file closed
`close(vnode, flags)
`Read or write a file
`rdwr(vnode, uio, rwflag. flags)
`Do I/O control operation
`ioctl(vnode. cmd, data. rwflag)
`Do select
`select(vnode. rwflag)
`Return file attributes
`getattr(vnode ) retums(attr)
`Set file attributes
`setattr(vnode, attr)
`Check access permission
`access(vnode. mode)
`Look up file name in a directory
`lookup(dvnode, name) retums(vnode)
`create (dvnode, name. attr, excl. mode) retums(vnode) Create a file
`remove (dvnode, name)
`Remove a file name from a directory
`link(vnode, todvnode. toname)
`Link to a file
`rename(dvnode. name. todvnode, toname)
`Rename a file
`mkdir(dvnode. name, attr) retums(dvnode)
`Create a directory
`' rmdlr(dvnode, name)
`Remove a directory
`readdir(dvnode) retums(entries)
`Read directory entries
`symllnk(dvnode. name. attr. to_name)
`Create a symbolic link
`readllnk(vp) returns(data)
`Read the value of a symbolic link
`l'sync(vnode)
`Flush dirty blocks of a file
`inactive (vnode)
`Mark vnode inactive and do clean up
`brnap(vnode. blk) returns(devnode. mappedhlk) Map block number
`strategy(bp)
`Read and write filesystem blocks
`bread(vnode, blockno) returns(buf)
`Read a block
`brelse(vnode, but’)
`Release a block buffer
`
`Notice that many of the vnode procedures map one-to-one with NFS protocol procedures. while
`other, UNIX dependent procedures such as open. close. and loctl do not. The bmap.
`strategy, bread. and brelse procedures are used to do reading and writing using the buffer
`cache.
`
`Pathname traversal is done in the kernel by breaking the path into directory components and
`doing a lookup call through the vnode for each component. At first glance it seems like a waste
`
`Petitioner Microsoft Corporation - Ex. 1041, p. 8
`123
`
`
`
`Petitioner Microsoft Corporation - Ex. 1041, p. 8
`
`

`
`of time to pass only one component with each call instead of passing the whole path and receiving
`back a target vnode. The main reason for this is that any component of the path could be a
`mount point for another filesystem. and the mount
`information is kept above the vnode
`implementation level.
`In the NFS filesystern. passing whole pathnames would force the server to
`keep track of all of the mount points of its clients in order to determine where to break the
`pathname and this would violate server statelessness. The inefficiency of looking up one
`component at a time is alleviated with a cache of directory vnodes.
`
`Implementation
`Implementation of the NFS started in March 1984. The first step in the implementation was
`modification of the 4.2 kernel to include the filesystem interface. By June we had the first
`“vnode kernel" running. We did some benchmarks to test the amount of overhead added by the
`extra interface.
`It turned out that in most cases the difference was not measurable. and in the
`worst case the kernel had only slowed down by about 2%. Most of the work in adding the new
`interface was in finding and fixing all of the places in the kernel that used inodes directly. and
`code that contained implicit knowledge of inodes or disk layout.
`
`Only a few of the filesystem routines in the kernel had to be completely rewritten to use vnodes.
`Namei.
`the routine that does pathname lookup. was changed to use the vnode lookup
`operation, and cleaned up so that it doesn't use global state. The direnter routine, which adds
`new directory entries (used by create, rename. etc.) , also had to be fixed because it depended
`on the global state from namei. Direnter also had to be modified to do directory locking during
`directory rename operations because inode locking is no longer available at this level, and vnodes
`are never locked.
`
`To avoid having a fixed upper limit on the number of active vnode and VFS structures we added a
`memory allocator to the kernel so that these and other structures can be allocated and freed
`dynamically .
`A new system call. getdirentries, was added to read directory entries from different types of
`filesystems. The 4.2 readdir library routine was modified to use the new system call so programs
`would not have to be rewritten. This change does. however, mean that programs that use
`readdir have to be relinked.
`,
`
`Beginning in March. the user level RFC and XDR libraries were ported to the kernel and we were
`able to make kernel to user and kernel
`to kernel RPC calls in June. We worked on RPC
`performance for about a month until the round trip time for a kernel to kernel null RPC call was
`8.8 milliseconds. The performance tuning included several speed ups to the UDP and IP code in
`the kernel.
`
`Once RPC and the vnode kernel were in place the implementation of NFS was simply a matter of
`writing the XDR routines to do the NFS protocol.
`implementing an RPC server for the NFS
`procedures in the kernel, and implementing a filesystem interface which translates vnode
`operations into NFS remote procedure calls. The first NFS kernel was up and running in mid
`August. At this point we had to make some modifications to the vnode interface to allow the
`NFS server to do synchronous write operations. This was necessary since unwritten blocks in
`the server's buffer cache are part of the “client's state".
`
`It wasn't
`Our first implementation of the MOUNT protocol was built into the NFS protocol.
`until later that we broke the MOUNT protocol into a separate. user level RPC service. The
`MOUNT server is a user level daemon that is started automatically when a mount request comes
`in.
`it checks the file /etc/exports which contains a list of exported filesystems and the clients
`that can import, them.
`If the client has import permission,
`the mount daemon does a get!‘h
`system call to convert a pathname into an fhandle which is returned to the client.
`
`On the client side. the mount command was modified to take additional arguments including a
`filesystem type and options string. The filesystem type allows one mount command to mount any
`type of filesystem. The options string is used to pass optional flags to the different filesystem
`mount system calls. For example, the NFS allows two flavors of mount, soft and hard. A hard
`mounted filcsystem will retry NFS calls forever if the server goes down, while a soft mount gives
`up after a while and returns an error. The problem with soft mounts is that most UNIX programs
`are not very good about checking return status from system calls so you can get some strange
`behavior when servers go down. A hard mounted filesystern. on the other hand. will never fail
`due to a server crash; it may cause processes to hang for a while. but data will not be lost.
`
`I24
`
`Petitioner Mic1'0s0ft Corporation - Ex. 1041, p. 9
`
`Petitioner Microsoft Corporation - Ex. 1041, p. 9
`
`

`
`In addition to the MOUNT server, we have added NFS server daemons. These are user level
`processes that make an nfsd system call into the kernel, and never return. This provides a user
`context to the kernel NFS server which allows the server to sleep. Similarly.
`the block l/O
`daemon, on the client side.
`is a user level process that
`lives in the kernel and services
`asynchronous block l/0 requests. Because the RPC requests are blocking. a user context is
`necessary to wait for read-ahead and write-behind requests to complete. These daemons provide
`a temporary solution to the problem of handling parallel. synchronous requests in the kernel.
`In
`the future we hope to use a light-weight process mechanism in the kernel to handle these requests
`[4].
`
`The NFS group started using the NFS in September, and spent the next six months working on
`performance enhancements and administrative tools to make the NFS easier to install and use.
`One of the advantages of the NFS was immediately obvious; as the df output below shows, a
`diskless workstation can have access to more than a Gigabyte of disk!
`
`kbytes
`Filesystem
`7445
`/dev/ndO
`5691
`/dev/ndpO
`27487
`panic:/usr
`345915
`fiat:/usr/src
`148371
`panic:/usr/panic
`7429
`galaxy:/usr/galaxy
`mercury:/usr/mercury 301719
`opium:/usr/opium
`327599
`The Hard Issues
`
`used
`5788
`2798
`21398
`220122
`116505
`5150
`215179
`36392
`
`avail capacity Mounted on
`912
`86%
`/
`2323
`55%
`/pub
`3340
`86%
`/usr
`91201
`71%
`/usr/src
`17028
`87%
`/usr/panic
`1536
`77%
`/usr/galaxy
`56368
`79%
`/usr/mercury
`258447
`12%
`/usr/opium
`
`Several hard design issues were resolved during the development of the NFS. One of the toughest
`was deciding how we wanted to use the NFS. Lots of flexibility can lead to lots of confusion.
`
`Root Fllesystems
`
`Our current NFS implementation does not allow shared NFS root filesystems. There are many
`hard problems associated with shared root filesystems that we just didn't have time to address.
`For example, many well-known, machine specific files are on the root filesystem. and too many
`programs use them. Also. sharing a root filesystem implies sharing /tnp and /dev. Sharing
`/txnp is a problem because programs create temporary files using their process id, which is not
`unique across machines. Sharing /dev requires a remote device access system. We considered
`allowing shared access to /dev by making operations on device nodes appear local. The
`problem with this simple solution is that many programs make special use of the ownership and
`permissions of device nodes.
`
`Since every client has private storage (either real disk or ND) for the root filesystem, we were
`able to move machine specific files from shared filesystems into a new directory called
`/private. and replace those files with symbolic links. Things like /usr/lib/crontab and the
`whole directory /usr/adm have been moved. This allows clients to boot with only /etc and
`/bin executables local. The /usr. and other filesystems are then remote mounted.
`
`Filesystem Naming
`
`Servers export whole filesystems. but clients can mount any sub-directory of a remote filesystem
`on top of a local filesystern, or on top of another remote ftlcsystem.
`In fact. a remote filesystem
`can be mounted more than once. and can even be mounted on another copy of itself! This
`means that clients can have different "names" for filesystems by mounting them in different
`places .
`
`To alleviate some of the confusion we use a set of basic mounted filesystcms on each machine
`and then let users add other filesystems on top of that. Remember though that this is just policy,
`there is no mechanism in the NFS to enforce this. User home directories are mounted on
`
`/usr/serve:-name. This may seem like a violation of our goals because hostnames are now part
`of pathnames but
`in fact the directories could have been called /usr/1. /usr/2. etc. Using
`server names is just a convenience. This scheme makes workstations look more like timesharing
`terminals because a user can log in to any workstation and her home directory will be there.
`It
`also makes tilde expansion (—usemame is expanded to the user's home directory) in the C shell
`work in a network with many workstations.
`
`To aviod the problems of loop detection and dynamic filesystem access checking, servers do not
`cross mount points on remote lookup requests. This means that in order to see the same
`Petitioner Microsoft Corporation - Ex. 1041, p. 10
`
`
`
`Petitioner Microsoft Corporation - Ex. 1041, p. 10
`
`

`
`filesystem layout as a server. a client has to remote mount each of the server’: exported
`filesystems.
`‘
`‘
`Credentials. Authentication and Security
`We wanted to use UNIX style permission checking on the server and client so that UNIX users
`would see very little difference between remote and local
`files.
`RPC allows different
`authentication parameters to be "plugged-in" to the packet header of each call so we were able to
`make the NFS use a UNIX flavor authenticator to pass uid, gid, and groups on each call. The
`server uses the authentication parameters to do permission checking as if the user making the call
`were doing the operation locally.
`The problem with this authentication method is that the mapping from uid and gid to user must
`be the same on the server and client. This implies a flat uid. gid space over a whole local
`network. This is not acceptable in the long run and we are working on different authentication
`schemes.
`In the mean time, we have developed another RPC based service called the Yellow
`Pages (YP) to provide a simple, replicated database lookup service [5]. By letting YP handle
`/etc/passwd and /etc/group we make the flat uid space much easier to administrate.
`Another issue related to client authentication is super-user access to remote files.
`It is not clear
`that the super~user on a workstation should have root access to files on a server machine through
`the NFS. To solve this problem the server maps user root (uid O) to user nobody (uid -2) before
`checking access permission. This solves the problem but. unfortunately, causes some strange
`behavior for users logged in as root, since root may have fewer access rights to a file than a
`normal user.
`
`Remote root access also affects programs which are set-uid root and need access to remote user
`files. for example Ipr. To make these programs more likely to succeed we check on the client
`side for RPC calls that fail with EACCES and retry the call with the real-uid instead of the
`effective-uid. This is only done when the effective-uid is zero and the real-uid is something other
`than zero so normal users are not affected.
`\‘
`the super-user on a client
`While restricting super-user access helps to protect remote files,
`machine can still gain access by using su to change her effective-uid to the uid of the owner of a
`remote file.
`-
`
`Concurrent Access and File Locking
`The NFS does not support remote file locking. We purposely did not include this as part of the
`protocol because we could not find a set of locking facilities that everyone agrees is correct.
`Instead we plan to build separate, RPC based file locking facilities.
`In this way people can use
`the locking facility with the flavor of their choice with minimal effort.
`In
`Related to the problem of file locking is concurrent access to remote files by multiple clients.
`the local filesystem. file modifications are locked at the inode level. This prevents two processes
`writing to the same file from intermixing data on a single write. Since the server maintains no
`locks between requests, and a write may span several RPC requests,
`two clients writing to the
`same remote file may get intermixed data on long writes.
`
`UNIX Open File Semantics
`We tried very hard to make the NFS client obey UNIX filesystem semantics without modifying the
`server or the protocol.
`In some cases this was hard to do. For example, UNIX allows removal of
`open files. A process can open a file, then remove the directory entry for the file so that it has no
`name anywhere in the filesystem. and still read and write the file. This is a disgusting bit of
`UNIX trivia and at first we were just not going to support it, but it turns out that all of the
`programs that we didn't want to have to fix (csh, sendmail, etc.) use this for temporary files.
`What we did to make open file removal work on remote files was check in the client VFS
`remove operation if the file is open. and if so rename it instead of removing it. This makes it
`(sort of) invisible to the client and still allows reading and writing. The client kernel then
`removes the new name when the vnode becomes inactive. We call this the 3/4 solution because
`if the client crashes between the rename and remove a garbage file is left on the server. An
`entry to cron can be added to clean up on the server.
`Another problem associated with remote, open files is that access permission on the file can
`change while the file is open.
`In the local case the access permission is only checked when the
`file is opened. but in the remote case permission is c

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket