`
`3/28/00
`
`37476385950
`
`remzi@cs wisc edu
`
`Reauest
`
`Sandberg
`
`Goldberg
`
`Libra. EXDreSJ
`10309
`
`
`Kleiman Walsh
`
`and Lyon
`
`Design and Implementation of the Sun Network Filesystem
`LI Proceedings of the Summer 1985 USENIX Conference Portland OR
`June 1985 pp 119-130
`
`restrictions The Copyright Law of the United States Title 17 United
`Notice warning concerning copyright
`of copyrighted material Under
`States Code governs the making of photocopies or other
`reproductions
`specified in the law libraries and archives are authorized to furnish
`photocopy or
`certain conditions
`other reproduction One of these specified conditions
`is not to be
`the photocopy or reproduction
`is that
`used for any purpose other
`than private study scholarship or research If
`user makes
`request
`for
`for purposes in excess of fair use that user may be liable for
`or later uses
`photocopy or reproduction
`to refuse to accept
`infringement This institution reserves the right
`copying order
`fulfillment of the order would involve violation of copyright
`law
`
`if
`
`in its
`
`copyright
`judgement
`
`Upon receipt of this electronic reproduction of the publication you have
`requested we ask that you comply with copyright
`law by not systematically
`reproducing it or in any way distributing or making available multiple copies
`of it
`
`ISSN/ISBN/OCLC
`
`____________________
`
`Copyright
`
`Updated
`
`Numberof pages
`
`______________
`
`________
`
`Notified
`
`______________
`
`Petitioner IBM – Ex. 1041, p. 1
`
`
`
`618
`
`65
`
`USENIX Association
`
`Summer Conference Proceedings
`
`Portland 1985
`
`June 1114 1985
`Portland Oregon USA
`
`Petitioner IBM – Ex. 1041, p. 2
`
`
`
`For additional copies of these proceedings contact
`
`USENIX Aociation
`2560 Ninth Street Suite 215
`Berkeley CA 94710 U.S.A
`510/528-8649
`
`Copyright 1985 by The USENIX Association
`
`All rights reserved
`
`This volume is published as
`
`collective work
`
`Rights to individual papers remain
`
`with the author or the authors employer
`
`UNIX is
`
`trademark of ATT Bell Laboratories
`
`Other trademarks are noted in the text
`
`Petitioner IBM – Ex. 1041, p. 3
`
`
`
`Design and Implementation of the Sun Network Filesystem
`
`Russel Sandberg
`David Goldberg
`Steve Kleiman
`Dan Walsh
`Bob Lyon
`
`Sun Microsystems Inc
`2550 Garcia Ave
`Mountain View CA 94110
`415 9607293
`
`OTICE This MATER
`BE PROTECTED
`Copyiight Law Title 17
`
`-Qde
`
`Inuction
`The Sun Network Filesystem NFS provides transparent
`remote access
`to filesystems Unlike
`many other remote filesystem implementations under UNIXt
`the NFS is designed to be easily
`portable to other operating systems and machine architectures
`It uses an External Data
`Representation XDR specification to describe protocols in
`machine and system independent
`Remote Procedure Call package RPC to help
`The NFS is implemented on top of
`way
`simplify protocol definition implementation and maintenance
`In order to build the NFS into the UNIX 4.2 kernel
`user transparent way we decided
`to add
`in
`new interface to the kernel which separates
`generic filesystent operations from specific
`The filesystem interface consists of
`two parts the Virtual File
`filesystem implementations
`System VFS interface defines the operations that can be done on
`filesystem while the vnode
`filesystem This new
`that can be done on
`file within that
`interface defines the operations
`interface allows us to implement and install new filesystems in much the same way as new device
`drivers are added to the kernel
`
`In this paper we discuss the design and implementation of the filesystem interface in the kernel
`filesystem We describe some interesting design issues and how they were
`and the NFS virtual
`implementation We conclude
`resolved and point out some of the shortcomings of
`the current
`with some ideas for future enhancements
`Design Goals
`The NFS was designed to make sharing of filesystem resources in
`network of non-homogeneous
`machines easier Our goal was to provide
`UNIX-like way of making remote files available to
`In addition we
`local programs without having to modify or even recompile those programs
`wanted remote file access
`file access
`to be comparable in speed to local
`The overall design goals of the NFS were
`
`Machine and Operating System Independence
`of UNIX so that an NFS server can
`The protocols used should be independent
`should also be
`The protocols
`supply files to many different
`types of clients
`they can be implemented on low end machines like the PC
`simple enough that
`
`Crash Recovery
`When clients can mount remote filesystems from many different servers it
`that clients be able to recover easily from server crashes
`very important
`
`is
`
`Transparent Access
`We want
`remote files in
`system which allows programs to access
`to provide
`exactly the same way as local files No pathname parsing no special
`libraries
`no recompiling Programs should not be able to tell whether
`file is remote or
`local
`
`UNIX is
`
`trademark of Bell Laboratories
`
`119
`
`Petitioner IBM – Ex. 1041, p. 4
`
`
`
`UNIX Semantics Maintained on Client
`to work on UNIX machines UNIX filesystem
`transparent access
`In order for
`semantics have to be maintained for remote files
`
`Reasonable Performance
`to use the NFS If
`is no faster than the existing networking
`People will not want
`is easier to use Our design goal Is to make NFS
`such as rcp even if
`utilities
`as fast as the Sun Network Disk protocol ND1 or about 80% as fast as
`local disk
`
`it
`
`it
`
`Basic Design
`
`the server side and the client side
`
`The NFS design consists of three major pieces the protocol
`NFS Protocol
`The NFS protocol uses the Sun Remote Procedure Call RPC mechanism
`For the same
`calls help simplify programs RPC helps simplify the definition
`reasons
`that procedure
`organization and implementation of remote services The NFS protocol
`is defined in terms of
`Remote procedure calls are
`set of procedures their arguments and results and their effects
`the call and returned the
`the server has completed
`that is the client blocks until
`synchronous
`results This makes RPC very easy to use since it behaves like
`local procedure call
`The NFS uses
`The parameters to each procedure call contain all of
`the
`stateless protocol
`the call and the server does not keep
`track of any past
`information necessary
`to complete
`resends NFS
`This makes crash recovery very easy when
`server crashes the client
`requests
`response is received and the server does no crash recovery at all When
`client
`requests until
`crashes no recovery is necessary for either the client or the server When state is maintained on
`the server on the other hand recovery is much harder Both client and server need to be able to
`reliably detect crashes The server needs to detect client crashes so that it can discard any State it
`is holding for the client and the client must detect server crashes so that
`it can rebuild the
`servers state
`stateless protocol allows us to avoid complex crash recovery and simplifies the protocol
`Using
`response is received data will never be lost due to
`server
`client just resends requests until
`If
`that has crashed and
`the difference between
`crash
`the client can not
`server
`In fact
`recovered and
`that is slow
`
`server
`
`tell
`
`New transport
`Suns remote procedure
`call package is designed to be transport
`independent
`protocols can be plugged in to the RPC implementation without affecting the higher
`level
`The NFS uses the ARPA User Datagram Protocol UDP and Internet Protocol
`protocol code
`IP for its transport
`level Since UDP is an unreliable datagram protocol packets can get
`lost
`is stateless and the NFS requests are idernpotent
`but because the NFS protocol
`the client can
`the packet gets through
`recover by retrying the call until
`file handle fhandle or fh
`The most common NFS procedure parameter is
`structure called
`file The fhandle is opaque
`which is provided by the server and used by the client
`to reference
`that is the client never looks at the contents of the fhandle but uses it when operations are done
`on that
`file
`
`An outline of the NFS protocol procedures is given below For the complete specification see the
`Sun Network Filesystem Protocol Specification
`
`nuil returns
`Do nothing procedure to ping the server and measure round trip time
`lookupdirfh name returns fh attr
`new Ihandle and attributes for the named file in
`Returns
`createdirfh name attr returns newfh attr
`new file and returns its fhandle and attributes
`Creates
`remove dirfh name returns status
`Removes
`file from directory
`getattrfh returns attr
`Returns file attributes This procedure is like
`ND the Sun Network Disk Protocol provides blocklevel
`
`stat call
`
`access to remote subpartitioned disks
`
`directory
`
`Petitioner IBM – Ex. 1041, p. 5
`
`
`
`read also
`
`time and modify time of
`
`file Setting the size to
`
`setattrfh attr returns attr
`Sets the mode uid gid size access
`the file
`zero truncates
`returns attr data
`readfh offset count
`Returns up to count bytes of data from file starting offset bytes into the file
`the file
`returns the attributes of
`writefh offset count data returns attr
`file beginning offset bytes from the beginning of the file
`Writes count bytes of data to
`Returns the attributes of the file after the write takes place
`renamedirfh name tofh toname returns status
`Renames the file name in the directory dirfh to toname in the directory tofh
`tinkdirfh name tofh toname returns status
`Creates the file toname in the directory lofh which is
`directory dirfh
`symllnkdirfh name string returns status
`symbolic link name in the directory dirfh with value st ring The server does not
`Creates
`interpret the string argument in any way just saves it and makes an association to the new
`symbolic link file
`readllnkfh returns string
`Returns the string which is associated with the symbolic link file
`mkdLrdirfh name attr returns fh newaltr
`new directory name in the directory dirfh and returns the new fhandle and
`Creates
`
`link to the file name in the
`
`attributes
`rmdirdirfh name returnsstatus
`Removes the empty directory name from the parent directory dirfh
`returns entries
`readd ir dirfh cookie count
`Returns up to count bytes of directory entries from the directory dirfh Each entry contains
`The
`cookie
`file name file id and an opaque pointer to the next directory entry called
`specific entry in the
`readdir calls to start reading at
`cookie is used in subsequent
`readdir call with the cookie of zero returns entries starting with the first
`directory
`entry in the directory
`statfsfh returns fsstats
`Returns filesystem information such as block size number of free blocks etc
`New fhandles are returned by\the lookup create and mkdlr procedures which also take an
`filesystem is obtained by the
`remote fhandle for the root of
`The first
`fhandle as an argument
`client using another RPC based protocol The MOUNT protocol
`directory pathname and
`takes
`the client has access permission to the filesystem which contains that
`returns an fhandle if
`this makes it easier to plug in
`is that
`directory The reason for making this
`separate protocol
`the operating system dependent
`checking methods and it separates out
`new filesystem access
`aspects of the protocol Note that the MOUNT protocol
`is the only place that UNIX pathnames
`In other operating system implementations the MOUNT protocol can
`are passed to the server
`be replaced without having to change the NFS protocol
`The NFS protocol and RPC are built on top of an External Data Representation XDR
`XDR defines the size bytes order and alignment of basic data types such as
`specification
`from the basic data
`can be built
`string integer union boolean and array Complex structures
`it also makes
`types Using XDR not only makes protocols machine and language independent
`them easy to define The arguments and results of RPC procedures are defined using an XDR
`
`data definition language that looks
`
`lot
`
`like
`
`declarations
`
`Server Side
`is stateless as mentioned above when servicing an NFS request
`Because the NFS server
`commit any modified data to stable storage before returning results The implication for UNIX
`requests which modify the filesystem must flush all modified data to disk
`based servers is that
`for example on write request not only the
`before returning from the call This means that
`data block but also any modified indirect blocks and the block containing the .inode must be
`they have been modified
`Another modification to UNIX necessary to make the server work is the addition of
`generation
`These extra numbers make It
`number in the mode and
`filesystein Id in the superbiock
`to use the inode number mode generation number and filesystem Id
`possible for the server
`
`it must
`
`flushed if
`
`Petitioner IBM – Ex. 1041, p. 6
`
`
`
`file The mode generation number Is necessary because the server
`together as the fhandle for
`is later removed and the mode
`may hand out an fhandle with an mode number of
`file that
`this mode
`fhandle comes back the server must be able to tell
`that
`reused When the original
`every time
`different file The generation number has to be Incremented
`number now refers to
`the mode is freed
`
`Client Side
`
`interface to the NFS To make transparent access to
`The client side provides the transparent
`does not change the
`locating remote files that
`method of
`remote files work we had to use
`structure of path names Some UNIX based remote file access schemes use hostpath to name
`since existing programs that parse
`transparent access
`
`This does not allow real
`remote files
`pathnameS have to be modified
`address we decided to do the hostname lookup and
`late binding of file
`Rather than doing
`remote filesystem to
`file address binding once per filesystem by allowing the client
`directory using the mount program This method has the advantage that the client only has to
`to filesystems
`deal with hosinames once at mount time It also allows the server to limit access
`is that remote files are not available to the
`The disadvantage
`client credentials
`by checking
`is done
`client until mount
`single machine is provided by
`Transparent access to different types of filesystems mounted on
`new filesystems interface in the kernel Each filesystem type supports two sets of operations
`the Virtual Filesystem VFS interface defines the procedures that operate on the filesystem as
`on an
`that operate
`whole and the Virtual Node vnode interface defines
`the procedures
`the filesystem
`schematic diagram of
`filesystem type
`Figure
`file within that
`individual
`interface and how the NFS uses it
`
`to attach
`
`is
`
`CLIENT
`
`SERVER
`
`Network
`
`Figure
`
`The Filesystem Interface
`the operations that can be done
`structure that contains
`The VFS interface is implemented using
`the operations
`structure that contains
`the vnode interface is
`whole filesystem Likewise
`on
`filesystem There is one VFS structure per
`node file or directory within
`that can be done on
`
`122
`
`Petitioner IBM – Ex. 1041, p. 7
`
`
`
`for each active node Using this
`mounted filesystem in the kernel and one vnode structure
`filesystems and nodes in the same
`abstract data type implementation allows the kernel
`to treat all
`is using
`way without knowing which underlying filesystem implementation it
`pointer to mounted-on VFS This
`to its parent VFS and
`Each vnode contains
`pointer
`root
`filesystern tree can be mount point
`for another filesystem
`means that any node in
`operation is provided in the VFS to return the root vnode of mounted filesystem This is used
`The root operation is
`by the pathname traversal routines in the kernel to bridge mount points
`pointer so that the root vnode for each mounted filesystem can be
`used instead of just keeping
`back pointer to the vnode on which it
`released The VFS of mounted fliesystem also contains
`can also be traversed across mount points
`is mounted so that pathnames that include
`each filesystem type must provide mount and
`In addition to the VFS and vnode operations
`The operations defined for the
`mount_root operations to mount normal and root
`filesystems
`filesystem interface are
`
`Filesystem Operations
`mount varies
`mount_root
`
`VFS Operations
`
`unmountvfs
`rootvfs returnsvnode
`statfs vfs returns fsstatbuf
`sync vfs
`
`Vnode Operations
`
`System call to mount
`Mount
`filesystem as root
`
`filesystem
`
`Unmount
`filesystern
`Return the vnode of the filesystem root
`Return
`filesystem statistics
`Flush delayed write blocks
`
`Mark file open
`Mark file closed
`Read or write
`Do I/O control operation
`Do select
`Return
`file attributes
`
`file
`
`directory
`
`file
`
`Link to
`Rename
`
`file
`
`file
`
`openvnode flags
`closevnode
`flags
`rdwrvnode uio rwflag flags
`loctlvnode cmd data rwflag
`selectvnode
`rwflag
`getattrvnode
`returnsattr
`Set
`file attributes
`setattrvnode attr
`accessvnode mode
`Check access permission
`Look up file name in
`lookupdvnode name returnsvnode
`name attr cxci mode returnsvnode
`Create
`createdvnode
`file name from directory
`name
`Remove
`removedvnode
`linkvnode todvnode toname
`renamedvnode name todvnode toname
`mkdlrdvnode name attr returnsdvnode
`Create
`Remove
`rmdlrdvnode name
`directory
`Read directory entries
`readdlrdvnode returns entries
`symllnkdvnode name attr to_name
`Create
`symbolic link
`Read the value of
`symbolic link
`readlfnkvp returnsdata
`Flush dirty blocks of
`fsyncvnode
`Mark vnode inactive and do clean up
`inactive vnode
`bmapvnode bik returnsdevnode mappedblk Map block number
`Read and write filesystem blocks
`strategybp
`Read
`block
`breadvnode blockno
`Release
`block buffer
`brelsevnode buf
`Notice that many of the vnode procedures map one-to-one with NFS protocol procedures while
`The bmap
`such as open close and locti do not
`other UNIX dependent procedures
`are used to do reading and writing using the buffer
`strategy bread and brelse procedures
`cache
`
`directory
`
`file
`
`returnsbuf
`
`is done in the kernel by breaking the path into directory components
`Pathname
`traversal
`lookup call through the vnode for each component At first glance it seems like
`doing
`
`and
`waste
`
`123
`
`Petitioner IBM – Ex. 1041, p. 8
`
`
`
`of time to pass only one component with each cali Instead of passing the whole path and receiving
`The main reaon for this is that any component of the path could be
`target vnode
`back
`filesystem and the mount
`information Is kept above the vnode
`mount point
`for another
`In the NFS filesystem passing whole pathnames would force the server to
`implementation level
`its clients in order to determine where to break the
`the mount points of
`keep track of ail of
`pathname and this would violate server statelessness The inefficiency of
`looking up one
`cache of directory vnodes
`time is alleviated with
`component at
`Implementation
`
`the NFS started in March 1984
`step in the implementation was
`The first
`Implementation of
`By June we had the first
`to include the filesystem interface
`the 4.2 kernel
`modification
`of
`vnode kernel running We did some benchmarks to test
`the amount of overhead added by the
`the difference was not measurable and in the
`in most cases
`turned out
`extra interface It
`that
`worst case the kernel had only slowed down by about 2% Most of the work in adding the new
`that used modes directly and
`interface was in finding and fixing all of the places in the kernel
`code that contained implicit knowledge of modes or disk layout
`the filesystem routines in the kernel had to be completely rewritten to use vnodes
`few of
`Only
`lookup
`the routine that does pathname lookup was changed
`the vnode
`Namei
`to use
`it doesnt use global state The direnter routine which adds
`operation and cleaned up so that
`new directory entries used by create rename etc also had to be fixed because it depended
`on the global state from narnei Direnter also had to be modified to do directory locking during
`directory rename operations because mode locking is no longer available at this level and vnodes
`locked
`are never
`
`fixed upper limit on the number of active vnode and VFS structures we added
`can be allocated and freed
`these and other structures
`to the kernel
`so that
`
`To avoid having
`memory allocator
`dynamically
`new system call getdirentries was added to read direètory entries from different
`types of
`filesystems The 4.2 readdir library routine was modified to use the new system call so programs
`This change does however mean that programs that use
`would not have to be rewritten
`readdir have to be relinked
`Beginning in March the user level RPC and XDR libraries were ported to the kernel and we were
`to kernel RPC calls in June We worked on RPC
`able to make kernel
`to user and kernel
`to kernel null RPC call was
`performance for about month until
`the round trip time for
`kernel
`8.8 milliseconds The performance tuning included several speed ups to the UDP and IP code in
`the kernel
`Once RPC and the vnode kernel were in place the implementation of NFS was simply matter of
`the NFS
`writing the XDR routines to do the NFS protocol
`implementing an RPC server
`for
`translates vnode
`filesystem interface which
`and implementing
`in the kernel
`procedures
`The first NFS kernel was up and running in mid
`operations into NFS remote procedure calls
`August At this point we had to make some modifications to the vnode interface -to allow the
`NFS server to do synchronous write operations
`This was necessary since unwritten blocks in
`the servers buffer cache are part of the clients state
`the MOUNT protocol was built
`into the NFS protocol
`It wasnt
`Our first
`implementation of
`later that we broke the MOUNT protocol
`level RPC service
`The
`separate user
`into
`until
`MOUNT server
`user level daemon that is started automatically when mount request comes
`list of exported filesystems and the clients
`in It checks the file /etc/exports which contains
`the client has import permission the mount daemon does
`that can import them If
`getfh
`pathname into an fhandle which is returned to the client
`to convert
`system call
`On the client side the mount command was modified to take additional arguments including
`filesystem type and options string The filesystem type allows one mount command to mount any
`type of filesystem The options string is used to pass optional
`flags to the different
`filesystem
`mount system calls For example the NFS allows two flavors of mount soft and hard
`hard
`the server goes down while
`retry NFS calls forever if
`soft mount gives
`mounted filesystem will
`up after while and returns an error The problem with soft mounts is that most UNIX programs
`are not very good about checking return status from system calls so you can get some strange
`hard mounted filesystem on the other hand will never
`behavior when servers go down
`server crash it may cause processes to hang for while but data will not be lost
`due to
`
`is
`
`fail
`
`124
`
`Petitioner IBM – Ex. 1041, p. 9
`
`
`
`In addition to the MOUNT server we have added NFS server daemons
`These are user level
`processes that make an nfsd system call
`into the kernel and never return This provides
`user
`to the kernel NFS server which allows the server
`to sleep Similarly the block I/O
`context
`daemon on the client side is
`and services
`user
`lives in the kernel
`that
`level process
`Because the RPC requests are blocking
`asynchronous block
`I/O requests
`user context
`necessary to wait for read-ahead and write-behind requests to complete
`These daemons provide
`requests in the kernel
`temporary solution to the problem of handling parallel synchronous
`In
`the future we hope to use
`light-weight process mechanism in the kernel
`to handle these requests
`
`is
`
`The NFS group started using the NFS in September and spent
`the next six months working on
`and administrative tools to make the NFS easier
`performance enhancements
`to install and use
`One of
`the advantages of the NFS was immediately obvious as the df output below shows
`to more than
`diskless workstation can have access
`Gigabyte of disk
`
`Fi.esysten
`/dev/ndo
`/dev/ndpo
`panic/usr
`fiat/usr/src
`panic/usr/panic
`galaxy/usr/galaxy
`mercury/usr/mercury
`opium/usr/opium
`
`kbytes
`7445
`5691
`27487
`345915
`148371
`7429
`301719
`327599
`
`used
`5788
`2798
`21398
`220122
`116505
`5150
`215179
`36392
`
`avail capacity
`912
`86%
`2323
`65%
`3340
`86%
`91201
`71%
`17028
`87%
`1536
`77%
`56368
`79%
`258447
`12%
`
`Mounted on
`
`/pub
`/usr
`/usr/src
`/usr/panic
`/usr/galaxy
`/usr/mercury
`/usr/opium
`
`The Hard Issues
`Several hard design issues were resolved during the development of the NFS One of the toughest
`was deciding how we wanted to use the NFS Lots of flexibility
`can lead to lots of confusion
`Root Fliesystems
`Our current NFS implementation does not allow shared NFS root filesystems
`There are many
`filesystems that we just didnt have time to address
`hard problems associated with shared root
`For example many well-known machine specific files are on the root filesystem and too many
`programs use them Also sharing
`filesystem implies sharing /tmp and /dev
`root
`Sharing
`tmp is
`problem because programs create temporary files using their process Id which is not
`remote device access system We considered
`unique across machines
`Sharing /dev
`requires
`The
`to /dev
`by making operations on device
`allowing shared access
`local
`nodes appear
`problem with this simple solution is that many programs make special use of the ownership and
`permissions of device nodes
`Since every client has private storage either real disk or ND for the root filesystem we were
`able to move machine specific
`from shared
`new directory called
`filesystems into
`/private and replace those files with symbolic links Things like iusrilib/crontab and the
`whole directory /usriadm have been moved This allows clients to boot with only /etc
`and
`ibm executables local The /usr and other filesystems are then remote mounted
`Fliesystem Naming
`
`files
`
`Servers export whole filesystems but clients can mount any sub-directory of
`remote filesystem
`on top of
`local filesystem or on top of another remote filesystem In fact
`remote filesystem
`can be mounted more than once and can even be mounted on another
`This
`itself
`copy of
`means that clients can have different names for filesystems by mounting them in different
`places
`To alleviate some of the confusion we use
`set of basic mounted filesystems on each machine
`and then let users add other filesystems on top of that Remember
`though that this is just policy
`there is no mechanism in the NFS to enforce this
`User home directories are mounted on
`violation of our goals because hostnames are now part
`This may seem like
`/usriserverriame
`the directories could have been called /usr/1
`/usr/2 etc Using
`of pathnames but
`in fact
`server names is just
`convenience
`This scheme makes workstations
`look more like timesharing
`user can log in to any workstation and her home directory will be there It
`terminals because
`also makes tilde expansion -username is expanded
`to the users home directory in the
`shell
`work in
`network with many workstations
`To aviod the problems of loop detection and dynamic filesystem access checking servers do not
`ih ir ai.sIi
`This
`cross mount points on remote lookup rpnh1pte
`tha
`
`Petitioner IBM – Ex. 1041, p. 10
`
`
`
`server
`
`client has to remote mount each of
`
`the servers exported
`
`files
`
`difference
`
`filesystem layout as
`filesystems
`Credentials Authentication and Security
`We wanted to use UNIX style permission checking on the server and client so that UNIX users
`RPC allows different
`remote and local
`between
`would see
`very
`authentication parameters to be plugged-in to the packet header of each call so we were able to
`The
`to pass uid gid and groups on each call
`make the NFS use UNIX flavor authenticator
`the user making the call
`server uses the authentication parameters to do permission checking as if
`were doing the operation locally
`the mapping from uid and gid to user must
`The problem with this authentication method is that
`whole local
`fiat uid gid space over
`be the same on the server and client
`This implies
`This is not acceptable in the long run and we are working on different authentication
`network
`In the mean time we have developed another RPC based service called the Yellow
`schemes
`By letting YP handle
`Pages YP to provide
`simple replicated database lookup service
`we make the fiat uid space much easier
`to administrate
`and /etc/group
`/etc/passwd
`is not clear
`to remote files
`It
`issue related to client authentication is super-user access
`Another
`server machine through
`that the super-user on workstation should have root access to files on
`before
`to user nobody uid
`the NFS To solve this problem the server maps user root uid
`access permission This solves the problem but unfortunately causes some strange
`checking
`file than
`for users logged in as root since root may have fewer access
`rights to
`behavior
`normal user
`to remote user
`Remote root access also affects programs which are set-uid root and need access
`files for example lpr To make these programs more likely to succeed we check on the client
`the
`fail with EACCES and retry the call with the real-uid instead of
`side for RPC calls that
`This is only done when the effective-uid is zero and the real-uid is something other
`effective-uid
`than zero so normal users are not affected
`
`little
`
`client
`
`remote files the super-user on
`While restricting super-user access
`helps to protect
`machine can still gain access by using su to change her effective-uid to the uid of the owner of
`remote file
`Concurrent Access and File Locking
`The NFS does not support remote file locking We purposely did not
`include this as part of the
`locking facilities that everyone agrees is correct
`protocol because we could not
`set of
`find
`In this way people can use
`Instead we plan to build separate RPC based file locking facilities
`the locking facility with the flavor of their choice with minimal effort
`access to remote files by multiple clients
`In
`Related to the problem of file locking is concurrent
`the mode level This prevents two processes
`the local filesystem file modifications are locked at
`single write Since the server maintains no
`writing to the same file from intermixing data on
`locks between requests and write may span several RPC requests two clients writing to the
`intermixed data on long writes
`same remote file may get
`UNIX Open File Semantics
`We tried very hard to make the NFS client obey UNIX filesystem semantics without modifying the
`In some cases this was hard to do For example UNIX allows removal of
`server or the protocol
`file then remove the directory entry for the file so that it has no
`process can open
`open flies
`read and write the file
`This is
`disgusting bit of
`name anywhere in the filesystem and still
`the
`it but
`turns out
`that all of
`UNIX trivia and at
`first we were just not going to support
`to have to fix csh sendrncil etc use this for temporary files
`programs that we didnt want
`in the client VFS
`removal work on remote files was check
`What we did to make open file
`instead of removing it This makes it
`the file is open and if so rename it
`remove operation if
`The client kernel
`then
`sort of invisible to the client and still allows reading and writing
`removes the new name when the vnode becomes inactive We call this the 3/4 solution because
`garbage file is left on the server An
`the client crashes between the rename and remove
`entry to cron can be added to clean up on the server
`Another problem associated with remote open files is that access permission on the file can
`In the local case the access permission is only checked when the
`change while the file is open
`in the remote case permission is checked on every NFS call This means that
`file is opened but
`it no longer has read
`file then changes the permission bits so that
`client program opens
`if
`
`it
`
`if
`
`Petitioner IBM – Ex. 1041, p. 11
`
`
`
`To get around this problem we save the client
`read request will
`fail
`subsequent
`permission
`credentials in the file table at open time and use them in later file access
`requests
`Not all of the UNIX open file semantics have been preserved because interactions between two
`For example if one
`clients using the same remote file can not be controlled on
`single client
`file the first clients read request will
`file and another client
`removes that
`fail
`client opens
`even though the file is still open
`Time Skew
`
`client and
`file to be
`server can cause time associated with
`Time skew between two clients or
`library entry and Id checks the
`time in
`For example ranlib saves the current
`inconsistent
`modify time of the library against the time saved in the library When ran lib is run on
`remote
`file the modify time comes from the server while the current
`time that gets saved in the library
`the servers time is far ahead of
`looks to Id like the
`the clients it
`comes from the client
`There were only three programs that we found that were affected by this
`library is out of date
`ranlib is and emacs so we fixed them
`potential problem for any program that compares system time to file modification time
`This is
`We plan to fix this by limiting the time skew between machines with
`time synchronization
`protocol
`Performance
`
`If
`
`The final hard issue is the one everyone is most interested in performance
`Much of the time since the NFS first came up has been spent
`in improving performance Our
`goal was to make NFS faster than the ND in the 1.1 Sun release about 80% of the speed of
`but how long it
`takes to do
`raw throughput
`local disk The speed we are interested in is not
`normal work
`set of benchm