throbber
,
`
`"Appeared at the USENIX Conference & Exhibition, Portland, Oregon, Summer 1985"
`
`(
`
`Design and Implementation or the Sun Network Fllesystem
`
`Rwrsd Sartdb,rl
`Dtnld Goldb,rl
`Sin• 1Cl•lm1111
`D1111 Wo/JII
`Bob Lyo11
`
`Sun Microsystems, Inc.
`2550 Oarcla Ave.
`Mountain View, CA. 94110
`(415) 960-7293
`
`Introduction
`The Sun Network FUesystem (NI'S) provides transparent, remote access to filesystems. Unlike
`many other remote filesystem implementations under UNJXt, the NI'S is designed to be easily
`portable 10 other operating systems and machine architectures.
`It uses an External Data
`Representation (XOR) specification to describe protocoll in a machine and system Independent
`way. The NFS is implemented on top of • Remote Procedure Call package (RPC) to help
`simplify protocol deflJlition, Implementation, and maintenance.
`In order to build the NFS into the UNIX ·4,2 kernel in a user transparent way, we decided to add
`a new Interface 10 the kernel which separates generic mesystem operations from specific
`filesystem implementations. The •fi!esystem interface" consists or two parts: the Virtual File
`System (VFS) Interface defines the operations that can be done on a filesystem, while the vnode
`Interface defines the operations that can be done on a file within that mesystem. This new
`inte~face allows us 10 implement and Install new filesystems In much the same way as new device
`,drivers are added to the kernel.
`In this paper we discuss the design and Implementation of the filesystem Interface In the kernel
`and the NFS virtual filesystem. We describe some interesting design Issues and how they were
`resolved, and point out some of the shoncornings of the current Implementation. We conclude
`with some ideas for future enhancel)lents.
`Design Goals
`The NFS was designed to make sharing of filesystem resources in a network of non-homogeneous
`machines easier. Our goal was to provide a UNIX-like way of makln& remote files available to
`local programs without having to modify, or even recompile, those programs. In addition, we
`wanted remote file access to be comparable in speed to local file access.
`The overall design aoals of the NFS were:
`Machine and Operating System Independence
`The protocols used should be independent or UNIX 10 that an NFS server can
`supply files to many different types of clients. The protocols should· also be
`simple enouah that they can be implemented on low end machines like the PC.
`Crash Recovery
`When clients can mount remote filesystems from many different servers ii Is
`very Important that clients be able to recover easily from server crashes.
`Transparent Access
`We want to provide a system which allows programs to access remote files in
`exactly the same way as local files. No pathname parsing, no special libraries,
`no recompillna. Programs should not be able to tell whether a me is remote or
`,
`local.
`--------
`t UNIX ii I trademart of Bell Laboratoriea.
`
`119
`
`Microsoft Ex. 1018, p. 1
`Microsoft v. Daedalus Blue
`IPR2021-00831
`
`

`

`)
`
`)
`
`,
`
`UNIX Semantics Maintained on Client
`In order for
`transparent access to worlr. on UNIX machines, UNIX fllesystem
`semantics have to be maintained for remote files.
`Reasonable Performance
`People will not want to use !be NFS if It is no faster !ban !be existing networlr.lng
`utilities, such as rep, even if It II easier to use. Our design goal is to mate NFS
`as fast as the Sun Network Dlslr. protocol (ND 1) , or about 8091, u fut as a
`local disk.
`Basic Design
`The NFS design consists of lhree major pieces: !be protocol, !be server side and !be client side.
`NFS Protocol
`The NFS protocol uses !be Sun Remote Procedure Call (RPC) mechanism I l I. For !be ■ame
`reasons that pro<cdllre calls help ■lmpllfy programs, RPC helps ■lmpllfy !be definition,
`organization, and implementation of remote services. The NFS protocol II denned In ternu of a
`set of procedures, their arguments and results, and !heir effects. Remote procedure calls are
`synchronous, that is, the client blocks until !be server bu completed !be call and returned !be
`results. This makes RPC very easy to use since It behaves lllr.e a local procedure call.
`The NFS uses a stateless protocol. The parameters to each procedure call contain all of !be
`Information necessary to complete the call, and the server does not keep track of any past
`requests. This makes crash recovery very easy; when a ■erver crashes, !be client resends NFS
`requests until a response is received, and the server does no crash recovery at all. When a client
`crashes no recovery is necessary for either the client or the server. When state II maintained on
`the server, on the other hand, recovery is much harder. Both client and server need to be able to
`reliably detect crashes. The server needs to detect client crashes so that It can discard any state it
`is holding for the client, and the client must detect server crashes 10 that It can rebuild !be
`server's state.
`Using a stateless protocol allows us to avoid complex crash recovery and simplifies the protocol.
`If a client just resends requests until a response is received, data will never be lost due to a server
`In fact the client can not tell the difference between a server that bas crashed and
`crash.
`recovered, and a server that Is slow.
`Sun's remote procedure call package is designed to be transpon Independent. New transpon
`protocols can be "plugged in" to the RPC irnplementati.on without affecting !be higher level
`protocol code. The NFS uses the ARPA User Datagram Protocol (UDP) and Internet Protocol
`(IP) for its transpon level. Since UDP Is an unreliable datagram protocol, packets can get Jost,
`but because the NFS protocol is stateless and the NFS requests are idempotent, the client can
`recover by retrying the call until the packet gets through.
`The most common NFS procedure parameter is a strucNre called a me handle (fhandle or fh)
`which is provided by the server and used by the client to reference a me. The fbandle is opaque,
`that is, the client never looks at the contents of the fbandle, but uses It when operations are done
`on that file.
`An outline of the NFS protocol procedures Is given below. For the complete specification see the
`Sim Nllwork Filrsyst,m Protocol Sp,cification {ZJ.
`null() returns ()
`Do nothing procedure to ping the server and measure round trip time.
`tookup(dirfb, name) returns (fb, attr)
`Returns a new fbandle and attributes for the named me In a directory.
`cre ■ te(dlrfb, name, anr) retum1 (newfb, attr)
`Create• a new me and reNmS 111 fhandle and attributes.
`remove(dirfh, name) returns (1taNs)
`Removes a me from a directory.
`aet ■ ttr(fh) returns (attr)
`Returns file attributes. This procedure is lllr.e a stat call.
`[!] NJ>, the Sun Networt Dist Protocol, provide• block-level acce11 to remoll, sub-partitioned diata.
`
`10
`
`....
`
`Microsoft Ex. 1018, p. 2
`Microsoft v. Daedalus Blue
`IPR2021-00831
`
`

`

`(
`
`aetattr(fh, attr) returns (attr)
`Sets the mode, uid, gid, 1ize, access time, and mocllty time or a file. Setting the size to
`zero truncates the me.
`read(fh, offset, count) return• (anr, data)
`Returns up to count bytes of data [rom a me stantn1 off11t bytes tnto the flle. read also
`returns the attributes of the file.

`wrlte(fh, offset, count, data) returns (attr)
`Writes count byte, or data to a me beginning 0//111 bytn from the begin.Ding or the file.
`Returns the attributes of the me after the write tun place.
`rename(dirfh, name, tofb, toname) returns (status)
`.
`Renames the me nam, tn the directory dlr/11, to ton am, ln the directory to/It.
`Unk(dirfh, name, tofh, toname) returns (status)
`Creates the file tonam, tn lbe directory to/11. which ii a link to the file 11am, ln the
`directory dlrflt.
`1ymllnk(dirfb, name, 1trin1) returns (status)
`Creates a symbolic link 11am, in the directory dir/11 with value 1trln1. The server does not
`interpret the 11r;n1 arpunent tn any way, just saves it and mates an usociadon to the new
`symbolic link file.
`readllnk(fh) returns (string)
`Returns the strtna which is associated with the symbolic lint file.
`mkdlr(dirfh, name, attr) returns (fh, newanr)
`Creates a new directory 11am, in tbe directory dirP, and returns the new fhandle and
`attributes.
`rmdlr(dirfh, name) retums(status)
`Removes the empty directory nam, from the parent directory dirfh.
`readdlr(dirfh, cookie, count) retums(entries)
`Returns up to count bytes of directory entries from the dir9'tory dirfh. Each entry contains
`a file name, file Id, and an opaque pointer to the next directory entry called a coolci,. The
`cooki, is used In subsequent readdlr calls to stan reading at a specific entry in the
`directory. A readdlr call with the cooA:i, of zero returns entries stanin1 with the first
`entry in the directory.
`1tatrs(fh) returns (fsstats)
`Returns filesystcm information such as block size, number of free blocks, etc.
`New fhandles are returned by the lookup. create, and mkdlr procedures which also take an
`fhandle as an argument. The first remote fhandle, for the root of a mesystem, Is obtained by the
`client using another RPC based protocol. The MOUNT protocol takes a directory pathname and
`returns an fhandle if the client has access permission to the mesystem which contains that
`directory. The reason for making this a separate protocol is that this makes it easier to plug in
`new filesystem access checkin& methods, and it separates out the operating system dependent
`aspects of the protocol. Note that the MOUNT protocol ls the only place that UNIX pathnames
`are passed to the server. In other operating system implementations the MOUNT protocol can
`be replaced without having to chanae the NFS protocol.
`The NFS protocol and RPC are built on top or an External Data Representation (XOR)
`specification (3}. XOR defines the size, bytes order and alignment or basic data types such as
`string, integer, union, boolean and array. Complex structures can be built from tbe basic data
`types. Using XDR not only makes protocols machine and lanauage independent, it also makes
`them easy to define. The arauments and results or RPC procedures are defined usina an XDR
`data definitioq lanauage that looks a lot like C declarations.
`Server Side
`Because the NFS server ls stateless, u mentioned above, when servicing an NFS request it must
`commit any modified data to stable storage before returning results. The implication for UNIX
`based servers is that requests which modify the fllesystem must nu.sh all modified data to disk
`before returning from the call. This means that, for example on a write request, not only the
`data block, but also any modified indirect blocks and the block containing the inode must be
`nushed if they have been modified.
`~
`~
`Another modification to UNIX necessary to make the server work is the addition or a generation
`number in the inode, and a rllesystem id in the superbloct. These extra numbers make it
`possible ror the server to use the inode number, in ode generation number, and filesysttm id
`11
`
`121
`
`Microsoft Ex. 1018, p. 3
`Microsoft v. Daedalus Blue
`IPR2021-00831
`
`

`

`101ether as the fhandle !or a me. The inode generation number is neceasary because the aerver
`may hand out an !handle with an inode number or a flle that ii later removed and the inode
`reused. When the original fhandle comes back, the server must be able to tell that thls lnode
`number now refers to a different me. Tbe generation number bu 10 be incremented every time
`the inode is !reed.
`Cllenl Side
`The client side provides the transparent interface to the NFS. To mate transparent access to
`remote ftles work we had to uae a method of locating remote rues that does not change the
`1tnu:ture of path names. Some UNIX based remote file access 1cheme1 use lto1t:poth to name
`remote files. This does not allow real transparent acceu 1ince existing programs that pane
`pathnames have to be modified.
`Rather than doing a •]ate binding" of me addre11, we declded to do the bostname lookup and
`rue address bindins once per fiJesystem by allowing the client to attach a remote fllesystem to a
`directory usinJ the mount propam. This method bu the advantage that the client only bu to
`deal with bostnames once, at mount time. It also allows the server to limit ac:c:etl to f1111ystem1
`by checking client credentials. The disadvantage d that remote rues are not available to the
`client until a mount is done.
`Transparent access to different types of filesystems mounted on a single machine is provided by a
`new filesystems interface in the kernel. Each •ruesy11em type" supports two sets or operations:
`the Virtual Filesystem (VFS) interface defmes the procedures lhat operate on the fllesystem u a
`whole; and the Virtual Node (vnode) interface defines the procedures that operate on an
`individual file within that filesystem type. Figure 1 ls a schematic diagram of the filesystem
`interface and how the NFS uses it.
`
`CLIENT
`
`System Calls
`
`SERVER
`
`System Calls
`
`VNODE/VFS
`
`VNODE/VFS
`
`PC Filesystem
`
`4 . 2 Filesystem
`
`NFS Filesystem
`
`Server Routines
`
`Floppy
`
`RPC I XDR
`
`RPC / XDR
`
`Network •
`
`Flaure I
`
`The Flleayslem lnlerrace
`The VFS interface is implemented using a structure that contains the operations that can be done
`on a whole filesystem. Likewise, the vnode interface is a structure that contains the operations
`that can be done on a node (me or dlrectory) within a mesystem. There is one VFS structure per
`12
`
`)
`
`Ill
`
`Microsoft Ex. 1018, p. 4
`Microsoft v. Daedalus Blue
`IPR2021-00831
`
`

`

`(
`
`mounted fllesystem In the kemtl and ont vnode structure for ucb active node. U1in1 this
`abstract data type implementation allows the kernel to treat all fUesystems and nodn lD the 1&me
`way without knowin& wblcb underlytna ftlesysttm implementation it II UJin&.
`Each vnode contains a pointer to its puent VFS and a pointer to a mounted•on VFS. 11m
`means that any node In a filesystem tree can be a mount point for another ftlaystem. A root
`operation is provided in the VFS to retum the root vnode of a mounted filnyatem. This ls used
`by the pathname travenal routinn in the kernel to brld&e mount points. Th• root operation ls
`used instead or Just teeptns a pointer ao that the root vnode for uch mounted llletyllem an be
`released. The VFS of a mounted ftlesystem also contalnJ a back pointer to tbe vnode on ~blcb lt
`is mounted 10 that pathnames that Include • •• " can also be trav,ned acrou mount points.· .
`In addition to the VfS and vnode operations. each filnystem type must provide mount and
`mount_root operations to mount normal and root fi111y1tems. Th• operations defined for the
`ftlesystem interface are:
`
`Fil,syst,m Op1rarions
`· mount( varies }
`mount_root(")
`
`VFS Op1rario11s
`unmount(vfs)
`root(vfs) retunu(vnode)
`atatrs(vfs) returns( f11tatbuf)
`aync(vfs)
`
`System call to mount ralnystem
`Mount filesystem u root
`
`Unmount fLlesystem
`Return the Y'Dode of the fllesystem root
`Return n.Ie1y1tem statistics
`f1usb delayed write blocks
`
`Vnod, Op,rarions
`Mark me open
`open(vnode, flags)
`Mart flle closed
`close(vnode. nags)
`Read or writt a file
`rdwr(vnode, uio, rwflag, naas)
`locll(vnode, cmd, data, rwfla1)
`Do 1/0 control operation
`Do 11lect
`aelect(vnode, rwna1)
`Return flle attributes
`getattr(vnode ) retums(attr)
`Set file attributes
`aetattr(vnode, attr)
`Check access permission
`access(vnode, mode)
`Loot up file name in a directory
`lookup(dvnode, name) retums('.(Dode)
`create(dvnode, name, attr, excl, mode) retums(vnode) Create a ftle
`Remove a me name from a direcrory
`remove(dvnode, name)
`Unk to a rue
`Jlnk(vnode, todvnodt, toname)
`rename(dvnode, name, todvnode, toname)
`Rename a file
`Create a directory
`mkdir(dvnode, name, attr) retums(dvnode)
`Remove a diRctory
`rmdlr(dvnode, name)
`readdlr(dvnode) retums(entries)
`Read dlrectory entries
`•ymlink(dvnode, name, attr, to_name)
`Create a symbolic link
`Read the value of a symbolic lint
`readllnk(vp) retums(data)
`flush cliny blocks of a rue
`rsync(vnode)
`lnactlve(vnode)
`Mart vnode inactive and do clean up
`bmap(vnode, bit) retumJ(devnode, mappedblk) Map block number
`strate1y(bp)
`Read and write filesysttm blocks
`Read a block
`bread(vnode, blockno) returns(buf)
`Releue a block buffer
`brelse(vnode, buf)

`.·
`Notice that many of the vnode procedures map one•to•one with NFS protocol procedurt1, while
`other, UNIX dependent procedures such as open, close, and Ioctl do not. The bmap,
`1crate1y, bread, and brelae procedures are used to do readina and wrttina usin& the buffer
`cai::he.
`Pathname traversal is done lD the kernel by breatin& the path into directory componen11 and
`doin& a lookup call throu&b the vnode for each compc,,ient. At first glanee it aeemi like a wute
`13
`
`,,,
`
`Microsoft Ex. 1018, p. 5
`Microsoft v. Daedalus Blue
`IPR2021-00831
`
`

`

`,
`
`of time 10 pass only one component with each call instead of passing the whole path and receiving
`back a target vnode. The main reason for lhil is that any component of the path could be a
`mount point for another fllesystem, and the mount Information Is kept above the vnode
`implementation level. In the NFS ftlesystem, passing whole pathnames would force the server to
`keep track of all of the mount points of its clients in order to determine where to break the
`pathname and this would violate server 11atele11ne11. The inefficiency of looting up one
`component at a time ls alleviated with a cache of directory vnodes.
`Jmplementa !Ion
`lmplementalion of the NFS started in March 1984. The fint step in the implementation was
`modification of the 4.2 kernel to include the filesystem interface. By June we had the· finl
`•"vnode kernel" running. We did some benchmarks to test the amount of overhead added by the
`· extra interface. It turned out that in most cases the difference was not measurable, and in the
`wont case the kernel had only slowed down by about 291,. Most of the wort in adding the new
`interface was in finding and fixing all of Ole places in the kernel that used !nodes directly, and
`code that contained impllcil knowledge of inodes or disk layout.
`Only a few of the filesystem routines in the kernel had to be completely rewritten to use vnodes.
`Namti, the routine that does pathname lookup, was changed to use the vnode lookup
`operation, and cleaned up so that it doesn't use global state. The dir,nr,r routine, which adds
`new directory entries (used by create, rename, etc.), also had to be fixed because it depended
`on the global state from namti. Dirtnltr also had to be modified to do directory locking during
`directory rename operations because inode locking is no longer available at this level, and vnodes
`are never locked.
`To avoid having a rixed upper limit on the number of active vnode and VFS strucrures we added a
`memory allocator to the kernel so that these and other strucrures can be allocated and freed
`dynamically.
`A new system call, 1•tdirtntrit1, was added to read directory entries from different types of
`filesystems. The 4.2 rtaddir library routine was modified to use the new system call 10 programs
`would not have to be rewritten. This change does, however, mean that programs that use
`rtaddir have to be relinked.
`Beginning in March, the user level RPC and XDR libraries were ported to the kernel and we were
`able to make kernel to user and kernel to kernel RPC calls in June. We worked on RPC
`performance for about a month until the round trip lime for a kernel to kernel null RPC call was
`8.8 milliseconds. The performance tuning included several speed ups to the UDP and IP code in
`the kernel.
`Once RPC and the vnode kernel were in place the implementation of NFS was simply a matter of
`writing the XDR routines to do the NFS protocol, implementing an RPC server for the NFS
`procedures in the kernel, and implementing a filesystem Interface which translates vnode
`operations into NFS remote procedure calls. The first NFS kernel was up and running In mid
`August. At this point we had to make some modifications to the vnode interface to allow the
`NFS server 10 do synchronous write operations. This was necessary since unwrlnen blocks In
`, the server's buffer cache are part of the "client's state".
`It wasn't
`Our first implementation of the MOUNT protocol was buill into the NFS protocol.
`until later that we broke the MOUNT protocol into a separate, user level RPC service. The
`MOUNT server is a user level daemon that Is started automatically when a mount request comes
`in. It checks the file /etc/exports which contains a list or exported filesy11ems and the clients
`If the client has import permission, the mount daemon does a 1etfh
`that can import them.
`system call to convert a pathname into an fhandle which Is rerumed to the client.
`On the client side, the mount command was modified to take additional arguments including a
`filesystem type and options string. The rtlesystem type allows one mount command 10 mount any
`type of filesystem. The options string is used to pass optional flags to the different filesystem
`mount system calls. For example, the NFS allows two fiavon of mount, soft and hard. A hard
`mounted filesystem will retry NFS calls forever If the server goes down, while a soft mount gives
`up after a while and returns an error. The problem with soft mounts ls that most UNIX programs
`are not very good about checking reNrn llatus from system calls so you can get some strange
`behavior when servers go down. A hard mounted filesystem, on the other hand, will never fail
`due to a server crash; ii may cause processes to hang for a while, but data will not be Jost.
`
`14
`
`)
`
`Microsoft Ex. 1018, p. 6
`Microsoft v. Daedalus Blue
`IPR2021-00831
`
`

`

`(
`
`Mounted on
`I
`/pub
`/Ulr
`/usr/ ■ rc
`/usr/panic
`/usr11alax7
`/usr/■ercur7
`tusr/opium
`
`Jn addition to the MOUNT server, we have added NFS server daemons. These are user level
`processes that make an nr1d system call into the temel, and never return. This provides a user
`context to the kernel NFS server which allows tbe server to sleep. Similarly, the block J/0
`daemon, on the .client side, is a user level process that lives in the kernel and services
`asynchronous block J/O requests. Because the RPC requests are blocking, a \lier context ls
`necessary to wait for read-ahead and write-behind requests to complete. These daemons provide
`a temporary solution to the problem of handllnl parallel, synchronous requests in the kernel. In
`the future we hope to use a ll&ht•welght process mechanism in the temel to handle tbese requests
`[ 4).
`The NFS group started using the NFS in September, and spent the next sbc months working on
`performance enhancements and·administrative tools to mate the NFS easier to install and use.
`One of the advantages of the NFS was immediately obvious; as the df output below shows, a
`diskless workstation can have access to more than a Gigabyte of distt
`kbytes
`used
`avail capacity
`Filesystem
`ae-.
`;dev ;ndo
`112
`7445
`5711
`/dev/ndp0
`5891
`2718
`2323
`55 ..
`as ..
`panic:/usr
`27487
`21391
`3340
`fiat:/usr/src
`345915 220122
`91201
`71~
`panic:/usr/panic
`148371 118505
`17028
`17~
`ralaxy:1usr11alaxy
`7429
`5150
`1531
`77~
`51318
`mercury:/usr/mercury 301719 215178
`78 ..
`39312 258447
`opium:/usr/opium
`327519
`12~
`The Hard Issues
`Several hard design issues were resolved during the development of the NFS. One of the toughest
`was decidin& how we wanted to use the NFS. Lots of ne:lQ1;,Uity can lead to lots of confusion.
`Root Fllesystems
`Our current NFS implementation does not allow shared NFS root filesystems. There arc many
`hard problems associated with shared root filesystems that we just didn't have time to address.
`For example, many well•known, machine specific files art! on the root filesystem, and too many
`programs use them. Also, sharing a root filesystem impli.es sharing /tap and /dev. Sharing
`/tmp is a problem because programs create temporary mes using their process Id, which ls not
`unique across machines. Sharing /dev requires a remote device access system. We considered
`allowing shared access to tdev by making operations on device nodes appear local. The
`problem with this simple solution is that many programs make special UH of the ownership and
`permissions or device nodes.
`Since every client has private storage (either rul disk or ND) for the root filesy,tem, we were
`able to move machine specific files from shared fllesy,tems into a new directory called
`/private, and replace those files with symbolic links. Things like /usr/lib/crontab and the
`whole directory /usr tad.a have been moved. This allows clients to boot with only /etc and
`/!>in executables local. The /usr, and other filesystems are then remote mounted.
`Fllesystem Namln1
`:;ervers expon whole fili:isysui ms, but clients can mount any sub-directory or a remote filesystem
`on top of a local filesystem, or on top of another remote filesystem. In fact, a remote filesystem
`can be mounted more than once, and can even be mounted on another copy of llselfl This
`means that clients can have different "names" for filesystems by mounting them in different
`places.
`To alleviate some or the confusion we use a set or basic mounted ruesystems on each machine
`and then let usen add other ri.tesystems on top or lhat. Remember thouah that lhls is just policy,
`there is no mechanism ln lbe NFS to enforce this. User home directories are mounted on
`/usr 1serverna11e. This may seem like a violation or our goals because hostnames are now pan
`or pathnames but in fact the directories could have been called /usr/1, /usr/2,. etc. Using
`server names is just a coJ}.venience. This scheme makes workstations look more like timesharing
`terminals because a user can 101 in to any workstation and her home directory will be there. It
`also makes tilde expansion (-usemame is expanded to lbe user's home directory) in tbe C shell
`work in a network with many workstations.
`To aviod the problems or loop detection and dynamic mesystem access checking, servers do not
`cross mount points on remote lookup requests. This means that in order to see the same
`15
`
`Microsoft Ex. 1018, p. 7
`Microsoft v. Daedalus Blue
`IPR2021-00831
`
`

`

`filesystem layoui as a server, a client has to remote mount each of the server's exponed
`filesystems.
`Credentials, Authentication and Security
`We wanted to use UNIX style permission cbeckin& on the server and client so that UNIX users
`RPC allows different
`would see very little difference between remote and local files.
`authentication parameters to be "pluged-in" to the packet header of eacb call to we were able to
`make the NFS use a UNIX navor authenticator to pus uid, pd, and groups on each call. The
`aerver uses the authentication parameters to do permission checking as i! the 111er mating the call
`were doing the operation locally.
`The problem with this auther:nication method is that the mappin& from uld and gid to user must
`be the same on the server and client. This implies a fiat uid, gid space over a whole local
`network. This is not acceptable in the long nin and we are working on different autbimticalion
`schemes. In the mean time, we have developed another RPC bued service called the Yellow
`Pages (YP) to provide a simple, replicated databue lookup service [:SJ. By Jettin& YP handle
`1etc1passwd and /etc/sroup we make the fiat uid space much cuter to admini1trate.
`Another issue related to client authentication is super-user access to remote file1. It ii not clear
`that the super-user on a workstation should have root access to mes on a server machine through
`the NFS. To solve this problem the server maps user root (uid 0) to user nobody (uid -2) before.
`checking access permission. This solves the problem but, unfortunately, causes some strange
`behavior for users logged in as root, since root may have fewer access rights to a fale than a
`normal user.
`Remote root access also affects progra.mt which are set-uid root and need access to remote user
`files, for example /pr. To make these programs more likely to succeed we check OD the client
`side for RPC calls that fail with EACCES and retry the call with the real-uid instead of the
`effective-uid. This is only done when the effective•uid is zero and the real-uld is 1omethin1 other
`than zero so normal users are not affected.
`While restricting super-user access helps to protect remote files, the super•user on a client
`machine can still aain access by using 1u to change her effective-uid to the uid of the owner of a
`re:note file.
`Concurrent Access and FIie Loc:kln1
`The NFS does rot support remote me locking. We purposely did not include this u pan of the
`protocol because we could not find a set of lockina facilitie■ that everyone agrees ls correct.
`Instead we plan to build separate, RPC based file locking facililies. In this way people can use
`the locking facility with the flavor of their choice with minimal effon.
`Related to the problem of me locking is concurrent access to remote mes by multiple cllenu. In
`the local mesystem, file modifications are locked at the lnode level. This prevents two processes
`writina to the SIi.me file Crom intermixin& data on a single write. Since the server maintains no
`locks between requests, and a write may span ■everal RPC requests, two clients writin1 to the
`1ame remote file may gel intermixed data on Ions writes.
`UNIX Open File Sem•ntlcs
`We tried very hard to mak-: the NFS client obey UNIX fllesystem semantics without modifying the
`server or the protocol. In some cases this was hard to do. Por example, UNIX allow, removal of
`open files. A process can open a me, then remove the directory entry for the file 10 that it has no
`name anywhere Jn the tilesystem, and still read and write the rue. Thi1 ls a disgusting bit or
`UNIX trivia and at first we were just not goiJl& to support U, but it turns out that all of the
`programs that we dicln.'t want to have to fix (c,A, .. ndmall, etc.) use Ibis for temporary files.
`What we did to make open me removal wort on remote flies was check in the client VPS
`remove operation if the file is open, and if so rename it instead or removin1 It. T'bil mate, it
`(sort of) invisible to tbe client and stW allows reading and writtn1. The client kernel then
`removes the new name when the vnode becomes inactive. We call tbis the 314 solution because
`if the client crashes between the rename and remove a 1arba1e file is left OD the ■erver. An
`entry to cron can be added to clean up on the server.
`Another problem associated with remote, open files is that access permission on the flle can
`change while the fale is open. In the local case the acce11 permission i1 only checked when the
`file is opened, but in the remote case permission is checked on every NFS call. This means that
`if a client program opens a flle, then changes the permission bits so that it no Jon1er has read
`
`1l6
`
`16
`
`)
`
`)
`
`Microsoft Ex. 1018, p. 8
`Microsoft v. Daedalus Blue
`IPR2021-00831
`
`

`

`permilsion, a 1Ubsequent read request will fall. To aet around this problem we save the client
`credentials in the me table at open time, and use them iD later file access requests.
`Not all or the UNIX. open rue semantics have been preserved because interactions between two
`clients usl.n& the same remote rue can not be controlled on a lin&le client. For example, if one
`client opens a file and another client removes that me, the rmt client's read request will fall
`even thou&h the file is still open.
`Time Skew
`Time skew between two clients or a client and a server can cause time usoclated With a rue to be
`inconsistent. For example, ronlib saves the cumnt time in a library entry, and Id checks the
`modify time of the library against the time saved in the library. When ro,dlb 11 nm on a remote
`rue the modify time comes from the nrver while the current time that &•ts saved in the library
`comes from the client. If the server's time ls far ahead of the client's lt loots to Id Ute the
`library is out of date. There were only three prosrams that we found that were affected by this,
`ranlib, Is and emacs, 10 we fixed them.
`This is a potential problem for any proaram that compares system time tom, modification time.
`We plan to fix this by limiting the time 1kew between machines with a lime syncbron.i.Zation
`protocol.
`Perrormance
`The final hard issue ii the one everyone is most interested in, performance.
`Much of the time since the NFS first came up has been 1pent in improvina performance. Our
`goal was to make NFS faster than the ND in the 1.1 Sun release (about 809- or the speed of a
`local disk). The 1peed we are interested iD ls not raw throughput, but how lon1 it takes

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket