`
`
`
`Scalable, Secure, and
`Highly Available
`Distributed File Access
`
`or the usersof a distributed system
`
`F tocollaborateeffectively,theabil-
`
`ity to share data easily is vital. Over
`the last decade, distributed file systems
`based on the Unix model have been the
`subject of growing attention. They are now
`widely considered an effective means of
`sharing data in academic and research en-
`vironments. This article presents a sum-
`mary and historical perspective of work
`done by my colleagues, students, and I in
`designing and implementing such systems
`at Carnegie Mellon University.
`This work began in 1983 in the context
`of Andrew, a joint project of CMU and
`IBM to develop a state-of-the-art comput-
`ing facility for education and research at
`CMU. The project envisioned a dramatic
`increase in computing power made pos-
`sible by the widespread deployment of
`powerful personal workstations. Our char-
`ter was to develop a mechanism that would
`enable the users of these workstations to
`collaborate and share data effectively. We
`decided to build a distributed file system
`for this purpose because it would provide
`the right balance between functionality
`and complexity for our usage environment.
`It was clear from the outset that our
`distributed file system had to possess two
`critical attributes: It had to scale well, so
`that the system could grow to its antici-
`
`May 1990
`
`MahadevSatyanarayanan
`Carnegie Mellon University
`
`
`
`Andrew and Coda are
`distributed Unix file
`systems that embody
`many of the recent
`advances in solving
`the problem of data
`sharing in large,
`physically dispersed
`workstation
`environments.
`
`pated final size of over 5,000 workstations.
`It also had to be secure, so that users could
`be confident of the privacy of their data.
`Neither of these attributes is likely to be
`present in a design by accident, nor can it
`be added as an afterthought. Rather, each
`attribute must be treated as a fundamental
`constraintand given carefulattention dur-
`
`0018-9 162/90/0500-0009$01.00 © 1990 LEEE
`
`ing the design and implementation of a
`system.
`Our design has evolved over time, re-
`sulting in three distinct versions of the
`Andrewfile system, called AFS-1, AFS-2,
`and AFS-3. In this article “Andrew file
`system” or “Andrew” will be used as a
`collective term referring to all three ver-
`sions.
`As our user community became more
`dependent on Andrew, the availability of
`data in it became more important. Today, a
`single failure in Andrew can seriously
`inconvenience many users for significant
`periods. To address this problem, we be-
`gan the design of an experimental file
`system called Coda in 1987. Intended for
`the same computing environment as An-
`drew, Coda retains Andrew’s scalability
`and security characteristics while provid-
`ing muchhigheravailability.
`
`The Andrew
`architecture
`
`The Andrew computing paradigm is a
`synthesis of the best features of personal
`computing and timesharing. It combines
`the flexible and visually rich user interface
`available in personal computing with the
`ease of information exchange typical of
`
`EMCVMW 1029
`
`
`
`
`
`
`
`Figure 1. A high-level view of the An-
`drew architecture. The structure la-
`beled “Vice”is a collection of trusted
`file servers and untrusted networks.
`The nodes labeled ‘““W”are private or
`public workstations, or timesharing
`systems. Software in each such node
`makesthe sharedfiles in Vice appear
`as an integral partof that node’s file
`system.
`
`User computing cycles are provided by
`workstations running the Unix operating
`system.
`Data sharing in Andrew is supported by
`a distributed file system that appears as a
`single large subtreeof the local file system
`on each workstation. Theonlyfiles outside
`the shared subtree are temporary files and
`files essential for workstation initializa-
`tion. A process called Venus, running on
`each workstation, mediates shared file
`access. Venus finds files in Vice, caches
`them locally, and performs emulation of
`Unix file system semantics. Both Vice and
`Venus are invisible to workstation pro-
`cesses, which only see a Unix file system,
`one subtree of which is identical on all
`workstations. Processes on two different
`workstations can read and writefiles in this
`subtree just as if they were running on a
`single timesharing system. Figure 2 de-
`picts the file system view seen by a work-
`station user.
`Our experience with the Andrew archi-
`tecture over the past six years has been
`positive. It is simple and easily understood
`by naive users, and it permits efficient
`implementation. It also offers a number of
`benefits that are particularly valuable on a
`large scale:
`
`* Data sharing is simplified. A worksta-
`tion with a small disk can potentially ac-
`cess any file in Andrew by name.Since the
`file system is location transparent, users do
`not have to remember the machines on
`whichfiles are currently located or where
`files were created. System administrators
`can movefiles from one server to another
`without inconveniencing users, who are
`completely unaware of such a move.
`e User mobility is supported. A user can
`walk to any workstation in the system and
`access anyfile in the shared name space. A
`user’s workstation is personal only in the
`sense that he ownsit.
`+ System administration is easier. Op-
`erations staff can focus on the relatively
`small numberofservers, ignoring the more
`numerous and physically dispersedclients.
`Adding anew workstation involves merely
`connectingit to the network and assigning
`it an address.
`¢ Better security is possible. The servers
`in Vice are physically secure and run
`trusted system software. No user programs
`are executed on servers. Encryption-based
`authentication and transmission are used
`to enforce the security of server-worksta-
`tion communication. Although individuals
`may tamper with the hardware and soft-
`ware on their workstations, their malicious
`
`actions cannot affect users at other work-
`stations.
`* Client autonomyis improved. Work-
`stations can be turned off or physically
`relocated at any time without
`inconve-
`niencing other users. Backup is needed
`only on the servers, since workstation disks
`are used merely as caches.
`
`Scalability in Andrew
`
`A scalable distributed system is one that
`can easily cope with the addition of users
`and sites,
`its growth involving minimal
`expense, performance degradation, and
`administrative complexity. We have
`achieved these goals in Andrew by reduc-
`ing static bindings to a bare minimum and
`by maximizing the numberofactive clients
`that can be supported by a server. The
`following sections describe the evolution
`of our design strategies for scalability in
`Andrew.
`
`AFS-1. AFS-1 was a prototype with the
`primary functions of validating the An-
`drew file system architecture and provid-
`ing rapid feedback on key design deci-
`sions. Each server contained a local file
`system mirroring the structure of the
`shared file system. Vice file status infor-
`mation, such as accesslists, was stored in
`shadowdirectories. If a file was not on a
`server, the search for its name would end in
`a stub directory that identified the server
`containing that file. Since server processes
`could not share memory, their only means
`of sharing data structures wasviathe local
`file system.
`Clients cached pathnameprefix infor-
`mation and usedit to direct file requests to
`appropriate servers. The Vice- Venusinter-
`face namedfiles by their full pathnames.
`There was no notion of a low-level name,
`such as the inode in Unix.
`Venus used a pessimistic approach to
`maintaining cache coherence. All cached
`copies of files were considered suspect.
`Before using a cached file, Venus would
`contact Vice to verify that it had thelatest
`version. Each open ofa file thus resulted in
`at least one interaction with a server, even
`if the file was already in the cache and up
`to date.
`For the mostpart, we were pleased with
`AFS-1. Almost every application was able
`to use Vice files without recompilation or
`relinking. There were minor areas of in-
`compatibility with standard Unix seman-
`tics, but these were neverserious enough to
`discourage users.
`
`COMPUTER
`
`
`
`tmp
`
`bin
`
`vmunix afs
`
`lib
`Local files
`
`Sharedfiles
`
`Figure 2. File system view at a work-
`station: how the sharedfiles in Vice
`appearto a user. The subtree under
`the directory labeled “afs”is identical
`at all workstations. The other directo-
`ries are local to each workstation.
`Symbolic links can be used to makelo-
`cal directories correspond to directo-
`ries in Vice.
`
`timesharing. A conceptual view of this
`model is shown in Figure 1.
`The large, amoeba-like structure in the
`middle, called Vice, is the information-
`sharing backboneofthe system. Although
`represented as a single entity, it actually
`consists of a collection of dedicated file
`servers and a complex local area network.
`
`10
`
`
`
`
`
`Design principles from Andrew and Coda
`
`of the second phase.
`
`The design choices of Andrew and
`Coda were guided by a few simple
`principles. They were not specified a
`priori, but emerged in the course of
`our work. We share these principles
`and examplesof their application in
`the hope that they will be useful to de-
`signers of other large-scale distributed
`systems. The principles should not be
`applied dogmatically but should be
`used to help crystallize thinking during
`the design process.
`
`* Workstations havethe cycles to
`burn. Wheneverthere is a choice be-
`tween performing an operation on a
`workstation and performing it on a
`central resource, it is preferable to
`pick the workstation. This enhances
`the scalability of the design becauseit
`lessens the needto increase central
`resources as workstations are added.
`The only functions performed by
`servers in Andrew and Coda are those
`critical to security, integrity, or location
`of data. Further, there is verylittle in-
`terservertraffic. Pathnametranslation
`is done on clients rather than on serv-
`ers in AFS-2, AFS-3, and Coda. The
`parallel update protocol in Coda de-
`pendson the client to directly update
`all AVSG members, rather than updat-
`ing one of them andletting it relay the
`update.
`
`« Cache whenever possible.
`Scalability, user mobility, and site au-
`tonomy motivate this principle. Cach-
`ing reduces contention on centralized
`resources and transparently makes
`data available wherever it is being
`used.
`AFS-1 cachedfiles and location in-
`formation. AFS-2 also cached directo-
`ries, as do AFS-3 and Coda. Caching
`is the basis of disconnected operation
`in Coda.
`
`¢ Exploit file usage properties.
`Knowledgeof the nature offile accesses
`in real systems allows better design
`choices to be made. Files can often be
`grouped into a small numberof easily
`identifiable classes that reflect their ac-
`cess and modification patterns. These
`class-specific properties provide an op-
`portunity for independent optimization
`and, hence, improved performance.
`Almost one-third of the file references
`in a typical Unix system are to temporary
`files. Since such files are seldom
`shared, Andrew and Coda make them
`part of the local name space. The ex-
`ecutable files of system programsare of-
`ten read but rarely written. AFS-2, AFS-
`3, and Coda therefore support read-only
`replication of these files to improve per-
`formanceandavailability. Coda’s use of
`an optimistic replication strategy is
`based on the premise that sequential
`write sharing of userfiles is rare.
`
`« Minimize systemwide knowledge
`and change.In a large distributed sys-
`tem, it is difficult to be aware atall times
`of the entire state of the system. It is
`also difficult to update distributed or rep-
`licated data structures consistently. The
`scalability of a design is enhancedif it
`rarely requires giobal information to be
`monitored or atomically updated.
`Workstations in Andrew and Coda
`monitor only the status of servers from
`which they have cached data. They do
`not require any Knowledge of the rest of
`the system. File location information on
`Andrew and Coda servers changesrela-
`tively rarely. Caching by Venus, rather
`thanfile location changesin Vice,is
`used to deal with movementof users.
`Coda integrates serverreplication (a
`relatively heavyweight mechanism) with
`caching to improve availability without
`losing scalability. Knowledge of a cach-
`ing site is confined to servers with call-
`backs for the caching site. Coda does
`
`not depend on knowledge of sys-
`temwide topology, nor doesit incorpo-
`rate any algorithms requiring sys-
`temwide election of commitment.
`Anotherinstance of the application
`of this principle is the use of negative
`rights. Andrew provides rapid revoca-
`tion by modifications of an accesslist
`at a single site rather than by sys-
`temwide changeof a replicated protec-
`tion database.
`
`¢ Trust the fewest possible enti-
`tles. A system whose security depends
`on the integrity of the fewest possible
`entities is more likely to remain secure
`as it grows.
`Rather than trusting thousands of
`workstations, security in Andrew and
`Codais predicated on the integrity of
`the much smaller numberof Vice serv-
`ers. The administrators of Vice need
`only ensure the physical security of
`these servers and the software they
`run. Responsibility for workstation in-
`tegrity is delegated to the ownerof
`each workstation. Andrew and Coda
`rely on end-to-end encryption rather
`than physical link security.
`
`¢ Batchif possible. Grouping op-
`erations (and hence scalability) can im-
`prove throughput, although often at the
`cost of latency.
`The transferoffiles in large chunks
`in AFS-3 andin their entirety in AFS-1,
`AFS-2, and Codais an instance of the
`application of this principle. More effi-
`cient network protocols can be used
`when data is transferred en masse
`rather than as individual pages. In
`Coda the second phaseof the update
`protocol is deferred and batched. La-
`tency is not increased in this case be-
`cause control can be returned to appli-
`cation programs before the completion
`
`AFS-1 wasin use for about a year, from
`late 1984 to late 1985. At its peak usage,
`there were about 100 workstations and six
`servers. Performance was usually accept-
`able to about20 active users perserver. But
`sometimes a few intense users caused per-
`formance to degrade intolerably. The sys-
`tem turned outto be difficult to operate and
`maintain, especially because it provided
`
`few tools to help system administrators.
`The embedding of file location informa-
`tion in stub directories made it hard to
`moveuserfiles between servers.
`
`AFS-2, The design of AFS-2 was based
`on our experience with AFS-1 as well as on
`extensive performance analysis.’ We re-
`tained the strategy of workstations caching
`
`entire files from a collection of dedicated
`autonomousservers. But we made many
`changesin the realization of this architec-
`ture, especially in cache management,
`name resolution, communication, and
`server process structure.
`A fundamental change in AFS-2 wasthe
`manner in which cache coherence was
`maintained. Instead of checking with a
`
`May 1990
`
`11
`
`
`
`
`
`ity for frequently read but rarely updated
`files, such as system programs. The backup
`and restoration mechanism in AFS-2 also
`madeuse of volumeprimitives. To back up
`a volume,a read-only clone wasfirst made.
`Then, an asynchronous mechanism trans-
`ferred this frozen snapshot
`to a staging
`machine from which it was dumpedto tape.
`To handle the commoncase of accidental
`
`deletion by users, the cloned backup vol-
`umeof each user’s files was made available
`as a read-only subtree of that user’s home
`directory. Thus, users themselves could
`restore files within 24 hours by means of
`normalfile operations.
`AFS-2 wasin use at CMU from late 1985
`until mid-1989. Our experience confirmed
`that it was indeed an efficient and conve-
`nient system to use at
`large scale. Con-
`trolled experiments established that it per-
`formed better under load than other con-
`
`temporary file systems.'? Figure 3 presents
`the results of one such experiment.
`
`AFS-3. In 1988, work began on a new
`version of the Andrew file system called
`AFS-3. (For ease of exposition, all changes
`made after the AFS-2 release described in
`Howard etal.' are described here as pertain-
`ing to AFS-3. In reality, the transition from
`AFS-2 to AFS-3 was gradual.) The revision
`wasinitiated at CMU andhas been contin-
`ued since mid-1989 at Transarc Corpora-
`tion, a commercial venture involving many
`ofthe original implementers of AFS-3. The
`revision was motivated by the need to pro-
`vide decentralized system administration,
`by the desire to operate over wide area
`networks, and by the goalof using industry
`standards in the implementation.
`AFS-3 supports multiple administrative
`cells, each with its own servers, worksta-
`tions, system administrators, and users.
`Each cell
`is a completely autonomous
`Andrew environment, but a federation of
`cells can cooperate in presenting users with
`a uniform, seamless filename space. The
`ability to decomposea distributed system
`into cells is importantat large scale because
`it allows administrative responsibility to be
`delegated along lines that parallel institu-
`tional boundaries. This makes for smooth
`and efficient system operation.
`The RPC protocol used in AFS-3 pro-
`vides good performance across local and
`wide area networks. In conjunction with the
`cell mechanism, this network capability has
`madepossible shared access to a common,
`nationwide file system, distributed over
`nodessuch as MIT,the University of Michi-
`gan, and Dartmouth,as well as CMU.
`Venus has been moved into the Unix
`
`COMPUTER
`
`Load units
`
`10
`
`12
`
`Figure 3. AFS-2 versus Sun NFS performance underload on identicalclient,
`server, and network hardware. A load unit consists of one client workstation
`running an instance of the Andrew benchmark.(Full details of the benchmark
`and experimental configuration can be found in Howardetal.,! from which this
`graph is adapted.) As the graphclearly indicates, the performance of AFS-2,
`even with a cold cache, degrades much moreslowly than that of NFS.
`
`server on each open, Venus now assumed
`that cache entries were valid unless other-
`wise notified. When a workstation cached
`a file or directory, the server promised to
`notify that workstation before allowing a
`modification by any other workstation.
`This promise, known as a callback, re-
`sulted in a considerable reduction in cache
`validation traffic.
`Callback madeit feasible for clients to
`cache directories and to translate path-
`names locally. Without callbacks,
`the
`lookup of every componentof a pathname
`would have generated a cache validation
`request. For reasonsof integrity, directory
`modifications were madedirectly on serv-
`ers, as in AFS-1. Each Vicefile or direc-
`tory in AFS-2 was identified by a unique
`fixed-lengthfile identifier. Location infor-
`mation was containedina slowly changing
`volume location database replicated on
`each server.
`
`AFS-2 used a single process to service
`all clients of a server, thus reducing the
`context switching and paging overheads
`observed in AFS-1. A nonpreemptive
`‘hightweight process mechanism supported
`concurrency and provided a convenient
`programming abstraction on servers and
`clients. The RPC (remote procedure call)
`
`12
`
`mechanism in AFS-2, which was inte-
`grated with the lightweight process mecha-
`nism, supported a very large number of
`active clients and used an optimized bulk-
`transfer protocol for file transfer.
`Besides the changes we made for per-
`formance, we also eliminated AFS-1’s
`inflexible mapping of Vice files to server
`disk storage. This change wasthe basis of
`a number of mechanisms that improved
`system operability. Vice data in AFS-2
`was organized in termsof a data-structur-
`ing primitive called a volume, a collection
`of files forming a partial subtree of the
`Vice name space. Volumes were glued
`together at mountpoints to form the com-
`plete name space. Venus transparently
`recognized and crossed mountpoints dur-
`ing nameresolution.
`Volumes were usually small enough to
`allow many volumesperserverdisk parti-
`tion. Volumes formed the basis of disk
`quotas. Each system user was typically
`assigned a volume, and each volume was
`assigned a quota. Easily moved between
`servers by system administrators, a vol-
`ume could be used (even for update) while
`it was being moved.
`Read-only replication of volumes made
`it possible to provide increased availabil-
`
`© AFS-2 cold cache
`& AFS-2 warm cache
`O nes
`
`a3c5oon
`
`£@
`E=
`=x=<
`
`GE&S€oa
`
`
`
`
`
`Other contemporarydistributed file systems
`
`A testimonial to the importance of
`distributedfile systems is the large
`numberof efforts to build such sys-
`tems in industry and academia. The
`following are some systemscurrently
`in use:
`
`Sun NFS has been widely viewed
`as a de facto standard sinceits intro-
`duction in 1985. Portability and
`heterogeneity are the dominant con-
`siderations in its design. Although
`originally developed on Unix,it is now
`available for other operating systems
`such as MS-DOS.
`Apollo Domain is a distributed
`workstation environment whose devel-
`opment beganin the early 1980s.
`Since the system wasoriginally in-
`tended for a close-knit team of col-
`
`laborating individuals, scale was not a
`dominant design consideration. But large
`Apollo installations now exist.
`IBM AIX-DSis a collection of distrib-
`uted system services for the AIX operat-
`ing system, a derivative of System V
`Unix. A distributed file system is the pri-
`mary component of AIX-DS.Its goals in-
`clude strict emulation of Unix semantics,
`ability to efficiently support databases,
`and ease of administering a wide range
`of installation configurations.
`AT&T RFSis a distributed file system
`developed for System V Unix. Its most
`distinctive feature is precise emulation of
`local Unix semantics for remotefiles.
`Sprite is an operating system for net-
`worked uniprocessor and multiprocessor
`workstations, designed at the University
`of California at Berkeley. The goals of the
`
`Sprite file system include efficient use
`of large main memory caches,
`diskless operation, and strict Unix
`emulation.
`Amoebais a distributed operating
`system built by the Free University
`and CWI (Mathematics Center) in
`Amsterdam. Thefirst version of the
`distributed file system used optimistic
`concurrency control. The current ver-
`sion provides simpler semantics and
`has high performanceasits primary
`objective.
`Echois a distributed file system
`currently being implemented at the
`System Research Centerof Digital
`Equipment Corporation. It uses a pri-
`mary site replication scheme, with
`reelection in case the primary site
`fails.
`
`pp. 119-130.
`
`Further reading
`Surveys
`
`Satyanarayanan, M., “A Survey of Distrib-
`uted File Systems,” in Annual Review of
`Computer Science, J.F. Traub etal., eds.,
`Annual Reviews, Inc., Palo Alto, Calif.,
`1989.
`
`Svobodova, L., “File Servers for Network-
`Based Distributed Systems,” ACM Comput-
`ing Surveys, Vol. 16, No. 4, Dec. 1984.
`
`Individual systems
`Amoeba
`van Renesse, R., H. van Staveren, and A.S.
`Tanenbaum, “The Performance of the
`Amoeba Distributed Operating System,”
`
`Software Practice and Experience, Vol. 19, No.
`3, Mar. 1989.
`
`Apolio Domain
`Levine, P., “The Apollo Domain Distributed
`File System”in Theory and Practice ofDistrib-
`uted Operating Systems, Y. Paker, J.-T. Ba-
`natre, and M. Bozyigit, eds., NATO ASISeries,
`Springer-Verlag, 1987.
`AT&T RFS
`Rifkin, A.P., et al., “RFS Architectural Over-
`view” Proc. Summer Usenix Conf., Atlanta,
`1986, pp. 248-259.
`Echo
`Hisgen, A., et al., “Availability and Consis-
`tency Trade-Offs in the Echo Distributed File
`System,” Proc. Second IEEE Workshop on
`
`Workstation Operating Systems, CS Press,
`Los Alamitos, Calif., Order No. 2003, Sept.
`1989,
`
`IBM AIX-DS
`Sauer, C.H., et al., “RT PC Distributed Ser-
`vices Overview,” ACM Operating Systems
`Review, Vol. 21, No. 3, July 1987, pp. 18-29.
`
`Sprite
`Ousterhout, J.K., et al., “The Sprite Network
`Operating System,” Computer, Vol. 21, No.
`2, Feb. 1988, pp. 23-36.
`Sun NFS
`Sandberg,R., et al., “Design and Implemen-
`tation of the Sun Network File System,”
`Proc, Summer Usenix Conf., Portland, 1985,
`
`kernelin order to use the vnodefile inter-
`cept mechanism from Sun Microsystems,
`a de facto industry standard. The change
`also makesit possible for Venus to cache
`files in large chunks (currently 64 Kbytes)
`rather than in their entirety. This feature
`reduces file-open latency and allows a
`workstation to access files too large to fit
`on its local disk cache.
`
`Security in Andrew
`
`A consequenceoflarge scale is that the
`casual attitude toward security typical of
`close-knit distributed environments is not
`
`acceptable. Andrew provides mechanisms
`to enforce security, but we have taken care
`to ensure that these mechanisms do not
`inhibit legitimate use of the system. Of
`course, mechanisms alone cannot guaran-
`tee security; an installation also mustfol-
`low proper administrative and operational
`procedures.
`A fundamental question is who enforces
`security. Rather than trusting thousandsof
`workstations, Andrew predicates security
`on the integrity of the much smaller num-
`ber of Vice servers. No user software is
`ever run on servers. Workstations may be
`ownedprivately or located in public areas.
`Andrew assumes that the hardware and
`
`software on workstations may be modified
`in arbitrary ways.
`This section summarizes the main as-
`pects of security in Andrew, pointing out
`the changes that occurred as the system
`evolved. These changes have been small
`compared to the changes for scalability.
`Moredetails on security in Andrew can be
`found in an earlier work.?
`
`Protection domain. The protection do-
`main in Andrew is composed of users and
`groups. A useris an entity, usually a hu-
`man,that can authenticateitself to Vice, be
`held responsible for its actions, and be
`charged for resource consumption. A
`
`May 1990
`
`13
`
`
`
`
`
`Master
`authentication
`server
`
`Slave
`authentication
`server
`
`server
`
`Slave
`authentication
`
`Client
`
`Figure 4. Major components andrelationships involved in authentication in Andrew. Modifications such as password
`changes and additions of new users are madeto the master authentication server, which distributes these changes to the
`slaves. When a userlogs in, a client can obtain authentication tokens on the user’s behalf from any slave authentication
`server. Theclient uses these tokens as needed to establish secure connectionsto file servers.
`
`group is a set of other groups and users.
`Every group is associated with a unique
`usercalled its owner.
`AFS-1 and AFS-2 supported group in-
`heritance, with a user’s privileges being
`the cumulative privilegesof all the groups
`it belonged to, either directly or indirectly.
`Modifications of the protection domain
`were madeoff line by system administra-
`tors and typically were reflected in the
`system once a day. In AFS-3, modifica-
`tions are madedirectly by users to a protec-
`tion server that immediately reflects the
`changes in the system. To simplify the
`implementation of the protection server,
`the initial release of AFS-3 does not sup-
`port group inheritance. This may change in
`the future because group inheritance con-
`ceptually simplifies management of the
`protection domain.
`Onegroupis distinguished by the name
`System:Administrators. Membership in
`this group endowsspecia) administrative
`privileges,including unrestricted access to
`any file in the system. The use of a
`System:Administrators group rather than a
`
`14
`
`pseudo-user (such as “root” in Unix sys-
`tems) has the advantage that the actual
`identity of the user exercising special privi-
`leges is available for use in audit trails.
`
`Authentication. The Andrew RPC
`mechanism provides support for secure,
`authenticated communication between
`mutually suspicious clients and servers, by
`using a variant of the Needham and Schroe-
`der private key algorithm.* When a user
`logs in on a workstation, his or her pass-
`word is used to obtain tokens from an
`authentication server. These tokens are
`saved by Venus and used as needed to
`establish secure RPC connections to file
`servers on behalf of the user.
`The level of indirection provided by
`tokens improves transparency and secu-
`rity. Venus can establish secure connec-
`tions to file servers without users’ having
`to supply a password each time a new
`server is contacted. Passwords do not have
`to be stored in the clear on workstations.
`Because tokens typically expire after 24
`hours, the period during which lost tokens
`
`can cause damageislimited.
`As shownin Figure 4, there are multiple
`instances of the authentication server, each
`running ona trusted Vice machine. One of
`the authentication servers, the master, re-
`sponds to updates by users and system
`administrators and asynchronously propa-
`gates the updates to other servers. The
`latter are slaves and only respond to que-
`ries. This design provides robustness by
`allowingusersto log in as long as any slave
`or the masteris accessible.
`For reasonsofstandardization, the AFS-
`3 developers plan to adopt the Kerberos
`authentication system.> Kerberos provides
`the functionality of the Andrew authenti-
`cation mechanism and closely resemblesit
`in design.
`
`File system protection. Andrew uses an
`access list mechanism for file protection.
`Thetotal rights specified for a user are the
`union of the rights specified for the user
`and for the groups he or she belongsto.
`Accesslists are associated with directories
`rather than individual files. The reduction
`
`COMPUTER
`
`
`
`
`
`in state obtained by this design decision
`provides conceptual simplicity that is valu-
`able at large scale. An access list can spec-
`ify negative rights. An entry in a negative
`rights list indicates denial of the specified
`rights, with denial overriding possession
`in case of conflict. Negative rights de-
`couple the problems of rapid revocation
`and propagation of group membership
`information and are particularly valuable
`in a large distributed system.
`Although Vice actually enforces protec-
`tion on the basis of access lists, Venus
`superimposes an emulation of Unix pro-
`tection semantics. The owner component
`of the Unix mode bits on a file indicate
`readability, writability, or executability.
`These bits, which indicate what can be
`doneto the file rather than who can doit,
`are set and examined by Venusbut ignored
`by Vice. The combinationofaccess lists on
`directories and mode bits on files has
`proved to be an excellent compromise
`between protection at fine granularity,
`conceptual simplicity, and Unix compati-
`bility.
`
`Resource usage. A security violation in
`a distributed system can manifest itself as
`an unauthorized release or modification of
`information or as a denial of resources to
`legitimate users. Andrew’s authentication
`and protection mechanisms guard against
`unauthorized release and modification of
`information. Although Andrew controls
`server disk usage through a per-volume
`quota mechanism,it does not control re-
`sources such as network bandwidth and
`server CPU cycles. In our experience, the
`absence of such controls has not proved to
`be a problem. What has been an occasional
`problem is the inconvenience to the owner
`of a workstation caused by the remote use
`of CPU cycles on that workstation. The
`paperon security in Andrew’elaborates on
`this issue.
`
`High availability in
`Coda
`
`The Codafile system, a descendant of
`AFS-2,
`is substantially more resilient to
`server and network failures. The ideal that
`Codastrives for is constant data availabil-
`ity, allowing a user to continue working
`regardless of failures elsewhere in the
`system. Coda providesusers with the bene-
`fits of a shared data repository but allows
`them to rely entirely on local resources
`whenthat repositoryis partially or totally
`inaccessible.
`
`May 1990
`
`
`
`different from using an AFS-2 worksta-
`tion.
`
`When network
`partitions occur,
`Coda allows data to be
`updated in each partition
`but detects and confines
`conflicting updates
`as soon as possible
`after their occurrence.
`It also provides
`mechanismsto help
`users recover from
`such conflicts.
`
`A related goal of Coda is to gracefully
`integrate the use of portable computers. At
`present, users manually copy relevantfiles
`from Vice, use the machine while isolated
`from the network, and manually copy
`updated files back to Vice upon reconnec-
`tion. These users are effectively perform-
`ing manual caching of files with write-
`back on reconnection. If one views the
`disconnection from Vice as a deliberately
`induced failure, it is clear that a mecha-
`nism for supporting portable machines in
`isolation is also a mechanism for fault
`tolerance.
`By providing the ability to move seam-
`lessly between zones of normal and dis-
`connected operation, Coda may simplify
`the use of cordless network technologies
`such ascellular telephone, packetradio, or
`infrared communication in distributedfile
`systems. Although such technologies pro-
`vide client mobility, they often have intrin-
`sic limitations such as short range, inabil-
`ity to operate inside steel-framed build-
`ings, or line-of-sight constraints. These
`shortcomings are reduced in significance
`if clients are capable of temporary autono-
`mousoperation.
`The design of Coda was presented in
`detail in a