`
`
`
`Scalable, Secure, and
`Highly Available
`Distributed File Access
`
`or the users of a distributed system
`
`F tocollaborateeffectively,the abil-
`
`ity to share data easily is vital. Over
`the last decade, distributed file systems
`based on the Unix model have been the
`subject ofgrowing attention. They are now
`widely considered an effective means of
`sharing data in academic and research en
`vironments. This article presents a sum~
`mary and historical perspective of work
`done by my colleagues, students, and I in
`designing and implementing such systems
`at Carnegie Mellon University.
`This work began in 1983 in the context
`of Andrew, a joint project of CMU and
`IBM to develop a state-of-the-art comput-
`ing facility for education and research at
`CMU. The project envisioned a dramatic
`increase in computing power made pos
`sible by the widespread deployment of
`powerful personal workstations. Our char—
`ter was to develop a mechanism that would
`enable the users of these workstations to
`collaborate and share data effectively. We
`decided to build a distributed file system
`for this purpose because it would provide
`the right balance between functionality
`and complexity for our usage environment.
`It was clear from the outset that our
`distributed file system had to possess two
`critical attributes: It had to scale well, so
`that the system could grow to its antici-
`
`May 1990
`
`Mahadev Satyanarayanan
`
`Carnegie Mellon University
`
`
`
`Andrew and Coda are
`
`distributed Unix file
`
`systems that embody
`many of the recent
`advances in solving
`the problem of data
`sharing in large,
`physically dispersed
`workstation
`
`environments.
`
`pated final size ofover 5,000 workstations.
`It also had to be secure, so that users could
`be confident of the privacy of their data.
`Neither of these attributes is likely to be
`present in a design by accident, nor can it
`be added as an afterthought. Rather. each
`attribute must be treated as a fundamental
`constraint and given careful attention dur»
`
`001879162/90/0500000930100 © 19901EEE
`
`ing the design and implementation of a
`system.
`Our design has evolved over time, re-
`sulting in three distinct versions of the
`Andrew file system, called AFS-l, AFS-Z,
`and AFS-3. In this article “Andrew file
`system“ or “Andrew" will be used as a
`collective term referring to all three ver—
`sions.
`As our user community became more
`dependent on Andrew, the availability of
`data in it became more important. Today, a
`single failure in Andrew can seriously
`inconvenience many users for significant
`periods. To address this problem, we be-
`gan the design of an experimental file
`system called Coda in 1987, Intended for
`the same computing environment as An-
`drew, Coda retains Andrew’s scalability
`and security characteristics while provid-
`ing much higher availability.
`
`The Andrew
`architecture
`
`The Andrew computing paradigm is a
`synthesis of the best features of personal
`computing and timesharing. It combines
`the flexible and visually rich user interface
`available in personal computing with the
`ease of information exchange typical of
`
`EMCVMW 1029
`
`
`
`
`
`
`
`Figure 1. A high-level view of the An-
`drew architecture. The structure la-
`beled “Vice” is a collection of trusted
`file servers and untrusted networks.
`The nodes labeled “W" are private or
`public workstations, or timesharing
`systems. Software in each such node
`makes the shared files in Vice appear
`as an integral part of that node’s file
`system.
`
`User computing cycles are provided by
`workstations running the Unix operating
`system.
`Data sharing in Andrew is supported by
`a distributed file system that appears as a
`single large subtree of the local file system
`on each workstation. The only files outside
`the shared subtree are temporary files and
`files essential for workstation initializa—
`tion. A process called Venus, running on
`each workstation, mediates shared file
`access. Venus finds files in Vice, caches
`them locally, and performs emulation of
`Unix file system semantics. Both Vice and
`Venus are invisible to workstation pro-
`cesses, which only see a Unix file system,
`one subtree of which is identical on all
`workstations. Processes on two different
`workstations can read and write files in this
`subtree just as if they were running on a
`single timesharing system. Figure 2 de—
`picts the file system view seen by a work-
`station user.
`Our experience with the Andrew archi—
`tecture over the past six years has been
`positive. It is simple and easily understood
`by naive users, and it permits efficient
`implementation. It also offers a number of
`benefits that are particularly valuable on a
`large scale:
`
`. Data sharing is simplified, A worksta-
`tion with a small disk can potentially ac—
`cess any file in Andrew by name. Since the
`file system is location transparent, users do
`not have to remember the machines on
`which files are currently located or where
`files were created. System administrators
`can move files from one server to another
`without inconveniencing users, who are
`completely unaware of such a move.
`0 User mobility is supported. A user can
`walk to any workstation in the system and
`access any file in the shared name space. A
`user’s workstation is personal only in the
`sense that he owns it.
`- System administration is easier. Op-
`erations staff can focus on the relatively
`small number of servers, ignoring the more
`numerous and physically dispersed clients.
`Adding a new workstation involves merely
`connecting it to the network and assigning
`it an address.
`- Better security is possible. The servers
`in Vice are physically secure and run
`trusted system software. No user programs
`are executed on servers. Encryption—based
`authentication and transmission are used
`to enforce thesecurity of server—worksta~
`tion communication. Although individuals
`may tamper with the hardware and soft-
`ware on their workstations, their malicious
`
`actions cannot affect users at other work-
`stations.
`- Client autonomy is improved. Work—
`stations can be turned off or physically
`relocated at any time without
`inconvee
`niencing other users. Backup is needed
`only on the servers, since workstation disks
`are used merely as caches.
`
`Scalability in Andrew
`
`A scalable distributed system is one that
`can easily cope with the addition of users
`and sites,
`its growth involving minimal
`expense, performance degradation, and
`administrative complexity. We have
`achieved these goals in Andrew by reduc-
`ing static bindings to a bare minimum and
`by maximizing the number of active clients
`that can be supported by a server. The
`following sections describe the evolution
`of our design strategies for scalability in
`Andrew.
`
`AFS-l. AFS-l was a prototype with the
`primary functions of validating the An-
`drew file system architecture and provid-
`ing rapid feedback on key design deci—
`sions. Each server contained a local file
`system mirroring the structure of the
`shared file system. Vice file status infor-
`mation. such as access lists, was stored in
`shadow directories. If a file was not on a
`server, the search for its name would end in
`a stub directory that identified the server
`containing that file. Since server processes
`could not share memory, their only means
`of sharing data structures was via the local
`file system.
`Clients cached pathname prefix infor-
`mation and used it to direct file requests to
`appropriate servers. The Vice-Venus inter-
`face named files by their full pathnames.
`There was no notion of a low—level name.
`such as the inode in Unix.
`Venus used a pessimistic approach to
`maintaining cache coherence. All cached
`copies of files were considered suspect.
`Before using a cached file, Venus would
`contact Vice to verify that it had the latest
`version. Each open of a file thus resulted in
`at least one interaction with a server, even
`if the file was already in the cache and up
`to date.
`For the most part, we were pleased with
`AFS—l. Almost every application was able
`to use Vice files without recompilation or
`relinking. There were minor areas of in-
`compatibility with standard Unix seman-
`tics, but these were never serious enough to
`discourage users.
`
`COMPUTER
`
`
`
`tmp
`
`bin
`
`vmunix ats
`
`lib
`Local files
`
`Shared files
`
`Figure 2. File system view at a work-
`station: how the shared files in Vice
`appear to a user. The subtree under
`the directory labeled “afs” is identical
`at all workstations. The other directo-
`ries are local to each workstation.
`Symbolic links can be used to make lo-
`cal directories correspond to directo-
`ries in Vice.
`
`timesharing. A conceptual view of this
`model is shown in Figure 1.
`The large. amoeba-like structure in the
`middle, called Vice, is the information-
`sharing backbone of the system. Although
`represented as a single entity, it actually
`consists of a collection of dedicated file
`servers and a complex local area network.
`
`10
`
`
`
`Design principles from Andrew and Coda
`
`
`
`The design choices of Andrew and
`Coda were guided by a few simple
`principles. They were not specified a
`priori, but emerged in the course of
`our work. We share these principles
`and examples of their application in
`the hope that they will be useful to de-
`signers of other large—scale distributed
`systems. The principles should not be
`applied dogmatically but should be
`used to help crystallize thinking during
`the design process.
`
`- Workstations have the cycles to
`burn. Whenever there is a choice be»
`tween performing an operation on a
`workstation and performing it on a
`central resource, it is preferable to
`pick the workstation. This enhances
`the scalability of the design because it
`lessens the need to increase central
`resources as workstations are added.
`The only functions performed by
`servers in Andrew and Coda are those
`critical to security, integrity, or location
`of data. Further, there is very little in-
`terserver traffic. Pathname translation
`is done on clients rather than on serv-
`ers in AFS-2, AFS—3, and Coda. The
`parallel update protocol in Coda de-
`pends on the client to directly update
`all AVSG members, rather than updat-
`ing one of them and letting it relay the
`update.
`
`- Cache whenever possible.
`Scalability, user mobility, and site au-
`tonomy motivate this principle. Cach»
`ing reduces contention on centralized
`resources and transparently makes
`data available wherever it is being
`used.
`AFS-1 cached files and location in
`formation. AFS-2 also cached directo-
`ries, as do AFS-3 and Coda. Caching
`is the basis of disconnected operation
`in Coda.
`
`. Exploit flle usage properties.
`Knowledge of the nature of file accesses
`in real systems allows better design
`choices to be made. Files can often be
`grouped into a small number of easily
`identifiable classes that reflect their ac-
`cess and modification patterns. These
`class-specific properties provide an op—
`portunity for independent optimization
`and. hence, improved performance.
`Almost one-third of the file references
`in a typical Unix system are to temporary
`files. Since such files are seldom
`shared, Andrew and Coda make them
`part of the local name space. The ex-
`ecutable files of system programs are of-
`ten read but rarely written. AFS-2, AFS—
`3. and Coda therefore support read-only
`replication of these files to improve per-
`formance and availability. Coda's use of
`an optimistic replication strategy is
`based on the premise that sequential
`write sharing of user files is rare.
`
`. Minimize systemwide knowledge
`and change. In a large distributed sys—
`tem, it is difficult to be aware at all times
`of the entire state of the system. It is
`also difficult to update distributed or rep-
`licated data structures consistently. The
`scalability of a design is enhanced if it
`rarely requires global information to be
`monitored or atomically updated.
`Workstations in Andrew and Coda
`monitor only the status of servers from
`which they have cached data. They do
`not require any knowledge of the rest of
`the system. File location information on
`Andrew and Code servers changes rela-
`tively rarely. Caching by Venus. rather
`than file location changes in Vice. is
`used to deal with movement of users.
`Coda integrates sewer replication (a
`relatively heavyweight mechanism) with
`caching to improve availability without
`losing scalability. Knowledge of a cach-
`ing site is confined to servers with call-
`backs for the caching site. Coda does
`
`not depend on knowledge of sys-
`temwide topology, nor does it incorpo-
`rate any algorithms requiring sys-
`temwide election or commitment.
`Another instance of the application
`of this principle is the use of negative
`rights. Andrew provides rapid revoca—
`tion by modifications of an access list
`at a single site rather than by sys-
`temwide change of a replicated protec-
`tion database.
`
`- Trust the fewest possible enti-
`tles. A system whose security depends
`on the integrity of the fewest possible
`entities is more likely to remain secure
`as it grows.
`Rather than trusting thousands of
`workstations, security in Andrew and
`Coda is predicated on the integrity of
`the much smaller number of Vice serv-
`ers. The administrators of Vice need
`only ensure the physical security of
`these servers and the software they
`run. Responsibility for workstation in-
`tegrity is delegated to the owner of
`each workstation. Andrew and Coda
`rely on end»to-end encryption rather
`than physical link security.
`
`0 Batch if possible. Grouping op»
`erations (and hence scalability) can im-
`prove throughput, although often af the
`cost of latency.
`The transfer of files in large chunks
`in AFS—S and in their entirety in AFS-t,
`AFS-2, and Coda is an instance of the
`application of this principle. More effi-
`cient network protocols can be used
`when data is transferred en masse
`rather than as individual pages. In
`Coda the second phase of the update
`protocol is deferred and batched. La-
`tency is not increased in this case be-
`cause control can be returned to appli—
`cation programs before the completion
`of the second phase.
`
`AF8—] was in use for about a year, from
`late 1984 to late 1985. At its peak usage,
`there were about 100 workstations and six
`servers. Performance was usually accept—
`able to about 20 active users per server. But
`sometimes a few intense users caused per-
`formance to degrade intolerably. The sys-
`tem turned out to be difficult to operate and
`maintain, especially because it provided
`
`few tools to help system administrators.
`The embedding of file location informa-
`tion in stub directories made it hard to
`move user files between servers.
`
`AFS-Z. The design of AFS-Z was based
`on our experience with AFS-l as well as on
`extensive performance analysis.‘ We re-
`tained the strategy of workstations caching
`
`entire files from a collection of dedicated
`autonomous servers. But we made many
`changes in the realization of this architec—
`ture, especially in cache management,
`name resolution. communication, and
`server process structure.
`A fundamental change in AFS-2 was the
`manner in which cache coherence was
`maintained. Instead of checking with a
`
`May 1990
`
`11
`
`
`
`
`
`ity for frequently read but rarely updated
`files, such as system programs. The backup
`and restoration mechanism in AFSrZ also
`made use of volume primitives. To back up
`a volume, a read-only clone was first made.
`Then, an asynchronous mechanism trans—
`ferred this frozen snapshot
`to a staging
`machine from which it was dumped to tape.
`To handle the common case of accidental
`
`deletion by users, the cloned backup vol-
`ume ofeach user’s files was made available
`as a read-only subtree of that user’s home
`directory. Thus, users themselves could
`restore files within 24 hours by means of
`normal file operations.
`AFS—2 was in use at CMU from late 1985
`until mid—1989. Our experience confirmed
`that it was indeed an efficient and come
`nient system to use at
`large scale. Con-
`trolled experiments established that it per—
`formed better under load than other con—
`
`temporary file systems.“2 Figure 3 presents
`the results of one such experiment.
`
`AFS-3. In 1988, work began on a new
`version of the Andrew file system called
`AFS—3. (For ease of exposition, all changes
`made after the AFS-2 release described in
`Howard et al.1 are described here as pertain-
`ing to AFS-3. In reality, the transition from
`AFS—Z to AFS-3 was gradual.) The revision
`was initiated at CMU and has been contin-
`ued since mid-1989 at Transarc Corpora—
`tion, a commercial venture involving many
`of the original implementers ofAFS—3. The
`revision was motivated by the need to pro-
`vide decentralized system administration,
`by the desire to operate over wide area
`networks, and by the goal of using industry
`standards in the implementation.
`AFS—3 supports multiple administrative
`cells, each with its own servers, worksta-
`tions, system administrators, and users.
`Each cell
`is a completely autonomous
`Andrew environment, but a federation of
`cells can cooperate in presenting users with
`a uniform, seamless filename space. The
`ability to decompose a distributed system
`into cells is important at large scale because
`it allows administrative responsibility to be
`delegated along lines that parallel institu~
`tional boundaries. This makes for smooth
`and efficient system operation.
`The RPC protocol used in AFS-3 pro-
`vides good performance across local and
`wide area networks. In conjunction with the
`cell mechanism, this network capability has
`made possible shared access to a common,
`nationwide file system, distributed over
`nodes such as MIT, the University ofMichi~
`gan, and Dartmouth. as well as CMU.
`Venus has been moved into the Unix
`
`COMPUTER
`
`U)“ucooa:(I)
`
`O AFS-2 cold cache
`A AFS-2 warm cache
`E] NFS
`
`Load units
`
`.Ea)
`E..
`xL
`.Cuca:
`
`mE
`
`m
`
`10
`
`12
`
`Figure 3. AFS-2 versus Sun NFS performance under load on identical client,
`server, and network hardware. A load unit consists of one client workstation
`running an instance of the Andrew benchmark. (Full details of the benchmark
`and experimental configuration can be found in Howard et al.,1 from which this
`graph is adapted.) As the graph clearly indicates, the performance of AFS-Z,
`even with a cold cache, degrades much more slowly than that of NFS.
`
`server on each open, Venus now assumed
`that cache entries were valid unless other-
`wise notified. When a workstation cached
`a file or directory, the server promised to
`notify that workstation before allowing a
`modification by any other workstation.
`This promise, known as a callback, re-
`sulted in a considerable reduction in cache
`validation traffic.
`Callback made it feasible for clients to
`cache directories and to translate path-
`names locally. Without callbacks,
`the
`lookup of every component of a pathname
`Would have generated a cache validation
`request. For reasons of integrity, directory
`modifications were made directly on serv-
`ers. as in AFS-l. Each Vice file or direc-
`tory in AFS-Z was identified by a unique
`fixed‘length file identifier. Location infor—
`mation was contained in a slowly changing
`volume location database replicated on
`each server.
`
`AFS-2 used a single process to service
`all clients of a server. thus reducing the
`context switching and paging overheads
`observed in AFS-l. A nonpreemptive
`'lightweight process mechanism supported
`concurrency and provided a convenient
`programming abstraction on servers and
`clients. The RPC (remote procedure call)
`
`12
`
`mechanism in AFS-Z, which was inte—
`grated with the lightweight process mecha-
`nism, supported a very large number of
`active clients and used an optimized bulk-
`transfer protocol for file transfer.
`Besides the changes we made for per—
`formance, we also eliminated AFS-l ’s
`inflexible mapping of Vice files to server
`disk storage. This change was the basis of
`a number of mechanisms that improved
`system operability. Vice data in AFS—2
`was organized in terms of a data—structur-
`ing primitive called a volume, a collection
`of files forming a partial subtree of the
`Vice name space. Volumes were glued
`together at mount points to form the com-
`plete name space. Venus transparently
`recognized and crossed mount points dur—
`ing name resolution.
`Volumes were usually small enough to
`allow many volumes per server disk parti—
`tion. Volumes formed the basis of disk
`quotas. Each system user was typically
`assigned a volume, and each volume was
`assigned a quota. Easily moved between
`servers by system administrators, a vol-
`ume could be used (even for update) while
`it was being moved.
`Read-only replication of volumes made
`it possible to provide increased availabilr
`
`
`
`
`
`Other contemporary distributed file systems
`
`A testimonial to the importance of
`distributed file systems is the large
`number of efforts to build such sys-
`tems in industry and academia. The
`following are some systems currently
`in use:
`
`Sun NFS has been widely viewed
`as a de facto standard since its intro-
`duction in 1985. Portability and
`heterogeneity are the dominant con-
`siderations in its design. Although
`originally developed on Unix, it is now
`available for other operating systems
`such as MS-DOS.
`Apollo Domain is a distributed
`workstation environment whose devel-
`opment began in the early 19805.
`Since the system was originally in-
`tended for a close-knit team of col-
`
`laborating individuals, scale was not a
`dominant design consideration. But large
`Apollo installations now exist.
`lBM AlX-DS is a collection of distrib-
`uted system services for the AIX operat-
`ing system, a derivative of System V
`Unix. A distributed file system is the pri-
`mary component of AlX-DS. Its goals in-
`clude strict emulation of Unix semantics,
`ability to efficiently support databases,
`and ease of administering a wide range
`of installation configurations.
`AT&T RFS is a distributed file system
`developed for System V Unix. its most
`distinctive feature is precise emulation of
`local Unix semantics for remote tiles.
`Sprlte is an operating system for net-
`worked uniprocessor and multiprocessor
`workstations, designed at the University
`of California at Berkeley. The goals of the
`
`Sprite file system include efficient use
`of large main memory caches.
`diskless operation, and strict Unix
`emulation.
`Amoeba is a distributed operating
`system built by the Free University
`and CWI (Mathematics Center) in
`Amsterdam. The first version of the
`distributed file system used optimistic
`concurrency control. The current ver-
`sion provides simpler semantics and
`has high performance as its primary
`objective.
`Echo is a distributed file system
`currently being implemented at the
`System Research Center of Digital
`Equipment Corporation. It uses a pri—
`mary site replication scheme, with
`reelection in case the primary site
`fails.
`
`pp. 119-130.
`
`Further reading
`Surveys
`
`Satyanarayanan, M., “A Survey of Distrib-
`uted File Systems," in Annual Review of
`Computer Science, J.F. Traub et al., eds.,
`Annual Reviews. Inc., Palo Alto, Calif.,
`1989.
`
`Svobodova, L., “File Sewers for Network-
`Based Distributed Systems.“ ACM Comput-
`ing Surveys, Vol. 16, No.4, Dec. 1984.
`
`Individual systems
`Amoeba
`van Renesse, 11.. H. van Staveren. and AS.
`Tanenbaum, “The Performance of the
`Amoeba Distributed Operating System.”
`
`Software Practice and Experience, Vol. 19. No.
`3, Mar. 1989.
`
`Apollo Domaln
`Levine, P., “The Apollo Domain Distributed
`File System" in Theory and Practice ofDistrib-
`uted Operating Systems, Y. Paker, J.-T. Ba-
`natre, and M. Bozyigit, eds., NATO AS] Series,
`Springer-Verlag, 1987.
`AT&T RFS
`Rifldn, A.P., et al., “RFS Architectural Over—
`view" Proc. Summer Usem'x Conf., Atlanta,
`1986, pp. 248-259.
`Echo
`Hisgen, A., et al., “Availability and Consis-
`tency Trade-Offs in the Echo Distributed File
`System," Proc. Second IEEE Workshop an
`
`Workstation Operating Systems, CS Press,
`Los Alamitos, Calif., Order No. 2003. Sept.
`1989.
`
`IBM AlX-DS
`Sauer, C.H., et 31., “RT PC Distributed Ser-
`vices Overview," ACM Operating Systems
`Review, Vol. 21, No. 3, July 1987, pp. 18-29.
`
`Sprite
`Ousterhout, J.K., et al., “The Sprite Network
`Operating System," Computer, Vol. 21. No.
`2, Feb. 1988, pp. 23-36.
`Sun NFS
`Sandberg, R., et 7.11., “Design and Implemen-
`tation of the Sun Network File System."
`Proc. Summer Usenix Confi, Portland, 1985,
`
`kernel in order to use the vnode file inter-
`cept mechanism from Sun Microsystems,
`a de facto industry standard. The change
`also makes it possible for Venus to cache
`files in large chunks (currently 64 Kbytes)
`rather than in their entirety. This feature
`reduces file-open latency and allows a
`workstation to access files too large to fit
`on its local disk cache.
`
`Security in Andrew
`
`A consequence of large scale is that the
`casual attitude toward security typical of
`close-knit distributed environments is not
`
`May 1990
`
`acceptable. Andrew provides mechanisms
`to enforce security, but we have taken care
`to ensure that these mechanisms do not
`inhibit legitimate use of the system. Of
`course, mechanisms alone cannot guaran—
`tee security; an installation also must fol-
`low proper administrative and operational
`procedures.
`A fundamental question is who enforces
`security. Rather than trusting thousands of
`workstations, Andrew predicates security
`on the integrity of the much smaller num-
`ber of Vice servers. No user software is
`ever run on servers. Workstations may be
`owned privately or located in public areas,
`Andrew assumes that the hardware and
`
`software on workstations may be modified
`in arbitrary ways.
`This section summarizes the main as—
`pects of security in Andrew, pointing out
`the changes that occurred as the system
`evolved. These changes have been small
`compared to the changes for scalability.
`More details on security in Andrew can be
`found in an earlier work.3
`
`Protection domain. The protection do—
`main in Andrew is composed of users and
`groups. A user is an entity, usually a hu-
`man, that can authenticate itself to Vice, be
`held responsible for its actions, and be
`charged for resource consumption. A
`
`13
`
`
`
`
`
`Master
`authentication
`server
`
`Slave
`authentication
`server
`
`server
`
`Slave
`authentication
`
`Client
`
`Figure 4. Major components and relationships involved in authentication in Andrew. Modifications such as password
`changes and additions of new users are made to the master authentication server, which distributes these changes to the
`slaves. When a user logs in, a client can obtain authentication tokens on the user’s behalf from any slave authentication
`server. The client uses these tokens as needed to establish secure connections to file servers.
`
`group is a set of other groups and users.
`Every group is associated with a unique
`user called its owner.
`AFS-l and AFS-Z supported group in—
`heritance, with a user's privileges being
`the cumulative privileges of all the groups
`it belonged to. either directly or indirectly.
`Modifications of the protection domain
`were made off line by system administra-
`tors and typically were reflected in the
`system once a day. In AFS—3. modifica—
`tions are made directly by users to a protec-
`tion server that immediately reflects the
`changes in the system. To simplify the
`implementation of the protection server,
`the initial release of AFS-3 does not sup
`port group inheritance. This may change in
`the future because group inheritance con-
`ceptually simplifies management of the
`protection domain.
`One group is distinguished by the name
`SystemzAdministrators. Membership in
`this group endows special administrative
`privileges, including unrestricted access to
`any file in the system. The use of a
`SystemzAdministrators group rather than a
`
`14
`
`pseudo-user (such as “root" in Unix sys—
`tems) has the advantage that the actual
`identity ofthe user exercising special privia
`leges is available for use in audit trails.
`
`Authentication. The Andrew RPC
`mechanism provides support for secure,
`authenticated communication between
`mutually suspicious clients and servers, by
`using a variant of the Needham and Schroe—
`der private key algorithm.4 When a user
`logs in on a workstation. his or her pass-
`word is used to obtain tokens from an
`authentication server. These tokens are
`saved by Venus and used as needed to
`establish secure RFC connections to file
`servers on behalf of the user.
`The level of indirection provided by
`tokens improves transparency and secu—
`rity. Venus can establish secure connec-
`tions to file servers without users” having
`to supply a password each time a new
`server is contacted. Passwords do not have
`to be stored in the clear on workstations.
`Because tokens typically expire after 24
`hours, the period during which lost tokens
`
`can cause damage is limited.
`As shown in Figure 4, there are multiple
`instances of the authentication server. each
`running on a trusted Vice machine. One of
`the authentication servers, the master. re-
`sponds to updates by users and system
`administrators and asynchronously propa-
`gates the updates to other servers. The
`latter are slaves and only respond to que-
`ries. This design provides robustness by
`allowing users to log in as long as any slave
`or the master is accessible.
`For reasons of standardization, the APS-
`3 developers plan to adopt the Kerberos
`authentication system.5 Kerberos provides
`the functionality of the Andrew authenti-
`cation mechanism and closely resembles it
`in design.
`
`File system protection. Andrew uses an
`access list mechanism for file protection.
`The total rights specified for a user are the
`union of the rights specified for the user
`and for the groups he or she belongs to.
`Access lists are associated with directories
`rather than individual files. The reduction
`
`COMPUTER
`
`
`
`
`
`in state obtained by this design decision
`provides conceptual simplicity that is valu-
`able at large scale. An access list can spec—
`ify negative rights. An entry in a negative
`rights list indicates denial of the specified
`rights, with denial overriding possession
`in case of conflict. Negative rights de-
`couple the problems of rapid revocation
`and propagation of group membership
`information and are particularly valuable
`in a large distributed system.
`Although Vice actually enforces protec—
`tion on the basis of access lists, Venus
`superimposes an emulation of Unix pro-
`tection semantics. The owner component
`of the Unix mode bits on a file indicate
`readability, writability, or executability.
`These bits, which indicate what can be
`done to the file rather than who can do it,
`are set and examined by Venus but ignored
`by Vice. The combination ofaccess lists on
`directories and mode bits on files has
`proved to be an excellent compromise
`between protection at fine granularity,
`conceptual simplicity, and Unix compati-
`bility.
`
`Resource usage. A security violation in
`a distributed system can manifest itself as
`an unauthorized release or modification of
`information or as a denial of resources to
`legitimate users. Andrew’s authentication
`and protection mechanisms guard against
`unauthorized release and modification of
`information. Although Andrew controls
`server disk usage through a per-volume
`quota mechanism, it does not control re—
`sources such as network bandwidth and
`server CPU cycles. In our experience, the
`absence of such controls has not proved to
`be a problem. What has been an occasional
`problem is the inconvenience to the owner
`of a workstation caused by the remote use
`of CPU cycles on that workstation. The
`paper on security in Andrew3 elaborates on
`this issue.
`
`High availability in
`Coda
`
`The Coda file system, a descendant of
`AFS~2,
`is substantially more resilient to
`server and network failures. The ideal that
`Coda strives for is constant data availabil~
`ity, allowing a user to continue working
`regardless of failures elsewhere in the
`system. Coda provides users with the bene-
`fits of a shared data repository but allows
`them to rely entirely on local resources
`when that repository is partially or totally
`inaccessible.
`
`May 1990
`
`
`When network
`
`partitions occur,
`Coda allows data to be
`
`updated in each partition
`but detects and confines
`
`conflicting updates
`as soon as possible
`after their occurrence.
`
`It also provides
`mechanisms to help
`users recover from
`such conflicts.
`
`A related goal of Coda is to gracefully
`integrate the use of portable computers. At
`present, users manually copy relevant files
`from Vice, use the machine while isolated
`from the network, and manually copy
`updated files back to Vice upon reconnec-
`tion. These users are effectively perform-
`ing manual caching of files with write—
`back on reconnection. If one views the
`disconnection from Vice as a deliberately
`induced failure, it is clear that a mecha-
`nism for supporting portable machines in
`isolation is also a mechanism for fault
`tolerance.
`By providing the ability to move seam-
`lessly between zones of normal and dis-
`connected operation, Coda may simplify
`the use of cordless network technologies
`such as cellular telephone, packet radio, or
`infrared communication in distributed file
`systems. Although such technologies pro-
`vide client mobility, they often have intrin-
`sic limitations such as short range, inabil-
`ity to operate inside steel—framed build—
`ings, or line-of-sig