throbber
DROPBOX EX. 1027
`
`DROPBOX EX. 1027
`
`
`
`

`

`m Series on
`stems and
`ogrammlng
`
`February 1992
`Volume 10 Number 1
`
`A publication of the
`Association for
`Computing Machinery
`
`I‘III-\
`
`I
`
`‘
`
`‘
`
`l
`
`Transactions
`n Computer
`yStemS "
`f
`‘
`
`1
`
`Preface to the Special Issue on Operating Systems Principles
`by Anita Jones
`
`3 Disconnected Operation in the Coda File System
`by James J. Kistler and M. Satyanarayanan
`
`26 The Design and Implementation of a Log-Structured File System
`by Mendel Rosenblum and John K. Ousterhout
`
`53 Scheduler Activations: Effective Kernel Support for the User-Level
`Management of Parallelism
`‘
`by Thomas E. Anderson, Brian N. Bershad, Edward D. Lazowska, and
`HenryM.Levy
`
`,% (\J
`14.152
`33 a;
`(314.1
`I
`zLLJ
`,: §
`.
`LUZ
`_.-m C: C0
`00 *r -
`(DZ 2 “1
`Lu
`
`Dropbox Ex. 1027
`Page 1
`
`
`
`
`

`

`
`
`
`
`
`Disconnected Operation in the Coda File
`
`System
`
`
`
`JAMES J. KISTLEFt and M. SATYANAHAYANAN
`
`
`
`
`
`
`
`
`Carnegie Mellon University
`
`
`
`
`Disconnected operation. is a mode of operation that enables a client to continue accessing critical
`
`
`
`
`
`
`
`
`
`
`
`
`
`data during temporary failures of a shared data repository. An important. though not. exclusive.
`
`
`
`
`
`
`
`
`
`
`
`
`application of disconnected operation is in supporting portable computers. In this paper, we Show
`
`
`
`
`
`
`
`
`
`
`
`
`
`that disconnected operation is feasible. efficient and usable by describing its design and imple-
`
`
`
`
`
`
`
`
`
`
`
`
`
`mentation in the Code. File System. The central idea behind our work is that caching of data.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`now widely used for performance, can also be exploited to improve availability.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Categories
`and Subject Descriptors: DALB IOpereting Systems]: File Systems
`
`
`
`
`
`
`
`
`Management—distributed file systems; D.4.5 [Operating Systems]: Reliability—foot! tolerance;
`
`
`
`
`
`
`
`11.4.8 [Operating Systems]: Performance— measurements
`
`
`
`
`General Terms: Design. Experimentation. Measurement. Performance. Reliability
`
`
`
`
`
`
`
`Additional Key Words and Phrases: Disconnected operation. hoarding, optimistic replication.
`
`
`
`
`
`
`
`
`
`reintegration, second-class replication. server emulation
`
`
`
`
`
`
`
`
`
`
`
`‘1.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Every serious user of a distributed system has faced situations where critical
`
`
`
`
`
`
`
`
`
`
`
`
`work has been impeded by a remote failure. His frustration is particularly
`
`
`
`
`
`
`
`
`
`
`
`
`acute when his workstation is powerful enough to be used standalone, but
`
`
`
`
`
`
`
`
`
`
`
`has been configured to be dependent on remote resources. An important
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`instance of such dependence is the use of data from a distributed file system.
`
`
`
`
`
`
`
`
`
`Placing data in a distributed file system simplifies collaboration between
`
`
`
`
`
`
`
`
`
`
`
`
`users. and allows them to delegate the administration of that data. The
`
`
`
`
`
`
`
`
`
`
`
`
`
`growing popularity of distributed file systems such as NFS [16] and AFS [19]
`
`
`
`This work was supported by the Defense Advanced Research Projects Agency (Avionics Lab.
`
`
`
`
`
`
`
`
`
`
`
`
`Wright Research and Development Center. Aeronautical Systems Division {AFSC}, U.S. Air
`
`
`
`
`
`
`
`
`
`
`Force. Wright-Patterson AFB. Ohio. 45433-6543 under Contract F33615v90-C-1465. ARPA Order
`
`
`
`
`
`
`
`
`
`7597}. National Science Foundation (PYI Award and Grant ECD 8907068}, IBM Corporation
`
`
`
`
`
`
`
`
`
`
`
`(Faculty Development Award. Graduate Fellowship, and Research Initiation Grant}. Digital
`
`
`
`
`
`
`
`
`
`Equipment Corporation (External Research Project Grant), and Bellcore (Information Network—
`
`
`
`
`
`
`
`
`
`ing Research Grant}.
`
`
`
`Authors' address: School of Computer Science, Carnegie Meilon University. Pittsburgh. PA
`
`
`
`
`
`
`
`
`
`
`15213.
`
`
`Permission to copy without fee all or part of this material is granted provided that the copies are
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`not made or distributed For direct commercial advantage, the ACM copyright. notice and the title
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`of the publication and its date appear. and notice is given that copying is by permission of the
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Association for Computing Machinery. To copy otherwise. or to republish. requires a fee andfor
`
`
`
`
`
`
`
`
`
`
`
`
`
`specific permission.
`
`
`© 1992 ACM 0734-2071}92,’0200u0003 $01.50
`
`
`
`
`
`ACM Transactions on Computer Systems. Vol. ll], No. 1. February 1992. Pages 3—25.
`
`
`
`
`
`
`
`
`
`
`
`
`INTRODUCTION
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Dropbox Ex. 1027
`Page 2
`
`Dropbox Ex. 1027
`Page 2
`
`

`

`4
`
`
`
`o
`
`J. J. Klstler and M. Satyaoareyanan
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`attests to the compelling nature of these considerations. Unfortunately, the
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`users of thEse systems have to accept the fact that a remote Failure at a
`
`
`
`
`
`
`critical juncture. may seriously inconvenience them.
`
`
`
`
`
`
`
`
`
`
`
`
`
`How can we improve this state of affairs? Ideally, we would like to enjoy
`
`
`
`
`
`
`
`
`
`
`
`
`the benefits of a shared data repository, but be able to continue critical work
`
`
`
`
`
`
`
`
`
`
`
`when that repository is inaccessible, We call the latter mode of operation
`
`
`
`
`
`
`
`
`disconnected operation, because it represents a temporary deviation from
`
`
`
`
`
`
`
`normal operation as a client of a shared repository.
`
`
`
`
`
`
`
`
`
`
`
`
`
`In this paper we show that disconnected operation in a file system is indeed
`
`
`
`
`
`
`
`
`
`
`
`
`feasible, efficient and usable. The central
`idea behind ciu‘ work is that
`
`
`
`
`
`
`
`
`
`
`
`
`sacking of data, now widely used to improve performance, can also be
`
`
`
`
`
`
`
`
`exploited to enhance availability. We have implemented disconnected opera-
`
`
`
`
`
`
`
`
`
`
`tion in the Coda File System at Carnegie Mellon University.
`
`
`
`
`
`
`
`
`
`
`Our initial experience with Coda confirms the viability of disconnected
`
`
`
`
`
`
`
`
`
`
`operation. We have successfully operated disconnected for periods lasting one
`
`
`
`
`
`
`
`
`
`
`
`
`
`to two days. For a disconnection of this duration, the process of reconnecting
`
`
`
`
`
`
`
`
`
`
`
`
`and propagating changes typically takes about a minute. A local disk of
`
`
`
`
`
`
`
`
`
`
`
`100MB has been adequate For us during these periods of disconnection.
`Trace-driven simulations indicate that a disk of about half that size should be
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`adequate for disconnections lasting a typical workday.
`
`2. DESIGN OVERVIEW
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Coda is designed for an environment consisting of a large collection of
`untrusted Unix1 clients and a much smaller number of trusted Unix file
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`servers. The design is optimized for the access and sharing patterns typical of
`
`
`
`
`
`
`
`
`
`
`academic and research environments.
`is specifically not
`intended for
`It
`
`
`
`
`
`
`
`
`
`applications that exhibit highly concurrent, fine granularity data access.
`Each Coda client has a local disk and can communicate with the servers
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`over a high bandwidth network. At certain times, a client may be temporar-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`ily unable to communicate with some or all of the servers. This may be due to
`
`
`
`
`
`
`
`
`
`
`
`
`
`a server or network failure, or due to the detachment of a pori‘obie ciient
`from the network.
`
`
`
`
`
`
`
`
`
`
`
`
`
`Clients view Coda as a single, location-transparent shared Unix file sys-
`
`
`
`
`
`
`
`
`
`
`
`
`tem. The Coda namespace is mapped to individual file servers at the granu-
`
`
`
`
`
`
`
`
`
`
`
`larity of subtrees celled volumes. At each client, a cache manager (Venus)
`
`
`
`
`
`
`dynamically obtains and caches volume mappings.
`
`
`
`
`
`
`
`
`
`
`Coda uses two distinct, but complementary, mechanisms to achieve high
`
`
`
`
`
`
`
`
`
`
`availability. The first mechanism, server replication, allows volumes to have
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`read—write replicas at. more than one server. The set of replication sites for a
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`volume is its volume storage group (VSG). The subset of a VSG that is
`
`
`
`
`
`
`
`
`
`
`
`currently accessible is a client's accessible VSG (AVSG). The performance
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`cost of server replication is kept low by caching on disks at clients and
`
`
`
`
`
`
`
`
`
`
`
`
`through the use of parallel access protocols. Venus uses a cache coherence
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`protocol based on calibucks [9] to guarantee that an open file yields its latest
`
`
`
`
`
`
`
`
`
`
`1 Unix is a trademark of AT&T Bell Telephone Labs.
`
`
`
`
`
`
`
`
`
`
`ACM Transactions on Computer Systems. Vol. 10. No. 1. February 1992.
`
`
`
`
`
`
`
`
`
`
`
`
`Dropbox Ex. 1027
`Page 3
`
`Dropbox Ex. 1027
`Page 3
`
`

`

`
`
`
`
`
`
`Disconnected Operation in the Code File System
`
`
`
`-
`
`5
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`copy in the AVSG. This guarantee is provided by servers notifying clients
`
`
`
`
`
`
`
`
`
`
`when tlieiwached copies are no longer valid, each notification being referred
`
`
`
`
`
`
`
`
`
`
`
`
`to as a 'callback break’. Modifications in Coda are propagated in parallel to
`
`
`
`
`
`
`
`
`
`
`all AVSG sites, and eventually to missing VSG sites.
`
`
`
`
`
`
`
`
`
`Disconnected operation, the second high availability mechanism used by
`
`
`
`
`
`
`
`
`
`
`takes effect when the AVSG becomes empty. While disconnected,
`Coda,
`
`
`
`
`
`
`
`
`
`
`
`
`
`Venus services file system requests by relying solely on the contents of its
`
`
`
`
`
`
`
`
`
`
`
`cache. Since cache misses cannot be serviced or masked,
`they appear
`
`
`
`
`
`
`
`
`
`
`as failures to application programs and users. When disconnection ends,
`
`
`
`
`
`
`
`
`
`
`Venus propagates modifications and reverts to server replication. Figure 1
`
`
`
`
`
`
`
`
`
`depicts a typical scenario involving transitions between server replication
`
`
`
`and disconnected operation.
`
`
`
`
`
`
`
`
`
`
`
`Earlier Coda papers [18, 19] have described server replication in depth. In
`
`
`
`
`
`
`
`
`
`contrast,
`this paper restricts its attention to disconnected operation. We
`
`
`
`
`
`
`
`
`
`
`
`discuss server replication only in those areas where its presence has signifi-
`
`
`
`
`
`
`
`cantly influenced our design for disconnected operation.
`
`
`
`
`3. DESIGN RATIONALE
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`At a high level, two factors influenced our strategy for high availability.
`
`
`
`
`
`
`
`
`
`First, we wanted to use conventional, off-the—shelf hardware throughout
`
`
`
`
`
`
`
`
`
`
`our system. Second, we wished to preserve transparency by seamlessly inte-
`
`
`
`
`
`
`
`
`
`
`grating the high availability mechanisms of Coda into a normal Unix
`environment.
`
`
`
`
`
`
`
`
`
`
`At a more detailed level, other considerations influenced our design. These
`
`
`
`
`
`
`
`
`
`
`
`include the need to scale gracefully, the advent of portable workstations, the
`
`
`
`
`
`
`
`
`very different.
`resource,
`integrity, and security assumptions made about
`
`
`
`
`
`
`
`
`
`
`
`clients and servers, and the need to strike a balance between availability and
`
`
`
`
`
`
`
`
`
`
`
`consistency. We examine each of these issues in the following sections.
`
`
`
`
`
`
`
`
`
`
`3.1 Scalability
`
`
`
`
`
`
`
`
`
`
`
`Successful distributed systems tend to grow in size. Our experience with
`
`
`
`
`
`
`
`
`
`
`
`
`Coda's ancestor, AFS, had impressed upon us the need to prepare for growth
`
`
`
`
`
`
`
`
`
`
`
`
`a priori, rather than treating it as an afterthought [171. We brought this
`
`
`
`
`
`
`
`
`
`
`
`
`experience to bear upon Coda in two ways. First, we adopted certain mecha-
`
`
`
`
`
`
`
`
`
`
`
`nisms that enhance scalability. Second, we drew upon a set of general
`
`
`
`
`
`
`principles to guide our design choices.
`
`
`
`
`
`
`
`
`
`
`An example of a mechanism we adopted for scalability is callback-based
`
`
`
`
`
`
`
`
`cache coherence. Another such mechanism whole-file caching, offers the
`
`
`
`
`
`
`
`
`
`
`
`
`added advantage of a much simpler failure model: a cache miss can only
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`occur on an open, never on a read, write. seek. or close. This,
`in turn,
`
`
`
`
`
`
`
`substantially simplifies the implementation of disconnected operation. A
`
`
`
`
`
`
`
`
`
`
`
`
`partiald‘ile caching scheme such as that of AFB-4 {22], Echo [8] or MFS
`
`
`
`
`
`
`
`
`[1] would have complicated our implementation and made disconnected
`
`
`
`operation less transparent.
`
`
`
`
`
`
`
`
`
`
`
`A scalability principle that has had considerable influence on our design is
`
`
`
`
`
`
`
`
`
`
`
`
`the placing offlmctioaality on. clients rather than servers. Only if integrity or
`ACM Transactions on Computer Systems, Vol. 10, No. 1, February 1992.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Dropbox Ex. 1027
`Page 4
`
`Dropbox Ex. 1027
`Page 4
`
`

`

`
`
`Fig. 1. How disconnected operation relates to server replication. Three servers (mahler, uiualdi, and rave!) have replicas of the
`volume containing file I. This file is potentially of interest to users at three clients ( flute. viola. and harp). Flute is capable of
`wireless communication (indicated by a dotted line) as well as regular network communication. Proceeding clockwise. the steps
`above show the value of x seen by each node as the connectivity of the system changes. Note that in step (d), flute is operating
`disconnected.
`
`>Gg 5
`
`’
`3wn
`U.o
`5a:
`
`oIOo a
`
`'U
`Ep
`3;
`m‘<
`3m
`5.m
`
`
`
`ueueAemueMeS'wpue1913ng'rr
`
`Dropbox Ex. 1027
`Page 5
`
`<2
`
`.H.
`
`o2.
`

`5"
`3’E
`s:'s
`Q.—
`enEl:
`5‘:
`
`

`

`
`
`
`
`
`
`
`
`
`
`
`Powerful, lightweight and compact laptop computers are commonplace today.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`It is instructive to observe how a person with data in a shared file system
`
`
`
`
`
`
`
`
`
`
`
`
`uses such a machine. Typically, he identifies files of interest and downloads
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`them from the shared file system into the local name space for use while
`
`
`
`
`
`
`
`
`
`
`
`
`
`isolated. When he returns, he copies modified files back into the shared file
`
`
`
`
`
`
`
`
`
`
`system. Such a user is effectively performing manual caching, with write-back
`
`
`upon reconnection!
`
`
`
`
`
`
`
`
`
`
`
`Early in the design of Coda we realized that disconnected operation could
`
`
`
`
`
`
`
`
`
`
`
`substantially simplify the use of portable clients. Users would not have to
`
`
`
`
`
`
`
`
`
`
`
`
`use a different name space while isolated, nor would they have to man-
`
`
`
`
`
`
`
`
`
`
`ually propagate changes upon reconnection. Thus portable machines are a
`
`
`
`
`
`champion application for disconnected operation.
`
`
`
`
`
`
`
`
`
`
`
`
`The use of portable machines also gave us another insight. The fact that
`
`
`
`
`
`
`
`
`
`
`
`
`people are able to operate for extended periods in isolation indicates that they
`
`
`
`
`
`
`
`
`
`
`
`
`are quite good at predicting their future file access needs. This, in turn,
`
`
`
`
`
`
`
`
`
`
`
`suggests that it is reasonable to seek user assistance in augmenting the
`
`
`
`
`
`
`cache management policy for disconnected operation.
`
`
`
`
`
`
`
`
`
`Functionally, involuntary disconnections caused by failures are no different
`
`
`
`
`
`
`
`
`from voluntary disconnections caused by unplugging portable computers.
`
`
`
`
`
`
`
`
`
`
`
`
`Hence Coda provides a single mechanism to cope with all disconnections. Of
`
`
`
`
`
`
`
`
`
`
`
`
`course, there may be qualitative differences: user expectations as well as the
`
`
`
`
`
`
`
`
`
`
`
`
`
`extent of user cooperation are likely to be different in the two cases.
`
`
`
`
`
`
`
`
`Disconnected Operation in the Coda File System
`
`s
`
`7
`
`
`
`
`
`
`
`
`
`
`
`
`
`security would have been compromised have we violated this principle.
`
`
`
`
`
`
`
`
`
`
`
`Another scalability principle we have adopted is the avoidance of'system-wide
`
`
`
`
`
`
`
`
`
`
`rapid. chongc. Consequently, we have rejected strategies that require election
`
`
`
`
`
`
`
`
`
`
`
`
`or agreement by large numbers of nodes. For example, we have avoided
`
`
`
`
`
`
`
`
`
`
`
`
`
`algorithms such as that used in Locus {23] that depend on nodes achieving
`
`
`
`
`
`
`
`
`
`consensus on the current partition state of the network.
`
`3.2 Portable Workstations
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`3.3 First- vs. Second-Class Replication
`
`
`
`
`
`
`
`
`
`
`If disconnected operation is feasible, why is server replication needed at
`
`
`
`
`
`
`
`
`
`
`
`all? The answer to this question depends critically on the very different
`
`
`
`
`
`
`
`
`assumptions made about clients and servers in Coda.
`
`
`
`
`
`
`
`
`
`
`
`
`
`Clients are like appliances:
`they can be turned off at will and may be
`
`
`
`
`
`
`
`
`
`
`
`unattended for long periods of time. They have limited disk storage capacity,
`
`
`
`
`
`
`
`
`
`
`
`their software and hardware may be tampered with, and their owners may
`
`
`
`
`
`
`
`
`
`
`
`
`not be diligent about backing up the local disks. Servers are like public
`
`
`
`
`
`
`
`
`
`
`utilities: they have much greater disk capacity, they are physically secure,
`
`
`
`
`
`
`
`
`
`
`and they are carefully monitored and administered by professional staff.
`
`
`
`
`
`
`
`
`
`It is therefore appropriate to distinguish between first—class replicas on
`
`
`
`
`
`
`
`
`
`servers, and second-class replicas (i.e., cache copies) on clients. First-class
`
`
`
`
`
`
`
`
`
`
`replicas are of higher quality:
`they are more persistent, widely known,
`
`
`
`
`
`
`
`
`secure, available, complete and accurate. Second-class replicas, in contrast,
`
`
`
`
`
`
`
`
`
`
`are inferior along all these dimensions. Only by periodic revalidation with
`
`
`
`
`
`
`
`
`
`
`respect to a first-class replica can a second-class replica be useful.
`ACM Transactions on Computer Systems. Vol. 1!]. No. 1. February 1992.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Dropbox Ex. 1027
`Page 6
`
`
`Dropbox Ex. 1027
`Page 6
`
`

`

`8
`
`
`v
`
`.1. J. Kistler and M. Selyanarayanan
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`The function of a cache coherence protocol is to combine the performance
`
`
`
`
`
`
`
`
`
`
`
`
`and scalability advantages of a second-class replica with the quality of a
`
`
`
`
`
`
`
`
`
`
`first-class replica. When disconnected, the quality of the second-class replica
`
`
`
`
`
`
`
`
`
`
`
`
`
`may be degraded because the first-class replica upon which it is contingent is
`
`
`
`
`
`
`
`
`
`
`innocessible. The longer the duration of disconnection, the greater the poten-
`
`
`
`
`
`
`
`
`
`
`tial for degradation. Whereas server replication preserves the quality of data
`
`
`
`
`
`
`
`
`
`
`in the face of failures, disconnected operation forsakes quality for avail-
`
`
`
`
`
`
`
`
`
`
`ability. Hence server replication is important because it reduces the fre-
`
`
`
`
`
`
`
`
`
`
`quency and duration of disconnected operation, which is properly viewed as
`a measure of last resort.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Server replication is expensive because it requires additional hardware.
`
`
`
`
`
`
`
`
`
`
`Disconnected operation,
`in Contrast, costs little. Whether to use servor
`
`
`
`
`
`
`
`
`
`
`
`
`
`replication or not is thus a trade-off between quality and cost. Coda does
`
`
`
`
`
`
`
`
`
`
`
`
`permit a volume to have a sole server replica. Therefore, an installation can
`
`
`
`
`
`
`
`
`
`rely exclusiVely on disconnected operation if it so chooses.
`
`
`
`
`
`
`
`
`
`
`
`3.4 Optlmlstio VS. Pessimistic Replica Control
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`By definitiOn, a network partition exists between a disconnected second-class
`
`
`
`
`
`
`
`
`
`
`
`
`replica and all its first-class associates. The choice between two families of
`
`
`
`
`
`
`
`
`
`
`replica control strategies, pessimistic and optimistic [5], is therefore central
`
`
`
`
`
`
`
`
`
`
`to the design of disconnected operation. A pessimistic strategy avoids conflict-
`
`
`
`
`
`
`
`
`
`
`ing operations by disallowing. all partitioned writes or by restricting reads
`
`
`
`
`
`
`
`
`
`
`and writes to a single partition. An optimistic strategy provides much higher
`
`
`
`
`
`
`
`
`
`
`availability by permitting reads and writes eVerywhere, and deals with the
`
`
`
`
`
`
`
`
`
`
`attendant danger of conflicts by detecting and resolving them after their
`occurrence.
`
`
`
`
`
`
`
`
`
`
`A pessimistic approach towards disconnected operation would require a
`
`
`
`
`
`
`
`
`
`
`
`
`
`client to acquire. shared or exclusive control of a cached object prior to
`
`
`
`
`
`
`
`
`
`
`disconnection, and to retain such control until reconnection. Possession of
`
`
`
`
`
`
`
`
`
`
`
`exclusive control by a disconnected client would preclude reading or writing
`
`
`
`
`
`
`
`
`
`
`
`
`at all other replicas. Possession of shared control would allow reading at
`
`
`
`
`
`
`
`
`
`other replicas, but writes would still be forbidden everywhere.
`
`
`
`
`
`
`
`
`
`
`Acquiring control prior to Voluntary disconnection is relatively simple. It is
`
`
`
`
`
`
`
`
`
`
`more difficult when disconnection is involuntary, because the system may
`
`
`
`
`
`
`
`
`
`have to arbitrate among multiple requestors. Unfortunately, the information
`
`
`
`
`
`
`
`
`
`
`
`
`
`needed to make a wise decision is not readily available. For example, the
`
`
`
`
`
`
`
`
`
`
`
`system cannot predict which req'uestors would actually use the object. when
`
`
`
`
`
`
`
`
`
`
`
`
`
`they would release control, or what the relative costs of denying them access
`Would be.
`
`
`
`
`
`
`
`
`
`
`
`
`
`Retaining control until reconnection is acceptable in the case of brief
`
`
`
`
`
`
`
`
`
`
`
`disconnections. But it is unacceptable in the case of extended disconnections.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`A disconnected client with shared control of an object would force the rest of
`
`
`
`
`
`
`
`
`
`
`
`
`
`the system to defer all updates until it reconnected. With exclusive control, it
`
`
`
`
`
`
`
`
`
`
`
`
`would even. prevent other users from making a copy of the object. Coercing
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`the client to reconnect may not be feasible, since its whereabouts may not be
`
`ACM Transactions on Computer Systems. Vol. 10, No. 1. February 1992.
`
`
`
`
`
`
`
`
`
`
`
`
`Dropbox Ex. 1027
`Page 7
`
`Dropbox Ex. 1027
`Page 7
`
`

`

`Disconnected Operation in the Code File System
`
`
`
`
`
`
`
`
`
`-
`
`9
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`known. Thus, an entire user community could be at the mercy of a single
`errant client for an unbounded amount of time.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Placing a time bound on exclusive or shared control, as done in the case of
`
`
`
`
`
`
`
`
`
`
`
`
`
`lenses [71, avoids this problem but introduces others. Once a lease expires, a
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`disconnected client loses the ability to access a cached object, even if no one
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`else in the system is interested in it. This, in turn, defeats the purpose of
`
`
`
`
`
`
`
`
`
`
`disconnected operation which is to provide high availability. Worse, updates
`
`
`
`
`
`
`
`
`already made while disconnected have to be discarded.
`
`
`
`
`
`
`
`
`
`
`
`An optimistic approach has its own disadvantages. An update made at one
`
`
`
`
`
`
`
`
`
`
`disconnected client may conflict with an update at another disconnected or
`
`
`
`
`
`
`
`
`
`
`
`
`connected client. For optimistic replication to be viable, the system has to be
`
`
`
`
`
`
`
`
`
`
`
`more sophisticated. There needs to be machinery in the system for detecting
`
`
`
`
`
`
`
`
`
`conflicts, for automating resolution when possible, and for confining dam-
`
`
`
`
`
`
`
`
`
`
`age and preserving evidence for manual repair. Having to repair conflicts
`
`
`
`
`
`
`
`
`
`
`manually violates transparency, is an annoyance to users, and reduces the
`
`
`
`
`usability of the system.
`
`
`
`
`
`
`
`
`
`
`We chose optimistic replication because We felt that its strengths and
`
`
`
`
`
`
`
`
`
`
`weaknesses better matched our design goals. The dominant influence on our
`
`
`
`
`
`
`
`
`
`
`
`
`choice was the low degree of write-sharing typical of Unix. This implied that
`
`
`
`
`
`
`
`
`
`
`
`an optimistic strategy was likely to lead to relatively few conflicts. An
`
`
`
`
`
`
`
`
`
`
`
`optimistic strategy was also consistent with our overall goal of providing the
`
`
`
`
`
`highest possible availability of data.
`
`
`
`
`
`
`
`
`
`
`In principle, we could have chosen a pessimistic strategy for server replica-
`
`
`
`
`
`
`
`
`
`
`tion even after choosing an optimistic strategy for disconnected operation.
`
`
`
`
`
`
`
`
`
`
`
`
`But that would have reduced transparency, because a user would have faced
`
`
`
`
`
`
`
`
`
`
`
`
`the anomaly of being able to update data when disconnected, but being
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`unable to do so when connected to a subset of the servers. Further, many of
`
`
`
`
`
`
`
`
`
`
`
`
`
`the previous arguments in favor of an optimistic strategy also apply to server
`
`replication.
`
`
`
`
`
`
`
`
`
`
`
`Using an optimistic strategy throughout presents a uniform model of the
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`system from the user’s perspective. At any time, he is able to read the latest
`
`
`
`
`
`
`
`
`
`
`
`data in his accessible universe and his updates are immediately visible to
`
`
`
`
`
`
`
`
`
`
`
`everyone else in that universe. His accessible universe is usually the entire
`
`
`
`
`
`
`
`
`
`
`set of servers and clients. When failures occur, his accessible universe
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`shrinks to the set of servers he can contact, and the set of clients that they, in
`
`
`
`
`
`
`
`
`
`
`
`turn, can contact.
`In the limit, when he is operating disconnected, his
`
`
`
`
`
`
`
`
`
`accessible universe consists of just his machine. Upon reconnection, his
`
`
`
`
`
`
`
`
`updates become visible throughout his now-enlarged accessible universe.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`4. DETAILED DESIGN AND IMPLEMENTATION
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`In describing our implementation of disconnected operation, we focus on the
`
`
`
`
`
`
`
`
`
`
`
`
`
`client since this is where much of the complexity lies. Section 4.1 describes
`
`
`
`
`
`
`
`
`
`
`
`
`the physical structure of a client, Section 4.2 introduces the major states
`
`
`
`
`
`
`
`
`
`
`
`
`
`of Venus, and Sections 4.3 to 4.5 discuss these states in detail. A descrip-
`
`
`
`
`
`
`
`
`
`
`
`tion of the server support needed for disconnected operation is contained in
`Section 4.5.
`
`
`
`
`
`ACM Transactions on Computer Systems. Vol. 10. No. 1, February 1992.
`
`
`
`
`
`
`
`
`
`
`
`
`Dropbox Ex. 1027
`Page 8
`
`Dropbox Ex. 1027
`Page 8
`
`

`

`
`
`10
`
`-
`
`J. J. Klsiler and M. Satyanarayanan
`
`A lication
`pp
`
`to Coda
`servers
`
`
`
`
`- stem Call Interface _
`
`Coda MiniCache
`
`Fig. 2. Structure of a Code client.
`
`4. 1 Client Structure
`
`Because of the complexity of Venus, we made it a user-level process rather
`than part of the kernel. The latter approach may have yielded better perfor-
`mance, but would have been less portable and considerably more difi‘icult to
`debug. Figure 2 illustrates the high-level structure of a Coda client.
`Venus intercepts Unix file system calls via the widely used Sun Vnode
`interface [10]. Since this interface imposes a heavy performance overhead on
`user-level cache managers, we use a tiny in-kernel MiniCache to filter out
`many kernel-Venus interactions The MiniCache contains no support for
`remote access, disconnected operation or server replication; these functions
`are handled entirely by Venus.
`A system call on a Coda object is forwarded by the Vnode interface to the
`MiniCache. If possible, the call is serviced by the MiniCache and control is
`returned to the application. Otherwise, the MiniCache contacts Venus to
`service the call. This, in turn, may involve contacting Coda servers. Control
`returns from Venus via the MiniCache to the application program, updating
`MiniCachc state as a side effect. MiniCache state changes may also be
`initiated by Venus on events such as callback breaks from Coda servers.
`Measurements from our implementation confirm that the MiniCache is
`critical for good performance [21].
`
`4.2 Venus States
`
`Logically, Venus operates in one of three states: hoarding, emulation, and
`reintegration. Figure 3 depicts these states and the transitions between them.
`Venus is normally in~the hoarding state, relying on server replication but
`always on the alert for possible disconnection. Upon disconnection, it enters
`the emulation state and remains there for the duration of disconnection.
`
`Upon reconnection, Venus enters the reintegration state, resynchronizes its
`ACM Transactions on Computer Systems. Vol. 10. No. 1. February 1992.
`
`Dropbox Ex. 1027
`Page 9
`
`

`

`Disconnected Operation in the Coda File System
`
`reconnection
`
`Fig. 3. Venus states and transitions. When disconnected. Venus is in the emulation state. It
`transmits to reintegration upon successful reconnection to an AVSG member, and thence to
`hoarding, where it resumes connected operation.
`
`cache with its AVSG, and then reverts to the hoarding state. Since
`all volumes may not be replicated across the same set of servers, Venus can
`be in different states with respect to different volumes, depending on failure
`conditions in the system.
`
`4.3 Hoarding
`
`
`
`The hoarding state is so named because a key responsibility of Venus in this
`state is to hoard useful data in anticipation of disconnection. However, this is
`not its only responsibility. Rather, Venus must manage its cache in a manner
`that balances the needs of connected and disconnected operation. For in-
`stance, a user may have indicated that a certain set of files is critical but may
`currently be using other files. To provide good performance, Venus must
`cache the latter files. But to be prepared for disconnection, it must also cache
`the former set of files.
`
`Many factors complicate the implementation of hoarding:
`
`—Fi1e reference behavior. especially in the distant
`predicted with certainty.
`—Disconnections and reconnections are often unpredictable.
`
`future, cannot be
`
`—The true cost of a cache miss while disconnected is highly variable and
`hard to quantify.
`
`—Activity at other clients must be accounted for. so that the latest version of
`an object is in the cache at disconnection.
`
`—Since cache space is finite, the availability of less critical objects may have
`to be sacrificed in favor of more critical objects.
`ACM Transactions on Computer Systems, Vol. 10. No. 1, February 1992.
`
`Dropbox Ex. 1027
`Page 10
`
`

`

`
`
`J. J. Klstler and M. Satyanatayanan
`
`
`
`
`
`
`
`Personal tiles
`
`
`Egflgafusrfjjk d+
`
`{codafusrfj jumpers 100: m
`
`
`{codafusrfjjkfpapersrsosp iooczc+
`
`
`
`
`System Elle:
`
`
`{usribin 100:d+
`
`
`{narrate Infizd+
`
`
`Iusrfinclude lDO:d+
`
`fusrflih lnflzdo
`
`
`iuarflocalfonu 6+
`
`
`fusrllocallrcs a+
`
`
`{usriuch d+
`
`
`
`
`
`u-na-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`IIII-fll'llflilill-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`llilllllfllllllllfllhIf-
`
`
`
`
`
`
`311 files
`
`
`(from £11 maintainer)
`
`
`Just/xllfblnfx
`
`fusr/XllfblanVga
`
`lusr/Xll/blnfmwm
`
`rusrlxlllblo/startx
`lust/x11fb1n/ac1ock
`lusr/Xll/hlnfxlnit
`
`Iusrfxllfnlnfxcens
`
`Icar/x1ltlncludeIXIIJbitmaps c+
`
`IuerXIIJILbJapp-defaults d+
`
`
`{uerXI1/11bxroncsfmlsc c4
`
`
`{oar/x11f11b!system.mumrc
`
`
`
`
`
`
`
`
`(a)
`l Venus source files
`
`
`
`
`I
`(shared among Coda developers)
`
`
`
`
`
`a Icocaiprcjectfcouarsrcrvenus 100:c*
`
`
`
`a fauna/project!codaflncluda 100:c+
`
`
`
`a jcodafprojectfcodaxllb c+
`
`
`
`
`(b)
`
`
`
`
`(C)
`
`Fig. 4. Sample hoard profiles. These are typical hoard profiles provided by a Coda user, an
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`application maintainer. and a group of project developers. Each profile is interpreted asparately
`
`
`
`
`
`
`
`
`
`
`
`
`by the HUB fronoend program. The 'a’ at. the beginning of a line indicates an arid-entry
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`omnmaud. Other commands are delete an entry. clear all entries. and list entries. The modifiers
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`following some pathnames specify nondefault priorities (the default is 10} ondfor metaexpansion
`
`
`
`
`
`
`
`
`
`
`
`for the entry. Note that the pathnamos beginning with ‘fusr’ are actually symbolic links into
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`'{codaC
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`To address these concerns, we manage the cache using a prioritized
`
`
`
`
`
`
`
`
`
`
`algorithm, and periodically reevaluate which objects merit retention in the
`
`
`
`
`
`
`
`cache via a process known as hoard walking.
`
`
`
`
`
`
`
`
`
`4.3.1 Prioritized Cache Management. Venus combines implicit and ex-
`
`
`
`
`
`
`
`
`
`plicit. sources of information in its priority—based cache management algo-
`
`
`
`
`
`
`
`
`
`
`
`rithm. The implicit information consists of L's-Cent reference history, as in
`
`
`
`
`
`
`
`
`
`
`traditional caching algorithms. Explicit information takes the form of a
`
`
`
`
`
`
`
`
`peraworkstation hoard database (EBB), whose entries a

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket