throbber
DROPBOX EX. 1006
`
`DROPBOX EX. 1006
`
`
`
`

`
`Disconnected Operation in the Coda File System
`
`James J. Kistler and M. Satyanarayanan
`
`School of Computer Science
`Carnegie Mellon University
`Pittsburgh, PA 15213
`
`Abstract
`Disconnected operation is a mode of operation that enables
`a client to continue accessing critical data during temporary
`failures of a shared data repository. An important, though
`not exclusive, application of disconnected operation is in
`supporting portable computers. In this paper, we show that
`disconnected operation is feasible, efficient and usable by
`describing its design and implementation in the Coda File
`System. The central idea behind our work is that caching of
`data, now widely used for performance, can also be
`exploited to improve availability.
`
`1. Introduction
`Every serious user of a distributed system has faced
`situations where critical work has been impeded by a
`remote failure. His frustration is particularly acute when
`his workstation is powerful enough to be used standalone,
`but has been configured to be dependent on remote
`resources. An important instance of such dependence is the
`use of data from a distributed file system.
`
`in a distributed file system simplifies
`Placing data
`collaboration between users, and allows them to delegate
`the administration of that data. The growing popularity of
`distributed file systems such as NFS [15] and AFS [18]
`attests to the compelling nature of these considerations.
`Unfortunately, the users of these systems have to accept the
`fact that a remote failure at a critical juncture may seriously
`inconvenience them.
`
`This work was supported by the Defense Advanced Research Projects Agency
`(Avionics Lab, Wright Research and Development Center, Aeronautical Systems
`Division (AFSC), U.S. Air Force, Wright-Patterson AFB, Ohio, 45433-6543 under
`Contract F33615-90-C-1465, ARPA Order No. 7597), National Science Foundation
`(PYI Award and Grant No. ECD 8907068), IBM Corporation (Faculty Development
`Award, Graduate Fellowship, and Research Initiation Grant), Digital Equipment
`Corporation (External Research Project Grant), and Bellcore (Information
`Networking Research Grant).
`
`How can we improve this state of affairs? Ideally, we
`would like to enjoy the benefits of a shared data repository,
`but be able to continue critical work when that repository is
`inaccessible. We call the latter mode of operation
`disconnected operation, because it represents a temporary
`deviation from normal operation as a client of a shared
`repository.
`
`In this paper we show that disconnected operation in a file
`system is indeed feasible, efficient and usable. The central
`idea behind our work is that caching of data, now widely
`used to improve performance, can also be exploited to
`enhance availability. We have implemented disconnected
`operation in the Coda File System at Carnegie Mellon
`University.
`
`Our initial experience with Coda confirms the viability of
`disconnected operation. We have successfully operated
`disconnected for periods lasting four to five hours. For a
`disconnection of this duration, the process of reconnecting
`and propagating changes typically takes about a minute. A
`local disk of 100MB has been adequate for us during these
`periods of disconnection.
`Trace-driven simulations
`indicate that a disk of about half that size should be
`adequate for disconnections lasting a typical workday.
`
`2. Design Overview
`Coda is designed for an environment consisting of a large
`1
`collection of untrusted Unix clients and a much smaller
`number of trusted Unix file servers. The design is
`optimized for the access and sharing patterns typical of
`academic and research environments. It is specifically not
`intended for applications that exhibit highly concurrent,
`fine granularity data access.
`
`Each Coda client has a local disk and can communicate
`with the servers over a high bandwidth network. At certain
`times, a client may be temporarily unable to communicate
`with some or all of the servers. This may be due to a server
`or network failure, or due to the detachment of a portable
`client from the network.
`
`1Unix is a trademark of AT&T.
`
`Dropbox Ex. 1006
`Page 1
`
`

`
`x=12
`
`x=12
`
`mahler
`
`vivaldi
`
`x=12
`
`ravel
`
`x=87
`
`x=87
`
`mahler
`
`vivaldi
`
`x=87
`
`ravel
`
`x=33
`
`x=87
`
`mahler
`
`vivaldi
`
`x=87
`
`ravel
`
`harp
`x=12
`
`flute
`x=87
`
`harp
`x=87
`
`flute
`x=33
`
`viola
`x=87
`
`(b)
`
`x=45
`
`x=87
`
`x=87
`
`mahler
`
`vivaldi
`
`ravel
`
`harp
`x=87
`
`viola
`x=33
`
`(c)
`
`x=33
`
`x=87
`
`mahler
`
`vivaldi
`
`x=87
`
`ravel
`
`flute
`x=12
`
`viola
`x=12
`
`(a)
`
`x=45
`
`x=45
`
`x=45
`
`mahler
`
`vivaldi
`
`ravel
`
`flute
`x=45
`
`viola
`x=45
`
`(f)
`
`harp
`x=45
`
`flute
`x=45
`
`harp
`x=87
`
`flute
`x=45
`
`harp
`x=87
`
`viola
`x=45
`
`(e)
`
`viola
`x=33
`
`(d)
`
`Three servers (mahler, vivaldi, and ravel) have replicas of the volume containing file x. This file is potentially of interest to users at three
`clients (flute, viola, and harp). Flute is capable of wireless communication (indicated by a dotted line) as well as regular network
`communication. Proceeding clockwise, the steps above show the value of x seen by each node as the connectivity of the system changes.
`Note that in step (d), flute is operating disconnected.
`Figure 1: How Disconnected Operation Relates to Server Replication
`
`cache. Since cache misses cannot be serviced or masked,
`Clients view Coda as a single, location-transparent shared
`they appear as failures to application programs and users.
`Unix file system. The Coda namespace is mapped to
`individual file servers at the granularity of subtrees called When disconnection ends, Venus propagates modifications
`volumes. At each client, a cache manager (Venus)
`and reverts to server replication. Figure 1 depicts a typical
`dynamically obtains and caches volume mappings.
`scenario involving transitions between server replication
`and disconnected operation.
`
`Coda uses two distinct, but complementary, mechanisms to
`achieve high availability. The first mechanism, server
`replication, allows volumes to have read-write replicas at
`more than one server. The set of replication sites for a
`volume is its volume storage group (VSG). The subset of a
`VSG that is currently accessible is a client’s accessible
`VSG (AVSG). The performance cost of server replication is
`kept low by caching on disks at clients and through the use
`of parallel access protocols. Venus uses a cache coherence
`protocol based on callbacks [9] to guarantee that an open of
`a file yields its latest copy in the AVSG. This guarantee is
`provided by servers notifying clients when their cached
`copies are no longer valid, each notification being referred
`to as a "callback break." Modifications in Coda are
`propagated in parallel to all AVSG sites, and eventually to
`missing VSG sites.
`
`Earlier Coda papers [17, 18] have described server
`replication in depth. In contrast, this paper restricts its
`attention to disconnected operation. We discuss server
`replication only in those areas where its presence has
`significantly
`influenced our design for disconnected
`operation.
`
`3. Design Rationale
`At a high level, two factors influenced our strategy for high
`availability. First, we wanted to use conventional, off-the-
`shelf hardware throughout our system. Second, we wished
`to preserve transparency by seamlessly integrating the high
`availability mechanisms of Coda into a normal Unix
`environment.
`
`the second high availability
`Disconnected operation,
`mechanism used by Coda, takes effect when the AVSG
`becomes empty. While disconnected, Venus services file
`system requests by relying solely on the contents of its
`
`At a more detailed level, other considerations influenced
`our design. These include the need to scale gracefully, the
`advent of portable workstations,
`the very different
`resource, integrity, and security assumptions made about
`
`Dropbox Ex. 1006
`Page 2
`
`

`
`clients and servers, and the need to strike a balance
`between availability and consistency. We examine each of
`these issues in the following sections.
`
`3.1. Scalability
`Successful distributed systems tend to grow in size. Our
`experience with Coda’s ancestor, AFS, had impressed upon
`us the need to prepare for growth a priori, rather than
`treating it as an afterthought [16]. We brought this
`experience to bear upon Coda in two ways. First, we
`adopted certain mechanisms that enhance scalability.
`Second, we drew upon a set of general principles to guide
`our design choices.
`
`An example of a mechanism we adopted for scalability is
`callback-based cache coherence. Another such mechanism,
`whole-file caching, offers the added advantage of a much
`simpler failure model: a cache miss can only occur on an
`open, never on a read, write, seek, or close. This,
`in turn, substantially simplifies the implementation of
`disconnected operation. A partial-file caching scheme such
`as that of AFS-4 [21], Echo [8] or MFS [1] would have
`complicated our implementation and made disconnected
`operation less transparent.
`
`A scalability principle that has had considerable influence
`on our design is the placing of functionality on clients
`rather than servers. Only if integrity or security would
`have been compromised have we violated this principle.
`Another scalability principle we have adopted is the
`avoidance of system-wide rapid change. Consequently, we
`have rejected strategies that require election or agreement
`by large numbers of nodes. For example, we have avoided
`algorithms such as that used in Locus [22] that depend on
`nodes achieving consensus on the current partition state of
`the network.
`
`3.2. Portable Workstations
`Powerful, lightweight and compact laptop computers are
`commonplace today.
`It is instructive to observe how a
`person with data in a shared file system uses such a
`machine. Typically, he identifies files of interest and
`downloads them from the shared file system into the local
`name space for use while isolated. When he returns, he
`copies modified files back into the shared file system.
`Such a user is effectively performing manual caching, with
`write-back upon reconnection!
`
`Early in the design of Coda we realized that disconnected
`operation could substantially simplify the use of portable
`clients. Users would not have to use a different name space
`while isolated, nor would they have to manually propagate
`changes upon reconnection. Thus portable machines are a
`champion application for disconnected operation.
`
`The use of portable machines also gave us another insight.
`The fact that people are able to operate for extended
`periods in isolation indicates that they are quite good at
`predicting their future file access needs. This, in turn,
`suggests that it is reasonable to seek user assistance in
`augmenting the cache management policy for disconnected
`operation.
`
`Functionally, involuntary disconnections caused by failures
`are no different from voluntary disconnections caused by
`unplugging portable computers. Hence Coda provides a
`single mechanism to cope with all disconnections. Of
`course,
`there may be qualitative differences:
` user
`expectations as well as the extent of user cooperation are
`likely to be different in the two cases.
`
`3.3. First vs Second Class Replication
`If disconnected operation is feasible, why is server
`replication needed at all? The answer to this question
`depends critically on the very different assumptions made
`about clients and servers in Coda.
`
`Clients are like appliances: they can be turned off at will
`and may be unattended for long periods of time. They have
`limited disk storage capacity, their software and hardware
`may be tampered with, and their owners may not be
`diligent about backing up the local disks. Servers are like
`public utilities: they have much greater disk capacity, they
`are physically secure, and they are carefully monitored and
`administered by professional staff.
`
`It is therefore appropriate to distinguish between first class
`replicas on servers, and second class replicas (i.e., cache
`copies) on clients. First class replicas are of higher quality:
`they are more persistent, widely known, secure, available,
`complete and accurate. Second class replicas, in contrast,
`are inferior along all these dimensions. Only by periodic
`revalidation with respect to a first class replica can a
`second class replica be useful.
`
`The function of a cache coherence protocol is to combine
`the performance and scalability advantages of a second
`class replica with the quality of a first class replica. When
`disconnected, the quality of the second class replica may be
`degraded because the first class replica upon which it is
`contingent is inaccessible. The longer the duration of
`disconnection, the greater the potential for degradation.
`Whereas server replication preserves the quality of data in
`the face of failures, disconnected operation forsakes quality
`for availability. Hence server replication is important
`because
`it reduces
`the frequency and duration of
`disconnected operation, which is properly viewed as a
`measure of last resort.
`
`it requires
`is expensive because
`Server replication
`additional hardware. Disconnected operation, in contrast,
`
`Dropbox Ex. 1006
`Page 3
`
`

`
`costs little. Whether to use server replication or not is thus
`a tradeoff between quality and cost. Coda does permit a
`volume to have a sole server replica. Therefore, an
`installation can rely exclusively on disconnected operation
`if it so chooses.
`
`3.4. Optimistic vs Pessimistic Replica Control
`By definition, a network partition exists between a
`disconnected second class replica and all its first class
`associates. The choice between two families of replica
`control strategies, pessimistic
`and optimistic [5],
`is
`therefore central to the design of disconnected operation.
`A pessimistic strategy avoids conflicting operations by
`disallowing all partitioned writes or by restricting reads and
`writes to a single partition. An optimistic strategy provides
`much higher availability by permitting reads and writes
`everywhere, and deals with the attendant danger of
`conflicts by detecting and resolving them after their
`occurence.
`
`A pessimistic approach towards disconnected operation
`would require a client to acquire shared or exclusive
`control of a cached object prior to disconnection, and to
`retain such control until reconnection. Possession of
`exclusive control by a disconnected client would preclude
`reading or writing at all other replicas. Possession of
`shared control would allow reading at other replicas, but
`writes would still be forbidden everywhere.
`
`Acquiring control prior to voluntary disconnection is
`relatively simple. It is more difficult when disconnection is
`involuntary, because the system may have to arbitrate
`among multiple requestors. Unfortunately, the information
`needed to make a wise decision is not readily available.
`For example, the system cannot predict which requestors
`will actually use the object, when they will release control,
`or what the relative costs of denying them access would be.
`
`Retaining control until reconnection is acceptable in the
`case of brief disconnections. But it is unacceptable in the
`case of extended disconnections. A disconnected client
`with shared control of an object would force the rest of the
`system to defer all updates until it reconnected. With
`exclusive control, it would even prevent other users from
`making a copy of the object. Coercing the client to
`reconnect may not be feasible, since its whereabouts may
`not be known. Thus, an entire user community could be at
`the mercy of a single errant client for an unbounded
`amount of time.
`
`Placing a time bound on exclusive or shared control, as
`done in the case of leases [7], avoids this problem but
`introduces others. Once a lease expires, a disconnected
`client loses the ability to access a cached object, even if no
`else in the system is interested in it. This, in turn, defeats
`
`the purpose of disconnected operation which is to provide
`high availability. Worse, updates already made while
`disconnected have to be discarded.
`
`An optimistic approach has its own disadvantages. An
`update made at one disconnected client may conflict with
`an update at another disconnected or connected client. For
`optimistic replication to be viable, the system has to be
`more sophisticated. There needs to be machinery in the
`system for detecting conflicts, for automating resolution
`when possible, and for confining damage and preserving
`evidence for manual repair. Having to repair conflicts
`manually violates transparency, is an annoyance to users,
`and reduces the usability of the system.
`
`We chose optimistic replication because we felt that its
`strengths and weaknesses better matched our design goals.
`The dominant influence on our choice was the low degree
`of write-sharing typical of Unix. This implied that an
`optimistic strategy was likely to lead to relatively few
`conflicts. An optimistic strategy was also consistent with
`our overall goal of providing
`the highest possible
`availability of data.
`
`In principle, we could have chosen a pessimistic strategy
`for server replication even after choosing an optimistic
`strategy for disconnected operation. But that would have
`reduced transparency, because a user would have faced the
`anomaly of being able to update data when disconnected,
`but being unable to do so when connected to a subset of the
`servers. Further, many of the previous arguments in favor
`of an optimistic strategy also apply to server replication.
`
`Using an optimistic strategy throughout presents a uniform
`model of the system from the user’s perspective. At any
`time, he is able to read the latest data in his accessible
`universe and his updates are immediately visible to
`everyone else in that universe. His accessible universe is
`usually the entire set of servers and clients. When failures
`occur, his accessible universe shrinks to the set of servers
`he can contact, and the set of clients that they, in turn, can
`contact. In the limit, when he is operating disconnected,
`his accessible universe consists of just his machine. Upon
`reconnection, his updates become visible throughout his
`now-enlarged accessible universe.
`
`4. Detailed Design and Implementation
`In describing our
`implementation of disconnected
`operation, we focus on the client since this is where much
`of the complexity lies. Section 4.1 describes the physical
`structure of a client, Section 4.2 introduces the major states
`of Venus, and Sections 4.3 to 4.5 discuss these states in
`detail. A description of the server support needed for
`disconnected operation is contained in Section 4.5.
`
`Dropbox Ex. 1006
`Page 4
`
`

`
`different volumes, depending on failure conditions in the
`system.
`
`Hoarding
`
`r e
`
`lo
`
`c
`
`o
`
`gic
`
`n
`
`n
`
`al
`
`e
`
`ctio
`
`n
`
`disconnection
`
`4.1. Client Structure
`Because of the complexity of Venus, we made it a user
`level process rather than part of the kernel. The latter
`approach may have yielded better performance, but would
`have been less portable and considerably more difficult to
`debug. Figure 2 illustrates the high-level structure of a
`Coda client.
`
`Application
`
`Venus
`
`to Coda
`servers
`
`Emulation
`
`Reintegration
`
`physical
`reconnection
`
`System Call Interface
`
`Vnode Interface
`
`Coda MiniCache
`
`Figure 2: Structure of a Coda Client
`
`Venus intercepts Unix file system calls via the widely-used
`Sun Vnode interface [10]. Since this interface imposes a
`heavy performance overhead on user-level cache managers,
`we use a tiny in-kernel MiniCache to filter out many
`kernel-Venus interactions. The MiniCache contains no
`support for remote access, disconnected operation or server
`replication; these functions are handled entirely by Venus.
`
`A system call on a Coda object is forwarded by the Vnode
`interface to the MiniCache. If possible, the call is serviced
`by the MiniCache and control is returned to the application.
`Otherwise, the MiniCache contacts Venus to service the
`call. This, in turn, may involve contacting Coda servers.
`Control returns from Venus via the MiniCache to the
`application program, updating MiniCache state as a side
`effect. MiniCache state changes may also be initiated by
`Venus on events such as callback breaks from Coda
`servers. Measurements from our implementation confirm
`that the MiniCache is critical for good performance [20].
`
`4.2. Venus States
`Logically, Venus operates in one of three states: hoarding,
`emulation, and reintegration. Figure 3 depicts these states
`and the transitions between them. Venus is normally in the
`hoarding state, relying on server replication but always on
`the alert for possible disconnection. Upon disconnection, it
`enters the emulation state and remains there for the
`duration of disconnection. Upon reconnection, Venus
`enters the reintegration state, resynchronizes its cache with
`its AVSG, and then reverts to the hoarding state. Since all
`volumes may not be replicated across the same set of
`servers, Venus can be in different states with respect to
`
`When disconnected, Venus is in the emulation state. It
`transits to reintegration upon successful reconnection to
`an AVSG member, and thence to hoarding, where it
`resumes connected operation.
`Figure 3: Venus States and Transitions
`
`4.3. Hoarding
`The hoarding state is so named because a key responsibility
`of Venus in this state is to hoard useful data in anticipation
`of disconnection.
` However,
`this
`is not
`its only
`responsibility. Rather, Venus must manage its cache in a
`manner
`that balances
`the needs of connected and
`disconnected operation. For instance, a user may have
`indicated that a certain set of files is critical but may
`currently be using other files.
` To provide good
`performance, Venus must cache the latter files. But to be
`prepared for disconnection, it must also cache the former
`set of files.
`
`Many factors complicate the implementation of hoarding:
`• File reference behavior, especially in the
`distant future, cannot be predicted with
`certainty.
`• Disconnections and reconnections are often
`unpredictable.
`• The
`true cost of a cache miss while
`disconnected is highly variable and hard to
`quantify.
`• Activity at other clients must be accounted for,
`so that the latest version of an object is in the
`cache at disconnection.
`• Since cache space is finite, the availability of
`less critical objects may have to be sacrificed
`in favor of more critical objects.
`To address these concerns, we manage the cache using a
`prioritized algorithm, and periodically reevaluate which
`objects merit retention in the cache via a process known as
`hoard walking.
`
`Dropbox Ex. 1006
`Page 5
`
`

`
`# Personal files
`a /coda/usr/jjk d+
`a /coda/usr/jjk/papers 100:d+
`a /coda/usr/jjk/papers/sosp 1000:d+
`
`# System files
`a /usr/bin 100:d+
`a /usr/etc 100:d+
`a /usr/include 100:d+
`a /usr/lib 100:d+
`a /usr/local/gnu d+
`a /usr/local/rcs d+
`a /usr/ucb d+
`
`(a)
`
`# X11 files
`# (from X11 maintainer)
`a /usr/X11/bin/X
`a /usr/X11/bin/Xvga
`a /usr/X11/bin/mwm
`a /usr/X11/bin/startx
`a /usr/X11/bin/xclock
`a /usr/X11/bin/xinit
`a /usr/X11/bin/xterm
`a /usr/X11/include/X11/bitmaps c+
`a /usr/X11/lib/app-defaults d+
`a /usr/X11/lib/fonts/misc c+
`a /usr/X11/lib/system.mwmrc
`
`(b)
`
`# Venus source files
`# (shared among Coda developers)
`a /coda/project/coda/src/venus 100:c+
`a /coda/project/coda/include 100:c+
`a /coda/project/coda/lib c+
`
`(c)
`
`These are typical hoard profiles provided by a Coda user, an application maintainer, and a group of project developers. Each profile is
`interpreted separately by the HDB front-end program. The ’a’ at the beginning of a line indicates an add-entry command. Other
`commands are delete an entry, clear all entries, and list entries. The modifiers following some pathnames specify non-default priorities
`(the default is 10) and/or meta-expansion for the entry. Note that the pathnames beginning with ’/usr’ are actually symbolic links into
`’/coda’.
`
`Figure 4: Sample Hoard Profiles
`
`4.3.1. Prioritized Cache Management
`Venus combines
`implicit and explicit sources of
`information
`in
`its priority-based cache management
`algorithm. The implicit information consists of recent
`reference history, as in traditional caching algorithms.
`Explicit information takes the form of a per-workstation
`hoard database (HDB), whose entries are pathnames
`identifying objects of
`interest
`to
`the user at
`that
`workstation.
`
`A simple front-end program allows a user to update the
`HDB using command scripts called hoard profiles, such as
`those shown in Figure 4. Since hoard profiles are just files,
`it is simple for an application maintainer to provide a
`common profile for his users, or for users collaborating on
`a project to maintain a common profile. A user can
`customize his HDB by specifying different combinations of
`profiles or by executing front-end commands interactively.
`To facilitate construction of hoard profiles, Venus can
`record all file references observed between a pair of start
`and stop events indicated by a user.
`
`To reduce the verbosity of hoard profiles and the effort
`needed to maintain them, Venus supports meta-expansion
`of HDB entries. As shown in Figure 4, if the letter ’c’ (or
`’d’) follows a pathname, the command also applies to
`immediate children (or all descendants). A ’+’ following
`the ’c’ or ’d’ indicates that the command applies to all
`future as well as present children or descendents. A hoard
`entry may optionally indicate a hoard priority, with higher
`priorities indicating more critical objects.
`
`The current priority of a cached object is a function of its
`hoard priority as well as a metric representing recent usage.
`The latter is updated continuously in response to new
`references, and serves to age the priority of objects no
`longer in the working set. Objects of the lowest priority
`are chosen as victims when cache space has to be
`reclaimed.
`
`To resolve the pathname of a cached object while
`disconnected, it is imperative that all the ancestors of the
`object also be cached. Venus must therefore ensure that a
`cached directory
`is not purged before any of
`its
`descendants. This hierarchical cache management is not
`needed in traditional file caching schemes because cache
`misses during name translation can be serviced, albeit at a
`performance cost. Venus performs hierarchical cache
`management by assigning infinite priority to directories
`with cached children.
` This automatically
`forces
`replacement to occur bottom-up.
`
`4.3.2. Hoard Walking
`We say that a cache is in equilibrium, signifying that it
`meets user expectations about availability, when no
`uncached object has a higher priority than a cached object.
`Equilibrium may be disturbed as a result of normal activity.
`For example, suppose an object, A, is brought into the
`cache on demand, replacing an object, B. Further suppose
`that B is mentioned in the HDB, but A is not. Some time
`after activity on A ceases, its priority will decay below the
`hoard priority of B. The cache is no longer in equilibrium,
`since the cached object A has lower priority than the
`uncached object B.
`
`Venus periodically restores equilibrium by performing an
`operation known as a hoard walk. A hoard walk occurs
`every 10 minutes in our current implementation, but one
`may be explicitly requested by a user prior to voluntary
`disconnection. The walk occurs in two phases. First, the
`name bindings of HDB entries are reevaluated to reflect
`update activity by other Coda clients. For example, new
`children may have been created in a directory whose
`pathname is specified with the ’+’ option in the HDB.
`Second, the priorities of all entries in the cache and HDB
`are reevaluated, and objects fetched or evicted as needed to
`restore equilibrium.
`
`Dropbox Ex. 1006
`Page 6
`
`

`
`Hoard walks also address a problem arising from callback
`breaks.
`In traditional callback-based caching, data is
`refetched only on demand after a callback break. But in
`Coda, such a strategy may result in a critical object being
`unavailable should a disconnection occur before the next
`reference to it. Refetching immediately upon callback
`break avoids this problem, but ignores a key characteristic
`of Unix environments: once an object is modified, it is
`likely to be modified many more times by the same user
`within a short interval [14, 6]. An immediate refetch
`policy would increase client-server traffic considerably,
`thereby reducing scalability.
`
`4.4.1. Logging
`During emulation, Venus records sufficient information to
`replay update activity when it reintegrates. It maintains
`this information in a per-volume log of mutating operations
`called a replay log. Each log entry contains a copy of the
`corresponding system call arguments as well as the version
`state of all objects referenced by the call.
`
`Venus uses a number of optimizations to reduce the length
`of the replay log, resulting in a log size that is typically a
`few percent of cache size. A small log conserves disk
`space, a critical resource during periods of disconnection.
`It also improves reintegration performance by reducing
`latency and server load.
`
`One important optimization to reduce log length pertains to
`write operations on files. Since Coda uses whole-file
`caching,
`the close after an open of a file for
`modification installs a completely new copy of the file.
`Rather than logging the open, close, and intervening
`write operations individually, Venus logs a single
`store record during the handling of a close.
`
`Our strategy is a compromise that balances availability,
`consistency, and scalability. For files and symbolic links,
`Venus purges the object on callback break, and refetches it
`on demand or during the next hoard walk, whichever
`occurs earlier. If a disconnection were to occur before
`refetching,
`the object would be unavailable.
` For
`directories, Venus does not purge on callback break, but
`marks the cache entry suspicious. A stale cache entry is
`thus available should a disconnection occur before the next
`hoard walk or reference. The acceptability of stale
`Another optimization consists of Venus discarding a
`directory data
`follows
`from
`its particular callback
`previous store record for a file when a new one is
`semantics. A callback break on a directory typically means
`appended to the log. This follows from the fact that a
`that an entry has been added to or deleted from the
`store renders all previous versions of a file superfluous.
`directory. It is often the case that other directory entries
`The store record does not contain a copy of the file’s
`and the objects they name are unchanged. Therefore,
`contents, but merely points to the copy in the cache.
`saving the stale copy and using it in the event of untimely
`disconnection causes consistency to suffer only a little, but We are currently implementing two further optimizations
`increases availability considerably.
`to reduce the length of the replay log. The first generalizes
`the optimization described in the previous paragraph such
`that any operation which overwrites the effect of earlier
`operations may cancel the corresponding log records. An
`example would be the cancelling of a store by a
`The second
`subsequent unlink or truncate.
`optimization exploits knowledge of inverse operations to
`cancel both the inverting and inverted log records. For
`example, a rmdir may cancel its own log record as well
`as that of the corresponding mkdir.
`
`4.4. Emulation
`In the emulation state, Venus performs many actions
`normally handled by servers. For example, Venus now
`assumes full responsibility for access and semantic checks.
`It
`is also responsible for generating
`temporary file
`identifiers (fids) for new objects, pending the assignment of
`permanent fids at reintegration. But although Venus is
`functioning as a pseudo-server, updates accepted by it have
`to be revalidated with respect to integrity and protection by
`real servers. This follows from the Coda policy of trusting
`only servers, not clients. To minimize unpleasant delayed
`surprises for a disconnected user, it behooves Venus to be
`as faithful as possible in its emulation.
`
`Cache management during emulation is done with the same
`priority algorithm used during hoarding.
` Mutating
`operations directly update the cache entries of the objects
`involved. Cache entries of deleted objects are freed
`immediately, but those of other modified objects assume
`infinite priority so that they are not purged before
`reintegration. On a cache miss, the default behavior of
`Venus is to return an error code. A user may optionally
`request Venus to block his processes until cache misses can
`be serviced.
`
`4.4.2. Persistence
`A disconnected user must be able to restart his machine
`after a shutdown and continue where he left off. In case of
`a crash, the amount of data lost should be no greater than if
`the same failure occurred during connected operation. To
`provide these guarantees, Venus must keep its cache and
`related data structures in non-volatile storage.
`
`Meta-data, consisting of cached directory and symbolic
`link contents, status b

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket