`
`
`
`DROPBOX EX. 1012DROPBOX EX. 1012
`
`
`
`
`
`THE ADVANCED COMPUTING SYSTEMS ASSOCIATION
`
`The following paper was originally published in the
`Proceedings of the USENIX Annual Technical Conference
`
`Monterey, California, USA, June 6-11, 1999
`
`Operation-based Update Propagation
`in a Mobile File System
`
`_
`
`Yui-Wah Lee, Kwong-Sak Leung
`The Chinese University of Hong Kong
`Mahadev Satyanarayanan
`Carnegie Mellon University
`
`© 1999 by The USENIX Association
`All Rights Reserved
`
`Rights to individual papers remain with the author or the author's employer. Permission is granted for noncommercial
`reproduction of the work for educational or research purposes. This copyright notice must be included in the reproduced paper.
`USENIX acknowledges all trademarks herein.
`
`For more information about the USENIX Association:
`Phone: 1 510 528 8649
`FAX: 1 510 548 5738
`Email: office@usenix.org WWW: http://www.usenix.org
`
`Dropbox Ex. 1012
`
`
`
`Operation-based Update Propagation
`in a Mobile File System (cid:3)
`
`Kwong-Sak Leung
`Yui-Wah Lee
`The Chinese University of Hong Kong
`fclement,ksleungg@cse.cuhk.edu.hk
`
`Mahadev Satyanarayanan
`Carnegie Mellon University
`satya+@cs.cmu.edu
`
`Abstract
`
`In this paper we describe a technique called operation-
`based update propagation for efficiently transmitting up-
`dates to large files that have been modified on a weakly
`connected client of a distributed file system. In this tech-
`nique, modifications are captured above the file-system
`layer at the client, shipped to a surrogate client that is
`strongly connected to a server, re-executed at the surro-
`gate, and the resulting files transmitted from the surro-
`gate to the server. If re-execution fails to produce a file
`identical to the original, the system falls back to ship-
`ping the file from the client over the slow network. We
`have implemented a prototype of this mechanism in the
`Coda File System on Linux, and demonstrated perfor-
`mance improvements ranging from 40 percents to nearly
`three orders of magnitude in reduced network traffic and
`elapsed time. We also found a novel use of forward error
`correction in this context.
`
`1 Introduction
`
`The use of a distributed file system on a mobile client is
`often hindered by poor network connectivity. Although
`disconnected operation [8] is feasible, a mobile client
`with an extensive amount of updates should not defer
`propagating them to a server for too long. Damage, theft
`or destruction of the client before update propagation
`will result in loss of those updates. Further, their timely
`propagation may be critical to successful collaboration
`
`(cid:3)This research was partially supported by the Defense Advanced Research
`Projects Agency (DARPA), Air Force Materiel Command, USAF under agree-
`ment number F19628-96-C-0061, the Intel Corporation, and the Novell Corpo-
`ration. The views and conclusions contained herein are those of the authors and
`should not be interpreted as necessarily representing the official policies or en-
`dorsements, either expressed or implied, of DARPA, Intel, Novell, or the U.S.
`Government.
`
`and in reducing the likelihood of update conflicts.
`
`Propagation of updates from a mobile client is often
`impeded by weak connectivity in the form of wireless
`or wired networks that are low-bandwidth or intermit-
`tent. Aggressive update propagation under these condi-
`tions increases the demand on scarce bandwidth. A ma-
`jor component to bandwidth demand is the shipping of
`large files in their entirety. Two obvious solutions to the
`problem are delta shipping and data compression. The
`former tries to ship only the incremental differences be-
`tween versions of files, and the latter “compresses out”
`the redundancies of files before shipping the files. Un-
`fortunately, as discussed later in this paper, both of these
`methods have shortcomings that limit their usefulness.
`
`In this paper, we focus on a radically different solution,
`which is called operation-based update propagation (or
`operation shipping for brevity). It is motivated by two
`observations. First, large files are often created or mod-
`ified by user operations that can be easily intercepted
`and compactly represented. Second, the cost of shipping
`and re-executing the user operations is often significantly
`smaller than that of shipping the large files over a weak
`network.
`
`To propagate a file, the client ships the operation to a
`surrogate client that is well connected to the server, as
`shown in Figure 1. The surrogate re-executes the opera-
`tion, validates that the re-generated file is identical to the
`original, and then propagates the file via its high-speed
`connection to the server. If the file re-generated by the
`surrogate does not match that from the original execu-
`tion, the system falls back to shipping the original from
`the client to the server over the slow connection. This
`validation and fall-back mechanism is essential to ensure
`the correctness of update propagation.
`
`Dropbox Ex. 1012
`
`Page 1
`
`
`
`1. Logging of user operations
`
`3. re-execution of user operations
`
`client
`
`Large files
`updated
`
`2. shipping of operation log
`
`compact operation log
`
`surrogate
`
`Large files
`re-generated
`
`4. validation of
` re-execution
` and shipping
` of re-generated
` files.
`
`weak network
`
`strong network
`
`server
`
`Figure 1: An overview of operation shipping
`
`The use of a surrogate in our approach ensures that
`server scalability and security are preserved — servers
`are not required to execute potentially malicious and
`long-running code, nor are they required to instantiate
`the execution environments of the numerous clients they
`service.
`In our model, we assume each mobile client
`has pre-arranged for a suitable surrogate of an appropri-
`ate machine type, at an adequate level of security, pos-
`sessing suitable authentication tickets, and providing an
`appropriate execution environment. This assumption is
`reasonable, since many users of mobile clients also own
`powerful and well connected personal desktops in their
`offices, and they can pre-arrange for these otherwise idle
`personal machines as the surrogates.
`
`Even though the idea of operation shipping is concep-
`tually simple, there are many details that have to be ad-
`dressed to make it work in practice. In the rest of this
`paper, we describe how we handle these details by imple-
`menting a prototype based on the Coda File System. Our
`experiment confirms the substantial performance gains
`of this approach. Depending on the metric and the spe-
`cific application, we have observed improvements rang-
`ing from a factor of three to nearly three orders of mag-
`nitude.
`
`2 Coda background
`
`We have implemented our prototype by extending the
`Coda File System [3]. Although our experience is based
`on Coda, the general principles should also be applicable
`to other mobile file systems. We briefly describe Coda in
`this section; more information is available in the litera-
`ture [3, 8, 7, 14, 13].
`
`Venus, carefully manages and persistently stores cached
`file-system objects. To support mobile computing, Coda
`clients can be used in disconnected and weakly connected
`mode, where Venus emulates the servers and allows file-
`system operations on cached objects. Updates are ap-
`plied immediately to locally cached objects, and are also
`logged in a client-modify log (CML). The logging mech-
`anism allows propagation of updates to servers to be de-
`ferred until a convenient time, such as when network
`connectivity is restored. Venus propagates these updates
`with a mechanism called trickle reintegration [14, 13].
`When propagation is attempted, a prefix of the log is
`shipped to the server in temporal order, the size of the
`prefix being determined by available bandwidth.
`
`The effect of each mutating file-system operation is rep-
`resented as a record in the CML. For example, a chmod
`operation is logged as a CHMOD record, a mkdir opera-
`tion is logged as a MKDIR record. Furthermore, if there
`have been some intervening write operations made to
`an open’ed file, a subsequent close operation will be
`logged as a STORE record.
`
`Records of the type STORE are special, because the as-
`sociated data include the contents of the files. There-
`fore, these records are much bigger than records of other
`types. STORE records can be as large as several kilo-
`bytes or even megabytes, whereas other records are typ-
`ically smaller than a few hundred bytes. Although the
`contents of a file logically constitute a part of the CML,
`they are physically stored in a separate container file in
`the client’s local file system.
`
`Although trickle reintegration has been shown to be ef-
`fective in decoupling the foreground file-system activi-
`ties from the slow propagations of updates, it still suffers
`from an important limitation: updated files are propa-
`gated in their entirety. In other words, although the users
`perceive a fast response time from the file system, the ac-
`tual propagations of the updates are very slow, and they
`generate heavy network traffic. On a weak network, we
`need a more efficient scheme for shipping updates.
`
`3 Design and Implementation
`
`3.1 Overview
`
`The Coda model is that there are many clients and a
`few servers. On each client, a cache manager, called
`
`We organize our discussion according to the four steps
`shown in Figure 1:
`
`2
`
`Dropbox Ex. 1012
`
`Page 2
`
`
`
`1. Logging of user operations. When a file is updated
`on a client, the client keeps the new file value V (the
`contents of the file) and also logs the user operation
`O.
`
`text processors
`and linkers (gcc, yacc, and ld),
`(latex and troff), file-format converters (dvips
`and sgml2latex), software-project management tools
`(make), and file packagers (tar).
`
`2. Shipping of operation log. If the network band-
`width is low, the client does not ship V to the server.
`Instead, it ships O, the fingerprint of V , and other
`meta-data to a surrogate client.
`
`3. Re-execution of operations. The surrogate re-
`executes O and produces a file value V .
`
`4. Validation of re-execution. The surrogate vali-
`dates the re-execution by checking the fingerprints
`and meta-data. If it accepts the re-execution, then it
`will present V to the server; otherwise, it will re-
`port failure to the client, which will then fall back
`to value shipping and ship V directly to the server.
`
`3.2 Logging of user operations
`
`3.2.1 Level of abstraction
`
`We need to find the right level of abstraction for log-
`ging operations, such that the operation logs are com-
`pact. Currently, Venus logs only the low-level file-system
`operations. File-system operation logs are not compact
`enough for efficient operation shipping, since they con-
`tain the contents of the files.
`
`On the other hand, computer users tend to think at a
`higher level. Typically, when they are working with a
`computer, they issue a series of commands to it. Since
`human beings type relatively slowly, the commands must
`be compact; hence, we focus on this level for operation
`shipping. When we speak of user operations, we mean
`the high level commands of computer users that can be
`intercepted, logged, and replayed later.
`
`There must be an entity that intercepts the user oper-
`ations and supplies the relevant information to the file
`system. This entity can be an interactive shell, or it can
`be the application itself. We refer to the two cases as
`application-transparent and application-aware logging.
`
`3.2.2 Application-transparent logging
`
`Application-transparent logging is possible if the ap-
`plication is non-interactive.
`In this case, an inter-
`active shell can intercept
`the information related to
`the user operation. For example, operation logging
`can be transparent to the following types of applica-
`tions (examples are listed in parentheses):
`compiler
`
`For application-transparent operation shipping, we need
`not modify the applications, we simply need to mod-
`ify some interactive shells – the meta-applications –
`such as bash and tcsh. Fortunately, there are much
`fewer meta-applications than applications. We assume,
`of course, that the users are willing to use a modified
`shell to take advantage of operation shipping.
`
`We extend the file-system API (application-programmer
`interface) with two new commands for the ioctl sys-
`tem call: VIOC BEGIN OP and VIOC END OP. They
`delineate the beginning and the end of a user operation.
`The user operation is tagged with the process group , so
`forked children in the same group are considered part of
`the same user operation. When the file system receives a
`file-system call, it can determine whether the call comes
`from a tagged user operation by examining the process-
`group information of the caller. The information nec-
`essary for re-execution, including the name of the user
`command, the command-line arguments, current work-
`ing directory, and so on, is passed to the file system with
`the VIOC BEGIN OP command.
`
`Our file system provides the logging mechanism, but the
`logging entities can choose the appropriate logging poli-
`cies. For example, an interactive shell may allow users
`to specifically enable or disable operation logging based
`on certain criteria.
`
`We have experimented with the bash shell, a com-
`mon Unix shell. We added a few lines of code so
`that the modified shell issues the VIOC BEGIN OP and
`VIOC END OP commands appropriately. Currently, we
`implement the most straightforward policy: the shell logs
`every user operation. In the future, we may implement a
`more flexible policy.
`
`3.2.3 Application-aware logging
`
`Application-aware logging is needed when the applica-
`tion is interactive.
`In this case, the application is in-
`volved in capturing the user operations. The follow-
`ing types of applications fall into this category (ex-
`amples are listed in parentheses):
`text editors and
`word processors (emacs, vi, Applix Word and
`Microsoft Word), drawing tools (xfig and Corel
`Draw), presentation software (Applix Present and
`Microsoft PowerPoint), computer-aided-design
`
`3
`
`Dropbox Ex. 1012
`
`Page 3
`
`
`
`tools (AutoCAD and magic), and image manipulators
`(xv and GNU Image Manipulation Program).
`
`shipping or value shipping.
`
`In this paper, we focus on application-transparent op-
`eration shipping. The mechanism that we have de-
`signed, and the evaluation that we have performed, is
`limited to non-interactive applications. We plan to study
`application-aware operation shipping next, and we will
`report our findings in the future.
`
`3.3 Shipping of operation log
`
`3.3.1 Shipping mechanism
`
`The reintegrator, which is a user-level thread within
`Venus, manages update propagation. It periodically se-
`lects several records from the head of the CML and ships
`them to the servers. For records with no user-operation
`information attached, the reintegrator uses value ship-
`ping and makes a ViceReintegrate remote proce-
`dure call (RPC) to the server. The server, when process-
`ing the RPC, back-fetches the related container files from
`the client. If the reply of the RPC indicates success, the
`reintegrator will locally commit the updates. Local com-
`mitment of updates is the final step of successful update
`propagation, and includes updating the states of relevant
`objects, such as version vectors and dirty bits, and trun-
`cating the CML.
`
`If user-operation information is available for a record,
`the reintegrator will attempt operation shipping first. All
`the records associated with the same user operation will
`be operation shipped altogether. The reintegrator se-
`lects the records, packs the operation log, and makes a
`UserOpPropagate RPC to the surrogate. If the reply
`indicates success, the reintegrator will locally commit the
`updates. However, if the reply indicates failure, the rein-
`tegrator will set a flag in each of the records indicating
`that it has tried and failed propagation by operation ship-
`ping. These records will then be value shipped.
`
`3.3.2 Cost model
`
`The current version of our prototype attempts operation
`shipping for a record whenever there is user-operation
`information available. This static approach implicitly
`assumes that the connectivity between a mobile client
`and its servers is always weak.
`In real life, a mobile
`client may have strong connectivity occasionally. Dur-
`ing that time, as explained in the following paragraphs,
`value shipping is more efficient than operation shipping.
`We plan to enhance our prototype so that mobile clients
`dynamically decide whether they should use operation
`
`Our cost model compares the costs of value shipping
`with that of operation shipping. For each case, there are
`two different costs involved: network traffic and elapsed
`time.
`
`For value shipping, assuming the overhead is negligible,
`the network traffic is the total length L of the updated
`files, and the elapsed time is Tv = L=Bc, where Bc is
`the bandwidth of the network connecting the client to the
`server.
`
`For operation shipping, the network traffic is the length
`of the operation log, Lop, and the elapsed time is Top.
`The latter is composed of four components:
`(1) the
`time needed to ship the operation log (Lop=Bc), (2)
`the time needed for re-executing the operation (E), (3)
`the time needed for additional computational overhead
`(Hop) such as computing checksum information and en-
`coding and decoding of forward-error-correction codes,
`and (4) the time needed to ship the updated files to the
`servers. There are two cases for the last component. If
`the re-execution passes the validation (accepted), the up-
`dated files will be shipped from the surrogate (the time
`cost will be L=Bs, where Bs is the bandwidth of the
`network connecting the surrogate to the server); if the
`re-execution fails the validation, the updated file will be
`shipped from the client (the time cost will be L=Bc). The
`following equation summarizes the time costs involved:
`
`Top = (cid:26) Lop=Bc + E + Hop + L=Bs
`
`Lop=Bc + E + Hop + L=Bc
`
`if accepted
`if rejected
`
`(1)
`
`Therefore, operation shipping is more favorable than
`value shipping only in certain condition. Operation ship-
`ping saves network traffic if the operation log is more
`compact than the updated files (Lop < L). Also, it
`speeds up the update propagation (Top < Tv) if the fol-
`lowing five conditions are true: (1) the re-execution is
`accepted, (2) the operation log is compact (Lop (cid:28) L),
`(3) the re-execution is fast (E (cid:28) L=Bc), (4) the time
`needed for additional computational overheads is small
`(Hop (cid:28) L=Bc), and (5) the surrogate has a much better
`network connectivity than the client (Bs (cid:29) Bc).
`
`3.4 Re-execution of user operations
`
`3.4.1 Re-execution mechanism
`
`Although we anticipate most re-executions will be suc-
`cessful (execute completely and pass the validation), we
`have to prepare for the possibility that they may fail (can-
`not execute completely or fail the validation). Therefore,
`we have to ensure that re-executions are abortable trans-
`
`4
`
`Dropbox Ex. 1012
`
`Page 4
`
`
`
`actions such that failed re-executions will have no lasting
`effect. We implement this as follows.
`
`principle to facilitate repeating re-executions.
`
`Upon receiving a UserOpPropagate RPC from the
`client, Venus on the surrogate will temporarily put the af-
`fected volume 1 into write-disconnected mode, and then
`re-executes the user operation via a spawned child called
`the re-executor. During the re-execution, since the vol-
`ume is write-disconnected, input files can be retrieved
`from the servers, but file-system updates are not writ-
`ten immediately to the server. These updates are tagged
`with the identity of the re-execution and are logged in the
`CML. If the re-execution later passes the validation, the
`surrogate will re-connect the volume with the servers and
`reintegrate the updates. At the end, and only when the
`reintegration succeeds, the surrogate will locally commit
`the updates, and indicate a success to the client in a RPC
`reply. On the other hand, failures may happen any time
`during the re-execution, the validation, or the reintegra-
`tion. If any such failures occur, the surrogate will discard
`all the updates and indicate a failure to the client in a RPC
`reply.
`
`It is possible that a reintegration completes successfully
`at the servers, but the RPC response fails to arrive at the
`client in spite of retransmission. This can happen when
`there is an untimely failure of the surrogate or the com-
`munication channels. We make use of the existing Coda
`mechanism of ensuring atomicity of reintegrations [13]
`to handle this kind of failures. In short, the client pre-
`sumes an reintegration has failed if it does not receive
`a positive response from the surrogate, and it will retry
`the reintegration. At the same time, the server retains the
`information necessary to detect whether a record has al-
`ready been reintegrated earlier. If a client attempts such
`an improper retry, the server will reply with the error
`code EALREADY (Operation already in progress), and
`the client will then know that the records have already
`been successfully reintegrated in a previous attempt, and
`it will simply locally commit those records.
`
`3.4.2 Facilitating repeating re-executions
`
`A re-execution of a user operation is accepted when it
`produces a result that is identical to that of the original
`execution. We say the re-execution is repeating the orig-
`inal execution. Only these repeating re-executions are
`useful to operation shipping. We know that a determin-
`istic procedure will produce the same result in different
`runs provided that it has the same input and the same en-
`vironment in each run. Our file system makes use of this
`
`1In Coda, a volume is a collection of files forming a partial subtree
`of the Coda name space.
`
`First, the re-executor runs with the four Unix-process at-
`tributes identical to that of the original execution: (1) the
`working directory, (2) the environment-variable list, (3)
`the command-line arguments, and (4) the file-creation
`mask [23].
`
`Second, our file system expects that the surrogate ma-
`chine has software and hardware configurations similar
`to the client machine. Two machines are said to be iden-
`tically configured if they have the same CPU type, the
`same operating system, the same system-header files,
`the same system libraries, and any other system environ-
`ments that can affect the outcomes of computations on
`the two machines.
`
`Third, if a re-execution requires an input file stored in
`Coda, we can rely on the file system to ensure that the
`client and the surrogate will use an identical version of
`the input file. Coda can ensure this because a client
`ships updates to its servers in temporal order, and the
`surrogate will always retrieve the latest version of a file
`from the servers. For example, consider a user issuing
`three user operations successively on a client machine:
`(Op1) update a source file using an editor, (Op2) com-
`pile the source file into an object file using a compiler,
`and (Op3) update the source file again. When the surro-
`gate re-executes Op2, the client must have shipped Op1
`but not Op3, and the re-executor will see exactly the ver-
`sion updated by Op1.
`
`We emphasize that our file system does not guarantee
`that all re-executions will be repeating their original ex-
`ecutions; it just increases the probability of that happen-
`ing. More importantly, it ensures that only repeating re-
`executions will be accepted. This is achieved by the pro-
`cedure of validation (Section 3.5).
`
`In the evaluation section, we will see that many appli-
`cations exhibit repeating re-executions. Although some
`other applications exhibit non-repeating side effects dur-
`ing re-executions, these side effects can be handled in
`simple ways. Therefore, we believe we can use opera-
`tion shipping with a large number of applications.
`
`3.4.3 Status information
`
`In addition to updating contents of files, user opera-
`tions also update some meta-data (status information).
`Some of the meta-data are internal to the file system and
`are invisible to the users (e.g., every STORE operation
`has an identity number for concurrency control); some
`
`5
`
`Dropbox Ex. 1012
`
`Page 5
`
`
`
`are external and are visible to the users (e.g., the last-
`modification time of a file).
`In both cases, to make a
`re-execution’s result identical to that of the original ex-
`ecution, the values of the meta-data of the re-execution
`should be reset to those of the original execution. There-
`fore, the client needs to pack these meta-data as part of
`the operation log, and the surrogate needs to adjust the
`meta-data of the re-execution to match the supplied val-
`ues.
`
`3.4.4 Non-repeating side effects
`
`We focus on applications that perform deterministic
`tasks, such as compiling binaries or formatting texts,
`and exclude applications such as games and probabilistic
`search that are randomized in nature. In an early stage of
`this project, we expected the re-executions of these ap-
`plications to repeat their original executions completely.
`However, we found that some common applications ex-
`hibited non-repeating side effects. So far we have found
`two types of such side effects: (1) time stamps in output
`files, and (2) temporary names of intermediate files. For-
`tunately, we are able to handle these side effects automat-
`ically, so we are still able to use operation shipping with
`these applications. We will discuss the handling methods
`in the two following subsections.
`
`We also anticipate a third possible kind of side effect: ex-
`ternal side effects. For example, if an application sends
`an email message as the last step of execution, then the
`user may be confused by the additional message sent by
`re-execution. To cope with this kind of side effect, we
`plan to allow users to disable the use of operation ship-
`ping for some applications.
`
`3.4.5 Side effects due to time stamps
`
`Some applications, such as rp2gen, ar, and latex,
`put time stamps into the files that they produce. rp2gen
`generates stubs routines for remote procedure calls, ar
`builds library files, and latex formats documents. They
`use time stamps for various reasons. Because of the
`time stamps, a file generated by a re-execution will dif-
`fer slightly from the version generated by the original
`execution. Observing that only a few bytes are differ-
`ent, we can treat the changed bytes as “errors” and use
`the technique of forward error correction (FEC) [6, 4]
`to “correct” them. (We are indebted to Matt Mathis for
`suggesting this idea to us.)
`
`Our file system, therefore, does the following. Venus on
`the remote client computes an error-correction code (we
`use the Reed-Solomon code) for each updated file that
`
`is to be shipped by operation, and it packs the code as a
`part of the operation log. Venus on the surrogate, after
`re-executing the operation, uses the code to correct the
`time stamps that may have occurred in the re-generated
`version of the file.
`
`Note that this is a novel use of the Reed-Solomon code.
`Usually, data blocks are sent together with parity blocks
`(the error-correction code); but our clients send only the
`parity blocks. The data blocks are instead re-generated
`by the surrogate. Whereas others use the code to correct
`communication errors, we use it to correct some minor
`re-execution discrepancies. If a discrepancy is so large
`that our error-correction procedure cannot correct it, our
`file system simply falls back to value shipping. This en-
`tails a loss of performance but preserves correctness.
`
`The additional network traffic due to the error correction
`code is quite small. We choose to use a (65535,65503)
`Reed-Solomon block code over GF (
`In other
`).
`words, the symbol size is 16 bits, each block has 65,503
`data symbols (131,006 bytes) and 32 parity symbols (64
`bytes). The system can correct up to 16 errors (32 bytes)
`in each data block.
`
`
`
`However, the additional CPU time due to encoding and
`decoding is not small. We discuss this in more detail
`in Section 4.4. Also, the Reed-Solomon code cannot
`correct discrepancies that change length (for example,
`the two timestamps “9:17” and “14:49” have differ-
`ent lengths). The rsync algorithm [24] can handle length
`change, but we favor the Reed-Solomon code because it
`has a smaller overhead on network traffic.
`
`3.4.6 Side effects due to temporary files
`
`The program ar is an example of an application that
`uses temporary files. Figure 2 shows the two CMLs on
`the client and the surrogate after the execution and re-
`execution of the following user operations:
`
`ar rv libsth.a foo.o bar.o.
`
`The user operation builds a library file libsth.a from
`two object modules foo.o and bar.o.
`
`Note that ar used two temporary files sta09395 and
`sta16294 in the two executions. The names of the
`temporary files are generated based on the identity num-
`bers of processes executing the application, and hence
`they are time dependent. Our validation procedure (Sec-
`tion 3.5) might naively reject the re-execution “because
`the records are different.”
`
`6
`
`Dropbox Ex. 1012
`
`Page 6
`
`
`
`Original Execution
`
`Re-execution
`
`Create sta09395
`
`Create sta16294
`
`3.5.1 Validation mechanism
`
`3.5 Validation of re-executions
`
`Store sta09395
`
`Store sta16294
`
`Remove libsth.a
`
`Remove libsth.a
`
`Rename sta09395
` to libsth.a
`
`Rename sta16294
` to libsth.a
`
`Figure 2: CMLs of two executions of ar
`
`Temporary files appear only in the intermediate steps of
`the execution. They will either be deleted or renamed at
`the end, so their names do not affect the final file-system
`state. An application uses temporary files to provide ex-
`ecution atomicity. For example, ar writes intermediate
`computation results to a temporary file, and it will re-
`name the file to the target filename only if the execution
`succeeds. This measure is to ensure that the target file
`will not be destroyed accidentally by a futile execution.
`
`If a temporary file is created and subsequently deleted
`during the execution of a user operation, its modification
`records will be deleted by the existing identity cancella-
`tion procedure [7]. They will not appear in the two CMLs
`and will not cause naive rejections of re-execution.
`
`However, if a temporary file is created and subsequently
`renamed during the execution of a user operation, its
`modifications records will be present in the CMLs, and
`might cause our validation to reject the re-execution. Our
`file system uses a procedure of temporary-file renaming
`to compensate for the side effect. This procedure is done
`after the re-executor has finished the re-execution and be-
`fore the surrogate begins the validation.
`
`The idea of the temporary-file renaming is to scan the
`two CMLs and identify all the temporary files as well
`as their final names. We identify temporary files by the
`fact that they are created and subsequently renamed in
`an user operation. For each temporary file used by the
`surrogate, our file system determines the temporary file
`name N used by the client in the original execution. It
`thus renames the temporary file to N . In our ar exam-
`ple, the temporary file sta16294 will be renamed to
`sta09395.
`
`7
`
`Validation is done after the handling of side effects, and
`the adjustments of status information. By that time, if a
`re-execution is repeating its original execution, the set of
`mutations incurred on the surrogate should be the same
`as that on the client. Since mutations are captured on
`CMLs, our file system can validate a re-execution by
`comparing the relevant portion of the CML of the sur-
`rogate to that of the client.
`
`the client packs every
`To facilitate the comparison,
`record in the relevant portion of the CML as part of the
`operation log. However, the container files, which are as-
`sociated with STORE records, are not packed; otherwise
`they would incur a heavy network traffic for shipping the
`operation log, amounting to the traffic needed for value
`shipping. Instead, the client packs the fingerprint of each
`container file. When comparing CMLs, the surrogate as-
`serts that two container files are equal if they have the
`same fingerprint.
`
`3.5.2 Fingerprint
`
`A fingerprint function produces a fixed-length fingerprint
`f (M ) for a given arbitrary-length message M . A good
`fingerprint function should have two properties: (1) com-
`puting f (M ) from M is easy, and (2) the probability that
`another message M gives the same fingerprint is small.
`For our purpose, the messages for which we find the fin-
`gerprints are the contents of the container files.
`
`Our file system employs MD5 (Message Digest 5) fin-
`gerprints [17, 21]. Each fingerprint has 128 bits, so the
`overhead is very small. Also, the probability that two
`different container files give the same fingerprint is very
`small; it is in the order of =
`.
`
`The fact that the probability is non-zero, albeit very
`small, may worry some readers. However, even value
`shipping is vulnerable to a small but non-zero proba-
`bility of error. That is, there is a small probability that
`a communication error has occurred but is not detected
`by the error-correction subsystem of the communication
`channel. We believe people can tolerate the small prob-
`abilities of errors of both opera