throbber
DROPBOX EX. 1007
`
`DROPBOX EX. 1007
`
`
`
`

`
`Efficient Distributed Backup with Delta Compression
`
`Randal C. Burns
`Department of Computer Science
`IBM Almaden Research Center
`randal@almaden.ibm.com
`
`Darrell D. E. Longf
`Department of Computer Science
`University of California, Santa Cruz
`darrell@cs.ucsc.edu
`
`Abstract
`
`Inexpensive storage and more powerful processors have re-
`sulted in a proliferation of data that needs to be reliably
`backed up. Network resource limitations make it increas-
`ingly difficult to backup a distributed file system on a nightly
`or even weekly basis. By using delta compression algo-
`rithms, which minimally encode a version of a file using only
`the bytes that have changed, a backup system can compress
`the data sent to a server. With the delta backup technique,
`we can achieve significant savings in network transmission
`time over previous techniques.
`Our measurements indicate that file system data may, on
`average, be compressed to within 10% of its original size
`with this method and that approximately 45% of all changed
`files have also been backed up in the previous week. Based
`on our measurements, we conclude that a small file store on
`the client that contains copies of previously backed up files
`can be used to retain versions in order to generate delta files.
`To reduce the load on the backup server, we implement a
`modified version storage architecture, version jumping, that
`allows us to restore delta encoded file versions with at most
`two accesses to tertiary storage. This minimizes server work-
`load and network transmission time on file restore.
`
`1
`
`Introduction
`
`Currently, file system backup takes too long and has a pro-
`hibitive storage cost. Resource constraints impede the reli-
`able and frequent backup of the increasing amounts of data
`spinning on disks today. The time required to perform back-
`up is in direct proportion to the amount of data transmitted
`
`tThe work of this author was performed while a Visiting Scientist at the
`IBM Almaden Research Center.
`
`Pcm+sion to make digitnl/l~nrd copk of all or part ofthis mnkrial Ibr
`Ill:11 tllc capics
`pcrsonnl or classroom USC is granted willlout I k pr&dCd
`are not made or distributed for prolit or comnvrckd ndvar~~ngc. IIW copy-
`right notice, tla title of lhe publicalion and ils thlc :ippcnr. and nolicc is
`given lllnt copyright is by pcnnksion of Ihe AChl.
`Ills. ‘I’0 Cupy otll~nuise,
`to republish, to post on sc\r~r\rs or to rcdistrihuk 10 lists. rquircs spccilic
`pcnnission and/or fk
`IOPADS 97 Son *Jose (2 1JS4
`1997.4CM o-89791-966-l/97/1
`copyight
`
`I..SB.W
`
`over the network from backup clients to a backup server. By
`using delta compression, compactly encoding a file version
`as a set of changes from a previous version, our backup sys-
`tem architecture reduces the size of data to be transmitted,
`reducing both time to perform backup and storage required
`on a backup server.
`Backup and restore can be limited by both network band-
`width, often 10 Mb/s, and poor throughput to tertiary storage
`devices, as slow as 500 KBls to tape storage. Since resource
`limitations frequently make backing up changed files infca-
`sible over a single night or even weekend, delta file com-
`pression has great potential to alleviate bandwidth problems
`by using available client processor cycles and disk storage to
`reduce the amount of data transferred. This technology cn-
`ables us to perform file backup over lower bandwidth chan-
`nels than were previously possible, for example a subscrip
`tion based backup service over the Internet.
`Early efforts to reduce the amount of data to bc backed up
`produced a simple optimization, incremental backup, which
`backs up only those files that have been modified since the
`end of the last epoch, the period betsvcen two backups. While
`incremental backup only detects changes at the granularity
`of a file, delta backup refines this concept, transmitting only
`the altered bytes in the files to be incrementally backed up
`[12]. Consequently, if only a few bytes arc modified in a
`large file, the backup system saves the expense of transmit-
`ting the large file in favor of transmitting only the changed
`bytes.
`Recent advances in differencing algorithms [ 1,4], allow
`nearly optimally compressed encodings of binary inputs in
`linear time. We use such an algorithm to generate delta cn-
`codings of versions.
`A differencing algorithm finds and outputs the changes
`made between two versions of the same file by locating com-
`mon strings to be copied and unique strings to be added cx-
`plicitly. A delfafire (A) is the encoding of the output of a
`differencing algorithm. An algorithm that creates a delta file
`takes as input two versions of a file, a rej%rencefile and a ver-
`sionjle to be encoded, and outputs a delta file rcprcscnting
`the modifications made between versions:
`
`Dropbox Ex. 1007
`
`

`
`Reconstruction, the inverse operation, requires the reference
`file and a delta file to rebuild a version:
`
`A backup system can leverage delta files to generate min-
`imal file representations (see Figure 1). We enhanced the
`client/server architecture of the AdStar Distributed Storage
`Manager (ADSM) backup system [5] to transmit delta files
`when a backup client has retained two versions of the same
`file. Furthermore, both uncompressed files and delta en-
`coded files still realize benefits from the standard file com-
`pression methods that ADSM already utilizes [I]. We inte-
`grate delta backup into ADSM and have a backwardly com-
`patible system with optimizations to transmit and store a re-
`duced amount of data at the server.
`The server storage and network transmission benefits are
`paid for in the client processor cycles to generate delta files
`and in additional disk storage at a backup client to retain sec-
`ond copies of files that are used to generate delta files. This
`architecture optimizes the network and storage bottleneck at
`the server in favor of the distributed resources of a server’s
`multiple clients.
`We will describe previous work in the use of delta com-
`pression for version storage in $2. Our modiications to the
`ADSM system architecture are presented in 53. In 54, we
`describe the delta storage problem in the presence of a large
`number of versions. In $5, we analyze the performance of
`the version jumping storage method and compare it to the
`optimally compact storage method of linear delta chains. In
`56 potential future work is discussed and we present our con-
`clusions in $7.
`
`2 Origins of Delta Backup
`
`Delta backup emerged from many applications, the first in-
`stance appearing in database technology. The database pages
`that are written between backup epochs are marked as “dirty’
`using a single bit [ 10, 173. At the end of an epoch, only
`the dirty pages need to be backed up. This concept paral-
`lels delta backup but operates only at page granularity. For
`file systems, there are no guarantees that modifications have
`page alignment. While dirty bits are effective for databases,
`they may not apply well to file system backup.
`To improve on the granularity of backup, logging meth-
`ods have been used to record the changes to a database [9,
`141 and a file system [16]. A logging system records every
`write during an epoch to a log file. This can be used to re-
`cover the version as it existed at the start of an epoch into its
`current state. While semantically similar to delta compres-
`sion, logging does not provide the compressibility guaran-
`tees of differencing algorithms. In the database example, a
`r
`
`27
`
`log or journal of changes grows in proportion to the number
`of writes made. If the same bytes are modified more than
`once or if adjacent bytes are modified at different times, this
`will not be a minimal representation. For an extremely active
`page, the log will likely exceed the page size. Differential
`compression also has the guarantee that a log of modifica-
`tions to a tile will be no smaller than the corresponding delta
`[41.
`Recently, several commercial systems have appeared on
`the marketplace that claim to perform delta backup [ 1 I, 6,
`181. While the exact methods these systems use have not
`been disclosed, the product literature implies that they either
`perform logging [ 1 I] or difference at the granularity of a file
`block [18, 61. We perform delta backup at the granularity
`of a byte. By running differential compression algorithms
`on the changed versions at this granularity, we generate and
`backup minimal delta files.
`
`2.1 Previous Work in Version Management
`Despite file restoration being the infrequent operation for
`backup and restore, optimizing this process has great util-
`ity. Often, restore is performed when file system compo-
`nents have been lost or damaged. Unavailable data and non-
`functional systems cost businesses, universities and home
`users lost time and revenue. This contrasts the backup oper-
`ation which is generally performed at night or in other low
`usage periods.
`Restoring files that have been stored using delta backup
`generates additional concerns in delta file management. Tra-
`ditional methods for storing deltas require the decompres-
`sion agent to examine either all of the versions of a given
`file [ 131 or all versions in between the first version and the
`version being restored [15]. In either case, the time to re-
`construct a given file grows at least linearly (see $4.1) with
`respect to the number of versions involved.
`A backup system has the additional limitation that any
`given delta file may reside on a physically distinct media
`device and device access can be slow, generally several sec-
`onds to load a tape in a tape robot [8]. Consequently, having
`many distinct deltas participate in the restoration of a single
`file becomes costly in device latency. An important goal of
`our system is to minimize the number of deltas participating
`in any given restore operation.
`Previous efforts in the efficient restoration of file versions
`have provided restoration execution times independent of the
`number of intermediate versions. These include methods
`based on AM Dags [7], linked data structures [19, 21, or
`string libraries [3]. However, these require all delta versions
`of a given file to be present at restore time and are conse-
`quently infeasible for a backup system. Such a backup sys-
`tem would require all prior versions of a file to be recalled
`from long term storage for that file to be reconstructed.
`As previous methods in efficient restoration fail to ap-
`
`/
`
`I
`
`I
`
`I :
`
`.
`
`I
`
`t ’
`i
`
`Dropbox Ex. 1007
`
`

`
`ADSM
`Delta
`Client
`
`Delta Compression Agent
`RCfCLYUC
`
`DcRa
`
`,
`:
`
`so&p
`
`\
`
`Server Hlemchlcal
`Storage Manager
`
`ADSM
`Delta
`
`Fue
`SYSlOll
`
`\
`
`I
`
`Relerence
`Fik
`StOlX
`
`\
`
`/
`
`.
`
`!
`
`Figure 1: Client/server schematic for a delta backup system.
`
`ply to the backup and restore application, we describe a new
`technique called version jumping and an associated system
`architecture. Version jumping takes many delta versions off
`of the same reference version and consequently requires at
`most two files to perform the restore operation on a given
`version. The restoration time is also independent of the total
`number of stored versions.
`
`3 System Architecture
`
`We modified the architecture and design of the AdStar Dis-
`tributed Storage Manager (ADSM) from IBM to add delta
`compression capabilities. The modified ADSM client delta
`compresses file data at the granularity of a byte and trans-
`mits delta files to the server. The modified ADSM server has
`enhanced file deletion capabilities to ensure that each delta
`file stored on the server also has the corresponding reference
`file.
`ADSM is a client/server backup and restore system cur-
`rently available for over 30 client platforms. The ADSM
`server is available for several operating systems including
`Windows NT and various UNIX and mainframe operating
`systems. The key features of ADSM are scheduled policy
`management for file backup, both client request and server
`polling, and hierarchical storage management of server me-
`dia devices including tape, disk drive and optical drives.
`
`3.1 Delta Compression at the Client
`We have modified the standard ADSM client to perform delta
`backup and restore operations. The modifications include
`the differencing and reconstruction algorithms (see 3 1) and
`the addition of a referencefile store.
`In order to compute deltas, the current version of the file
`must be compared with a reference version. We have the
`choice of storing the reference version on the client or fetch-
`
`ing it from the server. Obtaining the reference file from the
`server is unviable since it would increase both network traf-
`fic and server load, adversely affecting the time to perform
`the backup. By storing the reference file at the client, WC in-
`cur a small local storage cost in exchange for a large benefit
`in decreased backup time.
`We considered several options for maintaining the refcr-
`ence files, including copy-on-write, shadowing, and file sys-
`tem logging. Each of these options had to be rejected since
`they violated the design criterion that no file system modifi-
`cations could be made. Since ADSM supports more than
`30 client operating systems, any file system modification
`presents sign&ant portability and maintenance concerns.
`Instead, we chose to keep copies of recently backed up files
`in a reference file store.
`The reference file store is a reserved area on disk whcrc
`reference versions of files are kept (see Figure 1). When
`sending an uncompressed file to its server, the backup client
`copies this file to its reference file store. At this point, the file
`system holds two copies of the same file: one active version
`in fde system space and a static reference version in the rcf-
`erence file store. When the reference file store fills, the client
`selects a file to be ejected. We choose the victim file with a
`simple weighted least recently used (LRU) tcchniquc, In a
`backup system, many files are equally old, as they have been
`backed up at the same epoch. In order to discern among
`these multiple potential victims, our backup client uses an
`additional metric to weight tiles of equal LRU value WC
`select as a victim the reference file that achieved the worst
`relative compression on its previous usage, i.e. the file with
`the highest delta file size to uncompressed file size ratio at
`the last backup. This allows us to discern from many potcn-
`tial victims to increase the utility of our refercnco file store,
`At the same time, this weighting does not violate the tried
`and true LRU principle and consequently can bc cxpcctcd to
`realize all of the benefits of locality.
`
`28
`
`Dropbox Ex. 1007
`
`

`
`.2 c
`5 i%
`s 5
`,o
`z!
`
`0.6
`0.5
`0.4
`
`0.3
`
`0.2
`0.1
`nl
`"0
`
`20
`
`40
`
`60
`
`80
`Days
`
`I
`100 120 140 160
`
`Figure 2: Fraction of files modiied today also modified 1,4
`and 7 days ago.
`
`When the backup client sends a file that has a prior ver-
`sion in the reference file store, the client delta compresses
`this file and transmits the delta version. The old version of
`the file is retained in the reference file store as a common
`reference tile for a version jumping system.
`We do not expect the storage requirements of the refer-
`ence file store to be excessive. In order to evaluate the ef-
`fectiveness of the reference file store, we collected data for
`five months from the file servers at the School of Engineer-
`ing at the University of California, Santa Cruz. Each night,
`we would search for all files that had either been modified
`or newly created during the past 24 hours. Our measure-
`ments indicate that of approximately 2,356,400 files, less
`than 0.55% are newly created or modiied on an average day
`and less than 2% on any given day. These recently modified
`files contain approximately 1% of the file system data.
`In Figure 2, we estimate the effectiveness of the reference
`file store using a sliding window. We considered windows of
`l-7 days. The z-axis indicates the day in the trace, while the
`y-axis denotes the fraction of files that were modiied on that
`day and that were also created or modified in that window.
`Files that are not in the window are either newly created or
`have not been modified recently. An average of 45% and a
`maximum of 87% of files that are modified on a given day
`had also been modified in the last 7 days. This locality of file
`system access verifies the usefulness of a reference file store
`and we expect that a small fraction, approximately 5%, of
`local file system space will be adequate to maintain copies
`of recently modified file versions.
`
`3.2 Delta Versions at the Server
`
`In addition to our client modifications, delta version stor-
`age requires some enhanced file retention and file garbage
`collection policies at a backup server, For example, a naiive
`
`server might discard the reference files for delta files that can
`be recalled. The files that these deltas encode would then be
`lost from the backup storage pool. The file retention poli-
`cies we will describe require additional metadata to resolve
`the dependencies between a delta file and its corresponding
`reference file.
`A backup server accepts and retains files from backup
`clients (see Figure 1). These files are available for restora-
`tion by the clients that backed them up for the duration of
`their residence on the server. Files that are available for
`restoration are active tiles. A typical file retention policy
`would be: “hold the four most recent versions of a file.”
`While file retention policies may be far more complex, this
`turns out to have no bearing on our analysis and consequent
`requirements for backup servers. We only concern ourselves
`with the existence of deletion policies on an ADSM server
`and that files on the server are only active for a finite number
`of backup epochs.
`To reconstruct a version file to a client from a backup
`server, when the server has a delta encoding of this file, the
`client must restore both the delta encoding and the corre-
`sponding reference file. Under this constraint, a modified
`retention policy dictates that a backup server must retain all
`reference files for its active files that are delta encoded. In
`other words, a file cannot be deleted from the backup server
`until all active files that use it as a reference file have also
`been deleted. This relationship easily translates to a depen-
`dency digmph with dependencies represented by directed
`edges and files by nodes. This digraph is used both to encode
`dependent files which need to bc retained and to garbage col-
`lect the reference files when the referring delta versions are
`deleted.
`For a version chain storage scheme, we may store a delta
`file, AA*,A~, which depends on file AZ. However, the backup
`server may not store AZ. Instead it stores a delta file repre-
`sentation. In this event, we have AA~,A~ depend upon the
`delta encoding of As, AA, ,A* (see Figure 3(a)).
`The dependency digraphs for delta chains never branch
`and do not have cycles. Furthermore, each node in the di-
`graph has at most one inbound edge and one outbound edge.
`In this example, to restore file Al, the backup server needs
`to keep its delta encoding, AA*,A~, and it needs to be able
`to construct the file it depends upon, AZ. Since, we only
`have the delta encoding of AZ, we must retain this encod-
`ing, AA,,A~, and all files it requires to reconstruct A2. The
`obvious conclusion is: given a dependency digraph, to re-
`construct the file at a node, the backup server must retain all
`files in nodes reachable from the node to be reconstructed.
`For version jumping, as in version chains, the version
`digraphs do not branch. However, any node in the version
`digraph may have multiple incoming edges, i.e. a file may
`be a common reference file among many deltas (see Figure
`3(b)).
`The dependency digraphs for both version jumping and
`
`Dropbox Ex. 1007
`
`

`
`(a) Simple delta chain.
`
`(AA~.A;)
`
`(b) Jumping version chain.
`
`Figure 3: Dependency graphs for garbage collection and
`server deletion policies.
`
`delta chains are acyclic. This results directly from delta
`tiles being generated in a strict temporal order; the refer-
`ence file for any delta file is always from a previous backup
`epoch. Since these digraphs are acyclic, we can implicitly
`solve the dependency problem with local information. We
`decide whether or not the node is available for deletion with-
`out traversing the dependency digraph. This method works
`for all acyclic digraphs.
`At each node in the dependency digraph, i.e. for each
`file, we maintain a pointer to the reference file and a refer-
`ence counter. The pointer to the reference file indicates a
`dependency; a delta encoded version points to its reference
`file and an uncompressed file has no value for this pointer.
`The reference counter stores the number of inbound edges
`to a node and is used for garbage collection. A node has no
`knowledge of what files depend on it, only that dependent
`files exist.
`When backing up a delta file, we require a client to trans-
`mit its file identifier and the identifier of its reference file.
`When we store a new delta file at the ADSM server, we store
`the name of its reference file as its dependent reference and
`initialize its reference counter to zero. We must also update
`the metadata of the referenced file by incrementing its refer-
`ence counter. With these two metadata fields, the modiied
`backup server has enough information to retain dependent
`files and can guarantee that the reference files for its active
`files are available.
`
`3.3 Two-phase deletion
`
`When storing delta files, the backup server often needs to
`retain files that are no longer active. We modify deletion in
`this system to place files into three states: active files, inac-
`tive dependent files, and inactive independent files. Inactive
`dependent files are those files that can no longer be restored
`as a policy criterion for deletion has been met but need to bc
`retained as active delta versions depend on them. Inactive
`independent files are those files suitable for removal from
`storage as they cannot be restored and no active delta files
`depend on them.
`We add one more piece of metadata, an activity bit, to
`each file backed up on our server. When a file is received
`at the sewer, it is stored as usual and its activity bit is set.
`This file is now marked as active. The backup server implc-
`ments deletion policies as usual. However, when a file rc-
`tention policy indicates that a file can no longer be restored,
`we clear the file’s activity bit instead of deleting it. Its state
`is now inactive. Clearing the activity bit is the first phase of
`deletion. If no other files depend on this newly inactive file,
`it may be removed from storage, the second phase.
`Marking a file inactive may result in other files becoming
`independent as well. The newly independent reference file is
`garbage collected through the reference pointer of the delta
`file. The following rules govern file deletion:
`l When a file has no referring deltas, i.e. its rcfcrcncc
`counter equals zero, delete the file.
`l When deleting a delta file, decrement the refercncc
`counter of its reference file and garbage collect the rcf-
`erence file if appropriate.
`Reference counting, reference file pointers, and activity
`bits correctly implement the reachability dependence rela-
`tionship. The two phase deletion technique operates locally,
`never traversing the implicit dependency digraph, and con-
`sequently incurs little execution time overhead. The backup
`server only traverses this digraph when restoring files, It
`follows dependency pointers to determine the set of files rc-
`quired to restore the delta version in question.
`
`4 Delta Storage and Transmission
`
`The time to transmit files to and from a server is directly pro-
`portional to the amount of data sent. For a delta backup and
`restore system, the amount of data is also related to the man-
`ner in which delta files are generated. We develop an analy-
`sis to compare the version jumping storage method with stor-
`ing delta files as version to version incremental changes. WC
`show that version jumping pays a small compression penalty
`for file system backup when compared to the optimal linear
`delta chains. In exchange for this lost compression, version
`jumping allows a delta file to be rebuilt with at most two
`accesses to tertiary storage.
`
`30
`
`,.
`, 1...
`
`.-r
`.:.<‘.,,
`
`E-
`
`_, :I j,:
`-._.
`
`..‘.‘:
`__
`
`.\
`.-.>:
`
`.
`
`,Y-‘.
`j
`
`..‘i-.
`,
`
`,..,;
`-:,-;.
`
`?.I
`
`\
`
`_
`.
`
`(
`
`*:
`
`.
`,*
`
`_r--
`
`$y------T--T.
`
`,
`
`:
`
`4:
`
`,-,,;7-
`-,
`
`.
`
`*
`
`,
`
`”
`
`8
`
`_I-
`
`,.T$
`-.
`
`,
`
`Dropbox Ex. 1007
`
`

`
`The version storage problem for backup and restore dif-
`fers due to special storage requirements and the distributed
`nature of the application. Since AJXM stores files on mul-
`tiple and potentially slow media devices, not all versions of
`a file are readily available. This unavailability and other de-
`sign considerations shape our methods for storing delta ver-
`sions. If a backup system stores files on a tape robot then,
`when reconstructing a file, the server may have to load a
`separate tape for every delta it must access. Access to each
`tape may require several seconds. Consequently, multiple
`tape motions are intolerable. Our version jumping design
`guarantees that at most two accesses to the backing store are
`needed to rebuild a delta file.
`Also, we minimize the server load by performing all of
`the differencing tasks on the client. Since a server processes
`requests from many clients, a system that attempted to run
`compression algorithms at this location would quickly be-
`come processor bound and achieve poor data throughput.
`For client side differencing, we generate delta files at the
`client and transmit them to the server where they are stored
`unaltered. Using client differencing, the server incurs no ad-
`ditional processor overhead.
`
`4.1 Linear Delta Chains
`
`Serializability and lack of concurrency in file systems result
`in each file having a single preceeding and single following
`version. This linear sequence of versions forms a history of
`modifications to a file which we call a version chain. We
`now develop a notation for delta storage and analyze a linear
`version chain stored using traditional methods [15].
`Linear delta chains are the most compact version storage
`scheme as the inter-version modification are smallest when
`differencing between consecutive versions. We will use the
`optimal@ of linear delta chains later as a basis to compare
`the compression of the version jumping scheme.
`We denote the uncompressed
`ith version of a file by
`Vi. The difference between two versions Vi and Vi is in-
`dicated by A(,,,).
`The file A(K,v~) should be considered
`the differentially compressed encoding of Vj with respect to
`Vi such that Vj can be restored by the inverse differencing
`operation applied to Vi and A(% ,Q) . We indicate the differ-
`encing operation by
`
`and the inverse differencing or reconstruction operation by
`
`For our analysis, we consider a linear sequence of ver-
`sions of the same file that continues indefinitely,
`
`vl,%...
`
`,K-1,%K+1,...
`
`.
`
`The traditional way to store this version chain as a series
`of deltas is, for two adjacent versions Vi and Vi+l, to store
`the difference between these two files,-Avi,vi+, [13]. This
`produces the following “delta chain”
`
`Under this system, to reconstruct an arbitrary version J$, the
`algorithm must apply the inverse differencing algorithm re-
`cursively for all intermediate versions 2 through i. This re-
`lation can be compactly expressed as a recurrence. Vi rep-
`resents the contents of the ith version of a file and & is the
`recurrent file version. So when rebuilding Vi, Vi = & and
`
`Ri = ~-l(A(vl.-l,v&L~);
`
`RI = K.
`
`The time required to restore a version depends upon the time
`to restore all of the intermediate versions. In general, restora-
`tion time grows linearly in the number of intermediate ver-
`sions. In a system that retains multiple versions, the cost of
`restoring the most remote version quickly becomes exorbi-
`tant.
`
`4.2 Reverse Delta Chains
`
`Some version control systems solve the problem of long delta
`chains with reverse delta chain storage [15]. A reverse delta
`chain keeps the most recent version of a file present and un-
`compressed. The version chain is then stored as a set of
`backward deltas. For most applications, the most recent ver-
`sions are accessed far more frequently than older versions
`and the cost of restoring an old version with many interme-
`diate versions is offset by the low probability of that version
`being requested.
`We have seen that linear delta chains fail to provide ac-
`ceptable performance for delta storage (see $4.1). Our de-
`sign constraints of client-side differencing and two tape ac-
`cesses for delta restore also eliminate the use of reverse delta
`chains. We show this by examining the steps taken in a re-
`verse delta system to transmit the next version to the backup
`server.
`At some point, a server stores a reverse delta chain of the
`form
`
`By convention, Vi is created by modification of Vi-l. For
`versions Vi and Vj in a linear version chain, these versions
`are adjacent if they obey the property j - i = 1 and an
`intermediate version Vk obeys the property i < L < j.
`
`the
`In order to backup its new version of the file, V&,
`client generates a difference file, A(vn,v,,+l) and transmits
`this difference to the server. However, this delta is not the file
`that the server needs to store. It needs to generate and store
`
`Dropbox Ex. 1007
`
`

`
`A(v,+~,~,). Upon receiving A~v,,,v,++ the server must ap-
`ply the difference to V, to create Vn+l and then run the dif-
`ferencing algorithm to create A(v,+r,v,). To update a single
`version in the reverse delta chain, the server must store two
`new files, recall one old file, and perform both the differ-
`encing and reconstruction operations. Reverse delta chains
`fail to meet our design criteria as they implement neither
`minimal server processor load nor reconstruction with the
`minimum number of participating files.
`
`4.3 Version Jumping Delta Chains
`
`Our solution to the version storage implements what we call
`jumpirlg deltas. This design uses a minimum number of files
`for reconstruction and performs differencing on the backup
`client.
`In a version jumping system, the server stores versions in
`a modified forward delta chain with an occasional whole file
`rather than a delta. Such a chain looks like:
`
`K, A(v,,vz), A(v,,vs), - - - , A(K,I+-~% A,,,,,,,,
`
`. - -
`
`Storing this sequence of files allows any given version to be
`reconstructed by accessing at most two files from the version
`chain.
`When performing delta compression at the backup client,
`the files transmitted to the server may be stored directly with-
`out additional manipulation. This immediate storage at the
`server limits the processor overhead associated with each
`client session and optimizes the backup server.
`An obvious concern with these methods is that one ex-
`pects compression to degrade when taking the difference be-
`tween two non-adjacent versions, i.e. for versions Vi and
`Vj, IA(vi,~jjl’ increases as j - i increases. Since the com-
`pression is likely to degrade as the version distance, j - i,
`increases, we require an occasional whole file to limit the
`maximum version distance. This raises the question: what is
`the optimal number of versions between whole files?
`
`5 Performance Analysis
`
`We analyze the storage and transmission cost of backing up
`files using a version jumping policy. We already know that
`version jumping far outperforms other version storage meth-
`ods on restore, since it requires only two files to be accessed
`from tertiary storage to restore a delta file. Now, by showing
`that the compression loss with version jumping is small as
`compared to linear delta chains, the optimal storage method
`for delta compression, we conclude version jumping to be a
`superior policy.
`The analysis of transmission time and server storage is
`identical, since our backup server immediately stores all files,
`
`‘For a file If, we use IV1 to denote the size of the file. Since files are
`one dimensional streilllls of bytes, this is synonymous to the length of V.
`
`including deltas, that it receives and transmission time is in
`direct proportion to the amount of data sent. We choose to
`examine storage for ease of understanding the conten

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket