`[11] Patent Number:
`[19]
`United States Patent
`
`Cane et al.
`[45] Date of Patent:
`*Aug. 8, 2000
`
`USOO6101507A
`
`[54] FILE COMPARISON FOR DATA BACKUP
`AND FILE SYNCHRONIZATION
`
`[75]
`
`Inventors: David Cane, Sudbury; David
`Hirschman, Sharon; Philip Speare,
`Aflmgton; Lev va‘thht’ concord;
`Howard Mar50n> Needham all Of
`MaSS-
`
`[73] Assignee: Connected Corporation, Framingham,
`Mass.
`
`[*] Notice:
`
`This patent issued on a continued pros-
`ecution application filed under 37 CFR
`1.5301), and is subject to the twenty year
`patent
`term provisions of 35 U.S.C.
`154(a)(2)~
`
`[21] Appl. No.: 09/021,705
`.
`Flled:
`
`Feb. 10! 1998
`
`[22]
`
`5,628,012
`5,642,496
`5,647,017
`
`5/1997 Libkin ..................................... 707/102
`6/1997 Kanfi ................. 711/162
`
`7/1997 Smithies et al.
`.
`..... 382/119
`
`
`9/1997 Stolfo ...................................... 382/283
`5,668,897
`
`:jigg: 3621118. Ct 8.1.
`...........................3.‘9g/1641/(6)
`3,332,333
`..... 711/162
`4/1998 HTilditch
`5:737:763
`
`4/1998 Squibb .......... 707/203
`5,745,906
`
`..... 707/204
`8/1998 McClain
`5,794,254
`
`9/1998 Ziauddin et al.
`..... 707/7
`5,802,521
`
`10/1998 Smithies et al.
`.
`..... 382/115
`5,818,955
`
`5,848,422 12/1998 Sato et al.
`..... 707/203
`
`3/1999 Sigal et al. ............ 395/712
`5,881,292
`
`4/1999 McGrath et a1.
`..... 707/200
`5,893,113
`4/1999 Wang ................... 707/203
`5,893,117
`
`.. 707/200
`5,900,000
`5/1999 Korenshtein .
`5,909,677
`6/1999 Broder et al.
`............................... 707/3
`OTHER PUBLICATIONS
`
`A Tutorial on CRC Computations, Aug. 1988, pp 1—14.
`Article: Title: AMethod For Updating A Cyclic Redundancy
`Code, Jun. 1992, by Paul R. Lintz.
`
`[60]
`
`Related U-S- Application Data
`Provisional application N0. 60/037,597, Feb. 11, 1997.
`
`Primary Examiner—Thomas G. Black
`Assistant Examiner—Jean M Corrielus
`
`AHOWX 489% or Firm—Weingarteny SChurgin> Gagnebin
`& Hayes LLP
`[57]
`
`ABSTRACT
`
`File comparison employs a single function F to calculate a
`digital signature from data in a sliding Window. The digital
`signature is both incrementally computable and position
`sensitive. In particular, F is computable Without reprocess-
`ing each byte in the array When the Window is advanced and
`facmtates daemon 0f SUCh Changes as transposed bytes 0f
`data. The function F is defined by two qualities. First, for
`F(A+B), where A is an array, F(A+B)=F(A)+F(B). Second,
`given a concatenation operator “I” such that “OIA” indicates
`an array A Wlth 0 inserted before A, the function F has the
`property that there is a function G such that F(0!A)=G(F
`(A'0)). Both polynomials and cyclic redundancy checks
`«cm maybe
`as
`0mm
`
`Int. Cl.7 ...................................................... G06F 17/30
`[51]
`[52] us. Cl.
`.......................... 707/204; 707/203; 707/201;
`714/6; 711/162
`[58] Field of Search ................................ 707/203, 1, 204,
`707/201’ 207; 714/6; 711/162
`
`[56]
`
`References Cited
`
`U~S~ PATENT DOCUMENTS
`4,897,785
`1/1990 Ziiger ...................................... 364/200
`5,210,866
`5/1993 Milligan et a1.
`.
`5,214,696
`5/1993 Keiser et al.
`............................... 380/4
`
`5,263,154
`11/1993 Eastridge et a1.
`.
`714/6
`5,276,860
`1/1994 Fortier et al.
`............................... 714/6
`
`572767867
`“1994 Kenley 6t a1~ -
`
`593479652
`9/1994 EpStem et a1:
`2:82:28
`2133: 81:39:31..
`
`
`5,454,099
`....... 714/6
`9/1995 Myers et al.
`5,479,654 12/1995 Squibb .................................... 707/201
`
`707/1
`
`
`
`
`
`
`42
`IDENT AND SAVE
`
`REVISION ELEMENT
`
`
`44
`V ®
`N
`
`
`STORE INDICATOR
`
`AND POSITION
`
`
`52
`Y
`
`
`“‘1
`IN
`ADVANCE WINDOW
`ONE CHARACTER
`I
`54
`t
`48
`CALCULATE
`ADVANCE WINDOW
`
`SIGNATURE
`ONE BLOCK
`
`
`
`
`GET NEW FILE
`
`
`
`SAVE
`ENTIRE
`FILE
`
`
`
`
`
`
`17 Claims, 4 Drawing Sheets
`
`
`
`Clouding Exhibit 2002, pg. 1
`
`
`
`
`
`US. Patent
`
`Aug. 8, 2000
`
`Sheet 1 0f 4
`
`6,101,507
`
`
`
`2N_<<Em>mmomDOm
`
`22535%$83
`
`.39Hooppm:
`
`$2mZQ.oo
`
`mZOC
`
`200c.8E029:0ch
`82>32D4Q.3922.momZOE.
`mm3
`
`3923..ONEmzotNo009:2no00965
`
`09$53..noUOO>m<m~59.__NE<moUOD<Q<Q
`
`E;5%8085539be8km:
`
`
`
`N.UHW
`
`Clouding Exhibit 2002, pg. 2
`
`
`
`
`
`
`
`US. Patent
`
`Aug. 8,2000
`
`Sheet 2 0f4
`
`6,101,507
`
`
`
`GET NEW FILE
`
`ENTIRE
`
`
`
`
`
`INIT WINDOW
`
`STORE INDICATOR
`
`AND POSITION
`
`. FILE
`‘ S
`
`
` 42
`
`
`44 E
`
`
`
`
`
`
`IDENT AND SAVE
`
`REVISION ELEMENT
`
`46
`
`ADVANCE WINDOW
`
`ONE CHARACTER
`
`48
`
`CALCULATE
`
`SIGNATURE
`
`ADVANCE WINDOW
`
`ONE BLOCK
`
`FIG. 2
`
`Clouding Exhibit 2002, pg. 3
`
`
`
`US. Patent
`
`Aug. 8,2000
`
`Sheet 3 0f4
`
`6,101,507
`
`68 MATCH COUNT
`TABLE 66
`
`_
`
`72 =
`
`
`
`Y
`
`76
`
`N
`
`74
`
` 70
`
`ADVANCE
`
`WINDOW
`
`BEYOND N
`
`N ?
`
`78
`
`Y
`
`USE
`
`HIGHEST
`
`FIG. 3
`
`Clouding Exhibit 2002, pg. 4
`
`
`
`US. Patent
`
`Aug. 8, 2000
`
`Sheet 4 0f 4
`
`6,101,507
`
`owis52%
`
`._.m:02.4
`
`mm“2182$95
`
`mmE
`
`SW55%2E0
`
`mmE
`
`ow
`
`mm
`
`00
`
`No
`
`
`
`.5950mhfimu
`
`><~_~_<
`
`02$:5
`
`<20
`
`.533050m
`
`><mm<
`
`
`
`><~E<:Emz<~:.
`
`nGEN
`
`<>3
`
`m>D
`
`v03Ev0
`
`m0
`
`mu.8m0
`
`«.4
`
`mm
`
`NU
`
`No
`
`o>
`
`o<
`
`om
`
`00
`
`oo
`
`0080m
`
`V65%
`
`m.
`
`.QNE
`
`Clouding Exhibit 2002, pg. 5
`
`
`
`
`
`
`
`
`6,101,507
`
`1
`FILE COMPARISON FOR DATA BACKUP
`AND FILE SYNCHRONIZATION
`
`CROSS REFERENCE TO RELATED
`APPLICATIONS
`
`Priority is claimed to US. Provisional patent application
`Ser. No. 60/037,597 entitled FILE COMPARISON FOR
`DATA BACKUP AND FILE SYNCHRONIZATION, filed
`Feb. 11, 1997.
`
`STATEMENT REGARDING FEDERALLY
`SPONSORED RESEARCH OR DEVELOPMENT
`
`Not Applicable
`
`BACKGROUND OF THE INVENTION
`
`The present invention is generally related to data backup
`and file synchronization, and more particularly to compari-
`son of sets of digital data to determine differences therebe-
`tween to facilitate data backup and file synchronization.
`It is desirable to have the facility to copy files present on
`a first computer system onto a second computer system over
`a connection. Such a need arises when synchronizing the
`contents of an “A” disk system on the first computer system
`with another disk system on the second computer system.
`Such a need also arises when using a target system to
`maintain a second copy of a file from a source system for
`disaster recovery purposes. However, file copying can be
`cumbersome and time consuming due to the bandwidth
`limitations of typical connections.
`It is known to reduce file copying requirements, and hence
`more efficiently utilize bandwidth and reduce backup time,
`by sending only those portions of files which have changed
`since the last backup to the target system. However,
`to
`implement such a system it is necessary to perform data
`comparison to locate the changed portions.
`A file comparison technique is described in US. Pat. No.
`5,479,654 entitled APPARATUS AND METHOD FOR
`RECONSTRUCTING A FILE FROM A DIFFERENCE
`
`SIGNATURE AND AN ORIGINAL FILE, issued to Squibb.
`Squibb describes a method of comparing previously stored
`digital signatures with new digital signatures calculated for
`data within a sliding window in order to recognize differ-
`ences between files. In particular, Squibb teaches using two
`functions to calculate digital signatures. The first function
`has the attribute of being incremental. The incremental
`function allows computation of a new digital signature using
`only the new data entering the window, the old data leaving
`the window and the old signature, and hence can be calcu-
`lated relatively quickly. The second function has the
`attribute of being position sensitive and provides a more
`unique signature from the data in the window, being sensi-
`tive to such changes as transposed bytes.
`One combination of functions that Squibb teaches
`includes an Exclusive-OR (“XOR”) function as the incre-
`mental function and a Cyclic Redundancy Check (“CRC”)
`function as the position sensitive function. The XOR func-
`tion of Squibb is incremental, but is not sensitive to such
`changes as transposed bytes. The CRC function of Squibb is
`position sensitive, but must include all of the data in the
`window in the signature calculation. The CRC function of
`Squibb therefore requires relatively more calculations to
`compute than the XOR function. In operation the XOR
`function is employed first. If the comparison between the
`previously stored signature and the XOR signature indicates
`a changed data portion then that result is assumed to be
`
`2
`correct. If the XOR signature comparison does not indicate
`a changed data portion then the CRC function is employed
`to produce a signature that is compared with the previously
`stored signature. The result from the CRC signature com-
`parison is assumed to be correct. However, a more efficient
`technique would be desirable.
`
`BRIEF SUMMARY OF THE INVENTION
`
`In accordance with the present invention, a file compari-
`son routine employs a sliding window and a single function
`F to generate a digital signature that is both position sensi-
`tive and incrementally computable. The function F is defined
`by two qualities. First, for F(A+B), where A is in array,
`F(A+B)=F(A)+F(B). Second, given a concatenation opera-
`tor “!” such that “0!A” indicates an array Awith 0 inserted
`before A, the function F has the property that there is a
`function G such that F(0!A)=G(F(A!0)). Both polynomials
`and cyclic redundancy checks (“CRC”) may be employed as
`the function.
`
`The function F enables computation of a position sensi-
`tive digital signature in a single step using only the new data
`entering the window, the old data leaving the window and
`the digital signature computed from the previous window.
`Hence, a file comparison system in accordance with the
`present
`invention offers improved performance because
`relatively fewer calculations are required and a relatively
`more unique digital signature is provided.
`In further accordance with the present invention, previ-
`ously detected and stored files which have been modified
`and saved under different file names are detected, and only
`those portions of the later version of the file which have
`changed are backed up. To detect such a later version of the
`already backed up file an “approximate comparison” is made
`between the file under investigation and the previously
`stored digital signature lists of other files that have been
`created on the source system. In one method for approximate
`comparison a signature is computed for a block of characters
`in the file under investigation and the signature is compared
`to previously stored signatures of backed up blocks. The
`extent of matches is then determined, such as by the number
`of matching blocks, and the matches are ranked. The pre-
`viously backed up block corresponding to the highest ranked
`match is then employed as a baseline for storage of the new
`file, with only the differing portions of the new file being
`backed up along with an index to the baseline file.
`
`BRIEF DESCRIPTION OF THE DRAWING
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`The invention will be more fully understood in view of the
`Detailed Description of the Invention and the Drawing, of
`which:
`
`FIG. 1 is a block diagram of a data backup system that
`employs file comparison;
`FIG. 2 is a block diagram which illustrates a sliding
`window comparison method;
`FIG. 3 is a block diagram which illustrates approximate
`comparison of files;
`FIGS. 4—6 illustrate how compression and encryption
`affect backup and recovery; and
`FIG. 7 is a block diagram which illustrates the block filter
`routine.
`
`DETAILED DESCRIPTION OF THE
`INVENTION
`
`55
`
`60
`
`65
`
`US. Provisional patent application Ser. No. 60/037,597
`entitled FILE COMPARISON FOR DATA BACKUP AND
`
`Clouding Exhibit 2002, pg. 6
`
`
`
`6,101,507
`
`3
`FILE SYNCHRONIZATION, filed Feb. 11, 1997 is incor-
`porated herein by reference.
`Referring to FIGS. 1 and 2, a sliding window comparison
`technique is employed for data backup and file synchroni-
`zation. The use of sliding windows and digital signatures for
`file comparison is known in the art and is taught, inter alia,
`in US. Pat. No. 5,479,654 entitled APPARATUS AND
`METHOD FOR RECONSTRUCTING A FILE FROM A
`DIFFERENCE SIGNATURE AND AN ORIGINAL FILE,
`issued to Squibb, which is incorporated herein by reference.
`Afile is selected from a storage device 10 in a source system
`12 as indicated in step 14. The selected file is compared with
`files on a storage device 16 in a target system 18 by file name
`as indicated in step 20. As a performance enhancing varia-
`tion a list of target system files can be maintained on the
`source system. If no matching file name is located on the
`target system 18, such as when a new file name 22 is selected
`in the source system, the entire new file 22 is transmitted
`from the source system 12 to the target system 18 for backup
`as indicated in step 24. If a matching file name is located on
`the target system, the file modification dates associated with
`the respective matching files are compared as indicated in
`step 26. Upon locating a previously detected file with an
`unchanged modification date, no changes in the file are
`indicated and a new file is loaded as indicated in step 14.
`Upon locating a previously detected file with a changed
`modification date, a control program 28 operating in the
`source system 12 identifies at least one contiguous portion of
`the file which has changed (“revision element”). The iden-
`tified revision elements, which may vary in size, are trans-
`mitted from the source system to the target system, and may
`be written into the backup copy of the file or separately
`backed up.
`The control program employs a sliding window 30 to
`produce digital signatures 32 from block data portions of the
`file in the source system to identify the revision elements.
`Each digital signature is a representation of the data char-
`acters within the window 30 when the signature is generated.
`For the first signature generated from a file, the signature is
`computed by employing all of the data within the window as
`indicated in step 34. The computed signature 32 is compared
`with previously stored digital signatures 36 to produce a
`result 38 that indicates whether there is a match therebe-
`
`tween as indicated in step 40. If a match is not detected then
`the window position is recorded and the revision element is
`saved as indicated in step 42. If the end of the file has been
`reached, as determined in step 44, a new file is selected as
`indicated in step 14. If the end of the file has not been
`reached, the window 30 is then advanced by one character
`as indicated in step 46. A digital signature is then computed
`as indicated in step 48. The digital signature is compared
`with the previously stored digital signatures to determine if
`there is a match therebetween as indicated in step 40. If a
`match is detected in step 40, a match indicator and the match
`position are recorded as indicated in step 50. If the end of file
`has not been reached, as determined in step 52, the window
`is advanced by the number of characters within the window,
`e.g., one block, as indicated in step 54. When the entire file
`has been analyzed, a new file is loaded as indicated in step
`14.
`
`Calculation of the digital signature is facilitated by
`employing any one of a class of functions (“F”) that have the
`property of being both incrementally computable when the
`window 30 is advanced and providing a position sensitive
`digital signature 32. For this function, F(A+B)=F(A)+F(B).
`Further, given a concatenation operator “!” such that “0!A”
`indicates an array A with a 0 inserted before it, the function
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`F also has the property that there is a function G such that
`F(0!A)=G(F(A!0)). As such, given an array within the
`window that ends with 0, after shifting each of the bytes
`therein down by one position and inserting 0 at the begin-
`ning of the window, the signature 32 can be computed for
`this new array from the signature of the old array without
`reprocessing each of the bytes in the new array. Further, the
`digital signature produced by the function F is relatively
`more unique than known incremental functions, and allows
`detection of such differences as transposed bytes of data
`within the window. The digital signature is thus “position
`sensitive.”
`
`The use of the function F provides improved performance
`in the file comparison system by reducing the number of
`calculations required to detect differences between sets of
`data such as files. The function enables computation of a
`position sensitive digital signature in a single step using only
`the new data 56 entering the window, the old data 58 leaving
`the window and the digital signature computed from the
`previous window. Hence,
`the performance is improved
`because relatively fewer calculations are required and a
`relatively more unique digital signature is provided.
`In one embodiment the function F is the polynomial:
`F(An)=Sum (ai+n*3(w'1'0) modulo 264 for i=0 to w—1, for an
`array A starting at index position n with a window size of w,
`where al- is an element of A. If the window is advanced in A,
`then the new function is computed as follows:
`F(An+1)=3*F(An)—an*3w'1+an+w modulo 264. Computing the
`new function is generally faster than computing the original
`F(An). Hence, file comparison is facilitated by use of this
`function.
`
`In view of the present disclosure it will now be appreci-
`ated by those skilled in the art that functions other than the
`function illustrated in the above embodiment will provide
`digital signatures that are both incrementally computable
`and position sensitive. For example,
`the general class of
`functions known as Cyclic Redundancy Check (“CRC”)
`functions, which are polynomials over GF(2), are incremen-
`tal functions and will provide position sensitive digital
`signatures.
`Detection of Similar Files
`
`Referring now to FIGS. 1 and 3, in a first alternative
`embodiment the file comparison technique is employed to
`detect similar files having different filenames in order to
`further facilitate backup operations. When a previously
`detected file, such as a form letter 60,
`is modified and
`subsequently saved under a different file name,
`the new
`version 62 of the file is detected and only the revision
`elements of that new version are saved. To detect the new
`
`version of the file an “approximate comparison” is made
`between the new file and the previously stored digital
`signatures from the other files that have been created on the
`source system. The approximate comparison detects files
`that are similar, although not necessarily identical to, the
`new version of the file.
`
`In one approximate comparison technique the control
`program maintains a list 64 of signatures for all source
`system files that have previously been backed up to the
`target system. A Match Count Table 66 is created with one
`row 68 for each file in the list of signatures. Each row in the
`Match Count Table is initialized to zero. The window 30 for
`
`signature computation in the new file is then positioned at
`the beginning of the file being examined.
`In an initial step 68 a signature 32 for the characters in the
`window 30 is computed with the function F. The computed
`signature is then compared against all of the signatures for
`the first “N” blocks of previously copied files in the list 64,
`
`Clouding Exhibit 2002, pg. 7
`
`
`
`6,101,507
`
`5
`where N is a predetermined integer value. If a match against
`at least one existing block signature is detected as indicated
`in step 70, then the row pointer 72 in the Match Count Table
`is incremented and the window 30 in the new file is moved
`
`to begin at the end 73 of the block for which the signature
`was just computed. If no match is detected at step 70 then
`the window 30 is advanced by one character as indicated in
`step 74. If comparison has not advanced beyond the first N
`blocks for the new file as determined in step 76, flow returns
`to step 68. Otherwise flow continues to step 78. As indicated
`in step 78, the file with the highest count in the Match Count
`Table is selected and a comparison algorithm is used to
`determine the set of changes between this file and the new
`file. The set of changes is then backed up along with a
`pointer to the original backup of the file. In the event of
`equal counts resulting in “ties” in the Match Count Table one
`of the files is arbitrarily selected.
`Restore Optimization
`Referring to FIGS. 1 and 4—7, in a second alternative
`embodiment, restoration of revision elements is facilitated
`by a filter routine when compression and encryption are
`employed. The filter routine controls the encryption and
`compression engines on the source system. When the
`detected revision elements are compressed and encrypted for
`transmission to the target system,
`the compression and
`encryption engine is restarted with each revision element so
`that the target system can assemble a representation of the
`final file for transmission to the source system without
`decrypting the revision elements. Hence, it is not necessary
`for the target system to have knowledge of the encryption
`key or compression algorithm.
`Performance is further facilitated in the case where a
`
`plurality of relatively small revision elements are present by
`combining the small revision elements together and com-
`pressing the collection of revision elements in a single step.
`In particular, the compression engine is not restarted at each
`revision element, so the compression algorithm has a larger
`array to work on and hence is more efficient. The problems
`posed for this technique when encryption is employed are
`evident in FIG. 4, which shows version 0 (“V0”) of the file,
`consisting of 7 blocks, and the revision elements for V1
`through V4. FIG. 5 shows the complete contents of V4. If a
`request for the restoration of V4 is made, and the target
`system cannot take the revision elements apart because they
`are all encrypted together, then the best transmission that can
`be made is to send a total of V0 (7 blocks), UV2 (4 blocks),
`UV3 (4 blocks), UV4 (4 blocks) for a total of 19 blocks to
`restore a 7 block long file. UV1 is unnecessary because
`everything in it has been replaced by later revision elements.
`Referring now to FIG. 6, an original backup and revision
`elements generated from updates are grouped for compres-
`sion and encryption. Not all of the updates for a single
`version are packed together. Rather, the updates are grouped
`together into units of compression (“chunks”). The optimal
`set of chunks to be returned in the illustrated example is:
`A2B2, C3D3, D4E4, F4G4, whereby only eight blocks are
`sent for a seven block file, rather than nineteen blocks as in
`the previously described technique. However, in practice the
`updates will not always align perfectly and may have
`arbitrary overlaps.
`File restoration is further illustrated by Tables 1—11. A file
`to be restored is stored in the archive system as a series of
`updates and an archive. The archive contains all of the bytes
`of the file. The updates contain file change information that
`can be employed to update the previous version of the file to
`the current version. The update algorithm works by creating
`a list of updates that encompasses all of the bytes of the file.
`
`6
`The algorithm first processes the most recent update and
`then works sequentially backwards through the set of
`updates, so that if a given byte of the file is already covered
`by an update, the algorithm will not employ an older update
`that contains that same byte.
`
`Addr
`
`v0
`
`0
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`
`A
`B
`C
`D
`E
`F
`G
`H
`I
`J
`
`TABLE 1
`
`v1
`
`A
`L
`Q
`D
`E
`M
`F
`G
`H
`I
`J
`
`v2
`
`L
`Q
`D
`E
`M
`F
`N
`H
`I
`J
`
`v3
`
`P
`Q
`D
`E
`F
`N
`H
`I
`J
`
`Table 1 illustrates four versions of an example file. File
`version V1 differs from version V0 in two places.
`In
`particular, version V1 replaced two bytes at address 1 and
`inserted a byte at address 5. The update for V1 is coded as
`follows:
`
`TABLE 2
`
`Operation
`
`Data Offset
`
`Length
`
`Target
`
`Contents
`
`Copy
`Data
`Copy
`Data
`Copy
`
`0
`
`3
`
`5
`
`1
`2
`2
`1
`5
`
`0
`1
`3
`5
`6
`
`LQ
`
`M
`
`The update expressed in Table 2 is interpreted as:
`1. Copy 1 byte from the previous file (A) at address 0 to
`address 0 of the new version (Target) of the file.
`2. Place 2 bytes of newly supplied data (LQ) into the new
`file starting at address 1.
`3. Copy 2 bytes from the previous file (DE) at address 3
`to address 3 of the new file.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`4. Place 1 byte of newly supplied data (M) into the new
`file at address 5.
`
`45
`
`5. Copy 5 bytes from the previous file (FGHIJ) to address
`6 of new file.
`
`Given the above described coding method, the other three
`versions of the updates shown in Table 1 are coded as shown
`in Tables 3—5 below.
`
`50
`
`TABLE 3
`
`Operation
`Data
`Data
`Data
`
`Data Offset
`
`Length
`3
`3
`4
`
`Target
`0
`3
`6
`
`Contents
`ABC
`DEF
`GHIJ
`
`TABLE 4
`
`Operation
`
`Data Offset
`
`Length
`
`Target
`
`Contents
`
`Copy
`Data
`Copy
`
`1
`
`8
`
`6
`1
`3
`
`0
`6
`7
`
`N
`
`55
`
`60
`
`65
`
`Clouding Exhibit 2002, pg. 8
`
`
`
`6,101,507
`
`8
`Update List entry 2, namely 1—3. Entry 1 of the GetData List
`is then processed for source locations 4—8. This match is
`found with Update List item 3, a copy operation. GetData
`entry 3 is added. This satisfies the entire range of Entry 1
`from the GetData List. Entry 2 in the GetData List shows a
`source range of 1—3 and a Version of 2. The Update List is
`searched for a Version 2 entry with an output range matching
`this. Such an entry is found at Update List (4). This is a copy
`operation so a new entry (4) is placed in the GetData List.
`Note that because a source range of 1—3 was sought, and the
`output range from the Update List entry was 0—5, the entry
`is made into the GetData List, the Data Offset range and
`Length are adjusted to capture only the piece of Update List
`entry 4 to reflect what was needed by the GetData List entry
`2. Entry 2 from GetData is then completely satisfied. Entry
`3 from GetData shows a source range of 5—9. The first
`location of this range is matched by Update List(4), which
`is a copy operation and results in a new entry to GetData(5).
`This leaves GetData(3) with a source range of 6—9 to be
`processed. The GetData(3) source range of 6 is matched by
`Update List(5) which is a Data operation. The data (N) is
`copied to the output file at location 5. This is because the
`GetData(3) entry has a source range of 5—9 which maps to
`an output range of 4—8. Hence the source item at location 6,
`gets placed in the output file at location 5. The GetData(3)
`source range 7—9 is next found to match Update(6), a copy
`operation. This yields GetData(6). The remaining items in
`the GetData list are processed in a similar manner until the
`list is empty. This will yield the complete output file, without
`ever having to process entries from the update list Data items
`that did not affect the final output file.
`
`TABLE 5
`
`Operation
`Data
`Copy
`Copy
`
`Data Offset
`
`1
`5
`
`Length
`1
`3
`5
`
`Target
`0
`1
`4
`
`Contents
`P
`
`A block filter algorithm may be employed to facilitate
`restoration of a desired version of a file. When multiple
`updates for a file have been archived, intermediate updates
`may be overwritten by later updates. The block filter algo-
`rithm is employed to recognize such situations and avoid
`needless computation. The Block Filter Algorithm deter-
`mines an optimal set of updates. For example, providing file
`version V3 via the block filter algorithm includes creating an
`Update List of all of the version archives, ordered by version
`with the most recent at the head of the list. An empty output
`file and an empty GetData list are then created. The empty
`GetData list is a list of records that describes the data that is
`
`needed from each archive to create the output file. Entries
`are added to the end of the list and removed from the head
`
`of the list. Each record has the following four fields:
`Version: the archive version to get the data from
`Source Offset: the location within the archive file to get
`the data from
`
`the length of the data to be copied in this
`Length:
`operation
`Target: the location in the output file that this data will go
`to
`
`The following entry is placed at the head of the GetData list:
`Version=most recent version of the file to be retrieved
`Source Offset=0
`
`Length=length of output file
`Target=0
`The GetData list is processed by initially removing an
`entry from the head of the list. The entries in the Update List
`for the specified version are then searched to locate a record
`in the Update List whose source range overlaps the target
`range of the GetData entry being processed. “Overlap”
`indicates that some portion of the range defined by Data
`Offset+Length defined in the GetData entry overlaps the
`range defined by the Target+Length in the UpdateList. If the
`record found in the Update List is a “Data” record, then the
`data is copied from that record to the output file. If the record
`found in the Update List is a “Copy” record, then a new
`record is added to the end of the GetData list with the version
`
`entry decremented. Processing of the GetData entry contin-
`ues by scanning the Update List entries until the entire target
`range of the GetData entry has been covered with either Data
`or Copy entries from the Update List. Flow then return to the
`initial step at which processing of the next entry from the
`GetData list begins.
`Exemplary block filter algorithm execution steps are
`shown in Tables 5—8. Entry 1 in the GetData List shows a
`source range of 0—8 and a Version of 3. The Update List is
`searched for a Version 3 entry with a matching output range.
`The first such entry is Entry 1. This specifies Data with a
`length of 1 (P) and it is placed in the output file and location
`0, the target location. Entry 1 in the GetData List has been
`satisfied for source location 0, leaving source locations 1—8
`to be satisfied. Entry 1 in the GetData List is processed for
`a source range of 1—8. Amatch is found in Update List entry.
`This is a copy operation, so an entry is added to the GetData
`List (entry 2) that specifies Version 2, the next lower version,
`and spans the range of overlap from GetData entry 1 and
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`1
`5
`1
`
`8
`0
`
`Version Operation Data
`3
`Data
`3
`Copy
`3
`Copy
`2
`Copy
`2
`Data
`2
`Copy
`1
`Copy
`1
`Data
`1
`Copy
`1
`Data
`1
`Copy
`0
`Data
`0
`Data
`0
`Data
`
`3
`
`5
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`
`Length
`1
`3
`5
`6
`1
`3
`1
`2
`2
`1
`5
`3
`3
`4
`TABLE 6
`
`Target Contents
`0
`P
`1
`4
`0
`6
`7
`0
`1
`3
`5
`6
`0
`3
`6
`
`N
`
`LQ
`
`M
`
`ABC
`DEF
`GHIJ
`
`Output
`Range
`0
`1—3
`4—8
`0—5
`6
`7—9
`0
`1—2
`3—4
`5
`6—10
`0—2
`3—5
`6—9
`TABLE 6a
`
`TABLE 7
`
`GetData
`
`Data Offset
`0
`1
`5
`2
`6
`8
`3
`5
`8
`
`Length
`9
`3
`5
`3
`1
`3
`2
`1
`3
`
`Target
`0
`1
`4
`1
`4
`6
`2
`4
`6
`
`Version
`3
`2
`2
`1
`1
`1
`0
`0
`0
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`
`Clouding Exhibit 2002, pg. 9
`
`
`
`6,101,507
`
`9
`
`TABLE 7a
`
`Source
`0-8
`1-3
`5-9
`2-4
`6
`8-10
`3-4
`5
`7-9
`
`Ouput Range
`0—8
`1—3
`4—8
`1—3
`4
`6—8
`2—3
`4
`6—8
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`
`From GetData
`
`From Update
`
`1
`1
`2
`3
`3
`4
`5
`6
`
`2
`3
`4
`4
`6
`9
`11
`11
`
`TABLE 8
`
`Output File
`
`Address
`
`Contents
`
`From GetData
`List
`
`From Update
`List
`
`0
`1
`2
`3
`4
`5
`6
`7
`8
`
`P
`Q
`D
`E
`F
`N
`H
`I
`J
`
`1
`4
`7
`7
`8
`3
`9
`9
`9
`
`1
`8
`13
`13
`13
`5
`14
`14
`14
`
`Having described the preferred embodiments of the
`invention, other embodiments which incorporate the con-
`cepts of the invention will now become apparent to one of
`skill in the art. Therefore, the invention should not be viewed
`as limited to the disclosed embodiments but rather should
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`viewed as limited only by the spirit and scope of the
`appended claims.
`What is claimed is:
`
`35
`
`1. A system for comparing a first data set on a first storage
`device with a second data set on a second storage device,
`comprising:
`a transmission medium for transmitting data between the
`first storage device and the second storage device;
`a control program that generates a first digital signature
`from the first data set and a second digital signature
`from the second data set, the control program including
`a function F that
`incrementally calculates position
`sensitive first and second digital signatures, wherein
`said first data set
`is defined by a sliding indexed
`window and said incremental calculation comprises
`data leaving the window, data entering the window, and
`a previous digital signature; and
`a comparator for determining whether the first digital
`signature matches the second digital signature.
`2. The comparing system of claim 1 wherein polynomials
`are used to implement the function F.
`3. The comparing system of claim 2 wherein the function
`F is a polynomial: F(An)=Sum (ai+n*N(W'1'0) modulo 264
`for i=0 to w—1, for an array A starting at index position n
`with a window size of w, where al- is an element of A.
`4. The comparing system of claim 3 wherein,
`if the
`window is advanced in A, then a new function is computed
`as:
`
`F(An=1)=N*F(An)—an*NW'1+an+w modulo 264.
`5. The comparing system of claim 1 wherein a cyclic
`redundancy check is used to implement the function F.
`6. A method for calculating a position sensitive digital
`signature from data on a first storage medium for compari-
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`10
`son with a digital signature that represents data on a second
`storage medium, comprising the steps of:
`selecting the data in the first system with an indexed
`sliding window;
`applying a function F to the data within the window to
`generate the incremental position sensitive digital
`signature, wherein said function of F