throbber
000001
`
`Symantec 1017
`IPR of U.S. Pat. No. 7,757,298
`
`

`
`P3U
`
`
`
`w2aN_sEm>mSW58
`
`
`
`9<,m:m>m:83
`
`tnu
`
`4Mm
`
`7nu5,1nu1,6
`
`gE.5%uoo<%¢M82:2uo9m<m
`M£2M22mzo:M32HoE.
`
`3222sazo:
`
`32>32<£
`
`
`
`|mE.-..o.:.4Ju_o__
`
`mm
`
`we
`
`8
`
`3923..momzot
`
`82>._D_.Buoo>m<m_
`
`R9.__~_n_<mouon_<n_<._
`
`39_.UO85:
`
`\.m|:Eoc__
`
`000002
`
`N.U~,_~
`
`000002
`
`
`
`
`

`
`U.S. Patent
`
`Aug. 8,2000
`
`Sheet 2 of4
`
`6,101,507
`
`
`
`INIT WINDOW
`
`IDENT AND SAVE
`
`REVISION ELEMENT
`
` 42
`44
`
`STORE INDICATOR
`
`AND POSITION
`
`
`
`
`
`ADVANCE WINDOW
`
`ONE BLOCK
`
`FIG. 2
`
`000003
`
`
`
`46
`
`ADVANCE WINDOW
`
`ONE CHARACTER
`
`48
`
`CALCULATE
`
`SIGNATURE
`
`000003
`
`

`
`U.S. Patent
`
`Aug. 8,2000
`
`Sheet 3 of4
`
`6,101,507
`
`68 MATCH COUNT
`TABLE 66
`
`64
`
`\l f\3
`
`28
`
`Y
`
`76
`
`N
`
`74
`
` 70
`
`ADVANCE
`
`WINDOW
`
`BEYOND N
`
`N ?
`
`78
`
`Y
`
`USE
`
`HIGHEST
`
`FIG. 3
`
`000004
`
`000004
`
`

`
`U
`
`n
`
`U
`
`05
`
`M
`
`M
`
`A$5
`
`
`
`3m>_:u~_<m>_#E~_.58E22
`
`w88U8amm5omPN4.22S.«>3Q5«>35:o>
`85324mm3m_twas53%V.U~n.~
`
`H,RGEEH0EEEQa88E888.a§_:_a_,s__EEEEW.SE
`m.EE>354222
`
`mms;mmm2,«m2gmm>_Iu~_<2&0§
`
`4S3%oz_“_:<uE
`$3><~_~_<E.5
`5n_5om:s~_u8MURNm
`
`0
`
`000005
`
`
`

`
`6JIH,507
`
`1
`FILE COMPARISON FOR DATA BACKUP
`AND FILE SYNCHRONIZATION
`
`CROSS REFERENCE TO RELATED
`APPLICATIONS
`
`Priority is claimed to U.S. Provisional patent application
`Ser. No. 60/037,597 entitled FILE COMPARISON FOR
`DATA BACKUP AND FILE SYNCHRONIZATION, filed
`Feb. 11, 1997.
`
`STATEMENT REGARDING FEDERALLY
`SPONSORED RESEARCH OR DEVELOPMENT
`
`Not Applicable
`
`BACKGROUND OF THE INVENTION
`
`The present invention is generally related to data backup
`and file synchronization, and more particularly to compari-
`son of sets of digital data to determine differences therebe-
`tween to facilitate data backup and file synchronization.
`It is desirable to have the facility to copy files present on
`a first computer system onto a second computer system over
`a connection. Such a need arises when synchronizing the
`contents of an “A” disk system on the first computer system
`with another disk system on the second computer system.
`Such a need also arises when using a target system to
`maintain a second copy of a file from a source system for
`disaster recovery purposes. However, file copying can be
`cumbersome and time consuming due to the bandwidth
`limitations of typical connections.
`It is known to reduce file copying requirements, and hence
`more efficiently utilize bandwidth and reduce backup time,
`by sending only those portions of files which have changed
`since the last backup to the target system. However,
`to
`implement such a system it is necessary to perform data
`comparison to locate the changed portions.
`A file comparison technique is described in U.S. Pat. No.
`5,479,654 entitled APPARATUS AND METHOD FOR
`RECONSTRUCTING A FILE FROM A DIFFERENCE
`
`SIGNATURE AND AN ORIGINAL FILE, issued to Squibb.
`Squibb describes a method of comparing previously stored
`digital signatures with new digital signatures calculated for
`data within a sliding window in order to recognize differ-
`ences between files. In particular, Squibb teaches using two
`functions to calculate digital signatures. The first function
`has the attribute of being incremental. The incremental
`function allows computation of a new digital signature using
`only the new data entering the window, the old data leaving
`the window and the old signature, and hence can be calcu-
`lated relatively quickly. The second function has the
`attribute of being position sensitive and provides a more
`unique signature from the data in the window, being sensi-
`tive to such changes as transposed bytes.
`One combination of functions that Squibb teaches
`includes an Exclusive-OR (“XOR”) function as the incre-
`mental function and a Cyclic Redundancy Check (“CRC”)
`function as the position sensitive function. The XOR func-
`tion of Squibb is incremental, but is not sensitive to such
`changes as transposed bytes. The CRC function of Squibb is
`position sensitive, but must include all of the data in the
`window in the signature calculation. The CRC function of
`Squibb therefore requires relatively more calculations to
`compute than the XOR function. In operation the XOR
`function is employed first. If the comparison between the
`previously stored signature and the XOR signature indicates
`a changed data portion then that result is assumed to be
`
`2
`correct. If the XOR signature comparison does not indicate
`a changed data portion then the CRC function is employed
`to produce a signature that is compared with the previously
`stored signature. The result from the CRC signature com-
`parison is assumed to be correct. However, a more efficient
`technique would be desirable.
`
`BRIEF SUMMARY OF THE INVENTION
`
`In accordance with the present invention, a file compari-
`son routine employs a sliding window and a single function
`F to generate a digital signature that is both position sensi-
`tive and incrementally computable. The function F is defined
`by two qualities. First, for F(A+B), where A is in array,
`F(A+B)=F(A)+F(B). Second, given a concatenation opera-
`tor “!” such that “0!A” indicates an array Awith 0 inserted
`before A, the function F has the property that there is a
`function G such that F(0!A)=G(F(A!0)). Both polynomials
`and cyclic redundancy checks (“CRC”) may be employed as
`the function.
`
`The function F enables computation of a position sensi-
`tive digital signature in a single step using only the new data
`entering the window, the old data leaving the window and
`the digital signature computed from the previous window.
`Hence, a file comparison system in accordance with the
`present
`invention offers improved performance because
`relatively fewer calculations are required and a relatively
`more unique digital signature is provided.
`In further accordance with the present invention, previ-
`ously detected and stored files which have been modified
`and saved under different file names are detected, and only
`those portions of the later version of the file which have
`changed are backed up. To detect such a later version of the
`already backed up file an “approximate comparison” is made
`between the file under investigation and the previously
`stored digital signature lists of other files that have been
`created on the source system. In one method for approximate
`comparison a signature is computed for a block of characters
`in the file under investigation and the signature is compared
`to previously stored signatures of backed up blocks. The
`extent of matches is then determined, such as by the number
`of matching blocks, and the matches are ranked. The pre-
`viously backed up block corresponding to the highest ranked
`match is then employed as a baseline for storage of the new
`file, with only the differing portions of the new file being
`backed up along with an index to the baseline file.
`
`BRIEF DESCRIPTION OF THE DRAWING
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`The invention will be more fully understood in view of the
`Detailed Description of the Invention and the Drawing, of
`which:
`
`FIG. 1 is a block diagram of a data backup system that
`employs file comparison;
`FIG. 2 is a block diagram which illustrates a sliding
`window comparison method;
`FIG. 3 is a block diagram which illustrates approximate
`comparison of files;
`FIGS. 4-6 illustrate how compression and encryption
`affect backup and recovery; and
`FIG. 7 is a block diagram which illustrates the block filter
`routine.
`
`DETAILED DESCRIPTION OF THE
`INVENTION
`
`55
`
`60
`
`65
`
`U.S. Provisional patent application Ser. No. 60/037,597
`entitled FILE COMPARISON FOR DATA BACKUP AND
`
`000006
`
`000006
`
`

`
`6JIH,507
`
`3
`FILE SYNCHRONIZATION, filed Feb. 11, 1997 is incor-
`porated herein by reference.
`Referring to FIGS. 1 and 2, a sliding window comparison
`technique is employed for data backup and file synchroni-
`zation. The use of sliding windows and digital signatures for
`file comparison is known in the art and is taught, inter alia,
`in U.S. Pat. No. 5,479,654 entitled APPARATUS AND
`METHOD FOR RECONSTRUCTING A FILE FROM A
`DIFFERENCE SIGNATURE AND AN ORIGINAL FILE,
`issued to Squibb, which is incorporated herein by reference.
`Afile is selected from a storage device 10in a source system
`12 as indicated in step 14. The selected file is compared with
`files on a storage device 16 in a target system 18 by file name
`as indicated in step 20. As a performance enhancing varia-
`tion a list of target system files can be maintained on the
`source system. If no matching file name is located on the
`target system 18, such as when a new file name 22 is selected
`in the source system, the entire new file 22 is transmitted
`from the source system 12 to the target system 18 for backup
`as indicated in step 24. If a matching file name is located on
`the target system, the file modification dates associated with
`the respective matching files are compared as indicated in
`step 26. Upon locating a previously detected file with an
`unchanged modification date, no changes in the file are
`indicated and a new file is loaded as indicated in step 14.
`Upon locating a previously detected file with a changed
`modification date, a control program 28 operating in the
`source system 12 identifies at least one contiguous portion of
`the file which has changed (“revision element”). The iden-
`tified revision elements, which may vary in size, are trans-
`mitted from the source system to the target system, and may
`be written into the backup copy of the file or separately
`backed up.
`The control program employs a sliding window 30 to
`produce digital signatures 32 from block data portions of the
`file in the source system to identify the revision elements.
`Each digital signature is a representation of the data char-
`acters within the window 30 when the signature is generated.
`For the first signature generated from a file, the signature is
`computed by employing all of the data within the window as
`indicated in step 34. The computed signature 32 is compared
`with previously stored digital signatures 36 to produce a
`result 38 that indicates whether there is a match therebe-
`
`tween as indicated in step 40. If a match is not detected then
`the window position is recorded and the revision element is
`saved as indicated in step 42. If the end of the file has been
`reached, as determined in step 44, a new file is selected as
`indicated in step 14. If the end of the file has not been
`reached, the window 30 is then advanced by one character
`as indicated in step 46. A digital signature is then computed
`as indicated in step 48. The digital signature is compared
`with the previously stored digital signatures to determine if
`there is a match therebetween as indicated in step 40. If a
`match is detected in step 40, a match indicator and the match
`position are recorded as indicated in step 50. If the end of file
`has not been reached, as determined in step 52, the window
`is advanced by the number of characters within the window,
`e.g., one block, as indicated in step 54. When the entire file
`has been analyzed, a new file is loaded as indicated in step
`14.
`
`Calculation of the digital signature is facilitated by
`employing any one of a class of functions (“F”) that have the
`property of being both incrementally computable when the
`window 30 is advanced and providing a position sensitive
`digital signature 32. For this function, F(A+B)=F(A)+F(B).
`Further, given a concatenation operator “!” such that “0!A”
`indicates an array A with a 0 inserted before it, the function
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`F also has the property that there is a function G such that
`F(0!A)=G(F(A!0)). As such, given an array within the
`window that ends with 0, after shifting each of the bytes
`therein down by one position and inserting 0 at the begin-
`ning of the window, the signature 32 can be computed for
`this new array from the signature of the old array without
`reprocessing each of the bytes in the new array. Further, the
`digital signature produced by the function F is relatively
`more unique than known incremental functions, and allows
`detection of such differences as transposed bytes of data
`within the window. The digital signature is thus “position
`sensitive.”
`
`The use of the function F provides improved performance
`in the file comparison system by reducing the number of
`calculations required to detect differences between sets of
`data such as files. The function enables computation of a
`position sensitive digital signature in a single step using only
`the new data 56 entering the window, the old data 58 leaving
`the window and the digital signature computed from the
`previous window. Hence,
`the performance is improved
`because relatively fewer calculations are required and a
`relatively more unique digital signature is provided.
`In one embodiment the function F is the polynomial:
`F(An)=Sum (at-+n*3(W‘1“)) modulo 264 for i=0 to w—1, for an
`array A starting at index position n with a window size of w,
`where ai is an element of A. If the window is advanced in A,
`then the new function is computed as follows:
`F(An+1)=3*F(An)—an*3W‘1+an+W modulo 264. Computing the
`new function is generally faster than computing the original
`Hence, file comparison is facilitated by use of this
`function.
`
`In view of the present disclosure it will now be appreci-
`ated by those skilled in the art that functions other than the
`function illustrated in the above embodiment will provide
`digital signatures that are both incrementally computable
`and position sensitive. For example,
`the general class of
`functions known as Cyclic Redundancy Check (“CRC”)
`functions, which are polynomials over GF(2), are incremen-
`tal functions and will provide position sensitive digital
`signatures.
`Detection of Similar Files
`
`Referring now to FIGS. 1 and 3, in a first alternative
`embodiment the file comparison technique is employed to
`detect similar files having different filenames in order to
`further facilitate backup operations. When a previously
`detected file, such as a form letter 60,
`is modified and
`subsequently saved under a different file name,
`the new
`version 62 of the file is detected and only the revision
`elements of that new version are saved. To detect the new
`
`version of the file an “approximate comparison” is made
`between the new file and the previously stored digital
`signatures from the other files that have been created on the
`source system. The approximate comparison detects files
`that are similar, although not necessarily identical to, the
`new version of the file.
`
`In one approximate comparison technique the control
`program maintains a list 64 of signatures for all source
`system files that have previously been backed up to the
`target system. A Match Count Table 66 is created with one
`row 68 for each file in the list of signatures. Each row in the
`Match Count Table is initialized to zero. The window 30 for
`
`signature computation in the new file is then positioned at
`the beginning of the file being examined.
`In an initial step 68 a signature 32 for the characters in the
`window 30 is computed with the function F. The computed
`signature is then compared against all of the signatures for
`the first “N” blocks of previously copied files in the list 64,
`
`000007
`
`000007
`
`

`
`6JIH,507
`
`5
`where N is a predetermined integer value. If a match against
`at least one existing block signature is detected as indicated
`in step 70, then the row pointer 72 in the Match Count Table
`is incremented and the window 30 in the new file is moved
`
`to begin at the end 73 of the block for which the signature
`was just computed. If no match is detected at step 70 then
`the window 30 is advanced by one character as indicated in
`step 74. If comparison has not advanced beyond the first N
`blocks for the new file as determined in step 76, flow returns
`to step 68. Otherwise flow continues to step 78. As indicated
`in step 78, the file with the highest count in the Match Count
`Table is selected and a comparison algorithm is used to
`determine the set of changes between this file and the new
`file. The set of changes is then backed up along with a
`pointer to the original backup of the file. In the event of
`equal counts resulting in “ties” in the Match Count Table one
`of the files is arbitrarily selected.
`Restore Optimization
`Referring to FIGS. 1 and 4-7, in a second alternative
`embodiment, restoration of revision elements is facilitated
`by a filter routine when compression and encryption are
`employed. The filter routine controls the encryption and
`compression engines on the source system. When the
`detected revision elements are compressed and encrypted for
`transmission to the target system,
`the compression and
`encryption engine is restarted with each revision element so
`that the target system can assemble a representation of the
`final file for transmission to the source system without
`decrypting the revision elements. Hence, it is not necessary
`for the target system to have knowledge of the encryption
`key or compression algorithm.
`Performance is further facilitated in the case where a
`
`plurality of relatively small revision elements are present by
`combining the small revision elements together and com-
`pressing the collection of revision elements in a single step.
`In particular, the compression engine is not restarted at each
`revision element, so the compression algorithm has a larger
`array to work on and hence is more efficient. The problems
`posed for this technique when encryption is employed are
`evident in FIG. 4, which shows version 0 (“V0”) of the file,
`consisting of 7 blocks, and the revision elements for V1
`through V4. FIG. 5 shows the complete contents of V4. If a
`request for the restoration of V4 is made, and the target
`system cannot take the revision elements apart because they
`are all encrypted together, then the best transmission that can
`be made is to send a total of V0 (7 blocks), UV2 (4 blocks),
`UV3 (4 blocks), UV4 (4 blocks) for a total of 19 blocks to
`restore a 7 block long file. UV1 is unnecessary because
`everything in it has been replaced by later revision elements.
`Referring now to FIG. 6, an original backup and revision
`elements generated from updates are grouped for compres-
`sion and encryption. Not all of the updates for a single
`version are packed together. Rather, the updates are grouped
`together into units of compression (“chunks”). The optimal
`set of chunks to be returned in the illustrated example is:
`A2B2, C3D3, D4E4, F4G4, whereby only eight blocks are
`sent for a seven block file, rather than nineteen blocks as in
`the previously described technique. However, in practice the
`updates will not always align perfectly and may have
`arbitrary overlaps.
`File restoration is further illustrated by Tables 1-11. A file
`to be restored is stored in the archive system as a series of
`updates and an archive. The archive contains all of the bytes
`of the file. The updates contain file change information that
`can be employed to update the previous version of the file to
`the current version. The update algorithm works by creating
`a list of updates that encompasses all of the bytes of the file.
`
`6
`The algorithm first processes the most recent update and
`then works sequentially backwards through the set of
`updates, so that if a given byte of the file is already covered
`by an update, the algorithm will not employ an older update
`that contains that same byte.
`
`Addr
`
`V0
`
`0
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`
`A
`B
`C
`D
`E
`F
`G
`H
`1
`J
`
`TABLE 1
`
`V1
`
`A
`L
`Q
`D
`E
`M
`F
`G
`H
`1
`J
`
`V2
`
`L
`Q
`D
`E
`M
`F
`N
`H
`1
`J
`
`V3
`
`P
`Q
`D
`E
`F
`N
`H
`1
`J
`
`Table 1 illustrates four versions of an example file. File
`version V1 differs from version V0 in two places.
`In
`particular, version V1 replaced two bytes at address 1 and
`inserted a byte at address 5. The update for V1 is coded as
`follows:
`
`TABLE 2
`
`Operation
`
`Data Offset
`
`Length
`
`Target
`
`Contents
`
`Copy
`Data
`Copy
`Data
`Copy
`
`0
`
`3
`
`5
`
`1
`2
`2
`1
`5
`
`0
`1
`3
`5
`6
`
`LQ
`
`M
`
`The update expressed in Table 2 is interpreted as:
`1. Copy 1 byte from the previous file (A) at address 0 to
`address 0 of the new version (Target) of the file.
`2. Place 2 bytes of newly supplied data (LQ) into the new
`file starting at address 1.
`3. Copy 2 bytes from the previous file (DE) at address 3
`to address 3 of the new file.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`4. Place 1 byte of newly supplied data (M) into the new
`file at address 5.
`
`45
`
`5. Copy 5 bytes from the previous file (FGHIJ) to address
`6 of new file.
`
`Given the above described coding method, the other three
`versions of the updates shown in Table 1 are coded as shown
`in Tables 3-5 below.
`
`50
`
`TABLE 3
`
`Operation
`Data
`Data
`Data
`
`Data Offset
`
`Length
`3
`3
`4
`
`Target
`0
`3
`6
`
`Contents
`ABC
`DEF
`GHU
`
`TABLE 4
`
`Operation
`
`Data Offset
`
`Length
`
`Target
`
`Contents
`
`Copy
`Data
`Copy
`
`1
`
`8
`
`6
`1
`3
`
`0
`6
`7
`
`N
`
`55
`
`60
`
`65
`
`000008
`
`000008
`
`

`
`TABLE 5
`
`Operation
`Data
`Copy
`Copy
`
`Data Offset
`
`1
`5
`
`Length
`1
`3
`5
`
`Target
`0
`1
`4
`
`Contents
`P
`
`A block filter algorithm may be employed to facilitate
`restoration of a desired version of a file. When multiple
`updates for a file have been archived, intermediate updates
`may be overwritten by later updates. The block filter algo-
`rithm is employed to recognize such situations and avoid
`needless computation. The Block Filter Algorithm deter-
`mines an optimal set of updates. For example, providing file
`version V3 via the block filter algorithm includes creating an
`Update List of all of the version archives, ordered by version
`with the most recent at the head of the list. An empty output
`file and an empty GetData list are then created. The empty
`GetData list is a list of records that describes the data that is
`
`needed from each archive to create the output file. Entries
`are added to the end of the list and removed from the head
`
`of the list. Each record has the following four fields:
`Version: the archive version to get the data from
`Source Offset: the location within the archive file to get
`the data from
`
`the length of the data to be copied in this
`Length:
`operation
`Target: the location in the output file that this data will go
`to
`
`The following entry is placed at the head of the GetData list:
`Version=most recent version of the file to be retrieved
`Source Offset=0
`
`Length=length of output file
`Target=0
`The GetData list is processed by initially removing an
`entry from the head of the list. The entries in the Update List
`for the specified version are then searched to locate a record
`in the Update List whose source range overlaps the target
`range of the GetData entry being processed. “Overlap”
`indicates that some portion of the range defined by Data
`Offset+Length defined in the GetData entry overlaps the
`range defined by the Target+Length in the UpdateList. If the
`record found in the Update List is a “Data” record, then the
`data is copied from that record to the output file. If the record
`found in the Update List is a “Copy” record, then a new
`record is added to the end of the GetData list with the version
`
`entry decremented. Processing of the GetData entry contin-
`ues by scanning the Update List entries until the entire target
`range of the GetData entry has been covered with either Data
`or Copy entries from the Update List. Flow then return to the
`initial step at which processing of the next entry from the
`GetData list begins.
`Exemplary block filter algorithm execution steps are
`shown in Tables 5-8. Entry 1 in the GetData List shows a
`source range of 0-8 and a Version of 3. The Update List is
`searched for a Version 3 entry with a matching output range.
`The first such entry is Entry 1. This specifies Data with a
`length of 1 (P) and it is placed in the output file and location
`0, the target location. Entry 1 in the GetData List has been
`satisfied for source location 0, leaving source locations 1-8
`to be satisfied. Entry 1 in the GetData List is processed for
`a source range of 1-8. Amatch is found in Update List entry.
`This is a copy operation, so an entry is added to the GetData
`List (entry 2) that specifies Version 2, the next lower version,
`and spans the range of overlap from GetData entry 1 and
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`1
`5
`1
`
`8
`0
`
`Version Operation Data
`3
`Data
`3
`Copy
`3
`Copy
`2
`Copy
`2
`Data
`2
`Copy
`1
`Copy
`1
`Data
`1
`Copy
`1
`Data
`1
`Copy
`0
`Data
`0
`Data
`0
`Data
`
`3
`
`5
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`
`Length
`1
`3
`5
`6
`1
`3
`1
`2
`2
`1
`5
`3
`3
`4
`TABLE 6
`
`Target Contents
`0
`P
`1
`4
`0
`6
`7
`0
`1
`3
`5
`6
`0
`3
`6
`
`N
`
`LQ
`
`M
`
`ABC
`DEF
`GHIJ
`
`Output
`Range
`0
`1-3
`4-8
`0-5
`6
`7-9
`0
`1-2
`3-4
`5
`6-10
`0-2
`3-5
`6-9
`TABLE 6a
`
`6JIH,507
`
`8
`Update List entry 2, namely 1-3. Entry 1 of the GetData List
`is then processed for source locations 4-8. This match is
`found with Update List item 3, a copy operation. GetData
`entry 3 is added. This satisfies the entire range of Entry 1
`from the GetData List. Entry 2 in the GetData List shows a
`source range of 1-3 and a Version of 2. The Update List is
`searched for a Version 2 entry with an output range matching
`this. Such an entry is found at Update List This is a copy
`operation so a new entry (4) is placed in the GetData List.
`Note that because a source range of 1-3 was sought, and the
`output range from the Update List entry was 0-5, the entry
`is made into the GetData List, the Data Offset range and
`Length are adjusted to capture only the piece of Update List
`entry 4 to refiect what was needed by the GetData List entry
`2. Entry 2 from GetData is then completely satisfied. Entry
`3 from GetData shows a source range of 5-9. The first
`location of this range is matched by Update List(4), which
`is a copy operation and results in a new entry to GetData(5).
`This leaves GetData(3) with a source range of 6-9 to be
`processed. The GetData(3) source range of 6 is matched by
`Update List(5) which is a Data operation. The data (N) is
`copied to the output file at location 5. This is because the
`GetData(3) entry has a source range of 5-9 which maps to
`an output range of 4-8. Hence the source item at location 6,
`gets placed in the output file at location 5. The GetData(3)
`source range 7-9 is next found to match Update(6), a copy
`operation. This yields GetData(6). The remaining items in
`the GetData list are processed in a similar manner until the
`list is empty. This will yield the complete output file, without
`ever having to process entries from the update list Data items
`that did not affect the final output file.
`
`TABLE 7
`
`GetData
`
`Data Offset
`0
`1
`5
`2
`6
`8
`3
`5
`8
`
`Length
`9
`3
`5
`3
`1
`3
`2
`1
`3
`
`Target
`0
`1
`4
`1
`4
`6
`2
`4
`6
`
`Version
`3
`2
`2
`1
`1
`1
`0
`0
`0
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`
`000009
`
`000009
`
`

`
`6JIH,507
`
`9
`
`TABLE 7a
`
`Source
`0-8
`1-3
`5-9
`2-4
`6
`8-10
`3-4
`5
`7-9
`
`Ouput Range
`0-8
`1-3
`4-8
`1-3
`4
`6-8
`2-3
`4
`6-8
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`
`From GetData
`
`From Update
`
`1
`1
`2
`3
`3
`4
`5
`6
`
`2
`3
`4
`4
`6
`9
`11
`11
`
`TABLE 8
`
`Output File
`
`Address
`
`Contents
`
`From GetData
`List
`
`From Update
`List
`
`0
`1
`2
`3
`4
`5
`6
`7
`8
`
`P
`Q
`D
`E
`F
`N
`H
`I
`J
`
`1
`4
`7
`7
`8
`3
`9
`9
`9
`
`1
`8
`13
`13
`13
`5
`14
`14
`14
`
`Having described the preferred embodiments of the
`invention, other embodiments which incorporate the con-
`cepts of the invention will now become apparent to one of
`skill in the art. Therefore, the invention should not be viewed
`as limited to the disclosed embodiments but rather should
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`viewed as limited only by the spirit and scope of the
`appended claims.
`What is claimed is:
`
`35
`
`1. A system for comparing a first data set on a first storage
`device with a second data set on a second storage device,
`comprising:
`a transmission medium for transmitting data between the
`first storage device and the second storage device;
`a control program that generates a first digital signature
`from the first data set and a second digital signature
`from the second data set, the control program including
`a function F that
`incrementally calculates position
`sensitive first and second digital signatures, wherein
`said first data set
`is defined by a sliding indexed
`window and said incremental calculation comprises
`data leaving the window, data entering the window, and
`a previous digital signature; and
`a comparator for determining whether the first digital
`signature matches the second digital signature.
`2. The comparing system of claim 1 wherein polynomials
`are used to implement the function F.
`3. The comparing system of claim 2 wherein the function
`F is a polynomial: F(An)=Sum (ai+n*N(W‘1"6) modulo 264
`for i=0 to w—1, for an array A starting at index position n
`with a window size of w, where ai is an element of A.
`4. The comparing system of claim 3 wherein,
`if the
`window is advanced in A, then a new function is computed
`as:
`
`F(An=1)=N*F(An)—an*NW‘1+an+W modulo 264.
`5. The comparing system of claim 1 wherein a cyclic
`redundancy check is used to implement the function F.
`6. A method for calculating a position sensitive digital
`signature from data on a first storage medium for compari-
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`10
`son with a digital signature that represents data on a second
`storage medium, comprising the steps of:
`selecting the data in the first system with an indexed
`sliding window;
`applying a function F to the data within the window to
`generate the incremental position sensitive digital
`signature, wherein said function of F comprises input
`from data leaving said sliding indexed window, data
`entering said sliding indexed window, and a previously
`generated digital signature; and
`comparing the position sensitive digital signature with at
`least one digital signature representing data in the
`second system.
`7. The method of claim 6 including the further step of
`employing a polynomials to implement the function F.
`8. The method of claim 7 including the further step of
`1+)’;
`employing the function: F(An)=Sum (a.
`*N(W‘1“7) modulo
`264 for i=0 to w—1, for an array Astarting at index position
`n with a window size of w, where at. is an element of A, to
`implement the function F.
`9. The method of claim 8 including the further step of, if
`the window is advanced in A, computing a new position
`sensitive digital signature as: F(A =1)=N*F(An)—an*NW‘1+
`amw module 264.
`10. The method of claim 6 including the further step of
`employing a cyclic redundancy check to implement the
`function F.
`
`11. A method for providing an incremental backup of a
`first memory in a second memory wherein a set of different
`files have previously been stored in the second memory,
`comprising the steps of:
`selecting a file from the first memory for examination;
`generating a signature from a portion of the file defined by
`a sliding indexed window;
`comparing the generated signature with signatures gen-
`erated from the set of previously stored files having
`different filenames;
`determining the closest matching previously stored file,
`relative to the file under examination, where the stored
`file and the file under examination have different
`
`filenames, by identifying at least some portions of the
`stored file and the file under examination which are
`
`different, wherein said determining comprises applying
`a function F to the data within said sliding indexed
`window to generate an incremental position sensitive
`digital signature, wherein said function of F comprises
`input from data leaving said sliding indexed window,
`data entering said sliding indexed window, and a pre-
`viously generated digital signature; and
`storing the portions of the file under examination identi-
`fied as being different from the closest matching file in
`the second memory.
`12. The method of claim 11 including the further step of
`creating a match count table with a row corresponding to
`each respective previously stored file.
`13. The method of claim 12 including the further step of
`comparing the generated signature with signatures generated
`for N blocks of the previously stored files, where N is a
`predetermined integer.
`14. A method for determining a minimum set of data
`compression units for restoring a file from a base copy and
`a plurality of revision elements, comprising the steps of:
`selecting the base copy and revision elements required to
`build the file;
`sorting the selected base copy and revision elements into
`a list with the most recently generated revision element
`at the list head and the base copy at the list tail;
`
`000010
`
`000010
`
`

`
`6JIH,507
`
`11
`reading information for each selected revision element
`into an array with five columns: chunk, operation, data
`offset, data length, and target offset;
`creating an output array indicative of the reconstructed
`file with five columns: revision element pointer, chunk,
`data offset, data length, and target offset;
`calling a recursive function for the most recent revision,
`requesting data offset 0 and the final file length, and
`passing in target offset 0, the recursive function iterat-
`ing through the array columns and comparing the
`requested data offset and length with the target of offset
`and data length of each item, and in the case of a match,
`writing an entry into the output array if the array item
`is a data operation;
`optimizing, via a block filter, an optimal set of array
`elements wherein elements not affecting said output
`array are ignored; and
`sorting the output array by revision element and chunk;
`and
`
`12
`transmitt

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket