throbber
(12) United States Patent
`Balcha et al.
`
`111111111111111111111111111111111111111111111111111111111111111111111111111
`US006233589Bl
`US 6,233,589 Bl
`May 15,2001
`
`(10) Patent No.:
`(45) Date of Patent:
`
`(54) METHOD AND SYSTEM FOR REFLECTING
`DIFFERENCES BETWEEN TWO FILES
`
`(75)
`
`Inventors: Muralidhar Balcha, Shrewsbury, MA
`(US); Chandan Kudige, Karnataka
`(IN)
`
`(73) Assignee: Novell, Inc., Provo, UT (US)
`
`( *) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`
`(21) Appl. No.: 09/127,523
`
`(22) Filed:
`
`Jul. 31, 1998
`
`(51)
`Int. Cl? ...................................................... G06F 17/30
`(52) U.S. Cl. .............................................................. 707/203
`(58) Field of Search .............................. 707/203-10, 201,
`707/204, 217, 2, 6, 8, 101, 202, 205
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`5,479,654
`5,574,906
`5,634,052
`5,778,395 *
`5,919,247 *
`6,052,531 *
`
`12/1995
`11/1996
`5/1997
`7/1998
`7/1999
`4/2000
`
`Squibb .. ... ... .... ... ... ... ... ... .... .. 707/201
`Morris ...................................... 707/1
`Morris ...................................... 707/1
`Whiting et a!. ... ... ... .... ... ... ... 707/204
`Van Hoff et a!. .................... 709/217
`Waldin, Jr. eta!. ................... 707/10
`
`OTHER PUBLICATIONS
`
`ICS 161: Design and Analysis of Algorithms Lecture Notes
`for Feb. 19, 1996, "Longest Common Sequences", World
`Wide Web publication.
`"Longest Common Subsequence Problem", Pruhs, World
`Wide Web publication, Oct. 6, 1997.
`"Longest Common Subsequence", World Wide Web publi(cid:173)
`cation, 2 pp., author and publication date unknown.
`"Executive Summary", World Wide Web publication, 5 pp.,
`author and publication date unknown.
`* cited by examiner
`Primary Examiner---Hosain T. Alam
`Assistant Examiner---Sanjiv Shah
`(74) Attorney, Agent, or Firm---Dinsmore & Shohl LLP
`
`(57)
`
`ABSTRACT
`
`A method and system for reflecting differences between two
`files. The method includes generating a base signature file
`having a plurality of base bit patterns, each base bit pattern
`being generated as a function of a portion of data in a first
`file. A second file containing a plurality of revised bit
`patterns is generated from a second file. Each revised bit
`pattern is compared to and matches at least one of the base
`bit patterns. A delta file reflecting the differences between a
`first file and the second file based on the base signature file,
`the delta signature file, and the second file is created.
`
`20 Claims, 4 Drawing Sheets
`
`72
`
`92
`
`SAP Exhibit 1003, Page 1 of 13
`
`

`

`U.S. Patent
`
`May 15,2001
`
`Sheet 1 of 4
`
`US 6,233,589 Bl
`
`22
`
`LAN
`SERVER
`
`24
`
`LAN
`SERVER
`
`20
`
`21
`
`28
`
`27
`
`FIG. 1
`
`32
`
`32
`
`32
`
`32
`
`32
`
`34
`
`FIG. 2
`
`SAP Exhibit 1003, Page 2 of 13
`
`

`

`U.S. Patent
`
`May 15,2001
`
`Sheet 2 of 4
`
`US 6,233,589 Bl
`
`FIG. 3
`
`50
`
`52
`
`OPEN BASE FILE.
`INITIALIZE BASE
`SIGNAT~E FILE.
`
`56
`
`GET THE BLOCK SIZE
`FROM BACKUP SERVER
`
`58
`
`60
`
`READ NEXT BLOCK
`IN THE FILE
`
`ADD CRC TO THE
`BASE SIGNATURE
`
`62
`
`FIG. 4
`
`NO
`
`CALCULATE CRC
`OF THE BLOCK
`
`64
`
`68
`
`SAP Exhibit 1003, Page 3 of 13
`
`

`

`U.S. Patent
`
`May 15,2001
`
`Sheet 3 of 4
`
`US 6,233,589 Bl
`
`72
`
`OPEN REVISED FILE
`GET THE BLOCK SIZE FROM THE BASE
`SIGNATURE FILE. INITIALIZE REVISED SIGN FILE.
`
`94
`
`ADD CRC TO REVISED
`SIGNA T~E FILE
`
`78
`
`YES
`
`86
`
`REMOVE MSB FROM THE CRC
`COMPUTATION
`
`92
`
`ADD DISPLACEMENT OF THE BLOCK
`TO REVISED SIGNAT~E FILE
`
`ADD NEXT BYTE TO THE CRC
`COMPUTATION
`DISPLACEMENT ++
`
`88
`
`YES
`
`NO
`
`FIG. 5
`
`SAP Exhibit 1003, Page 4 of 13
`
`

`

`U.S. Patent
`
`May 15,2001
`
`Sheet 4 of 4
`
`US 6,233,589 Bl
`
`L1 : LIST BASE SIGNATURE CRCS
`~ : LIST OF REVISED SIGNATURE
`CRCS
`E : FIRST ELEMENT IN L 1•
`E2; FIRST ELEMENT IN L2,
`I=J=O; E l1 =E21 =NJLL
`
`100
`
`FIG. 6
`
`E 1 = NEXT(E 1 )
`Er NEXT(E2J
`1++, Jt+; E 11 = E 11 E 21=E 2
`
`108
`
`INS( S, BASE(E 1 ), S[BASE(NEXT(E 2pl,
`DISPLACEMENT(E2)]) TO DELTA FILE
`
`112
`
`DEL(S, BASE(NEXT(E 11ll, BASE (E 1)
`-BASE(NEXT(E 11))) TO DELTA FILE
`
`116
`
`DEL(S, BASE(NEXT(E 11)), BASE(E 1J
`-BASE(NEXT(E 11 ))) TO DELTA FILE.
`INS(S, BASE(NEXT(E nll,
`S(BASE(NEXT(E 21l,
`BASE(E2 )-BASE(NEXT(E21)]) TO DELTA FILE
`
`I++;
`E 1 =NEXT(E 1 )
`
`120
`
`124
`
`Jt+;
`E2=NEXT(E2l
`
`SAP Exhibit 1003, Page 5 of 13
`
`

`

`US 6,233,589 Bl
`
`1
`METHOD AND SYSTEM FOR REFLECTING
`DIFFERENCES BETWEEN TWO FILES
`
`FIELD OF THE INVENTION
`
`The present invention relates generally to backup and
`synchronization of files, and in particular relates to a method
`and system for reflecting differences between two files.
`
`BACKGROUND OF THE INVENTION
`Copies of files are frequently transmitted over a network 10
`from one computer to another computer. One reason to copy
`a file is for backup purposes. If a file created on one
`computer has been backed up on another computer, it can be
`easily recovered in the event the hard drive of the first
`computer fails. Because the loss of a file could mean the loss
`of extremely important data, and/or result in significant
`effort to recreate the file, file backup processes are very
`common. However, file backup has at least two problems
`associated with it: first, it can require significant network
`bandwidth to transfer file data from the first computer to the
`backup computer, and second, it can require significant
`storage space to maintain copies of files. Both of these
`problems can be alleviated to some extent through the use of
`an incremental backup. An incremental backup copies only
`those files that have been changed since the previous
`backup. Incremental backups can significantly reduce the
`number of files that are backed up on a periodic basis.
`Typically, when a file is modified, only a small portion of
`the file is actually changed from the previous version. While 30
`an incremental backup can reduce network bandwidth and
`save storage space compared to a complete backup, it is still
`inefficient in that a complete file is backed up even though
`it is possible that only a small portion of the file was actually
`modified. In an attempt to improve upon incremental
`backups, backup processes exist that identify the differences
`between two versions of a file, and attempt to backup only
`those differences. This is referred to as a differencing pro(cid:173)
`cess. Differencing processes can reduce network bandwidth
`and storage requirements because only portions of the file 40
`are backed up.
`Copies of files are also frequently made for purposes of
`synchronization or replication. A synchronized file exists in
`two different locations, such as on two different servers, and
`changes made to one file must be reflected in the other file. 45
`Synchronization usually occurs by periodically copying the
`file from one location to the other location.
`U.S. Pat. No. 5,634,052 discloses a system for reducing
`storage requirements in a backup subsystem. The system
`includes creating a delta file reflecting the differences 50
`between a base file and a modified version of the base file,
`and transmitting the delta file to a server for backup pur(cid:173)
`poses. One problem associated with this system is that the
`base file is necessary to create the delta file that reflects the
`differences between the base file and the revised file. Thus,
`if the delta file is to be created on another computer, such as
`the server, the base file must first be transmitted to the server
`where the differencing operation is carried out. Moreover,
`the '052 patent does not disclose optimal mechanisms for
`creating the delta file.
`In a differencing backup system, the differencing mecha(cid:173)
`nism used to create the delta file can be quite important. It
`is not uncommon for files to be many megabytes in size. A
`differencing mechanism that processes a file multiple times,
`or processes a file in an inefficient manner can result in 65
`excessive backup times. Moreover, an inefficient differenc(cid:173)
`ing mechanism can result in more data being backed up than
`
`2
`necessary. In other words, two differencing mechanisms can
`vary in their ability to efficiently recognize and reflect
`differences between two files. Also, it would be preferable
`for a differencing mechanism to be able to determine dif(cid:173)
`ferences between a base file and a modified version of the
`base file without actually having to repeatedly process the
`base file, so that the differencing operation can be performed
`on a remote computer, without the need to process the entire
`base file.
`U.S. Pat. No. 5,574,906 discloses a system and method
`for reducing storage requirements in backup subsystems.
`The '906 patent discloses a system similar to that disclosed
`in the '052 patent, with the enhancement that the base file
`from which the differencing operation is derived can be
`15 compressed. In certain files, a compressed base file will
`utilize less bandwidth and less storage space on a computer
`than would an uncompressed based file. One problem with
`this approach is that the compressibility of files differs
`greatly. While compression can significantly reduce the size
`20 of some files, compression algorithms do not obtain signifi(cid:173)
`cant reduction with other type of files. Additionally, the
`differencing mechanism of the '906 patent works by first
`compressing the revised version of the file, and upon deter(cid:173)
`mining that compressed portions of the base file and the
`25 revised file differ, both the base file and the revised file are
`uncompressed at those locations so that the differences
`between the two files can be determined. The overhead
`involved in such compression/decompression algorithms
`can be significant.
`U.S. Pat. No. 5,479,654 discloses an apparatus and
`method for generating a set of representations of the changes
`made in a computer file during a period of time. The process
`disclosed in the '654 patent makes multiple passes through
`portions of the most recent version of the file to determine
`35 the differences between it and the previous version of the
`file.
`Thus, it is apparent that a differencing system that reduces
`network traffic, efficiently determines and reflects differ(cid:173)
`ences between two files quickly, and reduces storage
`requirements would be highly desirable.
`
`SUMMARY OF 1HE INVENTION
`It is one object of the present invention to provide a
`method and system for determining the differences between
`a base file and a modified file without the need for a copy of
`the base file.
`It is another object of the present invention to provide a
`method and system for reflecting the differences between
`two files that is highly efficient and reduces network traffic.
`It is yet another object of the present invention to provide
`a method and system that can determine the differences
`between a base file and a modified file either on a client
`computer or a server computer without a need for a copy of
`55 the base file.
`It is still another object of the present invention to provide
`a method and system for creating a signature of a base file
`that can be used to determine the differences between a base
`file and a modified file.
`Additional objects, advantages and other novel features of
`the invention will be set forth in part in the description that
`follows and, in part, will become apparent to those skilled in
`the art upon examination of the invention. To achieve the
`foregoing and other objects, and in accordance with the
`purposes of the present invention as described above, a
`method and system for reflecting differences between two
`versions of a file is provided. The method includes
`
`60
`
`SAP Exhibit 1003, Page 6 of 13
`
`

`

`US 6,233,589 Bl
`
`4
`The present invention reduces the amount of data neces(cid:173)
`sary to reflect differences between two files, reducing both
`network utilization and storage requirements. The present
`invention is useful not only in backup systems, but also in
`replication and synchronization systems. Moreover, once a
`base signature file has been generated, a delta file reflecting
`differences between the base file and a subsequent version of
`the base file can be generated without the base file. Still other
`objects of the present invention will become apparent to
`10 those skilled in this art from the following description
`wherein there is shown and described preferred embodi(cid:173)
`ments of this invention. As will be realized, the invention is
`capable of other different obvious aspects all without depart(cid:173)
`ing from the invention. Accordingly, the drawings and
`15 description will be regarded as illustrative in nature and not
`as restrictive.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`25
`
`3
`generating, from a base file, a base signature file that
`includes a plurality of base bit patterns. Each bit pattern is
`generated as a function of a portion of data in the base file.
`A revised version of the base file is created. A revised
`signature file, including a plurality of revised bit patterns, is
`generated from the revised file. Each revised bit pattern
`matches at least one of the base bit patterns. Based on
`differences between the base signature file and the revised
`signature file, the revised file is accessed and a delta file
`reflecting the differences between the base file and the
`revised file is generated.
`Once the base signature file is generated from the base
`file, the base file need not be accessed to generate the delta
`file reflecting the differences between the base file and the
`revised file. Thus, according to the present invention, the
`base signature file, which is a small fraction in size of the
`original base file, can be transmitted over a network to
`another computer, such as a server, and the differencing
`operation to generate the delta file can be carried out on the
`server, without the need to transmit the original base file to
`the server.
`According to one embodiment of the invention, the base
`signature file is generated by reading fame size blocks of
`data from the base file, and for each block of data, calcu(cid:173)
`lating and generating a bit pattern, such as a cyclic redun(cid:173)
`dancy check (CRC) value that is stored in the base signature
`file. After the base file is processed, a base signature file
`exists containing a plurality of CRC values derived from the
`data contained in the base file. A modified, or revised version
`of the base file is made, and a revised signature file is created
`by reading frame size blocks of data from the revised file and 30
`generating a revised CRC value for each data block. The
`revised CRC value is compared to the CRC values in the
`base file. If the revised CRC value matches a CRC value in
`the base signature file, the revised CRC value is stored in the
`revised signature file. A match is an indication that the block 35
`of data matches a block of data in the base file. If no CRC
`values in the base signature file match the revised CRC bit
`pattern, a new block of data offset one byte from the
`previous block of data is read. This process is repeated until
`a revised CRC bit pattern is generated that matches a bit
`pattern from the base signature file. The revised bit pattern
`is then saved to the revised signature file, along with the
`offset of the block of data from the beginning of the revised
`file, and the next block of data is read from the revised file,
`this block of data beginning at the end of the previous block 45
`of data. At the end of the process, a revised signature file
`exists with CRC values that match CRC values in the base
`signature file, and offsets indicating the location in the
`revised file of the matching blocks of data.
`The base signature file and revised signature file are then 50
`processed, and based on the differences between the two
`files, and the offsets in the revised signature file, the revised
`file is accessed and a delta file is created reflecting the
`differences between the base file and the revised file. The
`delta file preferably contains primitives, such as insert,
`modify, and delete primitives, which are commands that can
`be applied to a previous version of the file to generate the
`revised file.
`According to one embodiment of the present invention,
`the revised signature file and the base signature file are
`processed in the order of the longest common subsequence
`of revised bit patterns with respect to the base bit patterns in
`the base signature file. Determining differences between the
`base file and the revised file based on the longest common
`subsequence of bit patterns ensures a minimal number of
`primitives are used in the delta file to reflect the differences
`between the base file and the revised file.
`
`The accompanying drawings incorporated in and forming
`20 a part of the specification, illustrate several aspects of the
`present invention, and together with the description serve to
`explain the principals of the invention. In the drawings:
`FIG. 1 is a block diagram illustrating an environment in
`which the present invention has application;
`FIG. 2 is a block diagram illustrating another environment
`in which the present invention has application;
`FIG. 3 is a schematic diagram illustrating the relationship
`between various files and a delta file according to one
`embodiment of the present invention;
`FIG. 4 is a flow diagram illustrating a process for gener(cid:173)
`ating a base signature file according to one embodiment of
`the present invention;
`FIG. 5 is a flow diagram illustrating a process for gener(cid:173)
`ating a delta signature file according to one embodiment of
`the present invention; and
`FIG. 6 is a flow diagram illustrating a process for gener(cid:173)
`ating a delta file according to one embodiment of the present
`40 invention.
`Reference will now be made in detail to present preferred
`embodiments of the invention, examples of which are illus(cid:173)
`trated in the accompanying drawings, wherein like numerals
`indicate the same elements throughout the views.
`
`DETAILED DESCRIPTION OF PREFERRED
`EMBODIMENTS
`
`The present invention is directed to a differencing mecha(cid:173)
`nism that quickly and efficiently determines the differences
`between two files, and generates a delta file reflecting those
`differences. Referring to FIG. 1, an environment using file
`synchronization or file replication is shown. Each of servers
`22 and 24 maintain a copy of a base file 21, 27, respectively,
`and are interconnected via a network 26. Base files 21, 27
`55 should be identical to one another and thus changes made to
`one copy should eventually be reflected in the other copy. A
`base signature file, as described in more detail herein, is
`generated on one of the servers 22, 24 and copied to the
`other server. The base signature file is maintained on server
`60 22 as file 20, and maintained on server 24 as file 28. Either
`base file 21 or 27 can be modified at either server. Upon
`detection of a modification to the file, the detecting server,
`for example server 22, uses the respective base signature
`file, for example, base signature file 20, to generate a new
`65 delta file, and communicates the delta file over network 26
`to server 24. Server 24 then uses the delta file to update the
`base file 27, and recalculates the base signature file 28. In
`
`SAP Exhibit 1003, Page 7 of 13
`
`

`

`US 6,233,589 Bl
`
`5
`this manner, base files 21, 27 on servers 22 and 24 can stay
`in synchronization with minimal transfer of data over net(cid:173)
`work 26.
`FIG. 2 is a block diagram illustrating another environment
`in which the present invention has application. A plurality of
`client computers 32 are connected over a network 26 to a
`backup server 30. Backup server 30 is in communication
`with a storage medium 34 for storage of files backed up from
`computers 32. Each new file created on computers 32 is
`preferably communicated to backup server 30 and backed up 10
`on storage medium 34. For each new file generated, a base
`signature file is created. The base signature file can be
`generated either on computer 32 or backup server 30. A user
`of computer 32 can modify the base file and generate a
`revised file. The differencing mechanism of the current 15
`invention then processes the base signature file and the
`revised file to generate a revised signature file. The revised
`signature file and base signature file in conjunction with the
`revised file are used to create a delta file, which can then be
`communicated over network 26 to backup server 30, and 20
`stored on storage medium 34.
`Multiple versions of delta files can be maintained so that
`any particular version of a file can be restored. To accom(cid:173)
`plish this, a new base signature file is generated from the
`revised file. In essence, the revised file becomes the base file 25
`for the next version of the revised file. If the base signature
`files are being maintained on backup server 30, and only the
`delta file is communicated from computers 32 to backup
`server 30, a new base signature file can be created on backup
`server 30 by first applying the delta file received from 30
`computer 32 to the previous base file to create the revised
`file. (This revised file becomes the base file for use with the
`next received delta file.) The base signature file can then be
`~:~:~a:~i~~~::~e0r:v;~:p:~:~ I~~~et~:s:e~g~:~~r:i~~:t~~: 35
`file can be generated from the actual revised file that was
`used in creating the delta file. In this manner, multiple delta
`files can be maintained to allow the recreation of any version
`of the base file.
`If network bandwidth consumption is not a problem, and
`client processing power is limited, the revised file can be
`communicated to backup server 30, and the delta file can be
`created on backup server 30. Alternatively, the base signa(cid:173)
`ture file can be stored by backup server 30 on storage
`medium 34, and then communicated to computer 32 upon its
`request. Computer 32 can then use the base signature file to
`create the delta file, and transmit the delta file to backup
`server 30.
`Certain terminology will be used to describe the invention 50
`herein. The term 'stream' will be used to refer to contiguous
`bytes of data stored in a file. The phrase 'delta' will be used
`to refer to differences between two files.
`Deltas
`A versioned sequence of a file can be represented as 55
`follows:
`
`45
`
`40
`
`6
`Any version of the file can be derived from the base
`version of the file F and the given deltas between the base
`version and the desired version.
`R1 generates the required version of the file from a given
`base file and a forward delta.
`Rb generates the required version of the file from a given
`base file and a backward delta.
`IfF; represents the i'h version of file F and d; represents the
`forward delta then
`
`F,~R/.F, d,)
`
`If F1 represents the j'h version of the file F and d1 represents
`the backward delta then
`
`Streams
`Assume S is a stream of bytes. If the function len(S)
`returns the length of stream S, then the following conven(cid:173)
`tions hold true:
`i. S[i] where 1~i~len(S) represents a single byte at
`position i within the stream S.
`ii. S[ a, b] where 1 ~a, b ~ len(S) represents a substream of
`S starting at position a and the next b-1 bytes
`iii. S[1, len(S)]=S
`iv. S[ n, #]=S[ n, len(S)]
`v. if S1[i]=S2[i] Vi and len(S 1)=1en(S2 ) then S1 and S2 are
`same.
`vi. S1 +S 2 concatenates S2 to S1 .
`Blocks
`Each stream is divided into blocks of fixed length such
`that following conventions hold true:
`i. BS is a fixed integer, used to represent the block size in
`the streamS
`ii. Given a streamS, N(S) represents the number of blocks
`in the stream S such that
`
`N(SHlen(S)/BSl
`
`iii. Bn(S) represents n'h block in the stream S such that,
`
`Bn(S)~S[BS*n+1, BS*(n+1)], O~n~N(S)-1
`
`Signature of Stream
`A signature is calculated with each stream of data. The
`signature of the stream enables the identification of changed
`blocks in the stream. The signature of the stream is described
`as follows:
`i. CRC(Bn(S)) represents the CRC of the block Bn of the
`streamS.
`ii. CRC(A)=CRC(B) iff A=B, where A and B are two
`blocks in the stream S (CRC(A)=CRC of block A).
`iii. Sig(S) represents the signature of the stream S such
`that
`
`Sig(S)~{CRC(B 1 (S)), CRC(B 2(S)), ... , CRC(BN(s/S))}
`
`where F is the base version of the file and d1 , d2 , . . . , dm+n
`are deltas of file F. Deltas d1 , d2 , . . . , dm represent reverse 60
`deltas and deltas dm+l' dm+ 2 , . . . , dm+n represent forward
`deltas with respect to file F.
`Given two versions of file F, F; and Fp a diff function is
`defined such that
`
`d~diff(F,, F)
`
`where d is the delta changes between F; and F1.
`
`If C; is the CRC of a block i in a stream S then
`i. Base( C;) is the starting position of the block within the
`streamS
`ii. Next(C;) returns the CRC of the next block in the
`signature Sig(S)
`The present invention generates a delta file reflecting the
`65 differences between a first and second file. The delta file
`preferably comprises primitives, such as insert, modify and
`delete commands, which reflect the operations to apply to
`
`SAP Exhibit 1003, Page 8 of 13
`
`

`

`US 6,233,589 Bl
`
`7
`the first file to derive the second file. According to one
`embodiment of this invention, three primitives are used.
`Each primitive is defined as a function that returns the
`resultant strem(S*) after applying the operation to the base
`streamS.
`Insertion primitive:
`S*=ins(S, i, S')=S[l, i]+S'+S[i, #]where S' represents the
`stream that was inserted within the stream S.
`Deletion primitive:
`S*=del(S, i, len)=S[l, i]+S[i+len, #]
`Modification primitive:
`S*=mod(S, i, S')=S[l, i]+S'+S[i+len(S'), #]where S' rep(cid:173)
`resents modification to the substream S[i, len(S')]
`A sequence of primitive operations on a stream can be
`represented as P 1 ,2 , . . . ,n(S) such that
`
`P 1,2, ... ,n(S)~Pn(Pn_ 1 ( . . . (P1 (S)))) where P,E[ins, del, mod]
`1~i~n
`
`Inverse Primitive:
`Give S, S* , P such that:
`
`we define inverse primitive P- such that
`
`S~P-(S*)
`
`If P~ins(S,i,S') then P-~del(S*, i, len(S'))
`
`If P~del(S,i,len) then P-~ins(S*, i, S[i+1, len])
`
`If P~mod(S, i,S') then P-~mod(S*, i, S')
`
`If P~Pn(Pn_ 1 ( . . . (P 1 (S)))) then
`
`P-~P-,(P-2( ... (P-n(S))))
`
`If P represents the forward delta for the base stream to
`generate the current version of the stream, then P- repre(cid:173)
`sents the backward delta for the current version of the stream
`to generate the base version of the stream.
`In general, one aspect of the present invention relates to
`solving the following problem: given two streams s1 and s2
`find P 1 2
`n such that S2 =P 1 2
`n (S 1) using Sig(S 1) and
`S2.
`, ... ,
`, ... ,
`
`With this notational background, a high-level schematic
`of the present invention can be described with reference to
`FIG. 3. A generator 40 processes a base file 38 and generates
`a base signature file 42. Base signature file 42 comprises a
`plurality of bit patterns, each of which is derived as a
`function of a block of data from base file 38. Base file 44 is
`then modified, or revised, by a user and a revised file 44 is
`created. Generator 46 processes the revised file 44, gener(cid:173)
`ating bit patterns as a function of blocks of data from revised
`file 44. To distinguish these bit patterns from those contained
`in base signature file 42, the bit patterns generated as a
`function of data from revised file 44 will be referred to as
`revised bit patterns, and those contained in base signature
`file 42 will be referred to as base bit patterns. Each revised
`bit pattern is compared to the base bit patterns in base
`signature file 42. For each revised bit pattern that matches a
`base bit pattern in base signature file 42, it is stored in
`revised signature file 48, along with an offset indicating the
`location in revised file 44 of the beginning of the block of
`data represented by the revised bit pattern. In this manner,
`generator 46 creates a revised signature file 48 containing a
`plurality of revised bit patterns. To generate a delta file 52
`reflecting the differences between base file 38 and revised
`file 44, a generator 50 processes the base signature file 42
`
`8
`and revised signature file 48, preferably in the longest
`common subsequence order of revised bit patterns. Based on
`comparisons between base signature file 42 and revised
`signature file 48, and using the offsets contained in revised
`signature file 48 and the data in revised file 44, the delta file
`52 is generated. Revised file 44 can then be treated as a new
`base file, and a new base signature file can be generated from
`revised file 44. Alternately, the next revision of revised file
`44 can be processed against the original base signature file
`10 42, and a delta file can be generated that reflects the
`differences between original base file 38 and the newest
`version of the revised file. The method and system of the
`present invention eliminate the need to retain the base file for
`purposes of generating a delta file after generation of the
`15 base signature file. Because the base signature file comprises
`bit patterns which are significantly smaller in size than the
`blocks of data which they represent, the base signature file
`42 is a small fraction of the size of the original base file 38.
`Preferably, the bit patterns used in the present invention
`20 are cyclic redundancy code ("CRC") patterns. A CRC is
`derived from a block of data and the calculation and use of
`CRCs (for the purpose of detecting errors in data
`transmission) is well known to those skilled in the art, and
`will not be discussed herein in detail. The block size chosen
`25 can be arbitrary, but must be the same for the base file and
`for subsequent revisions of the base file. The block size
`should not be so small that the process loses efficiency, and
`not so large that large blocks of data are backed up from
`minor modifications. For example, for a large file on the
`30 order of about one gigabyte, a suitable block size would be
`about 64 Kb. Assuming that the CRC polynomial is 32 bytes
`long, the size of the signature file would be approximately
`128 Kb. Thus, the signature file is significantly smaller than
`the data file on which it is based. Thus, a trade off exists
`35 between the size of the signature file and the block size so
`that effective performance is achieved.
`A CRC algorithm treats a message, or block of data, as a
`single binary number and divides it by another fixed binary
`number. The remainder of this division is known as the
`40 CRC. The fixed binary number is normally called the
`polynomial of the CRC. While a normal CRC algorithm
`could be applied to the present invention, performance may
`not be satisfactory. Preferably, a table-driven CRC algorithm
`is used so that a mere shift, OR, XOR and a table lookup per
`45 byte can be used to derive the CRC.
`The use of a CRC has several advantages. First, a CRC
`can serve as a unique identity that represents the contents of
`a logical block of data. When the contents are changed, the
`CRC value should change. Ideally, a CRC algorithm has
`50 polynomial time complexity and very little space complex(cid:173)
`ity. In other words, if the size of the file is N, the time
`required by the CRC algorithm to perform the calculation is
`K*N (where K is a constant), and has relatively small
`memory requirements. The CRC value itself should con-
`55 sume very little space relative to the amount of data that it
`represents. Given a stream of data where the logical blocks
`of data represented by a set of function outputs is embedded,
`it should be possible to determine the boundaries of the
`logical blocks represented by these outputs in polynomial
`60 time in a single pass of the file. A table-driven CRC is
`extremely fast and does not depend on the size of the
`polynomial. The purpose of one aspect of the present
`invention is to back up portions of files that have changed
`with respect to a previous version of the file. When data is
`65 inserted or deleted in a file, the position of blocks of data in
`the revised file differs from the same blocks of data in the
`previous version of the file. The differencing mechanism of
`
`SAP Exhibit 1003, Page 9 of 13
`
`

`

`US 6,233,589 Bl
`
`9
`the present invention determines the locations of identical
`blocks of data in two versions of the file. The following
`illustration will be useful in discussing the differencing
`mechanism according to one embodiment of the invention.
`
`~
`
`s
`
`I AlB I
`
`Unmodified streamS
`
`I A [~J I
`
`Modified stream S'
`
`~
`
`10
`the data have been changed. This pr

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket