throbber

`UNIFIED PATENTS
`
`EXHIBIT 1003
`
`Unfied Patents Exhibit 1003
`Pg. 1
`
`

`

`(12) United States Patent
`(10) Patent N0.:
`US 6,233,589 B1
`
`Balcha et al.
`(45) Date of Patent:
`May 15, 2001
`
`USOO6233589B1
`
`(54) METHOD AND SYSTEM FOR REFLECTING
`7
`DIFFERENCES BETWEEN T“ O FILES
`Inventors: Muralidhar Balcha, Shrewsbury, MA
`(US); Chandan Kudige, Karnalaka
`(IN)
`
`(75)
`
`(73) Assignee: Novell, Inc., Provo, UT (us)
`
`( * ) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U’S‘C’ 154(1)) by 0 days.
`
`(21) Appl. No.: 09/127,523
`.
`Jul. 31’ 1998
`Flled3
`(22)
`
`Int. Cl? ...................................................... G06F 17/30
`(1)
`(52) U.S.C1.
`..................
`707/203
`
`(
`) Field of Search
`707/203—10, 201,
`707/204, 217, 2, 6, 8, 101, 202, 205
`
`(56)
`‘
`’
`
`References Cited
`U.S. PATENT DOCUMENTS
`12/1995 Squibb ................................ 707/201
`11/1996 Morris
`
`. 707/1
`5/1997 Morris ..............................
`707/1
`
`707/204
`7/1998 Whiting etal.
`
`709/217
`.
`7/1999 Van Hoff et a1.
`4/2000 Waldin, Jr. et a1.
`...... 707/10
`
`55479554
`5,574.906
`5,634,052
`5,778,395 *
`5,919,247 *
`6,052,531
`
`OTHER PUBLICATIONS
`ICS 161: Design and Analysis of Algorithms Lecture Notes
`for Feb. 19, 1996, “Longest Common Sequences”, World
`Wide Web PUthfl‘im
`.,
`Longest Common Subsequence Problem , Pruhs, World
`Wide Web publication, Oct. 6, 1997,
`“Longest Common Subsequence”, World Wide Web publi—
`cation, 2 pp., author and publlcatlon date unknown,
`“EXCCUHVC Summary”, WOFId Wide WGb publication, 5 1113-,
`author and publication date UnkIIOWH.
`* Cited bV examiner
`
`Primary Examiner—Hosain T. Alam
`Assistant ExamineriSanjiv Shah
`(74) Attorney, Agent, or Firm—Dinsmore & Shohl LLP
`(57)
`ABSTRACT
`
`Amethod and system for reflecting differences between two
`files. The method includes generating a base signature file
`having a plurality of base bit patterns, each base bit pattern
`
`being generated as a function of a portion of data in a first
`file. A second file contaimng a plurallty of reV1sed bit
`patterns is generated from a second file. Each revised bit
`pattern IS compared to and matches at least one of the base
`bit patterns. A delta file reflecting the differences between a
`first file and the second file based on the base signature file,
`the delta signature file, and the second file is created.
`
`20 Claims, 4 Drawing Sheets
`
`72 ’
`
`94
`ADD CRC T0 REVISED
`
`STGNATJQE FTLE
`
`
`OPEN REVISED TILE
`{{T THE BLOCK SZE FROM THE EASE
`
`_
`SWTURE FIL; iNiTIALIZE REVTSED Sim FILE
`
`
`READ PEXT BLOCK
`
`
`
`
`
`CAI (llLATE CRC 0E TtE BLOCK
`
`
`YES
`
`ANV CFC TN BAE
`STCMTURE FTLE?
`
`
`
`
`
` DOES (RC MATCH
`
`
`
`
`92
`
`_
`56 J
`
`RED/[M v53 mom THE are
`COMPUTATION
`
`1
`ADD NEX' BYTE T0 at 010
`commnow
`mspncmrm H
`
`r55
`
`
`
`
`
`
`ADD DTSPLACFMENT OF THE BLOCK
`TO REVISED STGNATLRE FTLE
`
`
`
`DOES CRO MATCH
`
`
`13M CRC IN BASE
`SIGNAllRE FILE?
`
`
`
`Unfied Patents Exhibit 1003
`
`Pg. 2
`
`Unfied Patents Exhibit 1003
`Pg. 2
`
`

`

`US. Patent
`
`May 15, 2001
`
`Sheet 1 0f 4
`
`US 6,233,589 B1
`
`22
`
`24-
`
`LAN
`
`SERVER
`
`LAN
`
`SERVER
`
`FIG.1
`
`
`
`FIG. 2
`
`Unfied Patents Exhibit 1003
`
`Pg. 3
`
`Unfied Patents Exhibit 1003
`Pg. 3
`
`

`

`US. Patent
`
`May 15, 2001
`
`Sheet 2 0f 4
`
`US 6,233,589 B1
`
`58
`
`42
`
`FILE
`
`
`BASE
` SIGNATURE
`GENERATOR
`
`
` REVISED
`SIGNATLRE
`FILE
`
`
`
`OPEN BASE FILE.
`
`
`
`INITIALIZE BASE
`SIGNATLRE FILE.
`
`56
`
`
`
`
`
`
`
`60
`
`FIG. 4
`
`ADD CRC TO THE
`BASE SIGNATURE
`
`62
`
`
`
`CALCULATE CRC
`OF THE BLOCK
`
`Unfied Patents Exhibit 1003
`
`Pg. 4
`
`Unfied Patents Exhibit 1003
`Pg. 4
`
`

`

`US. Patent
`
`May 15, 2001
`
`Sheet 3 0f 4
`
`US 6,233,589 B1
`
`72
`
`OPEN REVISED FILE
`GET THE BLOCK SIZE FROM THE BASE
`
`SIGNATURE FILE. INITIALIZE REVISED SIGN FILE
`
`READ NEXT BLOCK
`
`74
`
`76
`
`® 61D 78
`
`N0
`
`CALCULATE CRC OF THE BLOCK
`
`50
`
`_
`
`YES
`
`82
`DOES CRC MATCH
`
`ANY CRC |N BASE
`SIGNATURE FILE?
`
`NO
`
`DISPLACEMENT = 0
`
`54
`
`94
`
`ADD CRC To REVISED
`
`SIGNATURE FILE
`
`
`
`56
`
`92
`
`REMOVE MSB FROM THE CRC
`COMPUTATION
`
`
`
`ADD NEXT BYTE TO THE CRC
`COMPUTATION
`DISPLACEMENT ++
`
`<93
`
`ADD DISPLACEMENT OF THE BLOCK
`TO REVISED SIGNATLRE FILE
`
`YES
`
`90
`
`NO
`
`SIGNATURE FILE?
`
`
` ANY CRC IN BASE
`DOES CRC MATCH
`
`
`FIG. 5
`
`Unfied Patents Exhibit 1003
`
`Pg. 5
`
`Unfied Patents Exhibit 1003
`Pg. 5
`
`

`

`US. Patent
`
`May 15, 2001
`
`Sheet 4 0f 4
`
`US 6,233,589 B1
`
`
`
`
`
`L1 : LIST BASE SIGNATURE CRCS
`L2: LIST OF REVISED SIGNATURE
`CRCS
`E : FIRST ELEMENT IN L1,
`E2; FIRST ELEMENT IN L2,
`|=J=0;E =E
`=NJLL
`l]
`21
`
`
`102
`
`
`
`
`
`100
`
`F l
`
`6' 6
`
`@ I
`
`
`
`
`
`
`
`
`
`
`E1=NEXT(E1)
`
`E2= NEXT(E2)
`
`l++, J++; E 11: E 11E21=E 2
`
`
`
`
`
`
`
`
`
`104
`YES
`
`
`DOES ElAND E2MACTH?
`
`106
`YES
`
`
`
`
`110
`
`
`PREVIE I=E
`YES
`11
`838‘
`INSIS, BASE(E1), SIBASEINEXTIE 21)),
`
`
`PREWE2“: E 21
`DISPLACEMENT(E2)]) TO DELTA FILE
`
`
`
`N0
`
`
`112
`
`
`
`PREV1E2) = E21
`
`&&
`
`
`
`
`
`
`
`YES
`
`DELIS, BASE(NEXT(E11]). BASE (E 1)
`-BASE(NEXT(E11)]) TO DELTA FILE
`
`116
`
`114
`
`&&
`
`PREV(E2) I = E 21
`
`YES
`
`
`
`DELIS, BASE(NEXT(E11]), BASE(E1)
`
`
`
`-BASE(NEXT(E11))) TO DELTA TILE.
`
`
`INSIS, BASE(NEXT(E 11)).
`
`SIBASEINEXTIE 21),
`BASE(E2 I-BASEINEXTIE21m TO DELTA FILE
`
`
`
`
`115
`
`N0
`
`NO
`
`YES
`
`NO
`
`122
`
`LCSLEN[I+1, J]<
`LCSLENII, J+1]
`
`l++;
`E1 =NEXT(E1)
`
`120
`
`1 24
`
`J++;
`E2=NEXT(E2)
`
`Unfied Patents Exhibit 1003
`
`Pg. 6
`
`Unfied Patents Exhibit 1003
`Pg. 6
`
`

`

`US 6,233,589 B1
`
`1
`METHOD AND SYSTEM FOR REFLECTING
`DIFFERENCES BETWEEN TWO FILES
`
`FIELD OF THE INVENTION
`
`The present invention relates generally to backup and
`synchronization of files, and in particular relates to a method
`and system for reflecting differences between two files.
`BACKGROUND OF THE INVENTION
`
`Copies of files are frequently transmitted over a network
`from one computer to another computer. One reason to copy
`a file is for backup purposes. If a file created on one
`computer has been backed up on another computer, it can be
`easily recovered in the event the hard drive of the first
`computer fails. Because the loss of a file could mean the loss
`of extremely important data, and/or result
`in significant
`effort to recreate the file, file backup processes are very
`common. However, file backup has at least two problems
`associated with it: first, it can require significant network
`bandwidth to transfer file data from the first computer to the
`backup computer, and second,
`it can require significant
`storage space to maintain copies of files. Both of these
`problems can be alleviated to some extent through the use of
`an incremental backup. An incremental backup copies only
`those files that have been changed since the previous ’
`backup. Incremental backups can significantly reduce the
`number of files that are backed up on a periodic basis.
`Typically, when a file is modified, only a small portion of
`the file is actually changed from the previous version. While
`an incremental backup can reduce network bandwidth and
`save storage space compared to a complete backup, it is still
`inefficient in that a complete file is backed 11p even though
`it is possible that only a small portion of the file was actually
`modified.
`In an attempt
`to improve upon incremental
`backups, backup processes exist that identify the dillerences
`between two versions of a file, and attempt to backup only
`those dilIerences. This is referred to as a dillerencing pro—
`cess. Differencing processes can reduce network bandwidth
`and storage requirements because only portions of the file
`are backed up.
`Copies of files are also frequently made for purposes of
`synchronization or replication. A synchronized file exists in
`two dilferent locations, such as on two dilferent servers, and
`changes made to one file must be reflected in the other file.
`Synchronization usually occurs by periodically copying the
`file from one location to the other location.
`
`10
`
`o;O
`
`35
`
`40
`
`45
`
`U.S. Pat. No. 5,634,052 discloses a system for reducing
`
`storage requirements in a backup subsystem. The system
`‘erences _
`includes creating a delta file reflecting the di
`between a base file and a modified version of the base file,
`and transmitting the delta file to a server [or backup pur—
`poses. One problem associated with this system is that the
`base file is necessary to create the delta file that re ects the
`differences between the base file and the revised file. Thus,
`if the delta file is to be created on another computer, such as
`the server, the base file must first be transmitted to tie server
`where the differencing operation is carried out. Moreover,
`the ’052 patent does not disc ose optimal mechanisms for
`creating the delta file.
`In a differencing backup system, the differencing mecha—
`nism used to create the delta ile can be quite important. It
`is not uncommon for files to be many megabytes in size. A
`differencing mechanism that processes a file multiple times,
`
`
`or processes a file in an ine icient manner can result in
`excessive backup times. Moreover, an inellicient dillerenc-
`ing mechanism can result in more data being backed up than
`
`
`
`
`
`60
`
`
`
`2
`necessary. In other words, two dilferencing mechanisms can
`vary in their ability to efficiently recognize and reflect
`dillerences between two files. Also, it would be preferable
`for a differencing mechanism to be able to determine dif-
`ferences between a base file and a modified version of the
`base file without actually having to repeatedly process the
`base file, so that the differencing operation can be performed
`on a remote computer, without the need to process the entire
`base file.
`
`US, Pat. No. 5,574,906 discloses a system and method
`for reducing storage requirements in backup subsystems.
`The ’906 patent discloses a system similar to that disclosed
`in the ‘052 patent, with the enhancement that the base file
`from which the differencing operation is derived can be
`compressed. In certain files, a compressed base file will
`utilize less bandwidth and less storage space on a computer
`than would an uncompressed based file. One problem with
`this approach is that
`the compressibility of files differs
`greatly. While compression can significantly reduce the size
`of some files, compression algorithms do not obtain signifi-
`
`cant reduction with other type of files. Additionally,
`the
`
`
`
`di ‘erencing mechanism of the ’906 patent works by first
`compressing the revised version of the file, and upon deter-
`mining that compressed portions of the base file and the
`revised file differ, both the base file and the revised file are
`uncompressed at
`those locations so that
`the differences
`between the two files can be determined. The overhead
`involved in such compression/decompression algorithms
`can be significant.
`US. Pat. No. 5,479,654 discloses an apparatus and
`method for generating a set of representations of the changes
`made in a computer file during a period of time. The process
`disclosed in the ’654 patent makes multiple passes through
`portions of the most recent version of the file to determine
`the differences between it and the previous version of the
`file.
`Thus, it is apparent that a differencing system that reduces
`network trallic, elliciently determines and rellects diller-
`ences between two files quickly, and reduces storage
`requirements would be highly desirable.
`SUMMARY OF TIIE INVENTION
`
`It is one object of the present invention to provide a
`method and system for determining the differences between
`a base file and a modified file without the need for a copy of
`the base file.
`
`It is another object of the present invention to provide a
`method and system for reflecting the differences between
`two files that is highly efficient and reduces network traffic.
`It is yet another object of the present invention to provide
`a method and system that can determine the differences
`between a base file and a modified file either on a client
`computer or a server computer without a need for a copy of
`the base file.
`It is still another object of the present invention to provide
`a method and system for creating a signature of a base file
`that can be used to determine the differences between a base
`file and a modified file.
`Additional objects, advantages and other novel features of
`the invention will be set forth in part in the description that
`follows and, in part, will become apparent to those skilled in
`the art upon examination of the invention. To achieve the
`foregoing and other objects, and in accordance with the
`purposes of the present invention as described above, a
`method and system for reflecting dilIerences between two
`versions of a file is provided. The method includes
`
`Unfied Patents Exhibit 1003
`
`Pg. 7
`
`Unfied Patents Exhibit 1003
`Pg. 7
`
`

`

`3
`from a base file, a base signature file that
`generating,
`includes a plurality of base bit patterns. Each bit pattern is
`generated as a function of a portion of data in the base file.
`A revised version of the base file is created. A revised
`signature file, including a plurality of revised bit patterns, is
`generated from the revised file. Each revised bit pattern
`matches at least one of the base bit patterns. Based on
`differences between the base signature file and the revised
`signature file, the revised file is accessed and a delta file
`reflecting the differences between the base file and the
`revised file is generated.
`Once the base signature file is generated from the base
`file, the base file need not be accessed to generate the delta
`file reflecting the differences between the base file and the
`revised file. Thus, according to the present invention, the
`base signature file, which is a small fraction in size of the
`original base file, can be transmitted over a network to
`another computer, such as a server, and the differencing
`operation to generate the delta file can be carried out on the
`server, without the need to transmit the original base file to
`the server.
`According to one embodiment of the invention, the base
`signature file is generated by reading fame size blocks of
`data from the base file, and for each block of data, calcu—
`lating and generating a bit pattern, such as a cyclic redun-
`dancy check (CRC) value that is stored in the base signature .
`file. After the base file is processed, a base signature file
`exists containing a plurality of CRC values derived from the
`data contained in the base file. Amodified, or revised version
`of the base file is made, and a revised signature file is created
`by reading frame size blocks of data from the revised file and
`generating a revised CRC value for each data block. The
`revised CRC value is compared to the CRC values in the
`base file. If the revised CRC value matches a CRC value in
`the base signature file, the revised CRC value is stored in the
`revised signature file. Amatch is an indication that the block
`of data matches a block of data in the base file. If no CRC
`values in the base signature file match the revised CRC bit
`pattern, a new block of data olIset one byte from the
`previous block of data is read. This process is repeated until
`a revised CRC bit pattern is generated that matches a bit
`pattern from the base signature file. The revised bit pattern
`is then saved to the revised signature file, along with the
`offset of the block of data from the beginning of the revised
`file, and the next block of data is read from the revised file,
`this block of data beginning at the end of the previous block
`of data. At the end of the process, a revised signature file
`exists with CRC values that match CRC values in the base
`signature file, and offsets indicating the location in the
`revised file of the matching bloc (s of data.
`
`The base signature file and revised signature file are then _
`
`processed, and based on the di erences between the two
`files, and the olIsets in the revised signature file, the revised
`file is accessed and a delta file is created reflecting the
`differences between the base file and the revised file. The
`delta file preferably contains primitives, such as insert,
`modify, and delete primitives, which are commands that can
`be applied to a previous version of the file to generate the
`revised file.
`According to one embodiment of the present invention,
`the revised signature file and the base signature file are
`processed in the order of the longest common subsequence
`of revised bit patterns with respect to the base bit patterns in
`the base signature file. Determining differences between the
`base file and the revised file based on the longest common
`subsequence of bit patterns ensures a minimal number of
`primitives are used in the delta file to reflect the dilIerences
`between the base file and the revised file.
`
`
`
`40
`
`45
`
`60
`
`US 6,233,589 B1
`
`10
`
`(,4O
`
`35
`
`
`
`
`4
`The present invention reduces the amount of data neces—
`sary to reflect differences between two files, reducing both
`network utilization and storage requirements. The present
`invention is useful not only in backup systems, but also in
`re alication and synchronization systems. Moreover, once a
`base signature file has been generated, a delta file reflecting
`di erences between the base file and a subsequent version of
`the base file can be generated without the base file. Still other
`ob'ects of the present invention will become apparent to
`those skilled in this art from the following description
`wherein there is saown and described preferred embodi-
`ments of this invention. As will be realized, the invention is
`
`
`capable of other di erent obvious aspects all without depart-
`ing from the invention. Accordingly,
`the drawings and
`description will be regarded as illustrative in nature and not
`as restrictive.
`
`
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The accompanying drawings incorporated in and forming
`a part of the specification, illustrate several aspects of the
`present invention, and together with the description serve to
`explain the principals of the invention. In the drawings:
`FIG. 1 is a block diagram illustrating an environment in
`which the present invention has application;
`FIG. 2 is a block diagram illustrating another environment
`in which the present invention has application;
`FIG. 3 is a schematic diagram illustrating the relationship
`between various files and a delta file according to one
`embodiment of the present invention;
`FIG. 4 is a flow diagram illustrating a process for gener—
`ating a base signature file according to one embodiment of
`the present invention;
`FIG. 5 is a flow diagram illustrating a process for gener-
`ating a delta signature file according to one embodiment of
`the present invention; and
`FIG. 6 is a flow diagram illustrating a process for gener—
`ating a delta file according to one embodiment of the present
`invention.
`
`Reference will now be made in detail to present preferred
`embodiments of the invention, examples of which are illus—
`trated in the accompanying drawings, wherein like numerals
`indicate the same elements throughout the views.
`DETAILED DESCRIPTION OF PREFERRED
`EMBODIMENTS
`
`
`The present invention is directed to a differencing mecha-
`
`
`nism that quickly and e ieiently determines the differences
`
`between two files, and generates a delta file reflecting those
`
`
`
`di erences. Referring to FIG. 1, an environment using file
`synchronization or file replication is shown. Each of servers
`22 and 24 maintain a copy of a base file 21, 27, respectively,
`and are interconnected via a network 26. Base files 21, 27
`should be identical to one another and thus changes made to
`one copy should eventually be reflected in the other copy. A
`base signature file, as described in more detail herein, is
`generated on one of the servers 22, 24 and copied to the
`other server. The base signature file is maintained on server
`22 as file 20, and maintained on server 24 as file 28. Either
`base file 21 or 27 can be modified at either server. Upon
`detection of a modification to the file, the detecting server,
`for example server 22, uses the respective base signature
`file, for example, base signature file 20, to generate a new
`delta file, and communicates the delta file over network 26
`to server 24. Server 24 then uses the delta file to update the
`base file 27, and recalculates the base signature file 28. In
`
`Unfied Patents Exhibit 1003
`
`Pg. 8
`
`Unfied Patents Exhibit 1003
`Pg. 8
`
`

`

`5
`this manner, base files 21, 27 on servers 22 and 24 can stay
`in synchronization with minimal transfer of data over net-
`work 26.
`
`6
`Any version of the file can be derived from the base
`version of the file F and the given deltas between the base
`version and the desired version.
`
`US 6,233,589 B1
`
`5
`
`10
`
`FIG. 2 is a block diagram illustrating another environment
`in which the present invention has application. Aplurality of
`client computers 32 are connected over a network 26 to a
`backup server 30. Backup server 30 is in communication
`with a storage medium 34 for storage of files backed up from
`computers 32. Each new file created on computers 32 is
`preferably communicated to backup server 30 and backed up
`on storage medium 34. For each new file generated, a base
`signature file is created. The base signature file can be
`generated either on computer 32 or backup server 30. Auser
`of computer 32 can modify the base file and generate a
`revised file The differencing mechanism of the current
`invention then processes the base signature file and the
`revised file to generate a revised signature file. The revised
`signature lile and base signature file in conjunction with the
`revised file are used to create a delta file, which can then be
`communicated over network 26 to backup server 30, and
`stored on storage medium 34.
`Multiple versions of delta files can be maintained so that
`any particular version of a file can be restored. To accom-
`plish this, a new base signature file is generated from the
`revised file. In essence, the revised file becomes the base file ’
`for the next version of the revised file. If the base signature
`files are being maintained on backup server 30, and only the
`delta file is communicated from computers 32 to backup
`server 30, a new base signature file can be created on backup
`server 30 by first applying the delta file received from
`computer 32 to the previous base file to create the revised
`file. (This revised file becomes the base file for use with the
`next received delta file.) The base signature file can then be
`generated from the revised file. If the base signature files are
`being maintained on computer 32, the new base signature
`file can be generated from the actual revised file that was
`used in creating the delta file. In this manner, multiple delta
`files can be maintained to allow the recreation of any version
`of the base file.
`
`o;O
`
`35
`
`40
`
`Rf generates the required version of the file from a given
`base file and a forward delta.
`Rb generates the required version of the file from a given
`base file and a backward delta.
`If F,- represents the im version of file F and dz. represents the
`forward delta then
`
`Fi:Rf(E di)
`
`If F} represents the j”‘ version of the file F and di represents
`the backward delta then
`
`F,=R..(F, 41-)
`Streams
`Assume S is a stream of bytes. If the function len(S)
`returns the length of stream S, then the following conven—
`tions hold true:
`
`i. S[i] Where léiéleii(S) represents a single byte at
`position i within the stream S.
`ii. S[a, b] where léa, bélen(S) represents a substream of
`S starting at position a and the next b—l bytes
`iii. S[1, len(S)]=S
`iv. S[n, #]=S[n, len(S)]
`v. if Sl[i]=S;[i] Vi and len(Sl)=len(Sz) then S1 and S2 are
`same.
`
`vi. S1+S2 concatenates S2 to S1.
`Blocks
`Each stream is divided into blocks of fixed length such
`that following conventions hold true:
`i. BS is a fixed integer, used to represent the block size in
`the stream S
`
`ii. Given a stream S, N (S) represents the number of blocks
`in the stream S such that
`
`N(S)=[Ien(S:)/BS]
`
`If network bandwidth consumption is not a problem, and
`client processing power is limited, the revised file can be
`communicated to backup server 30, and the delta file can be
`created on backup server 30. Alternatively, the base signa—
`ture file can be stored by backup server 30 on storage
`medium 34, and then communicated to computer 32 upon its
`request. Computer 32 can then use the base signature file to
`create the delta file, and transmit the delta file to backup
`server 30.
`
`45
`
`Certain terminology will be used to describe the invention
`herein. The term ‘stream’ will be used to refer to contiguous
`bytes of data stored in a file. The phrase ‘delta‘ will be used
`to refer to differences between two files.
`Deltas
`A versioned sequence of a file can be represented as
`follows:
`
`iii. B"(S) represents n”1 block in the stream S such that,
`Enl:.5')=3[BS*n+1, BS"(11+1)], 0§11§N(S)—1
`
`Signature of Stream
`A signature is calculated with each stream of data. The
`signature of the stream enables the identification of changed
`blocks in the stream. The signature of the stream is described
`as follows:
`
`i. CRC(B,L(S)) represents the CRC of the block B” of the
`stream S.
`
`ii. CRC(A)=CRC(R) iff A=B, where A and B are two
`blocks in the stream S (CRC(A)=CRC of block A).
`iii. Sig(S) represents the signature of the stream S such
`that
`
`{‘11) d2; d3; -
`
`-
`
`-
`
`, dm: Edm+1> dm+2.'
`
`-
`
`' dm+n}>
`
`Sig(5)={CRC(3i(S>>: CRCfB2(SD, -
`
`-
`
`-
`
`s CRC(BN(5)(S>>}
`
`, dmm
`.
`.
`where F is the base version of the file and d1, ell, .
`are deltas of file F. Deltas d1, d2, .
`.
`.
`, d,” represent reverse
`deltas and deltas dmfl, dmu, .
`.
`.
`, dmm
`represent forward
`deltas with respect to file F.
`Given two versions of file F, Fl. and Fr, a diff function is
`defined such that
`
`d=dUflFp m
`
`where d is the delta changes between 17,- and Ff.
`
`60
`
`If C,- is the CRC of a block i in a stream S then
`i. Base(C,7) is the starting position of the block within the
`stream S
`
`ii. Next(Ci) returns the CRC of the next block in the
`signature Sig(S)
`
`The present invention generates a delta file reflecting the
`
`
`
`di erences between a first and second file. The delta file
`preferably comprises primitives, such as insert, modify and
`delete commands, which reflect the operations to apply to
`
`Unfied Patents Exhibit 1003
`
`Pg. 9
`
`Unfied Patents Exhibit 1003
`Pg. 9
`
`

`

`US 6,233,589 B1
`
`7
`the llrst file to derive the second file. According to one
`embodiment of this invention,
`three primitives are used.
`Each primitive is dellned as a function that returns the
`resultant strem(S*) after applying the operation to the base
`stream S.
`Insertion primitive:
`S*=ins(S, i, S')=S[1, i]+S'+S[i, #] where 5' represents the
`stream that was inserted within the stream S.
`Deletion primitive:
`S*=del(S, i, len)=S[l, i]+S[i+len, #]
`Modification primitive:
`S*=m0d(S, i, S‘)=S[l, i]+S'+S[i+len(S'), #] where 8' rep-
`resents modification to the substream S[i, len(S‘)]
`A sequence of primitive operations on a stream can be
`represented as 1312) _
`_
`_ ”(S) such that
`W _
`_
`_ 1,,(5‘)=P,,(P,,,1( .
`.
`. (P1(S)))) where PiE[ins, del, mod]
`
`Inverse Primitive:
`Give S, S* , P such that:
`
`S*=P(S)
`
`we define inverse primitive P— such that
`
`S=P7(S*)
`
`If P=ins(S,i,S') then P—=del(S*, i, 1511(9))
`
`If P=del(S,i,len) then P7=ins(Sx, i, S[i+l, len])
`
`If P=mod(S, i,S') then P—=mod(S*, i, S')
`
`If P=P,,(P,,,1( .
`
`.
`
`. (P1(S)))j then
`
`Pmpjmzp .
`
`. (133(5))
`
`10
`
`o;O
`
`35
`
`8
`and revised signature file 48, preferably in the longest
`common subsequence order of revised bit patterns. Based on
`comparisons between base signature file 42 and revised
`signature file 48, and using the offsets contained in revised
`signature file 48 and the data in revised file 44, the delta file
`52 is generated. Revised file 44 can then be treated as a new
`base file, and a new base signature file can be generated from
`revised file 44. Alternately, the next revision of revised file
`44 can be processed against the original base signature file
`
`42, and a delta file can be generated that reflects the
`
`
`
`di erences between original base file 38 and the newest
`version of the revised file. The method and system of the
`present invention eliminate the need to retain the base file for
`purposes of generating a delta file after generation of the
`base signature file. Because the base signature file comprises
`bit patterns which are significantly smaller in size than the
`blocks of data which they represent, the base signature file
`42 is a small fraction of the size of the original base file 38.
`Preferably, the bit patterns used in the present invention
`are cyclic redundancy code (“CRC”) patterns. A CRC is
`derived from a block of data and the calculation and use of
`CRCs (for the purpose of detecting errors in data
`transmission) is well known to those skilled in the art, and
`will not be discussed herein in detail. The block size chosen
`earl be arbitrary, but must be the same for the base file and
`for subsequent revisions of the base file. The block size
`should not be so small that the process loses elliciency, and
`not so large that large blocks of data are backed up from
`minor modifications. For example, for a large file on the
`order of about one gigabyte, a suitable block size would be
`about 64 Kb. Assuming that the CRC polynomial is 32 bytes
`long, the size of the signature file would be approximately
`128 Kb. Thus, the signature file is significantly smaller than
`the data file on which it is based. Thus, a trade off exists
`between the size of the signature file and the block size so
`that effective performance is achieved.
`A CRC algorithm treats a message, or block of data, as a
`single binary number and divides it by another fixed binary
`number. The remainder of this division is known as the
`CRC. The fixed binary number is normally called the
`polynomial of the CRC. While a normal CRC algorithm
`could be applied to the present invention, performance may
`not be satisfactory. Preferably, a table-driven CRC algorithm
`is used so that a mere shift, OR, XOR and a table lookup per
`byte can be used to derive the CRC.
`The use of a CRC has several advantages. First, a CRC
`can serve as a unique identity that represents the contents of
`a logical block of data. When the contents are changed, the
`CRC value should change. Ideally, a CRC algorithm has
`polynomial time complexity and very little space complex-
`ity. In other words, if the size of the file is N, the time
`required by the CRC algorithm to perform the calculation is
`K*N (where K is a constant), and has relatively small
`memory requirements. The CRC value itself should con-
`sume very little space relative to the amount of data that it
`represents. Given a stream of data where the logical blocks
`of data represented by a set of function outputs is embedded,
`it should be possible to determine the boundaries of the
`logical blocks represented by these outputs in polynomial
`time in a single pass of the file. A table-driven CRC is
`extremely fast and does not depend on the size of the
`polynomial. The purpose of one aspect of the present
`invention is to back up portions of files that have changed
`with respect to a previous version of the file. When data is
`inserted or deleted in a file, the position of blocks of data in
`the revised file dilfers from the same blocks of data in the
`previous version of the file. The differencing mechanism of
`
`Unfied Patents Exhibit 1003
`
`Pg. 10
`
`If P represents the forward delta for the base stream to
`generate the current version of the stream, then P— repre-
`sents the backward delta for the current version of the stream
`to generate the base version of the stream.
`In general, one aspect of the present invention relates to
`solving the following problem: given two streams S1 and $2
`llnd P1,2 .
`_
`_ ’ n such that Sz=P1,2 _
`_
`. .n (Si) using Sig(Sl) and
`Sm
`With this notational background, a high—level schematic
`of the present invention can be described with reference to
`FIG. 3. A generator 40 processes a base file 38 and generates
`a base signature file 42. Base signature file 42 comprises a
`plurality of bit patterns, each of which is derived as a
`function of a block of data from base file 38. Base file 44 is
`then modified, or revised, by a riser and a revised file 44 is .
`created. Generator 46 processes the revised file 44, gener-
`ating bit patterns as a function of blocks of data from revised
`file 44. To distinguish these bit patterns from those contained
`in base signature file 42, the bit patterns generated as a
`function of data from revised file 44 will be referred to as
`revised bit patterns, and those contained in base signature
`file 42 will be referred to as base bit patterns. Each revised
`bit pattern is compared to the base bit patterns in base
`signature file 42. For each revised bit pattern that matches a
`base bit pattern in base signature file 42,
`it is stored in
`revised signature file 48, along with an offset indicating the
`location in revised file 44 of the beginning of the block of
`data represented by the revised bit pattern. In this manner,
`generator 46 creates a revised signature file 48 containing a
`plurality of revised bit patterns. To generate a delta file 52
`rellecting the differences between base file 38 and revised
`file 44, a generator 50 processes the base signature f

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket