`Carson
`
`US005978805A
`[11] Patent Number:
`[45] Date of Patent:
`
`5,978,805
`*Nov. 2, 1999
`
`[54] METHOD AND APPARATUS FOR
`SYNCHRONIZING FILES
`
`[75] Inventor: Dwayne A_ Carson, Brockton, Mass_
`
`5,155,847 10/1992 Kirouac et al. ....................... .. 395/600
`5,163,137 11/1992 Yamamoto et al.
`395/286
`5,210,865
`5/1993 Davis et al.
`395/575
`5,222,224
`6/1993 Flynn et al. .......................... .. 711/144
`5,274,807 12/1993 Hoshen et al. ........................ .. 707/205
`
`-
`.
`-
`[73] Assignee. 1l:/[/[1cr0c0m Systems, Inc., NorWood,
`ass‘
`
`5,283,646
`574467888
`5,506,958
`
`2/1994 Bruder ...... ..
`8/1995 Pyne ~ ~ ~ ~ ~
`4/1996 Myran .... ..
`
`.. 348/415
`~ ~ ~ " 395/600
`395/182.16
`
`*
`
`.
`_
`] Nonce-
`
`[
`
`.
`.
`.
`Th1s_ Pate“t {SSW/d on a Con?rmed Pros‘
`eclltlon aPPhCaIIOH ?led under 37 CFR
`1-53(d), and is subject to the twenty year
`patent term provisions of 35 U.S.C.
`154(a)(2).
`
`[21] App1_ No; 08/856,111
`_
`[22] Flled:
`
`May 14’ 1997
`
`Related US. Application Data
`[60] Provisional application No. 60/017,750, May 15, 1996.
`
`Int. Cl-6 .................................................... ..
`[52] US. Cl. .......................... .. 707/10; 707/202; 707/204;
`707/205
`[58] Field of Search ............................ .. 707/10, 205, 202,
`707/204; 211/144; 834/4; 395/185_01, 184,01,
`200.76, 286, 182.16
`
`[56]
`
`References Cited
`
`Us‘ PATENT DOCUMENTS
`3/1967 Landell .............................. .. 340/1725
`3,311,893
`3,612,660 10/1971 Miller .......... ..
`. 340/172.5
`4,232,375 11/1980 Paugstat et a1
`364/900
`474127306 10/1983 M011 ~~~~~~~~~~~~ ~~
`364/900
`438913785
`1/1990 Donohoo """"" "
`" 364/900
`4,914,583
`4/1990 Weisshaar et al. ................... .. 364/200
`5,086,402
`2/1992 Sterling, II ............................ .. 364/514
`
`5,086,434
`
`2/1992 Abe et al. . . . . . . . . .
`
`. . . . . .. 375/7
`
`5,101,348
`
`3/1992 Arrowood et al. ................... .. 395/200
`
`395/184.01
`5,539,879
`7/1996 Pearce et al.
`395/185.01
`5,583,988 12/1996 Crank et al. .
`....... .. 834/4
`5,659,614
`8/1997 Bailey, III
`707/10
`5,721,907
`2/1998 Pyne ....... ..
`5,752,251
`5/1998 Cripps ................................... .. 707/202
`
`Primary Examiner—Wayne Amsbury
`Assistant Examiner—Thuy Pardo
`Attorney, Agent, or Firm—MintZ, Levin, Cohn, Ferris,
`Glovsky and Popeo, P.C.
`
`[57]
`
`ABSTRACT
`_
`_
`_
`_
`A method and apparatus is disclosed for synchronizing ?les
`Stored in memory of [WO remotely located Systems_ The
`system can generate a copy of a source ?le at the destination
`location Without transferring all the data of the source ?le.
`The destination location includes a reference ?le that can
`contain similar data as that contained in the source ?le. The
`synchronization is accomplished by sending keys
`(representing reference blocks of data from the reference
`?le) to the source system. At the source system, a portion of
`each keys called a Feature is compared With portions of
`source ?le. If the Features match, a checksum corresponding
`to the reference block is compared with a check sum
`corresponding to the source block. If the checksums match,
`an short message identifying the key and reference block
`that matched is transmitted to the destination system in order
`that the reference block can be copied from the reference ?le
`-
`-
`-
`“1 Order to bulld the Synchromzed ?le‘
`
`21 Claims, 14 Drawing Sheets
`
`11o
`
`140
`
`358
`
`35
`Source Fby
`
`25a
`25
`
`2 25b
`
`‘K
`Reference File
`’F’édrh'ré’E\
`Length U Q 1'
`1
`219895 I
`__
`e:
`H E
`12' 2'
`L2
`1
`C, I;
`“El, ,
`
`_ 7 7 7 7 7 7 "E:
`
`130
`
`35b/
`
`3
`
`3:
`
`_ ______ ,,Q.'
`fir
`L4 .'
`a,’
`
`4
`
`150
`
`— iiiiii "E1 m ,
`
`110
`
`FreeCRCust 1
`
`5
`
`L5]
`-—1
`
`3
`
`_
`
`4'
`
`—
`
`5|
`
`CRC-block 3
`
`AP-1008.001
`
`
`
`U.S. Patent
`
`Nov. 2, 1999
`
`Sheet 1 0f 14
`
`5,978,805
`
`25
`
`Reference File
`
`/1O
`
`35
`
`Copy of Source File
`
`Source
`File
`
`Destination
`System
`
`22
`
`V
`
`Destination
`Process
`
`Source
`System
`
`‘
`
`f 32
`
`Source
`Process
`
`Fig. 1
`
`AP-1008.002
`
`
`
`U.S. Patent
`
`Nov. 2, 1999
`
`Sheet 2 0f 14
`
`5,978,805
`
`110
`
`)2
`
`/114
`
`/116
`
`/120
`
`120
`
`120
`
`CRC
`B|ock “3
`
`Block Offset
`
`CRC Item
`Count
`
`CRC Item0 CRC Hem] . . . CRC ItemM
`
`/
`
`/ /
`
`/
`/
`/
`/
`
`/ /
`
`/
`/
`/
`
`’
`/ /
`
`/
`/
`
`/ /
`
`\
`
`\
`
`\
`\
`\
`\
`
`\
`
`\
`\
`\
`\
`\
`
`\
`\
`
`\
`
`Feature
`
`Length of
`data block
`
`Checksum
`
`122
`
`124
`
`126
`
`Fig. 2
`
`AP-1008.003
`
`
`
`U.S. Patent
`
`Nov. 2, 1999
`
`Sheet 3 0f 14
`
`5,978,805
`
`COPYCRCBLOCK
`
`ID corresponding to 0
`received CRC-Block from
`the Destination System
`i
`
`\132
`
`130
`
`7/
`
`ID of the CRC-ltem in the
`CRC-Block received from
`the Destination System
`i
`
`\134
`
`WRITEDATABLOCK
`
`140
`
`/
`
`Length of
`Data Block
`
`‘
`
`\142
`
`Block of DQTQ
`
`i
`
`\144
`
`150
`
`FREECRCLIST /
`
`ID corresponding to the
`CRC-Block for which
`processing is complete
`
`Fig. 3
`
`AP-1008.004
`
`
`
`U.S. Patent
`
`Nov. 2, 1999
`
`Sheet 4 of 14
`
`5,978,805
`
`25a
`
`35
`
`35a
`
`
`
`Source File
`
`AP-1008.005
`
`AP-1008.005
`
`
`
`U.S. Patent
`
`Nov. 2, 1999
`
`Sheet 5 0f 14
`
`5,978,805
`
`Synchronize
`Main Destination Process
`Begin
`
`200 /
`
`ll
`Create Empty Temporary
`File [Reconstruction File)
`
`210 /
`
`l
`Open Reference File for
`Reading
`
`212 /
`
`/
`
`Initialize Pointers
`and Counters
`
`Create a cache for CRC
`Blocks
`
`V
`
`Call GenerateCRCBlocks
`(Begin generating and
`sending CRC Blocks]
`
`V
`
`Process messages from
`Source Process, Build
`Reconstruction File, and
`Generate CRC-blocks as
`needed until the
`Reconstruction ?le is
`Complete or Abort
`message received.
`
`Done
`
`Fig. 5
`
`220
`
`270 /
`
`218
`
`AP-1008.006
`
`
`
`U.S. Patent
`
`Nov. 2, 1999
`
`Sheet 6 0f 14
`
`5,978,805
`
`Generate CRC Blocks
`Subprocess Begin
`
`220
`/
`
`CRC Done
`Flag = True
`
`222
`
`YES
`
`224
`
`Any free Cache
`Blocks?
`
`NO
`
`226
`
`228
`/_/
`
`Allocate Cache Block and
`Assign CRC-block ID
`
`Call Build CRC List and
`attach to CRC Block.
`
`230
`//
`Did Build CRC List
`report an empty list
`'.>
`
`232
`
`Send CRCBLOCK
`— message with CRC Block
`information.
`"
`
`234
`v /
`SQF‘dBICREEND
`d 50? CRC
`on e
`Done ?ag = True
`
`336
`
`591 CRC Done
`FlGg = False
`
`238 ?g
`
`)
`
`" a
`
`Done
`
`Fig. 6
`
`AP-1008.007
`
`
`
`U.S. Patent
`
`Nov. 2, 1999
`
`Sheet 7 0f 14
`
`5,978,805
`
`Build CRC List
`Subprocess Begin
`
`.
`
`.
`
`v
`Store the Reference
`File po|nter as the
`Block Offset for the /
`CRC-block
`
`242
`
`240
`/
`
`244
`Initialize the CRC-Item /
`Count: Set I = 0
`
`v
`
`246
`
`'5 I < M?
`lM=MOX Count)
`
`248
`
`Attempt to read n Data
`Units from the Reference
`File into a cache. Let /
`Length = The number
`of data units successfully
`read.
`
`252
`/
`For the Len th of
`cached data,gselect
`the first k data
`units to be
`the Feature.
`
`260
`Store I as the CRC /
`"em Count for The
`CRC-block
`
`250
`
`ls Length > 0?
`
`I5 CRC "em Count
`= 0?
`
`262
`
`NO
`
`264
`
`266
`
`V
`-
`Return NOT Empty List
`
`-
`Return Empty List
`
`254
`
`‘V
`
`256
`
`For the cached data,
`calculate a Checksum.
`
`Store the
`Feature, Checksum,
`and Length in the area
`reserved for
`CRCHemm‘
`
`'
`258
`\ l = l + 1
`
`Fig. 7
`
`AP-1008.008
`
`
`
`U.S. Patent
`
`Nov. 2, 1999
`
`Sheet 8 0f 14
`
`5,978,805
`
`Process Sync Block
`Subprocess Begin
`
`’/ 270
`
`CopyCRCBlock
`message?
`
`Determine the
`CRC-block from
`CopyCRCBlock
`Message
`
`it
`Determine the
`Length of the
`Reference Black
`from the CRC Item
`identified in
`CopyCRCBlock
`Message
`
`ll
`Determine
`Reference ?le
`Offset for
`Reference Black to
`be copied to the
`Reconstruction
`File.
`Co y Reference
`B ock to write
`location in
`Reconstruction File
`and undate write
`location
`
`280
`
`WrieDataBlock
`Message?
`
`Write the Source
`File Data contained
`in this
`WriteDataBlock
`Message to the
`Reconstruction File
`at the Current
`Reconstruction
`File's Write
`Location.
`(Write Location is
`updated as a result
`of the write.)
`
`276
`
`Free the Cache
`Block used to hold
`the
`CRC-block
`speci?ed by the
`received
`FreeCRCList
`Message
`(Cache Block is
`now available for
`use again.)
`
`290
`, /
`Ignore
`unknown Sync.
`Block type.
`
`278
`
`K/
`
`220
`
`Call
`Generate CRC
`Blocks, This will
`cause a re-use of
`the newly freed
`Cache Block for
`additional sections
`of the Reference
`Fi e.
`
`292
`3 r
`>, Return
`
`l
`
`Fig. 8
`
`AP-1008.009
`
`
`
`5,978,805
`300 /
`
`U.S. Patent
`
`Nov. 2, 1999
`
`Sheet 9 0f 14
`
`Synchronize
`Main Source Process
`Begin
`
`l
`
`Initialize the Source File Cache
`Segments.
`
`/310
`
`i
`
`Initialize pointers
`
`Y
`
`Process CRC-blocks and wait
`for a CRCEND Message
`
`/314
`
`Fig. 9
`
`AP-1008.010
`
`
`
`I Process CRC Block
`Subprocess Begin
`
`y
`CSet ll: 0. This the
`RC tem num er to
`start with
`
`h
`‘.5 I < T e .
`clic'nem Clo“? "1
`i e CRCAB 92C 5
`Heo er-
`
`324
`
`,
`
`| = | +1 ~
`
`336
`
`340
`/
`Call Locate
`Key to locate
`CRQ-|1em (n in
`the Source Fi|e
`Cache
`Segments.
`
`326
`
`I
`AnyI Source File
`Cac e Segments
`left
`
`328
`
`Send a FREECRCLIST
`command to the
`Destination System.
`
`330
`—>©/
`
`Done
`
`U.S. Patent
`
`Nov. 2, 1999
`
`Sheet 10 0f 14
`
`5,978,805
`
`320 /
`
`322
`
`f 3705
`
`Call Sync-Cache to
`synchronize the cache to a
`New Offset. New Offset = The
`minimum of (the current
`Synchronization Offset +
`update block (1150 bytes) or
`the Source File size.
`
`332
`
`CRC-ttem it] found
`in the Source File Cache
`Segments?
`
`370A
`/
`
`Call Sync-Cache to
`synchronize the cache with a
`New Offset, New Offset =
`Offset of Cache Segment
`where match was found +
`Match Offset determined by the
`Locate Key Subprocess.
`
`v
`338% Send ACOPYCRCBLOCK
`message to the Destination
`System. ID And index in this
`COPYCRCBLOCK message
`correspond to the CRCBLOCK
`and CRC Item which matched.
`
`Fig. 10
`
`AP-1008.011
`
`
`
`U.S. Patent
`
`Nov. 2, 1999
`
`Sheet 11 0f 14
`
`5,978,805
`
`Locate KeylCRCBlock, Index]
`Subprocess Begin
`
`340
`/
`
`342
`
`Does Synchronization
`Offset Pointer exceed the
`Offset of the second
`cache segment
`
`344
`/
`Call DropFirstCacheSegmentihis will
`change the second cache segment to be
`the first cache segment and potentially add
`a new Mth cache segment if more not yet
`processed data exists in the New File.
`
`/346
`
`Let J = 0
`[Starts with the ?rst cache
`segments.)
`
`_
`
`Is
`J < Number of Cache
`Se ments
`9 ?
`
`348
`
`356
`
`_]= j +1
`
`Let Feature = CRCBlock's
`Keyllndex]
`Let CRC = CRCBlock's
`CRCllndex]
`
`|5
`Feature |n Qqche
`SegmentUl
`?
`
`CRC match a _
`calculated C-RC starting
`at the location where
`Feature was
`found?
`
`358
`
`Return FOUND
`Match Segment : J
`Match Offset = Location
`Found - Start Location of
`Cache Segment in which Key
`was found.
`
`Fig. 11
`
`AP-1008.012
`
`
`
`U.S. Patent
`
`NOV.2, 1999
`
`Sheet 12 0f 14
`
`5,978,805
`
`Sync-Cache iOffset)
`Subprocess Begin
`
`/ 370
`
`394
`
`NO
`
`374
`
`/
`LefK g P31
`(The offset to start from]
`
`6f :
`
`v
`
`YES
`
`/39e
`Call
`'
`DropFirstCacheSegment
`If no wclhe segrgems I9“,
`
`_
`
`I:
`
`376
`
`GfK: ,
`
`l5 |( > 0? (Any bytes
`left to consider?)
`
`Let E : Offset of the First
`Cache Segment + Length Of
`the First Cache Segment.
`378
`Thus, E is an offset 0 the end
`\A of the current cache segment.
`
`382
`
`E = E - Max. CRC item
`Length.
`f
`
`386
`
`cache segment equal
`to the maximum
`cache segment
`size
`?
`
`Let Offset : S - Offset of
`the First Cache Segment.
`Let Length L = E - 5.
`
`IS 5 >= Offset 0f
`the c?che Segment
`AND IS 5 < E?
`
`NO
`
`388
`
`is K > Length L?
`
`Y
`
`Let Length L = K.
`
`~
`
`\
`390
`
`Send a WRITEDATABLOCK
`messa e with File Data starti
`9
`"9
`at 5 having Length L data units.
`K = K - Length Land
`S: S + Length L.
`Update Synchronization Offset
`i a
`392
`
`Fig. 12
`
`AP-1008.013
`
`
`
`U.S. Patent
`
`NOV.2, 1999
`
`Sheet 13 0f 14
`
`5,978,805
`
`400 /
`
`Build CRC List
`Subprocess Begin
`
`/
`
`Store the current
`Reference File position in
`the header data area of the
`CRC item List
`
`.. .
`mmul'ze the CBC
`"em Cour"
`Set I = 0
`
`F
`
`ROM H5 14
`
`.
`
`418
`
`R f Attemrétltacfiilthe
`h
`eerence ie ac ewit
`data, t[\JpdgteCand the
`ea pointer
`
`?
`REFC C E
`c<
`A H SIZE
`
`E5
`
`_
`iNt-Max. Count]
`
`NO
`CRC ltemlll's
`Key[0..C] =
`Remaining bytes in
`the Ref. Cache,
`CRCltemlll's
`Length = C,
`CRC temlll's
`CRC= The
`checksum of the
`remaining data in
`the Ref. Cache.
`
`Has the
`Active Feature been
`assigned?
`
`428
`
`C = 0
`
`Starting at CRCBLOCKLEN from
`the head of the Reference Cache ‘
`Search fora Feature which does
`‘
`not match the Active Feature
`
`Define the Active Feature
`as the first k data units at
`the head of the Reference
`Cache.
`
`436
`
`To Fig. 14
`
`Fig. 13
`
`Return NOT Empty
`List
`
`AP-1008.014
`
`
`
`U.S. Patent
`
`NOV.2, 1999
`
`Sheet 14 0f 14
`
`5,978,805
`
`From Fig. 13
`
`442
`
`Was a different
`Feature found?
`
`YES
`
`NO
`
`Let Length = Difference of
`Location found and the
`444
`Head Pointer (Circular
`buffer to be considered in /
`Length calculation].
`Remember a diifferent
`Feature was found and
`where the Feature is in the
`Reference Cache.
`
`448
`
`l
`
`Len ,h z
`CRCBL CKEN +
`CRCBLOCKEN/Q
`
`446
`
`is C - Length > k?
`
`452
`/
`Length = C
`
`.
`
`Store the Active Feature as
`450
`\ the Feature for CRCltemll]
`_
`Calculate a checksum for
`the dOfQ UltffS from the
`head to the Ref. Cache up
`through Len th data units.
`Store this C C as the CRC
`L for (éRCltehml?. Stoke f
`engt as t e engt o
`CRCItemlll.
`
`/454
`
`C = C - Length. Adiust the
`Ref. Cache's tail pointer as
`well.
`
`458
`
`Was a different
`Feature found and is
`C > 0?
`
`Save the Feature which
`was found as the new
`Active Feature.
`
`To H ' 13
`
`7
`
`460
`~ I = t +1 /
`
`Fig. 14
`
`AP-1008.015
`
`
`
`1
`METHOD AND APPARATUS FOR
`SYNCHRONIZING FILES
`
`CROSS-REFERENCE TO RELATED
`APPLICATIONS
`This application claims bene?t of Provisional Application
`No. 60/017,750, ?led May 15, 1996.
`
`COPYRIGHT NOTICE
`Copyright, 1995, 1996, Microcom Systems, Inc. Aportion
`of the disclosure of this patent document contains material
`Which is subject to copyright protection. The copyright
`oWner has no objection to reproduction by anyone of the
`patent document or the patent disclosure, as it appears in the
`US. Patent and Trademark Of?ce patent ?le or records, but
`otherWise reserves all copyright rights Whatsoever.
`
`10
`
`BACKGROUND OF THE INVENTION
`The present invention relates generally to a system for
`transferring data betWeen computer storage devices, and
`more particularly to a system for efficiently synchroniZing
`the contents of speci?c ?les stored on tWo remotely located
`computers.
`There is currently a marked groWth and dependence upon
`mobile and remote computing systems. These systems alloW
`users to access data ?les (stored at the host site) from remote
`locations. In many circumstances it is desirable to have a
`copy of the data ?le at the remote location for access by a
`remote user. This presents difficulty Where the ?le is being
`updated by local users. Alternatively, Where the remote user
`modi?es the data ?le, it is desirable to be able to update the
`?le at the host site.
`Where the siZe of the ?le is very large, and the commu
`nication link, typically a computer modem, is relatively sloW
`(288K bits per second), the time and cost to transfer the ?le
`can be eXcessive. This is especially true Where there are very
`feW differences betWeen the tWo ?les. Therefore, it is
`desirable to be able to update the ?le Without having to
`transmit the entire ?le.
`
`OBJECTS OF THE INVENTION
`Accordingly, it is an object of the invention to provide an
`improved system for duplicating the contents of a ?le stored
`on source system at a destination system.
`It is another object of the invention to provide an
`improved system for duplicating the contents of a ?le stored
`on source a system at a destination system Without trans
`mitting the entire ?le from one system to another.
`It is a further object of the invention to provide an
`improved system for duplicating the contents of a ?le stored
`on source system at a destination system Without transmit
`ting the entire ?le from one system to another in as fast a
`manner as possible.
`Other objects and advantages of the present invention Will
`become readily apparent to those skilled in this art from the
`folloWing detailed description Wherein a preferred embodi
`ment is shoWn and described, simply by Way of illustration
`of the best mode of the invention. As Will be realiZed, the
`invention is capable of other and different embodiments, and
`its several details are capable of modi?cations in various
`respects, all Without departing from the invention.
`Accordingly, the draWings and description are to be regarded
`as illustrative in nature, and not restrictive.
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`55
`
`60
`
`SUMMARY OF THE INVENTION
`The invention is directed to an improved system for
`transferring blocks of data or ?les betWeen computer storage
`
`65
`
`5,978,805
`
`2
`devices and for synchroniZing or updating the contents of
`duplicate ?les stored at both storage devices. In accordance
`With the present invention, a system is disclosed for syn
`chroniZing the contents of one or more ?les stored in
`separate storage devices or storage locations. In accordance
`With one of the preferred embodiments of the invention, a
`system is disclosed Which enables tWo remotely located
`computers to exchange data over a communications link in
`order to duplicate the contents of one or more ?les stored on
`one of the computers on the other computer. Where the
`destination computer contains a ?le similar to the ?le to be
`duplicated, this can be accomplished by transmitting less
`than the entire ?le over the communications link to the
`destination computer, thus enabling the ?le to be duplicated
`faster and providing for ef?cient use of the communications
`link.
`In one embodiment, the system involves transferring a
`source ?le stored on a source computer to be transferred to
`a destination computer containing an older ?le or reference
`?le. The reference ?le is updated, also termed synchroniZed,
`With the source ?le in accordance With one method of the
`invention.
`The process includes dividing the reference ?le into
`reference blocks having a preselected length. For each
`reference block, a reference CRC-item is determined using
`the data from that reference block. In one preferred
`embodiment, the reference CRC-item includes a subset of
`data units from the block, herein referred to as a Feature, the
`length L of the reference block and a checksum value for the
`block of L data units. Preferably, the checksum uses a cyclic
`redundancy check (CRC) checksum algorithm. A CRC-item
`is created for each subsequent block.
`In one embodiment, several CRC-items are created
`together and the Features are selected such that they are
`unique With respect to one or more adjacent CRC-items. If
`the Feature for a given block is similar or identical to the
`Feature of one or more previous CRC-items, the block is
`searched (from the beginning to a search limit) for a subset
`of bytes that are not substantially identical to the prior
`Features. Once an acceptable Feature is selected, the bytes
`prior to the Feature bytes in the block are added to the
`previous block and the length L and the CRC value in the
`CRC-item corresponding to the previous block are adjusted
`accordingly.
`Each CRC-item is transmitted to the source computer,
`Whereby at least a portion of the source ?le is searched for
`a group of data units Which match the Feature. In the
`preferred embodiment, a portion of the source ?le is stored
`in a cache memory that permits high speed comparisons. If
`a match is found, a source checksum value is determined for
`the L data units in the source ?le that include the matching
`portion. If the source checksum value matches the reference
`checksum value in the CRC-item, that block in the source
`?le is presumed to match the corresponding block in the
`reference ?le and thus does not need to be transmitted over
`the communications link. At this point any bytes located in
`the source ?le prior to the bytes matching the Feature bytes
`are transmitted to the destination computer in order to
`duplicate the source ?le at the destination computer and a
`match indicator specifying the matching block in the refer
`ence ?le is sent to the destination computer.
`If the CRC value does not match the source CRC value,
`the Feature is compared to at least the remainder of the
`portion of the source ?le. If no match is encountered, it is
`assumed that the reference block corresponding to that
`Feature and CRC-item has been deleted from the source ?le.
`
`AP-1008.016
`
`
`
`3
`The Feature from the next CRC-item is then compared again
`to the same portion of the source ?le as stated above. This
`process is repeated for all CRC-items or until the end of the
`source ?le is reached. If no match is encountered, that
`remaining portion of the source ?le is transmitted to the
`destination computer in order to duplicate that portion of the
`source ?le at the destination computer.
`The process is then repeated by comparing subsequent
`reference CRC-items With subsequent portions of the source
`?le until the source ?le is completely duplicated at the
`destination computer. In an alternative embodiment, Where
`the source ?le is relatively small the entire source ?le can be
`searched.
`In the preferred embodiment, the source and destination
`computers communicate With each other in accordance With
`a prede?ned protocol. The protocol alloWs multiple CRC
`items to be transferred in the form of a CRC-block
`containing, for example, as many as 50 CRC-items. The
`protocol also provides that When a match betWeen a source
`block in the source ?le matches a reference block in the
`reference ?le, the source computer transmits a protocol
`message to the destination computer identifying the speci?c
`block in the reference ?le that can be used by the destination
`computer in duplicating the source ?le. Typically, this is
`accomplished by identifying (a) the offset into the reference
`?le Where the reference block begins and (b) the length of
`the reference block.
`
`DESCRIPTION OF THE DRAWINGS
`
`The foregoing and other objects of this invention, the
`various features thereof, as Well as the invention itself, may
`be more fully understood from the folloWing description,
`When read together With the accompanying draWings in
`Which:
`FIG. 1 is a diagrammatic vieW of a system in accordance
`With the invention;
`FIG. 2 is a diagram shoWing the structure of CRC-block
`in accordance With the invention;
`FIG. 3 is a diagram shoWing the structure of a Copy
`CRCBlock message, a WriteDataBlock message and a Free
`CRCList message in accordance With the invention;
`FIG. 4 is a diagram shoWing hoW messages are transferred
`betWeen the destination system and the source system in
`accordance With the invention;
`FIGS. 5—8 are ?oWcharts shoWing the destination process
`in accordance With the invention;
`FIGS. 9—12 are ?oWcharts shoWing the source process in
`accordance With the invention; and
`FIGS. 13 and 14 are ?oWcharts shoWing the build CRC
`list subprocess in accordance With an alternative embodi
`ment of the invention.
`
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENTS
`
`55
`
`The invention is directed to a method and apparatus for
`transferring a ?le, herein referred to as the source ?le, from
`a source location to a destination location. In accordance
`With the invention, the destination location can include a ?le,
`herein referred to as the reference ?le, that contains similar
`data. Typically, the source ?le is a modi?ed version of the
`reference ?le. In accordance With the invention, instead of
`transmitting the entire source ?le to the destination location,
`the reference ?le is divided into reference blocks and each
`reference block is compared to a portion of the source ?le to
`determine Whether the reference block can be used to
`
`65
`
`5,978,805
`
`10
`
`15
`
`25
`
`35
`
`45
`
`4
`construct the source ?le at the destination location. Instead
`of sending the entire source block that matched, a relatively
`small message identifying the matching reference block is
`sent to the destination location Whereby the matching ref
`erence block is used to construct an accurate copy of source
`?le at the destination location. In this Way, only copies of the
`source ?le that do not match blocks from the reference ?le
`are transmitted, thus reducing the amount of data transferred
`from the source location to the destination location.
`In one embodiment, the user merely desires to update or
`synchroniZe a ?le at a ?rst location With respect to a ?le
`located at a second location. In this situation, information
`about the ?les at each location is compared to determine
`Which is to be designated the source ?le and Which is to be
`designated the reference ?le. In the preferred embodiment,
`prior to the beginning the of the synchroniZation process, the
`time and date stamps of the ?les, the lengths of the ?les and
`checksums calculated for the entire length of both ?les are
`compared respectively in order to determine Whether the
`?les are identical. If the ?les are not identical, the ?le having
`the most recent time and date stamp is designated as the
`source ?le and the ?le having the oldest time and date stamp
`is designated as the reference ?le. The length of the source
`?le can also be used by the destination process to determine
`When the reconstruction of the source ?le at the destination
`location is completed.
`In accordance With one embodiment of the invention, the
`method of the invention is performed utiliZing tWo separate
`processes, herein referred to as the destination process and
`the source process. The destination process operates on the
`reference ?le and construct a copy of the source ?le at the
`destination location using information received from the
`source process. The source process operates on the source
`?le using information received from the destination process
`to determine Which portions of the source ?le need to be
`duplicated and transferred to the destination location and
`Which portions can be copied or used from the reference ?le.
`FIG. 1 shoWs a system 10 in accordance With the pre
`ferred embodiment of the invention. The system 10 includes
`a destination system 20 and source system 30. For purposes
`of illustration, the destination process can be performed by
`the destination system 20 and the source process can be
`performed by the source system 30. The destination system
`20 includes a reference ?le 25 and a destination location 40
`for storing a copy of the source ?le, herein referred to as the
`reconstruction ?le. The source system 30 includes a source
`?le 35. In the preferred embodiment, the destination process
`is embodied in softWare 22, stored in memory of the
`destination system 20 and executed by the central processing
`unit (CPU—not shoWn) of the destination system 20 and the
`source process is embodied in softWare 32, stored in
`memory of the source system 30 and executed by the CPU
`(not shoWn) of the source system 30. Appendix A includes
`source code in the C programming language Which imple
`ments the ?le synchroniZation process in accordance With
`the invention.
`A communications link 50 interconnects destination sys
`tem 20 and source system 30 to enable data to be transferred
`bidirectionally betWeen the destination system 20 and the
`source system 30. The communications link 50 can be any
`means by Which tWo systems can transfer data such as a
`Wired connection, e.g., a serial interface cable, a parallel
`interface cable, or a Wired netWork connection; or a Wireless
`connection, e.g., infrared, radio frequency or other type of
`signal communication techniques or a combination of both.
`In the preferred embodiment, the destination system 20 and
`the source system 30 include modems (not shoWn) and are
`
`AP-1008.017
`
`
`
`5,978,805
`
`5
`interconnected via a public switched telephone network
`(PSTN). In addition, the communications link 50 is consid
`ered to provide an error correcting link betWeen the desti
`nation system 20 and source system 30 and thus the source
`and destination processes can assume that data transmitted is
`received Without errors.
`The destination process produces CRC-blocks and trans
`fers the CRC-blocks to the source process over the commu
`nications link 50. A CRC-block includes one or more
`CRC-items. The CRC-items represent a block of data in the
`reference ?le (a reference block) and are used by the source
`process to compare the contents of the reference block With
`the same siZe block of data from the source ?le (a source
`block). FIG. 2 shoWs the structure of a CRC-item 120 and
`a CRC-Block 110 in accordance With the preferred embodi
`ment of the invention.
`Each CRC-item 120 is composed of a Feature 122, a
`length value L 124 corresponding to the length of the
`reference block and a checksum value 126. The Feature 122
`is a set of data units from the source block. The source
`process Will search for the Feature in the source in order to
`determine Whether the reference block matches a source
`block of length L 124 Which includes the data units that
`matched the Feature 122. In the preferred embodiment, the
`Feature 122 contains the ?rst 16 bytes from the reference
`block, the length L 124 of the reference block is 5120 bytes
`and the checksum value 126 is a 32 bit Cyclic Redundancy
`Check value although these numbers can clearly vary.
`Each CRC-block 110 is composed of a CRC-block ID
`112, a block offset value 114, a CRC-item count value 116
`(indicating the number of CRC-items included in the CRC
`block 110) and one or more CRC-items 120. The CRC-block
`ID 112 is a unique identi?er used to identify each CRC
`block 110 transferred to the source process. The block offset
`value 114 is an indicator of the location of the reference
`block in the reference ?le corresponding to the ?rst CRC
`item in the CRC-block 110. Preferably, the block offset value
`114 is expressed in terms of the number of bytes offset from
`the beginning of reference ?le. The destination process also
`generates an ENDCRC message (not shoWn) Which is sent
`to source process to indicate that the entire reference ?le has
`been processed and no further CRC-Blocks Will be trans
`ferred. Alternatively, the ENDCRC message can be a CRC
`block 110 Which indicates the number of CRC-items as Zero.
`In the preferred embodiment, the number of CRC-items
`120 in the ?rst CRC-block 110 is selected to be a smaller
`number than subsequent CRC-blocks. This speeds up the
`overall process by alloWing the source process to begin
`processing CRC-items 120 to determine Whether any of the
`corresponding reference blocks match the reference ?le
`While the destination process is generating further CRC
`blocks 110. Preferably, the ?rst CRC-block 110 includes 10
`CRC-items 120 and all subsequent CRC-blocks 110 include
`50 CRC-items 120 although thus number can clearly vary.
`The source process compares reference blocks to source
`blocks and sends either an indication that the blocks
`matched or a message containing a block of data that did not
`match any of the reference blocks. FIG. 3 shoWs the mes
`sages transferred from the source process to the destination
`process. These messages include the CopyCRCBlock mes
`sage 130, the WriteDataBlock message 140 and the Free
`CRCList message 150.
`The CopyCRCBlock message 130 is transferred from the
`source process to the destination process to indicate that a
`reference block matched a source block. The Copy
`CRCBlock message 130 includes a CRC-block identi?er
`
`15
`
`25
`
`35
`
`45
`
`55
`
`65
`
`6
`132 and a CRC-item identi?er 134 Which identify the
`CRC-block 110 and CRC-item 120, respectively, that cor
`respond to the matching reference block. The destination
`process Will use this information to determine Which refer
`ence block to copy into the ?le being built at the destination
`location.
`The WriteDataBlock message 140 is transferred from the
`source process to the destination process to indicate that a
`portion of the source ?le Was not found in the reference ?le.
`The WriteDataBlock message 140 includes a length value W
`142 and a block of W data units 144. The destination process
`Will use the block of data to build the reconstruction ?le at
`the destination location.
`The FreeCRCList message 150 is transferred from the
`source process to the destination process to indicate that it
`has ?nished processing a particular CRC-block 110 and that
`the block of memory Where the CRC-block 110 is stored at
`the destination location can be reused. The destination
`process maintains a copy in memory of each CRC-block 110
`transferred to the source process. The destination process
`uses the copy of a particular CRC-block 110 along With the
`information provided by a CopyCRCBlock message 130 to
`determine Which reference block is to be used to build the
`copy of the source ?le at the destination location.
`FIG. 4