throbber

`UNIFIED PATENTS
`
`EXHIBIT 1004
`
`Unfied Patents Exhibit 1004
`Pg. 1
`
`

`

`U8005832520A
`
`Umted States Patent
`[19]
`[11] Patent Number:
`5,832,520
`
`Miller
`[45] Date of Patent:
`Nov. 3, 1998
`
`[54] AUTOMATIC FILE DIFFERENCING AND
`UPDATING SYSTEM
`
`“Comparing and Merging Files”, from the World Wide Web
`(Jun 8, 1996).
`
`[75]
`
`Inventor: William A. Miller, Gualala, Calif.
`
`[73] Assignee: Miller, Cal], Planck and Miller,
`P0rt0]a Valley, Calif.
`
`An b
`' —W
`Ex
`P '
`Am’m’y‘ AW’W” FiyucT 15 “yd
`Home},
`gent, or
`trm— ownsen
`Crew LLP
`
`d T
`ownsen
`
`d
`
`an
`
`a
`
`an
`
`[21] Appl. N0.: 754,486
`[22]
`Filed:
`Nov. 22, 1996
`
`............................
`3
`n .
`.
`[”1]
`I t C] 6
`G06F 17/30
`[52] US. Cl.
`.. 707/203
`
`[58] Field of Search
`......................... 707/203
`
`[56]
`
`References Cited
`
`7
`9,278,979
`3,418,945
`5,604,853
`5,623,656
`5,644,709
`
`US. PATENT DOCUMENTS
`
`11/1994 1105“” at al‘
`707/293
`5/1995 Latter et al.
`7U//8
`
`
`707/540
`[”97 Nagashima
`
`...... 707/10
`/1997 Lyons
`---------------------- 39318701
`7/1997 Austin
`OTHER PUBLICATIONS
`
`Corrnen, et al., Introduction to Algorithms, The MIT Press
`(1990).
`Pocket Soft, Inc., White Paper re .RTPatch Professional
`Binary Update System, from the World Wide Web (http://
`www.pocketsoft.com) (Nov, 14, 1996).
`
`ABSTRACT
`
`[57]
`
`.
`A method and file structure for generating an efficient
`
`:1 erence igllles frornband old filedand a new fille so that a
`
`
`1 erence
`e can. e transmitte
`to a secon
`computer
`system where 1e dlfference file and a duphcate of the old
`
`file can quickly be used 0 create a copy of the new file is
`
`
`disclosed. Adi erencing process compares an old file and-a
`
`
`new file to generate a di erence file in wnch the 01d file 1s
`
`
`
`used as a database of by e strings. The di ‘ereneing process
`.
`reads strings of data from the new file, searches for the
`existence of those strings in the old file, and notes the
`
`locations in the old file in which strings in the new file are
`
`
`found and stores in a di
`erence file an indication of the
`
`location where a matching
`of the length. A specific fi
`is disclosed,
`
`
`
`string is found and an indication
`e structure for the difference file
`
`28 Claims, 13 Drawing Sheets
`
`START BUILD OF THE
`RAW DIFI= FILE
`
`
`
`INITIALIZE P_OLD AND P_NEW T2
`TO THE BEGINNING OF EACH
`
`
`
`SEARCH THE OLD I=II.E FOR
`THE STRING AT THE CURRENT
`POSITION IN THE NEW FILE
`
`(SEE FIG. 53).
`
`
`
`
`DETERMINE WHETHER
`A COPY. INSERT 0R INSERT/COPY
`OPERATION WILL BE PERFORMED
`
`(AND THE COUNT
`
`
`AND POSITION
`
`
`INFORMATION).
`
`
`
`
`INSERT—
`
`CREATE COPY
`COMMAND AND INSERT
`INTO RAW DIFF FILE
`
`CREATE INSERT CDIVMAND
`AND INSERT INTO RAW DIFF
`
` T10
`
`
`
` UPDATE STATISTICS
`NO N0 = SEARCH HAS ALREADY
`FOR COUNT AND
`RETURNED THE NEXT COPY
`POSITION BITS
`AFTER AN INSERT
`
`
`REQUIRED FOR THIS
`COPY
`
`FILE
`
`
`
`RAW DIFF FILE IS
`COMPLETE
`
`
`
`YES
`
`Unfied Patents Exhibit 1004
`
`Pg. 2
`
`Unfied Patents Exhibit 1004
`Pg. 2
`
`

`

`US. Patent
`
`Nov. 3, 1998
`
`Sheet 1 0f 13
`
`5,832,520
`
`\‘V 9V
`
`DUPLICATE OF
`OLD FILE
`
`15
`
`REVISION
`PROCESS
`
`200
`
`FIG. 1
`
`Unfied Patents Exhibit 1004
`
`Pg. 3
`
`lIl|I||l| :
`
`1
`:
`
`|llllIIIlI
`
`Unfied Patents Exhibit 1004
`Pg. 3
`
`

`

`US. Patent
`
`Nov. 3, 1998
`
`Sheet 2 0f 13
`
`5,832,520
`
`- 105
`V eV v
`
`10
`
`BUILDER
`
`TSI
`
`106
`
`04
`
`SEARCH
`ENGINE
`
`9/
`
`
`RAW
`
`
`DIFFERENCE
`
`
`FILE
`
`
`w
`
`20
`
`own
`
`BUlDER
`
`\\13
`
`1o
`
`10
`
`z’/
`
`10
`
`K//
`
`3'
`
`COMMAND
`
`ENCODE
`
`COUNT
`ENCODE
`
`
`
`FILE
`
`FINAL
`DIFFERENCE
`
`FIG. 2
`
`Unfied Patents Exhibit 1004
`
`Pg. 4
`
`Unfied Patents Exhibit 1004
`Pg. 4
`
`

`

`US. Patent
`
`Nov. 3, 1998
`
`Sheet 3 0f 13
`
`5,832,520
`
`START THE DIFFIT
`PROCESS
`
`82
`
`INPUT OLD FILE AND NEW
`FILE
`
`S4
`
`FORM CHECKSUMS FOR {\35
`OLD FILE AND NEW FILE
`
`BUILD THE TSI (TEXT STRING
`INDEX) STRUCTURE FOR THE
`
`OLD FILE
`
`810
`
`BUILD THE RAW DIFF (DIFFERENCE) FILE
`INCLUDING A SEQUENCE OF COPY
`COMMANDS, INSERT COMMANDS AND
`
`INSERTION STRINGS
`
`BUILD THE OISD (OPTIMIZED INSERTION
`STRING DATABASE). THE OISD IS A
`DATABASE 0F INSERTION STRINGS WITH
`CERTAIN REDUNDANT STRINGS REMOVED.
`
`ENCODE THE COMMAND SEQUENCE FROM
`THE RAW DIFFERENCE FILE AND PLACE IT
`IN THE DIFFERENCE FILE.
`
`APPEND THE (HUFFMAN) COMMAND
`DECODE TREES AND THE INITIAL COMMAND
`STATE TO THE DIFFERENCE FILE.
`
`APPEND THE OPTIMIZED INSERTION TEXT
`
`STRINGS (FROM THE OISD) TO THE DIFFERENCE FILE
`
`S12
`
`S14
`
`S16
`
`S17
`
`$18
`
`ADD A HEADER TO THE DIFF FILE WHICH
`
`
`CONTAINS CHECKSUMS AND OTHER
`INFORMATION ABOUT THE OLD. NEW AND
`DIFF FILES.
`
`
`FORM THE CHECKSUM FOR THE DIFF FILE
`AND ADD TO THE HEADER. THE DIFF FILE IS
`NOW COMPLETE. OPTIONALLY COMPRESS
`THE DIFFERENCE FILE.
`
`
`
`$20
`
`$22
`
`DIFFIT PROCESS
`COMPLETED
`
`824'
`
`FIG. 3
`
`Unfied Patents Exhibit 1004
`
`Pg. 5
`
`Unfied Patents Exhibit 1004
`Pg. 5
`
`

`

`US. Patent
`
`Nov. 3, 1998
`
`Sheet 4 0f 13
`
`5,832,520
`
`BYTE OFFSET INTO THE OLD
`FOUR BYTES OF THE OLD
`FILE (FIELD IS 2,3 0R 4 BYTES
`FILE DATA STARTING AT THE
`LONG DEPENDWG UPON OLD
`OFFSET-
`FILE SIZE).
`
`
`
`
`
`1122334410.
`
`oooooo m
`
`}106
`
`
`
`
`
`AN EXAMPLE
`i
`I
`OF AN
`N-4
`OADF3452
`UNSORTED
`
`
`TSI —_
`
`
`
`
`LAST FOUR BYTES IN THE
`FILE.
`
`(N IS THE OLD FILE
`LENGTH)
`
`00000000
`
`. 00000000
`
`00000001
`
`
`
`
`
`
`
`
`

`
`
`
`
`
`OF3538
`
`003780
`
`13B073D
`
`$
`003827
`
`AN EXAMPLE
`OF A
`SORTED TSI
`
`FIG. 4A
`
`Unfied Patents Exhibit 1004
`
`Pg. 6
`
`Unfied Patents Exhibit 1004
`Pg. 6
`
`

`

`US. Patent
`
`Nov. 3, 1998
`
`Sheet 5 0f 13
`
`5,832,520
`
`
`
`
`
`1 1 5
`
`r‘
`HASH
`FUNCTION
`
`OLD FILE STRING
`STARTING AT
`POSITION SHOWN
`
`m (HASH CHAIN TABLE)
`
`‘I_Q (OLD FILE)
`
`
`PRODUCES
`
`POINTER INTO
`THE HASH HEAD
`
`
`CORRESPONDS T0
`CORRESPONDS TO
`
`NU LL POI NTE R
`
`
`
`CORRESPONDS T0
`
`
`
`
`IIIIIIIIII
`
`
`
`
`
`
`
`CORRESPONDS TO
`
`
`
`
`CORRESPONDS TO
`
`NULL
`
`X
`
`IIIIIII
`
`
`
`FIG. 4B
`
`Unfied Patents Exhibit 1004
`
`Pg. 7
`
`Unfied Patents Exhibit 1004
`Pg. 7
`
`

`

`
`
`
`NO
`
`CREATE COPY
`COMMAND AND INSERT
`
`T10
`
`CREATE INSERT COMMAND
`
`
`AND INSERT INTO RAW DIFF
`
`FILE
`
`
`US. Patent
`
`Nov. 3, 1998
`
`Sheet 6 0f 13
`
`5,832,520
`
`START BUILD OF THE
`RAW DIFF FILE
`
`
`
`INITIALIZE P__OLD AND P__NE
`
`
`TO THE BEGINNING OF EACH
`FILE
`
`
`SEARCH THE OLD FILE FOR
`
`THE STRING AT THE CURRENT
`
`
`POSITION IN THE NEW FILE
`
`(SEE FIG. EB).
`
`DETERMINE WHETHER
`
`
`A COPY. INSERT 0R INSERT/COPY
`
`
`OPERATION WILL BE PERFORMED
` INSERT
`
`(AND THE COUNT
`
`AND POSITION
`
`INFORMATION).
`
`
`INTO RAW DIFF FILE
`
` YES
`
`UPDATE STATISTI
`
`
`CS
`FOR COUNT AND
`
`POSITION BITS
`REQUIRED FOR THIS
`
`COPY
`
`
`
`
`
`
`N0 N0 = SEARCH HAS ALREADY
`RETURNED THE NEXT COPY
`AFTER AN INS
`
`ERT
`
`
`
`RAW DIFF FILE IS
`COMPLETE
`
`FIG. 5A
`
`Unfied Patents Exhibit 1004
`
`Pg. 8
`
`Unfied Patents Exhibit 1004
`Pg. 8
`
`

`

`US. Patent
`
`Nov. 3, 1998
`
`Sheet 7 0f 13
`
`5,832,520
`
`START SEARCH
`
`
`INITIALIZE INSERT
`
`
`COUNTER TO ZERO
`
`
`T18
`
`
`
`N BYTE
`T25
`
`
`FIND LENGTH
`STRING AT P_OLD
`
`
`AND P_NEW
`
`MATCH?
`
`USE THE TSI OR
`
`
`SEQUENTIALLY
`
`SEARCH THE OLD FILE
`
`FOR ALL MATCHING
`
`STRINGS LONGER
`
`THAN X BYTES
`
`
`RETURN LENGTH OF
`
`STRING WITH "COPY
`
`IMMEDIATE" STATUS
`
`MATCHING
`
`
`STRINGS
`
`RETURN LENGTH OF
`FOUND?
`
`INSERT AND COPY. WITH
`"INSERT AND COPY
`
`
`KEEP THE LENGTH
`(IMMEDIATE OR NEW
`
`
`
`
`POSITION)" STATUS
`AND LOCATION OF
`INCREMENT THE
`THE LONGEST STRING
`INSERT LENGTH
`
`
`
`FOUND
`COUNTER AND OLD
`
`
`FILE AND NEW FILE
` POINTERS
` CALCULATE THE COPY
`
`COST AND INSERT COST
`
`
`FUNCTION
`
` RETURN FROM SEARCH
`T90
` ATTEMPT
`
`RESYNCHRONIZATION
`INCREMENT POINTERS. ADD
`8 T0 INSERT COST,
`
`INSERT COST
`
`< COPY COST?
`
`
`
`IGNORE
`
`RESYNCHRONIZATION
`
`
`AND USE ORIGINAL
`
`"INSERT AND COPY FROM
`
`DID Z BY'TES
`NEW POSITION"
`
`
`
`MATCH?
`PARAMETERS
`
`
`
`
`
`IGNORE THE ORIGINAL
`
`"INSERT AND COPY
`
`T80
`YES
`
`FROM NEW POSITION",
`FIND THE LENGTH T75
`AND USE THE "INSERT
`
`OF THE MATCH
`AND COPY IMMEDIATE"
`PARAMETERS
`
`
`
`FIG. 58
`
`Unfied Patents Exhibit 1004
`
`Pg. 9
`
`Unfied Patents Exhibit 1004
`Pg. 9
`
`

`

`US. Patent
`
`Nov. 3, 1998
`
`Sheet 8 0f 13
`
`5,832,520
`
`
`
`COMMAND
`
`
`W10
`
` CREATE LONG COUNT (SET THE FIRST
`
`
`LENGTH
`BYTE TO ZERO. APPEND THE LENGTH.
`OF COPY >127
`YES
`USING LONGFIELDWIDTH NUMBER OF
`
`
`BYTES?
`BYTES)
`
`
`
`W4
`
`NO
`
`COPY COMMAND BYTE
`EQUALS THE LENGTH
`
`APPEND THE POSITION
`BYTES
`
`
`W6
`
`
`
`
`
`
`APPEND THE COMMAND TO THE RAW DIFF
`FILE (SEE FIG. GO FOR COPY COMMAND
`FORMAT)
`
`W14
`
`END OF CREATE COPY
`COMMAND
`
`FIG. 6A
`
`START CREATE INSERT
`COMMAND
`
`W44
`
`
`
`LENGTH
`OF INSERT > 127
`BYTES?
`
`W50
`
`CREATE LONG COUNT (SET
`THE FIRST BYTE TO 0X80,
`APPEND THE LENGTH, USING
`LONGFIELDWIDTH NUMBER OF
`BYTES)
`
`
`
`
`
`
`
`
` YES
`
`
`
`
`NO
`
`W46
`
`
`
`
`
`INSERT COMMAND BYTE
`EQUALS THE LENGTH +
`0X80
`
`W48
`
`APPEND THE TEXT TO
`BE INSERTED
`
`APPEND THE COMMAND TO THE RAW DIFF
`FILE (SEE FIG. 60 FOR INSERT COMMAND
`FORMAT)
`
`
`
`
`
`END OF CREATE INSERT
`COMMAND
`
`W12
`
`W52
`
`FIG. 63
`
`Unfied Patents Exhibit 1004
`
`Pg. 10
`
`Unfied Patents Exhibit 1004
`Pg. 10
`
`

`

`US. Patent
`
`Nov. 3, 1998
`
`Sheet 9 0f 13
`
`5,832,520
`
`COPYCOMMANDFORMATS
`IST BYTE
`# 0F BYTES = LONGFIELDWIDTH
`
`SHORT COUNT COPY
`OLD FILE POSITION
`(NORMALIZED LENGTH<=127)
`(XXXXXXX = NORMALIZED LENGTH OF THE COPY)
`
`# 0F BYTES =
`# OF BYTES =
`1ST BYTE
`LONGFIELDWIDTH
`LONGFIELDWIDTH
`LONGCOUNTCOPY ——
`(NORMALIZED LENGTH>127)
`00000000
`OLD FILE POSITION
`COPY LENGTH
`
`FIG. 6C
`
`INSERT COMMAND FORMATS
`
`SHORTCOUNT'NSERT
`(NORMALIZED LENGTH<=127)
`(XXXXXXX = NORMALIZED LENGTH OF THE INSERT)
`
`INSERT TEXT
`
`IST BYTE
`
`# OF BYTES = LENGTH OF INSERTION TEXT
`
`# OF BYTES =
`# 0F BYTES = LENGTH OF
`‘57 BYTE
`LONGFIELDWIDTH
`INSERTION TEXT
`LONGCOUNT'NSERT ——
`(NORMALIZED LENGTH>127)
`10000000
`INSERT LENGTH
`INSERT TEXT
`
`FIG. 60
`
`
`
`FIG. 10
`
`Unfied Patents Exhibit 1004
`
`Pg. 11
`
`Unfied Patents Exhibit 1004
`Pg. 11
`
`

`

`US. Patent
`
`Nov. 3, 1998
`
`Sheet 10 0f 13
`
`5,832,520
`
`‘ '
`B I
`I I
`I
`I
`
`
`(OPTIMIZED INSERTION STRING
`
`
`U2
`
`DATABASE
`
`SEARCH THE RAW DIFF FILE FOR INSERT
`
`
`U4
`COMMANDS AND COPY STRING LENGTH
`
`
`AND A POINTER TO INSERTION STRING
`
`
`INDEX (ISI)
`
`USING THE COMPLETED ISI,
`
`CHECK THE LENGTH OF EACH
`STRING OF INSERTION TEXT
`
`
`(START AT THE FIRST STRING)
`
`U10
`
`
`
`END
`(OISD COMPLETED)
`
`U30
`
`FOR ANY INSERT COMMAND
`CODE INDICATING POINTER
`NOT RESOLVED, REPLACE THE
`POINTER WITH AN OISD
`OFFSET
`
`
`
`YES
`
`MORE ISI ENTRIES?
`
`
`
`
`
`
`
`
` INSERT
`
`
`
`OISD? NO
`U32
`
`GET THE STRING FROM THE RAW [J14
`DIFF FILE AND SEQUENTIALLY
`
`SEARCH THE EXISTING OISD FOR
`
`
`THE STRING
`
`
`STRING
`
`
`FOUND IN THE
`
`
`CHANGE THE ISI POINTER
`TO AN OFFSET INTO THE
`OISD WHERE THE STRING
`IS LOCATED AND SET
`COMMAND CODE TO IMP
`
`USE THE ISI TO SEARCH THE
`INSERT STRINGS IN THE RAw
`DIFF FILE AFI'ER THE CURRENT
`STRING
`
`U18
`
`.
`
`U20
`
`STRING
`
`
`FOUND AS PART 0
`
`
`ANOTHER INSERT STRING
`
`
`IN THE RAW
`
`
`DIFF FILE?
`
`
`
` "
`
`U22
`
`CHANGE THE ISI POINTER TO
`POINT TO THE BEGINNING OF
`OTHER ISI ENTRY WITH AN
`OFFSET INDICATING THE
`START OF THE INCLUDED
`STRING AND SET COMMAND
`CODE TO IMP (POINTER NOT
`RESOLVED)
`
`ADD THE STRING TO THE OISD
`AND SET COMMAND CODE TO ICP
`(INSERT AT THE CURRENT
`
`POSITION)
`
`FIG. 7A
`
`Unfied Patents Exhibit 1004
`
`Pg. 12
`
`U12
`
`
`
`LENGTH <=
`
`
`OSITION LENGTH?
`MINIMUN INSERT
`
`YES
`
`
`
`NO
`
`U26
`
`UPDATE STATISTICS
`FOR COUNT BITS
`REQUIRED FOR THIS
`
`REPLACE THE FIRST BYTE
`
`OF THE INSERTION TEXT IN
`THE RAW DIFF FILE WITH
`THE COMMAND CODE FOR
`THAT INSERT COMMAND
`
`Unfied Patents Exhibit 1004
`Pg. 12
`
`

`

`US. Patent
`
`Nov. 3, 1998
`
`Sheet 11 of 13
`
`5,832,520
`
`ISI (INSERTION STRING INDEX) PRIOR
`TO BUILDING OISD
`
`I:
`Li
`E
`D
`D
`E
`3 E
`3 E
`a g
`5 I2
`EH
`E E
`z
`z
`0
`:9 I
`==
`t 9
`E
`*5
`HIE
`'5
`{3
`LIJE
`ISIGROWS
`0
`{LE
`1'—
`E w
`E
`DOWN
`E E
`E
`E E
`E
`| ‘5 a I ‘ I
`o z
`-'
`8 z
`4
`
`MEMORY
`. . . 0 .
`
`MOST RECENT
`INSERT
`COMMAND
`
`SECOND INSERT
`COMMAND
`
`FIRST INSERT
`COMMAND
`
`ISI (INSERTION STRING INDEX)
`DURING BUILDING OF OISD
`
`o I-
`9 ><
`O E
`9 5
`I— -
`3,4 IE
`1L}... 3
`
`03
`
`a
`a
`3
`'5
`5
`
`O'
`
`9
`-
`E
`5
`E a E
`l‘c's z I-
`c, E "’
`5 5 5
`E E E
`o
`'2 E E z
`5 w :I: 8
`“ZS"-
`
`o I-
`<2 35
`8 "
`E
`t 5
`I:
`g
`“(,1 m
`Lu
`E $
`00.. OZ "
`
`0
`
`\
`THE OISD GROWS UP
`'
`
`F47
`
`0000
`
`OISD
`
`/
`
`
`
`
`
`
`
`DOWN
`INSERT
`SECOND
`
`COMMAND N
`INSERT
`COMMAND
`
`THIRD INSERT
`
`COMMAND
`FIRST
`
`'
`FIG 7C
`
`COMMAND
`INSERT
`
`REPLACE FIRST BYTE OF
`THE INSERTION TEXT
`STRING WITH THE
`COMMAND CODE.
`
`cc
`
`'61
`
`2
`
`INSERTION
`TEXT
`STRING
`
`3
`'c 2
`CC
`CC = COPY COMMANDS
`
`INSERTION
`TEXT
`4
`'c 3
`CC
`STRING
`IC = INSERT COMMANDS
`
`INSERTION
`TEXT
`STRING
`
`' ° ' .
`ENDS HERE
`
`RAW DIFF FILE DURING CONSTRUCTION OF 0150
`
`FIG. 70
`
`Unfied Patents Exhibit 1004
`
`Pg. 13
`
`Unfied Patents Exhibit 1004
`Pg. 13
`
`

`

`US. Patent
`
`Nov. 3, 1998
`
`Sheet 12 0f 13
`
`5,832,520
`
`HUFFMAN
`DECODE
`HEADER TREES
`
`INITIAL
`COMMAND
`STATE
`
`ENCODED COMMAND.
`
`SEQUENCE
`
`OISD (INSERTION
`TEXT)
`
`CCP ICP CCP CMP IMP CCP CMP CMP I I O 3674QBC3DF67 I I O I
`
`I
`
`COMMANDS
`MINIMIZED
`
`T
`
`THE OPTIMIZED INSERTION
`STRING DATABASE
`
`FIG. 8A
`
`NEXT STATE
`
`mm-
`
`CURRENT
`STATE
`
`
`(FOLLOWED BY
`
`ICP)
`
` CCP
`
`
`1 o
`
`o
`
`1 1 Q) = PRESENT STATE
`
`(PRESENT COMMAND INTERPRETATION)
`
`NOTES:
`
`/X\ = COMMAND CODE THAT TRANSITIONS
`TO THE NEXT STATE
`
`FIG. 88
`
`Unfied Patents Exhibit 1004
`
`Pg. 14
`
`Unfied Patents Exhibit 1004
`Pg. 14
`
`

`

`US. Patent
`
`Nov.
`
`3, 1998
`
`Sheet 13 0f 13
`
`5,832,520
`
`405
`
`408
`
`START REVIT PROCESS
`
`DECOMPRESS DIFFERENCE FILE
`
`410
`
`VALIDATE CHECKSUMS OF OLD AND DIFF
`FILES
`
`415
`
`425
`
`430
`
`UNPACK HUFFMAN DECODE TREES
`
`INITIALIZE POINTERS
`
`UNPACK FIRST COMMAND AND INITIALIZE
`STATE MACHINE
`
`710
`
`UNPACK NEXT
`COMMAND
`
`435
`
`DECODE THE COMMAND TYPE
`
` 500
`
`_
`
`660
`UNPACK NEw INSERT
`POSITION OFFSET
`
`DECODE LENGTH
`COUNT
`
`550
`DECODE LENGTH
`COUNT
`
`650
`600
`
`DECODE LENGTH
`DECODE LENGTH
`
`
`COUNT
`
`560
`
`UNPACK NEW COPY
`POSITION OFFSET
`
`570
`
`670
`
`NO 510
`
`COPY "LENGTH
`COPY "LENGTH
`COPY "LENGTH
`COUNT" BYTES FROM
`
`
`COUNT" BYTES FROM
`COUNT" BYTES FROM
`
`NEw COPY POSITION
`
`
`
`P_0LD FILE TO P_NEW
`
`
`NEW INSERT
`COUNT' BYTES FROM
`IN OLD FILE TO P NEw
`AND ADJUST
`
`
`AND ADJUS?
`P_IT TO P_NEW AND
`POSITION TO P_NEW
`
`
`
`POINTERS
`ADJUST POINTERS
`AND ADJUST
`
`POINTERS
`
`
`POINTERS
`
`
`COPY "LENGTH
`
`
`
`
`
`'VE ALL TH
`WAS LAST
`
`
`COMMANDS IN THE
`
`
`COMMAND CCP
`DIFF FILE BEEN
`
`FOLLOWED BY |CP7
`EXECUTED?
`
`
`
`YES
`
`800
`
`VALIDATE CHECKSUM
`FOR THE COMPLETED
`NEW FILE.
`
`
`
`NEW FILE
`850
`
`RECONSTRUCTION IS
`COMPLETED.
`
`
`
`FIG. 9
`
`Unfied Patents Exhibit 1004
`
`Pg. 15
`
`Unfied Patents Exhibit 1004
`Pg. 15
`
`

`

`This application claims the benefit of a provisional appli—
`cation Ser. No. 60/021,457, filed Jul. 3, 1996 now pending
`filed with a source code appendix consisting of 22 pages.
`The appendix contains program source code in the C++
`programming language for software modules that embody
`aspects of the current invention.
`The present invention relates to the field of data files used
`by computers. More specifically,
`the present
`invention
`relates to a system for creating, updating or revising large
`computer files by using only a small file containing indica-
`tions of the differences between the large computer files and
`a preexisting computer file.
`The present invention is motivated in part by changes that
`have been occurring in the personal computer industry over
`the last several years.
`Increases in performance and
`decreases in cost have led to a proliferation of computer
`equipment in homes and ollices. This computer equipment
`has in turn spawned a burgeoning market for software
`modules that cause the equipment to operate in a desired
`manner. In recent years, the software modules have become
`larger and larger as the price of computer memory and the
`storage space needed to hold these software modules have
`become cheaper and cheaper. This has allowed for the
`development and sale of far more complex executable
`program codes to accomplish various functions such as word
`processing, spreadsheets, multimedia or any other use for a
`computer. In addition to executable files, more and more
`complex text and multimedia files, as well as database files,
`are commonly being used and distributed or archived in
`home and office computer systems.
`These large files are distributed from software manufac-
`turers to users via a number of different means, including
`being preloaded on a computer’s hard drive before the
`computer is purchased, being shipped on a fixed medium
`such as a floppy disk or CD ROM, or being distributed
`through a transmission medium such as a dial-up telephone
`service, a BBS, or the Internet.
`It is the nature of computer software and other large files
`that it is often desirable to update or revise files in order to
`correct errors or add features. Sometimes these revisions
`may be relatively minor, involving changes in only a small
`percentage of the data that makes up the file.
`One obstacle to the frequent revision of large computer
`files by a manufacturer is the cost of delivering the updated
`file to the user. If an entire new revised file must be
`delivered, the amount of data can be substantial. Large files _
`typically are as large as ten million characters (10
`Megabytes) or larger. Distribution of such liles on floppy
`disk can require a relatively large amount of disk space.
`Distribution of such large files over a medium such as the
`Internet can take an undesirably long time from the point of
`view of the customer and can consume a large amount of
`server resources from the point of view of the file provider.
`One solution to the problem of distributing large com-
`puter files is use of compression. A number of standard
`compression algorithms are in existence and are commonly
`used today. These algorithms typically achieve compression
`of a large executable file down to between 40% to 60% of
`its original file size and can compress some types of text files
`even further, thereby reducing the transaction costs of ship-
`ping the file. However, for very large computer files or
`collections of files, even a compressed file reduced to 40%
`still represents a substantial transmission cost.
`
`10
`
`o;O
`
`35
`
`40
`
`45
`
`60
`
`1
`AUTOMATIC FILE DIFFERENCING AND
`UPDATING SYSTEM
`BACKGROUND OF THE INVENTION
`
`5,832,520
`
`2
`Another method useful for transmitting updated liles is
`using a technique known as a differencing program or
`comparator program to compare an old file to a new revised
`file in order to determine how the files differ. One such file
`system is distributed as part of the GNU UNIX—like oper—
`ating system through tools referred to as diff and patch and
`
`described in standard GNU documentation. The described
`
`
`
`system discusses a way to use a di erencing program to
`generate a patch file, and then using that patch file in
`combination with the old file to generate a newly revised
`file. While the GNU revision system has some applications
`Within the UN IX-like operating system within which it was
`developed, it has not been generalizable in the new envi-
`ronment of personal computer systems. The most commonly
`available versions of the system are limited to text files, and
`achieve only limited compression. These programs cannot
`effectively handle files where a number of fairly complex
`
`
`changes have occurred, such as a number of block moves or
`
`random shu ling of text strings. These programs also do not
`produce the smallest patch file possible.
`
`What is needed is a method and system for generating a
`
`
`di erence fi e from an old file and a new file, where that
`
`di ference ile indicates,
`in minimal number of bytes,
`changes between the old file and the new file. The needed
`system would allow users to then transmit the difference file
`to a seconc computer system or to a backup or archive
`storage system (system 2), and to use that difference file and
`
`
`the old file along with a decoding process to generate a
`
`
`newly revised file. The di erence file could also be stored
`locally, allowing a number of versions of the same file to be
`saved without duplicating redundant information. Ideally,
`the difference file would be the smallest possible difference
`file, achieving compression density of perhaps 10% or less
`of the original file, even with a moderate number of changes
`between the two files.
`SUMMARY OF THE INVENTION
`
`
`
`
`
`The present invention comprises a software system with
`
`several components, a method, and a file structure for
`
`
`
`generating very e icient difference files (sometimes abbre-
`viated DIFF file) from an old file and a new file so that a
`difference file can be transmitted to a second computer
`system where the difference file and a duplicate of the old
`file can quickly be used to create a copy of the new file,
`duplicating the new file as it existed on the first computer
`system. The difference file could also be stored locally to the
`first computer system, allowing the new file to be duplicated
`from the old file without storing the new file.
`According to the oresent invention, a differencing process
`
`on a first computer system compares an old file and a new
`
`
`
`
`file to generate a di erence file. In this process, the old file
`
`
`is used essentially as a database of byte strings. The di er-
`encing process reaes strings of data from the new file and
`
`searches for the existence of those strings in the old file. The
`di erencing process notes the locations in the old file in
`
`which strings in the new file are found and stores in a
`
`di erence file an inc ication of the location where a matching
`string is found and an indication of the length of the
`matching string found in the old file. This information is
`stored in the difference file in a copy command. When the
`di erencing process encounters strings of characters in the
`new file that are not found in the old file, the differencing
`process adds those strings to an insert database and adds an
`inc ication in the difference file of the location in the insert
`database where the strings can be found and the length of the
`string. This location and length are stored in an insert
`command.
`
`
`
`
`
`
`Unfied Patents Exhibit 1004
`
`Pg. 16
`
`Unfied Patents Exhibit 1004
`Pg. 16
`
`

`

`5,832,520
`
`
`
`3
`According to a specilic embodiment of the invention, the
`differencing process, upon opening the old file, creates an
`index (or hash table) of all of the character strings of
`predetermined length found in the old file, along with the
`locations at which those character strings were found, in
`order to facilitate searching for character strings from the
`new file. According to a further embodiment, the index (or
`hash table) is created only if the differencing process detects
`that there is sufficient memory on the first computer to hold
`the index.
`
`
`The present invention also comprises a file structure for
`
`
`the difference file that allows the di erence file to hold
`information from which to construct a copy of the new file
`using a duplicate of the old file while occupying the least
`number of bytes. According to specific embodiments, this
`difference file is built in a multistep process to minimize the
`size of the difference ile.
`
`The method according to the invention attempts to mini-
`
`mize the size of the di ‘erence file by a variety of techniques,
`any group of which may be incorporated into specific
`embodiments. According to a specific embodiment,
`the
`entire old file, rather han a limited portion of the data, is
`used as a database. This helps produce a smaller dilIerence
`file by potentially finding more data to copy from the old file.
`In general, as much data as possible is copied from the old
`file, unless it takes fewer bits to insert the data. Copying data
`from the old file requires only a command code, whereas
`inserting data requires a command, plus the actual insertion
`data. The invention makes decisions about whether to copy
`or insert data by using a search algorithm that can do the
`following: use a “current positions” pointer into the old file
`for copying data, which eliminates an explicit position field
`if data can be copied from the current position; favor
`copying from the “current position” in the old file; search the
`entire old file for data not found at the “current position” and
`if the data is found elsewhere in the old file, copy from that
`position (unless it “costs more” than inserting enough data
`to allow the next copy from the “current position”); and
`insert data if a copy was not chosen.
`According to an embodiment, the length of the minimum
`data string searched for in the old file is selected as “N" bytes
`and is dependent on where the data is copied from. The copy
`from current position command uses a shorter minimum
`data string than copy from another position because it has
`been encoded to require fewer bits.
`According to further embodiments, commands and count
`fields of the difference file are encoded. Commands may be
`encoded using a “state machine”, where certain commands
`are implied by the sequence of previous commands, and
`count fields may be encoded with a “cascaded” count field
`method. Hullman encoding of smaller count values may be
`employed to further reduce the difference file size, and
`counts larger than the HuJIman—encoded counts use three
`progressively larger count fields. The length of these larger
`count fields is variable for each command type, and is
`modified based on statistics gathered during the construction
`of each individual difference file.
`
`According to further embodiments, all command, count
`and position fields are “bit-packed” to eliminate unused
`
`“filler” bits, and redundant data strings are removed from the
`
`
`insertion data before being appended to the di erence file.
`The entire minimized difference file (minimized by the
`techniques mentioned above) may be finally compressed
`(using a “well-known” compression algorithm- ike “zip”-or
`proprietary compression technique) to reduce the file size.
`According to an embodiment of the invention, execution
`time of the differencing step is important only in that the
`
`
`
`10
`
`o;O
`
`35
`
`40
`
`45
`
`LIIo
`
`60
`
`4
`dillerence method should not take an unreasonable time to
`execute (overnight may be OK in many cases). An index or
`hash table may be used to speed searching, but is not
`necessary for the differencing process (and will not be built
`if sufficient memory is unavailable).
`invention is
`In this patent application,
`the present
`described with reference to specific embodiments. It will be
`understood by anyone skilled in the programming art that
`many variations on the basic system and method of the
`present invention are possible within a computer environ-
`ment. The invention therefore should not be construed as
`limited except as provided in the attached claims.
`For example, conventional computer systems today
`encode data as a collection of two-state biliary units known
`at bits. Most current computers group these bits into 8—bit
`groups known as bytes, also referred to as characters. A
`sequence of bytes or characters is commonly referred to as
`a string. These terms are used in accordance with their
`accepted meaning in the art in this application, but it should
`
`be understood that the techniques of the invention could be
`
`
`
`used in different types of computing systems having di er-
`ent means for encoding and organizing data.
`Also, within the art, the terms “text” and “string” are
`sometimes used in a particular way to describe computer
`encoded alphanumeric data, and at other times these terms
`are used very broadly to denote a sequence of data values
`that could represent anything: text, a number, a piece of an
`image, sound, etc. In the present description, these terms and
`other terms used in the art are intended to be given their
`broadest meaning.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 is a block diagram showing the general differenc-
`ing and revision process according to the current invention.
`FIG. 2 is a block diagram of a differencing process and its
`components according to an embodiment of the invention.
`
`
`FIG. 3 is a flow chart illustrating a method used by the
`
`di
`erencing processor according to the present invention.
`FIG. 4A is a diagram of a Text String Index (TSI)
`according to one embodiment of the present invention.
`FIG. 4B is a diagram of a hash table TSI according to a
`preferred embodiment of the present invention.
`
`FIG. 5A is a flow chart illustrating a method used by the
`i erencing processor according to the present invention to
`build the raw dilIerence file.
`
`FIG. 5B is a flow chart illustrating a method used by the
`' erencing processor to search for strings in the old file
`according to the present invention.
`FIGS. 6A—6D show flow charts and command structures
`i lustrating copy and insert commands inserted into the raw
`' erence file according to the present invention.
`FIGS. 7A77D show a flow chart and index and file
`structures illustrating creating the optimized insert string
`catabase for including into the final difference file according
`to the present invention.
`FIGS. SA—SB illustrate the file structure for the final
`
`
`
`ci erence file and a state machine and command format for
`command encoding in the final difference file according to
`an embodiment of the present invention.
`FIG. 9 is a flow chart showing the method of the revision
`processor according to the current invention.
`FIG. 10 shows a computer system incorporating the
`invention.
`
`
`
`
`
`Unfied Patents Exhibit 1004
`
`Pg. 17
`
`Unfied Patents Exhibit 1004
`Pg. 17
`
`

`

`5,832,520
`
`10
`
`5
`DETAILED DESCRIPTION OF THE DRAWINGS
`Overview
`An overview of the process according to the invention is
`illustrated in FIG. 1. FIG. 1 illustrates a first computer
`system 1 and a second computer system 2 which commu—
`nicate via a communication path 5. Both computer systems
`1 and 2 can be any collection of computing devices oper-
`ating together whatsoever as is known in the art. Computer
`systems 1 and 2 can also refer to the same computer system,
`or components within the same computer system. Commu-
`nication path 5 may be any means by which a file can be
`communicated between computer system 1 and computer
`system 2, including a removable fixed medium such as a
`Iloppy disk or CD ROM disk, or a communication medium
`such as an interoffice network, a telephone line, a bus, or the
`Internet. Path 5 also might encompass an electronic mail
`message.
`As shown in FIG. 1, computer system 1 includes an old
`file 10, a new file 20, and a differencing processor 100 which
`generates a difference file 30. New file 20 is generally
`somewhat similar to old file 10, containing some changes in
`data which reflect a revision of new file 20. Old file 10 and
`new file 20 could be any collection of computer data
`whatsoever, including an executable application program, an
`operating system program, a text file, a spreadsheet or other
`data file, an image file, or any other type of computer-
`
`readable data.
`
`
`
`Di erence processor (Difflt) 100 reads new file 20 and
`compares it to old file 10 by a process described below.
`DiJIerence process 100 then stores indications of the data to
`be copied from old file 10 or inserted from new file 20 into
`difference file 30. According to the invention, when new file
`20 is a revised version of old file 10, dill'erenc/e file 30 will
`be substantially smaller than either new file 20 or old file 10,
`in some cases, only ten percent or less than the size of new
`file 20.
`According to the invention, difference file 30 may then be
`transmitted over path 5 to computer system 2 where a
`revision process (RevIt) 200 reads a duplicate 15 of old file
`10 and difference file 30 and creates a copy 25 of new file
`20 on computer system 2, According to the invention, copy
`25 (designated the New* file) is identical to new file 20
`created on computer system 1.
`DiJIerencing Process Overview
`FIG. 2 is a block diagram illustrating the process of
`differencing process (DiffIt process) 100. DiftIt process 100
`uses old file 10 and new file 20 to generate difference file 30.
`An optional index builder 105 may be used to build a text
`string index (TSI) 106 to speed building the difference file.
`Search engine 104 reads strings of data from new file 20
`and attempts to locate string matches in old file 10 for each
`string found in new file 20, If an index 106 is present, it is .
`used by search engine 104 to increase searching speed.
`When search engine 104 finds a match in old file 10 for
`a string from new file 20, it indicates this by placing a copy
`command into a raw difference file 31 (RawDiff), The copy
`command includes an indication of where the string is found
`in 10 and the length of the string. When search engine 104
`does not find a match in old file 10 for a string from new file
`20, it indicates this by placing an insert command into a raw
`difference file 31 (RawDiff). The insert command includes
`the text for the string that was not found in old file 10.
`According to an embodiment of the invention, an opti—
`mized insertion string database (OISD) engine 101 examines
`the raw DIFF file 31 to create an OISD 110, using an
`insertion string index 113. A command encoder 102 and a
`count encoder 103 encode command codes and count fields
`into the final diIIerence file 30 which also includes OISD
`110.
`
`
`
`6
`The details of the operation of the elements shown in FIG.
`2 according to specific embodiments will now be described.
`Dilerencing Process Flowchart
`FIG. 3 is a detailed flow chart of the operatio

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket