throbber
United States Patent [19J
`Miller
`
`111111111111111111111111111111111111111111111111111111111111111111111111111
`US005832520A
`[11] Patent Number:
`[45] Date of Patent:
`
`5,832,520
`Nov. 3, 1998
`
`[54] AUTOMATIC FILE DIFFERENCING AND
`UPDATING SYSTEM
`
`"Comparing and Merging Files", from the World Wide Web
`(Jun. 8, 1996).
`
`[75]
`
`Inventor: William A. Miller, Gualala, Calif.
`
`[73] Assignee: Miller, Call, Plauck and Miller,
`Portola Valley, Calif.
`
`[21] Appl. No.: 754,486
`
`[22] Filed:
`
`Nov. 22, 1996
`
`Int. Cl.6
`...................................................... G06F 17/30
`[51]
`[52] U.S. Cl. .............................................................. 707/203
`[58] Field of Search ............................................... 707/203
`
`[56]
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`5,278,979
`5,418,945
`5,604,853
`5,623,656
`5,644,709
`
`1!1994 Foster eta!. ............................ 707/203
`5/1995 Carter et a!. ... ... ... ... ... ... .... ... ... ... . 707/8
`2/1997 Nagashima .............................. 707/540
`4/1997 Lyons . ... ... ... ... .... ... ... ... ... ... .... ... . 707/10
`7/1997 Austin ................................ 395/187.01
`
`01HER PUBLICATIONS
`
`Carmen, et al., Introduction to Algorithms, The MIT Press
`(1990).
`Pocket Soft, Inc., White Paper re .RTPatch Professional
`Binary Update System, from the World Wide Web (http://
`www.pocketsoft.com) (Nov. 14, 1996).
`
`Primary Examiner-Wayne Amsbury
`Attorney, Agent, or Firm-Townsend and Townsend and
`Crew LLP
`
`[57]
`
`ABSTRACT
`
`A method and file structure for generating an efficient
`difference files from and old file and a new file so that a
`difference file can be transmitted to a second computer
`system where the difference file and a duplicate of the old
`file can quickly be used to create a copy of the new file is
`disclosed. A differencing process compares an old file and a
`new file to generate a difference file in which the old file is
`used as a database of byte strings. The differencing process
`reads strings of data from the new file, searches for the
`existence of those strings in the old file, and notes the
`locations in the old file in which strings in the new file are
`found and stores in a difference file an indication of the
`location where a matching string is found and an indication
`of the length. A specific file structure for the difference file
`is disclosed.
`
`28 Claims, 13 Drawing Sheets
`
`COPY
`
`T6
`DETERMINE WHETHER
`A COPY, INSERT OR INSERT/COPY
`OPERATION WILL BE PERFORMED
`(AND THE COUNT
`AND POSITION
`INFORMATION)
`
`INSERT
`
`NO
`
`,---_._ ___ _,T10
`CREATE INSERT COMMAND
`AND INSERT INTO RAW DIFF
`FILE
`
`UPDATE STATISTICS
`FOR COUNT AND
`POSITION BITS
`REQUIRED FOR THIS
`COPY
`
`T12
`
`NO NO= SEARCH HAS ALREADY
`RETURNED THE NEXT COPY
`AFTER AN INSERT
`
`~
`L l -----+(• (
`.
`
`RAW DIFF FILE IS )c..·------''
`
`~
`
`COMPLETE
`
`_ 1~
`
`SAP Exhibit 1004, Page 1 of 25
`
`

`

`5,832,520
`U.S. Patent
`Sheet 1 of 13
`Nov. 3, 1998
`,-----------------------------------
`
`1
`I
`I
`
`1
`
`~
`
`OLD FILE
`
`NEW FILE
`
`10
`
`100
`
`20
`
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`
`30
`'------------------
`COMMUNICATION
`PATH
`
`2
`
`~----------------------
`
`~
`
`30
`
`15
`
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`5
`NEW* FILE
`:
`1
`I
`I
`I ___________________________________ J
`
`FIG. 1
`
`SAP Exhibit 1004, Page 2 of 25
`
`

`

`U.S. Patent
`
`Nov. 3, 1998
`
`Sheet 2 of 13
`
`5,832,520
`
`OLD FILE
`
`105
`
`INDEX
`BUILDER
`
`106
`
`OISD
`
`~
`113
`
`~
`110
`
`10
`
`COMMAND
`1 oz---.___--r--_ __.
`ENCODE
`
`COUNT
`ENCODE
`10~L---....,...-----'
`
`3
`
`FIG. 2
`
`SAP Exhibit 1004, Page 3 of 25
`
`

`

`U.S. Patent
`
`Nov. 3, 1998
`
`Sheet 3 of 13
`
`5,832,520
`
`82
`
`84
`
`( START THE DIFFIT
`PROCESS •
`I INPUT OLD FILE AND NEW
`FILE
`~
`I FORM CHECKSUMS FOR '("""'56
`
`OLD FILE AND NEW FILE
`~
`BUILD THE TSI (TEXT STRING r-810
`INDEX) STRUCTURE FOR THE
`OLD FILE
`
`~
`
`,--.· 812
`
`814
`v-·
`
`BUILD THE RAW DIFF (DIFFERENCE) FILE
`INCLUDING A SEQUENCE OF COPY
`COMMANDS, INSERT COMMANDS AND
`INSERTION STRINGS
`~
`BUILD THE OISD (OPTIMIZED INSERTION
`STRING DATABASE). THE OISD IS A
`DATABASE OF INSERTION STRINGS WITH
`CERTAIN REDUNDANT STRINGS REMOVED.
`~
`ENCODE THE COMMAND SEQUENCE FROM
`THE RAW DIFFERENCE FILE AND PLACE IT
`IN THE DIFFERENCE FILE.
`~
`APPEND THE (HUFFMAN) COMMAND
`DECODE TREES AND THE INITIAL COMMAND
`STATE TO THE DIFFERENCE FILE.
`~
`818
`APPEND THE OPTIMIZED INSERTION TEXT lr-'
`STRINGS (FROM THE OISD) TO THE
`DIFFERENCE FILE
`
`816
`
`817
`r--'
`
`+
`
`ADD A HEADER TO THE DIFF FILE WHICH
`CONTAINS CHECKSUMS AND OTHER
`INFORMATION ABOUT THE OLD, NEW AND
`DIFF FILES.
`
`20
`f----8
`
`+
`FORM THE CHECKSUM FOR THE DIFF FILE ,--.8
`22
`AND ADD TO THE HEADER. THE DIFF FILE IS
`NOW COMPLETE. OPTIONALLY COMPRESS
`THE DIFFERENCE FILE.
`
`+
`
`DIFFIT PROCESS
`COMPLETED
`
`824
`
`FIG. 3
`
`SAP Exhibit 1004, Page 4 of 25
`
`

`

`U.S. Patent
`
`Nov. 3, 1998
`
`Sheet 4 of 13
`
`5,832,520
`
`FOUR BYTES OF THE OLD
`FILE DATA STARTING AT THE
`OFFSET.
`
`BYTE OFFSET INTO THE OLD
`FILE (FIELD IS 2,3 OR 4 BYTES
`LONG DEPENDING UPON OLD
`FILE SIZE).
`
`AN EXAMPLE
`OFAN
`UNSORTED
`TSI
`
`AN EXAMPLE
`OFA
`SORTED TSI
`
`22334455
`33445566
`
`000001
`000002
`
`OADF3452
`_....-.DF345278
`
`N-4
`N-3
`
`LAST FOUR BYTES IN THE
`FILE.
`
`(N IS THE OLD FILE
`LENGTH)
`
`00000000
`00000000
`00000001
`
`OF3538
`003780
`13BC73D
`
`FFFFFFFE
`FFFFFFFF
`
`003827
`183F56
`
`FIG. 4A
`
`SAP Exhibit 1004, Page 5 of 25
`
`

`

`U.S. Patent
`
`Nov. 3, 1998
`
`Sheet 5 of 13
`
`5,832,520
`
`107 (HASH HEAD TABLE)
`
`OLD FILE STRING
`STARTING AT
`POSITION SHOWN
`~-----~
`
`HASH
`FUNCTION
`
`.1..Q.8. (HASH CHAIN TABLE)
`CORRESPONDS TO
`
`1Q (OLD FILE)
`
`NULL POINTER
`
`CORRESPONDS TO
`
`CORRESPONDS T
`
`CORRESPONDS TO
`
`CORRESPONDS TO
`
`X
`
`X
`
`NULL
`
`FIG. 48
`
`SAP Exhibit 1004, Page 6 of 25
`
`

`

`U.S. Patent
`
`Nov. 3, 1998
`
`Sheet 6 of 13
`
`5,832,520
`
`INITIALIZE P OLD AND P NEW
`TO THE BEGJNNING OF EACH
`FILE
`
`SEARCH THE OLD FILE FOR
`THE STRING AT THE CURRENT
`POSITION IN THE NEW FILE
`(SEE FIG. 58).
`
`T2
`
`:r4
`
`COPY
`
`DETERMINE WHETHER
`A COPY, INSERT OR INSERT/COPY
`OPERATION WILL BE PERFORMED
`(AND THE COUNT
`AND POSITION
`INFORMATION).
`
`INSERT
`
`NO
`
`TB
`CREATE COPY
`COMMAND AND INSERT 14 - - - - - - - - ,
`INTO RAW DIFF FILE
`
`~----~L-----~T10
`CREATE INSERT COMMAND
`AND INSERT INTO RAW DIFF
`-
`FILE
`
`UPDATE STATISTICS
`FOR COUNT AND
`POSITION BITS
`REQUIRED FOR THIS
`COPY
`
`T12
`
`NO NO = SEARCH HAS ALREADY
`RETURNED THE NEXT COPY
`AFTER AN INSERT
`
`~~----------~-c ..... __ RA_W __ D_IF-F-F-1-LE--IS __ )~
`
`_
`
`YES
`I
`
`YES
`
`_
`
`COMPLETE
`
`FIG. 5A
`
`SAP Exhibit 1004, Page 7 of 25
`
`

`

`U.S. Patent
`
`Nov. 3, 1998
`
`Sheet 7 of 13
`
`5,832,520
`
`T18
`
`NO
`
`T40
`
`USE THE TSI OR
`SEQUENTIALLY
`SEARCH THE OLD FILE
`FOR ALL MATCHING
`STRINGS LONGER
`THAN X BYTES
`
`YES
`
`KEEP THE LENGTH
`AND LOCATION OF T6Q
`THE LONGEST STRING
`FOUND
`
`T62
`
`T58
`
`INCREMENT THE
`INSERT LENGTH
`COUNTER AND OLD
`FILE AND NEW FILE
`POINTERS
`
`T35
`
`NO
`
`RETURN LENGTH OF
`STRING WITH "COPY
`IMMEDIATE" STATUS
`
`T85
`RETURN LENGTH OF
`INSERT AND COPY, WITH
`"INSERT AND COPY
`(IMMEDIATE OR NEW
`POSITION)" STATUS
`
`RETURN FROM SEARCH
`FUNCTION
`T90
`
`ATTEMPT
`RESYNCHRONIZA TION
`INCREMENT POINTERS, ADD
`8 TO INSERT COST.
`
`IGNORE
`RESYNCHRONIZATION T68
`AND USE ORIGINAL
`"INSERT AND COPY FROM
`NEW POSITION"
`PARAMETERS
`
`T80
`
`IGNORE THE ORIGINAL
`"INSERT AND COPY
`FROM NEW POSITION",
`AND USE THE "INSERT
`AND COPY IMMEDIATE"
`PARAMETERS
`
`FIG. 58
`
`YES
`
`T75
`
`SAP Exhibit 1004, Page 8 of 25
`
`

`

`U.S. Patent
`
`Nov. 3, 1998
`
`Sheet 8 of 13
`
`5,832,520
`
`W10
`CREATE LONG COUNT (SET THE FIRST
`BYTE TO ZERO, APPEND THE LENGTH,
`USING LONGFIELDWIDTH NUMBER OF
`BYTES)
`
`YES
`
`W4
`
`W6
`
`wa
`
`NO
`
`W50
`CREATE LONG COUNT (SET
`THE FIRST BYTE TO OXBO,
`APPEND THE LENGTH, USING
`LONGFIELDWIDTH NUMBER OF
`BYTES)
`
`W12
`
`APPEND THE COMMAND TO THE RAW DIFF
`FILE (SEE FIG. 6C FOR COPY COMMAND
`FORMAT)
`
`W14
`
`FIG. 6A
`
`W44
`
`YES
`
`NO
`
`W46
`
`INSERT COMMAND BYTE
`EQUALS THE LENGTH +
`oxao
`
`W48
`
`W52
`
`APPEND THE COMMAND TO THE RAW DIFF
`FILE (SEE FIG. 6D FOR INSERT COMMAND
`FORMAT)
`
`W54
`
`FIG. 68
`
`SAP Exhibit 1004, Page 9 of 25
`
`

`

`U.S. Patent
`
`Nov. 3, 1998
`
`Sheet 9 of 13
`
`5,832,520
`
`COPY COMMAND FORMATS
`I Oxxxxxxx I OLD FILE POSITION
`
`1ST BYTE
`
`# OF BYTES = LONGFIELDWIDTH
`
`SHORT COUNT COPY
`(NORMALIZED LENGTH<=127)
`
`(XXXXXXX = NORMALIZED LENGTH OF THE COPY)
`
`LONG COUNT COPY
`(NORMALIZED LENGTH>127)
`
`#OF BYTES=
`LONGFIELDWIDTH
`
`1ST BYTE
`
`I 00000000 I OLD FILE POSITION
`FIG. 6C
`
`#OF BYTES=
`LONGFIELDWIDTH
`
`COPY LENGTH
`
`INSERT COMMAND FORMATS
`
`SHORT COUNT INSERT
`(NORMALIZED LENGTH<=127)
`
`#OF BYTES= LENGTH OF INSERTION TEXT
`
`1ST BYTE
`
`l1xxxxxxx I
`
`INSERT TEXT
`
`(XXXXXXX = NORMALIZED LENGTH OF THE INSERT)
`
`LONG COUNT INSERT
`(NORMALIZED LENGTH>127)
`
`#OF BYTES=
`LONGFIELDWIDTH
`
`1ST BYTE
`
`110000000 I INSERT LENGTH
`FIG. 60
`
`# OF BYTES = LENGTH OF
`INSERTION TEXT
`
`INSERT TEXT
`
`/'100
`
`r;;;\910
`
`\J
`
`0
`
`0
`
`0 0
`
`FIG. 10
`
`SAP Exhibit 1004, Page 10 of 25
`
`

`

`U.S. Patent
`
`Nov. 3, 1998
`
`Sheet 10 of 13
`
`5,832,520
`
`U2
`
`SEARCH THE RAW DIFF FILE FOR INSERT U4
`COMMANDS AND COPY STRING LENGTH
`AND A POINTER TO INSERTION STRING
`INDEX(ISI)
`
`FOR ANY INSERT COMMAND U30
`CODE INDICATING POINTER
`NOT RESOLVED, REPLACE THE
`POINTER WITH AN OISD
`OFFSET
`
`USING THE COMPLETED lSI,
`CHECK THE LENGTH OF EACH
`STRING OF INSERTION TEXT 1+---YES;----<
`(START AT THE FIRST STRING)
`
`U10
`
`U12
`
`U26
`
`UPDATE STATISTICS
`FOR COUNT BITS
`REQUIRED FOR THIS
`INSERT
`
`NO
`
`GET THE STRING FROM THE RAW U14
`DIFF FILE AND SEQUENTIALLY
`SEARCH THE EXISTING OISD FOR
`THE STRING
`
`REPLACE THE FIRST BYTE
`OF THE INSERTION TEXT IN
`THE RAW DIFF FILE WITH
`THE COMMAND CODE FOR
`THAT INSERT COMMAND
`
`U2
`
`u 7
`
`YES
`
`CHANGE THE lSI POINTER
`TO AN OFFSET INTO THE
`OISD WHERE THE STRING
`IS LOCATED AND SET
`COMMAND CODE TO IMP
`
`NO
`
`USE THE lSI TO SEARCH THE U 18
`INSERT STRINGS IN THE RAW
`DIFF FILE AFTER THE CURRENT
`STRING
`
`U32
`
`CHANGE THE lSI POINTER TO
`POINT TO THE BEGINNING OF
`OTHER lSI ENTRY WITH AN
`OFFSET INDICATING THE
`START OF THE INCLUDED
`STRING AND SET COMMAND
`CODE TO IMP (POINTER NOT
`RESOLVED)
`
`ADD THE STRING TO THE OISD
`ANDSETCOMMANDCODETOICP~------------------------~
`(INSERT AT THE CURRENT
`POSITION)
`
`FIG. 7A
`
`SAP Exhibit 1004, Page 11 of 25
`
`

`

`U.S. Patent
`
`Nov. 3, 1998
`
`Sheet 11 of 13
`
`5,832,520
`
`lSI (INSERTION STRING INDEX) PRIOR
`TO BUILDING OISD
`I
`
`u.
`u.
`B
`~~--
`~~
`Oz
`1-Q
`Iii~
`cnw
`tf:cn
`I oz I
`
`::c:
`1-
`(!)
`z
`w
`....1
`
`••••• \
`
`lSI GROWS
`DOWN
`
`Oil
`
`MOSTRE~
`INSER~ENT J
`
`COMMAND
`
`SECOND INSERT
`COMMAND
`
`I
`
`\
`
`u.
`u.
`B
`~§
`~~
`lui=
`cnC::
`u.W
`u.cn
`IOZ
`I
`1\
`'---~--'/
`[FIRST INSERT
`COMMAND
`
`::c:
`G z
`w
`....1
`
`MEMORY
`
`OISD
`I
`
`/
`
`FIG. 78
`
`I
`
`lSI (INSERTION STRING INDEX)
`DURING BUILDING OF OISD
`I
`
`\
`
`••••
`
`lSI GROWS
`DOWN
`
`FIG. 7C
`
`SECOND
`INSERT
`COMMAND
`
`THIRD INSERT
`COMMAND
`
`FIRST
`INSERT
`COMMAND
`
`REPLACE FIRST BYTE OF
`THE INSERTION TEXT
`STRING WITH THE
`COMMAND CODE.
`
`[ . .
`RTION
`TEXT
`STRING
`
`2
`
`INSERTION
`TEXT
`STRING
`
`3
`
`IC = INSERT COMMANDS
`CC = COPY COMMANDS
`RAW DIFF FILE DURING CONSTRUCTION OF OISD
`
`ENDS HERE
`
`FIG. 70
`
`SAP Exhibit 1004, Page 12 of 25
`
`

`

`U.S. Patent
`
`Nov. 3, 1998
`
`Sheet 12 of 13
`
`5,832,520
`
`HUFFMAN
`DECODE
`HEADERI TREES
`
`I
`
`INITIAL
`COMMAND
`STATE
`
`I CCP ICP CCP CMP IMP CCP CMP CMP • • •136749BC3DF67 • • • •I
`
`ENCODED COMMAND
`SEQUENCE
`
`OISD (INSERTION
`TEXT)
`
`MINIMIZED
`COMMANDS
`
`THE OPTIMIZED INSERTION
`STRING DATABASE
`
`FIG. BA
`
`NEXT STATE
`
`CCP
`
`CMP
`
`ICP
`
`CCP
`CMP
`ICP
`
`IMP
`
`0/10*
`0/10*
`
`1
`11
`11
`11
`
`0
`
`IMP
`0
`10
`
`CURRENT
`STATE
`
`NOTES:
`
`=PRESENT STATE
`(PRESENT COMMAND INTERPRETATION)
`
`/X~ =COMMAND CODE THAT TRANSITIONS
`TO THE NEXT STATE
`
`FIG. 88
`
`SAP Exhibit 1004, Page 13 of 25
`
`

`

`U.S. Patent
`
`Nov. 3, 1998
`
`Sheet 13 of 13
`
`5,832,520
`
`405
`
`408
`
`START REVIT PROCESS
`
`DECOMPRESS DIFFERENCE FILE
`
`410 VALIDATE CHECKSUMS OF OLD AND DIFF
`FILES
`
`415
`
`425
`
`430
`
`UNPACK HUFFMAN DECODE TREES
`
`UNPACK FIRST COMMAND AND INITIALIZE
`STATE MACHINE
`
`710
`
`UNPACK NEXT
`COMMAND
`
`435
`
`500
`
`NO 510
`
`COPY "LENGTH
`COUNT' BYTES FROM
`P _OLD FILE TO P _NEW
`AND ADJUST
`POINTERS
`
`COPY "LENGTH
`COUNT' BYTES FROM
`NEW COPY POSITION
`IN OLD FILE TO P _NEW
`AND ADJUST
`POINTERS
`
`610
`
`COPY "LENGTH
`COUNT' BYTES FROM
`P ITTOP NEWAND
`ADJUST POINTERS
`
`COPY "LENGTH
`COUNT' BYTES FROM
`NEW INSERT
`POSITION TO P NEW
`AND ADJUST
`POINTERS
`
`YES
`
`FIG. 9
`
`SAP Exhibit 1004, Page 14 of 25
`
`

`

`5,832,520
`
`10
`
`2
`Another method useful for transmitting updated files is
`using a technique known as a differencing program or
`comparator program to compare an old file to a new revised
`file in order to determine how the files differ. One such file
`system is distributed as part of the GNU UNIX-like oper(cid:173)
`ating system through tools referred to as diff and patch and
`described in standard GNU documentation. The described
`system discusses a way to use a differencing program to
`generate a patch file, and then using that patch file in
`combination with the old file to generate a newly revised
`file. While the GNU revision system has some applications
`within the UNIX-like operating system within which it was
`developed, it has not been generalizable in the new envi(cid:173)
`ronment of personal computer systems. The most commonly
`available versions of the system are limited to text files, and
`achieve only limited compression. These programs cannot
`effectively handle files where a number of fairly complex
`changes have occurred, such as a number of block moves or
`random shuffling of text strings. These programs also do not
`produce the smallest patch file possible.
`What is needed is a method and system for generating a
`difference file from an old file and a new file, where that
`difference file indicates, in minimal number of bytes,
`changes between the old file and the new file. The needed
`system would allow users to then transmit the difference file
`to a second computer system or to a backup or archive
`storage system (system 2), and to use that difference file and
`the old file along with a decoding process to generate a
`newly revised file. The difference file could also be stored
`locally, allowing a number of versions of the same file to be
`saved without duplicating redundant information. Ideally,
`the difference file would be the smallest possible difference
`file, achieving compression density of perhaps 10% or less
`of the original file, even with a moderate number of changes
`between the two files.
`
`20
`
`25
`
`15
`
`1
`AUTOMATIC FILE DIFFERENCING AND
`UPDATING SYSTEM
`BACKGROUND OF THE INVENTION
`This application claims the benefit of a provisional appli(cid:173)
`cation Ser. No. 60/021,457, filed Jul. 3, 1996 now pending
`filed with a source code appendix consisting of 22 pages.
`The appendix contains program source code in the C++
`programming language for software modules that embody
`aspects of the current invention.
`The present invention relates to the field of data files used
`by computers. More specifically, the present invention
`relates to a system for creating, updating or revising large
`computer files by using only a small file containing indica(cid:173)
`tions of the differences between the large computer files and
`a preexisting computer file.
`The present invention is motivated in part by changes that
`have been occurring in the personal computer industry over
`the last several years. Increases in performance and
`decreases in cost have led to a proliferation of computer
`equipment in homes and offices. This computer equipment
`has in turn spawned a burgeoning market for software
`modules that cause the equipment to operate in a desired
`manner. In recent years, the software modules have become
`larger and larger as the price of computer memory and the
`storage space needed to hold these software modules have
`become cheaper and cheaper. This has allowed for the
`development and sale of far more complex executable
`program codes to accomplish various functions such as word
`processing, spreadsheets, multimedia or any other use for a 30
`computer. In addition to executable files, more and more
`complex text and multimedia files, as well as database files,
`are commonly being used and distributed or archived in
`home and office computer systems.
`These large files are distributed from software manufac- 35
`turers to users via a number of different means, including
`being preloaded on a computer's hard drive before the
`computer is purchased, being shipped on a fixed medium
`such as a floppy disk or CD ROM, or being distributed
`through a transmission medium such as a dial-up telephone 40
`service, a BBS, or the Internet.
`It is the nature of computer software and other large files
`that it is often desirable to update or revise files in order to
`correct errors or add features. Sometimes these revisions
`may be relatively minor, involving changes in only a small 45
`percentage of the data that makes up the file.
`One obstacle to the frequent revision of large computer
`files by a manufacturer is the cost of delivering the updated
`file to the user. If an entire new revised file must be
`delivered, the amount of data can be substantial. Large files
`typically are as large as ten million characters (10
`Megabytes) or larger. Distribution of such files on floppy
`disk can require a relatively large amount of disk space.
`Distribution of such large files over a medium such as the
`Internet can take an undesirably long time from the point of
`view of the customer and can consume a large amount of
`server resources from the point of view of the file provider.
`One solution to the problem of distributing large com(cid:173)
`puter files is use of compression. A number of standard
`compression algorithms are in existence and are commonly
`used today. These algorithms typically achieve compression
`of a large executable file down to between 40% to 60% of
`its original file size and can compress some types of text files
`even further, thereby reducing the transaction costs of ship(cid:173)
`ping the file. However, for very large computer files or
`collections of files, even a compressed file reduced to 40%
`still represents a substantial transmission cost.
`
`SUMMARY OF THE INVENTION
`The present invention comprises a software system with
`several components, a method, and a file structure for
`generating very efficient difference files (sometimes abbre(cid:173)
`viated DIFF file) from an old file and a new file so that a
`difference file can be transmitted to a second computer
`system where the difference file and a duplicate of the old
`file can quickly be used to create a copy of the new file,
`duplicating the new file as it existed on the first computer
`system. The difference file could also be stored locally to the
`first computer system, allowing the new file to be duplicated
`from the old file without storing the new file.
`According to the present invention, a differencing process
`50 on a first computer system compares an old file and a new
`file to generate a difference file. In this process, the old file
`is used essentially as a database of byte strings. The differ(cid:173)
`encing process reads strings of data from the new file and
`searches for the existence of those strings in the old file. The
`55 differencing process notes the locations in the old file in
`which strings in the new file are found and stores in a
`difference file an indication of the location where a matching
`string is found and an indication of the length of the
`matching string found in the old file. This information is
`60 stored in the difference file in a copy command. When the
`differencing process encounters strings of characters in the
`new file that are not found in the old file, the differencing
`process adds those strings to an insert database and adds an
`indication in the difference file of the location in the insert
`65 database where the strings can be found and the length of the
`string. This location and length are stored in an insert
`command.
`
`SAP Exhibit 1004, Page 15 of 25
`
`

`

`5,832,520
`
`10
`
`4
`difference method should not take an unreasonable time to
`execute (overnight may be OK in many cases). An index or
`hash table may be used to speed searching, but is not
`necessary for the differencing process (and will not be built
`if sufficient memory is unavailable).
`In this patent application, the present invention is
`described with reference to specific embodiments. It will be
`understood by anyone skilled in the programming art that
`many variations on the basic system and method of the
`present invention are possible within a computer environ(cid:173)
`ment. The invention therefore should not be construed as
`limited except as provided in the attached claims.
`For example, conventional computer systems today
`encode data as a collection of two-state binary units known
`at bits. Most current computers group these bits into 8-bit
`groups known as bytes, also referred to as characters. A
`sequence of bytes or characters is commonly referred to as
`a string. These terms are used in accordance with their
`accepted meaning in the art in this application, but it should
`20 be understood that the techniques of the invention could be
`used in different types of computing systems having differ(cid:173)
`ent means for encoding and organizing data.
`Also, within the art, the terms "text" and "string" are
`sometimes used in a particular way to describe computer
`encoded alphanumeric data, and at other times these terms
`are used very broadly to denote a sequence of data values
`that could represent anything: text, a number, a piece of an
`image, sound, etc. In the present description, these terms and
`other terms used in the art are intended to be given their
`broadest meaning.
`
`15
`
`3
`According to a specific embodiment of the invention, the
`differencing process, upon opening the old file, creates an
`index (or hash table) of all of the character strings of
`predetermined length found in the old file, along with the
`locations at which those character strings were found, in
`order to facilitate searching for character strings from the
`new file. According to a further embodiment, the index (or
`hash table) is created only if the differencing process detects
`that there is sufficient memory on the first computer to hold
`the index.
`The present invention also comprises a file structure for
`the difference file that allows the difference file to hold
`information from which to construct a copy of the new file
`using a duplicate of the old file while occupying the least
`number of bytes. According to specific embodiments, this
`difference file is built in a multistep process to minimize the
`size of the difference file.
`The method according to the invention attempts to mini(cid:173)
`mize the size of the difference file by a variety of techniques,
`any group of which may be incorporated into specific
`embodiments. According to a specific embodiment, the
`entire old file, rather than a limited portion of the data, is
`used as a database. This helps produce a smaller difference
`file by potentially finding more data to copy from the old file.
`In general, as much data as possible is copied from the old
`file, unless it takes fewer bits to insert the data. Copying data
`from the old file requires only a command code, whereas
`inserting data requires a command, plus the actual insertion
`data. The invention makes decisions about whether to copy
`or insert data by using a search algorithm that can do the 30
`following: use a "current positions" pointer into the old file
`for copying data, which eliminates an explicit position field
`if data can be copied from the current position; favor
`copying from the "current position" in the old file; search the
`entire old file for data not found at the "current position" and
`if the data is found elsewhere in the old file, copy from that
`position (unless it "costs more" than inserting enough data
`to allow the next copy from the "current position"); and
`insert data if a copy was not chosen.
`According to an embodiment, the length of the minimum 40
`data string searched for in the old file is selected as "N" bytes
`and is dependent on where the data is copied from. The copy
`from current position command uses a shorter minimum
`data string than copy from another position because it has
`been encoded to require fewer bits.
`According to further embodiments, commands and count
`fields of the difference file are encoded. Commands may be
`encoded using a "state machine", where certain commands
`are implied by the sequence of previous commands, and
`count fields may be encoded with a "cascaded" count field 50
`method. Huffman encoding of smaller count values may be
`employed to further reduce the difference file size, and
`counts larger than the Huffman-encoded counts use three
`progressively larger count fields. The length of these larger
`count fields is variable for each command type, and is 55
`modified based on statistics gathered during the construction
`of each individual difference file.
`According to further embodiments, all command, count
`and position fields are "bit-packed" to eliminate unused
`"filler" bits, and redundant data strings are removed from the
`insertion data before being appended to the difference file.
`The entire minimized difference file (minimized by the
`techniques mentioned above) may be finally compressed
`(using a "well-known" compression algorithm-like "zip" -or
`proprietary compression technique) to reduce the file size. 65
`According to an embodiment of the invention, execution
`time of the differencing step is important only in that the
`
`25
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`35
`
`45
`
`FIG. 1 is a block diagram showing the general differenc(cid:173)
`ing and revision process according to the current invention.
`FIG. 2 is a block diagram of a differencing process and its
`components according to an embodiment of the invention.
`FIG. 3 is a flow chart illustrating a method used by the
`differencing processor according to the present invention.
`FIG. 4A is a diagram of a Text String Index (TSI)
`according to one embodiment of the present invention.
`FIG. 4B is a diagram of a hash table TSI according to a
`preferred embodiment of the present invention.
`FIG. SA is a flow chart illustrating a method used by the
`differencing processor according to the present invention to
`build the raw difference file.
`FIG. 5B is a flow chart illustrating a method used by the
`differencing processor to search for strings in the old file
`according to the present invention.
`FIGS. 6A-6D show flow charts and command structures
`illustrating copy and insert commands inserted into the raw
`difference file according to the present invention.
`FIGS. 7A-7D show a flow chart and index and file
`structures illustrating creating the optimized insert string
`database for including into the final difference file according
`to the present invention.
`FIGS. 8A-8B illustrate the file structure for the final
`60 difference file and a state machine and command format for
`command encoding in the final difference file according to
`an embodiment of the present invention.
`FIG. 9 is a flow chart showing the method of the revision
`processor according to the current invention.
`FIG. 10 shows a computer system incorporating the
`invention.
`
`SAP Exhibit 1004, Page 16 of 25
`
`

`

`5,832,520
`
`10
`
`30
`
`5
`DETAILED DESCRIPTION OF THE DRAWINGS
`Overview
`An overview of the process according to the invention is
`illustrated in FIG. 1. FIG. 1 illustrates a first computer
`system 1 and a second computer system 2 which commu(cid:173)
`nicate via a communication path 5. Both computer systems
`1 and 2 can be any collection of computing devices oper(cid:173)
`ating together whatsoever as is known in the art. Computer
`systems 1 and 2 can also refer to the same computer system,
`or components within the same computer system. Commu-
`nication path 5 may be any means by which a file can be
`communicated between computer system 1 and computer
`system 2, including a removable fixed medium such as a
`floppy disk or CD ROM disk, or a communication medium
`such as an interoffice network, a telephone line, a bus, or the
`Internet. Path 5 also might encompass an electronic mail 15
`message.
`As shown in FIG. 1, computer system 1 includes an old
`file 10, a new file 20, and a differencing processor 100 which
`generates a difference file 30. New file 20 is generally
`somewhat similar to old file 10, containing some changes in
`data which reflect a revision of new file 20. Old file 10 and
`new file 20 could be any collection of computer data
`whatsoever, including an executable application program, an
`operating system program, a text file, a spreadsheet or other
`data file, an image file, or any other type of computer- 25
`readable data.
`Difference processor (Difflt) 100 reads new file 20 and
`compares it to old file 10 by a process described below.
`Difference process 100 then stores indications of the data to
`be copied from old file 10 or inserted from new file 20 into
`difference file 30. According to the invention, when new file
`20 is a revised version of old file 10, difference file 30 will
`be substantially smaller than either new file 20 or old file 10,
`in some cases, only ten percent or less than the size of new
`file 20.
`According to the invention, difference file 30 may then be
`transmitted over path 5 to computer system 2 where a
`revision process (Revlt) 200 reads a duplicate 15 of old file
`10 and difference file 30 and creates a copy 25 of new file
`20 on computer system 2. According to the invention, copy
`25 (designated the New* file) is identical to new file 20
`created on computer system 1.
`Differencing Process Overview
`FIG. 2 is a block diagram illustrating the process of
`differencing process (Difflt process) 100. Difflt process 100
`uses old file 10 and new file 20 to generate difference file 30. 45
`An optional index builder 105 may be used to build a text
`string index (TSI) 106 to speed building the difference file.
`Search engine 104 reads strings of data from new file 20
`and attempts to locate string matches in old file 10 for each
`string found in new file 20. If an index 106 is present, it is
`used by search engine 104 to increase searching speed.
`When search engine 104 finds a match in old file 10 for
`a string from new file 20, it indicates this by placing a copy
`command into a raw difference file 31 (RawDiff). The copy
`command includes an indication of where the string is found
`in 10 and the length of the string. When search engine 104
`does not find a match in old file 10 for a string from new file
`20, it indicates this by placing an insert command into a raw
`difference file 31 (RawDifl). The insert command includes
`the text for the string that was not found in old file 10.
`According to an embodiment of the invention, an opti(cid:173)
`mized insertion string database (OISD) engine 101 examines
`the raw DIFF file 31 to create an OISD 110, using an
`insertion string index 113. A command encoder 102 and a
`count encoder 103 encode command codes and count fields
`into the final difference file 30 which also includes OISD
`110.
`
`6
`The details of the operation of the elements shown in FIG.
`2 according to specific embodiments will now be described.
`Differencing Process Flowchart
`FIG. 3 is a detailed flow chart of the operation of
`differencing processor 100. The process begins (S2) and old
`file 10 and new file 20 are opened by the processor (S4).
`Checksums are formed for both the old file and the new file
`and are stored (S6). The processor then checks to see if there
`is sufficient memory to create a text string index and if so a
`text string index (TSI) is built (S10). A raw difference file is
`created by searching for strings from the new file in the old
`file, using the TSI if one was created (S12).
`After the raw difference file is created, an Optimized
`Insertion String Database (OISD) is generated from the
`insertion text and commands in the raw difference file as
`illustrated in FIGS. 7A, 7B, 7C and 7D (S14). The com-
`mands in the raw difference file are encoded to minimize
`their size (S16) by various possible encoding techniques
`including a state-machine for command encoding and Huff-
`20 man encoding for string length values. The encoded com(cid:173)
`mands are then placed in the final difference file (S16). The
`Huffman decode tables and initial state of the command
`decode state machine are then appended to the difference file
`(S17) followed by

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket