`Williams
`
`[54]
`
`METHOD FOR PARTITIONING A BLOCK
`OF DATA INTO SUBBLOCKS AND FOR
`STORING AND COMMUNCATING SUCH
`SUBBLOCKS
`
`[76]
`
`Inventor: Ross Neil Williams, 3/305 N. Terrace,
`Adelaide SASOOO, Australia
`
`[21]
`
`Appl. No.:
`
`08/894,091
`
`[22]
`
`PCT Filed:
`
`Feb. 15, 1996
`
`[86]
`
`PCTNo.:
`
`PCT/AU96/00081
`
`§ 371 Date:
`
`Aug. 15, 1997
`
`§ 102(e) Date: Aug. 15, 1997
`
`[87] PCT Pub. No.: W096/25801
`
`PCT Pub. Date: Aug. 22, 1996
`
`[30]
`
`Foreign Application Priority Data
`
`Feb. 17, 1995
`Apr. 12, 1995
`
`[AU] Australia ................................. PN1232
`[AU] Australia ................................. PN2392
`
`Int. Cl.6
`...................................................... H03M 7/00
`[51]
`[52] U.S. Cl. ................................................. 341/51; 341/67
`[58] Field of Search .................................. 341!51, 50, 67;
`375/241; 704/203
`
`[56]
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`4,698,628 10/1987 Herkert et a!. ..................... 340/825.02
`8/1993 Sugiyama eta!. ........................ 341!67
`5,235,623
`5,479,654 12/1995 Squibb .................................... 395/600
`
`OTHER PUBLICATIONS
`
`Williams, Ross, "An algorithm for matching text (possibly
`original)", Newsgroup posting, comp.compression, Jan. 27,
`1992.
`Williams, Ross, "Parallel data compression", Newsgroup
`posting, comp.conmpression.research, Jun. 30, 1992.
`
`111111111111111111111111111111111111111111111111111111111111111111111111111
`US005990810A
`[11] Patent Number:
`[45] Date of Patent:
`
`5,990,810
`Nov. 23, 1999
`
`Knuth, Donald E., "The Art of Computer Programming, vol.
`3: Sorting and Searching", pp. 508-513, Addison-Wesley
`Publishing Company, 1973.
`Williams, Ross N., "An Introduction to Digest Algorithms"
`Proceedings of the Digital Equipment Computer Users Soci(cid:173)
`ety, pp. 9-18, Aug. 1994.
`Williams, Ross N., "An Extremely Fast ZIV-Lempel Data
`Compression Algorithm", Proceedings of Data Compression
`Conference, pp. 362-371, Apr. 1991.
`Knuth, Donald E., The Art of Computer Programming, vol.
`1: Fundamental Algorithms, pp. 435-451, Addison Wesley
`Publishing Company, 1973.
`
`Primary Examiner-Brian Young
`Attorney, Agent, or Firm--Greenberg Traurig; Robert P.
`Bell
`
`[57]
`
`ABSTRACT
`
`This invention provides a method and apparatus for detect(cid:173)
`ing common spans within one or more data blocks by
`partitioning the blocks (FIG. 4) into subblocks and searching
`the group of subblocks (FIG. 12) (or their corresponding
`hashes (FIG. 13)) for duplicates. Blocks can be partitioned
`into subblocks using a variety of methods, including meth(cid:173)
`ods that place subblock boundaries at fixed positions (FIG.
`3), methods that place subblock boundaries at data(cid:173)
`dependent positions (FIG. 3), and methods that yield mul(cid:173)
`tiple overlapping subblocks (FIG. 6). By comparing the
`hashes of subblocks, common spans of one or more blocks
`can be identified without ever having to compare the blocks
`or subblocks themselves (FIG. 13). This leads to several
`applications including an incremental backup system that
`backs up changes rather than changed files (FIG. 25), a
`utility that determines the similarities and differences
`between two files (FIG. 13), a file system that stores each
`unique subblock at most once (FIG. 26), and a communi(cid:173)
`cations system that eliminates the need to transmit sub blocks
`already possessed by the receiver (FIG. 19).
`
`30 Claims, 26 Drawing Sheets
`
`Hash of subbblock
`I
`The subbblock itself
`I
`Hash of subbblock
`I
`Pointer to subbblock
`
`A Projection Of b
`
`SAP Exhibit 1006, Page 1 of 51
`
`
`
`U.S. Patent
`
`Nov. 23, 1999
`
`Sheet 1 of 26
`
`5,990,810
`
`MADD 0003
`Sheet 1 of 26
`
`Demonstr ates con tent mis alignmen t.
`XDemonst rates co ntent mi salignme nt.
`
`Figure 1
`
`SAP Exhibit 1006, Page 2 of 51
`
`
`
`U.S. Patent
`
`Nov. 23, 1999
`
`Sheet 2 of 26
`
`5,990,810
`
`Fixed an I d variab lie width I partiti I oning.
`Fixe I d an 1 d variable wid I th part I itioning.
`
`Figure 2
`
`SAP Exhibit 1006, Page 3 of 51
`
`
`
`U.S. Patent
`
`Nov. 23, 1999
`
`Sheet 3 of 26
`
`5,990,810
`
`Data-indep endent partitioning.
`XData-inde pendent pa rtitioning . I
`
`Data-dep I edent 1 partiti I oning.l
`XData-dep I endent partiti I oning.l
`
`Figure 3
`
`SAP Exhibit 1006, Page 4 of 51
`
`
`
`U.S. Patent
`
`Nov. 23, 1999
`
`Sheet 4 of 26
`
`5,990,810
`
`A
`
`B
`
`A
`
`B
`
`b~~~i~l~+f~~~~-~~~~~~~i~l-~rf-~1--l~~~,~~~~~~i~l-~rr~~-~~-~~~
`
`A
`
`B
`
`Function result space
`
`Figure 4
`
`SAP Exhibit 1006, Page 5 of 51
`
`
`
`U.S. Patent
`
`Nov. 23, 1999
`
`Sheet 5 of 26
`
`5,990,810
`
`p
`
`•
`Increase p
`
`F( bp-A+1···bp,bp+1···bp+B) ----•~1 (subclass) I
`
`Function result space
`
`Figure 5
`
`SAP Exhibit 1006, Page 6 of 51
`
`
`
`U.S. Patent
`
`Nov. 23, 1999
`
`Sheet 6 of 26
`
`5,990,810
`
`r-----,-----~---------------------,------------------~-----T------------•------1
`I
`1
`I
`I
`I
`I
`I
`1
`I
`I
`I
`I
`I
`I
`
`I
`
`F3 1
`F2 ~----- L--;-- ~-------------- -;------ L----- ~----------- -'----- -'------ ~----- ~----- ~
`F1 I
`bi I I I I I j I j I I I I I I j I I I I I I j I I i
`
`I
`I
`I
`I
`I
`I
`~--------~--------,------~---L-----------~-----,------------------~--~--------~
`I
`I
`I
`I
`:
`
`Figure 6
`
`SAP Exhibit 1006, Page 7 of 51
`
`
`
`U.S. Patent
`
`Nov. 23, 1999
`
`Sheet 7 of 26
`
`5,990,810
`
`·-----,------------------~
`I
`I
`I
`1
`I
`I
`- L - -
`'- -
`- -•
`
`-
`
`- -
`
`- -
`
`- -
`
`- - -
`
`- - -
`
`-
`
`- - -
`
`~--------------------~---------·
`I
`I
`I
`I
`I
`I
`•---------------------~--------~
`
`,---------------~-----------------,------------------------~------------------·
`1
`I
`I
`I
`I
`1
`I
`I
`I
`I
`'---------------~------------------L------------------------L-----------------J
`
`r-----•--------~--~------~--------r-----T·----~--~---------~--------~--------~
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`
`F·
`bi I j I I j
`
`j I j I I \ I j I j
`
`j I I j I I j I I j
`
`Figure 7
`
`SAP Exhibit 1006, Page 8 of 51
`
`
`
`U.S. Patent
`
`Nov. 23, 1999
`
`Sheet 8 of 26
`
`5,990,810
`
`1
`I
`I
`.-
`I
`L--------~------·-----------~-----L-----~--------------- 1---L---~-------------~
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`
`I
`
`F3 ;---------------------------------~------------------------~------------------:
`F2 l
`F 1 ~ -----~ --1 -
`bi I j I I j j I i I I i I I i i I i I I I i j i I i
`
`-
`
`-
`
`-
`
`- ~ -
`
`- ~- -
`
`-
`
`-
`
`- -;- -
`
`- L -
`
`-
`
`-
`
`-
`
`- ~ -
`
`-
`
`-
`
`-
`
`- J - - ~ - - ~- - - - - -;- - - '- - - - - - - '- - ~ - - ~ - - ~ - - - - - ~
`
`Figure 8
`
`SAP Exhibit 1006, Page 9 of 51
`
`
`
`U.S. Patent
`
`Nov. 23, 1999
`
`Sheet 9 of 26
`
`5,990,810
`
`.----------------------------------------------------------------------------·
`
`I
`
`I
`I
`I
`~----------------------------------------------------------------------------~
`I
`I
`I
`I
`~
`I
`I
`I
`I
`~--------------1---------------------r--------------~---------------r--------~
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`r-----r--------~--~------.------------~--------,------1---------r-----·-----,--,
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`
`F·
`bi I j I I i j I j I I I j I I j I i I I j I i I j
`
`i
`
`Figure 9
`
`SAP Exhibit 1006, Page 10 of 51
`
`
`
`U.S. Patent
`
`Nov. 23, 1999
`
`Sheet 10 of 26
`
`5,990,810
`
`·-- ----~
`
`A B
`
`k
`
`·-- ----~
`
`A B
`
`k
`
`·-- ----~
`
`A B
`
`k
`
`Hashes of
`subblocks
`
`Figure 10
`
`SAP Exhibit 1006, Page 11 of 51
`
`
`
`U.S. Patent
`
`Nov. 23, 1999
`
`Sheet 11 of 26
`
`5,990,810
`
`·-- ----~
`
`A B
`
`k
`
`b
`
`\
`\
`\
`
`\ . \
`
`\
`\
`\
`\
`\
`\
`\
`\
`\
`\
`\
`\
`\
`\
`
`Hash of subbblock
`I
`The subbblock itself
`I
`Hash of subbblock
`I
`Pointer to subbblock
`
`·-- ----~
`
`A B
`
`k
`
`·-- ----~
`
`A B
`
`k
`
`\
`\
`\
`\
`\
`\
`\
`\
`\
`\
`\
`\
`\
`\
`\
`\
`\
`
`A Projection Of b
`
`Figure 11
`
`SAP Exhibit 1006, Page 12 of 51
`
`
`
`U.S. Patent
`
`Nov. 23, 1999
`
`Sheet 12 of 26
`
`5,990,810
`
`A B
`·-~r---·
`b1l I I I . I I I
`"' -
`',
`.,""
`"'
`....
`
`..........
`
`....
`
`~
`
`--
`
`Figure 12
`
`SAP Exhibit 1006, Page 13 of 51
`
`
`
`U.S. Patent
`
`Nov. 23, 1999
`
`Sheet 13 of 26
`
`5,990,810
`
`·-- ----·
`
`A
`
`k
`
`8
`
`·-- ----·
`
`A
`
`k
`
`8
`
`·-- ----·
`
`k
`A
`
`8
`
`·-- ----·
`
`k
`A
`
`8
`
`Figure 13
`
`SAP Exhibit 1006, Page 14 of 51
`
`
`
`U.S. Patent
`
`Nov. 23, 1999
`
`Sheet 14 of 26
`
`5,990,810
`
`Entry lists
`
`Subblock storage
`
`Figure 14
`
`SAP Exhibit 1006, Page 15 of 51
`
`
`
`U.S. Patent
`
`Nov. 23, 1999
`
`Sheet 15 of 26
`
`5,990,810
`
`r--------o
`
`Q)
`c: ~
`()
`><
`.
`::J
`. ~
`,....
`1.-
`(/)
`><
`c:
`0
`><
`()
`
`Q) a: .._
`
`------
`
`E
`>-. .
`,....
`>-
`)..:
`
`C\J w
`
`c:
`"'00
`Q)~
`.
`::::::
`ctS
`·- ......
`Et:><
`C/)Q)'+-
`c: ~ 0
`ctS
`1.... c.
`1- Q)
`
`1.-
`
`1.-
`
`c:
`><
`,....
`><
`><
`
`E >-. .
`,....
`>-
`)..:
`
`,....
`w
`
`SAP Exhibit 1006, Page 16 of 51
`
`
`
`U.S. Patent
`
`Nov. 23, 1999
`
`Sheet 16 of 26
`
`5,990,810
`
`X: X1 .. Xn
`
`Y: Y1 .. Ym
`
`Figure 16
`
`SAP Exhibit 1006, Page 17 of 51
`
`
`
`U.S. Patent
`
`Nov. 23, 1999
`
`Sheet 17 of 26
`
`5,990,810
`
`~:·:·:·:·:·:·:·:·:·:·=r·:-:-:-:F:ii:·:·:·:·:·:·~
`.·.•.•.• .. •.•.•.•.•.•. ·.·~ ·.·.·.
`. ·.·,·············· ·.· .. ·.
`. .... .
`. .......... .
`X: X1 .. Xn
`
`1 - - - - - - 1 Projection of Y
`
`I
`I
`I
`I
`I
`I
`I
`
`I
`
`' I
`
`I
`I
`I
`I
`I
`I
`I
`
`Figure 17
`
`Y: Y1 .. Ym
`
`SAP Exhibit 1006, Page 18 of 51
`
`
`
`U.S. Patent
`
`Nov. 23, 1999
`
`Sheet 18 of 26
`
`5,990,810
`
`1----------------------- I
`I ~'r··········'l; ..... w· · · ·· -~~ I
`~,~;:;:;:;:;:;:;:;:;:;:~:::::::::M::::::::::::~~ I
`I
`~ I
`X: X1 .. Xn
`:
`. ---------------------- _,
`
`I
`I
`
`Y: Y1 .. Ym
`
`Figure 18
`
`SAP Exhibit 1006, Page 19 of 51
`
`
`
`U.S. Patent
`
`Nov. 23, 1999
`
`Sheet 19 of 26
`
`5,990,810
`
`E
`>-
`.,....
`>-
`>-
`
`C\1
`w
`
`.,....
`w
`
`E
`>-
`.,....
`>-
`>-
`
`c
`X
`.,....
`X
`X
`
`C\1 w
`
`"''cn
`Q)~
`--u
`~cno
`~Q)-
`··· ~~
`a: en
`Q)::J
`
`Q)
`c
`
`Q)
`E
`i-
`
`.,....
`w
`
`SAP Exhibit 1006, Page 20 of 51
`
`
`
`U.S. Patent
`US. Patent
`
`Nov. 23, 1999
`Nov. 23, 1999
`
`Sheet 20 0f 26
`Sheet 20 of 26
`
`5,990,810
`5,990,810
`
`E
`>-
`T""" >-
`:>.:
`
`c ><
`T""" ><
`><
`
`C\J w
`
`(J)
`c
`
`r/A
`
`C\J w
`
`T""" w
`
`c ><
`T""" ><
`><
`
`9.:9::
`
`(J)
`E
`~
`
`omBswa
`
`c ><
`T""" ><
`><
`
`
`
`VAREEEmmmmmmmm
`
`-T""" w
`
`SAP Exhibit 1006, Page 21 of 51
`
`SAP Exhibit 1006, Page 21 of 51
`
`
`
`U.S. Patent
`
`Nov. 23, 1999
`
`Sheet 21 of 26
`
`5,990,810
`
`E
`>-. .
`. .
`>-
`
`C\1 w
`
`'+-
`1-Q
`0
`cn 00 E
`Q)~>-
`:t= c
`.
`+-'Q),...:
`a5Ci5>-
`_Q)
`
`" ' ' '+-
`
`........... ..,....... ......
`
`1....
`
`c
`><
`~ w
`><
`><
`
`~
`
`SAP Exhibit 1006, Page 22 of 51
`
`
`
`U.S. Patent
`US. Patent
`
`Nov. 23, 1999
`Nov. 23, 1999
`
`Sheet 22 0f 26
`Sheet 22 of 26
`
`5,990,810
`5,990,810
`
`.
`X
`........
`0
`
`-~-~ j~f
`
`C\1 w
`
`~
`_H_
`
`~
`_H_
`
`X
`........
`0
`~
`.~
`
`._x633,529:
`
`+-' c
`Q)
`"'0
`·--
`Q)
`..c
`I-
`
`........
`0
`Q) ~ •
`..c ·- ·-
`._x
`t-EX
`Q)
`"'0
`
`ho3:52
`
`9i
`
`._xB3399t
`
`"'0
`Q)
`..c
`I-
`
`X
`
`(\J
`w
`
`Q)
`c
`
`8:me:
`
`Q)
`E
`~
`
`N
`N
`QJ
`
`mm2:me
`~ =
`."""' ~
`
`b()
`
`c
`X
`
`T'""
`
`X
`X
`
`T'"" w
`
`c
`X
`
`T'""
`
`X
`X
`
`T'"" w
`
`SAP Exhibit 1006, Page 23 of 51
`
`SAP Exhibit 1006, Page 23 of 51
`
`
`
`
`U.S. Patent
`
`Nov. 23, 1999
`
`Sheet 23 of 26
`
`5,990,810
`
`..0
`
`,._ LL
`0
`0>
`,.,.__
`c
`(/) ·-c(\i
`ctS
`:::J
`Q ) -
`~ ~
`Q)
`
`(/)
`:::J
`C\i
`,._
`ctS
`a.
`a.
`<(
`
`.........
`(/) co .-.
`,.,.__c......._co
`oo'~"
`·- ~ c.o
`+-'+-'""'
`Q) · - \.",/
`.........
`. ~
`en t
`ctS 0> ('I)
`Q.a.),.-
`..._,
`
`~
`N
`QJ
`
`J-c = b()
`
`·~
`~
`
`SAP Exhibit 1006, Page 24 of 51
`
`
`
`U.S. Patent
`
`Nov. 23, 1999
`
`Sheet 24 of 26
`
`5,990,810
`
`·-- ----..
`
`A B
`
`k
`
`·-- ----..
`
`A B
`
`k
`
`·-- ____ ..
`
`A B
`
`k
`
`Hashes of
`subblocks
`
`Table of hashes
`
`Figure 24
`
`SAP Exhibit 1006, Page 25 of 51
`
`
`
`d .
`\Jl .
`
`,- ---- -- -- - -- -- -- -,
`: ~::::;:::j:l:::;:l:l:ftJI~::::::::::::~:
`•
`X: X1 .. Xn
`•
`·----------------~
`X can be
`reconstructed
`from Y and D.
`~ ~:
`Y: Y1 .. Ym
`(Previously
`backed up
`version).
`E2
`
`~::::::::;:;:;::::::::I::::::::::E:3::::::::::::~
`X: X1 .. Xn: Current
`version (to be
`backed up).
`
`~
`
`H(Y1 ) ... H(Ym)
`(Hashes of subblocks
`in previously backed
`up version (Y)).
`
`E1
`
`D: Incremental
`backup
`consists of
`subblocks in X
`and references
`to subblocks in
`Y (and D).
`
`Figure 25
`
`SAP Exhibit 1006, Page 26 of 51
`
`
`
`U.S. Patent
`
`Nov. 23, 1999
`
`Sheet 26 of 26
`
`5,990,810
`
`Filename
`
`Blocks
`
`aardvark. txt
`
`walrus. txt
`
`sloth.dat
`
`File Table
`
`421
`
`0 1 3
`
`544
`
`D
`
`D
`
`c=J
`
`Hash
`830..
`0
`9F8..
`1
`2 AA6..
`3
`092..
`4 3EB..
`
`Refs
`1
`2
`1
`1
`3
`
`Hash Table
`
`Figure 26
`
`D
`
`Subblock storage system
`(Note: Subblocks are all
`different).
`
`SAP Exhibit 1006, Page 27 of 51
`
`
`
`5,990,810
`
`1
`METHOD FOR PARTITIONING A BLOCK
`OF DATA INTO SUBBLOCKS AND FOR
`STORING AND COMMUNCATING SUCH
`SUBBLOCKS
`
`INTRODUCTION
`
`The present invention provides a method and apparatus
`for partitioning one or more blocks of data into subblocks for
`the purpose of communicating and storing such subblocks in
`an efficient manner.
`
`BACKGROUND
`
`15
`
`Much of the voluminous amount of information stored,
`communicated, and manipulated by modern computer sys(cid:173)
`tems is duplicated within the same or a related computer
`system. It is commonplace, for example, for computers to
`store many slightly differing versions of the same document.
`It is also commonplace for data transmitted during a backup
`operation to be almost identical to the data transmitted 20
`during the previous backup operation. Computer networks
`also must repeatedly carry the same or similar data in
`accordance the requirements of their users.
`Despite the obvious benefits that would flow from a
`reduction in the redundancy of communicated and stored 25
`data, few computer systems perform any such optimization.
`Some instances can be found at the application level, one
`example being the class of incremental backup utilities that
`save only those files that have changed since the most recent
`backup. However, even these utilities do not attempt to 30
`exploit the significant similarities between old and new
`versions of files, and between files sharing other close
`semantic ties. This kind of redundancy can be approached
`only by analysing the contents of the files.
`The present invention addresses the potential for reducing 35
`redundancy by providing an efficient method for identifying
`identical portions of data within a group of blocks of data,
`and for using this identification to increase the efficiency of
`systems that store and communicate data.
`
`2
`subblocks whereas the data-dependent partitioning gener(cid:173)
`ates just four, allowing much of the similarity between the
`two blocks to be detected.
`The fact that a partitioning is data dependent does not
`imply that it must incorporate any knowledge of the syntax
`or semantics of the data. So long as the boundaries are
`positioned in a manner dependent on the local data content,
`identical subblocks are likely to be formed from identical
`portions of data, even if the two portions are not identically
`10 aligned relative to the start of their enclosing blocks (FIG.
`3).
`Once the group of blocks has blocks has been partitioned
`into subblocks, the resulting group of subblocks can be
`manipulated in a manner that exploits the occurrence of
`duplicate subblocks. This leads to a variety of applications,
`some of which are described below. However, the applica(cid:173)
`tion of a further aspect of the invention leads to even greater
`benefits.
`In a further aspect of the invention, the hash of one or
`more subblocks is calculated. The hash function can be an
`ordinary hash function or one providing cryptographic
`strength. The hash function maps each sub block into a small
`tractable value (e.g. 128 bits) that provides an identity of the
`subblock. These hashes can usually be manipulated more
`efficiently than their corresponding subblocks.
`Some applications of aspects of this invention are:
`Fine-grained incremental backups: Conventional incre(cid:173)
`mental backup technology uses the file as the unit of backup.
`However, in practice many large files change only slightly,
`resulting in a wasteful re-transmission of changed files. By
`storing the hashes of subblocks of the previous versions of
`files, the transmission of unchanged sub blocks can be elimi(cid:173)
`nated.
`Communications: By providing a framework for commu(cid:173)
`nicating the hashes of sub blocks, the invention can eliminate
`the transmission of subblocks already possessed by the
`receiver.
`Differences: The invention could be used as the basis of
`40 a program that determines the areas of similarity and dif(cid:173)
`ference between two blocks.
`Low-redundancy file system: Data stored in a file system
`can be partitioned into subblocks whose hashes can be
`compared so as to eliminate the redundant storage of iden-
`45 tical subblocks.
`Virtual memory: Virtual memory could be organized by
`sub block using a table of hashes to determine if a sub block
`is somewhere in memory.
`
`SUMMARY OF THE INVENTION
`
`To identify identical portions of data within a group of
`blocks of data, the blocks must be analysed. One simple
`approach is to divide the blocks into fixed-length (e.g.
`512-byte) subblocks and compare these with each other so
`as to identify all identical sub blocks. This knowledge can the
`be used to manage the blocks in more efficient ways.
`Unfortunately, the partitioning of blocks into fixed-length
`sub blocks does not always provide a suitable framework for 50
`the recognition of duplicated portions of data, as identical
`portions of data can occur in different sizes and places within
`a group of blocks of data. FIG. 1 shows how division into
`fixed-size sub blocks of two blocks (whose only difference is
`the insertion of a single byte ('X')) fails to generate identical 55
`subblocks. A comparison of the two groups of subblocks
`would reveal no identical pairs of subblocks even thought
`the two original blocks differ by just one character.
`A better approach is to partition each block using the data
`in the block itself to determine the position of the partitions. 60
`In an aspect of the invention, the blocks are partitioned at
`boundaries determined by the content of the data itself. For
`example, a block could be partitioned at each point at which
`the preceding three bytes has to a particular constant value.
`FIG. 2 shows how such a data dependent partitioning could 65
`turn out, and contrasts it with a fixed-length partitioning. In
`FIG. 3 data independent partitioning generates seven distinct
`
`Clarification of Terms
`
`The term block and subblock both refer, without
`limitation, to finite blocks or infinite blocks (sometimes
`called streams) of zero or more bits or bytes of digital data.
`Although the two different terms ("blocks" and "subblock")
`essentially describe the same substance (digital data), the
`two different terms have been employed in this specification
`to indicate the role that a particular piece of data is playing.
`The term "block" is usually used to refer to raw data to be
`manipulated by aspects of the invention. The term "sub(cid:173)
`block" is usually used to refer to a part of a block. "Blocks"
`are "partitioned" into "sub blocks".
`The term partition has its usual meaning of exhaustively
`dividing an entity into mutually exclusive parts. However,
`within this specification, the term also includes cases where:
`Not all of the block is subdivided.
`Multiple overlapping subblocks are formed.
`
`SAP Exhibit 1006, Page 28 of 51
`
`
`
`5,990,810
`
`10
`
`4
`3
`A natural number is a non-negative integer (0, 1, 2, 3, 4,
`a. Partitioning one or more of said blocks into one or more
`5, ... ).
`subblocks in accordance with one of aspects 1 to 6.
`b. Forming a projection of each said block, being an
`Where the phrase zero or more is used, this phrase is
`ordered or unordered collection of elements, wherein each
`intended to encompass the degenerate case where the objects
`element consists of a subblock, an identity of a subblock, or
`being enumerated are not considered at all, as well as the
`a reference of a subblock.
`case where zero or more objects are used.
`c. Comparing the elements of said projections of said
`BRIEF DESCRIPTION
`blocks.
`The following aspects of this invention are numbered for
`10. In a further aspect of the invention, the invention
`reference purposes. The terms "block" and "subblock" refer
`provides a method for representing one or more blocks,
`to blocks and subblocks of digital data.
`comprising:
`1. In an aspect of the invention, the invention provides a
`(i) A collection of subblocks;
`method for organizing a block b of digital data for the
`(ii) Block representatives (e.g. filenames) which are
`purpose of storage, communication, or comparison, by par(cid:173)
`mapped to lists of entries that identify subblocks;
`titioning said block into subblocks at one or more positions
`whereby the modification of one of said blocks involves the
`15 following steps:
`klk+1 within said block for which b[k-A+1 ... k+B]
`a. Partitioning some or all of said modified block into
`satisfies a predetermine constraint, where A and B are
`subblocks in accordance with one of aspects 1 to 6;
`natural numbers.
`Note: The specification of this aspect encompasses the
`b. Adding to said collection of subblocks zero or more
`degenerate case in which either A or B is zero. The speci(cid:173)
`subblocks that are not already in said collection, and updat-
`20 ing said subblock list associated with said modified block.
`fication also includes the case where the constraint does not
`pay attention to some parts of b[k-A+1 ... k+B]. For
`11. In a further aspect of the invention, the invention
`example, a constraint that pays attention only to (say) b[k-3]
`provides a method according to aspect 10, in which step b
`and b[k+2] would fall under the classes of constraint cor(cid:173)
`is replaced by:
`responding to A~4 and B~2.
`b. Removing from said collection of subblocks zero or
`2. In a further aspect of the invention, the invention
`25 more subblocks, and updating said subblock list associated
`provides a method according to aspect 1 in which the
`with said modified block.
`constraint comprises the hash of some or all of b[k-A+ 1 ...
`12. In a further aspect of the invention, the invention
`k+B].
`provides a method according to aspect 10, in which step b
`3. In a further aspect of the invention, the invention
`is replaced by:
`~~~~~~~ss:b~~~~o~0a~~~~~~n~ntoa a:fct~c~i' af~ro!ft~~~n~PtZ~ 30
`b. Adding to said collection of subblocks zero or more
`subblocks that are not already in the collection, removing
`within a said block, comprising the step of:
`from said collection of subblocks zero or more subblocks,
`a. Evaluating whether said predetermined constraint is
`and updating said subblock list associated with said modi(cid:173)
`satisfied at each position klk+1, for increasing (or
`fied block.
`decreasing) k, where k starts with the value p.
`13. In a further aspect of the invention, the invention
`4. In a further aspect of the invention, the invention 35
`provides a method for an entity El to communicate a block
`provides a method according to aspect 1, wherein one or
`X to El where El possesses the knowledge that E2 pos(cid:173)
`more bounds are imposed on the size of one or more
`sesses a group Y of sub blocks Y 1 . . . Y m' comprising the
`sub blocks.
`following steps:
`5. In a further aspect of the invention, the invention
`a. Partitioning X into subblocks X1 . . . Xn in accordance
`provides a method according to aspect 1, wherein additional 40
`with one of aspects 1 to 6;
`sub blocks are formed from one or more groups of subblocks.
`b. Transmitting from El to E2 the contents of zero or more
`6. In a further aspect of the invention, the invention
`provides a method according to aspect 1, wherein an addi(cid:173)
`subblocks in X, and the remaining subblocks as references
`tional hierarchy of subblocks is formed from one or more
`. Y m and to sub blocks already
`to sub blocks in Y 1
`.
`.
`groups of contiguous subblocks.
`45 transmitted.
`Note: In most implementations of this aspect, the subblocks
`7. In a further aspect of the invention, the invention
`provides a method according to one of aspects 1 to 6,
`whose contents are transmitted will be those in X that are not
`in Y, and for which no identical subblock has previously
`comprising the further step of:
`b. Calculating the hash of each of one or more of said
`been transmitted.
`so Note: To posses knowledge that E2 possesses Y 1 . . . Y m' El
`sub blocks.
`. Y m itself. El need only
`Note: The resulting collection of hashes is particularly
`need not actually posses Y 1
`.
`.
`useful if H is a strong one-way hash function.
`posses the identities of Y 1 . . . Ym (e.g. the hashes of each
`sub blockY 1 . . . Y m)· This specification is intended to admit
`8. In a further aspect of the invention, the invention
`any other representation in which El may have the knowl-
`provides a method according to one of aspects 1 to 6,
`ss edge that E2 possesses (or has access to) Y 1 . . . Y m· In
`comprising the further step of:
`b. Forming a projection of said block, being an ordered or
`particular, the knowledge may take the form of a projection
`unordered collection of elements, wherein each element
`ofY.
`Note: It is implicit in this aspect the El will be able to use
`consists of a subblock, an identity of a subblock, or a
`reference of a subblock.
`comparison (or other methods) to use its knowledge of E2's
`Note: The specification of this aspect is intended to admit 60
`possession of Y to determine the set of subblocks that are
`common to both X andY. For example, if El possessed the
`collections that contain a mixture of various kinds of iden(cid:173)
`hashes of the sub blocks of Y, it could compare them to the
`tities and references.
`Note: In most applications, the output of this aspect will be
`hashes of the subblocks of X to determine the subblocks
`an ordered list of hashes of the subblocks of the block.
`common to both X and Y. Subblocks that are not common
`9. In a further aspect of the invention, the invention 65 can be transmitted explicitly. Subblocks that are common to
`both X andY can be transmitted by transmitting a reference
`provides a method for comparing one or more blocks,
`comprising the steps of:
`to the subblock.
`
`SAP Exhibit 1006, Page 29 of 51
`
`
`
`5,990,810
`
`10
`
`30
`
`5
`14. In a further aspect of the invention, the invention
`provides a method for an entity E1 to communicate one or
`more subblocks of a group X of subblocks X1 . . . Xn to E2
`where E1 possesses the knowledge that E2 possesses the
`blocks Y, comprising the following steps:
`a. Partitioning Y into sub blocks Y 1 . . . Y m in accordance
`with one or aspects 1 to 6;
`b. Transmitting from E1 to E2 the contents of zero or more
`subblocks in X, and the remaining subblocks as references
`to subblocks in Y and to subblocks already transmitted.
`15. In a further aspect of the invention, the invention
`provides a method for an entity E1 to communicate a block
`X to E2 where E1 possesses the knowledge that E2 pos(cid:173)
`sesses block Y, comprising the following steps:
`a. Partitioning in accordance with one of aspects 1 to 6, X 15
`into subblocks x1 ... xn and y into subblocks y 1 ... y m;
`b. Transmitting from E1 to E2 the contents of subblocks
`in X, and the remaining sub blocks as references to sub blocks
`in Y and to subblocks already transmitted.
`16. In a further aspect of the invention, the invention
`provides a method for constructing a block D from a block
`X and a group Y of subblocks Y 1 . . . Y m such that X can be
`constructed from Y and D, comprising the following steps:
`a. Partitioning X into subblocks X1 . . . Xn in accordance
`with one of aspects 1 to 6;
`b. Constructing D from one or more of the following: the
`contents of zero or more subblocks in X, references to zero
`or more subblocks in Y, and references to zero or more
`subblocks in D.
`Note: Step b above is intended to encompass the case where
`a mixture of the elements it describes is used.
`17. In a further aspect of the invention, the invention
`provides a method for constructing a block D from a group
`X of subblocks X1 . . . Xn and a blockY such that X can be
`constructed from Y and D, comprising the following steps:
`a. Partitioning Y into sub blocks Y 1 . . . Y m in accordance
`with one of aspects 1 to 6;
`b. Constructing D from one or more of the following: the
`contents of zero or more subblocks in X, references to zero
`or more subblocks in Y, and references to zero or more
`subblocks in D.
`18. In a further aspect of the invention, the invention
`provides a method for constructing a block D from a block
`X and a blockY such that X can be constructed from Y and
`D, comprising the following steps:
`a. Partitioning in accordance with one of aspects 1 to 6, X
`into subblocks x1 ... xn and y into subblocks y 1 ... y m;
`b. Constructing D from one or more of the following: the
`contents of zero or more subblocks in X, references to zero
`or more subblocks in Y, and references to zero or more
`subblocks in D.
`19. In a further aspect of the invention, the invention
`provides a method for constructing a block D from a block
`X and a projection of Y, said projection comprising an
`ordered or unordered collection of elements wherein each 55
`element consists of a subblock in Y, an identity of a sub block
`in Y, or a reference of a subblock in Y, such that X can be
`constructed from Y and D, comprising the following steps:
`a. Partitioning X into subblocks X1 . . . Xn in accordance
`with one of aspects 1 to 6;
`b. Constructing D from one or more of the following: the
`contents of zero or more subblocks in X, references to zero
`or more subblocks in Y, and references to zero or more
`subblocks in D.
`20. In a further aspect of the invention, the invention 65
`provides a method for constructing a block X from a block
`Y and a block D, comprising the following steps:
`
`6
`a. Partitioning Y into sub blocks Y 1 . . . Y m in accordance
`with one or aspects 1 to 6;
`b. Constructing X from D and Y by constructing the
`subblocks of X based on one or more of:
`(i) subblocks contained within D;
`(iii) references in D to subblocks in Y;
`(iii) references in D to subblocks in D;
`21. In a further aspect of the invention, the invention
`provides a method for constructing a group X of subblocks
`X1 . . . Xn from a blockY and a block D, comprising the
`following steps:
`a. Partitioning Y into sub blocks Y 1 . . . Y m in accordance
`with one of aspects 1 to 6;
`b. Constructing X1 . . . Xn from D andY based on one or
`more of:
`(i) subblocks contained within D;
`(iii) references in D to subblocks in Y;
`(iii) references in D to subblocks in D;
`22. In a further aspect of the invention, the invention
`provides a method for communicating a data block X from
`20 one entity E1 to another entity E2 comprising the following
`steps:
`a. Partitioning X into subblocks X1 . . . Xn in accordance
`with one of aspects 1 to 6;
`b. Transmitting from E1 to E2 an identity of one or more
`25 subblocks;
`c. Transmitting from E2 to E1 information communicat(cid:173)
`ing the presence or absence of subblocks at E2;
`d. Transmitting from E1 to E2 at least the subblocks
`identified in step (c) as not being present at E2.
`Note: The information communicated in step (c) could take
`the form of a bitmap (or a compressed bitmap) correspond(cid:173)
`ing to the sub blocks referred to in step (a). It could also take
`many other forms.
`Note: If a group of sub blocks are to be transmitted, the above
`steps could be performed completely for each subblock
`35 before moving onto the next subblock. The steps could be
`applied to any subgroup of subblocks.
`23. In a further aspect of the invention, the invention
`provides a method for communicating a block X from one
`entity E1 to another entity E2, comprising the following
`40 steps:
`a. Partitioning X into subblocks X1 . . . Xn in accordance
`with one of aspects 1 to 6;
`b. Transmitting from E2 to E1 information communicat(cid:173)
`ing the presence or absence at E2 of members of a group Y
`45 or sub blocks Y 1 . . . Y m;
`c. Transmitting from E1 to E2 the contents of zero or more
`subblocks in X, and the remaining subblocks as references
`to subblocks in Y 1 . . . Y m and to subblocks transmitted.
`24. In a further aspect of the invention, the invention
`50 provides a method for an entity E2 to communicate to an
`entity E1 the fact that E2 possesses a blockY, comprising the
`following steps:
`a. Partitioning Y into sub blocks Y 1 . . . Y m in accordance
`with one or aspects 1 to 6;
`b. Transmitting from E2 to E1 references of the subblocks
`Y1 · · · Ym.
`25. In a further aspect of the invention, the invention
`provides a method for an entity E1 to communicate a
`sub block X; to an entity E2, comprising the following steps:
`a. Partitioning X into subblocks X1 . . . Xn