`US005990810A
`Patent Number:
`Date of Patent:
`
`[11]
`
`[45]
`
`5,990,810
`Nov. 23, 1999
`
`United States Patent
`Williams
`
`[19]
`
`[54]
`
`[76]
`
`[21]
`
`[22]
`
`[86]
`
`METHOD FOR PARTITIONING A BLOCK
`OF DATA INTO SUBBLOCKS AND FOR
`STORING AND COMMUNCATING SUCH
`SUBBLOCKS
`
`Inventor: Ross Neil Williams, 3/305 N. Terrace,
`Adelaide SASOOO, Australia
`
`Appl. No.:
`
`08/894,091
`
`PCT Filed:
`
`Feb. 15, 1996
`
`PCTNo.:
`
`PCT/AU96/00081
`
`§ 371 Date:
`
`Aug. 15, 1997
`
`§ 102(e) Date: Aug. 15, 1997
`
`[87]
`
`PCT Pub. No.: W096/25801
`
`PCT Pub. Date: Aug. 22, 1996
`
`[30]
`
`Foreign Application Priority Data
`
`Feb. 17, 1995
`Apr. 12, 1995
`
`[AU]
`[AU]
`
`Australia
`Australia
`
`Int. C1.6
`[51]
`[52] U.S. Cl.
`Field of Search
`[58]
`
`PN1232
`PN2392
`
`H03M 7/00
`341/51; 341/67
`341/51,50,67;
`375/241; 704/203
`
`[56]
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`4,698,628 10/1987 Herkert et al.
`8/1993 Sugiyama et al.
`5,235,623
`5,479,654 12/1995 Squibb
`
`340/825.02
`341/67
`395/600
`
`OlliER PUBLICATIONS
`
`Williams, Ross, "An algorithm for matching text (possibly
`original)", Newsgroup posting, comp.compression, Jan. 27,
`1992.
`Williams, Ross, "Parallel data compression", Newsgroup
`posting, comp.conmpression.research, Jun. 30, 1992.
`
`Knuth, Donald E., "The Art of Computer Programming, vol.
`3: Sorting and Searching", pp. 508-513, Addison-Wesley
`Publishing Company, 1973.
`Williams, Ross N., "An Introduction to Digest Algorithms"
`Proceedings of the Digital Equipment Computer Users Soci(cid:173)
`ety, pp. 9-18, Aug. 1994.
`Williams, Ross N., "An Extremely Fast ZIV-Lempel Data
`Compression Algorithm", Proceedings of Data Compression
`Conference, pp. 362-371, Apr. 1991.
`Knuth, Donald E., The Art of Computer Programming, vol.
`1: Fundamental Algorithms, pp. 435-451, Addison Wesley
`Publishing Company, 1973.
`
`Primary Examiner-Brian Young
`Attorney, Agent, or Firm---Greenberg Traurig; Robert P.
`Bell
`
`[57]
`
`ABSTRACT
`
`This invention provides a method and apparatus for detect(cid:173)
`ing common spans within one or more data blocks by
`partitioning the blocks (FIG. 4) into subblocks and searching
`the group of subblocks (FIG. 12) (or their corresponding
`hashes (FIG. 13)) for duplicates. Blocks can be partitioned
`into subblocks using a variety of methods, including meth(cid:173)
`ods that place subblock boundaries at fixed positions (FIG.
`3), methods that place subblock boundaries at data(cid:173)
`dependent positions (FIG. 3), and methods that yield mul(cid:173)
`tiple overlapping subblocks (FIG. 6). By comparing the
`hashes of subblocks, common spans of one or more blocks
`can be identified without ever having to compare the blocks
`or subblocks themselves (FIG. 13). This leads to several
`applications including an incremental backup system that
`backs up changes rather than changed files (FIG. 25), a
`utility that determines the similarities and differences
`between two files (FIG. 13), a file system that stores each
`unique subblock at most once (FIG. 26), and a communi(cid:173)
`cations system that eliminates the need to transmit subblocks
`already possessed by the receiver (FIG. 19).
`
`30 Claims, 26 Drawing Sheets
`
`Hash of subbblock
`I
`The subbblock itself
`I
`Hash of subbblock
`I
`
`Pointer to subbblock
`
`A Projection Of b
`
`Oracle Exhibit 1006, pg 1
`
`
`
`u.s. Patent
`
`Nov. 23, 1999
`
`Sheet 1 of 26
`
`5,990,810
`
`MADD 0003
`Sheet 1 of 26
`
`Demonstr ates con tent mis alignmen t.
`XDemonst rates co ntent mi salignme nt.
`
`Figure 1
`
`Oracle Exhibit 1006, pg 2
`
`
`
`u.s. Patent
`
`Nov. 23, 1999
`
`Sheet 2 of 26
`
`5,990,810
`
`Fixed an Id variab lie width Ipartiti loning.
`Fixe I d an Id variable wid Ith part Iitioning.
`
`Figure 2
`
`Oracle Exhibit 1006, pg 3
`
`
`
`u.s. Patent
`
`Nov. 23, 1999
`
`Sheet 3 of 26
`
`5,990,810
`
`partitioning.
`Data-indep endent
`XData-inde pendent pa rtitioning . I
`
`Data-dep I edent I partiti Ioning.1
`XData-dep Iendent partiti I oning·1
`
`Figure 3
`
`Oracle Exhibit 1006, pg 4
`
`
`
`u.s. Patent
`
`Nov. 23, 1999
`
`Sheet 4 of 26
`
`5,990,810
`
`..--- ----.-
`A
`B
`
`..--- ----.-
`A
`B
`
`..--- ----.-
`A
`B
`
`b
`
`k
`
`k
`
`k
`
`Function result space
`
`Figure 4
`
`Oracle Exhibit 1006, pg 5
`
`
`
`u.s. Patent
`
`Nov. 23, 1999
`
`Sheet 5 of 26
`
`5,990,810
`
`A B
`..--- ----.-
`
`b
`
`p
`
`..
`Increase p
`
`F( bp-A+1···bp,b p+1···bp+B) -----:·~I (SUbclass)I
`
`Function result space
`
`Figure 5
`
`Oracle Exhibit 1006, pg 6
`
`
`
`u.s. Patent
`
`Nov. 23, 1999
`
`Sheet 6 of 26
`
`5,990,810
`
`I
`
`I
`I
`I
`I
`I
`I
`~----_---~--------,------I---L-----------~-----,------------------~--~--------~
`I
`I
`I
`I :
`I
`I
`I
`I
`f
`
`f
`
`Figure 6
`
`r-----'-----~---------------------,------------------~-----T------------I------.
`I
`I
`I
`1
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`- ~ -
`- ~ -
`- ~
`
`-
`
`;
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`- -;- -
`
`-
`
`-
`
`-
`
`- L -
`
`-
`
`-
`
`- - ; -
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`- -'- -
`
`-
`
`-
`
`- _L. -
`
`-
`
`-
`
`- - ; -
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`F3 1
`F2 ~ -----L -
`F1 f
`
`Oracle Exhibit 1006, pg 7
`
`
`
`u.s. Patent
`
`Nov. 23, 1999
`
`Sheet 7 of 26
`
`5,990,810
`
`I-----'------------------~
`I
`I
`I
`I
`L
`.1
`
`I
`'-
`
`~--------------------~---------I
`,
`I
`I
`,
`I
`---------------~--------~
`I
`
`r---------------~-----------------~------------------------~------------------I
`I
`I
`I
`I
`.L.
`'-
`L
`I
`I
`I
`I
`~
`
`J
`
`I
`
`F,
`
`1
`
`r-----·--------~--~------~--------r-----T-----~--~---------~--------~--------~
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`
`Figure 7
`
`Oracle Exhibit 1006, pg 8
`
`
`
`u.s. Patent
`
`Nov. 23, 1999
`
`Sheet 8 of 26
`
`5,990,810
`
`~
`
`I
`I
`
`- - ~ - - ~ - -
`
`-
`
`-
`
`- - -;- -
`
`I
`
`I
`
`I
`I
`- L -
`
`-
`
`-
`
`-
`
`- ~ -
`
`-
`
`-
`
`-
`
`I
`
`I
`I
`- ~
`:
`
`- ~ -
`
`I
`
`I
`
`-
`
`-
`
`-
`
`- ~ -
`I
`
`F3 ;---------------------------------~------------------------~------------------:
`L
`~------------------~-----L-----~------------- __ I L ~-------------~
`F2:
`I
`I
`I
`I
`I
`I
`F1f - - -- - ~ - - 1 - -
`- J - - ~ - - ~ - - - - - -;- - - '- - - - - - - L - ~ -
`
`I
`
`I
`
`I
`
`I
`
`I
`
`I
`
`I
`
`I
`
`I
`
`I
`
`I
`
`Figure 8
`
`Oracle Exhibit 1006, pg 9
`
`
`
`u.s. Patent
`
`Nov. 23, 1999
`
`Sheet 9 of 26
`
`5,990,810
`
`.----------------------------------------------------------------------------.
`
`I
`I
`I
`I
`~----------------------------------------------------------------------------~
`I
`I
`I
`I
`~
`1
`I
`I
`I
`
`~--------------,---------------------i--------------~---------------T--------~
`
`I
`I
`I
`I
`I
`I
`I
`I
`I
`f
`r-----r--------~--~------I------------~--------,------I--------- r -----.-
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I :
`I
`I
`I
`I
`I
`I
`I
`
`F,
`
`I
`
`I
`I
`
`, __ ,
`I
`I
`
`I
`I
`
`bL..-.-1.-.l...-.l-.....l..-....L--.L...-...l-----l...----L----l...-...J.-..I...--JL.....J..----l....--L.---l.-..l..--L-J.----L.----L..-.....l..-...L-J
`
`Figure 9
`
`Oracle Exhibit 1006, pg 10
`
`
`
`u.s. Patent
`
`Nov. 23, 1999
`
`Sheet 10 of 26
`
`5,990,810
`
`.-- ----~
`
`A B
`
`k
`
`.-- ----~
`
`A B
`
`k
`
`.-- ----~
`
`A B
`
`k
`
`b
`
`Hashes of
`subblocks
`
`Figure 10
`
`Oracle Exhibit 1006, pg 11
`
`
`
`u.s. Patent
`
`Nov. 23, 1999
`
`Sheet 11 of 26
`
`5,990,810
`
`.-- ----~
`
`A B
`
`k
`
`.-- ----~
`
`A B
`
`k
`
`.-- ----~
`
`A B
`
`k
`
`b
`
`Hash of subbblock
`I
`The subbblock itself
`I
`Hash of subbblock
`I
`
`Pointer to subbblock
`
`A Projection Of b
`
`Figure 11
`
`Oracle Exhibit 1006, pg 12
`
`
`
`u.s. Patent
`
`Nov. 23, 1999
`
`Sheet 12 of 26
`
`5,990,810
`
`B
`
`A
`
`k
`
`..--- ----.-
`--
`
`....
`
`....
`
`....
`
`....
`
`,
`
`,
`
`,
`
`,
`
`B
`
`..--- ----.-
`
`A
`
`k
`
`B
`
`..--- ----.-
`
`A
`
`k
`
`b1
`
`....
`
`....
`
`--
`
`--
`
`--
`
`--
`
`---
`
`Compare
`
`..--- ----.(cid:173)
`k
`A B
`
`..--- ----.(cid:173)
`k
`A B
`
`Figure 12
`
`Oracle Exhibit 1006, pg 13
`
`
`
`u.s. Patent
`
`Nov. 23, 1999
`
`Sheet 13 of 26
`
`5,990,810
`
`A B
`.--- ----.(cid:173)
`
`k
`
`A B
`.--- ----.(cid:173)
`
`k
`
`.--- ----.-
`k
`A
`B
`
`.--- ----.(cid:173)
`k
`A B
`
`Figure 13
`
`Oracle Exhibit 1006, pg 14
`
`
`
`u.s. Patent
`
`Nov. 23, 1999
`
`Sheet 14 of 26
`
`5,990,810
`
`aardvark.txt -----..C3---f----------=~.. L...---'
`
`walrus.txt -------;~~)-
`
`sloth.dat ---~~~
`
`Entry lists
`
`Subblock storage
`
`Figure 14
`
`Oracle Exhibit 1006, pg 15
`
`
`
`u.s. Patent
`
`Nov. 23, 1999
`
`Sheet 15 of 26
`
`5,990,810
`
`E>-··
`,....
`>-..
`>-
`
`C\J
`W
`
`-"'0
`
`~(
`
`~C
`
`Q)
`)
`::J
`10(cid:173)
`/)e
`
`o(
`
`)
`
`Q)a:--
`
`r------
`
`eX
`
`.
`,....
`:::::::: X
`........
`=:::=:
`..
`:.:.:. X
`~I~
`
`~
`
`~
`
`e
`"'0 0
`Q) .-
`~ .
`~ ctS
`E ex
`.....
`Q)
`C/)
`C/) 0
`e
`ctS Q)
`10-
`10- a.
`I- Q)
`
`10-
`
`It')
`~
`Q)
`~
`
`::s
`bJj
`."""
`
`~
`
`e
`X
`,....
`X
`..
`X
`
`E
`>-
`··
`,....
`>-
`>=
`
`,....
`W
`
`Oracle Exhibit 1006, pg 16
`
`
`
`u.s. Patent
`
`Nov. 23, 1999
`
`Sheet 16 of 26
`
`5,990,810
`
`~...•.............,...~::::::::::::::::::=:: :::::::::: ::::::::::::
`• t. ..
`......
`
`X: X1 ..Xn
`
`••• « • t
`
`•
`
`t
`
`Y: Y1 .. Ym
`
`Figure 16
`
`Oracle Exhibit 1006, pg 17
`
`
`
`u.s. Patent
`
`Nov. 23, 1999
`
`Sheet 17 of 26
`
`5,990,810
`
`-.. .. .. .. ..... ....
`~ ~...... .. .....
`.
`.
`......................
`.
`.
`.
`.
`•••••••.•.•.
`X: X1 .. Xn
`
`0.
`
`. . . . .t
`
`,
`
`1 - - - - - - -1 Projection of Y
`
`Figure 17
`
`Y: Y1 .. Ym
`
`I
`I
`I
`I
`
`I,
`
`I
`
`I
`I
`I
`
`I I
`
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`
`Oracle Exhibit 1006, pg 18
`
`
`
`u.s. Patent
`
`Nov. 23, 1999
`
`Sheet 18 of 26
`
`5,990,810
`
`.- ----------------------.
`
`•
`•
`•
`•
`•
`•
`•
`•
`:
`X: X1 .. Xn
`:
`·- -- ------------------ - -'
`
`Y: Y1 ..Ym
`
`Figure 18
`
`Oracle Exhibit 1006, pg 19
`
`
`
`u.s. Patent
`
`Nov. 23, 1999
`
`Sheet 19 of 26
`
`5,990,810
`
`Q)
`
`C Q
`
`)E
`
`i-
`
`E
`>-
`.,...
`>-
`>-
`
`-o
`
`en c
`Q)X
`:
`:~
`+-'
`.,...
`a5 X"'0
`
`~""""""L-II
`
`N
`W
`
`.,...
`W
`
`E
`>-
`.,...
`>-
`>-
`
`c
`X
`.
`.,...
`~~{~ X
`.
`~jjj~j
`X
`~:~:~:
`
`N
`W
`
`.,...
`w
`
`Oracle Exhibit 1006, pg 20
`
`
`
`u.s. Patent
`
`Nov. 23, 1999
`
`Sheet 20 of 26
`
`5,990,810
`
`(J)
`
`c (
`
`J)
`
`E~
`
`-T"""
`W
`
`E>
`
`-
`T""">-
`>-
`
`C\J
`W
`
`T"""
`W
`
`c>
`
`<
`or-><
`><
`
`II
`
`Oracle Exhibit 1006, pg 21
`
`
`
`u.s. Patent
`
`Nov. 23, 1999
`
`Sheet 21 of 26
`
`5,990,810
`
`C\I
`W
`
`E>
`
`-..
`~>-. .
`>-
`
`'+-1-0o
`cncnE
`Q)~>-
`.-t= C
`.
`+-'Q)""':
`a5Q5>-
`-0 '+-
`-Q )
`1-
`
`.............-....
`
`~
`
`C><
`><. .
`><
`
`~
`
`w
`
`Oracle Exhibit 1006, pg 22
`
`
`
`u.s. Patent
`
`Nov. 23, 1999
`
`Sheet 22 of 26
`
`5,990,810
`
`C'I
`C'I
`QJ
`~
`=
`on
`."""
`~
`
`N
`W
`
`Q)
`C
`
`Q)
`E
`~
`
`.,...
`w
`
`.
`'-
`X
`......
`0
`~I
`~ j!j~j! X
`"'0
`Q)
`.c
`I-
`
`cX.
`
`,...
`X. .
`X
`
`N
`W
`
`.,...
`w
`
`.
`.-
`X
`......
`0
`~
`+-'
`+-'
`
`c
`Q)
`.-.
`"'0
`Q)
`.c
`l-
`
`......
`0
`Q) ~ •
`.c -- --
`I-EX
`Q)
`"'0
`
`cX.
`
`,...
`X..
`X
`
`Oracle Exhibit 1006, pg 23
`
`
`
`u.s. Patent
`
`Nov. 23, 1999
`
`Sheet 23 of 26
`
`5,990,810
`
`..0
`
`l-LL
`00>
`"t- C
`(J)~
`c etS
`etS
`::J
`0 ) -
`~ ~
`0)
`
`(J)
`::J
`C\il-
`etSa.
`a.«
`
`".........
`
`(J) CO .-.
`"t-C ......... CO
`00'r-.....
`.- ~ CD
`+-,+-,"""
`0) . - \.""
`.........
`. ~
`en L
`etS 0> ('I)
`a.a.>,.-
`
`~
`
`Oracle Exhibit 1006, pg 24
`
`
`
`u.s. Patent
`
`Nov. 23, 1999
`
`Sheet 24 of 26
`
`5,990,810
`
`-.-- ----..
`
`A B
`
`k
`
`-.-- ----..
`
`A
`
`k
`
`B
`
`-.-- ----..
`
`A B
`
`k
`
`b
`
`Hashes of
`subblocks
`
`Table of hashes
`
`Figure 24
`
`Oracle Exhibit 1006, pg 25
`
`
`
`d•
`'JJ.
`
`• ~~.
`
`....
`~=.....
`
`zo
`
`~ N~
`
`~
`
`'""'"'0
`'0
`'0
`
`'JJ.=(cid:173)~
`~....
`
`NU
`
`l
`o....,
`'1
`
`N0
`
`Ul
`....
`\C
`
`00
`
`\C=....
`~=
`
`X: X1 ..Xn: Current
`version (to be
`backed up).
`
`H(Y1 )... H(Ym)
`(Hashes of subblocks
`in previously backed
`up version (Y)).
`
`E1
`
`..
`.~~
`1
`.......:~I
`.........
`:.:.>:.:
`
`- 6
`-
`
`2
`
`I
`I
`I
`
`I
`I
`
`..
`
`D: Incremental
`backup
`consists of
`subblocks in X
`and references
`to subblocks in
`Y (and D).
`
`Figure 25
`
`-----------------
`
`I
`I
`I
`
`X: X1 ..Xn
`
`I
`----------~
`
`'AM«' ft • • L'
`
`X can be
`reconstructed
`from Yand D.
`~~~""ih~~ ~ J m
`:;::::::::::::::::::: ~j
`Y: Y1 ..Ym
`(Previously
`backed up
`version).
`E2
`
`Oracle Exhibit 1006, pg 26
`
`
`
`u.s. Patent
`
`Nov. 23, 1999
`
`Sheet 26 of 26
`
`5,990,810
`
`Filename
`
`Blocks
`
`aardvark.txt
`
`walrus.txt
`
`sloth.dat
`
`File Table
`
`I
`I
`I
`
`Hash
`830..
`9F8..
`AA6..
`092..
`3EB..
`
`Refs
`1
`2
`1
`1
`3
`
`421
`
`013
`
`I
`
`Ptr /
`
`- - /
`~
`
`I I
`
`I
`I
`
`o1234
`
`I
`I
`I
`
`544 VCJ D
`/
`--
`" ~
`-"""-
`..
`...
`
`I
`
`/1
`
`/-Pi
`
`..,l
`
`",l
`
`-
`
`~
`
`I
`
`I
`
`I
`
`0
`
`c=J
`
`I
`
`I
`
`0
`
`I
`I
`I
`
`I
`I
`I
`
`I
`I
`I
`
`Hash Table
`
`Figure 26
`
`Subblock storage system
`(Note: Subblocks are all
`different).
`
`Oracle Exhibit 1006, pg 27
`
`
`
`5,990,810
`
`1
`METHOD FOR PARTITIONING A BLOCK
`OF DATA INTO SUBBLOCKS AND FOR
`STORING AND COMMUNCATING SUCH
`SUBBLOCKS
`
`INTRODUCTION
`
`The present invention provides a method and apparatus
`for partitioning one or more blocks of data into subblocks for
`the purpose of communicating and storing such subblocks in
`an efficient manner.
`
`BACKGROUND
`
`5
`
`20
`
`Much of the voluminous amount of information stored,
`communicated, and manipulated by modern computer sys(cid:173)
`tems is duplicated within the same or a related computer
`system. It is commonplace, for example, for computers to
`store many slightly differing versions of the same document.
`It is also commonplace for data transmitted during a backup
`operation to be almost
`identical
`to the data transmitted
`during the previous backup operation. Computer networks
`also must repeatedly carry the same or similar data in
`accordance the requirements of their users.
`Despite the obvious benefits that would flow from a
`reduction in the redundancy of communicated and stored
`data, few computer systems perform any such optimization.
`Some instances can be found at the application level, one
`example being the class of incremental backup utilities that
`save only those files that have changed since the most recent
`backup. However, even these utilities do not attempt to
`exploit
`the significant similarities between old and new
`versions of files, and between files sharing other close
`semantic ties. This kind of redundancy can be approached
`only by analysing the contents of the files.
`The present invention addresses the potential for reducing 35
`redundancy by providing an efficient method for identifying
`identical portions of data within a group of blocks of data,
`and for using this identification to increase the efficiency of
`systems that store and communicate data.
`
`2
`subblocks whereas the data-dependent partitioning gener(cid:173)
`ates just four, allowing much of the similarity between the
`two blocks to be detected.
`The fact that a partitioning is data dependent does not
`imply that it must incorporate any knowledge of the syntax
`or semantics of the data. So long as the boundaries are
`positioned in a manner dependent on the local data content,
`identical subblocks are likely to be formed from identical
`portions of data, even if the two portions are not identically
`10 aligned relative to the start of their enclosing blocks (FIG.
`3).
`Once the group of blocks has blocks has been partitioned
`the resulting group of subblocks can be
`into subblocks,
`manipulated in a manner that exploits the occurrence of
`15 duplicate subblocks. This leads to a variety of applications,
`some of which are described below. However, the applica(cid:173)
`tion of a further aspect of the invention leads to even greater
`benefits.
`In a further aspect of the invention, the hash of one or
`more subblocks is calculated. The hash function can be an
`ordinary hash function or one providing cryptographic
`strength. The hash function maps each subblock into a small
`tractable value (e.g. 128 bits) that provides an identity of the
`25 subblock. These hashes can usually be manipulated more
`efficiently than their corresponding subblocks.
`Some applications of aspects of this invention are:
`Fine-grained incremental backups: Conventional incre(cid:173)
`mental backup technology uses the file as the unit of backup.
`30 However, in practice many large files change only slightly,
`resulting in a wasteful re-transmission of changed files. By
`storing the hashes of subblocks of the previous versions of
`files, the transmission of unchanged subblocks can be elimi(cid:173)
`nated.
`Communications: By providing a framework for commu(cid:173)
`nicating the hashes of subblocks, the invention can eliminate
`the ~ransmission of subblocks already possessed by the
`receIver.
`Differences: The invention could be used as the basis of
`40 a program that determines the areas of similarity and dif(cid:173)
`ference between two blocks.
`Low-redundancy file system: Data stored in a file system
`can be partitioned into subblocks whose hashes can be
`compared so as to eliminate the redundant storage of iden(cid:173)
`45 tical subblocks.
`Virtual memory: Virtual memory could be organized by
`subblock using a table of hashes to determine if a subblock
`is somewhere in memory.
`
`SUMMARY OF THE INVENTION
`
`To identify identical portions of data within a group of
`blocks of data, the blocks must be analysed. One simple
`approach is to divide the blocks into fixed-length (e.g.
`512-byte) subblocks and compare these with each other so
`as to identify all identical subblocks. This knowledge can the
`be used to manage the blocks in more efficient ways.
`Unfortunately, the partitioning of blocks into fixed-length
`subblocks does not always provide a suitable framework for 50
`the recognition of duplicated portions of data, as identical
`portions of data can occur in different sizes and places within
`a group of blocks of data. FIG. 1 shows how division into
`fixed-size subblocks of two blocks (whose only difference is
`the insertion of a single byte eX')) fails to generate identical 55
`subblocks. A comparison of the two groups of subblocks
`would reveal no identical pairs of subblocks even thought
`the two original blocks differ by just one character.
`A better approach is to partition each block using the data
`in the block itself to determine the position of the partitions. 60
`In an aspect of the invention, the blocks are partitioned at
`boundaries determined by the content of the data itself. For
`example, a block could be partitioned at each point at which
`the preceding three bytes has to a particular constant value.
`FIG. 2 shows how such a data dependent partitioning could 65
`turn out, and contrasts it with a fixed-length partitioning. In
`FIG. 3 data independent partitioning generates seven distinct
`
`Clarification of Terms
`
`The term block and subblock both refer, without
`limitation,
`to finite blocks or infinite blocks (sometimes
`called streams) of zero or more bits or bytes of digital data.
`Although the two different terms ("blocks" and "subblock")
`essentially describe the same substance (digital data), the
`two different terms have been employed in this specification
`to indicate the role that a particular piece of data is playing.
`The term "block" is usually used to refer to raw data to be
`manipulated by aspects of the invention. The term "sub(cid:173)
`block" is usually used to refer to a part of a block. "Blocks"
`are "partitioned" into "subblocks".
`The term partition has its usual meaning of exhaustively
`dividing an entity into mutually exclusive parts. However,
`within this specification, the term also includes cases where:
`Not all of the block is subdivided.
`Multiple overlapping subblocks are formed.
`
`Oracle Exhibit 1006, pg 28
`
`
`
`5,990,810
`
`5
`
`3
`A natural number is a non-negative integer (0, 1, 2, 3, 4,
`5, ... ).
`Where the phrase zero or more is used, this phrase is
`intended to encompass the degenerate case where the objects
`being enumerated are not considered at all, as well as the
`case where zero or more objects are used.
`BRIEF DESCRIPTION
`The following aspects of this invention are numbered for
`reference purposes. The terms "block" and "subblock" refer
`to blocks and subblocks of digital data.
`1. In an aspect of the invention, the invention provides a
`method for organizing a block b of digital data for the
`purpose of storage, communication, or comparison, by par(cid:173)
`titioning said block into subblocks at one or more positions
`klk+l within said block for which b[k-A+l ... k+B]
`satisfies a predetermine constraint, where A and Bare
`natural numbers.
`Note: The specification of this aspect encompasses the
`degenerate case in which either A or B is zero. The speci(cid:173)
`fication also includes the case where the constraint does not
`pay attention to some parts of b[k-A+l ... k+B]. For
`example, a constraint that pays attention only to (say) b[k-3]
`and b[k+2] would fall under the classes of constraint cor(cid:173)
`responding to A~4 and B~2.
`the invention
`2. In a further aspect of the invention,
`provides a method according to aspect 1 in which the
`constraint comprises the hash of some or all of b[k-A+1 ...
`k+B].
`3. In a further aspect of the invention,
`the invention
`~~~~~~~ss:b~~~~o~oa~~~~~~n~ntoaa:fct~c~i'af~ro~ft~~~n~ptz~ 30
`within a said block, comprising the step of:
`a. Evaluating whether said predetermined constraint is
`satisfied at each position klk+l,
`for
`increasing (or
`decreasing) k, where k starts with the value p.
`4. In a further aspect of the invention,
`the invention 35
`provides a method according to aspect 1, wherein one or
`more bounds are imposed on the size of one or more
`subblocks.
`the invention
`5. In a further aspect of the invention,
`provides a method according to aspect 1, wherein additional 40
`subblocks are formed from one or more groups of subblocks.
`6. In a further aspect of the invention,
`the invention
`provides a method according to aspect 1, wherein an addi(cid:173)
`tional hierarchy of subblocks is formed from one or more
`groups of contiguous subblocks.
`the invention
`7. In a further aspect of the invention,
`provides a method according to one of aspects 1 to 6,
`comprising the further step of:
`b. Calculating the hash of each of one or more of said
`subblocks.
`Note: The resulting collection of hashes is particularly
`useful if H is a strong one-way hash function.
`8. In a further aspect of the invention,
`the invention
`provides a method according to one of aspects 1 to 6,
`comprising the further step of:
`b. Forming a projection of said block, being an ordered or
`unordered collection of elements, wherein each element
`consists of a subblock, an identity of a subblock, or a
`reference of a subblock.
`Note: The specification of this aspect is intended to admit
`collections that contain a mixture of various kinds of iden(cid:173)
`tities and references.
`Note: In most applications, the output of this aspect will be
`an ordered list of hashes of the subblocks of the block.
`9. In a further aspect of the invention,
`the invention
`provides a method for comparing one or more blocks,
`comprising the steps of:
`
`4
`a. Partitioning one or more of said blocks into one or more
`subblocks in accordance with one of aspects 1 to 6.
`b. Forming a projection of each said block, being an
`ordered or unordered collection of elements, wherein each
`element consists of a subblock, an identity of a subblock, or
`a reference of a subblock.
`c. Comparing the elements of said projections of said
`blocks.
`10. In a further aspect of the invention, the invention
`10 provides a method for representing one or more blocks,
`comprising:
`(i) A collection of subblocks;
`filenames) which are
`(ii) Block representatives (e.g.
`mapped to lists of entries that identify subblocks;
`whereby the modification of one of said blocks involves the
`15 following steps:
`a. Partitioning some or all of said modified block into
`subblocks in accordance with one of aspects 1 to 6;
`b. Adding to said collection of subblocks zero or more
`subblocks that are not already in said collection, and updat(cid:173)
`20 ing said subblock list associated with said modified block.
`11. In a further aspect of the invention, the invention
`provides a method according to aspect 10, in which step b
`is replaced by:
`b. Removing from said collection of subblocks zero or
`25 more subblocks, and updating said subblock list associated
`with said modified block.
`12. In a further aspect of the invention, the invention
`provides a method according to aspect 10, in which step b
`is replaced by:
`b. Adding to said collection of subblocks zero or more
`subblocks that are not already in the collection, removing
`from said collection of subblocks zero or more subblocks,
`and updating said subblock list associated with said modi(cid:173)
`fied block.
`13. In a further aspect of the invention, the invention
`provides a method for an entity El to communicate a block
`X to El where El possesses the knowledge that E2 pos(cid:173)
`sesses a group Y of subblocks Y 1 . . . Y m' comprising the
`following steps:
`a. Partitioning X into subblocks Xl ... X n in accordance
`with one of aspects 1 to 6;
`b. Transmitting from El to E2 the contents of zero or more
`subblocks in X, and the remaining subblocks as references
`to subblocks in Y 1
`. Y m and to subblocks already
`.
`.
`45 transmitted.
`Note: In most implementations of this aspect, the subblocks
`whose contents are transmitted will be those in X that are not
`in Y, and for which no identical subblock has previously
`been transmitted.
`50 Note: To posses knowledge that E2 possesses Y 1 . . . Y m' El
`. Y m itself. El need only
`need not actually posses Y 1 .
`.
`posses the identities of Y l . . . Ym (e.g. the hashes of each
`subblock Y 1 . . . Y m)' This specification is intended to admit
`any other representation in which El may have the knowl-
`55 edge that E2 possesses (or has access to) Y 1 . . . Y m' In
`particular, the knowledge may take the form of a projection
`ofY.
`Note: It is implicit in this aspect the El will be able to use
`comparison (or other methods) to use its knowledge of E2's
`60 possession of Y to determine the set of subblocks that are
`common to both X and Y. For example, if El possessed the
`hashes of the subblocks of Y, it could compare them to the
`hashes of the subblocks of X to determine the subblocks
`common to both X and Y. Subblocks that are not common
`65 can be transmitted explicitly. Subblocks that are common to
`both X and Y can be transmitted by transmitting a reference
`to the subblock.
`
`Oracle Exhibit 1006, pg 29
`
`
`
`5,990,810
`
`5
`
`5
`14. In a further aspect of the invention, the invention
`provides a method for an entity E1 to communicate one or
`more subblocks of a group X of subblocks Xl ... X n to E2
`where E1 possesses the knowledge that E2 possesses the
`blocks Y, comprising the following steps:
`a. Partitioning Y into subblocks Y 1 . . . Ym in accordance
`with one or aspects 1 to 6;
`b. Transmitting from E1 to E2 the contents of zero or more
`subblocks in X, and the remaining subblocks as references
`to subblocks in Y and to subblocks already transmitted.
`15. In a further aspect of the invention, the invention
`provides a method for an entity E1 to communicate a block
`X to E2 where E1 possesses the knowledge that E2 pos(cid:173)
`sesses block Y, comprising the following steps:
`a. Partitioning in accordance with one of aspects 1 to 6, X 15
`into subblocks Xl ... X n and Y into subblocks Y 1 . . . Ym;
`b. Transmitting from E1 to E2 the contents of subblocks
`in X, and the remaining subblocks as references to subblocks
`in Y and to subblocks already transmitted.
`16. In a further aspect of the invention, the invention
`provides a method for constructing a block D from a block
`X and a group Y of subblocks Y 1 . . . Ym such that X can be
`constructed from Y and D, comprising the following steps:
`a. Partitioning X into subblocks Xl ... X n in accordance
`with one of aspects 1 to 6;
`b. Constructing D from one or more of the following: the
`contents of zero or more subblocks in X, references to zero
`or more subblocks in Y, and references to zero or more
`subblocks in D.
`Note: Step b above is intended to encompass the case where
`a mixture of the elements it describes is used.
`17. In a further aspect of the invention, the invention
`provides a method for constructing a block D from a group
`X of subblocks Xl ... X n and a block Y such that X can be
`constructed from Y and D, comprising the following steps:
`a. Partitioning Y into subblocks Y 1 . . . Ym in accordance
`with one of aspects 1 to 6;
`b. Constructing D from one or more of the following: the
`contents of zero or more subblocks in X, references to zero
`or more subblocks in Y, and references to zero or more
`subblocks in D.
`18. In a further aspect of the invention, the invention
`provides a method for constructing a block D from a block
`X and a block Y such that X can be constructed from Y and
`D, comprising the following steps:
`a. Partitioning in accordance with one of aspects 1 to 6, X
`into subblocks Xl ... X n and Y into subblocks Y 1 . . . Ym;
`b. Constructing D from one or more of the following: the
`contents of zero or more subblocks in X, references to zero
`or more subblocks in Y, and references to zero or more
`subblocks in D.
`19. In a further aspect of the invention, the invention
`provides a method for constructing a block D from a block
`X and a projection of Y, said projection comprising an
`ordered or unordered collection of elements wherein each 55
`element consists of a subblock in Y, an identity of a subblock
`in Y, or a reference of a subblock in Y, such that X can be
`constructed from Y and D, comprising the following steps:
`a. Partitioning X into subblocks Xl ... X n in accordance
`with one of aspects 1 to 6;
`b. Constructing D from one or more of the following: the
`contents of zero or more subblocks in X, references to zero
`or more subblocks in Y, and references to zero or more
`subblocks in D.
`20. In a further aspect of the invention, the invention
`provides a method for constructing a block X from a block
`Y and a block D, comprising the following steps:
`
`6
`a. Partitioning Y into subblocks Y 1 . . . Ym in accordance
`with one or aspects 1 to 6;
`b. Constructing X from D and Y by constructing the
`subblocks of X based on one or more of:
`(i) subblocks contained within D;
`(iii) references in D to subblocks in Y;
`(iii) references in D to subblocks in D;
`21. In a further aspect of the invention, the invention
`provides a method for constructing a group X of subblocks
`10 Xl ... X n from a block Y and a block D, comprising the
`following steps:
`a. Partitioning Y into subblocks Y 1 . . . Ym in accordance
`with one of aspects 1 to 6;
`b. Constructing Xl ... X n from D and Y based on one or
`more of:
`(i) subblocks contained within D;
`(iii) references in D to subblocks in Y;
`(iii) references in D to subblocks in D;
`22. In a further aspect of the invention, the invention
`provides a method for communicating a data block X from
`20 one entity E1 to another entity E2 comprising the following
`steps:
`a. Partitioning X into subblocks Xl ... X n in accordance
`with one of aspects 1 to 6;
`b. Transmitting from E1 to E2 an identity of one or more
`25 subblocks;
`c. Transmitting from E2 to E1 information communicat(cid:173)
`ing the presence or absence of subblocks at E2;
`d. Transmitting from E1 to E2 at least the subblocks
`identified in step (c) as not being present at E2.
`30 Note: The information communicated in step (c) could take
`the form of a bitmap (or a compressed bitmap) correspond(cid:173)
`ing to the subblocks referred to in step (a). It could also take
`many other forms.
`Note: If a group of subblocks are to be transmitted, the above
`steps could be performed completely for each subblock
`35 before moving onto the next subblock. The steps could be
`applied to any subgroup of subblocks.
`23. In a further aspect of the invention, the invention
`provides a method for communicating a block X from one
`entity E1 to another entity E2, comprising the following
`40 steps:
`a. Partitioning X into subblocks Xl ... X n in accordance
`with one of aspects 1 to 6;
`b. Transmitting from E2 to E1 information communicat(cid:173)
`ing the presence or absence at E2 of members of a group Y
`45 or subblocks Y 1 . . . Ym;
`c. Transmitting from E1 to E2 the contents of zero or more
`subblocks in X, and the remaining subblocks as references
`to subblocks in Y 1 . . . Ym and to subblocks transmitted.
`24. In a further aspect of the invention, the invention
`50 provides a method for an entity E2 to communicate to an
`entity E1 the fact that E2 possesses a block Y, comprising the
`following steps:
`a. Partitioning Y into subblocks Y 1 . . . Ym in accordance
`with one or aspects 1 to 6;
`b. Transmitting from E2 to E1 references of the subblocks
`Y l · · · Y m ·
`25. In a further aspect of the invention, the invention
`provides a method for an entity E1 to communicate a
`subblock Xi to an entity E2, comprising the following steps:
`a. Partitioning X into subblocks Xl ... X n in accordance
`with one of aspects 1 to 6;
`b. Transmitting from E2 to E1 an identity of Xi;
`c. Transmitting Xi from E1 to E2.
`Note: This aspect applies (among other applications) to the
`65 case of a network server E1 that serves subblocks to clients
`such as E2, given the identities (e.g. hashes) of the requested
`subblocks.
`
`60
`
`Oracle Exhibit 1006, pg 30
`
`
`
`5,990,810
`
`5
`
`7
`26. In a further aspect of the invention, the invention
`provides a method according to one of aspects 1 to 6,
`wherein said subblocks are compared by comparing the
`hashes of said subblocks.
`27. In a further aspect of the invention, the invention
`provides a method according to one of aspects 1 to 6,
`wherein subsets of identical subblocks within a group of one
`or more subblocks are found, by inserting each subblock, an
`identity of each subblock, a reference of each subblock, or
`a hash of each subblock, into a data structure.
`28.
`In further aspect of the invention,
`the invention
`provides an apparatus for organizing a block b of digital data
`for the purpose of storage, communication, or comparison,
`by partitioning said block into subblocks at one or more
`positions klk+1 within said block for which b[k-A+1 ...
`k+B] satisfies a predetermined constraint, where A and Bare
`natural numbers.
`Note: The specification of this aspect encompasses the
`degenerate case in which either A or B is zero. The speci(cid:173)
`fication also includes the case where the constraint does not
`pay attention