throbber
Ulllted States Patent
`
`[19]
`
`[11] Patent Number:
`
`5,978,791
`
`Farber et al.
`
`[45] Date of Patent:
`
`Nov. 2, 1999
`
`US005978791A
`
`.................... .. 364/200
`. . . .. 364/900
`395/600
`
`
`
`5/1990 Holloway et al.
`4,922,414
`11/1990 Burke . . . . . . .. . . . . . . .
`4,972,367
`4/1991 Bendert et al.
`5,007,658
`6/1991 C110 .................................. ..
`5,025,421
`9/1991 Marca ................................... .. 364/200
`5,050,074
`9/1991 Dyson ........ ..
`380/25
`5,050,212
`5,057,837 10 1991 Colwell et al.
`..
`341 55
`5,129,081
`741992 Kobayashi et al.
`. 395/6/00
`5,129,082
`7/1992 Tirfing etal.
`395/600
`5,144,667
`9/1992 Po
`e, Jr. et al.
`380/45
`5,179,680
`1/1993 Colgxlirell et al.
`......
`. 395/425
`5,202,982
`4/1993 Gramlich et al.
`. 395/600
`
`[54] DATA PROCESSING SYSTEM USING
`SUBSTANTIALLY UNIQUE IDENTIFIERS T0
`IDENTIFY DATA ITEMS, WHEREBY
`
`IDENTIFIERS
`
`[75]
`
`.
`.
`.
`.
`Inventors: David A. Farber, O]a1, Cal1f.; Ronald
`D- Lachmana Nofthbfooka 111-
`[73] Assigneei Kinetech, 1110-, Northbrook, 111.
`
`[21] App]. NO‘: 08/960,079
`
`[22]
`
`Filed:
`
`Oct. 24, 1997
`
`Related U_S_ Application Data
`
`[63] Continuation of application No. 08/425,160, Apr. 11, 1995,
`abandoned.
`
`5,208,858
`5,276,901
`5,301,286
`5,301,316
`5,343,527
`5,357,623
`57384565
`5,404,508
`
`5/1993 Vollert et al.
`1/1994 Howell et al.
`4/1994 Rajani
`. . . . . .. . . . . . . .
`4/1994 Hamilton et al.
`8/1994 M0016 . . . . . .. . . . . . . . . . .
`10/1994 Megory—Cohen
`1/1995 Cannon """ "
`4/1995 Konrad et al.
`
`380/43
`.. 395/800
`. . . .. 395/400
`. 395/600
`. . . .. 380/4
`395/425
`340/82544
`........................ .. 395/600
`
`
`
`OTHER PUBLICATIONS
`Witold Litwin et al, Linear Hashing for Distributed Files,
`ACM SIGMOD’ May, 1993 pp. 327_336.
`Ming—Ling Lo, et al, On Optimal Processor Allocation to
`Support Pipelined Hash Joins, ACM SIGMOD, pp. 69-78,
`1}/[lay 1992.13
`DE
`. 1C
`1
`. M 01232
`‘h
`omas
`.
`erson
`1 erentia
`r
`tana s1s
`o
`wit
`Applications to M135, 1313- 69-31911992 y
`
`(List continued on next page.)
`
`Primary Examiner—Paul V. Kulik
`Assistant Examiner—Jean
`Hgmere
`
`Attorney, Agent, or Firm—Pillsbury Madison & Sutro LLP
`
`[57]
`
`ABSTRACT
`
`In a data processing system, a mechanism identifies data
`items by substantially unique identifiers which depend on all
`of the data in the data items and only on the data in the data
`items. The system also determines whether a particular data
`item is present in the database by examining the identifiers
`of the plurality of data items
`'
`
`48 Claims, 31 Drawing Sheets
`
`[51]
`Int. Cl.6 .................................................... .. G06F 17/30
`
`[52] U_'S' Cl‘ ‘
`" 707/2; 707/1; 707/200
`[58] Fleld Of Search ..................................... 707/2, 1, 200
`,
`Referenees Clted
`U.S. PATENT DOCUMENTS
`6/1972 Evangelistietal.
`................. 340/172.5
`7/1980 Mitchell et al.
`..
`364/200
`9/1981 Cichelli et al.
`364/200
`3/1983 Rivest
`......... ..
`364/900
`9/1983 Rivest et al.
`..
`.. 178/22.1
`10/1983 Neches et al.
`364/200
`
`[56]
`
`3,668,647
`4,215,402
`4,290,105
`4,376,299
`4,405,829
`4,412,285
`
`364/200
`11/1983 Summer: J“ et a1~
`4>414a624
`364/200
`4/1984 Fletcher et al.
`..... ..
`4,441,155
`364/200
`8/1984 Benhase et al.
`..
`4,464,713
`364/200
`4,490,782 12/1984 Dixon et al.
`.... ..
`364/900
`4,571,700
`2/1986 Emry, Jr, et a1,
`,
`365/189
`4,577,293
`3/1986 Matick et al.
`364/900
`4,642,793
`2/1987 Meaden ~~~~~~~ ~-
`364/200
`4575310
`6/1987 Gfuner et a1-
`365/185
`436919299
`9/1987 Rwest et al‘ ““ “
`364/200
`4,725,945
`2/1988 Kronstadtetal.
`364/900
`4,773,039
`9/1988 Zamora ........... ..
`364/900
`4,887,235
`12/1989 Holloway et al.
`4,888,681
`12/1989 Barnes et al.
`......................... .. 364/200
`
`
`
`104
`sromes
`DEVICE
`
`104
`STORAGE
`DEVICE
`
`102
`PROCESSOR
`
`.
`
`.
`
`.
`
`102
`PROCESSOR
`
`ms
`
`102
`PROCESSOR
`
`102
`PROCESSOR
`
`102
`PROCESSOR
`
`PROCESSOR
`
`100
`CPU
`
`3
`3
`
`STORAGE
`nsvucz
`
`114
`
`MEMORY
`132
`A;
`134
`AL
`156
`SM];
`152
`an
`154
`5,“,
`
`126
`
`128
`
`1:10
`
`136
`
`153
`
`GOOG-1001-Page 1 of 56
`
`GOOG-1001-Page 1 of 56
`
`

`
`5,978,791
`Page 2
`
`OTHER PUBLICATIONS
`
`William Perrizo, et al., Distributed Join Processing Perfor-
`mance Evaluation, 1994. Twenty-Seventh Hawaii Interna-
`tional Conference on System Sciences, vol. II, pp. 236-244.
`A concurrency Control Mechanism based on Extendible
`Hashing for Main Memory Database Systems, Vij ay Kumar,
`pp. 109-113, ACM, vol. 3, 1989.
`Birgit Pfitzmann, Sorting Out Signature Schemes, Nov.
`1993, 1st Conf. Computer & Comm. Security ’93 pp. 74-85.
`Bert dem Boer, et al., Collisions for the compression func-
`tion of MDS pp. 292-304, 1994.
`Sakti Pramanik, et al., Multi-Directory Hashing, 1993, Info.
`Sys., vol. 18, No. 1, pp. 63-74.
`Murlidhar Koushik, Dynamic Hashing With Distributed
`Overflow Space: A File Organization With Good Insertion
`Performance, 1993, Info. Sys., vol. 18, No. 5, pp. 299-317.
`Witold Litwin, et al., LH*-Linear Hashing for Distributed
`Files, HP Labs Tech. Report No. HPL-93-21 Jun. 1993 pp.
`1-22.
`
`Yuliang Zheng, et al., HAVAL — A One-Way Hashing
`Algorithm with Variable Length of Output
`(Extended
`Abstract), pp. 83-105, Advances in Cryptology, AUSCRIPT
`’92, 1992.
`
`Chris Charnes and Josef Pieprzky, Linear Nonequivalence
`versus Nonlinearity, Pieprzky, pp. 156-164, 1993.
`Zhiyu Tian, et al., A New Hashing Function: Statistical
`Behaviour and Algorithm, pp. 3-13, SIGIR Forum, 1993.
`G. L. Friedman, Digital Camera With Apparatus For
`Authentication of Images Produced From an Image File,
`NASA Case No. NPO—19108—1—CU, Serial No. 08/159,980,
`Nov. 24, 1993.
`H. Goodman, Feb. 9, 1994 Ada, Object-Oriented Tech-
`niques, and Concurrency in Teaching Data Sructures and
`File Management Report Documentation P. AD-A275
`385 — 94-04277.
`
`Advances in Cryptology-EUROCRYPT ’93, Workshop on
`the Theory and Application of Cryptographic Techniques
`Lofthus, Norway, May 23-27, 1993 Proceedings.
`Proceedings of the 1993 ACM SIGMOD International Con-
`ference on Management of Data, vol. 22, Issue 2, Jun. 1993.
`Advances in Cryptology-AUSCRYPT ’92 — Workshop on
`the Theory and Application of Cryptographic Techniques
`Gold Coast, Queensland, Australia Dec. 13-16, 1992 Pro-
`ceedings.
`Search Report dated Jun. 24, 1996.
`
`GOOG-1001-Page 2 of 56
`
`GOOG-1001-Page 2 of 56
`
`

`
`U.S. Patent
`
`Nov. 2, 1999
`
`Sheet 1 of 31
`
`5,978,791
`
`2:
`
`
`
`
`
`m_OmwmOOm_n_mowmmoommmowwmoomm
`
`N2N9N3
`
`
`
`...mo_>mn_...m_o_>ma
`
`
`
`
`momwmoommmomwmoommmw<m2.mmwéohw
`
`N2N2a3G:.o_..._
`
`o_..
`
`GOOG-1001-Page 3 of 56
`
`GOOG-1001-Page 3 of 56
`
`

`
`U.S. Patent
`
`Nov. 2, 1999
`
`Sheet 2 of 31
`
`5,978,791
`
`>mos_ms_
`
`N9.
`
`n_<
`
`V2
`
`.2
`
`m2
`
`ezm
`
`Nm_.
`
`.75
`
`V9.
`
`o<m
`
`¢N_.
`
`mm:
`
`mm?
`
`mu:
`
`wN_.
`
`._.m
`
`5o?
`
`09
`
`._..._
`
`mfl
`
`40
`
`3:.o_.._
`
`GOOG-1001-Page 4 of 56
`
`GOOG-1001-Page 4 of 56
`
`
`
`

`
`U.S. Patent
`
`MN
`
`m
`
`MS
`
`13
`
`5,978,791
`
`
`
`9>mo_.ommE...>mo5mm_o>moBmm_o
`
`2.,2;2.m:
`
`Mm.__u_...m.__n_m.__u_Mourour8.
`
`NN_.
`
`N5.N2
`
`kzmswmw...
`
`
`
`wzmswmmhzmsomm
`
`GOOG-1001-Page 5 of 56
`
`w_._..m.=..._
`
`sm:.m>mN
`
`GOOG-1001-Page 5 of 56
`
`

`
`U.S. Patent
`
`Nov. 2, 1999
`
`Sheet 4 of 31
`
`5,978,791
`
`
`
`FIG. 3
`
`Pathname
`
`.38
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Time of last access
`
`Time of last modification
`
`
`
`
`
`
` ‘
`
`
`
`
`
`
`:40
`
`True Name
`
`Com-ressed File ID
`
`
`
`
`
`
`
`Time of last access
`
`
`Groomino delete count
`
`I42
`
`
`
`
`
`
`
`
`
`GOOG-1001-Page 6 of 56
`
`
`
`
`
`
`F|G.5
`
`GOOG-1001-Page 6 of 56
`
`

`
`U.S. Patent
`
`Nov. 2, 1999
`
`Sheet 5 of 31
`
`5,978,791
`
`.EmummfiHB OEMZ
`
`
`
`HMCH.w.no
`
`N.o_n_
`
`¢¢_
`
`o:
`
`
`
`u«awnwHwm>mmouaom
`
`
`
`cowumooamousom
`
`m.sumuusom
`m.o_.._
`
`m¢_
`
`on.
`
`0502OUHH.
`
`OEMCSUMW
`
`uucouomumu
`
`
`
`050.20.D..H.H.
`
`.~
`
`mg“.
`
`mOE
`
`GOOG-1001-Page 7 of 56
`
`GOOG-1001-Page 7 of 56
`
`
`
`
`

`
`U.S. Patent
`
`Nov. 2, 1999
`
`Sheet 6 of 31
`
`5,978,791
`
`FIG. IO(a)
`
`SIMPKE
`
`DATA ITEM
`
`COMPUTE MD FUNCTION ON
`
`DATA ITEM
`
`8214
`
`APPEND LENGTH MODULO 32 OF
`
`DATA ITEM
`
`TRUE NAME
`
`1
`
`GOOG-1001-Page 8 of 56
`
`GOOG-1001-Page 8 of 56
`
`

`
`U.S. Patent
`
`Nov. 2, 1999
`
`Sheet 7 of 31
`
`5,978,791
`
`
`
`
`
`S220
`PARTITION DATA ITEM INTO
`SEGMENTS
`
`FIG. IO(b)
`
`S222
`
`_ _ _ _ ..._
`
`ASSIMILATE EACH SEGMENT
`
`(COMPUTING rrs TRUE NAME)
`
`S224
`
`CREATE INDIRECT BLOCK OF
`SEGMENT TRUE NAMES
`
`,r‘“''s:21‘e'”" \
`COMPUTE TRUE :
`; NAME OF SIMPLE 1
`1
`DATA ITEM
`:
`\ . _ _ _ _ _
`
`S226
`
`ASSIMILATE INDIRECT BLOCK
`
`(COMPUTING rrs TRUE NAME)
`
`ITEM
`
`S228
`
`REPLACE FINAL 32 BITS OF TRUE
`NAME WITH LENGHT MOD 32 OF DATA
`
`GOOG-1001-Page 9 of 56
`
`GOOG-1001-Page 9 of 56
`
`

`
`U.S. Patent
`
`Nov. 2, 1999
`
`Sheet 8 of 31
`
`5,978,791
`
`>m._.zmmmoo
`
`~.n=m.=n_m><:
`
`mm:
`
`8%
`
`
`
`n:m.=.._mmohm.
`
`wmmm
`
`a.m.__u_mfido
`
`mm:
`
`ms_<zM55mmoo
`
`m.=..._mnmpZ.Bum.
`
`~.>Em_omm
`
`8%
`
`mz_s_mm:.mo
`
`ms<zmam»
`
`__.o_.._
`
`ommm
`
`
`
`>mFzmzmzm:.<mmo..
`
`
`
`_.O...._..zDO0mm:Em..
`
`
`
`
`
`
`
`mo._m_u_mmE.oEm..D.m.=u_mmopw..
`
`GOOG-1001-Page 10 of 56
`
`GOOG-1001-Page 10 of 56
`
`
`
`
`
`

`
`U.S. Patent
`
`Nov. 2, 1999
`
`Sheet 9 of 31
`
`5,978,791
`
`FIG. I2
`
`
` S238
`S240
`UPDATE
`
`
`DEPENDENCY
`
`
`LOCKED?
`LIST
`
`
`
`FILE
`
`S242
`
`SEND MESSAGE TO
`
`
`
`
`
`CACHE SERVER TO
` S244
`
`UPDATE CACHE
`
`(IF DESIRED)
`
`COMPRESS
`
`
`
`MIRROR
`
`
`(IF DESIRED)
`
`
`
`S246
`
`GOOG-1001-Page 11 of 56
`
`GOOG-1001-Page 11 of 56
`
`

`
`U.S. Patent
`
`N0v.2, 1999
`
`Sheet 10 of 31
`
`5,978,791
`
`FIG. I3
`
`S250
`
`SEARCH FOR
`THE
`
`PATHNAME
`
`‘ .
`
`.
`
`‘ '
`
`FAIL
`
`
`
`
`LDE INCLUDES
`
`TRUE NAME?
`
`S258
`
`
`
`
`
`ASSIMILATE
`FILE ID
`
`
`
`
`
`LDE IDENTIFIES
`
`DIRECTORY?
`
`YS
`
`S256
`
`FREEZE
`
`DIRECTORY
`
`GOOG-1001-Page 12 of 56
`
`GOOG-1001-Page 12 of 56
`
`

`
`U.S. Patent
`
`Nov. 2, 1999
`
`Sheet 11 of 31
`
`5,978,791
`
`S260
`
`CONFIRM THAT
`
`TRUE NAME
`
`EXISTS LOCALLY
`
`FIG. I4
`
`
`
`
` S262
`SEARCH FOR
`
`PATHNAME IN
`
`LDE TABLE
`
`S264
`
`
`
`
`
`
`
`CONFIRM THAT
`
`DIRECTORY
`
`EXISTS
`
`S266
`
`
`
`S268
`
`NAMED FILE
`
`EXISTS?
`
`DELETE
`
`TRUE FILE
`
`S270
`
`
`
`CREATE
`
`GOOG-1001-Page 13 of 56
`
`ENTRY IN LDE
`
`
`
`
`
`& UPDATE
`
`GOOG-1001-Page 13 of 56
`
`

`
`U.S. Patent
`
`N0v.2, 1999
`
`Sheet 12 of 31
`
`5,978,791
`
`mmzommmm
`
`m>:._mo..
`
`Sum
`
`"Fmozmm
`
`._._<>>wmw<mmms_
`mo".
`
`mwzommmm
`
`2%
`
`
`
`m.__u_m:E.mmwzm
`
`
`
`O._.z_omzmamm
`
`mu:
`
`mmzomwmm
`
`m>:.<wmz
`
`mm;
`
`Eommmoom.<29200..2
`
`Nwmm
`
`
`
`mam»E_mm>
`
`"5m.__"_
`
`ammamo
`
`8mm
`
`m.__"_02.".
`
`m_.o_.._
`
`GOOG-1001-Page 14 of 56
`
`GOOG-1001-Page 14 of 56
`
`
`
`
`
`

`
`U.S. Patent
`
`mN
`
`S
`
`5,978,791
`
`1,>z<2.Sum
`om:.om._mW ommomwmoomm
`
`mm:
`
`Em.or.as
`
`
`1.._.zm_._o
`1moBmmzommmmM.m»m<oo<ommms:.mm:
`
`.5ms:
`
`MMZOQMMQ
`
`i-li+|iI-
`
`m\<R.01
`
`«mum
`
`..zm_._o
`
`..&.om.m_m
`
`.wEommmoom._
`
`GOOG-1001-Page 15 of 56
`
`GOOG-1001-Page 15 of 56
`
`
`

`
`U.S. Patent
`
`N0v.2, 1999
`
`Sheet 14 of 31
`
`5,978,791
`
`max»".0momaow
`
`
`
`_20mEmmm"_u__oms_<z
`
`~zo:<z:.wmo
`
`Sum
`
`mmopm
`
`n:mommmoomm
`
`Emmw
`
`mz_s_mm:.mo
`
`
`
`m_.<n_zo:<m_n_xm
`
`.5...O._.oo<oz<
`
`ca
`‘£
`
`4m_momaom
`
`oz_:m_._m:n_
`
`oommm
`
`>.~.s_m»m>m
`
`mommm
`
`mo".mE._.n5X004
`
`on?wms_<zmam»
`
`
`
`n:zO_._.<OO._momsow
`
`
`
`MO".mo.momzowO._.
`
`ms_<zm:E.
`
`
`
`
`
`O._.mo<mwm_zozmw
`
`momnom20
`
`mommmoomm
`
`
`
`m.__.._mamhm>mmwmx
`
`Baum
`
`
`
`35..o_..._
`
`GOOG-1001-Page 16 of 56
`
`GOOG-1001-Page 16 of 56
`
`
`
`
`
`
`
`

`
`U.S. Patent
`
`N0v.2, 1999
`
`Sheet 15 of 31
`
`5,978,791
`
` mm:
`
`MO".0.m.__n_
`
`
`
`~.>E.zmm=._._.
`
`O2
`
`
`
`mmmmm..m>
`
`8%
`
`mwmmmsoomo
`
`omwmmmmsoo
`
`S.m.__"_
`
`mm;
`
`.=<..._
`
`oz
`
`>Ezmm.__n_mam
`
`mzmkmo".E»z_
`
`~.ms_<z
`
`A32.2“.
`
`GOOG-1001-Page 17 of 56
`
`GOOG-1001-Page 17 of 56
`
`
`
`
`

`
`U.S. Patent
`
`Nov. 2, 1999
`
`Sheet 16 of 31
`
`5,978,791
`
`i
`
`88
`
`n:mmokm
`
`38
`
`5m._mw
`
`wemomaom
`
`9momaow
`
`NEOEO2
`
`88
`
`m:.<oo._
`
`
`
`m.=...._m.Ss_mm
`
`EVE.o_..._
`
`mm:
`
`
`
`mSE.m~_._<mm
`
`
`
`s_om”_m.__"_
`
`Emomsom
`
`GOOG-1001-Page 18 of 56
`
`GOOG-1001-Page 18 of 56
`
`
`
`
`

`
`U.S. Patent
`
`N0v.2, 1999
`
`Sheet 17 of 31
`
`5,978,791
`
`.3w_
`or.
`
`mo".0.m.=n_
`
`~.m.__"_may:
`
`«N8
`
`mama.M252
`
`._<oo._w.=n_
`
`
`
`0.50:0:o._.<mo
`
`max»"5>n_Oomm
`
`._.m.__n_
`
`m:.m._momSm
`
`m.__n_mam»
`
`mm_"=._.zmn__mo._
`
`mamwwz:m_xm
`
`3...".
`
`ommw
`
`zmzmbqmmo
`
`
`
`m.=..._zobaom
`
`GOOG-1001-Page 19 of 56
`
`GOOG-1001-Page 19 of 56
`
`
`
`
`
`
`
`

`
`U.S. Patent
`
`Nov. 2, 1999
`
`Sheet 18 of 31
`
`5,978,791
`
`wmmm
`
`m>o_2mmwn:m.Em><m
`m..:.
`
`>Ezm
`
`mm:
`
`ll-
`
`88
`
`zmzO._.w.E>n_Oo
`
`
`
`35..o_u_
`
`
`
`
`
`.m._m<..mo._Z_9..u._u=..._mmohw.m.=.._
`
`pzaoo
`
`
`
`mm:pzmsmmomo
`
`GOOG-1001-Page 20 of 56
`
`GOOG-1001-Page 20 of 56
`
`
`
`

`
`U.S. Patent
`
`Nov. 2, 1999
`
`Sheet 19 of 31
`
`5,978,791
`
`mmmm
`
`m.E.__s__mm<
`
`om:.<.=s__ww<z:
`
`m.=..._
`
`vmmm
`
`u:mNmmE
`
`>mEomma
`
`
`
`Em..o_.._
`
`$8
`
`_.zmsm_moz_
`
`
`
`X004mnmmmm
`
`_._o<mmo”.
`
`m._.<z_omom:m
`
`QZ<m.__n_
`
`mm...2.>m9.omm_n
`
`
`
`>moSmm_n_zm>_w
`
`GOOG-1001-Page 21 of 56
`
`GOOG-1001-Page 21 of 56
`
`
`
`
`
`
`

`
`U.S. Patent
`
`N0v.2, 1999
`
`Sheet 20 of 31
`
`5,978,791
`
`ovmm
`
`nmmamo
`
`
`..<ZO_._._Qo<<._.<Dzmzn_z<m.__n_
`
`DmoommOh>m._.zmoQ<m:.<z_nmom:m
`
`wmmm:o<mmo".
`
`zoP<s_mou_z_
`
`
`
`35..o_.._
`
`_2m._._MI...z_>mo5mm_o
`
`
`
`>mo5mm_azm>_c
`
`$8
`
`
`
`m:._._.m:.<.__s=mm<
`
`
`
`
`
`S_m_._._<._.<D>>mZ
`
`Emw
`
`pzmsmmumo
`
`mwmmmumm»
`
`K004
`
`GOOG-1001-Page 22 of 56
`
`GOOG-1001-Page 22 of 56
`
`
`
`
`

`
`U.S. Patent
`
`mN
`
`2.,
`
`m
`
`5,978,791
`
`w9mm:05mo".mmosozEmmw3%$8
`
`ovmm
`
`mam...mxsz
`
`
`
`._<UO._m.__n_
`
`ON.0_n_
`
`
`
`
`
`>mo.6mm_o>moBmm_omm_Ezmmzoo
`
`
`
` Mm_2<zE.<n_M._.S"_mbmmom88
`
`$8
`
`
`
`O._.I._.<n_X23
`
`m5<zm:mk
`
`>mpzm
`
`GOOG-1001-Page 23 of 56
`
`GOOG-1001-Page 23 of 56
`
`
`
`
`
`

`
`U.S. Patent
`
`N0v.2, 1999
`
`Sheet 22 of 31
`
`5,978,791
`
`
`
` S354
`WAIT FOR
`
`
`
`FREEZE LOCK
`
`
`
`TO TURN OFF
`
`FlG.2|
`
`
`
`S356
`
`FIND TFR
`
`ENTRY
`
`
`
` S358
`DECREMENT
`
`REFERENCE
`
`COUNT
`
`REFERENCE COUNT IS
`
`
`
`
`S362
`
`DELETE
`
`TRUE FILE
`
`
`
`ZERO & NO DEPENDENT
`
`SYSTEMS IN TFR?
`
`S364
`
`
`REMOVE FILE ID
`AND COMPRESSED
`
`FILE ID
`
`
`
`GOOG-1001-Page 24 of 56
`
`GOOG-1001-Page 24 of 56
`
`

`
`U.S. Patent
`
`N0v.2, 1999
`
`Sheet 23 of 31
`
`5,978,791
`
`S365
`
`GET
`
`OPERATION
`
`3365
`CREATE oR
`MODIFY?
`
`cow OR DELETE
`COMPOUND?
`
`
`
`
`YES
`
`YES
`
`S368
`
`ASSIMILATE
`
`F"-E
`
`3369
`
`NEW TRUE
`
`S378
`
`3370
`
`Mompy USE
`COUNT OF EACH
`COMPONENT
`
`RECORD TRUE
`NAME IN AUDIT
`
`FILE
`
`S379
`
`FOR EACH PARENT
`
`DIRECTORY OR FILE,
`
`UPDATE USE COUNT,
`LAST ACCESS AND
`
`MODIFY TIMES
`
`GOOG-1001-Page 25 of 56
`
`GOOG-1001-Page 25 of 56
`
`

`
`U.S. Patent
`
`N0v.2, 1999
`
`Sheet 24 of 31
`
`5,978,791
`
`FIG. 23
`
`
`
` S382
`VERIFY
`
`GROOMING
`
`LOCK OFF
`
`
`
`GROOMING
`
`LOCK
`
`S386
`
`SET GROOM
`
`COUNTS
`
`GOOG-1001-Page 26 of 56
`
`
`
`S384
`
`SET
`
`GOOG-1001-Page 26 of 56
`
`

`
`U.S. Patent
`
`N0v.2, 1999
`
`Sheet 25 of 31
`
`5,978,791
`
`
`
`
`FIND LDE
`
`S388
`
`RECORD
`
`
`
`S390
`
`FIND TFR
`
`RECORD
`
`S392
`
`FIG. 24
`
`GOOG-1001-Page 27 of 56
`
`
`
`
`
`DELETE COUNT
`
`
`S394
`ADJUST FILE
`
`
`
`SIZES
`
`INCREMENT
`
`GROOMING
`
`GOOG-1001-Page 27 of 56
`
`

`
`U.S. Patent
`
`N0v.2, 1999
`
`Sheet 26 of 31
`
`5,978,791
`
`FIG. 25
`
`S396
`
`DELETE
`
`FEE
`
`S398
`
`UNLOCK
`
`GROOMING
`
`LOCK
`
`GOOG-1001-Page 28 of 56
`
`GOOG-1001-Page 28 of 56
`
`

`
`U.S. Patent
`
`N0v.2, 1999
`
`Sheet 27 of 31
`
`5,978,791
`
`GEN.o_.._
`
`-...m
`
`:m=._omn_
`
`>._zo.o<mm
`
`>m0.5mm..
`
`ovm
`
`oz_mm
`
`.Em.Emmo
`
`GOOG-1001-Page 29 of 56
`
`GOOG-1001-Page 29 of 56
`
`
`
`
`

`
`U.S. Patent
`
`N0v.2, 1999
`
`Sheet 28 of 31
`
`5,978,791
`
`omvm
`
`._<oo._mxsz
`
`
`
`n:m._:zmammwzo_wmm>
`
`mu:E01".
`
`52n:X004
`
`9200..
`
`:5
`
`m:.<mmo
`
`zobaow
`
`>mO0
`
`§oN.o_n_
`
`m.__"_
`
`._.om:o<o
`
`25
`
`mzmm
`
`>._m._.m.Es_oo
`
`
`
`zmEEsm..
`
`§.m
`
`zmBum
`
`Q.
`
`
`
`m.__u_zobqmom
`
`m._Emm<mm
`
`85
`
`mbmmo
`
`
`
`m.=n__._0._.<m_ow
`
`GOOG-1001-Page 30 of 56
`
`GOOG-1001-Page 30 of 56
`
`
`
`
`
`
`
`

`
`U.S. Patent
`
`N0v.2, 1999
`
`Sheet 29 of 31
`
`5,978,791
`
`..._m_:oE
`
`zo_E._mo
`
`Nmvm
`
`
`
` >Ezm._.mama...mz_sEmEo
`
`
`
`mo".momoomm
`
`m.__“_
`
`.Oomoommmm:0.
`
`Z_moomxoo._m.=u_
`
`>._zo-o<mm
`
`tEo5mm_o
`
`vmvm
`
`
`
`mam...>n.=._.zmn=
`
`GKN
`or.
`
`ms_<z
`
`
`
`
`
`m:E.20¢".m.=..._
`
`GOOG-1001-Page 31 of 56
`
`GOOG-1001-Page 31 of 56
`
`
`
`
`

`
`U.S. Patent
`
`N0v.2, 1999
`
`Sheet 30 of 31
`
`5,978,791
`
`.\.N¢w
`
`mfido
`
`
`
`>n_OO:o.p<mom
`
`m.__.._u_O
`
`mmvm
`
`
`
`
`
`O...>m._.zm00.0.
`
`
`
`m.__u_.:a:<
`
`
`
`ozozm<:m.__u_mm;
`
`$s_<zmam»
`
`omvm
`
`m:m._mo
`
`m.__n_mam»
`
`BIN.o_..._
`
`w.m.__u_m:E.
`
`
`
`m.._.ZDOUmm:
`
`MZO
`
`55
`
`
`
`mm:moaomm
`
`M20E#2300
`
`GOOG-1001-Page 32 of 56
`
`GOOG-1001-Page 32 of 56
`
`
`
`
`

`
`U.S. Patent
`
`Nov. 2, 1999
`
`Sheet 31 of 31
`
`5,978,791
`
`
`
`mm0...Smaomm
`
`Smnm<2Eou_
`
`mmvm
`
`m>:.<omz
`
`mwzommmm
`
`Evm
`
`E229.
`
`«Sm
`
`n_m<2Eou.
`
`pmmaamm
`
`mmor.
`
`Nmvm
`
`n=§oo._
`
`m_z<zmamp
`
`02
`
`mm;
`
`ommmmmmzooMOQ.m..=u_
`wmo:._oz
`
`
`
`«D.m.__u_
`
`Slum
`
`m>Ewo._
`
`mmzoawmm
`
`GOOG-1001-Page 33 of 56
`
`GOOG-1001-Page 33 of 56
`
`
`
`
`
`
`
`
`

`
`5,978,791
`
`1
`DATA PROCESSING SYSTEM USING
`SUBSTANTIALLY UNIQUE IDENTIFIERS TO
`IDENTIFY DATA ITEMS, WHEREBY
`IDENTICAL DATA ITEMS HAVE THE SAME
`IDENTIFIERS
`
`This is a continuation of application Ser. No. 08/425,160,
`filed on Apr. 11, 1995, which was abandoned upon the filing
`hereof.
`
`BACKGROUND OF THE INVENTION
`
`1. Field of the Invention
`
`This invention relates to data processing systems and,
`more particularly, to data processing systems wherein data
`items are identified by substantially unique identifiers which
`depend on all of the data in the data items and only on the
`data in the data items.
`
`2. Background of the Invention
`Data processing (DP) systems, computers, networks of
`computers, or the like, typically offer users and programs
`various ways to identify the data in the systems.
`Users typically identify data in the data processing system
`by giving the data some form of name. For example, a
`typical operating system (OS) on a computer provides a file
`system in which data items are named by alphanumeric
`identifiers. Programs typically identify data in the data
`processing system using a location or address. For example,
`a program may identify a record in a file or database by using
`a record number which serves to locate that record.
`
`In all but the most primitive operating systems, users and
`programs are able to create and use collections of named
`data items, these collections themselves being named by
`identifiers. These named collections can then, themselves,
`be made part of other named collections. For example, an
`OS may provide mechanisms to group files (data items) into
`directories (collections). These directories can then, them-
`selves be made part of other directories. A data item may
`thus be identified relative to these nested directories using a
`sequence of names, or a so-called pathname, which defines
`a path through the directories to a particular data item (file
`or directory).
`As another example, a database management system may
`group data records (data items) into tables and then group
`these tables into database files (collections). The complete
`address of any data record can then be specified using the
`database file name, the table name, and the record number of
`that data record.
`
`Other examples of identifying data items include: identi-
`fying files in a network file system, identifying objects in an
`object-oriented database,
`identifying images in an image
`database, and identifying articles in a text database.
`In general, the terms “data” and “data item” as used herein
`refer to sequences of bits. Thus a data item may be the
`contents of a file, a portion of a file, a page in memory, an
`object in an object-oriented program, a digital message, a
`digital scanned image, a part of a video or audio signal, or
`any other entity which can be represented by a sequence of
`bits. The term “data processing” herein refers to the pro-
`cessing of data items, and is sometimes dependent on the
`type of data item being processed. For example, a data
`processor for a digital image may differ from a data pro-
`cessor for an audio signal.
`In all of the prior data processing systems the names or
`identifiers provided to identify data items (the data items
`being files, directories, records in the database, objects in
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`object-oriented programming, locations in memory or on a
`physical device, or the like) are always defined relative to a
`specific context. For instance, the file identified by a par-
`ticular file name can only be determined when the directory
`containing the file (the context) is known. The file identified
`by a pathname can be determined only when the file system
`(context) is known. Similarly, the addresses in a process
`address space, the keys in a database table, or domain names
`on a global computer network such as the Internet are
`meaningful only because they are specified relative to a
`context.
`
`In prior art systems for identifying data items there is no
`direct relationship between the data names and the data item.
`The same data name in two different contexts may refer to
`different data items, and two different data names in the
`same context may refer to the same data item.
`In addition, because there is no correlation between a data
`name and the data it refers to, there is no a priori way to
`confirm that a given data item is in fact the one named by a
`data name. For instance, in a DP system, if one processor
`requests that another processor deliver a data item with a
`given data name,
`the requesting processor cannot,
`in
`general, verify that the data delivered is the correct data
`(given only the name). Therefore it may require further
`processing, typically on the part of the requestor, to verify
`that the data item it has obtained is,
`in fact,
`the item it
`requested.
`A common operation in a DP system is adding a new data
`item to the system. When a new data item is added to the
`system, a name can be assigned to it only by updating the
`context in which names are defined. Thus such systems
`require a centralized mechanism for the management of
`names. Such a mechanism is required even in a multi-
`processing system when data items are created and identified
`at separate processors in distinct locations, and in which
`there is no other need for communication when data items
`are added.
`
`In many data processing systems or environments, data
`items are transferred between different
`locations in the
`
`system. These locations may be processors in the data
`processing system, storage devices, memory, or the like. For
`example, one processor may obtain a data item from another
`processor or from an external storage device, such as a
`floppy disk, and may incorporate that data item into its
`system (using the name provided with that data item).
`However, when a processor (or some location) obtains a
`data item from another location in the DP system,
`it is
`possible that this obtained data item is already present in the
`system (either at the location of the processor or at some
`other location accessible by the processor) and therefore a
`duplicate of the data item is created. This situation is
`common in a network data processing environment where
`proprietary software products are installed from floppy disks
`onto several processors sharing a common file server. In
`these systems, it is often the case that the same product will
`be installed on several systems, so that several copies of
`each file will reside on the common file server.
`
`In some data processing systems in which several pro-
`cessors are connected in a network, one system is designated
`as a cache server to maintain master copies of data items,
`and other systems are designated as cache clients to copy
`local copies of the master data items into a local cache on an
`as-needed basis. Before using a cached item, a cache client
`must either reload the cached item, be informed of changes
`to the cached item, or confirm that the master item corre-
`sponding to the cached item has not changed. In other words,
`
`GOOG-1001-Page 34 of 56
`
`GOOG-1001-Page 34 of 56
`
`

`
`5,978,791
`
`3
`a cache client must synchronize its data items with those on
`the cache server. This synchronization may involve reload-
`ing data items onto the cache client. The need to keep the
`cache synchronized or reload it adds significant overhead to
`existing caching mechanisms.
`In view of the above and other problems with prior art
`systems, it is therefore desirable to have a mechanism which
`allows each processor in a multiprocessor system to deter-
`mine a common and substantially unique identifier for a data
`item, using only the data in the data item and not relying on
`any sort of context.
`It is further desirable to have a mechanism for reducing
`multiple copies of data items in a data processing system and
`to have a mechanism which enables the identification of
`
`identical data items so as to reduce multiple copies. It is
`further desirable to determine whether two instances of a
`
`data item are in fact the same data item, and to perform
`various other systems’ functions and applications on data
`items without relying on any context information or prop-
`erties of the data item.
`
`It is also desirable to provide such a mechanism in such
`a way as to make it
`transparent
`to users of the data
`processing system, and it is desirable that a single mecha-
`nism be used to address each of the problems described
`above.
`
`SUMMARY OF THE INVENTION
`
`This invention provides, in a data processing system, a
`method and apparatus for identifying a data item in the
`system, where the identity of the data item depends on all of
`the data in the data item and only on the data in the data item.
`Thus the identity of a data item is independent of its name,
`origin, location, address, or other information not derivable
`directly from the data, and depends only on the data itself.
`This invention further provides an apparatus and a method
`for determining whether a particular data item is present in
`the system or at a location in the system, by examining only
`the data identities of a plurality of data items.
`Using the method or apparatus of the present invention,
`the efficiency and integrity of a data processing system can
`be improved. The present invention improves the design and
`operation of a data storage system, file system, relational
`database, object-oriented database, or the like that stores a
`plurality of data items, by making possible or improving the
`design and operation of at least some or all of the following
`features:
`
`the system stores at most one copy of any data item at a
`given location, even when multiple data names in the system
`refer to the same contents;
`the system avoids copying data from source to destination
`locations when the destination locations already have the
`data;
`the system provides transparent access to any data item by
`reference only to its identity and independent of its present
`location, whether it be local, remote, or offline;
`the system caches data items from a server, so that only
`the most recently accessed data items need be retained;
`when the system is being used to cache data items,
`problems of maintaining cache consistency are avoided;
`the system maintains a desired level of redundancy of data
`items in a network of servers, to protect against failure by
`ensuring that multiple copies of the data items are present at
`different locations in the system;
`the system automatically archives data items as they are
`created or modified;
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`the system provides the size, age, and location of groups
`of data items in order to decide whether they can be safely
`removed from a local file system;
`the system can efficiently record and preserve any col-
`lection of data items;
`the system can efficiently make a copy of any collection
`of data items, to support a version control mechanism for
`groups of the data items;
`the system can publish data items, allowing other, possi-
`bly anonymous, systems in a network to gain access to the
`data items and to rely on the availability of the data items;
`the system can maintain a local inventory of all the data
`items located on a given removable medium, such as a
`diskette or CD-ROM, the inventory is independent of other
`properties of the data items such as their name, location, and
`date of creation;
`the system allows closely related sets of data items, such
`as matching or corresponding directories on disconnected
`computers,
`to be periodically resynchronized with one
`another;
`the system can verify that data retrieved from another
`location is the desired or requested data, using only the data
`identifier used to retrieve the data;
`the system can prove possession of specific data items by
`content without disclosing the content of the data items, for
`purposes of later legal verification and to provide anonym-
`ity;
`the system tracks possession of specific data items accord-
`ing to content by owner, independent of the name, date, or
`other properties of the data item, and tracks the uses of
`specific data items and files by content for accounting
`purposes.
`
`Other objects, features, and characteristics of the present
`invention as well as the methods of operation and functions
`of the related elements of structure, and the combination of
`parts and economies of manufacture, will become more
`apparent upon consideration of the following description
`and the appended claims with reference to the accompany-
`ing drawings, all of which form a part of this specification.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIGS. 1(a) and 1(b) depicts a typical data processing
`system in which a preferred embodiment of the present
`invention operates;
`FIG. 2 depicts a hierarchy of data items stored at any
`location in such a data processing system;
`FIGS. 3-9 depict data structures used to implement an
`embodiment of the present invention; and
`FIGS. 10(a)—28 are flow charts depicting operation of
`various aspects of the present invention.
`DETAILED DESCRIPTION OF THE
`PRESENTLY PREFERRED EXEMPLARY
`EMBODIMENTS
`
`An embodiment of the present invention is now described
`with reference to a typical data processing system 100,
`which, with reference to FIGS. 1(a) and 1(b), includes one
`or more processors (or computers) 102 and various storage
`devices 104 connected in some way, for example by a bus
`106.
`
`Each processor 102 includes a CPU 108, a memory 110
`and one or more local storage devices 112. The CPU 108,
`memory 110, and local storage device 112 may be internally
`connected, for example by a bus 114. Each processor 102
`
`GOOG-1001-Page 35 of 56
`
`GOOG-1001-Page 35 of 56
`
`

`
`5,978,791
`
`5
`
`may also include other devices (not shown), such as a
`keyboard, a display, a printer, and the like.
`In a data processing system 100, wherein more than one
`processor 102 is used, that is, in a multiprocessor system, the
`processors may be in one of various relationships. For
`example,
`two processors 102 may be in a client/server,
`client/client, or a server/server relationship. These inter-
`processor relationships may be dynamic, changing depend-
`ing on particular situations and functions. Thus, a particular
`processor 102 may change its relationship to other proces-
`sors as needed, essentially setting up a peer-to-peer relation-
`ship with other processors. In a peer-to-peer relationship,
`sometimes a particular processor 102 acts as a client
`processor, whereas at other times the same processor acts as
`a server processor. In other words, there is no hierarchy
`imposed on or required of processors 102.
`In a multiprocessor system, the processors 102 may be
`homogeneous or heterogeneous. Further, in a multiprocessor
`data processing system 100, some or all of the processors
`102 may be disconnected from the network of processors for
`periods of time. Such disconnection may be part of the
`normal operation of the system 100 or it may be because a
`particular processor 102 is in need of repair.
`Within a data processing system 100, the data may be
`organized to form a hierarchy of data storage elements,
`wherein lower level data storage elements are combined to
`form higher level elements. This hierarchy can consist of, for
`example, processors, file systems, regions, directories, data
`files, segments, and the like. For example, with reference to
`FIG. 2, the data items on a particular processor 102 may be
`organized or structured as a file system 116 which comprises
`regions 117, each of which comprises directories 118, each
`of which can contain other directories 118 or files 120. Each
`
`file 120 being made up of one or more data segments 122.
`In a typical data processing system, some or all of these
`elements can be named by users given certain implementa-
`tion specific naming conventions, the name (or pathname) of
`an element being relative to a context. In the context of a
`data processing system 100, a pathname is fully specified by
`a processor name, a filesystem name, a sequence of zero or
`more directory names identifying nested directories, and a
`final file name. (Usually the lowest level elements, in this
`case segments 122, cannot be named by users.)
`In other words, a file system 116 is a collection of
`directories 118. Adirectory 118 is a collection of named files
`120—both data files 120 and other directory files 118. Afile
`120 is a named data item which is either a data file (which
`may be simple or compound) or a directory file 118. A
`simple file 120 consists of a single data segment 122. A
`compound file 120 consists of a sequence of data segments
`122. A data segment 122 is a fixed sequence of bytes. An
`important property of any data segment
`is its size,
`the
`number of bytes in the sequence.
`A single processor 102 may access one or more file
`systems 116, and a single storage device 104 may contain
`one or more file systems 116, or portions of a file system 116.
`For instance, a file system 116 may span several storage
`devices 104.
`
`In order to implement controls in a file system, file system
`116 may be divided into distinct regions, where each region
`is a unit of management and control. A region consists of a
`given directory 118 and is ident

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket