`
`US()05978791A
`
`5,978,791
`[11] Patent Number:
`[19]
`United States Patent
`
`Farber et al.
`[45] Date of Patent:
`Nov. 2, 1999
`
`[54] DATA pRocnsstG SYSTEM USING
`SUBSTANTIALLY UNIQUE IDENTIFIERS To
`IDENTIFY DATA ITEMS, WHEREBY
`4
`igggggggfigflAITm/[S HAVE THE SAME
`.
`.
`.
`.
`.
`Inventors: Bail/1d 111. Farlfir, 813m, $211111? Ronald
`-
`5101mm],
`0“
`r00 7
`~
`‘
`’73] Ass1gnee: Kinetech, Inc., Northbrook, 111.
`
`[75]
`7
`
`
`
`21] Appl. No.: 03/960,079
`
`:22]
`
`Filed:
`
`Oct. 24, 1997
`
`Related U'Sl Application Data
`
`'63} Continuation of application No, 08/425,160, Apr, 11, 1995,
`'
`abandoned.
`
`’51]
`Int. Cl.6 ...................................................... G06F 17/30
`,521 U:S- Cl- -------------
`- 707/2; 707/1; 707/200
`
`58} Fleld Of Search
`.......... 707/2, 1, 200
`
`[56]
`
`Refelences We“
`05. PATENT DOCUMENTS
`
`...................... 364/200
`5/1990 Holloway et al.
`4,922,414
`
`364/900
`4972367 11/1990 Burke .......
`
`
`395/600
`4/1991 Bendel't et a .
`5,007,658
`365/23005
`5,025,421
`6/1991 C110 .........
`
`9.0991 Mama ..
`364/200
`5,050,074
`
`9/1991 Dyson ......
`.. 380/25
`5,050,212
`10/1991 Colwell et a1.
`.
`.'.. 341/55
`5,057,837
`
`7/1992 Kohayashi e, at.
`.
`_ 395/800
`5,129,081
`
`. 395/000
`5,129,082
`7/1992 Tll‘hng etal.
`
`5,144,667
`9/1992 Pogue,Jr. et at.
`380/45
`_____
`_ 395/425
`5,179,680
`1/1993 Come“ eta].
`
`. 395/600
`5,202,982
`4/1993 Gramlich at al.
`
`... 380/43
`5,208,858
`5/1993 Vollert et al.
`. 395/800
`5,276,901
`1/1994 Howell et a1.
`
`5,301,286
`4/1994 Rajani
`. 395/400
`
`. 395/600
`5,301,316
`4/1994 Hamilton et a1.
`8/1994 Moore .................. 380/4
`5,343,527
`
`5,357,623
`10/1994 Megot’y-Cohen
`395/425
`
`141995 Cannon """"""
`340/825'44
`55845565
`5,404,508
`4/1995 Konrad et al.
`395/600
`
`..
`
`OTHER PUBLICATIONS
`Witold Litwin et 01, Linear Hashing for Distributed Files,
`ACM SIGMOD, May, 1993 P134 327—336.
`Ming—Ling, Lo, et all, On Optimal Processor Allocation to
`Support 141110111100 Hash Joins, ACM SIGMOD, pp, 69—78,
`May 1993-
`Thomas A. Berson, Differential Cryptanalysis Mod 232 with
`Applications to MDS, pp. 69—81, 1992.
`
`(List continued on next page)
`
`Primary Examiner—4321111 V. Kulik
`Assistant Examinermlean R. Homere ,
`Attorney, Agent, or Firm—Pillsbury Madison & Sutro LLP
`[57]
`ABSTRACT
`
`in a data processing system, a mechanism identifies data
`items by substantially unique identifiers which depend on all
`of the data in the data items and only on the data in the data
`items. The system also determines Whether a particular data
`item is present in the database by examining the identifiers
`,
`.
`,
`.
`0f the mummy or data “‘3‘“
`
`................. 340/1725
`6/1972 Evangelisti 61 211.
`3,668,647
`364/200
`7/1980 Mitchell et a1.
`4,215,402
`
`.. 364/200
`9/1981 Clchelli et a1.
`4,290,105
`364/900
`3/1983 Rivest
`4,376,299
`9/1983 Rivest ct a .
`4,405,829
`.. 178/221
`.....
`10/1983 Neches et a),
`4,412,285
`.. 364/200
`11/1983 Summer, Jr, et 01
`4,414,624
`
`._ 322/388
`4/1984 Fletcher et 01.
`4,441,155
`8/1984 Benhase et 211.
`4,464,713
`
`2.. 364/200
`.. 364/200
`4,490,782 12/1984 Dixon et a1.
`2/1986 Entry, Jr. et al.
`.1 364/900
`4,571,700
`
`.
`4,577,293
`3/1986 Matick at al.
`365/189
`2/1987 Meaden .........
`4,642,793
`
`364/900
`.
`4,675,810
`6/1987 Gmner et a1.
`364/200
`4,691,299
`9/1987 Rivest et al.
`
`" 365/185
`364/200
`4,725,945
`2/1988 Kronstadtet al.
`364/900
`4,773,039
`9/1988 Zamora
`
`4,887,235 12/1989 Holloway et al.
`364/900
`........................... 364/200
`48 Claims, 31 Drawing Sheets
`4,888,681
`12/1989 Barnes et at.
`
`101
`m
`102
`$92
`
`
`
`
`711200255012
`1
`.
`.
`PROCESSOR
`53,3126:
`.
`.
`- Eg‘mifi
`.‘fl
`‘_
`W}
`1|!E
`
`
`
`
`l
`l
`102
`$92
`1
`102
`PROCEGGOR
`PROCESSOR
`PROCESSOR
`
`
`
`11m
`
`PROCESSOR
`
`{7
`
`1
`l
`
`cw
`‘
`3 E:
`E
`'
`112
`STORAGE
`DEVICE
`
`1
`3
`i
`
`11a
`
`
`
`lOZ
`
`
`W7"'42.?” M R
`LDE
`
`13.
`128
`AF
`
`, 7— ‘
`125
`M.
`
`TFR T
`:
`150
`" 130
`5m:
`15/
`L1-
`100 ‘
`9L
`
`,
`
`F757"
`'
`saw
`—«—J
`
`1
`
`;
`
`:
`
`
`
`
`
`
`
`
`
`5,978,791
`Page 2
`
`OTHER PUBLICATIONS
`
`William Perrizo, et a1,, Distributed Join Processing Perfor-
`mance Evaluation, 1994. Twenty—Seventh Hawaii Interna-
`tional Conference on System Sciences, vol. II, pp, 236—244.
`A concurrency Control Mechanism based on Extendible
`Hashing for Main Memory Database Systems, Vijay Kumar,
`pp. 109—113, ACM, vol. 3, 1989,
`Birgit Pfitzmann, Sorting Out Signature Schemes, Nov.
`1993, lst Conf, Computer & Comm. Security ’93 pp. 74—85.
`Bert dem Boer, et al., Collisions for the compression func«
`tion of MD, pp. 292—304, 1994.
`Sakti Pramanik, et 211., Multi—Directory Hashing, 1993, Info.
`Sys, vol. 18, No. 1, pp. 63—74.
`Murlidhar Koushik, Dynamic Hashing With Distributed
`Overflow Space: A File Organization With Good Insertion
`Performance, 1993, Info. Sys., vol. 18, No. 5, pp. 299—317.
`Witold Litwin, et al., LID—Linear Hashing for Distributed
`Files, IIP Labs Tech. Report No. HPL—93—21 Jun. 1993 pp.
`1—22.
`I-IAVAL —— A One—Way Hashing
`Yuliang Zheng, et a1.,
`Algorithm with Variable Length of Output
`(Extended
`Abstract), pp. 83—105, Advances in Cryptology, AUS CRIPT
`’ 92, 19 92.
`
`Chris Charnes and Josef Pieprzky, Linear Nonequivalence
`versus Nonlinearity, Pieprzky, pp. 156—164, 1993.
`Zhiyu Tian, et al., A New Hashing Function: Statistical
`Behaviour and Algorithm, pp. 3—13, SIGIR Forum, 1.993.
`G. L. Friedman, Digital Camera With Apparatus For
`Authentication of Images Produced From an Image File,
`NASA Case No. NPO—19108—1—CU, Serial No. 08/159,980,
`Nov. 24, 1993.
`H. Goodman, Feb. 9, 1994 Ada, Object—Oriented Tech—
`niques, and Concurrency in 'I‘eaching Data Sructures and
`File Management Report
`l.)ocnmentation 1’. AD—A275
`385 —— 94—04277.
`
`Advances in Cryptology—EUROCRYPT ’93, Workshop on
`the Theory and Application of Cryptographic Techniques
`Lofthus, Norway, May 23—27, 1993 Proceedings.
`Proceedings of the 1993 ACM SIGMOD International Con—
`ference on Management of Data, vol. 22, Issue 2, Jun. 1993.
`Advances in Cryptology—AUSCRYPT ’92 —— Workshop on
`the Theory and Application of Cryptographic Techniques
`Gold Coast, Queensland, Australia Dec. 13—16, 1992 Pro-
`ceedings.
`Search Report dated Jun. 24, 1996.
`
`
`
`US. Patent
`
`Nov. 2, 1999
`
`Sheet 1 0f 31
`
`5,978,791
`
`mar
`
`N
`ow
`
`Now
`
`
` mowmmoomm
`
`
`NOVNev
`
`MOmmmoomm
`
`
`
`mowwwoommmOmmwoomm
`
`
`
`
`
`
`US. Patent
`
`Nov. 2, 1999
`
`Sheet 2 0f 31
`
`5,978,791
`
`
`
`m0<mOHw
`
`m0_>ma
`
`mOwwmooma
`
`3:.0E
`
`
`
`US. Patent
`
`Nov. 2, 1999
`
`Sheet 3 0f 31
`
`5,978,791
`
`
`
`>mOHoww=a
`
`5‘r
`
`NNr
`
`Pzwfiwww
`
`
`
`3;.
`
`MAE
`
`EmFW>$
`
`
`
`w:
`
`
`
`
`US. Patent
`
`N0v.2,1999
`
`Sheet 4 0f 31
`
`5,978,791
`
`I38
`7
`FIG.
`
`Raolon ID
`
`
`
`‘0
`
`
`
`Tlme of last access
`
`Time of last modification
`
`
`
`
`
`
`
`
`
`
`
`42
`
`_ 1
`
`Com-ressed File IE
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`FIG.5
`
`
`
`
`
`US. Patent
`
`w.
`
`1
`
`W
`
`‘h
`
`M
`
`5,978,791
`
`Nww
`
`
`
`2,6:82.35.230
`
`9cowucum.o
`
`Mmvw
`
`5madzmsuawmamanuwm
`
`
`
`Q”.Emummfifia
`
`Om.
`
`QHMommmoouw
`
`m0‘
`
`h.oE
`
`¢¢_
`
`
`
`
`
`
`SOflUmUOH@Uhgow
`
`
`
`US. Patent
`
`N0v.2,1999
`
`Sheet 6 of 31
`
`5,978,791 -
`
`FIG. IO(CI)
`
`SIMPLE
`
`DATA ITEM
`
`COMPUTE MD FUNCTION ON
`
`DATA lTEM
`
`DATA ITEM
`
`8214
`
`APPEND LENGTH MODULO 32 OF
`
`TRUE NAME
`
`1
`
`
`
`US. Patent
`
`Nov. 2, 1999
`
`Sheet 7 01'31
`
`5,978,791
`
`
`
`
`
`
`8216
`
`SIMPLE?
`
`FIG. IO(b)
`
`
`8220
`
`
`PARTITION DATA STEM TNTO
`SEGMENTS
`
`
`
`
`
`
`
`
`
`
`ASSIMILATE EACH SEGMENT
`
`8222
`
`(COMPUTiNG ITS TRUE NAME)
`
`
`
`8224
`
`llllIII\
`
`\
`
`, """""3'21'8"""" \
`COMPUTE TRUE E
`NAME OF S!MPLE :
`DATA ITEM
`:
`I
`
`_______ T_””""'
`
`CREATE [NDIREQT BLOCK OF
`SEGMENT TRUE NAMES
`
`
`
`
`
`
`ASSIMILATE lNDlRECT BLOCK
`
`8226
`
`(COMPUTING ITS TRUE NAME)
`
`
` 5228
`
`REPLACE FINAL 32 BITS 0F TRUE
`NAME WITH LENGHT MOD 32 OF DATA
`
`ITEM
`
`
`
`
`
`US. Patent
`
`NOV. 2, 1999
`
`Sheet 8 0f 31
`
`5,978,791
`
`>mFZmmmoo
`
`WQmin—max:
`
`mm>
`
`mmmw
`
`n:wJEmekw
`
`wmmw
`
`
`
`n:we“.wkmjwo
`
`mm;
`
`NmNm
`
`w5<2may:wmoo
`
`mum—wmax...2.hw—xm
`
`w>wfibamm
`
`ommm
`
`>mh2m252wh<wmo«
`
`n:NJEmMOkw«rOF#2300mm:Hum«
`
`
`
`wnflmimmzkohum..
`
`lfl.
`
`ommw
`
`wzgmmhmo
`
`wE<zmam...
`
`:.07..
`
`
`
`
`
`
`
`
`
`U.S. Patent
`
`N0v.2, 1999
`
`Sheet 9 of 31
`
`5,978,791
`
`FIG. l2
`
`
` 8240
`' 8238 '
`UPDATE
`
`FILE
`
`
`
`
`
`DEPENDENCY '
`
`LIST
`
`8242
`
`LOCKED?
`
`
`
`
`
`, SEND MESSAGE TO .
`
`3244
`
`: COMPRESS
`
`(IF DESIRED)
`
`CACHE SERVER TO
`
`UPDATE CACHE
`
`8246
`
`MlRROR
`
`(IF DESlRED)
`
`
`
`US. Patent
`
`Nov. 2, 1999
`
`Sheet 10 0f 31
`
`5,978,791
`
`FIG. l3
`
` 8250
`
`SEARCH FOR
`THE
`
`
`PATHNAME
`
`A.
`
`O AD
`
`FAIL
`
`
`
`
`
`LDE INCLUDES
`
`TRUE NAME?
`
`
`
`Y S
`
`3253
`ASSIMILATE
`FILE it)
`
`
`
`NO
`
`8254
`LDE IDENTIFIES
`7 DIRECTORY?
`
`Y S
`
`8256
`
`FREEZE
`
`DIRECTORY
`
`
`
`US. Patent
`
`N0v.2, 1999
`
`Sheet 11 of 31
`
`5,978,791
`
`TRUE NAME
`
`EXISTS LOCALLY
`
`
`
`8260
`
`
`CONFIRM THAT
`
`
`
`
` 8262
`
`SEARCH FOR
`
`
`
`
`
`FIG. I4
`
`PATHNAME IN
`
`LDE TABLE
`
`
`
`
`
`8264
`
`
`I CONFIRM THAT 7
`DIRECTORY
`
`EXISTS
`
`
`
`8266
`
`
`
`8288
`
`
`NAMED FILE
`DELETE
`
`EXISTS?
`TRUE FILE
`
`
`
`
`
`8270
`
`
`
`CREATE
`
`ENTRY IN LDE
`
`
`& UPDATE
`
`
`
`
`US. Patent
`
`Nov. 2, 1999
`
`Sheet 12 0f 31
`
`5,978,791
`
`meOmmmm
`
`wamOm
`
`vnww
`
`“Fm02mm
`
`MOm.345:wm0<mwm5
`
`mmZOmmmm
`
`mmmw
`
`
`
`MAEmam...awkzw
`
`
`
`OE]:nwzmafim
`
`NEH
`
`42w
`
`Nwmm
`
`may:Cams
`
`“a5m"
`
`awmamm
`
`ome
`
`
`
`m..=u_DZ."—
`
`9.0E
`
`mm;
`
`mwzomwwm
`
`m2h<0mz
`
`«mOmmmoom.<29.550.—9Nam"»
`
`Oz
`
`
`
`
`
`US. Patent
`
`Nov. 2, 1999
`
`Sheet 13 of 31
`
`1W
`
`:0,
`
`9.,m2Boa
`
`om)‘fl7,mxaaommmm
`
`
`
`85..0:
`
`:53023...!
`
`mmOwwMOOMm
`
`>z<
`
`mwNw
`
`vmwm
`
`kzmjo
`
`wkowAmm
`
`aEOmwmooE
`
`Wm;
`
`Dwkome
`
`mmmw
`
`Emso
`
`kzmjo
`
`mgh¢s>
`
`whm<oo<ommm>P.sz
`
`wmmmh:MSE
`
`mmZOQmWK
`
`m0
`
`
`
`
`
`
`
`
`
`US. Patent
`
`NOV. 2, 1999
`
`Sheet 14 0f 31
`
`5,978,791
`
`cemwm
`
`MZEMMHMQ
`
`
`
`whdd20F<mixm
`
`kw:0kand.D72
`
`Wm:
`
`
`
`vw.MUM—20m
`
`62.19.5an
`
`Oommw
`
`>swEmkw>w
`
`momwm
`
`won”an:n5x004
`
`DD<,wwE<zmam...
`
`
`
`n:ZOF<OOAmomDOm
`
`
`
`MOmma.momDOwO...
`
`wE<zmam:
`
`nv
`
`
`
`mug—mMDmE.m>mwwmm
`
`momDOm20
`
`mowwwoomm
`
`
`
`
`
`0....m0<mww§02mm
`
`“#me
`
`Eva—.0:
`
`may;”.0momDOw
`
`20mmmam—"EBm2<2
`
`sz_._.<szwQ
`
`ommw
`
`mmOhw
`
`n:mowmmoomm
`
`
`
`
`
`
`
`
`
`
`
`US. Patent
`
`NOV. 2, 1999
`
`Sheet 15 0f 31
`
`5,978,791
`
`mmNm
`
`wwmeEOomQ
`
`ome
`
`«n:we".
`
`
`
`QmwwmmmEOQ,
`
`w>mkzw9:...
`
`
`
`m0»n:win..
`
`>mhzwMAEmam,
`
`mam.»mOwmm...7:
`
`«ms—<2
`
`ABE.oE
`
`
`
`
`
`
`
`US. Patent
`
`NOV.2,1999
`
`Sheet16 0f31
`
`5,978,791
`
`
`
`3:....9“.
`
`QMNder
`
`nomw,
`
`mdw,mm»
`
`i
`
`comm
`
`955.5
`
`Nomw
`
`>Eka
`
`mum:
`
`komgmm
`
`,
`
`3mm
`
`wm_m0m30w
`
`I‘.H~
`
`momw
`
`
`
`may:muémm
`
`:055E
`
`Emomnow
`
`numombow
`
`NEG:02
`
`womw
`
`mk<004
`
`wdumHOEmm
`
`
`
`
`
`
`
`
`US. Patent
`
`Nov. 2, 1999
`
`Sheet 17 0f 31
`
`5,978,791
`
`mg»
`
`
`
`flqsomw:96qu
`
`may:m0>m00mm
`
`No...“Q.mJE
`
`«MAEmam...
`
`NNmm.
`
`mam...mxg
`
`4.400.“MAE
`
`\/Ami—n.
`
`mwmm
`
`mkmgmo
`
`MAEmamk
`
`Mm:
`
`mm_n=._.2wo_ma...
`
`
`
`MDME.GZCMUAN
`
`«MAE
`
`ommw
`
`2&2wkfimo
`
`
`
`MAE$Oh<m0w
`
`
`
`
`
`
`
`
`
`
`US. Patent
`
`Nov. 2, 1999
`
`Sheet 18 0f 31
`
`5,978,791
`
`7mumm,
`
`m>OEmmw9BEw><m
`mu:
`
`>mkzw
`
`Mm;
`
`72:00mm:
`
`hr"
`
`
`
`
`
`.wgmz...man.2—?Nd".map—um“MAE
`
`kZDOo
`
`
`
`mm:kzms—mmomo
`
`ommw
`
`252Q...MAE>n~00
`
`3v?.OE
`
`
`
`
`
`
`
`US. Patent
`
`Nov. 2, 1999
`
`Sheet 19 0f 31
`
`5,978,791
`
`783
`
`7:o<mm0u
`
`824.2522:az<mi.
`
`53553aWWW“.mEzEmomnm
`
`
`
`m..=u_>mOPOmmE
`
`m1...2—>m0k0mm5
`
`
`
`>m0k0mm5ZmZO
`
`5mm
`
`
`
`3m:mkdmmo
`
`
`
`Em:<._.<n_
`
`
`
`30..0E
`
`$8
`
`kzmfimmoz
`
`
`
`$00..MNwmmn.
`
`
`
`
`
`
`
`US. Patent
`
`Nov. 2, 1999
`
`Sheet 20 0f 31
`
`5,978,791
`
`03%
`
`Jammwmwy<55.52oz<wan.
`
`
`omoomm9bfizm.genwizamomam
`
`8%,:05mo»
`smEm5.2_>m05mma
`
`ZO~P<EmOwZ”
`
`
`
`35..9“.
`
`
`
`
`
`,EN:<._.<O>>mz
`
`mzhmhfi§$m<
`
`Nvmw
`
`
`
`>mOkommaZm>5
`
`31mm
`
`thEmmomo
`
`mNmmmuwzk
`
`v804
`
`
`
`
`
`
`
`US. Patent
`
`Nov. 2, 1999
`
`Sheet 21 of 31
`
`5,978,791
`
`ommw
`
`
`
`4.5“.mkdflmo
`
`
`
`mszzIk<m
`
`Nmmm
`
`
`
`O...zk<m$23
`
`mE<Zwsmk
`
`>~Fzm
`
`3%38
`
`o<wmmoim0"—mmOE02vme
`
`.2004WAG
`
`mvmm
`
`mam»wigON.0_..n_
`
`
`
`
`
`>m0kowm=o>m0h0wm5mekzmHmzoa
`
`
`
`
`
`
`
`US. Patent
`
`N0V.2, 1999
`
`Sheet 22 0f 31
`
`5,978,791
`
` 8354
`WAIT FOR
`
`FREEZE LOCK
`
`TO TURN OFF
`
`
`
`
`
`8356
`
`
`
`FIND TFR
`
`FIG.2I
`
`ENTRY
`
`
`
` 8358
`
`DECREMENT
`
`
`REFERENCE
`Ch! I‘ITUUI‘ l
`
`
`
`
`
`
`
`8362
`
`REFERENCE COUNT IS
`
`DELETE
`
`
`ZERO 8: NO DEPENDENT
`
`TRUE FILE
`
`
`
`SYSTEMS IN TFR?
`
`
`
`
`S364
`
`
`REMOVE FILE ID
`
`AND COMPRESSED
`FILE ID
`
`
`
`
`
`US. Patent
`
`N0V.2, 1999
`
`Sheet 23 0f 31
`
`5,978,791
`
`
`
` 8365
`GET
`
`
`
`OPERATION
`
`
`
`CREATE OR 7
`MODIFY?
`
`8368
`
`FIG. 22
`
`YES
`
`
`
`ASSIMILATE
`
`
`S358
`
`
`
`COPY OR DELETE
`
`COMPOUND?
`
`YES
`
`
`
`
`
`
`
`
`
`
`
`8378
`
`8370
`RECORD TRUE
`MODIFY USE‘
`
`
`
`
`
`
`COUNT OF EACH
`COMPONENT
`
`NAME IN AUDIT
`FILE
`
`
`8379
`
`FOR EACH PARENT
`
`
`
`DIRECTORY OR FILE,
`
`UPDATE USE COUNT,
`LAST ACCESS AND
`
`MODIFY TIMES
`
`
`
`
`
`US. Patent
`
`N0v.2, 1999
`
`Sheet 24 of 31
`
`5,978,791
`
`
`
` S382
`
`VWHUFY
`
`
`
`FIG. 23
`
`
`
`GROGMlNG
`
`LOCK OFF
`
`
`
`8384
`
`
`
`SET
`
`GROOMING
`
`LOCK
`
`8386
`
`
`
`
`SET GROOM
`
`COUNTS
`
`
`
`
`
`
`US. Patent
`
`N0v.2, 1999
`
`Sheet 25 0f31
`
`5,978,791
`
`
`
`
`FIND LDE
`
`S388
`
`RECORD
`
`F1324
`
`
`
`
` S390
`
`FIND, TFR
`
`RECORD
`
`
`S392
`
`
`INCREMENT
`
`GROOMING
`
`DELETE COUNT
`
`
`
`8394
`
`
`
`ADJUST FILE
`
`S!ZES
`
`
`
`
`
`US. Patent
`
`.
`
`NOV. 2, 1999
`
`Sheet 26 0f 31
`
`5,978,791
`
`FIG. 25
`
` 8396
`DELETE
`
`
`
`
`
`FILE
`
`
`
`
` 8398
`
`
`UNLOCK
`
`GROOMING
`
`LOCK
`
`
`
`
`
`US. Patent
`
`Nov. 2, 1999
`
`Sheet 27 0f 31
`
`5,978,791
`
`BEN.oE
`
`oovw
`
`”Fm—VANMAE
`
`9.3.200...
`
`02
`
`vovw
`
`hmzxomnw
`
`zmmo
`
`Novm
`
`02mm
`
`fiomb‘wmo
`
`vam
`
`tmiomm
`
`zmmo
`
`>mChomm“
`
`>420.D<mm
`
`
`
`
`
`
`
`
`US. Patent
`
`Nov. 2, 1999
`
`Sheet 28» 0f 31
`
`5,978,791
`
`omvw
`
`.2004mx<5
`
`m:minzmbkmmw29me>
`
`
`
`ME.EONE
`
`H02u:x00;
`
`meOOJ
`
`wdw
`
`w0m10<0
`
`mwvw
`
`Gzfim
`
`>Amkmfin=200
`
`
`
`2m_t._m>>w.-
`
`wva
`
`wo¢w mum”.
`
`mwgw
`
`wkdmmo
`
`35mg...
`
`vmvm
`
`2E3mm
`
`n:
`
`wczwrehgow
`
`
`
`MAEIOPéOw
`
`
`
`
`
`
`
`
`
`US. Patent
`
`m.N
`
`%2,
`
`M0m92m
`
`5,978,791
`
`9oamoowmwe."0
`
`%“Echomma
`
`
`
`zormjmnEzodfim
`
`tmiommwm>2.mo9953BE
`
`ASE0E
`
`
`
`met.Efizmm.
`
`may?EOEwe“.
`
`Nva
`
`
`
` EmbzmPMwmm.“macaw—MED
`
`
`
`mOumomOOMm
`
`win
`
`
`
`
`
`
`US. Patent
`
`NOV. 2, 1999
`
`Sheet 30 0f 31
`
`5,978,791
`
`thw
`
`mmjmm
`
`
`
`E00onmom
`
`we“.no
`
`wmvw
`
`.Ok>mkzwand.
`
`MAEb03<
`
`omvm
`
`mhmjmo
`
`MAEmay:
`
`SEN.9“.
`
`rmvm
`
`
`
`mm:woaamm
`
`wzo>m#2300
`
`wm>.
`
`wMJEwnmk
`
`w_kZDOOmmD
`
`mzo
`
`02
`
`02w<2wuzu.mm;
`
`ww2<zmeP
`
`
`
`
`
`
`
`
`
`US. Patent
`
`Nov. 2, 1999
`
`Sheet 31 0f 31
`
`5,978,791
`
`mm.0_L$53
`
`vaw
`
`OZ
`
`MES,”mam...
`
`vmvm
`
`82:0»
`
`wm>
`
`wowfim<>>m0m
`
`,
`
`
`
`mmOh.kmmndmm
`
`,wmvm
`
`w>§<0m2
`
`meOmmmm
`
`kwwadmm,
`
`,mm>DmigmCL0vam
`mewNMQEOOm09w..=mmMODJUZ
`
`wemnzw
`
`33m
`
`matwOm
`
`mmZOmwwm
`
`
`
`
`
`
`5,978,791
`
`1
`DATA PROCESSING SYSTEM USING
`SUBSTANTIALLY UNIQUE IDENTIFIERS TO
`IDENTIFY DATA ITEMS, WHEREBY
`IDENTICAL DATA ITEMS HAVE THE SAME
`IDENTIFIERS
`
`This is a continuation of application Ser. No. 08/425,160,
`filed on Apr. 11, 1995, which was abandoned upon the filing
`hereof.
`
`BACKGROUND OF THE INVENTION
`
`1. Field of the Invention
`
`This invention relates to data processing systems and,
`more particularly, to data processing systems wherein data
`items are identified by substantially unique identifiers which
`depend on all of the data in the data items and only on the
`data in the data items,
`
`2. Background of the Invention
`Data processing (DP) systems, computers, networks of
`computers, or the like, typically offer users and programs
`various ways to identify the data in the systems.
`Users typically identify data in the data processing system
`by giving the data some form of name. For example, a
`typical operating system (OS) on a computer provides a file
`system in which data items are named by alphanumeric
`identifiers. Programs typically identify data in the data
`processing system using a location or address. For example,
`a program may identify a record in a file or database by using
`a record number which serves to locate that record.
`
`In all but the most primitive operating systems, users and
`programs are able to create and use collections of named
`data items,
`these collections themselves being named by
`identifiers. These named collections can then, themselves,
`be made part of other named collections. For example, an
`OS may provide mechanisms to group files (data items) into
`directories (collections). These directories can then, them~
`selves be made part of other directories. A data item may
`thus be identified relative to these nested directories using a
`sequence of names, or a so-called pathname, which defines
`a path through the directories to a particular data item (file
`or directory).
`As another example, a database management system may
`group data records (data items) into tables and then group
`these tables into database files (collections). The complete
`address of any data record can then be specified using the
`database file name, the table name, and the record number of
`that data record
`
`Other examples of identifying data items include: identi-
`fying files in a network file system, identifying objects in an
`object-oriented database,
`identifying images in an image
`database, and identifying articles in a text database.
`In general, the terms “data” and “data item” as used herein
`refer to sequences of bits. Thus a data item may be the
`contents of a file, a portion of a file, a page in memory, an
`object in an object—oriented program, a digital message, a
`digital scanned image, a part of a Video or audio signal, or
`any other entity which can be represented by a sequence of
`bits. The term “data processing” herein refers to the pro-
`cessing of data items, and is sometimes dependent on the
`type of data item being processed. For example, a data
`processor for a digital image may diiIer from a data pro-
`cessor for an audio signal.
`In all of the prior data processing systems the names or
`identifiers provided to identify data items (the data items
`being files, directories, records in the database, objects in
`
`10
`
`IS
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`65
`
`2
`object—oriented programming, locations in memory or on a
`physical device, or the like) are always defined relative to a
`specific context. For instance, the file identified by a par—
`ticular file name can only be determined when the directory
`containing the file (the context) is known. The file identified
`by a pathname can be determined only when the file system
`(context) is known. Similarly, the addresses in a process
`address space, the keys in a database table, or domain names
`on a global computer network such as the Internet are
`meaningful only because they are specified relative to a
`context.
`
`In prior art systems for identifying data items there is no
`direct relationship between the data names and the data item.
`The same data name in two dilferent contexts may refer to
`difi’erent data items, and two difi‘ferent data names in the
`same context may refer to the same data item.
`In addition, because there is no correlation between a data
`name and the data it refers to, there is no a priori way to
`confirm that a given data item is in fact the one named by a
`data name. For instance, in a DP system, if one processor
`requests that another processor deliver a data item with a
`given data name,
`the requesting processor cannot,
`in
`general, verify that the data delivered is the correct data
`(given only the name). Therefore it may require further
`processing, typically on the part of the requester, to verify
`that the data item it has obtained is,
`in fact,
`the item it
`requested.
`Acornmon operation in a DP system is adding a new data
`item to the system. When a new data item is added to the
`system, a name can be assigned to it only by updating the
`context in which names are defined. Thus such systems
`require a centralized mechanism for the management of
`names. Such a mechanism is required even in a multi~
`processing system when data items are created and identified
`at separate processors in distinct locations, and in which
`there is no other need for communication when data items
`are added.
`
`In many data processing systems or environments, data
`items are transferred between different
`locations in the
`system. These locations may be processors in the data
`processing system, storage devices, memory, or the like. For
`example, one processor may obtain a data item from another
`processor or from an external storage device, such as a
`floppy disk, and may incorporate that data item into its
`system (using the name provided with that data item).
`However, when a processor (or some location) obtains a
`data item from another location in the DP system,
`it
`is
`possible that this obtained data item is already present in the
`system (either at the location of the processor or at some
`other location accessible by the processor) and therefore a
`duplicate of the data item is created, This situation is
`common in a network data processing environment where
`proprietary software products are installed from floppy disks
`onto several processors sharing a common file server. In
`these systems, it is often the case that the same product will
`be installed on several systems, so that several copies of
`each file will reside on the common file server.
`
`In some data processing systems in which several pro-
`cessors are connected in a network, one system is designated
`as a cache server to maintain master copies of data items,
`and other systems are designated as cache clients to copy
`local copies of the master data items into a local cache on an
`sis-needed basis. Before using a cached item, a cache client
`must either reload the cached item, be informed of changes
`to the cached item, or confirm that the master item corre—
`sponding to the cached item has not changed. In other words,
`
`
`
`5,978,791
`
`3
`a cache client must synchronize its data items with those on
`the cache server. This synchronization may involve reload-
`ing data items onto the cache client, The need to keep the
`cache synchronized or reload it adds significant overhead to
`existing caching mechanisms.
`In view of the above and other problems with prior art
`systems, it is therefore desirable to have a mechanism which
`allows each processor in a multiprocessor system to deter—
`mine a common and substantially unique identifier for a data
`item, using only the data in the data item and not relying on
`any sort of context.
`It is further desirable to have a mechanism for reducing
`multiple copies ()Idata items in a data processing system and
`to have a mechanism which enables the identification of
`identical data items so as to reduce multiple copies. It is
`further desirable to determine whether two instances of a
`data item are in fact the same data item, and to perform
`various other systems’ functions and applications on data
`items without relying on any context information or prop—
`erties of the data item.
`
`It is also desirable to provide such a mechanism in such
`a way as
`to make it
`transparent
`to users of the data
`processing system, and it is desirable that a single mecha-
`nism be used to address each of the problems described
`above.
`
`SUMMARY OF THE INVENTION
`
`This invention provides, in a data processing system, a
`method and apparatus for identifying a data item in the
`system, where the identity of the data item depends on all of
`the data in the data item and only on the data in the data item.
`Thus the identity of a data item is independent of its name,
`origin, location, address, or other information not derivable
`directly from the data, and depends only on the data itself.
`This invention further provides an apparatus and a method
`for determining whether a particular data item is present in
`the system or at a location in the system, by examining only
`the data identities of a plurality of data items.
`Using the method or apparatus of the present invention,
`the efficiency and integrity of a data processing system can
`be improved. The present invention improves the design and
`operation of a data storage system, file system, relational
`database, object-oriented database, or the like that stores a
`plurality of data items, by making possible or improving the
`design and operation of at least some or all of the following
`features:
`
`the system stores at most one copy of any data item at a
`given location, even when multiple data names in the system
`refer to the same contents;
`the system avoids copying data from source to destination
`locations when the destination locations already have the
`data;
`the system provides transparent access to any data item by
`reference only to its identity and independent of its present
`location, whether it be local, remote, or offline;
`the system caches data items from a server, so that only
`the most recently accessed data items need be retained;
`when the system is being used to cache data items,
`problems of maintaining cache consistency are avoided;
`the system maintains a desired level of redundancy of data
`items in a network of servers, to protect against failure by
`ensuring that multiple copies of the data items are present at
`different locations in the system;
`the system automatically archives data items as they are
`created or modified;
`
`4
`the system provides the size, age, and location of groups
`of data items in order to decide whether they can be safely
`removed from a local file system;
`the system can efficiently record and preserve any col-
`lection of data items;
`the system can efficiently make a copy of any collection
`of data items, to support a version control mechanism for
`groups of the data items;
`the system can publish data items, allowing other, possi-
`bly anonymous, systems in a network to gain access to the
`data items and to rely on the availability of the data items;
`the system can maintain a local inventory of all the data
`items located on a given removable medium, such as a
`diskette or CD—ROM, the inventory is independent of other
`properties of the data items such as their name, location, and
`date of creation;
`the system allows closely related sets of data items, such
`as matching or corresponding directories on disconnected
`computers,
`to be periodically resynchronized with one
`another;
`the system can verify that data retrieved from another
`location is the desired or requested data, using only the data
`identifier used to retrieve the data;
`the system can prove possession of specific data items by
`content without disclosing the content of the data items, for
`purposes of later legal verification and to provide anonym—
`ily;
`the system tracks possession of specific data items accord—
`ing to content by owner, independent of the name, date, or
`other properties of the data item, and tracks the uses of
`specific data items and files by content for accounting
`purposes.
`
`Other objects, features, and characteristics of the present
`invention as well as the methods of operation and functions
`of the related elements of structure, and the combination of
`parts and economies of manufacture, will become more
`apparent upon consideration of the following description
`and the appended claims with reference to the accompany-
`ing drawings, all of which form a part of this specification.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIGS. 1(a) and 1(b) depicts a typical data processing
`system in which a preferred embodiment of the present
`invention operates;
`FIG. 2 depicts a hierarchy of data items stored at any
`location in such a data processing system;
`FIGS. 3M9 depict data structures used to implement an
`embodiment of the present invention; and
`FIGS. 10(a)-28 arc flow charts depicting operation of
`various aspects of the present invention.
`DETAILED DESCRIPTION OF THE
`PRESENTLY PREFERRED EXEMPLARY
`EMBODIMENTS
`
`An embodiment of the present invention is now described
`:with reference to a typical data processing system 100,
`which, with reference to FIGS, 1(a) and 1(b), includes one
`or more processors (or computers) 102 and various storage
`devices 104 connected in some way, for example by a bus
`106.
`
`Each processor 102 includes a CPU 108, a memory 110
`and one or more local storage devices 112. The CPU 108,
`memory 110, and local storage device 112 may be internally
`connected, for example by a bus 114. Each processor 102
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`
`
`5,978,791
`
`5
`may also include other devices (not shown), such as a
`keyboard, a display, a printer, and the like.
`In a data processing system 100, wherein more than one
`processor 102 is used, that is, in a multiprocessor system, the
`processors may be in one of various relationships. For
`example,
`two processors 102 may be in a client/server,
`client/client, or a server/server relationship. These inter-
`processor relationships may be dynamic, changing depend-
`ing on particular situations and functions. Thus, a particular
`processor 102 may change its relationship to other proces-
`sors as needed, essentially setting up a peer-to-peer relation-
`ship with other processors. In a peer—to-peer relationship,
`sometimes a particular processor 102 acts as a client
`processor, whereas at other times the same processor acts as
`a server processor. In other words,
`there is no hierarchy
`imposed on or required of processors 102.
`In a multiprocessor system, the processors 102 may be
`homogeneous or heterogeneous. Further, in a multiprocessor
`data processing system 100, some or all of the processors
`102 may be disconnected from the network of processors for
`periods of time. Such disconnection may be part of the
`normal operation of the system 100 or it may be because a
`particular processor 102 is in need of repair.
`Within a data processing system 100, the data may be
`organized to form a hierarchy of data storage elements,
`wherein lower level data storage elements are combined to
`form higher level elements. This hierarchy can consist of, for
`example, processors, file systems, regions, directories, data
`files, segments, and the like. For example, with reference to
`FIG. 2, the data items on a particular processor 102 may be
`organized or structured as a file system 116 which comprises
`regions 117, each of which comprises directories 118, each
`of which can contain other directories 118 or files 120. Each
`file 120 being made up of one or more data segments 122.
`In a typical data processing system, some or all of these
`elements can be named by users given certain implementa-
`tion specific naming conventions, the name (or pathnarne) of
`an element being relative to a context. In the context of a
`data processing system 100, a pathname is fully specified by
`a processor name, a filesystem name, a sequence of zero or
`more directory names identifying nested directories, and a
`final file name. (Usually the lowest level elements, in this
`case segments 122, cannot be named by users.)
`In other words, a file system 116 is a collection of
`directories 118. Adirectory 118 is a collection of named files
`120—~both data files 120 and other directory files 118. A file
`120 is a named data item which is either a data file (which
`may be simple or compound) or a directory file 118. A
`simple file 120 consists of a single data segment 122. A
`compound file 120 consists of a sequence of data segments
`122. A data segment 122 is a fixed sequence of bytes. An
`important property of any data segment
`is its size,
`the
`number of bytes in the sequence.
`A single processor 102 may access one or more file
`systems 116, and a single storage device 104 may contain
`one or more file systems 116, or portions of a file system 116.
`For instance, a file system 116 may span several storage
`devices 104.
`
`In order to implement controls in a file system, file system
`116 may be divided into distinct regions, where each region
`is a unit of management and control. A region consists of a
`given directory 118 and is identified by the pathname (user
`defined) of the directory.
`In the following, the term “location”, with respect to a
`data processing system 100, refers to any of a particular
`processor 102 in the system, a memory of a particular
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`processor, a storage device, a removable storage medium
`(such as a floppy disk or compact disk), or any other physical
`location in the system. The term “local” with respect to a
`particular processor 102 refers to the memory and storage
`devices of that particular processor.
`In the following, the terms “True Name”, “data identity”
`and “data identifier” refer to the substantially unique data
`identifier for a particular data item. The term “True File”
`refers to the actual file, segment, or data item identified by
`a True Name.
`
`A file system for a data processing system 100 is now
`described which is intended to work with an existing oper-
`ating system by augmenting some of the operating system’s
`file management system codes. The embodiment provided
`relies on the standard file management primitives for actu—
`ally storing to and retrieving data items from disk, but uses
`the mechanisms of the present invention to reference and
`access those data items.
`
`The processes and mechanisms (services) provided in this
`embodiment are grouped into the following categories:
`primitive mechanisms, operating system mechanisms,
`remote mechanisms, background mechanisms, and extended
`mechanisms.
`
`\opoqouxasw
`
`Primitive mechanisms provide fundamental capabilities
`used to support other mechanisms. The following primitive
`mechanisms are described:
`1. Calculate True Name;
`2, Assimilate Data Item;
`. New True File;
`. Get True Name from Path;
`. Link path to True Name;
`. Realize True File from Location;
`. Locate Remote File;
`Make True File Local;
`. Create Scratch File;
`10. Freeze Directory;
`11. Expand Frozen Directory;
`12. Delete True File;
`13. Process Audit File Entry;
`14. Begin Grooming;
`15. Select For Remov