throbber
llllllllllllllIllllllllllll||l||lllll||||l||l|ll||l|l|||lllllllllllllllllll
`
`US()05978791A
`
`5,978,791
`[11] Patent Number:
`[19]
`United States Patent
`
`Farber et al.
`[45] Date of Patent:
`Nov. 2, 1999
`
`[54] DATA pRocnsstG SYSTEM USING
`SUBSTANTIALLY UNIQUE IDENTIFIERS To
`IDENTIFY DATA ITEMS, WHEREBY
`4
`igggggggfigflAITm/[S HAVE THE SAME
`.
`.
`.
`.
`.
`Inventors: Bail/1d 111. Farlfir, 813m, $211111? Ronald
`-
`5101mm],
`0“
`r00 7
`~
`‘
`’73] Ass1gnee: Kinetech, Inc., Northbrook, 111.
`
`[75]
`7
`
`
`
`21] Appl. No.: 03/960,079
`
`:22]
`
`Filed:
`
`Oct. 24, 1997
`
`Related U'Sl Application Data
`
`'63} Continuation of application No, 08/425,160, Apr, 11, 1995,
`'
`abandoned.
`
`’51]
`Int. Cl.6 ...................................................... G06F 17/30
`,521 U:S- Cl- -------------
`- 707/2; 707/1; 707/200
`
`58} Fleld Of Search
`.......... 707/2, 1, 200
`
`[56]
`
`Refelences We“
`05. PATENT DOCUMENTS
`
`...................... 364/200
`5/1990 Holloway et al.
`4,922,414
`
`364/900
`4972367 11/1990 Burke .......
`
`
`395/600
`4/1991 Bendel't et a .
`5,007,658
`365/23005
`5,025,421
`6/1991 C110 .........
`
`9.0991 Mama ..
`364/200
`5,050,074
`
`9/1991 Dyson ......
`.. 380/25
`5,050,212
`10/1991 Colwell et a1.
`.
`.'.. 341/55
`5,057,837
`
`7/1992 Kohayashi e, at.
`.
`_ 395/800
`5,129,081
`
`. 395/000
`5,129,082
`7/1992 Tll‘hng etal.
`
`5,144,667
`9/1992 Pogue,Jr. et at.
`380/45
`_____
`_ 395/425
`5,179,680
`1/1993 Come“ eta].
`
`. 395/600
`5,202,982
`4/1993 Gramlich at al.
`
`... 380/43
`5,208,858
`5/1993 Vollert et al.
`. 395/800
`5,276,901
`1/1994 Howell et a1.
`
`5,301,286
`4/1994 Rajani
`. 395/400
`
`. 395/600
`5,301,316
`4/1994 Hamilton et a1.
`8/1994 Moore .................. 380/4
`5,343,527
`
`5,357,623
`10/1994 Megot’y-Cohen
`395/425
`
`141995 Cannon """"""
`340/825'44
`55845565
`5,404,508
`4/1995 Konrad et al.
`395/600
`
`..
`
`OTHER PUBLICATIONS
`Witold Litwin et 01, Linear Hashing for Distributed Files,
`ACM SIGMOD, May, 1993 P134 327—336.
`Ming—Ling, Lo, et all, On Optimal Processor Allocation to
`Support 141110111100 Hash Joins, ACM SIGMOD, pp, 69—78,
`May 1993-
`Thomas A. Berson, Differential Cryptanalysis Mod 232 with
`Applications to MDS, pp. 69—81, 1992.
`
`(List continued on next page)
`
`Primary Examiner—4321111 V. Kulik
`Assistant Examinermlean R. Homere ,
`Attorney, Agent, or Firm—Pillsbury Madison & Sutro LLP
`[57]
`ABSTRACT
`
`in a data processing system, a mechanism identifies data
`items by substantially unique identifiers which depend on all
`of the data in the data items and only on the data in the data
`items. The system also determines Whether a particular data
`item is present in the database by examining the identifiers
`,
`.
`,
`.
`0f the mummy or data “‘3‘“
`
`................. 340/1725
`6/1972 Evangelisti 61 211.
`3,668,647
`364/200
`7/1980 Mitchell et a1.
`4,215,402
`
`.. 364/200
`9/1981 Clchelli et a1.
`4,290,105
`364/900
`3/1983 Rivest
`4,376,299
`9/1983 Rivest ct a .
`4,405,829
`.. 178/221
`.....
`10/1983 Neches et a),
`4,412,285
`.. 364/200
`11/1983 Summer, Jr, et 01
`4,414,624
`
`._ 322/388
`4/1984 Fletcher et 01.
`4,441,155
`8/1984 Benhase et 211.
`4,464,713
`
`2.. 364/200
`.. 364/200
`4,490,782 12/1984 Dixon et a1.
`2/1986 Entry, Jr. et al.
`.1 364/900
`4,571,700
`
`.
`4,577,293
`3/1986 Matick at al.
`365/189
`2/1987 Meaden .........
`4,642,793
`
`364/900
`.
`4,675,810
`6/1987 Gmner et a1.
`364/200
`4,691,299
`9/1987 Rivest et al.
`
`" 365/185
`364/200
`4,725,945
`2/1988 Kronstadtet al.
`364/900
`4,773,039
`9/1988 Zamora
`
`4,887,235 12/1989 Holloway et al.
`364/900
`........................... 364/200
`48 Claims, 31 Drawing Sheets
`4,888,681
`12/1989 Barnes et at.
`
`101
`m
`102
`$92
`
`
`
`
`711200255012
`1
`.
`.
`PROCESSOR
`53,3126:
`.
`.
`- Eg‘mifi
`.‘fl
`‘_
`W}
`1|!E
`
`
`
`
`l
`l
`102
`$92
`1
`102
`PROCEGGOR
`PROCESSOR
`PROCESSOR
`
`
`
`11m
`
`PROCESSOR
`
`{7
`
`1
`l
`
`cw
`‘
`3 E:
`E
`'
`112
`STORAGE
`DEVICE
`
`1
`3
`i
`
`11a
`
`
`
`lOZ
`
`
`W7"'42.?” M R
`LDE
`
`13.
`128
`AF
`
`, 7— ‘
`125
`M.
`
`TFR T
`:
`150
`" 130
`5m:
`15/
`L1-
`100 ‘
`9L
`
`,
`
`F757"
`'
`saw
`—«—J
`
`1
`
`;
`
`:
`
`
`
`
`
`
`
`

`

`5,978,791
`Page 2
`
`OTHER PUBLICATIONS
`
`William Perrizo, et a1,, Distributed Join Processing Perfor-
`mance Evaluation, 1994. Twenty—Seventh Hawaii Interna-
`tional Conference on System Sciences, vol. II, pp, 236—244.
`A concurrency Control Mechanism based on Extendible
`Hashing for Main Memory Database Systems, Vijay Kumar,
`pp. 109—113, ACM, vol. 3, 1989,
`Birgit Pfitzmann, Sorting Out Signature Schemes, Nov.
`1993, lst Conf, Computer & Comm. Security ’93 pp. 74—85.
`Bert dem Boer, et al., Collisions for the compression func«
`tion of MD, pp. 292—304, 1994.
`Sakti Pramanik, et 211., Multi—Directory Hashing, 1993, Info.
`Sys, vol. 18, No. 1, pp. 63—74.
`Murlidhar Koushik, Dynamic Hashing With Distributed
`Overflow Space: A File Organization With Good Insertion
`Performance, 1993, Info. Sys., vol. 18, No. 5, pp. 299—317.
`Witold Litwin, et al., LID—Linear Hashing for Distributed
`Files, IIP Labs Tech. Report No. HPL—93—21 Jun. 1993 pp.
`1—22.
`I-IAVAL —— A One—Way Hashing
`Yuliang Zheng, et a1.,
`Algorithm with Variable Length of Output
`(Extended
`Abstract), pp. 83—105, Advances in Cryptology, AUS CRIPT
`’ 92, 19 92.
`
`Chris Charnes and Josef Pieprzky, Linear Nonequivalence
`versus Nonlinearity, Pieprzky, pp. 156—164, 1993.
`Zhiyu Tian, et al., A New Hashing Function: Statistical
`Behaviour and Algorithm, pp. 3—13, SIGIR Forum, 1.993.
`G. L. Friedman, Digital Camera With Apparatus For
`Authentication of Images Produced From an Image File,
`NASA Case No. NPO—19108—1—CU, Serial No. 08/159,980,
`Nov. 24, 1993.
`H. Goodman, Feb. 9, 1994 Ada, Object—Oriented Tech—
`niques, and Concurrency in 'I‘eaching Data Sructures and
`File Management Report
`l.)ocnmentation 1’. AD—A275
`385 —— 94—04277.
`
`Advances in Cryptology—EUROCRYPT ’93, Workshop on
`the Theory and Application of Cryptographic Techniques
`Lofthus, Norway, May 23—27, 1993 Proceedings.
`Proceedings of the 1993 ACM SIGMOD International Con—
`ference on Management of Data, vol. 22, Issue 2, Jun. 1993.
`Advances in Cryptology—AUSCRYPT ’92 —— Workshop on
`the Theory and Application of Cryptographic Techniques
`Gold Coast, Queensland, Australia Dec. 13—16, 1992 Pro-
`ceedings.
`Search Report dated Jun. 24, 1996.
`
`

`

`US. Patent
`
`Nov. 2, 1999
`
`Sheet 1 0f 31
`
`5,978,791
`
`mar
`
`N
`ow
`
`Now
`
`
` mowmmoomm
`
`
`NOVNev
`
`MOmmmoomm
`
`
`
`mowwwoommmOmmwoomm
`
`
`
`
`

`

`US. Patent
`
`Nov. 2, 1999
`
`Sheet 2 0f 31
`
`5,978,791
`
`
`
`m0<mOHw
`
`m0_>ma
`
`mOwwmooma
`
`3:.0E
`
`

`

`US. Patent
`
`Nov. 2, 1999
`
`Sheet 3 0f 31
`
`5,978,791
`
`
`
`>mOHoww=a
`
`5‘r
`
`NNr
`
`Pzwfiwww
`
`
`
`3;.
`
`MAE
`
`EmFW>$
`
`
`
`w:
`
`
`

`

`US. Patent
`
`N0v.2,1999
`
`Sheet 4 0f 31
`
`5,978,791
`
`I38
`7
`FIG.
`
`Raolon ID
`
`
`
`‘0
`
`
`
`Tlme of last access
`
`Time of last modification
`
`
`
`
`
`
`
`
`
`
`
`42
`
`_ 1
`
`Com-ressed File IE
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`FIG.5
`
`
`
`

`

`US. Patent
`
`w.
`
`1
`
`W
`
`‘h
`
`M
`
`5,978,791
`
`Nww
`
`
`
`2,6:82.35.230
`
`9cowucum.o
`
`Mmvw
`
`5madzmsuawmamanuwm
`
`
`
`Q”.Emummfifia
`
`Om.
`
`QHMommmoouw
`
`m0‘
`
`h.oE
`
`¢¢_
`
`
`
`
`
`
`SOflUmUOH@Uhgow
`
`

`

`US. Patent
`
`N0v.2,1999
`
`Sheet 6 of 31
`
`5,978,791 -
`
`FIG. IO(CI)
`
`SIMPLE
`
`DATA ITEM
`
`COMPUTE MD FUNCTION ON
`
`DATA lTEM
`
`DATA ITEM
`
`8214
`
`APPEND LENGTH MODULO 32 OF
`
`TRUE NAME
`
`1
`
`

`

`US. Patent
`
`Nov. 2, 1999
`
`Sheet 7 01'31
`
`5,978,791
`
`
`
`
`
`
`8216
`
`SIMPLE?
`
`FIG. IO(b)
`
`
`8220
`
`
`PARTITION DATA STEM TNTO
`SEGMENTS
`
`
`
`
`
`
`
`
`
`
`ASSIMILATE EACH SEGMENT
`
`8222
`
`(COMPUTiNG ITS TRUE NAME)
`
`
`
`8224
`
`llllIII\
`
`\
`
`, """""3'21'8"""" \
`COMPUTE TRUE E
`NAME OF S!MPLE :
`DATA ITEM
`:
`I
`
`_______ T_””""'
`
`CREATE [NDIREQT BLOCK OF
`SEGMENT TRUE NAMES
`
`
`
`
`
`
`ASSIMILATE lNDlRECT BLOCK
`
`8226
`
`(COMPUTING ITS TRUE NAME)
`
`
` 5228
`
`REPLACE FINAL 32 BITS 0F TRUE
`NAME WITH LENGHT MOD 32 OF DATA
`
`ITEM
`
`
`
`

`

`US. Patent
`
`NOV. 2, 1999
`
`Sheet 8 0f 31
`
`5,978,791
`
`>mFZmmmoo
`
`WQmin—max:
`
`mm>
`
`mmmw
`
`n:wJEmekw
`
`wmmw
`
`
`
`n:we“.wkmjwo
`
`mm;
`
`NmNm
`
`w5<2may:wmoo
`
`mum—wmax...2.hw—xm
`
`w>wfibamm
`
`ommm
`
`>mh2m252wh<wmo«
`
`n:NJEmMOkw«rOF#2300mm:Hum«
`
`
`
`wnflmimmzkohum..
`
`lfl.
`
`ommw
`
`wzgmmhmo
`
`wE<zmam...
`
`:.07..
`
`
`
`
`
`
`
`

`

`U.S. Patent
`
`N0v.2, 1999
`
`Sheet 9 of 31
`
`5,978,791
`
`FIG. l2
`
`
` 8240
`' 8238 '
`UPDATE
`
`FILE
`
`
`
`
`
`DEPENDENCY '
`
`LIST
`
`8242
`
`LOCKED?
`
`
`
`
`
`, SEND MESSAGE TO .
`
`3244
`
`: COMPRESS
`
`(IF DESIRED)
`
`CACHE SERVER TO
`
`UPDATE CACHE
`
`8246
`
`MlRROR
`
`(IF DESlRED)
`
`

`

`US. Patent
`
`Nov. 2, 1999
`
`Sheet 10 0f 31
`
`5,978,791
`
`FIG. l3
`
` 8250
`
`SEARCH FOR
`THE
`
`
`PATHNAME
`
`A.
`
`O AD
`
`FAIL
`
`
`
`
`
`LDE INCLUDES
`
`TRUE NAME?
`
`
`
`Y S
`
`3253
`ASSIMILATE
`FILE it)
`
`
`
`NO
`
`8254
`LDE IDENTIFIES
`7 DIRECTORY?
`
`Y S
`
`8256
`
`FREEZE
`
`DIRECTORY
`
`

`

`US. Patent
`
`N0v.2, 1999
`
`Sheet 11 of 31
`
`5,978,791
`
`TRUE NAME
`
`EXISTS LOCALLY
`
`
`
`8260
`
`
`CONFIRM THAT
`
`
`
`
` 8262
`
`SEARCH FOR
`
`
`
`
`
`FIG. I4
`
`PATHNAME IN
`
`LDE TABLE
`
`
`
`
`
`8264
`
`
`I CONFIRM THAT 7
`DIRECTORY
`
`EXISTS
`
`
`
`8266
`
`
`
`8288
`
`
`NAMED FILE
`DELETE
`
`EXISTS?
`TRUE FILE
`
`
`
`
`
`8270
`
`
`
`CREATE
`
`ENTRY IN LDE
`
`
`& UPDATE
`
`
`

`

`US. Patent
`
`Nov. 2, 1999
`
`Sheet 12 0f 31
`
`5,978,791
`
`meOmmmm
`
`wamOm
`
`vnww
`
`“Fm02mm
`
`MOm.345:wm0<mwm5
`
`mmZOmmmm
`
`mmmw
`
`
`
`MAEmam...awkzw
`
`
`
`OE]:nwzmafim
`
`NEH
`
`42w
`
`Nwmm
`
`may:Cams
`
`“a5m"
`
`awmamm
`
`ome
`
`
`
`m..=u_DZ."—
`
`9.0E
`
`mm;
`
`mwzomwwm
`
`m2h<0mz
`
`«mOmmmoom.<29.550.—9Nam"»
`
`Oz
`
`
`
`

`

`US. Patent
`
`Nov. 2, 1999
`
`Sheet 13 of 31
`
`1W
`
`:0,
`
`9.,m2Boa
`
`om)‘fl7,mxaaommmm
`
`
`
`85..0:
`
`:53023...!
`
`mmOwwMOOMm
`
`>z<
`
`mwNw
`
`vmwm
`
`kzmjo
`
`wkowAmm
`
`aEOmwmooE
`
`Wm;
`
`Dwkome
`
`mmmw
`
`Emso
`
`kzmjo
`
`mgh¢s>
`
`whm<oo<ommm>P.sz
`
`wmmmh:MSE
`
`mmZOQmWK
`
`m0
`
`
`
`
`
`
`
`

`

`US. Patent
`
`NOV. 2, 1999
`
`Sheet 14 0f 31
`
`5,978,791
`
`cemwm
`
`MZEMMHMQ
`
`
`
`whdd20F<mixm
`
`kw:0kand.D72
`
`Wm:
`
`
`
`vw.MUM—20m
`
`62.19.5an
`
`Oommw
`
`>swEmkw>w
`
`momwm
`
`won”an:n5x004
`
`DD<,wwE<zmam...
`
`
`
`n:ZOF<OOAmomDOm
`
`
`
`MOmma.momDOwO...
`
`wE<zmam:
`
`nv
`
`
`
`mug—mMDmE.m>mwwmm
`
`momDOm20
`
`mowwwoomm
`
`
`
`
`
`0....m0<mww§02mm
`
`“#me
`
`Eva—.0:
`
`may;”.0momDOw
`
`20mmmam—"EBm2<2
`
`sz_._.<szwQ
`
`ommw
`
`mmOhw
`
`n:mowmmoomm
`
`
`
`
`
`
`
`
`
`

`

`US. Patent
`
`NOV. 2, 1999
`
`Sheet 15 0f 31
`
`5,978,791
`
`mmNm
`
`wwmeEOomQ
`
`ome
`
`«n:we".
`
`
`
`QmwwmmmEOQ,
`
`w>mkzw9:...
`
`
`
`m0»n:win..
`
`>mhzwMAEmam,
`
`mam.»mOwmm...7:
`
`«ms—<2
`
`ABE.oE
`
`
`
`
`
`

`

`US. Patent
`
`NOV.2,1999
`
`Sheet16 0f31
`
`5,978,791
`
`
`
`3:....9“.
`
`QMNder
`
`nomw,
`
`mdw,mm»
`
`i
`
`comm
`
`955.5
`
`Nomw
`
`>Eka
`
`mum:
`
`komgmm
`
`,
`
`3mm
`
`wm_m0m30w
`
`I‘.H~
`
`momw
`
`
`
`may:muémm
`
`:055E
`
`Emomnow
`
`numombow
`
`NEG:02
`
`womw
`
`mk<004
`
`wdumHOEmm
`
`
`
`
`
`
`

`

`US. Patent
`
`Nov. 2, 1999
`
`Sheet 17 0f 31
`
`5,978,791
`
`mg»
`
`
`
`flqsomw:96qu
`
`may:m0>m00mm
`
`No...“Q.mJE
`
`«MAEmam...
`
`NNmm.
`
`mam...mxg
`
`4.400.“MAE
`
`\/Ami—n.
`
`mwmm
`
`mkmgmo
`
`MAEmamk
`
`Mm:
`
`mm_n=._.2wo_ma...
`
`
`
`MDME.GZCMUAN
`
`«MAE
`
`ommw
`
`2&2wkfimo
`
`
`
`MAE$Oh<m0w
`
`
`
`
`
`
`
`
`

`

`US. Patent
`
`Nov. 2, 1999
`
`Sheet 18 0f 31
`
`5,978,791
`
`7mumm,
`
`m>OEmmw9BEw><m
`mu:
`
`>mkzw
`
`Mm;
`
`72:00mm:
`
`hr"
`
`
`
`
`
`.wgmz...man.2—?Nd".map—um“MAE
`
`kZDOo
`
`
`
`mm:kzms—mmomo
`
`ommw
`
`252Q...MAE>n~00
`
`3v?.OE
`
`
`
`
`
`

`

`US. Patent
`
`Nov. 2, 1999
`
`Sheet 19 0f 31
`
`5,978,791
`
`783
`
`7:o<mm0u
`
`824.2522:az<mi.
`
`53553aWWW“.mEzEmomnm
`
`
`
`m..=u_>mOPOmmE
`
`m1...2—>m0k0mm5
`
`
`
`>m0k0mm5ZmZO
`
`5mm
`
`
`
`3m:mkdmmo
`
`
`
`Em:<._.<n_
`
`
`
`30..0E
`
`$8
`
`kzmfimmoz
`
`
`
`$00..MNwmmn.
`
`
`
`
`
`

`

`US. Patent
`
`Nov. 2, 1999
`
`Sheet 20 0f 31
`
`5,978,791
`
`03%
`
`Jammwmwy<55.52oz<wan.
`
`
`omoomm9bfizm.genwizamomam
`
`8%,:05mo»
`smEm5.2_>m05mma
`
`ZO~P<EmOwZ”
`
`
`
`35..9“.
`
`
`
`
`
`,EN:<._.<O>>mz
`
`mzhmhfi§$m<
`
`Nvmw
`
`
`
`>mOkommaZm>5
`
`31mm
`
`thEmmomo
`
`mNmmmuwzk
`
`v804
`
`
`
`
`
`

`

`US. Patent
`
`Nov. 2, 1999
`
`Sheet 21 of 31
`
`5,978,791
`
`ommw
`
`
`
`4.5“.mkdflmo
`
`
`
`mszzIk<m
`
`Nmmm
`
`
`
`O...zk<m$23
`
`mE<Zwsmk
`
`>~Fzm
`
`3%38
`
`o<wmmoim0"—mmOE02vme
`
`.2004WAG
`
`mvmm
`
`mam»wigON.0_..n_
`
`
`
`
`
`>m0kowm=o>m0h0wm5mekzmHmzoa
`
`
`
`
`
`

`

`US. Patent
`
`N0V.2, 1999
`
`Sheet 22 0f 31
`
`5,978,791
`
` 8354
`WAIT FOR
`
`FREEZE LOCK
`
`TO TURN OFF
`
`
`
`
`
`8356
`
`
`
`FIND TFR
`
`FIG.2I
`
`ENTRY
`
`
`
` 8358
`
`DECREMENT
`
`
`REFERENCE
`Ch! I‘ITUUI‘ l
`
`
`
`
`
`
`
`8362
`
`REFERENCE COUNT IS
`
`DELETE
`
`
`ZERO 8: NO DEPENDENT
`
`TRUE FILE
`
`
`
`SYSTEMS IN TFR?
`
`
`
`
`S364
`
`
`REMOVE FILE ID
`
`AND COMPRESSED
`FILE ID
`
`
`
`

`

`US. Patent
`
`N0V.2, 1999
`
`Sheet 23 0f 31
`
`5,978,791
`
`
`
` 8365
`GET
`
`
`
`OPERATION
`
`
`
`CREATE OR 7
`MODIFY?
`
`8368
`
`FIG. 22
`
`YES
`
`
`
`ASSIMILATE
`
`
`S358
`
`
`
`COPY OR DELETE
`
`COMPOUND?
`
`YES
`
`
`
`
`
`
`
`
`
`
`
`8378
`
`8370
`RECORD TRUE
`MODIFY USE‘
`
`
`
`
`
`
`COUNT OF EACH
`COMPONENT
`
`NAME IN AUDIT
`FILE
`
`
`8379
`
`FOR EACH PARENT
`
`
`
`DIRECTORY OR FILE,
`
`UPDATE USE COUNT,
`LAST ACCESS AND
`
`MODIFY TIMES
`
`
`
`

`

`US. Patent
`
`N0v.2, 1999
`
`Sheet 24 of 31
`
`5,978,791
`
`
`
` S382
`
`VWHUFY
`
`
`
`FIG. 23
`
`
`
`GROGMlNG
`
`LOCK OFF
`
`
`
`8384
`
`
`
`SET
`
`GROOMING
`
`LOCK
`
`8386
`
`
`
`
`SET GROOM
`
`COUNTS
`
`
`
`
`

`

`US. Patent
`
`N0v.2, 1999
`
`Sheet 25 0f31
`
`5,978,791
`
`
`
`
`FIND LDE
`
`S388
`
`RECORD
`
`F1324
`
`
`
`
` S390
`
`FIND, TFR
`
`RECORD
`
`
`S392
`
`
`INCREMENT
`
`GROOMING
`
`DELETE COUNT
`
`
`
`8394
`
`
`
`ADJUST FILE
`
`S!ZES
`
`
`
`

`

`US. Patent
`
`.
`
`NOV. 2, 1999
`
`Sheet 26 0f 31
`
`5,978,791
`
`FIG. 25
`
` 8396
`DELETE
`
`
`
`
`
`FILE
`
`
`
`
` 8398
`
`
`UNLOCK
`
`GROOMING
`
`LOCK
`
`
`
`

`

`US. Patent
`
`Nov. 2, 1999
`
`Sheet 27 0f 31
`
`5,978,791
`
`BEN.oE
`
`oovw
`
`”Fm—VANMAE
`
`9.3.200...
`
`02
`
`vovw
`
`hmzxomnw
`
`zmmo
`
`Novm
`
`02mm
`
`fiomb‘wmo
`
`vam
`
`tmiomm
`
`zmmo
`
`>mChomm“
`
`>420.D<mm
`
`
`
`
`
`
`

`

`US. Patent
`
`Nov. 2, 1999
`
`Sheet 28» 0f 31
`
`5,978,791
`
`omvw
`
`.2004mx<5
`
`m:minzmbkmmw29me>
`
`
`
`ME.EONE
`
`H02u:x00;
`
`meOOJ
`
`wdw
`
`w0m10<0
`
`mwvw
`
`Gzfim
`
`>Amkmfin=200
`
`
`
`2m_t._m>>w.-
`
`wva
`
`wo¢w mum”.
`
`mwgw
`
`wkdmmo
`
`35mg...
`
`vmvm
`
`2E3mm
`
`n:
`
`wczwrehgow
`
`
`
`MAEIOPéOw
`
`
`
`
`
`
`
`

`

`US. Patent
`
`m.N
`
`%2,
`
`M0m92m
`
`5,978,791
`
`9oamoowmwe."0
`
`%“Echomma
`
`
`
`zormjmnEzodfim
`
`tmiommwm>2.mo9953BE
`
`ASE0E
`
`
`
`met.Efizmm.
`
`may?EOEwe“.
`
`Nva
`
`
`
` EmbzmPMwmm.“macaw—MED
`
`
`
`mOumomOOMm
`
`win
`
`
`
`
`

`

`US. Patent
`
`NOV. 2, 1999
`
`Sheet 30 0f 31
`
`5,978,791
`
`thw
`
`mmjmm
`
`
`
`E00onmom
`
`we“.no
`
`wmvw
`
`.Ok>mkzwand.
`
`MAEb03<
`
`omvm
`
`mhmjmo
`
`MAEmay:
`
`SEN.9“.
`
`rmvm
`
`
`
`mm:woaamm
`
`wzo>m#2300
`
`wm>.
`
`wMJEwnmk
`
`w_kZDOOmmD
`
`mzo
`
`02
`
`02w<2wuzu.mm;
`
`ww2<zmeP
`
`
`
`
`
`
`
`

`

`US. Patent
`
`Nov. 2, 1999
`
`Sheet 31 0f 31
`
`5,978,791
`
`mm.0_L$53
`
`vaw
`
`OZ
`
`MES,”mam...
`
`vmvm
`
`82:0»
`
`wm>
`
`wowfim<>>m0m
`
`,
`
`
`
`mmOh.kmmndmm
`
`,wmvm
`
`w>§<0m2
`
`meOmmmm
`
`kwwadmm,
`
`,mm>DmigmCL0vam
`mewNMQEOOm09w..=mmMODJUZ
`
`wemnzw
`
`33m
`
`matwOm
`
`mmZOmwwm
`
`
`
`
`

`

`5,978,791
`
`1
`DATA PROCESSING SYSTEM USING
`SUBSTANTIALLY UNIQUE IDENTIFIERS TO
`IDENTIFY DATA ITEMS, WHEREBY
`IDENTICAL DATA ITEMS HAVE THE SAME
`IDENTIFIERS
`
`This is a continuation of application Ser. No. 08/425,160,
`filed on Apr. 11, 1995, which was abandoned upon the filing
`hereof.
`
`BACKGROUND OF THE INVENTION
`
`1. Field of the Invention
`
`This invention relates to data processing systems and,
`more particularly, to data processing systems wherein data
`items are identified by substantially unique identifiers which
`depend on all of the data in the data items and only on the
`data in the data items,
`
`2. Background of the Invention
`Data processing (DP) systems, computers, networks of
`computers, or the like, typically offer users and programs
`various ways to identify the data in the systems.
`Users typically identify data in the data processing system
`by giving the data some form of name. For example, a
`typical operating system (OS) on a computer provides a file
`system in which data items are named by alphanumeric
`identifiers. Programs typically identify data in the data
`processing system using a location or address. For example,
`a program may identify a record in a file or database by using
`a record number which serves to locate that record.
`
`In all but the most primitive operating systems, users and
`programs are able to create and use collections of named
`data items,
`these collections themselves being named by
`identifiers. These named collections can then, themselves,
`be made part of other named collections. For example, an
`OS may provide mechanisms to group files (data items) into
`directories (collections). These directories can then, them~
`selves be made part of other directories. A data item may
`thus be identified relative to these nested directories using a
`sequence of names, or a so-called pathname, which defines
`a path through the directories to a particular data item (file
`or directory).
`As another example, a database management system may
`group data records (data items) into tables and then group
`these tables into database files (collections). The complete
`address of any data record can then be specified using the
`database file name, the table name, and the record number of
`that data record
`
`Other examples of identifying data items include: identi-
`fying files in a network file system, identifying objects in an
`object-oriented database,
`identifying images in an image
`database, and identifying articles in a text database.
`In general, the terms “data” and “data item” as used herein
`refer to sequences of bits. Thus a data item may be the
`contents of a file, a portion of a file, a page in memory, an
`object in an object—oriented program, a digital message, a
`digital scanned image, a part of a Video or audio signal, or
`any other entity which can be represented by a sequence of
`bits. The term “data processing” herein refers to the pro-
`cessing of data items, and is sometimes dependent on the
`type of data item being processed. For example, a data
`processor for a digital image may diiIer from a data pro-
`cessor for an audio signal.
`In all of the prior data processing systems the names or
`identifiers provided to identify data items (the data items
`being files, directories, records in the database, objects in
`
`10
`
`IS
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`65
`
`2
`object—oriented programming, locations in memory or on a
`physical device, or the like) are always defined relative to a
`specific context. For instance, the file identified by a par—
`ticular file name can only be determined when the directory
`containing the file (the context) is known. The file identified
`by a pathname can be determined only when the file system
`(context) is known. Similarly, the addresses in a process
`address space, the keys in a database table, or domain names
`on a global computer network such as the Internet are
`meaningful only because they are specified relative to a
`context.
`
`In prior art systems for identifying data items there is no
`direct relationship between the data names and the data item.
`The same data name in two dilferent contexts may refer to
`difi’erent data items, and two difi‘ferent data names in the
`same context may refer to the same data item.
`In addition, because there is no correlation between a data
`name and the data it refers to, there is no a priori way to
`confirm that a given data item is in fact the one named by a
`data name. For instance, in a DP system, if one processor
`requests that another processor deliver a data item with a
`given data name,
`the requesting processor cannot,
`in
`general, verify that the data delivered is the correct data
`(given only the name). Therefore it may require further
`processing, typically on the part of the requester, to verify
`that the data item it has obtained is,
`in fact,
`the item it
`requested.
`Acornmon operation in a DP system is adding a new data
`item to the system. When a new data item is added to the
`system, a name can be assigned to it only by updating the
`context in which names are defined. Thus such systems
`require a centralized mechanism for the management of
`names. Such a mechanism is required even in a multi~
`processing system when data items are created and identified
`at separate processors in distinct locations, and in which
`there is no other need for communication when data items
`are added.
`
`In many data processing systems or environments, data
`items are transferred between different
`locations in the
`system. These locations may be processors in the data
`processing system, storage devices, memory, or the like. For
`example, one processor may obtain a data item from another
`processor or from an external storage device, such as a
`floppy disk, and may incorporate that data item into its
`system (using the name provided with that data item).
`However, when a processor (or some location) obtains a
`data item from another location in the DP system,
`it
`is
`possible that this obtained data item is already present in the
`system (either at the location of the processor or at some
`other location accessible by the processor) and therefore a
`duplicate of the data item is created, This situation is
`common in a network data processing environment where
`proprietary software products are installed from floppy disks
`onto several processors sharing a common file server. In
`these systems, it is often the case that the same product will
`be installed on several systems, so that several copies of
`each file will reside on the common file server.
`
`In some data processing systems in which several pro-
`cessors are connected in a network, one system is designated
`as a cache server to maintain master copies of data items,
`and other systems are designated as cache clients to copy
`local copies of the master data items into a local cache on an
`sis-needed basis. Before using a cached item, a cache client
`must either reload the cached item, be informed of changes
`to the cached item, or confirm that the master item corre—
`sponding to the cached item has not changed. In other words,
`
`

`

`5,978,791
`
`3
`a cache client must synchronize its data items with those on
`the cache server. This synchronization may involve reload-
`ing data items onto the cache client, The need to keep the
`cache synchronized or reload it adds significant overhead to
`existing caching mechanisms.
`In view of the above and other problems with prior art
`systems, it is therefore desirable to have a mechanism which
`allows each processor in a multiprocessor system to deter—
`mine a common and substantially unique identifier for a data
`item, using only the data in the data item and not relying on
`any sort of context.
`It is further desirable to have a mechanism for reducing
`multiple copies ()Idata items in a data processing system and
`to have a mechanism which enables the identification of
`identical data items so as to reduce multiple copies. It is
`further desirable to determine whether two instances of a
`data item are in fact the same data item, and to perform
`various other systems’ functions and applications on data
`items without relying on any context information or prop—
`erties of the data item.
`
`It is also desirable to provide such a mechanism in such
`a way as
`to make it
`transparent
`to users of the data
`processing system, and it is desirable that a single mecha-
`nism be used to address each of the problems described
`above.
`
`SUMMARY OF THE INVENTION
`
`This invention provides, in a data processing system, a
`method and apparatus for identifying a data item in the
`system, where the identity of the data item depends on all of
`the data in the data item and only on the data in the data item.
`Thus the identity of a data item is independent of its name,
`origin, location, address, or other information not derivable
`directly from the data, and depends only on the data itself.
`This invention further provides an apparatus and a method
`for determining whether a particular data item is present in
`the system or at a location in the system, by examining only
`the data identities of a plurality of data items.
`Using the method or apparatus of the present invention,
`the efficiency and integrity of a data processing system can
`be improved. The present invention improves the design and
`operation of a data storage system, file system, relational
`database, object-oriented database, or the like that stores a
`plurality of data items, by making possible or improving the
`design and operation of at least some or all of the following
`features:
`
`the system stores at most one copy of any data item at a
`given location, even when multiple data names in the system
`refer to the same contents;
`the system avoids copying data from source to destination
`locations when the destination locations already have the
`data;
`the system provides transparent access to any data item by
`reference only to its identity and independent of its present
`location, whether it be local, remote, or offline;
`the system caches data items from a server, so that only
`the most recently accessed data items need be retained;
`when the system is being used to cache data items,
`problems of maintaining cache consistency are avoided;
`the system maintains a desired level of redundancy of data
`items in a network of servers, to protect against failure by
`ensuring that multiple copies of the data items are present at
`different locations in the system;
`the system automatically archives data items as they are
`created or modified;
`
`4
`the system provides the size, age, and location of groups
`of data items in order to decide whether they can be safely
`removed from a local file system;
`the system can efficiently record and preserve any col-
`lection of data items;
`the system can efficiently make a copy of any collection
`of data items, to support a version control mechanism for
`groups of the data items;
`the system can publish data items, allowing other, possi-
`bly anonymous, systems in a network to gain access to the
`data items and to rely on the availability of the data items;
`the system can maintain a local inventory of all the data
`items located on a given removable medium, such as a
`diskette or CD—ROM, the inventory is independent of other
`properties of the data items such as their name, location, and
`date of creation;
`the system allows closely related sets of data items, such
`as matching or corresponding directories on disconnected
`computers,
`to be periodically resynchronized with one
`another;
`the system can verify that data retrieved from another
`location is the desired or requested data, using only the data
`identifier used to retrieve the data;
`the system can prove possession of specific data items by
`content without disclosing the content of the data items, for
`purposes of later legal verification and to provide anonym—
`ily;
`the system tracks possession of specific data items accord—
`ing to content by owner, independent of the name, date, or
`other properties of the data item, and tracks the uses of
`specific data items and files by content for accounting
`purposes.
`
`Other objects, features, and characteristics of the present
`invention as well as the methods of operation and functions
`of the related elements of structure, and the combination of
`parts and economies of manufacture, will become more
`apparent upon consideration of the following description
`and the appended claims with reference to the accompany-
`ing drawings, all of which form a part of this specification.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIGS. 1(a) and 1(b) depicts a typical data processing
`system in which a preferred embodiment of the present
`invention operates;
`FIG. 2 depicts a hierarchy of data items stored at any
`location in such a data processing system;
`FIGS. 3M9 depict data structures used to implement an
`embodiment of the present invention; and
`FIGS. 10(a)-28 arc flow charts depicting operation of
`various aspects of the present invention.
`DETAILED DESCRIPTION OF THE
`PRESENTLY PREFERRED EXEMPLARY
`EMBODIMENTS
`
`An embodiment of the present invention is now described
`:with reference to a typical data processing system 100,
`which, with reference to FIGS, 1(a) and 1(b), includes one
`or more processors (or computers) 102 and various storage
`devices 104 connected in some way, for example by a bus
`106.
`
`Each processor 102 includes a CPU 108, a memory 110
`and one or more local storage devices 112. The CPU 108,
`memory 110, and local storage device 112 may be internally
`connected, for example by a bus 114. Each processor 102
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`

`

`5,978,791
`
`5
`may also include other devices (not shown), such as a
`keyboard, a display, a printer, and the like.
`In a data processing system 100, wherein more than one
`processor 102 is used, that is, in a multiprocessor system, the
`processors may be in one of various relationships. For
`example,
`two processors 102 may be in a client/server,
`client/client, or a server/server relationship. These inter-
`processor relationships may be dynamic, changing depend-
`ing on particular situations and functions. Thus, a particular
`processor 102 may change its relationship to other proces-
`sors as needed, essentially setting up a peer-to-peer relation-
`ship with other processors. In a peer—to-peer relationship,
`sometimes a particular processor 102 acts as a client
`processor, whereas at other times the same processor acts as
`a server processor. In other words,
`there is no hierarchy
`imposed on or required of processors 102.
`In a multiprocessor system, the processors 102 may be
`homogeneous or heterogeneous. Further, in a multiprocessor
`data processing system 100, some or all of the processors
`102 may be disconnected from the network of processors for
`periods of time. Such disconnection may be part of the
`normal operation of the system 100 or it may be because a
`particular processor 102 is in need of repair.
`Within a data processing system 100, the data may be
`organized to form a hierarchy of data storage elements,
`wherein lower level data storage elements are combined to
`form higher level elements. This hierarchy can consist of, for
`example, processors, file systems, regions, directories, data
`files, segments, and the like. For example, with reference to
`FIG. 2, the data items on a particular processor 102 may be
`organized or structured as a file system 116 which comprises
`regions 117, each of which comprises directories 118, each
`of which can contain other directories 118 or files 120. Each
`file 120 being made up of one or more data segments 122.
`In a typical data processing system, some or all of these
`elements can be named by users given certain implementa-
`tion specific naming conventions, the name (or pathnarne) of
`an element being relative to a context. In the context of a
`data processing system 100, a pathname is fully specified by
`a processor name, a filesystem name, a sequence of zero or
`more directory names identifying nested directories, and a
`final file name. (Usually the lowest level elements, in this
`case segments 122, cannot be named by users.)
`In other words, a file system 116 is a collection of
`directories 118. Adirectory 118 is a collection of named files
`120—~both data files 120 and other directory files 118. A file
`120 is a named data item which is either a data file (which
`may be simple or compound) or a directory file 118. A
`simple file 120 consists of a single data segment 122. A
`compound file 120 consists of a sequence of data segments
`122. A data segment 122 is a fixed sequence of bytes. An
`important property of any data segment
`is its size,
`the
`number of bytes in the sequence.
`A single processor 102 may access one or more file
`systems 116, and a single storage device 104 may contain
`one or more file systems 116, or portions of a file system 116.
`For instance, a file system 116 may span several storage
`devices 104.
`
`In order to implement controls in a file system, file system
`116 may be divided into distinct regions, where each region
`is a unit of management and control. A region consists of a
`given directory 118 and is identified by the pathname (user
`defined) of the directory.
`In the following, the term “location”, with respect to a
`data processing system 100, refers to any of a particular
`processor 102 in the system, a memory of a particular
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`processor, a storage device, a removable storage medium
`(such as a floppy disk or compact disk), or any other physical
`location in the system. The term “local” with respect to a
`particular processor 102 refers to the memory and storage
`devices of that particular processor.
`In the following, the terms “True Name”, “data identity”
`and “data identifier” refer to the substantially unique data
`identifier for a particular data item. The term “True File”
`refers to the actual file, segment, or data item identified by
`a True Name.
`
`A file system for a data processing system 100 is now
`described which is intended to work with an existing oper-
`ating system by augmenting some of the operating system’s
`file management system codes. The embodiment provided
`relies on the standard file management primitives for actu—
`ally storing to and retrieving data items from disk, but uses
`the mechanisms of the present invention to reference and
`access those data items.
`
`The processes and mechanisms (services) provided in this
`embodiment are grouped into the following categories:
`primitive mechanisms, operating system mechanisms,
`remote mechanisms, background mechanisms, and extended
`mechanisms.
`
`\opoqouxasw
`
`Primitive mechanisms provide fundamental capabilities
`used to support other mechanisms. The following primitive
`mechanisms are described:
`1. Calculate True Name;
`2, Assimilate Data Item;
`. New True File;
`. Get True Name from Path;
`. Link path to True Name;
`. Realize True File from Location;
`. Locate Remote File;
`Make True File Local;
`. Create Scratch File;
`10. Freeze Directory;
`11. Expand Frozen Directory;
`12. Delete True File;
`13. Process Audit File Entry;
`14. Begin Grooming;
`15. Select For Remov

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket