throbber
I|||||||||||l|||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||
`
`USlJt]6-'-l-1528{lB1
`
`(12) Ulllted States Patent
`Farber et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 6,415,230 B1
`*Jul. 2, 2002
`
`rtasrszt
`-iI"U9.u’3fl3
`.. T-"|'l9f2['_I2
`"Amt
`395,t20t).49
`rttorssc
`
`
`
`I.
`
`6t199‘t Stefik ct al.
`5.538.443 A
`I),-'1':J‘.’7 Hamilton El al.
`5,f}4ll,5{)4 A “
`W1998 Baliek et al.
`$802,291 A
`.~,sm_.494 A * wtuaa Nguyen
`.‘,9(l7,?(t4 A
`.7199!)
`(' d
`d
`ii,nttts_,ttta A *
`tartuttv liiirnliiiinctsgii
`6.13-4_.tan3 A *
`t{|l2tJtJf.I Jonesetal.
`OTHER PUBLICATIONS
`
`Gwertzntan,James,e1al.“The Case for Geographical Push-
`(Taching.” Technical Report I-IU TR 34-94 (excerpt), I-larv
`vard University, DAS, Cambridge, MA (12138, 1994, 2 pgs.
`Grigni, Michelangelo, et al. “'l‘igl1t Bounds on Minimum
`Broadcasts Networks.” SIAM Journal of Discrete Math-
`cmalics, vol. 4, N0. 2, May 199], pp. 2(}7'—222.
`
`(54)
`
`II)l«:N'l‘IFYIN(:ANl) RI£QUl£S'l‘IN(} DATA IN
`
`ARE BASED ON CONTENTS OF DATA
`..
`.
`~

`-
`~
`_
`I"“""‘"”' 3a‘i‘ildAi)hIirl";r’ OJ""]’~J('AhE)US)l’{ H
`{l}’é‘)"
`‘
`‘ac '“““'
`"”
`"“‘ '
`
`'
`
`(75)
`
`U3) Assignees: Kinetcch, IrIe.. Northbmok. IL (US);
`Digital Island, Inc-., San Francisco, CA
`{Us)
`
`(‘l
`
`N0li<-‘til
`
`31lb_i'5C110iifl}’ fiiti-'1<'ii|‘I1<=f.1l1t$l<=l'|‘I‘l Of lhi-5
`Pflllifll
`i5 Gxlcflilfld 0|‘ iidjufilt-‘Bil
`I-mil’-31‘ 35
`U-S-C 154(b) by U d«'i)/En
`
`This patent is st.Il)jet.:t to a terminal dis-
`elaimer.
`
`(I-is! continued on next page)
`
`(21) Appl. No.: 09,r2s3,160
`
`(22
`
`Filed:
`
`Apr. 1,1999
`
`Pr."mr1r_t' E'xrmtiner—Jean R. Homere
`(74) Attorney) .4genr, or Fit-rrt—Pi||s|aury Winthrop LLP
`Intellectual Property
`
`(57)
`
`ABSTRACT
`
`Related U-S- APP"‘33“"-in Data
`_
`_
`_
`_
`_
`_
`(62) D“”‘°"°n “I aPF'l"°a"°" N0‘ U8"U5”-'m9' hlcd 0“ 0'31‘ 24-‘
`1997, now Pat. No. S_.978,T91. which is a continuation of
`appficmion Na 03,-42360. med an AP,’ 11? 1;,-,g5_ now
`abandoned.
`
`(51)
`
`Int. C1?
`
`G06F 17130
`
`In asystem in which aset olidata items are distributed across
`a network of servers, at least some of the data items being
`cached versions ofdata items from a source server a content
`1 .
`.
`_
`_
`i
`.
`.
`I
`I
`.
`‘.
`delivery mcthodtneludcs detern‘1tnrng‘a tata tdenttficr fer a
`])3l'ltCLll£ll' data Item.
`the data tclenttfter being determined
`us'n a !'v -n fun 't'0n ofth ' dar
`‘om ris'nv th* art'cul' r
`da‘1a’5iwr*;fa"nd msgnsiw to: WJHZSI [Er tinfparjifulai dafa
`
`flrmuli 709303; Fioglmgi 709929
`(58) Field of Search
`70713, 10, IUI,
`707593 709903» 319a 239
`
`(56)
`
`References Cited
`U_S_ pA—n_:NT DOCUMENTS
`
`7'77”
`TUW2
`_m_H2
`?[}9,-"Z26
`iI"[)'?,."2[l5
`'r’{!'trt0
`
`
`
`.
`
`"'»°33.-417 A
`5,2fl2_.982 A *
`5,28_M9g A
`5,341,477‘ A
`5__452_,44? A *
`5,342.08‘.-‘ A
`
`551990 Chum 9' 3"
`4)'l993 Gramlieh et all
`M994 News
`8r'l‘)‘J4 Pitkin ct al.
`9r'l995 Nelson et al.
`‘H1996 Ncimal et at.
`
`108
`
`192
`pfigceafiofl
`
`
`
`STORAGE
`nEw::E
`
`'
`
`oz
`rnccessen
`
`1IEt
`
`STORAGE
`DEVICE
`
`192
`Pnocssson
`
`
`
`particular data item providing the particular data item from
`3 given mm of [he servers of the nmwm-k of Sen.“-S‘ The
`request for the particular data item may be resolved based on
`a measure of availability of at least one of the servers, where
`the measure 01' availability may be it measurement of band-
`width tn the server; a measurement of a cost ofa connection
`to the server, artdfor a measurement 01' a reliability of a
`connection to the server. The function used to determine the
`‘
`" ,
`t
`‘
`may be a message digest
`funettnn or
`:1 haslt
`
`‘
`
`55 Claims, 31 Drawing Sheets
`
`PROCESSOR
`
`.._____....‘
`
`
`
`EMCVMW 1001
`EMCVMW 1 00 1
`
`

`
`US 6,415,280 B1
`Page 2
`
`OTHER PUBLICATIONS
`
`Devine, Robert. “Design and Implementation of DDH: A
`Distributed Dynamic Hashing Algorithm." In Proceedings
`of 4th International Conference on Foundations of Data
`Organizations and Algorithms, 1993, pp. 101-114.
`Deering, Stephen, et al. “Multicast Routing in Datagram
`lnternetworks and Extended LAD-ls.” ACM Transactions on
`Computer Systems, vol. 8, No. 2, May 1990, pp. 85-110.
`Cormen. Thomas H., et al. Introduction to At'gorit!tttts, The
`MIT Press, Cambridge, Massachusetts, 1994, pp. 219-243,
`991-993.
`Naor, Moni, et al. "The Load, Capacity and Availability of
`Quorum Systems." In Proceedings of the 35th IEEE Sym-
`posium on Foundations of Computer Science, Nov. 1994,
`pp. 214-225.
`for Space-
`"Psuedorandom Generators
`Nisan, Noam.
`Bounded Computation." In Proceedings of the Twenty-
`Second Annual ACM Symposium on Theory of Computing,
`May I990, pp. 204-212.
`Palmer, Mark ct al. “Fido: ACache that Learns to Fetch.” In
`Proceedings of the 171}: International Conference on Very
`Large Data Bases, Sep. 1991, pp. 255-264.
`Peleg, David, et al. "The Availability of Quorum Systems."
`Information and Computation 123, 1995, 210-223.
`Rabin, Michael. “Elficicttt Dispersal of Information for
`Security, Load Balancing, and Fault Tolerance.” Journal of
`the ACM, vol. 36, No. 2, Apr. I989, pp. 335-348.
`Ravi, R., “Rapid Rumor Ramification: Approximating the
`Minimum Broadcast Time.” In Proceedings of the 35th
`IEEE Symposium on Foundation of Computer Science, Nov.
`1994, pp. 203-213.
`Schmidt, Jeanette, et at. “Chcrnot1‘—Hoetfding Bounds for
`Applications with Limited Independence." In Proceedings
`ot‘ the 4th ACS—SIAM Symposium on Discrete Algorithms,
`1993. pp. 331-340.
`Tarjart, Robert Endre, et al. “Storing a Sparse Table.”
`Communications of the ACM, vol. 22, No. 1], Nov. 1979,
`pp. 606-611.
`
`Wegman, Mark, et al. "New Hash Functions and Their Use
`in Authentication and Set Equality.” Journal of Computer
`and System Sciences vol. 22, Jun. 1981, pp. 265-279.
`
`Vitter. Jelfrcy Scott, et at. “Optimal Prefetching via Data
`Compression." In Proceedings of 32nd IEEE Symposium on
`Foundations of Computer Science, Nov. 1991, pp. 121-130.
`
`Fredman, Michael, et al. "Storing a Sparse Table with 0(1)
`Worst Case Access Time.” Journal of the Association for
`Computing Machinery, vol. 3!, No. 3, Jul. 1984, pp.
`538-544.
`
`Yao, Andrew Chi—Chih. “Should Tables be Sorted?” Journal
`of the Association for Computing Machinery, vol. 28, No. 3,
`Jul. 1931, pp. 615-628.
`
`Floyd, Sally, et al. “A reliable Multicast Framework for
`I-ight—Weight Sessions and Application level Framing.” In
`Proceedings of ACM SIGCOMM '95, pp. 342-356.
`
`Feeley, Michael, et :1]. "Implementing Global Memory Man-
`agement in a Workstation Cluster.” In Proceedings of the
`15th ACM Symposium on Operating Systems Principles,
`1995, pp. 201-212.
`Carter, J. Lawrence, et al. “Universal Classes of Hash
`Functions." Journal of Computer and System Scienoes, vol.
`18, No. 2, Apr. 1979, pp. 143-154.
`
`Patent Abstracts of Japan, “Electronic Mail Multiplexing
`System and Communication Control Method in The Sys-
`tern.” Jun. 30, 1993, JP 051625293.
`
`Kim et al., “Experiencess with Tripwire: Usinglntergrity
`Checkers For Intruosion Detection”, Coast Labs. Dept. of
`Computer Sciences Purdue University. Feb. 22, 1995, pp.
`1-12.
`
`Kim ct al., “The Design and Implementation of Tripwire: A
`file System Integrity Checker”, Coast Labs. Dept. of Com-
`puter Sciences Purdue University, Nov. 19, 1993, pp. 1-21.
`
`* cited by examiner
`
`

`
`3U
`
`m.
`umA0:0E
`
`m
`
`m2:
`
`
`M_...mo_>momo_>mn
`
`momwmoommmommmooma...
`
`moéopmmoqmopw
`
`N2N2A?9
`
`N9NE.NE.
`
`
`
`
`
`mmommmoommmommmoomn.mommmooma
`
`n
`
`6m
`
`m0002|)H
`
`
`
`4.,'-I'll}-|-
`
`j
`
`

`
`Jul. 2, 2002
`
`Sheet 2 of 31
`
`US 6,415,280 B1
`
`3:_o_n_
`
` U.S.Patent
`
`

`
`FIG. 2
`
`FILE
`svsrem
`
`"15
`
`118
`
`1 1 8
`
`118
`
`120
`
`120
`
`FILE
`
`122
`
`122
`
`.
`
`‘
`
`I
`
`. . .
`
`.
`
`I -
`
`120
`
`FILE
`
`122
`
`SEGMENT
`
`
`
`
`
`[fl03z‘§I17‘9Sf]15.‘J0‘Ewalls200?‘Z‘In!’Illfilfld'S'fl
`
`
`
`
`
`

`
`U.S. Patent
`
`Jul. 2, 2002
`
`Sheet 4 of 31
`
`US 6,415,280 B1
`
`FIG. 3
`
`.38,
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`I40
`
`I42
`
`
`
`
`
`
`
`FIG. 4
`
`FIG. 5
`
`
`
`
`
`2
`
`
`
`
`
`
`
`
`
`
`
`
`
`

`
`FIG. 6
`
`source ID
`
`source t;-e
`source r1-hts
`
`source availabilit
`source location
`
`Dri-inal Name
`
`O-eration
`
`Processor ID
`
`Timestamo
`
`Pathname
`True Name
`
`date of entr
`
`t;oe of entr
`True Name
`
`FIG. 7
`
`F|G.8
`
`FIG. 9
`
`
`
`
`
`[CJ0SmusZ002‘Z'|“l'Ill313(]'S'[]
`
`150
`
`I5108Z‘SI17‘9Sf]
`
`

`
`U.S. Patent
`
`Jul. 2, 2002
`
`Sheet 6 of 31
`
`US 6,415,280 B1
`
`FIG. IO(cI)
`
`S!MPJ_E
`
`DA TA ITEM
`
`‘If
`
`I-———-nu-.————————.-———-——I-I"
`
`‘_,._¢-u.-.a-u.au.ou-—..-.¢-.4-nuns-a-..—u.—a-—u.-u.—.a-cu.-...,
`
`COMPUTE MD FUNCTION ON
`
`DATA ITEM
`
`8214
`
`APPEND LENGTH MODULO 32 OF
`
`DATA ITEM
`
`TRUE NAME
`
`l
`
`

`
`U.S. Patent
`
`Jul. 2, 2002
`
`Sheet 7 of 31
`
`US 6,415,280 B1
`
`
` FIG.
` S216
`YES
`DATA ITEM
`SIMPLE?
`
`$220
`
`|O(b)
`
`PARTITION DATA ITEM INTO
`SEGMENTS
`
`
`
`
`3222
`
`ASSIMILATE EACH sesmem
`
`(COMPUTING rrs TRUE NAME)
`
`; ‘""§21‘3""'‘~,
`COMPUTE TRUE ;
`; NAME or: SIMPLE :
`-_
`DATA ITEM
`:
`
`S224
`
`CREATE INDIRECT BLOCK OF
`
`SEGMENT TRUE NAMES
`
`S226
`ASSIMILATE INDIRECT BLOCK
`
`(COMPUTING ITS TRUE NAME)
`
`S228
`REPLACE FINAL 32 BITS OF TRUE
`NAME WITH LEN GHT MOD 32 OF DATA
`ITEM
`
`

`
`FIG. ll
`
`
`S230
`
`
`DETERMINE
`TRUE NAME
`
`
`
`DOES TRUE NAME
`
`EXIST IN TRUE FILE
`
`REGISTRY?
`
`
`
`
`
`
`
`
`S236
`
`' CREATE NEW ENTRY
`" SET USE COUNT T0 1
`" STORE FILE ID
`“ SET OTHER FIELDS
`
`
`
`1“31‘3d'S'fl
`
`B:
`
`5 L
`
`NIN)
`3IN)
`
`‘:7’
`E
`E
`3
`
`
`
`IE108Z‘SI17‘9Sf]
`
`DOES ENTRY
`HAVE FILE ID?
`
`S239
`
`STORE FILE ID
`
`DELETE FILE ID
`
`YES
`
`3238
`
`

`
`U.S. Patent
`
`Jul. 2, 2002
`
`Sheet 9 of 31
`
`US 6,415,280 B1
`
`FIG. I2
`
`
`
`
`
`£240
`
`u DATE
`
`
`
`
`
`DEPENDENCY
`
`
`
`LIST
`
`S242
`
`
`
`3238
`
`FILE
`
`LOCKED?
`
`
`
`
`
`SEND MESSAGE TO
`
`
`
`CACHE SERVER TO
`
` S244
`
`UPDATE CACHE
`
`COMPRESS
`
`(II= DESIRED)
`
` S246
`
`MIRROR
`
`(IF DESIRED)
`
`

`
`U.S. Patent
`
`Jul. 2, 2002
`
`Sheet 10 of 31
`
`US 6,415,280 B1
`
`3250
`SEARCH FOR
`THE
`PATHNAME
`
`‘ .
`
`.
`
`. ’
`
`FAIL
`
`
`
`LDE INCLUDES
`
`TRUE NAME?
`
`
`
`S258
`
`
`
`
`ASSIMIIATE
`
`
`FILE ID
`DIRECTORY?
`
`
`
`
`LDE IDENTIFIES
`
`Y8
`
`S256
`
`FREEZE
`
`DIRECTORY
`
`

`
`
`
`Sheet 11 of 31
`
`US 6,415,280 B1
`
`FIG. I4
`
`
`
`
`
`S268
`
`DELETE
`
`U.S. Patent
`
`.lul. 2, 2002
`
`CONFIRM THAT
`
` S260
`
`
`
`EXISTS LOCALLY
`
`
`
`TRUE NAME
`
`
`
`
`
`SEARCH FOR
`
`
`
`PATH NAME IN
`
`LDE TABLE
`
` S254
`
`
`CONFIRM THAT
`
`8262
`
`
`
`
`
`DIRECTORY
`
`EXISTS
`
`S268
`
`NAMED FILE
`
`EXISTS?
`
`
`
`
`
`TRUE FILE
`
`
` S270
`
`CREATE
`
`
`
`ENTRY IN LDE
`
`8: UPDATE
`
`

`
`
`
`
`
`Iao3z‘sw‘9sn19J"z:musznuz‘zWyuamd'S'[]
`
`
`
`
`NEGATIVE
`S274
`
`RESPONSE
`SEND RTF
`
`MESSAGE 8.
`WAIT FOR
`RESPONSE
`
`
`
`
`S278
`
`REQUEST
`MOUNT
`
`
`
`
`S276
`
`S280
`ENTER TRUE FILE
`FIND FILE
`RITU RNED INTO
`
`S282
`TFR
`VERIFY TRUE
`
`FILE (IF
`DESIRED}
`
`POSITIVE
`RESPONSE
`
`

`
`
`
`
`
`Ifl08Z‘SI17‘9Sr]mo91musz00z‘z‘InfIllfilfid'S'fl
`
`
`
`
`
`
`
` S284
`CLIENT
`SELECT5
`
`
`
`
`PROCES SO R{S}
`
`
`
`S265
`
`CLIENT
`
`BROADCASTS
`
`NEG - TIVE
`RESPONSE
`OR
`TIME UT
`
`S285
`
`ANY
`PROCES$ORS
`ELECTED
`
`YES
`
`0 F
`
`
`
`
`IG. |6(a)
`
`POSI IVE
`RESPONSE
`__._+_._-_.
`
`

`
`
`
`
`
`Ifl08z‘sw‘9Snm0Mmusznuz‘z‘InfJllalfid'S'{1
`
`
`
`
`
`
`
`
`
`
`SOURCE OF TRUE
`STORE
`NAME DIFFERS FROM
`PROCESSOR ID
`DESTINATION?
`
`
`S290
`
`S2908
`
`LOOK UP TFR FOR
`TRUE NAME 8; ADD
`SOURCE LOCATION ID
`TO SOURCE IDS FOR
`TRUE NAME
`
`
`
`FIG. l6(b)
`
`S291d
`
`DETERMINE
`YES
`EXPIRATION DATE
`AND ADD TO LIST
`
`
`
`
`
`S2910
`
`
`SEND MESSAG E TO
`SOURCEIS ‘
`RESERVE TRUE FILE
`PUBLl$HlNG >-
`ON SOURCE
`svsrem? '
`PROCESSOR
` \/
`
`S2900
`
`O
`
`
`
`

`
`m3PQMU
`
`2
`
`m
`
`MMHW...
`
`14,6SU
`
`1.B00%.
`
`5
`
`mm:
`
`
`
`2.,mm?
`
`5.,oz
`
`
`
`wmmmm..m>
`
`8%
`
`mmmmmsoomo
`
`omwmmmmsoo
`
`...n_m._E
`
`...ms_<zU.AEEM02>Ezmm.__n_mam
`
`
`
`
`

`
`QMU
`
`...I-nBta
`
`7.M
`
`,..0m%
`
`1B0002|)514.,6
`
`mBEor.
`
`emmam:mN_._<mm
`
` Emomaomm205.m.__n_
`
`
`
`m»<oo._2,88
`
`P--
`
`
`mposmmm._.om._mm
`
`
`
`
`
`ma.momaomm.=u_
`
`
`

`
`
`
`
`
`Ifl08z‘sw‘9Snm0L!muszauz‘z‘Infmama'S'f1
`
`
`
`
`
`
`
` CRATCH SHOULD
`
`
`
`BE COPY OF TRUE
`FILE?
`
`FIG. |8(cl)
`
`LDE IDENTIFIES
`
`EXISTING TRUE
`FILE?
`
`YES
`
`FILE ID FOR
`
`TRUE FILE?
`
`
`
`
`
`
`
`S322
`
`
`
`MAKE TRUE
`FILE LOCAL
`
`
`
`S320
`
`CREXTE NEW
`SCRATCH FILE
`
`

`
`U.S. Patent
`
`Jul. 2, 2002
`
`Sheet 18 of 31
`
`US 6,415,280 B1
`
`
`
`.m._mEmm:z_w9m.__u_m><mo.m.__n_.uEo._.m.m.__n_
`
`
`N_n__._.m>os_mE
`
`
`
`
`
`
`>Ezm_mm:pzmsmmomo
`
`pzzoo
`
`
`
`38Emz9m_.__”_.500
`
`88
`
`mm:0
`
`
`
`35..o_n_
`
`
`
`
`

`
`
`
`
`
`IE108Z‘SI17‘9SnI91"61WIS3003‘?'|“1'l“919d'S'fl
`
`
`
`
`
`
`
`
`
`S332
`
`
`INCREMENT
`FREEZE LOCK
`
`FIG. |9(o)
`
` S337
`CREATE NEW
`DATA ITEM
`
`
`
`
`
`
`
`
`FOR EACH
` S334 S336
`
`
`
`ASSIMILATE
`susonnmms
`gliifiggy
`UNASSIMILATED
`FILE AND
`
`
`
`
`
`DIRECTORY IN THE
`FILE
`GIVEN nmecronv
`
`
`
`

`
`
`
`
`
`
`
`I90gz‘§[17‘9Sf]19J"ozmusznuz‘z1"!‘Illfilfid'S'fl
`
`
`
`334°
`S338
`ADD ENTRY T0
`A§§l‘13.%:%L
`NEW DATA
`DESIRED
`mam
`INFORMATION
`
`
`
`S342
`
`ASSIMIUXTE THE
`NEW DATA ITEM
`
`
`FIG. I9(b)
`
` S344
`DECREMENT
`
`
`THE FREEZE
`LOCK
`
`
`
`
`
`FOR EACH
`SUBORDINATE
`FILE AND
`DIRECTORY IN THE
`GIVEN DIRECTORY
`
`
`
`
`

`
`
`
`
`
`
`
`
`
`Ifl08Z‘SI17‘9Snm0Itmusznuz‘z‘InfIllalfid'S'{1
`
`
`
`
` S346
`MAKE TRUE
`FILE LOCAL
`
`
`
`8353
`FOR EACH
`DIRECTORY
`ENTRY
`
`FIG. 20
`
`S354
`
`DONE
`
`NO MORE
`ENTRIES
`
`3348
`
`
`
`
`
`
`READ
`DIRECTORY
`
`
`
`
`S350
`
`CREATE FULL
`PATHNAME
`
`S352
`
`
`
`LINK PATH TO
`TRUE NAME
`
`

`
`U.S. Patent
`
`Jul. 2, 2002
`
`Sheet 22 of 31
`
`US 6,415,280 B1
`
`
`
` S354
`WAIT FOR
`
`
`
`FREEE LOCK
`TO TURN OFF
`
`
`
`
`
`FIG.2|
`
`S358
`
`
`
`DECREMENT
`
`REFERENCE
`COUNT
`
`
`
`
`S362
`DELEFE
`
`TRUE FILE
`
`
`
`
`
`REFERENCE COUNT IS
`
`ZERO 8: NO DEPENDENT
`
`SYSTEMS IN TFR?
`
`
`
`
`
`AND COMPRESSED
`FILE ID
`
`
`
`REMOVE FILE ID .
`
`S364
`
`

`
`U S Patent
`
`lul. 2 2002
`
`Sheet 23 of 31
`
`US 6.415.280 31
`
`FIG. 22
`
`YES
`
`S368
`
`
`ASSIMILATE
`
`YES
`
`
`
`3378
`
`S370
`
`MODIFY USE
`
`RECORD TRUE
`
`COUNT OF EACH
`COMPONENT
`
`NAME IN AUDIT
`
`FILE
`
`
`
`
`COPY OR DELETE
`COMPOUND?
`
`
`
`
`8379
`
`
`
`
`FOR EACH PARENT
`
`DIRECTORY OR FILE,
`UPDATE USE COUNT.
`LAST ACCESS AND
`MODIFY TIMES
`
`
`
`

`
`
`
`Sheet 24 of 31
`
`US 6,415,280 B1
`
`
`
` S382
`VERIFY
`
`
`
`
`
`GROOMING
`
`LOCK OFF
`
`
`
`
`
`
` 8384
`SET
`
`GROOMING
`
`LOCK
`
`
`
` 3386
`
`SET GROOM
`
`COUNTS
`
`U.S. Patent
`
`Jul. 2, 2002
`
`FIG. 23
`
`
`
`

`
`U.S. Patent
`
`Jul. 2, 2002
`
`Sheet 25 of 31
`
`US 6,415,280 B1
`
`FIG. 24
`
`S392
`
`INCREMENT
`
`GROOMING
`
`DELETE COUNT
`
`S394
`
`ADJUST FILE
`
`SIZES
`
`

`
`
`
`U.S. PatentU.S. Patent
`
`
`
`Jul. 2, 2002Jul. 2, 2002
`
`
`
`Sheet 26 of 31Sheet 26 of 31
`
`
`
`US 6,415,280 B1US 6,415,280 B1
`
`
`
`FIG. 25FIG. 25
`
`

`
`
`
`1“31‘3d'STl
`
`:
`-‘
`E5
`E
`
`{D='
`EI9‘-J
`E.
`5:’
`
`
`
`IE108Z‘SI17‘9Sf]
`
`
`
`FIG. 26(0)
`FILE EXISTS
`
`
`S402
`
`
`
`35"“?
`0
`CREATED?
`
`
`
`
`LOCALLY?
`
`
`
`
`
`
`S422
`
`S419
`PROHIBIT
`
`OPEN
`SCRATCH
`
`FILE?
`
`
`s4o4
`PROHIBIT
`opEN
`
`

`
`'S'fl 3|1913d
`
`
`
`LOCK IF NOT
`LOCKED
`
`BEING
`
`
`
`
`COMPLETELY
`-
`RITTEN
`
`3417
`
`CREATE
`ERASE FILE
`
`scans:-1
`
`com’
`
`
`
`54°“
`CREATE
`SCRATCH FILE
`
`
`
`E
`‘N
`‘L.
`
`GS
`
`_j _
`
`an
`
`E5
`
`3
`33
`S,
`53
`
`
`
`Ifl08Z‘SI17‘9Sf]
`
`5420
`MAKE LOCAL
`VERSION 3.
`
`RETURN FILE ID
`FROM TFR
`S424
`
`RETURN
`SCRATCH FILE
`
`ID
`
`
`
`
`
`
`
`
`F|G.26(b)
`
`

`
`
`
`
`
`Ifl0sz‘sw‘9so191"62milsZ002‘:'I"l‘Ilifilfid'S'fl
`
`
`
`
`
`
`
` S422
`DETERMINE LDE &
`RT ENTRY
`RECORDS FOR
`FILE
`
`
`
`
`
`
`
`PROHIBIT
`
`DELETION
`
`0 LDE RECORD 0 "
`
`
`FILE LOCKED OR IN
`
`
`READ-ONLY
`
`DIRECTORY?
`
`
`S424
`
`IDENTIFY TRUE
`FILE FROM TRUE
`
`NAME
`
`FIG. 27(0)
`
`

`
`
`
`
`
`Ifl0sz‘sw‘9so191"09milsH102‘:‘Infmama'S'fl
`
`
`
`
`
`
`
`
`YES
`
`
`FILE HAS NO
`TRUE NAME?
`
`
`
`S427
`DELETE
`
`
`
`
`
`SCRATCH COPY
`USE. COUNT IS
`ONE
`OF FILE
`
`
`TRUE FILE'S
`
`
`
`
`
`
`
`
`S430
`DELETE
`TRU E FILE
`
`S431
`
`REDUCE USE
`
`COUNT BY one
`
`
`
`
`
`S428
`
`ADD ENTRY T0
`AUDIT FILE
`
` FIG. 27( b)
`
`

`
`
`
`
`
`Iao3z‘sw‘9sn19J"19musznuz‘zWyuamd'S'[]
`
`S432
`
`Loom
`TRUE NAME
`
`FIG. 28
`
`YES
`
`S434
`
`FOUND?
`
`
`No
`
`
`
`NCLUDES FILE [D
`OR COMPRESSED
`
`
`
` S442
`FORWARD
`REQUEST TO BE
`
`REQUEST
`FORWARDED?
`
`
`
`
`
`
`S438
`
`NEGATIVE
`RESPONSE
`
`
`
`
`
`
`S444
`
`POSITIVE
`
`RESPONSE
`
`
`
`

`
`US 6,415,280 B1
`
`1
`IDENTIFYING ANI) REQUES'I‘ING DATA IN
`NETWORK USING ll)I€N'I‘IFll£RS WHICH
`ARE BASEI) ON CON’l‘ICN'I‘S 01"‘ IJATA
`
`This is a division of application Ser. No. (t8i96{),079,
`filed Oct. 24, 199?, now US. Pat. No. 5,9?8,791 filed Oct.
`24, 2001 which is a continuation of Scr. No. (l8i425,160,
`filed Apr. 11, 1995, now abandoned.
`
`BACKGROUND OF THE INVENTION
`
`1. Field of the Invention
`
`This invention relates to data processing systems and,
`more particularly, to data processing systems wherein data
`items are identified by substantially unique identifiers which
`depend on all of the data in the data items and only on the
`data in the data items.
`
`ill
`
`15
`
`2. Background of the Invention
`Data processing (DP) systems, computers, networks of
`computers, or the like, typically offer users and programs
`various ways to identify the data in the systems.
`Users typically identify data in the data processing system
`by giving the data some form of name. For example, a
`typical operating system (OS) on a computer provides a file
`system in which data items are named by alphanumeric -
`identifiers. Programs typically identify data in the data
`processing system using a location or address. For example,
`a program may identify a record in a file or database by using
`a record number which serves to locate that record.
`
`30
`
`In all but the most primitive operating systems, users and
`programs are able to create and use collections of named
`data items, these collections themselves being named by
`identifiers. These named collections can then, themselves,
`be made part of other named collections. For example, an
`OS may provide mechanisms to group files {data items) into
`directories (collections). These directories can then, them-
`selves be made part of other directories. A data item may
`thus be identified relative to these nested directories using a
`sequence of names, or a so-called pathname, which defines
`a path through the directories to a particular data item (file
`or directory).
`As another example, a database management system may
`group data records (data items) into tables and then group
`these tables into database files (collections). The complete
`address of any data record can then be specified using the
`database file name, the table name, and the record numberof
`that data record.
`
`Other examples of identifying data items include: identi-
`fying files in a network file system, identifying objects in art
`object-oriented database,
`identifying images in an image
`database, and identifying articles in a text database.
`In general, the terms "data” and "data item” as used herein
`refer to sequences of bits. Thus a data item may be the
`contents of a tile, a portion of a liie, a page in memory, an
`object in an object-oriented program, a digital message, a
`digital Scanned image, a part of a video or audio signal, or
`any other entity which can be represented by a sequence of
`bits. The term "data processing” herein refers to the pro-
`cessing of data items, and is sometimes dependent on the
`type of data item being processed. For example, a data
`processor for a digital image may differ from a data pro-
`cessor for an audio signal.
`In all of the prior data processing systems the names or
`identifiers provided to identify data items (the data items
`being tiles, directories, records in the database, objects in
`object-oriented programming, locations in memory or on a
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`physical device, or the like) are always defined relative to a
`specific context. For instance, the [lie identified by a par-
`ticular file name can only be determined when the directory
`containing the file (the context) is known. The file identified
`by a pathnamc can be determined only when the file system
`(context) is known. Similarly,
`the addresses in a process
`address space, the keys in a database table. or domain names
`on a global computer network such as the Internet are
`meaningful only becaum they are specified relative to a
`CCIIIIEXI.
`
`In prior art systems for identifying data items there is no
`direct relationship between the data names and the data item.
`The same data name in two different contexts may refer to
`different data items, and two different data names in the
`same context may refer to the same data item.
`In addition, because there is no correlation between a data
`name and the data it refers to, there is no a priori way to
`confirm that a given data item is in fact the one named by a
`data name. For instance, in a DP system, if one processor
`requests that another processor deliver a data item with a
`given data name,
`the requesting processor cannot,
`in
`general, verify that
`the data delivered is the correct data
`(given only the name}. Therefore it may require further
`processing, typically on the part of the requestor, to verify
`that
`the data item it has obtained is,
`in fact,
`the item it
`requested.
`A common operation in a DP system is adding a new data
`item to the system. When a new data item is added to the
`system, a name can be assigned to it only by updating the
`context
`in which names are defined. Thus such systems
`require a centralized mechanism for the management of
`names. Such a mechanism is required even in a multi-
`processing system when data items are created and identified
`at separate processors in distinct locations, and in which
`there is no other need for communication when data items
`are added.
`
`In many data processing systems or environments, data
`items are transferred between ditIerent
`locations in the
`system. These locations may be processors in the data
`processing system, storage devices, memory, or the like. For
`example, one processor may obtain a data item from another
`processor or from an external storage device, such as a
`tloppy disk, and may incorporate that data item into its
`system (using the name provided with that data item).
`However, when a processor (or some location) obtains a
`data item from another location in the DP system,
`it
`is
`possible that this obtained data item is already present in the
`system (either at the location of the processor or at some
`other location accessible by the processor) and therefore a
`duplicate of the data item is created. This situation is
`common in a network data processing environment where
`proprietary software products are installed from floppy disks
`onto several processors sharing a common file server. In
`these systems, it is often the case that the same product will
`be installed on several systems, so that several copies of
`each file will reside on the common file server.
`
`In some data processing systems in which several pro-
`cessors are connected in a network, one system is designated
`as a cache server to maintain master copies of data items,
`and other systems are designated as cache clients to copy
`local copies of the master data items into a local cache on an
`as—nceded basis. Before using a cached item, a cache client
`must either reload the cached item, be informed of changes
`to the cached item, or confirm that the master item corre-
`sponding to the cached item has not changed. In other words,
`a cache client must synchronize its data items with those on
`
`

`
`US 6,415,280 B1
`
`3
`the cache server. This synchronization may involve reload-
`ing data items onto the cache client. The need to keep the
`cache synchronired or reload it adds significant overhead to
`existing caching mechanisms.
`In view of the above and other problems with prior art
`systems, it is therefore desirable to have a mechanism which
`allows each processor in a multiprocessor system to deter-
`mine a common and substantially unique identifier for a data
`item, using only the data in the data item and not relying on
`any sort of context.
`It is further desirable to have a mechanism for reducing
`multiple copies ofdata items in a data processing system and
`to have a mechanism which enables the identification of
`identical data items so as to reduce multiple copies. It is
`further desirable to detennine whether two instances of a
`data item are in fact the same data item, and to perform
`various other systems’ functions and applications on data
`items without relying on any context information or prop-
`erties of the data item.
`
`It is also desirable to provide such a mechanism in such
`a way as to make it
`transparent
`to users of the data
`processing system, and it is desirable that a single mecha-
`nism be used to address each of the problems described
`above.
`
`SUMMARY OF THE INVENTION
`
`This invention provides, in a data processing system, a
`method and apparatus for identifying a data item in the
`system. where the identity of the data item depends on all of
`the data in the data item and only on the data in the data item.
`Thus the identity of a data item is independent of its name,
`origin, location, address, or other information not derivable
`directly from the data, and depends only on the data itself.
`This invention further provides an apparatus and a method
`for determining whether a particular data item is present in
`the system or at a location in the system, by examining only
`the data identities of a plurality of data items.
`Using the method or apparatus of the present invention,
`the efliciency and integrity of a data processing system can
`be improved. The present invention improves the design and
`operation of a data storage system, file system, relational
`database, object-oriented database, or the like that stores a
`plurality ofdata items, by making possible or improving the
`design and operation of at least some or all of the following
`features:
`
`the system stores at most one copy of any data item at a
`given location, even when multiple data names in the
`system refer to the same contents;
`the system avoids copying data from source to destination
`locations when the destination locations already have
`the data;
`the system provides transparent access to any data item by
`reference only to its identity and independent of its
`present location, whether it be local, remote, or olfline;
`the system caches data items from a server, so that only
`the most recently accessed data items need be retained;
`when the system is being used to cache data items,
`problems of maintaining cache consistency are
`avoided;
`the system maintains a desired level of redundancy ofdata
`items in a network of servers, to protect against failure
`by ensuring that multiple copies of the data items are
`present at dilIerent locations in the system;
`the system automatically archives data items as they are
`created or modified;
`
`ill
`
`15
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`the system provides the size, age, and location of groups
`of data items in order to decide whether they can be
`safely removed from a local file system;
`the system can efficiently record and preserve any col-
`lection of data items;
`the system can efficiently make a copy of any collection
`of data items, to support a version control mechanism
`for groups of the data items;
`the system can publish data items, allowing other, possi-
`bly anonymous, systems in a network to gain access to
`the data items and to rely on the availability ofthe data
`items;
`the system can maintain a local inventory of all the data
`items located on a given removable medium, such as a
`diskette or CD-ROM, the inventory is independent of
`other properties of the data items such as their name,
`location, and date of creation;
`the system allows closely related sets of data items, such
`as matching or corresponding directories on discon-
`nected computers,
`to be periodically resynchronized
`with one another;
`the system can verify that data retrieved from another
`location is the desired or requested data, using only the
`data identifier used to retrieve the data;
`the system can prove possession of specific data items by
`content without disclosing the content ofthe data items,
`for purposes of later legal verification and to provide
`anonymity;
`the system tracks possession of specific data items accord-
`ing to content by owner, independent ofthc name, date,
`or other properties of the data item, and tracks the uses
`of specific data items and files by content for account-
`ing purposes.
`Other objects, features, and characteristics of the present
`invention as well as the methods of operation and functions
`of the related elements of structure, and the combination of
`parts and economies of manufacture, will become more
`apparent upon consideration of the following description
`and the appended claims with reference to the accompany-
`ing drawings, all of which form a part of this specification.
`
`BRIEF DESCI{IP'l‘lON OF THE DRAWINGS
`
`FIGS. 1(a) and l(b) depict a typical data processing
`system in which a preferred embodiment of the present
`invention operates;
`FIG. 2 depicts a hierarchy of data items stored at any
`location in such a data processing system;
`FIGS. 3-9 depict data structures used to implement an
`embodiment of the present invention; and
`FIGS.
`l0(n)-28 are flow charts depicting operation of
`various aspects of the present invention.
`Dl:l'I‘AILED DESCRl.P’I'lON OF THE
`PRIESENTLY PREFERRED EXEMPLARY
`EMBODIMENTS
`
`An embodiment of the present invention is now described
`with reference to a typical data processing system 100,
`which, with reference to FIGS. 1(a) and l(b). includes one
`or more processors (or computers) 102 and various storage
`devices 104 connected in some way, for example by a bus
`106.
`
`Each processor 102 includes a CPU 108, a memory 110
`and one or more local storage devices 112. The CPU 108,
`memory 110, and local storage device 112 may be internally
`
`

`
`US 6,415,280 B1
`
`ill
`
`15
`
`5
`connected, for example by a bus 114. Each processor 102
`may also include other devices (not shown), such as a
`keyboard. a display, a printer. and the like.
`In a data processing system 100, wherein more than one
`processor 102 is used, that is, in a multiprocessor system, the
`processors may he in one of various relationships. For
`example,
`two processors 102 may be in a clientiserver,
`clienticlient, or a serverfserver relationship. These inter-
`processor relationships may be dynamic, changing depend-
`ing on particular situations and functions. Thus, a particular
`processor 102 may change its relationship to other proces-
`sors as needed, essentially setting up a peer-to-peer relation-
`ship with other processors. In a peer—to-peer relationship,
`sometimes a particular processor 102 acts as a client
`processor, whereas at other times the same processor acts as
`a server processor.
`In other words, there is no hierarchy
`imposed on or required of processors 102.
`In a multiprocessor system, the processors 10% may be
`homogeneous or heterogeneous. Further, in a multiprocessor
`data processing system 100, some or all ol the processors
`102 may be disconnected from the network of processors for
`periods of time. Such disconnection may be part of the
`normal operation of the system 100 or it may be because a
`particular processor 102 is in need of repair.
`Within a data processing system 100, the data may be *
`organized to form a hierarchy of data storage elements,
`wherein lower level data storage elements are combined to
`form higher level elements. This hierarchy can consist of, for
`exa mple, processors, file systems, regions, directories, data
`files, segments, and the like. For example, with reference to
`FIG. 2, the data items on a particular processor 102 may be
`organized or structured as a file system 116 which comprises
`regions 117, each of which comprises directories 118, each
`of wh ich can contain other directories 118 or files 120. Each
`
`30
`
`35
`
`file 120 being made up of one or more data segments 122.
`In a typical data processing system, some or all of these
`elements can be named by users given certain implementa-
`tion specific naming conventions, the name (or pathname) of
`an element being relative to a context. In the context of a
`data processing system 100, a pathname is fully specified by
`a processor name, a filesystem name, a sequence of zero or
`more directory names identifying nested directories, and a
`final file name. (Usually the lowest level elements, in this
`case segments 122, cannot be named by users.)
`In other words, a
`file system 116 is a collection of
`directories 118. Adirectory 118 is a collection of named liles
`l20—both data files 120 and other directory files 118. Afile
`120 is a named data item which is either a data file (which
`may be simple or compound) or a directory file 118. A
`simple file 120 consists of a single data segment 122. A
`compound file 120 consists of a sequence of data segments
`122. A data segment 12 is a fixed sequence of bytes. An
`important property of any data segment
`is its size,
`the
`number of bytes in the sequence.
`A single processor 102 may access one or more file
`systems 116, and a single storage device 104 may contain
`one or more lile systems 116, or portions ofa file system 116.
`For instance, a file system 116 may span several storage
`devices 104.
`
`In order to implement controls in a file system, file system
`116 may be divided into distinct regions, where each region
`is a unit of management and control. A region consists of a
`given directory 118 and is identified by the pathrtame (user
`defined) of the directory.
`In the following, the term “location", with respect to a
`data processing system 100. refers to any of a particular
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`processor 102 in the system, a memory of a particular
`processor, a storage device, a removable storage medium
`(such as a floppy disk or compact disk), or any other physical
`location in the system. The tenn "local” with respect to a
`particular processor 102 refers to the memory and storage
`devices of that particular processor.
`In the following, the terms “T

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket