`
`Symantec 1009
`IPR of U.S. Pat. No. 7,757,298
`
`
`
`5,978,791
`Page 2
`
`OTHER PUBLICATIONS
`
`William Perrizo, et al., Distributed Join Processing Perfor-
`mance Evaluation, 1994. Twenty-Seventh Hawaii Interna-
`tional Conference on System Sciences, vol. II, pp. 236-244.
`A concurrency Control Mechanism based on Extendible
`Hashing for Main Memory Database Systems, Vij ay Kumar,
`pp. 109-113, ACM, vol. 3, 1989.
`Birgit Pfitzmann, Sorting Out Signature Schemes, Nov.
`1993, 1st Conf. Computer & Comm. Security ’93 pp. 74-85.
`Bert dem Boer, et al., Collisions for the compression func-
`tion of MDS pp. 292-304, 1994.
`Sakti Pramanik, et al., Multi-Directory Hashing, 1993, Info.
`Sys., vol. 18, No. 1, pp. 63-74.
`Murlidhar Koushik, Dynamic Hashing With Distributed
`Overflow Space: A File Organization With Good Insertion
`Performance, 1993, Info. Sys., vol. 18, No. 5, pp. 299-317.
`Witold Litwin, et al., LH*-Linear Hashing for Distributed
`Files, HP Labs Tech. Report No. HPL-93-21 Jun. 1993 pp.
`1-22.
`
`Yuliang Zheng, et al., HAVAL — A One-Way Hashing
`Algorithm with Variable Length of Output
`(Extended
`Abstract), pp. 83-105, Advances in Cryptology, AUSCRIPT
`’92, 1992.
`
`Chris Charnes and Josef Pieprzky, Linear Nonequivalence
`versus Nonlinearity, Pieprzky, pp. 156-164, 1993.
`Zhiyu Tian, et al., A New Hashing Function: Statistical
`Behaviour and Algorithm, pp. 3-13, SIGIR Forum, 1993.
`G. L. Friedman, Digital Camera With Apparatus For
`Authentication of Images Produced From an Image File,
`NASA Case No. NPO-19108-1-CU, Serial No. 08/159,980,
`Nov. 24, 1993.
`H. Goodman, Feb. 9, 1994 Ada, Object-Oriented Tech-
`niques, and Concurrency in Teaching Data Sructures and
`File Management Report Documentation P. AD-A275
`385 — 94-04277.
`
`Advances in Cryptology-EUROCRYPT ’93, Workshop on
`the Theory and Application of Cryptographic Techniques
`Lofthus, Norway, May 23-27, 1993 Proceedings.
`Proceedings of the 1993 ACM SIGMOD International Con-
`ference on Management of Data, vol. 22, Issue 2, Jun. 1993.
`Advances in Cryptology-AUSCRYPT ’92 — Workshop on
`the Theory and Application of Cryptographic Techniques
`Gold Coast, Queensland, Australia Dec. 13-16, 1992 Pro-
`ceedings.
`Search Report dated Jun. 24, 1996.
`
`000002
`
`000002
`
`
`
`M
`
`0079,5
`
`197,
`
`tHCtaP.3U
`
`2.
`
`NSN2aCu:or.
`
`
`
`vmo_>mn_Mmomwmoomm...momwmoommmwéohm
`
`mw<mo»w
`
`MOSND
`
`W
`
`1,2:
`
`
`
`mmommmoommmommmoomamNSNE
`
`Now
`
`momwmoomm
`
`000003
`
`000003
`
`
`
`U
`
`19
`
`
`
`
`3..mommmoomm_P__.::::::::::::::::::::::::::::::::::::::::::::|1L.S-ll-|._.I
`1_m.mu:HNu_<08MNW«NV0 __m>mosm_s___wWWa_mo.
`
`7.,_8,_7_029.5H.6U
`
`mo_>ma_MSmo<mo5V2
`pm.75onea.."2N3H.m5_%25HMQ2”mEm
`
`
`
`000004
`
`
`
`U.S. Patent
`
`mN
`
`1
`
`ChS
`
`M
`
`197,87
`
`
`
`Fakzmswmw...hzmswmwEmsomm
`
`NE.N5N2
`
`Mm.__u_...m.__n_m.__u_Mourour8.
`
`2.,2;2.m:
`
`
`
`9>m2.omma...>mo5mm_o>moBmm_ow%0
`
`500
`
`u, E
`:2’ E
`
`U)
`
`N <
`
`2
`
`LL.
`
`000005
`
`
`
`U.S. Patent
`
`Nov. 2, 1999
`
`Sheet 4 of 31
`
`5,978,791
`
`FIG. 3
`
`.38
`
`Pathname
`
`Time of last access
`
`Time of last modification
`
`
`
`
`
`
`
`
`EEEgEHEE:HEEEHEHNRREEEENg:EEEgEEEHRHE::HH:ii::‘IHIIH:HIi:::::
`
`
`
`
`
`:40
`
`‘
`
`True Name
`
`Com-ressed File ID
`
`DeQendent -rocessors
`
`
`
`
`
`
`
`Time of last access
`
`
`Groomino delete count
`
`I42
`
`Mirror du-lication count
`
`F|G.5
`
`
`
`
`
`000006
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`000006
`
`
`
`
`
`¢¢_
`
`U.S.Patent
`
`m.sumunzom
`m.o:
`
`Nov. 2, 1999
`
`Sheets 0f31
`
`5,978,791
`
`m¢_
`
`on.
`
`m.2...
`
`ow.
`
`
`
`u«awnwHwa>mmoufiom
`
`
`
`GOH#MUOHmounow
`
` ho_.._
`
`:owumnw.o
`
`ooooo7
`
`.EmummfiHB
`
`OEMCSUMW
`
`mamamama
`
`000007
`
`
`
`
`
`U.S. Patent
`
`Nov. 2, 1999
`
`Sheet 6 of 31
`
`5,978,791
`
`FIG. IO(a)
`
`SIMPKE
`
`DATA ITEM
`
`COMPUTE MD FUNCTION ON
`
`DATA ITEM
`
`8214
`
`APPEND LENGTH MODULO 32 OF
`
`DATA ITEM
`
`TRUE NAME
`
`1
`
`000008
`
`000008
`
`
`
`U.S. Patent
`
`Nov. 2, 1999
`
`Sheet 7 of 31
`
`5,978,791
`
`
`
`
`
`S220
`PARTITION DATA ITEM INTO
`SEGMENTS
`
`FIG. IO(b)
`
`S222
`
`;’"""sI21'8'"" \‘
`COMPUTE TRUE :
`; NAME OF SIMPLE 1
`"
`DATA ITEM
`I
`\ . _ _ _ _ _
`_ _ _ _ ._
`
`ASSIMILATE EACH SEGMENT
`
`(COMPUTING rrs TRUE NAME)
`
`SEGMENT TRUE NAMES
`
`3224
`
`CREATE INDIRECT BLOCK OF
`
`S226
`
`ASSIMILATE INDIRECT BLOCK
`
`(COMPUTING ITS TRUE NAME)
`
`ITEM
`
`S228
`
`REPLACE FINAL 32 BITS OF TRUE
`NAME WITH LENGHT MOD 32 OF DATA
`
`000009
`
`000009
`
`
`
`3U
`
`LI.
`
`m
`
`%
`
`197,8.o/,
`
`P8%
`mms<zM55
`mz_s_Ema.m.m__o_..._
`
`5,9m.Emmohm
`
`omww
`
`8%
`
`a.m.__u_mfido
`
`
`
`Mwo._m_u_mmE.oEm..MD.m.=u_mmo._.w.
`00S_m.=n_m><:_.op._.z:oomm:Em.m>E.zmmmoomm:>Ezmzmzm:.<mmo.
`
`1,m.__u_mnmpz_Bum.
`
`.,v..mm:ms_<zM55mmooo
`9m%~.>Em_ommo
`
`omwm
`
`0m
`
`000010
`
`
`
`
`U.S. Patent
`
`Nov. 2, 1999
`
`Sheet 9 of 31
`
`5,978,791
`
`FIG. I2
`
`
`“E5225
`
`DEPENDENCY
`
`
`LOCKED?
`LIST
`
`
`
`
`FILE
`
`S242
`
`SEND MESSAGE TO
`
`
`
`
`
`CACHE SERVER TO
` S244
`UPDATE CACHE
`
`(IF DESIRED)
`
`COMPRESS
`
`S246
`
`
`
`MIRROR
`
`
`(IF DESIRED)
`
`
`
`00001 1
`
`000011
`
`
`
`U.S. Patent
`
`N0v.2, 1999
`
`Sheet 10 of 31
`
`5,978,791
`
`FIG. I3
`
`S250
`
`SEARCH FOR
`THE
`
`PATHNAME
`
`‘ .
`
`.
`
`‘ '
`
`FAIL
`
`LDE INCLUDES
`
`TRUE NAME?
`
`S258
`
`
`
`
`
`DIRECTORY?
`
`
`
`ASSIMILATE
`
`LDE IDENTIFIES
`
`
`
`FILE ID
`
`YS
`
`S256
`
`FREEZE
`
`DIRECTORY
`
`000012
`
`000012
`
`
`
`U.S. Patent
`
`Nov. 2, 1999
`
`Sheet 11 of 31
`
`5,978,791
`
`S260
`
`CONFIRM THAT
`
`TRUE NAME
`
`EXISTS LOCALLY
`
`
`
`
` S262
`SEARCH FOR
`
`FIG. I4
`
`
`
`PATHNAME IN
`
`LDE TABLE
`
`S264
`
`CONFIRM THAT
`
`DIRECTORY
`
`
`
`
`
`EXISTS
`
`S266
`
`
`
`S268
`
`NAMED FILE
`
`EXISTS?
`
`DELETE
`
`TRUE FILE
`
`
`
`CREATE
`
`000013
`
`ENTRY IN LDE
`
`
`
`
`
`& UPDATE
`
`S270
`
`000013
`
`
`
`U.S. Patent
`
`N0v.2, 1999
`
`Sheet 12 of 31
`
`5,978,791
`
`mmzommmm
`
`m>:._mo..
`
`Emm
`
`
`
`"Fm.azmm
`
`.:<>>wmw<mmms_
`mo".
`
`mwzommmm
`
`2%
`
`
`
`m.__u_m:E.mmwzm
`
`
`
`O._.z_omzmamm
`
`mu:
`
`mmzomwmm
`
`m>:.<wmz
`
`mm;
`
`Eommmoom.<29200..2
`
`Smm
`
`
`
`mam»E_mm>
`
`"5m.__"_
`
`ammamo
`
`8mm
`
`m.__"_02.".
`
`m_.o_.._
`
`000014
`
`000014
`
`
`
`
`
`
`
`U.S. Patent
`
`N0v.2, 1999
`
`Sheet 13 of 31
`
`5,978,791
`
`
`
`A30..9”.
`
`::<n_O
`
`«mum
`
`..zm_._o
`
`..&.om.m_m
`
`.wEommmoom._
`
`mm:
`
`8%
`
`>2...
`
`om._.om._m
`
`mmomwmoomm
`
`ommw
`
`._.zm_._o
`
`MMZOQMMQ
`
`i-li+|iI-
`
`m\<R.01
`
`m.5<oa<ommms:.0N2
`
`MWZOQMME
`
`hbNEE
`
`mo
`
`000015
`
`000015
`
`
`
`
`
`
`SU
`
`Cta
`
`MN
`
`m2.,
`
`M0
`
`19798
`
`PII|II
`
`tM55".0momaown8%
`
`A|..|.os_oEmmmuiams_<zmmopm
`
`~zo:<z:.mmo
`
`o_mowmmoomm
`
`.0/,>momwmoomm
`
`
`5,%..o_%a__h_vMm.%.._,_,__w_%%_n_5
`
`
`mz_s_mm:.momm:m.momnomom.__“_mam:m>mmwmm
`39$05%
`
`
`-oommmopmw<mmms_ozmw
`
`Mms_<zmay:
`
`
`
`amo".memomaowopw9zoF<oo._momaowV@_O_n_
`
`mo".«E.n_Dxoo._w9mommam
`22wms_<zms”:6
`
`000016
`
`
`
`
`U.S. Patent
`
`N0v.2, 1999
`
`Sheet 15 of 31
`
`5,978,791
`
` mm:
`
`MO".0.m.__n_
`
`
`
`~.>E.zmm=._._.
`
`O2
`
`
`
`mmmwmm>ommw
`
`mwmmmsoomo
`
`omwmmmmsoo
`
`S.m.__"_
`
`mm;
`
`.=<..._
`
`oz
`
`>Ezmm.__n_mam
`
`mzmkmo".E»z_
`
`~.ms_<z
`
`A32.2“.
`
`000017
`
`000017
`
`
`
`
`
`
`3U
`
`tnet
`
`MS
`
`1979007
`
`comm
`
`9mmopw
`
`2,3888
`
`a--P|IIIIIII
`
`
`
`1Emomnomms_oEm.__"_mMSE.m~_._<mm
`
`9,mm:5BE.0_..._
`
`8m
`
`
`
`m.Ss_mmm9momaomm5m._mwmmos.o2m:.<oo._wemomaom.m.__u_
`
`
`
`
`
`000018
`
`
`
`
`taD13U
`
`v.0N
`
`2,
`
`S
`
`7
`
`1979879:5
`
`ta.5o:m:o..<momA3w_0_n_
`
`3...“.
`
`max»noEcomm
`
`mm.__"_mam»mmE._mo
`
`
`
`1way...925.am:mbmmom$88%
`
`
`
`._<oo._m.__n_m.E:o..<mom
`
`
`wz:m_xmmmm:mo".9m_mm_"_Fzmo_mn._
`5...".m9~.m.__u_mamwmay:
`
`289
`
`000019
`
`
`
`
`U.S. Patent
`
`N0v.2, 1999
`
`Sheet 18 of 31
`
`5,978,791
`
`wmmm
`
`atm>o_2mmwn:m.Em><m
`
`>Ezm
`
`mm:
`
`
`
`
`
`.m._m<..mo._Z_9..u._.__..._mmobw.m.=.._
`
`pzaoo
`
`
`
`mm:pzmsmmomo
`
`ll-
`
`88
`
`zmzO._.w.E>n_Oo
`
`
`
`35..o_u_
`
`000020
`
`000020
`
`
`
`
`
`P&U
`
`WN
`
`w2.,
`
`mM
`
`1979007995
`
`A39OExoo._mNmmEm_.zmsm_moz_m$8
`
`somF<.=s__ww<z:oz<m.__"_
`
`m>moBmm_ozm>_u
`mum.EHt.2.>mo5mmE
`m:.<z_omom:mmmmmm:o<mmo”.m%»0
`E.<.__s__mm<
`
`000021
`
`
`
`
`U.S. Patent
`
`N0v.2, 1999
`
`Sheet 20 of 31
`
`5,978,791
`
`ovmm
`
`
`
`..<2O_._._oo<<._.<n_zmzn_z<m.__n_
`
`DmO0m.mOh>m._.zmoQ<m:.<z_nmom:m
`Dmm_mmo_2m._._mzpZ_>mo..omm_o
`
`wmmm:o<mmo".
`
`zoP<s_mou_z_
`
`
`
`35..o_.._
`
`Sam
`
`kzmsmmomo
`
`mwmmmumm»
`
`K004
`
`
`
`
`
`S_m_._._<._.<D>>mZ
`
`
`
`>mo5mm_azm>_c
`
`$8
`
`
`
`m:._._.m:.<.__s=mm<
`
`000022
`
`000022
`
`
`
`
`
`
`
`
` Mm_2<zE.<._M._.S"_mbmmom88
`
`197,
`
`O0
`
`.0/,m_z<zmax»
`
`5,oh:._.<n_xz_._
`
`Nmmm
`
`nCtaP3U
`
`mN
`
`2.,
`
`W
`
`teaomo:
`
`M55.mxsz.
`
`
`
`.28..m.__u_
`
`
`mm_Ezmmzoomwofim:05mo".mmosozEmmmw3%mmmm
`>E.zm2>mo.6mm_o>moBmm_o
`
`000023
`
`
`
`
`U.S. Patent
`
`N0v.2, 1999
`
`Sheet 22 of 31
`
`5,978,791
`
`
`
` S354
`WAIT FOR
`
`
`
`FREEZE LOCK
`
`
`
`TO TURN OFF
`
`
`
`S356
`
`FIND TFR
`
`ENTRY
`
`
`
` S358
`DECREMENT
`
`REFERENCE
`
`COUNT
`
`REFERENCE COUNT IS
`
`FlG.2|
`
`S362
`
`DELETE
`
`
`
`
`
`TRUE FILE
`
`
`
`
`ZERO & NO DEPENDENT
`
`SYSTEMS IN TFR?
`
`S364
`
`
`REMOVE FILE ID
`AND COMPRESSED
`
`FILE ID
`
`
`
`000024
`
`000024
`
`
`
`U.S. Patent
`
`N0v.2, 1999
`
`Sheet 23 of 31
`
`5,978,791
`
`S365
`
`GET
`
`OPERATION
`
`3365
`CREATE oR
`MODIFY?
`
`cow OR DELETE
`COMPOUND?
`
`
`
`
`YES
`
`YES
`
`S368
`
`ASSIMILATE
`
`F"-E
`
`3369
`
`NEW TRUE
`
`S378
`
`3370
`
`Mompy USE
`COUNT OF EACH
`COMPONENT
`
`RECORD TRUE
`NAME IN AUDIT
`
`FILE
`
`S379
`
`FOR EACH PARENT
`
`DIRECTORY OR FILE,
`
`UPDATE USE COUNT,
`LAST ACCESS AND
`
`MODIFY TIMES
`
`000025
`
`000025
`
`
`
`U.S. Patent
`
`N0v.2, 1999
`
`Sheet 24 of 31
`
`5,978,791
`
`
`
` S382
`VERIFY
`
`GROOMING
`
`LOCK OFF
`
`
`
`8384
`
`SET
`
`
`
`GROOMING
`
`LOCK
`
`S386
`
`SET GROOM
`
`COUNTS
`
`FIG. 23
`
`000026
`
`000026
`
`
`
`U.S. Patent
`
`N0v.2, 1999
`
`Sheet 25 of 31
`
`5,978,791
`
`
`
`
`FIND LDE
`
`S388
`
`RECORD
`
`FIG. 24
`
`
`
`S390
`
`FIND TFR
`
`RECORD
`
`S392
`
`
`
`
`
`DELETE COUNT
`
`
`S394
`ADJUST FILE
`
`
`
`SlZES
`
`000027
`
`INCREMENT
`
`GROOMING
`
`000027
`
`
`
`U.S. Patent
`
`N0v.2, 1999
`
`Sheet 26 of 31
`
`5,978,791
`
`FIG. 25
`
` S396
`
`
`DELETE
`
`FEE
`
` S398
`
`UNLOCK
`
`
`
`
`
`GROOMING
`
`LOCK
`
`000028
`
`000028
`
`
`
`U.S. Patent
`
`N0v.2, 1999
`
`Sheet 27 of 31
`
`5,978,791
`
`GEN.o_.._
`
`-...m
`
`:m=._omn_
`
`>._zo.o<mm
`
`>m0.5mm..
`
`ovm
`
`oz_mm
`
`.Em.Emmo
`
`000029
`
`000029
`
`
`
`
`
`
`U.S. Patent
`
`N0v.2, 1999
`
`Sheet 28 of 31
`
`5,978,791
`
`omvm
`
`._<oo._mxsz
`
`
`
`n:m._:zmammwzo_wmm>
`
`mu:E01".
`
`52n:X004
`
`9200..
`
`:5
`
`m:.<mmo
`
`zobaow
`
`>mO0
`
`§oN.o_n_
`
`m.__"_
`
`._.om:o<o
`
`28
`
`mzmm
`
`>._m._.m.Es_oo
`
`
`
`zmEEsm..
`
`vmvm
`
`zmBum
`
`Q.
`
`
`
`m.__u_zobqmom
`
`m._Emm<mm
`
`85
`
`mbmmo
`
`
`
`m.=n__._0._.<m_ow
`
`000030
`
`000030
`
`
`
`
`
`
`
`
`
`
`
`U.S.Patent
`
`Nmvm
`
`
`
` >Ezm._.mama...mz_sEmEo
`
`Nov. 2, 1999
`
`Sheet 29 of 31
`
`5,978,791
`
`
`
`QKN.o_n_M522
`
`
`
`
`
`max»20¢".m.=..._
`
`..._m_:oE
`
`zo_E._mo
`
`m.__“_
`
`
`
`mo".momoomm
`
`.Oomoommmm:0.
`
`Z_moomxoo._m.=u_
`
`>._zo-o<mm
`
`tEo5mm_o
`
`vmvm
`
`
`
`mam...>n=._.zmn__
`
`oooo31
`
`000031
`
`
`
`
`
`
`U.S. Patent
`
`N0v.2, 1999
`
`Sheet 30 of 31
`
`5,978,791
`
`.\.N¢w
`
`mfido
`
`
`
`>n_OOzobqmom
`
`m.__.._u_O
`
`mmvm
`
`
`
`O._.>Ezmoo<
`
`
`
`m.__u_._._Q_.._<
`
`omvm
`
`m:m._mo
`
`m.__n_mam»
`
`BIN.o_..._
`
`w.m.__"_mam»
`
`
`
`m.._.ZDO0mm:
`
`mzo
`
`55
`
`
`
`mm:moaomm
`
`M20E#2300
`
`
`
`ozozm<:m.__u_mm;
`
`$s_<zmam»
`
`000032
`
`000032
`
`
`
`
`
`
`U.S. Patent
`
`N0v.2, 1999
`
`Sheet 31 of 31
`
`5,978,791
`
`mm.o_..._
`
`Nmvm
`
`n=§oo._
`
`m_z<zmamp
`
`02mm;
`
`
`
`mm0...Smaomm
`
`Smnm<2Eou_
`
`mmvm
`
`m>:.<omz
`
`mwzommmm
`
`Evm
`
`E229.
`
`«Sm
`
`n_m<2Eou.
`
`pmmaamm
`
`ommmmmmzooMOQ.m..=u_
`wmo:._oz
`
`
`
`«D.m.__u_
`
`Slum
`
`m>Ewo._
`
`mmzoawmm
`
`000033
`
`000033
`
`
`
`
`
`
`
`
`5,978,791
`
`1
`DATA PROCESSING SYSTEM USING
`SUBSTANTIALLY UNIQUE IDENTIFIERS TO
`IDENTIFY DATA ITEMS, WHEREBY
`IDENTICAL DATA ITEMS HAVE THE SAME
`IDENTIFIERS
`
`This is a continuation of application Ser. No. 08/425,160,
`filed on Apr. 11, 1995, which was abandoned upon the filing
`hereof.
`
`BACKGROUND OF THE INVENTION
`
`1. Field of the Invention
`
`This invention relates to data processing systems and,
`more particularly, to data processing systems wherein data
`items are identified by substantially unique identifiers which
`depend on all of the data in the data items and only on the
`data in the data items.
`
`2. Background of the Invention
`Data processing (DP) systems, computers, networks of
`computers, or the like, typically offer users and programs
`various ways to identify the data in the systems.
`Users typically identify data in the data processing system
`by giving the data some form of name. For example, a
`typical operating system (OS) on a computer provides a file
`system in which data items are named by alphanumeric
`identifiers. Programs typically identify data in the data
`processing system using a location or address. For example,
`a program may identify a record in a file or database by using
`a record number which serves to locate that record.
`
`In all but the most primitive operating systems, users and
`programs are able to create and use collections of named
`data items, these collections themselves being named by
`identifiers. These named collections can then, themselves,
`be made part of other named collections. For example, an
`OS may provide mechanisms to group files (data items) into
`directories (collections). These directories can then, them-
`selves be made part of other directories. A data item may
`thus be identified relative to these nested directories using a
`sequence of names, or a so-called pathname, which defines
`a path through the directories to a particular data item (file
`or directory).
`As another example, a database management system may
`group data records (data items) into tables and then group
`these tables into database files (collections). The complete
`address of any data record can then be specified using the
`database file name, the table name, and the record number of
`that data record.
`
`Other examples of identifying data items include: identi-
`fying files in a network file system, identifying objects in an
`object-oriented database,
`identifying images in an image
`database, and identifying articles in a text database.
`In general, the terms “data” and “data item” as used herein
`refer to sequences of bits. Thus a data item may be the
`contents of a file, a portion of a file, a page in memory, an
`object in an object-oriented program, a digital message, a
`digital scanned image, a part of a video or audio signal, or
`any other entity which can be represented by a sequence of
`bits. The term “data processing” herein refers to the pro-
`cessing of data items, and is sometimes dependent on the
`type of data item being processed. For example, a data
`processor for a digital image may differ from a data pro-
`cessor for an audio signal.
`In all of the prior data processing systems the names or
`identifiers provided to identify data items (the data items
`being files, directories, records in the database, objects in
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`object-oriented programming, locations in memory or on a
`physical device, or the like) are always defined relative to a
`specific context. For instance, the file identified by a par-
`ticular file name can only be determined when the directory
`containing the file (the context) is known. The file identified
`by a pathname can be determined only when the file system
`(context) is known. Similarly, the addresses in a process
`address space, the keys in a database table, or domain names
`on a global computer network such as the Internet are
`meaningful only because they are specified relative to a
`context.
`
`In prior art systems for identifying data items there is no
`direct relationship between the data names and the data item.
`The same data name in two different contexts may refer to
`different data items, and two different data names in the
`same context may refer to the same data item.
`In addition, because there is no correlation between a data
`name and the data it refers to, there is no a priori way to
`confirm that a given data item is in fact the one named by a
`data name. For instance, in a DP system, if one processor
`requests that another processor deliver a data item with a
`given data name,
`the requesting processor cannot,
`in
`general, verify that the data delivered is the correct data
`(given only the name). Therefore it may require further
`processing, typically on the part of the requestor, to verify
`that the data item it has obtained is,
`in fact,
`the item it
`requested.
`A common operation in a DP system is adding a new data
`item to the system. When a new data item is added to the
`system, a name can be assigned to it only by updating the
`context in which names are defined. Thus such systems
`require a centralized mechanism for the management of
`names. Such a mechanism is required even in a multi-
`processing system when data items are created and identified
`at separate processors in distinct locations, and in which
`there is no other need for communication when data items
`are added.
`
`In many data processing systems or environments, data
`items are transferred between different
`locations in the
`
`system. These locations may be processors in the data
`processing system, storage devices, memory, or the like. For
`example, one processor may obtain a data item from another
`processor or from an external storage device, such as a
`floppy disk, and may incorporate that data item into its
`system (using the name provided with that data item).
`However, when a processor (or some location) obtains a
`data item from another location in the DP system,
`it is
`possible that this obtained data item is already present in the
`system (either at the location of the processor or at some
`other location accessible by the processor) and therefore a
`duplicate of the data item is created. This situation is
`common in a network data processing environment where
`proprietary software products are installed from floppy disks
`onto several processors sharing a common file server. In
`these systems, it is often the case that the same product will
`be installed on several systems, so that several copies of
`each file will reside on the common file server.
`
`In some data processing systems in which several pro-
`cessors are connected in a network, one system is designated
`as a cache server to maintain master copies of data items,
`and other systems are designated as cache clients to copy
`local copies of the master data items into a local cache on an
`as-needed basis. Before using a cached item, a cache client
`must either reload the cached item, be informed of changes
`to the cached item, or confirm that the master item corre-
`sponding to the cached item has not changed. In other words,
`
`000034
`
`000034
`
`
`
`5,978,791
`
`3
`a cache client must synchronize its data items with those on
`the cache server. This synchronization may involve reload-
`ing data items onto the cache client. The need to keep the
`cache synchronized or reload it adds significant overhead to
`existing caching mechanisms.
`In view of the above and other problems with prior art
`systems, it is therefore desirable to have a mechanism which
`allows each processor in a multiprocessor system to deter-
`mine a common and substantially unique identifier for a data
`item, using only the data in the data item and not relying on
`any sort of context.
`It is further desirable to have a mechanism for reducing
`multiple copies of data items in a data processing system and
`to have a mechanism which enables the identification of
`
`identical data items so as to reduce multiple copies. It is
`further desirable to determine whether two instances of a
`
`data item are in fact the same data item, and to perform
`various other systems’ functions and applications on data
`items without relying on any context information or prop-
`erties of the data item.
`
`It is also desirable to provide such a mechanism in such
`a way as to make it
`transparent
`to users of the data
`processing system, and it is desirable that a single mecha-
`nism be used to address each of the problems described
`above.
`
`SUMMARY OF THE INVENTION
`
`This invention provides, in a data processing system, a
`method and apparatus for identifying a data item in the
`system, where the identity of the data item depends on all of
`the data in the data item and only on the data in the data item.
`Thus the identity of a data item is independent of its name,
`origin, location, address, or other information not derivable
`directly from the data, and depends only on the data itself.
`This invention further provides an apparatus and a method
`for determining whether a particular data item is present in
`the system or at a location in the system, by examining only
`the data identities of a plurality of data items.
`Using the method or apparatus of the present invention,
`the efficiency and integrity of a data processing system can
`be improved. The present invention improves the design and
`operation of a data storage system, file system, relational
`database, object-oriented database, or the like that stores a
`plurality of data items, by making possible or improving the
`design and operation of at least some or all of the following
`features:
`
`the system stores at most one copy of any data item at a
`given location, even when multiple data names in the system
`refer to the same contents;
`the system avoids copying data from source to destination
`locations when the destination locations already have the
`data;
`the system provides transparent access to any data item by
`reference only to its identity and independent of its present
`location, whether it be local, remote, or offline;
`the system caches data items from a server, so that only
`the most recently accessed data items need be retained;
`when the system is being used to cache data items,
`problems of maintaining cache consistency are avoided;
`the system maintains a desired level of redundancy of data
`items in a network of servers, to protect against failure by
`ensuring that multiple copies of the data items are present at
`different locations in the system;
`the system automatically archives data items as they are
`created or modified;
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`the system provides the size, age, and location of groups
`of data items in order to decide whether they can be safely
`removed from a local file system;
`the system can efficiently record and preserve any col-
`lection of data items;
`the system can efficiently make a copy of any collection
`of data items, to support a version control mechanism for
`groups of the data items;
`the system can publish data items, allowing other, possi-
`bly anonymous, systems in a network to gain access to the
`data items and to rely on the availability of the data items;
`the system can maintain a local inventory of all the data
`items located on a given removable medium, such as a
`diskette or CD-ROM, the inventory is independent of other
`properties of the data items such as their name, location, and
`date of creation;
`the system allows closely related sets of data items, such
`as matching or corresponding directories on disconnected
`computers,
`to be periodically resynchronized with one
`another;
`the system can verify that data retrieved from another
`location is the desired or requested data, using only the data
`identifier used to retrieve the data;
`the system can prove possession of specific data items by
`content without disclosing the content of the data items, for
`purposes of later legal verification and to provide anonym-
`ity;
`the system tracks possession of specific data items accord-
`ing to content by owner, independent of the name, date, or
`other properties of the data item, and tracks the uses of
`specific data items and files by content for accounting
`purposes.
`
`Other objects, features, and characteristics of the present
`invention as well as the methods of operation and functions
`of the related elements of structure, and the combination of
`parts and economies of manufacture, will become more
`apparent upon consideration of the following description
`and the appended claims with reference to the accompany-
`ing drawings, all of which form a part of this specification.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIGS. 1(a) and 1(b) depicts a typical data processing
`system in which a preferred embodiment of the present
`invention operates;
`FIG. 2 depicts a hierarchy of data items stored at any
`location in such a data processing system;
`FIGS. 3-9 depict data structures used to implement an
`embodiment of the present invention; and
`FIGS. 10(a)—28 are flow charts depicting operation of
`various aspects of the present invention.
`DETAILED DESCRIPTION OF THE
`PRESENTLY PREFERRED EXEMPLARY
`EMBODIMENTS
`
`An embodiment of the present invention is now described
`with reference to a typical data processing system 100,
`which, with reference to FIGS. 1(a) and 1(b), includes one
`or more processors (or computers) 102 and various storage
`devices 104 connected in some way, for example by a bus
`106.
`
`Each processor 102 includes a CPU 108, a memory 110
`and one or more local storage devices 112. The CPU 108,
`memory 110, and local storage device 112 may be internally
`connected, for example by a bus 114. Each processor 102
`
`000035
`
`000035
`
`
`
`5,978,791
`
`5
`
`may also include other devices (not shown), such as a
`keyboard, a display, a printer, and the like.
`In a data processing system 100, wherein more than one
`processor 102 is used, that is, in a multiprocessor system, the
`processors may be in one of various relationships. For
`example,
`two processors 102 may be in a client/server,
`client/client, or a server/server relationship. These inter-
`processor relationships may be dynamic, changing depend-
`ing on particular situations and functions. Thus, a particular
`processor 102 may change its relationship to other proces-
`sors as needed, essentially setting up a peer-to-peer relation-
`ship with other processors. In a peer-to-peer relationship,
`sometimes a particular processor 102 acts as a client
`processor, whereas at other times the same processor acts as
`a server processor. In other words, there is no hierarchy
`imposed on or required of processors 102.
`In a multiprocessor system, the processors 102 may be
`homogeneous or heterogeneous. Further, in a multiprocessor
`data processing system 100, some or all of the processors
`102 may be disconnected from the network of processors for
`periods of time. Such disconnection may be part of the
`normal operation of the system 100 or it may be because a
`particular processor 102 is in need of repair.
`Within a data processing system 100, the data may be
`organized to form a hierarchy of data storage elements,
`wherein lower level data storage elements are combined to
`form higher level elements. This hierarchy can consist of, for
`example, processors, file systems, regions, directories, data
`files, segments, and the like. For example, with reference to
`FIG. 2, the data items on a particular processor 102 may be
`organized or structured as a file system 116 which comprises
`regions 117, each of which comprises directories 118, each
`of which can contain other directories 118 or files 120. Each
`
`file 120 being made up of one or more data segments 122.
`In a typical data processing system, some or all of these
`elements can be named by users given certain implementa-
`tion specific naming conventions, the name (or pathname) of
`an element being relative to a context. In the context of a
`data processing system 100, a pathname is fully specified by
`a processor name, a filesystem name, a sequence of zero or
`more directory names identifying nested directories, and a
`final file name. (Usually the lowest level elements, in this
`case segments 122, cannot be named by users.)
`In other words, a file system 116 is a collection of
`directories 118. Adirectory 118 is a collection of named files
`120—both data files 120 and other directory files 118. Afile
`120 is a named data item which is either a data file (which
`may be simple or compound) or a directory file 118. A
`simple file 120 consists of a single data segment 122. A
`compound file 120 consists of a sequence of data segments
`122. A data segment 122 is a fixed sequence of bytes. An
`important property of any data segment
`is its size,
`the
`number of bytes in the sequence.
`A single processor 102 may access one or more file
`systems 116, and a single storage device 104 may contain
`one or more file systems 116, or portions of a file system 116.
`For instance, a file system 116 may span several storage
`devices 104.
`
`In order to implement controls in a file system, file system
`116 may be divided into distinct regions, where each region
`is a unit of management and control. A region consists of a
`given directory 118 and is identified by the pathname (user
`defined) of the directory.
`In the following, the term “location”, with respect to a
`data processing system 100, refers to any of a particular
`processor 102 in the system, a memory of a particular
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`processor, a storage device, a removable storage medium
`(such as a floppy disk or compact disk), or any other physical
`location in the system. The term “local” with respect to a
`particular processor 102 refers to the memory and storage
`devices of that particular processor.
`In the following, the terms “True Name”, “data identity”
`and “data identifier” refer to the substantially unique data
`identifier for a particular data item. The term “True File”
`refers to the actual file, segment, or data item identified by
`a True Name.
`
`A file system for a data processing system 100 is now
`described which is intended to work with an existing oper-
`ating system by augmenting some of the operating system’s
`file management system codes. The embodiment provided
`relies on the standard file management primitives for actu-
`ally storing to and retrieving data items from disk, but uses
`the mechanisms of the present invention to reference and
`access those data items.
`
`The processes and mechanisms (services) provided in this
`embodiment are grouped into the following categories:
`primitive mechanisms, operating system mechanisms,
`remote mechanisms, background mechanisms, and extended
`mechanisms.
`
`Primitive mechanisms provide fundamental capabilities
`used to support other mechanisms. The following primitive
`mechanisms are described:
`
`xooo\1.c:x.u/1..:>uzt\)
`
`1. Calculate True Name;
`. Assimilate Data Item;
`. New True File;
`Get True Name from Path;
`Link path to True Name;
`Realize True File from Location;
`. Locate Remote File;
`. Make True File Local;
`. Create Scratch File;
`10. Freeze Directory;
`11. Expand Frozen Directory;
`12. Delete True File;
`13. Process Audit File Entry;
`14. Begin Grooming;
`15. Select For Removal; and
`16. End Grooming.
`Operating system mechanisms provide typical familiar
`file system mechanisms, while maintaining the data struc-
`tures required to offer the mechanisms of the present inven-
`tion. Operating system mechanisms are designed to augment
`existing operating systems, and in this way to make the
`present invention compatible with, and generally transparent
`to, existing applications. The following operating system
`mechanisms are described:
`
`1. Open File;
`2. Close File;
`3. Read File;
`4. Write File;
`5. Delete File or Directory;
`6. Copy File or Directory;
`7. Move File or Directory;
`8. Get File Status; and
`9. Get Files in Directory.
`R mote mechanisms are used by the operating system in
`responding to requests from other processors. These mecha-
`nisms enable the capabilities of the present invention in a
`
`000036
`
`000036
`
`
`
`5,978,791
`
`7
`peer-to-peer network mode of operation. The following
`remote mechanisms are described:
`
`1. Locate True File;
`2. Reserve True File;
`3. Request True File;
`4. Retire True File;
`5. Cancel Reservation;
`6. Acquire True File;
`7. Lock Cache;
`8. Update Cache; and
`9. Check Expiration Date.
`Background mechanisms are intended to run occasionally
`and at a low priority. These provide automated management
`capabilities with respect to the present invention. The fol-
`lowing background mechanisms are described:
`1. Mirror True File;
`2. Groom Region;
`3. Check for Expired Links; and
`4. Verify Region; and
`5. Groom Source List.
`
`Extended mechanisms run within application programs
`over the operating system. These mechanisms provide solu-
`tions to specific problems and applications. The following
`extended mechanisms are described:
`
`ooo.\I9\sn.4>.w.~
`
`1. Inventory Existing Directory;
`Inventory Removable, Read-only Files;
`Synchronize directories;
`Publish Region;
`Retire Directory;
`Realize Directory at location;
`Verify True File;
`. Track for accounting purposes; and
`. Track for licensing purposes.
`The file system herein described maintains sufficient
`information to provide a variety of mechanisms not ordi-
`narily offered by an operating system, some of which are
`listed and described here. Various processing performed by
`this embodiment of the present
`invention will now be
`described in greater detail.
`In some embodiments, some files 120 in a data processing
`system 100 do not have True Names because they have been
`recently received or created or modified, and thus their True
`Names have not yet been computed. A file that does not yet
`have a True Name is called a scratch file. The process of
`assigning a True Name to a file is referred to as assimilation,
`and is described later. Note that a scratch file may have a
`user provided name.
`Some of the processing performed by the present inven-
`tion can take place in a background mode or on a delayed or
`as-needed basis. This background processing is used to
`determine information that is not immediately required by
`the system or which may never be required. As an example,
`in some cases a scratch file is being changed at a rate greater
`than the rate at which it is useful to determine its True Name.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`8
`100. However, they can also be shared by placing them on
`a remote, shared file server (for instance, in a local area
`network of mac