`
`ISSUECLASSIFICATION
`:19555U-S-
` I37 I L. I |*iiEi
`
`-I II :''''*n''' 4:
`PATENT
`
`NUMBER
`
`
`
`
`
`GROUP AFIT |:I_N|‘l']
`
`. —\ 1,.’
`(
`J
`
`I
`ft
`
`
`
`
`
`I
`I] ‘J 'j—j;'
`SE RIAL NUMBEfi
`
`
`
`
`***$*.5MfiLL ENTITY
`
`
`[3 yas
`.5‘ no
` Fumlgn priority cdah-nod
`A1'FORNEY‘S
`35U8G119nondfl:'unamet
`E!
`'
`you Ens
`(fir?
`nsrsInmaIs«'
`_:-_-e.
`‘E’.
`
`-'
`
`
`
`
`
` .
`1'
`
`.
`
`3,
`
`
`
`._
`
`L-‘fr
`
`’,,.LI
`
`__.
`
`{
`
`_,__‘.
`
`._
`
`,-_
`
`_I
`
`_':
`
`-I
`
`r
`
`.
`
`_I,-_-
`
`,._
`
`{I
`
`:.‘;A___‘"
`
`.
`
`-
`-
`_;I,n':'; »;
`
`'
`
`:-E
`
`-'.
`
`,
`/I”-. :
`
`
`
` PARTS OF APPLICATION
`
`FILED SEPARATELY
`
` Sheets Drwg.
`
`F‘gs.. Drwg.
`
`£2;
`
` 5, Sections 122, 181 and 358 Possession cutside the US.
`Patent& Trademark Office is resificrnc to zu.rLho.*izec empicye
`
`
`
`hpphNRQRVNq R TTV
`
`
`
`”i°AT—|'E|'\'l1". APPLIcATI6nM“
`=wmnsimmwamWaiuxv
`r
`U596 U379
`
`//II///IIW!/filmW/IIWIIW)//II
`
`flB9£UU?9
`
` _
`
`2s.
`
`25.
`
`_ 27.
`
`28.
`
`29.
`
`30.
`
`31.
`
`—
`
`.
`'
`32.
`_
`5-P
`2of335”
`___/—— OOG101 399
`hpphNRQRVNq S TTV
`
`
`
`e A r c. H. L";
`
`0.9.6“/1/:r%'z’§
`
`IS. Tiff‘. r’.‘-‘.r"f."?3‘
`
`.5.)-5 2’-L
`
`4?;
`
`of 33 .
`(RIGHT OUTSIDE) GOOG_1015_Page
`hpphNRQRVNq T TTV
`1
`
`
`
`
`
`SPEC. HAND
` FILE MAINT.
`
`
`
`INDEX OF CLAIMS
`
`99
`
`Has’
`
`
`
`[LEFT INSIDE}
`
`hpphNRQRVNq U TTV
`GOOG-1015-Page 4 of 33
`
` DFIAFTING
`
`_ POSITION CLASSIFIER
` EXAMINER
`TYPIST
`VEFIIFIEH
`COFIPS COFIFI.
`
`
`
`
`
`SERIAL NUMBER
`L 5.1::
`" _
`'73-_
`‘3.
`v‘:§-
`
`.
`
`.
`
`FILING ngws
`
`c_Laga
`
`I.-
`'4."
`
`in
`
`Fe.
`
`[cam low 33
`
`Fnreign priority oiaimed
`35 use 119 aondiflons mei
`
`'
`
`'
`
`_ Vsrfflad and Mimowladgad
`
`stars on
`oourmw 1:-mraa. owns
`
`E W
`
`I
`
`_.
`
`_
`
`_
`
`r 1-.-5 U, _t'/CG «Eu
`;.
`F3555 To ,::\zDer0T3:1=-y Cbm-n
`
`.-
`
`'.'FFlE_F_'AFlED=FO_F‘l‘.
`
`en‘:
`.' w§h__u_iNa§:': The jn+onna";1oi{a'racros'" ézi
`.
`-
`'
`"
`- by-tlw‘Uniia'd States Cb'c.'e'Tlfl
`
`gr
`3S,___
`
`hpphNRQRVNq V TTV
`Og_C_3__;1_Q1_5:Pa e 5 of 335
`
`
`
`
`
`€11
`
`_
`
`'
`n
`J-$5160
`
`'
`
`I
`
`'
`
`CONTENTS
`
`.4PF¥Rov.ED FEJR LICENSE D
`INITIQWLII
`51-"
`"
`--
`-
`—
`
`'3
`
`Date
`
`.
`
`""°§l"°“'
`Mailed
`
`\ 0"”/
`
`I/'
`
`
`
`.
`.
`1. zflpplfcarion
`
`H
`
`\
`
`papa,»-5_
`
`
`
`
`
`E; 2 r
`
`Tim:
`'
`
`I
`
`'
`
`+ i:%::><4‘ "5 #3
`'
`:'l
`
`-/I
`
`_
`
`
`
`28.
`
`29.
`
`30.
`
`00..L
`
`{FRONT}
`
`hpphNRQRVNq W TTV
`GOOG-1015-Page 6 of 335
`
`-.
`
`
`
`FILE MAINT.
`
`.r_xmI=s P—QHI=g_
`SPEC. HAND
`
`INDEX OF CLAIMS
`
`n k..
`
`nal
`
`C-
`
`v "J
`
` .<.._I._R.-Rjgj.
`
`I—._.
`
`2
`
`_.?2|-
`
`.
`
`.
`
`I?!‘ _
`75
`
`En
`
`.
`
`.
`
`.
`
`‘
`
`J80.
`
`I
`
`i.—. 3(D
`
`.1.
`
`_
`
`r
`
`'
`_
`. 4L
`— — —— —.__ 4
`
`.
`-
`_99+
`5-
`J
`'
`'
`I
`
`I_ 100
`* —-GooG—1ro35—P_aJge"7 6f 5 g
`hpphNRQRVNq X TTV
`
`ID
`0
`
`I -I
`
`=<.I:I_-Ix:Z
`
`I2.
`
`
`
`A G
`
`.._.-—r-—-w
`
`T
`
`
`
`|—
`
`SEARCHED
`
`Class
`
`Sub.
`
`Date
`
`Exmr.
`
`a
`
`I
`
`.
`
`6?:
`
`lf/1:1/‘fig
`
`4
`
`_
`
`5;
`“(H
`J
`I’
`
`6 I L
`
`Ir
`
`,.I
`
`F SEARCH NOTES
`U5?/r71/PU75
`Date
`
`/+‘T:fl~'I{:a'?1€.:5
`.552
`S(,=?‘«lr_r‘C-\
`
`ff
`
`INTERFERENCE SEARCHED
`SW
`
`
`
`hpphNRQRVNq Y TTV
`
`
`
`
`
`U.S. DEPARTMENT OF COMMERCE
`PATENT AND TRADEMARK OFFICE
`__..___,____;_
`FEE RECORD SHEET
`
`PTO—155fi
`(5/8?)
`
`hpphNRQRVNq Z TTV
`Gooe-1cn5—Pa9e 9 ¢3&5
`
`
`
`“
`
`.
`
`_
`_
`_
`'.--"-“'5 RPPLTCEATIONEER. -RUI-E'
`
`.*
`
`. -L:
`
`IIFTICE
`
`FATENI‘
`
`(bl Lid}. The Commissioner of Pat
`
`_"_
`Atty. Dist. 21.3287
`and Trademarks
`/
`1-9:
`‘L..'ashin,gton, n.c. 20231
`April 11. 1995
`Date‘
`Sir:
`] Design
`[at] Utility) entitled:
`.1. This is a Request for filing a. new Q§ (-[
`2. (glare Tit1e): mm IN A DATA PR0_G‘._E-Esnmza SYSTEM
`
`‘
`
`-
`
`._
`
`-
`
`client Ref.
`
`it :1 filing fee or D.et-.h/Decla.ration but for which is enclosed the following:
`3.
`[1] Abstract L pege(s).
`:1. 3; Pages of Specifirsation (only spec. and claims) 5.
`6. fl_ Numbered c1ai.In(s): and
`[x] formal of size:
`Ir‘.
`[x] Dra.w:Lngs:fisheet(s) per set:[x] 1 set informal; 8.
`9.
`I
`] Priority is claimed 35 USC 119/365 from foreign epplic.ation(sJ filed in
`
`[
`
`] Specification in non~Engli.sh Lzm,gu.age
`
`[
`
`] A4 [] 13" [
`
`]
`
`I4"
`
`(W-min)
`
`A licat
`
`No.
`
`Fili_ng Date
`'
`
`Filirlggte
`Agplicatiun. No.
`(3)
`(1)j _j__
`(4)
`(2)
`(No.) Certified copy/copies of priority a'pplication(s) attached.
`10.
`] SUBSTI‘I'LT‘I‘EAppl_1:1 (FJIPEP
`11. This iaa[]ReiseL:.eofUSP:_jj[]C-ONT[]DIV[]GIP [
`, wherein
`12.
`201.05) of
`No. 0
`Q’
`fi1ed_
`] previously filed.
`[
`1.3.
`extension to date:
`] concurrently filed [
`] not needed
`[
`[X]
`I.-’+.
`!,ttached:
`Information Disclosure Statement, Form P'I‘0—l1+49 Form and references.
`-
`,/*6’
`__--
`'
`
`153/
`
`a$ itfation is made by the following nmned imrento'r(s) (lbuhle check :l.1:Is1:ructions for ec|::uracy.):
`
`(1) Inventor
`
`in
`Family Rum
`its‘;
`(State/Foreign Gotmtgfl
`Residence__(c1cy) oil;
`Post Office Address220N Gar§_lo an Ojgi. ea 93n23
`(include Zip Code)
`
`A.
`
`Kiddie Initial
`
`U.S.A.
`FARBER
`Country of Citlzazuhip
`
`on
`
`(2) Inventor
`
`Ronald
`Family bfm--.
`First
`(state/Foggn cm~¢_r_,g
`ResidenceJggg) Nortlmrgzk
`F0512 Officfl mfie5S @_L 11- 50.0.52
`(include Zip Code)
`
`_D,_
`Middle I:n.I.t.Lal
`
`U,S.A.
`LACHMAN
`Country of Cllilflmhip
`
`Ii.
`
`(3) Imrentor
`
`Firm:
`Residence (Gity)_
`Post Office Address
`
`(include Zip Code)
`
`(4) Inventor
`
`'
`
`mm
`F-wit-‘1m¢_tflfl-'1!
`'. Post Office Address
`
`(include Zip Code.)
`
`Middle In.i|‘_Lnl
`
`lrm-.n:_1y Name
`('_3_f_ate/Fo1_'eign con;-$3
`
`{kgu-ntry .;.£ cgggzcnship
`
`Kiddie mam
`
`Emily Name
`eI;c.a_=Japru:1m_0<nmsa:L
`
`Country of Cltizezuhjp
`
`FOR ADDITIONAL , v
`15. NO'I'E:
`1200 New York Avenue
`5.11.
`cnemmw -5.
`..___.__._—_:___e
`Ninth Floor East Tower
`By .
`-
`-
`—
` -
`Te :
`(202) aa1—3ooo
`Atty/Sec:lEL::pgd
`Sig:
`
`'
`
`‘
`
`:
`
`
`
`-
`
`d attach _s‘h
`with same inform.ation..
`'
`Reg. No. 28 312
`
`'
`
`2
`
`,.
`
`-
`5
`
`(202) 322-0942.
`Fax:
`(202) 861-3527
`Tel.:
`) and 8 .
`
`__
`
`('l'I'.‘v-II
`
`IHI
`
`(1XJ—l
`
`GOOG-1015-Page 10 of 335 I
`hpphNRQRVNq RQ TTV
`
`
`
`
`
`UNDER UNITED STATES PATENT LAMS
`
`Invention: David A. Farber and Ronald D. Lachman
`
`Inventor(s):
`
`IDENTIFYING DATA IN A DATA PROCESSING SYSTEM
`
`Cushman Darby E Cushman, LL».
`1100 New York Avenue, N.W.r
`Ninth Floor, East Tower
`'
`Washington, D.C. 20005-3918
`Attorneys
`Telephone:
`(202) 861-3000
`
`This is a:
`
`[
`
`] Provisional Application
`
`[ X] Regular Utility Application
`
`[
`
`[
`
`[
`
`[
`
`[
`
`] Continuing Application
`
`] PCT National Phase Application
`
`] Design Application
`
`] Reissue Application
`
`] Plant Application
`
`SPECIFICATION
`
`I’_"JC-IEO
`
`1-I'll)
`
`hpphNRQRVNq RR TTV
`GOOG-1015-Page 11 of 335
`
`
`
`
`
`1
`
`3
`
`L
`
`F
`
`4 1”/“ gm,
`' J 68‘
`70‘|8!2'|398?
`
`§§QKGRQDND OF IEE INVENTION
`
`1.
`
`Field o: the invention
`
`This invention relates to data processing
`systems and, more particularly,
`to data processing
`systems wherein data items are identified by
`substantially unique identifiers which depend on all of
`the data in the data items and only on the data in the
`data items.
`
`2.
`
`Background of the Invention
`
`Data processing (DP) systems, computers,
`networks of computers, or the like, typically offer users
`and programs various ways to identify the data in the
`systems.
`
`Users typically identify data in the data
`
`processing system by giving the data some form of name.
`For example, a typical operating system (05) on a
`computer provides a file system in which data items are
`named by alphanumeric identifiers.
`Programs typically
`identify data in the data processing system using a
`location or address.
`For example, a program may identify
`a record in a file or database by using a record number
`which serves to locate that record.
`
`In all but the most primitive operating
`systems, users and programs are able to create and use
`
`these collections
`collections of named data items,
`themselves being named by identifiers. These named
`collections can then,
`themselves, be made part of other
`named collections.
`For example, an OS may provide
`mechanisms to group files [data items} into directories
`(collections). These directories can then,
`themselves be
`made part of other directories.
`A data item may thus be
`identified relative to these nested directories using a
`
`hpphNRQRVNq RS TTV
`GOOG-1015-Page 12 of 335
`
`
`
`sequence of names, or a so—called pathname, which defines
`a path through the directories to a particular data item
`(file or directory}.
`
`As another example, a database management
`system may group data records (data items)
`into tables
`and then group these tables into database files
`
`The complete address of any data record
`(collections).
`can then be specified using the database file name,
`the
`table name, and the record number of that data record.
`
`other examples of identifying data items
`identifying files in a network file system,
`include:
`identifying objects in an object—oriented database,
`identifying images in an image database, and identifying
`articles in a text database.
`I
`
`In general,
`
`the terms "data" and "data item" as
`
`used herein refer to sequences of bits.
`
`Thus a data item
`
`may be the contents of a file, a portion of a file, a
`page in memory, an object in an object~oriented program,
`a digital message, a digital scanned image, a part of a
`video or audio signal, or any other entity which can be
`The term "data
`represented by a sequence of bits.
`
`processing" herein refers to the processing of data
`items, and is sometimes dependent on the type of data
`item being processed.
`For example, a data processor for
`a digital image may differ from a data processor for an
`audio signal.
`
`In all of the prior data processing systems the
`names or identifiers provided to identify data items (the
`data items being files, directories, records in the
`
`database, objects in object—oriented programming,
`locations in memory or on a physical device, or the like}
`are always defined relative to a specific context.
`For
`instance,
`the file identified by a particular file name
`
`can only be determined when the directory containing the
`file (the context)
`is known.
`The file identified by a
`pathname can be determined only when the file system
`(context)
`is known. Similarly,
`the addresses in a
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`hpphNRQRVNq RT TTV
`GOOG-1015-Page 13 of 335
`
`
`
`process address space,
`
`the keys in a database table, or
`
`domain names on a global computer network such as the
`
`Internet are meaningful only because they are specified
`relative to a context.
`
`5
`
`In prior art systems for identifying data items
`there is no direct relationship between the data names
`and the data item.
`The same data name in two different
`
`contexts may refer to different data items, and two
`
`10
`
`different data names in the same context may refer to the
`same data item.
`
`In addition, because there is no correlation
`
`between a data name and the data it refers to,
`
`there is
`
`no a priori way to confirm that a given data item is in
`
`fact the one named by a data name.
`
`For instance,
`
`in a DP
`
`system, if one processor requests that another processor
`deliver a data item with a given data name,
`the
`
`requesting processor cannot,
`
`in general, verify that the
`
`data delivered is the correct data (given only the name).
`
`Therefore it may require further processing, typically on
`the part of the requestor,
`to verify that the data item
`
`it has obtained is,
`
`in fact,
`
`the item it requested.
`
`A common operation in a DP system is adding a
`new data item to the system. When a new data item is
`
`15
`
`20
`
`added to the system, a name can be assigned to it only by
`updating the context in which names are defined.
`Thus
`
`25
`
`such systems require a centralized mechanism for the
`
`Such a mechanism is required even
`management of names.
`in a multi~processing system when data items are created
`
`30
`
`35
`
`and identified at separate processors in distinct
`locations, and in which there is no other need for
`communication when data items are added.
`
`In many data processing systems or
`environments, data items are transferred between
`
`different locations in the system. These locations may
`be processors in the data processing system, storage
`
`For example, one processor
`devices, memory, or the like.
`may obtain a data item from another processor or from an
`
`hpphNRQRVNq RU TTV
`GOOG-1015-Page 14 of 335
`
`
`
`external storage device, such as a floppy disk, and may
`incorporate that data item into its system (using the
`name provided with that data item).
`However, when a processor (or some location]
`obtains a data item from another location in the DP
`system, it is possible that this obtained data item is
`already present in the system [either at the location of
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`In these
`processors sharing a common file server.
`systems, it is often the case that the same product will
`be installed on several systems,
`so that several copies
`of each file will reside on the common file server.
`In some data processing systems in which
`several processors are connected in a network, one system
`is designated as a cache server to maintain master copies
`of data items, and other systems are designated as cache
`clients to copy local copies of the master data items
`into a local cache on an as—needed basis. Before using a
`cached item, a cache client must either reload the cached
`item, be informed of changes to the cached item, or
`confirm that the master item corresponding to the cached
`item has not changed.
`In other words, a cache client
`must synchronize its data items with those on the cache
`server. This synchronization may involve reloading data
`items onto the cache client.
`The need to keep the cache
`synchronized or reload it adds significant overhead to
`existing caching mechanisms.
`
`In view of the above and other problems with
`prior art systems, it is therefore desirable to have a
`mechanism which allows each processor in a multiprocessor
`system to determine a common and substantially unique
`identifier for a data item, using only the data in the
`data item and not relying on any sort of context.
`
`hpphNRQRVNq RV TTV
`GOOG-1015-Page 15 of 335
`
`
`
`It is further desirable to have a mechanism for
`reducing multiple copies of data items in a data
`
`processing system and to have a mechanism which enables
`the identification of identical data items so as to
`
`It is further desirable to
`reduce multiple copies.
`determine whether two instances of a data item are in
`fact the same data item, and to perform various other
`
`systems’ functions and applications on data items without
`relying on any context information or properties of the
`data item.
`
`It is also desirable to provide such a
`mechanism in such a way as to make it transparent to
`users of the data processing system, and it is desirable
`that a single mechanism be used to address each of the
`problems described above.
`
`SUMAEY OF THE INVENTION
`
`in a data processing
`This invention provides,
`system, a method and apparatus for identifying a data
`item in the system, where the identity of the data item
`depends on all of the data in the data item and only on
`the data in the data item. Thus the identity of a data
`item is independent of its name, origin,
`location,
`address, or other information not derivable directly from
`the data, and depends only on the data itself.
`
`This invention further provides an apparatus
`and a method for determining whether a particular data
`item is present in the system or at a location in the
`system, by examining only the data identities of a
`plurality of data items.
`
`invention,
`
`Using the method or apparatus of the present
`the efficiency and integrity of a data
`
`processing system can be improved.
`The present invention
`improves the design and operation of a data storage
`system, file system, relational database, object—oriented
`
`database, or the like that stores a plurality of data
`items, by making possible or improving the design and
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`hpphNRQRVNq RW TTV
`GOOG-1015-Page 16 of 335
`
`
`
`operation of at least some or all of the following
`features:
`
`the system stores at most one copy of any data
`
`item at a given location, even when multiple data names
`in the system refer to the same contents;
`
`the system avoids copying data from source to
`destination locations when the destination locations
`
`already have the data;
`
`the system provides transparent access to any
`
`data item by reference only to its identity and
`
`independent of its present location, whether it be local,
`remote, or offline;
`
`the system caches data items from a server, so
`
`that only the most recently accessed data items need be
`retained;
`'
`
`when the system is being used to cache data
`
`items, problems of maintaining cache consistency are
`avoided;
`
`the system maintains a desired level of
`
`redundancy of data items in a network of servers,
`
`to
`
`protect against failure by ensuring that multiple copies
`
`of the data items are present at different locations in
`the system;
`
`the system automatically archives data items as
`
`they are created or modified;
`
`the system provides the size, age, and location
`
`of groups of data items in order to decide whether they
`
`can be safely removed from a local file system;
`the system can efficiently record and preserve
`
`any collection of data items;
`
`the system can efficiently make a copy of any
`
`collection of data items,
`
`to support a version control
`
`mechanism for groups of the data items;
`
`the system can publish data items, allowing
`
`other, possibly anonymous, systems in a network to gain
`access to the data items and to rely on the availability
`
`of the data items;
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`hpphNRQRVNq RX TTV
`GOOG-1015-Page 17 of 335
`
`
`
`the system can maintain a local inventory of
`
`all the data items located on a given removable medium,
`such as a diskette or CD—ROH,
`the inventory is
`independent of other properties of the data items such as
`
`5
`
`location, and date of creation;
`their name,
`the system allows closely related sets of data
`
`10
`
`15
`
`items, such as matching or corresponding directories on
`disconnected computers,
`to be periodically resynchronized
`with one another;
`
`the system can verify that data retrieved from
`
`another location is the desired or requested data, using
`only the data identifier used to retrieve the data;
`the system can prove possession of specific
`
`data items by content without disclosing the content of
`the data items, for purposes of later legal verification
`and to provide anonymity;
`
`the system tracks possession of specific data
`items according to content by owner,
`independent of the
`
`name, date, or other properties of the data item, and
`
`20
`
`tracks the uses of specific data items and files by
`content for accounting purposes.
`
`Other objects, features, and characteristics of
`
`the present invention as well as the methods of operation
`
`and functions of the related elements of structure, and
`
`25
`
`the combination of parts and economies of manufacture,
`will become more apparent upon consideration of the
`
`following description and the appended claims with
`
`reference to the accompanying drawings, all of which form
`a part of this specification.
`
`30
`'_(,Y')
`
`3:3 fiiE¥ION OF THE DRAWINGS
`BRIE
`gfiifl
`
`FIGUR
`
`e ic s a typical data processing
`
`.
`
`system in which a preferred embodiment of the present
`invention operates;
`
`FIGURE 2 depicts a hierarchy of data items
`
`35
`
`stored at any location in such a data processing system;
`
`hpphNRQRVNq RY TTV
`GOOG-1015-Page 18 of 335
`
`
`
`FIGURES 3-9 depict data structures used to
`implement an embodiment of the present invention; and
`
`FIGURES 10(a)—28 are flow charts depicting
`operation of various aspects of the present invention.
`
`HETAILED QESCRIPTION OF THE_PRESENTLY PREFERRED
`EXEMPLARX EMBODIMENTS
`
`An embodiment of the present invention is now
`described with reference to a typical data pfififfifisfig
`system 100, which, with reference to FIGURES ,
`incl des
`one or more processors (or computers) 102 and various
`
`storage devices 104 connected in some way, for example by
`a bus 106.
`
`Each processor 102 includes a CPU 108, a memory
`The CPU
`110 and one or more local storage devices 112.
`
`108, memory 110, and local storage device 112 may be
`internally connected, for example by a bus 114.
`Each
`
`processor 102 may also include other devices {not shown),
`such as a keyboard,
`a display, a printer, and the like.
`
`In a data processing system 100, wherein more
`
`than one processor 102 is used,
`
`that is,
`
`in a
`
`the processors may be in one of
`multiprocessor system,
`For example,
`two processors 102
`various relationships.
`may be in a clientfserver, client/client, or a
`
`server/server relationship. These inter~processor
`relationships may be dynamic, changing depending on
`particular situations and functions. Thus, a particular
`processor 102 may change its relationship to other
`
`processors as needed, essentially setting up a peer—to-
`
`In a peer—to-
`peer relationship with other processors.
`peer relationship, sometimes a particular processor 102
`acts as a client processor, whereas at other times the
`same processor acts as a server processor.
`In other
`
`there is no hierarchy imposed on or required of
`words,
`processors 102.
`
`In a multiprocessor system,
`
`the processors 102
`
`may be homogeneous or heterogeneous. Further,
`
`in a
`
`8
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`hpphNRQRVNq RZ TTV
`GOOG-1015-Page 19 of 335
`
`
`
`some or all of
`multiprocessor data processing system 100,
`the processors 102 may be disconnected from the network
`
`of processors for periods of time.
`
`Such disconnection
`
`may be part of the normal operation of the system 100 or
`it may be because a particular processor 102 is in need
`of repair.
`
`Within a data processing system 100,
`the data
`may be organized to form a hierarchy of data storage
`elements, wherein lower level data storage elements are
`combined to form higher level elements. This hierarchy
`can consist of, for example, processors, file systems,
`regions, directories, data files, segments, and the like.
`For example, with reference to FIGURE 2,
`the data items
`
`on a particular processor 102 may be organized or
`structured as a file system 116 which comprises regions
`11?, each of which comprises directories 118, each of
`which can contain other directories 118 or files 120.
`
`Each file 120 being made up of one or more data segments
`122.
`
`In a typical data processing system,
`
`some or
`
`all of these elements can be named by users given certain
`implementation specific naming conventions,
`the name {or
`pathnamej of an element being relative to a context.
`In
`
`the context of a data processing system 100, a pathname
`is fully specified by a processor name, a filesystem
`name, a sequence of zero or more directory names
`
`identifying nested directories, and a final file name.
`
`(Usually the lowest level elements,
`122, cannot be named by users.)
`
`in this case segments
`
`In other words,
`
`a file system 116 is a
`
`A directory 118 is a
`collection of directories 118.
`collection of named files 120 —— both data files 120 and
`other directory files 118.
`A file 120 is a named data
`
`item which is either a data file {which may be simple or
`compound) or a directory file 115.
`A simple file 120
`A compound file
`consists of a single data segment 122.
`A data
`
`120 consists of a sequence of data segments 122.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`hpphNRQRVNq SQ TTV
`GOOG-1015-Page 20 of 335
`
`
`
`segment 122 is a fixed sequence of bytes.
`property of any data segment is its size,
`bytes in the sequence.
`
`An important
`the number of
`
`A single processor 102 may access one or more
`file systems 116, and a single storage device 104 may
`contain one or more file systems 116, or portions of a
`file system 116.
`For instance, a file system 116 may
`span several storage devices 104.
`
`In order to implement controls in a file
`system, file system 116 may be divided into distinct
`regions, where each region is a-unit of management and
`control.
`A region consists of a given directory 118 and
`is identified by the pathname {user defined) of the
`directory.
`
`the term "location", with
`In the following,
`respect to a data processing system 100, refers to any of
`a particular processor 102 in the system, a memory of a
`particular processor, a storage device, a removable
`storage medium (such as a floppy disk or compact disk),
`or any other physical
`location in the system.
`The term
`"local" with respect to a particular processor 102 refers
`to the memory and storage devices of that particular
`processor.
`
`the terms "True Name", "data
`In the following,
`identity" and "data identifier" refer to the
`substantially unique data identifier for a particular
`data item.
`The term "True File" refers to the actual
`file, segment, or data item identified by a True Name.
`A file system for a data processing system 100
`is now described which is intended to work with an
`existing operating system by augmenting some of the
`operating system's file management system codes.
`The
`embodiment provided relies on the standard file
`
`management primitives for actually storing to and
`retrieving data items from disk, but uses the mechanisms
`of the present invention to reference and access those
`data items.
`
`10
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`hpphNRQRVNq SR TTV
`GOOG-1015-Page 21 of 335
`
`
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`The processes and mechanisms {services}
`provided in this embodiment are grouped into the
`following categories: primitive mechanisms, operating
`system mechanisms,
`remote mechanisms, background
`mechanisms, and extended mechanisms.
`
`Primitive mechanisms provide fundamental
`
`capabilities used to support other mechanisms.
`following primitive mechanisms are described:
`
`The
`
`1.
`
`2.
`3.
`
`4.
`
`5.
`
`6.
`7.
`8.
`
`Calculate True Name;
`
`Assimilate Data Item;
`New True File;
`
`Get True Name from Path;
`
`Link path to True Name;
`
`Realize True File from Location;
`Locate Remote File;
`Make True File Local;
`
`9.
`10.
`
`Create Scratch File;
`Freeze Directory;
`
`Expand Frozen Directory;
`11.
`12. Delete True File;
`
`13. Process Audit File Entry;
`14. Begin Grooming;
`
`15. Select For Removal; and
`
`16.
`
`End Grooming.
`
`Operating system mechanisms provide typical
`familiar file system mechanisms, while maintaining the
`data structures required to offer the mechanisms of the
`
`present invention. Operating system mechanisms are
`designed to augment existing operating systems, and in
`this way to make the present invention compatible with,
`and generally transparent to, existing applications.
`The
`following operating system mechanisms are described:
`1.
`Open File;
`
`2.
`
`3.
`
`Close File;
`
`Read File;
`
`4. Write File;
`
`5.
`
`Delete File or Directory;
`
`11
`
`hpphNRQRVNq SS TTV
`GOOG-1015-Page 22 of 335
`
`
`
`6.
`
`7.
`8.
`
`9.
`
`Copy File or Directory;
`
`Move File or Directory;
`Get File Status; and
`
`Get Files in Directory.
`
`5
`
`Remote mechanisms are used by the operating
`system in responding to requests from other processors.
`
`These mechanisms enable the capabilities of the present
`
`invention in a peer-to-peer network mode of operation.
`The following remote mechanisms are described:
`
`10
`
`15
`
`1.
`2.
`
`3.
`
`4.
`
`S.
`
`5.
`7.
`
`8.
`
`9.
`
`Locate True File;
`Reserve True File;
`
`Request True File;
`
`Retire True File;
`
`cancel Reservation;
`
`Acquire True File;
`Lock Cache;
`
`Update Cache; and
`
`Check Expiration Date.
`
`Background mechanisms are intended to run
`
`20
`
`occasionally and at a low priority. These provide
`automated management capabilities with respect to the
`
`present invention.
`are described:
`
`The following background mechanisms
`
`25
`
`30
`
`35
`
`1. Mirror True File;
`
`2.
`
`3.
`4.
`5.
`
`Groom Region;
`
`Check for Expired Links; and
`Verify Region; and
`Groom Source List.
`
`Extended mechanisms run within application
`programs over the operating system.
`These mechanisms
`
`provide solutions to specific problems and applications.
`The following extended mechanisms are described:
`
`1.
`
`2.
`
`3
`
`4.
`
`5
`
`Inventory Existing Directory;
`
`Inventory Removable, Read-only Files;
`
`Synchronize directories;
`
`Publish Region;
`
`Retire Directory;
`
`12
`
`hpphNRQRVNq ST TTV
`GOOG-1015-Page 23 of 335
`
`
`
`6.
`7.
`
`8.
`9.
`
`Realize Directory at location;
`Verify True File:
`
`Track for accounting purposes; and
`Track for licensing purposes.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`The file system herein described maintains
`
`sufficient information to provide a variety of mechanisms
`not ordinarily offered by an operating system,
`some of
`which are listed and described here. Various processing
`performed by this embodiment of the present invention
`will now be described in greater detail.
`
`In some embodiments,
`some files 120 in a data
`processing system 100 do not have True Names because they
`have been recently received or created or modified,;and
`A file
`thus their True Names have not yet been computed.
`that does not yet have a True Name is called a scratch
`file.
`The process of assigning a True Name to a file is
`
`referred to as assimilation, and is described later.
`
`Note that a scratch file may have a user provided name.
`
`some of the processing performed by the present
`invention can take place in a background mode or on a
`delayed or as—needed basis. This background processing
`is used to determine information that is not immediately
`required by the system or which may never be required.
`As an example,
`in some cases a scratch file is being
`changed at a rate greater than the rate at which it is
`
`In these cases,
`useful to determine its True Name.
`determining the True Name of the file can be postponed or
`performed in the background.
`
`Data Structures
`
`The following data structures, stored in memory
`110 of one of more processors 102 are used to implement
`the mechanisms described herein.
`The data structures can
`
`be local to each processor 102 of the system 100, or they
`can reside on only some of the processors 102.
`
`13
`
`hpphNRQRVNq SU TTV
`GooG—1o15-Page 24 of 335
`
`
`
`The data structures described are assumed to
`reside on individual peer processors 102 in the data
`processing system 100. However,
`they can also be shared
`by placing them on a remote, shared file server (for
`instance,
`in a local area network of machines}.
`In order
`to accommodate sharing data structures, it is necessary
`that the processors accessing the shared database use the
`appropriate locking techniques to ensure that changes to
`the shared database do not interfere with one another but
`are appropriately serialized. These locking techniques
`are well understood by ordinarily skilled programmers of
`distributed applications.
`
`It is sometimes desirable to allow some regions
`to be local to a particular processor 102 and other;
`regions to be shared among processors 102.
`(Recall that
`a region is a unit of file system management
`and control
`consisting of a given directory identified by the
`pathname of the directory.)
`In the case of local and
`shared regions,
`there would be both local and shared
`versions of each data structure.
`Simple changes to the
`processes described below must be made to ensure that
`appropriate data structures are selected for a given
`operation.
`
`table 124
`The local directory extensions (LDEJ
`is a data structure which provides information about
`files 120 and directories 118 in the data processing
`system 100.
`The local directory extensions table 124 is
`indexed by a pathname or contextual name {that is, a user
`provided name) of a file and includes the True Name for
`most files.
`The information in local directory extension
`table 124 is in addition to that provided by the native
`file system of the operating system.
`
`The True File registry (TFR) 126 is a data
`store for listing actual data items which have True
`Names, both files 120 and segments 122. When such data
`items occur in the True File registry 126 they are known
`as True Files.
`True Files are identified in True File
`
`14
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`hpphNRQRVNq SV TTV
`GOOG-1015-Page 25 of 335
`
`
`
`The
`registry 126 by their True Names or identities.
`table True File registry 126 also stores location,
`dependency, and migration information about True Files.
`The region table (RT) 123 defines areas in the
`
`10
`
`15
`
`20
`
`The source table (ST) 130 is a list of the
`sources of True Files other than the current True File
`registry 125.
`The source table 130 includes removable
`volumes and remote processors.
`
`The audit file {AF} 132 is a list of records
`indicating changes to be made in local or remote files,
`these changes to be processed in background.
`The accounting log (AL) 134 is a log of file
`transactions used to create accounting information in a
`manner which preserves the identity of files being
`tracked independent of their name or location.
`The license table (LT) 136 is a table
`identifying files, which may only be used by licensed
`users,
`in a manner independent of their name or location,
`and the users licensed to use them.
`
`25
`
`Detailed Descriptions of the Data Structures
`The following table summarizes the fields of an
`local directory extensions table entry, as illustrated by
`record 138 in FIGURE 3.
`
`
`
`Description
`
`
`
`identifies the region in which this file is
`contained.
`
`
`
`lEEEHifi!HH‘N
`
`30
`
`Pathname
`
`
`
`
`
`
`
`True Name
`
`the user provided name or contextual name
`of the file or directory, relative to the
`region in which it occurs.
`
`
`
`the computed True Name or identity of the
`file or directory. This True Name is not
`always up to date, and it is set to a
`special value when a file is modified and
`is later recom-uted in the backoround.
`
`
`l5
`
`hpphNRQRVNq SW TTV
`GOOG-1015-Page 26 of 335
`
`
`
`
`
`the physical location of the file in the
`file system, when no True