`New US Application
`|nventor(s): David A. FARBER and Ronald D. LACHMAN
`Title: ACCESSING DATA IN A DATA PROCESSING SYSTEM
`Date: March 15, 2007
`
`)
`
`Docket No.: 2618-0015
`
`.
`'
`
`U-S- PTO
`11/724232
`
`03/15/2007
`
`THE FOLLOWING FILING FEE IS BASED ON CLAIMS AS FILED LESS ANY ABOVE CANCELLED
`
`.
`
`Large ISmall
`
`
`
`
`$ 300
`
`$ 500
`
`$ 200
`
`$350
`
`
`
`
`
`
`
`18a. Basic Filing Fee
`-
`
`18b. Search Fee
`
`
`18c. Examination Fee
`
`
`—__ Extra cm —
`——— 7
`
`
`
`resent. add:
`le de endent claim i noreim ro er is
`21. Ifan
`ro ermulti
`
`P
`P
`P
`_ILP_P_
`(9
`P
`P
`)
`$360/$180
`$360
`
`Leave this line blank if this is a reissue application)
`
`
`
`
`
`Application Size Fee (lfthe specification and drawings exceed ‘100 sheets of paper (excluding electronically filed
`sequence or computer listings under 37 CFR l.52(e)), the application size fee due is $250 ($125 for small entity) for each
`
`
`additional 50 sheets or fraction thereof. See 35 U.S.C. 4l(a)(l)(G) and 37 CFR 1.16(s).)
`
`Number of each additional
`
`
`50 or fraction thereof
`
`
`
`Extra
`
`(round up to a whole
`x $250l$150
`Sheets
`
`number)
`
`
`
`
`23. Total Filing Fee Enclosed:
`
`
`24.
`If “non-English" box 2 is X'd, add Rule 17(k) processing fee
`
`25.
`If “assignment” box 9 is X’d, add recording fees ($40 per assignment)
`
`26. [I Attached is a Petition/Fee under Rule No.
`27. Total Fee:
`
`Total
`Sheets
`
`Minus 100
`sheets
`
`
`
`
`
`
`
`28. XI Please charge the total fee to our deposit account below under the stated order no.: 2618-0015
`Our Deposit Account No.: 501860.
`CHARGE STATEMENT: The Commissioner is hereby authorized to charge any fee specifically authorized hereafter, or any
`missing or insufficient fee(s) filed, or asserted to be filed. or which should have been filed herewith or concerning any paper filed
`hereafter, and which may be required under Rules 16-18 (missing or insufficient fee only) now or hereafter relative to this
`application and the resulting Official document under Rule 20, or credit any overpayment. to our Account/Order Nos. shown above
`for which purpose a duplicate copy of this sheet is attached.
`
`This Charge Statement does not authorize charge of the issue fee until/unless an issue fee transmittal form is filed.
`
`29. Correspondence Address: Use the address associated with customer number 42624.
`
`CUSTOMER NUMBER
`
`42624
`
`Davidson Berquist Jackson & Gowdey, LLP
`703.894.6400
`
`703.894.6430 (Facsimile)
`
`
`
`GOOG-1024-Page 1 of1’l4
`
`GOOG-1024-Page 1 of 114
`
`
`
`
`
`IN THE UNITED STATES PATENT AND TRADEMARK OFFICE
`
`REQUEST FOR FILING NATIONAL PATENT APPLICATION
`
`Under 35 USC 1111a) and Rule 53(b)
`
`Hon. Commissioner of Patents
`PO. Box 1450
`Alexandria, VA 22313-1450
`
`.
`
`1
`’
`
`-
`
`‘
`
`Atty. Dkt. No.: 2618-0015
`
`Date: March 15, 2007
`
`NON-PROVISIONAL - NON REISSUE - NON PCT NAT PHASE
`
`‘ Sir:
`
`Herewith is the PATENT APPLICATION of:
`
`Title:
`
`ACCESSING DATA IN ADATA PROCESSING SYSTEM
`
`Including:
`
`1. Specification: E total pages (only spec. and claims)
`
`2.
`
`[:1 Specification in non-English language
`
`3. E] Application Data Sheet (3 Pages)
`
`4. E Return Receipt Postcard
`
`5.
`
`IX Oath or Declaration 1 total pages.
`
`5.a. I] Newly executed (El Original
`
`[:1 Facsimile/Copy)
`
`, or 5.b X Copy from prior application.
`
`’ 6.
`
`Abstract 1 page(s); 7. 21 claims.
`
`7.
`
`8.
`
`9.
`
`IX} Drawings: fl total sheet(s) of drawings
`
`I] Attached are assignment papers and cover sheet. Please return the recorded assignment to the undersigned.
`
`[Z Prior application is assigned to Level 3 CommunicationsI LLC by Assignment recorded on February 2, 2007;
`
`Reel 018847/Frame0077and to KINETECHI Inc. by Assignment recorded on November 15 2001; Reel
`012313/Frame 0446.
`
`10. DOMESTIC/INTERNATIONAL priority is claimed under 35 USC 119(e)/120/365(c) based on the following provisional,
`non-provisional and/or PCT international application(s):
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`11. Small Entity Status: E is NOT claimed [:1 is claimed.
`
`12. 1:] NONPUBLICATION REQUEST under Rule 213(a) attached.
`
`13. El Preliminary Amendment.
`
`14. E] This application is being filed under Rule 53(b)(2) since an inventor is named in the enclosed Declaration who
`was not named in the prior application.
`
`15.
`
`Attached: information Disclosure Statement; Form PTO-1449.
`
`
`
`GOOG-1024-Page 2 of 1+4
`
`GOOG-1024-Page 2 of 114
`
`
`
`New US. Application
`lnventor(s): David A. FARBER and Ronald D. LACHMAN
`Title: ACCESSING DATA IN A DATA PROCESSING SYSTEM
`Date: March 15, 2007
`
`Docket No.: 2618-0015
`
`16. El Power of Attorney
`
`If a CONTINUING APPLICATION, check appropriate box and supply the requisite information below and in the first sentence of the
`17.
`specification following the title. and in the Application Data Sheet under 37 CFR 1.76. This application is a
`
`of prior application no.: : 11/017,650 filed December 22,2004,
`C] Continuation-in-part (CIP)
`[XI Continuation El Divisional
`which is a continuation of and claims priority to application no. 09/987,723, filed November 15, 2001, now US. Patent No. 6,928,442,
`which is a continuation of application No. 09/283,160, filed April 1, 1999, now US. Patent No. 6,415,280, which is a division of
`application Ser. No. 08/960,079, filed Oct. 24, 1997, now US. Pat. No. 5,978,791 filed Oct. 24, 2001 which is a continuation of Ser. No.
`08l425,160, filed Apr. 11, 1995, now abandoned, IX the entire contents of which each of these applications are incorporated herein by
`reference.
`
`This application is also a .
`
`of prior application no.: : 10/742,972, filed December 23. 2003,
`E] Continuation-in-part (CIP)
`IE Continuation E] Divisional
`which is a division of and claims priority to application no. 09/987,723, filed November 15, 2001, now US. Patent No. 6,928,442, which
`is a continuation of application No. 09/283,160, filed April 1, 1999, now US. Patent No. 6,415,280, which is a division of application Ser.
`No. 08/960,079, filed Oct. 24, 1997, now US Pat. No. 5,978,791 filed Oct. 24, 2001 which is a continuation of Ser. No. 08l425,160,
`filed Apr. 11, 1995, now abandoned, [Z the entire contents of which each of these applications are incorporated herein by reference.
`
`
`
`GOOG-TBZ4-Page 3 of 114
`
`GOOG-1024-Page 3 of 114
`
`
`
`ACCESSING DATA IN A DATA PROCESSING SYSTEM
`
`RELATED APPLICATIONS
`
`[0001]
`
`_ This is a continuation of and claims priority to co-pending
`
`application no. 11/017,650, filed December 22, 2004, which is a continuation of
`
`application No. 09/987,723, filed November 15, 2001, now US. Patent No.
`
`6,928,442, which is a continuation of application No. 09/283,160, filed April 1,
`
`1999, now US. Patent No. 6,415,280, which is a division of application Ser. No.
`
`08/960,079, filed Oct. 24, 1997, now US. Pat. No. 5,978,791, which is a
`
`continuation of Ser. No. 08/425,160, filed Apr. 11, 1995, now abandoned, the
`contents of which each of these applications are hereby incorporated herein by
`
`reference. This is also a continuation of and claims priority to co-pending
`
`application no. 10/742,972, filed December 23, 2003, which is a division of
`
`application No. 09/987,723, filed November 15, 2001, now US. Patent No.
`
`6,928,442, which is a continuation of application No. 09/283,160, filed April 1,
`
`1999, now US. Patent No. 6,415,280, which is a division of application Ser. No.
`
`08/960,079, filed Oct. 24, 1997, now US. Pat. No. 5,978,791, which is a
`
`continuation of Ser. No. 08/425,160, filed Apr. 11, 1995, now abandoned, the
`
`contents of which each of these applications are hereby incorporated herein by
`
`reference.
`
`BACKGROUND OF THE INVENTION
`
`1. FIELD OF THE INVENTION
`
`[0002]
`
`This invention relates to data processing systems and, more
`
`particularly, to data processing systems whereindata items are identified by
`
`substantially unique identifiers which depend on all of the data in the data items
`
`and only on the data in the data items.
`
`2618-0015
`
`GOOG-1024-Page 4 of 114
`
`GOOG-1024-Page 4 of 114
`
`
`
`2. BACKGROUND OF THE. INVENTION
`
`[0003]
`
`Data processing (DP) systems, computers, networks ofcomputers, or
`
`the like, typically offer users and programs various ways to identify the data in the
`
`systems.
`
`[0004]
`
`Users typically identify data in the data processing system by giving
`
`the data some form of name. For example, a typical operating system (OS) on a-
`computer provides a file system in which data items are named by alphanumeric
`
`identifiers. Programs typically identify data in the data processing system using a
`
`location or address. For example, a program may identify a record in a file or
`
`database by using a record number which serves to locate that record.
`
`In all but the most primitive operating systems, users and programs
`[0005]
`are able to create and use collections of named data items, these collections
`‘
`
`themselves being named by identifiers. These named collections can then,
`
`themselves, be made part of other named collections. For example, an OS may
`provide mechanisms to group files (data items) into directories (collections).
`These directories can then, themselves be made part of other directories. A data
`item may thus be identified relative to these nested directories using a sequence of
`
`names, or a so-called pathname, which defines a path through the directories to a
`
`‘
`particular data item (file or directory).
`[0006] i
`As another example, a database management system may group data
`
`records (data items) into tables and then group these tables into database files
`
`(collections). The complete address of any data record can then be specified using
`the database file name, the table name,'an'd the record number of that data record.
`
`Other examples of identifying data itemsinclude: identifying files in
`[0007]
`a network file system, identifying objects in an object-oriented database,
`
`identifying images in an image database, and identifying articles in a text database.
`[0008]
`i
`In general, the terrns "data" and Y'data item" as used herein refer to
`sequences of bits. Thus a data item may be the contents ofa file, a portion of a
`file, a page in memory, an object in an object-oriented program, a digital message,
`
`2618-0015
`
`_ 2 _
`
`
`
`GOOG-TBZA-Page 5 of 1+4
`
`GOOG-1024-Page 5 of 114
`
`
`
`a digital scanned image, a part of a video or audio signal, or any other entity which
`
`can be represented by a sequence of bits. The term "data processing" herein refers
`to the processing of data items, and is sometimes dependent on the type of data
`item being processed. For example, a data processor for a'digital image may differ
`
`from a data processor for an audio signal.
`
`[0009]
`In all of the prior data processing systems the names or identifiers
`provided to identify data items (the data items being files, directories, records in
`
`the database, objects in object-oriented programming, locations in memory or on a
`
`physical device, or the like) are always defined relative to a specific context. For
`
`instance, the file identified by a particular file name can only be determined when
`
`the directory containing the file (the context) is known. The file identified by a
`pathname can be determined only when the file system (context) is known.
`Similarly, the addresses in a process address space, the keys in a database table, or
`domain names on a global computer network such as the Internet are meaningful
`
`only because they are specified relative to a context.
`
`[0010]
`
`In prior art systems for identifying data items there is no direct
`
`relationship between the data names and the data item. The same data name in two
`
`different contexts may refer to different data items, and two different data names
`
`in the same context may refer to-the same data item.
`
`[0011]
`
`In addition, because there is no correlation between a data name and
`
`the data it refers to, there is no a priori way to confirm that a given data item is in
`
`fact the one named by a data name. For instance, in a DP system, if one processor
`
`requests that another processor deliver a data item with a given data name, the
`
`requesting processor cannot, in general, verify that the data delivered is the correct
`data. (given'only the name). Therefore it may require further processing, typically
`on the part ofthe requestor, to verify that the data item it has obtained is, in fact,
`the item it requested.
`I
`l
`[0012]
`A common operation in a DP system is adding a new data item to
`the system. When a new data item is added to the system, a name can be assigned
`
`2618-0015
`
`_ 3 _
`
`GOOG-1024-Page 6 of 114
`
`GOOG-1024-Page 6 of 114
`
`
`
`to it only by updating the context in which names are defined. Thus such systems
`
`require a centralized mechanism for the management of names. Such a mechanism
`is required even in a multi-processing system when data items are created and
`
`identified at separate processors in distinct locations, and in which there is no
`
`other need for communication when data items are added.
`
`[0013]
`
`In many data processing systems or environments, data items are
`
`transferred between different locations in the system. These locations may be
`processors in the data processing system, storage devices, memory, or the like. For
`
`example, one processor may obtain a data item from another processor or from an
`external storage device, such as a floppy disk, and may incorporate that data item
`into its system (using the name provided with that data item).
`[0014]
`However, when a processor (or some location) obtains a data item
`
`from another. location in the DP system, it is possible that this obtained data item is
`
`already present in the system (either at the location of the processor or at some
`
`other location accessible by the processor) and therefore a duplicate of the data
`
`item is created. This situation is common in a network data processing
`environment where proprietary software products are installed from floppy disks
`
`onto several processors sharing a common file server. In these systems, it is often
`the case that the same product will be installed on several systems, so that several
`
`copies of each file will reside on the common file Server.
`
`[0015]
`
`In some data processing systems in which several processors are
`
`connected in a network, one system is designated as a cache server to maintain
`master copies of data items, and other systems are designated as cache clients to
`copy local copies ofthe master data items into a local cache on an as-n-eeded basis.
`Before using a cached item, a cache client must either reload the cached item, be
`informed of changes to the cached item, or confirm that the master item 0
`corresponding to the cached item has not changed. In other words, a cache client
`
`must synchronize its data items with those on the cache server. This
`synchronization may involve reloading data items onto the cache client. The need
`
`2618-0015
`
`_ 4 _
`
`
`
`GOOG-TBZA-Page 7 of 1+4
`
`GOOG-1024-Page 7 of 114
`
`
`
`to keep the cache synchronized or reload it adds significant overhead to existing
`caching mechanisms.
`I
`
`[0016]
`In View of the above and Other problems with prior art systems, it is
`therefore desirable to have a mechanism'which allows each processor in a
`multiprocessor system to determine a common and substantially unique identifier
`for-a data item, using only the data in the data item and not relying on any sort of
`
`context.
`
`[0017]
`
`It is further desirable to have a mechanism for reducing multiple
`
`copies of data items in a data processing system and to have a mechanism which
`enables the identification of identical data items so as to reduce multiple copies. It
`
`is further desirable to determine whether two instances of a data item are in fact
`
`the same data item, and to perform various other systems' functions and
`applications on data items without relying on any context information or
`
`properties of the data item.
`
`V
`
`[0018]
`
`It is also desirable to provide such a mechanism in such a way as to
`
`make it transparent to users of the data processing system, and it is desirable that a
`
`single mechanism be used to address'each of the problems described above.
`
`SUMMARY OF THE INVENTION
`
`[0019]
`
`This invention provides, in a data processing system, a method and
`
`apparatus for identifying a data item in the system, where the identity of the data
`item depends on all of the data in the data item and only on the data in the data
`item. Thus the identity of a data item is. independent of its name, origin, location,
`address, or other information not derivable directly from the data, and depends
`
`only on the data itself.
`
`-
`
`[0020]
`
`This invention further provides an' apparatus and a method for
`
`p determining whether a particular data item is present in the system or at a location
`
`in the system, by examining only the data identities of a plurality of data items.
`
`2618-0015
`
`
`
`GOOG-tBZA-Page s of 1+4
`
`GOOG-1024-Page 8 of 114
`
`
`
`[0021]
`
`Using the method or apparatus of the present invention,'the
`
`efficiency and integrity of a data processing system can be improved. The present
`
`invention improves the design and operation of a data storage system, file system,
`
`relational database, object-oriented database, or the like that stores a plurality of
`
`data items, by making possible or improving the design and operation of at least
`
`some or all of the following features:
`[0022]
`the system. stores at mest one Tcopy of any data item at a'given
`location, even when multiple data names in the system refer to the same contents;
`
`[0023]
`the system avoids copying data from source to destination locations
`when the destination locations already have the data;
`[0024]
`the system provides transparent access to any data item by reference
`
`only to its identity and independent of its preSent location, whether it be local,
`
`remote, or offline;
`[0025]
`the system caches data items from a server, so that only the most ‘
`
`recently accessed data items need be retained;
`[0026]
`when the system is being used to cache data items, problems of
`
`maintaining cache consistency are avoided;
`
`[0027]
`
`the system maintains a desired level of redundancy of data items in a
`
`network of servers, to protect against failure by ensuring that multiple copies of
`
`the data items are present at different locations in the system;
`
`[0028]
`
`the system automatically archives data items as they are created or
`
`modified;
`
`-
`
`[0029]
`
`the system provides the size, age, and location of groups of data
`
`items. in order to decide whether they can be safely removed-from a local file
`
`system;
`[0030]
`
`items;
`
`[0031]
`
`I
`
`the system can efficiently record and preserve any collection of data
`
`‘ the system can‘efficiently make acopy of any collection of data
`
`items, to support a version control mechanism for groups of the data items;
`
`.
`2618-0015
`
`- 6 —
`
`
`
`GOOG-TBZA-Page 9 of1’l4r
`
`GOOG-1024-Page 9 of 114
`
`
`
`[0032]
`
`the system can publish data items, allowing other, possibly
`
`anonymous, systems in a network to gain access to the data items and to rely on
`
`the availability of the data items;
`
`[0033]
`
`the system can maintain a local inventory of all the data items
`
`located on a given removable medium, such as a diskette or CD-ROM, the
`inventory is independent of other properties of the data items such as their name,
`. location, and date of creation;
`
`the system allows closely related sets of data items, such as
`[0034]
`matching or corresponding directories on disconnected computers, to be
`
`periodically resynchronized with one another;
`
`[0035]
`
`the system can verify that data retrieved from another location is the
`
`desired or requested data, using only the data identifier used to retrieve the data;
`[0036]
`the system can prove possession of specific data items by content ‘
`without disclosing the content of the data items, for purposes of later legal
`
`verification and to provide anonymity;
`
`[0037]
`
`the system tracks possession of specific data items according to
`
`content by owner, independent of the name, date, or other properties of the data
`
`item, and tracks the uses of specific data items and files by content for accounting
`
`purposes. ,
`
`[0038]
`
`Other objects, features, and characteristics of the present invention
`
`as well as the methods of operation and functions of the related elements of
`
`structure, and the combination of parts» and economies of manufacture, will
`
`become more apparent upon consideration of the following description and the
`
`appended claims with reference to the accompanying drawings, all of which form
`
`a part of this specification.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`[0039]
`
`Figures 1(a) and 1(b) depict a typical data processing system in
`
`which a preferred embodiment of the present invention operates;
`
`2618-0015
`
`_ 7 _
`
`
`
`600%1624-Page T60f1t4
`
`GOOG-1024-Page 10 of 114
`
`
`
`[0040]
`
`Figure 2 depicts a hierarchy of data items stored at any location in
`
`such a data processing system;
`
`[0041]
`
`Figures 3-9 depict data structures used to implement an embodiment
`
`of the present invention; and
`
`[0042]
`
`Figures 10(a)-28 are flow charts depicting operation of various
`
`aspects of the present invention.
`
`DETAILED DESCRIPTION OF THE PRESENTLY PREFERRED
`EXEMPLARY EMBODIMENTS
`I An embodiment of the present invention is now described with
`
`[0043]
`
`reference to a typical data processing system 100, which, with reference to FIGS.
`
`1(a) and 1(b), includes one or more processors (or computers) 102 and various
`
`’ storage devices 104 connected in some way, for example by a bus 106.
`[0044]
`Each processor 102 includes a CPU 108, a memory 110 and one or
`
`more local storage devices 112. The CPU 108, memory 110, and local storage
`
`device 112 may be internally connected, for example by a bus 114. Each processOr
`
`102 may also include other devices (not shown), such as a keyboard, a display, a
`printer, and the like.
`I
`
`[0045]
`
`In a data processing system 100, wherein more than one processor
`
`102 is used, that is, in a multiprocessor system, the processors may be in one of
`
`various relationships. For example, two processors 102 may be in a client/server,
`
`client/client, or a server/server relationship. These inter-processor relationships
`
`may be dynamic, changing depending on particular situations and fianctions. Thus,
`
`a particular processor 102 may change its relationship to other processors as
`needed, essentially setting up a peer-to-peer relationship With other processors. In
`a peer—to—peer relationship, sometimes a particular processor 102 acts as a client
`
`processor, whereas at other times the same processor acts as a server processor. In
`
`other words, there is no'hierarchy imposed on or required of processors 102.
`
`2618-0015
`
`
`
`600%1624-Page110f1’l4
`
`GOOG-1024-Page 11 of 114
`
`
`
`.
`
`[0046]
`
`In a multiprocessor system, the processors 102* may be homogeneous
`
`or heterogeneous. Further, in a multiprocessor data processing system 100,'some
`
`or all of the processors 102 may be disconnected from the network of processors
`for periods of time. Such disconnection may be part of the normal operation of the
`
`system 100 or it maybe because a particular processor 102 is in need of repair.
`
`[0047]
`
`Withinva data processing system 100, the data may be organized to
`
`form a hierarchy of data storage elements, wherein lower level data storage
`elements are combined to form higher level elements. This hierarchy can consist
`
`of, for example,processors, file systems, regions, directories, data files, segments,
`
`and the like. For example, with reference to FIG. 2, the data items on a particular
`
`processor 102 may be organized or structured as a file system 116 which
`comprises regions 117, each of which comprises directories 118, each of which
`
`can contain other directories 1.18 or files 120. Each file 120 being made up of one
`
`or more data segments 122.
`
`.
`
`[0048]
`
`In a typical data processing system, some or all of these elements
`
`can be named by users given certain implementation specific naming conventions,
`the name (or pathname) of an element being relative to a context. In the context of
`
`a data processing system 100, a pathname is fully specified by a processor name, a
`
`filesystem name, a sequence of zero or more directory names identifying nested
`
`directories, and a final file name. (Usually the lowest level elements, in this case
`segments 122, cannot be named by users.)
`V
`
`[0049]
`
`In other words, a file system 116 is a collection of directories 118. A
`
`directory 118 is a collection of named files 120--both data files 120 and'other
`directory files 118.“A file 120 is a named data item which is either a data file
`
`.
`
`(which may be simple or compound) or a directory file 118. A simple file 120
`consists of a single data segment 122. A compound file 120 consists of a sequence
`
`of data segments 122. A data segment 122 is a fixed sequence of bytes. An
`
`important property of any data segment is its size, the number of bytes in the
`
`sequence.
`
`2618-0015
`
`
`
`600%1624-Page, 12 of1t4
`
`GOOG-1024-Page 12 of 114
`
`
`
`[0050]
`A single processor 102 may access one or more file systems 116,
`and a single storage device 104 may contain one or more file systems 116, or"
`
`portions of a file system 116. For instance, a file system 116 may span several
`
`storage devices 104.
`
`[0051]
`In order to implement controls in a file system, file system 116 may
`be divided into distinct regions, where each region is a unit of management and
`
`control. A region consists of a given directory 118 and is identified by the
`
`pathname (user defined) of the directory.
`
`[0052]
`
`In the following, the term "location", with respect to a data '
`
`processing system 100, refers to any of a particular processor 102 in the system, a
`
`memory of a particular processor, a storage device, a removable storage medium
`
`(such as a floppy disk or compact disk), or any other physical location in the
`
`system. The term "local" with respect to a particular processor 102 refers to the
`
`memory and storage devices of that particular processor.
`
`[0053]
`
`In the following, the terms "True Name", "data identity" and "data
`
`identifier" refer to the substantially unique data identifier for a particular data item.
`
`The term "True File" refers to the actUal file, segment, or data item identified by a
`
`True Name.
`
`A file system for a data processing system 100 is now described
`[0054]
`which is intended to work with an existing operating system by augmenting some
`i ofthe operating system's file management system codes. The embodiment
`
`provided relies on the standard file management primitives for actually storing to
`
`and retrieving data items from disk, but uses the mechanisms of the present
`
`invention to reference and access those data items.
`
`[0055]
`The processes and mechanisms (services) provided in this
`embodiment are grouped into the following categories: primitive} mechanisms,
`operating system mechanisms, remote mechanisms, background mechanisms, and
`
`extended mechanisms.
`
`2618-0015
`
`_ 10 _
`
`
`
`GOO&-’IfiZ4-Page130f1t4
`
`GOOG-1024-Page 13 of 114
`
`
`
`[0056]
`
`Primitive mechanisms provide fundamental capabilities used to
`
`support other mechanisms. The following primitive mechanisms are described:
`
`1. Calculate True Name;
`
`2. Assimilate Data Item;
`
`3. True File;
`
`4. Get True Name from Path;
`
`5. Link path to True Name;
`
`6. Realize True File from Location;
`
`7. Locate Remote File;
`
`8. Make True File Local;
`
`9. Create Scratch File; .
`
`10. Freeze Directory;
`
`11. Expand Frozen Directory;
`' 12. Delete True File;
`
`‘ 13. Process Audit File Entry;
`
`14. Begin Grooming;
`
`15. Select For Removal; and
`
`16. End'Grooming.
`
`[0057].
`
`Operating system mechanisms provide typical familiar file system
`
`mechanisms, while maintaining the data structures required to offer the
`
`mechanisms of the, present invention. Operating system mechanisms are designed
`
`to augment existing operating systems, and in this way to make the present
`
`invention compatible with, and generally transparent to, existing applications. The
`
`following operating system mechanisms are described:
`
`1. Open File;
`
`2. Close File;
`
`3. Read File;
`
`4. Write File;
`
`5. Delete File or Directory;
`
`2618-0015
`
`-11;
`
`
`
`GCKX34624$%©e14cfi1+4
`
`GOOG-1024-Page 14 of 114
`
`
`
`6. Copy File or Directory;
`
`7. Move File or Directory;
`
`8. Get File Status; and
`
`9. Get Files in Directory.
`
`.
`
`[0058]
`
`Remote mechanisms are used by the operating system in responding
`
`to requests from other processors. These mechanisms enable the capabilities of the
`
`present invention in a peer-to-peer network mode of operation. The following
`
`remote mechanisms are described:
`
`1. Locate True File;
`
`\OOOQONUIJAUJN
`
`. Reserve True File;
`
`. Request True File;
`. Retire True File;
`
`. Cancel Reservation;
`
`. Acquire True File;
`
`. Lock Cache;
`
`. Update Cache; and
`
`. Check Expiration Date.
`
`[0059]
`
`Background mechanisms are intended to run occasionally and at a
`
`low priority. These provide automated management capabilities with respect to the
`
`present invention. The following background'mechanisms are described:
`
`1. Mirror True File;
`
`2. Groom Region;
`
`3. Check for Expired Links; and
`
`4. Verify Region; and
`
`5. Groom Source List.
`
`[0060]
`
`Extended mechanisms run within application programs over the
`
`operating system. These mechanisms provide solutions to specific problems and
`
`applications. The following extended mechanisms are described: -
`
`1. Inventory Existing Directory;
`
`2618-0015
`
`_ 12 _
`
`
`
`600%1624-Page +50=r1+4
`
`GOOG-1024-Page 15 of 114
`
`
`
`2. Inventory Removable, Read-only Files;
`
`3. Synchronize directories;
`
`4. Publish Region;
`
`5. Retire Directory;
`6. Realize Directory at location;
`
`7. Verify True File;
`
`8. Track for accounting purposes; and
`
`9. Track for licensing purposes.
`
`[0061]
`
`The file system herein described maintains sufficient information to
`
`provide a variety of mechanisms not ordinarily offered by an operating system, ‘
`
`some of which are listed and described here. Various processing performed by this
`embodiment of the present invention will now be described in greater detail.
`
`In some embodiments, some files 120 in a data processing system
`[0062]
`100 do not have True Names because they have been recently received or created
`
`or modified, and thus their True Names have not yet been computed. A file that
`
`does not yet have a True Name is called a scratch file. The process of assigning a
`True Name to a file is referred to as assimilation, and is described later. Note that a
`
`scratch file may have a user provided name.
`
`[0063]
`
`Some of the processing performed by the present invention can take
`
`place in a background mode or on a delayed or as-needed basis. This background
`
`_ processing is used to determine information that is not immediately required by
`
`the system or which may never be required. As an example, in some cases a
`
`scratch file is being changed at a rate greater than the rate at which it is useful to
`
`determine its True Name.»In these cases, determining the True Name of the file
`can be postponed or performed in the background.
`
`DATA STRUCTURES
`
`[0064]
`
`The following data structures, stored in memory 110 of one of more
`
`processors 102 are used to implement the mechanisms described herein. The data
`
`2618-0015
`
`_ 13 _
`
`
`
`600%1624-Page t60f1‘r4
`
`GOOG-1024-Page 16 of 114
`
`
`
`structures can be local to each processor 102 of the system 100, or they can reside
`
`on only some of the processors 102.
`
`[0065]
`
`The data structures described are assumed to reside on individual
`
`peer processors 102 in the data processing system 100. However, they can also be
`
`shared by placing them on a remote, shared file server (for instance, in a 10cal area
`
`network of machines). In order to accommodate sharing data structures, it is
`
`necessary that the processors accessing the shared database use the appropriate
`
`locking techniques to ensure that changes to the shared database do not interfere
`with one another but are appropriately serialized. These locking techniques are
`
`well understood by ordinarily skilled programmers of distributed applications.
`
`[0066]
`It is sometimes desirable to allow some regions to be local to a
`particular processor 102 and other regions to be shared among processors 102.
`
`(Recall that a region is a unit of file system management and control consisting of
`
`a given directory identified by the pathname of the directory.) In the Case of local
`
`and shared regions, there would be both local and shared versions of each data
`structure. Simple changes to the prOcesses described below must be made to
`
`ensure that appropriate data structures are selected for a given operation.
`
`[0067]
`
`The local directory extensions (LDE) table 124 is a data structure
`
`which provides information about files 120 and directories 118 in the data
`
`processing system 100. The local directory extensions table 124 is indexed by a
`
`pathname or contextual name (that is, a user provided name) of a file and includes
`
`the True Name for most files. The information in local directory extension table
`
`124 is in addition to that provided by the native file system of the operating
`
`system.
`[0068]
`
`The True File registry (TFR) 126 is