`lntemational Bureau
`
`i 1. v 1 7;‘.
`~'.*i~".iJ<5
`
`INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT)
`
`(51) International Patent Classification 5 =
`G061? 17/30’ 1
`
`(11) International Publication Number:
`(43) International Publication Date:
`
`W0 9652685
`17 October 1996 (l7.l0.96)
`
`(21) International Application Number:
`
`(22) International Filing Date:
`
`9 April 1996 (09.04.96)
`
`(30) Priority Data:
`08/425,160
`
`11 April 1995 (1104.95)
`
`US
`
`(71) Applicant: KlNE'l'ECH, INC. [US/US]; 3140 Whisperwoods
`Court, Northbrook, IL 60062 (US).
`
`AU, AZ, BB, BG, BR, BY,
`EE, ES, Fl, GB, GE, HU, IS,
`. LS. LT. LU, LV, MD,
`Z, PL, PT. RO, RU, SD.
`SE, SG, SI. SK, TJ, TM, TR, TI‘. UA, UG, UZ, VN. ARIPO
`patent (KE, LS, MW, SD.
`UG), Eurasian patent (AM,
`RU,
`I, TM), European patent (AT,
`BE, CH, DE, DK, ES, Fl, FR, GB, GR, IE, IT, LU, MC,
`OAPI patent (BF, BJ, CF, CG, CI, CM. GA,
`GN. ML, MR, NE, SN, TD, TG).
`
`(72) Inventors: FARBER, David, A.; 202E North Carollo Road,
`Ojal, CA 93023 (US).
`LACI-IMAN, Ronald, D.', 3140
`Whisperwoods Court, Northbrook, IL 60062 (US).
`
`With international search report.
`With amended claims.
`
`(74) Agents: LAZAR, Dale, S. et al.', Cushman Darby & Cushman
`L.L.P., 1100 New York Avenue, N.W., Washington, DC
`20005 (US).
`
`(54) Title:
`
`IDENTIFYING DATA IN A DATA PROCESSING SYSTEM
`
`(57) Abstract
`
`In a data processing system (100), a mechanism identifies
`data items by substantially unique identifiers (138, 140, 142,
`144, 146, 148, 150) which depend on all of the data in the
`data items and only on the data in the data items. Existence
`means determine whether a particular data item is present in
`the system, by examining the identifiers of the plurality of data
`items.
`
`GOOG-1005-Page 1 of 116
`
`GOOG-1005-Page 1 of 116
`
`
`
`FOR THE PURPOSES OF INFORMATION ONLY
`
`Codes used to identify States party to the PCT on the front pages of pamphlets publishing international
`applications under the PCT.
`
`AM
`AT
`AU
`BB
`BE
`
`‘£§3P3'£?=E‘§89§9'£8Q9E§ES%
`
`Armmia
`Auau-in
`Auatnlia
`Barbados
`Belgium
`Burkina Faao
`Bulgaria
`Benin
`Brazil
`Belarus
`Canada
`Central African Republic
`C0030
`Switzerland
`Cdte d‘lvoiI'e
`Cameroon
`China
`Czechoslovakia
`Czech Republic
`Germany
`Denmark
`Estonia
`533-111
`Finland
`Fruloe
`Gabon
`
`MW
`MX
`NE
`NL
`NO
`NZ
`PI.
`PI‘
`RO
`RU
`SD
`SE
`SG
`SI
`SK
`SN
`SZ
`‘I'D
`TG
`T1
`11‘
`UA
`UG
`US
`UZ
`VN
`
`Malawi
`Mexico
`Niger
`Netherlands
`Norway
`New Zealand
`Poland
`Portugal
`Romania
`Russian Fedemion
`Sudan
`Sweden
`Singapore
`Slovenia
`Slovakia
`Senegal
`Swaziland
`Chad
`T080
`Tajilriatan
`Trinidad and Tobago
`Ukraine
`Uganda
`United Sale: of America
`Uflaekiaun
`Viet Nam
`
`GOOG-1005-Page 2 of 116
`
`Democratic People's Republic
`of Korea
`Republic of Korea
`Kazakhstan
`Liechtenstein
`Sn’ Lanka
`Liberia
`Lithuania
`Luxembourg
`Latvia
`Monaco
`Republic of Moldova
`Mniagaacar
`Mali
`Mongolia
`Mauritania
`
`GOOG-1005-Page 2 of 116
`
`
`
`WO 96/32685
`
`PCT/US96/04733
`
`IDEHIIEXIEQ_DAIA_IH_A_DAIA_EEQ§E§§IE§_§1§IEH
`
`BAQK§BQHED_QE_IHE_IHXEHIIQN
`
`1.
`
` Linmmm
`
`This invention relates to data processing
`
`systems and, more particularly,
`
`to data processing
`
`systems wherein data items are identified by
`
`substantially unique identifiers which depend on all of
`
`the data in the data items and only on the data in the
`
`data items.
`
`2.
`
`Background of the Invention
`
`Data processing (DP) systems, computers,
`
`networks of computers, or the like, typically offer users
`
`and programs various ways to identify the data in the
`
`systems.
`
`Users typically identify data in the data
`
`processing system by giving the data some form of name.
`
`For example, a typical operating system (os) on a
`
`computer provides a file system in which data items are
`named by alphanumeric identifiers.
`Programs typically
`identify data in the data processing system using a
`
`For example, a program may identify
`location or address.
`a record in a file or database by using a record number
`
`which serves to locate that record.
`
`In all but the most primitive operating
`
`systems, users and programs are able to create and use
`
`collections of named data items,
`
`these collections
`
`themselves being named by identifiers. These named
`
`collections can then,
`
`themselves, be made part of other
`
`named collections.
`
`For example, an OS may provide
`
`mechanisms to group files (data items)
`
`into directories
`
`(collections). These directories can then,
`
`themselves be
`
`made part of other directories.
`
`A data item may thus be
`
`identified relative to these nested directories using a
`
`GOOG-1005-Page 3 of 116
`
`GOOG-1005-Page 3 of 116
`
`
`
`W0 96/232685
`
`PCT/US96l04733
`
`sequence of names, or a so-called pathname, which defines
`
`a path through the directories to a particular data item
`
`(file or directory).
`
`As another example, a database management
`
`system may group data records (data items)
`
`into tables
`
`and then group these tables into database files
`
`(collections). The complete address of any data record
`
`can then be specified using the database file name,
`
`the
`
`table name, and the record number of that data record.
`
`other examples of identifying data items
`
`include:
`
`identifying files in a network file system,
`
`identifying objects in an object-oriented database,
`
`identifying images in an image database, and identifying
`
`articles in a text database.
`
`In general,
`
`the terms "data" and "data item" as
`
`used herein refer to sequences of bits.
`
`Thus a data item
`
`may be the contents of a file, a portion of a file, a
`
`page in memory, an object in an object-oriented program,
`
`a digital message, a digital scanned image, a part of a
`
`video or audio signal, or any other entity which can be
`
`represented by a sequence of bits.
`
`The term "data
`
`processing" herein refers to the processing of data
`
`items, and is sometimes dependent on the type of data
`
`item being processed.
`
`For example, a data processor for
`
`a digital image may differ from a data processor for an
`
`audio signal.
`
`In all of the prior data processing systems the
`
`names or identifiers provided to identify data items (the
`
`data items being files, directories, records in the
`
`database, objects in object-oriented programming,
`
`locations in memory or on a physical device, or the like)
`
`are always defined relative to a specific context.
`
`For
`
`instance,
`
`the file identified by a particular file name
`
`can only be determined when the directory containing the
`
`file (the context)
`
`is known.
`
`The file identified by a
`
`pathname can be determined only when the file system
`
`(context)
`
`is known. Similarly,
`
`the addresses in a
`
`GOOG-1005-Page 4 of 116
`
`GOOG-1005-Page 4 of 116
`
`
`
`WO 96132685
`
`PC'l‘lUS96/04733
`
`the keys in a database table, or
`process address space,
`domain names on a global computer network such as the
`
`Internet are meaningful only because they are specified
`relative to a context.
`
`In prior art systems for identifying data items
`there is no direct relationship between the data names
`
`and the data item.
`
`The same data name in two different
`
`contexts may refer to different data items, and two
`different data names in the same context may refer to the
`same data item.
`
`In addition, because there is no correlation
`
`there is
`between a data name and the data it refers to,
`no a priori way to confirm that a given data item is in
`
`in a DP
`For instance,
`fact the one named by a data name.
`system, if one processor requests that another processor
`deliver a data item with a given data name,
`the‘
`requesting processor cannot,
`in general, verify that the
`data delivered is the correct data (given only the name).
`Therefore it may require further processing, typically on
`the part of the requestor,
`to verify that the data item
`it has obtained is,
`in fact,
`the item it requested.
`A common operation in a DP system is adding a
`new data item to the system. when a new data item is
`added to the system, a name can be assigned to it only by
`updating the context in which names are defined.
`Thus
`
`such systems require a centralized mechanism for the
`
`Such a mechanism is required even
`management of names.
`in a multi-processing system when data items are created
`and identified at separate processors in distinct
`locations, and in which there is no other need for
`communication when data items are added.
`
`In many data processing systems or
`environments, data items are transferred between
`different locations in the system. These locations may
`be processors in the data processing system, storage
`devices, memory, or the like.
`For example, one processor
`may obtain a data item from another processor or from an
`
`GOOG-1005-Page 5 of 116
`
`GOOG-1005-Page 5 of 116
`
`
`
`wo 96/32685
`
`PCI‘/US96/04733
`
`external storage device, such as a floppy disk, and may
`
`incorporate that data item into its system (using the
`
`name provided with that data item).
`
`_
`
`However, when a processor (or some location)
`
`obtains a data item from another location in the DP
`
`system, it is possible that this obtained data item is
`
`already present in the system (either at the location of
`
`the processor or at some other location accessible by the
`
`processor) and therefore a duplicate of the data item is
`created. This situation is common in a network data
`
`processing environment where proprietary software
`
`products are installed from floppy disks onto several
`
`processors sharing a common file server.
`
`In these
`
`systems, it is often the case that the same product will
`
`be installed on several systems, so that several copies
`
`of each file will reside on the common file server.
`
`In some data processing systems in which
`several processors are connected in a network, one system
`
`is designated as a cache server to maintain master copies
`
`of data items, and other systems are designated as cache
`
`clients to copy local copies of the master data items
`
`into a local cache on an as-needed basis. Before using a
`
`cached item, a cache client must either reload the cached
`
`item, be informed of changes to the cached item, or
`
`confirm that the master item corresponding to the cached
`
`item has not changed.
`
`In other words, a cache client
`
`must synchronize its data items with those on the cache
`
`server. This synchronization may involve reloading data
`
`items onto the cache client.
`
`.The need to keep the cache
`
`synchronized or reload it adds significant overhead to
`
`existing caching mechanisms.
`
`In view of the above and other problems with
`
`prior art systems, it is therefore desirable to have a
`
`mechanism which allows each processor in a multiprocessor
`
`system to determine a common and substantially unique
`
`identifier for a data item, using only the data in the
`
`data item and not relying on any sort of context.
`
`GOOG-1005-Page 6 of 116
`
`GOOG-1005-Page 6 of 116
`
`
`
`wo 96/32685
`
`PCI‘/US96/04733
`
`It is further desirable to have a mechanism for
`
`reducing multiple copies of data items in a data
`
`processing system and to have a mechanism which enables
`
`the identification of identical data items so as to
`
`reduce multiple copies.
`
`It is further desirable to
`
`determine whether two instances of a data item are in
`fact the same data item, and to perform various other
`
`systems’ functions and applications on data items without
`
`relying on any context information or properties of the
`data item.
`
`It is also desirable to provide such a
`
`mechanism in such a way as to make it transparent to
`
`users of the data processing system, and it is desirable
`
`that a single mechanism be used to address each of the
`problems described above.
`
`§HMX_QE_IHE_lEEEHIlQ!
`
`in a data processing
`This invention provides,
`system, a method and apparatus for identifying a data
`item in the system, where the identity of the data item
`
`depends on all of the data in the data item and only on
`the data in the data item.
`Thus the identity of a data
`item is independent of its name, origin,
`location,
`address, or other information not derivable directly from
`the data, and depends only on the data itself.
`
`This invention further provides an apparatus
`and a method for determining whether a particular data
`item is present in the system or at a location in the
`
`system, by examining only the data identities of a
`
`plurality of data items.
`
`invention,
`
`Using the method or apparatus of the present
`the efficiency and integrity of a data
`
`processing system can be improved.
`The present invention
`improves the design and operation of a data storage
`system, file system, relational database, object-oriented
`
`database, or the like that stores a plurality of data
`items, by making possible or improving the design and
`
`GOOG-1005-Page 7 of 116
`
`GOOG-1005-Page 7 of 116
`
`
`
`wo 96132685
`
`PCT/U596/04733
`
`operation of at least some or all of the following
`features:
`
`the system stores at most one copy of any data
`
`item at a given location, even when multiple data names
`
`in the system refer to the same contents;
`
`the system avoids copying data from source to
`
`destination locations when the destination locations
`
`already have the data;
`
`the system provides transparent access to any
`
`data item by reference only to its identity and
`
`independent of its present location, whether it be local,
`
`remote, or offline;
`
`the system caches data items from a server, so
`that only the most recently accessed data items need be
`
`retained;
`
`when the system is being used to cache data
`
`items, problems of maintaining cache consistency are
`
`avoided;
`
`the system maintains a desired level of
`
`redundancy of data items in a network of servers,
`
`to
`
`protect against failure by ensuring that multiple copies
`
`of the data items are present at different locations in
`
`the system;
`
`the system automatically archives data items as
`
`they are created or modified;
`
`the system provides the size, age, and location
`
`of groups of data items in order to decide whether they
`
`can be safely removed from a local file system;
`
`the system can efficiently record and preserve
`
`any collection of data items;
`
`the system can efficiently make a copy of any
`
`collection of data items,
`
`to support a version control
`
`mechanism for groups of the data items;
`
`the system can publish data items, allowing
`
`other, possibly anonymous, systems in a network to gain
`
`access to the data items and to rely on the availability
`
`of the data items;
`
`GOOG-1005-Page 8 of 116
`
`GOOG-1005-Page 8 of 116
`
`
`
`WO 96/32685
`
`PCTIUS96l04733
`
`the system can maintain a local inventory of
`
`all the data items located on a given removable medium,
`
`such as a diskette or CD-ROM,
`
`the inventory is
`
`independent of other properties of the data items such as
`
`their name,
`
`location, and date of creation;
`
`the system allows closely related sets of data
`
`items, such as matching or corresponding directories on
`
`disconnected computers,
`
`to be periodically resynchronized
`
`with one another;
`
`the system can verify that data retrieved from
`
`another location is the desired or requested data, using
`
`only the data identifier used to retrieve the data;
`
`the system can prove possession of specific
`
`data items by content without disclosing the content of
`
`the data items, for purposes of later legal verification
`
`and to provide anonymity;
`
`the system tracks possession of specific data
`
`items according to content by owner,
`
`independent of the
`
`name, date, or other properties of the data item, and
`
`tracks the uses of specific data items and files by
`
`content for accounting purposes.
`
`other objects, features, and characteristics of
`
`the present invention as well as the methods of operation
`
`and functions of the related elements of structure, and
`
`the combination of parts and economies of manufacture,
`
`will become more apparent upon consideration of the
`
`following description and the appended claims with
`
`reference to the accompanying drawings, all of which form
`
`a part of this specification.
`
`C
`
`ON
`
`W
`
`FIGURE 1 depicts a typical data processing
`
`system in which a preferred embodiment of the present
`
`invention operates;
`
`FIGURE 2 depicts a hierarchy of data items
`
`stored at any location in such a data processing system;
`
`GOOG-1005-Page 9 of 116
`
`GOOG-1005-Page 9 of 116
`
`
`
`wo 96/32685
`
`PCT/US96/04733
`
`FIGURES 3-9 depict data structures used to
`
`implement an embodiment of the present invention; and
`
`FIGURES 10(a)-28 are flow charts depicting
`
`operation of various aspects_of the present invention.
`
`DEIAILED_DE5QBI2IIQE_QE_IHE_£BE§EHIL1_EBEEEBBED
`EKEM2LAEX_EMEQDIMEHI§
`
`An embodiment of the present invention is now
`
`described with reference to a typical data processing
`
`system 100, which, with reference to FIGURE 1,
`
`includes
`
`one or more processors (or computers) 102 and various
`
`storage devices 104 connected in some way, for example by
`a bus 106.
`
`Each processor 102 includes a CPU 108, a memory
`
`110 and one or more local storage devices 112.
`
`The CPU
`
`108, memory 110, and local storage device 112 may be
`
`internally connected, for example by a bus 114. Each
`
`processor 102 may also include other devices (not shown),
`
`such as a keyboard, a display, a printer, and the like.
`
`In a data processing system 100, wherein more
`
`than one processor 102 is used, that is,
`
`in a
`
`multiprocessor system,
`
`the processors may be in one of
`
`various relationships.
`
`For example,
`
`two processors 102
`
`may be in a client/server, client/client, or a
`
`server/server relationship. These inter-processor
`
`relationships may be dynamic, changing depending on
`
`particular situations and functions. Thus, a particular
`
`processor 102 may change its relationship to other
`
`processors as needed, essentially setting up a peer-to-
`
`peer relationship with other processors.
`
`In a peer-to-
`
`peer relationship, sometimes a particular processor 102
`
`acts as a client processor, whereas at other times the
`
`same processor acts as a server processor.
`
`In other
`
`words,
`
`there is no hierarchy imposed on or required of
`
`processors 102.
`
`In a multiprocessor system,
`
`the processors 102
`
`may be homogeneous or heterogeneous. Further,
`
`in a
`
`8
`
`GOOG-‘I005-Page 10 of 116
`
`GOOG-1005-Page 10 of 116
`
`
`
`wo 96/32685
`
`PCI‘/US96/04733
`
`multiprocessor data processing system 100, some or all of
`
`the processors 102 may be disconnected from the network
`
`of processors for periods of time.
`
`Such disconnection
`
`may be part of the normal operation of the system 100 or
`
`it may be because a particular processor 102 is in need
`
`of repair.
`
`Within a data processing system 100,
`
`the data
`
`may be organized to form a hierarchy of data storage
`
`elements, wherein lower level data storage elements are
`
`combined to form higher level elements. This hierarchy
`can consist of, for example, processors, file systems,
`regions, directories, data files, segments, and the like.
`
`For example, with reference to FIGURE 2,
`
`the data items
`
`on a particular processor 102 may be organized or
`
`structured as a file system 116 which comprises regions
`
`117, each of which comprises directories 118, each of
`
`which can contain other directories 118 or files 120.
`
`Each file 120 being made up of one or more data segments
`122.
`
`In a typical data processing system,
`
`some or
`
`all of these elements can be named by users given certain
`implementation specific naming conventions,
`the name (or
`pathname) of an element being relative to a context.
`In
`
`the context of a data processing system 100, a pathname
`is fully specified by a processor name, a filesystem
`
`name, a sequence of zero or more directory names
`
`identifying nested directories, and a final file name.
`
`(Usually the lowest level elements,
`
`in this case segments
`
`122, cannot be named by users.)
`
`In other words, a file system 116 is a
`
`collection of directories 118.
`
`A directory 118 is a
`
`collection of named files 120 -- both data files 120 and
`
`other directory files 118.
`
`A file 120 is a named data
`
`item which is either a data file (which may be simple or
`
`compound) or a directory file 118.
`
`A simple.file 120
`
`consists of a single data segment 122.
`
`A compound file
`
`120 consists of a sequence of data segments 122.
`
`A data
`
`GOOG-‘I005-Page 11 of 116
`
`GOOG-1005-Page 11 of 116
`
`
`
`W0 96,32,535
`
`PCT/US96/04733
`
`segment 122 is a fixed sequence of bytes. An important
`
`property of any data segment is its size,
`
`the number of
`
`bytes in the sequence.
`
`A single processor 102 may access one or more
`
`file systems 116, and a single storage device 104 may
`
`contain one or more file systems 116, or portions of a
`
`file system 116.
`
`For instance, a file system 116 may
`
`span several storage devices 104.
`
`In order to implement controls in a file
`
`system, file system 116 may be divided into distinct
`
`regions, where each region is a unit of management and
`
`control.
`
`A region consists of a given directory 118 and
`
`is identified by the pathname (user defined) of the
`
`directory.
`
`In the following,
`
`the term "location", with
`
`respect to a data processing system 100, refers to any of
`
`a particular processor 102 in the system, a memory of a
`
`particular processor, a storage device, a removable
`
`storage medium (such as a floppy disk or compact disk),
`
`or any other physical location in the system.
`
`The term
`
`"local" with respect to a particular processor 102 refers
`
`to the memory and storage devices of that particular
`
`processor.
`
`In the following,
`
`the terms "True Name", "data
`
`identity" and "data identifier" refer to the
`
`substantially unique data identifier for a particular
`
`data item.
`
`The term "True File" refers to the actual
`
`file, segment, or data item identified by a True Name.
`
`A file system for a data processing system 100
`
`is now described which is intended to work with an
`
`existing operating system by augmenting some of the
`
`operating system's file management system codes.
`
`The
`
`embodiment provided relies on the standard file
`
`management primitives for actually storing to and
`
`retrieving data items from disk, but uses the mechanisms
`
`of the present invention to reference and access those
`
`data items.
`
`GOOG-‘I005-Page 12 of 116
`
`GOOG-1005-Page 12 of 116
`
`
`
`WO 96/32685
`
`PCT/US96/04733
`
`The processes and mechanisms (services)
`provided in this embodiment are grouped into the
`following categories: primitive mechanisms, operating
`system mechanisms, remote mechanisms, background
`mechanisms, and extended mechanisms.
`
`Primitive mechanisms provide fundamental
`capabilities used to support other mechanisms.
`The
`
`following primitive mechanisms are described:
`
`1.
`
`Calculate True Name;
`
`Assimilate Data Item;
`
`New True File;
`
`Get True Name from Path;
`
`Link path to True Name;
`Realize True File from Docation;
`Locate Remote File;
`
`Make True File Local;
`
`Create scratch File;
`
`Freeze Directory;
`
`Expand Frozen Directory;
`
`12. Delete True File;
`
`13. Process Audit File Entry;
`
`14. Begin Grooming;
`
`15. Select For Removal; and
`
`16.
`
`End Grooming.
`
`operating system mechanisms provide typical
`familiar file system mechanisms, while maintaining the
`data structures required to offer the mechanisms of the
`
`present invention.
`operating system mechanisms are
`designed to augment existing operating systems, and in
`this way to make the present invention compatible with,
`and generally transparent to, existing applications.
`The
`following operating system mechanisms are described:
`
`Open File;
`
`Close File;
`
`Read File;
`
`write File;
`
`Delete File or Directory;
`
`11
`
`GOOG-‘I005-Page 13 of 116
`
`GOOG-1005-Page 13 of 116
`
`
`
`WO 96132685
`
`PC!‘/US96/04733
`
`6.
`
`7.
`
`8.
`
`9.
`
`Copy File or Directory;
`
`Move File or Directory;
`
`Get File status; and
`
`Get Files in Directory.
`
`Remote mechanisms are used by the operating
`
`system in responding to requests from other processors.
`
`These mechanisms enable the capabilities of the present
`
`invention in a peer-to-peer network mode of operation.
`
`The following remote mechanisms are described:
`
`1.
`
`2.
`
`3.
`
`4.
`
`5.
`
`6.
`
`7.
`
`8.
`
`9.
`
`Locate True File;
`
`Reserve True File;
`
`Request True File;
`
`Retire True File;
`
`Cancel Reservation;
`
`Acquire True File;
`
`Lock Cache;
`
`Update Cache; and
`
`Check Expiration Date.
`
`Background mechanisms are intended to run
`
`occasionally and at a low priority. These provide
`
`automated management capabilities with respect to the
`
`present invention.
`are described:
`
`The following background mechanisms
`
`1. Mirror True File;
`
`2.
`
`3.
`
`4.
`5.
`
`Groom Region;
`
`Check for Expired Links; and
`
`Verify Region; and
`Groom Source List.
`
`Extended mechanisms run within application
`
`programs over the operating system. These mechanisms
`
`provide solutions to specific problems and applications.
`
`The following extended mechanisms are described:
`
`1.
`
`2.
`
`3.
`
`4.
`
`5.
`
`Inventory Existing Directory;
`
`Inventory Removable, Read-only Files;
`
`Synchronize directories;
`
`Publish Region;
`
`Retire Directory;
`
`12
`
`GOOG-1005-Page 14 of 116
`
`GOOG-1005-Page 14 of 116
`
`
`
`WO 96/32685
`
`PCI‘/US96/04733
`
`6.
`
`Realize Directory at location;
`
`7. Verify True File;
`
`8.
`
`9.
`
`Track for accounting purposes;
`
`Track for licensing purposes.
`
`The file system herein described maintains
`
`sufficient information to provide a variety of mechanisms
`not ordinarily offered by an operating system,
`some of
`which are listed and described here. Various processing
`performed by this embodiment of the present invention
`will now be described in greater detail.
`
`In some embodiments, some files 120 in a data
`processing system 100 do not have True Names because they
`have been recently received or created or modified, and
`thus their True Names have not yet been computed.
`A file
`that does not yet have a True Name is called a scratch
`file.
`The process of assigning a True Name to a file is
`referred to as assimilation, and is described later.
`
`Note that a scratch file may have a user provided name.
`
`Some of the processing performed by the present
`invention can take place in a background mode or on a
`delayed or as-needed basis. This background processing
`is used to determine information that is not immediately
`required by the system or which may never be required.
`As an example,
`in some cases a scratch file is being
`changed at a rate greater than the rate at which it is
`useful to determine its True Name.
`In these cases,
`determining the True Name of the file can be postponed or
`performed in the background.
`
`D§£§_§§IEE£2£§§
`
`The following data structures, stored in memory
`110 of one of more processors 102 are used to implement
`the mechanisms described herein.
`The data structures can
`be local to each processor 102 of the system 100, or they
`can reside on only some of the processors 102.
`
`GOOG-1005-Page 15 of 116
`
`GOOG-1005-Page 15 of 116
`
`
`
`wo 96l32685
`
`PCT/US96l04733
`
`The data structures described are assumed to
`
`reside on individual peer processors 102 in the data
`processing system 100. However,
`they can also be shared
`
`by placing them on a remote, shared file server (for
`
`instance,
`
`in a local area network of machines).
`
`In order
`
`to accommodate sharing data structures, it is necessary
`
`that the processors accessing the shared database use the
`
`appropriate locking techniques to ensure that changes to
`the shared database do not interfere with one another but
`
`are appropriately serialized. These locking techniques
`
`are well understood by ordinarily skilled programmers of
`
`distributed applications.
`
`It is sometimes desirable to allow some regions
`
`to be local to a particular processor 102 and other
`
`regions to be shared among processors 102.
`
`(Recall that
`
`a region is a unit of file system management and control
`
`consisting of a given directory identified by the
`
`pathname of the directory.)
`
`In the case of local and
`
`shared regions, there would be both local and shared
`
`versions of each data structure.
`
`Simple changes to the
`
`processes described below must be made to ensure that
`
`appropriate data structures are selected for a given
`
`operation.
`
`The local directory extensions (LDE)
`table 124
`is a data structure which provides information about
`
`files 120 and directories 118 in the data processing
`
`system 100.
`
`The local directory extensions table 124 is
`
`indexed by a pathname or contextual name (that is, a user
`
`provided name) of a file and includes the True Name for
`
`most files.
`
`The information in local directory extension
`
`table 124 is in addition to that provided by the native
`
`file system of the operating system.
`
`The True File registry (TFR) 126 is a data
`
`store for listing actual data items which have True
`
`Names, both files 120 and segments 122. When such data
`
`items occur in the True File registry 126 they are known
`
`as True Files. True Files are identified in True File
`
`GOOG-1005-Page 16 of 116
`
`GOOG-1005-Page 16 of 116
`
`
`
`WO 96/32685
`
`PCT/US96l04733
`
`registry 126 by their True Names or identities.
`
`The
`
`table True File registry 126 also stores location,
`
`dependency, and migration information about True Files.
`
`The region table (RT) 128 defines areas in the
`
`network storage which are to be managed separately.
`Region table 128 defines the rules for access to and
`
`migration of files 120 among various regions with the
`local file system 116 and remote peer file systems.
`
`The source table (ST) 130 is a list of the
`
`sources of True Files other than the current True File
`registry 126.
`The source table 130 includes removable
`
`volumes and remote processors.
`
`The audit file (AP) 132 is a list of records
`indicating changes to be made in local or remote files,
`these changes to be processed in background.
`The accounting log (AL) 134 is a log of file
`transactions used to create accounting information in a
`manner which preserves the identity of files being
`tracked independent of their name or location.
`
`The license table (LT) 136 is a table
`identifying files, which may only be used by licensed
`users,
`in a manner independent of their name or location,
`and the users licensed to use them.
`
`t ' e Des
`
`' tio s
`
`t
`
`t St uctu es
`
`The following table summarizes the fields of an
`local directory extensions table entry, as illustrated by
`record 138 in FIGURE 3.
`
`J nesenuon
`
`Region ID
`'
`
`identifies the region in which this file is
`contained.
`
`Pathname
`
`True Name
`
`the user provided name or contextual name
`of the file or directory, relative to the
`re ion in which it occurs.
`
`the computed True Name or identity of the
`file or directory. This True Name is not
`always up to date, and it is set to a
`special value when a file is modified and
`is later recom-uted in the back-round.
`
`GOOG-1005-Page 17 of 116
`
`GOOG-1005-Page 17 of 116
`
`
`
`WO 96/32685
`
`PCTIUS96I04733
`
`i T
`
`indicates whether the file is a data file
`or a directo .
`
`ype
`
`Scratch
`File ID
`
`Time of
`last
`access
`
`Time of
`last modi-
`fication
`
`Safe flag
`
`the physical location of the file in the
`file system, when no True Name has been
`calculated for the file. As noted above,
`such a file is called a scratch file.
`
`If this
`the last access time to this file.
`file is»a directory, this is the last
`access time to an
`file in the director .
`
`If
`the time of last change of this file.
`this file is a directory, this is the last
`modification time of any file in the
`director .
`
`indicates that this file (and, if this file
`is a directory, all of its subordinate
`files) have been backed up on some other
`system, and it is therefore safe to remove
`them.
`
`Lock flag .
`
`indicates
`
`is, it is
`cessor or
`-rocessor
`
`whether a file is locked, that
`being modified by the local pro-
`a remote processor.
`only one
`modif
`a file at a time.
`
`the full size of this directory (including
`all subordinate files), if all files in it
`were fully expanded and duplicated.
`For a
`file that is not a directory this is the
`size of the actual True File.
`
`the identity of the user who owns this
`file, for accounting and license tracking
`-ur-oses.
`
`Each record of the True File registry 126 has
`
`the fields shown in the True File registry record 140 in
`FIGURE 4.
`
`The True File registry 126 consists of the
`
`database described in the table below as well as the
`
`actual True Files identified by the True File IDs below.
`
`T T
`
`computed True Name or identity of
`the file.
`
`rue Name
`
`GOOG-1005-Page 18 of 116
`
`GOOG-1005-Page 18 of 116
`
`
`
`W0 96/32685
`
`PCT/US96/04733
`
`J C
`
`ompressed
`File ID
`
`compressed version of the True File
`may be stored instead of, or in
`addition to, an uncompressed
`version. This field provides the
`identity of the actual
`representation of the compressed
`version of the file.
`
`Grooming
`delete count
`
`tentative count of how many
`references have been selected for
`deletion during a grooming
`o-eration.
`
`Time of last most recent date and time the
`access
`content of this file was accessed.
`
`Expiration
`1
`
`date and time after which this file
`ma
`be deleted b
`this server.
`
`Dependent
`processors
`
`processor IDs of other processors
`which contain references to this
`True File.
`
`True File ID
`
`source ID(s) of zero or more
`sources from which this file or
`data item ma
`be retrieved.
`
`identity or disk location of the
`actual physical representation of
`the file or file segment.
`It is
`sufficient to use a filename in the
`registration directory of the
`underlying operating system.
`True File ID is absent if the
`actual file is not currently
`-resent at the current location.
`
`The
`
`Use count
`
`number of other records on this
`pr