throbber
wonun IN'1"ELLECl'UAL_ PROPERTY ORGANIZATION
`lntemational Bureau
`
`i 1. v 1 7;‘.
`~'.*i~".iJ<5
`
`INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT)
`
`(51) International Patent Classification 5 =
`G061? 17/30’ 1
`
`(11) International Publication Number:
`(43) International Publication Date:
`
`W0 9652685
`17 October 1996 (l7.l0.96)
`
`(21) International Application Number:
`
`(22) International Filing Date:
`
`9 April 1996 (09.04.96)
`
`(30) Priority Data:
`08/425,160
`
`11 April 1995 (1104.95)
`
`US
`
`(71) Applicant: KlNE'l'ECH, INC. [US/US]; 3140 Whisperwoods
`Court, Northbrook, IL 60062 (US).
`
`AU, AZ, BB, BG, BR, BY,
`EE, ES, Fl, GB, GE, HU, IS,
`. LS. LT. LU, LV, MD,
`Z, PL, PT. RO, RU, SD.
`SE, SG, SI. SK, TJ, TM, TR, TI‘. UA, UG, UZ, VN. ARIPO
`patent (KE, LS, MW, SD.
`UG), Eurasian patent (AM,
`RU,
`I, TM), European patent (AT,
`BE, CH, DE, DK, ES, Fl, FR, GB, GR, IE, IT, LU, MC,
`OAPI patent (BF, BJ, CF, CG, CI, CM. GA,
`GN. ML, MR, NE, SN, TD, TG).
`
`(72) Inventors: FARBER, David, A.; 202E North Carollo Road,
`Ojal, CA 93023 (US).
`LACI-IMAN, Ronald, D.', 3140
`Whisperwoods Court, Northbrook, IL 60062 (US).
`
`With international search report.
`With amended claims.
`
`(74) Agents: LAZAR, Dale, S. et al.', Cushman Darby & Cushman
`L.L.P., 1100 New York Avenue, N.W., Washington, DC
`20005 (US).
`
`(54) Title:
`
`IDENTIFYING DATA IN A DATA PROCESSING SYSTEM
`
`(57) Abstract
`
`In a data processing system (100), a mechanism identifies
`data items by substantially unique identifiers (138, 140, 142,
`144, 146, 148, 150) which depend on all of the data in the
`data items and only on the data in the data items. Existence
`means determine whether a particular data item is present in
`the system, by examining the identifiers of the plurality of data
`items.
`
`GOOG-1005-Page 1 of 116
`
`GOOG-1005-Page 1 of 116
`
`

`

`FOR THE PURPOSES OF INFORMATION ONLY
`
`Codes used to identify States party to the PCT on the front pages of pamphlets publishing international
`applications under the PCT.
`
`AM
`AT
`AU
`BB
`BE
`
`‘£§3P3'£?=E‘§89§9'£8Q9E§ES%
`
`Armmia
`Auau-in
`Auatnlia
`Barbados
`Belgium
`Burkina Faao
`Bulgaria
`Benin
`Brazil
`Belarus
`Canada
`Central African Republic
`C0030
`Switzerland
`Cdte d‘lvoiI'e
`Cameroon
`China
`Czechoslovakia
`Czech Republic
`Germany
`Denmark
`Estonia
`533-111
`Finland
`Fruloe
`Gabon
`
`MW
`MX
`NE
`NL
`NO
`NZ
`PI.
`PI‘
`RO
`RU
`SD
`SE
`SG
`SI
`SK
`SN
`SZ
`‘I'D
`TG
`T1
`11‘
`UA
`UG
`US
`UZ
`VN
`
`Malawi
`Mexico
`Niger
`Netherlands
`Norway
`New Zealand
`Poland
`Portugal
`Romania
`Russian Fedemion
`Sudan
`Sweden
`Singapore
`Slovenia
`Slovakia
`Senegal
`Swaziland
`Chad
`T080
`Tajilriatan
`Trinidad and Tobago
`Ukraine
`Uganda
`United Sale: of America
`Uflaekiaun
`Viet Nam
`
`GOOG-1005-Page 2 of 116
`
`Democratic People's Republic
`of Korea
`Republic of Korea
`Kazakhstan
`Liechtenstein
`Sn’ Lanka
`Liberia
`Lithuania
`Luxembourg
`Latvia
`Monaco
`Republic of Moldova
`Mniagaacar
`Mali
`Mongolia
`Mauritania
`
`GOOG-1005-Page 2 of 116
`
`

`

`WO 96/32685
`
`PCT/US96/04733
`
`IDEHIIEXIEQ_DAIA_IH_A_DAIA_EEQ§E§§IE§_§1§IEH
`
`BAQK§BQHED_QE_IHE_IHXEHIIQN
`
`1.
`
` Linmmm
`
`This invention relates to data processing
`
`systems and, more particularly,
`
`to data processing
`
`systems wherein data items are identified by
`
`substantially unique identifiers which depend on all of
`
`the data in the data items and only on the data in the
`
`data items.
`
`2.
`
`Background of the Invention
`
`Data processing (DP) systems, computers,
`
`networks of computers, or the like, typically offer users
`
`and programs various ways to identify the data in the
`
`systems.
`
`Users typically identify data in the data
`
`processing system by giving the data some form of name.
`
`For example, a typical operating system (os) on a
`
`computer provides a file system in which data items are
`named by alphanumeric identifiers.
`Programs typically
`identify data in the data processing system using a
`
`For example, a program may identify
`location or address.
`a record in a file or database by using a record number
`
`which serves to locate that record.
`
`In all but the most primitive operating
`
`systems, users and programs are able to create and use
`
`collections of named data items,
`
`these collections
`
`themselves being named by identifiers. These named
`
`collections can then,
`
`themselves, be made part of other
`
`named collections.
`
`For example, an OS may provide
`
`mechanisms to group files (data items)
`
`into directories
`
`(collections). These directories can then,
`
`themselves be
`
`made part of other directories.
`
`A data item may thus be
`
`identified relative to these nested directories using a
`
`GOOG-1005-Page 3 of 116
`
`GOOG-1005-Page 3 of 116
`
`

`

`W0 96/232685
`
`PCT/US96l04733
`
`sequence of names, or a so-called pathname, which defines
`
`a path through the directories to a particular data item
`
`(file or directory).
`
`As another example, a database management
`
`system may group data records (data items)
`
`into tables
`
`and then group these tables into database files
`
`(collections). The complete address of any data record
`
`can then be specified using the database file name,
`
`the
`
`table name, and the record number of that data record.
`
`other examples of identifying data items
`
`include:
`
`identifying files in a network file system,
`
`identifying objects in an object-oriented database,
`
`identifying images in an image database, and identifying
`
`articles in a text database.
`
`In general,
`
`the terms "data" and "data item" as
`
`used herein refer to sequences of bits.
`
`Thus a data item
`
`may be the contents of a file, a portion of a file, a
`
`page in memory, an object in an object-oriented program,
`
`a digital message, a digital scanned image, a part of a
`
`video or audio signal, or any other entity which can be
`
`represented by a sequence of bits.
`
`The term "data
`
`processing" herein refers to the processing of data
`
`items, and is sometimes dependent on the type of data
`
`item being processed.
`
`For example, a data processor for
`
`a digital image may differ from a data processor for an
`
`audio signal.
`
`In all of the prior data processing systems the
`
`names or identifiers provided to identify data items (the
`
`data items being files, directories, records in the
`
`database, objects in object-oriented programming,
`
`locations in memory or on a physical device, or the like)
`
`are always defined relative to a specific context.
`
`For
`
`instance,
`
`the file identified by a particular file name
`
`can only be determined when the directory containing the
`
`file (the context)
`
`is known.
`
`The file identified by a
`
`pathname can be determined only when the file system
`
`(context)
`
`is known. Similarly,
`
`the addresses in a
`
`GOOG-1005-Page 4 of 116
`
`GOOG-1005-Page 4 of 116
`
`

`

`WO 96132685
`
`PC'l‘lUS96/04733
`
`the keys in a database table, or
`process address space,
`domain names on a global computer network such as the
`
`Internet are meaningful only because they are specified
`relative to a context.
`
`In prior art systems for identifying data items
`there is no direct relationship between the data names
`
`and the data item.
`
`The same data name in two different
`
`contexts may refer to different data items, and two
`different data names in the same context may refer to the
`same data item.
`
`In addition, because there is no correlation
`
`there is
`between a data name and the data it refers to,
`no a priori way to confirm that a given data item is in
`
`in a DP
`For instance,
`fact the one named by a data name.
`system, if one processor requests that another processor
`deliver a data item with a given data name,
`the‘
`requesting processor cannot,
`in general, verify that the
`data delivered is the correct data (given only the name).
`Therefore it may require further processing, typically on
`the part of the requestor,
`to verify that the data item
`it has obtained is,
`in fact,
`the item it requested.
`A common operation in a DP system is adding a
`new data item to the system. when a new data item is
`added to the system, a name can be assigned to it only by
`updating the context in which names are defined.
`Thus
`
`such systems require a centralized mechanism for the
`
`Such a mechanism is required even
`management of names.
`in a multi-processing system when data items are created
`and identified at separate processors in distinct
`locations, and in which there is no other need for
`communication when data items are added.
`
`In many data processing systems or
`environments, data items are transferred between
`different locations in the system. These locations may
`be processors in the data processing system, storage
`devices, memory, or the like.
`For example, one processor
`may obtain a data item from another processor or from an
`
`GOOG-1005-Page 5 of 116
`
`GOOG-1005-Page 5 of 116
`
`

`

`wo 96/32685
`
`PCI‘/US96/04733
`
`external storage device, such as a floppy disk, and may
`
`incorporate that data item into its system (using the
`
`name provided with that data item).
`
`_
`
`However, when a processor (or some location)
`
`obtains a data item from another location in the DP
`
`system, it is possible that this obtained data item is
`
`already present in the system (either at the location of
`
`the processor or at some other location accessible by the
`
`processor) and therefore a duplicate of the data item is
`created. This situation is common in a network data
`
`processing environment where proprietary software
`
`products are installed from floppy disks onto several
`
`processors sharing a common file server.
`
`In these
`
`systems, it is often the case that the same product will
`
`be installed on several systems, so that several copies
`
`of each file will reside on the common file server.
`
`In some data processing systems in which
`several processors are connected in a network, one system
`
`is designated as a cache server to maintain master copies
`
`of data items, and other systems are designated as cache
`
`clients to copy local copies of the master data items
`
`into a local cache on an as-needed basis. Before using a
`
`cached item, a cache client must either reload the cached
`
`item, be informed of changes to the cached item, or
`
`confirm that the master item corresponding to the cached
`
`item has not changed.
`
`In other words, a cache client
`
`must synchronize its data items with those on the cache
`
`server. This synchronization may involve reloading data
`
`items onto the cache client.
`
`.The need to keep the cache
`
`synchronized or reload it adds significant overhead to
`
`existing caching mechanisms.
`
`In view of the above and other problems with
`
`prior art systems, it is therefore desirable to have a
`
`mechanism which allows each processor in a multiprocessor
`
`system to determine a common and substantially unique
`
`identifier for a data item, using only the data in the
`
`data item and not relying on any sort of context.
`
`GOOG-1005-Page 6 of 116
`
`GOOG-1005-Page 6 of 116
`
`

`

`wo 96/32685
`
`PCI‘/US96/04733
`
`It is further desirable to have a mechanism for
`
`reducing multiple copies of data items in a data
`
`processing system and to have a mechanism which enables
`
`the identification of identical data items so as to
`
`reduce multiple copies.
`
`It is further desirable to
`
`determine whether two instances of a data item are in
`fact the same data item, and to perform various other
`
`systems’ functions and applications on data items without
`
`relying on any context information or properties of the
`data item.
`
`It is also desirable to provide such a
`
`mechanism in such a way as to make it transparent to
`
`users of the data processing system, and it is desirable
`
`that a single mechanism be used to address each of the
`problems described above.
`
`§HMX_QE_IHE_lEEEHIlQ!
`
`in a data processing
`This invention provides,
`system, a method and apparatus for identifying a data
`item in the system, where the identity of the data item
`
`depends on all of the data in the data item and only on
`the data in the data item.
`Thus the identity of a data
`item is independent of its name, origin,
`location,
`address, or other information not derivable directly from
`the data, and depends only on the data itself.
`
`This invention further provides an apparatus
`and a method for determining whether a particular data
`item is present in the system or at a location in the
`
`system, by examining only the data identities of a
`
`plurality of data items.
`
`invention,
`
`Using the method or apparatus of the present
`the efficiency and integrity of a data
`
`processing system can be improved.
`The present invention
`improves the design and operation of a data storage
`system, file system, relational database, object-oriented
`
`database, or the like that stores a plurality of data
`items, by making possible or improving the design and
`
`GOOG-1005-Page 7 of 116
`
`GOOG-1005-Page 7 of 116
`
`

`

`wo 96132685
`
`PCT/U596/04733
`
`operation of at least some or all of the following
`features:
`
`the system stores at most one copy of any data
`
`item at a given location, even when multiple data names
`
`in the system refer to the same contents;
`
`the system avoids copying data from source to
`
`destination locations when the destination locations
`
`already have the data;
`
`the system provides transparent access to any
`
`data item by reference only to its identity and
`
`independent of its present location, whether it be local,
`
`remote, or offline;
`
`the system caches data items from a server, so
`that only the most recently accessed data items need be
`
`retained;
`
`when the system is being used to cache data
`
`items, problems of maintaining cache consistency are
`
`avoided;
`
`the system maintains a desired level of
`
`redundancy of data items in a network of servers,
`
`to
`
`protect against failure by ensuring that multiple copies
`
`of the data items are present at different locations in
`
`the system;
`
`the system automatically archives data items as
`
`they are created or modified;
`
`the system provides the size, age, and location
`
`of groups of data items in order to decide whether they
`
`can be safely removed from a local file system;
`
`the system can efficiently record and preserve
`
`any collection of data items;
`
`the system can efficiently make a copy of any
`
`collection of data items,
`
`to support a version control
`
`mechanism for groups of the data items;
`
`the system can publish data items, allowing
`
`other, possibly anonymous, systems in a network to gain
`
`access to the data items and to rely on the availability
`
`of the data items;
`
`GOOG-1005-Page 8 of 116
`
`GOOG-1005-Page 8 of 116
`
`

`

`WO 96/32685
`
`PCTIUS96l04733
`
`the system can maintain a local inventory of
`
`all the data items located on a given removable medium,
`
`such as a diskette or CD-ROM,
`
`the inventory is
`
`independent of other properties of the data items such as
`
`their name,
`
`location, and date of creation;
`
`the system allows closely related sets of data
`
`items, such as matching or corresponding directories on
`
`disconnected computers,
`
`to be periodically resynchronized
`
`with one another;
`
`the system can verify that data retrieved from
`
`another location is the desired or requested data, using
`
`only the data identifier used to retrieve the data;
`
`the system can prove possession of specific
`
`data items by content without disclosing the content of
`
`the data items, for purposes of later legal verification
`
`and to provide anonymity;
`
`the system tracks possession of specific data
`
`items according to content by owner,
`
`independent of the
`
`name, date, or other properties of the data item, and
`
`tracks the uses of specific data items and files by
`
`content for accounting purposes.
`
`other objects, features, and characteristics of
`
`the present invention as well as the methods of operation
`
`and functions of the related elements of structure, and
`
`the combination of parts and economies of manufacture,
`
`will become more apparent upon consideration of the
`
`following description and the appended claims with
`
`reference to the accompanying drawings, all of which form
`
`a part of this specification.
`
`C
`
`ON
`
`W
`
`FIGURE 1 depicts a typical data processing
`
`system in which a preferred embodiment of the present
`
`invention operates;
`
`FIGURE 2 depicts a hierarchy of data items
`
`stored at any location in such a data processing system;
`
`GOOG-1005-Page 9 of 116
`
`GOOG-1005-Page 9 of 116
`
`

`

`wo 96/32685
`
`PCT/US96/04733
`
`FIGURES 3-9 depict data structures used to
`
`implement an embodiment of the present invention; and
`
`FIGURES 10(a)-28 are flow charts depicting
`
`operation of various aspects_of the present invention.
`
`DEIAILED_DE5QBI2IIQE_QE_IHE_£BE§EHIL1_EBEEEBBED
`EKEM2LAEX_EMEQDIMEHI§
`
`An embodiment of the present invention is now
`
`described with reference to a typical data processing
`
`system 100, which, with reference to FIGURE 1,
`
`includes
`
`one or more processors (or computers) 102 and various
`
`storage devices 104 connected in some way, for example by
`a bus 106.
`
`Each processor 102 includes a CPU 108, a memory
`
`110 and one or more local storage devices 112.
`
`The CPU
`
`108, memory 110, and local storage device 112 may be
`
`internally connected, for example by a bus 114. Each
`
`processor 102 may also include other devices (not shown),
`
`such as a keyboard, a display, a printer, and the like.
`
`In a data processing system 100, wherein more
`
`than one processor 102 is used, that is,
`
`in a
`
`multiprocessor system,
`
`the processors may be in one of
`
`various relationships.
`
`For example,
`
`two processors 102
`
`may be in a client/server, client/client, or a
`
`server/server relationship. These inter-processor
`
`relationships may be dynamic, changing depending on
`
`particular situations and functions. Thus, a particular
`
`processor 102 may change its relationship to other
`
`processors as needed, essentially setting up a peer-to-
`
`peer relationship with other processors.
`
`In a peer-to-
`
`peer relationship, sometimes a particular processor 102
`
`acts as a client processor, whereas at other times the
`
`same processor acts as a server processor.
`
`In other
`
`words,
`
`there is no hierarchy imposed on or required of
`
`processors 102.
`
`In a multiprocessor system,
`
`the processors 102
`
`may be homogeneous or heterogeneous. Further,
`
`in a
`
`8
`
`GOOG-‘I005-Page 10 of 116
`
`GOOG-1005-Page 10 of 116
`
`

`

`wo 96/32685
`
`PCI‘/US96/04733
`
`multiprocessor data processing system 100, some or all of
`
`the processors 102 may be disconnected from the network
`
`of processors for periods of time.
`
`Such disconnection
`
`may be part of the normal operation of the system 100 or
`
`it may be because a particular processor 102 is in need
`
`of repair.
`
`Within a data processing system 100,
`
`the data
`
`may be organized to form a hierarchy of data storage
`
`elements, wherein lower level data storage elements are
`
`combined to form higher level elements. This hierarchy
`can consist of, for example, processors, file systems,
`regions, directories, data files, segments, and the like.
`
`For example, with reference to FIGURE 2,
`
`the data items
`
`on a particular processor 102 may be organized or
`
`structured as a file system 116 which comprises regions
`
`117, each of which comprises directories 118, each of
`
`which can contain other directories 118 or files 120.
`
`Each file 120 being made up of one or more data segments
`122.
`
`In a typical data processing system,
`
`some or
`
`all of these elements can be named by users given certain
`implementation specific naming conventions,
`the name (or
`pathname) of an element being relative to a context.
`In
`
`the context of a data processing system 100, a pathname
`is fully specified by a processor name, a filesystem
`
`name, a sequence of zero or more directory names
`
`identifying nested directories, and a final file name.
`
`(Usually the lowest level elements,
`
`in this case segments
`
`122, cannot be named by users.)
`
`In other words, a file system 116 is a
`
`collection of directories 118.
`
`A directory 118 is a
`
`collection of named files 120 -- both data files 120 and
`
`other directory files 118.
`
`A file 120 is a named data
`
`item which is either a data file (which may be simple or
`
`compound) or a directory file 118.
`
`A simple.file 120
`
`consists of a single data segment 122.
`
`A compound file
`
`120 consists of a sequence of data segments 122.
`
`A data
`
`GOOG-‘I005-Page 11 of 116
`
`GOOG-1005-Page 11 of 116
`
`

`

`W0 96,32,535
`
`PCT/US96/04733
`
`segment 122 is a fixed sequence of bytes. An important
`
`property of any data segment is its size,
`
`the number of
`
`bytes in the sequence.
`
`A single processor 102 may access one or more
`
`file systems 116, and a single storage device 104 may
`
`contain one or more file systems 116, or portions of a
`
`file system 116.
`
`For instance, a file system 116 may
`
`span several storage devices 104.
`
`In order to implement controls in a file
`
`system, file system 116 may be divided into distinct
`
`regions, where each region is a unit of management and
`
`control.
`
`A region consists of a given directory 118 and
`
`is identified by the pathname (user defined) of the
`
`directory.
`
`In the following,
`
`the term "location", with
`
`respect to a data processing system 100, refers to any of
`
`a particular processor 102 in the system, a memory of a
`
`particular processor, a storage device, a removable
`
`storage medium (such as a floppy disk or compact disk),
`
`or any other physical location in the system.
`
`The term
`
`"local" with respect to a particular processor 102 refers
`
`to the memory and storage devices of that particular
`
`processor.
`
`In the following,
`
`the terms "True Name", "data
`
`identity" and "data identifier" refer to the
`
`substantially unique data identifier for a particular
`
`data item.
`
`The term "True File" refers to the actual
`
`file, segment, or data item identified by a True Name.
`
`A file system for a data processing system 100
`
`is now described which is intended to work with an
`
`existing operating system by augmenting some of the
`
`operating system's file management system codes.
`
`The
`
`embodiment provided relies on the standard file
`
`management primitives for actually storing to and
`
`retrieving data items from disk, but uses the mechanisms
`
`of the present invention to reference and access those
`
`data items.
`
`GOOG-‘I005-Page 12 of 116
`
`GOOG-1005-Page 12 of 116
`
`

`

`WO 96/32685
`
`PCT/US96/04733
`
`The processes and mechanisms (services)
`provided in this embodiment are grouped into the
`following categories: primitive mechanisms, operating
`system mechanisms, remote mechanisms, background
`mechanisms, and extended mechanisms.
`
`Primitive mechanisms provide fundamental
`capabilities used to support other mechanisms.
`The
`
`following primitive mechanisms are described:
`
`1.
`
`Calculate True Name;
`
`Assimilate Data Item;
`
`New True File;
`
`Get True Name from Path;
`
`Link path to True Name;
`Realize True File from Docation;
`Locate Remote File;
`
`Make True File Local;
`
`Create scratch File;
`
`Freeze Directory;
`
`Expand Frozen Directory;
`
`12. Delete True File;
`
`13. Process Audit File Entry;
`
`14. Begin Grooming;
`
`15. Select For Removal; and
`
`16.
`
`End Grooming.
`
`operating system mechanisms provide typical
`familiar file system mechanisms, while maintaining the
`data structures required to offer the mechanisms of the
`
`present invention.
`operating system mechanisms are
`designed to augment existing operating systems, and in
`this way to make the present invention compatible with,
`and generally transparent to, existing applications.
`The
`following operating system mechanisms are described:
`
`Open File;
`
`Close File;
`
`Read File;
`
`write File;
`
`Delete File or Directory;
`
`11
`
`GOOG-‘I005-Page 13 of 116
`
`GOOG-1005-Page 13 of 116
`
`

`

`WO 96132685
`
`PC!‘/US96/04733
`
`6.
`
`7.
`
`8.
`
`9.
`
`Copy File or Directory;
`
`Move File or Directory;
`
`Get File status; and
`
`Get Files in Directory.
`
`Remote mechanisms are used by the operating
`
`system in responding to requests from other processors.
`
`These mechanisms enable the capabilities of the present
`
`invention in a peer-to-peer network mode of operation.
`
`The following remote mechanisms are described:
`
`1.
`
`2.
`
`3.
`
`4.
`
`5.
`
`6.
`
`7.
`
`8.
`
`9.
`
`Locate True File;
`
`Reserve True File;
`
`Request True File;
`
`Retire True File;
`
`Cancel Reservation;
`
`Acquire True File;
`
`Lock Cache;
`
`Update Cache; and
`
`Check Expiration Date.
`
`Background mechanisms are intended to run
`
`occasionally and at a low priority. These provide
`
`automated management capabilities with respect to the
`
`present invention.
`are described:
`
`The following background mechanisms
`
`1. Mirror True File;
`
`2.
`
`3.
`
`4.
`5.
`
`Groom Region;
`
`Check for Expired Links; and
`
`Verify Region; and
`Groom Source List.
`
`Extended mechanisms run within application
`
`programs over the operating system. These mechanisms
`
`provide solutions to specific problems and applications.
`
`The following extended mechanisms are described:
`
`1.
`
`2.
`
`3.
`
`4.
`
`5.
`
`Inventory Existing Directory;
`
`Inventory Removable, Read-only Files;
`
`Synchronize directories;
`
`Publish Region;
`
`Retire Directory;
`
`12
`
`GOOG-1005-Page 14 of 116
`
`GOOG-1005-Page 14 of 116
`
`

`

`WO 96/32685
`
`PCI‘/US96/04733
`
`6.
`
`Realize Directory at location;
`
`7. Verify True File;
`
`8.
`
`9.
`
`Track for accounting purposes;
`
`Track for licensing purposes.
`
`The file system herein described maintains
`
`sufficient information to provide a variety of mechanisms
`not ordinarily offered by an operating system,
`some of
`which are listed and described here. Various processing
`performed by this embodiment of the present invention
`will now be described in greater detail.
`
`In some embodiments, some files 120 in a data
`processing system 100 do not have True Names because they
`have been recently received or created or modified, and
`thus their True Names have not yet been computed.
`A file
`that does not yet have a True Name is called a scratch
`file.
`The process of assigning a True Name to a file is
`referred to as assimilation, and is described later.
`
`Note that a scratch file may have a user provided name.
`
`Some of the processing performed by the present
`invention can take place in a background mode or on a
`delayed or as-needed basis. This background processing
`is used to determine information that is not immediately
`required by the system or which may never be required.
`As an example,
`in some cases a scratch file is being
`changed at a rate greater than the rate at which it is
`useful to determine its True Name.
`In these cases,
`determining the True Name of the file can be postponed or
`performed in the background.
`
`D§£§_§§IEE£2£§§
`
`The following data structures, stored in memory
`110 of one of more processors 102 are used to implement
`the mechanisms described herein.
`The data structures can
`be local to each processor 102 of the system 100, or they
`can reside on only some of the processors 102.
`
`GOOG-1005-Page 15 of 116
`
`GOOG-1005-Page 15 of 116
`
`

`

`wo 96l32685
`
`PCT/US96l04733
`
`The data structures described are assumed to
`
`reside on individual peer processors 102 in the data
`processing system 100. However,
`they can also be shared
`
`by placing them on a remote, shared file server (for
`
`instance,
`
`in a local area network of machines).
`
`In order
`
`to accommodate sharing data structures, it is necessary
`
`that the processors accessing the shared database use the
`
`appropriate locking techniques to ensure that changes to
`the shared database do not interfere with one another but
`
`are appropriately serialized. These locking techniques
`
`are well understood by ordinarily skilled programmers of
`
`distributed applications.
`
`It is sometimes desirable to allow some regions
`
`to be local to a particular processor 102 and other
`
`regions to be shared among processors 102.
`
`(Recall that
`
`a region is a unit of file system management and control
`
`consisting of a given directory identified by the
`
`pathname of the directory.)
`
`In the case of local and
`
`shared regions, there would be both local and shared
`
`versions of each data structure.
`
`Simple changes to the
`
`processes described below must be made to ensure that
`
`appropriate data structures are selected for a given
`
`operation.
`
`The local directory extensions (LDE)
`table 124
`is a data structure which provides information about
`
`files 120 and directories 118 in the data processing
`
`system 100.
`
`The local directory extensions table 124 is
`
`indexed by a pathname or contextual name (that is, a user
`
`provided name) of a file and includes the True Name for
`
`most files.
`
`The information in local directory extension
`
`table 124 is in addition to that provided by the native
`
`file system of the operating system.
`
`The True File registry (TFR) 126 is a data
`
`store for listing actual data items which have True
`
`Names, both files 120 and segments 122. When such data
`
`items occur in the True File registry 126 they are known
`
`as True Files. True Files are identified in True File
`
`GOOG-1005-Page 16 of 116
`
`GOOG-1005-Page 16 of 116
`
`

`

`WO 96/32685
`
`PCT/US96l04733
`
`registry 126 by their True Names or identities.
`
`The
`
`table True File registry 126 also stores location,
`
`dependency, and migration information about True Files.
`
`The region table (RT) 128 defines areas in the
`
`network storage which are to be managed separately.
`Region table 128 defines the rules for access to and
`
`migration of files 120 among various regions with the
`local file system 116 and remote peer file systems.
`
`The source table (ST) 130 is a list of the
`
`sources of True Files other than the current True File
`registry 126.
`The source table 130 includes removable
`
`volumes and remote processors.
`
`The audit file (AP) 132 is a list of records
`indicating changes to be made in local or remote files,
`these changes to be processed in background.
`The accounting log (AL) 134 is a log of file
`transactions used to create accounting information in a
`manner which preserves the identity of files being
`tracked independent of their name or location.
`
`The license table (LT) 136 is a table
`identifying files, which may only be used by licensed
`users,
`in a manner independent of their name or location,
`and the users licensed to use them.
`
`t ' e Des
`
`' tio s
`
`t
`
`t St uctu es
`
`The following table summarizes the fields of an
`local directory extensions table entry, as illustrated by
`record 138 in FIGURE 3.
`
`J nesenuon
`
`Region ID
`'
`
`identifies the region in which this file is
`contained.
`
`Pathname
`
`True Name
`
`the user provided name or contextual name
`of the file or directory, relative to the
`re ion in which it occurs.
`
`the computed True Name or identity of the
`file or directory. This True Name is not
`always up to date, and it is set to a
`special value when a file is modified and
`is later recom-uted in the back-round.
`
`GOOG-1005-Page 17 of 116
`
`GOOG-1005-Page 17 of 116
`
`

`

`WO 96/32685
`
`PCTIUS96I04733
`
`i T
`
`indicates whether the file is a data file
`or a directo .
`
`ype
`
`Scratch
`File ID
`
`Time of
`last
`access
`
`Time of
`last modi-
`fication
`
`Safe flag
`
`the physical location of the file in the
`file system, when no True Name has been
`calculated for the file. As noted above,
`such a file is called a scratch file.
`
`If this
`the last access time to this file.
`file is»a directory, this is the last
`access time to an
`file in the director .
`
`If
`the time of last change of this file.
`this file is a directory, this is the last
`modification time of any file in the
`director .
`
`indicates that this file (and, if this file
`is a directory, all of its subordinate
`files) have been backed up on some other
`system, and it is therefore safe to remove
`them.
`
`Lock flag .
`
`indicates
`
`is, it is
`cessor or
`-rocessor
`
`whether a file is locked, that
`being modified by the local pro-
`a remote processor.
`only one
`modif
`a file at a time.
`
`the full size of this directory (including
`all subordinate files), if all files in it
`were fully expanded and duplicated.
`For a
`file that is not a directory this is the
`size of the actual True File.
`
`the identity of the user who owns this
`file, for accounting and license tracking
`-ur-oses.
`
`Each record of the True File registry 126 has
`
`the fields shown in the True File registry record 140 in
`FIGURE 4.
`
`The True File registry 126 consists of the
`
`database described in the table below as well as the
`
`actual True Files identified by the True File IDs below.
`
`T T
`
`computed True Name or identity of
`the file.
`
`rue Name
`
`GOOG-1005-Page 18 of 116
`
`GOOG-1005-Page 18 of 116
`
`

`

`W0 96/32685
`
`PCT/US96/04733
`
`J C
`
`ompressed
`File ID
`
`compressed version of the True File
`may be stored instead of, or in
`addition to, an uncompressed
`version. This field provides the
`identity of the actual
`representation of the compressed
`version of the file.
`
`Grooming
`delete count
`
`tentative count of how many
`references have been selected for
`deletion during a grooming
`o-eration.
`
`Time of last most recent date and time the
`access
`content of this file was accessed.
`
`Expiration
`1
`
`date and time after which this file
`ma
`be deleted b
`this server.
`
`Dependent
`processors
`
`processor IDs of other processors
`which contain references to this
`True File.
`
`True File ID
`
`source ID(s) of zero or more
`sources from which this file or
`data item ma
`be retrieved.
`
`identity or disk location of the
`actual physical representation of
`the file or file segment.
`It is
`sufficient to use a filename in the
`registration directory of the
`underlying operating system.
`True File ID is absent if the
`actual file is not currently
`-resent at the current location.
`
`The
`
`Use count
`
`number of other records on this
`pr

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket