`Distributed Software Repositories*
`
`Shirley Browne! Jack Dongarra, Stan Green, Keith Moore
`Theresa Pepin, Tom Rowan, and Reed Wade
`University of Tennessee
`Eric Grosse
`AT&T Bell Laboratories
`
`Abstract
`
`A location-independent naming system for network re-
`sources has been designed to facilitate organization and de-
`scription of software components accessible through a vir-
`tual distributed repository. This naming system enables
`easy and efficient searching and retrieval, and it addresses
`many of the consistency, authenticity, and integrity issues
`involved with distributed software repositories by providing
`mechanisms for grouping resources and for authenticity and
`integrity checking. This paper details the design of the nam-
`ing system, describes a prototype implementation of some of
`the capabilities, and describes how the system fits into the
`development of the National HPCC Software Exchange, a | ..
`virtual software repository that has the goal of providing ac-
`cess to reusable software components for high-performance
`computing.
`
`1
`
`Introduction
`
`Well-maintained software repositories are central to software
`reuse because they make high-quality software widely avail-
`able and easily accessible. One such repository is Netlib',
`a collection of high-quality publicly available mathematical
`software[6, 4]. Netlib,
`in operation since 1985, currently
`processes over 300,000 requests a day. Netlib is serving as
`a prototype for development of the National HPCC Soft-
`ware Exchange (NHSE)’, which has the goal of encompass-
`ing all High Performance Computing and Communications
`(HPCC) software repositories and of promoting reuse of
`software components developed by Grand Challenge and
`other scientific computing researchers (5). Other network-
`“The work described in this paper is sponsored by NASA under
`Grant No. NAG 5-2736, by the National Science Foundation under
`Grant No. ASC-9109853, and by AT&T Bell Laboratories.
`107
`tauthor to whom correspondence should be directed.
`Ayres Hall, Computer Science Department, University of Tennessee,
`Knoxville, TN 37996-1301, (615) 974-5886, browne@ca.utk.edu
`‘Accessible
`from
`a World Wide Web
`browser
`at
`bttp://www.netlib.org/
`Accessible at bttp://www-.netlib.org/nae/
`
`accessible software repositories include ASSET’, CARDS",
`DSRS*, ELSA®, the GAMS Virtual Software Repository’,
`and STARS®, ASSET, CARDS, DSRS, and ELSA arepar-
`ticipating in an interoperability experiment that allows a
`user of any one of these repositories to access software ex-
`ported from the other repositories.
`The software reuse marketplace is expanding in at least
`two dimensions. One dimension is the expansion from intra-
`organizational reuse to inter-organizational reuse. For ex-
`ample, various federal agencies have established their own
`internal software reuse programs. Several efforts are now
`underway to promote reuse of software across agencies.
`Similarly, companies are becoming interested in accessing
`software produced by academic and government research
`groups, Another dimension of expansion is from reuse within
`‘a’ particular application domain to interdisciplinary reuse.
`. Reuse of software from other disciplines is being fostered,
`for example, by efforts to solve interdisciplinary Grand Chal-
`lenge problems. Solution of such problems will require col-
`laboration by scientists from different disciplines, as well as
`sharing of software produced by application and computer
`scientists.
`Another recent development that affects the software
`reuse marketplace is the growth of the World Wide Web
`(WWW), together with the ease with which individuals may
`make resources available on a WWW server. A contributor
`need only make the files composing an resource available
`on a file server and make available a descriptive HTML file
`containing pointers to the resource files.
`Growthin the popularity of the Internet and the World
`Wide Web, as well as the wide availability of WWW
`client and server software, has accelerated the shift from
`centrally maintained software repositories to virtual, dis-
`tributed repositories. For example, the GAMS Repository,
`once a central repository, is now a virtual repository that
`catalogs software maintained by other repositories [2]. Sim-
`ilarly, the NHSE will provide a uniform interface to a virtual
`HPCCsoftware repository that will be built on top of a dis-
`tributed set of discipline-oriented repositories(5], as shown
`in Figure 1.
`The main advantage of distributing a repository is to
`
`S Accessible at bttp://source.asset.com/
`4 Accessible at http:/ /dealer.cards.com/
`5 Accessible at bttp:/ /ssed1.Jms-disa.mil/erp/dsrspage.btml
`S Accessible at
`http://rbse.mountain.net/ELSA/elsalob.bhiml
`T Accessible st bttp://gams.nist.gov/
`® Accessible at
`bttp://www.stars.ballston.paramax.com/Iindex.html
`
`EMCVMW 1002
`EMCVMW 1002
`
`1
`
`
`
`
`
`e The more decentralized and smaller the individual
`repositories become, the less practical it becomes for
`each individual repository to provide the full range of
`search and authentication services.
`
`:
`.
`.
`,
`Figure 1: Virtual Repository Architecture
`‘
`
`Most of the above problems can be alleviated by im-
`plementing a location-independent naming system that in-
`cludes mechanisms for authenticity and integrity checking.
`We have designed a naming system that provides for two
`levels of naming. The binding between a lower-level name
`(called a LIFN) and file contents is unchangeable and ver-
`ifiable. A lower-level name may be resolved to multiple,
`mirrored copies.
`In the case where it represents a set of
`files, the name mayberesolved to a list of other names. A
`higher-level name (called a URN)is associated with a cat-
`aloging record that includes the lower-level name as well as
`.
`er
`other descriptive information. This record may be crypto-
`allow the software to be maintained by those in the best po-
`graphically signed by the publisher so that users may verify
`sition to keep it up-to-date. Also, copies of popular software
`the authenticity of a retrieved resource. At any given time,
`packages may be mirrored by a number ofsites to increase
`a higher-level name is associated with exactly one lower-
`availability (e.g., if one site is unreachable, the software may
`level name, but this binding may change over time. Higher-
`be retrieved from a different site) and to prevent bottlenecks.
`level names allow for long-lived human-readable references,
`Despite the benefits, distributed maintenance and mir-
`while lower-level names permit reliable caching and mirror-
`roring of software poses the following challenges.
`ing as well as permitting precise references when needed.
`,
`;
`Be
`ais
`F
`Location-independent names will be the basis of transpar-
`. Kes we ey of oo and of ee ent mirroring. They will also provide a unique key to which
`information and
`presenting a uniform searching ani
`third parties may attach value-added information such as
`browsing interface become much more difficult.
`additional cataloging information and quality assessments.
`
`® The WWW mechanism ofspecifyingafile by its Uni- This paper describes the design of our naming system. We
`form Resource Locator (URL)is inadequate for ensur-
`also describe our implementation of a prototype name-to-
`ing the consistency and currency of mirrored copies, as____‘!ocation service and of a modified WWW client that does
`a URLfor an independently mirrored copy of a soft-- *'
`‘name resolution: A glossary of acronyms and termsused in
`ware package may point
`to an out-of-date copy and:".
`; {this paper isincluded as an appendix.
`give no indication that it is not up-to-date. Further-
`ae
`more, mirror copies of a file cannot be located froma”
`2s Related Work
`URL reference, since each copy has a different URL.
`a
`The use of a public-key encryption technique for authenti-
`cating the source of a software component and for ensuring
`e Consistency between a set of files that are meant to
`that the component has not been altered subsequentto its
`be used together must be maintained. For example,
`publication is proposed in [9]. Cryptographic information, in
`the Nethb Software Repository provides dependency
`the form of a digital signature created by signing the hashed
`checking that allows the user to retrieve a top-level
`digest of the contents of a component,is included within
`routine plus all routines in its dependencytree (i.e.,
`the component’s unique identifier. The proposed method
`those routines that are called directly or indirectly by
`is intended to prevent not only changes by unauthorized
`the top-level routine). Another example is a graph-
`parties, but also changes by the original author ~ i.e., the
`ical parallel programming environment that relies on
`author is not permitted to modify a component without as-
`an underlying parallel communications support pack-
`age. The problem becomes more complex when differ-_—signing a new uniqueidentifier. The method assumes that
`ent pieces might be retrieved from different. physical
`each author has been assigned a globally unique AuthorID,
`repositories.
`Ideally, the user should be able to have
`has chosen an asymmetric public/private key pair, and has
`a consistent setretrieved automatically without hav-
`publicized the public key to the community of potential re-
`ing to scan documentation to verify that compatible
`users. A newly chosen symmetric encryption key is used
`pieces have been retrieved.
`to encrypt the componentitself. Then the symmetric key,
`As the numberofreuse libraries grows, users cannot
`the hashed digest of the component, and the Author ID are
`be expected to access each of them separately using
`concatenated and encrypted using the asymmetric private
`a different interface. Thus, scalable interoperability
`__MeY, and the resultisconcatenated to thecleartext ver-
`ees
`sion
`e Author
`0 crea’
`¢ unique identifier
`for
`the
`between separately managed
`repositories is needed.
`component, The method does not address name-to-location
`In the environmentof accessing a few well-established
`resolution, other than to say that the encrypted component
`is made available along with the unique identifier and any
`repositories that the user knows and trusts, a user
`is assured of the integrity and authenticity of a re-
`other cleartext information. The proposed uniqueidentifier
`trieved file because these properties are provided by
`is similar to our LIFN, and encryption of the hash digest
`the administrative procedures of that repository. With
`and AuthorID is similar to our method of having the author
`a lange number ofless familiar repositories, however,
`cryptographically sign a catalogue record that includes the
`it becomes necessary to establish interoperable trust
`author name andthe file's MD5 signature. Our method al-
`mechanisms and to reduce the numberofparties with
`lows a choice of encryption algorithms, however, and allows
`whom the user must establish trust.
`the digital signature used for authentication to be generated
`
`2
`
`
`
`independently and at a different time from the component's
`cations rather than names, their use as references presents
`identifier.
`at least two problems. One problem is thatfiles get moved,
`changing their URLs. Then pointers that contain the old
`Functional requirements for Uniform Resource Names
`(URNs) are proposed in [12] by the IETF Uniform Resource
`URLs becomestale. One can leave a forwarding address at
`Identification et Working Group. According to [12], the
`the old URL, but forwarding addresses are an awkward and
`inelegant solution. Another problem with using URLs as
`function of a
`is to provide a globally unique, per-
`references is that mirrored copies offiles cannot be located
`sistent identifier used for recognition of and for access to
`characteristics of a resource or to the resource itself. URN
`from a URLreference, since each copy has a different URL.
`assignment is delegated to naming authorities,
`the names
`Tk has been widely recognized that a solution to the
`above problems is to assign location-independent names to
`of which are persistent and globally unique, and who may
`assign names directly or delegate their authority to sub-
`files and to provide a name-to-location service that, given
`a name, returns a list of locations for that name. A re
`authorities. Global uniqueness of URNs is guaranteed by
`requiring each naming authority to guarantee uniqueness
`source provider who moves somefiles need only delete’ the
`within its portion of the URN namespace.
`[t is left up to
`old name-to-location bindings and register the new bindings
`with the name-to-location service. Likewise, a site that mir-
`each naming authority to determine the conditions under
`rors a copy ofa file need only register its location with the
`which it will issue a URN (for example, whether or not to
`name-to-location service. Then a user attempting to retrieve
`issue a new URN when the contentsofa file change). Some
`the file corresponding to a location-independent name may
`tesL implementations of URNs are underway by members of
`query the name-to-location service for a list of alternative
`the UR] Working Group at Georgia Tech and Bunyip Cor-
`locations to betried.
`poration ®. The Georgia Tech testbed uses the whois++
`Our work is similar to the IETF’s Uniform Resource
`protocol for URN to URC resolution. A URC,or Uniform
`Resource Characteristic, is a catalog record which includes
`Identifier Working Group’s work on Uniform Resource
`locations, or URLs, at which the resource may be accessed.
`Names (URNs) [12] and to CNRI's work on unique docu-
`The URC server supports searching by other attributes, in
`ment identifiers for digital libraries (8). However, neither of
`addition to URN lookup, via the whois++ protocol. A mod-
`these groups has addressed the reliability and consistencyis-
`ified version of Mosaic that does URN to URC resolutionis
`sues addressed by our two-level naming system. Our system
`available. A proxy server based on CERN httpd that does
`includes a lower-level name a called Location Independent
`cacheing by URNsis also running at Georgia Tech.
`File Name (LIFN)and a higher-level namecalled a Uniform
`As part of the Computer Science Technical Report
`Resource Name (URN).
`An important question is whether the byte contents of
`(CSTR) project
`[8], which is developing an architecture
`for distributed digital document libraries, the Corporation
`the file referred to by a locatiori-independent. name should
`for National Research Initiatives (CNRI) is implementing a
`’_
`be fixed. or be allowed to change.
`If the byte contents are
`! name-to-location resolution service called the Handle Man-
`allowed to change, then afurther question arises as to what
`agement System (HMS) '°. CNRI’s handle is a name for a
`should be the consistency requirements for alternative loca-
`digital object and is analogous to JETF’s URN. The HMS.in-
`tions for the same name: Valid arguments for both cases
`cludes a Handle Generator that a naming authority may run
`can be madefor different situations. For example, for soft-
`and use to create globally unique handles, Handle Servers
`ware resources it is desirable to have an unambiguous refer-
`that process update requests from naming authorities and
`ence to the fixed byte contents for the purpose of attaching
`query requests from clients to resolve handles, and a Handle
`a review or reporting experimental or performanceresults.
`Server Directory that maps a handle to the appropriate Han-
`Fixed contents also make it possible to computeafile di-
`dle Server, The distribution of handles to Handle Servers is
`gest that may be cryptographically signed by the author
`based on a hashing algorithm. Anelectronic mail interface
`of the resource, allowing verification of the integrity of a
`is used by handle administrators to add, delete, and modify
`retrieved file. On the other hand,it is desirable to have a
`handle entries in the Handle Server database. Clients use
`reference to a software package that need not be changed ev-
`a UDP datagram interface to request location data associ-
`ery time a bug fix or minor revision takes place, especially
`ated with a handle. A modified version of Mosaic that does
`if the cataloging information (e.g., title, author, abstract)
`handle resolution is available from CNRI. The types oflo-
`does not change. The cataloging information for a software
`cation information stored by Handle Servers include URL,
`package might contain a reference to a Web page describing
`repository name, email address, and X.500 Distinguished
`and/or documenting the package. The author of the Web
`Name. Use of a repository name by a client requires an-
`page would like to be able to update the page without having
`other round of name-to-location resolution. CNRI’s proper-
`to changeall the references to it. A non-software example
`ties record that describes the properties of a digital object
`where it would be desirable to allow contents to changeis
`is analogous to IETF's URC. The properties record is not
`a name that refers to a file containing the “current weather
`map”.
`stored by the HMS, but rather by an Information and Refer-
`ence (IR) Server that is to be maintained by each repository.
`Because both types of name are needed, we have imple-
`Each naming authority may also maintain an IR server con-
`mented both. The type of name that refers to fixed byte con-
`taining a properties record for each digital object within its
`tents is called a Location Independent File Name, or LIFN.
`authority.
`Once a LIFN has been assigned to a particular sequence of
`bytes, that binding may not be changed. The type of name
`for which the contents to which it refers may change is called
`a Uniform Resource Name, or URN.
`We divide the file access system into two levels. The
`upper level is where publishing, cataloging, and searching
`activities take place. These upper-level activities are con-
`cerned with the semantic, or intellectual, contents offiles.
`The lower level is where distribution, mirroring, and caching
`
`Internet-accessible resources are currently referenced using
`
`Uniform Resource Locators (URLs). Because URLs are lo-
`* More informationis available at bttp://www.gatech.edu/iiir/
`More information is available at
`bttp://www.cori.reston.va.us/
`
`
`
`3 Publishing and Name Assignment
`
`3
`
`
`
`activities occur. These lower-level activities are not con-
`cerned with the semantic contents of files, only with ensur-
`ing that files may be accessed efficiently and that the byte
`contents offiles are not corrupted.
`The above arguments about the need for two types of
`name pertain to the upper level. At the lower level, there is
`a need for LIFNs, but not for URNs. Mirror sites use LIFNs
`and their associated file digests to ensure that their copies of
`files have not been corrupted. A cache site needs to be able
`to tell a user or client program whether it holds a copy of
`a requested file, and for this purpose it can answer whether
`or notit holds a copy of a particular LIFN.
`The above considerations led us to implement LIFNs at
`the lower level of the file access system and URNsat the
`upper level, but to make LIFNs visible at the upper level
`as well. A publisher will be responsible for assigning both
`a URNand a LIFN to any resource for which cataloging
`information is provided. For other files, only LIFNs need be
`provided. At any given time, a URN thatrefers toa file or a
`set offiles is associated with exactly one LIFN. A URN may
`be associated with a set ofdifferent LIFNs over the URN's
`lifetime, but we require that the set be in the form of a linear
`sequence, with the sequence order given by increasing time.
`The LIFN and URN name spaces are subdivided among
`several publishers, also called naming authorities, who are
`responsible for ensuring the uniqueness of names assigned
`within their portions of the name spaces. A name is formed
`by concatenating the registered naming authority identifier
`with a unique string assigned by the naming authority. The
`LIFN and URNare formatted as
`
`
`
`Figure 2: Publishing steps
`
`any other parties that mirror the resource, must register
`such locations with the appropriate name-to-location lookup
`services.
`Such name-to-location services are discussed in
`Section 4.
`Thus, publishing a resource involves the following steps,
`shown in Figure 2:
`
`1, creating the resource's catalog record in the form of a
`
`8G)
`ar
`LIFN:<publisher id>:atring
`Named
`‘
`URN :<publisher id>: string
`The publisher id portion of the name is used to:lo-
`cate appropriate URN and LIFN servers for that publisher.
`Given a URN, a URNserver returns a Uniform Resource
`Citation (URC) for that URN that includes its currently
`associated LIFN, as well as other cataloging information.
`. Tegistering the URC with the URN server,
`Given a LIFN, a LIFN server returnsalist of locations for
`that LIFN. More information about accessing URCs and
`. informing mirror sites of the new or updated file,
`files from their URNs and LIFNs may be found in Section
`4
`
`2; signing the catalog record with the publisher's private
`key,
`
`3.. making the resourcefiles available on one or morefile
`servers,
`
`ra
`
`. registering the file locations with the LIFN server,
`
`amowo. exporting relevant portions of the URC to search ser-
`
`vices.
`
`Steps 1 and 5 have been discussed above. Steps 2 is discussed
`in Section 5, and Steps 3, 4, and 5 are discussed in Section
`4.
`
`4 Name Resolution and File Mirroring
`
`The publisher provides cataloging information for each
`URN it assigns. The catalog record includes information
`such as title, author, abstract, etc. A recommended set of
`attributes for software assets is given by the Reuse Library
`Interoperability Group (RIG) Basic Interoperability Data
`Model[1]. In addition, the catalogue record for a URN in-
`cludes its currently associated LIFN, as well as an MD5 or
`similar fingerprint for that LIFN. This fingerprint is a 128-
`repository will be
`Resources available from the virtual
`bit quantity resulting from applying the MD5 function to
`named by URNs and/or LIFNs, rather than by URLs. Thus,
`the contents of the file. The function is designed to make
`WWWclients will need a means of resolving a URN or LIFN
`it computationally infeasible to find a different sequence of
`to one or more locations, expressed in the form of a URL,
`bytes that produces the same fingerprint
`[10]. To enable
`to be able to access the resource. Access to files is provided
`authentication, the entire description may be cryptographi-
`by conventional file servers, using protocols such as HTTP,
`cally signed, as discussed in Section 5. Portions of the cat-
`Gopher, and FTP.
`alog record may be exported to resource discovery servers,
`
`such as a Harvest Broker [3], which provide search services For a non-file resource, such as a database service,alist
`of locations is associated directly with the URN for that:
`based on resource descriptions. The URN exported to the
`
`search service provides a unique long-lived key, so that de resource. For a file resource, such asafile containing a
`piece of software, the relationship between the URN and
`scriptions may be unambiguously associated with a resource,
`the locations is indirect, via a LIFN - the URN is associated
`and so that a resource tums up at most once inalist of
`search hits.
`with a LIFN,and the LIFNis associated withalist of URLs.
`‘For a name to be useful, there must be some means of
`The LIFN-to-location mapping service is provided by
`
`resolving a nametoalocation from which the resource can a network of LIFN servers, collectively called the LIFN
`be retrieved or accessed. Thus,
`the publisher, as well as
`database.
`‘These servers process queries for locations of
`
`4
`
`
`
`Search reques! E>
`User! eecomeURNs)
`Clientrosa.
`
`4
`
`Figure 3; File access steps
`
`A file server can mirror a file by acquiring a copy ofit
`and posting an update to a LIFN server for the file’s naming
`authority.
`If a file server moves or deletes a file, then it
`would post that information as well.
`It
`is not necessary
`to keep all LIFN servers for a particular naming authority
`perfectly synchronized. Such synchronization would entail
`too much overhead.
`Instead, location updates are posted
`to a any LIFN server and propagated to.other peer servers
`using a batch update protocol.
`Updates to the URN server are posted by the publisher
`and by others authorized by the publisher to update the cat-
`alog record for a given URN. In order to ensure a consistent
`linear history of updates to the catalog record for a URN
`(e-g.,
`the sequence of LIFNs associated with that URN),
`replicated URN servers use a master-slave update protocol.
`One of the most important aspects of our use of LIFNs
`is that it assures the user of retrieving the most up-to-date
`copy of a file referenced by a URN, without the overhead
`of a replica control protocol between file servers mirroring
`thatfile, which in general will not al) be under the contro] of
`the URN's naming authority. This assurance is modulo the
`time required for the master-slave update protocol for the
`replicated URN servers, but if the user insists on contacting
`the master URN server, he is ensured of getting the most
`up-to-date copy.
`
`2. Query the URN server to retrieve the URC which con-
`tains the currently associated LIFN.
`3. Authenticate the URCif desired.
`
`4. Use DNSto locate an appropriate LIFN server.
`
`5. Query the LIFN server to retrieve a list of locations.
`6. Choose a location from which to retrieve the file.
`
`LIFNs. They also accept updates from file servers contain-
`ing new locations for LIFNs, as well as requests to delete old
`LIFN-to-location mappings. A naming authority may run
`its own LIFN servers, or it may find another organization
`§ Authenticity, Integrity, and Consistency of Resources
`willing to provide the service on its behalf.
`The URNservice is similar to the LIFN service, except
`Authentication of a resource verifies that the resource was
`that it maps a nameeithertoalist of locations or to a URC
`published by its purported publisher. Verifying the integrity
`that includes a LIFN.:.For,fault tolerance and availability,
`the URN.service is also provided by a network ofservers.
`;
`of a file ensures that the file has not been modified. Pro-
`visions for authenticity and integrity checking are necessary.
`Mappings from naming authority identifiers to URN and
`for a software repository because there have been ‘instances,
`| ;.-
`LIFN ‘servers are ‘stored in the the Domain Name System
`of software packages stored on a public repository'that were
`(DNS) name space, 30 that a client program can deter-
`modified by intruders to introduce security holes which were
`mine which URN ((LIFN) server to query for a particular
`then spread to other systems!'. Our authentication and in-
`URN (LIFN). Our current client uses an ordinary DNS
`tegrity mechanisms are similar to those describedin [11] and
`lookup for IP address records. The publisher identifier is
`prepended to the string .LIFN.NETLIB.ORG (for a LIFN)
`or ,URN.WETLIB.ORG (for a URN). The resulting string is
`treated as if it were the name of an Internet host, and DNS
`signs the catalog record for aresource. In the case ofafile re-
`source, this record includes the file’s LIFN and MD5finger-
`is queried to find the |P addresses of that host. For example,
`print. Any client in possession of the publisher's public key
`to find a LIFN server for the naming authority foo, the client
`can verify the authenticity of the resource description. Pub-
`would look up the IP addresses for foo.LIFW.WETLIB.ORG.
`lishers are expected to widely advertise their public keys to
`Several IP addresses may belisted for any one naming au-
`makeit difficult for an attacker to substitute rogue keys. In
`thority. Our client attempts to query each IP address until it
`addition, publishers may have their keys certified by trusted
`finds one that can satisfy the LIFN or URN lookup request.
`third parties to further establish their authenticity, as in
`Thus, the steps involved in resolving a URN so as to
`{ul"mine that the association between a LIFN andafile
`access a copy of thefile it names are as follows, as shown in
`Figure 3:
`signature (e.g., the MD5 fingerprint) is known to be correct
`1. Use DNSto locate an appropriate URN server.
`(either because the signature is part of the LIFN or because
`of the description authentication described in the preceding
`ph), a client may perform an integrity check on a
`retrieved file by computing the signature for the file and
`comparing, it with the one known to be associated with the
`file’s purported LIFN. Recall from Section 4 that a LIFN
`server returns a list of locations for a given LIFN but does
`not guarantee the correctness of those locations. A location
`may be incorrect if it no longer exists or if the contents of
`that location are wrong.
`In the former case,
`nofile will be
`returned from that location. The latter condition may be
`detected by the client performing an integrity check.
`In practice, Steps 4 through6will often be replaced by using
`To ensure consistency within a groupofrelated files, we
`the LIFNto access a local cache server. Because the binding
`allow a URN to refer to a set of files. There are at least two
`between a LIFN and the byte contents it points to is fixed,
`the cached copyis sure to be correct.
`
`9].e Recall from Section 3 that a publisher cryptographically
`
`11 Bor an example, see the CERT advisory at ftp://ftp.cert.org
`/pub/cert advisories/CA-94:07.wuarchive.ftpd.trojan.horse
`
`5
`
`
`
`server a query containing a LIFN causesalist of URLs to be
`cases where this might occur. One case is where a resource
`consists of a number ofrelated files, for example thefiles
`returned, possibly along with other information. Sending a
`making up a software package. Of course, such a setof files
`BFD LIFN server an update containing a LIFN/URL pair
`could be made available instead as a single tar file.
`If a
`(and possibly additional
`location-specific descriptive infor-
`file can be used in more than one package, however, or if
`mation) causes that pair to be added to the database.
`somefiles are also of use individually, it might be preferable
`The URN database and protocols have been imple-
`to make the files available separately. Another case is when
`mented in an analogous manner. The current URN server
`stores only the LIFN attribute in the URC for-a URN.
`there are alternative versions ofa file - for example, multiple
`precisions of a Fortran routine, or multiple formats of an
`To test the system, LIFNs were assigned to the software
`image.
`components making up the LAPACK directory in Netlib
`The first case is handled by ordering the files making up
`(around 2500 files total). Each of these LIFNs was of the
`form
`the resource and considering the ordered list of LIFNs for
`these files to be the contents of another file which wecall
`the composite-parts-list for the resource. The composite-
`parts-list file itself has a LIFN, andit is this LIFN that is
`associated with the URN for the resource. The second case
`is handled in a similar manner, but thefile containing the
`ordered list of LIFNsis called the alternative-parts-list
`for the resource. The parts-list may contain additional in-
`formation, such as how the alternative parts vary. After
`When a client program requests such a URL from the file
`retrieving a parts-list, the client program will invoke a spe-
`server, the file server either returnsafile that is ‘correct for
`cial module for handling it, similar to how current browsers
`the given LIFN, or it returns an error indicating that thefile
`invoke viewers for image or sound files. This module will
`corresponding to that LIFN was not found. The overhead
`assist the user in retrieving the componentfiles and saving
`for assigning these LIFNs involved running a script
`that
`or displaying them locally.
`computed the MD5 signatures and generated the LIFNs,
`created a directory that aliased the ascij form of the MD5
`signatures to the actual
`file locations, and registered the
`LIFN-to-URL mapping with the LIFN server. For the 2482
`files in the test described above, this script took 2 minutes
`35 seconds CPU time.
`
`lifn:netlib:<signature>
`
`where <signature>is the ascii form of the MD5 signature
`of the file. The URLs listed for the LIFNe were of the form
`
`<protocol>: //<hostname>/<path>/¢<lifm
`
`6 Prototype Implementation
`
`The naming system is being implemented as part of the
`Bulk File Distribution (BFD) package. BFD is part of the
`implementation of the National HPCC Software Exchange
`(NHSE), which is being developed by the Center for Re
`search in Parallel Computing (CRPC), a consortium of uni--
`versities and national laboratories formed to make high per-
`formance and parallel computing accessible to engineers and
`scientists. BFD URN and LIFN servers will run at al) the
`CRPCparticipating sites, as well as at other major NHSE
`sites, such as Oak Ridg