`Distributed Software Repositories‘
`
`Shirley Browriel Jack Dongarra. Stan Green, Keith Moore
`Theresa Pepin, Tom Rowan, and Reed Wade
`University of Tennessee
`Eric Groese
`AT&T Bell Laboratories
`
`Abstract
`
`A location-independent naming system for network ro-
`sourcel has been designed to facilitate organisation and de-
`scription of software components accessible through a vir-
`tual distributed repository. This naming system enables
`easy and efiicient seardting and retrieval, and it addresses
`many of the consistency, authenticity. and integrity issues
`involved with distributed anftware repositories by providing
`mechanisms for grouping resources and for authenticity and
`integrity checking. This paper details the design of the nam-
`ing system, describes aprototype implementation of some of
`the capabilities. and describes how the system fits into the
`development of the National HPCC Software Exchange, a _
`virtual software repository that has the goal of providing ac-
`cess to reusable software components for bigh—perfor'n1ance
`computing.
`
`1
`
`Introduction
`
`Well-maintained software repositories are central to software
`reuse because they make high-quality software widely avail-
`able and easily accessible. One such repository is Netlib',
`a collection of high-quality publicly available mathematical
`softwai-e[6, 4]. Netlib,
`in operation since 1985. currently
`processes over 303,000 requests a day. Netlib is serving as
`a prototype for devc-loprnt of the National HPCC Soft-
`ware Exchange (NHSEP, whirl: has the goal of enoornpass-
`in; all High Performance Computing and Communications
`(HPCC) software repositories and of promoting reuse of
`software components developed by Grand Challenge and
`other scientific computing researchers
`Other network-
`"Phe work described in this paper is sponsored by NASA under
`Grant No. NAG 5-‘I735, by the National Science Fbundntion under
`Grant No. A50-9103853. and by A’!‘lr"I‘ Bell Laboratoriu.
`Ill?
`‘Author to whom correspondence should be directed.
`Ayn: Hall. Computer Science Depanmsnr, University or Tinnensee.
`Knoxville. TN 37996430]. (615) 374-5855, hrniI'ntOI3.utlt.¢du
`'Acoeuible
`from
`a World Wide Web
`browser
`at
`an :]fww1r.nav.lib.m-5]
`Accessible at In.l.p1][1IrIrIv.natIib.uI-3/nae]
`
`accessible software repositories include ASSET’. CARDS‘,
`Dans‘-‘. ELSA‘, the GAMS Virtual Software Repository’,
`and STARS‘. ASSET. CARDS. DSR5. and ELSA are pan
`ticipating in an interoperability experiment that allows a
`user of any one of these repositories to access software ex-
`ported from the other repositories.
`The software reuse marketplace is expanding in at least
`two dimensions. One dimension is the expansion from intra-
`organizational reuse to inter—or'ga.ni2.ationa.l reuse. For ex-
`ample. various federal agencies have established their own
`internal software reuse programs. Several efforts are now
`underway to promote reuse of software across agencies.
`Similarly. companies are becoming interested in accessing
`software produced by academic and government research
`groups. An_ot__b_er dimension ofelrpansion is from reuse within
`"a7'partir:uJa.'r application domain to interdisciplinary reuse.
`. Reuse of software from other disciplines is being fostered,
`for example. by 'e["l'orl.a to solve intendisciplinanr Grand Chal-
`lenge problems. Solution of such problems will require col-
`laboration by scientists from different diaciplinu. as well as
`sharing of software Pmduced by application and computer
`scientists.
`Another recent development that affects the software
`reuse marketplace is the growth of the World Wide Web
`(WWW), together with the ease with which individuals may
`make resourca available on a WWW server. A oontributor
`need only make the files composing an resource available
`on a file server and make available a desrrriptive HTML file
`containing pointers to the ‘resource files.
`Growth in the popularity of the Internet and the World
`Wide Web. as well as the wide availability of WW
`client and server software. has accelerated the shift from
`centrally maintained software repuaitoria to virtual, dis-
`tributed repositoriel For example. the GAME Repository,
`once a central repository, is now a virtual repository that
`catalogs software maintained by other repositories
`Sim-
`ilarly, the NHSE will provide a unifonri interface to a virtual
`HPCC software repository that will be built on top of a dis-
`tributed set of discipline-oriented nepositories[5], as shown
`in Figure I.
`The main advantage of distributing a repository is to
`
`3 Accessible at lmp://oouree.usat.com/
`" Acounihlc at Imps} fdoalora.-.n.rdI.oom[
`sficcueible at lrttpif landl..Irna.dI.|a.naiIflrp/dsrspngo.|rt|nI
`6 Acoeuible at
`In trr=.UrIrn.nnn untain.neI!E‘.l.3AlalsaJob.html
`7 Accellible at http:/;'5an:o.aIat.3ovf
`' Accessible It
`ht tp://www .nu-a.baltstan.paruuax.oo:a/ladex.ht ml
`
`EMCVMW 1002
`EMCVMW 1002
`
`1
`
`
`
`o The more decentralized and smaller the individual
`repositories become, the less practical it becomes for
`each individual repository to provide the full range of
`search and authentication services.
`
`Most of the above problems can be alleviated by im-
`plementing a location-independent naming system that in-
`clud mechanisms for authenticity and integrity checking.
`We have
`a naming system that provid for two
`levels of naming. The binding between a lower-level name
`{called a LIFN) and file contents is unchangeabls and ver-
`ifiable. A lower-level name may be resolved to multiple,
`mirrored copier.
`In the case where it represltnidi a set of
`files. the name I-nay be resolved to a list of other names. A
`higher-level name (called a URN) is associated with a cat-
`aloging record that includes the lower-level name as well as
`other descriptive infonnation. This record may be crypto-
`graphically signed by the publisher so that users may verify
`the authenticity of a retrieved resource. At any given time,
`a higher-level name is usociated with exactly one loweri-
`Ievel name. but this binding may change over time. Higher-
`Ievel names allow for long-lived human-rermlable references,
`while lower-level names pennit reliable caching and mirror-
`ing as well as permitting precise references when needed.
`Location-independent names will be the basis of tra.I‘I3]'.‘M|l"-
`ent mirroring. They will also provide a unique key to whidi
`third parties may attach value-added infonnation such as
`additional cataloging information and quality assessments.
`This paper describes the design of our naming system. We
`also describe our implementation of a prototype name-to-
`location service and ofa modified WWW client that does
`Pname r-esnlulion-.' A glossary of acronyms and terms used in
`it this paper isfiyirrcluded as an appendix.
`
`RElaIE|d:‘h~lDI:'k
`'2
`The use of a public-key encryption'tcclu:Iique for authenti-
`cating the source of a software component and for ensuring
`that the component has not been altered subsequent to its
`publication is proposed in
`Cryptographic information, in
`the form of a digital signature created by signing the hushed
`digest of the contents of a component,
`is included within
`the component's unique identifier. The proposed method
`is intended to prevent not only changes by unauthorized
`partim, but_al.so changes by the original author - i.e., the
`author is not permitted to modify a. component without as-
`signing a new unique idtilier. The method assumes that
`each author has been assigned a globally unique Author ID.
`has chosen an asymmetric publicjprivate key pair, and has
`publici-secl the public key to the cornmunity of-potential re-
`users. A newly dlosen symmetric encryption key is used
`to encrypt the component itself. Then the symmetric key,
`the hashed digt of the component. and the Author ID are
`concatenated and encrypted using the asymmetric private
`loey, and the result is ooncatenated to the clear-text ver-
`sion of the Author [D to create the unique identifier for the
`component. The method do: not address name-to-location
`resolution, other than to say that the encrypted component
`is made available along with the unique identifier and any
`other cleartext information. The proposed unique identifier
`is similar to our LIFN, and encryption of the hash digut
`and Author ID is similar to our method of having the author
`cryptographically sign a catalogue record that includes the
`author name and the file's MD5 signature. Our method al-
`lows a droice of encryption algorithms, however, and allows
`the digital signature used for authentication to be generated
`
`Figure 1: Virtual Repository Arlzhitecture
`
`allow the software to be maintained by those in the best po-
`sition to keep it up-to-date. Also, copies of popular software
`packages may be mirrored by a number of sites to increase
`availability (e.g., if one site is unreachable, the software may
`be retrieved from a different site) and to prevent bottlenecks.
`Dupite the benefits, distributed maintenance and mir-
`roring of software poses the following drallengea.
`
`0 Maintaining the quality of software and of indexing
`irriormation and presenting a uniform seardring and
`browsing interface become much more diflicult.
`
`The WWW mechanism of specifying a file by its Uni»
`fonn Resource Locator (URL) is inadequate for ensur—
`ing the consistency and currency of mir1'ored copies, as _
`a URL for an independently mirrored copy of a soft-‘ ‘
`ware package may point
`to an out—of-date copy and.
`give no indication that it is not up-to-date. Further-
`more, mirror copies of a file cannot be located from‘a'
`URL reference. since each copy has a different URL.
`
`'
`
`_
`
`5,
`
`Consistency between a set of files that are meant to
`be used together must be maintained. For example,
`the Netlib Software Repository provides dependency
`diedting that allows the user to retrieve a top-level
`routine plus all routines in its dependency tree [i.e.,
`those routines that are called directly or indirectly by
`the top-level routine). Another example is a graph-
`ical parallel programming environment that relies on
`an underlying parallel communications support paell-
`age. The problem becomes more complex when differ-
`ent pieoea might be retrieved from difierent physical
`repositories.
`Ideally. the user should be able to have
`a consistent set retrieved automatically without hav-
`ing to scan documentation to verify that compatible
`pieces have been retrieved.
`
`As the number of reuse librariu grows, users cannot
`be expected to access each of them separately using
`a different interface. Thus, scalable interoperability
`between separately managed repositories is needed.
`
`in the environment of accessing a few well-established
`repositorim that the user knows and tnlats, a user
`is assured of the integrity and authenticity of a re-
`trieved file because these properties are provided by
`the administrative procedures of that repository. With
`a large number of Ins familiar repositories, however.
`it becomes necuam-y as establish interoperable trust
`mechanisms and to reduce the number of parties with
`whom the user must establish tnist.
`
`2
`
`
`
`independently and at a dillerent time from the component's
`identifier.
`Functional requirements for Uniform Rsource Names
`[URNs) are proposed in [12] by the IETF‘ Unifonn Resource
`Identification (URI? Working Group. According to [12], the
`function of a
`is to provide a globally unique. per-
`sistent identifier used for recognition of and for access to
`characteristics of a resource or to the resource itself. URN
`"assignment
`is delegated to naming authorities.
`the names
`of wl‘I.ic.h are persistent and globally unique. and who may
`usign narnu directly or delegate their authority to sub-
`authcrities. Global urriqneuus of U'R.Ns ll guaranteed by
`requiring each naming authority to guarantee uniqueness
`within its portion of the URN nanlespace.
`it is left up to
`mch naming authority to determine the conditions under
`which it will issue a URN {for example. whether or not to
`issue a new URN when the contents of a file change). Some
`test implementations of UHNI are underway by members of
`the URI Working Group at Georgia Tech and Bunyip Cor-
`poration ’. The Georgia Tech testbed uses the whois+-l-
`protocol for URN to URC resolution. A URC. or Uniform
`Resource Characteristic. is a catalog record which includes
`locations. or l.ill.l..s, at which the resource may be accessed.
`The URC server supports searching by other attributes. in
`addition to URN lockup. via the whois++ protocol. A mod-
`ified version of Mosaic that does URN to URC resolution is
`available. A proxy server based on CERN httpd that does
`cacheing by UitNs is also running at Georgia Tech.
`As part of the Computer Science Technical Report
`(CSTR) project
`[8]. which is developing an architecture
`for distributed digital document libraries, the Corporation
`'_ for National Research initiatives [CNRl) is implementing a
`2
`rram'e-to-location resolution service called the Handle Man-
`agement System (HMS) '°. CNl1l's hoadlsis a name for a
`digital objscland is analogous to .IETF“s URN. The l-[MS.in-
`cludes a Handle Generator that a naming authority may run
`and use to create globally unique handles, Handle Servers
`that process update requests from naming authorities and
`query requests from clients to resolve handles. and a Handle
`Server Directory that maps a handle to the appropriate Han-
`Iile Server. The distribution of handles to Handle Servers is
`based on a hashing algorithm. An electronic mail interface
`is used by handle administrators to add, delete. and modify
`handle entries in the Handle Server database. Clients use
`a UDF datagrarn interface to request location data associ-
`ated with a handle. A modified version of Mosaic that does
`handle resolution is available from CNRI. The types of lo-
`cation information stored by Handle Servers include URL,
`repository name. email address". and X.5OE} Distinguished
`Name. Use of a repository name by a client requires an-
`other round of name-to-location resolution. CNRl“s proper-
`ties record that describes the properties of a digital object
`is analogous to lETF"s URG. The properties record is not
`stored by the HMS, butratber by an Infonaation and Refer-
`ence (IR) Server that is to be maintained by each repository.
`Each naming authority may also maintain an IR scnrer con-
`taining s properties record for each digital object within its
`authority
`
`3 Publishing and Name Assignment
`
`Internet-accessible resources are currently referenced using
`UniI'orrn Rmource Locators
`Because URL8 are lo-
`‘ More information it available It httpt//vIInv.g-a£eels.edufiiir/
`"More inforrnatiou is available as
`httpi//ww'w.e.n.rl.re.staaI.vn.u.s]
`
`cations rather than names. their use as references presents
`at least two pntlblems. one problem is that files get moved,
`changing their Ul'1‘.l..s. Then pointers that contain the old
`URLs become stale. One can leave a forwarding address at
`the old URL. but forwarding addresses are an awkward and
`inelegant solution. Another proble with using URLs as
`references is that mirrored copies of film cannot be located
`from a URL reference, since each copy has a. rlifierent URL.
`it has been widely recognised that a solution to the
`above problems is to assign location-independent names to
`files and to provide a name-to-location service that." given
`a name. returns a list of locations for that name. A re-
`source provider who moves some files need only delete the
`old name-tcrlocation bindings and register the new bindings
`with the name-to-location service. Likewise, asite that mir-
`rors is copy of a file need only register its location with the
`name-to-location service. Then ausernttempting to retrieve
`the file corresponding to a location-independent name may
`query the name-to-location service for a list of alternative
`locations to be tried.
`Our work is similar to the 1ETF‘s Uniform Resource
`identifier Working Group's work on Uniform Resource
`Names (Ullhls) [12] and to CNRl’s worlc on unique docu-
`ment identifiers for digital libraries
`However. neither of
`these groups has addressed the reliability and consistency is-
`sues addressed by our two-level naming system. Our system
`includes a lower-level name a called Location Independent
`File Name [l..lFN) and a higher-level name called a Uniform
`Resource Name
`An important quwtion is whether the byte contents of
`the file referred to by a location-independent name should
`be fixed. or tie allowed to change.
`If the byte contents are
`allowed to change. then a' l‘n.i'ther Question arisa as to what
`should be the consistency requirements for alternative loca-
`tions for the same name; "Valid arguments for both cases
`can be made for "different situatiorut. For example. for soft-
`wan: resources it is desirable to have an unambiguous refer-
`ence to the fixed byte contents for the purpose of attaclung
`a review or reporting experimental or performance results.
`Fixed contents also make it possible to compute a file dl
`gest that may be cryptographically signed by the author
`of the resource, allowing verification of the integrity of s
`retrieved file. On the other hand. it is desirable to have a
`reference to a software package that need not be changed ev-
`ery time a bug fix or minor revision taitu place. especially
`if the cataloging information [e.g., title. author, abstract}
`does not change. The cataloging infonnation for a software
`package might contain a reference to a Web page describing
`andfor documenting the
`The author of the Web
`page would like to be able to update the page without-having
`to change all the references to it. A non-software example
`where it would be desirable to allow contents to change is
`a name that refers to a file containing the "|:'u.rre11t weather
`map“.
`Because both types of name are needed. we have imple-
`mented both. The type ofname that refers to fixed byte con-
`tents is called I Location Independent File Name, or LIFN.
`Once a LIFN has been assigned to a particular sequence of
`bytes, that binding may not be changed. The type of name
`for which the contents to which it refers may change is called
`a Uniform Ruouroe Name, or URN.
`We divide the file access system into two levels. The
`upper level is where publishing. cataloging. and searching
`activities take place.
`‘These upper-level activities are con-
`cerned with the semantic. or intellectual, contents of files.
`The lower level is where distribution. mirroring, and caching
`
`3
`
`
`
`activities occur. These lower-level activities are not con-
`cerned with the semantic oontents oi’ files, only with ensur-
`ing that fil may be accessed efilciently and that the byte
`contents of files are not corrupted.
`The above arguments about the need for two types of
`name pertain to the upper level. At the lower level, there is
`a need for LlFNs, but not for URNs. Min-or sites use LlFNa
`and their associated file digate to ensure that their copies of
`files have not been conupted. A cache site needs to be able
`to tell a user or client program whether it holds a copy of
`a. requested file. and for this purpose it can answer whether
`or not it holds a copy of a particular LIFN.
`The above considerations led us to implement LlF"Ns at
`the lower level of the file access system and URN: at the
`upper level, but to make LlF‘Ns visible at the upper level
`as well. A publisher will be responsible for assigning both
`a URN and a LIFN to any resource for which cataloging
`in.I'ormatioI'I is provided. For other files, only I..lF‘Ns need he
`provided. At any given time, a URN that refers to a file or a
`set of files is associated with exactly one LIFN. A URN may
`be associated with a set of difiereot Lil’-‘No over the URN's
`lifetime, but we require that the set be in the form of a linear
`sequence, with the sequence order given by increasing time.
`The LIFN and URN name spaces are subdivided among
`several publishers, also called naming authorities, who are
`responsible for ensuring the uniqueness of names assigned
`within their portions of the name spaces. A name is formed
`by concstenating the registered naming authority identifier
`with a unique string assigned by the naming authority. The
`LIFN and URN are formatted as
`
`Figure 2: Publishing steps
`
`any other parties that mirror the resource, must register
`such locations with the appropriate name—to-location loolmp
`services. Sud: narneto-location services are discussed in
`Section «L
`Thus, publishing a resource involves the following steps,
`shown in Figure 2:
`
`. creating the resource's catalog record in the form of a
`URC.
`
`I.IF'fl:<pub1iaher idinstring
`lJll.l:<pub1iaher id>:string
`
`L
`
`t
`
`.‘
`
`,_
`
`'
`
`. signing the catalog record with the publisher‘: private
`'' ‘
`' key!
`
`The publisher id portion of the name is used to'I‘-|o-
`cate appropriate URN’ and Lil"-‘N servers for that publisher.
`Given a URN. a URN server returns a Uniform Resource
`Citation (URC) for that URN that includes its currently
`associated LIFN. as well as other cataloging inlorrnation.
`Given a LIFN, a LIFN server retums a list of locations for
`that LlF'N. More inlormation about accessing URCe and
`files from their URN: and LlFNs may be found in Section
`4.
`
`The publisher provides cataloging inlormatino for each
`URN it assiyrs. The catalog record includes information
`sud: as title. author. abstract, etc. A recommended set of
`attributes for software assets is given by the Reuse l..il3ra.ry
`Interoperability Group (RIG) Basic Interoperability Data
`Model
`In addition. the catalogue record for a URN in-
`cludes its currently associated LIFN, as well as an MD5 or
`Iirnilar fingerprint for that LIFN. This fingerprint is a 128-
`bit quantity resulting from applying the MD5 function to
`the contents of the file. The function is designed to make
`it oomputationally infeasible to Find a diflerent sequence of
`bytes that produces the same fingerprint
`[l0}. To enable
`authentimtion, the entire description may be cryptographi-
`cally signed, as discussed in Section 5. Portions of the cat-
`alog record may be exported to resource discovery servers.
`Such as a Harvest Broker [3], which provide search serviou
`based on resource "descriptions. The URN exported to the
`search service provides a unique long-lived lacy, so that de-
`scriptions may he unambiguously associated with aresource.
`and so that a raource turns up at most once in 3 list of
`search bits.
`-For a name to be useful, there must be some means of
`resolving a name to a location from which the resource can
`be retrieved or accessed. Thus,
`the publisher. as well as
`
`making the resource files available on one or more file
`servers,
`
`. registering the file locations with the LIFN server,
`
`. registering the URC with the URN server,
`
`. informing mirror sites oi’ the new or updated file,
`
`. exporting relevant portions of the URC to search ser-
`vicar.
`
`Steps 1 and 5 have been discussed above. Steps 2 is discussed
`in Section 5. and Steps 3. 4, and 5 are discussed in Section
`4.
`
`4 Name Resolution and File Mirroring
`
`repository will be
`Resource available from the virtual
`named by URNs and} or LlFNs. rather than by URLs. Thus,
`WWW clits will need I mesrts of resolving a URNUI LIFN
`to one or more locations. eicpreaed in the form ole. URL.
`to be able to access the resource. Access to files is provided
`by oonventionsl file servers. using protocols such as HTTP.
`Gopher, and FTP.
`For a non-file resource, such as a database service, a list
`of locations is associated directly with the URN for that-
`reoooroe. For a file resource, such as a file containing a
`piece of eoltware. the relationship between the URN and
`the locations is indirect, via a L-lFN — the URN is associated
`with a LIFN, and the LIFN is associated with a list ol'URLs.
`The LIFN-to-location mapping service is provided by
`a network of LIFN servers, collectively called the l..lPN
`database. These servers process queries for locations of
`
`4
`
`
`
`Search rcqucs
` mg U'RNs)
`
`Figure 3: File access steps
`
`l..lF‘Ns. They also accept updates from file servers contain-
`ing new locations for LIFNs. as well as requests to delete old
`LlF‘N—to-location mappings. A naming authority may run
`its own LIFN servers. or it may find another organization
`willing to pmvide the service on its behalf.
`The URN service is similar to the LIFN service, except
`that it maps a name either to a list of locations or to a URC
`thatincludes a l..lF_N‘.»-.F0l:_l.hl.I‘ll. tolerance and availability,
`l.he'Ul'i.N. service is also provided by a network of servers.
`‘ Mappings from nan'_|ilig authority identifiers to URN and
`LlFN‘a'e'Fiier-s an-e'st.or'ed’i'n the the Domain Name System
`(DNS) name space,
`that a client program can deter-
`mine wh.ic.l1 URN {LlFN) server to query for a particiilar
`UR.N_ (LlF'N}. Our current client uses an ordinary DNS
`lookup for IP address records. The publisher identifier ia
`prepended to the string .LIFll.lE'fLIB.DRG (for a LIFNJ
`or .URH.IETLIB.lJEG [for a URN}. The resulting string is
`treated as if it were the name of an Internet host. and DNS
`is queried to find the IP addresses of that host. Fbr example,
`to find a LIFN server for the naming authority too, the client
`would look up the ll’ addressu for !oo.LIFl.l'ET|.IB.DB.G.
`Several IP addresses may be listed for any one naming au-
`thority. Our client attempts to query each IP address until it
`finds one that can satisfy the LII-‘N or URN loolrup request.
`Thus, the steps involved in rlving a URN so as to
`access a copy of the file it names are as follows. as shown in
`Figure 3:
`
`1. U2 DNS to locate an appropriate URN server.
`
`2. Query the URN server to retrieve the URC which con-
`tains the currenotly associated LIFN.
`. Authenticate the URC if desired.
`
`. Use DNS to locate an appropriate LIFN server.
`
`. Query the LIFN server to retrieve a list of locations.
`. Choose a location from which to retrieve Lhe file.
`
`In practice. Steps 4 through 6 will often be replaced by using
`the LIFN to access a local ache serum’. Because the binding
`between a LIFN and the byte contents it points to is find.
`the cached copy is sure to be correct.
`
`A file server can mirror a file by acquiring a copy ol'iI.
`and posting an update to I LIFN server for the file's naming
`authority.
`If a file server moves or deletes a file, then it
`would post that information as well.
`It
`is not necessary
`to lteep all LIFN servers for a particular naming authority
`perfectly 3)"IIClI.|'0l!Il‘M.‘d. Such aynchmniration would entail
`too much overhead.
`instead, location updates are posted
`to a any I..lFN server and propagated to.otlIer peer servers
`using a batch update protocol.
`Updates to the URN server are posted by the publishe-
`and by others authorized by the publisher to update the cat-
`alog record fora given URN. In order to ensure a consistent
`linear history of updates to the catalog record for a URN
`(e.g.,
`the sequence of LIFNI associated with that URN],
`replicated URN server: use a master-slave update protocol.
`One of the most important aspects of our use of LiF'Ns
`is that it assures the user of retrieving the most up-to-date
`copy of a file referenced by a URN. without the overhead
`of a replica control protocol between file servers mirroring
`that file, which in general will not all be under the control of
`the URN's naming authority. This assurance is modulo the
`time
`for the master-slave update protocol for the
`replicated URN servers, but if the user insists on contacting
`the master URN server, he is ensured of getting the most
`up-to-date oopy.
`
`5 Authenticity. Integrity. and Consistency of Ruourcu
`Authentication of a resoune verifies that the resource was
`published by its purported publisher. Verifying the integrity
`of a file ensures that the file has not been rnodified. Pro-
`visions for authenticity and integrity clinching
`necessary:
`for a software repository because there have beeniiostances.
`of software packages stored on a public reposit.ory_'-,tlIaL were
`modified by intruders to introduce security liolw which were
`then spread to other systems“. Our authentication and in-
`9 .
`teigrity rnechauisrns are similar to those described in [11] and
`[ Recall from Section 3 that a publisher cryptographically
`signs the catalog record for a resource. in the case ol'a file re-
`source. thin record includes the file‘: LIFN and MD5 finger-
`print. Any client in possession of the publisher‘: public key
`can verify the authenticity of the resource description. Pub-
`lishma are expected to widely advertise their public keys to
`make it difiicult for an attacker to substitute rogue lceys.
`in
`addition, publishers may have their keys certified by tnlsted
`third parties to further establish their authenticity, as in
`
`[11 .]As.suming that the association between a LIFN and afiie
`
`signature [e.g., the M05 fingerprint) is known to be correct
`(either because the signature is part of the LIFN or because
`of the deaaiption authentication described in the preceding
`paragraph). a client may perlonn an integrity check on a
`retrieved file by computing the signature for the file and
`comparing it with the one known to be associated with the
`file's p'urpDrl-ed I..lF'N. Recall from Section -I that I LIFN
`server returns a lint of locations for a given LIFN but does
`not guarantee the comectness of those locations. A location
`may be incorrect if it no longer exists or if the content: of
`that location are Irrong.
`ln the former case, no file will be
`returned from that location. The latter condition may be
`detected by the client performing an integrity check.
`To ensuit Consistency within a group of related files, we
`allow a URN to refer to a set of files. There are at It two
`
`“?or an eurnpla. nee the CERT advisory at up://f'tp.een.org
`[pub/earI..a.dvlIorioI)‘CA-BI:DT.wuIrehivu.fl.pd.Irojan.bona
`
`-_v
`
`-
`
`5
`
`
`
`cases where this might occur. One case is where a resource
`consists of a number of related files. for example the files
`making up a software package. Of course. such a set of files
`could be made available instead as a single tar file.
`ll‘ a
`file can be used in more than one padiage. however, or if
`some files are also of use individually, it might be preferable
`to make the files available separately. Another case is when
`there are alternative versions ofa file — for example, multiple
`preciaions of a Fhrtran routine, or multiple fonnats of an
`Image.
`The first case is handled by ordering the files making up
`the resource and considering the ordered list of LIFNs for
`these files to be the contents of another file which we call
`the composite-parts—list for the resource. The composite-
`parts-list file itself has a LIFN, and it is this LIFN that is
`associated with the URN for the remmte. The second case
`is handled in a similar manner, but the file containing the
`ordered list of LlFNs is called the alternative-par-ta-list
`for the ruource. The parts-list may contain additional in-
`formation. such as how the alternative parts vary. After
`retrieving a parts~l.ist, the client program will invoke a spe-
`cial module for handling it, similar to how current browsers
`invoice viewers for image or sound files. This module will
`assist the user in retrieving'tl1e component files and saving
`or displaying them locally.
`
`6 Prototype Implementation
`
`The naming system is being implemented as part of the
`Bulk File Distribution (BFD) package. BFD is part of the
`implementation of the National HPCC Software Exchange
`[NHSE), which is being developed by the Center for Re-
`search in Parallel Computing (CRP C}, a consortium of uni-'
`verai tie! and national laboratories fanned to make high per-
`formance and parallel computing accessible to engineer: and
`scientists. BFD URN and LIFN servers will run at all the
`CRPC partidpating sites, as well as at other major NHSE
`sites, such as Oslr. Ridge National Laboratory.
`in addition
`A BFD client
`is a WWW browser that,
`to having the capability to retrieve a file p'ven its URL.
`also has the capability to retrieve a file given its URN
`or LIFN. A version of NCSA Mosaic 2.4 for X Windows
`that has been modified to.support BFD is available at
`http://vvo.net1ib.orgfnaeibfdf. A Bl’-‘D client library
`that can be incorporated into other Web browsers will be
`available soon.
`The prototype implementation of BFI) uses query and
`update protocols based on Sun's Remote Procedure Call
`(RPC)mecl1anisln over UDP. RPC was chosen because it is
`very lightweight [one p-acltet for request, and one for reply).
`widely supported on UNIX platforms. and easy to imple
`ment on other platforms (at least for the portions of RFC
`needed by BFD). The BFD RPC requests are sent toe server
`at afixed port number, rather than using the RPG portma.p-
`per, to avoid the overhead of an extra “PC call.
`To locate a LIFN server. BFD uses an on-dips:-y DNS
`lookup for IP address records.
`l.‘l?ll.lIE'l'LIB.Gltfi is the im-
`plicit root of the Lil"-‘N name tree. For example, to find a
`LIFN server for the naming authority foo, a client
`looks
`up the IP addresses for foo . LII’-‘ll.IE‘l'LIB.nfiG. IP address
`were used instead of new DNS records