`
`http://www.dlib.org/dlib/july99/caplan/07caplan.html
`
`D-Lib Magazine
`July/August 1999
`
`Volume 5 Number 7/8
`
`ISSN 1082-9873
`
`Reference Linking for Journal Articles
`
`Priscilla Caplan
`University of Chicago
`p-caplan@uchicago.edu
`
`William Y. Arms
`Cornell University
`wya@cs.cornell.edu
`
`Abstract
`
`During the past year, great progress has been made in the field of reference
`linking, particularly in the important area of links to journal articles. This
`paper summarizes the current state-of-the-art, describes a general model for
`static linking, compares several current implementations against the model,
`and discusses some of the required future work. Particular emphasis is given
`to the minimal set of metadata needed for reference linking and to selective
`resolution of identifiers, methods by which a client can specify which of
`several copies of an item is accessed.
`
`Introduction
`
`Reference linking is the general term for links from one information object to
`another. The links may appear in a wide variety of contexts, including
`published citations to scientific works, references from a catalog or
`bibliography, and informal references transmitted by email or verbally. In
`recent years, extensive development has been carried out on reference linking
`between journal articles, and recently work has gone beyond journals. One of
`the first projects to examine reference linking systematically was the Open
`Journals Project [Hitchcock 1998].
`
`Recently, several systems have been developed for reference links from online
`journal articles to other journal articles. The most complete, within its limited
`domain, is provided by the NASA Astrophysics Data System [ADS]. Another
`leading example is the National Library of Medicine's PubMed/PubRef
`[PubMed] system, which is used by HighWire Press and others. An excellent
`commercial example is ISI's Web of Science [Atkins 1999]. The International
`DOI Foundation (IDF) is leading another effort, using Digital Object
`Identifiers (DOI), a form of Uniform Resource Name [Paskin 1999].
`
`In February 1999, the National Information Standards Organization (NISO),
`the Digital Library Federation (DLF), the National Federation of Abstracting
`and Information Services (NFAIS), and the Society for Scholarly Publishing
`(SSP) sponsored a one day invitational workshop to discuss issues
`surrounding reference linking, specifically linking from citations to electronic
`
`1 of 18
`
`1 of 18
`
`
`
`Reference Linking for Journal Articles
`
`http://www.dlib.org/dlib/july99/caplan/07caplan.html
`
`journal literature. The report of the February linking workshop is available at
`[Needleman 1999]. The participants identified three major components for
`constructing systems to support reference linking: identifiers for the works; a
`mechanism for discovering the identifier of a work from a citation; and a
`mechanism for taking the reader from an identifier to a particular item. A
`small working group was assembled to review, refine, and elaborate on the
`work of the first workshop. Their report [Caplan 1999a] was the basis of a
`follow-up workshop in June [Caplan 1999b]. This paper is an elaboration of
`that report. It places the results of the workshops within a broader discussion
`of the current state of reference linking.
`
`The generic statement of the reference linking problem is, "Given the
`information in a standard citation, how does one get to the thing to which the
`citation refers?" The major focus of the workshops, however, was citations to
`journal articles. Thus, the problem statement for the meeting of the working
`group was, "Given the information in a citation to a journal article, how does a
`user get from the citation to an appropriate copy of the article?" The working
`group was explicitly asked to consider the situation where there are several
`copies of an item and the user may have a preference for which item copy is
`supplied. The group coined the term "selective resolution" for this situation.
`
`The hyperlinks of the web, using URLs, often perform as surrogates for
`reference links. Hyperlinks can be used to represent citations, to structure
`information, or for a myriad of related purposes, but they suffer from several
`disadvantages when used as reference links. A URL identifies a single
`instance of a work, not the work itself. Since URLs reference a specific
`location, they are vulnerable to changes or poor management of the system at
`that location. Hence, research on reference linking is allied to the development
`of systems of persistent identifiers.
`
`Throughout the study, the emphasis has been pragmatic. What is needed to get
`started? Are there simplifications that can be made in the short term, knowing
`that they will need to be addressed later? However, reference linking goes
`much further than citations to journal articles, and the simplifications that are
`being used to get started must always be considered in the long-term context.
`(See the discussion of dynamic linking below.)
`
`Creations
`
`The first stage in reference linking is to understand to what a reference refers.
`The framework from the IFLA report, "Functional Requirements for
`Bibliographic Records", provides a vocabulary for distinguishing between
`related aspects of an intellectual entity [IFLA 1998]. In the IFLA model, a
`"work" is an abstract conception of some creator. Works are realized through
`"expressions", which are fixed spatial/temporal representations of works, such
`as a performance of a play or a symphony. Expressions in turn are embodied
`in "manifestations", physical representations such as printed books or recorded
`CDs, which may or may not be mass-produced. A specific, single
`manifestation is an "item", also called a "copy".
`
`The European INDECS project has done a careful analysis of these
`distinctions and proposes a categorization that, while somewhat different from
`the IFLA model, is mainly compatible with it [INDECS]. Supplementing the
`
`2 of 18
`
`2 of 18
`
`
`
`Reference Linking for Journal Articles
`
`http://www.dlib.org/dlib/july99/caplan/07caplan.html
`
`IFLA and INDECS terminology, the International DOI Foundation (IDF) has
`contributed "creation" as a useful generic term encompassing the work and all
`of its expressions, manifestations and items.
`
`The distinction between expression and manifestation is useful for works that
`are performed but usually can be ignored for works that have a single
`expression, like most journal articles. Journal articles represent three types of
`creations: the work, or creative output of the author(s); the manifestations, or
`instantiations of the work in print and/or electronic form; and the items, or
`specific copies of a manifestation. An article, for example, could have been
`published in a print and an electronic version. These would be separate
`manifestations, each of which might have multiple items (perhaps several
`hundred copies for the print run, and mirrored online and archival copies of
`the electronic version).
`
`Citations and creations
`
`The author of a citation sometimes refers to a work, sometimes to a specific
`expression or manifestation, and sometimes to an individual copy. Often a
`citation will refer to a specific manifestation only because the citer, working
`from his own copy of the article, is unaware of other manifestations that
`would do as well.
`
`In some cases, however, an author will cite a particular manifestation
`deliberately. The British Medical Journal provides an example of a
`publication where manifestation is significant. Articles are published in three
`manifestations: print, PDF, and HTML. For some articles, the print and PDF
`are abridged versions of the full HTML article, which may be longer, and may
`contain additional figures and references. However the official citation given
`by the publisher refers to the print/PDF manifestations, including the
`pagination, which is not relevant to the HTML.
`
`Consideration of the British Medical Journal leads to the question of under
`what circumstances the different versions should be considered different
`works, as the intellectual content varies. The distinctions between work,
`expression, and manifestation are a matter for judgment. The IFLA model is
`analytic while publishers are declarative, in essence defining different
`manifestations as distinct or equivalent by declaring that they consider them
`so. This example illustrates that the IFLA model must be seen as a general
`framework rather than a precise definition or specification.
`
`In the absence of a clear indication of the author's intentions, it can usually be
`assumed that a citation refers to the work, as both the citer and the reader can
`be expected to be primarily interested in the intellectual content. (This is true
`even though when a citation uses a URL, the author is usually constrained to
`refer to the location of a specific copy.) Most current implementation projects
`focus on citations to works, and hence on the association of identifiers with
`works, while recognizing that occasionally there will be a need to distinguish
`different manifestations. This is the approach taken by the Astrophysics Data
`Center, ISI, and PubMed. One of the central aims of INDECS is to be explicit
`in distinguishing between the underlying work, its various expressions, and its
`manifestations. The IDF is a member of INDECS and is bravely attempting to
`be explicit about the distinctions, but has accepted that its initial services can
`
`3 of 18
`
`3 of 18
`
`
`
`Reference Linking for Journal Articles
`
`http://www.dlib.org/dlib/july99/caplan/07caplan.html
`
`refer generally to "articles". Currently, this cautious pragmatism seems an
`acceptable simplification.
`
`A general model for reference linking of journal articles
`
`Although they differ greatly in details, most current systems fit within the
`framework shown in Figure 1. (A notable exception is SFX, which is
`mentioned briefly below.)
`
`Figure 1. Reference linking
`
`Each work has a unique identifier and one or more copies, each with its own
`URL. The provider of the information, who is usually the publisher, supplies
`metadata about each work. This is stored in databases as shown in the middle
`row of Figure 1. Clients access the databases through the interactions shown
`in the bottom part of the figure. The figure shows two databases: a reference
`database and a location database.
`
`Reference database
`
`For each work, the reference database contains metadata that, at a
`minimum, corresponds to the information in a conventional citation. A
`client that wishes to find the content associated with a reference sends a
`query to the reference database. This database returns a list of identifiers
`for works that match the query.
`
`Location database
`
`Typically each cited work will be stored at several locations. A client
`sends an identifier to the location database, which returns one or more
`URLs. The client selects the URL to retrieve the object. This is known
`as "resolution" of the identifier.
`
`This process has many complications. There will be considerable variation in
`citations; some will be formally published as references within scholarly
`journal articles; some will be formulated as part of more casual
`
`4 of 18
`
`4 of 18
`
`
`
`Reference Linking for Journal Articles
`
`http://www.dlib.org/dlib/july99/caplan/07caplan.html
`
`communications such as course reading lists and informal bibliographies. In
`some cases a citation may contain the identifier of the article explicitly, in
`which case the reference database lookup is not needed; in other cases an
`identifier will have to be obtained by using the bibliographic data elements
`given in the citation. There may be several works in the reference database
`that match the query; the client must select a work either by human
`intervention or by algorithm. When there are several URLs to different copies
`of the work, the system is faced with selective resolution: the client may wish
`to select a specific version based on variations of content, different licensing
`arrangements, or network performance.
`
`Current implementations present several variations on this model. The
`Astrophysics Data Service derives references algorithmically, bypassing the
`reference database lookup. PubMed and the Web of Science combine the
`citation and location databases. Currently, all location databases return a single
`URL, though this is changing. PubMed's LinkOut experiment permits users to
`provide URLs in addition to those provided by publishers. The Handle
`System, which resolves DOIs, has an unused service that is capable of
`returning several URLs or other resolutions of a DOI.
`
`Identifiers
`
`An important question is whether effective reference linking needs identifiers
`other than URLs. The need for persistent identifiers has been widely
`advocated in a broader context than the reference linking problem. (See, for
`example, [Sollins 1994].) Yet, it can be argued that the deployment of general
`purpose Uniform Resource Names (URNs) has been slow and that wonderful
`systems have been built on the web using nothing more than URLs.
`
`While it might be possible to build a reference linking model that does not
`presume the existence of identifiers, this seems unwise. Use of identifiers
`improves the reference linking model in a number of ways. Identifiers
`associated with works provide the primary means of clustering multiple copies
`of those works. The existence of the identifier allows citation lookup and
`resolution steps to be performed by different software systems, and facilitates
`distributed resolution. It provides management benefits for those running
`reference lookup and resolution services. Above all, the identifier gives
`permanence to a reference beyond the life span of any particular computer
`system. Given the overwhelming practical benefits of the identifier, it seems
`best to treat identifiers as a necessary part of the general model, while
`acknowledging there may be special cases in which they can be omitted.
`
`Perhaps the most compelling argument that identifiers are needed for reference
`linking is that all current systems find them necessary. For ISI the identifier is
`a private key. The Astrophysics Data System has its own BibCode, and
`PubMed uses a PubMed ID. Digital Object Identifiers (DOIs) are an
`implementation of a Uniform Resource Name; they are public identifiers
`intended to be used wherever the item needs to be identified. DOIs are
`managed and resolved through the CNRI Handle System [Handle]. BibCodes
`and PubMed IDs were not explicitly intended to be Uniform Resources
`Names, but can be considered as such. They satisfy the commonly accepted
`criteria of persistence and global uniqueness, while supported by openly-
`accessible resolution systems.
`
`5 of 18
`
`5 of 18
`
`
`
`Reference Linking for Journal Articles
`
`http://www.dlib.org/dlib/july99/caplan/07caplan.html
`
`Identifiers for reference linking must meet three functional requirements. The
`first two are generic; the third is specific to reference linking.
`
`Persistence
`
`An identifier must be persistent, or at least, have enough organizational
`and technical structure around it to ensure some degree of reliability.
`This excludes informal and unmanaged identifier systems, but does not
`preclude well-managed local and proprietary identification schemes
`(such as the PubMed ID).
`
`Uniqueness
`
`An identifier must be unique within its own namespace. The model
`assumes multiple systems of identifiers, and there is no way to
`guarantee an identifier will be universally unique, that is, that a
`particular identifier string will not resolve to different items within
`different resolution systems. However, identifiers must be unique within
`a single system of resolution. It is also reasonable to expect that
`uniqueness will be preserved within the larger universe if the namespace
`assignations are well-managed.
`
`Multiple resolution
`
`A system of identifiers must be capable of supporting resolution to
`multiple items. In the model, it is assumed that multiple copies of a
`creation may exist, and that it must be possible to get from an identifier
`to all copies or to the subset of copies most appropriate for the user. (A
`URL, which by definition resolves to a single location, cannot satisfy
`this requirement, though it is possible for a URL to point to a web page
`containing a list of URLs for various copies of the article. This does not,
`however, easily support automatic resolution to the most appropriate
`copy.)
`
`DOIs, PubMed IDs, and astrophysics BibCodes all satisfy these requirements.
`
`It has been suggested that actual identifiers are unnecessary, as citation
`information can be used to calculate a key to the article on the fly. However,
`this key must be either a URL or a string that resolves to one or more URLs. If
`the calculated key is a URL, it does not support the reference linking model
`because of the requirement to support resolution to multiple copies of an item.
`If the key is a string that can be resolved to one or more URLs, then that key is
`in fact an identifier which, if persistent and unique within its namespace, fits
`within this model.
`
`Obtaining an identifier from a citation
`
`In a recent paper, Van de Sompel and Hochstenbach [Van de Sompel 1999a]
`provide a categorization of the techniques used to obtain an identifier from a
`citation. In particular, they list the three following options.
`
`Calculation of identifiers
`
`In well-determined bodies of information, it may be possible to use an
`
`6 of 18
`
`6 of 18
`
`
`
`Reference Linking for Journal Articles
`
`http://www.dlib.org/dlib/july99/caplan/07caplan.html
`
`algorithm to calculate the identifier from the citation.
`
`Static reference databases
`
`Figure 1 shows the construction and use of a static reference database of
`references. With static linking, all reference links within a work are
`pre-computed, ready for clients to invoke. This is effective within a
`well-defined body of literature, such as scientific journals, where the
`publishers enter metadata about each digital object into a database on
`publication and use that database for establishing subsequent references.
`
`Dynamic linking
`
`In general, not all references can be or need to be precomputed. The
`term "dynamic linking" covers a variety of techniques for computing
`references only when required by a user. The approach of the Open
`Journal project is to compute links when a user downloads a page. The
`SFX system has just-in-time resolution [Van de Sompel 1999b]. When a
`client attempts to link to a reference, SFX attempts to resolve it. A
`major advantage of dynamic linking is flexibility: it allows links to
`materials only recently brought online and it permits forward references.
`Another advantage is that, unlike static linking, dynamic linking can be
`utilized in situations where not all of the resources in question are under
`the control of the linking service, a concept exploited in the SFX
`system. The major disadvantage is that dynamic linking is probabilistic:
`there is no guarantee a link will actually resolve to a valid item.
`
`Following this analysis, for static reference linking, identifiers to journal
`articles may be obtained in three ways:
`(cid:1) a citation can contain an identifier;
`(cid:1) the bibliographic information within a citation can be used to
`calculate an identifier;
`(cid:1) the bibliographic information within a citation can be used to look up
`the identifier in a reference database.
`
`If an identifier is embedded in a citation, the step of querying a reference
`database for the identifier is obviously unnecessary. Hopefully, the practice of
`including an identifier explicitly with a citation will increase, but it can never
`be depended upon.
`
`Calculation of identifiers
`
`In well-determined bodies of information, it is possible to use an algorithm to
`calculate the identifier from the citation. As a successful example, the
`astrophysics BibCode can be calculated from standard bibliographic
`information, such as the name of the publication, volume, and pagination. It
`takes advantage of the standardization possible within a tight community with
`a small number of prominent journals. The success of the BibCode shows that,
`in small domains, it is possible to extract metadata fields automatically, which
`can be assembled into a key, with high accuracy.
`
`7 of 18
`
`7 of 18
`
`
`
`Reference Linking for Journal Articles
`
`http://www.dlib.org/dlib/july99/caplan/07caplan.html
`
`The Serial Item and Contribution Identifier (SICI) standard provides a set of
`rules for calculating identifiers for journal articles [SICI 1996]. It combines
`the ISSN with data about the volume and issue, data identifying the location of
`the article, and a constructed title code for the article. When all basic
`bibliographic data are available for constructing the SICI, the identifier is
`consistent and highly likely to be unique. However, in the real world, citations
`are not always complete or fully uniform. The SICI standard allows the
`identifier to be constructed from the best available information, meaning that
`SICIs for the same article created from different citation sources could vary.
`
`This illustrates the general flaw of calculable identifiers. So long as the data
`from which the identifiers are calculated can be closely controlled, calculable
`identifiers can work reliably. However, the more variation there is in sources
`of citations, the higher the likelihood this data will vary. Thus, as the number
`of journals, publishers, abstracting and indexing databases, and end-user
`citation formats increases within any system of reference linking, the
`reliability of calculable citations correspondingly decreases. As a result, the
`working group was skeptical about the possibilities of building large-scale
`systems of reference linking that depend on automatic computation of
`identifiers from citations.
`
`In larger but well-structured domains, such as scientific journal articles, it is
`possible to extract metadata fields automatically, which can be assembled into
`a key, with good but not perfect accuracy. While the tools may not be precise
`enough to generate calculable identifiers, they are invaluable for preliminary
`analysis augmented by human editors. The philosophy behind the Scholarly
`Link Specification Framework (SLinkS) [Hellman 1999] and the method
`developed by ISI define a set of templates that correspond to the citation
`formats used by various publishers. A related project is the work of Lawrence
`and colleagues at NEC [CiteSeer]. Their ScienceIndex project (formerly
`known as CiteSeer) has developed a number of tools for extracting citation
`data automatically from documents, particularly those in PostScript. The Open
`Journal project has also built tools for extracting citations. All these tools are
`available to other researchers.
`
`Reliable templates depend upon the consistency with which publishers format
`citations. ISI, which has probably the most expertise in this area, finds that
`templates are extremely useful, but a substantial number of citations need
`manual processing to extract the correct metadata. Experience with
`multidisciplinary collections indicates that, outside the hard sciences, the
`ability to match citations accurately on the first try drops substantially and
`additional processing is required.
`
`Reference lookup
`
`If the identifier is neither embedded nor calculable, lookup in a reference
`database is required. The reference database contains metadata linked to
`identifiers for works (and possibly also for manifestations of works). The
`database system receives a query derived from a citation and returns the
`identifier associated with that citation.
`
`The act of reference lookup does not necessarily have to be implemented as a
`separate step, with a separate database, from the resolution of the identifier, as
`
`8 of 18
`
`8 of 18
`
`
`
`Reference Linking for Journal Articles
`
`http://www.dlib.org/dlib/july99/caplan/07caplan.html
`
`shown in the model. However, lookup and resolution are conceptually distinct
`steps, and they are likely to be implemented as separate systems. Different
`agencies may want to provide the different services. Also, citation lookup may
`require more processing power than resolution, arguing for technical
`separation. Further, it cannot be expected that every lookup of citation
`information will yield unique, unambiguous results. Lookups resulting in
`more than one hit may require some negotiation with the party initiating the
`lookup, or may return multiple identifiers, leaving it up to the user to select
`which to resolve. Functionally, this complexity is best dealt with by separating
`resolution from lookup.
`
`Several reference lookup services are likely to exist, and it can be expected
`that the databases will not necessarily have unique content, so the same
`citation could be successfully queried in more than one reference database.
`For example, both PubMed and the IDF system could have information about
`a single journal article. Different lookup services could provide different types
`of identifiers (e.g., PubMed IDs, DOIs); more than one service may also
`provide the same type of identifier. In the simplest case, the user would choose
`the lookup system and enter the query through a standard interface. Possibly,
`there could be a registry of lookup services, which a searcher could use to find
`the most appropriate. If there were only a small number of lookup sites, front
`end software could be written to search them all simultaneously. However, for
`these front ends to return intelligible results to the user, it may be necessary to
`standardize the response formats from the various lookup sites.
`
`Metadata for reference lookup
`
`A key issue for the lookup service is what metadata is needed to support
`reference lookup. It is useful to define a minimum set of data elements
`sufficient to support most queries, to be implemented by all providers of
`lookup services. This minimum element set becomes the definition of a
`minimal citation guaranteed to support successful lookup, assuming an
`appropriate reference database is selected for the query. Several publishers
`were insistent that the list of elements be kept short. They do not want the
`reference database to become an inferior indexing service that competes with
`their higher quality products.
`
`During the recent series of meetings, publishers and librarians reached
`considerable agreement about the necessary metadata fields for journal
`articles. Appendix 1 is an informal comparison of the metadata elements
`included in several different working systems or proposals, including PubRef,
`the in-house systems used by Wiley and D-Lib Magazine, and proposals
`drafted by NFAIS and by Norman Paskin for the IDF. (Note that this was
`informally compiled and is not intended to be a definitive summary of any of
`the included schemes.) Based on this comparison, the following recommended
`minimum data element set was drafted for further discussion.
`
`1. Title: Title of the journal article.
`
`2. Creator(s): Author(s) of the journal article. The first author at a
`minimum should be included; subsequent authors may be included at
`the discretion of the metadata provider.
`
`3. Journal Title: Title or title equivalent of the journal in which the
`
`9 of 18
`
`9 of 18
`
`
`
`Reference Linking for Journal Articles
`
`http://www.dlib.org/dlib/july99/caplan/07caplan.html
`
`article is published. An unambiguous key number, such as ISSN or
`CODEN, could function as a title equivalent.
`
`4. Date: Publication date of the article or the official chronology of the
`journal issue containing the article. Chronology is the published
`designation or "issue date" (e.g., May/June 1999).
`
`5. Enumeration: The numbering designation of the journal issue
`containing the article. Enumeration generally includes volume and issue
`number, and may include other designations such as Part, Series, etc.
`This can be omitted only if the journal itself has no official enumeration,
`as is the case with a currently small number of electronic-only journals.
`
`6. Location: Starting page number of the article, or, if there is no
`pagination, assigned article number.
`
`7. Type: Type of material, in this case probably "journal article". It is
`assumed that the provider of the reference database will wish to provide
`a code for the type of entity being described, in order to distinguish
`between related materials. For example, in the Wiley database, "Type"
`can have the value "Article", "Abstract", "Issue" or "Journal", since
`each of these entities has its own record in the database. It is not
`assumed that this element will be explicitly included in citations.
`However, the query interface to the reference database might be able to
`provide this value by inference, default, or even, in some cases, asking
`the user. (Another useful value for this element might be "Database
`Record" -- e.g., to indicate that the entity found is an ISI record or a
`PubMed record, or a library holding record, as opposed to the actual
`article itself.)
`
`This metadata set is compatible with the metadata currently collected by
`PubRef, and with the metadata set proposed for reference linking by Norman
`Paskin for the IDF. The work group attempted to relate these elements to the
`Dublin Core, but found difficulty in representing the relationship between a
`journal article and the journal in which it is published. This problem may be
`solved in the near future, as the Dublin Core Working Group on Bibliographic
`Citations is in the process of drafting guidelines for a standard way of
`representing citation information in both simple and qualified Dublin Core. It
`is hoped that these guidelines will accommodate all data elements in the
`recommended minimum set.
`
`It is recognized that several of these elements must actually refer to a selected
`manifestation of a work. It is also recognized that the descriptions are
`imprecise in their specification of the data which would be supplied to
`populate these elements in an actual database; within each element there may
`be differing definitions as well as multiple definitions (e.g., "Date" may
`include publication date and/or issue date). But these issues can be handled
`successfully in a real-world implementation as long as precise element
`definitions are specified on the input side, while looser formulations are
`permitted to be successful on the query side. For example, in the case of the
`"Date" element, a database might be structured hierarchically such that the
`"Date" branch included both "Publication Date" and "Issue Date" elements.
`While database population would have to follow very precise rules regarding
`
`10 of 18
`
`10 of 18
`
`
`
`Reference Linking for Journal Articles
`
`http://www.dlib.org/dlib/july99/caplan/07caplan.html
`
`which information would be permitted in each field, on the query side the
`rules could afford to be much looser: e.g., if a query were "smart" enough to
`seek "Issue Date" specifically, it could do so, but if it only knew enough to
`seek a "Date," then the query processor could easily consider the values of all
`"Date" related fields, or else the one deemed most likely to be useful as a
`default answer.
`
`Resolution of the identifier
`
`To resolve an identifier, it is sent to a location database that returns a list of
`locations where copies of the creation are stored. Extra information may be
`associated with each location to help the client select a specific location. For
`efficiency, it is desirable to have multiple resolvers for each type of identifier
`so that the processing load can be shared and resolution could be routed to the
`geographically closest server. The design of the Handle System supports
`high-performance distributed resolution of DOIs. The other types of identifiers
`use database lookup with mirroring.
`
`Since there are several types of identifier, a client must know what location
`databases support resolution of which types of identifiers. Under the simplest
`model, the identifier itself determines the resolver; a DOI is submitted to the
`DOI resolver, a PubMed ID is submitted to the PubMed/PubRef resolver, etc.
`Although not implemented, various automatic mechanisms have been
`proposed for registration of identifiers and for finding servers supporting
`resolution of the various namespaces. For the near future, the number of
`services is likely to be small enough that they can be listed by enumeration.
`
`Selective resolution
`
`While some implementation of identifier-based resolution of namespaces as
`described above is necessary, it is not in itself sufficient as it does not
`accommodate the second issue, the need for selective resolution. This
`requirement, which has come to be known as "the Harvard problem", was
`described in the report of the first workshop as follows:
`
`In many cases there will be multiple copies of the same article available.
`For example, an Elsevier journal may be available in Science Direct, in
`Michigan's PEAK database, through OhioLink, etc. Many legitimate
`reasons for multiple copies exist, including performance (caching),
`different service m