`Information Retrieval’
`
`Donald B. Crouch, Carolyn J. Crouch and Glenn Andreas
`
`Department of Computer Science
`University of Minnesota - Duluth
`320 Heller Hall
`Duluth, Minnesota 55812
`
`ABSTRACT
`
`The graph-traversal approach to hypertext information retrieval is a conceptualization of
`hypertext in which the structural aspects of the nodes are emphasized. A user navigates
`through such hypertext systems by evaluating the semantics associated with links
`between nodes as well as the information contained in nodes. [Fris&8] In this paper we
`describe an hierarchical structure which effectively supports the graphical traversal of a
`document collection in a hypertext system. We provide an overview of an interactive
`browser based on cluster hierarchies. Initial results obtained from the use of the browser
`in an experimental hypertextretrieval system are presented
`
`INTRODUCTION
`
`Information retrieval is concerned with the representation, storage and retrieval of
`documents or documentsurrogates. Informationretrieval activities are routinely conducted
`on-line under the control of search intermediaries or end users who have been trained to
`usé somewhat complex user-system interfaces. However, poor query formulations and
`inadequate user-system interaction still occur even with skilled users. For example,
`Cleverdon has noted that “if two search intermediaries search the same question on the
`same database on the same host, only 40 percent of the output may be commonto both
`searches.” [Clev84]
`
`What is being done to aid users of information retrieval systems? The most common
`approaches are generally directed either toward the development of aids based on
`sophisticated user interfaces or toward the developmentof expert system techniques forthe
`more complex operations of text retrieval systems. [Crou89] Research involving
`sophisticated user interfaces is primarily concerned with system functioning and
`convenience asit relates to the user; its goal is to facilitate the use of the system by
`providing computerized aids previously available only to the search intermediary in non-
`computerized forms. Amongthe facilities normally included in systemsof this type are
`vocabulary displays, thesaurus expansion of vocabulary items designed to add related
`terms to already existing search words, the construction and storage of search protocols,
`operations with previously formulated queries, etc. While this type of research is
`warranted and its results encouraging, it has not necessarily produced more effective
`retrieval but instead has generated toolsfor effortless learning and use of an information
`tetrieval system,
`
`The other major area of research in the developmentof user aids for information retrieval
`is concerned with the design of expert systems that facilitate access to the stored
`
`*This work was supported by the National Science Foundation under grant IRI 87-02735.
`
`Hypertext '89 Proceedings
`
`225
`
`(cid:20)
`1
`
`November 1989
`
`PETITIONERS - EXHIBIT 1011
`PETITIONERS- EXHIBIT 1011
`
`IPR2022-00217
`
`IPR2022-00217
`
`
`
`collections. The goal of such research is to capture the expertise of search intermediaries
`in formulating Boolean queries and in dealing with other types of retrieval services. The
`expert approach is based on the use of domain-specific knowledge that covers the topic
`areas represented by the collection, a language analyzer that can understand natural
`language queries and translate them into appropriate internal forms, and rules for search
`formulation and search strategy designed to choose search methods based on user criteria.
`It may eventually become feasible to generate search formulation criteria in the form of
`rules that do in fact reflect the expert knowledge of trained search intermediaries. However,
`for the time being, one has reason to be careful in accepting many of the currently
`unevaluated design proposals for expert system approachesas effective solutions to the
`Tetricval problem.
`
`Wesubmit that a viable alternalive to using either very sophisticated user interfaces or
`expert systems as a solution to the retrieval problem consists of using only simple user-
`system interactions which enhance the effectiveness of retrieval operations through the
`addition of properly designed user friendly features. These features allow the user to
`function in an active role, replacing the full natural language comprehension which is
`desirable yet currently unavailable in an automatic search expert.
`
`This approach to interface design is inherent in the concept of hypertext information
`retrieval. Hypertext supports a user's exploration of informational data items by
`representing data as a network of nodes containing text, graphics and other forms of
`information, [Smit88] A user may navigate through the hypertext system by following
`the links between nodes, The path a user follows is determined by his/her analysis of the
`information contained within the nodes and the semantics associated with links between
`the nodes, [Fris88]
`
`In hypertext information retrieval, each node is generally assumed to be a single
`document. Links exist which connect each document to other documents having keywards
`in common with it; the semantics of the links between nodes are keywords (document
`index terms) or some descriptive information representing the connected documents. In
`this paper we introduce an hierarchical structure which provides additional semantic.
`information within and between nodes. This structure seems particularly well suited to the
`user's exploration of a document collection in a visual context. The user may browse
`among the data items by analyzing a graphical display of the structureitself as well as the
`semantic links between nodes.
`
`In the next two sections, we briefly describe the retrieval model and the characteristics of
`the hierarchy on which our structure is based. We then describe a prototype of a hypertext
`retrieval system utilizing the cluster hierarchy and present the initial results of an
`experiment comparing retrieval performance of the hypertext system with that of an
`automatic retrieval system.
`
`INFORMATION RETRIEVAL MODELS
`
`The most commoninformation retrieval models are the Boolean retrieval model and the
`vector space model, These two models are briefly described andtheir use in conventional
`information retrieval systems examined.
`
`Boolean Retrieval Model
`
`Most retrieval systems are based on the Boolean model, Queries are expressed as a set of
`terms connected by the Boolean operators and, or and nol. Such systemsretrieve
`information by performing the Boolean operations on the corresponding sets of
`documents containing the query terms. Although the Boolean model can be used
`effectively in automatic text retrieval (in fact, a query can be formulated to retrieve any
`particular subsetof items), imprecise or broad requests utilizing the or relation can result
`in the retrievalof large numbers of irrelevant texts while narrow or overly precise queries
`
`
`Hypertext '89 Proceedings
`
`226
`
`(cid:21)
`
`November 1989
`
`
`
`utilizing the and relation can exclude many relevant items. In practice a compromiseis
`often obtained by the use of a query formulation that is neither too broad nor too narrow,
`[Salt86]
`
`Although the Boolean model has been widely accepted,it does have its problems:
`
`*
`
`*
`
`+
`
`*
`
`Boolean queries are difficult to construct, intermediaries are generally required to
`add terms notoriginally included, provide synonyms or altermate spellings, drop
`high-frequency terms, etc. [Fox86],
`
`Boolean systems generally do not provide for the assignment of term weights,
`
`the size of the subset of documents to be returned is difficult to control, and
`
`the retrieved documents are usually presented in a random order (no ranking based
`on an estimate of the query-documentrelevance is provided),
`
`The difficuluies associated with the construction of Boolean queries are well known. One
`author recently commented that “research and developmentin information retrieval since
`the 1950's has concentrated on methods which can provide better retrieval without the
`need for Boolean queries.” [Colv86]
`
`Vector Space Model
`
`The vector space model is conceptually the simplest retrieval model and is better suited
`for use in hypertext retrieval systems than the Boolean model. In the vector space model,
`the content of each documentor query is represented by a set of possibly weighted content
`terms(i.e., some form of content identifier, such as a word extracted from the document
`text, a word phrase, or concept class chosen from a thesaurus), A term’s weightreflects
`its importance in relation to the meaning of the document or query. Each informational
`item (document) may then be considered a term vector, and the complete document
`collection becomes a vector space whose dimension is equal to the numberof distinct
`terms used to identify the documents in the collection. [Rijs79, Sali83]
`
`In ihe vector space model, it is assumed that similar or related documents or similar
`documents and queries are represented by similar multidimensional term vectors,
`Similarity is then generally defined as a function of the magnitudes of the matching terms
`in the respective vectors.
`
`A vector representation of documents and queries facilitates certain retrieval operations,
`namely:
`
`*
`
`*
`
`*
`
`The construction of a clustered documentfile (consisting of classes of documents
`such that documents within a given class are substantially similar to each other).
`In clustered collections, an automatic search can be limited to the documents
`within those clusters whose class vector representations are similar to the query
`vector.
`
`The ranking of retrieved documents in decreasing order of their similarity with
`the query,
`The automatic reformulation of the query based on relevance assessments
`supplied by the user for previously rewieved documents. The intentof relevance
`feedback is to produce a modified query whose similarity to the relevant
`documents is greater than that of the original query while its similarity to the
`nonrelevantitemsis smaller.
`
`The vector processing modelalso exhibits certain disadvantages, namely:
`+
`Some mode! parameters, such as the query-documentsimilarity function, are not
`derivable within the system but instead are chosen a priori by the system
`designer.
`
`Hypertext '89 Proceedings
`
`227
`
`(cid:22)
`
`November 1989
`
`
`
`«
`
`«
`
`Terms are assumed to be independent of one another.
`
`Term relationships are not expressible within the model.
`
`A recent characterization of the vector space model is contained in [Wong84],
`
`CLUSTERED DOCUMENT ENVIRONMENTS
`
`A principal advantage of the vector space model for use in hypertext information retrieval
`is that algorithms exist for structuring a2 document collection in such a manner that
`similar documents are grouped together. A cluster hierarchy is represented by a tree
`structure in which terminal nodes correspond to single documents and interior nodes to
`groups of documents, In a hypertext system based on a clustered environment, the user
`can readily focus his/her search on those groups (clusters) that are likely to contain
`documents which are highly similar to the query. Additionally, the cluster hierarchy is
`beneficial as a browsing tool in that it makes it possible easily to locate neighboring
`items with related subject descriptions.
`
`Agglomerative Cluster Hierarchy
`
`Voorhees [Voor85] has shown thatretrieval effectiveness may be enhanced in automatic
`retrieval systems when a type of clustering, known as agglomerative Aierarchic clustering,
`is used to generate a cluster structure, In such a clustering method, each documentin the
`collection is considered initially to be a singleton cluster. The two closest clusters are
`successively merged until only one cluster remains. The definition of closest depends on
`the actual clustering method being used.
`
`Fig. 1 contains an example of a hierarchy for the single link agglomerative clustering
`method. In the single link method the similarity between two clusters is the maximum of
`the similarities between all pairs of documents such that one documentof the pair is in
`one cluster and the other documentis in the other cluster, It may be noted that in the
`hierarchy documents may appear at any level and that clusters overlap only in the sense
`that smaller clusters are nested within larger clusters.
`
`Each cluster in Fig. 1 is labelled with the level of association between the items underit,
`The clustering level determines the association strength of the corresponding items, Thus
`the similarity between items B, C and D in Fig, 4 is 0.9, On the other hand,
`the
`similarity between item A and the cluster containing items B, C and D is only 0.7, The
`level of association is a useful link semantic in a hypertext system,
`
`Searching a Clustered Environment
`
`To retrieve documents automatically in a clustered environment, comparisons are
`generally made between the query vector and documentvectors using one of the standard
`measures of similarity. A cluster search simplifies the search process by limiting the
`search to subsets of documents. For example, with an agglomeratively clustered tree such
`as that shownin Fig. 1, a straightforward, narrow, depth-first search starts at the top of
`the tree and calculates the similarity between the query and each ofits children. The child
`most similar to the query is selected, and the similarity between the query and each of the
`non-document children of that node is calculated. The process is repeated until either all
`the similarities between the query and the non-documentchildren of some nodeare less
`than that between the query and the node itself, or all the children of that node are
`document nodes. The documents comprising the cluster represented by that node are
`returned, The search may be broadened by considering more than one path at each level.
`The broadest search considers all paths and abandons them as they fail certain criteria.
`
`
`
`Hypertext ‘89 Pracesdings
`
`228
`
`(cid:23)
`
`November 1989
`
`
`
`
`
`Fig. 1 A sample singlelink hierarchy
`
`A bottom-up search may also be performed on such a tree. The cluster at the lowest level
`ofthe tree whose centroid is most similar to the query is chosen as the node at which the
`search will start. The search continues up the tree until the similarity between the query
`and the parent of the current node is smaller than the similarity between the query and the
`current node, The documents contained in the cluster corresponding to the current node are
`returned, The bottom-up search is often more effective due to the uncertainty involved at
`high levels of the hierarchy. [Crof80]
`
`Cluster hierarchies have been used effectively in automatic searches. Such hierarchies are
`also useful in performing searches based on browsing operations. These types of
`operations, we believe, can produce significant improvement in retricval performance.
`Automaticcluster searches are highly structured; the next link in the search path is
`determined solely on the basis of the similarity between the query vector and the vector
`representation of the node being evaluated. By displaying suitable portions of the
`hierarchy during the course of the search operations andletting the user choose appropriate
`search paths at each point, the output obtained should be superior to that obtained by
`automatic cluster searching. For example, in a hypertext system with an interactive
`browser, following evaluation of items B, C, and D in the sample tree of Fig.1, the user
`has the choice of exploring either a tightly clustered structure containing items F and G
`(which are very similar to each other with a similarity value of 0,8) or of staying in the
`same cluster and evaluating item A (at a lower similarity level of 0.7). In contrast, the
`control mechanism of the automatic search procedure may terminate the search at the node
`labelled 0.7 and never evaluate the cluster containing items F and G. The effectiveness of
`this type of user-directed, interactive browsing is determined by comparing the results of
`suchinteractive searches to those obtained by automatic cluster searches.
`
`THE INTERACTIVE BROWSER
`
`A browser incorporating the cluster hierarchy as its primary network structure was
`implemented on a Macintosh IIx computer using HyperCard, The Macintosh is connected
`via a local area network to a SUN System on which the SMARTinformation retrieval
`system [Salt71] resides. The SMART system provides packages for textual analysis,
`clustering, performance evaluation, etc.
`
`To conducta search using the browser, a userinitially specifies a natural language query
`which is subsequently transformed into a term vector representation via the SMART
`retrieval system, The hypertext system then displays a window containing the original
`query and its corresponding term vector (Fig. 2). As suggested by the annotated display of
`
`Hypertext '89 Proceedings
`
`229
`
`(cid:24)
`
`November 1989
`
`
`
`Fig. 3, a user may obtain the word stem associated with each conceptin the term vector
`as well as the document frequency of that term by clicking on a concept numberin the
`vector. At any point during the search process, the user may add or delete concepts from
`the query vector representation or completely re-specify the query itself, The query
`windowalso containsalist of identifiers representing the documents which the user has
`determined to be relevant to the query. Initially this list is empty; however, as the user
`conducts the search process, he/she enters documents into the list.
`
`To begin (or continue) a browse in the clustered environment, the user clicks on the
`Use Query button. The interface presents a display of the clustered document space
`represented as a complete link hierarchy. A user may begin an exploration of the cluster
`tree at any point, for example, at the root node for a top-down approachor at a leaf node
`(document) for a bottom-up approach. The user may prefer to initiate a search at an
`interior node (a cluster) which contains one or more documents knownto be similar to
`the query,
`
`In general, a tree representation of a clustered collection is too large to be displayed inits
`entirety. Therefore, a user is presented with two viewsofthe cluster tree simultaneously:
`a local view containing the subtree within which the user is currently browsing (see
`Fig. 4) and a global view, a more comprehensive view of the tree containing a
`significantly larger number of nodes than the local view (see Fig. 5). A user-directed
`traversal among the nodesis simultaneously reflected in both displays, The global view
`permits the user to observe where the search is being conducted in relation to the entire
`tree while the local view provides the user with more detailed information abouta specific
`subtree.
`
`As may be noted, many links and informational items are provided by the interface
`system to aid the user during the browsing process, As suggested by Fig, 4, the local
`view ofthe tree:
`
`+
`
`Uses different iconic representations to distinguish intenor nodes (clusters) from
`leaf nodes (documents).
`
`« Displays for each interior node the level at which the documents cluster. The
`clustering level represents the degree of association between the items underit.
`Lists the number of documents contained within the subtree defined by each node
`as well as the number of children of that node. This information can also he
`obtained by counting the nodes in the global view ofthetree.
`
`*
`
`*
`
`*
`
`Lists the value of the correlation measure of the query vector with either the
`centroid vector or the documentvector associated with each node in the subtree,
`During the search process the user may change the correlation measure being
`calculated by means of the Correlauon Measure pop-up menu. At present, the
`system provides a choice of several measures including vector product, inner
`product, Tanimoto, cosine and overlap,
`
`Provides a listing of the concepts contained within the query vector (se¢ also
`Fig. 6). This information is also displayed in the query window; however, in the
`tree display, the concepts in the query are displayed in ascending order of
`document frequency. The user may alter the query by adding or deleung concepts
`from the query vector during the search process without returning to the query
`window.
`
`*
`
`Uses different iconic representations to distinguish relevant documents from the
`other documents in the tree. A list of the documents which the user has chosen
`as relevant to the query is maintained in the display. The user mayfreely insert
`document identifiers into and delete items from this list. The icons of the
`documentsin this list are then highlighted in the tree representation,
`+—Lists documentidentifiers represented by the leaf nodes ofthe tree.
`
`
`Hypertext '89 Proceedings
`
`230
`
`(cid:25)
`
`November 1989
`
`
`
`Query * 12
`
`Effect of azathioprine on systemic lupus erythematosus,
`particularly in regard to renal lesions
`
`Concepts Weight
`
`Relevant Docs
`
`oncepts found in
`query with their
`
`Query * 12
`
`Effect of azathioprine on systemic lupus erythematosus,
`particularly in regard to renal[lesions|
`
`in the que lew
`
`Concepts Weight
`
`Relevant Docs
`
`Clicking on a concept
`numberreveals the word
`forming that concept and the
`document frequency of the
`concept.
`It also outlines the
`first occurrence of that word
`
`-.
`
`—ahats ee
`
`Fig. 3. Clicking on a concept number
`
`Hypertext '89 Proceedings
`
`ai
`
`(cid:26)
`
`November 1989
`
`
`
`
`
`in common
`
`
`
`
`
`
`
` : 0.086000
`Interior node with
`2/5
`
`one concept number}
`00,0437
`28
`
`matiBh
`
`
`Correlation Measure
`
`List of concept
`numbers used
`Cone
`
`
`
`
`List of
`relevant
`
`; documents
`
`
`A document with
`
`no concept numbers
`in common
`
`
`
`
`
`
`no concept numbers
`
`in common
`
`A document with
`one concept number
`in common
`
` aecee WU
`
`ATR Ata
`
`Fig. 5 The overview of the trea
`
`
`Hypertext ‘89 Proceedings
`232
`November 1989
`
`(cid:27)
`
`
`
`One can obtain additional information from the local tree display by clicking on various
`items contained within the display. For example, by option clicking on a node which has
`terms in common with a query, the selected node's icon is replaced with an informational
`window listing the terms in common and the weights associated with these terms
`(Fig. 7), Choosing a concept number in the query vector results in a display of the
`textual description of the chosen term and its documentfrequency (Fig. 6). Clicking on a
`terminal node (a documenticon)results in the display of additional information associated
`with the document. An example of a document window is contained in Fig. 8. One can
`also review the query or enter a new query by selecting the To Query button. It should be
`noted that each of the display windows has informational features associated with them
`which support the visual search process,
`
`The global view of the tree (Fig, 5) has few items associated with a node, since the
`purpose of this display is to assist the user in locating his/her position within the wee
`during the navigation of the cluster space. The three types of icons used in the global
`view distinguish the following types of nodes:
`+
`A documentclassified as relevant by the user. As previously noted, identifiers of
`the relevant documents are highlightedin the local view.
`A document which has not been classified as relevant.
`
`*
`
`*
`
`The node corresponding to the central nodein the local view of the tree.
`
`These nodes are color coded to facilitate a quick review ofthe tree at large.
`
`
`
`Fig. 6 Finding the term for a concapt numberin the browser
`
`Hypertext '89 Proceedings
`
`233
`
`(cid:28)
`
`November 1989
`
`
`
`
`Option-Clicking on a document or
`
`node wil] list the common concept ian®7
`number (and its weight)
`
`
`
`a
`
`Ls
`
`[A(ToDoc...)
`
`
`
` “I
`3340 @ 0.05781
`
`Fig. 7 Gatting common concept number information
`
`=
`
`a2 2
`
`= ewes
`
`«
`
`Item_ib
`
`1635} Id of Document
`
`:
`
`:
`
`|:
`
`BO
`Concept
`
`Fiat
`
`1.00000
`1.00000
`1.00000
`avalalale
`Concept
`
`Weights a
`
`4
`
`fstrain ofthe
`
`female genital
`
`aepticemia due te mycoplasma hominis type |
`viola m. young, ph.d., and sheldon m. wolff, md.
`a febrile illness that followed therapeutic
`abortion
`and was accompanied by the presence of
`mycoplasma hominis type | inthe blood is
`described.
`the patient upon recovery exhibited a specific
`antibody r
`isolated
`
`from the i Text of the Abstract
`
`these findings are regarded as
`
`same serotipe.
`additional
`evidence for the pathogenicity of m. hominis
`type 1 organisms, particularly in situations
`favoring
`their dissemination from the
`
`Fig. 8 Documentcard andits parts
`
`Hypertext '89 Proceedings
`
`234
`
`(cid:20)(cid:19)
`10
`
`November 1989
`
`
`
`The user may view (move to) other portions of the tree not currently in the local or
`global windowsin one of several ways:
`*
`By scrolling up or down orleft or right in the global display window. The
`window moves acrossthe tree in the global view in the direction represented by
`the scrolling action of the mouse,
`
`+
`
`*
`
`*
`
`By clicking on a node inthe local view.In this case, the chosen node becomes
`the central node in the local view. Such an action effectively moves the local
`view up or downonelevelof the tree.
`By clicking on the To Root button in the local view. This action causes the
`interface system to redraw the local view of the tree with the central node
`becoming the root node.
`
`By clicking on the To Doc button in the local view. The To Doc button
`allows the user to view the document window associated with any documentin
`the collection. To obtain a document window, the user must specify the
`document's identifier. Clicking on the To Tree button in the document
`window returns the user to a local view of the tree with its central node
`corresponding to the parent node of the document contained in the document
`window.
`
`The tree displays described above and the informational windows associated with items
`contained in the displays are effective for representing a local cluster arrangementfor a
`small collection or a local area of a larger collection.
`
`EVALUATION OF THE BROWSER
`
`The SMARTinformation retrieval system provides a general framework for conducting
`retrieval experiments. SMART has fully automatic iterative search methods with
`automatic relevance document classification. The means exist within the system for
`evaluating the effectiveness of the retrieval process; the effectiveness of any interactive
`system can be established by comparison with automatic search procedures contained in
`the SMART system. SMARTalso provides collections of documents and corresponding
`sets of queries which may be used for experimentation, Relevance assessments have been
`produced by persons knowledgeable of the subject matter in the collections.
`
`In order to develop a general search strategy which a user may employ in the hypertext
`retrieval system, we focused on the MEDLARSCollection, a somewhat homogeneous
`collection generated by the National Library of Medicine. A user-controlled cluster search
`technique which performs well in a homogeneous collection will perform as well or
`generally significantly better in a heterogeneous collection. MEDLARSconsists of 1,033
`documents in the medical field and a corresponding set of 30 queries. The document
`vectors were generated from an analysis of the abstracts of the documents. The
`MEDLARScollection was then clustered using a complete link clustering algorithm,
`resulting in a yery wide tree, The cluster hierarchy contains 76 subtrees at the root node,
`and the maximum depth ofthetreeis ten,
`
`To assist in the development of a method for conducting a visual interactive search of a
`clustered collection, we divided the MEDLARSquery collection into two subsets. One set
`of queries (the base set) was used to aid in the developmentof the methodology; the other
`set was used to estimate the performanceofthe interactive search process.
`
`We performed an interactive search in the hypertext system for each query in the base set
`to determine the optimal search a user would follow in order to retrieve the known
`relevant documents of the query. By conducting this process for each of the 15 queries in
`the base set, we were able to observe and define the common threads linking relevant
`documents to the cluster tree. Our observations and the resulting search method that
`evolved are reported in [Andr89].
`
`
`Hypertext '89 Proceadings
`
`235
`
`(cid:20)(cid:20)
`11
`
`November 1989
`
`
`
`An important point about both phases of the experimentis that the actual text of the
`document (as shown in the document cards) was never examined during the interactive
`search process to determine document relevance. Doing so would of course have
`substantially improved retrieval performance in an actual user-controlled search process,
`However, one of our objectives when conducting this experiment was to apply some of
`the insight gained from the interactive visual search to an automated system, An
`automated search system does not consult actual text during the search process; it uses
`only a vector representation of the document. We are now performing extensive testing
`with larger collections which does not place such a severe constraint on a user.
`
`Once we had developed some search guidelines, the fifteen remaining queries in the
`Medlars collection were processed using the developed search procedures. Thelist of
`relevant documents wasof course initially empty at the beginning of each search process.
`The companion global tree viewing program was not needed in a search, since a frequent
`user of the browser system has little problem with navigation of the search tree; novice
`users would certainly want to use the companion program, however. For each query, the
`query text was initially inspected prior to the navigation, and the resulting query vector
`edited as needed. Depending on the intermediate results obtained and the general feedback
`gained from the browsing process, the query vector was often modified to produce
`additional relevant documents.
`
`On an average, we were able to retrieve 55% of the relevant documents for the queriesin
`the test set, The automatic retrieval system had a recall value of only 32%. Thus, even
`without taking full advantage of the information linked to a node (namely, the document
`abstract itself), use of the interactive browser yielded a significant improvement over
`automatic cluster searches. Additionally, use of the hypertext system resulted in the return
`of slightly fewer irrelevant documents; with the browser, 25% of the documents found
`were irrelevant compared to 28% in the automatic system.
`
`CONCLUSION
`
`Our immediate objective in this work was to produce a retrieval system that allows easy
`and accurate searching and browsing of a documentcollection, The representation of a
`collection as a cluster hierarchy was shown to provide a solid basis on which to build a
`hypertext retrieval system. The interactive browser is believed to be sufficiently
`comprehensive and flexible enough to support a variety of experiments designed to
`evaluate the effects of user control, user intervention, and the visual analysis of graphical
`data representation on retrieval performance during the cluster search process, The dynamic
`nature of the HyperCard environment on which the browseris based is well suited to meet
`the need of flexibility required for these tasks.
`
`REFERENCES
`
`[Fris88]
`
`M. E.Frisse, Searching for Information in a Hypertext Medical Handbook.
`Communications of the ACM, 31:7 (1988), pp. 880-886.
`
`(Clev84]=C. W. Cleverdon, Optimizing Convenient On-Line Access to Bibliographic
`Databases. /nformation Service Use, 4 (1984), pp. 37-47.
`
`(Crou89|
`
`D. B. Crouch and R. Korfhage, The Use of Visual Representations in
`Information Retrieval Applications. In Visual Languages and Applications,
`R. Korfhage (ed.), Pergamon Press, New York (1989),
`
`[Smit88]
`
`J. B. Smith and S. F. Weiss, Hypertext. Communications of the ACM,
`31:7 (1988), pp. 816-819,
`
`[Salt86]
`
`G. Salton, Another Look at Automatic Text-Retrieval Systems.
`Communications of the ACM, 29:7 (1986), pp. 648-656.
`
`
`Hypertext '89 Proceedings
`
`236
`
`(cid:20)(cid:21)
`12
`
`November 1989
`
`
`
`[Fox86]
`
`E. A, Fox, Information Retrieval: Research into New Capabilities, In CD
`ROM, S. Lambert and $. Ropiequet (eds.), Microsoft Press, Redmond,
`Washington (1986), pp. 143-174.
`
`[Col v86]
`
`G, Colvin, The Current State of Text Retrieval. In CD ROM, S. Lambert
`and S$, Ropiequet(eds.), Microsoft Press, Redmond, Washington (1986), pp.
`131-136.
`
`[Rijs79]
`
`[Sali83]
`
`[Wong84]
`
`(Voor85]
`
`[Crof80]
`
`(Salt71)
`
`{Andr89]
`
`C.J. Van Rijsbergen, Information Retrieval, Second Edition. Buttersworth,
`London (1979).
`
`G, Salton and M. J. McGill, introduction to Modern Information Retrieval.
`McGraw-Hill Book Company, New York (1983),
`
`S$. K. M. Wong and V. V. Raghavan, Vector Space Model of Information
`Retrieval;