`Information Retrieval’
`
`Donald B. Crouch, Carolyn J. Crouch and Glenn Andreas
`
`Department of Computer Science
`University of Minnesota - Duluth
`320 Heller Hall
`Duluth, Minnesota 55812
`
`ABSTRACT
`
`The graph-traversal approach to hypertext informationretrieval is a conceptualization of
`hypertext in which the structural aspects of the nodes are emphasized. A user navigates
`through such hypertext systems by evaluating the semantics associated with links
`between nodes as well as the information contained in nodes. [Fris88] In this paper we
`describe an hierarchical structure which effectively supports the graphical traversal of a
`documentcollection in a hypertext system. We provide an overview of an interactive
`browser based on cluster hierarchies. Initial results obtained from the use of the browser
`in an experimental hypertextretrieval system are presented.
`
`INTRODUCTION
`
`Information retrieval is concerned with the representation, storage and retrieval of
`documents or documentsurrogates. Information retrieval activities are routinely conducted
`on-line under the control of search intermediaries or end users who have beentrained to
`use somewhat complex user-system interfaces. However, poor query formulations and
`inadequate user-system interaction still occur even with skilled users. For example,
`Cleverdon has noted that “if two search intermediaries search the same question on the
`same database on the samehost, only 40 percent of the output may be common to both
`searches.” [Clev84]
`
`What is being done to aid users of information retrieval systems? The most common
`approaches are generally directed either toward the development of aids based on
`sophisticated user interfaces or toward the developmentof expert system techniquesfor the
`more complex operations of text retrieval systems. [Crou89} Research involving
`sophisticated user interfaces is primarily concerned with system functioning and
`convenience asit relates to the user; its goal is to facilitate the use of the system by
`providing computerized aids previously available only to the search intermediary in non-
`computerized forms. Among the facilities normally included in systemsof this type are
`vocabulary displays, thesaurus expansion of vocabulary items designed to add related
`terms to already existing search words, the construction and storage of search protocols,
`operations with previously formulated queries, etc. While this type of research is
`warranted and its results encouraging, it has not necessarily produced more effective
`retrieval but instead has generated tools for effortless learning and use of an information
`retrieval system.
`
`The other major area of research in the developmentof user aids for informationretrieval
`is concerned with the design of expert systems that facilitate access to the stored
`
`*This work was supported by the National Science Foundation undergrant IRI 87-02735.
`
`Hypertext '89 Proceedings
`
`225
`
`1
`1
`
`November 1989
`
`(cid:40)(cid:59)(cid:20)(cid:19)(cid:20)(cid:20)
`EX1011
`
`
`
`collections. The goal of such research is to capture the expertise of search intermediaries
`in formulating Boolean queries and in dealing with other typesofretrieval services. The
`expert approach is based on the use of domain-specific knowledge that covers the topic
`areas represented by the collection, a language analyzer that can understand natural
`language queries and translate them into appropriate internal forms, and rules for search
`formulation and search strategy designed to choose search methods based onusercriteria.
`It may eventually become feasible to generate search formulation criteria in the form of
`rules that do in fact reflect the expert knowledge of trained search intermediaries. However,
`for the time being, one has reason to be careful in accepting many of the currently
`unevaluated design proposals for expert system approachesas effective solutions to the
`retrieval problem.
`
`Wesubmitthat a viable alternative to using either very sophisticated user interfaces or
`expert systems as a solution to the retrieval problem consists of using only simple user-
`system interactions which enhance the effectiveness of retrieval operations through the
`addition of properly designed user friendly features. These features allow the user to
`function in an active role, replacing the full natural language comprehension whichis
`desirable yet currently unavailable in an automatic search expert.
`
`This approach to interface design is inherent in the concept of hypertext information
`retrieval. Hypertext supports a user's exploration of informational data items by
`representing data as a network of nodes containing text, graphics and other forms of
`information. [Smit88] A user may navigate through the hypertext system by following
`the links between nodes. The path a user follows is determined by his/her analysis of the
`information contained within the nodes and the semantics associated with links between
`the nodes. [Fris88]
`
`In hypertext information retrieval, each node is generally assumed to be a single
`document. Links exist which connect each documentto other documents having keywords
`in common with it; the semantics of the links between nodes are keywords (document
`index terms) or some descriptive information representing the connected documents. In
`this paper we introduce an hierarchical structure which provides additional semantic
`information within and between nodes. This structure seems particularly well suited to the
`user's exploration of a documentcollection in a visual context. The user may browse
`among the data items by analyzing a graphical display of the structure itself as well as the
`semantic links between nodes.
`
`In the next two sections, we briefly describe the retrieval model and the characteristics of
`the hierarchy on which our structure is based. We then describe a prototype of a hypertext
`retrieval system utilizing the cluster hierarchy and present the initial results of an
`experiment comparing retrieval performance of the hypertext system with that of an
`automatic retrieval system.
`
`INFORMATION RETRIEVAL MODELS
`
`The most commoninformation retrieval models are the Boolean retrieval model and the
`vector space model. These two models are briefly described and their use in conventional
`information retrieval systems examined.
`
`Boolean Retrieval Model
`
`Mostretrieval systems are based on the Boolean model. Queries are expressed as a set of
`terms connected by the Boolean operators and, or and not. Such systemsretrieve
`information by performing the Boolean operations on the corresponding sets of
`documents containing the query terms. Although the Boolean model can be used
`effectively in automatic text retrieval (in fact, a query can be formulated to retrieve any
`particular subset of items), imprecise or broad requests utilizing the or relation can result
`in the retrieval of large numbers of irrelevant texts while narrow or overly precise queries
`
`
`Hypertext '89 Proceedings
`
`226
`
`2
`
`November 1989
`
`
`
`utilizing the and relation can exclude many relevant items. In practice a compromiseis
`often obtained by the use of a query formulation that is neither too broad nor too narrow.
`[Salt86]
`
`Although the Boolean model has been widely accepted, it does have its problems:
`
`¢
`
`¢
`
`*
`
`e
`
`Boolean queries are difficult to construct; intermediaries are generally required to
`add terms notoriginally included, provide synonyms or alternate spellings, drop
`high-frequency terms,etc. [Fox86],
`
`Boolean systems generally do not provide for the assignment of term weights,
`
`the size of the subset of documents to be returned is difficult to control, and
`
`the retrieved documents are usually presented in a random order (no ranking based
`on an estimate of the query-documentrelevance is provided).
`
`The difficulties associated with the construction of Boolean queries are well known. One
`author recently commentedthat “research and developmentin information retrieval since
`the 1950’s has concentrated on methods which can provide better retrieval without the
`need for Boolean queries.” [Colv86]
`
`Vector Space Model
`
`The vector space model is conceptually the simplest retrieval model and is better suited
`for use in hypertextretrieval systems than the Boolean model. In the vector space model,
`the content of each document or query is represented by a set of possibly weighted content
`terms(i.e., some form of content identifier, such as a word extracted from the document
`text, a word phrase, or concept class chosen from a thesaurus). A term’s weight reflects
`its importance in relation to the meaning of the document or query. Each informational
`item (document) may then be considered a term vector, and the complete document
`collection becomes a vector space whose dimension is equal to the number ofdistinct
`terms used to identify the documentsin the collection. [Rijs79, Salt83]
`
`In the vector space model, it is assumed that similar or related documents or similar
`documents and queries are represented by similar multidimensional term vectors.
`Similarity is then generally defined as a function of the magnitudes of the matching terms
`in the respective vectors.
`
`A vector representation of documents and queriesfacilitates certain retrieval operations,
`namely:
`
`¢
`
`¢
`
`«
`
`The construction of a clustered documentfile (consisting of classes of documents
`such that documents within a given class are substantially similar to each other),
`In clustered collections, an automatic search can be limited to the documents
`within those clusters whose class vector representations are similar to the query
`vector.
`
`The ranking of retrieved documents in decreasing order of their similarity with
`the query.
`
`The automatic reformulation of the query based on relevance assessments
`supplied by the user for previously retrieved documents. The intent of relevance
`feedback is to produce a modified query whose similarity to the relevant
`documents is greater than that of the original query while its similarity to the
`nonrelevantitems is smaller.
`
`The vector processing model also exhibits certain disadvantages, namely:
`
`*
`
`Some model parameters, such as the query-documentsimilarity function, are not
`derivable within the system but instead are chosen a priori by the system
`designer.
`
`Hypertext '89 Proceedings
`
`227
`
`3
`
`November 1989
`
`
`
`¢
`
`«
`
`Terms are assumed to be independentof one another.
`
`Term relationships are not expressible within the model.
`
`A recent characterization of the vector space model is contained in [Wong84].
`
`CLUSTERED DOCUMENT ENVIRONMENTS
`
`A principal advantage of the vector space model for use in hypertext informationretrieval
`is that algorithms exist for structuring a documentcollection in such a manner that
`similar documents are grouped together. A cluster hierarchy is represented by a tree
`structure in which terminal nodes correspond to single documents andinterior nodes to
`groups of documents, In a hypertext system based on a clustered environment, the user
`can readily focus his/her search on those groups (clusters) that are likely to contain
`documents which are highly similar to the query. Additionally, the cluster hierarchy is
`beneficial as a browsing tool in that it makes it possible easily to locate neighboring
`items with related subject descriptions.
`
`Agglomerative Cluster Hierarchy
`
`Voorhees [Voor85] has shownthatretrieval effectiveness may be enhanced in automatic
`retrieval systems when a type of clustering, known as agglomerative hierarchic clustering,
`is used to generate a cluster structure. In such a clustering method, each documentin the
`collection is considered initially to be a singleton cluster. The two closest clusters are
`successively merged until only one cluster remains. The definition of closest depends on
`the actual clustering method being used.
`
`Fig. 1 contains an example of a hierarchy for the single link agglomerative clustering
`method. In the single link method the similarity between twoclusters is the maximum of
`the similarities between all pairs of documents such that one documentof the pair is in
`one cluster and the other documentis in the other cluster. It may be noted that in the
`hierarchy documents may appear at any level and that clusters overlap only in the sense
`that smaller clusters are nested within larger clusters.
`
`Eachcluster in Fig. 1 is labelled with the level of association between the items underit.
`The clustering level determines the association strength of the corresponding items. Thus
`the similarity between items B, C and D in Fig. 4 is 0.9. On the other hand, the
`similarity between item A and the cluster containing items B, C and D is only 0.7. The
`level of association is a useful link semantic in a hypertext system.
`
`Searching a Clustered Environment
`
`To retrieve documents automatically in a clustered environment, comparisons are
`generally made between the query vector and documentvectors using one of the standard
`measures of similarity. A cluster search simplifies the search process by limiting the
`search to subsets of documents. For example, with an agglomeratively clustered tree such
`as that shownin Fig. 1, a straightforward, narrow, depth-first search starts at the top of
`the tree and calculates the similarity between the query and eachofits children. The child
`most similar to the query is selected, and the similarity between the query and each ofthe
`non-documentchildren of that node is calculated. The process is repeated until either all
`the similarities between the query and the non-documentchildren of some nodeare less
`than that between the query and the nodeitself, or all the children of that node are
`document nodes. The documents comprising the cluster represented by that node are
`returned. The search may be broadened by considering more than one path at each level.
`The broadest search considers all paths and abandonsthem as they fail certain criteria.
`
`Hypertext '89 Proceedings
`
`228
`
`4
`
`November 1989
`
`
`
`
`
`Fig. 1 A sample single link hierarchy
`
`A bottom-up search may also be performed on such a tree. The cluster at the lowest level
`of the tree whose centroid is most similar to the query is chosen as the node at which the
`search will start. The search continues up the tree until the similarity between the query
`and the parent of the current node is smaller than the similarity between the query and the
`current node, The documents contained in the cluster corresponding to the current node are
`returned. The bottom-up search is often more effective due to the uncertainty involved at
`high levels of the hierarchy. [Crof80]
`
`Cluster hierarchies have been used effectively in automatic searches. Such hierarchies are
`also useful in performing searches based on browsing operations. These types of
`operations, we believe, can produce significant improvement in retrieval performance.
`Automatic cluster searches are highly structured; the next link in the search path is
`determined solely on the basis of the similarity between the query vector and the vector
`representation of the node being evaluated. By displaying suitable portions of the
`hierarchy during the course of the search operations andletting the user choose appropriate
`search paths at each point, the output obtained should be superior to that obtained by
`automatic cluster searching. For example, in a hypertext system with an interactive
`browser, following evaluation of items B, C, and D in the sample tree of Fig. 1, the user
`has the choice of exploring either a tightly clustered structure containing items F and G
`(which are very similar to each other with a similarity value of 0.8) or of staying in the
`same Cluster and evaluating item A (at a lower similarity level of 0.7). In contrast, the
`control mechanism of the automatic search procedure may terminate the search at the node
`labelled 0.7 and never evaluate the cluster containing items F and G. The effectiveness of
`this type of user-directed, interactive browsing is determined by comparing the results of
`such interactive searches to those obtained by automatic cluster searches.
`
`THE INTERACTIVE BROWSER
`
`A browser incorporating the cluster hierarchy as its primary network structure was
`implemented on a Macintosh IIx computer using HyperCard. The Macintosh is connected
`via a local area network to a SUN System on which the SMARTinformation retrieval
`system [Salt71] resides. The SMART system provides packages for textual analysis,
`clustering, performance evaluation,etc.
`
`To conduct a search using the browser, a user initially specifies a natural language query
`which is subsequently transformed into a term vector representation via the SMART
`retrieval system. The hypertext system then displays a window containing the original
`query and its corresponding term vector (Fig. 2). As suggested by the annotated display of
`
`Hypertext '89 Proceedings
`
`229
`
`5
`
`November 1989
`
`
`
`Fig. 3, a user may obtain the word stem associated with each conceptin the term vector
`as well as the documentfrequency ofthat term by clicking on a concept numberin the
`vector. At any point during the search process, the user may add or delete concepts from
`the query vector representation or completely re-specify the query itself. The query
`window also containsa list of identifiers representing the documents which the user has
`determined to be relevant to the query. Initially this list is empty; however, as the user
`conducts the search process, he/she enters documentsinto thelist.
`
`To begin (or continue) a browse in the clustered environment, the user clicks on the
`Use Query button. The interface presents a display of the clustered document space
`represented as a complete link hierarchy. A user may begin an exploration of the cluster
`tree at any point, for example, at the root node for a top-down approach orat a leaf node
`(document) for a bottom-up approach. The user mayprefer to initiate a search at an
`interior node (a cluster) which contains one or more documents knownto be similar to
`the query.
`
`In general, a tree representation of a clustered collection is too large to be displayed inits
`entirety. Therefore, a user is presented with two viewsofthe cluster tree simultaneously:
`a local view containing the subtree within which the user is currently browsing (see
`Fig. 4) and a global view, a more comprehensive view of the tree containing a
`significantly larger number of nodes than the local view (see Fig. 5). A user-directed
`traversal among the nodesis simultaneously reflected in both displays. The global view
`permits the user to observe where the search is being conductedin relation to the entire
`tree while the local view provides the user with more detailed information abouta specific
`subtree.
`
`As may be noted, many links and informational items are provided by the interface
`system to aid the user during the browsing process. As suggested by Fig. 4, the local
`view ofthe tree:
`
`e
`
`«
`
`Usesdifferent iconic representations to distinguish interior nodes (clusters) from
`leaf nodes (documents).
`
`Displays for each interior node the level at which the documents cluster. The
`clustering level represents the degree of association between the items underit.
`
`¢—Lists the number of documents contained within the subtree defined by each node
`as well as the number of children of that node. This information can also be
`obtained by counting the nodesin the global view of thetree.
`
`*
`
`¢
`
`*
`
`Lists the value of the correlation measure of the query vector with either the
`centroid vector or the documentvector associated with each nodein the subtree.
`During the search process the user may change the correlation measure being
`calculated by means of the Correlation Measure pop-up menu.At present, the
`system provides a choice of several measures including vector product, inner
`product, Tanimoto, cosine and overlap.
`
`Provides a listing of the concepts contained within the query vector (see also
`Fig. 6). This information is also displayed in the query window; however,in the
`tree display, the concepts in the query are displayed in ascending order of
`documentfrequency. The user mayalter the query by adding or deleting concepts
`from the query vector during the search process without returning to the query
`window.
`
`Usesdifferent iconic representations to distinguish relevant documents from the
`other documentsin the tree. A list of the documents which the user has chosen
`as relevant to the query is maintained in the display. The user mayfreely insert
`document identifiers into and delete items from this list. The icons of the
`documentsin this list are then highlighted in the tree representation.
`
`*
`
`Lists documentidentifiers represented by the leaf nodesof the tree.
`
`
`
`Hypertext '89 Proceedings
`
`230
`
`6
`
`November 1989
`
`
`
`Effect of azathioprine on systemic lupus erythematosus,
`particularly in regard to renal lesions
`
`Concepts Weight
`
`Relevant Docs
`
`in the que ——alls
`
`LuntAtLe.
`| List of documents
`i relevant to que
`
`Clicking on a concept
`numberreveals the word
`forming that concept and the
`documentfrequency of the
`concept.
`It also outlines the
`first occurrence of that word
`
`-_—=
`
`Fig. 3 Clicking on a concept number
`
`Hypertext '89 Proceedings
`
`231
`
`7
`
`November 1989
`
`
`
`
`
`
`
`
`
`
`
`Listofconcept
`List of
`Clustering
`
`numbers used
`relevant
`
`
`Level
`documents
`cCoUrcK ney
` 0.259000
`2/2
`0.0578
`
`
`Interior node with
`j ie
`
`one concept number | 0.0437
`
`in common
`
`Correlation Measure
`
`To Doc...
`
`
`
`
`
`
`# Children/# total documents
`
`
`
`A document with
`
`
`no concept numbers
`in common
`
`
`
`Interior node with
`
`
`A document with
`
`no concept numbers
`
`
`one concept number
`in common
`
`in common
`
`Fig. 4 The browserandits parts
`
`SSS Ctenn's Disk:tree:
`9.120000
`2/7
`
`BP|[AIA[A1G/[A1A[a AA
`
`
`Fig. 5 The overview of the tree
`
`tt
`Hypertext '89 Proceedings
`232
`November 1989
`
`8
`
`
`
`One can obtain additional information from the local tree display by clicking on various
`items contained within the display. For example, by option clicking on a node which has
`terms in common with a query, the selected node's icon is replaced with an informational
`window listing the terms in common and the weights associated with these terms
`(Fig. 7). Choosing a concept number in the query vector results in a display of the
`textual description of the chosen term and its document frequency (Fig. 6). Clicking on a
`terminal node (a documenticon)results in the display of additional information associated
`with the document. An example of a document window is contained in Fig. 8. One can
`also review the query or enter a new query by selecting the To Query button. It should be
`noted that each of the display windowshas informational features associated with them
`which support the visual search process.
`
`The global view of the tree (Fig. 5) has few items associated with a node, since the
`purpose of this display is to assist the user in locating his/her position within the tree
`during the navigation of the cluster space, The three types of icons used in the global
`view distinguish the following types of nodes:
`
`*
`
`e
`
`*
`
`A documentclassified as relevant by the user. As previously noted, identifiers of
`the relevant documents are highlighted in the local view.
`
`A documentwhich has notbeen classified as relevant.
`
`The node correspondingto the central nodein the local view of the tree.
`
`These nodes are color codedto facilitate a quick review ofthetree at large.
`
`0.086000
`
`Fig. 6 Finding the term for a concept numberin the browser
`
`Hypertext '89 Proceedings
`
`233
`
`9
`
`November 1989
`
`
`
`HLORBOOD
`
`Option-Clicking on a documentorphMeasure|
`
`¢ node will list the common concept>
`number(andits weight)
`
`
`
`
`LF
`
`
`
`
`,
`3340 @ 0.0578 1
`
`
`
`Fig. 7 Getting common concept number information
`
`Item_ID
`
`Id of Document
`
`septicemia due to mycoplasma hominis type 1
`Viola m. young, ph.d., and sheldon m. wolff, m.d.
`1.00000
`a febrile illness that followed therapeutic
`1.00000
`abortion
`.
`and was accompanied by the presence of
`1.00000
`Mycoplasma hominis type 1 in the blood is
`1.00000
`described.
`1.00000
`the patient upon recovery exhibited a specific
`antibody
`reercasetoiba.oucsolecsrg isolated
`from the bfTextoftheAbstract|strain of the
`
`1.00000
`
`these
`
`findings are regarded as
`
`same serotype.
`additional
`evidence for the pathogenicity of m. hominis
`t
`rganisms,
`particularly
`in
`situati
`ype 1 organisms, particularly in
`situetions
`
`1.00000
`
`Fig. 8 Documentcard andits parts
`
`Hypertext '89 Proceedings
`
`234
`
`10
`10
`
`November 1989
`
`
`
`The user may view (move to) other portions of the tree not currently in the local or
`global windowsin one of several ways:
`
`*
`
`*
`
`*
`
`*
`
`By scrolling up or down orleft or right in the global display window. The
`window movesacrossthe tree in the global view in the direction represented by
`the scrolling action of the mouse.
`
`By clicking on a nodein the local view. In this case, the chosen node becomes
`the central node in the local view. Such an action effectively moves the local
`view up or downonelevelofthetree.
`
`Byclicking on the To Root button in the local view. This action causes the
`interface system to redraw the local view of the tree with the central node
`becoming the root node.
`
`By clicking on the To Doc button in the local view. The To Doc button
`allows the user to view the document window associated with any documentin
`the collection. To obtain a document window, the user must specify the
`document’s identifier. Clicking on the To Tree button in the document
`window returns the user to a local view of the tree with its central node
`corresponding to the parent node of the document contained in the document
`window.
`
`The tree displays described above and the informational windowsassociated with items
`contained in the displays are effective for representing a local cluster arrangement for a
`small collection or a local area of a larger collection.
`
`EVALUATION OF THE BROWSER
`
`The SMARTinformation retrieval system provides a general framework for conducting
`retrieval experiments. SMART has fully automatic iterative search methods with
`automatic relevance documentclassification. The means exist within the system for
`evaluating the effectiveness of the retrieval process; the effectiveness of any interactive
`system can be established by comparison with automatic search procedures contained in
`the SMART system. SMARTalso provides collections of documents and corresponding
`sets of queries which may be used for experimentation. Relevance assessments have been
`produced by persons knowledgeable of the subject matter in the collections.
`
`In order to develop a general search strategy which a user may employ in the hypertext
`retrieval system, we focused on the MEDLARSCollection, a somewhat homogeneous
`collection generated by the National Library of Medicine. A user-controlled cluster search
`technique which performs well in a homogeneouscollection will perform as well or
`generally significantly better in a heterogeneous collection. MEDLARSconsists of 1,033
`documents in the medical field and a corresponding set of 30 queries. The document
`vectors were generated from an analysis of the abstracts of the documents. The
`MEDLARScollection was then clustered using a complete link clustering algorithm,
`resulting in a very wide tree. The cluster hierarchy contains 76 subtrees at the root node,
`and the maximum depth ofthetreeis ten.
`
`To assist in the development of a method for conducting a visual interactive search of a
`clustered collection, we divided the MEDLARSquery collection into two subsets. One set
`of queries (the base set) was used to aid in the development of the methodology; the other
`set was used to estimate the performanceofthe interactive search process.
`
`Weperformed an interactive search in the hypertext system for each query in the base set
`to determine the optimal search a user would follow in order to retrieve the known
`relevant documents of the query. By conducting this process for each of the 15 queries in
`the base set, we were able to observe and define the commonthreads linking relevant
`documents to the cluster tree. Our observations and the resulting search method that
`evolved are reported in [Andr89}.
`
`Hypertext '89 Proceedings
`
`235
`
`11
`11
`
`November 1989
`
`
`
`An important point about both phases of the experimentis that the actual text of the
`document(as shownin the documentcards) was never examined during the interactive
`search process to determine document relevance. Doing so would of course have
`substantially improved retrieval performance in an actual user-controlled search process.
`However, one of our objectives when conducting this experiment was to apply someof
`the insight gained from the interactive visual search to an automated system. An
`automated search system does not consult actual text during the search process; it uses
`only a vector representation of the document. We are now performing extensive testing
`with larger collections which does not place such a severe constraint on a user.
`
`Once we had developed some search guidelines, the fifteen remaining queries in the
`Medlars collection were processed using the developed search procedures. The list of
`relevant documents wasofcourseinitially empty at the beginning of each search process.
`The companion global tree viewing program was not neededin a search, since a frequent
`user of the browser system has little problem with navigation of the search tree; novice
`users would certainly want to use the companion program, however. For each query, the
`query text wasinitially inspected prior to the navigation, and the resulting query vector
`edited as needed. Depending on the intermediate results obtained and the general feedback
`gained from the browsing process, the query vector was often modified to produce
`additional relevant documents.
`
`On an average, we were able to retrieve 55% of the relevant documents for the queries in
`the test set. The automatic retrieval system had a recall value of only 32%. Thus, even
`without taking full advantage of the information linked to a node (namely, the document
`abstract itself), use of the interactive browser yielded a significant improvement over
`automatic cluster searches. Additionally, use of the hypertext system resulted in the return
`of slightly fewer irrelevant documents; with the browser, 25% of the documents found
`were irrelevant compared to 28% in the automatic system.
`
`CONCLUSION
`
`Our immediate objective in this work was to produce a retrieval system that allows easy
`and accurate searching and browsing of a documentcollection. The representation of a
`collection as a cluster hierarchy was shown to provide a solid basis on which to build a
`hypertext retrieval system. The interactive browser is believed to be sufficiently
`comprehensive and flexible enough to support a variety of experiments designed to
`evaluate the effects of user control, user intervention, and the visual analysis of graphical
`data representation on retrieval performance during the cluster search process, The dynamic
`nature of the HyperCard environmenton which the browseris based is well suited to meet
`the need offlexibility required for these tasks.
`
`REFERENCES
`
`[Fris83]
`
`M.E. Frisse, Searching for Information in a Hypertext Medical Handbook.
`Communications of the ACM, 31:7 (1988), pp. 880-886.
`
`[Clev84]
`
`C. W. Cleverdon, Optimizing Convenient On-Line Access to Bibliographic
`Databases. Information Service Use, 4 (1984), pp. 37-47.
`
`[Crou89]
`
`D. B. Crouch and R. Korfhage, The Use of Visual Representations in
`Information Retrieval Applications. In Visual Languages and Applications,
`R. Korfhage (ed.), Pergamon Press, New York (1989).
`
`[Smit88]
`
`J. B. Smith and S. F. Weiss, Hypertext. Communications of the ACM,
`31:7 (1988), pp. 816-819.
`
`G. Salton, Another Look at Automatic Text-Retrieval Systems.
`Communications of the ACM, 29:7 (1986), pp. 648-656.
`
`
`[Salt86]
`
`Hypertext '89 Proceedings
`
`236
`
`12
`12
`
`November 1989
`
`
`
`[Fox86]
`
`[Colv86]
`
`[Rijs79]
`
`{Salt83]
`
`[Wong84]
`
`[Voor85]
`
`[Crof80]
`
`[Salt71]
`
`[Andr89]
`
`E. A. Fox, Information Retrieval: Research into New Capabilities. In CD
`ROM,S. Lambert and S. Ropiequet (eds.), Microsoft Press, Redmond,
`Washington (1986), pp. 143-174.
`
`G. Colvin, The Current State of Text Retrieval. In CD ROM, S. Lambert
`and S. Ropiequet (eds.), Microsoft Press, Redmond, Washington (1986), pp.
`131-136.
`
`C. J. Van Rijsbergen, Information Retrieval, Second Edition. Buttersworth,
`London (1979).
`
`G, Salton and M. J. McGill, Introduction to Modern Information Retrieval.
`McGraw-Hill Book Company, New York (1983).
`
`S. K. M. Wong and V. V. Raghavan, Vector Space Model of Information
`Retrieval: A Reevaluation. In Research and Development in Information
`Retrieval, C. J. van Rijsbergen (ed.), Cambridge University Press, London
`(1984), pp. 167-186.
`
`E. M. Voorhees, The Effectivenes