`Information Retrieval*
`
`in Hypertext
`
`Donald B. Crouch, Carolyn J. Crouch and Glenn Andreas
`
`Department of Computer Science
`University of Minnesota - Duluth
`320 Heller Hall
`Duluth, Minnesota 55812
`
`ABSTRACT
`
`The graph-truversul approach to hypertext information retrieval is a conceptualization of
`hypertext in which the structural aspects of the nodes are emphasized. A user navigates
`through such hypertext systems by evaluating
`the semantics associated with links
`between nodes as well as the information contained in nodes. [Fris88] In this paper we
`describe an hierarchical structure which effectively supports the graphical traversal of a
`document collection in a hypertext system. We provide an overview of an interactive
`browser based on cluster hierarchies. Initial results obtained from the use of the browser
`in an experimental hypertext retrieval system are presented.
`
`INTRODUCTION
`
`is concerned with the representation, storage and retrieval of
`retrieval
`Information
`documents or document surrogates. Information retrieval activities are routinely conducted
`on-line under the control of search intermediaries or end users who have been trained to
`use somewhat complex user-system interfaces. However, poor query formulations and
`inadequate user-system interaction still occur even with skilled users. For example,
`Cleverdon has noted that “if two search intermediaries search the same question on the
`same database on the same host, only 40 percent of the output may be common to both
`searches.” [Clev84]
`
`What is being done to aid users of information retrieval systems? The most common
`approaches are generally directed either toward the development of aids based on
`sophisticated user interfaces or toward the development of expert system techniques for the
`more complex operations of text retrieval systems. [Crou89] Research involving
`sophisticated user interfaces is primarily concerned with system functioning and
`convenience as it relates to the user; its goal is to facilitate the use of the system by
`providing computerized aids previously available only to the search intermediary in non-
`computerized forms. Among the facilities normally included in systems of this type are
`vocabulary displays, thesaurus expansion of vocabulary items designed to add related
`terms to already existing search words, the construction and storage of search protocols,
`operations with previously
`formulated queries, etc. While
`this type of research is
`warranted and its results encouraging, it has not necessarily produced more effective
`retrieval but instead has generated tools for effortless learning and use of an information
`retrieval system.
`
`The other major area of research in the development of user aids for information retrieval
`is concerned with the design of expert systems that facilitate access to the stored
`
`*This work was supported by the National Science Foundation under grant IRI 87-02735.
`
`Hypertext
`
`‘89 Proceedings
`
`225
`
`November 1989
`
`IPR2017-01039
`Unified EX1011 Page 1
`
`
`
`collections. The goal of such research is to capture the expertise of search intermediaries
`in formulating Boolean queries and in dealing with other types of retrieval services. The
`expert approach is based on the use of domain-specific knowledge that covers the topic
`areas represented by the collection, a language analyzer that can understand natural
`language queries and translate them into appropriate internal forms, and rules for search
`formulation and search strategy designed to choose search methods based on user criteria.
`It may eventually become feasible to generate search formulation criteria in the form of
`rules that do in fact reflect the expert knowledge of trained search intermediaries. However,
`for the time being, one has reason to be careful in accepting many of the currently
`unevaluated design proposals for expert system approaches as effective solutions to the
`retrieval problem.
`
`We submit that a viable alternative to using either very sophisticated user interfaces or
`expert systems as a solution to the retrieval problem consists of using only simple user-
`system interactions which enhance the effectiveness of retrieval operations through the
`addition of properly designed user friendly features. These features allow the user to
`function in an active role, replacing the full natural language comprehension which is
`desirable yet currently unavailable in an automatic search expert.
`
`This approach to interface design is inherent in the concept of hypertext information
`retrieval. Hypertext supports a user’s exploration of informational data items by
`representing data as a network of nodes containing text, graphics and other forms of
`information.
`[Smit88] A user may navigate through the hypertext system by following
`the links between nodes. The path a user follows is determined by his/her analysis of the
`information contained within the nodes and the semantics associated with links between
`the nodes. [Fris883
`
`retrieval, each node is generally assumed to be a single
`information
`In hypertext
`document. Links exist which connect each document to other documents having keywords
`in common with it; the semantics of the links between nodes are keywords (document
`index terms) or some descriptive information representing the connected documents. In
`this paper we introduce an hierarchical structure which provides additional semantic
`information within and between nodes. This structure seems particularly well suited to the
`user’s exploration of a document collection in a visual context. The user may browse
`among the data items by analyzing a graphical display of the structure itself as well as the
`semantic links between nodes.
`
`In the next two sections, we briefly describe the retrieval model and the characteristics of
`the hierarchy on which our structure is based. We then describe a prototype of a hypertext
`retrieval system utilizing
`the cluster hierarchy and present the initial
`results of an
`experiment comparing retrieval performance of the hypertext system with that of an
`automatic retrieval system.
`
`INFORMATION RETRIEVAL MODELS
`
`The most common information retrieval models are the Boolean retrieval model and the
`vector space model. These two models are briefly described and their use in conventional
`information retrieval systems examined.
`
`Boolean Retrieval Model
`
`Most retrieval systems are based on the Boolean model. Queries are expressed as a set of
`terms connected by the Boolean operators and, or and not. Such systems retrieve
`information by performing
`the Boolean operations on the corresponding sets of
`documents containing
`the query terms. Although
`the Boolean model can be used
`effectively in automatic text retrieval (in fact, a query can be formulated to retrieve any
`particular subset of items), imprecise or broad requests utilizing
`the or relation can result
`in the retrieval of large numbers of irrelevant texts while narrow or overly precise queries
`
`Hypertext
`
`‘89 Proceedings
`
`226
`
`November 1989
`
`IPR2017-01039
`Unified EX1011 Page 2
`
`
`
`the and relation can exclude many relevant items. In practice a compromise is
`utilizing
`often obtained by the use of a query formulation that is neither too broad nor too narrow.
`[Salt861
`
`.
`.
`.
`
`Although the Boolean model has been widely accepted, it does have its problems:
`.
`Boolean queries are difficult to construct; intermediaries are generally required to
`add terms not originally included, provide synonyms or alternate spellings, drop
`high-frequency terms, etc. pox863,
`Boolean systems generally do not provide for the assignment of term weights,
`the size of the subset of documents to be returned is difficult to control, and
`the retrieved documents are usually presented in a random order (no ranking based
`on an estimate of the query-document relevance is provided).
`The difficulties associated with the construction of Boolean queries are well known. One
`author recently commented that “research and development in information retrieval since
`the 1950’s has concentrated on methods which can provide better retrieval without the
`need for Boolean queries.” [Colv86]
`
`Vector Space Model
`
`The vecror space model is conceptually the simplest retrieval model and is better suited
`for use in hypertext retrieval systems than the Boolean model. In the vector space model,
`the content of each document or query is represented by a set of possibly weighted content
`terms (i.e., some form of content identifier, such as a word extracted from the document
`text, a word phrase, or concept class chosen from a thesaurus). A term’s weight reflects
`its importance in relation to the meaning of the document or query. Each informational
`item (document) may then be considered a term vector, and the complete document
`collection becomes a vector space whose dimension is equal to the number of distinct
`terms used to identify the documents in the collection. [Rijs79, Salt831
`
`In the vector space model, it is assumed that similar or related documents or similar
`documents and queries are represented by similar multidimensional
`term vectors.
`Similarity is then generally defined as a function of the magnitudes of the matching terms
`in the respective vectors.
`
`A vector representation of documents and queries facilitates certain retrieval operations,
`namely:
`.
`
`The construction of a clustered document file (consisting of classes of documents
`such that documents within a given class are substantially similar to each other).
`In clustered collections, an automatic search can be limited to the documents
`within those clusters whose class vector representations are similar to the query
`vector.
`The ranking of retrieved documents in decreasing order of their similarity with
`the query.
`The automatic
`reformulation of the query based on relevance assessments
`supplied by the user for previously retrieved documents. The intent of relevance
`is to produce a modified query whose similarity
`to the relevant
`feedback
`documents is greater than that of the original query while its similarity
`to the
`nonrelevant items is smaller.
`
`.
`
`.
`
`The vector processing model also exhibits certain disadvantages, namely:
`.
`Some model parameters, such as the query-document similarity function, are not
`derivable within
`the system but instead are chosen a priori by the system
`designer.
`
`Hypertext
`
`‘89 Proceedings
`
`227
`
`November
`
`1989
`
`IPR2017-01039
`Unified EX1011 Page 3
`
`
`
`.
`.
`
`Terms are assumed to be independent of one another.
`Term relationships are not expressible within the model.
`A recent characterization of the vector space model is contained in [wong84],
`
`CLUSTERED DOCUMENT ENVIRONMENTS
`
`A principal advantage of the vector space model for use in hypertext information retrieval
`is that algorithms exist for structuring a document collection
`in such a manner that
`similar documents are grouped together. A cluster hierarchy is represented by a tree
`structure in which terminal nodes correspond to single documents and interior nodes to
`groups of documents. In a hypertext system based on a clustered environment, the user
`can readily focus his/her search on those groups (clusters) that are likely
`to contain
`documents which are highly similar to the query. Additionally,
`the cluster hierarchy is
`beneficial as a browsing tool in that it makes it possible easily to locate neighboring
`items with related subject descriptions.
`
`Agglomerative Cluster Hierarchy
`
`Voorhees [Voor85] has shown that retrieval effectiveness may be enhanced in automatic
`retrieval systems when a type of clustering, known as agglomerative hierarchic clustering,
`is used to generate a cluster structure. In such a clustering method, each document in the
`to be a singleton cluster. The two closest clusters are
`collection is considered initially
`successively merged until only one cluster remains. The definition of closest depends on
`the actual clustering method being used.
`
`link agglomerative clustering
`Fig. 1 contains an example of a hierarchy for the single
`method. In the single link method the similarity between two clusters is the maximum of
`the similarities between all pairs of documents such that one document of the pair is in
`one cluster and the other document is in the other cluster. It may be noted that in the
`hierarchy documents may appear at any level and that clusters overlap only in the sense
`that smaller clusters are nested within larger clusters.
`
`Each cluster in Fig. 1 is labelled with the level of association between the items under it.
`The clustering level determines the association strength of the corresponding items. Thus
`the similarity between items B, C and D in Fig. 4 is 0.9. On the other hand, the
`similarity between item A and the cluster containing items B, C and D is only 0.7. The
`level of association is a useful link semantic in a hypertext system.
`
`Searching a Clustered Environment
`
`in a clustered environment, comparisons are
`To retrieve documents automatically
`generally made between the query vector and document vectors using one of the standard
`measures of similarity. A cluster search simplifies the search process by limiting
`the
`search to subsets of documents. For example, with an agglomeratively clustered tree such
`as that shown in Fig. 1, a straightforward, narrow, depth-first search starts at the top of
`the tree and calculates the similarity between the query and each of its children. The child
`most similar to the query is selected, and the similarity between the query and each of the
`non-document children of that node is calculated. The process is repeated until either all
`the similarities between the query and the non-document children of some node are less
`than that between the query and the node itself, or all the children of that node are
`document nodes. The documents comprising the cluster represented by that node are
`returned. The search may be broadened by considering more than one path at each level.
`The broadest search considers all paths and abandons them as they fail certain criteria.
`
`Hypertext
`
`‘89 Proceedings
`
`228
`
`November
`
`t 989
`
`IPR2017-01039
`Unified EX1011 Page 4
`
`
`
`Fig. 1 A sample single
`
`link hierarchy
`
`A bottom-up search may also be performed on such a tree. The cluster at the lowest level
`of the tree whose centroid is most similar to the query is chosen as the node at which the
`search will start. The search continues up the tree until the similarity between the query
`and the parent of the current node is smaller than the similarity between the query and the
`current node. The documents contained in the cluster corresponding to the current node are
`returned. The bottom-up search is often more effective due to the uncertainty involved at
`high levels of the hierarchy. [Crof?30]
`
`Cluster hierarchies have been used effectively in automatic searches. Such hierarchies are
`also useful in performing searches based on browsing operations. These types of
`operations, we believe, can produce significant improvement in retrieval performance.
`Automatic cluster searches are highly structured; the next link in the search path is
`determined solely on the basis of the similarity between the query vector and the vector
`representation of the node being evaluated. By displaying suitable portions of the
`hierarchy during the course of the search operations and letting the user choose appropriate
`search paths at each point, the output obtained should be superior to that obtained by
`automatic cluster searching. For example, in a hypertext system with an interactive
`browser, following evaluation of items B, C, and D in the sample tree of Fig. 1, the user
`has the choice of exploring either a tightly clustered structure containing items F and G
`(which are very similar to each other with a similarity value of 0.8) or of staying in the
`same cluster and evaluating item A (at a lower similarity
`level of 0.7). In contrast, the
`control mechanism of the automatic search procedure may terminate the search at the node
`labelled 0.7 and never evaluate
`the cluster containing items F and G. The effectiveness of
`this type of user-directed, interactive browsing is determined by comparing the results of
`such interactive searches to those obtained by automatic cluster searches.
`
`THE INTERACTIVE BROWSER
`
`the cluster hierarchy as its primary network structure was
`A browser incorporating
`implemented on a Macintosh 11x computer using HyperCard. The Macintosh is connected
`via a local area network to a SUN System on which the SMART information retrieval
`system [Salt711 resides. The SMART system provides packages for textual analysis,
`clustering, performance evaluation, etc.
`
`To conduct a search using the browser, a user initially specifies a natural language query
`which is subsequently transformed into a term vector representation via the SMART
`retrieval system. The hypertext system then displays a window containing the original
`query and its corresponding term vector (Fig. 2). As suggested by the annotated display of
`
`Hypertext
`
`‘89 Proceedings
`
`229
`
`November
`
`1989
`
`IPR2017-01039
`Unified EX1011 Page 5
`
`
`
`Fig. 3, a user may obtain the word stem associated with each concept in the term vector
`as well as the document frequency of that term by clicking on a concept number in the
`vector. At any point during the search process, the user may add or delete concepts from
`the query vector representation or completely re-specify the query itself. The query
`window also contains a list of identifiers representing the documents which the user has
`determined to be relevant to the query. Initially
`this list is empty; however, as the user
`conducts the search process, he/she enters documents into the list.
`
`To begin (or continue) a browse in the clustered environment, the user clicks on the
`Use Query button. The interface presents a display of the clustered document space
`represented as a complete link hierarchy. A user may begin an exploration of the cluster
`tree at any point, for example, at the root node for a top-down approach or at a leaf node
`(document) for a bottom-up approach. The user may prefer to initiate a search at an
`interior node (a cluster) which contains one or more documents known to be similar to
`the query.
`
`In general, a tree representation of a clustered collection is too large to be displayed in its
`entirety. Therefore, a user is presented with two views of the cluster try simultaneously:
`a local view containing the subtree within which the user is currently browsing (see
`Fig. 4) and a global view, a more comprehensive view of the tree containing a
`significantly
`larger number of nodes than the local view (see Fig. 5). A user-directed
`traversal among the nodes is simultaneously reflected in both displays. The global view
`permits the user to observe where the search is being conducted in relation to the entire
`tree while the local view provides the user with more detailed information about a specific
`subtree.
`
`.
`
`.
`
`items are provided by the interface
`As may be noted, many links and informational
`system to aid the user during the browsing process. As suggested by Fig. 4, the local
`view of the tree:
`. Uses different iconic representations to distinguish interior nodes (clusters) from
`leaf nodes (documents).
`. Displays for each interior node the level at which the documents cluster. The
`clustering level represents the degree of association between the items under it.
`Lists the number of documents contained within the subtree defined by each node
`as well as the number of children of that node. This information can also be
`obtained by counting the nodes in the global view of the tree.
`Lists the value of the correlation measure of the query vector with either the
`centroid vector or the document vector associated with each node in the subtree.
`During the search process the user may change the correlation measure being
`calculated by means of the Correlation Measure pop-up menu. At present, the
`system provides a choice of several measures including vector product, inner
`product, Tanimoto, cosine and overlap.
`Provides a listing of the concepts contained within the query vector (see also
`Fig. 6). This information is also displayed in the query window; however, in the
`tree display, the concepts in the query are displayed in ascending order of
`document frequency. The user may alter the query by adding or deleting concepts
`from the query vector during the search process without returning to the query
`window.
`. Uses different iconic representations to distinguish relevant documents from the
`other documents in the tree. A list of the documents which the user has chosen
`as relevant to the query is maintained in the display. The user may freely insert
`document identifiers
`into and delete items from this list. The icons of the
`documents in this list are then highlighted in the tree representation.
`Lists document identifiers represented by the leaf nodes of the tree.
`
`.
`
`.
`
`Hypertext
`
`‘89 Proceedings
`
`230
`
`November 1989
`
`IPR2017-01039
`Unified EX1011 Page 6
`
`
`
`Query * 12
`
`(iFG&)
`
`IEffect of azathioprine
`1 particularlu
`in regard
`
`lupus erythematosus,
`on systemic
`to renal
`lesions
`
`Conce ‘DtS Weight
`
`Relevant Dots
`
`I auerv with their
`
`1
`
`I relevant to query
`
`I
`
`Fig. 2 A sample query card and its parts
`
`Query * 12
`
`(iiiZiG$
`
`Effect of azathioprine
`particularly
`in regard
`
`on systemic
`to renal-1
`
`lupus erythematosus,
`
`Concepts Weight
`
`I Relevant Dots
`
`that concept and the
`forming
`document frequency of the
`concept.
`It also outlines the
`first occurrence of that word
`
`I
`
`II
`
`Fig. 3 Clicking on a concept number
`
`Hypertext
`
`‘89 Proceedings
`
`231
`
`November 1989
`
`IPR2017-01039
`Unified EX1011 Page 7
`
`
`
`..l..4.a!.z I... I ‘=
`
`
`..!..a5 . . . . _... is
`
`A533 . . . . . ...” is I. in common
`El
`1 ..!.m2.9 . . ..I is
`
`77’nn H ir
`
`..!..t?.z% . . . . . . Is
`
`.
`
`I “You are Here” node $
`
`I
`
`19
`
`-
`904
`
`2
`
`T\
`
`A document with
`
`CrJ
`
`Fig. 4 The browser and its parts
`
`Tree
`
`21
`
`0 F 22
`
`143
`
`.I.I.QI n
`
`. .47A
`
`,479
`
`Fig. 5 The overview of the tree
`
`Hypertext
`
`‘89 Proceedings
`
`232
`
`November 1989
`
`IPR2017-01039
`Unified EX1011 Page 8
`
`
`
`One can obtain additional information from the local tree display by clicking on various
`items contained within the display. For example, by option clicking on a node which has
`terms in common with a query, the selected node’s icon is replaced with an informational
`window
`listing
`the terms in common and the weights associated with these terms
`(Fig. 7). Choosing a concept number in the query vector results in a display of the
`textual description of the chosen term and its document frequency (Fig. 6). Clicking on a
`terminal node (a document icon) results in the display of additional information associated
`with the document. An example of a document window is contained in Fig. 8. One can
`also review the query or enter a new query by selecting the To Query button. It should be
`noted that each of the display windows has informational features associated with them
`which support the visual search process.
`
`The global view of the tree (Fig. 5) has few items associated with a node, since the
`purpose of this display is to assist the user in locating his/her position within the tree
`during the navigation of the cluster space. The three types of icons used in the global
`view distinguish the following types of nodes:
`.
`A document classified as relevant by the user. As previously noted, identifiers of
`the relevant documents are highlighted in the local view.
`A document which has not been classified as relevant.
`‘Ihe node corresponding to the central node in the local view of the tree.
`
`.
`.
`
`These nodes are color coded to facilitate a quick review of the tree at large.
`
`0.086000
`2/S
`0.0437
`
`. ..l..9.Q.Q .Y.. ....
`
`..1..m .5
`is
`
`~453 3.. ........
`..!..4.5.3 5.. .... is
`.... _ is
`..!..I.50 0..
`i s
`..!..%x .!. ...... is
`~~~~~
`
`luous 11
`
`\a
`
`1Q - kF’
`
`;
`
`/I
`
`I
`
`I
`
`Fig. 6 Finding the term for a concept number in the browser
`
`Hypertext
`
`‘89 Proceedings
`
`233
`
`November 1989
`
`IPR2017-01039
`Unified EX1011 Page 9
`
`
`
`node will
`
`list
`
`the common concept
`
`[ To Query...
`
`)
`
`Fig. 7 Getting common concept number
`
`information
`
`I tern-10
`
`0
`septicemia due to mycoplesma homints type I
`-
`viola m. young, ph.d., and sheldon m. Wolff, m.d.
`:g#i
`a febrile
`illness that followed therapeutic
`~~~~l;
`sbortion
`i&;
`$i!J
`and was accompanied by the presence of
`ijiiiii
`::::;::
`mycoplasma hominis type 1 in the blood is
`;gji
`::::.:_
`dcscri bed.
`;ig@
`@ii
`the patient upon recovery exhibited a specific
`::::>i
`~&~
`anti body
`isolated
`strain of the !#$
`from the
`$@
`Same acr
`garded as
`$:i$i)
`additional
`i&g
`:,.*.i;
`giij j
`evidence for the pathogenlcity of m. hominis
`~~~$~~
`type 1 organisms, particularly
`in situations
`iii;:;;
`$::i&
`favoring
`0.
`their dissemination from the female aenital
`
`1087
`1715
`1799
`2004
`2830
`2893
`
`Concept
`Numbers
`
`5271
`5578
`7321
`7611
`
`Fig. 8 DOCUMefIt card and its park
`
`Hypertext
`
`‘89 Proceedings
`
`234
`
`November 1989
`
`IPR2017-01039
`Unified EX1011 Page 10
`
`
`
`in the local or
`
`.
`
`.
`
`The user may view (move to) other portions of the tree not currently
`global windows in one of several ways:
`.
`By scrolling up or down or left or right in the global display window. The
`window moves across the tree in the global view in the direction represented by
`the scrolling action of the mouse.
`By clicking on a node in the local view. In this case, the chosen node becomes
`the central node in the local view. Such an action effectively moves the local
`view up or down one level of the tree.
`By clicking on the To Root button in the local view. This action causes the
`interface system to redraw the local view of the tree with the central node
`becoming the root node.
`By clicking on the To Dot button in the local view. The To Dot button
`allows the user to view the document window associated with any document in
`the collection. To obtain a document window,
`the user must specify the
`document’s
`identifier. Clicking on the To Tree button
`in the document
`window returns the user to a local view of the tree with its central node
`corresponding to the parent node of the document contained in the document
`window.
`
`.
`
`The tree displays described above and the informational windows associated with items
`contained in the displays are effective for representing a local cluster arrangement for a
`small collection or a local area of a larger collection.
`
`EVALUATION OF THE BROWSER
`
`The SMART information retrieval system provides a general framework for conducting
`retrieval experiments. SMART has fully automatic iterative search methods with
`automatic relevance document classification. The means exist within
`the system for
`evaluating the effectiveness of the retrieval process; the effectiveness of any interactive
`system can be established by comparison with automatic search procedures contained in
`the SMART system. SMART also provides collections of documents and corresponding
`sets of queries which may be used for experimentation. Relevance assessments have been
`produced by persons knowledgeable of the subject matter in the collections.
`
`In order to develop a general search strategy which a user may employ in the hypertext
`retrieval system, we focused on the MEDLARS Collection, a somewhat homogeneous
`collection generated by the National Library of Medicine, A user-controlled cluster search
`technique which performs well in a homogeneous collection will perform as well or
`generally significantly better in a heterogeneous collection. MEDLARS consists of 1,033
`documents in the medical field and a corresponding set of 30 queries. The document
`vectors were generated from an analysis of the abstracts of the documents. The
`MEDLARS collection was then clustered using a complete link clustering algorithm,
`resulting in a very wide tree. The cluster hierarchy contains 76 subtrees at the root node,
`and the maximum depth of the tree is ten.
`
`To assist in the development of a method for conducting a visual interactive search of a
`clustered collection, we divided the MEDLARS query collection into two subsets. One set
`of queries (the base set) was used to aid in the development of the methodology; the other
`set was used to estimate the performance of the interactive search process.
`
`We performed an interactive search in the hypertext system for each query in the base set
`to determine the optimal search a user would follow
`in order to retrieve the known
`relevant documents of the query. By conducting this process for each of the 15 queries in
`the base set, we were able to observe and define the common threads linking relevant
`documents to the cluster tree. Our observations and the resulting search method that
`evolved are reported in [Andr89].
`
`Hypertext
`
`‘89 Proceedings
`
`235
`
`November
`
`1989
`
`IPR2017-01039
`Unified EX1011 Page 11
`
`
`
`An important point about both phases of the experiment is that the actual text of the
`document (as shown in the document cards) was never examined during the interactive
`search process to determine document relevance. Doing so would of course have
`substantially improved retrieval performance in an actual user-controlled search process.
`However, one of our objectives when conducting this experiment was to apply some of
`the insight gained from the interactive visual search to an automated system. An
`automated search system does not consult actual text during the search process; it uses
`only a vector representation of the document. We are now performing extensive testing
`with larger collections which does not place such a severe constraint on a user.
`
`Once we had developed some search guidelines, the fifteen remaining queries in the
`Medlars collection were processed using the developed search procedures. The list of
`relevant documents was of course initially empty at the beginning of each search process.
`The companion global tree viewing program was not needed in a search, since a frequent
`user of the browser system has little problem with navigation of the search tree; novice
`users would certainly want to use the companion program, however. For each query, the
`query text was initially
`inspected prior to the navigation, and the resulting query vector
`edited as needed. Depending on the intermediate results obtained and the generaI feedback
`gained from the browsing process, the query vector was often modified to produce
`additional relevant documents.
`
`On an average, we were able to retrieve 55% of the relevant documents for the queries in
`the test set. The automatic retrieval system had a recall value of only 32%. Thus, even
`without taking full advantage of the information linked to a node (namely, the document
`abstract itself), use of the interactive browser yielded a significant
`improvement over
`automatic cluster searches. Additionally, use of the hypertext system resulted in the return
`of slightly fewer irrelevant documents; with the browser, 25% of the documents found
`were irrelevant compared to 28% in the automatic system.
`
`CONCLUSION
`
`Our immediate objective in this work was to produce a retrieval system that allows easy
`and accurate searching and browsing of a document collection. The representation of a
`collection as a cluster hierarchy was shown to provide a solid basis on which to build a
`hypertext retrieval system. The interactive browser is believed
`to be sufficiently
`comprehensive and flexible enough to support a variety of experiments designed to
`evaluate the effects of user control, user intervention, and the visual analysis of graphical
`data representation on retrieval performance during the cluster search process, The dynamic
`nature of the HyperCard environment on which the browser is based is well suited to meet
`the need of flexibility required for these tasks.
`
`REFERENCES
`
`@-is881
`
`M. E. Frisse, Searching for Information in a Hypertext Medical Handbook.
`Communications of the ACM, 31:7 (1988), pp. 880-886.
`
`[Clev841
`
`C. W. Cleverdon, Optimizing Convenient On-Line Access to Bibliographic
`Databases. Information Service Use, 4 (1984), pp. 31-47.
`
`[Crou891
`
`D. B. Crouch and R. Korfhage, The Use of Visual Representations in
`Information Retrieval Applications. In Visual Languages and Applications,
`R. Korfhage (ed.), Pergamon Press, New York (1989).
`
`[S