throbber
The Use of Cluster Hierarchies in Hypertext
`Information Retrieval’
`
`Donald B. Crouch, Carolyn J. Crouch and Glenn Andreas
`
`Department of Computer Science
`University of Minnesota - Duluth
`320 Heller Hall
`Duluth, Minnesota 55812
`
`ABSTRACT
`
`The graph-traversal approach to hypertext informationretrieval is a conceptualization of
`hypertext in which the structural aspects of the nodes are emphasized. A user navigates
`through such hypertext systems by evaluating the semantics associated with links
`between nodes as well as the information contained in nodes. [Fris88] In this paper we
`describe an hierarchical structure which effectively supports the graphical traversal of a
`documentcollection in a hypertext system. We provide an overview of an interactive
`browser based on cluster hierarchies. Initial results obtained from the use of the browser
`in an experimental hypertextretrieval system are presented.
`
`INTRODUCTION
`
`Information retrieval is concerned with the representation, storage and retrieval of
`documents or documentsurrogates. Information retrieval activities are routinely conducted
`on-line under the control of search intermediaries or end users who have beentrained to
`use somewhat complex user-system interfaces. However, poor query formulations and
`inadequate user-system interaction still occur even with skilled users. For example,
`Cleverdon has noted that “if two search intermediaries search the same question on the
`same database on the samehost, only 40 percent of the output may be common to both
`searches.” [Clev84]
`
`What is being done to aid users of information retrieval systems? The most common
`approaches are generally directed either toward the development of aids based on
`sophisticated user interfaces or toward the developmentof expert system techniquesfor the
`more complex operations of text retrieval systems. [Crou89} Research involving
`sophisticated user interfaces is primarily concerned with system functioning and
`convenience asit relates to the user; its goal is to facilitate the use of the system by
`providing computerized aids previously available only to the search intermediary in non-
`computerized forms. Among the facilities normally included in systemsof this type are
`vocabulary displays, thesaurus expansion of vocabulary items designed to add related
`terms to already existing search words, the construction and storage of search protocols,
`operations with previously formulated queries, etc. While this type of research is
`warranted and its results encouraging, it has not necessarily produced more effective
`retrieval but instead has generated tools for effortless learning and use of an information
`retrieval system.
`
`The other major area of research in the developmentof user aids for informationretrieval
`is concerned with the design of expert systems that facilitate access to the stored
`
`*This work was supported by the National Science Foundation undergrant IRI 87-02735.
`
`Hypertext '89 Proceedings
`
`225
`
`(cid:20)
`1
`
`November 1989
`
`ELASTIC - EXHIBIT 1011
`ELASTIC - EXHIBIT 1011
`
`

`

`collections. The goal of such research is to capture the expertise of search intermediaries
`in formulating Boolean queries and in dealing with other typesofretrieval services. The
`expert approach is based on the use of domain-specific knowledge that covers the topic
`areas represented by the collection, a language analyzer that can understand natural
`language queries and translate them into appropriate internal forms, and rules for search
`formulation and search strategy designed to choose search methods based onusercriteria.
`It may eventually become feasible to generate search formulation criteria in the form of
`rules that do in fact reflect the expert knowledge of trained search intermediaries. However,
`for the time being, one has reason to be careful in accepting many of the currently
`unevaluated design proposals for expert system approachesas effective solutions to the
`retrieval problem.
`
`Wesubmitthat a viable alternative to using either very sophisticated user interfaces or
`expert systems as a solution to the retrieval problem consists of using only simple user-
`system interactions which enhance the effectiveness of retrieval operations through the
`addition of properly designed user friendly features. These features allow the user to
`function in an active role, replacing the full natural language comprehension whichis
`desirable yet currently unavailable in an automatic search expert.
`
`This approach to interface design is inherent in the concept of hypertext information
`retrieval. Hypertext supports a user's exploration of informational data items by
`representing data as a network of nodes containing text, graphics and other forms of
`information. [Smit88] A user may navigate through the hypertext system by following
`the links between nodes. The path a user follows is determined by his/her analysis of the
`information contained within the nodes and the semantics associated with links between
`the nodes. [Fris88]
`
`In hypertext information retrieval, each node is generally assumed to be a single
`document. Links exist which connect each documentto other documents having keywords
`in common with it; the semantics of the links between nodes are keywords (document
`index terms) or some descriptive information representing the connected documents. In
`this paper we introduce an hierarchical structure which provides additional semantic
`information within and between nodes. This structure seems particularly well suited to the
`user's exploration of a documentcollection in a visual context. The user may browse
`among the data items by analyzing a graphical display of the structure itself as well as the
`semantic links between nodes.
`
`In the next two sections, we briefly describe the retrieval model and the characteristics of
`the hierarchy on which our structure is based. We then describe a prototype of a hypertext
`retrieval system utilizing the cluster hierarchy and present the initial results of an
`experiment comparing retrieval performance of the hypertext system with that of an
`automatic retrieval system.
`
`INFORMATION RETRIEVAL MODELS
`
`The most commoninformation retrieval models are the Boolean retrieval model and the
`vector space model. These two models are briefly described and their use in conventional
`information retrieval systems examined.
`
`Boolean Retrieval Model
`
`Mostretrieval systems are based on the Boolean model. Queries are expressed as a set of
`terms connected by the Boolean operators and, or and not. Such systemsretrieve
`information by performing the Boolean operations on the corresponding sets of
`documents containing the query terms. Although the Boolean model can be used
`effectively in automatic text retrieval (in fact, a query can be formulated to retrieve any
`particular subset of items), imprecise or broad requests utilizing the or relation can result
`in the retrieval of large numbers of irrelevant texts while narrow or overly precise queries
`
`
`Hypertext '89 Proceedings
`
`226
`
`(cid:21)
`
`November 1989
`
`

`

`utilizing the and relation can exclude many relevant items. In practice a compromiseis
`often obtained by the use of a query formulation that is neither too broad nor too narrow.
`[Salt86]
`
`Although the Boolean model has been widely accepted, it does have its problems:
`

`

`
`*
`
`e
`
`Boolean queries are difficult to construct; intermediaries are generally required to
`add terms notoriginally included, provide synonyms or alternate spellings, drop
`high-frequency terms,etc. [Fox86],
`
`Boolean systems generally do not provide for the assignment of term weights,
`
`the size of the subset of documents to be returned is difficult to control, and
`
`the retrieved documents are usually presented in a random order (no ranking based
`on an estimate of the query-documentrelevance is provided).
`
`The difficulties associated with the construction of Boolean queries are well known. One
`author recently commentedthat “research and developmentin information retrieval since
`the 1950’s has concentrated on methods which can provide better retrieval without the
`need for Boolean queries.” [Colv86]
`
`Vector Space Model
`
`The vector space model is conceptually the simplest retrieval model and is better suited
`for use in hypertextretrieval systems than the Boolean model. In the vector space model,
`the content of each document or query is represented by a set of possibly weighted content
`terms(i.e., some form of content identifier, such as a word extracted from the document
`text, a word phrase, or concept class chosen from a thesaurus). A term’s weight reflects
`its importance in relation to the meaning of the document or query. Each informational
`item (document) may then be considered a term vector, and the complete document
`collection becomes a vector space whose dimension is equal to the number ofdistinct
`terms used to identify the documentsin the collection. [Rijs79, Salt83]
`
`In the vector space model, it is assumed that similar or related documents or similar
`documents and queries are represented by similar multidimensional term vectors.
`Similarity is then generally defined as a function of the magnitudes of the matching terms
`in the respective vectors.
`
`A vector representation of documents and queries facilitates certain retrieval operations,
`namely:
`

`

`

`
`The construction of a clustered documentfile (consisting of classes of documents
`such that documents within a given class are substantially similar to each other),
`In clustered collections, an automatic search can be limited to the documents
`within those clusters whose class vector representations are similar to the query
`vector.
`
`The ranking of retrieved documents in decreasing order of their similarity with
`the query.
`
`The automatic reformulation of the query based on relevance assessments
`supplied by the user for previously retrieved documents. The intent of relevance
`feedback is to produce a modified query whose similarity to the relevant
`documents is greater than that of the original query while its similarity to the
`nonrelevantitems is smaller.
`
`The vector processing model also exhibits certain disadvantages, namely:
`
`*
`
`Some model parameters, such as the query-documentsimilarity function, are not
`derivable within the system but instead are chosen a priori by the system
`designer.
`
`Hypertext '89 Proceedings
`
`227
`
`(cid:22)
`
`November 1989
`
`

`


`

`
`Terms are assumed to be independentof one another.
`
`Term relationships are not expressible within the model.
`
`A recent characterization of the vector space model is contained in [Wong84].
`
`CLUSTERED DOCUMENT ENVIRONMENTS
`
`A principal advantage of the vector space model for use in hypertext informationretrieval
`is that algorithms exist for structuring a documentcollection in such a manner that
`similar documents are grouped together. A cluster hierarchy is represented by a tree
`structure in which terminal nodes correspond to single documents andinterior nodes to
`groups of documents, In a hypertext system based on a clustered environment, the user
`can readily focus his/her search on those groups (clusters) that are likely to contain
`documents which are highly similar to the query. Additionally, the cluster hierarchy is
`beneficial as a browsing tool in that it makes it possible easily to locate neighboring
`items with related subject descriptions.
`
`Agglomerative Cluster Hierarchy
`
`Voorhees [Voor85] has shownthatretrieval effectiveness may be enhanced in automatic
`retrieval systems when a type of clustering, known as agglomerative hierarchic clustering,
`is used to generate a cluster structure. In such a clustering method, each documentin the
`collection is considered initially to be a singleton cluster. The two closest clusters are
`successively merged until only one cluster remains. The definition of closest depends on
`the actual clustering method being used.
`
`Fig. 1 contains an example of a hierarchy for the single link agglomerative clustering
`method. In the single link method the similarity between twoclusters is the maximum of
`the similarities between all pairs of documents such that one documentof the pair is in
`one cluster and the other documentis in the other cluster. It may be noted that in the
`hierarchy documents may appear at any level and that clusters overlap only in the sense
`that smaller clusters are nested within larger clusters.
`
`Eachcluster in Fig. 1 is labelled with the level of association between the items underit.
`The clustering level determines the association strength of the corresponding items. Thus
`the similarity between items B, C and D in Fig. 4 is 0.9. On the other hand, the
`similarity between item A and the cluster containing items B, C and D is only 0.7. The
`level of association is a useful link semantic in a hypertext system.
`
`Searching a Clustered Environment
`
`To retrieve documents automatically in a clustered environment, comparisons are
`generally made between the query vector and documentvectors using one of the standard
`measures of similarity. A cluster search simplifies the search process by limiting the
`search to subsets of documents. For example, with an agglomeratively clustered tree such
`as that shownin Fig. 1, a straightforward, narrow, depth-first search starts at the top of
`the tree and calculates the similarity between the query and eachofits children. The child
`most similar to the query is selected, and the similarity between the query and each ofthe
`non-documentchildren of that node is calculated. The process is repeated until either all
`the similarities between the query and the non-documentchildren of some nodeare less
`than that between the query and the nodeitself, or all the children of that node are
`document nodes. The documents comprising the cluster represented by that node are
`returned. The search may be broadened by considering more than one path at each level.
`The broadest search considers all paths and abandonsthem as they fail certain criteria.
`
`Hypertext '89 Proceedings
`
`228
`
`(cid:23)
`
`November 1989
`
`

`

`
`
`Fig. 1 A sample single link hierarchy
`
`A bottom-up search may also be performed on such a tree. The cluster at the lowest level
`of the tree whose centroid is most similar to the query is chosen as the node at which the
`search will start. The search continues up the tree until the similarity between the query
`and the parent of the current node is smaller than the similarity between the query and the
`current node, The documents contained in the cluster corresponding to the current node are
`returned. The bottom-up search is often more effective due to the uncertainty involved at
`high levels of the hierarchy. [Crof80]
`
`Cluster hierarchies have been used effectively in automatic searches. Such hierarchies are
`also useful in performing searches based on browsing operations. These types of
`operations, we believe, can produce significant improvement in retrieval performance.
`Automatic cluster searches are highly structured; the next link in the search path is
`determined solely on the basis of the similarity between the query vector and the vector
`representation of the node being evaluated. By displaying suitable portions of the
`hierarchy during the course of the search operations andletting the user choose appropriate
`search paths at each point, the output obtained should be superior to that obtained by
`automatic cluster searching. For example, in a hypertext system with an interactive
`browser, following evaluation of items B, C, and D in the sample tree of Fig. 1, the user
`has the choice of exploring either a tightly clustered structure containing items F and G
`(which are very similar to each other with a similarity value of 0.8) or of staying in the
`same Cluster and evaluating item A (at a lower similarity level of 0.7). In contrast, the
`control mechanism of the automatic search procedure may terminate the search at the node
`labelled 0.7 and never evaluate the cluster containing items F and G. The effectiveness of
`this type of user-directed, interactive browsing is determined by comparing the results of
`such interactive searches to those obtained by automatic cluster searches.
`
`THE INTERACTIVE BROWSER
`
`A browser incorporating the cluster hierarchy as its primary network structure was
`implemented on a Macintosh IIx computer using HyperCard. The Macintosh is connected
`via a local area network to a SUN System on which the SMARTinformation retrieval
`system [Salt71] resides. The SMART system provides packages for textual analysis,
`clustering, performance evaluation,etc.
`
`To conduct a search using the browser, a userinitially specifies a natural language query
`which is subsequently transformed into a term vector representation via the SMART
`retrieval system. The hypertext system then displays a window containing the original
`query and its corresponding term vector (Fig. 2). As suggested by the annotated display of
`
`Hypertext '89 Proceedings
`
`229
`
`(cid:24)
`
`November 1989
`
`

`

`Fig. 3, a user may obtain the word stem associated with each conceptin the term vector
`as well as the documentfrequency ofthat term by clicking on a concept numberin the
`vector. At any point during the search process, the user may add or delete concepts from
`the query vector representation or completely re-specify the query itself. The query
`window also containsa list of identifiers representing the documents which the user has
`determined to be relevant to the query. Initially this list is empty; however, as the user
`conducts the search process, he/she enters documentsinto thelist.
`
`To begin (or continue) a browse in the clustered environment, the user clicks on the
`Use Query button. The interface presents a display of the clustered document space
`represented as a complete link hierarchy. A user may begin an exploration of the cluster
`tree at any point, for example, at the root node for a top-down approach orat a leaf node
`(document) for a bottom-up approach. The user mayprefer to initiate a search at an
`interior node (a cluster) which contains one or more documents knownto be similar to
`the query.
`
`In general, a tree representation of a clustered collection is too large to be displayed inits
`entirety. Therefore, a user is presented with two viewsofthe cluster tree simultaneously:
`a local view containing the subtree within which the user is currently browsing (see
`Fig. 4) and a global view, a more comprehensive view of the tree containing a
`significantly larger number of nodes than the local view (see Fig. 5). A user-directed
`traversal among the nodesis simultaneously reflected in both displays. The global view
`permits the user to observe where the search is being conductedin relation to the entire
`tree while the local view provides the user with more detailed information abouta specific
`subtree.
`
`As may be noted, many links and informational items are provided by the interface
`system to aid the user during the browsing process. As suggested by Fig. 4, the local
`view ofthe tree:
`
`e
`

`
`*
`
`*
`

`
`*
`
`Usesdifferent iconic representations to distinguish interior nodes (clusters) from
`leaf nodes (documents).
`
`Displays for each interior node the level at which the documents cluster. The
`clustering level represents the degree of association between the items underit.
`
`Lists the number of documents contained within the subtree defined by each node
`as well as the number of children of that node. This information can also be
`obtained by counting the nodesin the global view of thetree.
`
`Lists the value of the correlation measure of the query vector with either the
`centroid vector or the documentvector associated with each nodein the subtree.
`During the search process the user may change the correlation measure being
`calculated by means of the Correlation Measure pop-up menu.At present, the
`system provides a choice of several measures including vector product, inner
`product, Tanimoto, cosine and overlap.
`
`Provides a listing of the concepts contained within the query vector (see also
`Fig. 6). This information is also displayed in the query window; however,in the
`tree display, the concepts in the query are displayed in ascending order of
`documentfrequency. The user mayalter the query by adding or deleting concepts
`from the query vector during the search process without returning to the query
`window.
`
`Usesdifferent iconic representations to distinguish relevant documents from the
`other documentsin the tree. A list of the documents which the user has chosen
`as relevant to the query is maintained in the display. The user mayfreely insert
`document identifiers into and delete items from this list. The icons of the
`documentsin this list are then highlighted in the tree representation.
`
`*
`
`Lists documentidentifiers represented by the leaf nodesof the tree.
`
`
`
`Hypertext '89 Proceedings
`
`230
`
`(cid:25)
`
`November 1989
`
`

`

`Effect of azathioprine on systemic lupus erythematosus,
`particularly in regard to renal lesions
`
`Concepts Weight
`
`Relevant Docs
`
`in the que ——alls
`
`LuntAtLe.
`| List of documents
`i relevant to que
`
`Clicking on a concept
`numberreveals the word
`forming that concept and the
`documentfrequency of the
`concept.
`It also outlines the
`first occurrence of that word
`
`-_—=
`
`Fig. 3 Clicking on a concept number
`
`Hypertext '89 Proceedings
`
`231
`
`(cid:26)
`
`November 1989
`
`

`

`
`Interior node with
`{ —
`
`one concept number | 0.0437
`
`in common
`
`Correlation Measure
`
`
`
`
`
`
`
`
`
`
`Listofconcept
`List of
`Clustering
`
`numbers used
`relevant
`Level
`
`
`CuoreKkrne
`documents
`
`0.259000
`2/2
`
`# Children/# total documents
`0.0578
`
`
`To Doc...
`
`
`
`| A document with
`no concept numbers
`
`
`
`
`Interior node with
`
`A document with
`
`no concept numbers
`
`one concept number
`
`in common
`
`in common
`
`
`
`Fig. 4 The browserandits parts
`
` LIA
`
`Fig. 5 The overview of the tree
`
`en
`November 1989
`Hypertext '89 Proceedings
`232
`
`(cid:27)
`
`

`

`One can obtain additional information from the local tree display by clicking on various
`items contained within the display. For example, by option clicking on a node which has
`terms in common with a query, the selected node's icon is replaced with an informational
`window listing the terms in common and the weights associated with these terms
`(Fig. 7). Choosing a concept number in the query vector results in a display of the
`textual description of the chosen term and its document frequency (Fig. 6). Clicking on a
`terminal node (a documenticon)results in the display of additional information associated
`with the document. An example of a document window is contained in Fig. 8. One can
`also review the query or enter a new query by selecting the To Query button. It should be
`noted that each of the display windowshas informational features associated with them
`which support the visual search process.
`
`The global view of the tree (Fig. 5) has few items associated with a node, since the
`purpose of this display is to assist the user in locating his/her position within the tree
`during the navigation of the cluster space, The three types of icons used in the global
`view distinguish the following types of nodes:
`
`*
`
`e
`
`*
`
`A documentclassified as relevant by the user. As previously noted, identifiers of
`the relevant documents are highlighted in the local view.
`
`A documentwhich has notbeen classified as relevant.
`
`The node correspondingto the central nodein the local view of the tree.
`
`These nodes are color codedto facilitate a quick review ofthetree at large.
`
`0.086000
`
`Fig. 6 Finding the term for a concept numberin the browser
`
`Hypertext '89 Proceedings
`
`233
`
`(cid:28)
`
`November 1989
`
`

`

`HLORBOOD
`
`Option-Clicking on a documentorpnMeasure|
`
`# node will list the common concept FT
`number(and its weight)
`
`
`
`
`as
`
`
`A
`3340 @ 0.05781
`
`
`
`Fig. 7 Getting common concept number information
`
`Item_ID
`
`their dissemination from the female genital
`
`septicemia due to mycoplasma hominis type 1
`Viola m. young, ph.d., and sheldon m. wolff, m.d.
`a febrile illness that followed therapeutic
`abortion
`and was accompanied by the presence of
`Mycoplasma hominis type 1 in the blood is
`described.
`the patient upon recovery exhibited a specific
`antibody respascetethea.muasoleceng isolated
`from the bi Text of the Abstract§strain of the
`same serotype.
`these
`findings are regarded as
`additional
`evidence for the pathogenicity of m. hominis
`type 1 organisms, particularly in situations
`favoring
`
`1.00000
`1.00000
`1.00000
`.
`1.00000
`1.00000
`1.00000
`
`3.00000
`1.00000
`1.00000
`
`Fig. 8 Documentcard andits parts
`
`Hypertext '89 Proceedings
`
`234
`
`(cid:20)(cid:19)
`10
`
`November 1989
`
`

`

`The user may view (move to) other portions of the tree not currently in the local or
`global windowsin one of several ways:
`
`*
`
`*
`
`*
`
`*
`
`By scrolling up or down orleft or right in the global display window. The
`window movesacrossthe tree in the global view in the direction represented by
`the scrolling action of the mouse.
`
`By clicking on a nodein the local view. In this case, the chosen node becomes
`the central node in the local view. Such an action effectively moves the local
`view up or downonelevelofthetree.
`
`Byclicking on the To Root button in the local view. This action causes the
`interface system to redraw the local view of the tree with the central node
`becoming the root node.
`
`By clicking on the To Doc button in the local view. The To Doc button
`allows the user to view the document window associated with any documentin
`the collection. To obtain a document window, the user must specify the
`document’s identifier. Clicking on the To Tree button in the document
`window returns the user to a local view of the tree with its central node
`corresponding to the parent node of the document contained in the document
`window.
`
`The tree displays described above and the informational windowsassociated with items
`contained in the displays are effective for representing a local cluster arrangement for a
`small collection or a local area of a larger collection.
`
`EVALUATION OF THE BROWSER
`
`The SMARTinformation retrieval system provides a general framework for conducting
`retrieval experiments. SMART has fully automatic iterative search methods with
`automatic relevance documentclassification. The means exist within the system for
`evaluating the effectiveness of the retrieval process; the effectiveness of any interactive
`system can be established by comparison with automatic search procedures contained in
`the SMART system. SMARTalso provides collections of documents and corresponding
`sets of queries which may be used for experimentation. Relevance assessments have been
`produced by persons knowledgeable of the subject matter in the collections.
`
`In order to develop a general search strategy which a user may employ in the hypertext
`retrieval system, we focused on the MEDLARSCollection, a somewhat homogeneous
`collection generated by the National Library of Medicine. A user-controlled cluster search
`technique which performs well in a homogeneouscollection will perform as well or
`generally significantly better in a heterogeneous collection. MEDLARSconsists of 1,033
`documents in the medical field and a corresponding set of 30 queries. The document
`vectors were generated from an analysis of the abstracts of the documents. The
`MEDLARScollection was then clustered using a complete link clustering algorithm,
`resulting in a very wide tree. The cluster hierarchy contains 76 subtrees at the root node,
`and the maximum depth ofthetreeis ten.
`
`To assist in the development of a method for conducting a visual interactive search of a
`clustered collection, we divided the MEDLARSquery collection into two subsets. One set
`of queries (the base set) was used to aid in the development of the methodology; the other
`set was used to estimate the performanceofthe interactive search process.
`
`Weperformed an interactive search in the hypertext system for each query in the base set
`to determine the optimal search a user would follow in order to retrieve the known
`relevant documents of the query. By conducting this process for each of the 15 queries in
`the base set, we were able to observe and define the commonthreads linking relevant
`documents to the cluster tree. Our observations and the resulting search method that
`evolved are reported in [Andr89}.
`
`Hypertext '89 Proceedings
`
`235
`
`(cid:20)(cid:20)
`11
`
`November 1989
`
`

`

`An important point about both phases of the experimentis that the actual text of the
`document(as shownin the document cards) was never examined during the interactive
`search process to determine document relevance. Doing so would of course have
`substantially improved retrieval performance in an actual user-controlled search process.
`However, one of our objectives when conducting this experiment was to apply someof
`the insight gained from the interactive visual search to an automated system. An
`automated search system does not consult actual text during the search process; it uses
`only a vector representation of the document. We are now performing extensive testing
`with larger collections which does not place such a severe constraint on a user.
`
`Once we had developed some search guidelines, the fifteen remaining queries in the
`Medlars collection were processed using the developed search procedures. The list of
`relevant documents wasofcourseinitially empty at the beginning of each search process.
`The companion global tree viewing program was not neededin a search, since a frequent
`user of the browser system has little problem with navigation of the search tree; novice
`users would certainly want to use the companion program, however. For each query, the
`query text wasinitially inspected prior to the navigation, and the resulting query vector
`edited as needed. Depending on the intermediate results obtained and the general feedback
`gained from the browsing process, the query vector was often modified to produce
`additional relevant documents.
`
`On an average, we were able to retrieve 55% of the relevant documents for the queries in
`the test set. The automatic retrieval system had a recall value of only 32%. Thus, even
`without taking full advantage of the information linked to a node (namely, the document
`abstract itself), use of the interactive browser yielded a significant improvement over
`automatic cluster searches. Additionally, use of the hypertext system resulted in the return
`of slightly fewer irrelevant documents; with the browser, 25% of the documents found
`were irrelevant compared to 28% in the automatic system.
`
`CONCLUSION
`
`Our immediate objective in this work was to produce a retrieval system that allows easy
`and accurate searching and browsing of a documentcollection. The representation of a
`collection as a cluster hierarchy was shown to provide a solid basis on which to build a
`hypertext retrieval system. The interactive browser is believed to be sufficiently
`comprehensive and flexible enough to support a variety of experiments designed to
`evaluate the effects of user control, user intervention, and the visual analysis of graphical
`data representation on retrieval performance during the cluster search process, The dynamic
`nature of the HyperCard environmenton which the browseris based is well suited to meet
`the need offlexibility required for these tasks.
`
`REFERENCES
`
`[Fris83]
`
`M.E. Frisse, Searching for Information in a Hypertext Medical Handbook.
`Communications of the ACM, 31:7 (1988), pp. 880-886.
`
`[Clev84]
`
`C. W. Cleverdon, Optimizing Convenient On-Line Access to Bibliographic
`Databases. Information Service Use, 4 (1984), pp. 37-47.
`
`[Crou89]
`
`D. B. Crouch and R. Korfhage, The Use of Visual Representations in
`Information Retrieval Applications. In Visual Languages and Applications,
`R. Korfhage (ed.), Pergamon Press, New York (1989).
`
`[Smit88]
`
`J. B. Smith and S. F. Weiss, Hypertext. Communications of the ACM,
`31:7 (1988), pp. 816-819.
`
`G. Salton, Another Look at Automatic Text-Retrieval Systems.
`Communications of the ACM, 29:7 (1986), pp. 648-656.
`
`
`[Salt86]
`
`Hypertext '89 Proceedings
`
`236
`
`(cid:20)(cid:21)
`12
`
`November 1989
`
`

`

`[Fox86]
`
`[Colv86]
`
`[Rijs79]
`
`{Salt83]
`
`[Wong84]
`
`[Voor85]
`
`[Crof80]
`
`[Salt71]
`
`[Andr89]
`
`E. A. Fox, Information Retrieval: Research into New Capabilities. In CD
`ROM,S. Lambert and S. Ropiequet (eds.), Microsoft Press, Redmond,
`Washington (1986), pp. 143-174.
`
`G. Colvin, The Current State of Text Retrieval. In CD ROM, S. Lambert
`and S. Ropiequet (eds.), Microsoft Press, Redmond, Washington (1986), pp.
`131-136.
`
`C. J. Van Rijsbergen, Information Retrieval, Second Edition. Buttersworth,
`London (1979).
`
`G, Salton and M. J. McGill, Introduction to Modern Information Retrieval.
`McGraw-Hill Book Company, New York (1983).
`
`S. K. M. Wong and V. V. Raghavan, Vector Space Model of Information
`Retrieval: A Reevaluation. In Research and Development in Information
`Retrieval, C. J. van Rijsbergen (ed.), Cambridge University Press, London
`(1

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket