throbber
The Use of Cluster Hierarchies in Hypertext
`Information Retrieval’
`
`Donald B. Crouch, Carolyn J. Crouch and Glenn Andreas
`
`Department of Computer Science
`University of Minnesota - Duluth
`320 Heller Hall
`Duluth, Minnesota 55812
`
`ABSTRACT
`
`The graph-traversal approach to hypertext information retrieval is a conceptualization of
`hypertext in which the structural aspects of the nodes are emphasized. A user navigates
`through such hypertext systems by evaluating the semantics associated with links
`between nodes as well as the information contained in nodes. [Fris&8] In this paper we
`describe an hierarchical structure which effectively supports the graphical traversal of a
`document collection in a hypertext system. We provide an overview of an interactive
`browser based on cluster hierarchies. Initial results obtained from the use of the browser
`in an experimental hypertextretrieval system are presented
`
`INTRODUCTION
`
`Information retrieval is concerned with the representation, storage and retrieval of
`documents or documentsurrogates. Informationretrieval activities are routinely conducted
`on-line under the control of search intermediaries or end users who have been trained to
`usé somewhat complex user-system interfaces. However, poor query formulations and
`inadequate user-system interaction still occur even with skilled users. For example,
`Cleverdon has noted that “if two search intermediaries search the same question on the
`same database on the same host, only 40 percent of the output may be commonto both
`searches.” [Clev84]
`
`What is being done to aid users of information retrieval systems? The most common
`approaches are generally directed either toward the development of aids based on
`sophisticated user interfaces or toward the developmentof expert system techniques forthe
`more complex operations of text retrieval systems. [Crou89] Research involving
`sophisticated user interfaces is primarily concerned with system functioning and
`convenience asit relates to the user; its goal is to facilitate the use of the system by
`providing computerized aids previously available only to the search intermediary in non-
`computerized forms. Amongthe facilities normally included in systemsof this type are
`vocabulary displays, thesaurus expansion of vocabulary items designed to add related
`terms to already existing search words, the construction and storage of search protocols,
`operations with previously formulated queries, etc. While this type of research is
`warranted and its results encouraging, it has not necessarily produced more effective
`retrieval but instead has generated toolsfor effortless learning and use of an information
`tetrieval system,
`
`The other major area of research in the developmentof user aids for information retrieval
`is concerned with the design of expert systems that facilitate access to the stored
`
`*This work was supported by the National Science Foundation under grant IRI 87-02735.
`
`Hypertext '89 Proceedings
`
`225
`
`(cid:20)
`1
`
`November 1989
`
`PETITIONERS - EXHIBIT 1011
`PETITIONERS- EXHIBIT 1011
`
`IPR2022-00217
`
`IPR2022-00217
`
`

`

`collections. The goal of such research is to capture the expertise of search intermediaries
`in formulating Boolean queries and in dealing with other types of retrieval services. The
`expert approach is based on the use of domain-specific knowledge that covers the topic
`areas represented by the collection, a language analyzer that can understand natural
`language queries and translate them into appropriate internal forms, and rules for search
`formulation and search strategy designed to choose search methods based on user criteria.
`It may eventually become feasible to generate search formulation criteria in the form of
`rules that do in fact reflect the expert knowledge of trained search intermediaries. However,
`for the time being, one has reason to be careful in accepting many of the currently
`unevaluated design proposals for expert system approachesas effective solutions to the
`Tetricval problem.
`
`Wesubmit that a viable alternalive to using either very sophisticated user interfaces or
`expert systems as a solution to the retrieval problem consists of using only simple user-
`system interactions which enhance the effectiveness of retrieval operations through the
`addition of properly designed user friendly features. These features allow the user to
`function in an active role, replacing the full natural language comprehension which is
`desirable yet currently unavailable in an automatic search expert.
`
`This approach to interface design is inherent in the concept of hypertext information
`retrieval. Hypertext supports a user's exploration of informational data items by
`representing data as a network of nodes containing text, graphics and other forms of
`information, [Smit88] A user may navigate through the hypertext system by following
`the links between nodes, The path a user follows is determined by his/her analysis of the
`information contained within the nodes and the semantics associated with links between
`the nodes, [Fris88]
`
`In hypertext information retrieval, each node is generally assumed to be a single
`document. Links exist which connect each document to other documents having keywards
`in common with it; the semantics of the links between nodes are keywords (document
`index terms) or some descriptive information representing the connected documents. In
`this paper we introduce an hierarchical structure which provides additional semantic.
`information within and between nodes. This structure seems particularly well suited to the
`user's exploration of a document collection in a visual context. The user may browse
`among the data items by analyzing a graphical display of the structureitself as well as the
`semantic links between nodes.
`
`In the next two sections, we briefly describe the retrieval model and the characteristics of
`the hierarchy on which our structure is based. We then describe a prototype of a hypertext
`retrieval system utilizing the cluster hierarchy and present the initial results of an
`experiment comparing retrieval performance of the hypertext system with that of an
`automatic retrieval system.
`
`INFORMATION RETRIEVAL MODELS
`
`The most commoninformation retrieval models are the Boolean retrieval model and the
`vector space model, These two models are briefly described andtheir use in conventional
`information retrieval systems examined.
`
`Boolean Retrieval Model
`
`Most retrieval systems are based on the Boolean model, Queries are expressed as a set of
`terms connected by the Boolean operators and, or and nol. Such systemsretrieve
`information by performing the Boolean operations on the corresponding sets of
`documents containing the query terms. Although the Boolean model can be used
`effectively in automatic text retrieval (in fact, a query can be formulated to retrieve any
`particular subsetof items), imprecise or broad requests utilizing the or relation can result
`in the retrievalof large numbers of irrelevant texts while narrow or overly precise queries
`
`
`Hypertext '89 Proceedings
`
`226
`
`(cid:21)
`
`November 1989
`
`

`

`utilizing the and relation can exclude many relevant items. In practice a compromiseis
`often obtained by the use of a query formulation that is neither too broad nor too narrow,
`[Salt86]
`
`Although the Boolean model has been widely accepted,it does have its problems:
`
`*
`
`*
`
`+
`
`*
`
`Boolean queries are difficult to construct, intermediaries are generally required to
`add terms notoriginally included, provide synonyms or altermate spellings, drop
`high-frequency terms, etc. [Fox86],
`
`Boolean systems generally do not provide for the assignment of term weights,
`
`the size of the subset of documents to be returned is difficult to control, and
`
`the retrieved documents are usually presented in a random order (no ranking based
`on an estimate of the query-documentrelevance is provided),
`
`The difficuluies associated with the construction of Boolean queries are well known. One
`author recently commented that “research and developmentin information retrieval since
`the 1950's has concentrated on methods which can provide better retrieval without the
`need for Boolean queries.” [Colv86]
`
`Vector Space Model
`
`The vector space model is conceptually the simplest retrieval model and is better suited
`for use in hypertext retrieval systems than the Boolean model. In the vector space model,
`the content of each documentor query is represented by a set of possibly weighted content
`terms(i.e., some form of content identifier, such as a word extracted from the document
`text, a word phrase, or concept class chosen from a thesaurus), A term’s weightreflects
`its importance in relation to the meaning of the document or query. Each informational
`item (document) may then be considered a term vector, and the complete document
`collection becomes a vector space whose dimension is equal to the numberof distinct
`terms used to identify the documents in the collection. [Rijs79, Sali83]
`
`In ihe vector space model, it is assumed that similar or related documents or similar
`documents and queries are represented by similar multidimensional term vectors,
`Similarity is then generally defined as a function of the magnitudes of the matching terms
`in the respective vectors.
`
`A vector representation of documents and queries facilitates certain retrieval operations,
`namely:
`
`*
`
`*
`
`*
`
`The construction of a clustered documentfile (consisting of classes of documents
`such that documents within a given class are substantially similar to each other).
`In clustered collections, an automatic search can be limited to the documents
`within those clusters whose class vector representations are similar to the query
`vector.
`
`The ranking of retrieved documents in decreasing order of their similarity with
`the query,
`The automatic reformulation of the query based on relevance assessments
`supplied by the user for previously rewieved documents. The intentof relevance
`feedback is to produce a modified query whose similarity to the relevant
`documents is greater than that of the original query while its similarity to the
`nonrelevantitemsis smaller.
`
`The vector processing modelalso exhibits certain disadvantages, namely:
`+
`Some mode! parameters, such as the query-documentsimilarity function, are not
`derivable within the system but instead are chosen a priori by the system
`designer.
`
`Hypertext '89 Proceedings
`
`227
`
`(cid:22)
`
`November 1989
`
`

`


`

`
`Terms are assumed to be independent of one another.
`
`Term relationships are not expressible within the model.
`
`A recent characterization of the vector space model is contained in [Wong84],
`
`CLUSTERED DOCUMENT ENVIRONMENTS
`
`A principal advantage of the vector space model for use in hypertext information retrieval
`is that algorithms exist for structuring a2 document collection in such a manner that
`similar documents are grouped together. A cluster hierarchy is represented by a tree
`structure in which terminal nodes correspond to single documents and interior nodes to
`groups of documents, In a hypertext system based on a clustered environment, the user
`can readily focus his/her search on those groups (clusters) that are likely to contain
`documents which are highly similar to the query. Additionally, the cluster hierarchy is
`beneficial as a browsing tool in that it makes it possible easily to locate neighboring
`items with related subject descriptions.
`
`Agglomerative Cluster Hierarchy
`
`Voorhees [Voor85] has shown thatretrieval effectiveness may be enhanced in automatic
`retrieval systems when a type of clustering, known as agglomerative Aierarchic clustering,
`is used to generate a cluster structure, In such a clustering method, each documentin the
`collection is considered initially to be a singleton cluster. The two closest clusters are
`successively merged until only one cluster remains. The definition of closest depends on
`the actual clustering method being used.
`
`Fig. 1 contains an example of a hierarchy for the single link agglomerative clustering
`method. In the single link method the similarity between two clusters is the maximum of
`the similarities between all pairs of documents such that one documentof the pair is in
`one cluster and the other documentis in the other cluster, It may be noted that in the
`hierarchy documents may appear at any level and that clusters overlap only in the sense
`that smaller clusters are nested within larger clusters.
`
`Each cluster in Fig. 1 is labelled with the level of association between the items underit,
`The clustering level determines the association strength of the corresponding items, Thus
`the similarity between items B, C and D in Fig, 4 is 0.9, On the other hand,
`the
`similarity between item A and the cluster containing items B, C and D is only 0.7, The
`level of association is a useful link semantic in a hypertext system,
`
`Searching a Clustered Environment
`
`To retrieve documents automatically in a clustered environment, comparisons are
`generally made between the query vector and documentvectors using one of the standard
`measures of similarity. A cluster search simplifies the search process by limiting the
`search to subsets of documents. For example, with an agglomeratively clustered tree such
`as that shownin Fig. 1, a straightforward, narrow, depth-first search starts at the top of
`the tree and calculates the similarity between the query and each ofits children. The child
`most similar to the query is selected, and the similarity between the query and each of the
`non-document children of that node is calculated. The process is repeated until either all
`the similarities between the query and the non-documentchildren of some nodeare less
`than that between the query and the node itself, or all the children of that node are
`document nodes. The documents comprising the cluster represented by that node are
`returned, The search may be broadened by considering more than one path at each level.
`The broadest search considers all paths and abandons them as they fail certain criteria.
`
`
`
`Hypertext ‘89 Pracesdings
`
`228
`
`(cid:23)
`
`November 1989
`
`

`

`
`
`Fig. 1 A sample singlelink hierarchy
`
`A bottom-up search may also be performed on such a tree. The cluster at the lowest level
`ofthe tree whose centroid is most similar to the query is chosen as the node at which the
`search will start. The search continues up the tree until the similarity between the query
`and the parent of the current node is smaller than the similarity between the query and the
`current node, The documents contained in the cluster corresponding to the current node are
`returned, The bottom-up search is often more effective due to the uncertainty involved at
`high levels of the hierarchy. [Crof80]
`
`Cluster hierarchies have been used effectively in automatic searches. Such hierarchies are
`also useful in performing searches based on browsing operations. These types of
`operations, we believe, can produce significant improvement in retricval performance.
`Automaticcluster searches are highly structured; the next link in the search path is
`determined solely on the basis of the similarity between the query vector and the vector
`representation of the node being evaluated. By displaying suitable portions of the
`hierarchy during the course of the search operations andletting the user choose appropriate
`search paths at each point, the output obtained should be superior to that obtained by
`automatic cluster searching. For example, in a hypertext system with an interactive
`browser, following evaluation of items B, C, and D in the sample tree of Fig.1, the user
`has the choice of exploring either a tightly clustered structure containing items F and G
`(which are very similar to each other with a similarity value of 0,8) or of staying in the
`same cluster and evaluating item A (at a lower similarity level of 0.7). In contrast, the
`control mechanism of the automatic search procedure may terminate the search at the node
`labelled 0.7 and never evaluate the cluster containing items F and G. The effectiveness of
`this type of user-directed, interactive browsing is determined by comparing the results of
`suchinteractive searches to those obtained by automatic cluster searches.
`
`THE INTERACTIVE BROWSER
`
`A browser incorporating the cluster hierarchy as its primary network structure was
`implemented on a Macintosh IIx computer using HyperCard, The Macintosh is connected
`via a local area network to a SUN System on which the SMARTinformation retrieval
`system [Salt71] resides. The SMART system provides packages for textual analysis,
`clustering, performance evaluation, etc.
`
`To conducta search using the browser, a userinitially specifies a natural language query
`which is subsequently transformed into a term vector representation via the SMART
`retrieval system, The hypertext system then displays a window containing the original
`query and its corresponding term vector (Fig. 2). As suggested by the annotated display of
`
`Hypertext '89 Proceedings
`
`229
`
`(cid:24)
`
`November 1989
`
`

`

`Fig. 3, a user may obtain the word stem associated with each conceptin the term vector
`as well as the document frequency of that term by clicking on a concept numberin the
`vector. At any point during the search process, the user may add or delete concepts from
`the query vector representation or completely re-specify the query itself, The query
`windowalso containsalist of identifiers representing the documents which the user has
`determined to be relevant to the query. Initially this list is empty; however, as the user
`conducts the search process, he/she enters documents into the list.
`
`To begin (or continue) a browse in the clustered environment, the user clicks on the
`Use Query button. The interface presents a display of the clustered document space
`represented as a complete link hierarchy. A user may begin an exploration of the cluster
`tree at any point, for example, at the root node for a top-down approachor at a leaf node
`(document) for a bottom-up approach. The user may prefer to initiate a search at an
`interior node (a cluster) which contains one or more documents knownto be similar to
`the query,
`
`In general, a tree representation of a clustered collection is too large to be displayed inits
`entirety. Therefore, a user is presented with two viewsofthe cluster tree simultaneously:
`a local view containing the subtree within which the user is currently browsing (see
`Fig. 4) and a global view, a more comprehensive view of the tree containing a
`significantly larger number of nodes than the local view (see Fig. 5). A user-directed
`traversal among the nodesis simultaneously reflected in both displays, The global view
`permits the user to observe where the search is being conducted in relation to the entire
`tree while the local view provides the user with more detailed information abouta specific
`subtree.
`
`As may be noted, many links and informational items are provided by the interface
`system to aid the user during the browsing process, As suggested by Fig, 4, the local
`view ofthe tree:
`
`+
`
`Uses different iconic representations to distinguish intenor nodes (clusters) from
`leaf nodes (documents).
`
`« Displays for each interior node the level at which the documents cluster. The
`clustering level represents the degree of association between the items underit.
`Lists the number of documents contained within the subtree defined by each node
`as well as the number of children of that node. This information can also he
`obtained by counting the nodes in the global view ofthetree.
`
`*
`
`*
`
`*
`
`Lists the value of the correlation measure of the query vector with either the
`centroid vector or the documentvector associated with each node in the subtree,
`During the search process the user may change the correlation measure being
`calculated by means of the Correlauon Measure pop-up menu. At present, the
`system provides a choice of several measures including vector product, inner
`product, Tanimoto, cosine and overlap,
`
`Provides a listing of the concepts contained within the query vector (se¢ also
`Fig. 6). This information is also displayed in the query window; however, in the
`tree display, the concepts in the query are displayed in ascending order of
`document frequency. The user may alter the query by adding or deleung concepts
`from the query vector during the search process without returning to the query
`window.
`
`*
`
`Uses different iconic representations to distinguish relevant documents from the
`other documents in the tree. A list of the documents which the user has chosen
`as relevant to the query is maintained in the display. The user mayfreely insert
`document identifiers into and delete items from this list. The icons of the
`documentsin this list are then highlighted in the tree representation,
`+—Lists documentidentifiers represented by the leaf nodes ofthe tree.
`
`
`Hypertext '89 Proceedings
`
`230
`
`(cid:25)
`
`November 1989
`
`

`

`Query * 12
`
`Effect of azathioprine on systemic lupus erythematosus,
`particularly in regard to renal lesions
`
`Concepts Weight
`
`Relevant Docs
`
`oncepts found in
`query with their
`
`Query * 12
`
`Effect of azathioprine on systemic lupus erythematosus,
`particularly in regard to renal[lesions|
`
`in the que lew
`
`Concepts Weight
`
`Relevant Docs
`
`Clicking on a concept
`numberreveals the word
`forming that concept and the
`document frequency of the
`concept.
`It also outlines the
`first occurrence of that word
`
`-.
`
`—ahats ee
`
`Fig. 3. Clicking on a concept number
`
`Hypertext '89 Proceedings
`
`ai
`
`(cid:26)
`
`November 1989
`
`

`

`
`
`in common
`
`
`
`
`
`
`
` : 0.086000
`Interior node with
`2/5
`
`one concept number}
`00,0437
`28
`
`matiBh
`
`
`Correlation Measure
`
`List of concept
`numbers used
`Cone
`
`
`
`
`List of
`relevant
`
`; documents
`
`
`A document with
`
`no concept numbers
`in common
`
`
`
`
`
`
`no concept numbers
`
`in common
`
`A document with
`one concept number
`in common
`
` aecee WU
`
`ATR Ata
`
`Fig. 5 The overview of the trea
`
`
`Hypertext ‘89 Proceedings
`232
`November 1989
`
`(cid:27)
`
`

`

`One can obtain additional information from the local tree display by clicking on various
`items contained within the display. For example, by option clicking on a node which has
`terms in common with a query, the selected node's icon is replaced with an informational
`window listing the terms in common and the weights associated with these terms
`(Fig. 7), Choosing a concept number in the query vector results in a display of the
`textual description of the chosen term and its documentfrequency (Fig. 6). Clicking on a
`terminal node (a documenticon)results in the display of additional information associated
`with the document. An example of a document window is contained in Fig. 8. One can
`also review the query or enter a new query by selecting the To Query button. It should be
`noted that each of the display windows has informational features associated with them
`which support the visual search process,
`
`The global view of the tree (Fig, 5) has few items associated with a node, since the
`purpose of this display is to assist the user in locating his/her position within the wee
`during the navigation of the cluster space. The three types of icons used in the global
`view distinguish the following types of nodes:
`+
`A documentclassified as relevant by the user. As previously noted, identifiers of
`the relevant documents are highlightedin the local view.
`A document which has not been classified as relevant.
`
`*
`
`*
`
`The node corresponding to the central nodein the local view of the tree.
`
`These nodes are color coded to facilitate a quick review ofthe tree at large.
`
`
`
`Fig. 6 Finding the term for a concapt numberin the browser
`
`Hypertext '89 Proceedings
`
`233
`
`(cid:28)
`
`November 1989
`
`

`

`
`Option-Clicking on a document or
`
`node wil] list the common concept ian®7
`number (and its weight)
`
`
`
`a
`
`Ls
`
`[A(ToDoc...)
`
`
`
` “I
`3340 @ 0.05781
`
`Fig. 7 Gatting common concept number information
`
`=
`
`a2 2
`
`= ewes
`

`
`Item_ib
`
`1635} Id of Document
`
`:
`
`:
`
`|:
`
`BO
`Concept
`
`Fiat
`
`1.00000
`1.00000
`1.00000
`avalalale
`Concept
`
`Weights a
`
`4
`
`fstrain ofthe
`
`female genital
`
`aepticemia due te mycoplasma hominis type |
`viola m. young, ph.d., and sheldon m. wolff, md.
`a febrile illness that followed therapeutic
`abortion
`and was accompanied by the presence of
`mycoplasma hominis type | inthe blood is
`described.
`the patient upon recovery exhibited a specific
`antibody r
`isolated
`
`from the i Text of the Abstract
`
`these findings are regarded as
`
`same serotipe.
`additional
`evidence for the pathogenicity of m. hominis
`type 1 organisms, particularly in situations
`favoring
`their dissemination from the
`
`Fig. 8 Documentcard andits parts
`
`Hypertext '89 Proceedings
`
`234
`
`(cid:20)(cid:19)
`10
`
`November 1989
`
`

`

`The user may view (move to) other portions of the tree not currently in the local or
`global windowsin one of several ways:
`*
`By scrolling up or down orleft or right in the global display window. The
`window moves acrossthe tree in the global view in the direction represented by
`the scrolling action of the mouse,
`
`+
`
`*
`
`*
`
`By clicking on a node inthe local view.In this case, the chosen node becomes
`the central node in the local view. Such an action effectively moves the local
`view up or downonelevelof the tree.
`By clicking on the To Root button in the local view. This action causes the
`interface system to redraw the local view of the tree with the central node
`becoming the root node.
`
`By clicking on the To Doc button in the local view. The To Doc button
`allows the user to view the document window associated with any documentin
`the collection. To obtain a document window, the user must specify the
`document's identifier. Clicking on the To Tree button in the document
`window returns the user to a local view of the tree with its central node
`corresponding to the parent node of the document contained in the document
`window.
`
`The tree displays described above and the informational windows associated with items
`contained in the displays are effective for representing a local cluster arrangementfor a
`small collection or a local area of a larger collection.
`
`EVALUATION OF THE BROWSER
`
`The SMARTinformation retrieval system provides a general framework for conducting
`retrieval experiments. SMART has fully automatic iterative search methods with
`automatic relevance document classification. The means exist within the system for
`evaluating the effectiveness of the retrieval process; the effectiveness of any interactive
`system can be established by comparison with automatic search procedures contained in
`the SMART system. SMARTalso provides collections of documents and corresponding
`sets of queries which may be used for experimentation, Relevance assessments have been
`produced by persons knowledgeable of the subject matter in the collections.
`
`In order to develop a general search strategy which a user may employ in the hypertext
`retrieval system, we focused on the MEDLARSCollection, a somewhat homogeneous
`collection generated by the National Library of Medicine. A user-controlled cluster search
`technique which performs well in a homogeneous collection will perform as well or
`generally significantly better in a heterogeneous collection. MEDLARSconsists of 1,033
`documents in the medical field and a corresponding set of 30 queries. The document
`vectors were generated from an analysis of the abstracts of the documents. The
`MEDLARScollection was then clustered using a complete link clustering algorithm,
`resulting in a yery wide tree, The cluster hierarchy contains 76 subtrees at the root node,
`and the maximum depth ofthetreeis ten,
`
`To assist in the development of a method for conducting a visual interactive search of a
`clustered collection, we divided the MEDLARSquery collection into two subsets. One set
`of queries (the base set) was used to aid in the developmentof the methodology; the other
`set was used to estimate the performanceofthe interactive search process.
`
`We performed an interactive search in the hypertext system for each query in the base set
`to determine the optimal search a user would follow in order to retrieve the known
`relevant documents of the query. By conducting this process for each of the 15 queries in
`the base set, we were able to observe and define the common threads linking relevant
`documents to the cluster tree. Our observations and the resulting search method that
`evolved are reported in [Andr89].
`
`
`Hypertext '89 Proceadings
`
`235
`
`(cid:20)(cid:20)
`11
`
`November 1989
`
`

`

`An important point about both phases of the experimentis that the actual text of the
`document (as shown in the document cards) was never examined during the interactive
`search process to determine document relevance. Doing so would of course have
`substantially improved retrieval performance in an actual user-controlled search process,
`However, one of our objectives when conducting this experiment was to apply some of
`the insight gained from the interactive visual search to an automated system, An
`automated search system does not consult actual text during the search process; it uses
`only a vector representation of the document. We are now performing extensive testing
`with larger collections which does not place such a severe constraint on a user.
`
`Once we had developed some search guidelines, the fifteen remaining queries in the
`Medlars collection were processed using the developed search procedures. Thelist of
`relevant documents wasof course initially empty at the beginning of each search process.
`The companion global tree viewing program was not needed in a search, since a frequent
`user of the browser system has little problem with navigation of the search tree; novice
`users would certainly want to use the companion program, however. For each query, the
`query text was initially inspected prior to the navigation, and the resulting query vector
`edited as needed. Depending on the intermediate results obtained and the general feedback
`gained from the browsing process, the query vector was often modified to produce
`additional relevant documents.
`
`On an average, we were able to retrieve 55% of the relevant documents for the queriesin
`the test set, The automatic retrieval system had a recall value of only 32%. Thus, even
`without taking full advantage of the information linked to a node (namely, the document
`abstract itself), use of the interactive browser yielded a significant improvement over
`automatic cluster searches. Additionally, use of the hypertext system resulted in the return
`of slightly fewer irrelevant documents; with the browser, 25% of the documents found
`were irrelevant compared to 28% in the automatic system.
`
`CONCLUSION
`
`Our immediate objective in this work was to produce a retrieval system that allows easy
`and accurate searching and browsing of a documentcollection, The representation of a
`collection as a cluster hierarchy was shown to provide a solid basis on which to build a
`hypertext retrieval system. The interactive browser is believed to be sufficiently
`comprehensive and flexible enough to support a variety of experiments designed to
`evaluate the effects of user control, user intervention, and the visual analysis of graphical
`data representation on retrieval performance during the cluster search process, The dynamic
`nature of the HyperCard environment on which the browseris based is well suited to meet
`the need of flexibility required for these tasks.
`
`REFERENCES
`
`[Fris88]
`
`M. E.Frisse, Searching for Information in a Hypertext Medical Handbook.
`Communications of the ACM, 31:7 (1988), pp. 880-886.
`
`(Clev84]=C. W. Cleverdon, Optimizing Convenient On-Line Access to Bibliographic
`Databases. /nformation Service Use, 4 (1984), pp. 37-47.
`
`(Crou89|
`
`D. B. Crouch and R. Korfhage, The Use of Visual Representations in
`Information Retrieval Applications. In Visual Languages and Applications,
`R. Korfhage (ed.), Pergamon Press, New York (1989),
`
`[Smit88]
`
`J. B. Smith and S. F. Weiss, Hypertext. Communications of the ACM,
`31:7 (1988), pp. 816-819,
`
`[Salt86]
`
`G. Salton, Another Look at Automatic Text-Retrieval Systems.
`Communications of the ACM, 29:7 (1986), pp. 648-656.
`
`
`Hypertext '89 Proceedings
`
`236
`
`(cid:20)(cid:21)
`12
`
`November 1989
`
`

`

`[Fox86]
`
`E. A, Fox, Information Retrieval: Research into New Capabilities, In CD
`ROM, S. Lambert and $. Ropiequet (eds.), Microsoft Press, Redmond,
`Washington (1986), pp. 143-174.
`
`[Col v86]
`
`G, Colvin, The Current State of Text Retrieval. In CD ROM, S. Lambert
`and S$, Ropiequet(eds.), Microsoft Press, Redmond, Washington (1986), pp.
`131-136.
`
`[Rijs79]
`
`[Sali83]
`
`[Wong84]
`
`(Voor85]
`
`[Crof80]
`
`(Salt71)
`
`{Andr89]
`
`C.J. Van Rijsbergen, Information Retrieval, Second Edition. Buttersworth,
`London (1979).
`
`G, Salton and M. J. McGill, introduction to Modern Information Retrieval.
`McGraw-Hill Book Company, New York (1983),
`
`S$. K. M. Wong and V. V. Raghavan, Vector Space Model of Information
`Retrieval;

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket