throbber
The Use of Cluster Hierarchies
`Information Retrieval*
`
`in Hypertext
`
`Donald B. Crouch, Carolyn J. Crouch and Glenn Andreas
`
`Department of Computer Science
`University of Minnesota - Duluth
`320 Heller Hall
`Duluth, Minnesota 55812
`
`ABSTRACT
`
`The graph-truversul approach to hypertext information retrieval is a conceptualization of
`hypertext in which the structural aspects of the nodes are emphasized. A user navigates
`through such hypertext systems by evaluating
`the semantics associated with links
`between nodes as well as the information contained in nodes. [Fris88] In this paper we
`describe an hierarchical structure which effectively supports the graphical traversal of a
`document collection in a hypertext system. We provide an overview of an interactive
`browser based on cluster hierarchies. Initial results obtained from the use of the browser
`in an experimental hypertext retrieval system are presented.
`
`INTRODUCTION
`
`is concerned with the representation, storage and retrieval of
`retrieval
`Information
`documents or document surrogates. Information retrieval activities are routinely conducted
`on-line under the control of search intermediaries or end users who have been trained to
`use somewhat complex user-system interfaces. However, poor query formulations and
`inadequate user-system interaction still occur even with skilled users. For example,
`Cleverdon has noted that “if two search intermediaries search the same question on the
`same database on the same host, only 40 percent of the output may be common to both
`searches.” [Clev84]
`
`What is being done to aid users of information retrieval systems? The most common
`approaches are generally directed either toward the development of aids based on
`sophisticated user interfaces or toward the development of expert system techniques for the
`more complex operations of text retrieval systems. [Crou89] Research involving
`sophisticated user interfaces is primarily concerned with system functioning and
`convenience as it relates to the user; its goal is to facilitate the use of the system by
`providing computerized aids previously available only to the search intermediary in non-
`computerized forms. Among the facilities normally included in systems of this type are
`vocabulary displays, thesaurus expansion of vocabulary items designed to add related
`terms to already existing search words, the construction and storage of search protocols,
`operations with previously
`formulated queries, etc. While
`this type of research is
`warranted and its results encouraging, it has not necessarily produced more effective
`retrieval but instead has generated tools for effortless learning and use of an information
`retrieval system.
`
`The other major area of research in the development of user aids for information retrieval
`is concerned with the design of expert systems that facilitate access to the stored
`
`*This work was supported by the National Science Foundation under grant IRI 87-02735.
`
`Hypertext
`
`‘89 Proceedings
`
`225
`
`November 1989
`
`IPR2019-01304
`BloomReach, Inc. EX1011 Page 1
`
`

`

`collections. The goal of such research is to capture the expertise of search intermediaries
`in formulating Boolean queries and in dealing with other types of retrieval services. The
`expert approach is based on the use of domain-specific knowledge that covers the topic
`areas represented by the collection, a language analyzer that can understand natural
`language queries and translate them into appropriate internal forms, and rules for search
`formulation and search strategy designed to choose search methods based on user criteria.
`It may eventually become feasible to generate search formulation criteria in the form of
`rules that do in fact reflect the expert knowledge of trained search intermediaries. However,
`for the time being, one has reason to be careful in accepting many of the currently
`unevaluated design proposals for expert system approaches as effective solutions to the
`retrieval problem.
`
`We submit that a viable alternative to using either very sophisticated user interfaces or
`expert systems as a solution to the retrieval problem consists of using only simple user-
`system interactions which enhance the effectiveness of retrieval operations through the
`addition of properly designed user friendly features. These features allow the user to
`function in an active role, replacing the full natural language comprehension which is
`desirable yet currently unavailable in an automatic search expert.
`
`This approach to interface design is inherent in the concept of hypertext information
`retrieval. Hypertext supports a user’s exploration of informational data items by
`representing data as a network of nodes containing text, graphics and other forms of
`information.
`[Smit88] A user may navigate through the hypertext system by following
`the links between nodes. The path a user follows is determined by his/her analysis of the
`information contained within the nodes and the semantics associated with links between
`the nodes. [Fris883
`
`retrieval, each node is generally assumed to be a single
`information
`In hypertext
`document. Links exist which connect each document to other documents having keywords
`in common with it; the semantics of the links between nodes are keywords (document
`index terms) or some descriptive information representing the connected documents. In
`this paper we introduce an hierarchical structure which provides additional semantic
`information within and between nodes. This structure seems particularly well suited to the
`user’s exploration of a document collection in a visual context. The user may browse
`among the data items by analyzing a graphical display of the structure itself as well as the
`semantic links between nodes.
`
`In the next two sections, we briefly describe the retrieval model and the characteristics of
`the hierarchy on which our structure is based. We then describe a prototype of a hypertext
`retrieval system utilizing
`the cluster hierarchy and present the initial
`results of an
`experiment comparing retrieval performance of the hypertext system with that of an
`automatic retrieval system.
`
`INFORMATION RETRIEVAL MODELS
`
`The most common information retrieval models are the Boolean retrieval model and the
`vector space model. These two models are briefly described and their use in conventional
`information retrieval systems examined.
`
`Boolean Retrieval Model
`
`Most retrieval systems are based on the Boolean model. Queries are expressed as a set of
`terms connected by the Boolean operators and, or and not. Such systems retrieve
`information by performing
`the Boolean operations on the corresponding sets of
`documents containing
`the query terms. Although
`the Boolean model can be used
`effectively in automatic text retrieval (in fact, a query can be formulated to retrieve any
`particular subset of items), imprecise or broad requests utilizing
`the or relation can result
`in the retrieval of large numbers of irrelevant texts while narrow or overly precise queries
`
`Hypertext
`
`‘89 Proceedings
`
`226
`
`November 1989
`
`IPR2019-01304
`BloomReach, Inc. EX1011 Page 2
`
`

`

`the and relation can exclude many relevant items. In practice a compromise is
`utilizing
`often obtained by the use of a query formulation that is neither too broad nor too narrow.
`[Salt861
`
`.
`.
`.
`
`Although the Boolean model has been widely accepted, it does have its problems:
`.
`Boolean queries are difficult to construct; intermediaries are generally required to
`add terms not originally included, provide synonyms or alternate spellings, drop
`high-frequency terms, etc. pox863,
`Boolean systems generally do not provide for the assignment of term weights,
`the size of the subset of documents to be returned is difficult to control, and
`the retrieved documents are usually presented in a random order (no ranking based
`on an estimate of the query-document relevance is provided).
`The difficulties associated with the construction of Boolean queries are well known. One
`author recently commented that “research and development in information retrieval since
`the 1950’s has concentrated on methods which can provide better retrieval without the
`need for Boolean queries.” [Colv86]
`
`Vector Space Model
`
`The vecror space model is conceptually the simplest retrieval model and is better suited
`for use in hypertext retrieval systems than the Boolean model. In the vector space model,
`the content of each document or query is represented by a set of possibly weighted content
`terms (i.e., some form of content identifier, such as a word extracted from the document
`text, a word phrase, or concept class chosen from a thesaurus). A term’s weight reflects
`its importance in relation to the meaning of the document or query. Each informational
`item (document) may then be considered a term vector, and the complete document
`collection becomes a vector space whose dimension is equal to the number of distinct
`terms used to identify the documents in the collection. [Rijs79, Salt831
`
`In the vector space model, it is assumed that similar or related documents or similar
`documents and queries are represented by similar multidimensional
`term vectors.
`Similarity is then generally defined as a function of the magnitudes of the matching terms
`in the respective vectors.
`
`A vector representation of documents and queries facilitates certain retrieval operations,
`namely:
`.
`
`The construction of a clustered document file (consisting of classes of documents
`such that documents within a given class are substantially similar to each other).
`In clustered collections, an automatic search can be limited to the documents
`within those clusters whose class vector representations are similar to the query
`vector.
`The ranking of retrieved documents in decreasing order of their similarity with
`the query.
`The automatic
`reformulation of the query based on relevance assessments
`supplied by the user for previously retrieved documents. The intent of relevance
`is to produce a modified query whose similarity
`to the relevant
`feedback
`documents is greater than that of the original query while its similarity
`to the
`nonrelevant items is smaller.
`
`.
`
`.
`
`The vector processing model also exhibits certain disadvantages, namely:
`.
`Some model parameters, such as the query-document similarity function, are not
`derivable within
`the system but instead are chosen a priori by the system
`designer.
`
`Hypertext
`
`‘89 Proceedings
`
`227
`
`November
`
`1989
`
`IPR2019-01304
`BloomReach, Inc. EX1011 Page 3
`
`

`

`.
`.
`
`Terms are assumed to be independent of one another.
`Term relationships are not expressible within the model.
`A recent characterization of the vector space model is contained in [wong84],
`
`CLUSTERED DOCUMENT ENVIRONMENTS
`
`A principal advantage of the vector space model for use in hypertext information retrieval
`is that algorithms exist for structuring a document collection
`in such a manner that
`similar documents are grouped together. A cluster hierarchy is represented by a tree
`structure in which terminal nodes correspond to single documents and interior nodes to
`groups of documents. In a hypertext system based on a clustered environment, the user
`can readily focus his/her search on those groups (clusters) that are likely
`to contain
`documents which are highly similar to the query. Additionally,
`the cluster hierarchy is
`beneficial as a browsing tool in that it makes it possible easily to locate neighboring
`items with related subject descriptions.
`
`Agglomerative Cluster Hierarchy
`
`Voorhees [Voor85] has shown that retrieval effectiveness may be enhanced in automatic
`retrieval systems when a type of clustering, known as agglomerative hierarchic clustering,
`is used to generate a cluster structure. In such a clustering method, each document in the
`to be a singleton cluster. The two closest clusters are
`collection is considered initially
`successively merged until only one cluster remains. The definition of closest depends on
`the actual clustering method being used.
`
`link agglomerative clustering
`Fig. 1 contains an example of a hierarchy for the single
`method. In the single link method the similarity between two clusters is the maximum of
`the similarities between all pairs of documents such that one document of the pair is in
`one cluster and the other document is in the other cluster. It may be noted that in the
`hierarchy documents may appear at any level and that clusters overlap only in the sense
`that smaller clusters are nested within larger clusters.
`
`Each cluster in Fig. 1 is labelled with the level of association between the items under it.
`The clustering level determines the association strength of the corresponding items. Thus
`the similarity between items B, C and D in Fig. 4 is 0.9. On the other hand, the
`similarity between item A and the cluster containing items B, C and D is only 0.7. The
`level of association is a useful link semantic in a hypertext system.
`
`Searching a Clustered Environment
`
`in a clustered environment, comparisons are
`To retrieve documents automatically
`generally made between the query vector and document vectors using one of the standard
`measures of similarity. A cluster search simplifies the search process by limiting
`the
`search to subsets of documents. For example, with an agglomeratively clustered tree such
`as that shown in Fig. 1, a straightforward, narrow, depth-first search starts at the top of
`the tree and calculates the similarity between the query and each of its children. The child
`most similar to the query is selected, and the similarity between the query and each of the
`non-document children of that node is calculated. The process is repeated until either all
`the similarities between the query and the non-document children of some node are less
`than that between the query and the node itself, or all the children of that node are
`document nodes. The documents comprising the cluster represented by that node are
`returned. The search may be broadened by considering more than one path at each level.
`The broadest search considers all paths and abandons them as they fail certain criteria.
`
`Hypertext
`
`‘89 Proceedings
`
`228
`
`November
`
`t 989
`
`IPR2019-01304
`BloomReach, Inc. EX1011 Page 4
`
`

`

`Fig. 1 A sample single
`
`link hierarchy
`
`A bottom-up search may also be performed on such a tree. The cluster at the lowest level
`of the tree whose centroid is most similar to the query is chosen as the node at which the
`search will start. The search continues up the tree until the similarity between the query
`and the parent of the current node is smaller than the similarity between the query and the
`current node. The documents contained in the cluster corresponding to the current node are
`returned. The bottom-up search is often more effective due to the uncertainty involved at
`high levels of the hierarchy. [Crof?30]
`
`Cluster hierarchies have been used effectively in automatic searches. Such hierarchies are
`also useful in performing searches based on browsing operations. These types of
`operations, we believe, can produce significant improvement in retrieval performance.
`Automatic cluster searches are highly structured; the next link in the search path is
`determined solely on the basis of the similarity between the query vector and the vector
`representation of the node being evaluated. By displaying suitable portions of the
`hierarchy during the course of the search operations and letting the user choose appropriate
`search paths at each point, the output obtained should be superior to that obtained by
`automatic cluster searching. For example, in a hypertext system with an interactive
`browser, following evaluation of items B, C, and D in the sample tree of Fig. 1, the user
`has the choice of exploring either a tightly clustered structure containing items F and G
`(which are very similar to each other with a similarity value of 0.8) or of staying in the
`same cluster and evaluating item A (at a lower similarity
`level of 0.7). In contrast, the
`control mechanism of the automatic search procedure may terminate the search at the node
`labelled 0.7 and never evaluate
`the cluster containing items F and G. The effectiveness of
`this type of user-directed, interactive browsing is determined by comparing the results of
`such interactive searches to those obtained by automatic cluster searches.
`
`THE INTERACTIVE BROWSER
`
`the cluster hierarchy as its primary network structure was
`A browser incorporating
`implemented on a Macintosh 11x computer using HyperCard. The Macintosh is connected
`via a local area network to a SUN System on which the SMART information retrieval
`system [Salt711 resides. The SMART system provides packages for textual analysis,
`clustering, performance evaluation, etc.
`
`To conduct a search using the browser, a user initially specifies a natural language query
`which is subsequently transformed into a term vector representation via the SMART
`retrieval system. The hypertext system then displays a window containing the original
`query and its corresponding term vector (Fig. 2). As suggested by the annotated display of
`
`Hypertext
`
`‘89 Proceedings
`
`229
`
`November
`
`1989
`
`IPR2019-01304
`BloomReach, Inc. EX1011 Page 5
`
`

`

`Fig. 3, a user may obtain the word stem associated with each concept in the term vector
`as well as the document frequency of that term by clicking on a concept number in the
`vector. At any point during the search process, the user may add or delete concepts from
`the query vector representation or completely re-specify the query itself. The query
`window also contains a list of identifiers representing the documents which the user has
`determined to be relevant to the query. Initially
`this list is empty; however, as the user
`conducts the search process, he/she enters documents into the list.
`
`To begin (or continue) a browse in the clustered environment, the user clicks on the
`Use Query button. The interface presents a display of the clustered document space
`represented as a complete link hierarchy. A user may begin an exploration of the cluster
`tree at any point, for example, at the root node for a top-down approach or at a leaf node
`(document) for a bottom-up approach. The user may prefer to initiate a search at an
`interior node (a cluster) which contains one or more documents known to be similar to
`the query.
`
`In general, a tree representation of a clustered collection is too large to be displayed in its
`entirety. Therefore, a user is presented with two views of the cluster try simultaneously:
`a local view containing the subtree within which the user is currently browsing (see
`Fig. 4) and a global view, a more comprehensive view of the tree containing a
`significantly
`larger number of nodes than the local view (see Fig. 5). A user-directed
`traversal among the nodes is simultaneously reflected in both displays. The global view
`permits the user to observe where the search is being conducted in relation to the entire
`tree while the local view provides the user with more detailed information about a specific
`subtree.
`
`.
`
`.
`
`items are provided by the interface
`As may be noted, many links and informational
`system to aid the user during the browsing process. As suggested by Fig. 4, the local
`view of the tree:
`. Uses different iconic representations to distinguish interior nodes (clusters) from
`leaf nodes (documents).
`. Displays for each interior node the level at which the documents cluster. The
`clustering level represents the degree of association between the items under it.
`Lists the number of documents contained within the subtree defined by each node
`as well as the number of children of that node. This information can also be
`obtained by counting the nodes in the global view of the tree.
`Lists the value of the correlation measure of the query vector with either the
`centroid vector or the document vector associated with each node in the subtree.
`During the search process the user may change the correlation measure being
`calculated by means of the Correlation Measure pop-up menu. At present, the
`system provides a choice of several measures including vector product, inner
`product, Tanimoto, cosine and overlap.
`Provides a listing of the concepts contained within the query vector (see also
`Fig. 6). This information is also displayed in the query window; however, in the
`tree display, the concepts in the query are displayed in ascending order of
`document frequency. The user may alter the query by adding or deleting concepts
`from the query vector during the search process without returning to the query
`window.
`. Uses different iconic representations to distinguish relevant documents from the
`other documents in the tree. A list of the documents which the user has chosen
`as relevant to the query is maintained in the display. The user may freely insert
`document identifiers
`into and delete items from this list. The icons of the
`documents in this list are then highlighted in the tree representation.
`Lists document identifiers represented by the leaf nodes of the tree.
`
`.
`
`.
`
`Hypertext
`
`‘89 Proceedings
`
`230
`
`November 1989
`
`IPR2019-01304
`BloomReach, Inc. EX1011 Page 6
`
`

`

`Query * 12
`
`(iFG&)
`
`IEffect of azathioprine
`1 particularlu
`in regard
`
`lupus erythematosus,
`on systemic
`to renal
`lesions
`
`Conce ‘DtS Weight
`
`Relevant Dots
`
`I auerv with their
`
`1
`
`I relevant to query
`
`I
`
`Fig. 2 A sample query card and its parts
`
`Query * 12
`
`(iiiZiG$
`
`Effect of azathioprine
`particularly
`in regard
`
`on systemic
`to renal-1
`
`lupus erythematosus,
`
`Concepts Weight
`
`I Relevant Dots
`
`that concept and the
`forming
`document frequency of the
`concept.
`It also outlines the
`first occurrence of that word
`
`I
`
`II
`
`Fig. 3 Clicking on a concept number
`
`Hypertext
`
`‘89 Proceedings
`
`231
`
`November 1989
`
`IPR2019-01304
`BloomReach, Inc. EX1011 Page 7
`
`

`

`..l..4.a!.z I... I ‘=
`
`
`..!..a5 . . . . _... is
`
`A533 . . . . . ...” is I. in common
`El
`1 ..!.m2.9 . . ..I is
`
`77’nn H ir
`
`..!..t?.z% . . . . . . Is
`
`.
`
`I “You are Here” node $
`
`I
`
`19
`
`-
`904
`
`2
`
`T\
`
`A document with
`
`CrJ
`
`Fig. 4 The browser and its parts
`
`Tree
`
`21
`
`0 F 22
`
`143
`
`.I.I.QI n
`
`. .47A
`
`,479
`
`Fig. 5 The overview of the tree
`
`Hypertext
`
`‘89 Proceedings
`
`232
`
`November 1989
`
`IPR2019-01304
`BloomReach, Inc. EX1011 Page 8
`
`

`

`One can obtain additional information from the local tree display by clicking on various
`items contained within the display. For example, by option clicking on a node which has
`terms in common with a query, the selected node’s icon is replaced with an informational
`window
`listing
`the terms in common and the weights associated with these terms
`(Fig. 7). Choosing a concept number in the query vector results in a display of the
`textual description of the chosen term and its document frequency (Fig. 6). Clicking on a
`terminal node (a document icon) results in the display of additional information associated
`with the document. An example of a document window is contained in Fig. 8. One can
`also review the query or enter a new query by selecting the To Query button. It should be
`noted that each of the display windows has informational features associated with them
`which support the visual search process.
`
`The global view of the tree (Fig. 5) has few items associated with a node, since the
`purpose of this display is to assist the user in locating his/her position within the tree
`during the navigation of the cluster space. The three types of icons used in the global
`view distinguish the following types of nodes:
`.
`A document classified as relevant by the user. As previously noted, identifiers of
`the relevant documents are highlighted in the local view.
`A document which has not been classified as relevant.
`‘Ihe node corresponding to the central node in the local view of the tree.
`
`.
`.
`
`These nodes are color coded to facilitate a quick review of the tree at large.
`
`0.086000
`2/S
`0.0437
`
`. ..l..9.Q.Q .Y.. ....
`
`..1..m .5
`is
`
`~453 3.. ........
`..!..4.5.3 5.. .... is
`.... _ is
`..!..I.50 0..
`i s
`..!..%x .!. ...... is
`~~~~~
`
`luous 11
`
`\a
`
`1Q - kF’
`
`;
`
`/I
`
`I
`
`I
`
`Fig. 6 Finding the term for a concept number in the browser
`
`Hypertext
`
`‘89 Proceedings
`
`233
`
`November 1989
`
`IPR2019-01304
`BloomReach, Inc. EX1011 Page 9
`
`

`

`node will
`
`list
`
`the common concept
`
`[ To Query...
`
`)
`
`Fig. 7 Getting common concept number
`
`information
`
`I tern-10
`
`0
`septicemia due to mycoplesma homints type I
`-
`viola m. young, ph.d., and sheldon m. Wolff, m.d.
`:g#i
`a febrile
`illness that followed therapeutic
`~~~~l;
`sbortion
`i&;
`$i!J
`and was accompanied by the presence of
`ijiiiii
`::::;::
`mycoplasma hominis type 1 in the blood is
`;gji
`::::.:_
`dcscri bed.
`;ig@
`@ii
`the patient upon recovery exhibited a specific
`::::>i
`~&~
`anti body
`isolated
`strain of the !#$
`from the
`$@
`Same acr
`garded as
`$:i$i)
`additional
`i&g
`:,.*.i;
`giij j
`evidence for the pathogenlcity of m. hominis
`~~~$~~
`type 1 organisms, particularly
`in situations
`iii;:;;
`$::i&
`favoring
`0.
`their dissemination from the female aenital
`
`1087
`1715
`1799
`2004
`2830
`2893
`
`Concept
`Numbers
`
`5271
`5578
`7321
`7611
`
`Fig. 8 DOCUMefIt card and its park
`
`Hypertext
`
`‘89 Proceedings
`
`234
`
`November 1989
`
`IPR2019-01304
`BloomReach, Inc. EX1011 Page 10
`
`

`

`in the local or
`
`.
`
`.
`
`The user may view (move to) other portions of the tree not currently
`global windows in one of several ways:
`.
`By scrolling up or down or left or right in the global display window. The
`window moves across the tree in the global view in the direction represented by
`the scrolling action of the mouse.
`By clicking on a node in the local view. In this case, the chosen node becomes
`the central node in the local view. Such an action effectively moves the local
`view up or down one level of the tree.
`By clicking on the To Root button in the local view. This action causes the
`interface system to redraw the local view of the tree with the central node
`becoming the root node.
`By clicking on the To Dot button in the local view. The To Dot button
`allows the user to view the document window associated with any document in
`the collection. To obtain a document window,
`the user must specify the
`document’s
`identifier. Clicking on the To Tree button
`in the document
`window returns the user to a local view of the tree with its central node
`corresponding to the parent node of the document contained in the document
`window.
`
`.
`
`The tree displays described above and the informational windows associated with items
`contained in the displays are effective for representing a local cluster arrangement for a
`small collection or a local area of a larger collection.
`
`EVALUATION OF THE BROWSER
`
`The SMART information retrieval system provides a general framework for conducting
`retrieval experiments. SMART has fully automatic iterative search methods with
`automatic relevance document classification. The means exist within
`the system for
`evaluating the effectiveness of the retrieval process; the effectiveness of any interactive
`system can be established by comparison with automatic search procedures contained in
`the SMART system. SMART also provides collections of documents and corresponding
`sets of queries which may be used for experimentation. Relevance assessments have been
`produced by persons knowledgeable of the subject matter in the collections.
`
`In order to develop a general search strategy which a user may employ in the hypertext
`retrieval system, we focused on the MEDLARS Collection, a somewhat homogeneous
`collection generated by the National Library of Medicine, A user-controlled cluster search
`technique which performs well in a homogeneous collection will perform as well or
`generally significantly better in a heterogeneous collection. MEDLARS consists of 1,033
`documents in the medical field and a corresponding set of 30 queries. The document
`vectors were generated from an analysis of the abstracts of the documents. The
`MEDLARS collection was then clustered using a complete link clustering algorithm,
`resulting in a very wide tree. The cluster hierarchy contains 76 subtrees at the root node,
`and the maximum depth of the tree is ten.
`
`To assist in the development of a method for conducting a visual interactive search of a
`clustered collection, we divided the MEDLARS query collection into two subsets. One set
`of queries (the base set) was used to aid in the development of the methodology; the other
`set was used to estimate the performance of the interactive search process.
`
`We performed an interactive search in the hypertext system for each query in the base set
`to determine the optimal search a user would follow
`in order to retrieve the known
`relevant documents of the query. By conducting this process for each of the 15 queries in
`the base set, we were able to observe and define the common threads linking relevant
`documents to the cluster tree. Our observations and the resulting search method that
`evolved are reported in [Andr89].
`
`Hypertext
`
`‘89 Proceedings
`
`235
`
`November
`
`1989
`
`IPR2019-01304
`BloomReach, Inc. EX1011 Page 11
`
`

`

`An important point about both phases of the experiment is that the actual text of the
`document (as shown in the document cards) was never examined during the interactive
`search process to determine document relevance. Doing so would of course have
`substantially improved retrieval performance in an actual user-controlled search process.
`However, one of our objectives when conducting this experiment was to apply some of
`the insight gained from the interactive visual search to an automated system. An
`automated search system does not consult actual text during the search process; it uses
`only a vector representation of the document. We are now performing extensive testing
`with larger collections which does not place such a severe constraint on a user.
`
`Once we had developed some search guidelines, the fifteen remaining queries in the
`Medlars collection were processed using the developed search procedures. The list of
`relevant documents was of course initially empty at the beginning of each search process.
`The companion global tree viewing program was not needed in a search, since a frequent
`user of the browser system has little problem with navigation of the search tree; novice
`users would certainly want to use the companion program, however. For each query, the
`query text was initially
`inspected prior to the navigation, and the resulting query vector
`edited as needed. Depending on the intermediate results obtained and the generaI feedback
`gained from the browsing process, the query vector was often modified to produce
`additional relevant documents.
`
`On an average, we were able to retrieve 55% of the relevant documents for the queries in
`the test set. The automatic retrieval system had a recall value of only 32%. Thus, even
`without taking full advantage of the information linked to a node (namely, the document
`abstract itself), use of the interactive browser yielded a significant
`improvement over
`automatic cluster searches. Additionally, use of the hypertext system resulted in the return
`of slightly fewer irrelevant documents; with the browser, 25% of the documents found
`were irrelevant compared to 28% in the automatic system.
`
`CONCLUSION
`
`Our immediate objective in this work was to produce a retrieval system that allows easy
`and accurate searching and browsing of a document collection. The representation of a
`collection as a cluster hierarchy was shown to provide a solid basis on which to build a
`hypertext retrieval system. The interactive browser is believed
`to be sufficiently
`comprehensive and flexible enough to support a variety of experiments designed to
`evaluate the effects of user control, user intervention, and the visual analysis of graphical
`data representation on retrieval performance during the cluster search process, The dynamic
`nature of the HyperCard environment on which the browser is based is well suited to meet
`the need of flexibility required for these tasks.
`
`REFERENCES
`
`@-is881
`
`M. E. Frisse, Searching for Information in a Hypertext Medical Handbook.
`Communications of the ACM, 31:7 (1988), pp. 880-886.
`
`[Clev841
`
`C. W. Cleverdon, Optimizing Convenient On-Line Access to Bibliographic
`Databases. Information Service Use, 4 (1984), pp. 31-47.
`
`[Crou891
`
`D. B. Crouch and R. Korfhage, The Use of Visual Representations in
`Information Retrieval Applicatio

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket