`Document Analysis
`
`Jinxi Xu and W. Bruce Croft
`
`Center for Intelligent Information Retrieval
`Computer Science Department
`University of Massachusetts, Amherst
`Amherst, MA 01003-4610, USA
`xu@cs.umass.edu croft@cs.umass.edu
`
`Abstract
`
`Automatic query espansion has long been suggested as a
`technique for dealing with the fundamental issue of word
`mismatch in information retrieval. A number of approaches
`to expansion have been studied and, more recently, attention
`has focused on techniques that analyze the corpus to discover
`word relationships (global techniques) and those that analyze
`documents retrieved by the initial query (local feedback). In
`this paper, we compare the efiectiveness of these approaches
`and show that, although global analysis has some advantages,
`local analysis is generally more efi’ective. We also show that
`using global analysis techniques, such as word contest and
`phrase structure, on the local set of documents produces re-
`sults that are both more efl'ective and more predictable than
`simple local feedback.
`
`1
`
`Introduction
`
`The problem of word mismatch is fundamental to informa-
`tion retrieval. Simply stated, it means that people often use
`different words to describe concepts in their queries than au-
`thors use to describe the same concepts in their documents.
`The severity of the problem tends to decrease as queries
`get longer, since there is more chance of some important
`words co—occurring in the query and relevant documents.
`In many applications, however, the queries are very short.
`For example, applications that provide searching across the
`World-Wide Web typically record average query lengths of
`two words [Croft et al., 1995]. Although this may be one ex-
`treme in terms of IR applications, it does indicate that most
`IR queries are not long and that techniques for dealing with
`word mismatch are needed.
`An obvious approach to solving this problem is query
`expansion. The query is expanded using words or phrases
`with similar meaning to those in the query and the chances
`of matching words in relevant documents are therefore in-
`creased. This is the basic idea behind the use of a thesaurus
`
`Permission to make digital/hard copy of all part of this work for per
`sonal or classroom use is granted wilhnut fee provided that copies are
`not made or distributed for profit or commercial advantage, the copy»
`right notice, the title of the publication and its date appear, and notice
`is given that copying is by permission of ACM. Inc. To copy otherwi—
`se, to republish, to post on servers or to redistribute to lists, requires
`prior specific permission and/or fee.
`SIGIR‘96,
`Zurich.
`Switzerland©1996
`ACM 0—89791-7927
`8/96/08 $3.50
`
`in query formulation. There is, however, little evidence that
`a general thesaurus is of any use in improving the effec-
`tiveness of the search, even if words are selected by the
`searchers [Voorhees, 1994].
`Instead, it has been proposed
`that by automatically analyzing the text of the corpus be-
`ing searched, a more effective thesaurus or query expansion
`technique could be produced.
`One of the earliest studies of this type was carried out
`by Sparck Jones [Sparck Jones, 1971] who clustered words
`based on co—occurrence in documents and used those clus—
`ters to expand the queries. A number of similar studies
`followed but it was not until recently that consistently pos-
`itive results have been obtained. The tedmiques that have
`been used recently can be described as being based on either
`global or local analysis of the documents in the corpus being
`searched. The global techniques examine word occurrences
`and relationships in the corpus as a whole, and use this in-
`formation to expand any particular query. Given their focus
`on analyzing the corpus, these techniques are extensions of
`Sparck Jones' original approach.
`Local analysis, on the other hand, involves only the top
`ranked documents retrieved by the original query. We have
`called it local because the techniques are variations of the
`original work on local feedback [Attar 8: Fraenkel, 1977,
`Croft 85 Harper, 1979]. This work treated local feedback as
`a special case of relevance feedback where the top ranked
`documents were assumed to be relevant. Queries were both
`reweighted and expanded based on this information.
`Both global and local analysis have the advantage of ex—
`panding the query based on all the words in the query. This
`is in contrast to a thesaurus-based approach where individ-
`ual words and phrases in the query are expanded and word
`ambiguity is a problem. Global analysis is inherently more
`expensive than local analysis. On the other hand, global
`analysis provides a thesaurus-like resource that can be used
`for browsing without searching, and retrieval results with
`local feedback on small test collections were not promising.
`More recent results with the TREC collection, however,
`indicate that local feedback approaches can be effective and,
`in some cases, outperform global analysis techniques. In this
`paper, we compare these approaches using different query
`sets and corpora.
`In addition, we propose and evaluate a
`new technique which borrows ideas from global analysis,
`such as the use of context and phrase structure, but applies
`them to the local document set. We call the new technique
`local context analysis to distinguish it from local feedback.
`In the next section, we describe the global analysis pro-
`cedure used in these experiments, which is the Phrasefinder
`component of the INQUERY retrieval system [Jing & Croft,
`
`1
`
`EX1021
`EX1021
`
`
`
`1994]. Section 3 covers the local analysis procedures. The
`local feedback technique is based on the most successful ap-
`proaches from the recent TREC conference (Harman, 1996].
`Local context analysis is described in detail.
`The experiments and results are presented in section 4.
`Both the TREC [Hal-man, 1995] and WEST [Turtle, 1994]
`collections are used in order to compare results in differ-
`ent domains. A number of experiments with local context
`analysis are reported to show the effect of parameter varia-
`tions on this new technique. The other techniques are run
`using established parameter settings. In the comparison of
`global and local techniques, both recall/precision averages
`and query-by-query results are used. The latter evaluation
`is particularly useful to determine the robustness of the tech-
`niques, in terms of how many queries perform substantially
`worse after expansion.
`In the final section, we summarize
`the results and suggest future work.
`
`2 Global Analysis
`
`The global analysis technique we describe here has been used
`in the INQUERY system in TREC evaluations and other
`applications [Jing 8L Croft, 1994, Callan et al., 1995], and
`was one of the first techniques to produce consistent effec—
`tiveness improvements through automatic expansion. Other
`researchers have developed similar approaches [Qiu & Frei,
`1993, Schiitze 82, Pedersen, 1994] and have also reported good
`results.
`The basic idea in global analysis is that the global con—
`text of a concept can be used to determine similarities be-
`tween concepts. Context can be defined in a number of ways,
`as can concepts. The simplest definitions are that all words
`are concepts (except perhaps stop words) and that the con—
`text for a word is all the words that co-occur in documents
`with that word. This is the approach used by [Qiu & Frei,
`1993], and the analysis produced is related to the represen-
`tations generated by other dimensionality-reduction tech—
`niques [Deerwester et al., 1990, Caid et al., 1993]. The
`essential difference is that global analysis is only used for
`query expansion and does not replace the original word-
`based document representations. Reducing dimensions in
`the document representation leads to problems with preci-
`sion. Another related approach uses clustering to determine
`the context for document analysis [Crouch Kc Yang, 1992].
`In the Phrasefinder technique used with INQUERY, the
`basic definition for a concept is a noun group, and the con-
`text is defined as the collection of fixed length windows sur-
`rounding the concepts. A noun group (phrase) is either a
`single noun,
`two adjacent nouns or three adjacent nouns.
`Typical efi'ective window sizes are from 1 to 3 sentences.
`One way of visualizing the technique, although not the most
`efficient way of implementing it, is to consider every concept
`(noun group) to be associated with a pseudo-document. The
`contents of the pseudo-document for a concept are the words
`that occur in every window for that concept in the corpus.
`For example, the concept airline pilot might have the words
`pay, strike, safety, air, Lraflic and FAA occurring frequently
`in the corresponding pseudo-document, depending on the
`corpus being analyzed. An INQUERY database is built from
`these pseudo-documents, creating a concept database. A fil-
`tering step is used to remove words that are too frequent or
`too rare, in order to control the size of the database.
`To expand a query, it is run against the concept database
`using INQUERY, which will generate a ranked list of phrasal
`concepts as output, instead of the usual list of document
`names. Document and collection—based weighting of match-
`
`ing words are used to determine the concept ranking, in a
`similar way to document ranking. Some of the top-ranking
`phrases from the list are then added to the query and
`weighted appropriately.
`In the Phrasefinder queries used
`in this paper, 30 phrases are added into each query and are
`downweighted in proportion to their rank position. Phrases
`containing only terms in the original query are weighted
`more heavily than those containing terms not in the origi-
`nal query.
`retrieved by
`the top 30 concepts
`shows
`Figure 1
`Phrasefinder for the TREC4 query 214 “What are the differ-
`ent techniques used to create self induced hypnosis”. While
`some of the concepts are reasonable, others are difficult to
`understand. This is due to a number of spurious matches
`with noncontent words in the query.
`The main advantages of a global analysis approach like
`the one used in INQUERY is that it is relatively robust in
`that average performance of queries tends to improve us»
`ing this type of expansion, and it provides a thesaurus—like
`resource that can be used for browsing or other types of
`concept search. The disadvantages of this approach is that
`it can be expensive in terms of disk space and computer
`time to do the global context analysis and build the search-
`able database, and individual queries can be significantly
`degraded by expansion.
`
`3 Local Analysis
`3. 1 Local Feedback
`
`The general concept of local feedback dates back at least
`to a 1977 paper by Attar and Fraenkel [Attar &. Fraenkel,
`1977]. In this paper, the top ranked documents for a query
`were proposed as a source of information for building an
`automatic thesaurus. Terms in these documents were clus-
`tered and treated as quasi-synonyms.
`In [Croft 8: Harper,
`1979], information from the top ranked documents is used to
`re—estimate the probabilities of term occurrence in the rel-
`evant set for a query.
`In other words, the weights of query
`terms would be modified but new terms were not added.
`This experiment produced effectiveness improvements, but
`was only carried out on a small test collection.
`Experiments carried out with other standard small col-
`lections did not give promising results. Since the simple
`version of this technique consists of adding common words
`from the top-ranked documents to the original query, the
`effectiveness of the technique is obviously highly influenced
`by the proportion of relevant documents in the high ranks.
`Queries that perform poorly and retrieve few relevant doc-
`uments would seem likely to perform even worse after local
`feedback, since most words added to the query would come
`from non—relevant documents.
`In recent TREC conferences, however, simple local feed—
`back techniques appear to have performed quite well. In this
`paper, we expand using a procedure similar to that used by
`the Cornell group in TREC 4 & 3 [Buckley et al., 1996].
`The most frequent 50 terms and 10 phrases (pairs of adja—
`cent non stop words) from the top ranked documents are
`added to the query. The terms in the query are reweighted
`using the Rocchio formula with or :5 z 'y = 1 : 1 : 0.
`Figure 2 shows terms and phrases added by local feed-
`back to the same query used in the previous section. In this
`case, the terms in the query are stemmed.
`One advantage of local feedback is that it can be rela-
`tively efficient to do expansion based on high ranking doc-
`uments.
`It may be slightly slower at run-time than, for
`
`2
`
`
`
`hypnosis
`meditation
`practitioners
`
`
`
`dentists
`antibodies
`disorders
`psychiatry
`immunodeficiency-virus
`anesthesia
`
`
`susceptibility
`therapists
`dearth
`atoms
`van-dyke
`self
`
`
`confession
`stare
`proteins
`
`
`katie
`johns-hopkins—university
`growing-acceptance
`
`
`reflexes
`voltage
`ad-hoc
`
`
`correlation
`conde—nast
`dynamics
`
`
`
`ike
`illnesses
`hofi'man
`
`
`
`Figure 1: Phrasefinder concepts for TREC4 query 214
`
`
`
`
`hypnot
`hypnotiz
`19960500
`
`psychiatr
`immun
`psychosomat
`
`suscept
`mesmer
`franz
`austrian
`dyck
`psychiatrist
`
`shesaid
`tranc
`professor
`hallucin
`18th
`centur
`
`hilgard
`1 1th
`unaccept
`
`19820902
`syndrom
`exper
`physician
`told
`patient
`
`hemophiliac
`strang
`cortic
`ol
`defic
`muncie
`
`spiegel
`diseas
`imagin
`
`suggest
`dyke
`feburar
`immunoglobulin
`reseach
`fresco
`
`person
`numb
`katie
`psorias
`treatment
`medicin
`
`17150000
`ms
`franz-mesmer
`
`
`
`austrian—physician
`psychosomat-medicin
`intern-congress
`hypnot-state
`fight-immun
`
`
`hypnotiz-peopl
`late-18th
`diseas-fight
`
`
`
` ms-ol
`
`
`Figure 2: Local feedback terms and phrases for TREC4 query 214
`
`example, Phrasefinder, but needs no thesaurus construction
`phase. Local feedback requires an extra search and access
`to document information. If document information is stored
`only for this purpose, then this should be counted as a space
`overhead for the technique, but it likely to be significantly
`less than a concept database. A disadvantage currently is
`that it is not clear how well this technique will work with
`queries that retrieve few relevant documents.
`
`about multiple topics, a co-occurrence of a concept at
`the begimling and a term at the end of a long docu-
`ment may mean nothing.
`It is also more efiicient to
`use passages because we can eliminate the cost of pro-
`cessing the unnecessary parts of the documents.
`.
`2' Concepts (noun phrases) “1 the top 71 passages are
`ranked according to the formula
`
`3.2 Local Context Analysis
`Local context analysis in a new technique which combines
`global analysis and local feedback. Like Phrasefinder, noun
`groups are used as concepts and concepts are selected based
`on co-occurrence with query terms. . Concepts are chosen
`from the top ranked documents, similar to local feedback,
`but the best passages are used instead of whole documents.
`The standard INQUERY ranking is not used in this tech-
`nique.
`Below are the steps to use local context analysis to ex—
`pand a query Q on a collection.
`
`1. Use a stande IR system (INQUERY) to retrieve the
`top n ranked passages. A passage is a text window
`uf fixed size (300 words in these experiments [Callan,
`1994]).
`There are two reasons that we use passages rather than
`documents. Since documents can be very long and
`
`bel(Q,c) = H (5 +log(af(c,t.~)) idf‘/1°g("))l#‘
`c,eq
`
`Where
`
`_
`af(c,t,~) = 22:: ft” fcj
`I'df;
`= maz(1.0,lang(N/N.-)/5.0)
`idfc
`= mas-(1.0, 10,11“N/N.)/5.0)
`
`is a concept
`c
`in p,-
`is the number of occurrences of t.’
`ft;,-
`is the number of occurr-ces of c in p5
`fcj
`N is the number of passages in the collection
`N;
`is the number of passages containing t.-
`N:
`is the number of passages containing c
`6
`is 0.1 in this paper to avoid zero bel value
`,
`.
`.
`The above formula '9 a Variant °f the U "if measure
`used by most IR systems. In the formula, the of part
`
`3
`
`
`
`rewards concepts co-occurring frequently with query
`terms, the idf, part penalizes concepts occurring fre-
`quently in the collection, the idf; part emphasizes in-
`frequent query terms. Multiplication is used to em-
`phasize co-occurrence with all query terms.
`
`3. Add m top ranked concepts to Q using the following
`formula:
`
`Qnew = #WSUM(1.0 1.0 Q U) Q!)
`Q!
`= #WSUM(1.0 w; C1 103 c;
`
`w". Cm)
`
`In our experiments, m is set to 70 and w; is set to
`1.0 — 0.9 at: i/TO. Unless specified otherwise, 10 is set to
`2.0. We call Q! the auxiliary query. #WSUM is an
`INQUERY query operator which computes a weighted
`average of its components.
`
`Figure 3 shows the top 30 concepts added by local con-
`text analysis to TREC4 query 214.
`Local context analysis has several advantages. It is com-
`putationally practical. For each collection, we only need a
`single pass to collect the collection frequencies for the terms
`and noun phrases. This pass takes about 3 hours on an
`Alpha workstation for the TREC4 collection. The major
`overhead to expand a query is an extra search to retrieve
`the top ranked passages. On a modern computer system,
`this overhead is reasonably small. Once the top ranked
`passages are available, query expansion is fast: when 100
`passages are used, our current implementation requires only
`several seconds of CPU time to expand a TREC4 query.
`So local context analysis is practical even for interactive
`applications. For queries containing proximity constraints
`(e.g. phrases), Phrasefinder may add concepts which co-
`occur with all query terms but do not satisfy proximity con-
`straints. Local context analysis does not have such a prob-
`lem because the top ranked passages are retrieved using the
`original query. Because it does not filter out frequent con-
`cepts, local context analysis also has the advantage of using
`frequent but potentially good expansion concepts. A disad-
`vantage of local context analysis is that it may require more
`time to expand a query than Phrasefinder.
`
`4 Experiments
`
`4.1 Collections and Query Sets
`
`Experiments are carried out on 3 collections: TRECS that
`comprises Tipster 1 and 2 datasets with 50 queries (topics
`151-200), TREC4 that comprises Tipster 2 and 3 datasets
`with 49 queries (topics 202-250) and WEST with 34 queries.
`TRECS and TREC4 (about 2 G33 each) are much larger
`and more heterogeneous than WEST. The average docu-
`ment length of the TREC documents is only 1/ 7 of that of
`the WEST documents. The average number of relevant doc-
`uments per query with the TREC collections is much larger
`than that of WEST. Table 1 lists some statistics about the
`collections and the query sets. Stop words are not included.
`
`4.2 Local Context Analysis
`Table 2 shows the performance of local context analysis on
`the three collections. 70 concepts are added into each query
`using the expansion formula in section 3.2.
`Local text analysis performs very well on TREC3 and
`TREC4. All runs produce significant
`improvements over
`the baseline on the TREC collections. The best run on
`
`TREC4 (100 passages) is 23.5% better than the baseline.
`The best run on TREC3 (200 passages) is 24.4% better than
`the baseline. On WEST, the improvements over the baseline
`are not as good as on TREC3 and TREC4. With too many
`passages, the performance is even worse than the baseline.
`The high baseline of the WEST collection (53.8% average
`precision) suggests that the original queries are of very good
`quality and we should give them more emphasis.
`So we
`downweight the expansion concepts by 50% by reducing the
`weight of auxiliary query QI from 2.0 to 1.0. Table 3 shows
`that downweighting the expansion concepts does improve
`performance.
`It is interesting to see how the number of passages used
`affects retrieval performance. To see it more clearly, we
`plot the performance curve on TREC4 in figure 4. Initially,
`increasing the number of passages quickly improves perfor—
`mance. The performance peaks at a certain point. After
`staying relatively flat for a period, the performance curves
`drop slowly when more passages are used. For TREC3 and
`TREC4,
`the optimal number of passages is around 100,
`while on WEST, the optimal number of passages is around
`20. This is not surprising because the first two collections
`are a order of magnitude larger than WEST. Currently we
`do not know how to automatically determine the optimal
`number of passages to use. Fortunately, local context anal-
`ysis is relatively insensitive to the number of the passages
`used, especially for large collections like the TREC collec-
`tions. On the TREC collections, between 30 and 300 pas-
`sages produces very good retrieval performance.
`
`5 Local Text Analysis vs Global Analysis
`
`In this section we compare Phrasefinder and local context
`analysis in term of retrieval performance. Tables 4-5 com-
`pare the retrieval performance of the two techniques on
`the TREC collections. On both collections,
`local context
`analysis is much better than Phrasefinder. On TRECS,
`Phrasefinder is 7.8% better than the baseline while local
`context analysis using the top ranked 100 passages is 23.3%
`better than the baseline. On TREC4, Phrasefinder is only
`3.4% better than the baseline while local context analysis
`using the top ranked 100 passages is 23.5% than the base-
`line.
`In fact, all local context analysis ms in table 2 are
`better than Phrasefinder on TRECS and TREC4. On both
`collections, Phrasefinder hurts the high-precision end while
`local context analysis helps improve precision. The results
`show that local context analysis is a better query expansion
`technique than Phrasefinder.
`to show why
`We
`examine
`two TREC4 queries
`Phrasefinder is not as good as local context analysis. For
`one example, “China” and “Iraq” are very good concepts
`for TREC4 query “Status of nuclear proliferation treaties ~
`violations and monitoring". They are added into the query
`by local context analysis but not by Phrasefinder.
`It ap-
`pears that they are filtered out by Phrasefinder because they
`are frequent concepts. For the other example, Phrasefinder
`added the concept “oil spill” to TREC4 query “As a result
`of DNA testing, are more defendants being absolved or con-
`victed of crimes”. This seems to be strange. It appears that
`Phrasefinder did this because “oil spill” co—occurs with many
`of the terms in the query, e.g., “result”, “test”, “defendant”,
`“absolve” and “crime". But “oil spill” does not co-occur
`with “DNA", which is a key element of the query. While
`it is very hard to automatically determine which terms are
`key elements of a query, the product function used by local
`context analysis for selecting expansion concepts should be
`
`4
`
`
`
`ms.-bums
`bram'-wave
`hypnosis
`
`
`
`technique
`pulse
`reed
`
`ms.-olness
`brain
`trance
`
`
`hallucination
`process
`circuit
`
`
`van-dyck
`behavior
`suggestion
`case
`spiegel
`finding
`
`
`
`hypnotizables
`subject
`van-dyke
`patient
`memory
`application
`
`
`katie
`muncie
`approach
`
`study
`point
`
`
`
`
`Figure 3: Local Context Analysis concepts for query 214
`WEST ‘7 TRECB
`2.2
`0.26
`11,953
`
`collection
`Number of queries
`Raw text size in gigabytes
`Number of documents
`Mean words per document
`Mean relevant documents per query
`Number of words in a collection
`
`| 741,856
`260
`
`1,970 23,516,042 I 192,684,738 | 169,682,351
`200
`
`Table 1: Statistics on text corpora
`
`Number of passages
`50
`100
`collection
`TREC4
`
`TREC3
`
`WEST
`
`Table 2: Performance of local context analysis using 11 point average precision
`
`collection
`
`10
`55.9
`+3.8
`
`Number of passages
`50
`200
`100
`40
`30
`20
`54.6
`55.8
`55.7
`55.6
`56.5
`55.6
`+5.0 +3.4 +3.6 +3.7
`+1.6
`+3.3
`
`WEST
`
`300
`54.4
`+1.2
`
`500
`53.6
`-0.4
`
`1000
`53.7
`-0.1
`
`2000
`53.7
`-0.1
`
`Table 3: Downweight expansion concepts of local context analysis on WEST. The weight of the auxiliary query is reduced to
`1.0
`
`better than the sum function used by Phrasefinder because
`with the product function it is harder for some query terms
`to dominate other query terms.
`
`6 Local Text Analysis vs Local Feedback
`
`In this section we compare the retrieval performances of lo-
`cal feedback and local context analysis. Table 7 shows the
`retrieval performance of local feedback.
`Table 8 shows the result of downweighting the expansion
`concepts by 50% on WEST. The reason for this is to make
`a fair comparison with local context analysis. Remember
`that we also downweighted the expansion concepts of local
`context analysis by 50% on WEST.
`Local feedback does very well on TREC3. The best run
`produces a 20.5% improvement over the baseline, close to
`the 24.4% of the best run of local context analysis. It is also
`relatively insensitive to the number of documents used for
`feedback on TREC3.
`Increasing the number of documents
`from 10 to 50 does not affect performance much.
`It also does well on TREC4. The best run produces a
`14.0% improvement over the baseline, very significant, but
`lower than the 23.5% of the best run of local context analy-
`
`sis. It is very sensitive to the number of documents used for
`feedback on TREC4.
`Increasing the number of documents
`from 5 to 20 results in a big performance loss. In contrast,
`local context analysis is relatively insensitive to the number
`of passages on all three collections.
`On WEST, local feedback does not work at all. With-
`out downweighting the expansion concepts, it results in a
`significant performance loss over all runs. Downweighting
`the expansion concepts only reduces the amount of loss. It
`is also sensitive to the number of documents used for feed-
`back. Increasing the number of feedback documents results
`in significantly more performance loss.
`It seems that the performance of local feedback and its
`sensitivity to the number of documents used for feedback
`depend on the number of relevant documents in the col-
`lection for the query. From table 1 we know that average
`number of relevant documents per query on TREC3 is 196,
`larger than 133 of TREC4, which is in turn larger than 29
`of WEST. This corresponds to the relative performance of
`local feedback on the collections.
`Tables 4-6 show a side by side comparison between local
`feedback and local context analysis at difl'erent recall levels
`on the three collections. Top 10 documents are used for local
`
`5
`
`
`
`as
`
`92
`
`s1
`
`as
`
`29
`
`§5
`
`2'.
`s
`
`2s a
`
`so
`
`100
`
`150
`
`zoo mmmmmsoo
`m. a pull-nu
`
`Figure 4: Performance curve of local context analysis on TREC4
`
`Phrasefinder
`
`lea-mop
`
`average 73.2
`
`57.1
`46.8
`39.9
`35.3
`29.9
`23.6
`17.9
`
`+11.o_+23-5l
`
`Table 4: A comparison of baseline, Phrasefinder, local feedback and local context analysis on TREC4. 10 documents for local
`feedback (lf-lOdoc). 100 passages for local context analysis (lea-100p)
`
`feedback and top 100 passages are used for local context
`analysis in these tables. In table 6 for WEST, the expansion
`concepts are downweighted by 50% for both local feedback
`and local context analysis.
`We also made a query-by-query comparison of the best
`run of local feedback and the best run of local context anal-
`ysis on TREC4. Of 49 queries, local feedback hurts 21 and
`improves 28, while local context analysis hurts 11 and im-
`proves 38. Of the queries hurt by local feedback, 5 queries
`have a more than 5% percent loss in average precision. The
`worst case is query 232, whose average precision is reduced
`from 24.8% to 4.3%. Of those hurt by local context analysis,
`only one has a more 5% percent loss in average precision.
`Local feedback also tends to hurt queries with poor perfor-
`mance. Of 9 queries with baseline average precision less than
`5%, local feedback hurts 8 and improves 1. In contrast, lo-
`cal context analysis hurts 4 and improves 5. Its tendency to
`hurt “bad” queries and queries with few relevant documents
`(such as the WEST queries) suggests that local feedback is
`very sensitive to the number of relevant documents in the
`top ranked documents. In comparison, local context analy-
`sis is not so sensitive.
`It is interesting to note that although both local context
`analysis and local feedback find concepts from top ranked
`passages/documents, the overlap of the concepts chosen by
`them is very small. On TREC4,
`the average number of
`unique terms in the expansion concepts per query is 58 by
`local feedback and 78 by local context analysis. The aver—
`age overlap per query is only 17.6 terms. This means local
`
`context analysis and local feedback are two quite different
`query expansion techniques. Some queries expanded quite
`difi'erently are improved by both methods. For example, the
`expansion overlap for query 214 of TREC4 (”What are the
`different techniques used to create self-induced hypnosis”) is
`19 terms, yet both methods improve the query significantly.
`
`7 Conclusion and Future Work
`
`This paper compares the retrieval effectiveness of three au-
`tomatic query expansion techniques: global analysis, local
`feedback and local context analysis. Experimental results
`on three collections show that local document analysis (local
`feedback and local context analysis) is more effective than
`global document analysis. The results also show that local
`context analysis, which uses some global analysis techniques
`on the local document set outperforms simple local feedback
`in terms of retrieval effectiveness and predictability.
`We will continue our work in these aspects:
`
`1. local context analysis: automatically determine how
`many passages to use, how many concepts to add to
`the query and how to assign the weights to them on a
`query by query basis. Currently the parameter values
`are decided experimentally and fixed for all queries.
`
`a new metric for selecting concepts.
`2. Phrasefinder:
`Currently Phrasefinder uses Inquery’s belief func-
`tion, which is not designed to select concepts. We
`
`6
`
`
`
`Table 5: A comparison of baseline, Phrasefinder, local feedback and local text analysis on TREC3. 10 documents for local
`feedback (lf-lOdoc). 100 passages for local context analysis (Ice-100p)
`
`average
`21.8 92.1
`
`lca- 100p-w1.0
`81.9
`(+4.7)
`76.9
`84.3
`(+5.4)
`71.4
`78.5
`(+1.3)
`68.2
`73.9
`(—o.1)
`60.8
`61.8
`(—1.7)
`56.8
`56.8
`(—1.2)
`50.1
`50.7
`(+2.2)
`42.1
`44.2
`(+6.4)
`33.1
`36.4
`+11.2
`22.6
`+17.1
`10.0
`+15.3
`—mu::?-EEIIE_-l
`Table 6: A comparison of baseline, local feedback and local text analysis on WEST. 10 documents for local feedback with
`weights for expansion units downweighted by 50% (lfledoc—dw0.5). 100 passages for local context analysis with weight for
`auxiliary query set to 1.0 (lca-lUOp-w1.0).
`
`hope a better metric will improve the performance of
`Phrasefinder.
`
`8 Acknowledgements
`
`We thank Dan Nachbar and James Allan for their help dur-
`ing this research. This research is supported in part by the
`NSF Center for Intelligent Information Retrieval at Univer—
`sity of Massachusetts, Amherst.
`This material is based on work supported in part by
`NRaD Contract Number N66001-94—D-6054. Any opinions,
`findings and conclusions or recommendations expressed in
`this material are the author(s) and do not necessarily re-
`flect those of the sponsor.
`
`References
`
`[Attar & Fraenkel, 1977] Attar, B., & Fraenkel, A. S.
`(1977). Local Feedback in Full—Text Retrieval Systems.
`Journal of the Association for Computing Machinery,
`24(3),397—417.
`
`[Buckley et a1., 1996] Buckley, 0., Singhal, A. Mitra,M.,
`& Salton, G. (1996). New Retrieval Approaches Using
`SMART. TREC 4. In Harman, D. ,editor, Proceedings of
`the TREC’ 4 Conference. National Institute of Standards
`and Technology Special Publication. to appear.
`
`81.
`[Caid et al., 1993] Caid, B., Gallant, 5., Carleton, J.,
`Sudbeck, D. (1993). HNC Tipster Phase I Final Report.
`In Proceedings of Tipster Test Program (Phase I), pp. 69—
`92.
`
`[Callan et al., 1995] Callan, J., Croft, W. B., 85 Broglio,
`J. (1995). TREC and TIPSTER experiments with IN-
`QUERY.
`Information Processing and Management, pp.
`327-343.
`
`[Callan, 1994] Callan, J. P. (1994). Passage-level evidence
`in document retrieval. In Proceedings of AGM SIGIR In—
`ternational Conference on Research and Development in
`Information Retrieval, pp. 302—310.
`
`[Croft et al., 1995] Croft, W. B., Cook, B., 82 Wilder, D
`(1995). Providing Government Information on The In-
`ternet: Experiences with THOMAS. In Digital Libraries
`Conference DL'95, pp. 19—24.
`
`85 Harper, D. J.
`[CroftSzHarper, 1979] Croft, W. B.,
`(1979). Using probabilistic models of document retrieval
`without relevance information.
`Journal of Documenta—
`tion, 35,285—295.
`
`[Crouch 8: Yang, 1992] Crouch, C. J. & Yang, B. (1992).
`Experimentsin automatic statistical thesaurus construc-
`tion.
`In Proceedings of ACM SIGIR International Con-
`ference on Research and Development in Information Be-
`trieval, pp. 77‘88.
`
`10
`
`7
`
`
`
`collection
`
`28.7
`+14.0
`36.6
`+16.0
`49.6
`-7.8
`
`-34.7
`
`number 0 documents used
`10
`20
`30
`27.9
`26.9
`27.2
`+11.0
`+6.8
`+8.2
`38.0
`37.6
`37.7
`+20.5
`+19.1
`+19.4
`49.8
`46.2
`44.1
`-7.5
`-14.2
`-18.0
`
`50
`26.7
`+6.2
`37.7
`+19.3
`40.0
`-25.6
`
`100
`26.1
`+3.5
`36.6
`+15.8
`35.1
`
`Table 7: Performance of local feedback using 11 point average precision.
`
`number of documents used
` 5
`10
`20
`30
`50
`100
`WEST
`52.6
`52.0
`48.7
`47.5
`44.5
`40.0
`-3.3-2.2 -25.7 -9.5 -11.6 -17.2
`
`
`
`
`
`
`Tabl