`Document Analysis
`
`Jinxi Xu and W. Bruce Croft
`Center for Intelligent Information Retrieval
`Computer Science Department
`University of Massachusetts, Amherst
`Amherst, MA 01003-4610, USA
`xu@cs.umass.edu croft@cs.umass.edu
`
`Abstract
`
`Automatic query expansion has long been suggested as a
`technique for dealing with the fundamental issue of word
`mismatch in information retrieval. A number of approaches
`to expansion have been studied and, more recently, attention
`has focused on techniques that analyze the corpus to discover
`word relationships (global techniques) and those that analyze
`documentsretrieved by the initial query ( local feedback). In
`this paper, we compare the effectiveness of these approaches
`and show that, although global analysis has some advantages,
`local analysis is generally more effective. We also show that
`using global analysis techniques, such as word context and
`phrase structure, on the local set of documents produces re-
`sults that are both more effective and more predictable than
`simple local feedback.
`
`1
`
`Introduction
`
`The problem of word mismatch is fundamental to informa-
`tion retrieval. Simply stated, it means that people often use
`different words to describe concepts in their queries than au-
`thors use to describe the same concepts in their documents.
`The severity of the problem tends to decrease as queries
`get longer, since there is more chance of some important
`words co-occurring in the query and relevant documents.
`In many applications, however, the queries are very short.
`For example, applications that provide searching across the
`World-Wide Web typically record average query lengths of
`two words [Croft et al., 1995]. Although this may be one ex-
`treme in terms of IR applications, it does indicate that most
`IR queries are not long and that techniques for dealing with
`word mismatch are needed.
`An obvious approach to solving this problem is query
`expansion. The query is expanded using words or phrases
`with similar meaning to those in the query and the chances
`of matching words in relevant documents are therefore in-
`creased. This is the basic idea behind the use of a thesaurus
`
`Permission to make digital/hard copy ofall part of this work for per-
`sonal or classroomuse is granted without fee provided that copies are
`not madeor distributed for profit or commercial advantage, the copy-
`right notice, the title of the publication and its date appear, and notice
`is given that copying 1s by permission of ACM,Inc. To copy otherwi-
`se, to republish, to post on servers or to redistribute to lists, requires
`prior specific permission and/orfee.
`0-89791-792-
`SIGIR'96,
`Zurich,
`Switzerland©1996
`ACM
`8/96/08 $3.50
`
`in query formulation. There is, however,little evidence that
`a general thesaurus is of any use in improving the effec-
`tiveness of the search, even if words are selected by the
`searchers [Voorhees, 1994].
`Instead, it has been proposed
`that by automatically analyzing the text of the corpus be-
`ing searched, a moreeffective thesaurus or query expansion
`technique could be produced.
`Oneof the earliest studies of this type was carried out
`by Sparck Jones [Sparck Jones, 1971] who clustered words
`based on co-occurrence in documents and used those clus-
`ters to expand the queries. A numberof similar studies
`followed but it was not until recently that consistently pos-
`itive results have been obtained. The techniques that have
`been used recently can be described as being based on either
`global or local analysis of the documents in the corpus being
`searched. The global techniques examine word occurrences
`and relationships in the corpus as a whole, and use this in-
`formation to expand any particular query. Given their focus
`on analyzing the corpus, these techniques are extensions of
`Sparck Jones’ original approach.
`Local analysis, on the other hand, involves only the top
`ranked documents retrieved by the original query. We have
`called it local because the techniques are variations of the
`original work on local feedback [Attar & Fraenkel, 1977,
`Croft & Harper, 1979]. This work treated local feedback as
`a special case of relevance feedback where the top ranked
`documents were assumed to be relevant. Queries were both
`reweighted and expanded based on this information.
`Both global and local analysis have the advantage of ex-
`panding the query based on all the words in the query. This
`is in contrast to a thesaurus-based approach whereindivid-
`ual words and phrases in the query are expanded and word
`ambiguity is a problem. Global analysis is inherently more
`expensive than local analysis. On the other hand, global
`analysis provides a thesaurus-like resource that can be used
`for browsing without searching, and retrieval results with
`local feedback on small test collections were not promising.
`More recent results with the TRECcollection, however,
`indicate that local feedback approaches can be effective and,
`in somecases, outperform global analysis techniques. In this
`paper, we compare these approaches using different query
`sets and corpora.
`In addition, we propose and evaluate a
`new technique which borrows ideas from global analysis,
`such as the use of context and phrase structure, but applies
`them to the local document set. We call the new technique
`local context analysis to distinguish it from local feedback.
`In the next section, we describe the global analysis pro-
`cedure used in these experiments, which is the Phrasefinder
`component of the INQUERYretrieval system [Jing & Croft,
`
`1
`
`EX1021
`EX1021
`
`
`
`1994]. Section 3 covers the local analysis procedures. The
`local feedback technique is based on the most successful ap-
`proaches from the recent TREC conference [Harman, 1996].
`Local context analysis is described in detail.
`The experiments and results are presented in section 4.
`Both the TREC [Harman, 1995] and WEST [Turtle, 1994]
`collections are used in order to compare results in differ-
`ent domains. A numberof experiments with local context
`analysis are reported to show the effect of parameter varia-
`tions on this new technique. The other techniques are run
`using established parameter settings. In the comparison of
`global and local techniques, both recall/precision averages
`and query-by-query results are used. The latter evaluation
`is particularly useful to determine the robustness of the tech-
`niques, in terms of how many queries perform substantially
`worse after expansion.
`In the final section, we summarize
`the results and suggest future work.
`
`2 Global Analysis
`
`The global analysis technique we describe here has been used
`in the INQUERY system in TREC evaluations and other
`applications [Jing & Croft, 1994, Callan et al., 1995], and
`was one of the first techniques to produce consistent effec-
`tiveness improvements through automatic expansion. Other
`researchers have developed similar approaches [Qiu & Frei,
`1993, Schiitze & Pedersen, 1994] and have also reported good
`results.
`The basic idea in global analysis is that the global con-
`text of a concept can be used to determinesimilarities be-
`tween concepts. Context can be defined in a numberof ways,
`as can concepts. The simplest definitions are that all words
`are concepts (except perhaps stop words) and that the con-
`text for a word is all the words that co-occur in documents
`with that word. This is the approach used by [Qiu & Frei,
`1993], and the analysis producedis related to the represen-
`tations generated by other dimensionality-reduction tech-
`niques [Deerwester et al., 1990, Caid et al., 1993]. The
`essential difference is that global analysis is only used for
`query expansion and does not replace the original word-
`based document representations. Reducing dimensions in
`the document representation leads to problems with preci-
`sion. Another related approach uses clustering to determine
`the context for document analysis [Crouch & Yang, 1992].
`In the Phrasefinder technique used with INQUERY,the
`basic definition for a concept is a noun group, and the con-
`text is defined as the collection of fixed length windows sur-
`rounding the concepts. A noun group (phrase) is either a
`single noun,
`two adjacent nouns or three adjacent nouns.
`Typical effective window sizes are from 1 to 3 sentences.
`One way of visualizing the technique, although not the most
`efficient way of implementingit, is to consider every concept
`(noun group) to be associated with a pseudo-document. The
`contents of the pseudo-document for a concept are the words
`that occur in every window for that concept in the corpus.
`For example, the concept airline pilot might have the words
`pay, strike, safety, air, traffic and FAA occurring frequently
`in the corresponding pseudo-document, depending on the
`corpus being analyzed. An INQUERYdatabaseis built from
`these pseudo-documents, creating a concept database. A fil-
`tering step is used to remove words that are too frequent or
`too rare, in order to control the size of the database.
`To expand a query,it is run against the concept database
`using INQUERY,which will generate a rankedlist of phrasal
`concepts as output, instead of the usual list of document
`names. Document and collection-based weighting of match-
`
`ing words are used to determine the concept ranking, in a
`similar way to document ranking. Some of the top-ranking
`phrases from the list are then added to the query and
`weighted appropriately.
`In the Phrasefinder queries used
`in this paper, 30 phrases are added into each query and are
`downweighted in proportion to their rank position. Phrases
`containing only terms in the original query are weighted
`more heavily than those containing terms not in the origi-
`nal query.
`retrieved by
`the top 30 concepts
`shows
`Figure 1
`Phrasefinder for the TREC4 query 214 “What are the differ-
`ent techniques used to create self induced hypnosis”. While
`some of the concepts are reasonable, others are difficult to
`understand. This is due to a numberof spurious matches
`with noncontent words in the query.
`The main advantages of a global analysis approach like
`the one used in INQUERYis that it is relatively robust in
`that average performance of queries tends to improve us-
`ing this type of expansion, and it provides a thesaurus-like
`resource that can be used for browsing or other types of
`concept search. The disadvantages of this approach is that
`it can be expensive in terms of disk space and computer
`time to do the global context analysis and build the search-
`able database, and individual queries can besignificantly
`degraded by expansion.
`
`3 Local Analysis
`3.1 Local Feedback
`
`The general concept of local feedback dates back at least
`to a 1977 paper by Attar and Fraenkel [Attar & Fraenkel,
`1977]. In this paper, the top ranked documents for a query
`were proposed as a source of information for building an
`automatic thesaurus. Terms in these documents were clus-
`tered and treated as quasi-synonyms.
`In [Croft & Harper,
`1979], information from the top ranked documentsis used to
`re-estimate the probabilities of term occurrence in the rel-
`evant set for a query.
`In other words, the weights of query
`terms would be modified but new terms were not added.
`This experiment produced effectiveness improvements, but
`was only carried out on a small test collection.
`Experiments carried out with other standard small col-
`lections did not give promising results. Since the simple
`version of this technique consists of adding common words
`from the top-ranked documents to the original query, the
`effectiveness of the technique is obviously highly influenced
`by the proportion of relevant documents in the high ranks.
`Queries that perform poorly and retrieve few relevant doc-
`uments would seem likely to perform even worse after local
`feedback, since most words added to the query would come
`from non-relevant documents.
`In recent TREC conferences, however, simple local feed-
`back techniques appear to have performed quite well. In this
`paper, we expand using a procedure similar to that used by
`the Cornell group in TREC 4 & 3 [Buckley et al., 1996].
`The most frequent 50 terms and 10 phrases (pairs of adja-
`cent non stop words) from the top ranked documents are
`added to the query. The terms in the query are reweighted
`using the Rocchio formula with a: 8:y=1:1:0.
`Figure 2 shows terms and phrases added bylocal feed-
`back to the same query used in the previous section. In this
`case, the terms in the query are stemmed.
`One advantage of local feedback is that it can be rela-
`tively efficient to do expansion based on high ranking doc-
`uments.
`It may be slightly slower at run-time than, for
`
`2
`
`
`
`
`
`hypnosis
`practitioners
`meditation
`dentists
`disorders
`antibodies
`
`
`psychiatry
`anesthesia
`immunodeficiency-virus
`susceptibility
`dearth
`therapists
`
`
`atoms
`self
`van-dyke
`confession
`proteins
`stare
`
`
`katie
`growing-acceptance
`jobns-hopkins-university
`
`
`reflexes
`ad-hoc
`voltage
`
`
`correlation
`dynamics
`conde-nast
`
`
`
`ike
`illnesses
`hoffman
`
`
`
`Figure 1: Phrasefinder concepts for TREC4 query 214
`
`
`
`19960500
`hypnotiz
`hypnot
`
`immun
`psychiatr
`psychosomat
`
`franz
`mesmer
`suscept
`dyck
`austrian
`psychiatrist
`
`tranc
`shesaid
`professor
`
`centur
`18th
`hallucin
`unaccept
`
`1ith
`hilgard
`exper
`syndrom
`19820902
`told
`physician
`patient
`
`cortic
`strang
`hemophiliac
`defic
`ol
`muncie
`diseas
`spiegel
`imagin
`
`feburar
`dyke
`suggest
`fresco
`reseach
`immunoglobulin
`
`katie
`numb
`person
`medicin
`treatment
`psorias
`
`ms
`17150000
`franz-mesmer
`
`
`psychosomat-medicin
`austrian-physician
`intern-congress
`fight-immun
`hypnot-state
`
`
`
`hypnotiz-peopl
`diseas-fight
`late-18th
`
`
`ms-ol
`
`
`
`
`
`
`Figure 2: Local feedback terms and phrases for TREC4 query 214
`
`example, Phrasefinder, but needs no thesaurus construction
`phase. Local feedback requires an extra search and access
`to document information. If document information is stored
`only for this purpose, then this should be counted as a space
`overhead for the technique, but it likely to be significantly
`less than a concept database. A disadvantage currently is
`that it is not clear how well this technique will work with
`queries that retrieve few relevant documents.
`
`about multiple topics, a co-occurrence of a concept at
`the beginning and a term at the end of a long docu-
`ment may mean nothing.
`It is also more efficient to
`use passages because we can eliminate the cost of pro-
`cessing the unnecessary parts of the documents.
`
`2. Concepts (noun phrases) in the top n passages are
`ranked according to the formula
`
`bel(Q,c) = |] (5 + log(af(c, ts)idfe/ log(n))**
`t3€Q
`
`Where
`
`3.2 Local Context Analysis
`Local context analysis is a new technique which combines
`global analysis and local feedback. Like Phrasefinder, noun
`groups are used as concepts and concepts are selected based
`on co-occurrence with query terms. Concepts are chosen
`from the top ranked documents, similar to local feedback,
`but the best passages are used instead of whole documents.
`The standard INQUERYranking is not used in this tech-
`nique.
`e
`is a concept
`Below are the steps to use local context analysis to ex-
`pand a query Q onacollection.
`fiz is the number of occurrences of ¢;
`in p;
`j
`is the number of occurrences of c in pj;
`Nis the number of passages in the collection
`N;
`is the number of passages containing t;
`N.
`is the number of passages containing c
`6
`is 0.1 in this paper to avoid zero bel value
`
`1. Use a standard IR system (INQUERY)to retrieve the
`top n ranked passages. A passage is a text window
`of fixed size (300 words in these experiments [Callan,
`1994]).
`There are two reasons that we use passages rather than
`documents. Since documents can be very long and
`
`af(c,t:) = I=? ftss fey
`taf;
`= maza(1.0,log10(N/N;)/5.0)
`idf,
`= maz(1.0,logl0(N/N.)/5.0)
`
`The above formula is a variant of the tf idf measure
`used by most IR systems. In the formula, the af part
`
`3
`
`
`
`rewards concepts co-occurring frequently with query
`terms, the idf, part penalizes concepts occurring fre-
`quently in the collection, the idf; part emphasizes in-
`frequent query terms. Multiplication is used to em-
`phasize co-occurrence with all query terms.
`
`3. Add m top ranked concepts to Q using the following
`formula:
`
`Qnew = FWSUM(1.01.0Q w OQ)
`Ql
`= #WSUM(1L.0 wi cr wa ca... Wm Cm)
`
`In our experiments, m is set to 70 and w; is set to
`1.0—6.9%1/70. Unless specified otherwise, w is set to
`2.0. We call Qi the auxiliary query. #WSUM is an
`INQUERYquery operator which computes a weighted
`average of its components.
`
`Figure 3 shows the top 30 concepts added by local con-
`text analysis to TREC4 query 214.
`Local context analysis has several advantages. It is com-
`putationally practical. For each collection, we only need a
`single pass to collect the collection frequencies for the terms
`and noun phrases. This pass takes about 3 hours on an
`Alpha workstation for the TREC4 collection. The major
`overhead to expand a query is an extra search to retrieve
`the top ranked passages. On a modern computer system,
`this overhead is reasonably small. Once the top ranked
`passages are available, query expansion is fast: when 100
`passages are used, our current implementation requires only
`several seconds of CPU time to expand a TREC4 query.
`So local context analysis is practical even for interactive
`applications. For queries containing proximity constraints
`(e.g. phrases), Phrasefinder may add concepts which co-
`occur with all query terms but do not satisfy proximity con-
`straints. Local context analysis does not have such a prob-
`lem because the top ranked passages are retrieved using the
`original query. Because it does not filter out frequent con-
`cepts, local context analysis also has the advantage of using
`frequent but potentially good expansion concepts. A disad-
`vantage of local context analysis is that it may require more
`time to expand a query than Phrasefinder.
`
`4 Experiments
`
`4.1 Collections and Query Sets
`
`Experiments are carried out on 3 collections: TREC3 that
`comprises Tipster 1 and 2 datasets with 50 queries (topics
`151-200), TREC4 that comprises Tipster 2 and 3 datasets
`with 49 queries (topics 202-250) and WEST with 34 queries.
`TREC3 and TREC4(about 2 GBs each) are much larger
`and more heterogeneous than WEST. The average docu-
`ment length of the TREC documents is only 1/7 of that of
`the WEST documents. The average number ofrelevant doc-
`uments per query with the TREC collections is much larger
`than that of WEST. Table 1 lists some statistics about the
`collections and the query sets. Stop words are not included.
`
`4.2 Local Context Analysis
`Table 2 shows the performance of local context analysis on
`the three collections. 70 concepts are added into each query
`using the expansion formula in section 3.2.
`Local text analysis performs very well on TREC3 and
`TREC4. All runs produce significant
`improvements over
`the baseline on the TREC collections. The best run on
`
`TREC4 (100 passages) is 23.5% better than the baseline.
`The best run on TREC3 (200 passages) is 24.4% better than
`the baseline. On WEST,the improvements over the baseline
`are not as good as on TREC3 and TREC4. With too many
`passages, the performance is even worse than the baseline.
`The high baseline of the WESTcollection (53.8% average
`precision) suggests that the original queries are of very good
`quality and we should give them more emphasis.
`So we
`downweight the expansion concepts by 50% by reducing the
`weight of auxiliary query Qi from 2.0 to 1.0. Table 3 shows
`that downweighting the expansion concepts does improve
`performance.
`It is interesting to see how the numberof passages used
`affects retrieval performance. To see it more clearly, we
`plot the performance curve on TREC4in figure 4. Initially,
`increasing the numberof passages quickly improves perfor-
`mance. The performance peaks at a certain point. After
`staying relatively flat for a period, the performance curves
`drop slowly when more passages are used. For TREC3 and
`TREC4,
`the optimal number of passages is around 100,
`while on WEST, the optimal numberof passages is around
`20. This is not surprising because the first two collections
`are a order of magnitude larger than WEST. Currently we
`do not know how to automatically determine the optimal
`numberof passages to use. Fortunately, local context anal-
`ysis is relatively insensitive to the numberof the passages
`used, especially for large collections like the TREC collec-
`tions. On the TRECcollections, between 30 and 300 pas-
`sages produces very good retrieval performance.
`
`5 Local Text Analysis vs Global Analysis
`
`In this section we compare Phrasefinder and local context
`analysis in term of retrieval performance. Tables 4-5 com-
`pare the retrieval performance of the two techniques on
`the TREC collections. On both collections,
`local context
`analysis is much better than Phrasefinder. On TREC3,
`Phrasefinder is 7.8% better than the baseline while local
`context analysis using the top ranked 100 passagesis 23.3%
`better than the baseline. On TREC4, Phrasefinder is only
`3.4% better than the baseline while local context analysis
`using the top ranked 100 passages is 23.5% than the base-
`line.
`In fact, all local context analysis runs in table 2 are
`better than Phrasefinder on TREC3 and TREC4. On both
`collections, Phrasefinder hurts the high-precision end while
`local context analysis helps improve precision. The results
`show that local context analysis is a better query expansion
`technique than Phrasefinder.
`to show why
`We
`examine
`two TREC4 queries
`Phrasefinder is not as good as local context analysis. For
`one example, “China” and “Iraq” are very good concepts
`for TREC4 query “Status of nuclear proliferation treaties —
`violations and monitoring”. They are added into the query
`by local context analysis but not by Phrasefinder.
`It ap-
`pears that they are filtered out by Phrasefinder because they
`are frequent concepts. For the other example, Phrasefinder
`added the concept “oil spill” te TREC4 query “As a result
`of DNAtesting, are more defendants being absolved or con-
`victed of crimes”. This seems to be strange. It appears that
`Phrasefinder did this because “oil spill” co-occurs with many
`of the terms in the query,e.g., “result”, “test”, “defendant”,
`“absolve” and “crime”. But “oil spill” does not co-occur
`with “DNA”, which is a key element of the query. While
`it is very hard to automatically determine which terms are
`key elements of a query, the product function used by local
`context analysis for selecting expansion concepts should be
`
`4
`
`
`
`
`
`
`
`hypnosis
`brain-wave ms.-burns
`technique
`pulse
`reed
`ms.-olness
`brain
`trance
`hallucination
`process
`circuit
`
`
`van-dyck
`behavior
`suggestion
`case
`spiegel
`finding
`
`
`
`hypnotizables
`subject
`van-dyke
`patient
`memory
`application
`
`
`katie
`muncie
`approach
`study
`point
`contrast
`
`
`Figure 3: Local Context Analysis concepts for query 214
`
`collection
`Number of queries
`Raw text size in gigabytes
`Numberof documents
`Mean words per document
`Mean relevant documents per query
`Number of words in a collection
`
`0.26
`11,953
`
`2.2
`| 741,856
`
`567,529
`299
`
`
`Numberof passages
`50
`100
`200
`31.0
`+23.0
`39.3
`+24.4
`53.1
`-13
`
`30.7
`+21.8
`39.1
`423.7
`52.7
`-2.0
`
`29.9
`+18.6
`38.3
`421.3
`52.1
`-3.2
`
`
`
`collection
`TREC4
`
`TREC3
`
`WEST
`
`a
`
`:
`
`.
`
`:
`5
`.
`
`.
`
`.
`
`3
`
`Table 2: Performance of local context analysis using 11 point average precision
`
`collection
`
`2000 WEST
`
`10
`55.6
`56.5
`55.9
`43.8 +50 43.4
`
`40
`55.7
`43.6
`
`Numberof passages
`50
`100
`200
`55.8
`556
`546
`43.7
`43.3
`+1.6
`
`300
`544
`41.2
`
`500
`53.6
`-04
`
`1000
`53.7
`-01
`
`53.7
`-0.1
`
`Table 3: Downweight expansion concepts of local context analysis on WEST. The weight of the auxiliary query is reduced to
`1.0
`
`better than the sum function used by Phrasefinder because
`with the product function it is harder for some query terms
`to dominate other query terms.
`
`6 Local Text Analysis vs Local Feedback
`
`In this section we compare the retrieval performances oflo-
`cal feedback and local context analysis. Table 7 shows the
`retrieval performance of local feedback.
`Table 8 shows the result of downweighting the expansion
`concepts by 50% on WEST. The reason for this is to make
`a fair comparison with local context analysis. Remember
`that we also downweighted the expansion concepts of local
`context analysis by 50% on WEST.
`Local feedback does very well on TREC3. The best run
`produces a 20.5% improvement over the baseline, close to
`the 24.4% of the best run of local context analysis. It is also
`relatively insensitive to the number of documents used for
`feedback on TREC3.
`Increasing the number of documents
`from 10 to 50 does not affect performance much.
`It also does well on TREC4. The best run produces a
`14.0% improvement over the baseline, very significant, but
`lower than the 23.5% of the best run of local context analy-
`
`sis. It is very sensitive to the number of documents used for
`feedback on TREC4.
`Increasing the number of documents
`from 5 to 20 results in a big performance loss. In contrast,
`local context analysis is relatively insensitive to the number
`of passages on all three collections.
`On WEST, local feedback does not work at all, With-
`out downweighting the expansion concepts, it results in a
`significant performance loss over all runs. Downweighting
`the expansion concepts only reduces the amount ofloss. It
`is also sensitive to the number of documents used for feed-
`back. Increasing the number of feedback documentsresults
`in significantly more performanceloss.
`It seems that the performanceof local feedback and its
`sensitivity to the number of documents used for feedback
`depend on the number of relevant documents in the col-
`lection for the query. From table 1 we know that average
`numberof relevant documents per query on TREC3 is 196,
`larger than 133 of TREC4, which is in turn larger than 29
`of WEST. This corresponds to the relative performance of
`local feedback on the collections.
`Tables 4-6 show a side by side comparison between local
`feedback and local context analysis at different recall levels
`on the three collections. Top 10 documents are used for local
`
`5
`
`
`
`2 8
`
`2
`
`oa
`
`30
`
`20
`
`3a
`
`e5
`&
`
`28 °
`
`60
`
`100160
`
`260300
`200
`number of passages
`
`390
`
`400
`
`450
`
`500
`
`Figure 4: Performance curve of local context analysis on TREC4
`
`Phrasefinder
`
`Tea-100p
`
`average 68.4
`
`52.8
`43.2
`36.0
`29.8
`24.5
`19.7
`14.8
`10.8
`
`(
` (+33.4)
`(+56.9)
`(+74.7)
`
`$11.0) [31.1 (+235) ]
`
`Table 4: A comparison of baseline, Phrasefinder, local feedback and local context analysis on TREC4. 10 documents for local
`feedback (If-10doc). 100 passages for local context analysis (lca-100p)
`
`feedback and top 100 passages are used for local context
`analysis in these tables. In table 6 for WEST, the expansion
`concepts are downweighted by 50% for both local feedback
`and local context analysis.
`We also made a query-by-query comparison of the best
`run of local feedback and the best run of local context anal-
`ysis on TREC4. Of 49 queries, local feedback hurts 21 and
`improves 28, while local context analysis hurts 11 and im-
`proves 38. Of the queries hurt by local feedback, 5 queries
`have a more than 5% percentloss in average precision. The
`worst case is query 232, whose average precision is reduced
`from 24.8% to 4.3%. Of those hurt by local context analysis,
`only one has a more 5% percent loss in average precision.
`Local feedback also tends to hurt queries with poor perfor-
`mance. Of 9 queries with baseline average precision less than
`5%, local feedback hurts 8 and improves 1. In contrast,lo-
`cal context analysis hurts 4 and improves 5. Its tendency to
`hurt “bad” queries and queries with few relevant documents
`(such as the WEST queries) suggests that local feedback is
`very sensitive to the number of relevant documents in the
`top ranked documents. In comparison, local context analy-
`sis is not so sensitive.
`It is interesting to note that although both local context
`analysis and local feedback find concepts from top ranked
`passages/documents, the overlap of the concepts chosen by
`them is very small. On TREC4,
`the average number of
`unique terms in the expansion concepts per query is 58 by
`local feedback and 78 by local context analysis. The aver-
`age overlap per query is only 17.6 terms. This meanslocal
`
`context analysis and local feedback are two quite different
`query expansion techniques. Some queries expanded quite
`differently are improved by both methods. For example, the
`expansion overlap for query 214 of TREC4 (“Whatare the
`different techniques usedto create self-induced hypnosis”) is
`19 terms, yet both methods improve the query significantly.
`
`7 Conclusion and Future Work
`
`This paper compares the retrieval effectiveness of three au-
`tomatic query expansion techniques: global analysis, local
`feedback and local context analysis. Experimental results
`on three collections show that local document analysis (local
`feedback and local context analysis) is more effective than
`global document analysis. The results also show that local
`context analysis, which uses someglobal analysis techniques
`on the local document set outperforms simple local feedback
`in termsofretrieval effectiveness and predictability.
`Wewill continue our work in these aspects:
`
`1. local context analysis: automatically determine how
`many passages to use, how many concepts te add to
`the query and how to assign the weights to them on a
`query by query basis. Currently the parameter values
`are decided experimentally and fixed for all queries.
`
`a new metric for selecting concepts.
`2. Phrasefinder:
`Currently Phrasefinder uses Inquery’s belief func-
`tion, which is not designed to select concepts. We
`
`6
`
`
`
`Table 5: A comparison of baseline, Phrasefinder, local feedback and local text analysis on TREC3. 10 documents for local
`feedback (If-10doc). 100 passages for local context analysis (Ica-100p)
`
`average
`50.1 (—7.0)
`
`1f-10doc-dw0.5 Ica-100p-w1.0
`
`81.9
`92.1
`(44.7)
`84.3
`76.9
`71.4
`78.5
`68.2
`73.9
`60.8
`61.8
`56.8
`56.8
`50.7
`44.2
`36.4
`22.6
`10.0
`[average|53.8]62.0(-3.3)|55.6(133)|
`
`(—4.0)
`
`Table 6: A comparison of baseline, local feedback and local text analysis on WEST. 10 documents for local feedback with
`weights for expansion units downweighted by 50% (lf-10doc-dw0.5). 100 passages for local context analysis with weight for
`auxiliary query set to 1.0 (lca-100p-w1.0).
`
`hope a better metric will improve the performance of
`Phrasefinder.
`
`8 Acknowledgements
`
`We thank Dan Nachbar and James Allan for their help dur-
`ing this research. This research is supported in part by the
`NSF Center for Intelligent Information Retrieval at Univer-
`sity of Massachusetts, Amherst.
`This material is based on work supported in part by
`NRaD Contract Number N66001-94-D-6054. Any opinions,
`findings and conclusions or recommendations expressed in
`this material are the author(s) and do not necessarily re-
`flect those of the sponsor.
`
`References
`
`(Attar & Fraenkel, 1977] Attar, R., & Fraenkel, A. S.
`(1977). Local Feedback in Full-Text Retrieval Systems.
`Journal of the Association for Computing Machinery,
`24(3),397-417.
`
`[Buckley et al., 1996] Buckley, C., Singhal, A., Mitra, M.,
`& Salton, G. (1996). New Retrieval Approaches Using
`SMART : TREC4. In Harman,D., editor, Proceedings of
`the TREC 4 Conference. National ‘institute of Standards
`and Technology Special Publication. to appear.
`
`[Caid et al., 1993] Caid, B., Gallant, S., Carleton, J., &
`Sudbeck, D. (1993). HNC Tipster Phase I Final Report.
`In Proceedings of Tipster Text Program (Phase I), pp. 69-
`92.
`
`[Callan et al., 1995] Callan, J., Croft, W. B., & Broglio,
`J. (1995). TREC and TIPSTER experiments with IN-
`QUERY.
`Information Processing and Management, pp.
`327-343.
`
`(Callan, 1994] Callan, J. P. (1994). Passage-level evidence
`in documentretrieval. In Proceedings of ACM SIGIR In-
`ternational Conference on Research and Developmentin
`Information Retrieval, pp. 302-310.
`
`[Croft et al., 1995] Croft, W. B., Cook, R., & Wilder, D
`(1995). Providing Government Information on The In-
`ternet: Experiences with THOMAS. In Digital Libraries
`Conference DL'95, pp. 19-24.
`
`[Croft & Harper, 1979] Croft, W. B., & Harper, D. J.
`(1979). Using probabilistic models of documentretrieval
`without relevance information.
`Journal of Documenta-
`tion, 35,285-295.
`
`[Crouch & Yang, 1992] Crouch, C. J., & Yang, B. (1992).
`Experiments in automatic atatiatical thesaurus construc-
`tion.
`In Proceedings of ACM SIGIR International Con-
`ference on Research and Development in Information Re-
`trieval, pp. 77-88.
`
`10
`
`7
`
`
`
`‘
`
`number of documents used
`20
`30
`.
`27.2
`+8.2
`37.7
`+19.4
`44,1
`-18.0
`
`50
`26.7
`+6.2
`37.7
`+419.3
`40.0
`-25.6
`
`26.1
`43.5
`36.6
`+15.8
`35.1
`-34.7
`
`collection
`TREC
`
`
`
`number of documents used
`100
`10
`20
`30
`50
`40.0
`52.0
`48.7
`47.5
`44.5
`52.6
`WEST
`
`-2.2
`-3.3
` -9.5
`-11.6
`-17.2
`-25.7
`
`Table 8: Downweight expansion concepts of lecal feedback by 50% on WEST
`
`{Deerwester et al., 1990] Deerwester, S., Dumais, §., Fur-
`nas, G., Landauer, T., & Harshman, R. (1990).
`Index-
`ing by latent semantic analysis. Journal of the Ameri