`Document Analysis
`
`Jinxi Xu and W. Bruce Croft
`Center for Intelligent Information Retrieval
`Computer Science Department
`University of Massachusetts, Amherst
`Amherst, MA 01003-4610, USA
`xu@cs.umass.edu croft@cs.umass.edu
`
`Abstract
`
`Automatic query expansion has long been suggested as a
`technique for dealing with the fundarnental isaue of word
`mismatch in information retrieval. A number of approaches
`ta expansion have been studied and, more recently, attention
`has focused on techniques that analyze the corpus to discover
`word relationshipa (global techniques) and those that analyze
`documents retrieved by ihe initial query ( local feedback), In
`this paper, we compare the effectiveness of these approaches
`and show that, although global analysis has some advantages,
`local analysis is generally more effective. We also show that
`using global analysis techniques, such as word contezt and
`phrase structure, on the local set of documents produces re-
`sulis that are both more effective and more predictable than
`simple local feedback.
`
`1
`
`Introduction
`
`The problem of word mismatch is fundamental to informa-
`tion retrieval. Simply stated, it means that people often use
`different words to describe concepts in their queries than au-
`thors use to describe the same concepts in their documents.
`The severity of the problem tends to decrease as queries
`get longer, since there is more chance of some important
`words co-occurring in the query and relevant documents.
`In many applications, however, the queries are very short.
`For example, applications that provide searching across the
`World-Wide Web typically record average query lengths of
`two words [Croft et al., 1995]. Although this may be one ex-
`treme in terms of IR applications, it does indicate that most
`TR queries are not long and that techniques for dealing with
`word mismatch are needed.
`An obvious approach to solving this problem is query
`expansion, The query is expanded using words or phrases
`with similar meaning to those in the query and the chances
`of matching words in relevant documents are therefore in-
`creased, This is the basic idea behind the use of a thesaurus
`
`Permission to moake digitaVhard copy of ull part of this work for per-
`sonal or classroom use 18 granted without fee provided that copies anc
`not made or distributed for profit or commercial advantage, the copy-
`right notice, the title of the publication and ts date appear, and notice
`is given that copying 1s by permission of ACM,Inc. To copy otherwi-
`se, to republish, to post on servers or to redistribute to lists, requires
`poor specific permission and/or fee.
`0-89791-792
`SIGIR,
`Zurich,
`Switverland@1996
`ACM
`8/96/08 $3.50
`
`in query formulation. There is, however,little evidence that
`a general thesaurus is of any use in improving the effec-
`tiveness of the search, even if words are selected by the
`searchers [Voorhees, 1994].
`Instead, it has been proposed
`that by automatically analyzing the text of the corpus be-
`ing searched, a more effective thesaurus or query expansion
`technique could be produced.
`One of the earliest studies of this type was carried out
`by Sparck Jones [Sparck Jones, 1971] who clustered words
`based on co-occurrence in documents and used those clus-
`ters to expand the queries, A number of similar studies
`followed but it was not until recently that consistently pos-
`itive results have been obtained. The techniques that have
`been used recently can be described as being based on either
`global or local analysis of the documents in the corpus being
`searched, The global techniques examine word occurrences
`and relationships in the corpus as a whole, and use this in-
`formation to expand any particular query. Given their focus
`on analyzing the corpus, these techniques are extensions of
`Sparck Jones" original approach.
`Local analysis, on the other hand, involves only the top
`ranked documents retrieved by the original query. We have
`called it local because the techniques are variations of the
`original work on local feedback [Attar & Fraenkel, 1977,
`Croft & Harper, 1979]. This work treated local feedback as
`a special case of relevance feedback where the top ranked
`documents were assumed to be relevant. Queries were both
`reweighted and expanded based on this information.
`Both global and local analysis have the advantage of ex-
`panding the query based onall the words in the query. This
`is in contrast to a thesaurus-based approach where individ-
`ual words and phrases in the query are expanded and word
`ambiguity is a problem. Global analysie is inherently more
`expensive than local analysis. On the other hand, global
`analysis provides a thesaurus-like resource that can be used
`for browsing without searching, and retrieval results with
`local feedback on small test collections were not promising.
`More recent results with the TREC collection, however,
`indicate that local feedback approaches can be effective and,
`in some cases, outperform global analysis techniques. In this
`paper, we compare these approaches using different query
`sets and corpora.
`In addition, we propose and evaluate a
`new technique which borrows ideas from global analysis,
`such as the use of context and phrase structure, but applies
`them to the local document set, We call the new technique
`local context analysis to distinguish it from local feedback.
`In the next section, we describe the global analysis pro-
`cedure used in these experiments, which is the Phrasejinder
`component of the INQUERYretrieval system [Jing & Croft,
`
`(cid:20)
`
`PETITIONERS - EXHIBIT 1021
`PETITIONERS- EXHIBIT 1021
`IPR2022-00217
`
`IPR2022-00217
`
`
`
`1994]. Section 3 covers the local analysis procedures. The
`local feedback technique is based on the most successful ap-
`proaches from the recent TREC conference (Harman, 1996].
`Local context analysis is described in detail.
`The experiments and results are presented in section 4.
`Both the TREC [Harman, 1995] and WEST [Turtle, 1994]
`collections are used in order to compare results in differ-
`ent domains, A number of experiments with local context
`analysis are reported to show the effect of parameter varia-
`tions on this new technique. The other techniques are run
`using established parameter settings. In the comparison of
`global and local techniques, both recall/precision averages
`and query-by-query results are used. The latter evaluation
`is particularly useful to determine the robustness of the tech-
`niques, in terms of how many queries perform substantially
`worse after expansion.
`In the final section, we summarize
`the results and suggest future work.
`
`2 Global Analysis
`
`The global analysis technique we describe here has been used
`in the INQUERY system in TREC evaluations and other
`applications [Jing & Croft, 1994, Callan et al., 1995], and
`was one of the first techniques to produce consistent effec-
`tiveness improvements through automatic expansion. Other
`researchers have developed similar approaches [Qiu & Frei,
`1993, Schiitse & Pedersen, 1994] and havealso reported good
`results.
`The basic idea in global analysis is that the global con-
`text of a concept can be used to determine similarities be-
`tween concepts. Context can be defined in a number of ways,
`as can concepts. The simplest definitions are that all words
`are concepts (except perhaps stop words) and that the con-
`text for a wordis all the words that co-occur in documents
`with that word. This is the approach used by [Qiu & Frei,
`1993], and the analysis produced is related to the represen-
`tations generated by other dimensionality-reduction tech-
`niques [Deerwester et al., 1990, Caid et al., 1993], The
`essential difference is that global analysis is only used for
`query expansion and does mot replace the original word-
`based document representations. Reducing dimensions in
`the document representation leads to problems with preci-
`sion. Another related approach uses clustering to determine
`the context for document analysis [Crouch & Yang, 1992].
`In the Phrasefinder technique used with INQUERY,the
`basic definition for a concept is a noun group, and the con-
`text is defined as the collection of fixed length windows sur-
`rounding the concepts. A noun group (phrase) is either a
`single noun, two adjacent nouns or three adjacent nouns.
`Typical effective window sizes are from 1 to 3 sentences.
`One way ofvisualizing the technique, although not the most
`efficient way of implementing it, is to consider every concept
`(noun group) to be associated with a pseudo-document, The
`contents of the pseudo-documentfor a concept are the words
`that occur in every window for that concept in the corpus,
`For example, the concept airline pilot might have the words
`pay, strike, safety, air, traffic and FAA occurring frequently
`in the corresponding pseudo-document, depending on the
`corpus being analyzed. An INQUERYdatabaseis built from
`these pseudo-documents, creating a concept database. A fil-
`tering step is used to remove words that are too frequent or
`too rare, in order to control the size of the database.
`To expand a query,it is run against the concept database
`using INQUERY,which will generate a ranked list of phrasal
`concepts as output, instead of the usual list of document
`names. Document and collection-based weighting of match-
`
`ing words are used to determine the concept ranking, in a
`similar way to document ranking. Someof the top-ranking
`phrases from the list are then added to the query and
`weighted appropriately.
`In the Phrasefinder queries used
`in this paper, 30 phrases are added into each query and are
`downweighted in proportion to their rank position. Phrases
`containing only terms in the original query are weighted
`more heavily than those containing terms not in the origi-
`nal query,
`retrieved by
`the top 30 concepts
`shows
`Figure 1
`Phrasefinder for the TREC4 query 214 “What are the differ-
`ent techniques used to create self induced hypnosis”. While
`some of the concepts are reasonable, others are difficult to
`understand. This is due to a number of spurious matches
`with noncontent words in the query.
`The main advantages of a global analysis approach like
`the one used in INQUERY ie thatit is relatively robust in
`that average performance of queries tends to improve us-
`ing this type of expansion, and it provides a thesaurus-like
`resource that can be used for browsing or other types of
`concept search. The disadvantages of this approach is that
`it can be expensive in terms of disk space and computer
`time to do the global context analysis and build the search-
`able database, and individual queries can be significantly
`degraded by expansion.
`
`3. Local Analysis
`
`$.1 Local Feedback
`
`The general concept of local feedback dates back at least
`to e 1977 paper by Attar and Fraenkel [Attar & Fraenkel,
`1977]. In this paper, the top ranked documents for a query
`were proposed as a source of information for building an
`automatic thesaurus. Terms in these documents were clus-
`tered and treated as quasi-synonyms.
`In [Croft & Harper,
`1979], information from the top ranked documents is used to
`re-estimate the probabilities of term occurrence in the rel-
`evant set for a query. In other words, the weights of query
`terms would be modified but new terms were not added.
`This experiment produced effectiveness improvements, but
`was only carried out on a small test collection.
`Experiments carried out with other standard small col-
`lections did mot give promising results, Since the simple
`version of this technique consists of adding common words
`from the top-ranked documents to the original query, the
`effectiveness of the technique is obviously highly influenced
`by the proportion of relevant documents in the high ranks.
`Queries that perform poorly and retrieve few relevant doc-
`uments would seem likely to perform even worse after local
`feedback, since most words added to the query would come
`from non-relevant documents.
`In recent TREC conferences, however, simple local feed-
`back techniques appear to have performed quite well. In this
`paper, we expand using a procedure similar to that used by
`the Cornell group in TREC 4 & 3 [Buckley et al., 1996].
`The most frequent 50 terms and 10 phrases (pairs of adja-
`cent non stop words) from the top ranked documents are
`added to the query. The terms in the query are reweighted
`using the Rocchio formula with a: 6:7=1:1:0.
`Figure 2 shows terms and phrases added by local feed-
`back to the same query used in the previous section. In this
`case, the terms in the query are stemmed.
`One advantage of local feedback is that it can be rela-
`tively efficient to do expansion based on high ranking doc-
`uments.
`It may be slightly slower at run-time than, for
`
`n
`
`(cid:21)
`
`
`
`ms-ol practitioners
`
`psychosomat
`mesmer
`austrian
`shesaid
`halhucin
`
`19820902
`physician
`hemophiliac
`ol
`spiegel
`suggest
`immunoglobulin
`person
`psorias
`17150000
`austrian-physician
`hypnot-state
`late-18th
`
`antibodies
`immunodeficiency-virus
`therapists
`van-dyke
`stare
`johns-hopkins-university
`voltage
`conde-nast
`
`disorders
`anesthesia
`dearth
`self
`proteins
`growing-acceptance
`ad-hoc
`dynamica
`
`psychiatrist
`professor
`centur
`unaccept
`exper
`patient
`cortic
`muncie
`imagin
`feburar
`fresco
`katie
`medicin
`franz-mesmer
`
`psychosomat-medicin
`fight-Immun
`diseas-fight
`
`intern-congress
`hypnotiz-peopl
`
`Figure 2: Local feedback terms and phrases for TREC4 query 214
`
`example, Phrasefinder, but needs no thesaurus construction
`phase. Local feedback requires an extra search and access
`to document information. If document information is stored
`only for this purpose, then this should be counted as a space
`overhead for the technique, but it likely to be significantly
`less than a concept database. A disadvantage currently is
`that it ia not clear how well this technique will work with
`queries that retrieve few relevant documents.
`
`3.2 Local Context Analysis
`Local context analysis is a mew technique which combines
`global analysis and local feedback. Like Phrasefinder, noun
`groups are used as concepts and concepts are selected based
`on co-occurrence with query terms. Concepts are chosen
`from the top ranked documents,similar to local feedback,
`but the best passages are used instead ofwhole documents.
`The standard INQUERY ranking is not used in this tech-
`mque.
`Below are the steps to use local context analysis to ex-
`pand a query Q on a collection.
`
`1, Use a standard IR system (INQUERY)to retrieve the
`top n ranked passages. A passage is a text window
`of ane size (300 words in these experiments [Callan,
`1994]).
`There are tworeasons that we use passages rather than
`documents. Since documents can be very long and
`
`about multiple topics, a co-occurrence of a concept at
`the beginning and a term at the end of a long docu-
`ment may mean nothing. It is also more efficient to
`use passages because we can eliminate the cost of pro-
`cessing the unnecessary parts of the documents.
`
`. Concepts (noun phrases) in the top n passages are
`ranked according to the formula
`
`bel(Q,¢) = [J (6 + log(af(c, t:)) tafe/ Jog(n))
`ee
`
`Where
`
`:
`af(e,ti) = Dien [tis fey
`idf;
`= maz(1,0,log10(N/N;)/5.0)
`idf.
`= maz(1.0,log10(NIRS/5.0)
`
`ce
`
`is a concept
`in pj;
`is the number of occurrences of ¢;
`is the number of occurrences of ¢ in pj;
`fej
`Nis the number of passages in the collection
`Nz
`is the number of passages containing t;
`N.
`is the number of passages containing c
`6
`is 0.1 in this paper to avoid sero bel value
`
`The above formula ie a variant of the tf idf measure
`used by most IR systems. In the formula, the af part
`
`(cid:22)
`
`
`
`rewards concepts co-occurring frequently with query
`terms, the idf, part penalizes concepts occurring fre-
`quently in the collection, the idf; part emphasizes in-
`frequent query terms. Multiplication is used to em-
`phasize co-occurrence with all query terms.
`
`3 Add m top ranked concepts to Q using the following
`formula:
`
`Qnew
`Qi
`
`#WSUM(1.01.090w Qi)
`#WSUM(1.0 WwW) C1 Wy CZ... Wen Cm)
`
`In our experiments, m is set to 70 and w; is set to
`1.0—0.9414/70. Unless specified otherwise, w is set to
`2.0. We call Qi the auxiliary query. #WSUM is an
`INQUERY query operator which computes a weighted
`average of its components.
`
`Figure 3 showa the top 30 concepts added by local con-
`text analysis to TREC4 query 214.
`Local context analysis has several advantages. It is com-
`putationally practical. For each collection, we only need a
`single pass to collect the collection frequencies for the terms
`and noun phrases. This pass takes about 3 hours on an
`Alpha workstation for the TREC4 collection. The major
`overhead to expand a query is an extra search to retrieve:
`the top ranked passages. On a modern computer system,
`this overhead is reasonably small. Once the top ranked
`passages are available, query expansion is fast; when 100
`passages are used, our current implementation requires only
`several seconds of CPU time to expand a TREC4 query.
`So local context analysis is practical even for interactive
`applications. For queries containing proximity constraints
`(e.g. phrases), Phrasefinder may add concepts which co-
`occur with all query terms but do not satisfy proximity con-
`straints. Local context analysis does not have such a prob-
`lem because the top ranked passages are retrieved using the
`original query. Because it does not filter out frequent con-
`cepts, local context analysis also has the advantage of using
`frequent but potentially good expansion concepts, A disad-
`vantage of local context analysis is that it may require more
`time to expand a query than Phrasefinder.
`
`4 Experiments
`
`4.1 Collections and Query Sets
`
`Experiments are carried out on 3 collections: TREC3 that
`comprises Tipster 1 and 2 datasets with 50 queries (topics
`151-200), THEC4 that comprises Tipster 2 and 3 datasets
`with 49 queries (topics 202-250) and WEST with 34 queries.
`TREC3 and TREC4 (about 2 GBs each) are much larger
`and more heterogeneous than WEST. The average docu-
`ment length of the TREC documents is only 1/7 of that of
`the WEST documents. The average number ofrelevant doc-
`uments per query with the TREC collections is much larger
`than that of WEST. Table 1 lists some statistics about the
`collections and the query sets, Stop words are not included,
`
`4.2 Local Context. Analysis
`Table 2 shows the performance of local context analysis on
`the three collections. 70 concepts are added into each query
`using the expansion formula in section 3,2.
`Local text analysis performs very well on TRECS and
`TREG4. Ali runs produce significant
`improvements over
`the baseline on the TREC collections. The best run on
`
`TREC4 (100 passages) is 23.5% better than the baseline.
`The best run on TRECS3 (200 passages) i 24.4% better than
`the baseline. On WEST,the improvements over the baseline
`are not as good as on TREC3 and TREC4. With too many
`passages, the performance is even worse than the baseline.
`The high baseline of the WESTcollection (53.8% average
`precision) suggests that the original queries are of very good
`quality and we should give them more emphasis. So we
`downweight the expansion concepts by 50% by reducing the
`weight of auxilary query Q! from 2.0 to 1.0. Table 3 shows
`that downweighting the expansion concepts does improve
`performance.
`It is interesting to see how the number of passages used
`affects retrieval performance. To see it more clearly, we
`plot the performance curve on TREC4in figure 4. Initially,
`increasing the number of passages quickly improves perfor-
`mance. The performance peaks at a certain pomt. After
`staying relatively flat for a period, the performance curves
`drop slowly when more passages are used. For TREC3 and
`TREC4, the optimal number of passages is around 100,
`while on WEST, the optimal number of passages is around
`20. This is not surprising because the firat two collections
`are a order of magnitude larger than WEST. Currently we
`do not know how to automatically determine the optimal
`number of passages to use. Fortunately, local context anal-
`yeie is relatively insensitive to the number of the passages
`used, especially for large collections like the TREC collec-
`tions. On the TRECcollections, between 30 and 300 pas-
`sages produces very good retrieval performance.
`
`5 Local Text Analysis va Global Analysis
`
`In this section we compare Phrasefinder and local context
`analysis in term of retrieval performance. Tables 4-5 com-
`pare the retrieval performance of the two techniques on
`the TRECcollections. On both collections, local context
`analysis is much better than Phrasefinder. On TREC3,
`Phrasefinder is 7.8% better than the baseline while local
`context analysis using the top ranked 100 passages is 23.3%
`better than the baseline. On TREC4, Phrasefinder is only
`3.4% better than the baseline while local context analysis
`using the top ranked 100 passages is 23.5% than the base-
`line.
`In fact, all Jocal context analysis runs in table 2 are
`better than Phrasefinder on TREC3 and TREC4. On both
`collections, Phrasefinder hurts the high-precision end while
`local context analysis helps improve precision. The results
`show that local context analysis is a better query expansion
`technique than Phrasefinder.
`to show why
`We
`examine
`two TREC4 queries
`Phrasefinder is not as good as local context analysis. For
`one example, “China” and “Iraq” are very good concepts
`for TREC4 query “Status of miclear proliferation treaties —
`violations and monitoring”. They sre added into the query
`by local context analysis but not by Phrasefinder.
`It ap-
`pears that they are filtered out by Phrasefinder because they
`are frequent concepts. For the other example, Phrasefinder
`added the concept “oil spill” to TREC4 query “As a result
`of DNAtesting, are more defendants being absolved or con-
`victed of crimes”. This seems to be strange. It appears that
`Phrasefinder did this because “oil spill” co-occurs with many
`of the terma in the query, ¢.g., “result”, “test”, “defendant”,
`“absolve” and “crime”. But “oil spill” does not co-occur
`with “DNA”, which is a key element of the query. While
`it is very hard to automatically determine which terms are
`key elements of a query, the product function used by local
`context analysis for selecting expansion concepts should be
`
`(cid:23)
`
`
`
`muncie point
`
`hypnosis
`technique
`brain
`hallucination
`van-dyck
`case
`hypnotizables
`patient
`katie
`study
`
`pulse
`ms,-olness
`process
`behavior
`spiegel
`subject
`memory
`
`ms.-burns
`reed
`trance
`cirewit
`suggestion
`finding
`van-dyke
`application
`approach
`contrast
`
`Figure 3: Local Context Analysis concepts for query 214
`WEST
`collection
`[TREC3[|TRECi|
`[34
`(oe==[abe
`paz207
`1,953
`741,856
`567,529
`299
`133
`
` +1.6 1000
`
`30.2
`+19.8
`38.7
`+22.6
`54.5
`41.3
`
`Number of passages
`500
`300
`50
`100
`200
`29.9
`30.7
`slid
`31.0
`+23.6
`423.0 +218 +18.6
`38.9
`39.3
`39.1
`38.3
`23.3 +244
`+23.7
`+421.3
`54.2
`53.1
`52.7
`52.1
`+0.8
`-1.3
`-2.0
`-3.2
`
`2z000
`27.9
`+10.7
`36.6
`+16.0
`51.7
`-3.9
`
`29.0
`+15
`37.6
`+19
`51.7
`-3.9
`
`
`
`56.5
`55.9
`+3.8 +5.0
`
`55.6
`+384
`
`40
`55.7
`43.6
`
`umber of passages
`100
`200
`55.8
`55.6
`54.6
`43.7 +3.3
`
`300
`64.4
`+12
`
`500
`53.6
`-04
`
`1000
`63.7
`-0.1
`
`2000
`63.7
` -O.1
`
`Table 3: Downweight expansion concepts oflocal context analysis on WEST. The weight of the auxiliary query is reduced to
`1.0
`
`better than the sum function used by Phrasefinder because
`with the product function it is harder for some query terme
`to dominate other query terms.
`
`6 Local Text Analysis vs Local Feedback
`
`In this section we compare the retrieval performances of lo-
`cal feedback and local context analysis. Table 7 shows the
`retrieval performance of local feedback.
`Table 8 shows the result of downweighting the expansion
`concepts by 50% on WEST. The reason for this is to make
`a fair comparison with local context analysis. Remember
`that we also downweighted the expansion concepts of local
`context analysis by 50% on WEST.
`Local feedback does very well on TREC3. The best run
`produces a 20.5% improvement over the baseline, close to
`the 24.4% of the best run of local context analysis. It is also
`relatively insensitive to the number of documents used for
`feedback on TRECS.
`Increasing the number of documents
`from 10 to 50 does not affect performance much.
`It also does well on TREC4. The best run produces a
`14.0% improvement over the baseline, very significant, but
`Jower than the 23.5% of the best run of local context analy-
`
`sis. It is very sensitive to the number of documents used for
`feedback on THEC4.
`Increasing the number of documents
`from 5 to 20 results in a big performance loss. In contrast,
`local context analysis is relatively insensitive to the number
`of passages on all three collections.
`On WEST, local feedback does not work at all. With-
`out downweighting the expansion concepts, it results in a
`significant performance loss over all runs. Downweighting
`the expansion concepts only reduces the amount ofloss. It
`is also sensitive to the number of documents used for feed-
`back, Increasing the number of feedback documents results
`in significantly more performance loss.
`It seems that the performance of local feedback and its
`sensitivity to the number of documents used for feedback
`depend on the number of relevant documents in the col-
`lection for the query. From table 1 we know that average
`number of relevant documents per query on TREC3is 196,
`larger than 133 of TREC4, which is in turn larger than 29
`of WEST. This corresponds to the relative performance of
`local feedback on the collections.
`Tables 4-6 show a side by side comparison between local
`feedback and local context analysis at different recall levels
`on the three collections. Top 10 documents are sed for local
`
`(cid:24)
`
`
`
`“a
`
`14tieascii
`
`6
`
`aS
`
`10 1 m0 m3 1-0 4400
`nur of Deegan
`
`Figure 4: Performance curve of local context analysis on TREC4
`
`Laverage|25.2]26.0(+3.4)| (+9.2
`
`(+13,2
`24.5
`(+334)
`19.7
`(456.9
`14.8
`(+74.7
`10.8
`0.9
`A
`0.8
`is
`7.9
`[97.9 (410.0) |St. (423.5) |
`
`0.6
`
`Table 4: A comparison of baseline, Phrasefinder, local feedback and local context analysis on TREC4. 10 documents for local
`feedback (lf-10doc). 100 passages for local context analysis (Ica-100p)
`
`feedback and top 100 passages are used for local context
`analysis in these tables. In table 6 for WEST, the expansion
`concepts are downweighted by 50% for both local feedback
`and local context analysis.
`We also made a query-by-query comparison of the best
`run of local feedback and the best run of local context anal-
`ysis on TREC4, Of 49 queries, local feedback hurts 21 and
`improves 28, while local context analysis hurts 11 and im-
`proves 38. Of the queries hurt by local feedback, 5 queries
`have a more than 5% percent loss in average precision. The
`worst case is query 232, whose average precision is reduced
`from 24.8% to 4.3%, Of those hurt by local context analysis,
`only one has a more 5% percent loss in average precision.
`Local feedback also tends to hurt queries with poor perfor-
`mance. Of 9 queries with baseline average precision less than
`5%, local feedback hurta 8 and improves 1. In contrast, lo-
`cal context analysis hurts 4 and improves 5. Its tendency to
`hurt “bad” queries and queries with few relevant documents
`(such as the WEST queries) suggests that local feedback is
`very sensitive to the number of relevant documents in the
`top ranked documents. In comparison, local context analy-
`sis is not so sensitive.
`It is interesting to note that although both local context
`analysis and local feedback find concepts from top ranked
`passages/documents, the overlap of the concepts chosen by
`them is very small. On TREC4,
`the average number of
`unique terms in the expansion concepts per query is 58 by
`local feedback and 78 by local context analysis. The aver-
`age overlap per query is only 17.6 terms. This means local
`
`context analysis and local feedback are two quite different
`query expansion techniques. Some queries expanded quite
`differently are improved by both methods. For example, the
`expansion overlap for query 214 of TREC4 (“Whatare the
`different techniques used to create self-induced hypnosia™) is
`19 terms, yet both methods improve the query significantly.
`
`7 Conclusion and Future Work
`
`This paper compares the retrieval effectiveness of three au-
`tomatic query expansion techniques: global analysis, local
`feedback and local context analysis. Experimental results
`on three collections show that local document analysis (local
`feedback and local context analysis) is more effective then
`global document analysis. The results also show that local
`context analysis, which uses some global analysis techniques
`on the local document set outperforms simple local feedback
`in terms of retrieval effectiveness and predictability.
`We will continue our work in these aspects:
`
`1. local context analysis: automatically determine how
`many passages to use, how many concepts to add to
`the query and how to assign the weights to them on a
`query by query basis. Currently the parameter values
`are decided experimentally and fixed for all queries.
`
`a new metric for selecting concepts.
`2. Phrasefinder:
`Currently Phresefinder uses Inquery’s belief fune-
`tion, which is not designed to select concepts. We
`
`(cid:25)
`
`
`
`[Precision (%change)—S0-queries
`
`(49.1
`(410.7
`+12.8
`+15.9
`
`50.4
`43.3
`36.9
`31.8
`26,1
`20.6
`15.8
`&4
`
`36.8
`30.9
`25.2
`194
`.5
`
`445.7
`Mie3
`
`415.1 41.6
`
`
`Table 5: A comparison of baseline, Phrasefinder, local feedback and local text analysis on TREC3. 10 documents for local
`feedback (lf-10doc). 100 passages for local context analysis (lca-100p)
`
`Table 6: A comparison of baseline, local feedback and local text analysis on WEST. 10 documents for local feedback with
`weights for expansion units downweighted by 50% (1f10doc-dw0.5), 100 passages for local context analysis with weight for
`auxiliary query set to 1-0 (Ica-100p-w1.0).
`
`hope a better metric will improve the performance of
`Phrasefinder.
`
`8 Acknowledgements
`
`We thank Dan Nachbar and James Allan for their help dur-
`ing this research, This research is supported in part by the
`NSF Center for Intelligent Information Retrieval at Univer-
`sity of Massachusetts, Amherst.
`This material
`is based on work supported in part by
`NRaD Contract Number N66001-94-D-6054. Any opinions,
`findings and conclusions or recommendations expressed in
`this material are the author(s) and do not necessarily re-
`flect those of the sponsor.
`
`References
`
`5S,
`[Attar & Fraenkel, 1977] Attar, R., & Fraenkel, A.
`(1977). Local Feedback in Full-Text Retrieval Systems.
`Journal of the Association for Computing Machinery,
`24(3),397-417.
`
`[Buckley ct al., 1996] Buckley, C., Singhal, A., Mitra, M.,
`& Salton, G. (1996). New Retrieval Approaches Using
`SMART: TREC 4. In Harman,D., editor, Proceedings of
`the TREC 4 Conference. National Institute of Standards
`and Technology Special Publication. to appear.
`
`[Caid et al., 1993] Caid, B., Gallant, S., Carleton, J., &
`Sudbeck, D. (1993). HNC Tipster Phase I Final Report.
`In Proceedings of Tipster Text Program (Phase J), pp. 69-
`92.
`
`[Callan et al., 1995] Callan, J., Croft, W. B., & Broglio,
`J. (1995). TREC and TIPSTER experitnenta with IN-
`QUERY. Information Processing and Management, pp.
`327-343.
`
`[Callan, 1994] Callan, J. P. (1994). Passage-level evidence
`in document retrieval. In Proceedings of ACM SIGIR In-
`ternational Conference on Research and Development in
`Information Retrieval, pp. 302-310.
`
`[Croft et al., 1995] Croft, W. B., Cook, R., & Wilder, D
`(1995), Providing Government Information: on The In-
`ternet: Experiences with THOMAS. In Digital Libraries
`Conference DL'95, pp. 19-24,
`
`[Croft & Harper, 1979] Croft, W. B., & Harper, D. J.
`(1979). Using probabilisticmodels of documentretrieval
`without relevance information.
`Journal of Documenta-
`tion, 35,285-295,
`[Crouch & Yang, 1992] Crouch, C. J., & Yang, B. (1992).
`Experiments in automatic statistical thesaurus construc-
`tion.
`In Proceedings of ACM SIGIR International Con-
`ference on Research and Development in Information Re-
`trieval, pp. 77-88.
`
`10
`
`(cid:26)
`
`
`
`
`number of documents used
`10
`20
`30
`
`collection
`
`5
`
`50
`
`100
`
`
`40.0 -2,.20
`
`Table 7: Performance of local feedback using 11 point average precision.
`
`collection
`I
`
`5
`52.6
`
`number of documents used
`10
`20
`30
`50
`52.0
`48.7
`47.6
`445
`-3.3
`-9.5
`-11.6
`-17.2
`
`100
`
`-25.7
`
`Table 8: Downweight expansion concepts of local feedback by 50% on WEST
`
`(Deerwester et al., 1990] Deerwester, S.; Dumais, S., Pur-
`nas, G., Landauer, T., & Harshman, R. (1990).
`Index-
`ing by latent semantic analysis, Journal of the American
`Society for Information Science, 41,391—407.
`
`[Harman, 1995) Harman, D. (1995). Overview of the Third
`Text REtrieval Conference (TREC-3). In Harman,D., ed-
`itor, Proceedings of the Third Tezt REirieval Conference
`(TREG-3), pp. 1-20. NIST Special Publication 500-225.
`
`[Harman, 1996] Harman, D., editor (1996). Proceedings of
`the TREC 4 Conference. National Institute of Standards
`and Technology Special Publication, to appear,
`
`[Jing & Croft, 1994] Jing, Y., & Croft, W. B. (1994), An
`association thesaurus fo