throbber

`
`I, Julio Gonzalo, state and declare as follows:
`
`1.
`
`I am a professor at Universidad Nacional de Educación a Distancia (UNED) and a
`
`member of the UNED group in Natural Language Processing and Information Retrieval.
`
`2.
`
`3.
`
`I have personal knowledge of the facts set forth herein.
`
`The Association for Computational Linguistics (ACL) is the premier international
`
`scientific and professional society for people working on computational linguistics or natural
`
`language processing (NLP). ACL holds an annual meeting, which is one of the leading
`
`conferences of the field.
`
`4.
`
`In 1998, the 36th Annual Meeting of the Association for Computational
`
`Linguistics (ACL’98) and the 17th International Conference on Computational Linguistics
`
`(COLING’98) held a single, major joint conference (COLING-ACL’98) on the campus of
`
`Université de Montréal at Montreal, Quebec, Canada.
`
`5.
`
`The COLING-ACL’98 conference was a landmark event because it was the
`
`second combined conference held by COLING and ACL.
`
`6.
`
`I attended the conference as one of the presenters at the “Usage of WordNet in
`
`Natural Language Processing Systems” workshop on August 16, 1998. The focus of this
`
`workshop was to bring together researchers that use WordNet in natural language processing
`
`applications. The topic of my presentation was titled “Indexing with WordNet Synsets can
`
`Improve Text Retrieval.”
`
`7.
`
`Attached as Exhibit A is a true and correct copy of excerpts of the COLING-
`
`ACL’98 workshop on “Usage of WordNet in Natural Language Processing Systems,” obtained
`
`from Association for Computational Linguistics (ACL)
`
`(https://www.aclweb.org/anthology/W98-0700.pdf). Page 1 of Exhibit A shows the workshop
`
`
`
`1
`
`Page 1 of 27
`
`GOOGLE EXHIBIT 1019
`
`

`

`
`
`date on August 16, 1998, page 2 includes a copyright date of 1998, and pages 4-5 identify the
`
`program schedule and the workshop papers. A true and correct copy of my paper titled “Indexing
`
`with WordNet synsets can improve text retrieval” is included in Exhibit A from pages 8-14,
`
`obtained from ACL (https://www.aclweb.org/anthology/W98-0705.pdf).
`
`8.
`
`Paper copies of the workshop papers were made available to attendees of the
`
`conference during the conference, and additional copies of the proceedings could be ordered
`
`directly from ACL. In addition, all of the conference papers were made available on the
`
`conference Web site.
`
`9.
`
`I also submitted a copy of my paper to the public archive arxiv.org hosted by
`
`Cornell University on August 5, 1998.
`
`10.
`
`Attached as Exhibit B is a is a true and correct copy of Cornell University’s
`
`bibliographic data of my paper titled, “Indexing with WordNet synsets can improve text
`
`retrieval” printed from Cornell University (https://arxiv.org/abs/cmp-lg/9808002v1). Page 1 of
`
`Exhibit B shows that I submitted my paper on Wednesday, August 5, 1998, at 14:13:08 UTC. A
`
`true and correct copy of my paper is included in Exhibit B from pages 3-9, obtained from Cornell
`
`University (https://arxiv.org/pdf/cmp-lg/9808002v1.pdf). Page 3 of Exhibit B identifies the
`
`submission date on the first page of my paper.
`
`I declare that the foregoing is true and correct.
`
`
`
`Executed on March 27, 2020
`
`
`
`
`
`
`
`2
`
`Page 2 of 27
`
`

`

`
`
`
`Exhibit A
`Exhibit A
`
`Page 3 of 27
`
`Page 3 of 27
`
`

`

`,| Usage of WordNet • " in :8 Natural Language Processing Systems Proceedings of the Workshop Edited by Sanda Harabagiu I m . i 16th August 1998 Universitd de Montrdal Montreal, Quebec, Canada COLING-ACL '98 i I
`
`Exhibit A
`Page 1 of 14
`
`Page 4 of 27
`
`

`

`© 1998 Uuiversit~ de Montreal. Additional copies of these proceedings may be ordered while stocks last from: Association for Computational Lingustics (ACL) 75 Paterson Street, Suite 9 New B~mswick, NJ 08901 USA Tel: +1-732-342-9100 Fax: +1-732-873-0014 rasmusse@cs, rut gers. edu
`
`Exhibit A
`Page 2 of 14
`
`Page 5 of 27
`
`

`

`I I I i i I I I i i I I I i I I ! I I Program and Organizing Committees PROGRAM COMMITTEE: Alan Biermann (Duke University) Joyce Chai (Duke University) Martin Chodorow (New York University) Christiane FeUbaum (Princeton University) Fernando Gomez (University of Central Florida) Ken Haase (MIT) Sanda Harabagiu (SRI International) Marti Hearst (University of California, Berkeley) Graeme Hirst (University of Toronto) Claudia Leacock (Educational Testing Service) Mitch Marcus (University of Pennsylvania) George A. Miller (Princeton University) Dan Moldovan (Southern Methodist University) Hwee Tou Ng (DSO National Laboratories, Singapore) Philip Resnik (University of Maryland) Yorick Wilks (University of Shef~eld) ORGANIZING COMMITTEE: Sanda Harabagiu (SRI International) Joyce Yue Chai (Duke University)
`
`Exhibit A
`Page 3 of 14
`
`Page 6 of 27
`
`

`

`I I l I I I ! I I I I I I 1 t I I I i Program 9:00-9:05 9:05-9:20 Session 1: 9:20-9:40 Opening Introductory Talk by Dr. George Miller Semantic Disambiguation using WordNet Jiri Stetina, Sadao Kurohashi and Makoto Nagao (Kyoto University, Japan) General Word Sense Disambiguation Method based on a Full Sentential Context 9:40-10:00 Eric Siegel (Columbia University, USA) Disambiguating Verbs with the WordNet Cat- egor9 of the Direct Object 10:00-10:20 Coffee Break 10:20-10:40 Rada Mihalcea and Dan I. Moldovan (Southern Methodist University, USA) Word Sense Disambiguation based on Semantic Density 10:40--11:00 Janyce Wiebe, Tom O'Hara and Rebecca Bruce (New Mexico State University and University of North Carolina, USA) Constructing Bayesian Networks from WordNet for Word-Sense Disambiguation: Representational and Processing Issues Session 2: Usage of WordNet for Information Retrieval and Text Classification 11:00-11:20 Rila Mandala, Tokunaga Takenobu and Tanaka Hoz-mi (Tokyo Institute of Technol- ogy, Japan) The Use of WordNet in Information Retrieval 11:20-11:40 Julio Gonzalo, Felisa Verdejo, Irina Chugur and Juan Cigarr~n (UNED, Spain) In- dezin9 with WordNet Synsets can Improve Tezt Retrieval 11:40-12:00 Sam Scott and Stan Matwin (University of Ottawa, Canada) Text Classification using WordNet Hypernyms 12:00-13:00 Lunch Break Session 3: WordNet Augmentations and Construction 13:00-13:20 Christiane Fellbaum (Rider University and Princeton University, USA) Towards a Representation of Idioms in WordNet 13:20-13:40 Fernando Gomez (University of Central Florida, USA) Linking WordNet Verb Classes to Semantic Interpretation 13:40--14:00 Xavier Farreres, German Rigau and Horatio Rodriguez (Universitat Polit~cnica de Catalunya, Spain) Using WordNet for Building WordNets 14:00-14:20 Oi Yee Kwong (University of Cambridge, UK) Aligning WordNet with Additional Lezical Resources 14:20--14:40 Roberto Basili, Alessandro Cucchiarelli, Carlo Consoli, Maria Teresa Pazienza and Paola Velardi (Universita' di Roma Tor Vergata, Universita' di Ancona and Univer- sita' di Roma, La Sapienza, Italy) Automatic Adaptation of WordNet to Sublanguages and Computational Tasks 14:40-15:00 Simonettn Montemagni and Vito Pirelli (CNR, Italy) Augmenting WordNet-like Lez- ical Resources with Distributional Evidence. An Application-Oriented Perspective 15:00-15:20 Coffee Break Session 4: Ontologies based on WordNet 15:20-15:40 Tom O'Hara, Kavi Mahesh and Sergei Nirenburg (CRL, New Mexico State University, USA) Lezical Acquisition with WordNet and the Microkosmos Ontology 15:40-16:00 Alistair E. Campell and Stuart C. Shapiro (State University of New York at Buffalo, USA) Algorithms for Ontological Mediation 16:00-16:20 Noriko Tomuro (DePaul University) Semi-automatic Induction of Systematic Poly- semy from WordNet ii
`
`Exhibit A
`Page 4 of 14
`
`Page 7 of 27
`
`

`

`I 1 I t I i I i | I I I I I ! ! 16:20--16:40 Michael McHale (Air Force Research Laboratory, USA) A Comparison of WordNet and Roger's Tazonomy for Measuring Semantic Similarity Session 5: Other Applications of WordNet 16:40-17:00 Yuval Krymolowski and Dan Roth (Bar-Ilan University, Israel and University of Illi- nois, USA) Incorporating Knowledge in Natural Language Learning: A Case Study 17:00-17:20 Hongyan Jing (Columbia University, USA) Usage of WordNet in Natural Language Generation 17:20-17:40 Doug Beeferman (Carnegie Mellon University, USA) Lezical Discovery with an En- riched Semantic Network 17:40-18:00 Sanda M. Harabagiu (SRI International, USA) Deriving Metonymic Coercions from WordNet o°o Ill
`
`Exhibit A
`Page 5 of 14
`
`Page 8 of 27
`
`

`

`! ! ! i i ! ! 1 ! t ! Table of Contents Program and Organizing Committees ....................................... i Program ........................................................... ii Table of Contents ..................................................... iv Author Index ........................................................ vi Workshop Papers Jiri Stetina, Sadao Kurohashi and Makoto Nagao General Word Sense Disambiguation Method Based on a Full Sentential Context ...... 1 Erie V. Siegel Disambiguating Verbs with the WordNet Category of the Direct Object ............. 9 Rada Miha!cea and Dan L Moldovan Word Sense Disambiguation based on Semantic Density ....................... 16 Janyce Wiebe, Tom O'Hara and Rebecca Bruce Constructing Bayesian Networks from WordNet .for Word-Sense Oisambiguation: Repre- sentational and Processing Issues ........................................ 23 Rila Mandala, Takenobu Tokunaga and Hozumi 2~nalta The Use of WordNet in Information Retrieval .............................. 31 Julio Gonzalo, Felisa Verdejo, Irina Chugur and Juan Cigarr(m Indexing with WordNet Synsets can Improve Te~ Retrieval ..................... 38 Sam Scott and Stan Matwin Tezt Classification using WordNet Hypernyms .............................. 45 Christiane Fellbaum Towards a Representation of Idioms in WordNet ............................ 52 Fernando Gomez Linking WordNet Verb Classes to Semantic Interpretation ...................... 58 Xavier Farreres, German Rigau and Horacio Rodriguez Using WordNet .for Building WordNets ................................... 65 Oi Yee Kwong Aligning WordNet with Additional Lezical Resources .......................... 73 Roberts Basili, Alessandro CucehiareUi, Carlo Consoli, Maria Teresa Pazie~a and Paola Velardi Automatic Adaptation of WordNet to Sublanguages and to Computational Tasks ...... 80 Simonetta Montemagni and Vito Pirelli Augmenting WordNet-like Lezical Resources with Distributional Evidence. An Application- Oriented Perspective ................................................. 87 Tom O'Hara, Kavi Mahesh and Sergei Nirenburg Lezieal Acquisition ~th WordNet and the Microkosmos Ontology ................. 94 Alistair E. Campell and Stuart C. Shapiro Algorithms for Ontological Mediation ..................................... 102 Noriko Tomuro Semi-automatic Induction of Systematic Polysemy from WordNet ................ 108 Michael L. McI-Iale A Comparison of WordNet and Roget's Tazonomy .for Measuring Semantic Similarity .. 115 Yuval Krymolowski and Dan Roth Incorporating Knowledge in Natural Language Learning: A Case Study ............. 121 iv
`
`Exhibit A
`Page 6 of 14
`
`Page 9 of 27
`
`

`

`I I 1 I "1 i i I I I I l I I II I II Ii il Author Index B~ili, R ...................................................................................... 80 Beeferman, D ........................................ . ....................................... 135 Brace, R ....................................................................................... 23 C~mpell, A.E ................................................................................. 102 Chugur, I ..................................................................................... 38 Cigam~.n, J ................................................................................... 38 Consoli, C .................................................................................... 80 Cucchiarelli, A ................................................................................ 80 Farreres, X .................................................................................... 65 Fellbaum, C ................................................................................... 52 Gomez, F ..................................................................................... 58 Gonzalo, J .................................................................................... 38 Harabagiu, S.M .............................................................................. 142 Jing, H ................................................ ...................................... 128 Krymolowski, Y .............................................................................. 121 Kurohashi, S. .... . ............................................................................. 1 Kwong, O.Y .................................................................................. 73 Mahesh, K .................................................................................... 94 Maudala, R ................................................................................... 31 Matwin, S .................................................................................... 45 McHale, M.L ................................................................................ 115 Mihalcea, R ................................................................................... 16 Moldovan, D.I ................................................................................ 16 Montemagni, S ................................................................................ 87 Nagao, M ...................................................................................... 1 Nirenburg, S, ................................................................................. 94 O'Hara, T ................................................................................. 23, 94 Pazienza, M.T ................................................................................ 80 Pirelli, V ...................................................................................... 87 Rigau, G ...................................................................................... 65 Rodrlguez, H .................................................................................. 65 Roth, D ..................................................................................... 121 Scott, S ....................................................................................... 45 Shapiro, S.C ................................................................................. 102 Siegel, E.V ..................................................................................... 9 Stetina, J ...................................................................................... 1 Tanaka, H .................................................................................... 31 Tokunaga, T .................................................................................. 31 Tomuro, N ................................................................................... 108 Velardi, P ..................................................................................... 80 Yerdejo, F .................................................................................... 38 Wiebe, J ...................................................................................... 23 vi
`
`Exhibit A
`Page 7 of 14
`
`Page 10 of 27
`
`

`

`I I I I I I I I I I I I Indexing with WordNet synsets can improve text retrieval Julio Gonzalo and Felisa Verdejo and Irina Chugur and Juan Cigarr~in UNED Ciudad Universitaria, s.n. 28040 Madrid - Spain {julio, felisa, irina, juanci}@ieec, uned. es Abstract The classical, vector space model for text retrieval is shown to give better results (up to 29% better in our experiments) if WordNet synsets are chosen as the indexing space, instead of word forms. This re- sult is obtained for a manually disambiguated test collection (of queries and documents) derived from the SEMCOR semantic concordance. The sensitiv- ity of retrieval performance to (automatic) disam- biguation errors when indexing documents is also measured. Finally, it is observed that if queries are not disambiguated, indexing by synsets performs (at best) only as good as standard word indexing. 1 Introduction Text retrieval deals with the problem of finding all the relevant documents in a text collection for a given user's query. A large-scale semantic database such as WordNet (Miller, 1990) seems to have a great potential for this task. There are, at least, two ob- vious reasons: • It offers the possibility to discriminate word senses in documents and queries. This would prevent matching spring in its "metal device" sense with documents mentioning spring in the sense of springtime. And then retrieval accu- racy could be improved. • WordNet provides the chance of matching se- mantically related words. For instance, spring, fountain, outflow, outpouring, in the appropri- ate senses, can be identified as occurrences of the same concept, 'natural flow of ground wa- ter'. And beyond synonymy, WordNet can be used to measure semantic distance between oc- curring terms to get more sophisticated ways of comparing documents and queries. However, the general feeling within the informa- tion retrieval community is that dealing explicitly with semantic information does not improve signif- icantly the performance of text retrieval systems. This impression is founded on the results of some experiments measuring the role of Word Sense Dis- ambiguation (WSD) for text retrieval, on one hand, and some attempts to exploit the features of Word- Net and other lexical databases, on the other hand. In (Sanderson, 1994), word sense ambiguity is shown to produce only minor effects on retrieval ac- curacy, apparently confirming that query/document matching strategies already perform an implicit dis- ambiguation. Sanderson also estimates that if ex- plicit WSD is performed with less than 90% accu- racy, the results are worse than non disambiguating at all. In his experimental setup, ambiguity is in- troduced artificially in the documents, substituting randomly chosen pairs of words (for instance, ba- nana and kalashnikov) with artificially ambiguous terms (banana/kalashnikov). While his results are very interesting, it remains unclear, in our opinion, whether they would be corroborated with real oc- currences of ambiguous words. There is also other minor weakness in Sanderson's experiments. When he ~disambiguates" a term such as spring/bank to get, for instance, bank, he has done only a partial disambiguation, as bank can be used in more than one sense in the text collection. Besides disambiguation, many attempts have been done to exploit WordNet for text retrieval purposes. Mainly two aspects have been addressed: the enrich- ment of queries with semantically-related terms, on one hand, and the comparison of queries and doc- uments via conceptual distance measures, on the other. Query expansion with WordNet has shown to be potentially relevant to enhance recall, as it permits matching relevant documents that could not contain any of the query terms (Smeaton et al., 1995). How- ever, it has produced few successful experiments. For instance, (Voorhees, 1994) manually expanded 50 queries over a TREC-1 collection (Harman, 1993) using synonymy and other semantic relations from WordNet 1.3. Voorhees found that the expansion was useful with short, incomplete queries, and rather useless for complete topic statements -where other expansion techniques worked better-. For short queries, it remained the problem of selecting the ex- pansions automatically; doing it badly could degrade retrieval performance rather than enhancing it. In 38
`
`Exhibit A
`Page 8 of 14
`
`Page 11 of 27
`
`

`

`I I I I I I I I I I I I I I I I I I I (Richardson and Smeaton, 1995), a combination of rather sophisticated techniques based on WordNet, including automatic disambiguation and measures of semantic relatedness between query/document con- cepts resulted in a drop of effectiveness. Unfortu- nately, the effects of WSD errors could not be dis- cerned from the accuracy of the retrieval strategy. However, in (Smeaton and Quigley, 1996), retrieval on a small collection of image captions - that is, on very short documents - is reasonably improved us- ing measures of conceptual distance between words based on WordNet 1.4. Previously, captions and queries had been manually disambiguated against WordNet. The reason for such success is that with very short documents (e.g. boys playing in the sand) the chance of finding the original terms of the query (e.g. of children running on a beach) are much lower than for average-size documents (that typically in- elude many phrasings for the same concepts). These results are in agreement with (Voorhees, 1994), but it remains the question of whether the conceptual distance matching would scale up to longer docu- ments and queries. In addition, the experiments in . (Smeaton and Quigley, 1996) only consider nouns, while WordNet offers the chance to use all open-class words (nouns, verbs, adjectives and adverbs). Our essential retrieval strategy in the experiments reported here is to adapt a classical vector model based system, using WordNet synsets as indexing space instead of word forms. This approach com- bines two benefits for retrieval: one, that terms axe fully disambiguated (this should improve precision); and two, that equivalent terms can be identified (this should improve recall). Note that query expansion does not satisfy the first condition, as the terms used to expand are words and, therefore, are in turn am- biguous. On the other hand, plain word sense dis- ambiguation does not satisfy the second condition. as equivalent senses of two different words are not matched. Thus, indexing by synsets gets maximum matching and minimum spurious matching, seeming a good starting point to study text retrieval with WordNet. Given this approach, our goal is to test two main issues which are not clearly answered -to our knowledge- by the experiments mentioned above: • Abstracting from the problem of sense disam- biguation, what potential does WordNet offer for text retrieval? In particular, we would like to extend experiments with manually disam- biguated queries and documents to average-size texts. • Once the potential of WordNet is known for a manually disambiguated collection, we want to test the sensitivity of retrieval performance to disambiguation errors introduced by automatic WSD. This paper reports on our first results answering these questions. The next section describes the test collection that we have produced. The experiments are described in Section 3, and the last Section dis- cusses the results obtained. 2 The test collection The best-known publicly available corpus hand- tagged with WordNet senses is SEMCOR (Miller et al., 1993), a subset of the Brown Corpus of about 100 documents that occupies about 11 Mb. (in- cluding tags) The collection is rather heterogeneous, covering politics, sports, music, cinema, philosophy, excerpts from fiction novels, scientific texts... A new, bigger version has been made available recently (Landes et al., 1998), but we have not still adapted it for our collection. We have adapted SEMCOR in order to build a test collection -that we call IR-SEMCOR- in four manual steps: • We have split the documents to get coherent chunks of text for retrieval. We have obtained 171 fragments that constitute our text collec- tion, with an averagv length of 1331 words per fragment. • We have extended the original TOPIC tags of the Brown Corpus with a hierarchy of subtags, assigning a set of tags to each text in our col- lection. This is not used in the experiments reported here. • We have written a summary for each of the frag- ments, with lengths varying between 4 and 50 words and an average of 22 words per summary. Each summary is a human explanation of the text contents, not a mere bag of related key- words. These summaries serve as queries on the text collection, and then there is exactly one relevant document per query. • Finally, we have hand-tagged each of the summaries with WordNet 1.5 senses. When a word or term was not present in the database, it was left unchanged. In general, such terms correspond to groups (vg. Ful- ton_County_Grand-Jury), persons (Cervantes) or locations (Fulton). We also generated a list Of "stop-senses" and a list of "stop-synsets', automatically translating a stan- dard list of stop words for English. Such a test collection offers the chance to measure the adequacy of WordNet-based approaches to IR in- dependently from the disambiguator being used, but also offers the chance to measure the role of auto- matic disambiguation by introducing different rates 39
`
`Exhibit A
`Page 9 of 14
`
`Page 12 of 27
`
`

`

`! ! Experiment 07o correct document retrieved in first place 62.0 Indexing by synsets Indexing by word senses 53.2 Indexing by words (basic SMART) 48.0 Indexing by synsets with a 5% errors ratio 62.0 Id. with 10% errors ratio 60.8 Id. with 20% errors ratio 56.1 Id. with 30% errors ratio 54.4 Indexing with all possible synsets (no disambiguation) 52.6 Id. with 60% errors ratio 49.1 Synset indexing with non-disambiguated queries 48.5 Word-Sense indexing with non-disambiguated queries 40.9 Table 1: Percentage of correct documents retrieved in first place of "disambignation errors" in the collection. The only disadvantage is the small size of the collection, which does not allow fine-grained distinctions in the results. However, it has proved large enough to give meaningful statistics for the experiments reported here. Although designed for our concrete text retrieval testing purposes, the resulting database could also be useful for many other tasks. For instance, it could be used to evaluate automatic summarization sys- tems (measuring the semantic relation between the manually written and hand-tagged summaries of IR- SEMCOR and the output of text summarization sys- tems) and other related tasks. 3 The experiments We have performed a number of experiments using a standard vector-model based text retrieval system, SMART (Salton, 1971), and three different indexing spaces: the original terms in the documents (for standard SMART runs), the word-senses correspond- ing to the document terms (in other words, a man- ually disambiguated version of the documents) and the WordNet synsets corresponding to the document terms (roughly equivalent to concepts occurring in the documents). These are all the experiments considered here: 1. The original texts as documents and the sum- maries as queries. This is a classic SMART run, with the peculiarity that there is only one rele- vant document per query. 2. Both documents (texts) and queries (sum- maries) are indexed in terms of word-senses. That means that we disambiguate manually all terms. For instance "debate" might be substi- tuted with "debate~l:10:01:?'. The three num- bers denote the part of speech, the WordNet lexicographer's file and the sense number within the file. In this case, it is a noun belonging to the noun.communication file. With this collection we can see if plain disam- biguation is helpful for retrieval, because word senses are distinguished but synonymous word senses are not identified. 3. In the previous collection, we substitute each word sense for a unique identifier of its associ- ated synset. For instance, "debate~l:lO:01:." is substituted with "n04616654", which is an identifier for "{argument, debate1}" (a discussion in which reasons are advanced for and against some proposition or proposal; "the argument over foreign aid goes on and on') This collection represents conceptual indexing, as equivalent word senses are represented with a unique identifier. 4. We produced different versions of the synset indexed collection, introducing fixed percent- ages of erroneous synsets. Thus we simulated a word-sense disambiguation process with 5%, 10%, 20%, 30% and 60% error rates. The er- rors were introduced randomly in the ambigu- ous words of each document. With this set of experiments we can measure the sensitivity of the retrieval process to disambiguation errors. 5. To complement the previous experiment, we also prepared collections indexed with all pos- sible meanings (in their word sense and synset versions) for each term. This represents a lower bound for automatic disambiguation: we should not disambiguate if performance is worse than considering all possible senses for every word form. 6. We produced also a non-disambiguated version of the queries (again, both in its word sense and 40
`
`Exhibit A
`Page 10 of 14
`
`Page 13 of 27
`
`

`

`Figure 1: Different indexing approaches c 0 u .= o. 0.8 0.6 0.4 0.2 0 1 0.3 0.4 1. Indexing by synsets o 2. Indexing by word senses -+--- 3. Indexing by words (SMART) -o-- 1 2 ~ ~-.... I I I I ! 0.5 0.6 0.7 0.8 0.9 Recall synset variants). This set of queries was run against the manually disambiguated collection. In all cases, we compared arc and ann standard weighting schemes, and they produced very similar results. Thus we only report here on the results for nnn weighting scheme. 4 Discussion of results 4.1 Indexing approach In Figure 1 we compare different indexing ap- proaches: indexing by synsets, indexing by words (basic SMART) and indexing by word senses (ex- periments 1, 2 and 3). The leftmost point in each curve represents the percentage of documents that were successfully ranked as the most relevant for its summary/query. The next point represents the doc- uments retrieved as the first or the second most rel- evant to its summary/query, and so on. Note that, as there is only one relevant document per query, the leftmost point is the most representative of each curve. Therefore, we have included this results sep- arately in Table 1. The results are encouraging: • Indexing by WordNet synsets produces a remarkable improvement on our test collection. A 62% of the documents are retrieved in first place by its summary, against 48% of the ba- sic SMART run. This represents 14% more documents, a 29% improvement with respect to SMART. This is an excellent result, al- though we should keep in mind that is obtained with manually disambiguated queries and doc- uments. Nevertheless, it shows that WordNet can greatly enhance text retrieval: the problem resides in achieving accurate automatic Word Sense Disambiguation. • Indexing by word senses improves perfor- mance when considering up to four documents retrieved for each query/summary, although it is worse than indexing by synsets. This con- firms our intuition that synset indexing has ad- vantages over plain word sense disambiguation, because it permits matching semantically simi- lar terms. Taking only the first document retrieved for each summary, the disambiguated collection gives a 53.2% success against a 48% of the plain SMART query, which represents a 11% im- provement. For recall levels higher than 0.85, however, the disambiguated collection performs slightly worse. This may seem surprising, as word sense disambiguation should only increase our knowledge about queries and documents. But we should bear in mind that WordNet 1.5 is not the perfect database for text retrieval, and indexing by word senses prevents some match° ings that can be useful for retrieval. For in- 41
`
`Exhibit A
`Page 11 of 14
`
`Page 14 of 27
`
`

`

`I I I I I I I I I I I I I I I I I I I t- O o tl. 0.8 0.6 0.4 0.2 0 I 0.3 0.4 Figure 2: sensitivity to disambiguation errors ! ! I ! 1. Manual disambiguation x 2. 5% error -~--- 3. 10% error -E3-- 4. 20% error .-~ ..... 5. 30% error --~--- 6. All possible synsets per word (without disambigua~on) -~.- 7. 60% error -<,--- 8. SMART -~'--- 21 3~ .... ::.-....-...:.. "-....-..>. • .--..::....... ..... ~. .~ °e. ~-- ! I I I I 0.5 0.6 0.7 0.8 0.9 Recall stance, design is used as a noun repeatedly in one of the documents, while its summary uses design as a verb. WordNet 1.5 does not include cross-part-of-speech semantic relations, so this relation cannot be used with word senses, while term indexing simply (and successfully!) does not distinguish them. Other problems of Word- Net for text retrieval include too much fine- grained sense-distinctions and lack of domain information; see (Gonzalo et al., In press) for a more detailed discussion on the adequacy of WordNet structure for text retrieval. 4.2 Sensitivity to disambiguation errors Figure 2 shows the sensitivity of the synset indexing system to degradation of disambiguation accuracy (corresponding to the experiments 4 and 5 described above). Prom the plot, it can be seen that: • Less than 10% disambiguating errors does not substantially affect performance. This is roughly in agreement with (Sanderson, 1994). • For

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket