throbber
United States Patent [19J
`Church et al.
`
`I 1111111111111111 11111 lllll lllll 111111111111111 1111111111 lll111111111111111
`US005541836A
`[lll Patent Number:
`[45] Date of Patent:
`
`5,541,836
`Jul. 30, 1996
`
`[54] WORD DISAMBIGUATION APPARATUS AND
`METHODS
`
`[75]
`
`Inventors: Kenneth W. Church, Chatham;
`William A. Gale, Maplewood; David
`E. Yarowsky, Summit, all of N.J.
`
`[73] Assignee: AT&T Corp., Murray Hill, N.J.
`
`[21] Appl. No.: 814,850
`
`Dec. 30, 1991
`
`[22] Filed:
`Int. Cl.6
`...••.••...•.......................•...........••.•.•. G06F 19/00
`[51]
`[52] U.S. Cl . ......................................................... 364/419.07
`[58] Field of Search ............................... 364/419, 419.07,
`364/419.08, 419.12, 419.13
`
`[56]
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`4,661,924
`4,868,750
`4,914,590
`4,930,077
`4,942,526
`5,056,021
`5,088,038
`5,109,509
`5,128,865
`5,146,405
`5,170,349
`5,237,503
`5,243,520
`5,317,510
`
`4/1987 Okamoto et al ................... 364/DIG. 2
`9/1989 Kucera et al ........................... 364/419
`4/1990 Loatman et al ......................... 364/419
`5/1990 Fan .......................................... 364/419
`7/1990 Okojirna et al ......................... 364/419
`10/1991 Ausborn .................................. 364/419
`2/1992 Tanaka et al. ..................... 364/419 .08
`4/1992 Katayama et al ....................... 395/600
`7/1992 Sadler ..................................... 364/419
`9/1992 Church .................................... 364/419
`12/1992 Yagisawa et al ....................... 364/419
`8/1993 Bedecarrax et al ................ 364/419.08
`9/1993 Jacobs et al ....................... 364/419.08
`3/1994 Yoshimura et al ................ 364/419.08
`
`OTHER PUBLICATIONS
`
`IBM Technical Disclosure Bulletin vol. 33 No. 1B Jun. 1990
`Armonk US pp. 54-55 "Method for Inferring Lexical Asso(cid:173)
`ciations from Textual Co-Occurrences".
`
`E. Black, "An Experiment in Computational Discrimination
`of English Word Senses", IBM Journal of Research and
`Development, vol. 32, No. 2, Mar. 1988, Armonk, NY, USA.
`"Word Sense Disambiguation Using an Untagged Corpus",
`IBM Technical Disclosure Bulletin, vol. 35, No. 6, Nov.
`1992.
`J. D. Benson and B. Brainerd, "Chesterton's Parodies of
`Swinburne and Yeats: A Lexical Approach", Literary and

`Linguistic Computing, vol. 3, No. 4, 1988.
`R. Krovetz, W. Bruce Croft, "Word Sense Disambiguation
`Using Machine-Readable Dictionaries", Proceedings of the
`12th Annual International ACMSIGIR Conference on
`Research and Development in Information Retrieval, Jun.
`1989, Cambridge.
`
`Primary Examiner-Donald E. McElheny, Jr.
`Attorney, Agent, or Firm-Gordon E. Nelson; Jeffrey M.
`Weinick
`
`[57]
`
`ABSTRACT
`
`Apparatus and methods for determining whether a word/
`sense pair is proper for a context. Wide contexts (100 words)
`are employed for both training and testing, and testing is
`done by adding the weights of vocabulary words from the
`context. The weights are determined by Bayesian techniques
`which interpolate between the probability of occurrence of
`a vocabulary word in a conditional sample of the training
`text and the probability of its occurrence in the entire
`training text. A further improvement in testing takes advan(cid:173)
`tage of the fact that a word is generally used in only a single
`sense in a single discourse. Also disclosed are automated
`training techniques including training on bilingual bodies of
`text and training using categories from Roget's Thesaurus.
`
`37 Claims, 2 Drawing Sheets
`
`101
`
`111
`
`TEXT
`
`C115
`
`P116
`
`I
`I
`
`I
`I
`
`c:::J,
`
`WSPPT
`107
`
`108{a)
`121
`
`108(n)
`121
`
`1 - - - -~ - - ;
`
`TW
`WSPPTE
`
`119
`
`113
`
`W119
`
`CW113
`
`103
`
`SLC
`
`WSPL
`117
`
`Page 1 of 13
`
`GOOGLE EXHIBIT 1039
`
`

`

`U.S. Patent
`
`Jul. 30, 1996
`
`Sheet 1 of 2
`
`5,541,836
`
`FIG. 1
`
`WSPPT
`107
`
`105
`WSP
`
`FIG. 2
`
`108(a)
`121
`
`WSPST
`TW w
`WSPPTE
`I
`I
`I
`
`119
`
`WSPPTE
`
`121
`
`113
`
`101
`
`111
`
`TEXT
`
`C115
`
`I
`I
`
`P116
`
`I
`I
`
`r-+
`
`SLC
`
`WSPL
`117
`
`W119
`
`CW113
`
`103
`
`201
`
`Table 6: Selected Portions of Two Models
`11Xsemeofdllly
`wcight*freq weight
`freq wonl
`so
`285
`5.7
`counlerVailing
`111.8
`4.3
`duties
`26
`99.9
`2.7 201137
`ILS '-209
`1.7
`43
`73.1
`tnde
`1.8
`39
`70.21_
`11.MeS
`69.3 204 3.31
`duty
`21
`90ftwood
`3.6 205 19
`68.4
`68.4
`36
`united
`1.9
`58.8
`8.4
`l'CICinds
`7
`S4
`lumber
`3
`18
`S0.4
`shingles
`4.2
`12
`50.4
`4.2
`shakes
`12
`46.8
`3.6
`13
`35
`22
`46.2
`2.1
`against
`canadian
`41.8
`1.1
`38
`
`2025
`
`obligalion sense of d,uy
`weigbt•freq weight
`freq word
`64
`petitions
`3.2
`20
`59.28
`228
`0.26
`to
`I
`0.42
`134
`56.28
`17
`petition
`51
`3
`17
`47.6
`2.8
`pursuant
`89 mr
`0.52
`46.28
`honour
`2.7
`14
`37.8
`1.4
`27
`order
`37.8
`present
`36
`2
`18
`12
`proceedings
`2.8
`33.6
`9
`prescription
`31.S
`3.S
`36
`house
`0.87
`31.32
`9
`reject
`3.3
`29.7
`boundaries
`4.2
`7
`29.4
`electoral
`4.1
`28.7
`7
`
`)
`203
`
`Page 2 of 13
`
`

`

`0\
`~
`~ 00
`i,,,,,,-
`.a;;:;...
`~ Ul
`Ul
`
`N
`s,
`N
`~
`Cl:l =(cid:173)~
`
`~ ~=
`i
`
`0'I
`1,0
`1-0
`1-1,
`
`~ = ~ = ~
`
`•
`• r.,;.
`Cj
`
`(2.0), animal (1.7), family (!.7), common (1.3), ...
`fish (2.4), species (2.3), egg (2.2), inhabit (2.2), breed (2.2), cm (2.2), eat (2.2). female
`ANIMAL.INSECT (Calegory 414): l8il (2.7), bird (2.6), wild (2.6), coat (2.5). nest (2.5).
`
`[
`
`405
`
`engine (2.6), cut (2.6), loodl (2.5), device (2.2), wood (2.0),. ..
`pump (3.5), gear (3.5), piston(3.6), shaft(3.3), tool (3.1), wheel (28), machine (2.7).
`TOOLS/MACHINERY (Category 348): saw (5.1), lever (4.1), blade (3.8). knife (3.8),
`
`[
`
`403
`
`(Wll9
`
`.
`
`401
`
`FIG. 4
`
`In marshy areas .SB The crowned craM • however. occasionally nests i ~ 305
`
`s cmter of rotation .PP A tower craM is an assembly of fabricated ste
`the traditional ABC mclhod and drill were unchanged , and dissatisfac
`I itcment heightens the colors .SB Drills live in die forests of equatorial ~ 303
`steel-penellating carbide-lipped drills forced manufactmers to find sir
`to 8000 BC • flint-edged wooden sickles were used to gather wild grain
`ic equipment , valves for nuclear generators • oil-ldinery turbines • and
`ommon .SB Resembling a power shovel mounted on a floating hull • th
`le equipment such as a hydraulic shovel capable of lifting 26 cubic me
`000 CARVING .SB The gutter adz has a concave blade for forming
`
`301
`
`FIG. 3
`
`---
`
`.
`
`.
`
`. ... --
`
`.
`
`-----------·-·····
`
`-
`
`.
`
`. ---
`
`------
`
`---·
`
`Page 3 of 13
`
`

`

`5,541,836
`
`2
`
`1
`WORD DISAMBIGUATION APPARATUS AND
`METHODS
`
`BACKGROUND OF THE INVENTION
`
`5
`
`1. Qualitative Methods, e.g., Hirst (1987)
`2. Dictionary-based Methods, e.g., Lesk (1986)
`3. Corpus-based Methods, e.g., Kelly and Stone (1975)
`In each case, the work has been limited by a knowledge
`acquisition bottleneck. For example, there has been a tradi(cid:173)
`tion in parts of the AI community of building large experts
`by hand, e.g., Granger (1977), Rieger (1977), Small and
`Rieger (1982), Hirst (1987). Unfortunately, this approach is
`10 not very easy to scale up, as many researchers have
`observed:
`"The expert for THROW is currently six pages long, ...
`but it should be 10 times that size." (Small and Reiger,
`198X)
`15 Since this approach is so difficult to scale up, much of the
`work has had to focus on "toy" domains (e.g., Winograd's
`Blocks World) or sublanguages (e.g., Isabelle (1984), Hir(cid:173)
`schman (1986)). Currently, it is not possible to find a
`semantic network with the kind of broad coverage that
`20 would be required for unrestricted text.
`Others such as Lesk (1986), Walker (1987), Ide (1990,
`Waterloo Meeting) have turned to machine-readable dictio(cid:173)
`narys (MRD) such as Oxford's Advanced Learner's Dictio(cid:173)
`nary of Current English (OALDCE) in the hope that MRDs
`25 might provide a way out of the knowledge acquisition
`bottleneck. These researchers seek to develop a program that
`could read an arbitrary text and tag each word in the text
`with a pointer to a particular sense number in a particular
`dictionary. Thus, for example, if Lesk' s program was given
`the phrase pine cone, it ought to tag pine with a pointer to
`the first sense under pine in OALDCE (a kind of evergreen
`tree), and it ought to tag cone with a pointer to the third sense
`under cone in OALDCE (fruit of certain evergreen trees).
`Lesk's program accomplishes this task by looking for over-
`laps between the words in the definition and words in the
`text "near" the ambiguous word.
`Unfortunately, the approach doesn't seem to work as well
`as one might hope. Lesk (1986) reports accuracies of
`50-70% on short samples of Pride and Prejudice. Part of the
`40 problem may be that dictionary definitions are too short to
`mention all of the collocations (words that are often found
`in the context of a particular sense of a polysemous word).
`In addition, dictionaries have much less coverage than one
`might have expected. Walker (1987) reports that perhaps
`45 half of the words occurring in a new text cannot be related
`to a dictionary entry.
`Thus, like the AI approach, the dictionary-based approach
`is also limited by the knowledge acquisition bottleneck;
`dictionaries simply don't record enough of the relevant
`information, and much of the information that is stored in
`the dictionary is not in a format that computers can easily
`digest, at least at present.
`A third line of research makes use of hand-annotated
`corpora. Most of these studies are limited by the availability
`of hand-annotated text. Since it is unlikely that such text will
`be available in large quantities for most of the polysemous
`words in the vocabulary, there are serious questions about
`how such an approach could be scaled up to handle unre(cid:173)
`stricted text. Kelly and Stone (1975) built 1815 disambigu(cid:173)
`ation models by hand, selecting words with a frequency of
`at least 20 in a half million word corpus. They started from
`key word in context (KWIC) concordances for each word,
`and used these to establish the senses they perceived as
`useful for content analysis. The models consisted of an
`ordered set of rules, each giving a sufficient condition for
`deciding on one classification, or for jumping to another rule
`in the same model, or for jumping to a rule in the model for
`
`30
`
`35
`
`1. Field of the Invention
`The invention relates to computerized text analysis gen(cid:173)
`erally and more specifically to the problem of determining
`whether a given word-sense pair is proper for a given
`context.
`2. Description of the Prior Art
`Machine translation of natural language texts has long
`been a goal of researchers in computer science and linguis(cid:173)
`tics. A major barrier to high-quality machine translation has
`been the difficulty of disambiguating words. Word disam(cid:173)
`biguation is necessary because many words in any natural
`language have more than one sense. For example, the
`English noun sentence has two senses in common usage: one
`relating to grammar, where a sentence is a part of a text or
`speech, and one relating to punishment, where a sentence is
`a punishment imposed for a crime. Human beings use the
`context in which the word appears and their general knowl(cid:173)
`edge of the world to determine which sense is meant, and
`consequently do not even have trouble with texts such as:
`The teacher gave the student the sentence of writing the
`sentence "I will not throw spit wads" 100 times.
`Computers, however, have no general knowledge of the
`world, and consequently, have had a great deal of trouble
`translating sentences such as the above into languages such
`as French, where the word used to translate sentence when
`it is employed in the grammatical sense is phrase and the
`word used to translate sentence when it is employed in the
`sense of punishment is peine.
`The ability to determine a probable sense of a word from
`the context in which the word is used important in other
`areas of text analysis as well. For example, optical character
`recognition systems and speech recognition systems often
`can only resolve a printed or spoken word into a small set of
`possibilities; one way of making a choice among the words
`in the small set is to determine which word has a sense which
`best fits the context. Other examples in this area are deter(cid:173)
`mining whether characters such as accents or umlauts should
`be present on a word or whether the word should be
`capitalized. Additionally, there are text editing tools such as
`spelling checkers or interactive thesauri which present the
`user with a set of suggested alternatives for a word. These
`tools, too, are improved if the set of alternatives is limited to
`words whose senses fit the context.
`Another area of text analysis that will benefit from good
`techniques for determining the probable sense of a word
`from its context is data base searching. Word searches in
`data bases work by simply matching a search term with an
`occurrence of the term in the data base, without regard to the
`sense in which the term is used in the data base. The only 55
`way to restrict a search to a given sense of a term is to
`provide other search terms which the searcher expects to
`find in conjunction with the first search term. Such a search
`strategy will, however, miss occurrences of the first term
`where the first term has the proper sense but is not found in 60
`conjunction with the other search terms. Given a useful way
`of determining what sense of a word best fits a context, it
`will be possible to search by specifying not only the search
`term, but also the sense in which it is being used.
`Past researchers have used
`three different general 65
`approaches to the word disambiguation problem sketched
`above:
`
`50
`
`Page 4 of 13
`
`

`

`5,541,836
`
`3
`another word. The conditions of a given rule could refer to
`the context within four words of the target word. They could
`test the morphology of the target word, an exact context
`word, or the part of speech or semantic class of any of the
`context words. The sixteen semantic classes were assigned 5
`by hand.
`Most subsequent work has sought automatic methods
`because it is quite labor intensive to construct these rules by
`hand. Weiss (1973) first built rule sets by hand for five
`words, then developed automatic procedures for building 10
`similar rule sets, which he applied to an additional three
`words. Unfortunately, the system was tested on the training
`set, so it is difficult to know how well it actually worked.
`Black (1987, 1988) studied five 4-way polysemous words
`using about 2000 hand tagged concordance lines for each
`word. Using 1500 training examples for each word, his
`program constructed decision trees based on the presence or
`absence of 81 "contextual categories" within the context of
`the ambiguous word. He used three different types of
`contextual categories: (1) subject categories from LDOCE, 20
`the Longman Dictionary of Contemporary English (Long(cid:173)
`man, 1978), (2) the 41 vocabulary items occurring most
`frequently within two words of the ambiguous word, and (3)
`the 40 vocabulary items excluding function words occurring
`most frequently in the concordance line. Black found that 25
`the dictionary categories produced the weakest performance
`(47 percent correct), while the other two were quite close at
`72 and 75 percent correct, respectively.
`There has recently been a flurry of interest in approaches
`based on hand-annotated corpora. Hearst (1991) is a very 30
`recent example of an approach somewhat like Black (1987,
`1988), Weiss (1973) and Kelly and Stone (1975), in this
`respect, though she makes use of considerably more syn(cid:173)
`tactic information than the others did. Her performance also
`seems to be somewhat better than the others', though it is 35
`difficult to compare performance across systems.
`As may be seen from the foregoing, the lack of suitable
`techniques for determining which word-sense pair best fits a
`given context has been a serious hindrance in many areas of
`text analysis. It is an object of the apparatus and methods 40
`disclosed herein to provide such techniques.
`
`4
`making a first determination of the sense of the given
`occurrence of the word; and
`making a final determination of the sense of the given
`occurrence of the word by comparing the first deter(cid:173)
`mination with a determination of the sense of a neigh(cid:173)
`boring occurrence of the word.
`The foregoing and other objects, aspects, and advantages
`of the invention will be apparent to one of ordinary skill in
`the art who peruses the following Drawing and Detailed
`Description, wherein:
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`FIG. 1 is a block diagram of apparatus for determining the
`15 probability that a word/sense pair is proper for a context;
`FIG. 2 is a table of data from which table 107 of FIG. 1
`may be constructed;
`FIG. 3 is an example of part of a conditional sample; and
`FIG. 4 is an example of weights computed using Roget's
`categories.
`The reference numbers employed in the Drawing and the
`Detailed Description have three or more digits. The two least
`significant digits are a number within a figure; the remaining
`digits are the figure number. Thus, the element with the
`reference number "305" is first shown in FIG. 3.
`
`DETAILED DESCRIPTION
`
`The following Detailed Description will first provide an
`overview of the theoretical approach to the disambiguation
`·problem in a preferred embodiment, will then describe
`apparatus for solving the disambiguation problem, and will
`finally discuss how the apparatus for solving the disambigu(cid:173)
`ation problem is trained.
`
`BAYESIAN DISAMBIGUATION TECHNIQUES
`
`SUMMARY OF THE INVENTION
`
`50
`
`In one aspect, the invention is a method of automatically 45
`determining that a word/sense pair has a sense which suits
`a given position in a text. The method includes the steps of:
`determining a sequence of words in the text which
`includes the given position and is substantially longer
`than a single line of the text; and
`determining whether the word/sense pair has the suitable
`sense by automatically analyzing the sequence.
`In another aspect, the invention is a method of automatically
`determining a probability that a word/sense pair has a sense
`which suits a given position in a text. The method includes
`the steps of:
`determining a sequence of words in the text which
`includes the given position; and
`automatically employing a Bayesian discrimination tech(cid:173)
`nique involving the words in the sequence and the
`sense of the word-sense pair to determine the probabil(cid:173)
`ity that the word/sense pair has a sense which suits the
`given position.
`In still another aspect, the invention is a method of auto(cid:173)
`matically determining whether a given occurrence of a word
`in a text has a given sense. The method includes the steps of:
`
`65
`
`The word-sense disambiguation problem is a discrimina-
`tion problem, not very different from problems such as
`author identification and information retrieval. In author
`identification and information retrieval, it is customary to
`split the problem up into a testing phase and a training phase.
`During the training phase, we are given two (or more) sets
`of documents and are asked to construct a discriminator
`which can distinguish between the two (or more) classes of
`documents. These discriminators are then applied to new
`documents during the testing phase. In the author identifi-
`cation task, for example, the training set consists of several
`documents written by each of the two ( or more) authors. The
`resulting discriminator is then tested on documents whose
`authorship is disputed. In the information retrieval applica(cid:173)
`tion, the training set consists of a set of one or more relevant
`55 documents and a set of zero or more irrelevant documents.
`The resulting discriminator is then applied to all documents
`in the library in order to separate the more relevant ones
`from the less relevant ones. In the sense disambiguation
`case, the 100-word context surrounding instances of a poly-
`60 semous word (e.g., duty) are treated very much like a
`document.
`It is natural to take a Bayesian approach to these discrimi(cid:173)
`nation problems. Mosteller and Wallace (1964, section 3.1)
`used the following formula to combine new evidence (e.g.,
`the term by document matrix) with prior evidence (e.g., the
`historical record) in their classic authors hip study of the
`Federalist Papers.
`
`Page 5 of 13
`
`

`

`5,541,836
`
`5
`
`P(class 1) _ p(class 1) x L(class 1)
`P(class 2)
`- p(class 2)
`L(class 2)
`
`where P represents a final probability, p represents an initial
`probability, andLrepresents a likelihood. Similar equations
`can be found in textbooks on information retrieval (e.g.,
`Salton (1989), equation 10.17).
`The initial odds depend on the problem. In the author
`identification problem, for example, the initial odds are used
`to model what we know about the documents from the
`various conflicting historical ·records. In the information
`retrieval application, the user may have a guess about the
`fraction of the library that he or she would expect to be
`relevant; such a guess could be used as the prior. It is often
`the case that the prior probability will not have very much
`influence on the outcome, which is fortunate, since the prior
`can sometimes be difficult to estimate.
`It is common practice to decompose the likelihoods into
`a product of likelihoods over tokens (occurrences of words)
`in the document (under appropriate independence assump(cid:173)
`tions):
`
`L(class 1)
`L(class 2)
`
`n
`Pr(tok class 1)
`tok in doc Pr(tok class 2)
`
`The crucial ingredients for this calculation are the prob(cid:173)
`abilities of each term in the vocabulary conditional on the
`document being from a given class. These conditional
`probabilities have been computed in a number of different
`ways depending on the application and the study.
`For two senses, the Bayesian equation mentioned above
`becomes:
`
`P(sense 1)
`P(sense 2)
`
`p(sense 1)
`p(sense 2)
`
`x L(sense 1)
`L(sense 2)
`
`where p, P and L are the initial probability, the final
`probability and likelihood, as before. The initial probabili(cid:173)
`ties are determined from the overall probabilities of the two
`senses in the corpus. As in other large dimension discrimi(cid:173)
`nation problems, the likelihoods are decomposed into a
`product over tokens:
`
`L(sense 1)
`L{sense 2)
`
`n
`tok in context
`
`Pr(tok class 1)
`Pr(tok class 2)
`
`Apparatus for Use in Word-sense Disambiguation: FIG. 1
`FIG. 1 shows apparatus 101 which determines the like(cid:173)
`lihood that a word/sense pair has the sense required by a
`given context. In a preferred embodiment, the output of 50
`apparatus 101 is the logarithm of the likelihood. Log word/
`sense pair likelihood (WSPL) 117 is computed by sense
`likelihood calculator 103. Inputs to sense likelihood calcu(cid:173)
`lator 103 come from text 111, word/sense pair probability
`table 107, and word/sense pair 105. Word/sense pair prob- 55
`ability table 107 is a table which contains a subtable 108 for
`each word/sense pair which is of interest. Each subtable 108
`contains entries (WSPPTE) 123 for at least all of the words
`in text 111 which give useful indications whether the word/
`sense pair 105 to which subtable 108 corresponds has a 60
`sense suitable for position 116. Each entry for a text word
`121 contains an indication of the weight 119 of that word for
`determining whether word/sense pair 105 has a suitable
`sense.
`When sense likelihood calculator 103 is calculating the
`likelihood that a given word/sense pair 105 is suitable for a
`position P 116 in text 111, sense likelihood calculator 103
`
`25
`
`6
`begins reading words of text 111 from 50 words ahead of
`position 116 and continues reading 50 words following
`position 116. The 100 words which contain position 116 are
`position 116's context 115. An important aspect of the
`5 preferred embodiment is that there are substantially more
`words in context 115 than there are in a line of text 111,
`which is presumed to contain about 10 words. For each
`current word (CW) 113 read from context 115, calculator
`103 determines whether the word has an entry 123 in table
`10 107, and if it does, it adds the weight specified in the entry
`to the weights accumulated from the words read so far. The
`value of the accumulated weights 119 is then the likelihood
`117 that word/sense pair 105 is suitable for position 116.
`Of course, in most applications of apparatus 101, the issue
`15 will be which of two or more word/sense pairs best suits
`context 115. To find that out, one simply uses apparatus 101
`as described above for each word/sense pair in tum. The one
`with the highest accumulated weights is the word/sense pair
`best suited to position 116. For example, if apparatus 101
`20 were being used in the word disambiguation application,
`there would be a separate word/sense pair for each sense of
`the word to be disambiguated, and the sense used to translate
`the word would be the sense of the word/sense pair with the
`highest accumulated weights.
`When words are being disambiguated, it will occasionally
`occur that the difference between the highest and the next
`highest accumulated weights is not enough to permit clear
`disambiguation. In that case, SLC 103 may take other
`approaches. One such approach is by analyzing the dis-
`30 course to which text 111 belongs. For purposes of the present
`discussion, a discourse is one or more texts which concern
`a single subject or a set of related subjects. Within a given
`discourse, ambiguous words tend to be used in a single
`sense. For instance, if the discourse concerns grammar,
`35 sentence used in the punishment sense will rarely occur; if
`the discourse concerns criminal trial practice, sentence used
`in the grammatical sense will rarely occur.
`A simple way of analyzing a discourse using apparatus
`101 is as follows: The text belonging to the discourse is
`40 marked as such and calculator 103 stores the most suitable
`sense and weight for every position where there was clear
`disambiguation for the word in question. Generally, there
`will be a heavy preponderance of one of the possible senses,
`and the preponderant sense can be used in those situations
`45 where the analysis of context 115 alone did not clearly
`disambiguate. An even simpler, albeit less precise way, takes
`advantage of the fact that adjacent uses of a word tend to
`belong to the same discourse. In this technique, if the
`analysis of context 115 alone does not clearly disambiguate,
`apparatus 101 will apply the result of the examination of a
`neighboring context 115 which contains the word in ques-
`tion and determine the sense of the word in question from its
`sense in the neighboring context.
`In a preferred embodiment, apparatus 101 is implemented
`in a digital computer system. Text 111, table 107, and
`word/sense pair 105 are stored in the computer system's data
`storage system and sense likelihood calculator 103 is imple(cid:173)
`mented by means of a program executed by the digital
`computer system's processor. In some embodiments, table
`107 may be in read-only memory and sense likelihood
`calculator may be a special-purpose processor which has
`been adapted for rapid referencing of table 107. Such an
`embodiment might, for example, be useful in a pocket
`translating device or in an electronic typewriter.
`65 Computation of Word/Sense Pair Probability Table 107
`Conceptually, word/sense pair probability table 107 is a
`whole set of subtables 108. There is a subtable 108 for at
`
`Page 6 of 13
`
`

`

`5,541,836
`
`7
`least every word/sense pair which is of interest for the task
`we are interested in, and the subtable 108 for that word/sense
`pair contains an entry for at least every word in text 111.
`Moreover, what is really wanted is a word/sense pair prob(cid:173)
`ability table 107 which will work with virtually any text 5
`written in a given language. In the disambiguation context,
`such a table would conceptually include a subtable 108 for
`each word/sense pair for each ambiguous word in the given
`language and each subtable 108 would include ~n~es_ for
`every word in the given language. Of course, optumzat10ns 10
`are possible. For example, most words will contribute little
`or nothing to the disambiguation, and such words may
`simply be left out of the table and given a default weight.
`It is apparent that large word/sense pair probability tables
`107 can only be computed by machine. In the terms 15
`employed in the art, the training of apparatus 101 must be
`automated. Training is in many respects the reverse of the
`testing just described. In the disambiguation context, a
`subtable 108 is created in table 107 for a given ambiguous
`word by examining contexts 115 which contain known 20
`senses of the ambiguous word to find other words in the
`context which can be used in testing to estimate the probable
`sense of the ambiguous word. For example, if the subtable
`108 is for sentence's punishment sense, the contexts will
`certainly show more occurrences of the word 'judge" or the 25
`word "trial" than is normal in English language texts, and
`these words can be weighted accordingly in table 107.
`The largest problem in automating training is of course
`determining which sense the word being trained for has in a
`given context 115. For large tables 107, it is clearly imprac- 30
`tical for human readers to flag the word being trained for as
`having one sense or another in the contexts 115 being used
`for training. During development of apparatus 101, two
`techniques have been discovered for automatically deter(cid:173)
`mining the sense of the word being trained for in a given 35
`context 115. One of the techniques employs bilingual bodies
`of text; the other employs subject matter categories such as
`those provided by a thesaurus.
`Training on Bilingual Bodies of Text
`Training on bilingual bodies of text takes advantage of 40
`two facts: first, a translation of a word which is ambiguous
`in a first language into a word in a second language often
`indicates the sense of the ambiguous word. Thus, if the
`English word sentence is translated peine, we know that the
`English word is being used in the punishment sense. If it is 45
`translated phrase, we know that the grammatical sense is
`meant. Second, there are presently available large amounts
`of machine-readable text for which versions in two lan(cid:173)
`guages are available. One example of such a bilingual body
`of text is the Canadian Hansards, which is the journal in both 50
`English and French of the debates in the Canadian Houses
`of Parliament. In the following, the English version is
`termed the English Hansards and the French version the
`French Hansards.
`The preferred embodiment is trained on the Canadian 55
`Hansards for one sense of a given ambiguous English word
`as follows: first, statistics are gathered for the entire body of
`the English Hansards. The statistics include the number of
`tokens (words and phrases treated as words) in the English
`Canadian Hansards and then the number of occurrences of 60
`each word in the English Hansards. From these statistics the
`probability that a given word will ~ccur in a 100-word
`context of the English Hansards is computed.
`Then a conditional sample of the English Hansards is
`made for the desired sense of the given ambiguous word.
`This is done by finding each occurrence of the given
`ambiguous word in the English Hansards. Then the French
`
`65
`
`8
`word (or phrase) which corresponds to the occurrence is
`located in the French Canadian Hansards. The French word
`which corresponds to the occurence of the ambigu~us
`English word is located by aligning sentences of the En~hsh
`text with corresponding ones of the French text, as descnbed
`in Gale, W., and K. Church, "A Program for Aligning
`Sentences in Bilingual Corpora," Proceedings: 29th Annual
`Meeting of the Association for Computational Linguistics,
`pp. 177-184, 1991. When located, the French wor~ or
`phrase determines whether that occurrence of the g!ven
`word has the desired sense. If it does, the 50 words on either
`side of the occurrence are output to the conditional sample.
`It should be pointed out here that the use of a 100-word
`context in training is as important as it is in the operation of
`apparatus 101.
`Once the conditional sample is made, it is analyzed using
`Bayesian techniques which will be described in detail below
`to determine the weight of each word in the conditional
`sample with regard to the probability that the given ambigu(cid:173)
`ous word has the sense used to make th

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket