throbber
(h
`
`TII-ITY
`SERIAL
`NUMBER
`
`PATEI
`
`I
`
`;td*s -[
`
`FILING DATE
`i:i ::ji Ij r::t';;! /'*'.;l
`
`cl^ss
`
`I NUMBER
`t
`
`Olf i-ii."il"i ..i,,
`Fr, 1 ,,r:i''''i'
`$..1 il,: i',; i',; i
`&
`
`""
`i'.i
`
`'::
`
`'.''
`
`irli;:i.i:ii:;i:ii::ii....:rj,, i{rt.l. i..ilr,.ti::i 1 i:,i:li[:: .l irji.., !:.:ii.,'ii.
`i.:,,, !::t l: t-::l-lF:tfil.iLil-ii'.i ., i::il:i jlri(ii.Lll]
`" i*i;l:l
`i:i[::.1.ri..-i::::\iL.ii:::. " ir..'|i..'l "
`
`,, i,:,1 i:.:i I !.:ii::.1..:l i::il.:i [:. l:::. ,, i- i l:]l
`i.d.i.1....;,,.. .i. r';ii.i i].j " i:ri.tL, i.:i1.,!.,
`
`6078051
`
`607605t
`
`GROUP ART UNIT
`-;l;ji-i-
`2"7:tV
`i..liI: ., i.,.i ri i:
`.i i..,t.-t i::i f.l,, i:'.i l,::: 1... il. i;:::
`i::ii:::.i.:'lrlt_.tirii.i,, i4!{i I l.::.i:.t[il:.iil
`
`EGMINER
`
`".J
`
`ri::+:i.:::i-Ji!.|"i .i. i\ii-t l. idi:,i
`vfil i:i :i l:::''i' !::: j.:r
`
`?t$)i"f;:i:t': :l: *: :ji: :i: :i: :i: i+: :l
`
`ii: :+: :l: :+; :ai :{: :i: :j; ji: ii: :i.: :+;
`
`I
`,ir ;i: F.f'lflf: 'i ijiilr.j {i-\i::,$:ii... :{ t::::r+[ ], t...ri.il;,+' :+, :ii: :i: :t: :+: :1r: :.1: :.t: :i: ii: :i:
`!fi::i::l j: Fr I F:i:r
`/
`
`ntstrnfrn
`APf{ e r 200t
`rflnffffi0t
`
`N7>N
`
`t_
`
`hI
`
`ilaro €
`
`i:::'i iii:ii:l:"i::;ilii i:r'i i.- l'iil{:,i t-.i t:-:i:::i.1iiriF: r::,ii:;j,_}i,.i i"i:iil
`
`-
`
`pdodry chimed
`35 USC ll9condiuons met O
`Verified and
`
`yes
`
`ni,
`no
`
`tr
`
`FILING FEE
`RECEIVED
`
`ii 1
`ir .i. .: .
`
`ttti
`._-. '.,.i ,. ,.,, :,.,
`
`ATTORNEY'S
`oocrer no.
`{, fr, :i r..t r..t iri i:; :i. .,
`
`l::.
`rl
`
`!.'i
`
`'!
`
`iili::: i"ii::iirli.:i-i .!.;."ii.j :::li:'";"i-t j. i::.tjr.:1i.... l..j"l- j" i
`
`irjlr::.i'.ii:il'rl | :1. t:..: i:ii::i
`
`t1i--i1J
`!"1 !_ l-t !
`
`til-ii'l
`l !r
`
`I r'"1 i
`
`U.S. OEPT. OF COMM./ PAT & TM-PTO-436L (Rev.12-9
`
`at
`ah
`ut
`
`Goo
`
`5
`
`.
`
`NOTICE OF ALLOWANCE MAILED
`
`:t?'f.?lfi^1,'l?iHl "' [s -"" ]tri,,,
`f -J a '77r,"
`
`|SSUE FEE €jrl
`Datg Pald "
`Amount Due
`/*/o ,()o
`l? f t.lTT
`
`Label
`Area
`
`rfd
`
`r 1.a^ 1s-qn 71t ;:#.''l"\
`23
`
`Aoplications Examiner
`CLAIMS ALLOWED
`Total Claims
`Print Claim
`
`I
`
`DRAWING
`
`A0{friNEH
`27.i'1--
`
`PREPARED FOR ISSUE
`
`Examiner
`
`Sheets Dnruo
`
`Figs. Drwg.n
`t8
`ISSUEBATCH yl V \
`NUMBER 11 3
`
`/
`
`Print Fig.
`
`ll
`
`WARNING: The information disclosed herein may be restlcte-d. Unauthorized disclosure may be prohi:ited
`by the United States Code Tltle 35, Se$idis 122, 181 and 368. Possession outside the U.S.
`Patent & Trademark Otfice is restrictgdte authorized employees and contractors only.
`
`Form PTO-436A
`(Rev. 8/92)
`
`4r
`4:i;;
`'litt
`
`,'ti;J::"*+ i',: f,fJ.f ",
`'rhlsq
`, tit'
`
`**f,]::"
`
`f o'rrnat nmilri$NR$ L*$hb) $st*
`
`*:" t
`
`(FACE)
`
`Page 1 of 364
`
`GOOGLE EXHIBIT 1026
`
`

`

`I
`
`.j.
`
`.ll
`
`L-]-
`
`i\:
`\,
`ITENT APFLICATION
`PATENT APFLICNTIbN
`' lllllllllillllllllllllffiiffiffillillillllt'!i'
`08885814
`
`CONTENTS
`
`i
`
`:
`
`APPROVED FOI]I LICEN$E
`
`*i,,1$$l
`
`Da.te
`Received ,
`.or
`Mailed
`
`r/v
`
`,rt');!s:,,n;
`
`,Q {)*-"
`papers.ld ffb lTp:
`"t't,t:tt-t/{ l;
`
`tt - t;'-.qr-l
`AuJtl
`t-,
`,lU
`,Y
`
`i-s*ttryt; *a x-qq
`
`/-r\D-'9-,-tL)
`ii--t;
`\\
`ll,
`il
`- r\ l\
`lt)
`ilj
`til!
`,'
`
`ffi
`
`,.
`
`5. r.d0
`
`4.
`
`5.
`
`6.
`
`7.
`
`8.
`
`9.
`
`10.
`
`11.
`
`12.
`
`13.
`
`14.
`
`15.
`
`16.
`
`17.
`
`18.
`
`19.
`
`20.
`
`22.
`
`23.
`
`24.
`
`25.
`
`26.
`
`27.
`
`28.
`
`29.
`
`30.
`
`2.191.
`
`\
`
`(FRONT)
`
`Page 2 of 364
`
`

`

`pArENr ApprrcArroN sERrAr
`
`llffllu|lqlulfll|!ryilllfllfil
`
`U.S. DEPARTI{ENT OF COMMERCE
`PATEM A}TD TRADEMANT OFFICE
`FEE RECORD SEEET
`
`qM
`
`08/1t/1997 TI}ADET 00000041 00096814
`01 FC:101
`770.00 0p
`0? FCrl0P
`480.00 0p
`03 FC:103
`740.00 0F
`
`Pt0-1556
`$187)
`
`qrq
`
`Page 3 of 364
`
`

`

`Snnn and Brnny lr,p
`6300 Columbia Center
`le, Washingt on 98104-7 A92
`Phone (206) 622-4900
`Fax (206) 682-6031
`
`Express Mail Certifioate No.:
`DocketNo.:
`Date:
`
`nM{fi229266US
`66100s.s12
`March 7,1997
`
`MESD
`MAR - 7 tggT
`
`BOXPATENT
`ASSISTA]YT CO
`PATENTS
`20 1I JEFFERSON DAVIS HIGI{WAY
`WASHINGTONDC 20231
`
`Sir:
`
`William B. Dolan,
`
`Transmitted herewith for filing is the patent application of:
`John J. Messerly, George E. Heidorn, Stephen D. Richardson,
`and Karen Jensen
`INFORMATION RETRIEVA L VTil,IzJNG SEMANTIC
`REPRESENTATION OF TEXT
`Enclosed are:
`!! sheets of informal drawings (Figs. 1-18).
`An assignment of the inventionto: Microsoft Corporation, a corporation of the State of Washington.
`A Declaration and Power of Attorney.
`A verified statement to establish small entity status under 37 C.F.R. 1.9 and 37 C.F.R. I.27 .
`A certified copy of Application No. , filed , from which priority is claimed, .
`The filing fee has been caloulated as shown below.
`Filed without fee or formal papers.
`
`Inventors:
`
`For:
`
`txl
`txl
`txl
`fl
`tl
`Xt
`tl
`
`X'or:
`
`Utilltv Fee
`
`Total Claims
`
`Indenendent Claims
`( ) MuftipleDependent
`Claim Presented
`
`ASSIGI\MEIIT
`
`No. Fileil
`
`No. Extra
`
`Small Entitv
`Rate
`
`Fee 'to
`
`$ 385
`
`54
`
`9
`
`34
`
`6
`
`x1l
`
`x40
`
`+ 130
`
`+40
`
`$
`
`$
`
`$
`
`$
`
`TOTAL $
`
`OtherThanA
`SmallEntitv
`Rate
`
`Fee
`
`$ 770
`
`$ 748
`
`$ 480
`
`$ -0-
`
`$40
`
`x22
`
`x80
`
`+ 260
`
`+40
`
`TOTAL $ 2.038
`
`or
`or
`
`or
`
`or
`
`or
`
`or
`
`or
`
`.or
`
`tl
`rxl
`txl
`
`Please charge my Deposit Account No. 1 9- 1090 in the amount of $-. A duplioate copy of this sheet is enclosed.
`A check in the amount of $ 2.038 is enclosed.
`The Assistant Commissioner is hereby authorized to charge
`communication or credit any overpayment to Deposit Acoount No.
`Xl,qrry additional filing fees required under 3? C.F.R. 1. 1 6.
`[X] Any patent applioation processing fees under 37 C.F.R. 1.
`
`payment of the following fees associated with this
`1P-1090. A duplicate copy ofthis sheet is enclosed.
`
`Stsved D.
`Registration No.
`
`l{rl
`
`:'rE
`
`l's
`
`:. 1
`
`a
`
`Page 4 of 364
`
`

`

`UNITED STATES PATENT AND TRADEMAITK OFFICE
`
`PATENT
`
`John J. Messerly, George E. Heidorn, Stephen D. Richardson,
`
`William B. Dolan, and Karen Jensen
`
`March 7, 1997
`
`INFORMATION RETRIEVAL UTILIZING SEMANTIC
`REPRESENTATION OF TEXT
`
`Docket No.
`
`661005.512
`
`Date
`
`March 7, 1997
`
`Box Patent Application
`Assistant Commissioner for Patents
`2011 Jefferson Davis Highway
`Washington, DC 20231
`
`", :'
`
`dfi6
`
`ll't,!
`
`LI
`
`CERTIFICATE OF MAILING BY "E)GRESS MAIL''
`
`Sir:
`
`I hereby certi$ that the enclosures listed below are being deposited with the
`United States Postal Service "EXPRESS MAIL Post Office to Addressee" service under 37
`C.F.R. $ 1.10, Mailing Label Certificate No. EM41722}2661J5, on March 7, lgg7, addressed
`to Box Patent Application, Assistant Commissioner for Patents, 2OIl Jefferson Davis
`Highway, Washington, DC 2023L
`
`Resp ectfu lly submitted,
`
`SEED and BERRY t t P
`
`SDL:ljj
`
`Enclosures:
`Postcard
`Check No. 45395 for $2,038
`Form PTO-I082 (+ copy)
`Specification, Claims, Abstract (37)
`i8 Sheets of Informal Drawings (Figs. 1-18)
`Declaration and Power of Attorney
`Form PTO-1595
`Assignment
`Information Disclosure Statement,
`Form PTO- 1449, References Cited (3)
`
`c:\sdl\1952
`
`Page 5 of 364
`
`

`

`t, 11 K,00 *lol-A
`
`Express Mail No. 8M417229266U5
`
`INFORMATION RETRIEVAL UTILIZING SEMANTIC
`
`REPRESENTATION OF TEXT
`
`TECHNICAL FIELD
`The present invention relates to the field of information retrieval, and,
`
`5
`
`more specifically, to the field of information retrieval tokenization.
`
`BACKGROLIND OF THE INVENTION
`Information retrieval refers to the process of identiSing occlilrences in a
`target document of words in a query or query document. Information retrieval can be
`
`gainfirlly applied in several situations, including processing explicit user search queries,
`identifying documents relating to a particular document, judging the similarities of two
`documents, extacting the features of a document, and summarizing a document.
`Information refiieval typically involves a two-stage process: (1) In an
`indexing stage, a document is initially indexed by (a) converting each word in the
`document into a series of characters intelligible to and differentiable by an information
`retrieval engine, called a "token" (known as "tokenizing" the document) and
`(b) creating an index mapping from each tokw to the location in the document where
`the token occurs. (2) In a query phase, a query (or query document) is similarly
`tokenized and compared to the index to identify locations in the document at which
`
`tokens in the tokenized query occur.
`Figure I is an overview data flow diagram depicting the information
`retrieval process. In the indexing stage, a tnget document 111 is submitted to a
`tokenizer 112. The target document is comprised of a number of strings, such as
`sentences, each occu:ring at a particular location in the target document. The strings in
`the target document and their word locations are passed to a tokenizet 120, which
`converts the words in each string into a series of tokens that are intelligible to and
`distinguishable by an information retrieval engine 130. An index construction portion
`131 of the information retrieval engine 130 adds the tokens and their locations to an
`index 140. The index maps each unique token to the locations at which it occurs in the
`target document. This process may be repeated to add a number of different target
`
`10
`
`15
`
`20.
`
`25
`
`30
`
`F-n
`
`tl
`
`in'
`
`-4#,d
`
`Page 6 of 364
`
`

`

`2
`
`documents to the index, if desired. If the index 140 thus represents the text in a number
`of target documents, the location information preferably includes an indication of, for
`
`each location, the document to which the location coresponds.
`
`In the query phase, a textual query 112 is submiued to the tokenizer 120.
`5 The query may be a single string, or sentence, or may be an entire document comprised
`of a number of strings. The tokenizer 120 converts the words in the text of the query
`112 into tokens in the same manner that it converted the words in the target document
`into tokens. The tokenizer 120 passes these tokens to an index retrieval portion 132 of
`the information retrieval engine 130. The index retrieval portion of the information
`10 retrieval engine searches the index 140 for occurrences of the tokens in the target
`document. For each of the tokens, the index retrieval portion of the information
`retrieval engine identifies the locations at which the token occurs in the target
`document. This list of locations is returned as the query result 113.
`Conventional tokenizers typically involve superfi cial transformations of
`15 the input text, such as changing each upper-case character to lower-case, identiffing the
`individual words in the input text, and removing suffixes from the words. For example,
`
`'<:5#r''''
`
`20
`
`.^"-i:,.-: ,. i r..
`
`: ..,' .'
`
`'
`
`25
`
`a conventional tokenizer might convert the input text string
`
`The father is holding the baby.
`
`into the following tokens:
`
`,h"
`
`father
`
`is
`
`hold
`
`the
`
`baby
`
`f5!
`1?!
`\r{
`ili
`
`{r,
`i*i
`
`Ii
`
`il"'?
`
`i*J
`t-I
`
`'tS n
`
`Page 7 of 364
`
`

`

`3
`
`This approach to tokenization tends to make searches based on it overinclusive of
`occunences in which senses of words are different than the intended sense in the query
`text. For example, the sample input text string uses the verb "hold" in the sense that
`means "to support or grasp." However, the token "hold" could match uses of the word
`"hold" that mean "the cargo area of a ship." This approach to tokenization also tends to
`be overinclusive of occurrences in which the words relate to each other differently than "
`the words in the query text. For example, the sample input text string above, in which
`o'father" is the subject of the word "held" and o'baby" is the object, might match the
`sentence "The father and the baby held the toy," in which "bfuy" is a subject, not an
`object. This approach is further underinclusive of occurrences that use a different, but
`semantically related word in place of a word of the query text. For example, the input
`text string above would not match the text string "The parent is holding the baby."
`Given these disadvantages of conventional toke rization, a tokenizer that .E*"M*
`IL
`semantic relationships implicit in the tokenized text would have significant utility.
`
`SUMMARY OF THE INVENTION
`The invention is directed to performing information retrieval using an
`improved tokenizer that parses input text to identifi' logical forms, then expands the
`logical forms using hypemyms. The invention, when used in conjunction with
`conventional information retrieval index construction and querying, reduces the number
`of identified occurrences for which different senses were intended and in which words
`bear different relationships to each other, and increases the number of identified
`occuffences in which different but semantically related terms are used.
`The invention overcomes the problems associated with conventional
`tokenization by parsing both indexed and query text to perform lexical, syntactic, and
`
`semantic analysis of this input text. This parsing process produces one or more logical
`forms, which identiS words that perform primary roles in the query text and their
`intended senses, and that further identifu the relationship between those words. The
`
`parser preferably produces logical forms that relate the deep subject, verb, and deep
`object of the input text. For example, for the input text "The father is holding the
`baby," the parser might produce the following logical form:
`
`10
`
`15
`
`20
`
`25
`
`30
`
`#a
`
`ii.
`Alat-
`
`i-,a
`
`'c
`
`i.-Jt
`
`ilw
`irrri
`e--ti
`ari
`'4
`
`--1"
`
`I
`
`; r
`
`
`
`Page 8 of 364
`
`

`

`_\,.
`
`*-*.d.d
`
`deep subject
`
`father
`
`verb
`hold
`
`deep object
`
`baby
`
`.j*q
`
`i"-r,
`
`iii|
`
`'1 i
`Xrn
`
`f""i
`
`,fii
`
`10
`
`15
`
`The parser fi.rther ascribes to these words the particular senses in which they are used in
`
`the input text.
`
`Using a digital dictionary or thesaurus (also known as a "linguistic
`knowledge base") that identifies, for a particular sense of a word, senses of other words
`
`that are generic terms for the sense of the word ("hypernyms"), the invention changes
`
`the words within the logical forms produced by the parser to their hypernyms to create
`
`additional logical forms having an overall meaning that is hypernymous to the meaning
`
`of these original logical forms. For example, based on indications from the dictionary
`that a sense of o'parent" is a hypernym of the ascribed sense of "father," a sense of
`"touch" is a hypernym of the ascribed sense of "hold," and a sense of o'child" and sense
`of "person" are hypernyms of the ascribed sense of "baby," the invention might create
`
`additional logical forms as foliows:
`
`deep subiect
`
`parent
`
`father
`
`parent
`
`father
`
`parent
`
`father
`
`parent
`
`father
`
`parent
`
`father
`
`paxent
`
`verb
`
`hold
`
`touch
`
`touch
`
`hold
`
`hold
`
`touch
`
`touch
`
`hold
`
`hold
`
`touch
`
`touch
`
`deep object
`
`baby
`
`baby
`
`baby
`
`I child
`child
`
`child
`
`child
`
`person
`
`person
`
`person
`
`person
`
`Page 9 of 364
`
`

`

`.
`
`5
`
`The invention then transforms all of the generated logical forms into
`tokens intelligible by the information retrieval system that compares the tokenized
`
`query to the index, and submits them to the information retrieval system.
`
`5 BRIEF DESCRIPTION OF THE DRAWINGS
`Figure I is an overview data flow diagram depicting the information.
`retrieval process.
`Figure 2 is a highJevel block diagram of the general-purpose computer
`
`10
`
`system upon which the facility preferably operates.
`Figure 3 is an overview flow diagram showing the steps preferably
`performed by the facility in order to construct and access an index semantically
`
`representing the target documents.
`Figure 4 is a flow diagram showing the tokenize routine used by the
`facility to generate tokens for an input sentence.
`Figure 5 is a iogical form diagram showing a sample logical form.
`Figure 6 is an input text diagram showing an input text fragment for
`which the facility would construct the logical form shown in Figure 5.
`Figure 7A is a linguistic knowledge base diagram showing sample
`hypernym relationships identified by a linguistic knowledge base.
`Figure 78 is a linguistic knowledge base diagram showing the seiection
`
`15
`
`20
`
`of hypernyms of the deep subject of the primary logical form, man (sense 2).
`Figure 8 is a linguistic knowledge base diagram showing the selection of
`
`hypemyms of the verb of the primary logical form, kiss (sense 1).
`Figures 9 and 10 are linguistic knowledge base diagrams showing the
`25 selection of hypernyms of the deep object of the primary logical form, pig (sense 2).
`Figure 11 is a logical form diagram showing the expanded logical form.
`Figure L2 is achart diagram showing the derivative logical forms created
`
`by permuting the expanded primary logical form.
`Figure 13 is an index diagram showing sample contents of the index.
`Figure 14 is a logical form diagram showing the logical form preferably
`
`30
`
`constructed by the facility for the query "man kissing horse."
`
`fi
`fli]
`,i"Sr/&
`,hirdffi
`i*i,
`;sq
`uJ
`il#
`.|;-$l
`u,
`
`#f
`
`EN
`
`.r'
`
`,;
`
`1 !
`
`
`
`Page 10 of 364
`
`

`

`6
`
`Figure 15 shbws the expansion of the primary logical form using
`
`hypernyms.
`
`Figure 16 is a linguistic knowledge base diagram showing the selection
`
`of hypernyms of the deep object of the query logical form, horse (sense 1).
`
`Figure 17 is apartial logical form diagram showing apafiral logical form
`
`corresponding to a partial query containing only a deep subject and a verb.
`Figure 18 is a partial logical form diagram showin g apartral logical form
`
`corresponding to a partial query containing only a verb and a deep object.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`DETAILED DESCRIPTION OF THE INVENTION
`The present invention is directed to performing information retrieval
`utilizing semantic representation of text. When used in conjunction with conventional
`
`information retrieval index construction and querying, the invention reduces the number
`of identified occurrences for which different senses were intended and in which words
`bear different relationships to each other, and increases the number of identified
`occurrences in which different but semantically related terms are used.
`
`In a preferred embodiment, the conventional tokenizer shown in Figure 1
`is replaced with an improved information retrieval tokenization facility ("the facility")
`that parses input text to identify logical forms, then expands the logical forms using
`hypernyms. The invention overcomos the problems associated with conventional
`tokenization by parsing both indexed and query text to perform lexical, syntactic, and
`semantic analysis of this input text. This parsing process produces one or more logical
`forms, which identiff words that perform primary roles in the query text and their
`intended senses, and that further identifu the relationship between those words. The
`parser preferably produces logical forms that relate the deep subject, verb, and deep
`object of the input text. For example, for the input text o'The father is holding the
`baby," the parser might produce logical form indicating the deep subject is "father," the
`verb is oohold," and the deep object is "baby." Because transforming input text into a
`logical form distills the input text to its fundamental meaning by eliminating modifiers
`and ignoring differences in tense and voice, transforming input text segments into the
`logical forms tends to unifr the many different ways that may be used in a natural
`
`rii
`
`ri-1t*-\
`
`Page 11 of 364
`
`

`

`7
`
`language to express the same idea. The parser further identifies the particular senses of
`
`these words in which they are used in the input text.
`Using a digital dictionary or thesaurus (also known as a "linguistic
`knowledge base") that identifies, for a particular sense of a word, senses of other words
`
`that are generic terms for the sense of the word ("hypernyms"), the invention changes
`the words within the logical forms produced by the parser to their hypernyms to create
`
`additional logical forms having an overall meaning that is hypemymous to the meaning
`of these original logical forms. The invention then transforms all of the generated
`logical forms into tokens intelligible by the information retrieval system that compares
`
`10
`
`the tokenized query to the index, and submits them to the information retrieval system.
`
`Figure 2 is a high-level block diagram of the general-purpose computer
`
`system upon which the facility preferably operates. The computer system 200 contains
`
`a central processing unit (CPU) 210, input/output devices 220, anda computer memory
`
`(memory) 230. Among the input/output devices is a storage device 221, such as a hard
`disk drive. The input/output devices also include a computer-readable media drle 222,
`r,vhich can be used to install software products, including the facility which are provided
`
`on a computer-readable medium, such as a CD-ROM. The input/output devices further
`include an Internet connection 223 enabling the computer system 200 to communicate
`with other computer systems via the Internet. The computer programs that preferably
`comprise the facility 240 reside in the memory 230 and execute on the CPU 210. The
`facility 240 includes a rule-based parser 241 for parsing input text segments to be
`tokenized in order to produce logical forms. The facility 240 further includes a
`linguistic knowledge base 242 used by the parser to ascribe sense numbers to words in
`the logical form. The facility further uses the linguistic knowledge base to identiff
`hypemyms of the words in the generated logical forms. The memory 230 preferably
`also contains an index 250 for mapping from tokens generated from the target
`documents to locations in the target documents. The memory 230 also contains an
`information retrieval engine ("IR engine") 260 for storing tokens generated from the
`target documents in the index 250, and for identiffing in the index tokens that match
`tokens generated from queries. While the facility is preferably implemented on a
`
`15
`
`20
`
`25
`
`30
`
`rTt
`
`?nd
`;"'!j
`
`Page 12 of 364
`
`

`

`8
`
`computer system configured as described above, those skilled in the art wili recognize
`that it may also be implemented on computer systems having different configurations.
`Figwe 3 is an overview flow diagram showing the steps preferably
`performed by the facility in order to construct and access an index semantically
`representing the target documents. Briefly, the facility first semantically indexes the
`target documents by converting each sentence or sentence fragment of the target
`document into a number of tokens representing an expanded logical form portraying the
`relationship between the important words in the sentence, including hypernyms having
`similar meanings. The facility stores these "semantic tokens" in the index, along with
`the location in the target documents where the sentence occurs. After all of the target
`documents have been indexed, the facility is able to process information retrieval
`queries against the index. For each such query received, the facility tokenizes the text
`of the query in the silme way it tokenized. sentences from the target documents -- by
`converting the sentence into semantic tokens together representing an expanded logical
`form for the query text. The facility then compares these semantic tokens to the
`semantic tokens stored in the index to identi$r locations in the target documents for
`which these semantic tokens have been stored, and ranks the target documents
`containing these semantic tokens in the order of their relevance to the query. The
`facility may preferably update the index to include semantic tokens for new tarset
`documents at any time.
`
`Refening to Figure 3, in steps 301-304, the facility loops through each
`sentence in the target documents. In step 302, the facility invokes a routine to tokenize
`the sentence as shown in Figure 4.
`Figure 4 is a flow diagram showing the tokenize routine used by the
`facility to generate tokens for an input sentence or other input text segment. In step
`401, the facility constructs a primary logical form from the input text segment. As
`discussed above, a logical form represents the fundamental meaning of a sentenoe or
`sentence fragment. The logical forms are produced by applying the parser 241
`(Figure 2) to subject the input text segment to a syntactic and semantic parsing process.
`For a detailed discussion of the construction of logical forms representing an input text
`
`10
`
`15
`
`2A
`
`25
`
`30
`
`d.--
`ls
`'sh
`
`ll
`
`;*
`
`;,{ v
`
`,-\
`
`Page 13 of 364
`
`

`

`9
`
`string, refer to U.S. Patent Application No. 08/674,610, which is hereby incorporated by
`
`reference.
`
`The logical form used by the facility preferably isolates the principal
`verb of the sentence, the noun that is the real subject of the verb ("deep subject") and
`
`the noun that is the real object ofthe verb ("deep object"). Figure 5 is a logical form
`diagram showing a sample primary logical form. The logical form has three elements:
`a deep subject element 510, a verb elemett 520, and a deep object element 530. It can
`be seen that the deep subject of the logical form is sense 2 of the word "man." The
`sense number indicates, for words having more than one sense, the particular sense
`
`10
`
`15
`
`ascribed to the word by the parser as defined by the linguistic knowledge base used by
`the parser. For example, the word "man" could have a first sense meaning to supply
`with people and a second sense meaning adult male person. The verb of the logical
`form is a first sense of the word "kiss." Finally, the deep object is a second sense of the
`word "pig." An abbreviated version of this logical form is an ordered triple 550 having
`as its first element the deep subject, as its second element the verb, and as its third
`
`element the deep object:
`
`(man, kiss, pig)
`
`20
`
`The logical form shown in Figure 5 characterizes a number of different
`. sentences and sentence fragments. For example, Figure 6 is an input text diagram
`
`i1"*
`'*w
`tli t
`
`gt
`
`\rri'
`
`.W
`
`showing an input text segment for which the facility would consffuct the logical form
`shown in Figure 5. Figure 6 shows the input text sentence fragment "man kissing a
`pig." It can be seen that this phrase occurs at word number 150 of document 5,
`occupying word positions 150, 151,152, and 153. When the facility is tokenizing this
`input text fragment, it generates the logical form shown in Figure 5. The facility would
`
`also generate the logical form shown in Figure 5 for the following input text segments:
`
`25
`
`tI
`
`I
`
`Page 14 of 364
`
`

`

`10
`
`The pig was kissed by an unusual man.
`
`The man will kiss the largest pig.
`
`Many pigs have been kissed by that man.
`
`i'---
`fn
`
`rai
`ixtl
`
`\l
`. frt
`ttl?!i6
`. aii
`
`As discussed above, because transforming input text into a logical form distills the input
`text to its fundamental meaning by eliminating modifiers and ignoring differences in "'
`tense and voice, transforming input text segments into the logical forms tends to uniry
`the many different ways that may be used in a natural language to express the same
`
`idea.
`
`10
`
`l5
`
`20
`
`25
`
`Returning to Figure 4, after the facility has constructed the primary
`logical form from the input text, such as the logical form shown in Figure 5, the facility
`continues in step 402 to expand this primary logical form using hypernyms. After step
`
`402, the tokenized routine returns.
`As mentioned above, a hypernym is a genus term that has an "is a"
`relationship with a particular word. For instance, the word "vehicle" is a hypernym of
`the word o'automobile." The facility preferably uses a linguistic knowledge base to
`identify hypernyms of the words in the primary logical form. Such a linguistic
`knowledge base typically contains semantic links identiffing hypernyms of a word.
`Figure 7A is a linguistic knowledge base diagram showing sample
`hypernym relationships identified by a linguistic knowledge base. It should be noted
`that Figure 7A, like the linguistic knowledge base diagrams that follow, has been
`simplified to facilitate this discussion, and omits information commonly found in
`linguistic knowledge bases that is not directly relevant to the present discussion. Each
`
`ascending affow in Figure 7A connects a word to its hypernym. For example, there is
`an uilrow connecting the word man (sense 2) 7tl to the word person (sense l) 7I4,
`indieating that person (sense 1) is a hypemym of man (sense 2). Conversely, man (sense
`
`2) is said to be a "hyponym" of person (sense 1).
`
`In identiffing hypernyms with which to expand the primary logical form,
`the facility selects one or more hypemyms for each word of the primary logical form
`based upon the oocoherency" of the hypemyms' hyponyms. By selecting hypernyms in
`
`30
`
`Page 15 of 364
`
`

`

`11
`
`this manner, the facility generalizes the meaning of the logical form beyond the
`meaning of the input text segment, but by a controlled amount. For a particular word of
`a primary logical form, the facility first selects the immediate hypernym of the word of
`the primary logical form. For example, with reference to Figure 7A, starting with man
`(sense 2) 7I1 which occurs in the primary logical form, the facility selects its
`hypemym, person (sense l) 714. The facility next bases its determination of whether to
`also select the hypemym of person (sense l) 714, animal (sense 3) 715, on whether
`person (sense 1) 714 has a coherent hyponym set with respect to the starting word man
`(sense 2) 7ll. Person (sense I) 7L4 has a coherent hyponym set with respect to man
`(sense 2) 7ll if a large number of hyponyms of all senses of the word person other than
`the starting word (sens e 2) 7l1 bear at least a threshold level of similarity to the starting
`word man (sense 2) 7ll.
`In order to determine the level of similarity between the hyponyms of the
`different senses of the hypernym, the facility preferably consults the linguistic
`knowledge base to obtain similarity weights indicating the degree of similarity between
`these word sentences. Figure 78 is a linguistic knowledge base diagram showing
`similarity weights between man (sense 2) and other hyponyms of person (sense 1) and
`person (sense 5). The diagram shows that the similarity weight between man (sense 2)
`and woman (sense 1) is ".0075"; between man (sense 2) and child (sense 1) is ".0029";
`between man (sense 2) andviilain (sense 1) is ".0003"; and between man (sense 2) and
`lead (sense 7) is ".0002". These similarity weights are preferably calculated by the
`linguistic knowledge base based on a network of semantic relations maintained by the
`linguistic knowledge base between the word sense pairs. For a detailed discussion of
`calculating similarity weights between word sense pairs using a linguistic knowledge
`base, refer to U.S. Patent Application No.08ho4,?
`eAmO*SZ*i, entitled "DETERMINING SIMILARITY BETWEEN WORDS," which
`
`flsi
`ied
`
`rff.
`
`{.iJ
`$E*
`ji i8
`
`Lri"--
`f6{ss
`E
`
`tgig
`
`r$EB
`
`10
`
`15
`
`20
`
`4;s,'
`
`irb\
`
`is hereby incorporated by reference.
`In order to determine whether the set of hyponyms is coherent based on
`these similarity weights, the facility determines whether a threshold number of the
`30 similarity weights exceed a threshold similarity weight. While the preferred threshold
`
`\
`{'**-
`
`II
`
`1
`
`Page 16 of 364
`
`

`

`iJ-
`
`[*"i:
`
`I2
`
`percentage is 90Yo, the threshold percentage may preferably be adjusted in order to
`optimize the performance of the facility. The similarity weight threshold may also be
`configured to optimize the performance of the facility. The threshold similarity weight
`is preferably coordinated with the overall distribution of similarity weights provided by
`the linguistic knowledge base. Here, the use of a tlueshold of ".0015" is shown. The
`
`facility therefore determines whether at least 90o/o of the similarity weights between the
`starting word and the other hyponyms of ali of the senses of the hypemym are at or
`above the ".0015" threshold similarity weight. It can be seen from Figure 78 that this
`condition is not satisfied by the hyponyms of person with respect to man (sense 1):
`
`while the similarity weights between man (sense 1) and woman (sense 1) and between
`man (sense 1) and child (sense 1) are greater than ".0015", the similarity weights
`between man (sense 1) and villain (sense 1) and between man (sense 1) and lead (sense
`7) are less than ".0015". The facility therefore does not select the further hypernym
`animal (sense 3) 715, or any hypemyms of animal (sense 3). As a result, only the
`hypernym person (sense l) 714 is selected to expand the primary logical form.
`To expand a primary logical form, the facility also selects hypernyms of
`the verb and deep object of the primary logical form. Figure 8 is a linguistic knowiedge
`base diagram showing the selection of hypernyms of the verb of the primary logicai
`form, kiss (sense 1). It can be seen from the diagram that touch (sense 2) is the
`hypernym of kiss (sense 1). The diagram also shows the similarity weights between
`kiss (sense 1) andthe otherhyponyms of all of the senses of touch. The facility first
`selects the immediate hyperrrym of the verb of the primary logical form kiss (sense l),
`touch (sense 2). To deterlnine whether to select the hypernym of touch (sense 2),
`interact (sense 9), the facility determines how many similarity weights between kiss
`(sense 1) and the other hyponyms of all of the senses of touch are at least as large as the
`threshold sirnilarity weight. Because only two of these four similarity weights ate at
`least as largd as the ".0015" threshold similarity weight, the facility do

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket