throbber
United States Patent [191
`Turtle
`
`[111
`[45]
`
`llllll|||||l|ll||lllllllllllllllllllllllllllllllllllllllllllllllllllllllll
`
`US005265065A
`Patent Number:
`Date of Patent:
`
`5,265,065
`Nov. 23, 1993
`
`5,123,103 6/1992 Ohtaki et al. ..................... ., 395/600
`
`OTHER PUBLICATIONS
`
`'
`
`Turtle et al., “Evaluation of an lnterence Networ
`k-Based Retrieval Model", Transactions on Information
`Systems, Association for Computer Machinery, vol. 9,
`No. 3, pp. 187-223 (Jul. 1991).
`Croft et a1, "Interactive Retrieval of Complex Docu
`ments“, Information Processing and Management, vol.
`26, No. 5, pp. 593—613 (1990).
`
`(List continued on next page.)
`
`Primary Examiner-Thomas C. Lee
`Assistant Examiner—-Wayne Amsbury
`Attorney, Agent, or Firm-—-Kinney & Lange
`[57]
`ABSTRACT
`A computer implemented process for creating a search
`query for an information retrieval system in which a
`database is provided containing a plurality of‘ stopwords
`and phrases. A natural language input query de?nes the
`composition of the test of documents to be identi?ed.
`Each word of the natural language input query is com
`pared to the database in order to remove stopwords
`from the query. The remaining words of the input query
`are stemmed to their basic roots, and the sequence of
`stemmed words in the list is compared to phrases in the
`database to identify phrases in the search query. The
`phrases are substituted for the sequence of stemmed
`words from the list so that the remaining elements,
`namely the substituted phrases and unsubstituted
`stemmed words, form the search query. The completed
`search query elements are query nodes of a query net
`work used to match representation nodes of a document
`network of an inference network. The database includes
`as options a topic and key database for ?nding numeri
`cal keys, and a synonym database for ?nding synonyms,
`both of which are employed in the query as query
`nodes.
`
`46 Claims, 8 Drawing Sheets
`EXAMPLE
`
`1541
`
`175]
`[73]
`
`[21]
`[22]
`[51]
`[52]
`
`[53]
`
`[56]
`
`METHOD AND APPARATUS FOR
`INFORMATION RETRIEVAL FROM A
`DATABASE BY REPLACING DOMAIN
`SPECIFIC STEMMED PHASES IN A
`NATURAL LANGUAGE TO CREATE A
`SEARCH QUERY
`Inventor: Howard R. Turtle, Woodbury, Minn,
`Assignee: West Publishing Company, Eagan,
`Minn.
`Appl. No.: 773,101
`Filed:
`Oct. 8, 1991
`
`Int. Cl.5 ....................... .. G06F 15/40; G06F 7/10
`US. Cl. ........................... .. 395/600; 364/D1G. 1;
`364/2S2.1; 364/283.3; 1164/2228]; 364/225.3;
`364/226.4; 364/225.4; 364/226.6; 364/260.4
`Field of Search ............. .. 364/200, 300, 419, 513;
`395/600
`
`References Cited
`U.S, PATENT DOCUMENTS
`4,241,402 12/1980 Mayper, Jr. et a1.
`4,270,182 5/1981 Asija .................. ..
`4,358,824 11/1982 Glickman et a1.
`4,384,329 5/1983 Rosenbaum et a1.
`
`364/200
`364/900
`.. 364/200
`364/419
`
`r . . ., 364/900
`4,471,459 9/1984 Dickinson et a1, . . . r . .
`364/900
`4,499,553 2/1985 Dickinson et a], ,.
`4,554,631 11/1985 Reddington ...................... ., 364/300
`4,580,218 4/1986 Raye .................................. .. 364/300
`4,650,848 6/1987 Schramm ....... .,
`364/513
`4,688,195 8/1987 Thompson et al
`364/300
`4,706,212 11/1987 Toma ............. ..
`.. 364/900
`4,787,035 11/1988 Bourne .............................. ., 364/300
`4,823,306 4/1989 Barbic et al ...................... .. 364/900
`4,839,853 6/1989 Deerwester et a1.
`364/900
`
`4,862,408 8/1989 Zamora . . . . . . . . r . , . . r .
`
`. . . 1. 364/900
`
`4,868,750 9/1989 Kucera et a1.
`4,914,590 4/1990 Loatman et a1. .
`4,918,588 4/1990 Barrett et a1,
`4,931,935 6/1990 Ohira et a1. ,,,, ,.
`
`364/419
`.. 364/419
`364/200
`364/419
`
`4,972,349 11/1990 Kleittberger . . , . . . . . .
`
`. . . .. 364/900
`
`4,974,191 11/1990 Amirghodsi et a1.
`
`364/900
`
`. , . ,. 364/900
`4,991,087 2/1991 Burkowski et a1, . , . r r
`364/419
`5,099,425 3/1992 Kanno: Yuji et a1. ..
`395/600
`5,109,509 4/1992 Katayama et al. N
`5,117,349 5/1992 Tir?ing et a1. .................... .. 395/600
`
`50
`
`STEPS
`
`Enter Natural Language Query l
`
`=
`
`Purse words unto L151
`
`Stern Words
`
`-
`
`What is the liobrtit at the United
`States under the Fe rat ‘(on
`CtotmsAct 101 10101025 untamed
`to
`an ind
`t
`es of
`I
`en
`du contract
`actor worhin
`U V
`d 5
`ct:
`2 rule
`tote
`an
`nc n
`iv
`\f i
`Mummers ’
`I
`5°
`Qntroctar
`Contract
`Go:
`em
`
`i
`.
`,
`Ctutm
`Lrohrt
`w 4
`State
`Injur
`éwtu
`Siam“
`o e
`lnoepeynd
`
`or
`
`42
`
`“
`
`56
`
`\I
`
`Sutmuute Phrases
`
`l
`
`-
`
`LiQDtI
`Phrase itgnlt State}
`Phrase eon Tun Ctoirn Act)
`
`m “'1
`1
`1
`It
`run I
`We“ Met» den Con rector!
`Contract
`Phrase (Aqenc Umt Stave!
`vll‘l'l
`
`Facebook, Inc. - EXHIBIT 1237
`
`001
`
`

`

`5,265,065
`Page 2
`
`OTHER PUBLICATIONS
`
`Haynes, "Designing a System for the Specialized User:
`A Case Study", Proceedings-I985 National OnIine
`MeetingLearned Information lnc., pp. 205-213, (Apr.
`30, 1985).
`Croft et al, "A Retrieval Model Incorporating Hyper
`text Links", Hyperlex ‘89 Proceedings, Association for
`Computer Machinery, pp. 213-224 (Nov. 1989).
`Turtle et a1, “Inference Networks for Document Re
`trieval", Coins Technical Report 90-07, University of
`Massachusetts (Mar. 1990).
`
`Turtle et al, “Inference Network for Document Re
`trieval“, Sigir 90, Association for Computing Machin
`ery, pp. l~24 (Sep. 1990).
`Turtle, “Inference Network for Document Retrieval",
`Ph.D. Dissertation, Coins Technical Report 90-92, Uni
`versity of Massachusetts (Oct. 1990)v
`Turtle et a], “Efficient Probabilistic Inference for Text
`Retrieval”, Riao ’91 Conference Proceedings, Recherche
`d‘lnformation Assistée par Ordinateur, Universitat
`Automa de Barcelona, Spain, pp. 644-661 (Apr. 1991).
`Porter, “An Algorithm for Suf?x Skipping“, Program,
`vol, 14, pp. 130-137 (1980).
`
`Facebook, Inc. - EXHIBIT 1237
`
`002
`
`

`

`US. Patent
`
`Nov. 23, 1993
`
`Sheet 1 of 8
`
`5,265,065
`
`d1
`
`d|-1
`
`d|
`
`GD
`
`U2
`
`r?
`
`r 3
`
`,
`
`DOCUMENT '
`
`NETWORK
`10/
`
`QUERY
`
`NETWORK
`
`12/
`
`01
`
`d1
`
`r1
`
`M
`
`in
`
`‘
`
`, d
`
`___ ____—_—_
`
`r
`c3
`
`cm
`
`q2
`
`d|1
`
`q2
`
`dl
`
`rn
`
`'3
`
`c2
`
`q1
`
`d2
`
`r2
`
`Q1
`
`Facebook, Inc. - EXHIBIT 1237
`
`003
`
`

`

`US. Patent
`
`Nov. 23, 1993
`
`Sheet 2 of 8
`
`5,265,065
`
`52 ~65: m .mvm
`
`No}
`
`39:5: >9. 9.. 9.00..
`
`
`3323 >3. .m 052 5
`
`boxcobnc a2 30:5 >
`SEE? >9.
`
`.312...
`
`mono: >33. 2 23:5:
`
`>9. v9.5.7: 92 23
`
`\
`
`3 \
`9”
`
`
`
`£052 550.28
`
`a,
`
`$5005.
`.zwogz
`mwsmiOu
`
`20m
`
`O:
`
`Facebook, Inc. - EXHIBIT 1237
`
`004
`
`

`

`US. Patent
`
`Nov. 23, 1993
`
`Sheet 3 of 8
`
`5,265,065
`
`O¢
`
`NV
`
`Eu
`
`
`
`
`
`1299.50E3532:$951
`
`«noEEM
`
`{25
`
`
`
`3.2m:59.50539.5.;
`
`Em>oo
`
`80:50
`
`539.28E220
`
`€038<
`
`9.3qu
`
`w.._&5_<xw
`
`3353:5...2:o.35oco5:3
`3:52:B2:52.2:r.2;
`E35535sowe$0335»33:853.313.:.8.o<mE_o_u
`
`
`68:80Etc::28;38228
`
`
`
`30:505?.22m
`
`:8..083...9:82:.$35
`Co<E20:8..63...32:1
`
`
`5260£295to._.
`
`
`ocwo<£23mBum“.
`385:5335.”.
`.8202SEN mam—Fm
`
`>52032.9.3
`7.32.:383organ.
`
`
`
`m.coEEm>ow
`
`.32..
`
`5035
`5.8.
`
`
`
`33335u>oEum
`
`
`
`3.8.5.2::.85
`
`005
`
`Facebook, Inc. - EXHIBIT 1237
`
`Facebook, Inc. - EXHIBIT 1237
`
`005
`
`
`
`
`
`
`
`

`

`US. Patent
`
`Nov. 23, 1993
`
`Sheet 4 of 8
`
`5,265,065
`
`dm
`
`gig
`
`2
`
`‘1
`
`dm
`
`Q
`
`gig.
`
`dm
`
`0
`
`dm
`
`r‘
`
`Q
`
`r2
`
`'3 '
`
`Facebook, Inc. - EXHIBIT 1237
`
`006
`
`

`

`US. Patent
`
`Nov. 23, 1993
`
`Sheet 5 of 8
`
`5,265,065
`
`R. .mrw
`
`32.3 :2 2022M
`
`5.62.3 62000
`
`
`.3 8:: 22.5.85 mo ESP
`
`3:3 :33“. of 05:29.6
`
`3: on 3.2 $5238 :0 2
`32% :3 2: 52:3
`
`29.6 >5 E 022 00 3:3 E=E§0E 2t 3:3 9: mo .53
`
`
`8... 3% BE .5833 500 .5...
`
`.36 $05“. :3 2: 52:8
`
`
`umoEa :ztoa of .8
`
`mo»
`
`OF
`
`Facebook, Inc. - EXHIBIT 1237
`
`007
`
`

`

`US. Patent
`
`Nov. 23, 1993
`
`Sheet 6 of 8
`
`5,265,065
`
`INPUT 'NQUERY
`NATURAL LANGUAGE
`
`{82
`STOPWORD
`DATABASE
`
`PARSE
`'
`COMPARE
`STOPWORDS
`
`so
`‘1
`SYNONYM
`DATABASE
`
`YES
`
`[00
`
`COMPARE
`PHRASES
`
`|02
`
`I08
`
`TOPIC
`a
`KEY
`SUBROUTINE
`
`B4
`
`86
`
`as
`
`92
`
`94
`
`96 /
`PHRASE
`DATABASE
`
`Em To
`DOCUMENT
`NETWORK
`
`DETERMINE
`2 FIRS'I
`
`“LONG
`
`QUERY NETWORK
`
`Facebook, Inc. - EXHIBIT 1237
`
`008
`
`

`

`US. Patent
`
`Nov.
`23, 1993
`
`Sheet 7 of 8
`
`5,265,065
`
`FROM
`PARSER
`
`MS
`COUNT TER
`SET 2
`
`SET '
`
`ADD
`
`us /
`COMPARE TERM TO
`DATABASE, FIND
`
`I20 /
`
`CALCULATE
`O.4+O.6 idfi ' tfij
`/ I22
`ACCUMULATE
`
`I24
`
`NO
`
`/l26
`/I28
`/ I30
`
`YES
`DIVIDE BY
`2
`6
`STORE
`i
`REPEAT FOR
`PICS
`OTHER TO
`/ I32
`
`U
`
`RAN K
`Pn.
`SELECT TO
`
`I34
`
`THRESHOLD
`L
`
`H6
`5
`TOPIC AND KEY
`DATABASE
`
`I
`
`TO FIG 8
`
`I08
`
`TOPIC a KEY
`SUBROUTINE
`gig. 9
`
`Facebook, Inc. - EXHIBIT 1237
`
`009
`
`

`

`US. Patent
`
`Nov. 23, 1993
`
`Sheet 8 of 8
`
`5,265,065
`
`I46
`7
`DOCUMENT
`DATABASE
`
`TEXT
`NODES j
`
`DOCUMENT
`NETWORK
`gig. 10
`
`FROM QUERY
`NETWORK
`
`COUNT TERMS
`SET 2
`
`SET i=0
`
`ADD
`
`‘
`
`COM PAR
`DATABAS
`idfi ,
`if;
`
`TO
`
`150 /
`
`CALCULATE
`o.4+o.e - mi . tfn
`'1
`T
`ACCUMU LATE
`
`NO
`
`YES
`
`mvmE BY
`x+ z
`l
`STORE
`1
`REPEAT FOR
`OTHER DOCUMENTS
`
`I60
`
`RANK
`
`PRINgOUT
`
`DISPLAY
`
`Facebook, Inc. - EXHIBIT 1237
`
`010
`
`

`

`1
`
`METHOD AND APPARATUS FOR INFORMATION
`RETRIEVAL FROM A DATABASE BY REPLACING
`DOMAIN SPECIFIC STEMMED PHASES IN A
`NATURAL LANGUAGE TO CREATE A SEARCH
`QUERY
`
`20
`
`BACKGROUND OF THE INVENTION
`This invention relates to information retrieval, and
`particularly to document retrieval from a computer
`database. More particularly, the invention concerns a
`method and apparatus for creating a search query in
`natural language for use in an inference network for
`document identi?cation and retrieval purposes.
`Presently, document retrieval is most commonly per‘
`formed through use of Boolean search queries to search
`the texts of documents in the database. These retrieval
`systems specify strategies for evaluating documents
`with respect to a given query by logically comparing
`search queries to document texts. One of the problems
`associated with text searching is that for a single natural
`language description of an information need, different
`Boolean researchers will formulate different Boolean
`queries to represent that need. Because the queries are
`different, document ranking will be different for each
`search, thereby resulting in different documents being
`retrieved.
`More recently, hypertext databases have been devel
`oped which emphasize ?exible organizations of multi
`media "nodes" through connections made with user
`speci?ed links and interfaces which facilitate browsing
`in the network. Early networks employed query-based
`retrieval strategies to form a ranked list of candidate
`"starting points“ for hypertext browsing. Some systems
`employed feedback during browsing to modify the ini
`tial query and to locate additional starting points. Net
`work structures employing hypertext databases have
`used automatically and manually generated links be
`tween documents and the concepts or terms that are
`used to represent their content. For example, “docu
`ment clustering" employs links between documents that
`are automatically generated by comparing similarities
`of content. Another technique is “citations" wherein
`documents are linked by comparing similar citations in
`them. “Term clustering" and “manually-generated the
`sauri" provide links between terms, but these have not
`been altogether suitable for document searching on a
`reliable basis.
`Deductive databases have been developed employing
`facts about the nodes, and current links between the
`nodes. A simple query in a deductive database, where N
`is the only free variable in formula W, is of the form
`{N1W(N)}, which is read as “Retrieve all nodes N such
`that W(N) can be shown to be true in the current data
`base." However, deductive databases have not been
`successful in information retrieval. Particularly, uncer
`tainty associated with natural language affects the de
`ductive database, including the facts, the rules, and the
`query. For example, a speci?c concept may not be an
`accurate description of a particular node; some rules
`may be more certain than others; and some parts of a
`query may be more important than others. For a more
`complete description of deductive databases, see Croft
`et al. “A Retrieval Model for Incorporating Hypertext
`Links", Hypertext ‘89 Proceedings. pp 213-224, No
`vember l989 (Association for Computing Machinery),
`incorporated herein by reference.
`
`5,265,065
`2
`A Bayesian network is one which employs nodes to
`represent the document and the query. If a proposition
`represented by a parent node directly implies the propo
`sition represented by a child node, a implication line is
`drawn between the two nodes. If-then rules of Bayesian
`networks are interpreted as conditional probabilities.
`Thus, a rule A-+B is interpreted as a probability
`P(B|A), and the line connecting A with B is logically
`labeled with a matrix that specifies P(B l A) for all possi
`ble combinations of values of the two nodes. The set of
`matrices pointing to a node characterizes the depen
`dence relationship between that node and the nodes
`representing propositions naming it as a consequence.
`For a given set of prior probabilities for roots of the
`network, the compiled network is used to compute the
`probability or degree of belief associated with the re
`maining nodes.
`An inference network is one which is based on a
`plausible or non-deductive inference. One such network
`employs a Bayesian network, described by Turtle et al.
`in "Inference Networks for Document Retrieval",
`SIGIR 90, pp 1-24 September 1990 (Association for
`Computing Machinery), incorporated herein by refer
`ence. The Bayesian inference network described in the
`Turtle et al. article comprises a document network and
`a query network. The document network represents the
`document collection and employs document nodes, text
`representation nodes and content representation nodes.
`A document node corresponds to abstract documents
`rather than their speci?c representations, whereas a text
`representation node corresponds to a specific text repre
`sentation of the document. A set of content representa’
`tion nodes corresponds to a single representation tech
`nique which has been applied to the documents of the
`database.
`The query network of the Bayesian inference net
`work described in the Turtle et al. article employs an
`information node identifying the information need, and
`a plurality of concept nodes corresponding to the con‘
`cepts that express that information need. A plurality of
`intermediate query nodes may also be employed where
`multiple queries are used to express the information
`requirement.
`The Bayesian inference network described in the
`Turtle et al. article has been quite successful for general
`purpose databases. However, it has been difficult to
`formulate the query network to develop nodes which
`conform to the document network nodes. More particu
`larly, the inference network described in the Turtle et
`al. article did not use domain~speci?c knowledge bases
`to recognize phrases, such as specialized, professional
`terms, like jargon traditionally associated with specific
`professions, such as law or medicine.
`For a more general discussion concerning inference
`networks, reference may be made to Probabilistic Rea
`soning in Intelligent Systems: Network: of Plausible 1n fer
`ence by J. Pearl, published by Morgan Kaufmann Pub
`lishers, Inc., San Mateo, Calif, 1988, and to Probabilistic
`Reasoning in Expert Systems by R. E. Neapolitan, John
`Wiley 8L Sons, New York, N.Y., 1990.
`Prior techniques for recognizing phrases in an input
`query employed syntactic and statistical analysis and
`manual selection techniques. The present invention
`employs an automated domain-specific knowledge
`based system to recognize phrases.
`
`50
`
`35
`
`45
`
`65
`
`Facebook, Inc. - EXHIBIT 1237
`
`011
`
`

`

`3
`SUMMARY OF THE INVENTION
`The present invention provides a computer imple
`mented process performing a search query in which a
`database is provided containing domain—knowledge
`speci?c phrases. A natural language input query is in
`putted to the computer system, the input query de?ning
`the composition of the text of documents sought to be
`identi?ed. The natural language query is parsed and the
`stopwords are removed. The remaining words of the
`input query are stemmed to their basic roots, and the
`sequence of stemmed words in the list is compared to
`domain-speci?c phrases in the database to identify
`phrases in the search query. The phrases from the data
`base are substituted for the sequence of stemmed words
`from the list so that the remaining elements, namely, the
`substituted phrases and unsubstituted stemmed words,
`form the search query. More particularly, the individual
`terms of the completed search query form the query
`nodes of the query network.
`An optional and desirable feature of the present in
`vention resides in the provision of a technique for han
`dling citations as a syntactic phrase, the citations being
`employed for a “weighting’ of the statistical probability
`algorithms of the inference network.
`Another optional and desirable feature of the present
`invention resides in the provision of determining a key
`number based on a topical database, the key number
`being added to the search query as a query node and
`affecting the statistical probability algorithms of the
`inference network.
`
`0
`
`5
`
`20
`
`25
`
`5,265,065
`4
`in the propositions which potentially caused it. This is
`distinct from a diagnostic probability scheme in which
`the children provide support for their parents, that is
`belief in the potential causes of a proposition increases
`with belief in the proposition. In either case, the propa
`gation of probabilities through the network is done
`using information passed between adjacent nodes.
`FIG. 1 illustrates a Bayesian inference network as
`described in the aforementioned Turtle et al. article.
`The Bayesian network shown in FIG. 1 is a directed,
`acyclic dependency graph in which nodes represent
`propositional variables or constraints and the arcs rep
`resent dependence relations between propositions. An
`arc between nodes represents that the parent node
`"causes” or implies the proposition represented by the
`child node. The child node contains a link matrix or
`tensor which speci?es the probability that the child
`node is caused by any combination of the parent nodes.
`Where a node has multiple parents, the link matrix spec
`i?es the dependence of that child node on the set of
`parents and characterizes the dependence relationship
`between the node and all nodes representing its poten‘
`tial causes. Thus, for all nodes there exists an estimate of
`the probability that the node takes on a value given any
`set of values for its parent nodes. Ifa node a has a set of
`parents 1r,,={p1, .
`.
`. p,,}, the estimated probabilities
`P(a|p1, .
`.
`. p,,) are determined.
`The inference network is graphically illustrated in
`FIG. 1 and consists of two component networks: a
`document network 10 and a query network 12. The
`document network consists of document nodes d1, d1. .
`. . d,'.;, d,-, interior text representation nodes 11, t2. .
`.
`. tj.1,
`ti, and leaf nodes r1, r1, r3, .
`.
`. rk. The document nodes
`d correspond to abstract documents rather than their
`physical representations. The interior nodes t are text
`representation nodes which correspond to speci?c text
`representations within a document. The present inven
`tion will be described in connection with the text con
`tent of documents, but it is understood that the network
`can support document nodes with multiple children
`representing additional component types, such as audio,
`video, etc. Similarly, while a single text may be shared
`by more than one document, such as journal articles
`that appear in both serial issue and reprint collections,
`and parent/divisional patent speci?cations, the present
`invention shall be described in connection with a single
`text for each document. Therefore, for simplicity, the
`present invention shall assume a one-to-one correspon
`dence between documents and texts.
`The leaf nodes r are content representation nodes.
`There are several subsets of content representation
`nodes r1, r2, r3, .
`.
`. rk, each corresponding to a single
`representation technique which has been applied to the
`document texts. If a document collection has been in
`dexed employing automatic phrase extraction and man
`ually assigned index terms, then the set of representa
`tion nodes will consist of distinct subsets or content
`representation types with disjoint domains. For exam
`ple, if the phrase “independent contractor“ has been
`extracted and “independent contractor" has been manu
`ally assigned as an index term, then two content repre
`sentation nodes with distinct meanings will be created,
`one corresponding to the event that “independent con
`tractor" has been automatically extracted from the sub
`set of the collection, and the other corresponding to the
`event that “independent contractor" has been manually
`assigned to a subset of the collection. As will become
`
`55
`
`BRIEF DESCRIPTION OF THE DRA\VINGS
`FIG. 1 is a block diagram representation of a Bayes
`ian inference network with which the present invention
`is used.
`FIG. 2 is a block diagram representation of a simpli
`?ed Bayesian inference network as in FIG. 1.
`FIG. 3 is a block diagram of a computer system for
`carrying out the invention.
`FIG. 4 is a flowchart and example illustrating the
`steps of creating a search query in accordance with the
`preferred embodiment of the present invention.
`FIG. 5 is a ?owchart and example of the steps for
`determining a key number for inclusion in the search
`query described in connection with FIG. 4.
`FIGS. 6A-6D are block diagram representations ofa
`simpli?ed Bayesian inference network illustrating dif
`ferent techniques for handling phrases.
`FIG. 7 is a ?owchart illustrating the manner by
`which partial phrases are handled in a document re~
`trieval system.
`FIG. 8 is a detailed ?owchart of the query network in
`accordance with the presently preferred embodiment of
`the present invention.
`FIG. 9 is a detailed flowchart of a topic and key
`subroutine used in the query network illustrated in FIG.
`8.
`
`FIG. 10 is a detailed ?owchart of a document net
`work used with the query network shown in FIG. 8.
`
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENT
`The Inference Network
`Inference networks employ a predictive probability
`scheme in which parent nodes provide support for their
`children. Thus. the degree to which belief exists in a
`proposition depends on the degree to which belief exists
`
`65
`
`Facebook, Inc. - EXHIBIT 1237
`
`012
`
`

`

`5,265,065
`
`6
`
`20
`
`25
`
`30
`
`35
`
`5
`clear hereinafter, some concept representation nodes
`may be created based on the content of the query net
`work.
`Each document node has a prior probability associ
`ated with it that describes the probability of Observing
`that document. The document node probability will be
`equal to l/(collection size) and will be small for most
`document collections. Each text node contains a speci?
`cation of its dependence upon its parent. By assumption,
`this dependence is complete (t; is true) when its parent
`document is observed (d,- is true). Each representation
`node contains a speci?cation of the conditional proba
`bility associated with the node given its set of parent
`text nodes. The representation node incorporates the
`effect of any indexing weights (for example, term fre
`quency in each parent text) or term weights (inverse
`document frequency) associated with the concept.
`The query network 12 is an “inverted“ directed acy
`clic graph with a single node I which corresponds to an
`information need. The root nodes c1, c2, c3, .
`.
`. cm are
`the primitive concepts nodes used to express the infor
`mation requirement. A query concept node, c, contains
`the speci?cation of the probabilistic dependence of the
`query concept on its set of parent representation con
`tent nodes, r. The query concept nodes cl. .
`. cm define
`the mapping between the concepts used to represent the
`document collection and the concepts that make up the
`queries. A single concept node may have more than one
`parent representation node. For example, concept node
`c1 may represent the query concept "independent con
`tractor" and have as its parents representation nodes r;
`and U which correspond to "independent contractor“
`as a phrase and as a manually assigned term.
`Nodes q]. q; are query nodes representing distinct
`query representations corresponding to the event that
`the individual query representation is satis?ed. Each
`query node contains a speci?cation of the query on the
`query concept it contains. The intermediate query
`nodes are used in those cases where multiple query
`representations express the information need I.
`As shown in FIG. 1, there is a one-to-one correspon
`dence between document nodes, d, and text nodes, t.
`Consequently, the network representation of FIG. 1
`may be diagrammatically reduced so that the document
`nodes d1, d2, .
`.
`. d,~.1, d,-are parents to the representation
`nodes r1, r2, r3, . . . r;,. In practice, it is possible to further
`reduce the network of FIG. 1 due to an assumed one-to
`one correspondence between the representation nodes
`r1, r2, r3, .
`. . rk, and the concept nodes c1, c2, c3, . .
`. cm.
`The simpli?ed inference network is illustrated in FIG. 2
`and is more particularly described in the article by Tur
`tle et al., “Efficient Probabilistic Inference for Text
`Retrieval,“ RIAO 91 Conference Proceedings, pp.
`644-661, April, 1991 (Recherche d‘lnformaion Assistée
`par Ordinateur. Universitat Autonoma de Barcelona,
`Spain), which article is herein incorporated by refer
`ence.
`As described above, each child node carries a proba
`bility that the child node is caused by the parent node.
`The estimates of the dependence of a child node Q on its
`set of parents. P|, P2, .
`.
`. P,,, are encoded using the
`following expressions:
`
`where P(P1=true)=pg, P(P2=true)=p2, .
`.
`.
`P(Pn=true)=P,,, W1, W1, . . . w” are the term weights for
`each term P1, P1, .
`.
`. Pm and wg is the maximum proba
`bility that the child node can achieve, 0_-<_-w8-_<- I.
`As described above, all child nodes carry a probabil
`ity that the child was caused by the identi?ed parent
`nodes. Document network 10 is created once and re
`mains constant for each document in the network. The
`document network structure is not changed, except to
`add documents to the database. The document nodes d
`and text nodes t do not change for any given document
`once the document representation has been entered into
`document network 10. Most representation nodes are
`created with the database and are dependant on the
`document content. Some representation nodes (repre
`senting phrases and the like) are created for the particu
`lar search being conducted and are dependant of the
`search query.
`Query network 12, on the other hand. changes for
`each input query de?ning a document request. There
`fore, the concept nodes c of the search network are
`created with each search query and provide support to
`the query nodes q and the information need, node 1
`(FIG. 1).
`Document searching can be accomplished by a docu
`ment-based scan or a concept-based scan. A document
`based scan is one wherein the text of each document is
`scanned to determine the likelihood that the document
`meets the information need, I. More particularly, the
`representation nodes r1, r1, r3, .
`.
`. r], of a single docu
`ment are evaluated with respect to the several query
`nodes q], q; to determine a probability that the docu
`ment meets the information need. The top n-ranked
`documents are then selected as potential information
`need documents. The scan process reaches a point, for
`example after assigning a probability for more than n
`documents of a large document collection. that docu
`ments can be eliminated from the evaluation process
`after evaluating subsets of the representation nodes.
`More particularly, if a given document scores so low of
`a probability after only evaluating one or two represen
`tation nodes, determination can be made that even if the
`evaluation continued the document still would not
`score in the top n-ranked documents. Hence, most doc
`uments of a large collection are discarded from consid
`eration without having all their representation nodes
`evaluated.
`A concept-based scan is one wherein all documents
`containing a given representation node are evaluated.
`As the process continues through several representation
`nodes, a scorecard is maintained of the probabilities that
`each document meets the information need, I. More
`particularly, a single representation node r1 is evaluated
`for each document in the collection to assign an initial
`probability that the document meets the concept. The
`process continues through the several representation
`nodes with the probabilities being updated with each
`iteration. The top n-ranked documents are then selected
`as potential information need documents. If at some
`point in the process it can be determined that evaluation
`of additional representation nodes will not alter the
`ranking of the top n-ranked documents, the scan process
`can be terminated.
`
`40
`
`45
`
`50
`
`55
`
`Facebook, Inc. - EXHIBIT 1237
`
`013
`
`

`

`20
`
`25
`
`35
`
`7
`it can be appreciated that the representation nodes r1,
`r2, r3, .
`.
`. n, are nodes dependent on the content of the
`texts of the documents in the collection. Most represen
`tation nodes are created in the document database.
`Other representation nodes, namely those associated
`with phrases, synonyms and citations, are not manifest
`in any static physical embodiment and are created based
`on each search query. For example, a query manifesting
`the concept "employee” may be represented by one or
`more of “actor", "agent“, “attendant", “craftsman",
`"doer", "laborer", “maid”, “servant", "smith", “techni
`cian” and “worker”, to name a few. These various rep
`resentation nodes are created from the query node at
`the time of the search, such as through the use of the
`sauri and other tools to be described. A query node q1,
`qz, etc. can be manifest in one or more representations.
`The Search Query
`The present invention concerns development of the
`concept nodes c for use in the inference network illus
`trated in FIG. 1. The invention will be described in
`connection with a speci?c search query as follows:
`“What is the liability of the United States under the
`Federal Tort Claims Act for injuries sustained by
`employees of an independent contractor working
`under contract with an agency of the United States
`government?"
`Thus the present invention will be described in con
`nection with a database for searching legal documents,
`but it is to be understood the concepts of the invention
`may be applied to other professional databases, such as
`medical, theological, ?nancial and other types where
`specialized vocabularies, citations and digests are em
`ployed.
`The present invention is carried out through use of a
`computer system, such as illustrated in FIG. 3 compris
`ing a computer 20 connected to an input/output termi
`nal 22 and a read only memory (ROM) 24. ROM 24
`may be any form of read only memory, such as a CD
`ROM, write protected magnetic disc or tape, or a
`ROM, PROM or EPROM chip encoded for the pur
`poses described. Computer 20 may be a personal com
`puter (PC) and may be optionally connected through
`modem 26, telephone communication network 28 and
`modem 30 to a central computer 32 having a memory
`34. In one form of the invention, the document network
`10 and the document database containing the texts of
`documents represented by the document network are
`contained in the central computer 32 and its associated
`memory 34. Alternatively, the entire network and data
`base may be resident in the memory of personal com
`puter 20 and ROM 24. In a legal database and document
`information retrieval network the documents may com
`prise, for example, decisions and orders of courts and
`government agencies, rules, statutes and other docu
`ments re?ecting legal precedent. By maintaining the
`document database and document network at a central
`location, legal researchers may input documents into
`the document database in a uniform manner. Thus,
`there may be a plurality of computers 20, each having
`individual ROMs 24 and input/output devices 22, the
`computers 20 being linked to central computer 32 in a
`time-sharing mode. The search query is developed by
`each individual user or researcher by input via the re
`spective input/output terminal 22. For example, input
`/output terminal 22 may comprise the input keyboard
`and display unit of PC computer 20 and may include a
`printer for printing the display and/or document texts.
`
`5,265,065
`8
`ROM 24 contains a database containing phrases
`unique

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket