throbber
United States Patent [19J
`Holt et al.
`
`[54] ASSOCIATIVE TEXT SEARCH AND
`RETRIEVAL SYSTEM HAVING A TABLE
`INDICATING WORD POSITION IN PHRASES
`
`[75]
`
`Inventors: John Holt, Centerville; David James
`Miller, Spring Valley; X. Allan Lu,
`Springboro; Ray Daley; Minh Doan,
`both of Dayton; Richard G. Graham,
`Beavercreek; Catherine Leininger,
`Dayton; Darin W. McBeath,
`Miamisburg; Thomas Pease, Mason;
`Stephen M. Sever, Kettering; Dale
`Waddell; Franz Weckesser, both of
`Dayton, all of Ohio
`
`[73] Assignee: Reed Elsevier, Inc., Newton, Mass.
`
`[21] Appl. No.: 473,824
`
`[22] Filed:
`
`Jun. 7, 1995
`
`Related U.S. Application Data
`
`[63] Continuation of Ser. No. 155,304, Nov. 22, 1993.
`Int. Cl.6
`............................. G06F 17/30; G06F 17/21
`[51]
`[52] U.S. Cl. .......................... 395/605; 395/792; 395/793;
`395/759; 395/760; 395/606; 395/603
`[58] Field of Search ..................................... 395/605, 792,
`395/793, 603, 606, 795, 760; 364/419.19
`
`[56]
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`................... 364/200
`4,241,402 12/1980 Mayper, Jr. et al.
`4,270,182
`5/1981 Asija ....................................... 364/900
`4,358,824 11/1982 Glickman et al. ...................... 364/200
`4,384,329
`5/1983 Rosenbaum et al. ................... 364/300
`4,471,459
`9/1984 Dickinson et al.
`..................... 364/900
`4,499,553
`2/1985 Dickinson et al.
`..................... 364/900
`4,554,631 11/1985 Reddington ............................. 364/300
`4,580,218
`4/1986 Raye ....................................... 364/300
`4,688,195
`8/1987 Thompson et al. ..................... 364/300
`4,706,212 11/1987 Toma ...................................... 364/900
`4,760,528
`7/1988 Levin.
`4,787,035 11/1988 Bourne .................................... 364/300
`
`I 1111111111111111 11111 lllll 111111111111111 11111 1111111111 111111111111111111
`US005771378A
`[11] Patent Number:
`[45] Date of Patent:
`
`5,771,378
`Jun. 23, 1998
`
`4,819,156
`4,823,306
`4,839,853
`4,852,003
`4,862,408
`
`4/1989 De Lorme et al. ................ 395/182.13
`4/1989 Barbie et al. .. ... ... .... ... ... ... ... ... 364/900
`6/1989 Deerwester et al. .................... 364/900
`7/1989 Zamora.
`8/1989 Zamora ................................... 364/900
`
`(List continued on next page.)
`
`FOREIGN PATENT DOCUMENTS
`
`WO 92/04681
`
`3/1992 WIPO .
`
`OTHER PUBLICATIONS
`
`Wong, Wai Yee Peter, et al., "Implementations of Partial
`Document Ranking Using Inverted Files," Information Pro(cid:173)
`cessing & Management, (Great Britain), vol. 29, No. 5, pp.
`647-669, 1991.
`Robert F. Jack, "Essential Guide to the Library IBM PC, vol.
`4, Data Communications: Going Online", Merkler Corp.
`©1987, pp. 62-67.
`
`(List continued on next page.)
`
`Primary Examiner-Gail 0. Hayes
`Assistant Examiner-Joseph Thomas
`Attorney, Agent, or Firm-Reid & Priest LLP
`
`[57]
`
`ABSTRACT
`
`An associative text search and retrieval system uses one or
`more front end processors to interacting with a network
`having one or more user terminals connected thereto to
`allow a user to provide information to the system and receive
`information from the system. The system also includes
`storage for a plurality of text documents, and at least one
`processor, coupled to the front end processors and the
`document storage. The processor(s) search the text docu(cid:173)
`ments according to a search request provided by the user and
`provide to the front end processor a predetermined number
`of retrieved documents containing at least one term of the
`search request. The retrieved documents have higher ranks
`than documents not provided to the front end processor. The
`ranks are calculated using a formula that varies according to
`the square of the frequency in each of the text documents of
`each of the search terms.
`
`15 Claims, 12 Drawing Sheets
`
`74,
`
`\
`
`LOC A
`LOC B
`
`LOC C
`
`LOC D
`LOC E
`
`LOC f-
`
`LOC G
`
`\ ~O~ H
`
`LOC I
`
`Page 1 of 23
`
`GOOGLE EXHIBIT 1038
`
`

`

`5,771,378
`Page 2
`
`U.S. PATENT DOCUMENTS
`
`4,868,750
`4,914,590
`4,918,588
`4,931,935
`4,965,763
`4,972,349
`4,974,191
`4,991,087
`4,991,094
`5,005,127
`5,099,425
`5,109,509
`5,117,349
`5,123,103
`5,265,065
`5,321,833
`5,323,316
`5,544,352
`
`9/1989 Kucera et al. .......................... 364/419
`4/1990 Loatman et al. ........................ 364/419
`4/1990 Barrett et al. .. ... ... ... ... .... ... ... ... 364/200
`6/1990 Ohira et al. ............................. 364/419
`10/1990 Zamora .
`11/1990 Kleinberger ... .... ... ... ... ... ... .... .. 364/900
`11/1990 Amirghodsi et al. ................... 364/900
`2/1991 Burkowski et al. .................... 364/200
`2/1991 Fagan et al. .
`4/1991 Kugimiya et al. .
`3/1992 Yuji et al. ............................... 364/419
`4/1992 Katayama et al. ...................... 395/600
`5/1992 Tirfling et al. .......................... 395/600
`6/1992 Ohtaki et al. ........................... 395/600
`11/1993 Turtle . ... ... ... ... .... ... ... ... ... ... .... .. 395 /600
`6/1994 Chang et al.
`........................... 395/605
`6/1994 Kadashevich et al. ................. 364/419
`8/1996 Egger . ... ... ... ... .... ... ... ... ... ... .... .. 395 /605
`
`OIBER PUBLICATIONS
`
`Vu/Text®User Guide, A Knight Ridder Company, Aug.
`1989 Chapter 7, p. 7.11.
`Save™Retrieval Guide,. Jun. 1, 1990, pp. 12-17 of Chapter
`12.
`Dialog OnDisc User's Guide, Dialog Information Services,
`Inc.,©1987, pp. E-4 & E-5.
`Dialog File 410, Acc.# 00035496: "New Full-Text Features
`in McGraw-Hill ... " published Nov. 1968.
`Dialog File 410, Ace, # 00044457: "KWIC and Hilight:
`How and When," published May 1988.
`Croft, et al., "A Retrieval Model Incorporating Hypertext
`Links," Hypertext '89 Proceedings, Association for Com(cid:173)
`puter Machinery, pp. 213-224 (Nov., 1989).
`
`Turtle, et al., "Inference Networks for Document Retrieval,"
`Coins Technical Report 90---07, University of Massachusetts,
`pp. 1-16, (Mar., 1990).
`Turtle, et al., "Inference Networks for Document Retrieval,"
`Sigir 90, Association for Computer Machinery, pp. 1-24
`(Sep., 1990).
`Turtle, "Inference Network for Document Retrieval," Ph.D.
`Dissertation, Coins Technical Report 90-92, University of
`Massachusetts (Feb., 1991).
`Turtle, et al., "Efficient Probabilistic Inference for Text
`Retrieval," RIAO '91 Conference Proceedings, Recherche
`d'Information Assistee' par Ordinateur, Universitat Auton(cid:173)
`sma de Burcelona, Spain, pp. 644-661 (Apr., 1991).
`Porter, "An Algorithm for Suffix Stripping," Program, vol.
`14, pp. 130-137 (1980).
`Turtle, et al., "Evaluation of an Inference Network-Based
`Retrieval Model," Transactions on Information Systems,
`Association for Computer Machinery, vol. 9, No. 3, pp.
`187-222 (Jul., 1991).
`Croft, et al., "Interactive Retrieval for Complex Docu(cid:173)
`ments," Information Proceeding and Management, vol. 26,
`No. 5, pp. 593-613 (1990).
`Haynes, "Describing a System for the Specialized User: A
`Case Study," Proceedings-1985 National Online Meeting,
`Learned Information, Inc., pp. 205-213 (Apr. 30, 1985).
`"Text Search and Retrieval Reference Manual for the Auto(cid:173)
`mated Patent System (APS)," U.S. Department of Com(cid:173)
`merce, pp. 1-5, 3-2 to 3-3, 3-8 to 3-9, 3-157-1 to 7-7,
`7-35 to 7-37 (Oct. 21, 1992).
`"Text Search and Retrieval Training Manual for the Auto(cid:173)
`mated Patent System (APS)," U.S. Department of Com(cid:173)
`merce, pp. 28, 29, 39-43 (Oct. 21, 1992).
`
`Page 2 of 23
`
`

`

`U.S. Patent
`
`Jun. 23, 1998
`
`Sheet 1 of 12
`
`5,771,378
`
`~I
`
`:.-:,:-::
`et::
`0
`5
`f-
`w
`z
`
`~I
`
`t13I
`
`~I
`
`~,
`
`<;f(cid:173)
`l[)
`
`il
`
`~I
`
`~I
`
`ff51 ~,
`
`~I
`
`~I
`
`~l
`
`~I ~,
`~,
`
`~I
`
`~I
`
`-..........
`
`(j
`~
`
`\ 0
`
`rD
`
`Page 3 of 23
`
`

`

`U.S. Patent
`
`Jun. 23, 1998
`
`Sheet 2 of 12
`
`5,771,378
`
`r70
`
`I
`
`/
`
`/
`
`/
`
`/
`
`/
`
`/
`
`/
`
`/
`
`/
`
`/
`
`/
`
`/
`
`P1 ~2-
`TERM1
`TERM2 p3' P4
`.
`
`. .
`.
`TERMN Pr Ps
`
`LOC A
`LOC B
`.
`
`LOC C
`
`LOC D
`LOC E
`
`.
`
`'
`
`LOC F
`
`'
`
`\
`\
`\
`\
`\
`\
`\
`
`LOC G
`
`LOCH
`.
`.
`.
`LOCI
`
`FIG 3
`
`DOCUMENT
`TEXT
`
`72
`-
`
`INDEX
`74
`-
`
`ANCILLARY
`DATA
`76
`-
`
`FIG 2
`
`Page 4 of 23
`
`

`

`U.S. Patent
`
`Jun. 23, 1998
`
`Sheet 3 of 12
`
`5,771,378
`
`ussc
`1700 to
`1900
`
`82
`
`ussc
`1901 to
`1960
`
`83
`
`ussc
`1961 to
`1992
`
`84
`
`ussc
`1993
`
`85
`
`80
`
`FIG 4
`
`80
`
`83
`
`ussc
`1700 to
`1900
`
`82
`
`ussc
`1901 to
`1960
`
`ussc
`1961 to
`1992
`
`84
`
`ussc
`1993
`
`85
`
`92
`
`94
`
`95
`
`FIG 5
`
`Page 5 of 23
`
`

`

`U.S. Patent
`
`Jun. 23, 1998
`
`Sheet 4 of 12
`
`5,771,378
`
`~104
`
`100
`
`102
`Enter your FREESTYLE (TM) set:'h Description.
`Enter phrases in quotation marks.
`Example : "lender liability", "28 U.S.C. @ 1332 a"
`
`Type .bool to enter a Boolean search
`
`F ex further explanation, press the H key (f<X HELP) and then the ENTER key.
`
`FIG 6
`
`SEARCH OPTIONS
`
`Press ENTER to send search.
`To use a Search Option, enter an equal sign followed by the number.
`
`Search Description:
`WHAT DIFFERENT METHODS OF "USABILITY TEllNG" AND TOM ARE USED
`BY SOFTWARE DEVELOPMENT COMPANIES?
`112a
`
`('114
`('I 15
`Enter/Edit Mandatory Terms
`Enter/Edit Restrictions (e.g., dafe)
`Thesaurus/"! 16
`117
`(I 18
`Edit Search Oescriptiorl
`Change number of documents Current setting: 25
`
`<=1>
`<=2>
`<"'3>
`<=4>
`<=5>
`
`F ex further explanation, press the H key (for HELP) and then the ENTER key.
`
`FIG 7
`
`Page 6 of 23
`
`

`

`U.S. Patent
`
`Jun. 23, 1998
`
`Sheet 5 of 12
`
`5,771,378
`
`SEARCH OPTIONS
`
`Press ENTER to send search.
`To use a Search Option, enter an equal sign followed by the number.
`
`Search Description:
`WHAT DIFFERENT METHODS OF "USABILITY TESTING" AND TOM ARE USED
`I 12o
`BY SOFTWARE DEVELOPMENT COMPANIES?
`
`Mandatory: "USABILITY TESTING", TOM
`
`122
`
`('114
`rl 15
`Enter/Edit Mandatory Terrrb
`Enter/Edit Restrictions (e.g., date)
`Thesaurus/"1I6
`117
`(118
`Edit Search Descriptior1
`Change number of documents Current setting: 25
`
`<:::1 >
`<=2>
`<m3>
`<=4>
`<=5>
`
`110
`
`For further explanation, press the H key (for HELP) and then the ENTER key.
`
`FIG 8
`
`=1
`
`SEARCH RESTRICTIONS
`
`To use restrictions, enter an equal sign followed by the number.
`Press ENTER to return to Search Options.
`
`1 132
`<•1 > DAtE:
`1133
`<=2> COURT:
`.
`134
`<"'3> JuoC
`1135
`<=4> COUNSEL:
`,r"--136
`<=5> NAME:
`
`Example: after 6/6/1993
`
`Example: Sixth Circuit
`
`130
`
`Example: Rehnquist
`
`Example: Gerry
`
`Example: Brown and Topeka Board of Education
`
`For further explanation, press the H key (for HELP) and then the ENTER key.
`
`FIG 9
`
`Page 7 of 23
`
`

`

`U.S. Patent
`
`Jun. 23, 1998
`
`Sheet 6 of 12
`
`5,771,378
`
`SEARCH OPTIONS
`
`Press ENTER to send search.
`To use a Search Option, enter an equal sign followed by the number.
`
`Search Description:
`WHAT DIFFERENT METHODS OF "USABILITY TEl!NG" AND TOM ARE USED
`BY SOFTWARE DEVELOPMENT COMPANIES?
`
`1120
`Restrictions: DAT~ (AFT 10/1/92). COURT (SIXTH CIRCUIT)
`142)
`
`('114
`Enter/Edit Mandatory Terms
`('I 15
`Enter/Edit Restrictions (e.g., date)
`Thesaurus/'l 16
`117
`(I 18
`Edit Search Descriptiori
`Change number of documents Current setting: 25
`
`<a1>
`<=2>
`<-3>
`<=<4>
`<=5>
`
`110
`
`For further explanation, press the H key (for HELP) and then the ENTER key.
`
`FIG 10
`
`SEARCH OPTIONS
`
`Press ENTER to send search.
`To use a Search Option, enter an equal sign followed by the number.
`
`Search Description:
`WHAT DIFFERENT METHODS OF "USABILITY TE~TING" AND TOM ARE USED
`li
`BY SOFTWARE DEVELOPMENT COMPANIES?
`120
`Mandatory:
`"USABILITY TESTING", TOMl22
`
`Restrictions: DATE (AFT 10/1/92). COURT (Sl~H CIRCUIT)
`~ 42
`('114
`<=1 > Enter/Edit Mandatory Terms
`('I 15
`<=2> Enter/Edit Restrictions (e.g., date)
`<-3> Thesaurusr'I 16
`117
`(118
`<""4> Edit Search Descriptiori
`<=5> Change number of documents Current setting: 25
`
`110
`
`For further explanation, press the H key (for HELP) and then the ENTER key.
`
`FIG II
`
`Page 8 of 23
`
`

`

`U.S. Patent
`
`Jun. 23, 1998
`
`Sheet 7 of 12
`
`5,771,378
`
`1, 2, 3, 6-10
`SEARCH TERMS FOUND IN THESAURUS
`
`('154
`Enter term numbers to view synonyms. Example: 1, 2, 3, 6-1 O
`
`150
`
`<=1 > Return to Search Options
`
`1234567890123456789012345678901234567890123456789012345678901234567890123
`2 Usable
`1 5 Piaget
`28 Andretti
`3 TOM
`16 Ncrman
`29 Winog-ad
`4 Models
`17 Carroll
`30 Newell
`5 Algorithms
`18 Cognitive
`31 Shank
`6 Graphics
`19 Metaphysical
`32 Feigenbaum
`20 Metamorphosis
`33 Hypoturbocharged
`7 Chomskian
`8 Transformational
`21 Carburator
`35 Knowledge
`9 Grammar
`Injection
`36 Wcrds
`22
`10 Averylongsearchss...
`23 12345678901234567...
`11 Anotheryerylongss...
`24 words
`25 words
`12 12345678901234567890
`26 words
`1 3 words
`
`37 wwororddss l,
`
`~
`
`38
`39 words
`
`52
`
`For fur1her explanation, ixess the H key (for HELP) and then the ENTER key.
`
`FIG 12
`
`1, 3
`
`('162
`Synonyms for: INCOHERENCE
`Enter synonym numbers to include in search and ixess ENTER (e.g., 1, 2,-3)
`
`<,.1 > Return to Search Options
`
`<a2> Return to Term Selection
`
`--- -- - - ---f-.1§-=!- __ --Term Variations-- - --- - - - - - - - - - - - -
`
`1 incoherent
`
`2 chaos
`5 disjunction
`8
`incomprehensibility
`11
`randomness
`14 unclearness
`17 wandering
`
`3 disconnection
`6 discrder
`9 lack of clarity
`12 ranting
`1 5 unevenness
`
`4 discontinuity
`7 illegibility
`1 O meaninglessness
`13 raving
`16 unintelligibility
`
`160
`
`For further explanation, press the H key (fcr HELP) and then the ENTER key.
`
`FIG 13
`
`Page 9 of 23
`
`

`

`U.S. Patent
`
`Jun. 23, 1998
`
`Sheet 8 of 12
`
`5,771,378
`
`Press ENTER to send search
`To use a Search Option, enter an equal sign followed by the number.
`
`SEARCH OPTIONS
`
`Search Description:
`1112b
`WHAT DIFFERENT METHODS OF "USABILll'Y TESTING" AND TOM
`("CONTINUOUS IMPROVEMENT", "QUALITY FUNCTION DEPLOYMENT")
`ARE USED BY SOFTWARE DEVELOPMENT COMPANIES?
`,122
`"USABILITY TESTING", TOM
`
`Mandatory:
`
`Restrictions: SATE (AFT 10/1/92). COU~ {SIXTH CIRCUIT)
`('114
`142
`rl 15
`<=1 > Enter/Edit Mandatory Terms
`<=2> Enter/Edit Restrictions (e.g., date)
`I 17
`<~3> Thesaurus/""! 16
`{118
`<"'4> Edit Search Descriptiorl
`<=5> Change number of documents Current setting: 25
`
`110
`
`For further explanation, press the H key {for HELP) and then the ENTER key.
`
`FIG 14
`
`Your FREESTYLE search has retrieved the top 44 documents based on
`statistical ranking. Search terms are listed in order of importa·nce.
`Terms after the • occurred too frequently to enhance your search.
`
`"PUBLIC FIGURE" DEFINED PURPO~ES DEFAMATION• FIRST CASE
`C172 ~
`173
`
`Mandatory Terms: DEFAMATION
`
`170
`
`Press ENTER to view documents in KWIC or use Full, Cite, or Segment keys.
`,174
`<=1 > Browse documents in SuperKWIC (. SK)
`r-175
`<=2> Location of search terms documents (. whereJ
`176
`<=3> Number of documents with search terms (. whyf
`< .. 4> Change document order (. sortf'l77
`
`For further explanation, press the H key (for HELP) and then the TRANSMIT key.
`
`FIG 15
`
`Page 10 of 23
`
`

`

`U.S. Patent
`
`Jun. 23, 1998
`
`Sheet 9 of 12
`
`5,771,378
`
`NUMBER OF DOCUMENTS WITH SEARCH TERMS (. why)
`
`184~
`Documents
`Retrieved
`
`186\
`Documents
`Matched
`
`188\
`Term Importance
`(0-100)
`
`DEFAMATION
`PUBLIC FIGURE
`DEFINED
`PURPOSES
`DEFAMATION
`A
`
`44
`11
`22
`29
`44
`
`Total Retrived:
`
`45
`
`<-1> Return to browse
`<=2> Location of terms (. where)
`
`80
`11
`26
`51
`80
`
`(M) Mandatcxy
`12
`8
`5
`4
`
`180
`
`Fcx further explanation, press the H key (fcx HELP) and then the TRANSMIT key.
`
`FIG /6
`
`LOCATION OF SEARCH TERMS IN DOCUMENTS (. where)
`
`Document numbers are listed accross the top of the chart.
`Terms are listed down the side in o..-der of importance.
`Asterisks (*) indicate the existence of terms in documents.
`To view a document, enter the document number.
`1
`1 2 3 4 5 6 7 8 9 0
`
`1941
`2
`1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
`
`)()(
`
`) ( ) ( ) (
`
`****
`)( * * * * *
`)(
`* * )(
`*
`********* *
`)(.)()()(-)()(
`*
`**
`)(-)(
`* *****
`)()(
`) ( - ) ( j \ - ) ( j \ ) ( - ) ( ) ( ) ( ) ( l ( - ) ( ) ( ) ( **********
`* * * *"* ** * * *"* * * ** * * * ** ** ***
`
`l ( l ( ) (
`
`FINISHED PRODUCT
`COMPONENT PART
`NON RESIDENT
`MANUFACTURER
`BRINGING
`FORUM
`INCORPORATED
`PROPER
`ACTION
`TENTH TERM
`<=1 > Return to browse
`<=2> Number of documents with terms (. why)
`Press the H key (fcx Help) and then the ENTER key.
`
`190
`
`FIG 17
`
`Page 11 of 23
`
`

`

`U.S. Patent
`
`Jun. 23, 1998
`
`Sheet 10 of 12
`
`5,771,378
`
`START
`
`202
`
`203
`
`ENTER SEARCH
`DESCRIPTION
`
`DETERMINE
`PHRASES
`
`204
`YES
`
`207
`
`, - - - - - - - - ,
`PROCESS
`OPTION
`
`NO
`
`NO
`
`YES
`
`208
`
`YES
`
`DESC.
`ORNOT
`1 ST MAND.
`
`NO
`
`PERFORM
`SEARCH
`
`DISPLAY
`RESULTS
`
`210
`
`212
`
`DONE
`
`FIG /8
`
`Page 12 of 23
`
`

`

`U.S. Patent
`
`Jun. 23, 1998
`
`Sheet 11 of 12
`
`5,771,378
`
`WORD 1
`WORD 2
`
`BITMAP 1
`BITMAP 2
`
`ID 1
`ID 2
`
`.
`
`WORD N
`
`)
`222
`
`ID N
`
`(224
`
`BITMAP N
`
`22i
`FIG 19
`
`!230
`
`FIG 20
`
`WORD A WORD B WORD C WORD D
`
`250
`FIG 21
`
`LEVEL 1
`
`LEVEL 2
`
`LEVEL 3
`
`LEVEL 8
`
`Page 13 of 23
`
`

`

`U.S. Patent
`
`Jun. 23, 1998
`
`Sheet 12 of 12
`
`5,771,378
`
`START
`
`START
`
`262
`
`PROVIDE TERMS
`ANO FILTER
`INSTRUCTIONS TO
`SR'S
`
`RECEIVE
`DOC COUNTS
`
`264
`
`CALCULATE
`MAXDFI AND DFI
`FOR EACH TERM
`
`265
`
`ELIMINATE SR'S
`WITH NO MATCH
`
`266
`
`PROVIDE DFI AND
`MAXDFI TO SR'S
`
`267
`
`- - - - - - - -
`
`MERGE RANKS
`AND REQUEST
`DOCS
`
`269
`
`- - -
`
`-·-- -
`
`271
`
`272
`
`CALCULATE
`TERM
`IMPORTANCE
`
`DISPLAY
`DOCS
`
`DONE
`
`263
`
`PERFORM
`SEARCH
`
`CALCULATE DOC
`RANKS AND
`RETURN TOP N
`RANKS
`
`RETURN
`REQUESTED
`DOCS
`
`268
`
`270
`
`DONE
`
`FIG 22
`
`Page 14 of 23
`
`

`

`1
`ASSOCIATIVE TEXT SEARCH AND
`RETRIEVAL SYSTEM HAVING A TABLE
`INDICATING WORD POSITION IN PHRASES
`
`5,771,378
`
`This is a continuation of U.S. patent application Ser. No. 5
`08/155,304, filed Nov. 22, 1993.
`
`FIELD OF THE INVENTION
`
`This invention relates to the field of searching and retriev(cid:173)
`ing text documents and more particularly to the field of using
`one or more computers to search a plurality of text docu(cid:173)
`ments in order to retrieve documents having particular terms
`and phrases.
`
`BACKGROUND OF THE INVENTION
`
`2
`one of the supplied search terms and then ranks each
`document using a ranking formula that varies according to
`the square of the term frequency of each of the search terms
`in the document. The ranking formula can also vary accord(cid:173)
`ing to the inverse document frequency of each search term.
`The formula can also use a maximum term frequency to
`estimate the size of a document and the maximum document
`frequency to estimate the number of documents in a collec(cid:173)
`tion of documents, thus reducing the amount of processing
`10 needed to determine document size and the number of
`documents in a collection. The user can provide mandatory
`terms which cause the search to only return documents that
`contain those terms.
`The system can employ a thesaurus to provide both
`15 synonyms and morphological variations of words. Phrases
`in the search description are detected using a table with a
`bitmap indicating possible positions of a word in a phrase
`and by using a tree having nodes corresponding to ID's
`associated with words in a phrase, the nodes being con-
`20 nected according to the order that the words can appear in a
`phrase. The system optimizes the search by distinguishing
`between noise words, which are not provided in an index for
`the documents, and frequently used terms, which are pro(cid:173)
`vided in the index but which are not used in the search.
`The system can provide display options for the documents
`that are retrieved by the search, including displaying a
`window of text that contains the greatest number and
`diversity of search terms and mandatory terms. The system
`can also display a screen indicating which search terms are
`in which retrieved documents and can display a screen that
`indicates the importance of each term, which varies accord(cid:173)
`ing to the inverse of the document frequency of each term.
`The documents can be sorted according to rank or according
`to a predetermined default method, such as reverse chrono(cid:173)
`logical order.
`The system can include a plurality of interconnected
`processors and appropriate data therefore wherein some of
`the processors perform searches and others of the processors
`merge the search data and interact with the user.
`
`40
`
`It is known that a large collection of text documents can
`be searched for particular keywords or phrases. A user can
`provide a single word or phrase or multiple words or phrases
`connected by Boolean connectors such as "AND" or "OR".
`However, in many cases, a user must be fairly sophisticated
`in order to perform searches of sufficient complexity in order
`to retrieve the exact class of documents that the user desires
`without having to perform an excessive number of searches.
`Associative retrieval, a technique for information retrieval 25
`developed in the 1960s by Gerard Salton, addresses some of
`the shortcomings of Boolean searching. Automatic Text
`Processing, (published by Addison Wesley, New York, N.Y.
`1988, and written by Gerard Salton) provides a description
`of associative retrieval searching. The basic formula used in 30
`associative retrieval involves calculating a term weight for
`each term within a search request, and scoring documents in
`a collection based on the sum of the weights for the search
`request terms that occur within the document. The two basic
`weighting factors are known as the term frequency-tf-and 35
`the inverse document frequency-idf.
`The term frequency is defined as the number of times the
`term occurs within a given document. Hence, the term
`frequency must be calculated for each document within the
`collection.
`The inverse document frequency is defined as the inverse
`of the number of documents in the entire collection which
`contain the term. Therefore, if df documents within a col(cid:173)
`lection of N documents contain a given term, the idf would
`be 1/df. The idf can be normalized with respect to the
`number of documents by setting it to log(N/df). The idf is
`calculated for each search request term, but is constant for
`the collection and does not vary by document. The score for
`a given document is calculated by summing the product of
`the tf and the idfs for each search request term contained in
`the document.
`However, there are many aspects of Associative Retrieval
`as described by Salton which render it impractical or
`unwieldy for large scale commercial use for searching and 55
`retrieving documents in large databases. Furthermore, most
`of the work done in the area of Associative Retrieval has
`failed to adequately address aspects relating to human
`interaction and feedback. It is desirable, therefore, to provide
`an associative text search and retrieval system that over(cid:173)
`comes the deficiencies of known systems.
`
`45
`
`50
`
`BRIEF DESCRIPTION OF DRAWINGS
`FIG. 1 is a schematic view of a document searching
`system according to the invention.
`FIG. 2 illustrates data stored in a physical document
`collection.
`FIG. 3 illustrates data stored in an index for a physical
`document collection.
`FIG. 4 illustrates a logical document collection comprised
`of a plurality of physical document collections.
`FIG. 5 illustrates a logical document collection comprised
`of a plurality of subsets of physical document collections.
`FIG. 6 is a screen illustrating entry of a search description.
`FIG. 7 is a screen illustrating entry of search options.
`FIG. 8 is a screen illustrating entry of mandatory terms.
`FIG. 9 is a screen illustrating entry of restrictions.
`FIG. 10 is a screen illustrating displaying of restrictions.
`FIG. 11 is a screen illustrating entry of both mandatory
`terms and restrictions.
`FIG. 12 is a screen illustrating a thesaurus function.
`FIG. 13 is a screen illustrating selection of synonyms
`and/or morphological variations of a term using the thesau-
`65 rus function.
`FIG. 14 is a screen illustrating mandatory terms,
`restrictions, and thesaurus entries.
`
`60
`
`SUMMARY OF THE INVENTION
`
`According to the present invention, a user provides a
`search description containing one or more search terms to an
`associative text search and retrieval system that searches a
`document database to retrieve documents containing at least
`
`Page 15 of 23
`
`

`

`5,771,378
`
`3
`FIG. 15 is a screen illustrating options for viewing docu-
`ments retrieved after a search.
`FIG. 16 is a screen illustrating a "why" function.
`FIG. 17 is a screen illustrating a "where" function.
`FIG. 18 is a flowchart illustrating overall operation of the
`system according to the invention.
`FIG. 19 shows a table used to detect phrases.
`FIG. 20 shows a tree data structure used to detect phrases.
`FIG. 21 shows a plurality of contiguous words from a
`search description.
`FIG. 22 is a flowchart illustrating operation of a search
`algorithm.
`
`DETAILED DESCRIPTION OF DRAWINGS
`Referring to FIG. 1, a document search and retrieval
`system 30 allows a user to search a subset of a plurality of
`documents for particular key words or phrases and retrieves,
`for the user to view, documents that correspond to the search
`request. The system 30 comprises a plurality of Search and
`Retrieval (SR) computers 32-35 connected via a high speed
`interconnection 38 to a plurality of Session Administrator
`(SA) computers 42-44. Each of the SR's 32-35 is connected
`to one or more document collections 46-49, each containing
`text for a plurality of documents, indexes therefor, and other
`ancillary data. More than one SR can be provided access to
`a single document collection. Also, a single SR can be
`provided access to more than one document collection. The
`SR's 32-35 can be implemented using a variety of com(cid:173)
`mercially available computers known to one of ordinary
`skill in the art, such as Model EXlO0 manufactured by
`Hitachi Data Systems of Santa Clara Calif.
`Each of the SA's 42-44 is provided access to data
`representing phrase and thesaurus dictionaries 52-54. The
`SA's 42-44 can also be implemented using a variety of
`commercially available computers, such as Models 5990
`and 5995 manufactured by Amdahl Corporation of Sunny(cid:173)
`vale Calif. The interconnection 38 between the SR's and the
`SA's can be any one of a number of two-way high-speed 40
`computer data interconnections known to one of ordinary
`skill in the art, such as the Model 7200-DX manufactured by
`Network Systems Corporation of Minneapolis Minn.
`Each of the SA's 42-44 is connected to one of a plurality
`of front end processors 56-58 . The front end processors 45
`56-58 provide a connection of the system 30 one or more
`commonly available networks 62 for accessing digital data,
`such as an X.25 network, long distance telephone lines, and
`SprintNet. Connected to the network 62 is a plurality of
`terminals 64--66 which provide user access to the system 30. 50
`The terminals 64--66 can be dumb terminals that simply
`process and display data inputs and outputs, or can be one
`of a variety of readily available stand-alone computers, such
`as an IBM or IBM-compatible Personal Computer. The front
`end processors 56-58 can be implemented by a variety of 55
`commercially available devices, such as Models 4745 and
`4705 manufactured by the Amdahl Corporation of Sunny(cid:173)
`vale Calif. Note that the number of components shown in
`FIG. 1 are for illustrative purposes only and that the system
`30 described herein can have any number of SA's, SR's, 60
`front end processors, etc. Also, the distribution of processing
`described herein may be modified and may in fact be
`performed on a single computer without departing from the
`spirit and scope of the invention.
`A user wishing to access the system 30 via one of the 65
`terminals 64-66 will use the network 62 to establish a
`connection, by means known to one of ordinary skill in the
`
`4
`art, to one of the front end processors 56-58. The front end
`processors 56-58 handle communication with the user ter(cid:173)
`minals 64--66 by providing output data for display by the
`terminals 64-66 and by processing terminal keyboard inputs
`5 entered by the user. The data output by the front end
`processors 56-58 includes text and screen commands. The
`front end processors 56-58 support screen control
`commands, such as the commonly known VTl00
`commands, which provide screen functionality to the termi-
`10 nals 64-66 such as clearing the screen and moving the cursor
`insertion point. The front end processors 56-58 can handle
`other known types of terminals and/or stand-alone comput(cid:173)
`ers by providing appropriate commands.
`Each of the front end processors 56-58 communicates
`15 bidirectionally, by means known to one of ordinary skill in
`the art, with the particular one of the SA's 42-44 connected
`thereto. It is also possible to configure the system, in a
`manner known to one of ordinary skill in the art, such that
`one or more of the front end processors can communicate
`20 with more than one of the SA's 42-44. The front end
`processors 56-58 can be configured to "load balance" the
`SA's 42-44 in response to data flow patterns. The concept
`of load balancing is known to one of ordinary skill in the art.
`Each of the SA's 42-44 contains an application program,
`25 described in more detail hereinafter, that processes search
`requests input by a user at one of the terminals 64--66 and
`passes the search request information on to one or more of
`the SR's 32-35 which perform the search and returns the
`results, including the text of the documents, to the SA's
`30 42-44. The SA's 42-44 provide the user with text docu(cid:173)
`ments corresponding to the search results via the terminals
`64--66. For a particular user session (i.e. a single user
`accessing the system via one of the terminals 64-66), a
`single one of the SA's 42-44 will interact with a user
`35 through an appropriate one of the front end processors
`56-58.
`Referring to FIG. 2, data 70 stored in each of the physical
`document collections 46-49 consists of document text 72,
`an index 74, and ancillary document information 76. The
`data 70 can be located in one or more files of a computer
`hard disk storage device. The document text 72 portion of
`the data 70 is comprised of character data representing text
`(such as ASCII or EBCDIC character data) for a plurality of
`documents. Each of the documents that are part of the
`document text 72 can be accessed individually. The index 74
`contains a list of terms (words and phrases) that are present
`in all of the documents of the document text 72 along with
`the locations in the documents of those terms. The ancillary
`document information 76, described in more detail
`hereinafter, contains other information about the documents,
`such as the dates associated with the documents, the source
`of the documents, etc.
`Referring to FIG. 3, the index 74 for a document collec(cid:173)
`tion comprises a plurality of entries that relate particular
`terms (terml-termn) to a plurality of locations (loc A-loc
`I). The table shown on the left-hand portion of FIG. 3 relates
`each term to a pair of pointers such that terml is related to
`pointers Pl and P2, term2 is related to pointers P3 and P4,
`and termn is related to pointers Pr and Ps. The right-hand
`portion of FIG. 3 represents a list of all of the locations for
`all of the terms in the physical document collection. The
`pointers associated with each term point to the first and last
`locations in the list in order to correlate the terms in the text
`of the documents of the physical collection with the loca(cid:173)
`tions of the terms. For example, FIG. 3 shows terml being
`located at locations loc A (pointed to by Pl) through loc C
`(pointed to by P2) in the list. All of the locations in the list
`
`Page 16 of 23
`
`

`

`5,771,378
`
`5
`between the entry for loc A and the entry for loc C indicate
`separate locations for terml in the document collection.
`Words and phrases which are so common as to be of little
`value in document searching, such as the word "of', are
`deemed "noise words" and are not included in the index. A 5
`list of noise words for each physical document collection is
`stored with the ancillary document information 76.
`The SR's 32-34 search documents in a physical collection
`for particular terms by accessing the index 74. Terms in the
`search request are matched with terms in the index 74 in 10
`order to find specific documents in the document text 72
`which contain the terms in the search request. Plural terms
`are depluralized and stored in their singular form. Terms that
`are submitted for a search are also depluralized. Deplural(cid:173)
`ization is known in the art and is described in Program, Vol. 15
`1, no. 3, pp. 130-137, Jul. 1980, which is incorporated by
`reference herein.
`A user does not typically search all of the documents of
`the system 30, but rather, chooses a subset of the

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket