`Schultz
`
`I 111111111111111111111 111111111111111 IIIII 1111111111 11111 lll111111111111111
`US005721902A
`5,721,902
`[11] Patent Number:
`[45] Date of Patent:
`Feb.24, 1998
`
`[54] RESTRICTED EXPANSION OF QUERY
`TERMS USING PART OF SPEECH TAGGING
`
`(75]
`
`Inventor: John Michael Schultz, Bala Cynwyd.
`Pa.
`
`(73] Assignee: Infonautics Corporation, Wayne, Pa.
`
`[21] Appl. No.: 528,740
`
`Sep. 15, 1995
`
`[22] Filed:
`Int. Cl.6
`...................................................... G06F 17/30
`[51]
`[52] U.S. Cl . .................... 395/604; 364/419; 364/419.08;
`364/222.01; 364/225; 364/225.3; 364/225.4
`[58] Field of Search ................................. 364/19.08, 419,
`364/419.08. 222.01. 225.3, 225.4, 225;
`395/600, 605. 604
`
`[56]
`
`References Cited
`
`U.S. PATENT DOCUMENfS
`
`7/1989 Adi .......................................... 364/419
`4,849,898
`9/1989 Fujisawa et al ........................ 364/200
`4,868,733
`3/1992 Cargren et al .......................... 364/419
`5,099,426
`9/1992 Church .................................... 364/419
`5,146,405
`8/1993 Reed et al ............................... 395/610
`5,241,671
`5,265,065 11/1993 Turtle ...................................... 395/600
`3/1994 Kanaegami et al ................ 364/419.08
`5,297,039
`7/1994 Black et al ........................ 364/419.08
`5,331,556
`5,477,448 12/1995 Golding et al ..................... 364/419.08
`l/1996 Golding et al ..................... 364/419.08
`5,485,372
`8/1996 Henderson et al ................ 364/419.19
`5,544,049
`
`FOREIGN PATENT DOCUMENTS
`
`624837 12/1989 Australia .
`
`648113 11/1992 Australia.
`4213393 10/1993 Australia.
`9427227 11/1994 WIPO.
`9427237
`ll/1994 WIPO.
`95008%
`1/1995 WIPO.
`9512172
`5/1995 WIPO.
`5/1995 WIPO.
`9513582
`9513585
`5/1995 WIPO.
`9514279
`5/1995 WIPO.
`9514280
`5/1995 WIPO.
`9518406
`7/1995 WIPO.
`
`Primary £Kaminer-Thomas G. Black
`Assistant Examiner-Jean M. Corrielus
`Attome)? Agent, or Firm-Reed Smith Shaw & McClay
`
`[57]
`
`ABSTRACT
`
`A method for searching a database of an information
`retrieval system in response to a query having at least one
`query word with a part of speech. for applying the query
`word to the database and selecting information from the
`database according to the query word. A semantic network
`is provided for determining expansion words to expand the
`search of the database in response to said the query word.
`The part of speech of the selected query word is determined.
`The selected query word is applied to the semantic network
`to provide a query expansion word in response to the
`selected query word. The part of speech of the query
`expansion word is determined. The query expansion word is
`applied to the database in accordance with the part of speech
`of the selected query word and the part of speech of the
`query expansion word.
`
`4 Claims, 22 Drawing Sheets
`
`RECEIVE QUERY FIELDS
`fROM SEARCH ENGINE API
`140
`
`IDENTIFY PART or SPEECH
`OF EACH WORD
`IN NATURAL
`LANGUAGE QUERY
`
`.!.R
`
`IS
`QUERY
`,----y'-< Wp~~PE~
`
`r-142c
`
`~O?UN
`
`N
`
`142 r
`
`INCREASE
`RELEVANCE
`
`142g
`
`,:--142d
`
`DECREASE
`RELEVANCE
`
`,-142J
`
`,,-142k
`
`r142L
`
`TRANSMIT DOC. J.O.
`NUMBERS AND
`CORRESPONDING
`RELEVANCE SCORES TO
`SEARCH ENGINE API 140
`
`Page 1 of 42
`
`GOOGLE EXHIBIT 1035
`
`
`
`U.S. Patent
`
`Feb.24, 1998
`
`Sheet 1 of 22
`
`5,721,902
`
`0:::
`L.,.J
`:::c
`(./')
`
`0:::
`L&.J
`:::c
`(./')
`::J
`CD
`=:,
`0..
`
`N
`..(cid:173)
`..-
`
`N
`
`0
`, -
`..-
`
`~1
`
`CD
`Cl
`
`>- a:::
`a::: w
`w>
`::::>~
`0~
`
`Za:::
`
`Ow ->
`(/) a:::
`V>w
`~(/)
`
`z
`0
`t- ...I
`<(w
`Uz
`zz
`:::> <(
`~ I
`~ (.)
`0
`(.)
`
`CX)
`
`0 .-
`
`~
`
`• c_,
`~
`
`, -
`, -
`
`, -
`..-
`
`--
`
`..-
`
`N
`0 .....
`
`w
`t- a:::
`z<(
`w~
`-
`1-(cid:173)
`...J LL Uo
`
`(/)
`
`Page 2 of 42
`
`
`
`U.S. Patent
`
`Feb. 24, 1998
`
`Sheet 2 of 22
`
`5,721,902
`
`200
`
`USER TELLS WHAT INFORMATION THEY
`WANT
`IN NATURAL LANGUAGE
`
`SYSTEM SEARCHES DATABASE INDEX
`AND RETURNS A LIST OF
`ITEMS THAT MATCH
`
`USER CHOOSES ONE OF THE MATCHING
`ITEMS FROM THE LIST
`
`ITEM
`SYSTEM RETRIEVES THE CHOSEN
`FROM THE DOCUMENT DATABASE
`
`USER KEEPS OR DISCARDS THE PARTS
`THEY WANT FROM CHOSEN
`ITEM
`
`202
`
`204
`
`206
`
`208
`
`210
`
`FIG. 2
`
`Page 3 of 42
`
`
`
`"' '-0 fJ
`
`ii--
`N
`"' .....:a
`Ol
`
`~
`~
`w
`~
`~ !'D
`
`~
`....
`):
`~
`
`QC
`
`rt> a
`~ .....
`e • 00
`
`•
`
`L--------------------~
`I
`I
`I
`I
`I
`
`DOCUMENT
`
`117
`
`FIG. 3
`
`DATABASE
`
`INDEX
`
`SERVER
`QUERY
`
`11 6
`
`FOOi
`124
`
`AUTHENTICATION
`
`120 """"--I ENROLLMENT
`
`DOCUMENT PREPARATION/
`
`ACCOUNTING
`
`126
`
`122
`
`1 1 9
`
`118
`
`DATABASE
`.,__ ►I ACCOUNTING
`
`DATABASE
`1..a ►I IMAGE/TEXT
`
`-------7
`
`100
`
`1 _______ 1 ________________ _
`
`DOCUMENTS 11 2
`INCOMING PUBLISHER
`
`l
`.-110
`
`,.-114
`
`I
`
`106
`
`104
`
`102
`
`~ -----t---
`
`SOFTWARE
`
`USER
`
`-108-------r
`
`L_
`I
`I
`I
`I
`I
`I
`I
`I
`II I SESSION
`
`SERVER
`
`Page 4 of 42
`
`
`
`U.S. Patent
`
`Feb.24, 1998
`
`Sheet 4 of 22
`
`5,721,902
`
`300
`
`USER ACCESSES SESSION
`MANAGER THROUGH CHANNEL
`108
`
`315
`
`~ - - - - - l USER AUTHENTICATION
`
`DELIVERY OF SOFTWARE
`106 TO USER/
`USER REGISTRATION
`
`WAIT FOR NEW QUERY
`
`PREPROCESS AND SPELL
`CHECK QUERY
`
`QUERY SENT TO SESSION
`MANAGER
`
`QUERY SENT TO QUERY
`SEARCH ENGINE
`
`305
`
`310
`
`320
`
`325
`
`330
`
`335
`
`340
`
`USER DOES
`NEW QUERY
`
`SESSION MANAGER GETS
`RESULTS,PERFORMS AN
`INTERSECTION
`IF ITS A
`RECURSIVE QUERY AND
`TRANSMITS TO USER
`
`345
`
`USER DOES
`RECURSIVE QUERY
`
`USER SELECTS DOCUMENT
`
`SESSION MANGER GETS
`AND TRANSMITS DOCUMENT
`
`350
`
`355
`
`FIG. 4
`
`IN
`DOCUMENT SHOWN
`OPEN WINDOW /DOCUMENTS
`IN OPEN WINDOW(S) USED TO
`FORM COMPOSITE DOCUMENT
`
`360
`
`Page 5 of 42
`
`
`
`~
`....
`a ~
`~ ?"
`
`Segrthj \
`
`34'
`
`~ a
`~ = t"'f'.
`~ • r:n •
`
`D Religion
`O Politics
`0 Sport~
`0 Arts
`
`A. ...
`1 ..... ... ...
`
`TV /Radio
`
`g
`
`~ t! ~
`
`Magazines
`
`f1ID
`
`~~~Pi~s
`
`39 Items Found
`
`Newspapers
`"-"
`~~
`
`~NONEij
`ij
`~ ALL
`---------..
`
`Paste Dictionary Thesaurus Check
`Spell
`
`[Query: Who was James Doohan]
`
`Homework Helper -
`
`= File Edit Options Tools View Window Help
`=
`
`w ~ ~ ~ "" ~ 0
`
`~S\~~
`
`Copy
`
`-
`
`Cut
`
`Who was James Doohan?I
`What do you want to know?
`New Search
`
`
`Save
`
`~
`
`lll @
`
`~yo STAR TREK VI --THE UNDISCOVERED COUNTRY
`
`) .....
`
`34
`
`James Doohan
`
`g100
`
`34
`
`\
`
`' ) .....
`
`STAR TREK Ill -THE SEARCH FOR SPOCK
`A11thnr
`I itlA
`
`Score
`
`o-.:~ 100 STAR TREK IV--THE VOYAGE HOME
`~100
`_---,,,.
`
`34
`4
`'-
`1
`4
`34
`
`"-......
`
`34
`3
`
`Magill's Survey of Cinema------3
`01-01-19941 16K I
`Magill's Survey of Cinema --3
`\ -t
`~Two row results II View Document I
`~D Advanced Searchingij
`./ll
`110 Use Subjects
`/
`
`~ \C = N
`
`i,,,,,...
`N
`~ -...J
`OI
`
`N
`N
`~
`VI
`i
`~
`
`3b
`3
`
`34
`
`)
`
`01-01-1994 j 13K 10--..._
`Magill's Survey of /Cinema
`01-01-19941 14K I ( 0
`Archive Photos ~ /"
`01-01-19941 12K I
`10
`
`-
`
`9
`
`I Si7P. I C,rnde
`
`lnntP
`I Publication
`
`3a
`
`,I, 34
`\ ~
`L\
`
`USA TODAY
`09-30-1994 I 7K I 9--
`
`FIG. 4A
`
`I~ Trekkers' coordinates set for new film
`
`~1 0
`
`2 )
`
`3
`
`ROBERT W. WELKOS
`'Star Trek': Even Space Has Its Road Bumbs; Film: First, the s Los Angeles Times
`
`Page 6 of 42
`
`
`
`~ ....
`N
`-....I
`....
`Ol
`
`\C s
`
`~
`~
`0'I
`~
`
`f
`
`~ -i
`
`~
`
`~ = """" a
`
`~ • 00
`
`•
`
`47
`
`--I----
`
`c==
`
`t
`
`Help
`(Q)
`... ,,.
`I,,. ... ,,.
`
`Paste Dictionary Thesaurus Check
`Spell
`
`el @
`
`tielp
`
`Copy
`
`--
`
`Cut
`
`Who =I
`What do you want to know?
`=
`New Search Save
`
`FIG. 4B
`
`r'---341
`
`09-30-1994 I 13K I 10
`1,
`.. Magill's Survey of Cinema
`=
`01-01-19601 14K I 0
`Archive Photos
`
`/
`
`= ..
`
`here Spock has just undergone a
`
`ock, but his mind is blank. After
`net Genesis in search of Spock.
`Th
`t of the former Enterprise companio
`n McCoy's body, thus the doctor wer
`am Shatner) ascertained that Spock
`the weather and finally had to be
`
`Kirk comes through once again and
`
`rd Nimoy), Dr. McCoy (DeForest
`
`==
`t
`
`-
`
`I
`
`/
`
`348
`
`\ ..... ,~
`
`\ ~~
`
`)· ·.
`
`t,) 11 vi
`
`~11
`
`~~ ,,,,-~'I 'I ~~
`
`-J ~
`
`/<r
`
`I
`
`l~~]))l
`~~~
`
`39 It
`
`lUiQ!
`II AL
`
`-e,
`Scar ~ _-
`
`t)~ _..-
`~-
`
`Ill -THE SEARCH FOR SPOCK, ~
`
`I,,.• Survey of Cinema, 1 Jan 1994.
`
`STAR TREK
`
`James Doohan, Archive Photos, 1 Jan 1960. SPOCK
`
`James Doohan
`
`Query: Who was James =I STAR TREK Ill -THE SEARCH FOR SPOCK
`
`\~\.\~~
`
`.
`
`james Doohan]
`
`[Query: Who was
`
`Homework Helper -
`
`= File Edit Qptions Tools View Window
`=
`
`m ~ ~ ~ 'ii\ ~ 0
`
`Page 7 of 42
`
`
`
`~ ,,..
`N
`'!...)
`Ol
`
`\0 = N
`
`N
`N
`s,
`..J
`~
`~ g
`
`1--' i 00
`)t
`i
`
`~ a
`~ .....
`~ • 00
`
`•
`
`348a
`
`FIG. 4C
`
`~-----------------------01.!:::==============F=~
`
`~ ~ ~ ~ ...,..,.
`Vvvl,,vv ~ ~ ~ ~
`~ vv-./VVVVvVV ~ ~
`'WW WW ~ -
`~......,._,,,,._....,..,..,~""""""'"'~~
`~,..,..,__.,_............._~~IVY#~~
`~,.,..____..,,._.............,.
`
`,_,__.., ~ ~
`
`h
`
`0
`n
`t
`n
`
`~ ~ -
`
`;....--: ~ --=--"'I
`
`,e
`
`\~~ ,,..~-..~
`\~
`""I
`
`,-
`,,,.-- ...-.,
`
`11
`'I\
`
`I\
`/
`
`§f"
`
`e-1
`--
`II g,
`'e-1~1
`
`Sco~I
`
`39 ltll
`
`341
`
`~l==============~I II= I COMPOSITE DOCUMENT WINDOW
`
`rd Nimoy), Dr. McCoy (DeForest
`
`Jan 1960. IISPOCK
`
`James Doohan, Archive Photos,
`
`I
`THE SEARCH FOR SPOCK,
`
`1,,.1.._IISurvey of Cinema, 1 Jan 1994.
`
`Ill -
`
`IISTAR TREK
`
`James Doohan
`
`Query: Who was Jamesl! = I STAR TREK Ill -THE SEARCH FOR SPOCK
`
`Wholi=i
`What do you want to know?
`=,
`New Search
`
`Save
`
`1 ... ~
`
`... ...
`
`0
`
`LJ-1-
`
`Paste Dictionary Thesaurus Check
`Spell
`
`tII @
`
`Copy
`
`Cut
`
`m ~ ~ ~ ~ ~ 0
`
`[Query: Who was james Doohan]
`
`Homework Helper -
`
`=I
`
`t!elp
`
`= 1 File Edit Qptions Tools View Window
`
`Page 8 of 42
`
`
`
`U.S. Patent
`
`Feb. 24, 1998
`
`Sheet 8 of 22
`
`5,721,902
`
`DOCUMENT IDENTIFICATION
`NUMBER
`
`J
`
`401
`
`PUBLISHER IDENTIFICATION
`NUMBER
`TITLE
`AUTHOR LAST NAME
`AUTHOR FIRST NAME
`COPYRIGHT DATE
`COPYRIGHT HOLDER
`COPYRIGHT MESSAGE
`SOURCE TYPE
`READABILITY
`LEVEL
`SUBJECT
`INFORMATION
`DOCUMENT
`TYPE
`
`LAST RETRIEVAL
`DATE
`
`J
`__,,.....
`
`_r--
`
`_,,,--
`
`__r-
`
`_,,,--
`
`__,r-
`
`__r-
`
`402
`
`403
`404
`405
`406
`407
`408
`409
`
`l-.r-
`
`410
`
`411
`~
`
`412
`L--r
`
`_,r-
`
`413
`
`ORIGINAL INSTALL
`DATE
`
`414
`kF
`
`LAST INSTALL
`DATE
`INSTALL COUNT
`FILENAME
`OFFSET
`DOCUMENT SIZE
`
`415
`~
`
`L,r---___,.--.._ ---
`
`i__.,,---
`
`416
`
`417
`418
`419
`
`420
`
`DEWEY DECIMAL
`CALL NUMBER
`LIBRARY OF CONGRESS
`421
`~
`CALL NUMBER
`PUBLISHER NAME
`PUBLICATION DATE
`PUBLICATION NAME
`EDITION
`
`..._,,--.
`
`__,,-
`
`422
`423
`424
`~
`425
`
`L-,-'-
`
`DOCUMENT IDENTIFICATION Lf 432
`
`NUMBER
`FILENAME
`IMAGE
`OFFSET
`
`Lr
`__r
`
`432
`
`433
`
`DOCUMENT
`SIZE
`
`434
`lJ
`
`441
`Lr
`442
`
`._,r-
`
`___,.-
`___,.--
`
`_.,r-
`
`_r-
`
`443
`444
`445
`446
`447
`~
`448
`~
`449
`450
`~
`
`k.r""
`
`Lr
`451
`
`PUBLISHER IDENTIFICATION
`NUMBER
`NAME
`ADDRESS 1
`ADDRESS 2
`CITY
`STATE
`ZIP
`COUNTRY
`CONTACT NAME
`CONTACT ADDRESS
`CONTACT TELEPHONE
`NUMBER
`
`FIG. 5
`
`Page 9 of 42
`
`
`
`\C s
`~ ,,..
`N
`.......
`°'
`,,..
`
`N
`N
`s,
`\0
`~ a
`
`~
`
`i QC
`1-1
`,J;t-
`N
`
`~ a
`~ f"'f'.
`•
`0 • 00
`
`ADDR: ~
`DOC • .1t ~
`
`ADDR: ~
`DOC . .1t ~
`
`-~
`
`(
`\
`\ • • •
`)
`(
`:
`I
`\ • • •
`
`~
`
`I
`
`)
`
`463
`
`I
`
`462
`
`FIG. 5A
`
`•
`
`•
`
`•
`
`/
`
`,.., 1
`
`ADDR: ~ ADDR: ~
`DOC.# ~ DOC.# ~ 1 nnr .u
`
`ZYGOTE
`
`ADDR: ~ ADDR: ~ ADDR: ~
`DOC.# ~ DOC.# ~ DOC.# ~
`ADDR: ~ ADDR: ~ ADDR: ~
`DOC.# ~ DOC.# ~ DOC.# ~
`ADDR: ~ ADDR: ~ ADDR: ~
`DOC.# ~ DOC.# ~ DOC.# ~
`ADDR: ~ ADDR: ~ ADDR: ~
`DOC.# ~ DOC.# ~ DOC.# ~
`ADDR: ~ ADDR: ~ ADDR: ~-V
`DOC.# ~ DOC.# ~ I DOC.# ~
`LOCATION DATA
`
`\
`
`DOC t.D./
`
`•
`•
`•
`
`--L----
`
`--
`
`-
`
`--.....
`
`•
`•
`•
`
`-
`
`-
`
`ABYSS
`
`ABATE
`
`ABANDON
`
`ABACUS
`
`.
`
`461
`
`AARDVARK
`
`TERMS
`SEARCH
`
`460 ------.....
`
`Page 10 of 42
`
`
`
`N
`0
`~ \0
`~
`N
`~ ....J
`01
`
`!
`~
`
`N
`N
`g,
`0
`I-'
`
`I-' J
`~
`~ ?"
`
`""'" ~ a
`1
`•
`~ • rJ)
`
`480
`
`FIG. 5B
`
`470
`
`460
`
`118 b
`
`INFORMATION
`IMAGE-TYPE
`
`I
`I
`I
`I
`I
`I
`I
`I
`1
`
`INFORMATION
`TEXT-TYPE
`
`INFORMATION
`TEXT-TYPE
`
`IMAGE
`
`OFFSET POINTERS
`
`FILENAME AND
`
`432, 433
`
`IMAGE TABLE
`
`(RECORDS 430)
`
`DEPENDENT
`
`------L-----
`OFFSET POINTERS I
`I
`I
`I
`I
`I
`I
`
`417, 418
`
`FILENAME ANO
`
`OFFSET POINTERS
`
`FILENAME AND
`
`417, 418
`
`DOCUMENT INFORMATION
`
`(RECORDS 400)
`DIRECTORY TABLE
`
`/118
`
`11 Ba
`
`Page 11 of 42
`
`
`
`~ -..
`N
`'!..I
`Ul
`
`\0 = N
`
`N
`
`~ N
`:::
`a
`
`:r
`00
`
`"""' ~
`--~
`?"
`
`~
`
`QC
`
`I
`00 • ;p
`0 •
`
`----
`
`INDEXES
`
`117
`
`1-114a
`
`DOCUMENT
`----
`a
`__ ___.___
`\ 11 6
`I
`-_ _ _ _ _ _ _ _ -.-J
`I
`I
`I
`I
`142 I
`I
`I
`1 44 J
`
`~--;:= ,__=_ -~=-~-
`
`~_____._-
`
`---.---
`SCHEDULER
`
`,,1,----
`
`1 40
`
`ENGINE
`i--a-~ SEARCH
`
`API
`
`---------:=.-~---~-------
`
`) \
`-----1 f'-
`I
`I
`I
`1 l -
`I I
`I
`I I
`I I
`I I
`I I
`I I
`I (
`
`-
`
`1 32
`
`-+--~·----
`
`r-.----~---__
`
`.-------f-.
`
`MANAGER
`SESSION
`ACTIVE
`
`I
`I
`I
`I
`I
`I
`I
`(
`------------------
`
`FIG. 6
`
`r---118b
`
`11 Ba
`119
`
`IMAGE
`TEXT &
`
`DOCUMENTS
`
`..
`I
`
`I
`
`...... -+--~-l ENGINE
`SEARCH
`___ __,__
`
`,------<----.
`
`1 3 4
`
`I r---------l-~/, --=-=~~
`-----.....1
`MANAGER ....__~ ENGINE
`QUERY
`SESSION
`,------___.____.___ ~
`
`1
`
`13a
`
`INTERFACE
`
`ACCOUNTING..,.._~ ACCOUNTING
`
`MANAGER
`
`INFO
`
`I USERPROFILE
`I ----
`
`l "------.....J )
`I
`RETRIEVAL
`I DOCUMENT
`
`--1--___,,
`
`-
`
`SYSTEM
`
`/
`
`, 36
`
`:
`
`Page 12 of 42
`
`
`
`U.S. Patent
`
`Feb. 24, 1998
`
`Sheet 12 of 22
`
`5,721,902
`
`130
`
`130c
`
`130a'\__# QUERIES++
`
`LOGIN
`
`INITIA
`PROFIL
`
`130 b
`
`WAIT FOR
`USER
`ACTION
`
`# QUERIES >1
`
`DOCUMENT
`RETRIVAL
`STATE
`
`# QUERIES > 1
`
`RECURSIVE
`QUERY
`STATE
`
`130dj
`
`FIG. 6A
`
`Page 13 of 42
`
`
`
`U.S. Patent
`
`Feb.24, 1998
`
`Sheet 13 of 22
`
`5,721,902
`
`RECEIVE QUERY FIELDS
`FROM SEARCH ENGINE API
`140
`
`142a
`
`142
`
`IDENTIFY PART OF SPEECH
`OF EACH WORD
`IN NATURAL
`LANGUAGE QUERY
`
`142b
`
`142c
`
`INCREASE
`RELEVANCE
`
`142g
`
`N
`SEMANTIC
`NETWORK
`
`142d
`
`142h
`
`DECREASE
`RELEVANCE
`
`ADD EXPANSION
`WORDS
`
`142e
`
`PERFORM SEARCH ON
`INDICES DATABASE
`117
`
`142j
`
`NORMALIZE OUTPUT
`RELEVANCE SCORES
`
`142k
`
`TRANSMIT DOC. I.D.
`NUMBERS AND
`CORRESPONDING
`RELEVANCE SCORES TO
`SEARCH ENGINE API 140
`
`142 L
`
`FIG. 6B
`
`Page 14 of 42
`
`
`
`U.S. Patent
`
`Feb. 24, 1998
`
`Sheet 14 of 22
`
`5,721,902
`
`700
`
`DETERMINER
`NOUN
`
`VERB
`
`) 710
`
`Pa
`Pb
`Pc
`•
`•
`•
`
`DETERMINER NOUN
`Pg
`Pd
`Ph
`Pe
`Pi
`Pf
`•
`•
`•
`•
`•
`•
`
`, / 720
`VERB )
`Pj ,,
`Pk i'
`Pl
`
`I
`
`• • •
`
`: }
`
`INITIAL
`MATRIX
`
`SUBSEQUENT
`MATRIX
`
`FIG. 7A
`
`750
`
`720
`
`• • •
`
`DETERMINER
`NOUN
`VERB
`
`710
`
`)
`
`Px
`Px
`Px
`•
`•
`•
`
`DETERMINER
`Pd
`Pe
`Pf
`•
`•
`•
`
`NOUN
`Pg
`Ph
`Pi
`•
`•
`•
`
`VERB
`PJ·
`Pk
`Pl
`•
`•
`•
`
`INITIAL
`MATRIX
`
`SUBSEQUENT
`MATRIX
`
`FIG. 7B
`
`Page 15 of 42
`
`
`
`"' \0 = N
`
`"""'6
`N
`"' ....J
`Ul
`
`N
`N
`~
`UI
`
`(D -....
`00 =- (D
`
`.. ~ ....
`
`N
`?'
`(D
`"!!'j
`
`QC
`\C
`\C
`
`~ .... ~ = ....
`
`•
`00
`~ •
`
`FIG. BA
`
`66 63 60 60
`60 60
`
`5,721,902
`
`Sheet 15 of 22
`
`69 67
`63 62
`60 60
`
`73 72
`66 64
`66 63
`60 60
`
`76 74
`69 67
`69 67
`69 65
`60 60
`
`79 77
`74 70
`74 70
`74 70
`69 65
`60 60
`
`V8‘OlT
`
`84 80
`79 75
`79 75
`79 75
`74 70
`74 70
`60 60
`
`89 85
`89 80
`84 80
`84 80
`79 75
`79 75
`74 70
`60 60
`
`I WS 6 I ws 1 I ws 8 I ws 9
`
`Feb. 24, 1998
`
`800
`
`U.S. Patent
`
`008
`
`0909}£999}2969}@Z££)¥ZOL}LL624|O8V8}SG868!06¥6;S6O01Ol
`6sm|es|Zsa|gsm|gsm|ysm|gsm|zsm|1sm|osm|#
`0909|29£9|v999|}2969|OLWZ|SZ6Z|O868|06¥6/1S6OOl!]60909|£999|4969}OLVL}SZ62|O8V8{iS868/06
`oOL1!80909|S969|OLVL}SZ64}O8V8|Sg68/06ool]Z0909;S969|OLvZ{|SZ62|0868/06O01!90909|OLvZ|SZ62|O868/06O01;S0909|OLVL{|SZ68|}06OOL]¥0909|OfSZ{06OOF;£&
`
`I ws 1 I WS 2 I ws 3 I ws 4 I ws 5
`
`0909|}06OOL|ZOOlOOF)|
`
`10 100
`95 94 90
`9
`100 95 94 90
`100 90 89 85
`8
`7
`100 90 89 85
`6
`100 90 89 80
`5
`100 90 89 80
`90 89 75
`4
`100
`3
`100 90 75 70
`60 60
`100 90
`2
`100 100
`1
`# I ws 0
`
`Page 16 of 42
`
`
`
`~ ,,..
`N
`':...:a
`0-.
`
`\C = N
`
`g: a
`
`~
`~
`0'I
`~
`
`~ !
`):
`~ ?"
`
`~ = ~ a
`
`•
`00.
`0 •
`
`FIG. BB
`
`9
`
`88
`
`777
`
`6666
`
`55555
`
`444444
`
`000000000 11111111 22222222 3333333
`
`10
`
`6666
`
`55555
`
`4444444
`
`0000000000 111111111 22222222 33333333
`
`5555
`
`444444
`
`3333333
`
`00000000000 1111111111 222222222
`
`555
`
`444444
`
`33333333
`
`2222222222
`
`0000000000000111111111111
`
`444444
`
`33333333
`
`2222222222
`
`000000000000000 1111111111111
`
`333333
`
`22222222222
`
`0000000000000000000111111111111111
`
`8
`
`7
`
`777
`
`66
`
`6
`
`5
`
`4
`
`3
`
`2
`
`22222222
`
`)804c
`
`r--a04b
`11{11111
`
`804a
`
`11111111111111
`
`00000000000000000000
`
`/
`
`\B04b
`
`804a
`
`0000000000000000
`oooooo6ooo /
`I ao4a
`o
`
`9
`
`8
`
`7
`
`6
`
`5
`
`4
`
`3
`
`2
`
`10
`
`20
`
`30
`
`40
`
`50
`
`60
`
`70
`
`80
`
`90
`
`100
`
`850
`
`WORDS
`QUERY
`OF
`NUMBER
`
`Page 17 of 42
`
`
`
`'41 \0 = N
`
`la-\
`N
`....J
`01
`
`'41
`
`g: a
`
`~
`~
`
`N
`N
`s,
`
`~ i
`~t
`i
`
`[
`~
`~ • r:JJ •
`
`932
`
`928
`
`924
`
`920
`
`900
`
`UNIT
`
`READABILITY
`
`UPDATE
`INDEX
`
`UNIT
`
`916
`
`UPDATE
`
`TEXT
`
`UNIT
`
`UPDATE
`IMAGE
`
`UNIT
`
`CONVERSION
`
`PUBLISHER
`
`FORMAT
`
`CONVERSION
`
`PUBLISHER
`
`FORMAT
`
`•
`•
`•
`UNIT2
`
`FIG. 9
`
`L _____ _J
`I
`
`UNITN
`
`I
`.-1 ... a-----a..,....il ANALYSIS
`I
`,-_/ __ 7
`
`UNIT1
`
`CONVERSION
`
`PUBLISHER
`
`FORMAT
`
`--912
`
`/
`
`916
`
`.., 1
`I
`I
`I
`I
`I
`1
`
`908
`
`...,I
`
`REFORMATTING 1..
`
`UNIT
`
`TEXT
`
`I
`
`904
`
`REFORMATTING-
`
`UNIT
`
`IMAGE
`
`Page 18 of 42
`
`
`
`~ -..
`N
`'!..'a
`f.Jl
`
`~ s
`
`N
`N
`s,
`"""" QO
`~ a
`
`"""" ~ QO
`J~
`~ ?"
`
`~ a
`~ ....
`~ • rJJ.
`
`•
`
`FIG. 9A
`
`460
`
`908b
`
`.__
`
`INFORMATION
`TEXT-TYPE
`
`CONVERTED
`
`IMAGE
`
`11 Bb
`
`DATABASE
`
`-
`
`-
`
`INFORMATION
`TEXT-TYPE
`CONVERTED
`
`)470
`
`INFORMATION
`IMAGE-TYPE
`CONVERTED
`
`480
`
`-
`
`-
`
`-
`
`UNCONVERTED
`
`INFORMATION
`TEXT-TYPE
`
`IMAGE
`
`-
`
`COMPONENT
`PREPARATION
`
`DOCUMENT
`
`UNCONVERTED
`
`INFORMATION
`TEXT-TYPE
`
`)900
`
`)908a
`
`- -
`
`'
`
`I
`
`PUBLISHER
`
`FROM
`
`112
`
`INFORMATION
`IMAGE-TYPE
`UNCONVERTED
`
`-
`
`;
`
`Page 19 of 42
`
`
`
`U.S. Patent
`
`Feb. 24, 1998
`
`Sheet 19 of 22
`
`5,721,902
`
`0
`0
`0
`
`lO
`
`, - - 7
`I
`I
`
`0 ..... ~ I
`
`I-a::
`
`<(
`
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`Vl
`"-+-- I-
`I
`0::
`0
`I
`I
`a..
`Vl
`1
`I
`L __ _J
`
`0
`N
`0 .....
`L
`
`(
`co
`.....
`0 .....
`
`0
`N
`0 .....
`L--
`
`? co
`......
`0 .....
`
`0
`N
`0 .....
`l,.._
`
`t IX)
`
`.....
`0 ......
`
`I- z
`Uo
`Wu
`-,_
`CDX
`:::> w
`Vl ..J
`
`-
`
`•
`•
`•
`
`t-z
`Uo
`WU
`-,_
`CDX
`::::>w
`Vl ..J
`
`t-z
`Uo
`We,
`-,_
`a::i X
`::::>w
`Vl ..J
`
`N
`N
`0 .....
`L--J
`
`N
`
`0
`
`~
`
`I-
`OU
`t-w
`:::>-,
`<(~
`Vl
`
`(X)
`0
`
`0 w
`
`..J Vl ot-
`o<
`a..~
`
`•
`•
`•
`
`Vl
`u
`-I-
`
`..J
`0
`a..
`
`lO
`......
`0 ....
`
`lO
`
`0
`
`co
`
`.._,.
`0
`0 .....
`"---
`
`I-z
`W..J ~o
`:::> 0
`ua..
`0
`0
`
`Page 20 of 42
`
`
`
`U.S. Patent
`
`Feb.24, 1998
`
`Sheet 20 of 22
`
`5,721,902
`
`1100
`
`1104
`
`1108
`
`SELECT
`MINIATURE
`CORPUS
`
`DETERMINE
`MINIATURE
`CORPUS
`STATS
`
`1112
`
`DETERMINE
`PRELIMINARY
`LEXICON
`1238
`
`ELIMINATE
`INCORRECT
`DOCUMENTS
`
`1128
`
`1120
`
`DETERMINE
`ROUGH
`CORPUS
`1220
`
`1124
`
`y
`
`1122
`
`SUBJECT
`LEXICON 720
`COMPLETE
`
`1132
`
`FIG. 11
`
`Page 21 of 42
`
`
`
`~
`\C
`~ ,.,.
`N
`~
`Ol
`
`N
`h)
`~
`lo-'
`N
`m.
`00. =-
`
`lo-' i
`~
`?"
`
`l'!!'j
`
`~ = I
`
`•
`~ • r:,;.
`
`1240
`
`1020
`
`SUBJ_DOC
`
`SUBJ_DOC
`SUBJECT
`
`LEXICON
`
`FIG. 12
`
`1238
`
`1224
`
`CHOOSER
`
`1210
`
`1008
`
`STATS
`POOL
`
`1238
`
`1236
`
`ANIMALS.LEX
`ARTS.LEX
`
`•••
`
`1238
`
`1214
`
`CORPUS
`ROUGH
`
`,-----; COM 8_ ST ATS
`
`1212
`
`1208
`
`1232
`
`1228
`
`CORPUS
`
`MINI
`
`CORPUS
`REFINED
`
`1200
`
`1220
`
`1216
`
`... ,suBJ_POOLi-.--.
`
`I
`
`1204
`
`QUERY
`
`1004
`
`DOCUMENT
`
`POOL
`
`Page 22 of 42
`
`
`
`U.S. Patent
`
`Feb. 24, 1998
`
`Sheet 22 of 22
`
`5,721,902
`
`119a
`
`119b
`
`SUBSCRIBER
`PROFILE
`DATABASE
`
`j
`ACCOUNTING
`RECORDS
`DATABASE
`---- ACCOUNTING
`119c _J
`RECORD
`--
`
`_;
`11 9c
`
`I
`
`I
`
`I
`
`ACCOUNTING
`RECORD
`
`119
`
`119d
`
`PUBLISHER
`INFORMATION
`DATABASE
`
`PUBLISHER
`RECORD
`
`440
`
`440
`
`PUBLISHER
`RECORD
`
`119
`
`gl
`
`RETRIEVAL
`ACCOUNT TABLE
`DATABASE/ c-
`
`119h
`
`RETRIEVAL
`ENTRY
`
`I
`
`I
`
`I
`
`r ,__
`
`119h
`
`RETRIEVAL
`ENTRY
`
`119e
`
`l
`DATABASE r~ 119/
`
`QUERY LOG
`TABLE
`
`QUERY
`LOG
`ENTRY
`
`I
`
`I
`
`I
`
`QUERY
`LOG
`ENTRY
`
`;~
`119/
`
`FIG. 13
`
`Page 23 of 42
`
`
`
`5,721,902
`
`1
`RESTRICTED EXPANSION OF QUERY
`TERMS USING PART OF SPEECH TAGGING
`
`FIELD OF THE INVENTION
`
`The present invention is directed to systems for identify(cid:173)
`ing documents corresponding to a search topic or query.
`More particularly, the present invention is directed to an
`automated multi-user system for identifying and retrieving
`text and multi-media files related to a search topic from a
`database library composed of information from many vari(cid:173)
`ous publisher sources.
`
`BACKGROUND OF THE INVENTION
`
`Information retrieval systems are designed to store and
`retrieve information provided by publishers covering differ(cid:173)
`ent subjects. Both static information, such as works of
`literature and reference books, and dynamic information,
`such as newspapers and periodicals, are stored in these
`systems. Information retrieval engines are provided within
`prior art information retrieval systems in order to receive
`search queries from users and perform searches through the
`stored information. It is an object of most information
`retrieval systems to provide the user with all stored infor(cid:173)
`mation relevant to the query. However, many existing
`searching/retrieval systems are not adapted to identify the
`best or most relevant information yielded by the query
`search. Such systems typically return query results to the
`user in such a way that the user must retrieve and view every
`document returned by the query in order to determine which
`document(s) is/are most relevant. It is therefore desirable to
`have a document searching system which not only returns a
`list of relevant information to the user based on a query
`search, but also returns the list to the user in such a form that
`the user can readily identify which information returned
`from the search is most relevant to the query topic.
`Existing systems for searching and retrieving files from
`databases based on user queries are directed primarily to the
`searching and retrieval of textual documents. However.
`there is a growing volume of multi, media information being
`published which is not textual Such multi-media informa(cid:173)
`tion corresponds, for example, to still images, motion video
`sequences and digital audio sequences, which may be stored
`and retrieved by digital computers. It would be desirable
`from the point of view of an individual using an information
`searching/retrieval system to be able to be able to query a
`library or database and identify not only text documents. but
`also multi-media files that are relevant to user's query.
`Moreover. it would be desirable if the searching system
`could return to the user not only a single list having both text
`and multi-media information relevant to the query search,
`but also a list which enabled the user to readily identify
`which of the text and multi-media fries were most relevant
`to the query topic.
`Each different publisher providing documents that may be
`retrieved by information retrieval systems typically uses its
`own information format to store and transmit its information
`files. Thus, an information searching/retrieval system which
`has a library database based upon information from many
`various publishers must be compatible with many different
`publisher formats. This compatibility requirement can serve
`to slow the performance of an information searching/
`retrieval system.
`It is well known in the prior art of information retrieval
`systems to permit a user to specify a single subject of a
`number of subjects for searching. For example, a user may
`wish to search only sports literature, medical literature or art
`
`10
`
`2
`literature. This avoids unnecessary searching through data(cid:173)
`base documents that are not relevant to the subject of interest
`to the user. In order to provide this capability, information
`retrieval systems must categorize documents received from
`5 publishers according to their subject prior to adding them to
`the database. Subjecting of incoming documents often
`requires an individual to read each incoming and make a
`determination regarding its subject. This process is very time
`consuming and expensive, as there is often a large number
`of incoming documents to be processed. The subjecting
`process may be further complicated if certain documents
`should properly be categorized in more than one subject. It
`would be desirable to have an automated system for pro(cid:173)
`cessing incoming documents which categorized each incom-
`15 ing document into one or more subjects, and which did not
`require an individual to read each incoming document and
`make a separate judgment categorizing the subject of such
`document.
`When a user of an information searching/retrieval system
`20 enters a search query into the system. the query must be
`parsed. Based on the parsed query, a listing of stored
`documents relevant to the query is provided to the user for
`review. In the prior art, it is known to use semantic networks
`when parsing a query. Semantic networks make it possible
`25 to identify words not appearing in the query, but which
`correspond to or are associated with the words used in the
`query. The number of words used to search the database is
`then expanded by including the corresponding words or
`associated words identified by the semantic network in the
`30 search instructions. This procedure is used to increase the
`number of relevant documents located by the information
`searching/retrieval system. Although semantic networks
`may be useful for finding additional relevant documents
`responsive to a query, it is believed that use of such networks
`35 also tends to increase the number of irrelevant documents
`located by the search. In fact, it is generally believed that the
`number of additional relevant documents identified through
`the use of semantic networks is roughly equal to the number
`of irrelevant documents which are also brought into the
`40 search results list as a result of the semantic network. It
`would be desirable to have a system for implementing a
`semantic network which maximized the number of relevant
`documents identified during the search, without substan(cid:173)
`tially increasing the number of irrelevant documents found
`45 by the search.
`Many publishers that provide documents to information
`retrieval systems require record-keeping in order to ensure
`accurate royalty payments. Record-keeping permits the pub(cid:173)
`lishers to determine the interest level in various documents
`50 produced by the publisher, and the demographics of users
`retrieving such documents. Thus, it would be desirable to
`have a searching/retrieval system that tracked not only how
`often each document stored in the system database was
`retrieved by users, but also the demographics of the users
`55 retrieving the documents and the query searches used to
`identify and retrieve such documents.
`It is therefore an object of the present invention to provide
`a searching/retrieval system which can query a library or
`database and identify not only text documents, but also
`60 multi-media files stored on the library or database that are
`relevant to query.
`It is a further object of the present invention to provide a
`searching/retrieval system that accepts a query and returns a
`single search results list having both text and multi-media
`65 information, which list is presented in a format that enables
`the user to readily identify which of the text and multi-media
`files are most relevant to the query topic.
`
`Page 24 of 42
`
`
`
`5,721,902
`
`4
`3
`It is a still further object of the present invention to
`of textual documents. The multi-media records have multi(cid:173)
`media information fields for representing only digital video
`provide a scalable computer architecture for implementing a
`(i.e., still images or motion video image sequences), digital
`searching/retrieval system which can query a database and
`audio or graphics information. and associated text fields,
`identify text documents and multi-media files stored on the
`5 each of the associated text fields representing text associated
`database that are relevant to query.
`with one of the multi-media information fields. A single
`It is a still further object of the present invention to
`search query corresponding to the search topic is received.
`provide an information searching/retrieval system which has
`The single search query is preferably in a natural language
`a library database based upon information from many vari(cid:173)
`format. An index database is searched in accordance with the
`ous publishers. and which is compatible with many different
`10 single search query to simultaneously identify document
`publisher formats.
`records and multi-media records related to the single search
`It is a still further object of the present invention to
`query. The index database has a plurality of search terms
`provide an information searching/retrieval system which has
`corresponding to terms represented by the text information
`a library database based upon information from many vari(cid:173)
`fields and the associated text fields. The index database also
`ous publishers, and wherein such information is stored in a
`central database in one or more common information for- 15 includes a table for associating each of the document and
`multi-media records with one or more of the search terms.
`mats.
`A search result list having entries representative of both
`It is a still further object of the present invention to
`textual documents and multi-media files related to the single
`provide an automated system for processing incoming docu(cid:173)
`search query is generated in accordance with the document
`ments to be stored on a library or database. which system
`20 records and the multi-media records identified by the index
`categorizes each incoming document into one or more
`database search. Text corresponding to the search topic is
`subjects, and which does not require an individual to read
`retrieved by selecting entries from the search result list
`each incoming document and make a separate judgment
`representing document records to be retrieved, and then
`categorizing the subject of such document.
`retrieving text represented by the text information fields
`It is a still further object of the present invention to
`25 associated with the selected document records. Digital
`provide a system for implementing a semantic network
`video, audio or graphics information corresponding to the
`which maximizes the number of relevant documents iden(cid:173)
`search topic is retrieved by selecting entries from the search
`tified during the query search, without substantially increas(cid:173)
`result list representing selected multi-media records to be
`ing the number of irrelevant documents found by the search.
`retrieved, and then retrieving digital video, audio or graphics
`30 information represented by multi-media information fields
`It is a still further object of the present invention to
`associated with the selected multi-media records.
`provide a system for using a semantic network which
`In accordance with a further aspect. the present invention
`maximizes the number of relevant documents identified
`is directed to a computer-implemented method and appara(cid:173)
`during a query search by semantically expanding the search
`tus for composing a composite document on a selected topic
`in response to the part of speech associated with each query
`35 from a plurality of information sources by searching the
`term in the search.
`plurality of information sources and identifying, displaying
`It is a still further object of the present invention to
`and copying files corresponding to the selected topic. A
`provide a searching system that queries a database to deter(cid:173)
`plurality of records, each of which is representative of at
`mine text documents and multi-media files relevant to the
`least one information file, are stored in a database. A single
`query, wherein weightings associated with proper nouns and
`40 search query corresponding to the search topic is received.
`slow words are adjusted prior to searching the database.
`The database is searched in accordance with the single
`It is a further object of the present invention to provide a
`search query to identify records related to the single search
`searching/retrieval system that accepts a query and returns a
`query. A search result list is then generated having entries
`single search results list including document relevance
`representative of information files identified during the
`values. wherein the document relevance values are indepen(cid:173)
`45 database search, and the search result list is displayed in a
`dent of the number of terms in the query.
`first display window open on a user display. Signals repre(cid:173)
`It is yet a still further object of the present invention to
`sentative of at least first and second selected entries from the
`provide a searching/retrieval system that tracks not only how
`search result list are