throbber
United States Patent [19J
`Doner et al.
`
`[54] APPARATUS AND METHOD FOR
`RETRIEVING AND GROUPING Th1AGES
`REPRESENTING TEXT FILES BASED ON
`THE RELEVANCE OF KEY WORDS
`EXTRACTED FROM A SELECTED FILE TO
`THE TEXT FILES
`
`[75]
`
`Inventors: Christopher G. Doner, San Francisco;
`Lawrence G. Miller, Saratoga; Ian D.
`Emmons, Richmond; Michael R.
`Barnes, Berkeley, all of Calif.
`
`[73] Assignee: Caere Corporation, Los Gatos, Calif.
`
`[21] Appl. No.: 948,669
`
`[22] Filed:
`
`Sep. 22, 1992
`
`Int. Cl.6
`............................. G06F 17/30; G06F 17/21
`[51]
`[52] U.S. Cl . .................... 395/605; 364/419.19; 395/348;
`395/759
`[58] Field of Search ..................................... 395/600, 159;
`364/419.08, 419.19
`
`[56]
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`4,359,824
`4,839,853
`4,868,733
`5,020,019
`5,060,135
`5,062,074
`5,211,563
`5,263,159
`5,276,616
`5,297,042
`
`11/1982 Glickman et al .................. 364/419.19
`6/1989 Deerwester et al. .................... 395/600
`9/1989 Fujisawa et al ........................ 395/600
`5/1991 Ogawa .................................... 395/600
`10/1991 Levine et al ............................ 364/200
`9/1991 Kleinberger ............................ 395/600
`5/1993 Haga et al .............................. 434/322
`11/1993 Mitsui ..................................... 395/600
`1/1994 Kuga et al ........................... 364/419.8
`3/1994 Morita ................................ 364/419.19
`
`OTHER PUBLICATIONS
`
`Salton et al., "Parallel Text Search Methods", Communica(cid:173)
`tions of the ACM vol. v31 Issue N2 p. 202(14), Feb. 1988.
`Kimoto et al "A Dynamic Thesaurus and Its Application to
`1991
`Associated
`Information
`Retrieval"
`Jul.
`IJCNN-91-Seattle IEEE Press pp. 19-29 vol. 1.
`
`I 1111111111111111 11111 lllll lllll lllll lllll lllll lllll lllll 111111111111111111
`US005598557 A
`[lll Patent Number:
`[45] Date of Patent:
`
`5,598,557
`Jan. 28, 1997
`
`Churbuck, "Haystack Searching", Forbes, v. 149, n. 4 Feb.
`17, 1992, pp. 130 (2).
`
`Donna Harman and Gerald Candela, "Retrieving Records
`from a Gigabyte of Text on a Minicomputer Using Statistical
`Ranking", Dec. 1990, pp. 581-589.
`
`Kimoto et al., "Automatic Indexing System for Japanese
`Text" 1989, Review of the Electrical Communications
`Laboratories, V. 37, No. 1, pp. 51-56.
`
`Al-Hawamdeh, S. et al., "Compound Document Processing
`System", Proc. of the Fifteenth Annual International Com(cid:173)
`puter Software and Applications Conf., pp. 640-644 Sep.
`1991.
`
`Salton, G. et al., "The SMART Automatic Document
`Retrieval System-An Example", Communications of the
`AMC, vol. 8 No. 6, pp. 391-398 Jun. 1965.
`
`Primary Examiner-Thomas G. Black
`Assistant Examiner-Jack M. Choules
`Attorney, Agent, or Finn-Blakely, Sokoloff, Taylor & Zaf(cid:173)
`man
`
`[57]
`
`ABSTRACT
`
`An apparatus for searching and retrieving files in a database
`without a user being required to provide keywords or query
`terms. A user first selects and opens a reference file. A
`natural language recognition algorithm is used to determine
`the subject words of the selected file. Next, a statistical
`comparison between the subject words and the contents of
`files in a database is performed. Based on the statistical
`comparison, files are assigned weighted relevancies. Rel(cid:173)
`evant files are prioritized and displayed to the user in groups.
`The groups are formed based on the retrieved files relevance
`to specific subject works of the selected file. The groups of
`retrieved files are displayed in associating with the subject
`word they are relevant to.
`
`30 Claims, 8 Drawing Sheets
`
`BASEOONTHESUBJECTWORDSOFTHE
`REFERENCE DOCUMENT, DITTRMINE WEIGHTED
`RELEVANCE OFDOCUMEN'iS IN THE DATABASE
`
`RANKANDOISPLAYTHEREI.EVANT
`DOCUMENTS ACCORDING TO THEIR WEIGHTS
`
`r ?OJ
`
`ros
`
`DITTA MINE 11-IETllREE MOST COMMON
`SUBJECT WORDS IN THE REFERENCE DOCUMENT
`
`FOR EACH OFTliE Tl-lREE MOST COMMON SUBJECT
`\VOROS. RETTIIEVE ANO PRIORITIZE DOCUMENTS
`RELEVANTTOTliOSESUBJECTWOROS
`
`,.--- 71 D
`
`Page 1 of 15
`
`GOOGLE EXHIBIT 1036
`
`

`

`U.S. Patent
`
`Jan.28, 1997
`
`Sheet 1 of 8
`
`5,598,557
`
`Static
`Memory
`106
`
`Mass Storage
`Device
`
`107
`
`{}
`
`101
`
`{~
`
`Bus
`
`fi
`
`Processor
`102
`- - - - -
`
`I
`1
`00 J
`
`- - -
`
`-
`
`Main
`Memory
`104
`
`{}
`
`~s
`
`OCR
`
`108
`
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`L - - - - -
`
`y
`
`Display
`'
`121 "
`
`A
`
`-
`
`Keyboard
`122
`
`Cursor
`Control
`123
`
`Hard Copy
`Device
`124 "
`
`.A
`
`Sound
`;Recording anc
`Playback ,_
`"
`Device
`125
`
`A
`
`Scanner
`126
`
`Figure 1
`
`Page 2 of 15
`
`

`

`U.S. Patent
`
`Jan.28, 1997
`
`Sheet 2 of 8
`
`5,598,557
`
`Import Files
`201
`
`Manual
`Input 202
`
`Scan
`
`Specify
`Zones 204
`
`Recognize
`205
`
`Edit
`
`Index
`
`207
`
`Figure 2
`
`Page 3 of 15
`
`

`

`....::a
`01
`Ol
`-...
`00
`\0
`Ol
`-...
`01
`
`~IG_3
`
`s>
`
`¢,
`
`r,;_ =(cid:173) tD a
`
`00
`0 ....,
`
`lj,)
`
`....
`i:.... ?
`
`s:,:i
`N
`
`-...J
`\0
`\0
`
`('0 = """'"
`
`"""'"
`~
`~
`•
`rJ'J.
`0 •
`
`~ Wonder Products
`
`I Thingamagigs
`l!J Gidgets
`
`W~©l~®ft®
`l!J Whatsits
`
`304
`
`301
`
`~ Wonder Company
`
`l!J Sales
`
`. · .. : --~---.. ·.·. ·. :-:·.: :.:_;_ ·.: :·. ·:: :·: .. ·-:.·:: :· :: :: :
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... .
`. · 1· ................. · ...................... .
`. --... ·1· ... ·j: ......... -.•~·.-:::. =•·::.·.·.:::: ::·
`
`IW@@lllllliJil®f!ilft ~~®rroft @@ffilf©llil .• ".
`
`Weighted Boolean Search ...
`Weighted Word Search ...
`Options
`
`=
`
`Page 4 of 15
`
`

`

`U.S. Patent
`US. Patent
`
`Jan. 28, 1997
`Jan. 28, 1997
`
`Sheet 4 of 8
`Sheet 4 0f 8
`
`5,598,557
`5,598,557
`
`LO
`c:,
`s:t"
`
`<O
`c:,
`s:t"
`
`r--
`Ev
`c:,
`s:t"
`
`__Ix!!!__mi
`
`V\\\\\\\ggmmmEggsEgg;§fl
`
`C\J
`c:,
`s:t"
`
`_mocmo
`00
`
`flWlM-le
`
`§\
`-Q)
`3: ....
`
`O'l
`"'O
`
`Q)
`"O
`C
`0
`$
`
`
`
`
`
`6023$225”Eo>>
`
` mENmom—2
`IIOIIIIIIOIII 5%252%,
`Haggigdfl
`253m55%63%93%:um2m
`
`·,:·. ::: "O
`,,,•., C
`·••,•
`0
`··•.·· >
`::: -:: >
`······
`......
`: .. : . ·.·:
`·.·.·. :.:: ~:::!::::=====================~
`......
`. ..
`II::·.·:
`
`......
`<I O'l : : : ·:. ••
`::J ••••••
`t>
`..DI •••• ••
`<J.)
`• ••
`•
`.....
`0
`·.::-:: Q)
`.-.: .. O'l
`: : : • :-.
`"O
`
`. :: :-: ~
`:::-· .. : ~ <J.)
`
`
`.gnuI.conunu-u-ucn-un-uo-ua-on
`.n..-uu.onnun-non:.III.‘I
`
`
`
`
`
`III-IOIIII-IIIIII-IIOIII
`
`69:35965
`
`.................................
`
`....................
`...
`
`0.. ::,•:
`Q) •• ••
`
`Page 5 0f 15
`
`Page 5 of 15
`
`

`

`.....:a
`Ol
`Ol
`,..
`00
`\0
`Ol
`,..
`Ol
`
`00
`
`s,
`tit
`('I) ....
`00 =(cid:173)
`
`('I)
`
`-...J
`\C
`\C
`....
`YJ
`N
`c:..i ?
`
`~ = "'""" ~ = "'"""
`
`•
`00.
`0 •
`
`508
`
`507
`
`506
`
`505
`
`504
`
`503
`
`=
`'v I L'.
`• •••• ••
`·:_!_ ••• •• •••• ·._ • .__.. ••• •• ••• •.· ••• : • •••• • .. •••• • •• : • •••• ·--·-~ • : • ••• • •••• • •• : • ••• •• •,._!_~--:-:: ::. ·: ·.: :·. •;: :·-· •• ·•:.·:: :· :: :-·:
`· · · · · -~-· ·2· · · · · · ·1· w·
`· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ~ · · · · · · · · · · · · · · · · ·· · · · · · · · · · · ·. · · · · · · · · · · ·
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ....
`...... ·····LJ····· ·····~····•·j····.~::.···:.··.:~~::::.·::.:.·:.·.•.:-::::::
`. -.--.-.--.--.--.--.-·····j······ .-.... LJ
`File
`De_g_ug
`=
`'v IL'.
`
`~IG_5
`
`514
`
`product
`(517
`
`510
`
`"?
`
`◊'
`
`machines
`
`(516
`
`--
`
`511
`
`gadget
`
`(515
`
`509
`
`l!J Thingamagigs
`l!J Gidgets
`
`\VM'D©l@@it~
`
`Whatsits
`
`lc:17 Wonder Company
`
`Wonder Products
`l!J Sales
`
`501
`
`502
`
`Widgets
`
`Current Agent Document: Widgets
`
`Name
`
`IQJ@©l!IIUil'il@ITilit b\@J@lrilit @@(ill!'@lhl
`
`'v I 6. Ill=
`
`Help
`
`500
`
`Admin Options Window
`
`Results Search
`
`Edit
`
`Page 6 of 15
`
`

`

`U.S. Patent
`
`Jan.28, 1997
`
`Sheet 6 of 8
`
`5,598,557
`
`INPUT A LIST OF KEYWORDS
`
`SEARCH FOR DOCUMENTS MEETING
`THE KEYWORD REQUIREMENTS
`
`601
`
`602
`
`COMPUTE THE IDF _s- 603
`FOR A DOCUMENT
`
`COMPUTE THE RELEVANCE OF
`THE DOCUMENT TO A KEYWORD
`
`604
`
`NO
`
`606
`
`SUM THE RELEVANCES OF DOCUMENT
`TO EACH OF THE KEYWORDS
`
`NO
`
`RANK EACH DOCUMENT
`ACCORDING TO ITS ASSIGNED WEIGHT
`
`608
`
`FIG. 6
`
`Page 7 of 15
`
`

`

`U.S. Patent
`
`Jan. 28, 1997
`
`Sheet 7 of 8
`
`5,598,557
`
`~
`
`SELECT AND OPEN A
`REFERENCE DOCUMENT
`
`PARSE THE REFERENCE
`DOCUMENT INTO SENTENCES
`
`DISREGARD STOP WORDS
`
`DETERMINE PARTS OF SPEECH
`FOR EACH WORD IN THE SENTENCE
`
`DETERMINE THE SUBJECT
`WORD OF THE SENTENCE
`
`NO
`
`701
`
`702
`
`703
`
`704
`
`705
`
`BASED ON THE SUBJECT WORDS OF THE
`REFERENCE DOCUMENT, DETERMINE WEIGHTED
`RELEVANCE OF DOCUMENTS IN THE DATABASE
`
`RANK AND DISPLAY THE RELEVANT
`DOCUMENTS ACCORDING TO THEIR WEIGHTS
`
`DETERMINE THE THREE MOST COMMON
`SUBJECT WORDS IN THE REFERENCE DOCUMENT
`
`FOR EACH OF THE THREE MOST COMMON SUBJECT
`WORDS, RETRIEVE AND PRIORITIZE DOCUMENTS
`RELEVANT TO THOSE SUBJECT WORDS
`
`707
`
`708
`
`709
`
`710
`
`FIG. 7
`
`Page 8 of 15
`
`

`

`...J
`Ol
`~ Ol
`00
`\0
`~ Ol
`Ol
`
`~
`
`s,
`~
`...,..
`l'D
`00 =- 1'0
`
`~ ?
`
`-...J
`\0
`\0
`I-"
`"'~
`N
`
`~ = f"'+,-
`~
`~
`•
`rJ'J.
`0 •
`
`Current Agent Document: Widgets
`. ·.·. ·.· : . ·.·. --~--~-=-~ .. : . ·.· .. ·.·. ·~-~--~"'~ .. ·.·. ·.· :_._·: _•. ·.·. ·. __ .:. ·.· . ·.·. · .. : . ·: .. ·.·. ·. :-:·. ::. ·: ·.: :·.·:: :·-· .. ··:.·:: :· :: :-·:
`····························································~·········································
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ....
`· · · · · · · · · 1 · · · · · ~ · · · · · ·j · · · · · r · · · · ·j · · · · · ~--... · .... · · ·.:::: = ..... ·~· ... =•·:. ·.·. ·.: =:: = ..
`· · · · · ·~ · · · · · t· · · · · ·~ · · · · · · · · · · ·r · · · · · tm·
`De_Qug I~
`= File
`=
`'v I 6.
`
`~IG_S
`
`801
`
`802
`
`product
`
`~dvertisement
`
`[g'.issertation
`
`~ewspaper
`
`~agazine
`
`[§~(~:~~~~L~ I
`
`=I I ~S~91~~;TI
`
`il machines ~
`
`1-=
`
`II
`
`gadget I
`
`'
`
`-
`
`Widgets
`
`Help
`
`Admin Options Window
`
`Search
`
`Results
`
`Edit
`
`@@©lllllliJil®lfilit &i@@IJilit ~@IIDU'©lhl
`
`800
`
`Page 9 of 15
`
`

`

`5,598,557
`
`1
`APPARATUS AND METHOD FOR
`RETRIEVING AND GROUPING IMAGES
`REPRESENTING TEXT FILES BASED ON
`THE RELEVANCE OF KEY WORDS
`EXTRACTED FROM A SELECTED FILE TO
`THE TEXT FILES
`
`FIELD OF THE INVENTION
`
`The present invention pertains to the field of computer(cid:173)
`ized information search and retrieval systems and methods.
`More particularly, the present invention relates to an appa(cid:173)
`ratus and method for searching and retrieving text found in
`a database as a function of their relevancy to a desired
`subject matter.
`
`BACKGROUND OF THE INVENTION
`
`5
`
`2
`that a user specifies a search for (keyword 1 AND keyword
`2) OR keyword 3, the computer retrieves all texts containing
`keyword 3 plus those texts containing both keyword 1 and
`keyword 2. Two examples of this type of text retrieval
`system are the LEXISTM and DialogTM systems.
`Even though computerized search and retrieval systems
`greatly facilitate a user in locating relevant texts, there yet
`remains many disadvantages with these systems. One dis(cid:173)
`advantage of this type of prior art search and retrieval
`10 method is that the user is required to anticipate one or more
`keywords used to identify and distinguish relevant texts. In
`other words, the user must guess the words used by the
`author of a desired text. This problem arises because a user
`typically does not have advance knowledge of how the texts
`l5 of interest are worded. If a user fails to guess appropriate
`keywords, highly relevant text might be missed.
`Another disadvantage with typical prior art search and
`retrieval systems is that picking significant keywords is a
`tricky and delicate operation. If a keyword is too common
`and/or if a user utilizes an inclusive OR function to join
`multiple keywords, a search request can potentially result in
`the retrieval of hundreds of text satisfying the broadly
`defined search criteria. Often, only a small handful of text
`among the hundreds of retrieved texts is of actual interest to
`a user. The user must then expend much time and energy to
`tediously scan each text and winnow out the truly relevant
`texts from the vast pool of retrieved texts. Conversely, if the
`keyword is too specific or if the exclusive AND function is
`used to join multiple keywords, the search might be too
`30 restrictive. Highly relevant text which did not meet the
`specific keyword criteria will not be retrieved. Hence, a user
`frequently chooses different keywords and conjunctions in a
`costly and time-consuming iterative process to tailor the
`search request. Consequently, operating typical prior art
`35 search and retrieval systems require skill, training, and
`expertise.
`Therefore, what is needed is an apparatus and method for
`determining and ranking the significance of each retrieved
`40 document so that a user can broaden the scope of a search
`to catch any relevant text without being unduly burdened by
`having to wade through inconsequential texts. It would be
`highly preferable for the same apparatus and method to also
`provide a mechanism to easily and naturally navigate
`45 between texts dealing with related subject matter.
`
`25
`
`20
`
`Due to rapid advances made in electronic storage tech(cid:173)
`nology, it is becoming ever more convenient and economi(cid:173)
`cally attractive to store information electronically as a series
`of digital bits of data. As such, "texts" from magazines,
`newspapers, journals, encyclopedias, books, and other
`printed materials are increasingly being classified and
`grouped together into various databases. These texts can be
`comprised of miscellaneous strings of characters, sentences,
`or documents having indeterminate or varied lengths and
`can be of a wide variety of data classes, such as words,
`numbers, graphics, etc. Computers are then utilized to
`access these databases in order to store additional new text
`and to retrieve old, stored texts. One added advantage of
`electronically storing information is that computers can be
`programmed to search and retrieve specific texts in a data(cid:173)
`base which is of special interest to the user. In essence, a
`computer can perform indexing functions, such as a card
`catalog. A user can retrieve a particular text by inputting the
`title, author, date of publication, or some other description
`specific to that text. In response, the computer can automati(cid:173)
`cally search, retrieve, and display the desired text.
`However, if the user does not know of a specific text or
`wishes to conduct research on a general subject matter, the
`computer can be programmed to select certain text which
`might be of significance to the user. Prior art search and
`retrieval systems have typically accomplished this by focus(cid:173)
`ing on "keywords" or query terms. A user who wishes to find
`texts of a particular nature, first specifies one or more
`keywords which might be contained in the desired texts.
`Typically, each text in the database is assigned a unique
`reference number. All words in the text, except for trivial
`words such as "a," and "the," etc., are tagged with the unique 50
`reference number and are placed in an alphabetical index.
`Hence, all texts in the database containing a given keyword
`are located by searching for that keyword in the alphabetical
`index and returning a set of reference numbers. Thereby,
`texts corresponding to the reference numbers are known to 55
`contain the keyword and are accessed via the computer.
`In order to provide the user with greater flexibility, many
`prior art search and retrieval systems provide for "Boolean"
`searches. A Boolean search involves searching for docu(cid:173)
`ments containing more than one keyword. This is typically 60
`accomplished by joining the keywords with conjunctions
`such as the exclusive "AND" function and/or the inclusive
`"OR" function. If two or more keywords are joined by an
`AND, only those texts which contain all those joined key(cid:173)
`words are retrieved. If two or more keywords are joined by 65
`the inclusive "OR" function, all texts which contain at least
`one of the joined keywords are retrieved. For example, given
`
`SUMMARY OF THE INVENTION
`
`In view of the problems associated with information
`search and retrieval systems, one object of the present
`invention is to provide an apparatus and method for ranking
`retrieved documents according to its relevance.
`Another object of the present invention is to provide an
`information search and retrieval system which does not
`require a user to specify keywords or query terms.
`Another object of the present invention is to provide a
`mechanism so that a user can easily and naturally navigate
`between groups of files dealing with related subject matter.
`These and other objects of the present invention are
`implemented in an information search and retrieval com(cid:173)
`puter system. A user initiates a search by selecting and
`opening a file containing subject matter of particular inter(cid:173)
`est. The computer system performs a natural recognition
`algorithm to determine the subject words of the document
`corresponding to the selected file. This is accomplished by
`parsing the document into sentences, determining the parts
`of speech for each word in the sentence, and picking out the
`
`Page 10 of 15
`
`

`

`5,598,557
`
`4
`DETAILED DESCRIPTION
`
`20
`
`25
`
`30
`
`3
`subject word of the sentence based on heuristic syntactical
`grammar rules.
`Once all the subject words in the reference document have
`been found, they are used in a statistical comparison algo(cid:173)
`rithm to determine the relevancy of each file in a database. 5
`A file's relevancy is a function of both the frequency of
`subject words occurring in that file and the distribution of the
`subject words within the database. The file's relevancy is
`also normalized to its length. Relevant files are then
`retrieved and displayed in a list. The most relevant docu- 10
`ments are displayed at the top of the list, while those which
`are not as relevant are displayed in descending order. Hence,
`a user is not required to guess at keywords or query terms
`prior to conducting a search. The user need only select a
`document which is of interest, and the present invention 15
`retrieves and prioritizes relevant documents residing in the
`database.
`The present invention also provides a user with a means
`for navigating between files of related topics. A thumbnail
`image comprising a scaled down bit-mapped representation
`of the cover sheet of the reference document is displayed.
`The three most commonly occurring subject words in the
`reference document are displayed next to this thumbnail
`image. Files in the database which have relevance to each of
`the three subject words are retrieved and are prioritized
`according to their degree of relevance to that particular
`subject word. The thumbnail image of the most relevant file
`to the first subject word is displayed adjacent to that subject
`word. It is followed by the thumbnail image of the next most
`relevant file to the first subject word, etc. Similar thumbnail
`images of files corresponding to the second and third subject
`words are also displayed.
`By placing a moveable cursor over any of the thumbnail
`images and clicking on it, the user can designate that file to
`be the new reference file. This initiates a new search based
`on the subject words of the new reference file. The search
`produces a new list of files ranked according to the degree
`of relevance to the new reference file. It also produces the
`three most common subject words of the new reference
`document and new thumbnail images of files prioritized to
`those subject words. Thus, the present invention allows a
`user to conduct research on a topic by successfully selecting
`new reference documents based on prior search results.
`
`35
`
`An apparatus and method for searching and retrieving
`significant text from a database is described. In the following
`description, for the purposes of explanation, numerous spe(cid:173)
`cific details such as mathematical formulas, flowcharts,
`menus, etc., are set forth in order to provide a thorough
`understanding of the present invention. It will be apparent,
`however, to one skilled in the art that the present invention
`may be practiced without these specific details. In other
`instances, well-known structures and devices are shown in
`block diagram form in order to avoid unnecessarily obscur-
`ing the present invention.
`Referring to FIG. 1, the computer system upon which the
`preferred embodiment of the present invention can be imple(cid:173)
`mented is shown as 100. Computer system 100 comprises a
`bus or other communication means 101 for communicating
`information, and a processing means 102 coupled with bus
`101 for processing information. System 100 further com(cid:173)
`prises a random access memory (RAM) or other dynamic
`storage device 104 (referred to as main memory), coupled to
`bus 101 for storing information and instructions to be
`executed by processor 102. Main memory 104 also may be
`used for storing temporary variables or other intermediate
`information during execution of instructions by processor
`102. Computer system 100 also comprises a read only
`memory (ROM) and/or other static storage device 106
`coupled to bus 101 for storing static information and instruc(cid:173)
`tions for processor 102. Data storage device 107 is coupled
`to bus 101 for storing information and instructions.
`Furthermore, a data storage device 107 such as a magnetic
`disk or optical disk and its corresponding disk drive can be
`coupled to computer system 100. Computer system 100 can
`also be coupled via bus 101 to a display device 121, such as
`a cathode ray tube (CRT), for displaying information to a
`computer user. An alphanumeric input device 122, including
`alphanumeric and other keys, is typically coupled to bus 101
`for communicating information and command selections to
`processor 102. Another type of user input device is cursor
`control 123, such as a mouse, a trackball, or cursor direction
`keys for communicating direction information and com(cid:173)
`mand selections to processor 102 and for controlling cursor
`movement on display 121. This input device typically has
`two degrees of freedom in two axes, a first axis (e.g., x) and
`45 a second axis (e.g., y), which allows the device to specify
`positions in a plane.
`Moreover, data can be input by scanner 126. The scanner
`126 serves to read out the contents of an original document
`or photograph as digitized image information. An OCR
`50 (Optical Character Reader) 108 can be utilized to recognize
`textual portions of a scanned document. Another device
`which may be coupled to bus 101 is hard copy device 124
`which may be used for printing instructions, data, or other
`information on a medium such as paper, film, or similar
`types of media. Additionally, computer system 100 can be
`coupled to a device for sound recording and/or playback 125
`such as an audio digitizer coupled to a microphone for
`recording information. Further, the device may include a
`speaker which is coupled to a digital to analog (D/A)
`60 converter for playing back the digitized sounds. Finally,
`computer system 100 can be a terminal in a computer
`network (i.e., a LAN).
`The currently preferred embodiment of the present inven(cid:173)
`tion can be part of an overall document management soft(cid:173)
`ware package. To conduct a search, a user first specifies a
`particular database. Databases are usually organized so that
`files stored on a particular database share a common
`
`40
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The present invention is illustrated by way of example,
`and not by way of limitation, in the Figures of the accom(cid:173)
`panying drawings and in which like reference numerals refer
`to similar elements and in which:
`FIG. 1 illustrates a computer system as may be utilized by
`the preferred embodiment of the present invention.
`FIG. 2 is a flowchart illustrating the steps for creating a
`new database.
`FIG. 3 illustrates a typical window displayed on a CRT
`which can be used as a user interface for the present
`invention.
`FIG. 4 illustrates a window displaying a search dialog
`box.
`FIG. 5 is a window illustrating the results of a document
`agent search.
`FIG. 6 is a flowchart illustrating the steps for determining
`and ranking the relevance of files in a database.
`FIG. 7 is flowchart illustrating the steps involved in a 65
`document agent search.
`FIG. 8 illustrates a search results window.
`
`55
`
`Page 11 of 15
`
`

`

`5,598,557
`
`10
`
`20
`
`5
`attribute. For example, an attorney might utilize a database
`containing cases from a particular jurisdiction; a doctor
`might consult a database containing files of patient histories;
`a marketing manager might access a database containing
`product reviews for spotting market trends; etc. The data(cid:173)
`base can be an already existing database or a newly created
`database. FIG. 2 is a flowchart illustrating the steps for
`creating a new database. Computer files containing useful
`information can be imported by copying it over to the
`database, step 201. Moreover, data in the form of docu(cid:173)
`ments, reports, magazine and newspaper articles, can be
`entered either manually by means of a keyboard, step 202,
`or they can be entered by using an optical scanner, step 203.
`Moreover, the data can already exist on the computer
`system. The user can specify zones of a scanned image or
`file which is of particular significance for further processing,
`step 204. Textual portions of a scanned bit-map image or file
`can be recognized and converted into ASCII code data, step
`205. The ASCII code data can then be edited, step 206.
`Finally, the processed information is indexed and saved to
`the database, step 207.
`Once a database has been selected, the user can select a
`weighted keyword search, a weighted Boolean search, or a
`document agent search. FIG. 3 illustrates a typical window
`300 which can be displayed on a CRT. Window 300 is
`provided as user interface for the present invention. Window
`300 is comprised of a number of pull-down menus which
`can be accessed by a cursor positioning device, such as a
`mouse. The search menu 301 is accessed by the user to select
`the desired type of search (i.e., keyword 302, Boolean 303,
`or document search 304). The selected type of search is
`highlighted. For example, FIG. 3 illustrates the user having
`selected a Document Agent Search 304.
`If the user selects the weighted word search 302, a search
`dialog box 401 is displayed, as illustrated in FIG. 4. The user
`then types in one or more keywords and clicks on the OK
`box 402 to initiate the search based on the inputted key(cid:173)
`word(s). When the search is completed, a Search Results
`window 403 is displayed. FIG. 4 illustrates a Search Result
`window 403 displaying a list of retrieved documents
`405--407. The list displays those retrieved documents as a
`function of their relevance. Documents having the most
`significance are displayed at the top of the list, whereas
`retrieved documents having less relevance are displayed
`near the bottom of the list. In addition to displaying each
`retrieved document according to its relevancy, a box bearing
`a bar is superimposed over each document's file name. The
`extension of the bar indicates that document's degree of
`relevance to the keyword(s). For example, a search based on
`the keyword Wonder Widget 404 might result in the retrieval
`of three documents 405--407. (It is noted that Wonder Widget
`and Widgets are fictitious names.) A data sheet 405 describ(cid:173)
`ing the product, which is highly relevant, is displayed at the
`top of the list and has a relatively long bar. A brochure 406
`describing all Wonder products, including WonderWidget,
`having some relevance, is displayed in the middle. It has a
`medium-sized bar. A magazine article 407 of a competing
`product that mentions WonderWidget, has low relevance and
`is ranked last in the list. Correspondingly, it has a small bar.
`In the currently preferred embodiment, the bars are color
`coded red, green, and blue, to respectively indicate the
`documents having much, some, and less relevance. The
`determination of the document's relevancy is described in
`detail below.
`For greater flexibility, a user can specify a Weighted 65
`Boolean Search, wherein keywords are joined by conjunc(cid:173)
`tions (e.g., AND, OR, etc.) Again, any retrieved documents
`
`6
`are weighted and ranked according to their relevance to the
`Boolean search request. Typically, a Boolean search results
`in the retrieval of a few highly relevant documents, a
`medium sized grouping of documents having modest rel-
`5 evancy, and a large grouping of documents having little
`relevancy. Note that in the present invention, a user is not
`unduly penalized for using inclusive OR conjunctions.
`Although more documents are likely to be retrieved, the user
`can quickly scan through the most significant documents
`(i.e., documents at the top of the list). The effect of adding
`keywords in an inclusive OR search contributes to the
`determination of a document's relevancy and influences
`which documents "float" to the top of the list.
`Alternatively, a user can opt for a Document Agent
`Search, which allows the user to initiate a search for
`15 documents which are similar to a reference document
`selected by the user. First, the user selects and opens a
`reference document. Next, the user selects the Document
`Agent Search option from the Search pull-down menu.
`Thereupon, the present invention retrieves documents from
`the database which are related to the reference document.
`The relevancy of each retrieved document to the reference
`document is determined, and each document is ranked and
`displayed according to its relevancy.
`FIG. 5 shows a window 500, as may be displayed on a
`25 CRT, illustrating the results of a Document Agent Search. A
`user first selects a particular file, such as Widgets 501, from
`a folder Wonder Products 502. The Widgets 501 document
`is designated the reference document against which other
`documents in the database are compared in determining
`30 relevancy. Note that with this type of search, the user is not
`required to supply keywords. The present invention retrieves
`those documents that are considered to be relevant, ranks
`each retrieved document, and lists the retrieved documents
`in ascending order based on their degrees of relevancy. For
`35 example, if six documents 503-508 were retrieved, the top
`document entitled Data Sheet 503 is considered to have the
`most relevance to the reference document Widgets 501.
`Likewise, the bottom documents, such as Dissertation 507
`and Advertisement 508, are considered to be the least
`40 relevant.
`A section 509 of window 500 is used to display an
`organized chart 510 of relevant documents. Initially, chart
`510 displays a "thumbnail" image 511 of the cover sheet of
`the reference document. A thumbnail image is a bit-mapped
`45 shrunken, miniaturized representation of a page of a docu(cid:173)
`ment (usually the title page ). Multiple rows of thumbnail
`images 512---514 are displayed to the right of the thumbnail
`image of the reference document. Each row comprises
`retrieved files of relevant documents. The first row corre-
`50 sponds to retrieved files having relevance with respect to the
`most relevant subject word in the reference document;
`similarly, the second row corresponds to retrieved files
`having relevance with respect to the second most relevant
`word in the reference document; etc. For example, if the
`three most relevant subject words in the reference document
`Widgets 511 are "gadget" 515, "machines" 516, and "prod(cid:173)
`uct" 517, those documents having relevance to the word
`"gadget" is categorized into the top row. The second and
`third rows comprise documents having relevance to the
`60 subject words "machines" and "product." The documents in
`a row are arranged so that the most relevant document is
`placed at the left with successively decreasing relevant
`documents placed to the right. Hence, document 512 has
`more relevance to the subject word "gadget" 515 than
`document 518.
`Chart 510 provides a user with a means for navigating
`between related documents. By glancing at the thumbnail
`
`55
`
`Page 12 of 15
`
`

`

`5,598,557
`
`7
`images, the subject words, and the titles, a user can get a
`general indication of those documents which are of interest.
`The user can also open a document to examine its contents.
`The user can then select a particularly interesting document
`by positioning a cu

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket