`
`6366908
`
`6366908
`
`UTILITY Patent Application
`
`
`PATENT DATE
`
`SCANNED
`
`.QJK.
`
`APR 0 2 Zfli£
`
`zo o u
`
`. w<
`
`f>
`
`5o U
`
`J3C
`
`O
`CO
`
`, ' <0<A
`Gd
`
`Un3C
`
`O
`
`N
`
`Class
`
`I
`
` :o
`
`I
`
`a •
`
`Of
`
`■y
`
`CLASS
`
`SUBCLASS
`
`ART UN T
`
`EXAMINER
`
`/
`
`TITLE OF
`
`APPLICANT(S):
`
`1
`1
`
` ORIGINAL
`
`1
`
` ISSUING CLASSIFICATION
`CROSS REFERENCE(S)
` SUBCLASS (ONE SUBCLASS PER BLOCK)
`jT
`e w/ ZOi
`lO
`7
`/77
`
`1
`
`|
`|
`|
`
`SUBCLASS
`
`CLASS
`CLASS
`3
`7C7
`707
`1 INTERNATIONAL CLASSIFICATION 70 If
`/7/30
`■3SZ
`G 0.i£
`1 r f ^
`
`'
`
`/
`
`/
`
`□Continued on Issue Slip Inside File Jacket
`
`|
`
`Sheets Drwg.
`
`DRAWINGS
`Figs. Drwg.
`6
`
`Print Fig.
`
`CLAIMS ALLOWED
`
`Totai Claims
`
`Print Giaim for O.G.
`/
`NOTICE OF ALLOWANCE MAILED
`
`^ (Assistant Examiner)
`THOMAS BLACK
`SUPEBWSOmWTOT^W
`viinftiAC Rl &GK
`techno^ CEi™
`
`tf)ate]r
`
`4^
`
`(Primary Examiner)
`
`~JCi-
`
`gai l^istfu
`(Lesal l/tetiumsnts Examiner)
`
`/ (Dat0f
`
`ISSUE FEE
`
`Amount Due
`
`Date Paid
`
`3d>' e>D
`ISSUE BATCH NUMBER
`
`/a 'e -"I
`
`(FACE)
`
`I
`months of
`,^inal
`Save been dlsciaimed.
` BC
`s
` iiiB iiiiuifiuidon disclosed herein may be restricted. Unauthorized disclosure may be prohibited by the United States Code Title 35, Sections 122,181 and 366.
`Possession outside the U.S. Patent & Trademark Oftice Is restricted to authorized employees and contractors only.
`FILED WITH: □DiSK(CRF) □fICHE □ CD-ROM
`^ (Auached in pockat on rigtit Insldo flap)
`
`t
`/
`
`/
`
`■
`
`'1INAL
`.AIMER /
`/
`
`/
`
`. (date)
`
`)f this patent
`
`/I
`
`aimed./
`
`of this patent shaii
`yondybe expiration date
`. No,'/•
`
`r
`
`I
`
`Form PTO-436A
`(Rev. 6/B9)
`
`Page 1 of 178
`
`GOOGLE EXHIBIT 1002
`
`
`
`09475743
`
`1
`
`INITIALS.
`
`i'l
`
`1. Application SJTXuji- papers.
`Pt.>/pyai—
`
`OiinXl^
`
`CONTENTS
`Date Received
`(inci. 0. of M.)
`or
`Date Mailed
`
`U-30-'5'?
`
`(X-?>0
`
`42..
`
`43..
`
`44.
`
`45.
`46.
`9y^5^o]
`-HiHs 47.
`
`Date Received
`(Incl. 0. of M.)
`or
`Date Mailed
`
`7.
`
`e..
`
`9..
`
`10..
`
`11..
`
`12..
`
`13..
`
`14..
`
`15..
`
`16..
`
`17..
`
`18.
`
`19.
`
`20.
`
`21.
`
`22.
`
`23. _
`
`24. _
`
`25. _
`
`26. _
`
`27. _
`
`28. _
`
`29..
`
`30..
`
`31..
`
`32..
`
`33..
`
`34.:
`
`35..
`
`36..
`
`37..
`
`38..
`
`39..
`
`40.
`
`41.
`
`48.
`
`49.
`
`50.
`
`51.
`
`52.
`
`53. _
`
`54. _
`
`55. _
`
`56. _
`
`57. _
`
`58. _
`
`59. _
`
`60. _
`
`61. _
`
`62. _
`
`63. _
`
`64. _
`
`65. _
`
`66..
`
`67..
`
`68..
`
`69..
`
`70..
`
`71..
`
`72..
`
`73..
`
`74..
`
`75..
`
`76..
`
`77..
`
`78..
`
`79..
`
`80..
`
`81.
`
`82.
`
`(LEFT OUTSIDE)
`
`Page 2 of 178
`
`
`
`ISSUE SLIP STAPLE AREA (for additional cross references)
`
`POSITION
`
`INITIALS
`
`ID NO.
`
`DATE
`
`Claim
`
`Original
`
`Rnal
`
`Original
`
`FEE DETERMINATION
`
`O.I.P.E. CLASSIFIER
`
`FORMALITY REVIEW
`
`RESPONSE FORMALITY REVIEW
`
`— (Throughnumeral).
`
`Date
`
`Claim
`
`iI
`
`1 <1
`2
`T 3
`If 4
`.f 5
`f.
`7 7
`7 8
`9
`f® 10 —1
`(f 11
`12
`
`-13
`
`14
`
`15
`
`16
`
`17
`
`IB
`
`19
`
`20
`
`21
`
`22
`
`23
`
`INDEX OF CLAIMS
`. Rejected
`N
`, Allowed
`i
`Canceled
`A
`. Restricted
`0
`Filan
`
`Non-elected
`Interference
`Appeal
`Objected
`
`Claim
`
`Date
`
`Date
`
`51
`
`52
`
`53
`
`54
`
`55
`
`56
`
`57
`
`58
`
`59
`
`60
`
`61
`
`62
`
`63
`
`64
`
`65
`
`66
`
`67
`
`68
`
`69
`
`70
`
`71
`
`72
`
`73
`
`Filan
`
`101
`
`102
`
`103
`
`104
`
`105
`
`106
`
`107
`
`108
`
`109
`
`110
`
`111
`
`Original
`
`112
`
`113
`
`114
`
`115
`
`116
`
`117
`
`118
`
`119
`
`120
`
`121
`
`122
`
`123
`
`124
`
`24
`
`25
`
`26
`
`27
`
`28
`
`29
`
`30
`
`31
`
`32
`
`33
`
`34
`
`35
`
`36
`
`37
`
`38
`
`39
`
`40
`
`41
`
`42
`
`43
`
`44
`
`45
`
`48
`
`47
`
`48
`
`49
`
`50
`
`74
`
`75
`
`76
`
`77
`
`78
`
`79
`
`80
`
`81
`
`82
`
`83
`
`84
`
`85
`
`86
`
`87
`
`88
`
`89
`
`90
`
`91
`
`92
`
`93
`
`94
`
`95
`
`96
`
`97
`
`98
`
`99
`
`IOC
`
`125
`
`126
`
`127
`
`128
`
`129
`
`130
`
`131
`
`132
`
`13S
`
`13^
`
`135
`
`136
`
`137
`
`138
`
`139
`
`140
`
`141
`
`142
`
`143
`
`144
`
`145
`
`146
`
`147
`
`148
`
`14£
`
`150
`
`If more than 150 claims or 10 actions
`staple additional sheet here
`
`(LEFT INSIDE)
`
`Page 3 of 178
`
`
`
`!
`
`SEARCHED
`
`SEARCH NOTES
`(INCLUDING SEARCH STRATEGY)
`
`Date
`
`Exmr.
`nz"
`
`I
`
`<i/i^/el
`J'
`
`II
`
`'ijnhi
`j
`
`>/
`
`Sea
`
`To discuss
`fri<^r ^rr
`
`See.
`
`7^¥/^
`
`S'e^f^rcA, S^e.
`
`Sub.
`
`Date
`
`/ 2
`
`iZ
`
`Class
`107
`
`7 /
`
`O
`Z7^./
`
`/ oZ.
`
`tcf.t
`
`lof
`
`I
`
`INTERFERENCE SEARCHED
`
`Class
`
`Sub.
`
`Date
`
`Exmr.
`
`1
`
`1
`
`(RIGHT OUTSIDE)
`
`3 5 7 1
`
`0
`
`IV I
`l.vi
`
`ni
`
`i 1
`
`7t?7
`•i-
`
`Page 4 of 178
`
`
`
`SERIAL NUMBER
`
`FILING DATE
`
`CLASS
`
`GROUP ART UNIT
`
`ATTORNEY DOCKET NO.
`
`1
`
`09/475,743
`
`12/30/99
`
`707
`
`2771
`
`z KYUNG TAEK CHONG, TAEJON, REPUBLIC OF KOREA; MYONG-GIL JANG, TABJON,
`3 REPUBLIC OF KOREA; MISEON JUN, TAEJON, REPUBLIC OF KOREA; SE YOUNG PARK
`g; TAEJON, REPUBLIC OF KOREA.
`'
`
`<
`
`**CONTINUING DOMESTIC DATA*********************
`
`VERIFIED
`
`**371 (NAT'L STAGE) DATA*********************
`VERIFIED
`
`* *FOREIGN APPLICATIONS* * **********
`VERIFIED
`REPUBLIC OF KOREA 1999-25035
`
`06/28/99
`
`Ve3
`
`IF REQUIRED, FOREIGN FILING LICENSE GRANTED 02/09/00 ** SMALL ENTITY **
`136 ulc 119 (a-d) conditions met
`Dno QMet after Altowance |
`I Verified and Acknowledgedy^^^^^
`
`TOTAL
`CLAIMS
`
`12
`
`INDEPENDENT
`CLAIMS
`
`2
`
`STATE OR
`COUNTRY
`
`KRX
`
`SHEETS
`DRAWING
`
`5
`
`TTrrritrtgiamifni
`jE
`
`Iniffnifi -
`
`SEED AND BERRY LLP
`6300 COLUMBIA CENTER
`SEATTLE WA 98104-7092
`
`vt
`(/)
`I lu
`OC
`
`Q <
`
`3<
`
`KEYFACT-BASED TEXT RETRIEVAL SYSTEM, KEYFACT-BASBD TEXT INDEX METHOD
`AND RETRIEVAL METHOD
`'
`
`FILING FEE
`RECEIVED
`
`$345
`
`FEES: Authority has been given in Paper
`No-
`to charge/credit DEPOSIT ACCOUNT
`NO,
`for the following:
`
`All Fees
`1.16 Fees (Filing)
`1.17 Fees (Processing Ext. of time)
`1.18 Fees (Issue)
`Other
`Credit
`
`Page 5 of 178
`
`
`
`PATENT APPUCATION SERIAL NO.
`
`U.S. DEPARTMENT OF COMMERCE
`PATENT AND TRADEMARK OFFICE
`FEE RECORD SHEET
`
`Ol/ie/BOOO D6UTLER OOOOOOD4 09475743
`
`01 FCsBOl
`
`345.00 OP
`
`Repln. Ref: Ol/lB/BOOO DBUTLER OOISIOOBOO
`Dfli:191090 Haee/NuBber:09475743
`FC: 704
`$35.00 CR
`
`PTO-1556
`(5/87)
`
`•U.S. 6P0:1999^59-082/19144
`
`Page 6 of 178
`
`
`
`Please tyije a plus sign (+) inside this box | + | ^
`
`o-Z)
`
`PTO/SB/05 (2/98)
`
`in
`I <t\ I
`
`UTILITY
`as patent application
`J M
`|c
`TRANSMITTAL
`In
`
`CO
`
`<A>
`
`vo
`vo
`
`ly fornonprovisional applicaSons under 37 CFR § 1.53(b))
`
`Attorney Docket No.
`
`me
`
`30005 S
`I Kyung Tack Chong
`First Inventor or Application IdenUFer
`KEYFACT-BASED TEXT RETRIEVAL SYSTEM,
`KEYFACT-BASED TEXT INDEX METHOD, ANDa^
`RETRIEVAL METHOD
`Express Mail Label No.
`
`o■
`
`*->
`
`EL427974151US
`
`APPLICATION ELEMENTS
`See MPEP chapter 600 concerning utility patent a/^lication contents.
`
`ADDRESS TO:
`
`jta
`tS
`Box Patent Application
`Assistant Commissioner for Patents
`Washington, D.C. 20231
`
`20
`
`General Authorization Form & Fee Transmittal
`(Submit an original and a duplicate for fee processing)
`Specification
`[Total Pages]
`(preferred arrangement set forth below)
`- Descriptive Titie of the invention
`Cross References to Reiated Appiications
`Statement Regarding Fed sponsored R&D
`Reference to Microfiche Appendix
`Background of the Invention
`Brief Summary of the Invention
`■ Brief Description of the Drawings (if filed)
`■ Detailed Description
`' Ciaim(s)
`' Abstract of the Disclosure
`X Drawing(s) (3S use ii3) [total stieets]
`
`[Totai Pages]
`Oath or Declaration
`a- I X I Newly executed (original or copy)
`Copy from a prior application (37 CFR 1.63(d))
`b. □
`{for continuation/divisional with Box 17 completed)
`□ DELETION OF INVENTOR(S)
`
`"Al
`
`iv
`
`w
`
`5.
`
`□
`
`6.
`
`7.
`
`I I Microfiche Computer Program (Appendix)
`Nucleotide and Amino Acid Sequence Submission
`(If appficable, all necessary)
`I
` I Computer-Readable Copy
`I
` I Paper Copy (identical to computer copy)
`I
` I Statement verifying identity of above copies
`
`b.
`c.
`
`ACCOMPANYING APPLICATION PARTS
`I XI Assignment Papers (cover sheet & document(s))
`□ 37 CFR 3.73(b) Statement
`X Power of Attorney
`I
` I English Translation Document (if applicable)
`Copies of IDS
`Citations
`
`information Disciosure
`Statement (iDS)/PTO-1449 X
`
`(when there Is an assignee)
`
`g.
`
`10.
`
`11.
`
`12.
`
`13. X
`
`Small Entity
`
`Preliminary Amendment
`Return Receipt Postcard
`Statement(s) □ Statement filed in prior application.
`Status still proper and desired
`Certified Copy of Priority Document(s)
`(if foreign priority is cialmed)
`Other: Certificate of Express Mali
`Check
`
`14.
`
`15.
`
`16.
`
`Signed statement attached deleting
`inventor(s) named in the prior application,
`see 37 CFR 1.63(d)(2) and 1.33(b)
`Incorporation By Reference (useable If box 4b Is
`checked) The entire disclosure of the prior application,
`from which a copy of the oath or declaration is supplied
`under Box 4b, is considered to be part of the disclosure of
`the accompanying application and Is hereby Incorporated
`reference therein.
`17, If a CONTINUING APPLICATION, check appropriate box and supply the requisite information below and in a preliminary amendment
`
`□ Continuation □ Divisional □ Continuation-in-Part (CiP)
`
`of prior Application No.;
`
`Prior application Information: Examiner _
`
`Group/Art Unit _
`
`X Claims the benefit of Korean Application No. 1999-25035 filed June 28.1999
`
`CORRESPONDENCE ADDRESS
`E. Russell Tarleton
`Seed and Berry llp
`701 Fifth Avenue, Suite 6300
`Seattle, Washington 98104-7092
`(206) 622-4900 p/ione; (206) 682-6031 fax
`
`Respectfully submitted,
`
`TYPED or PRINTED NAME
`SIGNATURE
`U:\float\jabl\300055.418ptosb05
`
`E. Russell Tarleton
`
`REGlSTRj^roN NO. 31.800
`
`Date
`
`30.
`
`Page 7 of 178
`
`
`
`EXPRESS
`
`NO. EW.27974151US
`
`1
`
`2
`
` KEYFACT-BASED TEXT RETIUEVAL SYSTEM, KEYFACT-BASED
`
` TEXT INDEX METHOD, AND RETRIEVAL METHOD
`
`3 TECHNICAL FIELD
`4
` Hie present invention relates to a keyfact-based text retrieval method and a
`5 keyfact-based text index method. In particular, the methods describe the formalized concept
`
`6 of a document as a pair comprising an object that is the head and a property that is the
`
`7 modifier, and uses the information described by the pair as index imbrmation for efficient
`
`8
`
` document retrieval.
`
`9 BACKGROUND OF THE INVENTION
`10
`A keyfact means an important fact contained in sentences v/hich constitute a
`
`11 document. The keyfact is represented by an object and property information through
`
`12 syntactic analysis of the sentence.
`13
`Tlie ke>'word-based text retrieval method was the main stream in
`
`14 conventional text retrieval methods. However, the precision of the keyword-based text
`
`15 retrieval method was not good due to the following reasons. First, the meaning of the
`
` 16 document is not precisely represented and the representativeness of document expression is
`
`17 low because the document is represented by keywords, which are nouns. This is a
`18 fimdamental reason for poor retrieval precision. Second, when a query includes a natural
`
`19 language phrase or a natural language sentence or keywords, the intention of the user's query
`
`20 is not reflected precisely in a keyword-based text retrieval method because the query is
`21 expressed by keywords. Tlierefore, the keyword-based text retrieval method has a
`22 fundamental limitation in retrieval precision because it performs document retrieval by
`23 keywords. As a result, because the keyword-based text retrieval system provides such low
`24 level of retrieval precision, it causes a number of unnecessary retrievals and therefore
`
`25 precious resources, such as time and effort, are wasted.
`
`Recently, a number of studies have been performed in the area of
`26
`27 phrase-based text retrieval methods in order to compromise such defects of the
`
`1
`
`I . i!
`
`[ y
`y
`
`i'H
`
`hQ
`
`Page 8 of 178
`
`
`
`28
`
`29
`
`30
`
`31
`
`32
`
`33
`
`34
`
`35
`
`36
`
`37
`
`38
`
`39
`
`40
`
`41
`
`42
`
`43
`
`44
`
`45
`
`46
`
`47
`
`48
`
`49
`
`50
`
`51
`
`52
`
`53
`
`54
`
`isSt#
`1 ^
`
`AC
`
`iri*S? SI
`
`r. s
`
`issxi
`
`! sS
`
`■3
`
`I'S;
`i y
`! s
`!.y
`!;3
`
`'si-i
`Lil
`
`SKT
`
`A new approach to keyfact-based text retrieval methods has been proposed in
`
`In the keyfact-based retrieval method, it is desirable that phrases or words
`
`Since the keyword-based retrieval method doesn't recognize the conceptual
`
`SUMMARY OF THE INVENTION
`A keyfact-based retrieval method, which extracts the precise keyfact pattern
`
`'-n
`
`Page 9 of 178
`
`
`
`In addition, a keyfact-based retrieval method, which extracts precise keyfact
`
`55
`
`56
`
`57
`
`58
`
`59
`
`60
`
`In addition, a keyfact-based retrieval method, which retrieves and indexes
`60 documents with the unit of keyfact, is provided.
`A keyfact-based text retrieval system of the present invention includes
`
`61
`
`keyfact index structure. The keyfact retrievmg means for receiving tlie keyfact of the user
`
`The keyfact extracting means includes morphology analyzing means,
`
`keyfact terms.
`
`The keyfact indexing means includes frequency calculating means, table
`generating means, and keyfact indexing means. The frequency calculating means calculates
`
`3 (
`
`-1
`
`.ttss.
`
`•
`
`rs^
`
`■4
`in
`
`i il
`
`'tv
`
`ry
`ly
`
`rsaj,
`
`;.f1
`
`.
`
`62
`
`63
`
`64
`
`65
`
`66
`
`67
`
`68
`
`69
`
`70
`
`71
`
`72
`
`73
`
`74
`
`75
`
`76
`
`77
`
`78
`
`79
`
`80
`
`81
`
`Page 10 of 178
`
`
`
`a frequency of various keyfacts and a document frequency of the keyfacts. The various
`keyfacts are included in the document collection; and the document frequency is the number
`
`of documents contained the various keyfacts. The table generating means generates a
`
`document index table, a document table, and a keyfact index table of the document collection.
`
`5 The keyfact indexing means forms a keyfact index structure. The keyfact index structure has
`information regarding document frequency, document identifier, and keyfact frequency in
`
`each corresponed documents.
`The keyfact retrieving means includes following means. A means forms a
`
`document and a user query vector with an index file and the keyfact of the user query. The
`
`10 index file generated by the keyfact indexing means. The keyfact of the user query generated
`by the keyfact extracting means. A means determines keyfact weight constants in accordance
`
`with the keyfact pattern. A means calculates keyfact weights for the document and the user
`query by applying the keyfact weight constants to the document and the user query vector.
`
`The retrieval results displaying means displays the retrieval result by applying the keyfact
`15 weights to keyfact retrieval model. The retrieval result indicates documents with a keyfact
`
`similar to the keyfact of the user query.
`
`A keyfact-based text retrieving method of the present invention includes
`
`keyfact extracting step, keyfact indexing step, and keyfact retrieving step. The keyfact
`extracting step is to analyze a document collection and a user query, and extracts keywords
`
`20 without part-of-speech ambiguity from the document collection and the user query, and
`respectively extracts keyfacts of tlie document collection and the user query from the
`keywords. The keyfact indexing step is to calculates the frequency of the keyfacts of the
`
`document collection and generates a keyfact list of the document collection for a keyfact
`
`index structure. The keyfact retrieving step is to receives the keyfact of the user query and the
`25 keyfacts of the document collection and defines a keyfact retrieval model in consideration of
`
`weigh factors according to the keyfact pattern and generates the retrieval result.
`The step of keyfact extracting includes the following steps. The first step is to
`
`analyze morphology of an input sentence and obtaining tag sequences of part-of-speech by
`attaching part-of-speech tags. The second step is to select a tag sequence of part-of-speech
`
`4 XI
`
`i'3
`. fh
`ifcr
`
`L J
`
`Page 11 of 178
`
`
`
`5
`
`out of the tag sequences of part-of-speech. The third step is to extract a keyfact pattern by
`applying the tag sequence of part-of-speech to a keyfact pattern rule. The fourth step is to
`apply the keyfact pattern to a keyfact pattern generation rule and generating a keyfact list.
`Tiic step of analyzing moiphology includes the following steps. The first step
` is to divide the input sentence into words. Tne second step is to perform morphological
`analysis on the words using part-of-speech dictionaries. The third step is to perform
`morphological variation and recover prototypes. The fourtli step is to obtain the tag sequence
`of part-of-speech by tagging part-of-speech tags in accordance with the result of the
`morphological analysis.
`The part-of-speech dictionaries include a noun dictionary, a verb dictionary,
`an adjective dictionary, an adverb dictionary, a preposition dictionary, a conjunction
`dictionary and a stop-word lexicon.
`The step of keyfact indexing includes the following steps. The first step is to
`calculate a frequency of various keyfacts and a document frequency of the keyfact. The
`second step is to generate a document index table, a document table and a keyfact index table
`of the document collection. The third step is to form a keyfact index structure including
`document frequency, document identifier and keyfact frequency.
`The step of keyfact retrieving includes the following steps. The first step is to
`form a document and a user query vector with an index file and a keyfact of the user query.
`20 The second step is to determine keyfact weight constants in accordance with the keyfact
`pattern. The third step is to calculate keyfact wei^ts for the document and the user query by
`applying the keyfact weight constants to the document and the user query vector. The fourth
`step is to display the retrieval result by applying the keyfact weights to the keyfact retrieval
`model. The retrieval result indicates documents with a keyfact similar to the keyfact of the
`
`10
`
`15
`
`25 user query.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`Figure 1 is a block diagram illustrating a keyfact-based text retrieval system
`of the present invention;
`
`5 i
`
`f
`
`♦SB?
`
`i U
`
`ir&s
`
`■ S 2
`•:*w
`.saj
`
`Page 12 of 178
`
`
`
`•
`
` •
`
`Figure 2 is a block diagram illustrating a hardware structure of a
`
`keyfact-based text retrieval system in accordance with an embodiment of the present
`
`invention;
`
`Figure 3 is a block diagi'am illustrating a keyfact extraction device of a
`
`5 keyfact-based text retrieval system in accordance with an embodiment of the present
`
`invention;
`
`Figure 4 is a block diagram illustrating a keyfact index device of a
`
`keyfact-based text retrieval system in accordance with an embodiment of the present
`
`invention;
`
`10
`
`Figure 5 is a block diagram illustrating a keyfact retrieval device of a
`
`keyfact-based text retrieval system in accordance with an embodiment of the present
`
`invention; and
`
`to a query.
`
`Figure 6 is a screen image illustrating a document retrieval result in response
`
`15 DETAILED DESCRIPTION OF THE INVENTION
`
`ru
`
`'iXt
`
`Figure 1 is a block diagram illustrating a keyfact-based text retrieval system
`
`of the present invention. The keyfact-based text retrieval system comprises a keyfact
`
`extraction device 11, a keyfact index device 12, and a keyfact retrieval device 13. Figure 2 is
`
`a block diagram illustrating a hardware structure of a keyfact-based text retrieval system in
`
`20 accordance with an embodiment of the present invention.
`
`As sho\vn in Figure 2, the main memory device 21 includes a keyfact
`
`extraction device, a keyfact index device 12, a keyfact retrieval device 13, and an index
`
`stmcture 16. The central processing device 23 supervises the keyfact-based text retrieval. A
`
`hard disk 24 stores document collection 25, dictionaries for keyfact retrieval 26, and an index
`
`25 file that is the result of the keyfact index. The index file 27 is loaded onto the main memory
`
`as an index stmcture 16 and the keyfact retrieval device 13 uses the index file. The input and
`
`output device 22 receives a query from a user and generates retrieval results to the user.
`
`Page 13 of 178
`
`
`
`Now, tlie keyfaci-based text retrieval system in accordance with the present
`
`invention is explained with reference to Figure 1; Once a document collection 14 or a query
`
`15 is given, the keyfact extraction device 11 extracts words without ambiguity by
`performing morphological analysis and tagging. The keyfact generation rule is applied to
`
`5
`
` the words and then the keyfacts are extracted.
`
`The keyfact index device 12 indexes the document collection 14 or the query
`
`with the unit of keyfact and calculates the frequencies of the keyfacts. The frequencies of the
`
`keyfacts are stored into the index structure 16 with the document ID information. The
`
`keyfact retrieval device 13 orders documents using the similarit}' calculation method and
`
`10 shows retrieval results. The similarity calculation method considers document collection
`
`and keyfact weights with the help of a keyfact-based text retrieval model. In a keyfact-based
`
`text retrieval, when a document collection 14 or a query is given, the keyfact extraction
`
`device 11 expresses it in the unit of keyfacts. All keyfacts express semantic relation between
`
`words in the form of [object, property]. Keyfacts can be categorized by configurations of an
`
`15 object and a property. Parts of text that express the same conceptual meaning in the
`
` document collection or the query are categorized into the same keyfact type. The keyfact
`
` extraction device will be reviewed in detail below with Figure 3.
` The keyfact index device 12 indexes the extracted keyfacts with frequency
`
` information. In other words, the keyfact index device 12 calculates frequencies of the
` 20 various forms of keyfacts included in the documents and generates a keyfact list of the
`
`document collection. Therefore, an index structure 16 that reflects keyfacts is created and
`
`the index file is stored. The keyfact index device 12 will be reviewed in detail below with
`
`Figure 4.
`
`When the keyfact retrieval device 13 receives a query, it retrieves appropriate
`
`25 documents on the basis of the keyfact-based retrieval method. The keyfact retrieval model is
`
`defined by considering weights of keyfact patterns. The similarity between the query and the
`
`documents is calculated and appropriate documents for the query are shown as a result in the
`
`order of the similarity. The keyfact retrieval device 13 will be reviewed in detail below with
`
`Figure 5.
`
`:s^
`
`'d
`
`Sir
`
`y
`
`i
`^
`
`Ls
`
`3
`%
`
`Page 14 of 178
`
`
`
`As shown in Figure 3, the keyfact extraction device 11 analyzes a document
`
`and generates keyfacts through the processes of morphological analysis, part-of-speech
`
`tagging, keyfact pattern extraction, and keyfact generation.
`
`A document is supplied at stage 31 and morphological analysjs is performed
`
`5 at stage 32. A sentence in the document is divided into words and the morphological
`analysis is performed with dictionaries 36 at stage 32. The morphological variation is
`
`considered in order to recover prototypes. The dictionaries 36 include a noun dictionary, a
`
`verb dictionary, an adjective dictionary, an adverb dictionary, a preposition dictionary, a
`conjunction dictionary, and a stop-word lexicon. In some cases, a part-of-speech of a word is
`
`10 determined by rules without dictionaries.
`
`The part-of-speech tag in dictionaries 36 includes noun (N), verb (V),
`
`adjective (A), preposition (P), and stop-word (S). The noun is further divided into proper
`
`noun (NQ), name noun (NN), vocative noun (NV), unit noun (NU), predicate noun (NP),
` non-predicate noun (NX), etc. The reason for such division is that the class of noun
`
`15 determines the object or the property of the keyfacts.
`
`For example, in a sequence of words having two or three nouns in a row, it is
`
`likely that name noun (NN), proper noun (NQ), and non-predicate noun (NX) are objects and
`
`vocative noun (NV), unit noun (NTJ), and predicate noun (NP) are properties. Additionally,
`
`in a phrase having proper noun (NQ), name noun (NN), and non-predicate noun (NX), the
`
`20 order of priority of nouns in the object is name noun (NN) > proper noun (NQ) >
`
`non-predicate noun (NX).
`
`The preposition is divided into the possessive preposition (PO) which is used
`
`as "of and the positional preposition (PP) and etc. The adjective or the variated verb which
`
`makes up the noun is tagged as a pronoun (MP), wliich is a separate keyfact tag. For example,
`
`25 in analyzing "the fast retrieval of the distributed infoimation" with morphological analysis, a
`
`result of the sequence of the tag would be " S (stop-word) A (adjective) NV (vocative noun)
`
`PO (possessive preposition) S (stop-word) V-ed (verb) NV (vocative noun). The V-ed
`
`(verb) is a modified form of verb and makes up the noun. Like the A(adjective), the V-ed
`
`'iO
`
`I
`
`V? i
`
`]s^.
`
`I'll
`
`w
`
`Page 15 of 178
`
`
`
`(verb) is convened into a keyfact tag MP and the sequence of nouns is converted into a
`
`keyfact tag KEY. The final result would become "MP KEY PO MP KEY".
`
`Once the stage 32 of morphological analysis is performed, various results are
`
`obtained.
`
`5
`
` At stage of 33 in which part-of-speech tagging is performed, a precise
`
`sequence of tags is chosen among the various results of the morphological analysis. In other
`
`words, the part-of-speech tags obtained from the morphological analysis are used at the stage
`of part-of-speech tagging. The modified form of verb that makes up a noun or an adjective is
`converted into a modifier (MP) and the sequence of nouns is converted into KEY tag. The
`
`10 examplar sentence "the fast retrieval of the distributed information" shows the final
`
`sequence of tags "MP KEY PO MP KEY".
`
`.<33.
`
`Once the final sequence of tags in response to the input sentence is obtained,
`
`die stage of keyfact pattern extraction 34 searches the keyfact pattern rule 37 and extracts
`meaningful keyfact patterns necessary for keyfact generation. The keyfact pattern rule 37
`
`15 which is used for keyfact pattern extraction describes keyfact patterns as to the sequence of
`
`the input tags. A part of the keyfact pattern rule is illustrated at following table 1.
`
`Page 16 of 178
`
`
`
`•t£#
`!!».
`
`ss
`
`•sen
`
`:
`
`'i'"J
`
`:!
`
`5 W
`
`LU
`
`•sal
`
`•Saf
`
`; n
`
`[Table 11
`
`Keyfact pattern
`
`Keyfact term list
`
`KEYl PO KEY2
`
`(the retrieval of infotTnation)
`
`KEYl PO MP KEY2
`
`(the retrieval of the
`distributed information)
`
`MP KEYl PO KEY2
`
`(thefast retrieval of
`information)
`
`MPl KEYl PO MP2 KEY2
`
`(the fast retrieval of the
`distributed information)
`
`[KEY2,KEY1], [BCEYl.NIL],
`[KEY2,NIL],
`[KEY2 KEYl,NIL]
`
`[information, retrieval],
`
`[information, NIL],
`
`[retrieval, NIL],
`
`[information retrieval, NIL]
`
`[KEY2,KEY1], [KEYl,NIL],
`[KEY2,NIL],
`
`[KEY2 KEYl,NIL], [KEY2, MP]
`
`[KEY2,KEY1], [KEYl,NIL],
`[KEY2,NIL],
`
`[KEY2 KEY1,NIL], [KEYl, MP]
`
`[KEY2,KEY1], [KEYl,NIL],
`[KEY2,NIL],
`
`[KEY2 KEYl,NIL], [KEYl, MPl],
`
`[KEY2, MP2]
`
`(Note; The italic is the examples.)
`
`The final sequence of tags "MP KEY PO MP KEY" obtained from "the fast
`
`retiieval of the distributed information" is applied to the keyfact pattern rule and the keyfact
`
`5
`
` pattern "MPl KEYl PO MP2 B:EY2" is the result.
`
`Keyfact terms that have forms of [object, property] are generated as to the
`
`input keyfact pattern at the stage of the keyfact generation 55 by searching the keyfact
`
`generation rule 38. The object is a noun or a compound noun represented by a keyword and
`
`the property is a verbal word or a noun that makes up another noun, or a prototype of a verbal
`
`10 word.
`
`10
`
`Page 17 of 178
`
`
`
`The keyfact generation rule includes possible keyfact lists, each of which can
`
`be generated in each keyfact pattern. In the example stated above, if the keyfact pattern
`
`"MPl KEYl JY MP2 KEY2" is applied to the keyfact generation stage, "[KEY2, KEYl],
`[KEYl, NIL], [KEY2, NIL], [KEY2 KEYl, NIL], [KEYl, MPl], [KEY2, MP2]" is going to
`
`5 be the outcome. That is, a keyfact list 39 "[information, retrieval], [retrieval, NIL],
`[information, NIL], [information retrieval, NIL], [retrieval, fast], [information, distributed]"
`
`is obtained from the keyfact pattern "the fast retrieval of the distributed information".
`
`The keyfact index device is now reviewed in detail with Figure 4.
`The keyfact index device calculates statistical frequencies of keyfacts in a
`
`10 document obtained from the keyfact extraction device 11 and forms the index structure.
`
`Therefore, index information is efficiently maintained and processed by the keyfact index
`
`device. Each index term of the keyfact index device is an extracted keyfact term
`
`representing each document.
`
`I n
`
`For each document, the keyfact frequency (tf) and document frequency of the
`
`15 keyfact (df) are calculated in order to obtain the frequency information of the keyfacts.
`
`Next, supplementary tables such as a document index table, a document table,
`
`and a keyfact index table are generated to form an efficient index structure 44. The
`
`document index table contains keyfacts of the document, tlie frequency infomiation. The
`
`document table includes a real document text. The keyfact index table is the main table that
`
`20 includes the document frequency (df) of each keyfact, and pair list of the document identifi er
`
`of each keyfact and tlie frequency information within a document (tf).
`
`Next, an index structure is formed in the unit of tlie keyfact and an index file
`
`is stored. Efficient storage structures like the B+ tree can be used for the index structure.
`
`The inverted file structure of the keyfact index table is used as posting information file
`
`25 structure.
`
`A part of the result of the keyfact index is shown in the following table 2.
`
`11
`
`Page 18 of 178
`
`
`
`Keyfect index
`
`[tliom, sharp]
`
`[tliorii, dull]
`
`[reed, NIL]
`
`[reed field, NIL]
`
`[branch, NIL]
`
`[Dahurian buclchom
`family, NIL]
`
`liable 21
`
`Document id:
`frequency
`
`(162:1) (197:1)
`
`C102:2X18S:3)(193:I)
`
`(6:2)(29:1)
`(6:1)
`(21:1)(33:2}(88:1)(90:3)
`
`(102:1)
`
`At table 2, in case of [branch, NIL], "branch" appears at 4 documents and
`
`therefore the document frequency (df) for keyfact index [branch, NIL] is four. In addition,
`
`5 "branch" appears once in document 21, twice in document 33, once in document 88, and
`
`three times in document 90.
`
`The keyfact retrieval device 13 is now reviewed in detail with Figure 5. The
`
`keyfact retrieval device forms the document vector and query vector with the keyfact, which
`
`is supplied from the keyfact extraction device 53, and the index file 52 generated by the
`
`10 keyfact index device 51.
`
`The keyfact weight constants (CK/rypc#). which are fit for the attribute of a
`
`document collection, are determined 55 before calculating the keyfact weights from
`
`document and query vector. Table 3 shows that keyfact weight constants are assigned to
`
`various pattems of keyfacts.
`
`[Table 31
`
`Keyfact pattern
`
`Weight constants
`
`[KEY,NIL]
`
`■7
`
`Page 19 of 178
`
`
`
`Type 2
`
`Type 3
`
`Type 4
`
`[KEY,MP] or
`[KEY,VH/VB]
`
`[KEY1,KEY2]
`
`[KEYl ICEY2,NIL] or
`[KEY2 KEYl, NIL]
`
`T>pe 5
`
`[KEYl KEY2 KEY3]
`
`CKlTypell
`
`CKlTypelll
`
`CxfTypelV
`
`CKfTypeV
`
`...
`
`...
`
`...
`
`The keyfact weight constants are assigned with the sequence like Cicfrypei <
`
`CKfTypeii < CKiTypeiii < CKfTypeiv < CKfTypev < ••• and do important role for the precision of
`
`keyfact-based text retrieval. Therefore, weight constants are determined experimentally on
`
`the basis of distribution of keyfact pattern of document collection.
`
`The keyfact weight constant is applied to the following equation 1 and the
`
`result of equation 1, a keyfact weight (wx^), is used in the keyfact-based text retrieval model.
`
`[Equation 1]
`
`N + l
`
`'13:1
`,r1i
`
`i
`
`Wxk:
`
`a keyfact weight
`
`10 tfxk:
`
`frequency of a keyfact
`
`a
`
`N;
`
`size of a document
`
`dfk:
`
`document frequency of a keyfact
`
`CkfType#: a keyfact weight constant
`
`15
`
`Conventionally, only the frequency of keywords (tfkeyword)) the document
`
`frequency of kejwords (dfkcyword), and the number of the documents in a document collection
`
`are considered in calculating tlie keyword weight in the keyword-based text retrieval system.
`
`However, the keyfact weight constant (CkiType#) of the keyfact pattern is also reflected in
`
`calculating keyfact weights in the keyfact-based retrieval system, so as to make it possible to
`
`20 index and retrieve in the imit of a keyfact.
`
`13
`
`Ul
`
`Page 20 of 178
`
`
`
`Next, the shuilarity of the document appropriate for the query is calculated by
`
`employing the keyfact retrieval model based upon the vector space model. The result of the
`
`similarity calculation determines the order of appropriate documents 57.
`
`Figure 6 shows a screen image for illustrating a document retrieval result in
`
`5 response to a query. A user makes a query in query section 61 with natural language. The
`
`keyfact is extracted by the keyfact-based text retrieval system and the documents close to the
`
`query are found. The result of the retrieval of the query is displayed at the document retrieval
`
`result screen 62 m the order of similarity. Document title and weight are also displayed with
`
`the order of similarity. In addition, if the document displayed is selected, document text
`
`10 screen 63 shows the contents of text of tlie document.
`
`According to the present invention, texts of document collection and user
`
`queries are expressed, indexed and retrieved by concept-based keyfacts. Therefore, more
`
`precise retrieval results are achievable. Additionally, since indexing and retrieval with high
`
`precision are possible, time and efforts can be minimized, the keyfact-based retrieval method
` 15 in accordance with the present invention can be used in various applications. Especially,
`
`digital library, text, and annotation based multimedia information retrieval of broadcasting
`
` station, intemet application, information retrieval of electronics cormnercial trading, and
`education/medical/military application areas can take advantage of the present invention.
`
`Although representative embodiments of the present invention have been
`
`20 disclosed for illustrative purposes, those skilled in the art will appreciate that