throbber
(12) United States Patent
`Chong et al.
`
`USOO6366908B1
`(10) Patent No.:
`US 6,366,908 B1
`(45) Date of Patent:
`Apr. 2, 2002
`
`(*) Notice:
`
`(73) ASSignee:
`
`(54) KEYFACT-BASED TEXT RETRIEVAL
`SYSTEM, KEYFACT-BASED TEXT INDEX
`METHOD, AND RETRIEVAL METHOD
`(75) Inventors: Kyung Taek Chong; Myung-Gil Jang;
`MiSeon Jun; Se Young Park, all of
`Taejon (KR)
`Electronics and Telecommunications
`Research Institute, Taejon (KR)
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`(21) Appl. No.:
`09/475,743
`(22) Filed:
`Dec. 30, 1999
`Foreign Application Priority Data
`(30)
`Jun. 28, 1999
`(KR) ............................................ 99-25035
`(51) Int. Cl. ................................................ G06F 17/30
`(52) U.S. Cl. ................... 707/3; 707/5; 707/6; 707/101;
`707/201; 704/7; 704/9; 704/10, 382/177;
`382/306
`(58) Field of Search ................................ 707/1, 2, 3, 4,
`707/5, 6, 102, 103, 104.1, 706/12, 45, 47;
`704/7, 9, 10, 270.1
`
`(56)
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`4,839,853. A 6/1989 Deerwester et al. ........ 364/900
`5,289,375 A * 2/1994 Fukumochi et al. ... 364/419.02
`5.535,382 A
`7/1996 Ogawa ....................... 395/600
`5,541,836 A * 7/1996 Church et al. ......... 364/419.07
`5,598,557 A * 1/1997 Doner et al................. 395/605
`5,708,829 A
`1/1998 Kadashevich et al. ...... 395/793
`5,721,902 A
`2/1998 Schultz ....................... 395/604
`5,771,378 A * 6/1998 Holt et al. .................. 395/605
`OTHER PUBLICATIONS
`Jang, Ho Wook and Se Young Park, “Key fact Concept For
`An Information Retrieval System.” Proceedings of Natural
`Language Proc. Pacific Rim Symposium, pp. 510–513,
`1995.
`
`Jun, MiSeon and Se Young Park, “Key fact-Based Informa
`tion Retrieval System,” International Symposium on Digital
`Library, pp. 521-524, 1997.
`Arampatzis et al., “Phrase-Based Information Retrieval.”
`Journal of Information Proc. & Management 34(6):1-19,
`1998.
`Robert Krovetz and W. Bruce Croft:“Lexical ambiguity and
`information retrieval”, Apr. 1992, ACM, vol. 10, pp.
`115-141.*
`Michael Sussna: “Word sense disambiguation for free-text
`indexing using a massive semantic network”, 1993, ACM,
`pp. 67-74.*
`
`* cited by examiner
`
`Primary Examiner Thomas Black
`Assistant Examiner Jacques Veillard
`(74) Attorney, Agent, or Firm-E. Russell Tarleton; SEED
`IP Law Group PLLC
`(57)
`ABSTRACT
`A key fact-based text retrieval method and a key fact-based
`text index method that describes the formalized concept of
`a document by a pair comprising an object that is the head
`and a property that is the modifier and uses the information
`described by the pairs as index information for efficient
`document retrieval. A key fact-based text retrieval System
`includes key fact extracting, key fact indexing, and key fact
`retrieving. The key fact extracting analyzes a document
`collection and a query and extracts keywords and key facts.
`The keywords do not have part-of-Speech ambiguity and the
`key facts are extracted from the keywords. The key fact
`indexing calculates the frequency of the key facts and gen
`erates a key fact list of the document collection for a key fact
`indeX Structure. The key fact retrieving receive a key fact of
`the query and key facts of the document collection and
`defines a key fact-based retrieval model in consideration of a
`weight factor of the key fact pattern and generates a retrieval
`result. The retrieval result is a document Similar to the query.
`
`12 Claims, 5 Drawing Sheets
`
`21-
`
`MAN MEMORY DEVICE
`
`11
`
`KEYFACT EXTRACTION DEVICE
`
`12--KEYFACT INDEX DEW CE
`
`INPUT AND OUTPUT DEVICE - -22
`QUERY
`RETREWAL
`RESULT
`PROCESSING
`- E.
`
`-23
`
`to--GD
`
`25
`
`C C C C
`
`DOCUMENT
`COLLECTION
`
`26
`
`(ge, l-H2
`
`Page 1 of 11
`
`GOOGLE EXHIBIT 1001
`
`

`

`U.S. Patent
`
`Apr. 2, 2002
`
`Sheet 1 of 5
`
`US 6,366,908 B1
`
`FIG. 1
`
`DOCUMENT
`COLLECTO
`
`KEYFACT INDEX DEVICE
`
`12
`
`
`
`KEYFACT
`EXTRACTION
`DEVICE
`
`
`
`|NOEX
`STRUCTURE
`
`16
`
`
`
`
`
`
`
`
`
`
`
`KEYFACT RETREVAL DEVICE
`
`13
`
`RETREVAL
`RESULT
`
`17
`
`FIG 2
`
`
`
`
`
`
`
`
`
`
`
`
`
`MAN MEMORY DEVICE
`
`
`
`KEYFACT EXTRACTION DEVICE
`
`KFYFACT INDEX DEVICE
`
`
`
`KEYFACT RETREVAL DEVICE
`
`NDEX
`STRUCTURE
`
`
`
`
`
`
`
`INPUT AND OUTPUT DEVICE
`RETREVAL
`RESULT
`QUERY
`CENTRAL PROCESSING
`DEVICE
`
`HARD DISK
`
`
`
`DOCUMENT
`COLLECTION
`
`
`
`DCTIONARES
`
`22
`
`23
`
`24
`
`26
`
`27
`
`Page 2 of 11
`
`

`

`U.S. Patent
`
`Apr. 2, 2002
`
`Sheet 2 of 5
`
`US 6,366,908 B1
`
`FIG 3
`
`
`
`31
`
`INPUT DOCUMENT
`
`MORPHOLOGICAL ANALYSS S-C
`bictionales-36
`
`
`
`
`
`PART-OF-SPEECH TAGG|NG
`
`32
`
`33
`
`34-KEYFACT PAERN EXTRACTIONHKEYFACT
`PATTERN RULE
`KEYFACT GENERATION O d
`KEYFACT
`GENERATION
`RULE
`
`35
`
`
`
`
`
`39
`
`
`
`
`
`37
`
`38
`
`Page 3 of 11
`
`

`

`U.S. Patent
`
`Apr. 2, 2002
`
`Sheet 3 of 5
`
`US 6,366,908 B1
`
`FIG. 4
`
`DOCUMENT
`COLLECTION
`
`KEYFACT EXTRACTION DEVICE
`
`START
`
`41
`
`*KEYFACD- 42
`
`KEYFACT FREQUENCTY,
`OOCUMENT FREQUENCY
`CALCULATION
`
`43
`
`44
`
`
`
`TABLES
`GENERATING
`DOCUMENT INDEX TABLE
`DOCUMENT TABLE
`
`KEYFACT INDEX
`TABLE
`
`FORMING INDEX
`STRUCTURE
`
`45
`
`CINDEX FCD-46
`
`END
`
`Page 4 of 11
`
`

`

`U.S. Patent
`
`Apr. 2, 2002
`
`Sheet 4 of 5
`
`US 6,366,908 B1
`
`FIG. 5
`
`51
`
`KEFACT INDEX DEVICF
`
`STAR
`
`se-Gel)
`
`KEYFACT EXTRACTION DEVICE
`
`55
`
`FORMING DOCUMENT,
`QUERY VECTOR
`
`53
`
`54
`
`DETERMINING
`KEYFACT WEIGHT
`CONSTANT Cire
`
`CALCULATING DOCUMENT
`QUERY WEIGHT
`
`56
`
`57 - RANKED RETRIEVAL RESULT
`RETREVAL RESULT
`
`END
`
`KEYFACT
`RETREVAL
`MODEL
`
`
`
`58
`
`Page 5 of 11
`
`

`

`U.S. Patent
`
`Apr. 2, 2002
`
`Sheet 5 of 5
`
`US 6,366,908 B1
`
`FIG. 6
`
`
`
`QUERY IN
`NATURAL LANGUAGE
`
`DOCUMENT TEXT:
`
`RESULT
`
`Page 6 of 11
`
`

`

`1
`KEYEACT-BASED TEXT RETRIEVAL
`SYSTEM, KEYFACT-BASED TEXT INDEX
`METHOD, AND RETRIEVAL METHOD
`TECHNICAL FIELD
`The present invention relates to a key fact-based text
`retrieval method and a key fact-based text index method. In
`particular, the methods describe the formalized concept of a
`document as a pair comprising an object that is the head and
`a property that is the modifier, and uses the information
`described by the pair as index information for efficient
`document retrieval.
`
`BACKGROUND OF THE INVENTION
`A key fact means an important fact contained in Sentences
`which constitute a document. The key fact is represented by
`an object and property information through Syntactic analy
`sis of the Sentence.
`The keyword-based text retrieval method was the main
`stream in conventional text retrieval methods. However, the
`precision of the keyword-based text retrieval method was
`not good due to the following reasons. First, the meaning of
`the document is not precisely represented and the represen
`tativeness of document expression is low because the docu
`ment is represented by keywords, which are nouns. This is
`a fundamental reason for poor retrieval precision. Second,
`when a query includes a natural language phrase or a natural
`language Sentence or keywords, the intention of the user's
`query is not reflected precisely in a keyword-based text
`retrieval method because the query is expressed by key
`words. Therefore, the keyword-based text retrieval method
`has a fundamental limitation in retrieval precision because it
`performs document retrieval by keywords. As a result,
`because the keyword-based text retrieval System provides
`Such low level of retrieval precision, it causes a number of
`unnecessary retrievals and therefore precious resources,
`Such as time and effort, are wasted.
`Recently, a number of studies have been performed in the
`area of phrase-based text retrieval methods in order to
`compromise Such defects of the keyword-based retrieval
`method. The phrase-based text retrieval methods extract a
`precise phrase pattern through a morphological-Syntactic
`normalization proceSS and perform indexing and retrieval by
`extracted phrase. Therefore, the phrase-based retrieval
`method performs more precise text retrieval than the
`keyword-based text retrieval method but performs less pre
`cise text retrieval than a concept-based text retrieval method,
`which expresses text by concept units.
`A new approach to key fact-based text retrieval methods
`has been proposed in order to overcome the shortcomings of
`the keyword-based text retrieval method and generalize
`phrase-based text retrieval method. In the key fact-base text
`retrieval method, a part of text that represent the same
`meaning is described as a key fact. Since the key fact-based
`retrieval method is a Sort of concept-based retrieval method,
`and therefore indexing and retrieval of the key fact-based
`retrieval method are performed with the unit of the key fact,
`precision of the retrieval is greatly improved.
`In the key fact-based retrieval method, it is desirable that
`phrases or words having the same meaning are indexed as
`the same indexing terms. For example, noun phrases includ
`ing “the retrieval of information” as a subset of “the efficient
`retrieval of information”, “the retrieval of the distributed
`information', and “the fast retrieval of the distributed infor
`mation' must have common indices which can be possibly
`generated from “the retrieval of information” as Subsets and
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`US 6,366,908 B1
`
`2
`recognize also them as different meaning with Subtle con
`ceptual different indexes at the same time.
`Since the keyword-based retrieval method doesn't recog
`nize the conceptual difference between “the retrieval of the
`information” and “the efficient retrieval of the information',
`users are not able to retrieve the exact document that is
`desired.
`
`SUMMARY OF THE INVENTION
`A key fact-based retrieval method, which extracts the
`precise keyfact pattern using the natural language processing
`techniques and indexes documents with the unit of the
`key fact, is provided.
`In addition, a key fact-based retrieval method, which
`extracts precise key fact patterns included in a natural query
`of a user using the natural language processing techniques
`and retrieves documents similar to the query in the key fact
`based indeX file, is provided.
`In addition, a key fact-based retrieval method, which
`retrieves and indexes documents with the unit of key fact, is
`provided.
`A key fact-based text retrieval System of the present inven
`tion includes key fact extracting means, key fact indexing
`means, and key fact retrieving means. The key fact extracting
`means analyze a document collection and a user query, and
`extracting keywords not having part-of-Speech ambiguity
`from the document collection and the user query, and
`respectively extracting key facts of the document collection
`and the user query from the keywords. The key fact indexing
`means for calculating the frequency of the key facts of the
`document collection and generating a key fact list of the
`document collection for a key fact indeX Structure. The
`key fact retrieving means for receiving the key fact of the user
`query and the key facts of the document collection and
`defining a key fact retrieval model in consideration of weight
`factors according to a key fact pattern and generating a
`retrieval result.
`The key fact extracting means includes morphology ana
`lyzing means, part-of-Speech tagging means, keyfact pattern
`extracting means, and key fact generating means. The mor
`phology analyzing means analyze morphology of an input
`Sentence and obtaining tag Sequences of part-of-Speech by
`attaching part-of-Speech tags. The part-of-Speech tagging
`means Selects a tag Sequence of part-of-Speech out of the tag
`Sequences of part-of-speech. The tag Sequence of part-of
`Speech is precise. The key fact pattern extracting means
`extracts a key fact pattern by applying the tag Sequences of
`part-of-Speech to a key fact pattern rule. The key fact gener
`ating means applies the key fact pattern to a key fact pattern
`generation rule and generating a key fact list, which is a Set
`of key fact terms.
`The key fact indexing means includes frequency calculat
`ing means, table generating means, and key fact indexing
`means. The frequency calculating means calculates a fre
`quency of various key facts and a document frequency of the
`key facts. The various key facts are included in the document
`collection, and the document frequency is the number of
`documents contained the various key facts. The table gener
`ating means generates a document index table, a document
`table, and a key fact indeX table of the document collection.
`The key fact indexing means forms a keyfact indeX Structure.
`The key fact indeX Structure has information regarding docu
`ment frequency, document identifier, and key fact frequency
`in each corresponded documents.
`The key fact retrieving means includes following means. A
`means forms a document and a user query vector with an
`
`Page 7 of 11
`
`

`

`US 6,366,908 B1
`
`15
`
`25
`
`35
`
`40
`
`3
`index file and the key fact of the user query. The index file
`generated by the key fact indexing means. The key fact of the
`user query generated by the key fact extracting means. A
`means determines key fact weight constants in accordance
`with the key fact pattern. A means calculates key fact weights
`for the document and the user query by applying the key fact
`weight constants to the document and the user query vector.
`The retrieval results displaying means displays the retrieval
`result by applying the key fact weights to key fact retrieval
`model. The retrieval result indicates documents with a
`key fact Similar to the key fact of the user query.
`A key fact-based text retrieving method of the present
`invention includes key fact extracting Step, key fact indexing
`Step, and key fact retrieving Step. The key fact extracting Step
`is to analyze a document collection and a user query, and
`extracts keywords without part-of-Speech ambiguity from
`the document collection and the user query, and respectively
`extracts key facts of the document collection and the user
`query from the keywords. The key fact indexing Step is to
`calculates the frequency of the key facts of the document
`collection and generates a key fact list of the document
`collection for a key fact indeX Structure. The key fact retriev
`ing Step is to receives the key fact of the user query and the
`key facts of the document collection and defines a key fact
`retrieval model in consideration of weigh factors according
`to the key fact pattern and generates the retrieval result.
`The Step of key fact extracting includes the following
`Steps. The first Step is to analyze morphology of an input
`Sentence and obtaining tag Sequences of part-of-Speech by
`attaching part-of-Speech tags. The Second Step is to Select a
`tag Sequence of part-of-speech out of the tag Sequences of
`part-of-Speech. The third Step is to extract a key fact pattern
`by applying the tag Sequence of part-of-Speech to a key fact
`pattern rule. The fourth Step is to apply the key fact pattern
`to a key fact pattern generation rule and generating a key fact
`list.
`The Step of analyzing morphology includes the following
`Steps. The first Step is to divide the input Sentence into
`words. The Second Step is to perform morphological analysis
`on the words using part-of-Speech dictionaries. The third
`Step is to perform morphological variation and recover
`prototypes. The fourth Step is to obtain the tag Sequence of
`part-of-Speech by tagging part-of-Speech tags in accordance
`with the result of the morphological analysis.
`The part-of-Speech dictionaries include a noun dictionary,
`a verb dictionary, an adjective dictionary, an adverb
`dictionary, a preposition dictionary, a conjunction dictionary
`and a stop-word lexicon.
`The Step of key fact indexing includes the following StepS.
`The first Step is to calculate a frequency of various key facts
`and a document frequency of the key fact. The Second Step is
`to generate a document indeX table, a document table and a
`key fact index table of the document collection. The third
`Step is to form a key fact indeX Structure including document
`frequency, document identifier and key fact frequency.
`The Step of key fact retrieving includes the following
`Steps. The first Step is to form a document and a user query
`vector with an indeX file and a key fact of the user query. The
`Second Step is to determine key fact weight constants in
`accordance with the key fact pattern. The third Step is to
`calculate key fact weights for the document and the user
`query by applying the key fact weight constants to the
`document and the user query vector. The fourth Step is to
`display the retrieval result by applying the key fact weights
`to the key fact retrieval model. The retrieval result indicates
`documents with a key fact Similar to the key fact of the user
`query.
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`BRIEF DESCRIPTION OF THE DRAWINGS
`FIG. 1 is a block diagram illustrating a key fact-based text
`retrieval System of the present invention;
`FIG. 2 is a block diagram illustrating a hardware structure
`of a key fact-based text retrieval System in accordance with
`an embodiment of the present invention;
`FIG. 3 is a block diagram illustrating a key fact extraction
`device of a key fact-based text retrieval System in accordance
`with an embodiment of the present invention;
`FIG. 4 is a block diagram illustrating a key fact indeX
`device of a key fact-based text retrieval System in accordance
`with an embodiment of the present invention;
`FIG. 5 is a block diagram illustrating a key fact retrieval
`device of a key fact-based text retrieval System in accordance
`with an embodiment of the present invention; and
`FIG. 6 is a Screen image illustrating a document retrieval
`result in response to a query.
`DETAILED DESCRIPTION OF THE
`INVENTION
`FIG. 1 is a block diagram illustrating a key fact-based text
`retrieval System of the present invention. The key fact-based
`text retrieval System comprises a key fact extraction device
`11, a key fact indeX device 12, and a key fact retrieval device
`13. FIG. 2 is a block diagram illustrating a hardware
`Structure of a key fact-based text retrieval System in accor
`dance with an embodiment of the present invention.
`As shown in FIG. 2, the main memory device 21 includes
`a key fact extraction device, a key fact indeX device 12, a
`key fact retrieval device 13, and an index structure 16. The
`central processing device 23 Supervises the key fact-based
`text retrieval. A hard disk 24 stores document collection 25,
`dictionaries for key fact retrieval 26, and an indeX file that is
`the result of the key fact index. The index file 27 is loaded
`onto the main memory as an indeX Structure 16 and the
`key fact retrieval device 13 uses the index file. The input and
`output device 22 receives a query from a user and generates
`retrieval results to the user.
`Now, the key fact-based text retrieval System in accor
`dance with the present invention is explained with reference
`to FIG. 1. Once a document collection 14 or a query 15 is
`given, the key fact extraction device 11 extracts words with
`out ambiguity by performing morphological analysis and
`tagging. The key fact generation rule is applied to the words
`and then the key facts are extracted.
`The key fact index device 12 indexes the document col
`lection 14 or the query with the unit of key fact and calculates
`the frequencies of the key facts. The frequencies of the
`key facts are stored into the index structure 16 with the
`document ID information. The keyfact retrieval device 13
`orders documents using the Similarity calculation method
`and ShowS retrieval results. The Similarity calculation
`method considers document collection and key fact weights
`with the help of a key fact-based text retrieval model. In a
`key fact-based text retrieval, when a document collection 14
`or a query is given, the key fact extraction device 11
`expresses it in the unit of key facts. All key facts express
`Semantic relation between words in the form of object,
`property. Keyfacts can be categorized by configurations of
`an object and a property. Parts of text that express the same
`conceptual meaning in the document collection or the query
`are categorized into the same key fact type. The key fact
`extraction device will be reviewed in detail below with FIG.
`3.
`The keyfact index device 12 indexes the extracted key
`facts with frequency information. In other words, the key fact
`
`Page 8 of 11
`
`

`

`S
`indeX device 12 calculates frequencies of the various forms
`of key facts included in the documents and generates a
`key fact list of the document collection. Therefore, an index
`structure 16 that reflects key facts is created and the index file
`is stored. The key fact index device 12 will be reviewed in
`detail below with FIG. 4.
`When the key fact retrieval device 13 receives a query, it
`retrieves appropriate documents on the basis of the key fact
`based retrieval method. The keyfact retrieval model is
`defined by considering weights of key fact patterns. The
`Similarity between the query and the documents is calculated
`and appropriate documents for the query are shown as a
`result in the order of the similarity. The key fact retrieval
`device 13 will be reviewed in detail below with FIG. 5.
`As shown in FIG. 3, the key fact extraction device 11
`analyzes a document and generates keyfacts through the
`processes of morphological analysis, part-of-Speech tagging,
`key fact pattern extraction, and key fact generation.
`A document is Supplied at Stage 31 and morphological
`analysis is performed at Stage 32. A Sentence in the docu
`ment is divided into words and the morphological analysis
`is performed with dictionaries 36 at stage 32. The morpho
`logical variation is considered in order to recover proto
`types. The dictionaries 36 include a noun dictionary, a verb
`dictionary, an adjective dictionary, an adverb dictionary, a
`preposition dictionary, a conjunction dictionary, and a stop
`word lexicon. In Some cases, a part-of-Speech of a word is
`determined by rules without dictionaries.
`The part-of-Speech tag in dictionaries 36 includes noun
`(N), verb (V), adjective (A), preposition (P), and stop-word
`(S). The noun is further divided into proper noun (NO),
`name noun (NN), vocative noun (NV), unit nouns (NJ),
`predicate noun (NU), non-predicate noun (NX), etc. The
`reason for Such division is that the class of noun determines
`the object or the property of the key facts.
`For example, in a Sequence of words having two or three
`nouns in a row, it is likely that name noun (NN), proper noun
`(NO), and non-predicate noun (NX) are objects and vocative
`noun (NV), unit noun Id), and predicate noun (NU) are
`properties. Additionally, in a phrase having proper noun
`(NO), name noun (NN), and non-predicate noun (NX), the
`order of priority of nouns in the object is name noun
`(NN)>proper noun (NQ)>non-predicate noun (NX).
`The preposition is divided into the possessive preposition
`(PO) which is used as “of” and the positional preposition
`(PP) and etc. The adjective or the variated verb which makes
`up the noun is tagged as a pronoun (MP), which is a separate
`key fact tag. For example, in analyzing “the fast retrieval of
`the distributed information' with morphological analysis, a
`50
`result of the sequence of the tag would be “S (stop-word) A
`(adjective) NV (vocative noun) PO (possessive preposition)
`S (stop-word) V-ed (verb) NV (vocative noun). The V-ed
`(verb) is a modified form of verb and makes up the noun.
`Like the A (adjective), the V-ed (verb) is converted into a
`key fact tag MP and the Sequence of nouns is converted into
`a keyfact tag KEY. The final result would become “NMP
`KEY PO MP KEY.
`Once the Stage 32 of morphological analysis is performed,
`various results are obtained.
`At Stage of 33 in which part-of-Speech tagging is
`performed, a precise Sequence of tags is chosen among the
`various results of the morphological analysis. In other
`words, the part-of-Speech tags obtained from the morpho
`logical analysis are used at the Stage of part-of-speech
`tagging. The modified form of Verb that makes up a noun or
`an adjective is converted into a modifier (MP) and the
`
`6
`Sequence of nouns is converted into KEY tag. The exem
`plary sentence “the fast retrieval of the distributed informa
`tion” shows the final sequence of tags “MP KEY PO MP
`KEY.
`Once the final Sequence of tags in response to the input
`Sentence is obtained, the Stage of key fact pattern extraction
`34 searches the key fact pattern rule 37 and extracts mean
`ingful key fact patterns necessary for key fact generation. The
`key fact pattern rule 37 which is used for key fact pattern
`extraction describes key fact patterns as to the Sequence of
`the input tags. A part of the key fact pattern rule is illustrated
`at following table 1.
`
`Keyfact pattern
`
`KEY1. POKEY2
`(the retrieval of information)
`
`KEY1PO MPKEY2
`(the retrieval of the
`distributed information)
`MPKEY1. POKEY2
`(the fast retrieval of
`information)
`MP1 KEY1. POMP2 KEY2
`(the fast retrieval of the
`distributed information)
`
`TABLE 1.
`
`Key fact term list
`KEY2, KEY1), KEY1, NIL),
`KEY2, NIL),
`KEY2 KEY1, NIL
`information, retrieval
`information, NIL,
`retrieval, NIL,
`information retrieval, NIL
`KEY2, KEY1), KEY1, NIL),
`KEY2, NIL),
`KEY2 KEY1, NIL, KEY2, MP
`KEY2, KEY1), KEY1, NIL),
`KEY2, NIL),
`KEY2 KEY1, NIL, KEY1, MP
`KEY2, KEY1), KEY1, NIL),
`KEY2, NIL),
`KEY2 KEY1, NIL, KEY1, MP1,
`KEY2, MP2
`
`(Note:
`The italic is the examples.)
`The final sequence of tags “MP KEY PO MP KEY”
`obtained from “the fast retrieval of the distributed informa
`tion' is applied to the key fact pattern rule and the key fact
`pattern “MP1 KEY1PO MP2 KEY2" is the result.
`Key fact terms that have forms of object, property are
`generated as to the input key fact pattern at the Stage of the
`key fact generation 35 by Searching the key fact generation
`rule 38. The object is a noun or a compound noun repre
`Sented by a keyword and the property is a verbal word or a
`noun that makes up another noun, or a prototype of a Verbal
`word.
`The key fact generation rule includes possible key fact
`lists, each of which can be generated in each key fact pattern.
`In the example stated above, if the key fact pattern “MP1
`KEY1 JY MP2 KEY2" is applied to the key fact generation
`stage, “KEY2, KEY1), KEY1, NIL), KEY2, NIL),
`KEY2 KEY1, NIL, KEY1, MP1), KEY2, MP2 is
`going to be the outcome. That is, a key fact list 39
`“information, retrieval, retrieval, NIL), information,
`NIL), information retrieval, NIL), retrieval, fast,
`information, distributed” is obtained from the key fact
`pattern “the fast retrieval of the distributed information”.
`The key fact index device is now reviewed in detail with
`FIG. 4.
`The key fact indeX device calculates Statistical frequencies
`of key facts in a document obtained from the key fact extrac
`tion device 11 and forms the index structure. Therefore,
`indeX information is efficiently maintained and processed by
`the key fact index device. Each index term of the key fact
`indeX device is an extracted key fact term representing each
`document.
`For each document, the key fact frequency (tf) and docu
`ment frequency of the key fact (df) are calculated in order to
`obtain the frequency information of the key facts.
`
`US 6,366,908 B1
`
`15
`
`25
`
`35
`
`40
`
`45
`
`55
`
`60
`
`65
`
`Page 9 of 11
`
`

`

`US 6,366,908 B1
`
`8
`
`7
`Next, Supplementary tables Such as a document indeX
`table, a document table, and a key fact indeX table are
`generated to form an efficient index structure 44. The
`document index table contains key facts of the document, the
`frequency information. The document table includes a real
`document text. The key fact index table is the main table that
`includes the document frequency (df) of each key fact, and
`pair list of the document identifier of each key fact and the
`frequency information within a document (tf).
`Next, an index structure is formed in the unit of the
`key fact and an indeX file is Stored. Efficient Storage Struc
`tures like the B+ tree can be used for the index structure. The
`inverted file structure of the key fact index table is used as
`posting information file Structure.
`A part of the result of the key fact index is shown in the
`following table 2.
`
`TABLE 2
`
`Df
`
`Document id:
`frequency
`
`2
`3
`
`2
`1.
`4
`1.
`
`(162:1)(197:1)
`(102:2)(188:3)(193:1)
`
`(6:2)(29:1)
`(6:1)
`(21:1)(33:2)(88:1)(90:3)
`(102:1)
`
`Key fact index
`
`thorn, sharp
`thorn, dull
`
`reed, NIL
`reed field, NIL
`branch, NIL
`Dahurian buckhorn
`family, NIL
`
`At table 2, in case of branch, NIL), “branch” appears at
`4 documents and therefore the document frequency (df) for
`key fact index branch, NIL) is four. In addition, “branch”
`appears once in document 21, twice in document 33, once in
`document 88, and three times in document 90.
`The key fact retrieval device 13 is now reviewed in detail
`with FIG. 5. The key fact retrieval device forms the docu
`ment vector and query vector with the keyfact, which is
`supplied from the keyfact extraction device 53, and the
`index file 52 generated by the key fact index device 51.
`The keyfact weight constants (Cl), which are fit for
`the attribute of a document collection, are determined 55
`before calculating the key fact weights from document and
`query vector. Table 3 shows that key fact weight constants
`are assigned to various patterns of key facts.
`
`TABLE 3
`
`Types
`Type 1
`Type 2
`
`Type 3
`Type 4
`Type 5
`
`Keyfact pattern
`KEY, NIL
`KEY, MP or
`KEY, VHIVB
`KEY1, KEY2
`KEY1 KEY2, NIL) or
`KEY2 KEY 1, NIL
`KEY1 KEY2 KEY3
`
`Weight constants
`CKrType
`CKrType II
`
`CKFTypeIII
`CKrTyperv
`CKrTypev
`
`The key fact weight constants are assigned with the
`Sequence
`like
`CKftyperCKftyperCKrypetIrCKftspervsCKftypews ... and do
`important role for the precision of key fact-based text
`retrieval. Therefore, weight constants are determined experi
`mentally on the basis of distribution of key fact pattern of
`document collection.
`The key fact weight constant is applied to the following
`equation 1 and the result of equation 1, a key fact weight
`(W), is used in the key fact-based text retrieval model.
`
`Wyk = if
`
`lost
`
`N + 1
`df.
`
`) Corra
`
`Equation 1
`
`5
`
`15
`
`25
`
`35
`
`45
`
`50
`
`55
`
`60
`
`65
`
`W: a key fact weight
`tf: frequency of a key fact
`N: size of a document
`df document frequency of a key fact
`Crisper a key fact weight constant
`Conventionally, only the frequency of keywords
`(tfe), the document frequency of keywords (dfe),
`and the number of the documents in a document collection
`are considered in calculating the keyword weight in the
`keyword-based text retrieval system. However, the key fact
`Weight constant (C) of the keyfact pattern is also
`reflected in calculating key fact weights in the key fact-based
`retrieval System, So as to make it possible to indeX and
`retrieve in the unit of a key fact.
`Next, the Similarity of the document appropriate for the
`query is calculated by employing the key fact retrieval model
`based upon the vector Space model. The result of the
`Similarity calculation determines the order of appropriate
`documents 57.
`FIG. 6 shows a Screen image for illustrating a document
`retrieval result in response to a query. A user makes a query
`in query Section 61 with natural language. The key fact is
`extracted by the key fact-based text retrieval System and the
`documents close to the query are found. The result of the
`retrieval of the query is displayed at the document retrieval
`result screen 62 in the order of similarity. Document title and
`weight are also displayed with the order of Similarity. In
`addition, if the document displayed is selected, document
`text screen 63 shows the contents of text of the document.
`According to the present invention, texts of document
`collection and user queries are expressed, indexed and
`retrieved by concept-based key facts. Therefore, more pre
`cise retrieval results are achievable. Additionally, Since
`indexing and retrieval with high precision are possible, time
`and efforts can be minimized, the keyfact-based retrieval
`method in accordance with the present invention can be used
`in various applications. Especially, digital library, text and
`annotation based multimedia information retrieval of broad
`casting Station, internet application, information retrieval of
`electronicS commercial trading, and education/medical/
`military application areas can take advantage of the present
`invention.
`Although representative embodiments of the present
`invention have been disclosed for illustrative purposes,
`those skilled in the art will appreciate that various
`modifications, additions and Subs

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket