`(12) Patent Application Publication (10) Pub. No.: US 2001/0014852 A1
`Tsourikov et al.
`(43) Pub. Date:
`Aug. 16, 2001
`
`US 20010014852A1
`
`(54) DOCUMENT SEMANTIC
`ANALYSIS/SELECTION WITH
`KNOWLEDGE CREATIVITY CAPABILITY
`
`(76) Inventors: Valery M. Tsourikov, Boston, MA
`s
`s
`(US); Leonid S. Batchilo, Belmont,
`MA (US); Igor V. Sovpel, Minsk (BY)
`Correspondence Address:
`Edward Dreyfus, Esq.
`Stanger & Dreyfus
`608 Sherwood Parkway
`Mountainside, NJ 07092 (US)
`
`(21) Appl. No.:
`(22) Filed:
`
`09/745,261
`Feb. 7, 2001
`
`Related U.S. Application Data
`(63) Continuation of application No. 09/321,804, filed on
`May 27, 1999, now Pat. No. 6,167,370. which is a
`non-provisional of provisional application No.
`60/099,641, filed on Sep. 9, 1998.
`Publication Classification
`
`(51) Int. Cl. ............................ G06F 17/27; G06F 17/30
`
`(52) U.S. Cl. ........................................ 704/9; 704/1; 707/3
`
`5
`7
`(57)
`
`ABSTRACT
`
`A computer based Software System and method for Seman
`tically processing a user entered natural language request to
`identify and Store linguistic Subject-action-object (SAO)
`Structures, using Such structures as key words/phrases to
`Search local and web-based databases for downloading
`candidate natural language documents, Semantically pro
`cessing candidate document texts into candidate document
`SAO Structures, and Selecting and Storing only relevant
`documents whose SAO structures include a match with a
`Stored request SAO Structure. Further features include ana
`lyzing relationships among relevant document SAO Struc
`tures and creating new SAO Structures based on Such
`relationships that may yield new knowledge concepts and
`ideas for display to the user and generating and displaying
`natural language Summaries based on the relevant document
`SAO structures.
`
`2
`
`USER REQUEST-
`DBOFORIGINALDOCUMENTS
`WEB-P GAISE" is
`LOCALDB-
`:
`26
`:
`DBOFSUMMARIES OF
`ORIGINALDOCUMENTS
`;
`(NATURALLANGUAGETEXTS)
`28
`
`DBOF NEW CONCEPTS
`(NATURALLANGUAGETEXTS)
`30
`
`DBOF ACCURATEKEY
`WORDSPHRASES
`REPRESENTATIONS OF
`ORIGINALTEXTS
`
`TOWEB
`
`TOLOCALDB
`
`:
`
`
`
`SARFTEXI
`| EMTR
`ENGOP
`SAGEXTRACTION
`SAONORMALIZER
`
`SAO PROCESSOR
`COMPARISON
`RE-ORGANIZATION
`FILTERING
`
`
`
`20
`
`--SEMANTIC
`: EESR
`""
`
`in-SEMANTIC
`R
`So
`;
`;
`
`18
`
`He DBOFSAO-STRUCTURES
`
`SAOSYNTHESIZER OF
`NATURALLANGUAGETEXT
`
`T :
`
`SAOSYNTHESIZER OF
`KEY WORDSPHRASES
`REPRESENATION
`
`Page 1 of 18
`
`GOOGLE EXHIBIT 1015
`
`
`
`Patent Application Publication Aug. 16, 2001 Sheet 1 of 12
`
`US 2001/0014852 A1
`
`to N.
`
`
`
`I
`PRINTER H
`18 - III
`
`
`
`12
`
`2
`e Evano
`NETWORK
`
`
`
`
`
`
`
`H
`in
`o H Dom Hall
`KEYBOARD
`
`FIG. 1
`
`Page 2 of 18
`
`
`
`mm.P
`
`US 2001/0014852 A1
`
`.mun3532.?Hn1magma"Iflhwxmmx."E35225"a"@ngw222%);nA_.-----------------------------------------------------------------------._
`
`
` p.mmuamMu"Ezgazsizzz_u2n"LENEEZZg_Hwun"AEEQEZEEEZV“%mmmageméfiog_mEgzsgmiogm1uu25::uumm2"295251;mm6,um285528"u1.n"$830235"aEzngégézv"We“u"$553822an"Anu"aggzéiogHn___..mmm5%:ngmmmu“zgagwymmmm“u233
`b_.__uuu:zzozamfi_EVEmo<2ozfi
`
`._<mE<zU.n2uzu$3$259:“22330222032“a;
`
`
`
`
`N.G_LunuunulnlaanluuuunI...IHHHHMHHHHHHHHHHHHHHHHHWWH.uuuuuuuuuuuuuuuuuuuuuuuuuuu
`“mm@2022___3522anmm8%ng"LomwmmgfifimmmLoggia:"mggségm__u_age
`
`Page 3 0f 18
`
`Page 3 of 18
`
`
`
`
`Patent Application Publication Aug. 16, 2001 Sheet 3 of 12
`
`US 2001/0014852 A1
`
`SENDINGKEYWORD PHRASES REPRESENTATIONS TOWEB
`TOWEB (ALTA-VISTA, LOCALDBs, ETC.)
`LOCALDBS
`(30)
`(2)
`ROM
`STORINGDBOFKEYWORD/PHRASES
`STORING AFULTEXT
`REPRESENTATION OF USER REQUEST
`OFCURRENT CANDDATE
`DOCUMENT
`(24)
`STORING OFSASTRUCTURES (18) SYNTHESISOFKEYWORDPHRASEs
`EAE
`OFUSER REQUEST
`REPRESENTATION OFUSERREQUEST
`RAN:
`TOUSER
`
`USERREQUEST
`
`SAOANALYSIS OF TEXT
`SEREES' (16)
`
`
`
`COMPARATIVEANALYSIS OFSAO-
`STRUCTURES OF USERREQUESTAND SAO
`STRUCTURES OF CANDIDATEDOCUMENT
`F
`REEWANT
`
`DELETING
`THECURRENT
`CANDIDATE
`DOCUMENT
`ANDTSSAO
`STRUCTURES
`
`MARKING
`THECURRENT
`CANDDATE
`DOCUMENT
`ANDTSSAO
`STRUCTURES
`ASRELENANT
`
`
`
`SAOANALYSIS OF TEXT
`OFCURRENT CANDIDATE
`DOCUMENT
`
`
`
`
`
`STORINGDBOFSAO
`STRUCTURES OF CANDIDATE
`DOCUMENT
`
`FMARKED
`RELEVANT
`
`( 20)
`
`FILTERING MAJOR
`SAO-STRUCTURES
`
`(22)
`SYNTHESZINGATEXT OFA
`SHORTSUMMARY OFRELEWANT
`DOCUMENT
`(26)
`STORINGDBOFSUMMARIES OF
`REEWAN DOCUMENT
`
`(20)
`PROCESSING THESAO-STRUCTURESOFRELEVANT DOCUMENTS,
`REORGANIZING EXISTED SAO-STRUCTURES ANDSYNTHESIZINGA
`NEWSAO-STRUCTURES
`(18)
`STORINGDBOF NEWSAO-STRUCTURES
`(22)
`(28)
`SYNTHESzNGATE OF NEW CONCEPTS-S.
`
`DISPLAYING TO USER
`
`FIG. 3
`
`Page 4 of 18
`
`
`
`Patent Application Publication Aug. 16, 2001 Sheet 4 of 12
`
`US 2001/0014852 A1
`
`FROMDBOFDOCUMENTS (12)
`
`
`
`
`
`
`
`
`
`44 (SEPARATEDATABASES)
`LIST OF CODES
`DCTIONARY WORD-CODES
`DCTIONARY DOM-CODES
`DICTIONARY WORD-CODE-FREQUENCY
`STATISTICALMATRIX CODE-CODE
`PARSINGRULES SNS OFCODES)
`VERB/NOUN GROUPPATTERNS
`
`
`
`DOCUMENT PRE-FORMATTER
`32
`
`y
`
`TEXT CODER (TAGGING)
`
`RECOGNIZER OF
`VERBINOUNSGROUPS
`
`
`
`36
`
`SENTENCEPARSER
`
`S-A-OEXTRACTOR
`
`S-A-ONORMALIZER
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`TODBOFSA-OSTRUCTURES(18)
`
`FIG. 4
`
`Page 5 of 18
`
`
`
`Patent Application Publication Aug. 16, 2001 Sheet 5 of 12
`
`US 2001/0014852 A1
`
`FROMDBOFSAO STRUCTURES
`
`20
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`COMPARATIVE UNIT
`
`REORGANIZING SE
`UNIT FORSYNTHESIS OF
`NEWSAO
`
`FILTERING UNIT
`
`52(SEPARATEDATABASES)
`- SYNONYMDB, (DICTIONARY INCLUDING THE
`FREQUENCIES OF WORDS AND THEIRSYNONIMS)
`- RULES OF LOGICAL INFERENCE FOR SAO
`STRUCTURES
`- STOP-WORDS/PHRASESFREQUENCYDB
`- RULES OF COMPARING
`- SEMANTICMARKERS (PATTERNSAS
`CODESTRINGS
`
`
`
`TODBOFSA-OSTRUCTURES
`
`FIG.5
`
`Page 6 of 18
`
`
`
`Patent Application Publication Aug. 16, 2001 Sheet 6 of 12
`
`US 2001/0014852 A1
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`- - - - - • • • - - - - - - -§§§§§
`
`
`
`OWS MENS||[\d|N|HH|| ||
`
`(82)9'0||(92)
`
`Page 7 of 18
`
`
`
`Patent Application Publication Aug. 16, 2001 Sheet 7 of 12
`
`US 2001/0014852 A1
`
`SOURCE SENTENCE
`
`The present invention shields a
`noise of an external magnetic field
`with the slider and improves a
`reCOrding performance because the
`Slider is isolated magnetically.
`
`FIG.7
`
`Page 8 of 18
`
`
`
`Patent Application Publication Aug. 16, 2001 Sheet 8 of 12
`
`US 2001/0014852 A1
`
`TAGGED SENTENCE
`
`The ATI present JJ invention NN
`Shields WBZa ATnOise NN Of IN
`an AT external JJ magnetic JJ
`field NNWith N the ATI sider NN
`and CC improVeS WBZa AT
`recording NN performance NN
`becauSe CS the ATSlider NN
`is BEZ isolated VBN magnetically.
`
`FIG. 8
`
`Page 9 of 18
`
`
`
`Patent Application Publication Aug. 16, 2001 Sheet 9 of 12
`
`US 2001/0014852 A1
`
`VERB GROUPS ALLOCATION
`
`The ATI present JJ invention NN
`Shields WBZa AT noise NN Of IN
`an AT external JJ magnetic JJ
`field NNWith IN the ATSlider NN
`and CC improves WBZaAT
`VERBGROUP-recording NN performance NN
`beCauSe CS the ATSlider NN
`is BEZ isolated VBM magnetically.
`
`VERB GROUP
`
`FIG. 9
`
`Page 10 of 18
`
`
`
`Patent Application Publication Aug. 16, 2001 Sheet 10 of 12
`
`US 2001/0014852 A1
`
`NOUN GROUPS ALLOCATION
`
`
`
`The ATI present JJ invention MN 1
`Shields WBZa AT noise NWOf IN
`an AT external JJ magnetic JJ
`field NNWith IN the ATI slider NN
`and CC improves WBZaAT
`recording NIN performance NW 3
`beCauSe CS the ATI Slider NWis BEZ
`isolated VBN magnetically.
`4
`
`FIG 10
`
`Page 11 of 18
`
`
`
`Patent Application Publication Aug. 16, 2001 Sheet 11 of 12
`
`US 2001/0014852 A1
`
`
`
`BONHI NHS (JHSHWd
`
`|| '0||
`
`Page 12 of 18
`
`
`
`Patent Application Publication Aug. 16, 2001 Sheet 12 of 12
`
`US 2001/0014852 A1
`
`SUBJECT
`THE PRESENTINVENTION
`THE PRESENTINVENTION
`
`SAO EXTRACTION
`ACTION
`SHIELDS
`IMPROVES
`ISSOLATED
`
`FIG. 12
`
`OBJECT
`ANOSE OFEXTERNALMAGNETICFIELD
`ARECORDING PERFORMANCE
`THESLIDER
`
`
`
`SUBJECT
`PRESENTINVENTION
`PRESENTINVENTION
`
`SAO EXTRACTION (NORMALIZED)
`ACTION
`SHIELD
`IMPROVE
`SOLATE
`FIG. 13
`
`OBJECT
`NOISE OFEXTERNALMAGNETICFIELD
`RECORDING PERFORMANCE
`SLIDER
`
`Page 13 of 18
`
`
`
`US 2001/0014852 A1
`
`Aug. 16, 2001
`
`DOCUMENT SEMANTIC ANALYSIS/SELECTION
`WITH KNOWLEDGE CREATIVITY CAPABILITY
`
`REFERENCE TO PRIORITY APPLICATION
`0001) This application claims the benefit of U.S. Provi
`sional Application No. 60/099,641, filed Sep. 9, 1998.
`
`BACKGROUND
`0002 The present invention relates to computer based
`natural language processing Systems and more particularly
`to computer based Systems and methods of processing
`natural language text to identify Subject, Action, Object
`triplets and relationships between Such triplets, Storing this
`data and processing this data to Semantically analyze, Select,
`Summarize, Store, and display candidate documents contain
`ing Specific content or Subject matter.
`0.003 Computer based document search processors are
`known to perform key word Searches for publications on the
`Internet and World Wide Web. Today, information owners
`and Service providers are adapting their databases to indi
`vidual tastes and requirements. For example, Boston based
`Agents, Inc. offers over the Web personalized newsletters for
`music fans Such that classical music lovers are blocked from
`receiving Rap music advertisements and Vice-versa. KD,
`Inc. of Hong Kong has developed a System that takes into
`consideration words Similar by Sense while Searching the
`Web. Today, the user can download 10,000 papers from the
`Web by typing the word “Screen”. The search system
`designed by KD, Inc. asks the user whether he/she is seeking
`papers related to Computer Screen, TV Screen or Window
`Screen. In this case, the number of unrelated papers will be
`drastically reduced.
`0004 Software based search processors are able to
`remember requests of a Single user and to conduct perSon
`alized non-Stop Searches on the Web. So, when a user wakes
`up in the morning, he/she finds references and abstracts of
`several new Web papers related to his/her area of interest. In
`1997, practically all fundamental technical publications,
`journals, magazines, as well as patents of all industrial
`countries became available on the Web, i.e., available in
`electronic format.
`0005. Although key word searching the Web affords the
`user great value, it also has created and will continue to
`create Substantial problems adversely affecting this value.
`Specifically, because of the enormous amount of informa
`tion available on the Web, key word search processors
`produce too much downloaded information, the vast major
`ity of which is irrelevant or immaterial to the information the
`user wants. Many users Simply give up in frustration when
`presented with Several hundred articles in response to what
`the user considered a request for only those few articles
`related to a Specific request.
`0006. This problem is also experienced in the technical
`fields of Science and engineering, particularly since there is
`a growing number of libraries, government patent offices,
`universities, government research centers, and others adding
`vast amounts of technical and Scientific information for Web
`access. Engineers, Scientists, and doctors are overwhelmed
`with too many articles, paperS. patents and general infor
`mation on the topic of interest to them. In addition, the user
`presently has only two choices when examining a down
`
`loaded article to determine its relevance to the users project.
`He/she can either read the authors abstract and/or Scan
`various sections of the full article to determine whether or
`not to Save or print-out that Specific document. Since the
`author's abstract is not comprehensive, it often omits the
`reference to the Specific Subject matter of interest to he user
`or treats this Subject matter in an incomprehensive manner.
`Thus, Scanning the abstract and Scanning the full article may
`have little value and require an inordinate amount of user
`time.
`0007 Various attempts purport to increase the recall and
`precision of the selection such as U.S. Pat. Nos. 5,774,833
`and 5,794,050 incorporated herein by reference, however,
`these methods simply rely on key word or phrase Searching
`with various techniques of Selection based on variations of
`the key words, or purported understanding of textual
`phrases. These prior methods may improve recall but tend to
`require too much physical and mental effort and time to
`determine why the document was Selected and what is the
`pertinent part. This results from the entire document or
`abstract being presented without Summary or concept gen
`eration.
`
`SUMMARY OF EXEMPLARY EMBODIMENT
`OF PRESENT INVENTION
`0008. A computer based software system and method
`according to the principles of the present invention Solves
`the foregoing problems and has the ability to perform a
`non-stop search of all databases on the Web or other network
`for key words and to Semantically proceSS candidate docu
`ments for Specific knowledge concepts, Such as technologi
`cal functions or Specific physical effects, So that only the
`very few prioritized or a Single document meeting the Search
`criteria is presented or identified to the user.
`0009 Further, the computer based software system in
`accordance with the principles of the present invention
`captures these highly relevant documents and creates a
`compressed, short Summary of the precise technical physical
`aspects designated by the Search criteria.
`0010 Another aspect of the present invention includes
`using the Semantic analysis results of the Selected documents
`to create new ideas of knowledge concepts. The System does
`this by analyzing the Subject, action, and object triplets
`mentioned in the documents, identifying cause and effect
`triplet relationships, and re-organizing these triplet repre
`Sentations into new and/or different profiles of Such ele
`ments. AS further described below, Some of these reorga
`nized Sets of relationships among these elements may
`comprise new concepts never before thought of by anyone.
`0011. According to an aspect of the present invention, the
`method and apparatuS begins with the user entering natural
`language text related to the task, concept, or Subject matter
`for which the user desires to acquire publications or docu
`ments. The System analyzes this request text and automati
`cally tags each word with a code that indicates the type of
`word it is. Once all words in the request are tagged, the
`System performs a Semantic analysis that, in one example,
`includes determining and Storing the verb groups within the
`first Sentence of the request, then determining and Storing the
`noun groups within that Sentence of the request. This process
`is repeated for all Sentences in the request.
`
`Page 14 of 18
`
`
`
`US 2001/0014852 A1
`
`Aug. 16, 2001
`
`0012 Next, the system parses each request sentence with
`an hierarchical algorithm into a coded framework (tree)
`which is Substantially indicative of the Sense of the Sentence.
`The System includes databases of various types to aid in
`generating the coded framework, Such as grammar rules,
`parsing rules, dictionary Synonyms, and the like. Once
`parsed, Sentence codes are Stored, the System identifies
`Subject-Action-Object (SAO) extractions within each sen
`tence and Stores them. A Sentence can have one, two, or a
`plurality of SAO extractions as Seen in the detailed descrip
`tion below. Each extraction is normalized into a SAO
`Structure by processing extractions according to certain rules
`described below. Accordingly, the result of the Semantic
`analysis routine performed on the request test is a Series of
`SAO structures (triplets) indicative of the content of the
`request. These request SAO structures are applied to (1) a
`comparative module for comparing the SAO Structures of
`candidate documents as described below and (2) a Search
`request and key word generator that identifies key words and
`key combinations of words, and Synonyms thereof, for
`Searching the Web internet, intranet, and/or local databases
`for candidate documents. Any Suitable Search engine, e.g.
`Alta Vista TM, can be used to identify, select, and download
`candidate documents based on the generated key words.
`0013. It should be understood that, as mentioned above,
`key word Searching produces an over-abundance of candi
`date documents. However, according to the principles of the
`present invention, the System performs Substantially the
`same Semantic analysis on each candidate document as
`performed on the user input Search request. That is, the
`System generates an SAO Structure(s) for each Sentence of
`each candidate document and forward them to the compara
`tive Unit where the request SAO structures are compared to
`the candidate document SAP structures. Those few candi
`date documents having SAO Structures that Substantially
`match the request SAO Structure profile are placed into a
`retrieved document Unit where they are ranked in order of
`relevance. The System then Summarizes the essence of each
`retrieved document by synthesizing those SAO structures of
`the document that match the request SAO Structures and
`Stores this Summary for user display or printout. Users can
`later read the Summary and decide to display or print out or
`delete the entire retrieved document and its SAO's.
`
`0.014 AS stated above, the SAO structures for each
`Sentence for each retrieved document are Stored in the
`System according to the present invention. According to the
`knowledge creativity aspect of the present invention, the
`System analyzes all these Stored Structures, identifies where
`common or equivalent Subjects and objects exist and reor
`ganizes, generates, Synthesizes, new SAO Structures or new
`Strings (relationships) or SAO structures for user's consid
`eration. Some of these new Structures or Strings may by
`unique and comprise new Solutions to problems related to
`the user's requested Subject matter. For example, if two
`structures S1-A1-O1 and S2-A2-O2 are stored, and the
`present System recognizes that S2 is equivalent to or the
`synonym for or has some other stored relation to O1 then it
`will generate and Store for the user's access a Summary of
`S1-A1-S2-A2-O2. Of if the system stores an association
`between S1 and A2 it can generated S1-A1/A2-O1 to
`Suggest improvement of O1 toward desired results.
`
`0015. Other and further advantages and benefits shall
`become apparent with the following detailed description
`when taken in View of the appended drawings, in which:
`
`FIG. 6 is a schematic representation of Unite 22 of
`
`FIG. 5 is a schematic representation of Unit 20 of
`
`DRAWING DESCRIPTION
`0016 FIG. 1 is a pictorial representation of one exem
`plary embodiment of the System according to the principles
`of the present invention.
`0017 FIG. 2 is a schematic representation of the main
`architectural elements of the System according to the present
`invention.
`0018 FIG. 3 is a schematic representation of the method
`according to the principles of the present invention.
`FIG. 4 is a schematic representation of Unit 16 of
`0019)
`FG, 2.
`0020
`FG, 2.
`0021)
`FG, 2.
`0022 FIG. 7 is a typical example of the user request text
`entered by use.
`0023 FIG. 8 is a tagged and coded representation ver
`Sion of text of FIG. 7.
`0024 FIG. 9 is an identification of verb groups of the text
`of FIG. 8.
`0025 FIG. 10 is an identification of noun groups of the
`coded text of FIG. 8.
`0026 FIG. 11 is a representation of parsed hierarchy
`coded text of FIG. 8.
`0027 FIG. 12 is a representation of SAO extraction of
`the text of FIG. 7.
`0028 FIG. 13 is a representation of SAO structures of
`the extraction of FIG. 12.
`
`DETAILED DESCRIPTION OF EXEMPLARY
`EMBODIMENTS
`0029. One exemplary embodiment of a semantic process
`ing System according to the principles of the present inven
`tion includes:
`0030 A CPU 12 that could comprise a general purpose
`personal computer or networked Server or minicomputer
`with Standard user input and output driver Such as keyboard
`14, mouse 16, scanner 19, CD reader 17, and printer 18.
`System 10 also includes standard communication ports 21 to
`LANS, WANs, and/or public or private switched networks to
`the Web.
`0031. With reference to FIGS. 1-6, the semantic proces
`Sion System 10 includes a temporary Storage or database 12
`for receiving and Storing documents downloaded from the
`Web or local area network generated as a user request text
`with use of keyboard 14 or one of the other input devices.
`User can type the request, examples disclosed below, or
`enter full documents into DB 12 and designate the document
`as user's request. System 10 further includes Semantic
`processor 14 for receiving the entire text of each document
`and includes a Subject-Action-Object (SAO) analyzer Unit
`
`Page 15 of 18
`
`
`
`US 2001/0014852 A1
`
`Aug. 16, 2001
`
`16 that tags each word of each Sentence with a code type
`(such as Markov chain theory code). Unit 16 then identifies
`each verb group and noun group, (described below) within
`each Sentence and parses and normalizes each Sentence into
`SAO structures that represent the sense of the sentence. Unit
`16 applies its output to DB of SAO structures 18. SAO
`processor Unit 20 stores the request SAO structures and
`receives the SAO Structures of each Sentence of each docu
`ment stored in Unit 18. Unit 20 compares the document
`SAO’s to the request SAO's and deletes out those docu
`ments with no matches. The SAO structures of matched
`documents are Stored back in Unit 18 or Some other Storage
`facility. In addition, Unit 20 analyzes SAO structures within
`a single document or with those of one or more other
`relevant documents, Searches for relationships among S-A-
`OS and generates new SAO Structures for user consider
`ation. These new structures are stored in Unit 18 or some
`other Storage facility in the System.
`0.032
`Unit 14 further includes natural language Unit 22
`that receives SAO structures in table form and synthesizes
`Structures in to natural language form, i.e. Sentences.
`0.033
`Unite 14 also includes keyword Unit 24 for receiv
`ing SAO Structures and extracts key words and phrases from
`them and acquires their Synonyms for use as additional key
`words/phrases.
`0034) Database Units 26, 28, and 30 receive the outputs
`from Unit 14, generally as shown, for Storing the natural
`language Summaries of Selected SAO Structures as described
`below and the key words/phrases that form user request sent
`to Search engines through port 21.
`0.035
`Unit 16 includes document pre-formatter 32 that
`receives full text of documents from Unit 12 and converts
`the text and other contents to a Standard plain text format.
`Text coder 34 analyzes each word of each sentence of text
`and tags a code to every word which code designates the
`word type, see FIG. 8. Various databases designated 44 in
`FIG. 4 are available to aid the Units of Unit 16. Following
`tagging, recognizer Unit 36 identifies the verb groups (FIG.
`9) and the noun groups of each sentence (FIG.10). Sentence
`parser 38 then parses each Sentence into a hierarchical coded
`form that represents the sense of the sentence. FIG. 11
`S-A-O extractor 40 organizes the SAO's of each sentence
`into extracted table format (FIG. 12). Then normalizer 42
`normalizes the extractions into SAO Structures as described
`above (FIG. 13).
`0.036 SAO processor 20 includes three main Units. Com
`parative Unit 46 receives SAO structures from database 18.
`One Set of these structures originates from the user request
`text described above and other Sets originate from the
`candidate documents. Unit 46 then compares these two Sets
`looking for matches between SAO structures of these two
`Sets. If no match results then the candidate document and
`associated SAO's are deleted. If a match is identified then
`the document is marked relevant and ranked and Stored in
`Unit 12 and its SAO structures stored in Unit 18. Unit 46
`then compares all candidate documents in Sequence and in
`the same way as described.
`0037 Unit 20 also includes the SAO structure reorganiz
`ing Unit 48 to synthesize new SAO structures from different
`documents on the same matter and combines them into the
`new structure, as described above, and applies them to Unit
`18.
`
`0038 Filtering Unit 50 analyzes every SAO structure of
`each document and blocks or deletes those not relevant to
`the SAO structures of the request.
`0039 Reference 52 designates some of the databases
`available to aid Sub-units of Unit 20.
`0040 SAO synthesizer Unit 22 (FIG. 6) includes a
`Subject detector 54 for detecting the content of the subject
`for each received SAO structure. If S is detected then the
`SAO is fed to Unit 56 in which the tree structure of the verb
`group(s) is restored to natural language using grammar,
`Semantic, Speech patterns, and Synonyms rules database 66.
`Synthesizer 58 does the same for subject noun groups and
`Synthesizer 60 does the same for object noun groupS. Com
`biner 68 then organizes and combines these groups into a
`natural language Sentence.
`0041) If S was not detected by Unit 54, the SAO struc
`tures are processed by Synthesizer 62 to restore the verb
`group in passive form. Synthesizer 64 processes the object
`noun group for a passive Sentence and combiner 70 to
`organize and combine the groups into a natural language
`Sentence.
`0042. If SAO structures received by Unit 54 bear new
`structure markings, then combiners 68 and 70 apply their
`output to Unit 28 and if they were marked existing SAO
`structure, then units 68, 70 apply output to Unit 26. See FIG.
`3.
`0043. The salient steps to the method according to the
`principles of the present invention are shown in FIG. 3,
`where the number in the parenthesis refer to the Units of
`FIG. 2 where the process Steps take place. A Session begins
`with the user inputting a natural language request which
`could be customized with the use of the keyboard or would
`be a natural language document entered via one of the input
`devices shown in FIG. 1. A typical user generates custom
`ized request as shown in FIG. 7, System 10 Unit 14, then by
`first tagging each word with a type code (See FIG. 8) then
`identifying the verb groups of each sentence (FIG. 9) and
`noun groups of each Sentence (FIG. 10) then processing
`each sentence into an hierarchical tree (FIG. 11) and then
`extracting the SAO extractions where all extracted words are
`the originals of the request (FIG. 12).
`0044) Then the method normalizes these words (modi
`fies) each as each action is changed to its infinitive form.
`Thus, “is isolated”FIG. 12 is changed to “ISOLATE", the
`word “to” being understood (FIG. 13). It should be under
`stood that not all attributes of the Subject, action and objects
`appearing in FIG. 11 are shown in FIGS. 12 and 13, but the
`system know the full attributes associated with the SAO
`elements and these attributes are part of the SAO structure.
`Also, note in FIG. 13, no subject is listed for the last action
`because is indicated pursuant to the planning rules. This
`absence does not affect the reliability of the overall method
`because all Sentences of the candidate documents the include
`an A-O of Isolate-slides will be considered a matter regard
`less of the subject. The normalized SAO's are called herein
`as SAO Structures. These users request SAO Structures are
`Stored and applied in tow following steps (i) Synthesis of key
`word/phrases of user request; (ii) a comparative analysis of
`SAP Structure of each Sentence of each candidate documents
`as described below.
`004.5 The request SAO structure key words/phrases are
`Stored and Sent to a Standard Search engine to Search for
`
`Page 16 of 18
`
`
`
`US 2001/0014852 A1
`
`Aug. 16, 2001
`
`candidate documents in local databases, LANs and/or the
`Web. Alta Vista TM, Yahoo TM, or other typical search engines
`could be used. The engine, using the request SAO Structure
`key words/phrases identifies candidate documents and Stores
`them (full text) for system 10 analysis. Next the SAO
`analysis as described above for the Search request is repeated
`for each Sentence of each candidate document So that SAO
`Structures are generated and Stored as indicated in FIG. 3. In
`addition, the SAO Structures of each document are used in
`the comparative Steps where the request SAO Structures are
`compared with the candidate document SAO structures. If
`no match is found then the documents and related SAO
`Structures are deleted from the System. If one or more
`matches are found then the document and related Structures
`are marked relevant and its relevancy marked for example
`on a scale of 1.0 to 100. The full relevant document text is
`permanently stored (although it can later be deleted by user
`if desired) for display or print-Out as user desires. Relevant
`SAO Structures are also marked relevant and permanently
`Stored.
`0046) Next System 10 filters out the least relevant SAO
`Structures and uses the matched SAO Structures of each
`relevant document to Synthesize into natural language Sum
`mary Sentence(s) the matched SAO structures and the page
`number where the complete Sentence associated with the
`matched SAO Structures and the page number where the
`complete Sentence associated with the matched SAO Struc
`ture appears. This Summary is Stored and available for user's
`display or print-out as desired.
`0047 Filtered relevant SAO structures of relevant docu
`ment(s) are analyzed to identify relationships among the
`Subjects, actions, and objects among all relevant Structures.
`Then SAO Structures are processed to reorganize them into
`new SAO Structures for Storage and Synthesis into natural
`language new sentence(s). The new sentences may and
`probably Some of them will express or Summarize new
`ideas, concepts and thoughts for users to consider. The new
`Sentences are Stored for user display or pint-Out.
`0048 For example, if
`0049 S-A-O,
`0050 S-A-O
`0051 S-A-O
`0.052 and S is the same as or a synonym of O, then
`S-A-S-A-O, is synthesized into a new sentence and
`Stored.
`Accordingly, the method and apparatus according
`0.053
`to the present invention provides use automatically with a
`Set of new ideas directly relating to user's requested area of
`interest Some of which ideas are probably new and Suggest
`possible new Solutions to user's problems under consider
`ation and/or the Specific documents and Summaries of per
`tinent parts of Specific documents related directly to user's
`request.
`0.054 Although mention has been made herein of appli
`cation of the present System and method to the engineering,
`Scientific and medical fields, the application thereof is not
`limited thereto. The present invention has utility for histo
`rians, philosophers, theology, poetry, the arts or any field
`where written language is used.
`
`0055. It will be understood that various enhancements
`and changes can be made to the example embodiments
`herein disclosed without departing from the Spirit and Scope
`of the present invention.
`
`We claim:
`1. A natural language document analysis and Selection
`System comprising,
`a general purpose computer having a monitor, a central
`processing unit (CPU), a user input device for gener
`ating request data representing a natural language
`request, and a communications device for communica
`tion with local and remote natural language document
`databases,
`said CPU comprising
`(i) first storage means for storing the request data,
`(ii) a Semantic processor for generating request Subject
`action-object (SAO) extractions in response to
`receiving request data, and
`(iii) SAO Storage means for Storing representations of
`the request SAO extractions.
`2. A System as Set forth in claim 1, wherein Said commu
`nication device conveys candidate document data to Said
`CPU for Storage in Said first Storage means, the candidate
`document data representing natural language document text,
`Said Semantic processor generating candidate document
`SAO extractions in response to receiving candidate
`document data, and
`Said SAO Storage means also storing representations of
`candidate document SAO extractions.
`3. A System as Set forth in claim 2, wherein Said Semantic
`processor identifies matches between Said representations of
`Said request SAO extractions and Said candidate document
`SAO extractions.
`4. A System as Set forth in claim 3, wherein Said Semantic
`processor comprises means for marking as relevant candi
`date document data that includes at least one representation
`of candidate document SAO extraction that matches at least
`one representation of request SAO extraction.
`5. A System as Set forth in claim 4, wherein Said Semantic
`processor comprises means for deleting Stored candidate
`document data and Stored representations of cand