throbber
(19) United States
`(12) Patent Application Publication (10) Pub. No.: US 2001/0014852 A1
`Tsourikov et al.
`(43) Pub. Date:
`Aug. 16, 2001
`
`US 20010014852A1
`
`(54) DOCUMENT SEMANTIC
`ANALYSIS/SELECTION WITH
`KNOWLEDGE CREATIVITY CAPABILITY
`
`(76) Inventors: Valery M. Tsourikov, Boston, MA
`s
`s
`(US); Leonid S. Batchilo, Belmont,
`MA (US); Igor V. Sovpel, Minsk (BY)
`Correspondence Address:
`Edward Dreyfus, Esq.
`Stanger & Dreyfus
`608 Sherwood Parkway
`Mountainside, NJ 07092 (US)
`
`(21) Appl. No.:
`(22) Filed:
`
`09/745,261
`Feb. 7, 2001
`
`Related U.S. Application Data
`(63) Continuation of application No. 09/321,804, filed on
`May 27, 1999, now Pat. No. 6,167,370. which is a
`non-provisional of provisional application No.
`60/099,641, filed on Sep. 9, 1998.
`Publication Classification
`
`(51) Int. Cl. ............................ G06F 17/27; G06F 17/30
`
`(52) U.S. Cl. ........................................ 704/9; 704/1; 707/3
`
`5
`7
`(57)
`
`ABSTRACT
`
`A computer based Software System and method for Seman
`tically processing a user entered natural language request to
`identify and Store linguistic Subject-action-object (SAO)
`Structures, using Such structures as key words/phrases to
`Search local and web-based databases for downloading
`candidate natural language documents, Semantically pro
`cessing candidate document texts into candidate document
`SAO Structures, and Selecting and Storing only relevant
`documents whose SAO structures include a match with a
`Stored request SAO Structure. Further features include ana
`lyzing relationships among relevant document SAO Struc
`tures and creating new SAO Structures based on Such
`relationships that may yield new knowledge concepts and
`ideas for display to the user and generating and displaying
`natural language Summaries based on the relevant document
`SAO structures.
`
`2
`
`USER REQUEST-
`DBOFORIGINALDOCUMENTS
`WEB-P GAISE" is
`LOCALDB-
`:
`26
`:
`DBOFSUMMARIES OF
`ORIGINALDOCUMENTS
`;
`(NATURALLANGUAGETEXTS)
`28
`
`DBOF NEW CONCEPTS
`(NATURALLANGUAGETEXTS)
`30
`
`DBOF ACCURATEKEY
`WORDSPHRASES
`REPRESENTATIONS OF
`ORIGINALTEXTS
`
`TOWEB
`
`TOLOCALDB
`
`:
`
`
`
`SARFTEXI
`| EMTR
`ENGOP
`SAGEXTRACTION
`SAONORMALIZER
`
`SAO PROCESSOR
`COMPARISON
`RE-ORGANIZATION
`FILTERING
`
`
`
`20
`
`--SEMANTIC
`: EESR
`""
`
`in-SEMANTIC
`R
`So
`;
`;
`
`18
`
`He DBOFSAO-STRUCTURES
`
`SAOSYNTHESIZER OF
`NATURALLANGUAGETEXT
`
`T :
`
`SAOSYNTHESIZER OF
`KEY WORDSPHRASES
`REPRESENATION
`
`Page 1 of 18
`
`GOOGLE EXHIBIT 1015
`
`

`

`Patent Application Publication Aug. 16, 2001 Sheet 1 of 12
`
`US 2001/0014852 A1
`
`to N.
`
`
`
`I
`PRINTER H
`18 - III
`
`
`
`12
`
`2
`e Evano
`NETWORK
`
`
`
`
`
`
`
`H
`in
`o H Dom Hall
`KEYBOARD
`
`FIG. 1
`
`Page 2 of 18
`
`

`

`Patent Application Publication Aug. 16,2001 Sheet 2 of 12
`
`US 2001/0014852 AJ
`
`NI
`
`=L
`
`i
`
`YOSSI00Ud
`
`WAISAS
`
`01
`
`OLNVSS-s_
`
`SIUNLONULS-OVS4080
`
`8
`
`i!
`
`4¥OSSIO0Ud
`
`
`
`dNOWSNAON/EHSA
`
`YaLLVAWOs-3ed
`
`ONIDOVL
`
`NOLLOWULXOVS
`
`HAZTIVINUONOVS
`
`ONISHVd
`
`Y0S5I00UdOVS
`
`NOSIHVdW09
`
`NOLWZINVOUO-Je
`
`ONALTIS
`
`AQYIZISSHINASOVS
`
`
`
`
`
`LXALJOWNONY]WHPLYN
`
`OLLNVINSS-~~
`TXal40HaZATWNOVS
`
`
`
`
`
`eeweremeeeeecremeeeeemmioneeemmmmehmhntseenmeeemeeeeeeaeeteemmmmmseeoeonesaaesoem
`
`
`
`
`
`pectesetwernersetteewrernK1ve
`
`40YZISSHINASOWS
`
`SaSVUHd/SCUOMAS)
`
`NOLLVINISJddFY
`
`ADALVEAQOY4080
`
`SASWHHd/SCHOM
`
`40SNOLLWVINISJUdsd
`
`
`
`SLXALTNIDIYO
`
`4!
`
`
`
`SINAWNOCWNISIO4080
`
`
`
`
`
`(S1X41JOVNONYTWYNLWN)
`
`40SHVNNS408d
`
`
`
`SINAWNIOCTWNIDIEO
`
`
`
`
`
`(SLXALJO¥NONVTTWHNLYN)
`
`S1d39NO9MIN4090
`
`
`
`
`
`(SLX4LJVNONV]TWHALYN)
`
`Oe eee eee eee
`
`da
`
`01v001
`
`{S3N034YSN
`
`Page 3 of 18
`
`aMOL
`
`80W907OL
`
`Page 3 of 18
`
`
`
`
`

`

`Patent Application Publication Aug. 16, 2001 Sheet 3 of 12
`
`US 2001/0014852 A1
`
`SENDINGKEYWORD PHRASES REPRESENTATIONS TOWEB
`TOWEB (ALTA-VISTA, LOCALDBs, ETC.)
`LOCALDBS
`(30)
`(2)
`ROM
`STORINGDBOFKEYWORD/PHRASES
`STORING AFULTEXT
`REPRESENTATION OF USER REQUEST
`OFCURRENT CANDDATE
`DOCUMENT
`(24)
`STORING OFSASTRUCTURES (18) SYNTHESISOFKEYWORDPHRASEs
`EAE
`OFUSER REQUEST
`REPRESENTATION OFUSERREQUEST
`RAN:
`TOUSER
`
`USERREQUEST
`
`SAOANALYSIS OF TEXT
`SEREES' (16)
`
`
`
`COMPARATIVEANALYSIS OFSAO-
`STRUCTURES OF USERREQUESTAND SAO
`STRUCTURES OF CANDIDATEDOCUMENT
`F
`REEWANT
`
`DELETING
`THECURRENT
`CANDIDATE
`DOCUMENT
`ANDTSSAO
`STRUCTURES
`
`MARKING
`THECURRENT
`CANDDATE
`DOCUMENT
`ANDTSSAO
`STRUCTURES
`ASRELENANT
`
`
`
`SAOANALYSIS OF TEXT
`OFCURRENT CANDIDATE
`DOCUMENT
`
`
`
`
`
`STORINGDBOFSAO
`STRUCTURES OF CANDIDATE
`DOCUMENT
`
`FMARKED
`RELEVANT
`
`( 20)
`
`FILTERING MAJOR
`SAO-STRUCTURES
`
`(22)
`SYNTHESZINGATEXT OFA
`SHORTSUMMARY OFRELEWANT
`DOCUMENT
`(26)
`STORINGDBOFSUMMARIES OF
`REEWAN DOCUMENT
`
`(20)
`PROCESSING THESAO-STRUCTURESOFRELEVANT DOCUMENTS,
`REORGANIZING EXISTED SAO-STRUCTURES ANDSYNTHESIZINGA
`NEWSAO-STRUCTURES
`(18)
`STORINGDBOF NEWSAO-STRUCTURES
`(22)
`(28)
`SYNTHESzNGATE OF NEW CONCEPTS-S.
`
`DISPLAYING TO USER
`
`FIG. 3
`
`Page 4 of 18
`
`

`

`Patent Application Publication Aug. 16, 2001 Sheet 4 of 12
`
`US 2001/0014852 A1
`
`FROMDBOFDOCUMENTS (12)
`
`
`
`
`
`
`
`
`
`44 (SEPARATEDATABASES)
`LIST OF CODES
`DCTIONARY WORD-CODES
`DCTIONARY DOM-CODES
`DICTIONARY WORD-CODE-FREQUENCY
`STATISTICALMATRIX CODE-CODE
`PARSINGRULES SNS OFCODES)
`VERB/NOUN GROUPPATTERNS
`
`
`
`DOCUMENT PRE-FORMATTER
`32
`
`y
`
`TEXT CODER (TAGGING)
`
`RECOGNIZER OF
`VERBINOUNSGROUPS
`
`
`
`36
`
`SENTENCEPARSER
`
`S-A-OEXTRACTOR
`
`S-A-ONORMALIZER
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`TODBOFSA-OSTRUCTURES(18)
`
`FIG. 4
`
`Page 5 of 18
`
`

`

`Patent Application Publication Aug. 16, 2001 Sheet 5 of 12
`
`US 2001/0014852 A1
`
`FROMDBOFSAO STRUCTURES
`
`20
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`COMPARATIVE UNIT
`
`REORGANIZING SE
`UNIT FORSYNTHESIS OF
`NEWSAO
`
`FILTERING UNIT
`
`52(SEPARATEDATABASES)
`- SYNONYMDB, (DICTIONARY INCLUDING THE
`FREQUENCIES OF WORDS AND THEIRSYNONIMS)
`- RULES OF LOGICAL INFERENCE FOR SAO
`STRUCTURES
`- STOP-WORDS/PHRASESFREQUENCYDB
`- RULES OF COMPARING
`- SEMANTICMARKERS (PATTERNSAS
`CODESTRINGS
`
`
`
`TODBOFSA-OSTRUCTURES
`
`FIG.5
`
`Page 6 of 18
`
`

`

`Patent Application Publication Aug. 16, 2001 Sheet 6 of 12
`
`US 2001/0014852 A1
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`- - - - - • • • - - - - - - -§§§§§
`
`
`
`OWS MENS||[\d|N|HH|| ||
`
`(82)9'0||(92)
`
`Page 7 of 18
`
`

`

`Patent Application Publication Aug. 16, 2001 Sheet 7 of 12
`
`US 2001/0014852 A1
`
`SOURCE SENTENCE
`
`The present invention shields a
`noise of an external magnetic field
`with the slider and improves a
`reCOrding performance because the
`Slider is isolated magnetically.
`
`FIG.7
`
`Page 8 of 18
`
`

`

`Patent Application Publication Aug. 16, 2001 Sheet 8 of 12
`
`US 2001/0014852 A1
`
`TAGGED SENTENCE
`
`The ATI present JJ invention NN
`Shields WBZa ATnOise NN Of IN
`an AT external JJ magnetic JJ
`field NNWith N the ATI sider NN
`and CC improVeS WBZa AT
`recording NN performance NN
`becauSe CS the ATSlider NN
`is BEZ isolated VBN magnetically.
`
`FIG. 8
`
`Page 9 of 18
`
`

`

`Patent Application Publication Aug. 16, 2001 Sheet 9 of 12
`
`US 2001/0014852 A1
`
`VERB GROUPS ALLOCATION
`
`The ATI present JJ invention NN
`Shields WBZa AT noise NN Of IN
`an AT external JJ magnetic JJ
`field NNWith IN the ATSlider NN
`and CC improves WBZaAT
`VERBGROUP-recording NN performance NN
`beCauSe CS the ATSlider NN
`is BEZ isolated VBM magnetically.
`
`VERB GROUP
`
`FIG. 9
`
`Page 10 of 18
`
`

`

`Patent Application Publication Aug. 16, 2001 Sheet 10 of 12
`
`US 2001/0014852 A1
`
`NOUN GROUPS ALLOCATION
`
`
`
`The ATI present JJ invention MN 1
`Shields WBZa AT noise NWOf IN
`an AT external JJ magnetic JJ
`field NNWith IN the ATI slider NN
`and CC improves WBZaAT
`recording NIN performance NW 3
`beCauSe CS the ATI Slider NWis BEZ
`isolated VBN magnetically.
`4
`
`FIG 10
`
`Page 11 of 18
`
`

`

`Patent Application Publication Aug. 16, 2001 Sheet 11 of 12
`
`US 2001/0014852 A1
`
`
`
`BONHI NHS (JHSHWd
`
`|| '0||
`
`Page 12 of 18
`
`

`

`Patent Application Publication Aug. 16, 2001 Sheet 12 of 12
`
`US 2001/0014852 A1
`
`SUBJECT
`THE PRESENTINVENTION
`THE PRESENTINVENTION
`
`SAO EXTRACTION
`ACTION
`SHIELDS
`IMPROVES
`ISSOLATED
`
`FIG. 12
`
`OBJECT
`ANOSE OFEXTERNALMAGNETICFIELD
`ARECORDING PERFORMANCE
`THESLIDER
`
`
`
`SUBJECT
`PRESENTINVENTION
`PRESENTINVENTION
`
`SAO EXTRACTION (NORMALIZED)
`ACTION
`SHIELD
`IMPROVE
`SOLATE
`FIG. 13
`
`OBJECT
`NOISE OFEXTERNALMAGNETICFIELD
`RECORDING PERFORMANCE
`SLIDER
`
`Page 13 of 18
`
`

`

`US 2001/0014852 A1
`
`Aug. 16, 2001
`
`DOCUMENT SEMANTIC ANALYSIS/SELECTION
`WITH KNOWLEDGE CREATIVITY CAPABILITY
`
`REFERENCE TO PRIORITY APPLICATION
`0001) This application claims the benefit of U.S. Provi
`sional Application No. 60/099,641, filed Sep. 9, 1998.
`
`BACKGROUND
`0002 The present invention relates to computer based
`natural language processing Systems and more particularly
`to computer based Systems and methods of processing
`natural language text to identify Subject, Action, Object
`triplets and relationships between Such triplets, Storing this
`data and processing this data to Semantically analyze, Select,
`Summarize, Store, and display candidate documents contain
`ing Specific content or Subject matter.
`0.003 Computer based document search processors are
`known to perform key word Searches for publications on the
`Internet and World Wide Web. Today, information owners
`and Service providers are adapting their databases to indi
`vidual tastes and requirements. For example, Boston based
`Agents, Inc. offers over the Web personalized newsletters for
`music fans Such that classical music lovers are blocked from
`receiving Rap music advertisements and Vice-versa. KD,
`Inc. of Hong Kong has developed a System that takes into
`consideration words Similar by Sense while Searching the
`Web. Today, the user can download 10,000 papers from the
`Web by typing the word “Screen”. The search system
`designed by KD, Inc. asks the user whether he/she is seeking
`papers related to Computer Screen, TV Screen or Window
`Screen. In this case, the number of unrelated papers will be
`drastically reduced.
`0004 Software based search processors are able to
`remember requests of a Single user and to conduct perSon
`alized non-Stop Searches on the Web. So, when a user wakes
`up in the morning, he/she finds references and abstracts of
`several new Web papers related to his/her area of interest. In
`1997, practically all fundamental technical publications,
`journals, magazines, as well as patents of all industrial
`countries became available on the Web, i.e., available in
`electronic format.
`0005. Although key word searching the Web affords the
`user great value, it also has created and will continue to
`create Substantial problems adversely affecting this value.
`Specifically, because of the enormous amount of informa
`tion available on the Web, key word search processors
`produce too much downloaded information, the vast major
`ity of which is irrelevant or immaterial to the information the
`user wants. Many users Simply give up in frustration when
`presented with Several hundred articles in response to what
`the user considered a request for only those few articles
`related to a Specific request.
`0006. This problem is also experienced in the technical
`fields of Science and engineering, particularly since there is
`a growing number of libraries, government patent offices,
`universities, government research centers, and others adding
`vast amounts of technical and Scientific information for Web
`access. Engineers, Scientists, and doctors are overwhelmed
`with too many articles, paperS. patents and general infor
`mation on the topic of interest to them. In addition, the user
`presently has only two choices when examining a down
`
`loaded article to determine its relevance to the users project.
`He/she can either read the authors abstract and/or Scan
`various sections of the full article to determine whether or
`not to Save or print-out that Specific document. Since the
`author's abstract is not comprehensive, it often omits the
`reference to the Specific Subject matter of interest to he user
`or treats this Subject matter in an incomprehensive manner.
`Thus, Scanning the abstract and Scanning the full article may
`have little value and require an inordinate amount of user
`time.
`0007 Various attempts purport to increase the recall and
`precision of the selection such as U.S. Pat. Nos. 5,774,833
`and 5,794,050 incorporated herein by reference, however,
`these methods simply rely on key word or phrase Searching
`with various techniques of Selection based on variations of
`the key words, or purported understanding of textual
`phrases. These prior methods may improve recall but tend to
`require too much physical and mental effort and time to
`determine why the document was Selected and what is the
`pertinent part. This results from the entire document or
`abstract being presented without Summary or concept gen
`eration.
`
`SUMMARY OF EXEMPLARY EMBODIMENT
`OF PRESENT INVENTION
`0008. A computer based software system and method
`according to the principles of the present invention Solves
`the foregoing problems and has the ability to perform a
`non-stop search of all databases on the Web or other network
`for key words and to Semantically proceSS candidate docu
`ments for Specific knowledge concepts, Such as technologi
`cal functions or Specific physical effects, So that only the
`very few prioritized or a Single document meeting the Search
`criteria is presented or identified to the user.
`0009 Further, the computer based software system in
`accordance with the principles of the present invention
`captures these highly relevant documents and creates a
`compressed, short Summary of the precise technical physical
`aspects designated by the Search criteria.
`0010 Another aspect of the present invention includes
`using the Semantic analysis results of the Selected documents
`to create new ideas of knowledge concepts. The System does
`this by analyzing the Subject, action, and object triplets
`mentioned in the documents, identifying cause and effect
`triplet relationships, and re-organizing these triplet repre
`Sentations into new and/or different profiles of Such ele
`ments. AS further described below, Some of these reorga
`nized Sets of relationships among these elements may
`comprise new concepts never before thought of by anyone.
`0011. According to an aspect of the present invention, the
`method and apparatuS begins with the user entering natural
`language text related to the task, concept, or Subject matter
`for which the user desires to acquire publications or docu
`ments. The System analyzes this request text and automati
`cally tags each word with a code that indicates the type of
`word it is. Once all words in the request are tagged, the
`System performs a Semantic analysis that, in one example,
`includes determining and Storing the verb groups within the
`first Sentence of the request, then determining and Storing the
`noun groups within that Sentence of the request. This process
`is repeated for all Sentences in the request.
`
`Page 14 of 18
`
`

`

`US 2001/0014852 A1
`
`Aug. 16, 2001
`
`0012 Next, the system parses each request sentence with
`an hierarchical algorithm into a coded framework (tree)
`which is Substantially indicative of the Sense of the Sentence.
`The System includes databases of various types to aid in
`generating the coded framework, Such as grammar rules,
`parsing rules, dictionary Synonyms, and the like. Once
`parsed, Sentence codes are Stored, the System identifies
`Subject-Action-Object (SAO) extractions within each sen
`tence and Stores them. A Sentence can have one, two, or a
`plurality of SAO extractions as Seen in the detailed descrip
`tion below. Each extraction is normalized into a SAO
`Structure by processing extractions according to certain rules
`described below. Accordingly, the result of the Semantic
`analysis routine performed on the request test is a Series of
`SAO structures (triplets) indicative of the content of the
`request. These request SAO structures are applied to (1) a
`comparative module for comparing the SAO Structures of
`candidate documents as described below and (2) a Search
`request and key word generator that identifies key words and
`key combinations of words, and Synonyms thereof, for
`Searching the Web internet, intranet, and/or local databases
`for candidate documents. Any Suitable Search engine, e.g.
`Alta Vista TM, can be used to identify, select, and download
`candidate documents based on the generated key words.
`0013. It should be understood that, as mentioned above,
`key word Searching produces an over-abundance of candi
`date documents. However, according to the principles of the
`present invention, the System performs Substantially the
`same Semantic analysis on each candidate document as
`performed on the user input Search request. That is, the
`System generates an SAO Structure(s) for each Sentence of
`each candidate document and forward them to the compara
`tive Unit where the request SAO structures are compared to
`the candidate document SAP structures. Those few candi
`date documents having SAO Structures that Substantially
`match the request SAO Structure profile are placed into a
`retrieved document Unit where they are ranked in order of
`relevance. The System then Summarizes the essence of each
`retrieved document by synthesizing those SAO structures of
`the document that match the request SAO Structures and
`Stores this Summary for user display or printout. Users can
`later read the Summary and decide to display or print out or
`delete the entire retrieved document and its SAO's.
`
`0.014 AS stated above, the SAO structures for each
`Sentence for each retrieved document are Stored in the
`System according to the present invention. According to the
`knowledge creativity aspect of the present invention, the
`System analyzes all these Stored Structures, identifies where
`common or equivalent Subjects and objects exist and reor
`ganizes, generates, Synthesizes, new SAO Structures or new
`Strings (relationships) or SAO structures for user's consid
`eration. Some of these new Structures or Strings may by
`unique and comprise new Solutions to problems related to
`the user's requested Subject matter. For example, if two
`structures S1-A1-O1 and S2-A2-O2 are stored, and the
`present System recognizes that S2 is equivalent to or the
`synonym for or has some other stored relation to O1 then it
`will generate and Store for the user's access a Summary of
`S1-A1-S2-A2-O2. Of if the system stores an association
`between S1 and A2 it can generated S1-A1/A2-O1 to
`Suggest improvement of O1 toward desired results.
`
`0015. Other and further advantages and benefits shall
`become apparent with the following detailed description
`when taken in View of the appended drawings, in which:
`
`FIG. 6 is a schematic representation of Unite 22 of
`
`FIG. 5 is a schematic representation of Unit 20 of
`
`DRAWING DESCRIPTION
`0016 FIG. 1 is a pictorial representation of one exem
`plary embodiment of the System according to the principles
`of the present invention.
`0017 FIG. 2 is a schematic representation of the main
`architectural elements of the System according to the present
`invention.
`0018 FIG. 3 is a schematic representation of the method
`according to the principles of the present invention.
`FIG. 4 is a schematic representation of Unit 16 of
`0019)
`FG, 2.
`0020
`FG, 2.
`0021)
`FG, 2.
`0022 FIG. 7 is a typical example of the user request text
`entered by use.
`0023 FIG. 8 is a tagged and coded representation ver
`Sion of text of FIG. 7.
`0024 FIG. 9 is an identification of verb groups of the text
`of FIG. 8.
`0025 FIG. 10 is an identification of noun groups of the
`coded text of FIG. 8.
`0026 FIG. 11 is a representation of parsed hierarchy
`coded text of FIG. 8.
`0027 FIG. 12 is a representation of SAO extraction of
`the text of FIG. 7.
`0028 FIG. 13 is a representation of SAO structures of
`the extraction of FIG. 12.
`
`DETAILED DESCRIPTION OF EXEMPLARY
`EMBODIMENTS
`0029. One exemplary embodiment of a semantic process
`ing System according to the principles of the present inven
`tion includes:
`0030 A CPU 12 that could comprise a general purpose
`personal computer or networked Server or minicomputer
`with Standard user input and output driver Such as keyboard
`14, mouse 16, scanner 19, CD reader 17, and printer 18.
`System 10 also includes standard communication ports 21 to
`LANS, WANs, and/or public or private switched networks to
`the Web.
`0031. With reference to FIGS. 1-6, the semantic proces
`Sion System 10 includes a temporary Storage or database 12
`for receiving and Storing documents downloaded from the
`Web or local area network generated as a user request text
`with use of keyboard 14 or one of the other input devices.
`User can type the request, examples disclosed below, or
`enter full documents into DB 12 and designate the document
`as user's request. System 10 further includes Semantic
`processor 14 for receiving the entire text of each document
`and includes a Subject-Action-Object (SAO) analyzer Unit
`
`Page 15 of 18
`
`

`

`US 2001/0014852 A1
`
`Aug. 16, 2001
`
`16 that tags each word of each Sentence with a code type
`(such as Markov chain theory code). Unit 16 then identifies
`each verb group and noun group, (described below) within
`each Sentence and parses and normalizes each Sentence into
`SAO structures that represent the sense of the sentence. Unit
`16 applies its output to DB of SAO structures 18. SAO
`processor Unit 20 stores the request SAO structures and
`receives the SAO Structures of each Sentence of each docu
`ment stored in Unit 18. Unit 20 compares the document
`SAO’s to the request SAO's and deletes out those docu
`ments with no matches. The SAO structures of matched
`documents are Stored back in Unit 18 or Some other Storage
`facility. In addition, Unit 20 analyzes SAO structures within
`a single document or with those of one or more other
`relevant documents, Searches for relationships among S-A-
`OS and generates new SAO Structures for user consider
`ation. These new structures are stored in Unit 18 or some
`other Storage facility in the System.
`0.032
`Unit 14 further includes natural language Unit 22
`that receives SAO structures in table form and synthesizes
`Structures in to natural language form, i.e. Sentences.
`0.033
`Unite 14 also includes keyword Unit 24 for receiv
`ing SAO Structures and extracts key words and phrases from
`them and acquires their Synonyms for use as additional key
`words/phrases.
`0034) Database Units 26, 28, and 30 receive the outputs
`from Unit 14, generally as shown, for Storing the natural
`language Summaries of Selected SAO Structures as described
`below and the key words/phrases that form user request sent
`to Search engines through port 21.
`0.035
`Unit 16 includes document pre-formatter 32 that
`receives full text of documents from Unit 12 and converts
`the text and other contents to a Standard plain text format.
`Text coder 34 analyzes each word of each sentence of text
`and tags a code to every word which code designates the
`word type, see FIG. 8. Various databases designated 44 in
`FIG. 4 are available to aid the Units of Unit 16. Following
`tagging, recognizer Unit 36 identifies the verb groups (FIG.
`9) and the noun groups of each sentence (FIG.10). Sentence
`parser 38 then parses each Sentence into a hierarchical coded
`form that represents the sense of the sentence. FIG. 11
`S-A-O extractor 40 organizes the SAO's of each sentence
`into extracted table format (FIG. 12). Then normalizer 42
`normalizes the extractions into SAO Structures as described
`above (FIG. 13).
`0.036 SAO processor 20 includes three main Units. Com
`parative Unit 46 receives SAO structures from database 18.
`One Set of these structures originates from the user request
`text described above and other Sets originate from the
`candidate documents. Unit 46 then compares these two Sets
`looking for matches between SAO structures of these two
`Sets. If no match results then the candidate document and
`associated SAO's are deleted. If a match is identified then
`the document is marked relevant and ranked and Stored in
`Unit 12 and its SAO structures stored in Unit 18. Unit 46
`then compares all candidate documents in Sequence and in
`the same way as described.
`0037 Unit 20 also includes the SAO structure reorganiz
`ing Unit 48 to synthesize new SAO structures from different
`documents on the same matter and combines them into the
`new structure, as described above, and applies them to Unit
`18.
`
`0038 Filtering Unit 50 analyzes every SAO structure of
`each document and blocks or deletes those not relevant to
`the SAO structures of the request.
`0039 Reference 52 designates some of the databases
`available to aid Sub-units of Unit 20.
`0040 SAO synthesizer Unit 22 (FIG. 6) includes a
`Subject detector 54 for detecting the content of the subject
`for each received SAO structure. If S is detected then the
`SAO is fed to Unit 56 in which the tree structure of the verb
`group(s) is restored to natural language using grammar,
`Semantic, Speech patterns, and Synonyms rules database 66.
`Synthesizer 58 does the same for subject noun groups and
`Synthesizer 60 does the same for object noun groupS. Com
`biner 68 then organizes and combines these groups into a
`natural language Sentence.
`0041) If S was not detected by Unit 54, the SAO struc
`tures are processed by Synthesizer 62 to restore the verb
`group in passive form. Synthesizer 64 processes the object
`noun group for a passive Sentence and combiner 70 to
`organize and combine the groups into a natural language
`Sentence.
`0042. If SAO structures received by Unit 54 bear new
`structure markings, then combiners 68 and 70 apply their
`output to Unit 28 and if they were marked existing SAO
`structure, then units 68, 70 apply output to Unit 26. See FIG.
`3.
`0043. The salient steps to the method according to the
`principles of the present invention are shown in FIG. 3,
`where the number in the parenthesis refer to the Units of
`FIG. 2 where the process Steps take place. A Session begins
`with the user inputting a natural language request which
`could be customized with the use of the keyboard or would
`be a natural language document entered via one of the input
`devices shown in FIG. 1. A typical user generates custom
`ized request as shown in FIG. 7, System 10 Unit 14, then by
`first tagging each word with a type code (See FIG. 8) then
`identifying the verb groups of each sentence (FIG. 9) and
`noun groups of each Sentence (FIG. 10) then processing
`each sentence into an hierarchical tree (FIG. 11) and then
`extracting the SAO extractions where all extracted words are
`the originals of the request (FIG. 12).
`0044) Then the method normalizes these words (modi
`fies) each as each action is changed to its infinitive form.
`Thus, “is isolated”FIG. 12 is changed to “ISOLATE", the
`word “to” being understood (FIG. 13). It should be under
`stood that not all attributes of the Subject, action and objects
`appearing in FIG. 11 are shown in FIGS. 12 and 13, but the
`system know the full attributes associated with the SAO
`elements and these attributes are part of the SAO structure.
`Also, note in FIG. 13, no subject is listed for the last action
`because is indicated pursuant to the planning rules. This
`absence does not affect the reliability of the overall method
`because all Sentences of the candidate documents the include
`an A-O of Isolate-slides will be considered a matter regard
`less of the subject. The normalized SAO's are called herein
`as SAO Structures. These users request SAO Structures are
`Stored and applied in tow following steps (i) Synthesis of key
`word/phrases of user request; (ii) a comparative analysis of
`SAP Structure of each Sentence of each candidate documents
`as described below.
`004.5 The request SAO structure key words/phrases are
`Stored and Sent to a Standard Search engine to Search for
`
`Page 16 of 18
`
`

`

`US 2001/0014852 A1
`
`Aug. 16, 2001
`
`candidate documents in local databases, LANs and/or the
`Web. Alta Vista TM, Yahoo TM, or other typical search engines
`could be used. The engine, using the request SAO Structure
`key words/phrases identifies candidate documents and Stores
`them (full text) for system 10 analysis. Next the SAO
`analysis as described above for the Search request is repeated
`for each Sentence of each candidate document So that SAO
`Structures are generated and Stored as indicated in FIG. 3. In
`addition, the SAO Structures of each document are used in
`the comparative Steps where the request SAO Structures are
`compared with the candidate document SAO structures. If
`no match is found then the documents and related SAO
`Structures are deleted from the System. If one or more
`matches are found then the document and related Structures
`are marked relevant and its relevancy marked for example
`on a scale of 1.0 to 100. The full relevant document text is
`permanently stored (although it can later be deleted by user
`if desired) for display or print-Out as user desires. Relevant
`SAO Structures are also marked relevant and permanently
`Stored.
`0046) Next System 10 filters out the least relevant SAO
`Structures and uses the matched SAO Structures of each
`relevant document to Synthesize into natural language Sum
`mary Sentence(s) the matched SAO structures and the page
`number where the complete Sentence associated with the
`matched SAO Structures and the page number where the
`complete Sentence associated with the matched SAO Struc
`ture appears. This Summary is Stored and available for user's
`display or print-out as desired.
`0047 Filtered relevant SAO structures of relevant docu
`ment(s) are analyzed to identify relationships among the
`Subjects, actions, and objects among all relevant Structures.
`Then SAO Structures are processed to reorganize them into
`new SAO Structures for Storage and Synthesis into natural
`language new sentence(s). The new sentences may and
`probably Some of them will express or Summarize new
`ideas, concepts and thoughts for users to consider. The new
`Sentences are Stored for user display or pint-Out.
`0048 For example, if
`0049 S-A-O,
`0050 S-A-O
`0051 S-A-O
`0.052 and S is the same as or a synonym of O, then
`S-A-S-A-O, is synthesized into a new sentence and
`Stored.
`Accordingly, the method and apparatus according
`0.053
`to the present invention provides use automatically with a
`Set of new ideas directly relating to user's requested area of
`interest Some of which ideas are probably new and Suggest
`possible new Solutions to user's problems under consider
`ation and/or the Specific documents and Summaries of per
`tinent parts of Specific documents related directly to user's
`request.
`0.054 Although mention has been made herein of appli
`cation of the present System and method to the engineering,
`Scientific and medical fields, the application thereof is not
`limited thereto. The present invention has utility for histo
`rians, philosophers, theology, poetry, the arts or any field
`where written language is used.
`
`0055. It will be understood that various enhancements
`and changes can be made to the example embodiments
`herein disclosed without departing from the Spirit and Scope
`of the present invention.
`
`We claim:
`1. A natural language document analysis and Selection
`System comprising,
`a general purpose computer having a monitor, a central
`processing unit (CPU), a user input device for gener
`ating request data representing a natural language
`request, and a communications device for communica
`tion with local and remote natural language document
`databases,
`said CPU comprising
`(i) first storage means for storing the request data,
`(ii) a Semantic processor for generating request Subject
`action-object (SAO) extractions in response to
`receiving request data, and
`(iii) SAO Storage means for Storing representations of
`the request SAO extractions.
`2. A System as Set forth in claim 1, wherein Said commu
`nication device conveys candidate document data to Said
`CPU for Storage in Said first Storage means, the candidate
`document data representing natural language document text,
`Said Semantic processor generating candidate document
`SAO extractions in response to receiving candidate
`document data, and
`Said SAO Storage means also storing representations of
`candidate document SAO extractions.
`3. A System as Set forth in claim 2, wherein Said Semantic
`processor identifies matches between Said representations of
`Said request SAO extractions and Said candidate document
`SAO extractions.
`4. A System as Set forth in claim 3, wherein Said Semantic
`processor comprises means for marking as relevant candi
`date document data that includes at least one representation
`of candidate do

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket