`
`United States Patent
`Manson
`
`COMPUTER SYSTEM VVITH NATURAL
`LANGUAGE TO MACHINE LANGUAGE
`TRANSLATOR
`
`Inventor: Keith S. l\/Ianson. Albany, CA (US)
`
`Assignee: Ravenflow, Inc., Emeryville, CA (US)
`
`Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 1095 days.
`
`09/883,693
`
`Jun. 18, 2001
`
`(10) Patent N0.:
`(45) Date of Patent:
`
`US 7,085,708 B2
`Aug. 1,2006
`
`2/2000 Suda et al.
`6,029,123 A
`Richardson et 211
`5/2000
`6,070,134 A
`Richardson et a1
`8/2000
`6,108,620 A
`6,311,150 B1* 10/2001
`Raniaswainy et al.
`()'1'lr1ER PUBL1CA1'1()N S
`
`....... .. 704/1
`
`Pereira, Categorial Semantics and Scoping (1990) Compu-
`tations Linguistics V1, p. 1-10.*
`Uchinami et al., Linguistic model based on the generative
`topological infonnation space, Osaka University (1980), p.
`93-100*
`Warren et al., Using Semantics in Non—Context—Free Parsing
`of Montague Grammar, 1982, Ameican Journal of Compu-
`tational Linguistics, V.8, No. 3-4, Jul.-Dec. 1982, p. 123-
`138.*
`
`Prior Publication Data
`
`* cited by examiner
`
`US 2004/0181390 A1
`
`Sep. 16, 2004
`
`Related U.S. Application Data
`
`Provisional application No. 60/235,165, filed on Sep.
`23, 2000.
`
`Int. Cl.
`(2006.01)
`G06F 17/27
`U.S. Cl.
`............................................. .. 704/9; 704/1
`Field of Classification Search ................... .. 704/9
`See application file for complete search history.
`References Cited
`
`Primary Examiner—Richen1ond Dorvil
`Assistant Examiner—Lamont Spooner
`(74) Attorney, /lgeizz, or Firm John Carpenter; Reed Smith,
`LLP
`
`(57)
`
`ABSTRACT
`
`Presented is a system and method for converting or trans-
`lating expressions in a natural language such as English into
`machine executable expressions in a formal language. This
`translation enables a transforrnation from the syntactic struc-
`tures of a natural language into effective algebraic forms for
`further exact processing. The invention utilizes algorithms
`employing a reductio11 of sequences ofter111s defined over an
`extensible lexicon into formal syntactic and semantic struc-
`tures. This term reduction incorporates both syntactic type
`and semantic context to achieve a11 effective formal repre-
`sentation and interpretation of the meaning conveyed by any
`natural language expression.
`
`9 Claims, 7 Drawing Sheets
`
`U.S. PATENT DOCUMENTS
`6/1994
`Namba ct al.
`............... ..
`9/1996
`Namba et al.
`............... ..
`10/1997
`Conrad et al.
`............... ..
`3/1999
`Bralich et al.
`3/1999
`Ho .............................. .. 707/3
`10/1999
`Heidorn et al.
`
`704/9
`704/9
`704/9
`
`'1
`
`AAAAAA
`
`5,321,608
`5,555,169
`5,682,539
`5,878,385
`5,884,302
`5,966,686
`
`axrarnalsynnm ,
`
`General System Process and Data Flaw
`
`GOOGLE EXHIBIT 1009
`
`Page 1 of 22
`
`
`
`U.S. Patent
`
`Aug. 1, 2006
`
`Sheet 1 of 7
`
`US 7,085,708 B2
`
`,. server] CPU
`
`keyboard
`
`1 .1 .1
`
`coiiaterai devices
`I
`I
`
`1 .4
`
`Figure 1: Computer System Architecture
`
`Page 2 of 22
`
`
`
`U.S. Patent
`
`Aug. 1, 2006
`
`Sheet 2 of 7
`
`US 7,085,708 B2
`
`input: natural language text
`
`output: swrlacfic data
`[sequence of
`syntactic complexes}
`
`output: fomwal data
`lsequence ofi
`format expre$ions)
`
`external
`processing
`
`output: external dam
`!sequence of
`executable expressions}
`
`operafional
`environment
`
`3_2_2
`
`external system
`
`Figure 2: General System Process and Data Flow
`
`Page 3 of 22
`
`
`
`U.S. Patent
`
`Aug. 1, 2006
`
`Sheet 3 of 7
`
`US 7,085,708 B2
`
`?A:AIItypesassigned?
`(ocnnli ——-yes)
`
`operational
`envirunrnerrt
`
`Figure 3: Detailed System Process and Data Flow
`
`Page 4 of 22
`
`
`
`U.S. Patent
`
`Aug. 1, 2006
`
`Sheet 4 of 7
`
`US 7,085,708 B2
`
`{send,act) = (send.Zexcyp(0,send));
`<Bob,pnm) = {B¢b,1ext:yp{c,Bob}),-
`(an,adj} = !an,leJ:typ£0.an)):
`(email,xao) = {emai1,Jext:};p(D,emailH;
`(askin-g,ing) = (asking,l_axt:yp(u,asking)),~
`(him,ppm) = (him,laxcy_p(0,him))7
`(if,xdc) = (if,lextyp(0.:'Lf)) .'
`(he,ppm} = (he,1eXtyp(B,he));
`(is,5ob) = (:'.s,1extyp(0.is));
`(going, ing) 2 (going.1ex::_yp(D,going)) ,-
`(to,xpi) = (t:o.1ex!:yp(0.t'.o)):
`(go.act) = (go.1e.x:yp(u.goH.-
`(toncpi) = (t:o,1excyp(0, COM;
`
`action
`proper name/male
`adjectiye
`ambiguous action/object
`ing = ambiguous participle/gerund
`personal pronoun/male
`ambiguous delimiter/conditional
`personal pronoun/male
`s!:ate—of—heing verb
`ambiguous participle/gerund
`ambiguous preposition/infinitive
`= action
`Xpi = ambiguous preposition/infinitive
`
`‘
`
`'
`
`'
`
`pm a personal possessive/male
`(his,psm) = (hi5,1e.xr:5-;a(0,his)) ;
`(appointmenlbkom) = (appointment, lexqpto, appointment)) ;
`xom = ambiguous object/modifier
`
`(byqprp) = (by,lexcyp(o,by)};
`(himse1f,prm) = (himself,1ax{:yp{0,himsr.-1f)) ;
`(.,trm) = (.,1ext:yp(O, .));
`
`PIP == preposition
`= personal reflexive/male
`trm a termination
`
`Figure 4a: Virtual Type Assignment
`
`(send,act) = !send,1ext;yp(0,sendH;
`
`action
`
`adj
`obj
`pt:
`
`dlp
`
`proper name/male
`(Bob,pnm) = (Bob,1e.xtyp(0,Bob)):
`adjective
`(an.ad3') = (an,iexcyp(o,an));
`object
`(emai1.obj) - 1emai1,1.excyp{2..ama11)).-
`participle
`(asking, pcc)
`tasking, lextyp (1, asking) ) 7
`personal pronoun/male
`(him,ppm} 2 {him,1extypID,him));
`phrase delimiter
`(if,d1p) = (if,1extyp{0,ifH;
`personal yxonoun/male
`(he,ppm) = (he,lex2:yp(0,he§) ;
`state—-of-being verb
`sub
`(is.sob) = (is.1extyp(0,is));
`participle
`ptc
`(going,pi:c) = {going,1extyp(1,go;‘u1g));
`infinitive
`inf
`(cminf) = (to,lextyp(2.to)):
`act = action
`lgcnact} = (gD,l£-JCtX¢'3(D,go));
`prp = preposition
`(to,prp) = (to,lextyp(1,toH;
`psm = personal possessive/male
`(his,psm) = €his,1extyp(0,h.is));
`(appoin|:ment,obj) = (appointment,lextyp(1,appoinl:ment));
`obj — object
`grp = preposition
`= personal reflexive/male
`a termination
`
`(by,prp) = (by.lextyp£0.by));
`(himself,prm) = (him:se1f,1extyp(D.himselfH;
`4. . crm) 2 4. , lextypm, . H:
`
`Figure 4b: Actual Type Assignment
`
`Page 5 of 22
`
`
`
`U.S. Patent
`
`Aug. 1, 2006
`
`Sheet 5 of 7
`
`US 7,085,708 B2
`
`to his appointment by himself
`is going to go
`he
`send Bob an email asking him if
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`act obj adj obj
`ptc obj dlp obj act ptc inf act: prp adj
`obj
`prp
`ubj
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`inf act: prp
`prp
`obj
`I
`I
`I
`I
`I
`-I
`I
`I
`.infacI:prp
`
`»
`.
`
`I
`
`I
`I
`
`.
`I
`trm
`
`.
`.
`
`II
`
`inf
`
`II II
`
`inf
`I
`
`.
`
`.
`.
`
`I
`I
`I
`I
`~
`. dip obj act ptc
`I
`I
`I
`I
`I
`I
`I
`~
`. dip obj act
`I
`I___._I
`I
`-
`. dlp
`I
`I
`v
`. dlp
`I
`
`I
`I
`I
`I
`act obj adj obj
`I
`I
`I__I
`I
`I
`I
`act obj
`obj
`I
`I
`I
`I
`I
`I
`act obj
`obj
`I
`I
`I
`I
`I
`I
`act obj
`obj
`I
`I
`‘
`act: obj
`I___I
`Iact
`I
`
`Figure 5: Term Reduction Sequence
`
`Page 6 of 22
`
`
`
`U.S. Patent
`
`Aug. 1, 2006
`
`Sheet 6 of 7
`
`US 7,085,708 B2
`
`(send, act} -> (Bob, obi}
`
`(send, act) -1- (email, obd)
`
`—>
`
`(an, adj)
`
`(send,act) -—>
`
`(ema.i.1,obd)
`
`(send,act:)
`
`——> (email,obd)
`
`-> (asking,ptc) —> (him,obd)
`——>
`
`(aski.ng,ptc)
`
`-> (if,d1p) -—) (is,act) -—> (he,obs)
`
`(ema:‘.l,obdZI —-> {asking,pi:c) -) (if,d1p) —> (is,act) -—> (going,pt:c) —->
`(send,act) -—>
`(to,in£) —) (go,act) --> (to,prp) —-) (appointment,obp) —-> (his,ac1j)
`
`(emai1,obd) -¥ (asI:ing,ptc) —-> (iE,dlp) ~> (is,act) ~> (going,ptc) —>
`(send,act) —->
`(to.inf) -9 (go,act:) -r (DYIPIPI ~> (h.i.mse1f.obp)
`
`Figure 6a: Syntactic Dependency Chains
`
`+- (act. send)
`
`-I--— (obi,BobI
`
`I+
`
`- (obd, email)
`
`I
`
`IIIIIIIIIIII II
`
`IIIIIIIIIIIII4
`
`,- (. ,trm)
`
`Figure 613: Syntactic Tree
`
`I+
`
`- (adj . an)
`
`I+
`
`— (ptc, asking)
`
`I+
`
`— (0236, him)
`
`I+
`
`: (d1p.ifI
`
`I+
`
`- (acl:,is)
`
`I+
`
`— (0335, he)
`
`I+
`
`— (ptc,go:‘mg)
`I
`+— (inf,to)
`
`I+
`
`+~ (obp, appointment)
`
`I+
`
`- (adj . his}
`
`+- (ohp, himself)
`
`~ !.act:,go)
`
`Page 7 of 22
`
`
`
`U.S. Patent
`
`Aug. 1, 2006
`
`Sheet 7 of 7
`
`US 7,085,708 B2
`
`synrep
`Exp(NL3 -———————;> 8yn£NL)
`
`instep
`~—-—-———+ Tns‘{NL)
`
`fmiint
`-——---—-y Exp{XL}
`
`exmep
`tam:
`Sam(NL} ————'3'i+Mod1xu —————.——+ Envmu 9—-——-—-——>
`
`E
`
`Figure 7a: External Representation 1 Interpretation Schema
`
`Exp(NL)
`
`synrep
`
`fmlint
`tnsrep
`» .S‘yn(NL) -—-—-> Tns1'NL} §-————-> Exp(XL)
`
`P="0t0C0/
`
`um
`pdcod
`Moo‘(XL)xExplXL) ————r BKPIPU —‘—)£——+§Exp£EL)
`
`Figure 7b: Protocol Representation I Interpretation Schema
`
`Page 8 of 22
`
`
`
`US 7,085,708 B2
`
`1
`COMPUTER SYSTEM VVITH NATURAL
`LANGUAGE TO MACHINE LANGUAGE
`TRANSLATOR
`
`This utility application claims the priority date of provi-
`sional application Ser. No. 60/235,165 filed on Sep. 23, 2000
`with the same title and by the same applicant, Keith Manson.
`
`BACKGROUND OF THE INVENTION
`
`invention is directed to a system which
`The present
`translates natural (human) language into an abstract fomial
`la11guage. This formal
`language is explicitly designed to
`serve as a universal template for further translations into a
`comprehensive variety of machine languages which are
`executable in specific operational environments. Extensive
`e orts have been made, many articles have been published,
`and many patents have been issued, all directed toward the
`goal of providing computers with the capacity to understand
`natural (human) language sufficiently well to respond reli-
`ably and accurately to directives issued from human users.
`Many companies and research groups, such as AT&T, IBM,
`and Microsoft, and an assortment of academic institutions,
`are presently working on natural
`language processing V
`(NLP).
`To date, many different approaches l1ave been tried to
`provide a system which effectively converts natural lan-
`guage to a formal language for computer applications. One
`such approach is disclosed in an article published by
`Microsoft Corporation titled “Microsoft Research: Natural
`Language Processing Hits High Gear” dated May 3, 2000.
`The article discloses that Microsoft is heavily focused on a
`database of logical forms, called MindNet
`(TM), and the
`creation of a machine translation application. It is stated that
`Mir1dNet
`is an initiative in an area of research called
`“example-based processing”, whereby a computer processes
`input based on something it has encountered before. The
`MindNet database is created by storing and weighting the
`semantic graphs produced during the analysis of a document
`or collection of documents. The system uses this database to
`find links in meaning between words within a single lan-
`guage or across languages. These stored relationships
`among words give the system a basis for “understanding”,
`thereby allowing the system to respond to natural language
`input. MindNet apparently contains the contents of several
`dictionaries and an encyclopedia to increase its level of
`understanding. Another approach is disclosed in Microsoft
`U.S. Pat. No. 5,966,686. This approach provides a rule-
`based computer system for semantically analyzing natural
`language sentences. The system first transforms an input
`sentence into a syntactic parse tree. Semantic analysis then
`applies three sets of semantic rules to create an initial logical
`forr11 graph from this tree. Additional rules provide seman-
`tically meaningful labels to create additional logical form
`graph models and to unify redundant elements. The final
`logical fonn graph represents the semantic analysis of the
`input sentence.
`Yet another, and apparently more common, approach is
`provided by U.S. Pat. No. 5,895,466, wherein a database
`stores a plurality of answers which are indexed to natural
`language keys. The natural
`language device receives a
`natural language question over the network from a remote
`device and the question is analyzed using a natural language
`understanding system. Based on this analysis, the database
`is then queried and an answer is provided to the remote
`device.
`
`7
`
`2
`Applicant is aware that various other approaches toward
`providing a conversion from natural
`language to some
`machine language have been tried. However, the prior art
`has not provided a truly effective conversion system of this
`sort.
`
`SUMMARY OF TIIE INVENTION
`
`Presented is a system and method for converting or
`translating expressions in a natural language such as English
`into machine executable expressions in a formal language.
`This translation enables a transformation from the syntactic
`structures of a natural
`language into e"ective algebraic
`forms for further exact processing. The invention utilizes
`algorithms employing a reduction of sequences of terms
`defined over an extensible lexicon into formal syntactic and
`semantic structures. This term reduction incorporates both
`syntactic type and semantic context to achieve an effective
`formal representation and interpretation of the meaning
`~ conveyed by any natural language expression.
`The foregoing features and advantages of the present
`invention will be apparent from the following more particu-
`lar description of the invention. The accompanying draw-
`ings, listed herein below, are useful in explaining the inven-
`tion.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 shows the hardware architecture of a computer
`system comprising the natural language converter of the
`present invention;
`FIG. 2 shows the general process and data flow of the
`inventive system;
`FIG. 3 shows a more detailed flow diagram for the
`inventive system;
`FIG. 4a shows the results of virtual type assignment
`applied to a sample text;
`FIG. 4b shows the results of actual type assignment for
`the same text;
`FIG. 5 shows the term reduction sequence for a sample
`text;
`FIG. 6:1 shows the sequence of dependency chains for a
`sample text;
`FIG. 6b shows the associated syntactic tree for the same
`text; and
`FIG. 7a shows the schema of structures and maps
`involved i11 the external interpretation of a text;
`FIG. 7b shows this external
`interpretation schema as
`controlled by a metasemantic protocol.
`
`BRIEF DESCRIPTION OF THE INVENTION
`
`Refer to FIG. 1 for an overview ofthe system architecture.
`As mentioned above, the inventive system, called METASCRIPT
`(TM), provides a method for translating expressions in a
`natural language such as English into machine executable
`expressions. In the embodiment of the system and method to
`be described,
`the user inputs text in a natural
`language
`through some input device to a known computer system
`which may comprise a standalone computer system, a local
`network of computing devices, or a global network such as
`the Internet, using wired land lines, wireless communica-
`tion, or some combination thereof, etc. This computer sys-
`tem includes memory for storing data, and a data processor.
`The text may be entered into the client device or local VDM
`(Video Display Monitor) (1.1) by any suitable means, such
`as direct input via keyboard (1.1.1), voice input via speech
`
`Page 9 of 22
`
`
`
`US 7,085,708 B2
`
`3
`recognition means (an SR system) (l.1.2), or indirect input
`via optical scanning (an OCR system) (1.1.3). The natural
`language text input to the system is passed along the network
`or local bus (1.3) to a server or local CPU (Central Process-
`ing Unit) (1.2) where it is processed in accordance with the
`inventive method and system. This processed output of the
`system is then provided to the system for distribution to the
`original input device (1.1), or to other collateral devices
`(1.4) which may be one or more digital computers, mobile
`devices, etc. The inventive system thus comprises a natural
`language interface to any sufficiently capable digital envi-
`romnent.
`
`Refer now to FIG. 2 for an overview of the process and
`data flow of the inventive system. The invention will be
`subsequently discussed in more detail herein below. Natural
`language text input is entered by the user (2.0) into the
`internal system (2.1) by means ofa text processing module
`(211) which parses the text. The output of the text pro-
`cessing module comprises a parsed sequence of preexpres—
`sions which is entered into the syntactic processing module
`(2.1.2) which provides syntactic type information, estab-
`lishes proper syntactic dependencies between terms in
`expressions, and represents these expressions as complexes
`in a syntactic algebra. The output of the syntactic processing
`module, comprising a sequence of these syntactic com-
`plexes,
`is entered into the semantic processing module
`(21.3) in order to achieve a semantic interpretation of the
`input text. The output of the semantic processing module,
`comprising a formal
`interpretation of the input
`text,
`is
`entered into an external system by means of the external
`processing module
`(2.2.1), which finally provides
`a
`sequence of executable expressions derived from the input
`text for use in a specific operational environment (2.2.2).
`As noted above. the means for providing text input to the
`system, such as through a keyboard, scamier, or speech
`recognition system, are well known in the art and are
`commercially available. Another standard component of the
`present system is a text parser, construed here in an
`extremely narrow sense as a limited process restricted to
`partitioning text strings into syntactic subcomponents such
`as paragraphs, sentences, and words. As such, the text parser
`discussed herein does not provide farther linguistic infor-
`mation such as grammatical types, syntactic dependencies,
`semantic ir11port, etc. Such limited text parsers are standard
`components of any natural language processing system, and
`exceedingly well known in the art. Yet another component in
`the present system which plays a relatively standard role is
`the lexicon or “electronic dictionary”. In general, lexicons
`are also well known in the art and are discussed in many
`patents including U.S. Pat. Nos. 5,37l,807; 5,724,594;
`5,794,050; and 5,966,686. However, the notion and function
`of “virtual” types. which play a significant syntactic catego-
`rization role ir1 the passive specification of lexical terms, and
`hence strongly contribute to the definition of the particular
`lexicon used in the inventive system, are not standard, and
`so require careful description. On the other hand, since text
`input devices, text parsers, and their operation are so well
`known, they will not be further described in detail herein.
`Refer now to FIG. 3, which shows more details of the
`inventive system. The components, modules, and sub1nod—
`ules of the inventive system are enumerated for convenient
`reference so that the operation and application of the system
`and method may be described in detail.
`As mentioned above, natural language text is entered by
`the user (3.0) into the text input submodule (3.1 .1) of the text
`processing module (3.1) via any suitable means including a
`keyboard or a speech recognition system. For the purposes
`
`4
`is simply some
`the user input signal
`of this discussion,
`linguistic data stream which is digitized into a string of
`ASCII characters. This ASCII string is the input text.
`In order to clarify the following discussion, it is helpful to
`note that any natural language text is typically organized into
`a sequence of paragraphs, each of which is a sequence of
`sentences, each of which is a sequence of words, each of
`which is a sequence of characters (alphanumeric symbols).
`All of this nested syntactic structure must be taken into
`account if an effective interpretive analysis is to be achieved.
`The role of the text parser is to detemiine and then present
`these nested sequential structures to the system for further
`processing. Thus in general, the adequate output of the text
`parser is a sequence of sequences of sequences of sequences
`ofASCII characters. This level of generality, however, tends
`to obscure the basic poi11ts of any useful description of the
`inventive system, so a teclmical cor11pro111ise is adopted
`herein, whereby any text
`is considered to comprise a
`sequence of sentences, or more properly, of expressions,
`‘ each of which comprises a sequence of words. Until recog-
`nized by the system as a meaningful unit of linguistic
`analysis, however, any such word in a text is simply treated
`as a partitioned substring of the input text string. Thus the
`proper output of the text parser is considered here to be a
`sequence of sequences of “pretokens”, where a pretoken is
`a text fragment which is a candidate for a word, i.e. anASCII
`(sub)string presented for recognition as a system “token”.
`The system lexicon is a lexicographically ordered list of
`such tokens (with associated type and reference data), and
`recognition by the system of a pretoken as an actual token
`is simply a matter of exact string comparison.
`Accordingly,
`the output of the text parser (3.1.2) is a
`sequence of sequences of pretokens, or sequence of “pre-
`expressions”, which is then passed to the type assignment
`sub111odule (3.2.1.1) of the type association submodule
`(32.1), where syntactic processing is initiated. Each preto-
`ken is checked against the system lexicon (3.2.0) for its
`status as a recognized lexical token. If a pretoken is recog-
`nized, i.e. if the string comprising that pretoken is included
`in the lexicon as an actual token (with associated syntactic
`and semantic data), then it is assigned a lexically associated
`syntactic type. The system determines at decision node
`(3.2.1.2) whether all the pretokens from the entered text
`l1ave been recognized as system tokens. If the determination
`is negative, then as indicated by the "no” connection to the
`lexical insertion submodule (3.2.1.3), the user is given the
`option to add the unrecognized pretokens to the system as
`tokens with associated type a11d reference data, i.e. to insert
`new terms into the lexicon, for further processing. On the
`other hand, if the determination is affirmative,
`then the
`resulting sequence of sequences of lexically typed tokens, or
`sequence of “virtual" expressions, is passed along the “yes"
`connection to the
`type
`contextualization submodule
`(321.4). This submodule initiates a second order type
`assignment which uses the initial (or virtual) lexical type
`assignments as data for a contextual process which may
`reassign these initial types depending on the relative syn-
`tactic roles of the tokens in the virtual expressions being
`processed. Upon complete (re)assign1nent of appropriate
`types to tokens, each virtual expression is promoted to an
`“actual” expression, and each token/type pair becomes a
`fully functional lexical term with associated semantic data.
`Thus the output of the type association submodule (3.2.1)
`of the syntactic processing module (3.2) comprises a
`sequence of (actual) expressions, and is passed to the tenn
`correlation submodule (32.2.1) of the term resolution mod-
`ule (3.2.2). The output of this sub111odule is a sequence of
`
`.
`
`Page 10 of 22
`
`
`
`US 7,085,708 B2
`
`5
`sequences of fully correlated lexical terms, which is then
`entered into the term reduction submodule (3.2.2.2), wherein
`proper syntactic dependencies between terms in an expres-
`sion are established by means of a type reduction matrix.
`Thc output of this submodulc is a sequence of scqucnccs of 5
`reduction links, which is entered into the term inversion
`submodule (3.2.2.3), wherein these reduction links are used
`to construct syntactic trees, each tree representing a pro-
`cessed expression. The resulting sequence of syntactic trees
`is passed to the syntactic representation submodule (3.2.3),
`wherein each expression is then represented as a syntactic
`complex,
`i.e. a (usually composite) term ir1 the syntactic
`algebra.
`Semantic processing (3.3) is initiated hi the semantic
`representation submodule
`(3.31), wherein the
`input
`scqucncc of syntactic complexes from the syntactic process-
`ing module (3.2) is represented as a full semantic complex,
`i.e. a structure of internal objects in the semantic algebra.
`This semantic complex is then passed to the formal repre-
`sentation submodule (3.32), wherein the input semantic
`complex is represented as a formal structure adhering to a
`frmdamental
`transaction paradigm. This formal semantic
`model
`is then corrrbined with the sequence of syntactic
`complexes output from the syntactic processing module to
`form the input
`to thc formal
`intcrprctation submodulc '
`(3.3.3), wherein a sequence of formal expressions is con-
`structed as a11 interpretation of the presented syntactic and
`semantic data.
`
`In addition, tl1e output of the formal representation sub-
`module is passed to the external representation submodule
`(3.4.l) of thc cxtcmal processing module (3.4), wherein a
`specific external representation appropriate for the fomial
`semantic data presented is identified. This external repre-
`sentation is cor11bi11ed with the sequence of formal expres-
`sions output from the formal interpretation submodule to
`form the input to the extemal
`interpretation submodule
`(3.4.2), wherein a sequence of executable expressions is
`constructed accordingly for ultimate processing in the
`appropriate operational environment (3.5).
`
`DETAILED DESCRIPTION OF THE
`INVENTION
`
`Introduction:
`
`6
`leams as it goes. In addition, any desired level of syntactic
`disambiguation is attainable by increasing the local dimen-
`sionality of the underlying reduction matrix, though this
`feature is part of the underlying algorithm, and therefore
`independent of user modulation.
`It should be noted that 1\/JITASCRIPT is not a speech recog-
`nition system. Instead, it is a fully capable natural language
`interpreter. Specifically, MJETASCRIPT translates natural lan-
`guage expressions into expressions in a formal
`language
`associated with an abstract network protocol. A more
`detailed account of this process follows.
`Notation:
`Standard mathematical notation is used to clarify the
`presentation of certain technical features of the system. In
`particular,
`the following set—theoretical notation appears
`throughout this discussion:
`a) a set is a collection of things, called elements. For
`cxamplc, N:{0,l,2,3,
`.
`.
`.
`}
`is the sct of natural
`numbers.
`Note: In general, a set A is most conveniently deter-
`mined by son1e property P of its elements, indicated
`by use of so-called “set-builder notation” as A:{x|P
`(x)}fihe set of things x satisfying property P.
`b) the expression ‘xeA’ indicates that the thing x is an
`element of the set A
`c) the expression ‘AQB’ indicates that the set A is a
`subset of the set E, ie. that every element of A is an
`element of B as well
`d) a map (or function) is a relation between two sets A,B
`such that each element
`ir1 A is assigned a unique
`element in B. The expression ‘f: A—>B’ indicates that f
`is a map from the set A to the set B, i.e. f assigns a
`unique element Ff(x)eB to each element xeA. The
`composition of maps f: A—>B and g: B—>C on sets
`A,B,C is the map h:gof: A+C. defined sucl1 that
`h(x):g(f(x))eC for any XEA.
`Note: A program is a function which maps input onto
`output in an effective manner,
`i.e. by means of a finite,
`discrete, deterministic procedure;
`ir1 fact, any process or
`procedure is effective precisely to the extent
`that
`it
`is
`cxccutablc as a program of this sort.
`e) for any sets A,B, the Cartesian product A><B consists of
`all pairs (x,y) such that xeA a11d yeB, i.e. A><B:{(x,y)
`|xeA, yeB}
`1‘) for any sets A,B the union AUB is the set consisting of
`all elements x such that xeA or xeB, i.e. AUB:{x|xeA
`or xeB}: for any collection C:{A|0 éj én} of sets
`for
`sonre neN, the overall union UC is the set consisting of
`unions over all
`sets A],
`i.e. UC:LJ{A].|0§j§n}:
`AOU .
`.
`. UA"
`g) for any algebras AB and representation f: AeB, the
`correlated tensor product AJB is the distinguished sub-
`set of A><B which consists of all pairs (x,f(x)) for xeA,
`i.e. AfB:{(x,f(x))|xeA}%he graph of f; for an implicit
`representation, the map subscript may be omitted, i.e.
`AB:A/B for some implicit f AQB
`l1) for any set A, Seq(A) is the set of finite sequences from
`A, i.e. Seq(A):{(aO, .
`.
`.
`,an)|a].eA, neN}
`60 Definitions:
`language: a structure over the following components:
`a) alphabet: a set of basic symbols
`b) punctuation symbols: a set of basic symbols disjoint
`from the alphabet
`c) words: admissible sequences of basic symbols
`(1) punctuations: admissible sequences of punctuation
`symbols
`
`MJETASCRIPT is a translation fron1 a natural language into ar1
`executable formal language. This translation is essentially a
`transfomiation from thc syntactic structures of natural lan-
`guage into efifective algebraic forms suitable for further
`processing. The formal semantics which finally determines 7
`the ensuing interpretations and executions of these formal
`expressions in external operational environments is object-
`oriented.
`
`The fundamental algorithm upon which MJETASCRIFT is
`based cmploys a reduction to formal syntactic structurcs
`over temis defined in an extensible lexicon. This term
`reduction incorporates both syntactic type and semantic
`context to achieve ar1 effective formal representation and
`interpretation of the meaning conveyed by any natural
`language expression. Extensibility of the lexicon under
`specific user direction provides the capacity for the system
`to expand its knowledge of vocabulary and usage, and
`consequently, oifers an effective mechanism under user
`control for cstablishing dcfinitc incrcmcntal cnhanccmcnts
`to the system’s linguistic capabilities, hence substantially
`increasing the system’s familiarity with (and competence in)
`particular operational environments. Put simply, tl1e system
`
`Page 11 of 22
`
`
`
`US 7,085,708 B2
`
`7
`e) expressions: admissible sequences of words and/or
`punctuations
`f) sentences: complete expressions
`g) syntax: a specification of which
`sequences of basic symbols are admissible as words
`sequences of punctuation symbols are admissible as
`punctuations
`sequences of words and/or punctuations are admissible
`as expressions
`expressions are admissible as sentences
`g) semantics: a scheme of interpretation over words
`whereby expressions acquire meaning with respect to
`certain external structures
`A number of languages enter into this discussion:
`1) natural
`language: any of the human languages in
`current use, e.g. English, each characterized by an
`informal, and hence notoriously ambiguous, syntax and
`semantics
`
`2) formal language: a highly structured language, usually
`mathematical
`in origin and use, characterized by a
`uniquely readable, recursive syntax and an extensional,
`usually first-order semantics; i11 short, a language for
`which the syntax and semantics is effectively unam-
`biguous
`3) object language: a formal language which is interpret-
`able relative to a class of extensional structures, i.e. a
`formal language with an object-oriented semantics
`4) protocol language: a formal language which mediates
`transactions between addressable nodes on a network
`5) executable language: a formal, programrna ble language
`which encodes instructions directly implementable by a
`suitably capable machine such as a computer
`system: the integrated process which manifests MJETASCRITT,
`and which may be implemented as software running on
`any programmable device
`string: a sequence of ASCll characters
`text: a string presented to the system as the fundamental unit
`of initial input
`parser: a process which partitions texts into sequences of
`sequences of substrings
`preexpression: a sequence of substrings of some text, dis-
`tinguished as a unit of syntactic processing by the text
`parser
`lexicographically
`indexed,
`system specific,
`a
`lexicon:
`ordered list of designated strings, called tokens, each of
`which is associated with certain syntactic and semantic
`information; in particular, each token is associated with a
`lexical type, which may be virtual (syntactically ambigu-
`ous) or actual syntactically unambiguous); furthermore,
`each token which is associated with an actual type is also
`associated with a lexical reference, which provides basic
`semantic information
`Note: A single string may serve as a token with multiple
`entries, associated with a number (including 1) of virtual
`types and a number of actual types, reflccting that token ’s
`multiple syntactic roles, e.g. as a verb and an object, or an
`object and an adjective, etc. Although there is considerable
`variability in such syntactic multiplicities among lexical
`entries, it is still the case that every token is associated with
`at least one actual type.
`token: a string recognized by the system in the sense that it
`is included in the system lexicon
`type: a syntactic category used to organize semantically
`similar tokens; there are three sorts:
`1) virtual: a lexical type which is ambiguous
`2) actual: a lexical type which is not ambiguous
`
`8
`3) reduced: a syntactic type which has specific semantic
`functionality upon tenn reduction
`terr11: there are six sorts:
`1) lexical: a token/type pair in the lexicon with associated
`reference data
`2) reduced: a token/type pair for which the type is reduced
`3) syntactic: an element of the syntactic algebra associ-
`ated witl1 a language
`4) semantic: an element of the semantic object algebra
`associated with a language
`5) tensored: an element of the semantic tensor algebra
`associated with a language
`6) formal: an interpretable element of a formal language
`reference: there are two sorts:
`1) internal: an object with which a term is associated,
`either in the lexicon or in the semantic object algebra
`2) external: an object with which an ir1ten1al semantic
`object is associated in some operational environment
`expression: a sequence of tokens
`‘ sentence: a syntactically correct, semantically complete
`expression
`chain: a linearly ordered set of nodes (usually compri