throbber
(12) United States Patent
`Weber
`
`I lllll llllllll Ill lllll lllll lllll lllll lllll 111111111111111111111111111111111
`US006434524Bl
`US 6,434,524 Bl
`Aug. 13, 2002
`
`(10) Patent No.:
`(45) Date of Patent:
`
`(54) OBJECT INTERACTIVE USER INTERFACE
`USING SPEECH RECOGNITION AND
`NATURAL LANGUAGE PROCESSING
`
`(75)
`
`Inventor: Dean Weber, San Diego, CA (US)
`
`(73) Assignee: One Voice Technologies, Inc., San
`Diego, CA (US)
`
`( *) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`
`(21) Appl. No.: 09/412,929
`
`(22) Filed:
`
`Oct. 5, 1999
`
`Related U.S. Application Data
`
`(63) Continuation-in-part of application No. 09/166,199, filed on
`Oct. 5, 1998, which is a continuation-in-part of application
`No. 09/150,459, filed on Sep. 9, 1998.
`
`Int. Cl.7 ................................................ GOlL 15/18
`(51)
`(52) U.S. Cl. ......................... 704/257; 704/275; 704/10;
`704/9; 707 /3
`(58) Field of Search .............................. 704/9, 10, 251,
`704/252, 253, 257, 235, 270, 275; 707/4,
`3
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`11/1988 Baker et al. .................. 381/42
`4,783,803 A
`12/1989 Zamora et al.
`............. 364/419
`4,887,212 A
`5,311,429 A * 5/1994 Tominaga
`5,799,279 A * 8/1998 Gould et al. ................ 704/231
`5,991,721 A * 11/1999 Asano et al.
`............... 704/239
`6,112,174 A * 8/2000 Wakisaka et al. ........... 701/117
`6,144,938 A * 11/2000 Surace et al. ............... 704/257
`6,188,977 Bl * 2/2001 Hirota .. ... .. ... ... ... ... ... .. ... 704/9
`
`FOREIGN PATENT DOCUMENTS
`0 837 962 A2 * 4/1998
`
`............. GOlL/5/06
`
`EP
`
`OTHER PUBLICATIONS
`
`Wyard et al., Spoken Language systems-beyond prompt and
`response, Jan. 1996, BT Technology Journal, vol. 14, No. 1,
`pp. 187-207.*
`Approximate Word-Spotting Method for Constrained
`Grammars; Oct. 1994; IBM Technical DisclosureBulletin,
`vol. 37, No. 10, pp. 385.*
`
`* cited by examiner
`
`Primary Examiner-William Korzuch
`Assistant Examiner-Abu! K. Azad
`(74) Attorney, Agent, or Firm-James Y. C. Sze; Pillsbury
`Winthrop LLP
`
`(57)
`
`ABSTRACT
`
`A system and method for interacting with objects, via a
`computer using utterances, speech processing and natural
`language processing. A Data Definition File relates net(cid:173)
`worked objects and a speech processor. The Data Definition
`File encompasses a memory structure relating the objects,
`including grammar files, a context-specific dictation model,
`and a natural language processor. The speech processor
`searches a first grammar file for a matching phrase for the
`utterance, and for searching a second grammar file for the
`matching phrase if the matching phrase is not found in the
`first grammar file. The system also includes a natural lan(cid:173)
`guage processor for searching a database for a matching
`entry for the matching phrase; and an application interface
`for performing an action associated with the matching entry
`if the matching entry is found in the database. The system
`utilizes context-specific grammars and dictation models,
`thereby enhancing speech recognition and natural language
`processing efficiency. Additionally, for each user the system
`adaptively and interactively "learns" words and phrases, and
`their associated meanings, storing the adaptive updates into
`user voice profiles. Because the user voice profiles can be
`stored locally or remotely, users can access the adaptively
`learned words and phrases at various locations.
`
`76 Claims, 12 Drawing Sheets
`
`

`

`U.S. Patent
`
`Aug. 13, 2002
`
`Sheet 1 of 12
`
`US 6,434,524 Bl
`
`104
`\
`
`Display
`
`,_,..100
`
`-
`
`Storage
`Medium
`
`r-- 108
`
`102
`\
`
`106
`\
`
`-
`
`114
`\
`Data Input
`Port
`
`116
`\
`Network
`Interface
`
`CPU
`
`1 - -
`
`...__
`
`Microphone t"'- 110
`
`112
`I
`
`,_____
`
`Manual
`Input
`
`-
`
`Speaker
`
`FIG. 1
`
`,,,.. 500
`
`510
`\
`Network Object
`Table
`
`212
`~-'_.__~~-
`Context-Specific
`Grammar
`
`218
`\
`
`NLP
`Database
`
`217
`\
`Context-Specific
`Dictation Model
`
`FIG. 5
`
`

`

`,-----
`
`102
`__ l
`
`I
`I
`I
`I
`I
`OFF I
`8PU I
`-
`-
`I
`
`I
`
`l_
`
`220
`I
`Application
`Interface
`
`r - - -
`
`201
`I
`
`.....--
`
`Data
`Processor
`
`200
`I
`
`Speech
`Recognition
`Processor
`
`202
`
`(
`
`204
`I
`Variable
`Replacer
`
`210
`I
`Boolean
`Tester
`
`- - - - - - - - - - - - ,
`
`206
`I
`
`208
`I
`
`String
`Formatter
`
`Word
`Weighter
`
`211
`I
`Pronoun
`Replacer
`
`215
`I
`Search
`Engine
`
`- - - - - - -
`-
`-
`
`-
`
`-
`
`-
`
`-
`212
`I
`
`I
`
`I
`
`Context-
`Specific
`Grammar
`
`..
`
`~
`
`I
`L
`
`Natural Language Processor
`- - - - - - - - - - - - - - - - - - - -
`,_ ______ -
`
`-
`
`-
`
`-
`
`-
`
`- - - - - - - -
`
`214
`I
`General
`Grammar
`
`216
`I
`Dictation
`Grammar
`
`217
`J
`
`Context-
`Specific
`Dictation Model
`
`FIG. 2
`
`_J
`
`I
`
`-
`
`-
`
`-
`
`218
`I
`
`NLP
`Database
`
`-
`
`_J
`
`d •
`\JJ.
`•
`~
`~ ......
`~ = ......
`
`~
`~
`'"""' ~
`N c c
`
`N
`
`'Jl =(cid:173)~
`~ .....
`N
`0 .....,
`'"""' N
`
`e
`
`rJ'J.
`O'I
`
`~ .i;;..
`11.
`N
`.i;;..
`~
`lo-"
`
`

`

`U.S. Patent
`
`Aug. 13, 2002
`
`Sheet 3of12
`
`US 6,434,524 Bl
`
`Provide utterance
`to speech
`processor
`
`300
`
`Capture manually
`entered words
`
`301
`
`Search context-
`specific grammar
`
`304
`
`y
`
`Enable general
`grammar
`
`308
`
`Search general
`grammar
`
`310
`
`Direct application
`interiace to take
`associated action
`
`322
`
`Access NLP
`database
`
`Prepend context
`to matching
`phrase
`
`Enable context(cid:173)
`specific dictation
`model
`
`317
`
`Enable dictation
`grammar
`
`314
`
`Search dictation 1 - - - - - - - - - i . . ,
`grammar
`316
`
`FIG. 3A
`
`r - - - - - -
`Error
`1
`I , Message
`
`320
`
`

`

`U.S. Patent
`
`Aug. 13, 2002
`
`Sheet 4of12
`
`US 6,434,524 Bl
`
`Format phrase
`for NLP analysis
`
`328
`
`Replace word(cid:173)
`variables with
`associated wildcard
`function
`
`Pronoun
`substitution
`
`Weight individual
`words
`
`Search NLP
`database
`
`Generate confidence
`value for possible
`matches
`
`330
`
`332
`
`334
`
`336
`
`338
`
`N
`
`Discard 11noise11
`words from phrase
`
`344
`
`342
`, ____ J __
`Error
`1
`, Message
`
`FIG. 38
`
`

`

`U.S. Patent
`
`Aug. 13, 2002
`
`Sheet 5of12
`
`US 6,434,524 Bl
`
`Retrieve non-noise word
`requirement from NLP
`database for
`highest-confidence entry
`
`346
`
`y
`
`Prompt user whether
`highest-confidence
`entry was meant
`
`354
`
`T~ke action(s)
`associated with
`highest(cid:173)
`conf idence entry
`
`Enable context-specific
`grammar for context
`associated with
`highest-confidence entry
`
`350
`
`352
`
`FIG. 3C
`
`Retrieve context for
`highest-confidence entry
`
`358
`
`Prompt user for information
`using context-based
`interactive dialog
`
`FIG. 30
`
`Update entries in NLP
`database, general grammar,
`and context-specific grammar
`
`360
`
`362
`
`

`

`400
`
`402
`
`404
`
`408
`
`412A-n
`
`Phase
`- -
`What movies are
`playing at $time
`What is the price of
`I BM stock on $date
`Sell IBM stock at
`$dollars
`What is the weather
`at $location
`
`What time is it
`
`Show me the news
`
`How do I format
`this paragraph
`
`How do I insert
`a table
`
`Reguired words Context/Subcontext
`
`3
`
`4
`
`4
`
`3
`
`2
`
`2
`
`2
`
`2
`
`movies
`
`stocks
`
`stocks
`
`weather
`
`time
`
`news
`
`Word Processor
`
`Spreadsheet
`
`FIG. 4
`
`Action 1
`Access movie
`web site
`Access stock
`price web site
`Access stock
`price web site
`Access weather
`web site
`
`N/A
`
`Access news
`web site
`Locate Word
`Processor Help
`
`d •
`\JJ.
`•
`~
`~ ......
`~ = ......
`
`~
`~
`'"""' ~~
`N c c
`
`N
`
`'Jl =(cid:173)~
`~ .....
`O'I
`0 .....,
`'"""' N
`
`,-218
`
`410
`
`Action 2
`
`N/A
`
`N/A
`
`N/A
`
`N/A
`
`Text-to-Speech
`of Time
`
`Display Images
`
`Format Paragraph
`Help
`
`Locate
`Spreadsheet Help
`
`Insert Table Help
`
`e
`
`rJ'J.
`O'I
`
`~ .i;;..
`11.
`N
`.i;;..
`~
`lo-"
`
`

`

`520
`
`522
`
`524
`
`526
`
`528
`
`530
`
`532
`
`534
`
`,.- 510
`
`Use
`Grammar
`
`Append
`Is Yes/
`Grammar No?
`x
`
`Do Yes
`
`Do No
`
`x
`
`210
`
`211
`
`x
`
`x
`
`254
`
`213
`
`object
`
`http://www.conversit.
`com
`http://www.conversit.
`com/news
`http://www.conversit.
`com/products
`http://www.conversit.
`comNiaVoice
`http://www.conversit.
`com/search
`WPHelp: Format:
`Paragraph
`
`TIS Flag
`x
`
`x
`x
`
`x
`
`x
`
`x
`
`Text Speech
`"Hello, welcome
`to ...
`"Would you
`like to learn ...
`"All natural
`language ...
`
`"Via Voice is ...
`
`'To format
`paragraphs ...
`
`. . .
`
`SpreadsheetHelp:
`Insert: Table
`
`x
`
`"Inserting Tables
`is easy ...
`
`540A-n
`
`FIG. 6
`
`d •
`\JJ.
`•
`~
`~ ......
`~ = ......
`
`~
`~
`"'"" ~~
`N c c
`
`N
`
`'Jl =(cid:173)~
`~ ......
`-..J
`0 ......,
`"'"" N
`
`e
`
`rJ'J.
`O'I
`
`~ .i;;..
`11.
`N
`.i;;..
`~
`lo-"
`
`

`

`U.S. Patent
`
`Aug. 13, 2002
`
`Sheet 8of12
`
`US 6,434,524 Bl
`
`600
`
`Provide object
`location to
`program
`
`602
`
`N
`
`610
`,---- _!_ __
`1 Error, Display
`1 error message.
`
`1, ____ _
`
`Retrieve object
`
`606
`
`Display object
`
`61 o
`
`FIG. 7A
`
`

`

`U.S. Patent
`
`Aug. 13, 2002
`
`Sheet 9of12
`
`US 6,434,524 Bl
`
`Is DDF
`file/information
`encoded in object
`information
`?
`
`N
`:i.--..-c
`
`N
`
`y
`
`y
`
`y
`
`Resolve DDF
`file location
`
`616
`
`Retrieve DDF file
`
`626
`
`N
`
`630
`
`632
`
`Replace any prior
`DDF file with
`newly obtained
`DDF file
`
`Extract object table,
`any context-specific
`grammar files, NLP
`database, and
`context-specific
`dictation models
`
`Deactivate existing
`(if any) object table,
`NLP database,
`context-specific
`grammar,
`context-specific
`dictation model
`
`Treat object as
`non-voice
`activated object
`(use only standard
`grammar files)
`
`622
`
`624
`
`FIG. 78
`
`

`

`U.S. Patent
`
`Aug. 13, 2002
`
`Sheet 10 of 12
`
`US 6,434,524 Bl
`
`Read Object
`table
`
`634
`
`Play spoken
`statement
`
`640
`
`y
`
`Enable
`context-specific
`dictation model
`
`650
`
`Enable context-
`specific grammar
`
`644
`
`Enable NLP
`database
`
`646
`
`FIG. 7C
`
`

`

`U.S. Patent
`
`Aug. 13, 2002
`
`Sheet 11 of 12
`
`US 6,434,524 Bl
`
`D
`
`111 I
`
`[[]
`[[]
`[[]
`
`710
`
`108
`
`720
`
`720
`
`100 D
`
`FIG. 8
`
`,--aoo
`
`214a
`\
`General
`Grammar
`additions
`
`212a
`\
`Context-Specific
`Grammar
`additions
`
`218a
`\
`
`NLP
`Database
`additions
`
`FIG. 9
`
`

`

`U.S. Patent
`
`Aug. 13, 2002
`
`Sheet 12 of 12
`
`US 6,434,524 Bl
`
`Query User
`for Login ID
`& Password
`
`900
`
`N
`
`Load local user
`voice profile
`
`915
`
`Load travelling
`user voice profile
`
`925
`
`N
`
`Enable user
`voice profile
`
`940
`
`945
`
`No user voice profile
`available, use
`standard
`(non-custom user)
`processing
`
`FIG. 10
`
`

`

`US 6,434,524 Bl
`
`1
`OBJECT INTERACTIVE USER INTERFACE
`USING SPEECH RECOGNITION AND
`NATURAL LANGUAGE PROCESSING
`
`The aspects of the present invention relate to speech
`recognition for an object-based computer user interfice.
`More specifically, the embodiments of the present invention
`relate to a novel method and system for user interaction with
`a computer using speech recognition and natural language
`process. This application is a continuation-in-part of U.S.
`patent application Ser. No. 09/166,198, entitled "Network
`interactive User Interface Using Speech Recognition and
`Natural Language Processing," filed Oct. 5, 1998, a
`continuation-in-part of U.S. patent application Ser. No.
`09/150,459, entitled "Interactive User Interface Using 15
`Speech Recognition and Natural Language Processing" filed
`Sep. 9, 1998. This application is additionly related to PCT/
`US99/20445 and PCT/US99/20447, both filed Sep. 9, 1999,
`corresponding to U.S. patent application Ser. Nos. 09/150,
`459 and 09/166,198, respectively.
`
`2
`language processing system. Natural language processing
`generally involves determining a conceptual "meaning"
`(e.g., what meaning the speaker intended to convey) of the
`detected words by analyzing their grammatical relationship
`5 and relative context. For example, U.S. Pat. No. 4,887,212,
`entitled "PARSER FOR NATURAL LANGUAGE TEXT",
`issued Dec. 12, 1989, assigned to International Business
`Machines Corporation teaches a method of parsing an input
`stream of words by using word isolation, morphological
`10 analysis, dictionary look-up and grammar analysis.
`Natural language processing used in concert with speech
`recognition provides a powerful tool for operating a com(cid:173)
`puter using spoken words rather than manual input such as
`a keyboard or mouse. However, one drawback of a conven(cid:173)
`tional natural language processing system is that it may fail
`to determine the correct "meaning" of the words detected by
`the speech recognition system. In such a case, the user is
`typically required to recompose or restate the phrase, with
`the hope that the natural language processing system will
`20 determine the correct "meaning" on subsequent attempts.
`Clearly, this may lead to substantial delays as the user is
`required to restate the entire sentence or command. Another
`drawback of conventional systems is that the processing
`time required for the speech recognition can be prohibitively
`25 long. This is primarily due to the finite speed of the pro(cid:173)
`cessing resources as compared with the large amount of
`information to be processed. For example, in many conven(cid:173)
`tional speech recognition programs, the time required to
`recognize the utterance is long due to the size of the
`30 dictionary file being searched.
`An additional drawback of conventional speech recogni(cid:173)
`tion and natural language processing systems is that they are
`not interactive, and thus are unable to cope with new
`situations. When a computer system encounters unknown or
`35 new networked objects, new relationships between the com(cid:173)
`puter and the objects are formed. Conventional speech
`recognition and natural language processing systems are
`unable to cope with the situations that result from the new
`relationships posed by previously unknown networked
`40 objects. As a result, a conversational-style interaction with
`the computer is not possible. The user is required to com(cid:173)
`municate complete concepts to the computer. The user is not
`able to speak in sentence fragments because the meaning of
`these sentence fragments (which is dependent on the mean-
`45 ing of previous utterances) will be lost.
`Another drawback of conventional speech recognition
`and natural language processing systems is that once a user
`successfully "trains" a computer system to recognize the
`50 user's speech and voice commands, the user cannot easily
`move to another computer without having to undergo the
`process of training the new computer. As a result, changing
`a user's computer workstations or location results in wasted
`time by users that need to re-train the new computer to the
`user's speech habits and voice commands.
`
`55
`
`SUMMARY
`
`The embodiments of the present invention include a novel
`and improved system and method for interacting with a
`60 computer using utterances, speech processing and natural
`language processing. Generally, the system comprises a
`speech processor for searching a first grammar file for a
`matching phrase for the utterance, and for searching a
`second grammar file for the matching phrase if the matching
`65 phase is not found in the first grammar file. The system also
`includes a natural language processor for searching a data(cid:173)
`base for a match entry for the matching phrase; and an
`
`BACKGROUND
`
`Description of the Related Art
`As computers have become more prevalent it has become
`clear that many people have great difficulty understanding
`and communicating with computers. A user must often learn
`archaic commands and non-intuitive procedures in order to
`operate the computer. For example, most personal comput(cid:173)
`ers use windows-based operating systems that are largely
`menu-driven. This requires that the user learn what menu
`commands or sequence of commands produce the desired
`results.
`Furthermore, traditional interaction with a computer is
`often slowed by manual input devices such as keyboards or
`mice. Many computer users are not fast typists. As a result,
`much time is spent communicating commands and words to
`the computer through these manual input devices. It is
`becoming clear that an easier, faster and more intuitive
`method of communicating with computers and networked
`objects, such as web-sites, is needed.
`One proposed method of computer interactions is speech
`recognition. Speech recognition involves software and hard(cid:173)
`ware that act together to audibly detect human speech and
`translate the detected speech into a string of words. As is
`known in the art, speech recognition words by breaking
`down sounds the hardware detects into smaller non-divisible
`sounds called phonemes. Phonemes are distinct units of
`sound. For example, the word "those" is made up of three
`phonemes, the first is the "th" sound, the second is the "o"
`sound, and the third is the "s" sound. The speech recognition
`software attempts to match the detected phonemes with
`known words from a stored dictionary. An example of a
`speech recognition system is given in U.S. Pat. No. 4,783,
`803, entitled "SPEECH RECOGNITION APPARATUS
`AND METHOD", issued Nov. 8, 1998, assigned to Dragon
`Systems, Incorporated. Presently, there are many commer(cid:173)
`cially available speech recognition software packages avail(cid:173)
`able from such companies as Dragon Systems, Inc. and
`International Business Machine Corporation.
`One limitation of these speech recognition software pack(cid:173)
`ages or systems is that they typically only perform command
`and control or dictation functions. Thus, the user is still
`required to learn a vocabulary of commands in order to
`operate the computer.
`A proposed enhancement to these speech recogmt10n
`systems is to process the detected words using a natural
`
`

`

`US 6,434,524 Bl
`
`5
`
`15
`
`3
`application interface for performing an action associated
`with the matching entry if the matching entry is found in the
`database.
`In one embodiment, the natural language processor
`updates at least one of the database, the first grammar file
`and the second grammar file with the matching phrase if the
`matching entry is not found in the database.
`The first grammar file is a context-specific grammar file.
`A context-specific grammar file is one that contains words
`and phrases that are highly relevant to a specific subject. The
`second grammar file is a general grammar file. A general
`grammar file is one that contains words and phrases which
`do not need to be interpreted in light of a context. That is to
`say, the words and phrases in the general grammar file do not
`belong to any parent context. By searching the context(cid:173)
`specific grammar file before searching the general grammar
`file, the present invention allows the user to communicate
`with the computer using a more conversational style,
`wherein the words spoken, if found in the context specific
`grammar file, are interpreted in light of the subject matter
`most recently discussed.
`In a further aspect of the present invention, the speech
`processor searches a dictation grammar for the matching
`phrase if the matching phrase is not found in the general
`grammar file. The dictation grammar is a large vocabulary of
`general words and phrases. By searching the context(cid:173)
`specific and general grammars first, it is expected that the
`speech recognition time will be greatly reduced due to the
`context-specific and general grammars being physically
`smaller files than the dictation grammar.
`In another aspect of the present invention, the speech
`processor searches a context-specific dictation model for the
`matching phrase if the matching phrase is not found within
`the dictation grammar. A context-specific dictation model is
`a model that indicates the relationship between words in a
`vocabulary. The speech processor uses this to determine help
`decode the meaning of related words in an utterance.
`In another aspect of the present invention, the natural
`language processor replaces at least one word in the match(cid:173)
`ing phrase prior to searching the database. This may be
`accomplished by a variable replacer in the natural language
`processor for substituting a wildcard for the at least one
`word in the matching phrase. By substituting wildcards for
`certain words (called "word-variables") in the phrase, the
`number of entries in the database can be significantly
`reduced. Additionally, a pronoun substituter in the natural
`language processor may substitute a proper name for pro(cid:173)
`nouns the matching phrase, allowing user-specific facts to be
`stored in the database.
`In another aspect, a string formatter text formats the
`matching phrase prior to searching the database. Also, a
`word weighter weights individual words in the matching
`phrase according to a relative significance of the individual
`words prior to searching the database. These acts allow for 55
`faster, more accurate searching of the database.
`A search engine in the natural language processor gener(cid:173)
`ates a confidence value for the matching entry. The natural
`language processor compares the confidence value with a
`threshold value. A boolean tester determines whether a 60
`required number of words from the matching phrase are
`present in the matching entry. This boolean testing serves as
`a verification of the results returned by the search engine.
`In order to clear up ambiguities, the natural language
`processor prompts the user whether the matching entry is a 65
`correct interpretation of the utterance if the required number
`of words from the matching phrase are not present in the
`
`50
`
`4
`matching entry. The natural language processor also prompts
`the user for additional information if the matching entry is
`not a correct interpretation of the utterance. At least one of
`the database, the first grammar file and the second grammar
`file are updated with the additional information. In this way,
`the present invention adaptively "learns" the meaning of
`additional utterances, thereby enhancing the efficiency of the
`user interface.
`The speech processor will enable and search a context-
`10 specific grammar associated with the matching entry for a
`subsequent matching phrase for a subsequent utterance. This
`ensures that the most relevant words and phrases will be
`searched first, thereby decreasing speech recognition times.
`Generically, the embodiments include a method to update
`a computer for voice interaction with an object, such as a
`help file or web page. Initially, an object table, which
`associates with the object with the voice interaction system,
`is transferred to the computer over a network. The location
`of the object table can be imbedded within the object, at a
`specific internet web-site, or at consolidated location that
`20 stores object tables for multiple objects. The object table is
`searched for an entry marching the object. The entry match(cid:173)
`ing object may result in an action being performed, such as
`text speech being voiced through a speaker, a context(cid:173)
`specific grammar file being used, or a natural language
`25 processor database being used. The object table may be part
`of a dialog definition file. Dialog definition files may also
`include a context-specific grammar, entries for a natural
`language processor database, a context-specific dictation
`model, or any combination thereof.
`In another aspect of the present invention, a network
`interface transfers a dialog definition file from over the
`network. The dialog definition file contains an object table.
`A data processor searches the object table for a table entry
`that matches the object. Once this matching table entry is
`35 found, an application interface performs an action specified
`by the matching entry.
`In another aspect of the present invention, the dialog
`definition file associated with a network is located, and then
`read. The dialog definition file could be read from a variety
`40 of locations, such as a web-site, storage media, or a location
`that stores dialog definition files for multiple objects. An
`object table, contained within the dialog definition file, is
`searched to find a table entry matching the object. The
`matching entry defines an action associated with the object,
`45 and the action is then performed by the system. In addition
`to an object table, the dialog definition file may contain a
`context-specific grammar, entries for a natural language
`processor database, a context-specific dictation model, or
`any combination thereof.
`
`30
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`The embodiments of the present invention will become
`more apparent from the detailed description set forth below
`when taken in conjunction with the drawings in which like
`reference characters identify correspondingly throughout
`and wherein:
`FIG. 1 is a functional block diagram of an exemplary
`computer system embodiment;
`FIG. 2 is an expanded functional block diagram of the
`CPU 102 and storage medium 108 of the computer system
`embodiment of FIG. 1;
`FIGS. 3A-3D are a flowchart of a method embodiment of
`providing interactive speech recognition and natural lan(cid:173)
`guage processing to a computer;
`FIG. 4 is a diagram of selected columns of an exemplary
`natural language processing (NLP) database embodiment of
`the present invention;
`
`

`

`US 6,434,524 Bl
`
`5
`
`5
`FIG. 5 is a diagram of an exemplary Database Definition
`File (DDF) according to an embodiment of the present
`invention;
`FIG. 6 is a diagram of selected columns of an exemplary
`object table;
`FIGS. 7A-7C are a flowchart of a method embodiment of
`the present invention, illustrating the linking of interactive
`speech recognition and natural language processing to a
`networked object, such as a web-page;
`FIG. 8 is a diagram depicting a computer system con(cid:173)
`necting to other computers, storage media, and web-sites via
`the Internet;
`FIG. 9 is a diagram of an exemplary user voice profile
`according to an embodiment of the present invention; and
`FIG. 10 is a flowchart of a method embodiment of the
`present invention, illustrating the retrieval and enabling of
`an individual's user voice profile during login at a computer
`workstation.
`
`15
`
`6
`system 100 may be suitable off-the-shelf components as
`described above. The embodiments of the present invention
`provide a method and system for human interaction with the
`computer system 100 using speech.
`As shown in FIG. 8, constructed and operative in accor-
`dance with an embodiment of the present invention, the
`computer system 100 may be connected to the Internet 700,
`a collection of computer networks. To connect to the Internet
`700, computer system 100 may use a network interface 116,
`10 a modem connected to the data input port 114, or any other
`method known in the art. Web-sites 710, other computers
`720, and storage media 108 may also be connected to the
`Internet through such methods known in the art.
`Turning now to FIG. 2, FIG. 2 illustrates an expanded
`functional block diagram of CPU 102 and storage medium
`108 constructed and operative in accordance with an
`embodiment of the present invention. CPU 102 includes
`speech recognition processor 220, natural language proces(cid:173)
`sor 202 and application interface 220. Natural language
`processor 202 further includes variable replace 204, string
`20 formatter 206, wore weighter 208, boolean tester 210, pro(cid:173)
`noun replacer 211, and search engine 213. Storage medium
`108 includes a plurality of context-specific grammar files
`212, general grammar file 214, dictation grammar 216, and
`natural language processor (NLP) database 218. In one
`25 embodiment, the grammar files 212, 214, and 216 are
`Bakus-Naur Form (BNF) files, which describe the structure
`of the language spoken by the user. BNF files are well(cid:173)
`known in the art for describing the structure of language, and
`details of BNF files will therefore not be discussed herein.
`30 One advantage of BNF files is that hierarchical tree-like
`structures may be used to describe phrases or word
`sequences, without the need to explicitly recite all combi(cid:173)
`nations of these word sequences. Thus, the use of BNF files
`in the embodiment minimizes the physical sizes of the files
`35 212, 214, and 216 in the storage medium 108, increasing the
`speed at which these files can be enabled and searched as
`described below. However, in alternate embodiments, other
`file structures are used.
`The context-specific dictation model 217 is an optional
`40 file that contains specific models to improve dictation accu(cid:173)
`racy. These models enable users to specify word orders and
`word models. The models accomplish this by describing
`words and their relationship to other words, thus determin(cid:173)
`ing word meaning by contextual interpretation in a specific
`45 field or topic. Take for example, the phrase "650 megahertz
`microprocessor computer." A context-specific dictation
`model 217 for computers may indicate the likelihood of the
`word "microprocessor" with "computer," and that a number,
`such as "650" is likely to be found near the word "mega-
`50 hertz." By interpreting the context of the words, via a
`context-specific dictation model 217, a speech recognition
`processor would analyze the phrase, interpret a single object,
`i.e. the computer, and realize that "650 megahertz micro-
`processor" are adjectives or traits describing the type of
`computer.
`Topics for context-specific dictation models 217 vary
`widely, and may include any topic area of interest to a
`user-both broad and narrow. Broad topics may include:
`history, law, medicine, science, technology, or computers.
`Specialized topics, such as a particular field of literature
`encountered at a book retailer's web-site are also possible.
`Such a context-specific dictation model 217 may contain
`text for author and title information, for example.
`Finally, the context-specific dictation model 217 format
`65 relies upon the underlying speech recognition processor 200,
`and is specific to each type of speech recognition processor
`200.
`
`DETAILED DESCRIPTION
`The embodiments of the present invention will now be
`disclosed with reference to a functional block diagram of an
`exemplary computer system 100 of FIG. 1, constructed and
`operative in accordance with an embodiment of the present
`invention. In FIG. 1, computer system 100 includes a central
`processing unit (CPU) 102. The CPU 102 may be any
`general purpose microprocessor or microcontroller as is
`known in the art, appropriately programmed to perform the
`method embodiment described herein with reference to
`FIGS. 3A-3D. For example, CPU 102 may be a conven(cid:173)
`tional microprocessor such as the Pentium II processor
`manufactured by Intel Corporation or the like.
`CPU 102 communicates with a plurality of peripheral
`equipment, including a display 104, manual input 106,
`storage medium 108, microphone 110, speaker 112, data
`input port 114 and network interface 116. Display 104 may
`be a visual display such as a CRT, LCD screen, touch(cid:173)
`sensitive screen, or other monitors as are known in the art for
`visually displaying images and text to a user. Manual input
`106 may be a conventional keyboard, keypad, mouse,
`trackball, or other input device as is known in the art for the
`manual input of data. Storage medium 108 may be a
`conventional read/write memory such as a magnetic disk
`drive, floppy disk drive, CD-ROM drive, silicon memory or
`other memory device as is known in the art for storing and
`retrieving data. Significantly, storage medium 108 may be
`remotely located from CPU 102, and be connected to CPU
`102 via a network such as a local area network (LAN), or a
`wide area network (WAN), or the Internet. Microphone 110
`may be any suitable microphone as is known in the art for
`providing audio signals to CPU 102. Speaker 112 may be
`any suitable speaker as is known in the art for reproducing
`audio signals from CPU 102. It is understood that micro(cid:173)
`phone 110 and speaker 112 may include appropriate digital- 55
`to-analog and analog-to-digital conversion circuitry as
`appropriate. Data input port 114 may be any data port as is
`known in the art for interfacing with an external accessory
`using a data protocol such as RS-232, Universal Serial Bus,
`or the like. Network interface 116 may be any interface as 60
`known in the art for communicating or transferring files
`across a computer network, examples of such networks
`include TCP/IP, ethernet, or token ring networks. In
`addition, on some systems, a network interface 116 may
`consist of a modem connected to the data input port 114.
`Thus, FIG. 1 illustrates the functional elements of a
`computer system 100. Each of the elements of computer
`
`

`

`US 6,434,524 Bl
`
`7
`The operation and interaction of these functional elements
`of FIG. 2 will be described with reference to the flowchart
`of FIGS. 3A-3D, constructed and operative in accordance
`with an embodiment of the present invention. In FIG. 3A,
`the flow begins at

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket