`Weber
`
`USOO6434.524B1
`(10) Patent No.:
`US 6,434,524 B1
`(45) Date of Patent:
`Aug. 13, 2002
`
`(54) OBJECT INTERACTIVE USER INTERFACE
`USING SPEECH RECOGNITION AND
`NATURAL LANGUAGE PROCESSING
`
`(75)
`(73)
`
`(21)
`(22)
`
`(63)
`
`(51)
`(52)
`
`(58)
`
`(56)
`
`Inventor: Dean Weber, San Diego, CA (US)
`Assignee: One Voice Technologies, Inc., San
`Diego, CA (US)
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`
`Notice:
`
`Appl. No.: 09/412,929
`Filed:
`Oct. 5, 1999
`Related U.S. Application Data
`
`Continuation-in-part of application No. 09/166,199, filed on
`Oct. 5, 1998, which is a continuation-in-part of application
`No. 09/150,459, filed on Sep. 9, 1998.
`Int. Cl................................................. G01L 15/18
`U.S. Cl. ......................... 704/257; 704/275; 704/10;
`704/9; 707/3
`Field of Search .............................. 704/9, 10, 251,
`704/252,253, 257, 235, 270, 275; 707/4,
`3
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`4,783,803 A 11/1988 Baker et al. .................. 381/42
`4,887.212 A 12/1989 Zamora et al. ............. 364/419
`5,311,429 A * 5/1994 Tominaga
`5,799.279 A * 8/1998 Gould et al................. 704/231
`5.991,721. A * 11/1999 Asano et al. .....
`... 704/239
`6,112,174. A * 8/2000 Wakisaka et al. .
`... 701/117
`6,144.938 A * 11/2000 Surace et al. .....
`... 704/257
`6,188,977 B1
`2/2001 Hirota ........................... 704/9
`FOREIGN PATENT DOCUMENTS
`
`OTHER PUBLICATIONS
`Wyard et al., Spoken Language Systems-beyond prompt and
`response, Jan. 1996, BT Technology Journal, vol. 14, No. 1,
`pp. 187–207.*
`Approximate Word-Spotting Method for Constrained
`Grammars; Oct. 1994, IBM Technical DisclosureBulletin,
`vol. 37, No. 10, pp. 385.*
`* cited by examiner
`Primary Examiner William Korzuch
`ASSistant Examiner Abul K. Azad
`(74) Attorney, Agent, or Firm-James Y. C. Sze; Pillsbury
`Winthrop LLP
`ABSTRACT
`(57)
`A System and method for interacting with objects, via a
`computer using utterances, Speech processing and natural
`language processing. A Data Definition File relates net
`worked objects and a speech processor. The Data Definition
`File encompasses a memory Structure relating the objects,
`including grammar files, a context-specific dictation model,
`and a natural language processor. The Speech processor
`Searches a first grammar file for a matching phrase for the
`utterance, and for Searching a Second grammar file for the
`matching phrase if the matching phrase is not found in the
`first grammar file. The System also includes a natural lan
`guage processor for Searching a database for a matching
`entry for the matching phrase; and an application interface
`for performing an action associated with the matching entry
`if the matching entry is found in the database. The System
`utilizes context-specific grammars and dictation models,
`thereby enhancing Speech recognition and natural language
`processing efficiency. Additionally, for each user the System
`adaptively and interactively “learns' words and phrases, and
`their associated meanings, Storing the adaptive updates into
`user Voice profiles. Because the user Voice profiles can be
`Stored locally or remotely, users can access the adaptively
`learned words and phrases at various locations.
`
`EP
`
`O 837 962 A2 * 4/1998 ............. GO1 L/5/06
`
`76 Claims, 12 Drawing Sheets
`
`Provide terance
`to speech
`processor
`
`300
`
`Context
`specific grammar
`enable
`302
`
`Search context
`specific grammar
`
`Context
`specific grammar
`enabled
`
`Enable general
`grammar
`
`Search genera
`grammar
`
`Match
`foundingeneral
`
`grammara-312 -Ga)
`
`y
`
`Enable dictation
`grammar
`
`Search dictation
`grammar
`
`Capture manually
`entered Words
`
`30
`
`Direct application
`interface to take
`associated action-32
`
`"
`
`Access NLP
`database
`
`-
`
`Prepend context
`to matching
`phrase
`
`Match
`found in dictation
`grammar
`
`Enable COftext
`specific dictation
`model
`
`
`
`found in context
`specific dictation
`node
`
`Error
`Message
`
`320
`
`Petitioner Google Ex-1024, 0001
`
`
`
`U.S. Patent
`
`Aug. 13, 2002
`
`Sheet 1 of 12
`
`US 6,434,524 B1
`
`104
`
`Display
`
`-100
`
`108
`
`114
`
`
`
`Data input
`
`116
`
`
`
`NetWOrk
`InterfaCe
`
`
`
`Network Object
`Table
`
`217
`
`Context-Specific
`Dictation MOdel
`
`FIG. 5
`
`Petitioner Google Ex-1024, 0002
`
`
`
`U.S. Patent
`
`Aug.13, 2002
`
`Sheet 2 of 12
`
`PJOM
`
`JayyBie
`
`Bus
`
`JOYCWI04
`
`sIQeueA
`
`Jeoeiday
`
`
`
`seoe|day19]S8]
`
`
`
`UNOUOsueg|oog
`
`Ole
`
`yoaeds
`
`uomubooay
`
`JOSS80014
`
`uoeoyddy
`
`B0BL9}UI
`
`Beq
`
`JOSS8001d
`
`4d0
`
`Add
`
`
`
`abenbue]jenjen
`
`PO|Joss900!4
`
`US 6,434,524 BI
`
`Ble
`
`PTTSF7||aseqeleq
`
`
`
`eeeaioeds.seWWeIDsewed“yond|dIN
`
`
`IXE]U0)UIE}4X9]U09)|[eJ9USE)
`Le~~Le
`
`
`¢Sid
`
`Petitioner Google Ex-1024, 0003
`
`Petitioner Google Ex-1024, 0003
`
`
`
`
`
`
`
`
`
`
`
`U.S. Patent
`
`Aug. 13, 2002
`
`Sheet 3 of 12
`
`US 6,434,524 B1
`
`Provide utterance
`to speech
`processor
`
`300
`
`
`
`Context
`Specific grammar
`enabled
`302
`
`Capture manually
`entered WOrds
`
`301
`
`Direct application
`interface tO take
`aSSOCiated action
`
`322
`
`
`
`Context
`specific grammar
`enabled
`
`Prepend Context
`to matching
`phrase
`
`(A)
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Match
`found in general
`grammar
`
`N
`
`Enable dictation
`grammar
`
`314
`
`Search dictation
`grammar
`
`316
`
`
`
`
`
`
`
`Match
`found in dictation
`grammar
`
`Enable COntext
`specific dictation
`model
`
`317
`
`
`
`
`
`found in COntext
`specific dictation
`mOde?
`N
`- - - - - - - -
`Error
`Message
`
`320
`
`FIG. 3A
`
`Petitioner Google Ex-1024, 0004
`
`
`
`U.S. Patent
`
`Aug. 13, 2002
`
`Sheet 4 of 12
`
`US 6,434,524 B1
`
`
`
`
`
`
`
`
`
`
`
`Format phrase
`for NLP analysis
`
`328
`
`Replace Word
`variables with
`aSSOCiated Wildcard
`function
`
`PrOnOUn
`Substitution
`
`
`
`Weight individual
`WOrds
`
`Search NLP
`database
`
`330
`
`332
`
`334
`
`336
`
`Generate COnfidence
`value for possible
`matches
`
`338
`
`
`
`
`
`
`
`Any
`match have
`COnfidence
`
`342
`- - - - - - -
`|
`Error
`Message
`
`Discard "noise"
`Words from phrase
`
`344
`
`FIG. 3B
`
`Petitioner Google Ex-1024, 0005
`
`
`
`U.S. Patent
`
`Aug. 13, 2002
`
`Sheet 5 of 12
`
`US 6,434,524 B1
`
`Retrieve non-noise WOrd
`requirement from NLP
`database for
`highest-Confidence entry
`
`
`
`
`
`Require
`number of non
`noise words from phrase in
`highest-Confidence
`entry?
`
`Take action(s)
`aSSOciated With
`highest
`confidence entry
`
`
`
`Enable context-specific
`grammar for context
`aSSOCiated With
`highest-confidence entry
`
`
`
`
`
`
`
`
`
`
`
`Prompt user whether
`highest-Confidence
`entry was meant
`
`User
`responds
`affirmatively
`
`
`
`FIG. 3C
`
`Retrieve COntext for
`highest-confidence entry
`
`358
`
`Prompt user for information
`using Context-based
`interactive dialog
`
`360
`
`FIG. 3D
`
`Update entries in NLP
`database, general grammar,
`and Context-specific grammar N-362
`
`Petitioner Google Ex-1024, 0006
`
`
`
`U.S. Patent
`
`Aug.13, 2002
`
`Sheet 6 of 12
`
`US 6,434,524 BI
`
`8c
`
`AOlP807v0vcOVOOP
`
`eu
`
`
`IXa]UONGNSAXe]UOD
`
`ayISGam
`
`sebewAejdsiqSMUSSB00Y
`
`
`SMOU9}BWMOUS
`
`VIN
`
`V/N
`
`V/N
`
`VIN
`
`@WILLJO
`
`
`
`allsqameoudayep$UOYO0}SWEI
`
`
`
`alsgameoudsie||Op$
`
`aysGamUOHedOI$ye
`
`aISGaM
`
`
`
`aieSOIAOWJEU
`
`ewigyeBulAeld
`
`
`
`HfS|OUI}JEUM
`
`U-VolV
`
`
`
`
`
`
`
`ydeseseyyeuwo04DJONA8}2907]
`
`diaydjaHJOSse001d
`
`JOSSO001dPIOAA
`
`dja}ajqe|Wesu|
`diaHJeeyspeads
`
`9}2007]
`
`vSls
`
`JEWWUO}|OPMOH
`
`
`
`udeiBesedsiy}
`
`9jqe}eYasSul|OPMOH
`
`Petitioner Google Ex-1024, 0007
`
`Petitioner Google Ex-1024, 0007
`
`
`
`
`
`
`
`
`
`
`
`
`U.S. Patent
`
`Aug. 13, 2002
`
`Sheet 7 of 12
`
`US 6,434,524 B1
`
`089
`
`829
`
`729
`
`
`
`
`
`“ÁSee s?
`
`UJ00
`
`Petitioner Google Ex-1024, 0008
`
`
`
`U.S. Patent
`
`Aug. 13, 2002
`
`Sheet 8 of 12
`
`US 6,434,524 B1
`
`(SE).
`
`Provide object
`location to
`program
`
`602
`
`610
`Frror -4--
`Able
`to resolve object YN Error, Display
`location
`error message.
`
`
`
`Object
`retrieved
`Successfully
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`ls
`DDF file
`for object already
`present
`
`
`
`Petitioner Google Ex-1024, 0009
`
`
`
`U.S. Patent
`
`Aug. 13, 2002
`
`Sheet 9 of 12
`
`US 6,434,524 B1
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`SDDF
`file/information
`encoded in object
`information
`
`
`
`SDDF
`file/information
`OCated at
`Web-site
`?
`
`SDDF
`file/informationN N
`located at
`Centralized
`location
`
`
`
`
`
`fi DDF d
`ile retrieve
`Successfully
`
`
`
`Y
`
`Replace any prior
`DDF file With
`newly obtained
`DDF file
`
`Extract object table,
`any context-specific
`grammar files, NLP
`database, and
`Context-specific
`dictation m00els
`
`630
`
`632
`
`Deactivate existing
`(if any) object table,
`NLP database,
`Context-specific
`grammar,
`context-specific
`dictation model
`
`
`
`
`
`
`
`Treat object as
`nOn-WOice
`activated object
`(use Only standard
`grammar files)
`
`622
`
`624
`
`FIG. 7B
`
`Petitioner Google Ex-1024, 0010
`
`
`
`U.S. Patent
`
`Aug. 13, 2002
`
`Sheet 10 of 12
`
`US 6,434,524 B1
`
`
`
`Read Object
`table
`
`634
`
`
`
`
`
`ls
`object present in
`the object table
`
`
`
`
`
`
`
`ls
`there a spoken
`Statement asSOciated
`with the object
`
`
`
`Play spoken
`Statement
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Context-specific
`grammar (file)
`asSociated with
`the object
`
`Enable Context
`Specific grammar
`
`Enable NLP
`database
`
`
`
`
`
`
`
`
`
`ls
`there a
`Context-specific
`dictation model
`aSSOCiated With the
`object
`p
`
`
`
`
`
`Enable
`Context-specific
`dictation model
`
`FIG. 7C
`
`Petitioner Google Ex-1024, 0011
`
`
`
`U.S. Patent
`
`Aug. 13, 2002
`
`Sheet 11 of 12
`
`US 6,434,524 B1
`
`
`
`General
`Grammar
`additions
`
`Context-Specific
`Grammar
`additions
`
`NLP
`Database
`additions
`
`FIG. 9
`
`Petitioner Google Ex-1024, 0012
`
`
`
`U.S. Patent
`
`Aug. 13, 2002
`
`Sheet 12 of 12
`
`US 6,434,524 B1
`
`
`
`
`
`
`
`Query User
`for Login ID
`& PaSSWOrd
`
`
`
`
`
`
`
`
`
`
`
`
`
`DOes User
`have a locally stored
`user Voice profile
`
`
`
`
`
`
`
`
`
`Load local user
`voice profile
`
`
`
`ls the
`retrieval of the
`local voice profile
`SuCCessful
`
`
`
`
`
`
`
`Does User
`have a travelling
`user voice profile
`
`
`
`Load travelling
`user Voice profile
`
`ls the
`retrieval Of the
`travelling voice profile
`SuCCeSSful
`
`
`
`
`
`
`
`Enable USer
`voice profile
`
`
`
`No user voice profile
`available, USe
`Standard
`(non-Custom user)
`processing
`
`FIG. O
`
`Petitioner Google Ex-1024, 0013
`
`
`
`1
`OBJECT INTERACTIVE USER INTERFACE
`USING SPEECH RECOGNITION AND
`NATURAL LANGUAGE PROCESSING
`
`The aspects of the present invention relate to speech
`recognition for an object-based computer user interfice.
`More specifically, the embodiments of the present invention
`relate to a novel method and System for user interaction with
`a computer using Speech recognition and natural language
`process. This application is a continuation-in-part of U.S.
`patent application Ser. No. 09/166,198, entitled “Network
`interactive User Interface Using Speech Recognition and
`Natural Language Processing,” filed Oct. 5, 1998, a
`continuation-in-part of U.S. patent application Ser. No.
`09/150,459, entitled “Interactive User Interface Using
`Speech Recognition and Natural Language Processing filed
`Sep. 9, 1998. This application is additionly related to PCT/
`US99/20445 and PCT/US99/20447, both filed Sep. 9, 1999,
`corresponding to U.S. patent application Ser. Nos. 09/150,
`459 and 09/166,198, respectively.
`BACKGROUND
`Description of the Related Art
`AS computers have become more prevalent it has become
`clear that many people have great difficulty understanding
`and communicating with computers. A user must often learn
`archaic commands and non-intuitive procedures in order to
`operate the computer. For example, most personal comput
`erS use windows-based operating Systems that are largely
`menu-driven. This requires that the user learn what menu
`commands or Sequence of commands produce the desired
`results.
`Furthermore, traditional interaction with a computer is
`often slowed by manual input devices Such as keyboards or
`mice. Many computer users are not fast typists. As a result,
`much time is spent communicating commands and words to
`the computer through these manual input devices. It is
`becoming clear that an easier, faster and more intuitive
`method of communicating with computers and networked
`objects, Such as web-sites, is needed.
`One proposed method of computer interactions is speech
`recognition. Speech recognition involves Software and hard
`ware that act together to audibly detect human speech and
`translate the detected Speech into a String of words. AS is
`known in the art, speech recognition words by breaking
`down Sounds the hardware detects into Smaller non-divisible
`Sounds called phonemes. Phonemes are distinct units of
`sound. For example, the word “those” is made up of three
`phonemes, the first is the “th” sound, the second is the “o”
`Sound, and the third is the “s' Sound. The Speech recognition
`Software attempts to match the detected phonemes with
`known words from a stored dictionary. An example of a
`speech recognition system is given in U.S. Pat. No. 4,783,
`803, entitled “SPEECH RECOGNITION APPARATUS
`AND METHOD", issued Nov. 8, 1998, assigned to Dragon
`Systems, Incorporated. Presently, there are many commer
`cially available Speech recognition Software packages avail
`able from Such companies as Dragon Systems, Inc. and
`International Business Machine Corporation.
`One limitation of these speech recognition Software pack
`ages or Systems is that they typically only perform command
`and control or dictation functions. Thus, the user is still
`required to learn a Vocabulary of commands in order to
`operate the computer.
`A proposed enhancement to these speech recognition
`Systems is to process the detected words using a natural
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`US 6,434,524 B1
`
`2
`language processing System. Natural language processing
`generally involves determining a conceptual “meaning”
`(e.g., what meaning the Speaker intended to convey) of the
`detected words by analyzing their grammatical relationship
`and relative context. For example, U.S. Pat. No. 4,887.212,
`entitled “PARSER FOR NATURAL LANGUAGE TEXT,
`issued Dec. 12, 1989, assigned to International Business
`Machines Corporation teaches a method of parsing an input
`Stream of words by using word isolation, morphological
`analysis, dictionary look-up and grammar analysis.
`Natural language processing used in concert with Speech
`recognition provides a powerful tool for operating a com
`puter using Spoken words rather than manual input Such as
`a keyboard or mouse. However, one drawback of a conven
`tional natural language processing System is that it may fail
`to determine the correct “meaning” of the words detected by
`the Speech recognition System. In Such a case, the user is
`typically required to recompose or restate the phrase, with
`the hope that the natural language processing System will
`determine the correct “meaning on Subsequent attempts.
`Clearly, this may lead to Substantial delays as the user is
`required to restate the entire Sentence or command. Another
`drawback of conventional Systems is that the processing
`time required for the Speech recognition can be prohibitively
`long. This is primarily due to the finite Speed of the pro
`cessing resources as compared with the large amount of
`information to be processed. For example, in many conven
`tional Speech recognition programs, the time required to
`recognize the utterance is long due to the size of the
`dictionary file being Searched.
`An additional drawback of conventional Speech recogni
`tion and natural language processing Systems is that they are
`not interactive, and thus are unable to cope with new
`Situations. When a computer System encounters unknown or
`new networked objects, new relationships between the com
`puter and the objects are formed. Conventional Speech
`recognition and natural language processing Systems are
`unable to cope with the situations that result from the new
`relationships posed by previously unknown networked
`objects. As a result, a conversational-style interaction with
`the computer is not possible. The user is required to com
`municate complete concepts to the computer. The user is not
`able to Speak in Sentence fragments because the meaning of
`these sentence fragments (which is dependent on the mean
`ing of previous utterances) will be lost.
`Another drawback of conventional Speech recognition
`and natural language processing Systems is that once a user
`Successfully "trains a computer System to recognize the
`user's Speech and Voice commands, the user cannot easily
`move to another computer without having to undergo the
`process of training the new computer. As a result, changing
`a user's computer WorkStations or location results in wasted
`time by users that need to re-train the new computer to the
`user's Speech habits and Voice commands.
`
`SUMMARY
`The embodiments of the present invention include a novel
`and improved System and method for interacting with a
`computer using utterances, Speech processing and natural
`language processing. Generally, the System comprises a
`Speech processor for Searching a first grammar file for a
`matching phrase for the utterance, and for Searching a
`Second grammar file for the matching phrase if the matching
`phase is not found in the first grammar file. The System also
`includes a natural language processor for Searching a data
`base for a match entry for the matching phrase; and an
`
`Petitioner Google Ex-1024, 0014
`
`
`
`3
`application interface for performing an action associated
`with the matching entry if the matching entry is found in the
`database.
`In one embodiment, the natural language processor
`updates at least one of the database, the first grammar file
`and the Second grammar file with the matching phrase if the
`matching entry is not found in the database.
`The first grammar file is a context-specific grammar file.
`A context-specific grammar file is one that contains words
`and phrases that are highly relevant to a specific Subject. The
`Second grammar file is a general grammar file. A general
`grammar file is one that contains words and phrases which
`do not need to be interpreted in light of a context. That is to
`Say, the words and phrases in the general grammar file do not
`belong to any parent context. By Searching the context
`Specific grammar file before Searching the general grammar
`file, the present invention allows the user to communicate
`with the computer using a more conversational Style,
`wherein the words spoken, if found in the context specific
`grammar file, are interpreted in light of the Subject matter
`most recently discussed.
`In a further aspect of the present invention, the Speech
`processor Searches a dictation grammar for the matching
`phrase if the matching phrase is not found in the general
`grammar file. The dictation grammar is a large Vocabulary of
`general words and phrases. By Searching the context
`Specific and general grammars first, it is expected that the
`Speech recognition time will be greatly reduced due to the
`context-specific and general grammars being physically
`Smaller files than the dictation grammar.
`In another aspect of the present invention, the Speech
`processor Searches a context-specific dictation model for the
`matching phrase if the matching phrase is not found within
`the dictation grammar. A context-specific dictation model is
`a model that indicates the relationship between words in a
`Vocabulary. The Speech processor uses this to determine help
`decode the meaning of related words in an utterance.
`In another aspect of the present invention, the natural
`language processor replaces at least one word in the match
`ing phrase prior to Searching the database. This may be
`accomplished by a variable replacer in the natural language
`processor for Substituting a wildcard for the at least one
`word in the matching phrase. By Substituting wildcards for
`certain words (called “word-variables”) in the phrase, the
`number of entries in the database can be significantly
`reduced. Additionally, a pronoun Substituter in the natural
`language processor may Substitute a proper name for pro
`nouns the matching phrase, allowing user-specific facts to be
`Stored in the database.
`In another aspect, a String formatter text formats the
`matching phrase prior to Searching the database. Also, a
`word weighter weights individual words in the matching
`phrase according to a relative significance of the individual
`words prior to Searching the database. These acts allow for
`faster, more accurate Searching of the database.
`A Search engine in the natural language processor gener
`ates a confidence value for the matching entry. The natural
`language processor compares the confidence value with a
`threshold value. A boolean tester determines whether a
`required number of words from the matching phrase are
`present in the matching entry. This boolean testing Serves as
`a verification of the results returned by the Search engine.
`In order to clear up ambiguities, the natural language
`processor prompts the user whether the matching entry is a
`correct interpretation of the utterance if the required number
`of words from the matching phrase are not present in the
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`US 6,434,524 B1
`
`4
`matching entry. The natural language processor also prompts
`the user for additional information if the matching entry is
`not a correct interpretation of the utterance. At least one of
`the database, the first grammar file and the Second grammar
`file are updated with the additional information. In this way,
`the present invention adaptively “learns” the meaning of
`additional utterances, thereby enhancing the efficiency of the
`user interface.
`The Speech processor will enable and Search a context
`Specific grammar associated with the matching entry for a
`Subsequent matching phrase for a Subsequent utterance. This
`ensures that the most relevant words and phrases will be
`Searched first, thereby decreasing speech recognition times.
`Generically, the embodiments include a method to update
`a computer for voice interaction with an object, Such as a
`help file or web page. Initially, an object table, which
`asSociates with the object with the Voice interaction System,
`is transferred to the computer over a network. The location
`of the object table can be imbedded within the object, at a
`Specific internet web-site, or at consolidated location that
`stores object tables for multiple objects. The object table is
`Searched for an entry marching the object. The entry match
`ing object may result in an action being performed, Such as
`text Speech being voiced through a speaker, a context
`Specific grammar file being used, or a natural language
`processor database being used. The object table may be part
`of a dialog definition file. Dialog definition files may also
`include a context-specific grammar, entries for a natural
`language processor database, a context-specific dictation
`model, or any combination thereof.
`In another aspect of the present invention, a network
`interface transferS a dialog definition file from over the
`network. The dialog definition file contains an object table.
`A data processor Searches the object table for a table entry
`that matches the object. Once this matching table entry is
`found, an application interface performs an action Specified
`by the matching entry.
`In another aspect of the present invention, the dialog
`definition file associated with a network is located, and then
`read. The dialog definition file could be read from a variety
`of locations, Such as a web-site, Storage media, or a location
`that Stores dialog definition files for multiple objects. An
`object table, contained within the dialog definition file, is
`Searched to find a table entry matching the object. The
`matching entry defines an action associated with the object,
`and the action is then performed by the System. In addition
`to an object table, the dialog definition file may contain a
`context-specific grammar, entries for a natural language
`processor database, a context-specific dictation model, or
`any combination thereof.
`BRIEF DESCRIPTION OF THE DRAWINGS
`The embodiments of the present invention will become
`more apparent from the detailed description Set forth below
`when taken in conjunction with the drawings in which like
`reference characters identify correspondingly throughout
`and wherein:
`FIG. 1 is a functional block diagram of an exemplary
`computer System embodiment;
`FIG. 2 is an expanded functional block diagram of the
`CPU 102 and storage medium 108 of the computer system
`embodiment of FIG. 1;
`FIGS. 3A-3D are a flowchart of a method embodiment of
`providing interactive speech recognition and natural lan
`guage processing to a computer;
`FIG. 4 is a diagram of Selected columns of an exemplary
`natural language processing (NLP) database embodiment of
`the present invention;
`
`Petitioner Google Ex-1024, 0015
`
`
`
`US 6,434,524 B1
`
`S
`FIG. 5 is a diagram of an exemplary Database Definition
`File (DDF) according to an embodiment of the present
`invention;
`FIG. 6 is a diagram of Selected columns of an exemplary
`object table;
`FIGS. 7A-7C are a flowchart of a method embodiment of
`the present invention, illustrating the linking of interactive
`Speech recognition and natural language processing to a
`networked object, Such as a web-page;
`FIG. 8 is a diagram depicting a computer System con
`necting to other computers, Storage media, and web-sites via
`the Internet;
`FIG. 9 is a diagram of an exemplary user voice profile
`according to an embodiment of the present invention; and
`FIG. 10 is a flowchart of a method embodiment of the
`present invention, illustrating the retrieval and enabling of
`an individuals user voice profile during login at a computer
`WorkStation.
`
`6
`system 100 may be suitable off-the-shelf components as
`described above. The embodiments of the present invention
`provide a method and System for human interaction with the
`computer System 100 using speech.
`AS shown in FIG. 8, constructed and operative in accor
`dance with an embodiment of the present invention, the
`computer system 100 may be connected to the Internet 700,
`a collection of computer networks. To connect to the Internet
`700, computer system 100 may use a network interface 116,
`a modem connected to the data input port 114, or any other
`method known in the art. Web-sites 710, other computers
`720, and storage media 108 may also be connected to the
`Internet through Such methods known in the art.
`Turning now to FIG. 2, FIG. 2 illustrates an expanded
`functional block diagram of CPU 102 and storage medium
`108 constructed and operative in accordance with an
`embodiment of the present invention. CPU 102 includes
`Speech recognition processor 220, natural language proces
`Sor 202 and application interface 220. Natural language
`processor 202 further includes variable replace 204, string
`formatter 206, wore weighter 208, boolean tester 210, pro
`noun replacer 211, and Search engine 213. Storage medium
`108 includes a plurality of context-specific grammar files
`212, general grammar file 214, dictation grammar 216, and
`natural language processor (NLP) database 218. In one
`embodiment, the grammar files 212, 214, and 216 are
`Bakus-Naur Form (BNF) files, which describe the structure
`of the language spoken by the user. BNF files are well
`known in the art for describing the Structure of language, and
`details of BNF files will therefore not be discussed herein.
`One advantage of BNF files is that hierarchical tree-like
`Structures may be used to describe phrases or word
`Sequences, without the need to explicitly recite all combi
`nations of these word sequences. Thus, the use of BNF files
`in the embodiment minimizes the physical sizes of the files
`212, 214, and 216 in the storage medium 108, increasing the
`Speed at which these files can be enabled and Searched as
`described below. However, in alternate embodiments, other
`file Structures are used.
`The context-specific dictation model 217 is an optional
`file that contains Specific models to improve dictation accu
`racy. These models enable users to Specify word orders and
`word models. The models accomplish this by describing
`words and their relationship to other words, thus determin
`ing word meaning by contextual interpretation in a specific
`field or topic. Take for example, the phrase “650 megahertz
`microprocessor computer.” A context-specific dictation
`model 217 for computers may indicate the likelihood of the
`word “microprocessor' with “computer,” and that a number,
`such as "650” is likely to be found near the word “mega
`hertz.” By interpreting the context of the words, via a
`context-specific dictation model 217, a speech recognition
`processor would analyze the phrase, interpret a single object,
`i.e. the computer, and realize that “650 megahertz micro
`processor are adjectives or traits describing the type of
`computer.
`Topics for context-specific dictation models 217 vary
`widely, and may include any topic area of interest to a
`user-both broad and narrow. Broad topics may include:
`history, law, medicine, Science, technology, or computers.
`Specialized topics, Such as a particular field of literature
`encountered at a book retailers web-site are also possible.
`Such a context-specific dictation model 217 may contain
`text for author and title information, for example.
`Finally, the context-specific dictation model 217 format
`relies upon the underlying speech recognition processor 200,
`and is specific to each type of Speech recognition processor
`200.
`
`15
`
`25
`
`35
`
`40
`
`DETAILED DESCRIPTION
`The embodiments of the present invention will now be
`disclosed with reference to a functional block diagram of an
`exemplary computer system 100 of FIG. 1, constructed and
`operative in accordance with an embodiment of the present
`invention. In FIG. 1, computer system 100 includes a central
`processing unit (CPU) 102. The CPU 102 may be any
`general purpose microprocessor or microcontroller as is
`known in the art, appropriately programmed to perform the
`method embodiment described herein with reference to
`FIGS. 3A-3D. For example, CPU 102 may be a conven
`tional microprocessor Such as the Pentium II processor
`manufactured by Intel Corporation or the like.
`CPU 102 communicates with a plurality of peripheral
`equipment, including a display 104, manual input 106,
`storage medium 108, microphone 110, speaker 112, data
`input port 114 and network interface 116. Display 104 may
`be a visual display such as a CRT, LCD screen, touch
`Sensitive Screen, or other monitors as are known in the art for
`Visually displaying images and text to a user. Manual input
`106 may be a conventional keyboard, keypad, mouse,
`trackball, or other input device as is known in the art for the
`manual input of data. Storage medium 108 may be a
`conventional read/write memory Such as a magnetic disk
`drive, floppy disk drive, CD-ROM drive, silicon memory or
`other memory device as is known in the art for Storing and
`retrieving data. Significantly, Storage medium 108 may be
`remotely located from CPU 102, and be connected to CPU
`102 via a network Such as a local area network (LAN), or a
`wide area network (WAN), or the Internet. Microphone 110
`may be any Suitable microphone as is known in the art for
`providing audio signals to CPU 102. Speaker 112 may be
`any Suitable Speaker as is known in the art for reproducing
`audio signals from CPU 102. It is understood that micro
`phone 110 and Speaker 112 may include appropriate digital
`to-analog and analog-to-digital conversion circuitry as
`appropriate. Data input port 114 may be any data port as is
`known in the art for interfacing with an external accessory
`using a data protocol such as RS-232, Universal Serial Bus,
`or the like. Network interface 116 may be any interface as
`known in the art for communicating or transferring files
`acroSS a computer network, examples of Such networks
`include TCP/IP, ethernet, or token ring networks. In
`addition, on Some Systems, a network interface 116 may
`consist of a modem connected to the data input port 114.
`Thus, FIG. 1 illustrates the functional elements of a
`computer system 100. Each of the elements of computer
`
`45
`
`50
`
`55
`
`60
`
`65
`
`Petitioner Google Ex-1024, 0016
`
`
`
`7
`The operation and interaction of these functional elements
`of FIG. 2 will be described with reference to the flowchart
`of FIGS. 3A-3D, constructed and operative in accordance
`with an embodiment of the present invention. In FIG. 3A,
`the flow begins at block 300 with the providing of an
`utterance to Speech processor 200. An utterance is a Series of
`Sounds havi