`White et al.
`
`USOO6408272B1
`(10) Patent No.:
`US 6,408,272 B1
`45) Date of Patent:
`Jun. 18, 2002
`
`9
`
`(*) Notice:
`
`(56)
`
`References Cited
`U.S. PATENT DOCUMENTS
`4,525,793 A * 7/1985 Stackhouse ................. 704/270
`
`5,819,220 A * 10/1998 Sarukkai et al. ............ 704/270
`(54) DISTRIBUTED VOICE USER INTERFACE
`5,926,789 A * 7/1999 Barbara et al. ............. 704/275
`5,946.050 A * 8/1999 Wolff ......................... 704/270
`(75) Inventors: George M. White, Santa Cruz; James
`5,953,392 A * 9/1999 Rhie et al. ............... 379/88.13
`J. Buteau, Mountain View; Glen E.
`5,953,700 A * 9/1999 Kanevsky et al. .......... 704/246
`..i.
`Shire, R"SE 5.956,683 A * 9/1999 Jacobs et al. ............... 704/275
`unnyvale,
`5,960,399 A * 9/1999 Barclay et al. ............. 704/270
`, LOS
`Gatos, all of CA (US)
`5,963,618 A * 10/1999 Porter ..................... 379/88.17
`6,078,886 A * 6/2000 Dragosh et al. ............ 704/270
`(73) Assignee: General Magic, Inc., Sunnyvale, CA
`6,098,043 A
`8/2000 Forest et al. ................ 704/270
`(US)
`6,101,473 A
`8/2000 Scott et al. ................. 704/270
`Subject to any disclaimer, the term of this
`* cited by examiner
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`Primary Examiner Talinvaldis Ivars Smits
`ASSistant Examiner Martin Lerner
`(21) Appl. No.: 09/290,508
`(74) Attorney, Agent, or Firm-Skjerven Morrill LLP
`(22) Filed:
`Apr. 12, 1999
`(57)
`ABSTRACT
`(51) Int. Cl. ................................................ G10L 15/22
`A distributed Voice user interface System includes a local
`(52) U.S. Cl. ..................................... 7041270.1; 704/275
`device which receives speech input issued from a user. Such
`(58) Field of Search ................................. 704/251, 252,
`Speech input may specify a command or a request by the
`704/270, 270.1, 275, 200, 201; 379/88.01,
`user. The local device performs preliminary processing of
`88.04, 88.16
`the Speech input and determines whether it is able to respond
`to the command or request by itself. If not, the local device
`initiates communication with a remote System for further
`processing of the Speech input.
`
`6 Claims, 5 Drawing Sheets
`
`14d
`
`14e
`
`14f
`
`Local
`Device
`
`Local
`Device
`
`Local
`Device
`
`u 12
`
`Remote System
`
`
`
`16
`
`.-
`
`Local
`Device
`
`- - - --------------
`
`14a
`
`Telecommunications
`Network
`
`Local
`Device
`
`14
`
`-M
`
`Local
`
`Device
`
`14C
`
`10
`Distributed Voice User Interface System
`
`Internet
`
`
`
`Local
`Device
`7
`
`14g
`
`Exhibit 1012
`Page 01 of 17
`
`
`
`U.S. Patent
`
`Jun. 18, 2002
`
`Sheet 1 of 5
`
`US 6,408,272 B1
`
`o»). I 18001
`_| 30?A?O
`
`----
`
`| -61-I
`
`
`
`
`
`
`
`
`
`Exhibit 1012
`Page 02 of 17
`
`
`
`U.S. Patent
`
`Jun. 18, 2002
`
`Sheet 2 of 5
`
`US 6,408,272 B1
`
`ZZ
`
`ÁeIds[c]
`
`
`
`
`
`
`
`
`
`?o?A?OJ [eoOT
`
`Exhibit 1012
`Page 03 of 17
`
`
`
`U.S. Patent
`
`Jun. 18, 2002
`
`Sheet 3 of 5
`
`US 6,408,272 B1
`
`Local Devices
`
`O
`O
`
`
`
`
`
`
`
`
`
`
`
`
`
`a
`
`
`
`queuoduoo u?-96Jeg
`
`89
`
`------?
`
`Exhibit 1012
`Page 04 of 17
`
`
`
`U.S. Patent
`
`Jun. 18, 2002
`
`Sheet 4 of 5
`
`US 6,408,272 B1
`
`
`
`102
`
`Activation
`Event?
`
`
`
`
`
`Processing at local
`device
`
`106
`
`Yes
`
`
`
`
`
`Local
`processing
`sufficient?
`
`No
`
`110
`
`Establish Connection
`between local device
`and remote system
`
`Transmit data/speech
`from local device to
`remote system
`
`
`
`112
`
`113
`
`Yes
`
`-No
`
`
`
`
`
`Response
`received from
`remote system?
`
`Yes
`
`
`
`Terminate connection
`between local device
`and remote system
`
`118
`
`120
`
`Take action based
`on processing
`
`
`
`End session?
`
`Yes
`
`Exhibit 1012
`Page 05 of 17
`
`
`
`U.S. Patent
`
`Jun. 18, 2002
`
`Sheet 5 of 5
`
`US 6,408,272 B1
`
`Input from
`user?
`
`Yes
`
`Compare to
`grammars
`
`204
`
`
`
`Yes (B)
`
`
`
`
`
`Request more
`input
`
`
`
`
`
`208
`
`Compare to
`grammars
`
`210
`
`k-Yes
`
`
`
`
`
`Yes->GB)
`
`NO
`
`214
`
`Yes
`
`More attempts?
`
`200
`
`
`
`Have user select
`from listed
`commands or
`requests
`
`218
`
`Compare to
`grammars
`
`Respond to
`command or
`request
`
`
`
`226
`
`session?
`
`Yes
`
`Fig. 5
`
`Exhibit 1012
`Page 06 of 17
`
`
`
`1
`DISTRIBUTED VOICE USER INTERFACE
`
`CROSS-REFERENCE TO RELATED
`APPLICATIONS
`This Application relates to the Subject matter disclosed in
`the following co-pending United States Applications: U.S.
`application Ser. No. 08/609,699, filed Mar. 1, 1996, entitled
`"Method and Apparatus For Telephonically Accessing and
`Navigating the Internet,” now U.S. Pat. No. 5,953,392
`issued Sep. 14, 1999; and U.S. application Ser. No. 09/071,
`717, filed May 1, 1998, entitled “Voice User Interface With
`Personality,” now U.S. Pat. No. 6,144.938 issued Nov. 7,
`2000. These applications were co-pending at the time of
`filing of this application, are assigned to the present
`ASSignee and are incorporated herein by reference.
`
`15
`
`TECHNICAL FIELD OF THE INVENTION
`The present invention relates generally to user interfaces
`and, more particularly, to a distributed Voice user interface.
`
`CROSS-REFERENCE TO MICROFICHE
`APPENDICES
`A portion of the disclosure of this patent document
`contains material that is Subject to copyright protection. The
`copyright owner has no objection to the facsimile reproduc
`tion by anyone of the patent disclosure as it appears in the
`Patent and Trademark Office patent files or records, but
`otherwise reserves all copyright rights whatsoever.
`
`25
`
`BACKGROUND OF THE INVENTION
`A voice user interface (VUI) allows a human user to
`interact with an intelligent, electronic device (e.g., a
`computer) by merely “talking” to the device. The electronic
`device is thus able to receive, and respond to, directions,
`commands, instructions, or requests issued verbally by the
`human user. AS Such, a VUI facilitates the use of the device.
`A typical VUI is implemented using various techniques
`which enable an electronic device to “understand' particular
`words or phrases spoken by the human user, and to output
`or “speak' the same or different words/phrases for
`prompting, or responding to, the user. The words or phrases
`understood and/or spoken by a device constitute its “vocabu
`lary.” In general, the number of words/phrases within a
`device's Vocabulary is directly related to the computing
`power which supports its VUI. Thus, a device with more
`computing power can understand more words or phrases
`than a device with less computing power.
`Many modern electronic devices, Such as personal digital
`assistants (PDAS), radios, Stereo systems, television sets,
`remote controls, household Security Systems, cable and
`Satellite receivers, Video game Stations, automotive dash
`board electronics, household appliances, and the like, have
`Some computing power, but typically not enough to Support
`a sophisticated VUI with a large vocabulary- i.e., a VUI
`capable of understanding and/or speaking many words and
`phrases. Accordingly, it is generally pointleSS to attempt to
`implement a VUI on Such devices as the Speech recognition
`and Speech output capabilities would be far too limited for
`practical use.
`
`SUMMARY
`The present invention provides a System and method for
`a distributed voice user interface (VUI) in which a remote
`System cooperates with one or more local devices to deliver
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`US 6,408,272 B1
`
`2
`a Sophisticated Voice user interface at the local devices. The
`remote System and the local devices may communicate via
`a Suitable network, Such as, for example, a telecommunica
`tions network or a local area network (LAN). In one
`embodiment, the distributed VUI is achieved by the local
`devices performing preliminary signal processing (e.g.,
`Speech parameter extraction and/or elementary Speech
`recognition) and accessing more Sophisticated speech rec
`ognition and/or speech output functionality implemented at
`the remote System only if and when necessary.
`According to an embodiment of the present invention, a
`local device includes an input device which can receive
`Speech input issued from a user. A processing component,
`coupled to the input device, extracts feature parameters
`(which can be frequency domain parameters and/or time
`domain parameters) from the speech input for processing at
`the local device or, alternatively, at a remote System.
`According to another embodiment of the present
`invention, a distributed Voice user interface System includes
`a local device which continuously monitors for Speech input
`issued by a user, Scans the Speech input for one or more
`keywords, and initiates communication with a remote SyS
`tem when a keyword is detected. The remote System
`receives the Speech input from the local device and can then
`recognize words therein.
`According to yet another embodiment of the present
`invention, a local device includes an input device for receiv
`ing Speech input issued from a user. Such speech input may
`Specify a command or a request by the user. A processing
`component, coupled to the input device, is operable to
`perform preliminary processing of the Speech input. The
`processing component determines whether the local device
`is by itself able to respond to the command or request
`Specified in the Speech input. If not, the processing compo
`nent initiates communication with a remote System for
`further processing of the Speech input.
`According to Still another embodiment of the present
`invention, a remote System includes a transceiver which
`receives Speech input, Such speech input previously issued
`by a user and preliminarily processed and forwarded by a
`local device. A processing component, coupled to the trans
`ceiver at the remote System, recognizes words in the Speech
`input.
`According to Still yet another embodiment of the present
`invention, a method includes the following Steps: continu
`ously monitoring at a local device for Speech input issued by
`a user; Scanning the Speech input at the local device for one
`or more keywords, initiating a connection between the local
`device and a remote System when a keyword is detected; and
`passing the Speech input, or appropriate feature parameters
`extracted from the Speech input, from the local device to the
`remote System for interpretation.
`A technical advantage of the present invention includes
`providing functional control over various local devices (e.g.,
`PDAS, radios, Stereo Systems, television Sets, remote
`controls, household Security Systems, cable and Satellite
`receivers, Video game Stations, automotive dashboard
`electronics, household appliances, etc.) using Sophisticated
`Speech recognition capability enabled primarily at a remote
`Site. The Speech recognition capability is delivered to each
`local device in the form of a distributed VUI. Thus, func
`tional control of the local devices via Speech recognition can
`be provided in a cost-effective manner.
`Another technical advantage of the present invention
`includes providing the vast bulk of hardware and/or software
`for implementing a Sophisticated Voice user interface at a
`
`Exhibit 1012
`Page 07 of 17
`
`
`
`US 6,408,272 B1
`
`3
`Single remote System, while only requiring minor hardware/
`Software implementations at each of a number of local
`devices. This Substantially reduces the cost of deploying a
`Sophisticated Voice user interface at the various local
`devices, because the incremental cost for each local device
`is Small. Furthermore, the Sophisticated Voice user interface
`is delivered to each local device without substantially
`increasing its size. In addition, the power required to operate
`each local device is minimal Since most of the capability for
`the Voice user interface resides in the remote System; this can
`be crucial for applications in which a local device is battery
`powered. Furthermore, the Single remote System can be
`more easily maintained and upgraded with new features or
`hardware, than can the individual local devices.
`Yet another technical advantage of the present invention
`includes providing a transient, on-demand connection
`between each local device and the remote System-i.e.,
`communication between a local device and the remote
`System is enabled only if the local device requires the
`assistance of the remote System. Accordingly, communica
`tion costs, Such as, for example, long distance charges, are
`minimized. Furthermore, the remote System is capable of
`Supporting a larger number of local devices if each Such
`device is only connected on a transient basis.
`Still another technical advantage of the present invention
`includes providing the capability for data to be downloaded
`from the remote System to each of the local devices, either
`automatically or in response to a user's request. Thus, the
`data already present in each local device can be updated,
`replaced, or Supplemented as desired, for example, to
`modify the voice user interface capability (e.g., speech
`recognition/output) Supported at the local device. In
`addition, data from news Sources or databases can be down
`loaded (e.g., from the Internet) and made available to the
`local devices for output to users.
`Other aspects and advantages of the present invention will
`become apparent from the following descriptions and
`accompanying drawings.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`For a more complete understanding of the present inven
`tion and for further features and advantages, reference is
`now made to the following description taken in conjunction
`with the accompanying drawings, in which:
`FIG. 1 illustrates a distributed voice user interface system,
`according to an embodiment of the present invention;
`FIG. 2 illustrates details for a local device, according to an
`embodiment of the present invention;
`FIG.3 illustrates details for a remote System, according to
`an embodiment of the present invention;
`FIG. 4 is a flow diagram of an exemplary method of
`operation for a local device, according to an embodiment of
`the present invention; and
`FIG. 5 is a flow diagram of an exemplary method of
`operation for a remote System, according to an embodiment
`of the present invention.
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENTS
`The preferred embodiments of the present invention and
`their advantages are best understood by referring to FIGS. 1
`through 5 of the drawings. Like numerals are used for like
`and corresponding parts of the various drawings.
`Turning first to the nomenclature of the Specification, the
`detailed description which follows is represented largely in
`
`4
`terms of processes and Symbolic representations of opera
`tions performed by conventional computer components,
`Such as a central processing unit (CPU) or processor asso
`ciated with a general purpose computer System, memory
`Storage devices for the processor, and connected pixel
`oriented display devices. These operations include the
`manipulation of data bits by the processor and the mainte
`nance of these bits within data Structures resident in one or
`more of the memory Storage devices. Such data Structures
`impose a physical organization upon the collection of data
`bits Stored within computer memory and represent specific
`electrical or magnetic elements. These Symbolic represen
`tations are the means used by those skilled in the art of
`computer programming and computer construction to most
`effectively convey teachings and discoveries to others
`skilled in the art.
`For purposes of this discussion, a process, method,
`routine, or Sub-routine is generally considered to be a
`Sequence of computer-executed Steps leading to a desired
`result. These Steps generally require manipulations of physi
`cal quantities. Usually, although not necessarily, these quan
`tities take the form of electrical, magnetic, or optical signals
`capable of being Stored, transferred, combined, compared, or
`otherwise manipulated. It is conventional for those skilled in
`the art to refer to these signals as bits, values, elements,
`Symbols, characters, text, terms, numbers, records, files, or
`the like. It should be kept in mind, however, that these and
`Some other terms should be associated with appropriate
`physical quantities for computer operations, and that these
`terms are merely conventional labels applied to physical
`quantities that exist within and during operation of the
`computer.
`It should also be understood that manipulations within the
`computer are often referred to in terms Such as adding,
`comparing, moving, or the like, which are often associated
`with manual operations performed by a human operator. It
`must be understood that no involvement of the human
`operator may be necessary, or even desirable, in the present
`invention. The operations described herein are machine
`operations performed in conjunction with the human opera
`tor or user that interacts with the computer or computers.
`In addition, it should be understood that the programs,
`processes, methods, and the like, described herein are but an
`exemplary implementation of the present invention and are
`not related, or limited, to any particular computer, apparatus,
`or computer language. Rather, various types of general
`purpose computing machines or devices may be used with
`programs constructed in accordance with the teachings
`described herein. Similarly, it may prove advantageous to
`construct a specialized apparatus to perform the method
`StepS described herein by way of dedicated computer Sys
`tems with hard-wired logic or programs Stored in non
`volatile memory, such as read-only memory (ROM).
`Network System Overview
`Referring now to the drawings, FIG. 1 illustrates a dis
`tributed voice user interface (VUI) system 10, according to
`an embodiment of the present invention. In general, distrib
`uted VUI System 10 allows one or more users to interact
`Via Speech or Verbal communication-with one or more
`electronic devices or systems into which distributed VUI
`system 10 is incorporated, or alternatively, to which distrib
`uted VUI system 10 is connected. As used herein, the terms
`“connected,” “coupled,” or any variant thereof, means any
`connection or coupling, either direct or indirect, between
`two or more elements, the coupling or connection can be
`physical or logical.
`More particularly, distributed VUI system 10 includes a
`remote System 12 which may communicate with a number
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`Exhibit 1012
`Page 08 of 17
`
`
`
`US 6,408,272 B1
`
`25
`
`35
`
`15
`
`S
`of local devices 14 (separately designated with reference
`numerals 14a, 14b, 14c, 14d, 14e, 14f. 14.g., 14h, and 14i) to
`implement one or more distributed VUIs. In one
`embodiment, a “distributed VUI' comprises a voice user
`interface that may control the functioning of a respective
`local device 14 through the Services and capabilities of
`remote System 12. That is, remote System 12 cooperates with
`each local device 14 to deliver a separate, Sophisticated VUI
`capable of responding to a user and controlling that local
`device 14. In this way, the Sophisticated VUIs provided at
`local devices 14 by distributed VUI System 10 facilitate the
`use of the local devices 14. In another embodiment, the
`distributed VUI enables control of another apparatus or
`System (e.g., a database or a website), in which case, the
`local device 14 serves as a “medium.'
`Each such VUI of system 10 may be “distributed” in the
`Sense that Speech recognition and Speech output Software
`and/or hardware can be implemented in remote System 12
`and the corresponding functionality distributed to the
`respective local device 14. Some Speech recognition/output
`Software or hardware can be implemented in each of local
`devices 14 as well.
`When implementing distributed VUI System 10 described
`herein, a number of factors may be considered in dividing
`the Speech recognition/output functionality between local
`devices 14 and remote system 12. These factors may
`include, for example, the amount of processing and memory
`capability available at each of local devices 14 and remote
`system 12; the bandwidth of the link between each local
`device 14 and remote System 12, the kinds of commands,
`instructions, directions, or requests expected from a user,
`and the respective, expected frequency of each; the expected
`amount of use of a local device 14 by a given user; the
`desired cost for implementing each local device 14, etc. In
`one embodiment, each local device 14 may be customized to
`address the Specific needs of a particular user, thus providing
`a technical advantage.
`Local Devices
`Each local device 14 can be an electronic device with a
`processor having a limited amount of processing or com
`40
`puting power. For example, a local device 14 can be a
`relatively Small, portable, inexpensive, and/or low power
`consuming "Smart device,” Such as a personal digital assis
`tant (PDA), a wireless remote control (e.g., for a television
`Set or Stereo system), a Smart telephone (Such as a cellular
`phone or a stationary phone with a Screen), or Smartjewelry
`(e.g., an electronic watch). A local device 14 may also
`comprise or be incorporated into a larger device or System,
`Such as a television Set, a television set top box (e.g., a cable
`receiver, a satellite receiver, or a Video game station), a video
`cassette recorder, a Video disc player, a radio, a Stereo
`System, an automobile dashboard component, a microwave
`oven, a refrigerator, a household Security System, a climate
`control System (for heating and cooling), or the like.
`In one embodiment, a local device 14 uses elementary
`techniques (e.g., the push of a button) to detect the onset of
`Speech. Local device 14 then performs preliminary proceSS
`ing on the Speech waveform. For example, local device 14
`may transform speech into a Series of feature vectors or
`frequency domain parameters (which differ from the digi
`tized or compressed Speech used in VocoderS or cellular
`phones). Specifically, from the Speech waveform, the local
`device 14 may extract various feature parameters, Such as,
`for example, cepstral coefficients, Fourier coefficients, linear
`predictive coding (LPC) coefficients, or other spectral
`parameters in the time or frequency domain. These spectral
`parameters (also referred to as features in automatic speech
`
`6
`recognition Systems), which would normally be extracted in
`the first stage of a speech recognition System, are transmitted
`to remote System 12 for processing therein. Speech recog
`nition and/or speech output hardware/ Software at remote
`system 12 (in communication with the local device 14) then
`provides a Sophisticated VUI through which a user can input
`commands, instructions, or directions into, and/or retrieve
`information or obtain responses from, the local device 14.
`In another embodiment, in addition to performing pre
`liminary signal processing (including feature parameter
`extraction), at least a portion of local devices 14 may each
`be provided with its own resident VUI. This resident VUI
`allows the respective local device 14 to understand and
`Speak to a user, at least on an elementary level, without
`remote System 12. To accomplish this, each Such resident
`VUI may include, or be coupled to, Suitable input/output
`devices (e.g., microphone and speaker) for receiving and
`outputting audible speech. Furthermore, each resident VUI
`may include hardware and/or Software for implementing
`Speech recognition (e.g., automatic Speech recognition
`(ASR) Software) and speech output (e.g., recorded or gen
`erated speech output Software). An exemplary embodiment
`for a resident VUI of a local device 14 is described below in
`more detail.
`A local device 14 with a resident VUI may be, for
`example, a remote control for a television Set. A user may
`issue a command to the local device 14 by Stating “Channel
`four or “Volume up,” to which the local device 14 responds
`by changing the channel on the television Set to channel four
`or by turning up the Volume on the Set.
`Because each local device 14, by definition, has a pro
`ceSSor with limited computing power, the respective resident
`VUI for a local device 14, taken alone, generally does not
`provide extensive Speech recognition and/or speech output
`capability. For example, rather than implement a more
`complex and Sophisticated natural language (NL) technique
`for Speech recognition, each resident VUI may perform
`“word Spotting” by Scanning speech input for the occurrence
`of one or more “keywords.” Furthermore, each local device
`14 will have a relatively limited vocabulary (e.g., less than
`one hundred words) for its resident VUI. As such, a local
`device 14, by itself, is only capable of responding to
`relatively simple commands, instructions, directions, or
`requests from a user.
`In instances where the Speech recognition and/or speech
`output capability provided by a resident VUI of a local
`device 14 is not adequate to address the needs of a user, the
`resident VUI can be supplemented with the more extensive
`capability provided by remote system 12. Thus, the local
`device 14 can be controlled by Spoken commands and
`otherwise actively participate in Verbal eXchanges with the
`user by utilizing more complex Speech recognition/output
`hardware and/or Software implemented at remote System 12
`(as further described herein).
`Each local device 14 may further comprise a manual input
`device-Such as a button, a toggle Switch, a keypad, or the
`like-by which a user can interact with the local device 14
`(and also remote System 12 via a Suitable communication
`network) to input commands, instructions, requests, or
`directions without using either the resident or distributed
`VUI. For example, each local device 14 may include hard
`ware and/or Software Supporting the interpretation and issu
`ance of dual tone multiple frequency (DTMF) commands. In
`one embodiment, Such manual input device can be used by
`the user to activate or turn on the respective local device 14
`and/or initiate communication with remote System 12.
`
`45
`
`50
`
`55
`
`60
`
`65
`
`Exhibit 1012
`Page 09 of 17
`
`
`
`US 6,408,272 B1
`
`7
`
`Remote System
`In general, remote System 12 Supports a relatively Sophis
`ticated VUI which can be utilized when the capabilities of
`any given local device 14 alone are insufficient to address or
`respond to instructions, commands, directions, or requests
`issued by a user at the local device 14. The VUI at remote
`System 12 can be implemented with Speech recognition/
`output hardware and/or Software Suitable for performing the
`functionality described herein.
`The VUI of remote system 12 interprets the vocalized
`expressions of a user-communicated from a local device
`14-So that remote System 12 may itself respond, or
`alternatively, direct the local device 14 to respond, to the
`commands, directions, instructions, requests, and other input
`spoken by the user. AS Such, remote System 12 completes the
`task of recognizing words and phrases.
`The VUI at remote system 12 can be implemented with a
`different type of automatic speech recognition (ASR)
`hardware/software than local devices 14. For example, in
`one embodiment, rather than performing “word Spotting,” as
`may occur at local devices 14, remote System 12 may use a
`larger Vocabulary recognizer, implemented with word and
`optional Sentence recognition grammars. A recognition
`grammar Specifies a set of directions, commands,
`instructions, or requests that, when spoken by a user, can be
`understood by a VUI. In other words, a recognition grammar
`Specifies what Sentences and phrases are to be recognized by
`the VUI. For example, if a local device 14 comprises a
`microwave oven, a distributed VUI for the same can include
`a recognition grammar that allows a user to Set a cooking
`time by saying, “Oven high for half a minute,” or “Cook on
`high for thirty seconds,” or, alternatively, “Please cook for
`thirty Seconds at high.” Commercially available speech
`recognition Systems with recognition grammars are pro
`Vided by ASR technology Vendors Such as, for example, the
`following: Nuance Corporation of Menlo Park, Calif.;
`Dragon Systems of Newton, Mass.; IBM of Austin, Tex.;
`Kurzweil Applied Intelligence of Waltham, Mass.; Lernout
`Hauspie Speech Products of Burlington, Mass.; and
`PureSpeech, Inc. of Cambridge, Mass.
`Remote System 12 may process the directions, commands,
`instructions, or requests that it has recognized or understood
`from the utterances of a user. During processing, remote
`System 12 can, among other things, generate control signals
`and reply messages, which are returned to a local device 14.
`Control Signals are used to direct or control the local device
`14 in response to user input. For example, in response to a
`user command of “Turn up the heat to 82 degrees,” control
`Signals may direct a local device 14 incorporating a ther
`mostat to adjust the temperature of a climate control System.
`Reply messages are intended for the immediate consumption
`of a user at the local device and may take the form of Video
`or audio, or text to be displayed at the local device. As a
`reply message, the VUI at remote System 12 may issue
`audible output in the form of speech that is understandable
`by a user.
`For issuing reply messages, the VUI of remote System 12
`may include capability for speech generation (Synthesized
`speech) and/or play-back (previously recorded speech).
`Speech generation capability can be implemented with text
`to-speech (TTS) hardware/software, which converts textual
`information into Synthesized, audible speech. Speech play
`back capability may be implemented with an analog-to
`digital (A/D) converter driven by CD ROM (or other digital
`memory device), a tape player, a laser disc player, a spe
`cialized integrated circuit (IC) device, or the like, which
`plays back previously recorded human Speech.
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`8
`In speech play-back, a person (preferably a voice model)
`recites various Statements which may desirably be issued
`during an interactive Session with a user at a local device 14
`of distributed VUI system 10. The person's voice is recorded
`as the recitations are made. The recordings are separated into
`discrete messages, each message comprising one or more
`Statements that would desirably be issued in a particular
`context (e.g., greeting, farewell, requesting instructions,
`receiving instructions, etc.). Afterwards, when a user inter
`acts with distributed VUI System 10, the recorded messages
`are played back to the user when the proper context arises.
`The reply messages generated by the VUI at remote
`System 12 can be made to be consistent with any messages
`provided by the resident VUI of a local device 14. For
`example, if Speech play-back capability is used for gener
`ating speech, the same person's voice may be recorded for
`messages output by the resident VUI of the local device 14
`and the VUI of remote system 12. If synthesized (computer
`generated) speech capability is used, a similar Sounding
`artificial voice may be provided for the VUIs of both local
`devices 14 and remote system 12. In this way, the distributed
`VUI of system 10 provides to a user an interactive interface
`which is "seamless” in the Sense that the user cannot
`distinguish between the simpler, resident VUI of the local
`device 14 and the more Sophisticated VUI of remote system
`12.
`In one embodiment, the Speech recognition and Speech
`play-back capabilities described herein can be used to imple
`ment a voice user interface with personality, as taught by
`U.S. patent application Ser. No. 09/071,717, entitled “Voice
`User Interface With Personality,” the text of which is incor
`porated herein by reference.
`Remote system 12 may also comprise hardware and/or
`Software Supporting the interpretation and issuance of
`commands, such as dual tone multiple frequency (DTMF)
`commands, So that a user may alternatively interact with
`remote System 12 using an alternative input device, Such as
`a telephone key pad.
`Remote System 12 may be in communication with the
`“Internet,” thus providing access thereto for users at local
`devices 14. The Internet is an interconnection of computer
`“clients” and “servers' located throughout the world and
`eXchanging information according to Transmission Control
`Protocol/Internet Protocol (TCP/IP), Internetwork Packet
`eXchange/Sequence Packet exchange (IPX/SPX),
`AppleTalk, or other Suitable protocol. The Internet Supports
`the distributed application known as the “World Wide Web.”
`Web Servers may exchange information with one another
`using a protocol known as hypertext transport protocol
`(HTTP). Information may be communicated from one server
`to any other computer using HTTP and is maintained in the
`form of web pages, each