`(12) Patent Application Publication (10) Pub. No.: US 2002/0072918 A1
`White et al.
`(43) Pub. Date:
`Jun. 13, 2002
`
`US 20020072918A1
`
`(54) DISTRIBUTED VOICE USER INTERFACE
`
`Related U.S. Application Data
`
`(76)
`
`Inventors: George M. White, Delray Beach, FL
`(US); James J. Buteau, Mountain
`View, CA (US); Glen E. Shires,
`Danville, CA (US); Kevin J. Surace,
`Sunnyvale, CA (US); Steven
`Markman, Los Gatos, CA (US)
`
`Correspondence Address:
`SKJERVEN MORRILL MACPHERSON LLP
`THREE EMBARCADERO CENTER
`28TH FLOOR
`SAN FRANCISCO, CA 94111 (US)
`
`Appl. No.2
`
`10/057,523
`
`Filed:
`
`Jan. 22, 2002
`
`Division of application No. 09/290,508, filed on Apr.
`12, 1999.
`
`Publication Classification
`
`Int. Cl.7
`US. Cl.
`
`............ .. G10L 21/00; G10L 11/00
`........................................... .. 704/270.1
`
`(57)
`
`ABSTRACT
`
`A distributed voice user interface system includes a local
`device which receives speech input issued from a user. Such
`speech input may specify a command or a request by the
`user. The local device performs preliminary processing of
`the speech input and determines Whether it is able to respond
`to the command or request by itself. If not, the local device
`initiates communication With a remote system for further
`processing of the speech input.
`
`14d
`_ 1
`Local
`Device
`
`14e
`,
`)
`
`Lo
`ca|
`Device
`
`141
`Q 1
`‘
`
`Local
`Device
`
`1
`‘
`
`LAN
`
`1.1,] p 12
`
`Remote System
`
`L1 .
`T
`
`~
`>
`Telecommunications
`,
`Device
`Network 6 L°‘"~.a'
`A
`/\>
`J/\ _
`Local
`Device
`
`\V//\‘
`
`r
`
`Mb
`
`|/
`
`7
`
`/14°
`
`’
`
`Distributed Voice User Interface System
`
`GOOGLE EXHIBIT 1008
`
`Page 1 of 18
`
`
`
`Patent Application Publication
`
`20023:1mHJ
`
`5f01LI.6ehS
`
`US 2002/0072918 A1
`
`\wo_>mn_
`
`as.83_xsimzRmCO_w$O__..__..=.CCCOO®_@n_.
`
`_\.m_u_
`
`mix
`
`_moo._.
`
`H
`
`0r
`
`wo_>mQxv‘;
`¥(\N_
`TA
`
`Em«m>mEoEmm
`
`Page 2 of 18
`
`
`
`Patent Application Publication
`
`20023:1muJ
`
`5£102LI.eehS
`
`US 2002/0072918 A1
`
`
`
`couomzxm_9m.EEmn_
`
`
`
`Em:ogEoomc_mmmoo_n_
`
`
`
`E28500c_-m9mm
`
`
`
`oo_>mn_.6004
`
`
`
`
`
`mc_mcm_coazmoommcommnw
`
`x
`
`
`
`:o_..m.m:mofioam
`
`
`
`./(mugE»M.,Lmo_>mumc=u._oomm
`
`.¢mE_.n_
`
`b_._mco_Hoc:n_
`
`Ewcon_Eo0
`
`Page 3 of 18
`
`
`
`Patent Application Publication
`
`20023:1mHJ
`
`5f03LI.6ehS
`
`US 2002/0072918 A1
`
`Local Devices
`
`
`
`Emum:_._cozmficmwcomwum
`
`
`mcofimmfi\;;
`
`
`
` E‘wk\.ogmcm/.on
`
`
`
`coacmoummcummaw
`
`
`
`
`
`m._mEEm_o_mco_>_u:w:au<nE:Kmcficmmm
`
`
`
`~:m:om_w_..m.o:_-mm_mm_
`
`
`
`
`
`.Eu:oo_Eoom:_mmwo2n__m:m_m
`
`
`
`
`
`amazesco_fi=8tcmoozumn.o.om_.Eooz<>>llv~mc._mE_
`
`
`
`
`
`
`
`Em..m>mm..oEmm_
`
`Page 4 of 18
`
`
`
`Patent Application Publication
`
`Jun. 13, 2002 Sheet 4 of 5
`
`US 2002/0072913 A1
`
`100
`
`Receive Speech4\ 104
`We
`
`Processing at local
`device
`
`Local
`———————\ processing
`Qifficient?
`
`"
`Establish connection
`between local device
`and remote system
`
`Transmit datalspeech
`from local device to
`remote system
`
`Response
`received from
`remote system?
`
`Terminate connection
`between local device
`and remote system
`
`Take action based
`on pmcessing
`
`End session?
`
`Page 5 of 18
`
`
`
`Patent Application Publication
`
`Jun. 13, 2002 Sheet 5 of 5
`
`US 2002/0072918 A1
`
`200
`
`Have user select
`
`‘
`1
`
`from Isted
`commands or
`requests
`
`V
`
`Compare to
`grammars
`
`/
`
`220
`Matchx
`
`Respond to
`command or
`request
`
`End
`
`session?
`
`user’?
`
`Yes
`¢
`Compare to
`grammars
`
`206
`
`Yes
`
`\
`
`No
`
` V
`6
`3 208
`Request more
`input
`
`Compare to ‘
`grammars
`
`1
`
`210
`
`21 2
`
`1;
`’
`
`g
`
`r~Yes
`
`More attempts?
`
`</latch’? —Yes
`
`\ N
`
`o
`
`‘
`
`/ 214
`Yes
`:——<Ilore atlempls?
`
`Page 6 of 18
`
`
`
`US 2002/0072918 A1
`
`Jun. 13, 2002
`
`DISTRIBUTED VOICE USER INTERFACE
`
`CROSS-REFERENCE TO RELATED
`Al’PLlCAl‘l(,)NS
`
`[0001] This Application relates to the subject matter dis-
`closed in the following co-pending U.S. Applications: U.S.
`application Ser. No. 08/609,699, filed Mar. 1, 1996, entitled
`“Method and Apparatus For Telephonically Accessing and
`Navigating the Internet,” and US. application Ser. No.
`09/071,717, filed May 1, 1998, entitled “Voice User Inter-
`face With Personality.” These co-pending applications are
`assigned to the present Assignee and are incorporated herein
`by reference.
`
`BACKGROUND OF THE INVENTION
`
`[0002] A Voice user interface (VUI) allows a l1un1an user
`to interact with an intelligent, electronic device (e.g., a
`computer) by merely “talking” to the device. The electronic
`device is thus able to receive, and respond to, directions,
`commands, instructions, or requests issued verbally by the
`human user. As such, a VUI facilitates the use of the device.
`
`[0003] A typical VUI is implemented using various tech-
`niques which enable an electronic device to “understand”
`particular words or phrases spoken by the human user, and
`to output or “speak” the same or different words/phrases for
`prompting, or responding to, the user. The words or phrases
`understood and/or spoken by a device constitute its “Vocabu-
`lary.” In general,
`the number of words/phrases within a
`device’s vocabulary is directly related to the computing
`power which supports its VUI. Thus, a device with more
`computing power can understand more words or phrases
`than a device with less computing power.
`
`[0004] Many modern electronic devices, such as personal
`digital assistants (PDAs), radios, stereo systems, television
`sets, remote controls, household security systems, cable and
`satellite receivers, video game stations, automotive dash-
`board electronics, household appliances, and the like, have
`some computing power, but typically not enough to support
`a sophisticated V1 with a large vocabulary—i.e., a VUI
`capable of understanding and/or speaking many words and
`phrases. Accordingly, it is generally pointless to attempt to
`implement a VUI on such devices as the speech recognition
`and speech output capabilities would be far too limited for
`practical use.
`
`SUMMARY
`
`invention provides a system and
`[0005] The present
`method for a distributed voice user interface (VUI) in which
`a remote system cooperates with one or more local devices
`to deliver a sophisticated Voice user interface at the local
`devices. The remote system and the local devices may
`communicate via a suitable network, such as, for example,
`a telecommunications network or a local area network
`In one embodiment, the distributed VUI is achieved
`by the local devices performing preliminary signal process-
`ing (e.g., speech parameter extraction and/or elementary
`speech recognition)
`and accessing more sophisticated
`speech recognition and/or
`speech output
`functionality
`implemented at the remote system only if and when neces-
`sarv.
`
`[0006] According to an embodiment of the present inven-
`tion, a local device includes an input device which can
`
`issued from a user. A processing
`receive speech input
`component, coupled to the input device, extracts feature
`parameters (which can be frequency domain parameters
`and/or time domain parameters) from the speech input for
`processing at the local device or, alternatively, at a remote
`system.
`
`[0007] According to another embodiment of the present
`invention, a distributed voice user interface system includes
`a local device which continuously monitors for speech input
`issued by a user, scans the speech input for one or more
`keywords, and initiates communication with a remote sys-
`tem when a keyword is detected. The remote system
`receives the speech input from the local device and can then
`recognize words therein.
`
`[0008] According to yet another embodiment of the
`present invention, a local device includes an input device for
`receiving speech input issued from a user. Such speech input
`may specify a command or a request by the user. A pro-
`cessing component, coupled to the input device, is operable
`to perform preliminary processing of the speech input. The
`processing component determines whether the local device
`is by itself able to respond to the command or request
`specified in the speech input. If not, the processing compo-
`nent
`initiates communication with a remote system for
`further processing of the speech input.
`
`[0009] According to still another embodiment of the
`present invention, a remote system includes a transceiver
`which receives speech input, such speech input previously
`issued by a user and preliminarily processed and forwarded
`by a local device. A processing component, coupled to the
`transceiver at the remote system, recognizes words in the
`speech input.
`
`[0010] According to still yet another embodiment of the
`present invention, a method includes the following steps:
`continuously monitoring at a local device for speech input
`issued by a user; scanning the speech input at the local
`device for one or more keywords; initiating a connection
`between the local device and a remote system when a
`keyword is detected, and passing the speech input, or
`appropriate feature parameters extracted from the speech
`input, from the local device to the remote system for
`interpretation.
`
`invention
`[0011] A technical advantage of the present
`includes providing functional control over various local
`devices (e.g., PDAs, radios, stereo systems, television sets,
`remote controls, household security systems, cable and
`satellite receivers, video game stations, automotive dash-
`board electronics, household appliances, etc.) using sophis-
`ticated speech recognition capability enabled primarily at a
`remote site. The speech recognition capability is delivered to
`each local device in the form of a distributed VUI. Thus,
`functional control of the local devices via speech recognition
`can be provided in a cost—elfective manner.
`
`[0012] Another technical advantage of the present inven-
`tion includes providing the vast bulk of hardware and/or
`software for implementing a sophisticated voice user inter-
`face at a single remote system, while only requiring minor
`hardware/software implementations at each of a number of
`local devices. This substantially reduces the cost of deploy-
`ing a sophisticated voice user interface at the Various local
`devices, because the incremental cost for each local device
`
`Page 7 of 18
`
`
`
`US 2002/0072918 A1
`
`Jun. 13, 2002
`
`is small. Furthermore, the sophisticated voice user interface
`is delivered to each local device without substantially
`increasing its size. In addition, the power required to operate
`each local device is minimal since most of the capability for
`the voice user interface resides in the remote system; this can
`be crucial for applications in which a local device is battery-
`powered. Furthermore,
`the single remote system can be
`more easily maintained and upgraded with new features or
`hardware, than can the individual local devices.
`
`[0013] Yet another technical advantage of the present
`invention includes providing a transient, on-demand con-
`nection between each local device and the remote system—
`i.e., communication between a local device and the remote
`system is enabled only if the local device requires the
`assistance of the remote system. Accordingly, communica-
`tion costs, such as, for example, long distance charges, are
`minimized. Furthermore, the remote system is capable of
`supporting a larger number of local devices if each such
`device is only connected 011 a transient basis.
`
`[0014] Still another technical advantage of the present
`invention includes providing the capability for data to be
`downloaded from the remote system to each of the local
`devices, either automatically or in response to a user’s
`request. Thus, the data already present in each local device
`can be updated, replaced, or supplemented as desired, for
`example, to modify the voice user interface capability (e .g.,
`speech recognition/output) supported at the local device. In
`addition, data from news sources or databases can be down-
`loaded (e.g., from the Internet) and made available to the
`local devices for output to users.
`
`[0015] Other aspects and advantages of the present inven-
`tion will become apparent from the following descriptions
`and accompanying drawings.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`[0016] For a more complete understanding of the present
`invention and for further features and advantages, reference
`is now made to the following description taken in conjunc-
`tion with the accompanying drawings, in which:
`
`[0017] FIG. 1 illustrates a distributed voice user interface
`system, according to an embodiment of the present inven-
`tion;
`
`[0018] FIG. 2 illustrates details for a local device, accord-
`ing to an embodiment of the present invention;
`
`[0019] FIG. 3 illustrates details for a remote system,
`according to an embodiment of the present invention;
`
`[0020] FIG. 4 is a flow diagram of an exemplary method
`of operation for a local device, according to an embodiment
`of the present invention; and
`
`[0021] FIG. 5 is a [low diagram of an exemplary method
`of operation for a remote system, according to an embodi-
`ment of the present invention.
`
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENTS
`
`[0022] The preferred embodiments of the present inven-
`tion and their advantages are best understood by referring to
`FIGS. 1 through 5 of the drawings. Like numerals are used
`for like and corresponding parts of the various drawings.
`
`[0023] Turning first to the nomenclature of the specifica-
`tion, the detailed description which follows is represented
`largely in terms of processes and symbolic representations
`of operations performed by conventional computer compo-
`nents, such as a central processing unit (CPU) or processor
`associated with a general purpose computer system, memory
`storage devices for the processor, and connected pixel-
`oriented display devices. These operations include the
`manipulation of data bits by the processor and the mainte-
`nance of these bits within data structures resident in one or
`more of the memory storage devices. Such data structures
`impose a physical organization upon the collection of data
`bits stored within computer memory and represent specific
`electrical or magnetic elements. These symbolic represen-
`tations are the means used by those skilled in the art of
`computer programming and computer construction to most
`eifectively convey teachings and discoveries to others
`skilled in the art.
`
`[0024] For purposes of this discussion, a process, method,
`routine, or sub-routine is generally considered to be a
`sequence of computer-executed steps leading to a desired
`result. These steps generally require manipulations of physi-
`cal quantities. Usually, although not necessarily, these quan-
`tities take the form of electrical, magnetic, or optical signals
`capable of being stored, transferred, combined, compared, or
`otherwise manipulated. It is conventional for those skilled in
`the art to refer to these signals as bits, values, elements,
`symbols, characters, text, terms, numbers, records, files, or
`the like. It should be kept in mind, however, that these and
`some other terms should be associated with appropriate
`physical quantities for computer operations, and that these
`terms are merely conventional labels applied to physical
`quantities that exist within and during operation of the
`computer.
`
`It should also be understood that manipulations
`[0025]
`within the computer are often referred to in terms such as
`adding, comparing, moving, or the like, which are often
`associated with manual operations performed by a human
`operator. It must be understood that no involvement of the
`human operator may be necessary, or even desirable, in the
`present
`invention. The operations described herein are
`machine operations performed in conjunction with the
`human operator or user that interacts with the computer or
`computers.
`
`In addition, it should be understood that the pro-
`[0026]
`grams, processes, methods, and the like, described herein are
`but an exemplary implementation of the present invention
`and are not related, or limited, to any particular computer,
`apparatus, or computer language. Rather, various types of
`general purpose computing machines or devices may be
`used with programs constructed in accordance with the
`teachings described herein. Similarly, it may prove advan-
`tageous to construct a specialized apparatus to perform the
`method steps described herein by way of dedicated com-
`puter systems with hard—wired logic or programs stored in
`r1on-volatile memory, such as read-only memory (ROM).
`
`[0027] Network System Overview
`
`[0028] Referring now to the drawings, FIG. 1 illustrates a
`distributed voice user interface (VUI) system 10, according
`to an embodiment of the present
`invention.
`In general,
`distributed VUI system 10 allows one or more users to
`ir1teract—via speech or verbal comr11unication—with one or
`
`Page 8 of 18
`
`
`
`US 2002/0072918 A1
`
`Jun. 13, 2002
`
`more electronic devices or systems into which distributed
`VUI system 10 is incorporated, or alternatively, to which
`distributed VUI system 10 is connected. As used herein, the
`terms "connected,”“coupled,” or any variant thereof, means
`any connection or coupling, either direct or
`indirect,
`between two or more elements; the coupling or connection
`can be physical or logical.
`system 10
`[0029] More particularly, distributed VUI
`includes a remote system 12 which may communicate with
`a number of local devices 14 (separately designated with
`reference numerals 14a, 14b, 14c, 14d, 14e, 14f, 14g, 14h,
`and 141) to implement one or more distributed VUIs. In one
`embodiment, a “distributed VUI” comprises a voice user
`interface that may control the functioning of a respective
`local device 14 through the services and capabilities of
`remote system 12. That is, remote system 12 cooperates with
`each local device 14 to deliver a separate, sophisticated VUI
`capable of responding to a user and controlling that local
`device 14. In this way, the sophisticated VUIs provided at
`local devices 14 by distributed VUI system 10 facilitate the
`use of the local devices 14. In another embodiment,
`the
`distributed VUI enables control of another apparatus or
`system (e.g., a database or a website), in which case, the
`local device 14 serves as a “medium.”
`
`[0030] Each such VUI of system 10 may be “distributed”
`in the sense that speech recognition and speech output
`software and/or hardware can be implemented in remote
`system 12 and the corresponding functionality distributed to
`the respective local device 14. Some speech recognition,’
`output software or hardware can be implemented in each of
`local devices 14 as well.
`
`[0031] When implementing distributed VUI system 10
`described herein, a number of factors may be considered in
`dividing
`the
`speech
`recognition/output
`functionality
`between local devices 14 and remote system 12. These
`factors may include, for example, the amount of processing
`and memory capability available at each of local devices 14
`and remote system 12; the bandwidth of the hnk between
`each local device 14 and remote system 12; the kinds of
`commands,
`instructions, directions, or requests expected
`from a user, and the respective, expected frequency of each;
`the expected amount of use of a local device 14 by a given
`user; the desired cost for implementing each local device 14;
`etc.
`In one embodiment, each local device 14 may be
`customized to address the specific needs of a particular user,
`thus providing a technical advantage.
`[0032] Local Devices
`[0033] Each local device 14 can be an electronic device
`with a processor having a limited amount of processing or
`computing power. For example, a local device 14 can be a
`relatively small, portable, inexpensive, and/or low power-
`consuming “smart device,” such as a personal digital assis-
`tant (PDA), a wireless remote control (e.g., for a television
`set or stereo system), a smart telephone (such as a cellular
`phone or a stationary phone with a screen), or smart jewelry
`(e.g., an electronic watch). A local device 14 may also
`comprise or be incorporated into a larger device or system,
`such as a television set, a television set top box (e.g., a cable
`receiver, a satellite receiver, or a video game station), a video
`cassette recorder, a video disc player, a radio, a stereo
`system, an automobile dashboard component, a microwave
`oven, a refrigerator, a household security system, a climate
`control system (for heating and cooling), or the like.
`
`In one embodiment, a local device 14 uses elemen-
`[0034]
`tary techniques (e.g., the push of a button) to detect the onset
`of speech. Local device 14 then performs preliminary pro-
`cessing on the speech waveform. For example, local device
`14 may transform speech into a series of feature vectors or
`frequency domain parameters (which differ from the digi-
`tized or compressed speech used in vocoders or cellular
`phones). Specifically, from the speech waveform, the local
`device 14 may extract various feature parameters, such as,
`for example, cepstral coefficients, Fourier coeflicients, linear
`predictive coding (LPC) coe icients, or other spectral
`parameters in the time or frequency domain. These spectral
`parameters (also referred to as features in automatic speech
`recognition systems), which would normally be extracted in
`the first stage of a speech recognition system, are transmitted
`to remote system 12 for processing therein. Speech recog-
`nition and/or speech output hardware/software at remote
`system 12 (in communication with the local device 14) then
`provides a sophisticated VUI through which a user can input
`commands, instructions, or directions into, and/or retrieve
`information or obtain responses from, the local device 14.
`
`In another embodiment, in addition to performing
`[0035]
`preliminary signal processing (including feature parameter
`extraction), at least a portion of local devices 14 may each
`be provided with its own resident VUI. This resident VUI
`allows the respective local device 14 to understand and
`speak to a user, at least on an elementary level, without
`remote system 12. To accomplish this, each such resident
`VUI may include, or be coupled to, suitable input/output
`devices (e.g., microphone and speaker) for receiving and
`outputting audible speech. Furthermore, each resident VUI
`may include hardware and/or software for implementing
`speech recognition (e.g., automatic speech recognition
`(ASR) software) and speech output (e.g., recorded or gen-
`erated speech output software). An exemplary embodiment
`for a resident VUI of a local device 14 is described below in
`more detail.
`
`[0036] A local device 14 with a resident VUI may be, for
`example, a remote control for a television set. Auser may
`issue a command to the local device 14 by stating “Channel
`four” or “Volume up,” to which the local device 14 responds
`by changing the channel on the television set to channel four
`or by turning up the volume on the set.
`
`[0037] Because each local device 14, by definition, has a
`processor with hmited computing power,
`the respective
`resident VUI for a local device 14, taken alone, generally
`does not provide extensive speech recognition and/or speech
`output capability. For example, rather than implement a
`more complex and sophisticated natural
`language
`technique for speech recognition, each resident VUI may
`perform “word spotting” by scanning speech input for the
`occurrence of one or more “keywords.” Furthermore, each
`local device 14 will have a relatively limited vocabulary
`(e.g., less than one hundred words) for its resident VUI. As
`such, a local device 14, by itself, is only capable of respond-
`ing to relatively simple commands, instructions, directions,
`or requests from a user.
`
`In instances where the speech recognition and/or
`[0038]
`speech output capability provided by a resident VUI of a
`local device 14 is not adequate to address the needs of a user,
`the resident VUI can be supplemented with the more exten-
`sive capability provided by remote system 12. Thus, the
`
`Page 9 of 18
`
`
`
`US 2002/0072918 A1
`
`Jun. 13, 2002
`
`local device 14 can be controlled by spoken commands and
`otherwise actively participate in verbal exchanges with the
`user by utilizing more complex speech recognition/output
`hardware and/or software implemented at remote system 12
`(as f11rther described herein).
`
`[0039] Each local device 14 may further comprise a
`manual input device—such as a button, a toggle switch, a
`keypad, or the like—by which a user can interact with the
`local device 14 (and also remote system 12 via a suitable
`communication network) to input commands, instructions,
`requests, or directions without using either the resident or
`distributed VUI. For example, each local device 14 may
`include hardware and/or software supporting the interpreta-
`tion and issuance of dual tone multiple frequency (DTMF)
`commands In one embodiment, such manual input device
`can be used by the user to activate or turn on the respective
`local device 14 and/or initiate communication with remote
`system 12.
`
`[0040] Remote System
`
`In general, remote system 12 supports a relatively
`[0041]
`sophisticated VUI which can be utilized when the capabili-
`ties of any given local device 14 alone are insufficient to
`address or respond to instructions, commands, directions, or
`requests issued by a user at the local device 14. The VUI at
`remote system 12 can be implemented with speech recog-
`nition/output hardware and/or software suitable for perform-
`ing the functionality described herein.
`
`[0042] The VUI of remote system 12 interprets the vocal-
`ized expressions of a user—communicated from a local
`device 14—so that remote system 12 may itself respond, or
`alternatively, direct the local device 14 to respond, to the
`commands, directions, instructions, requests, and other input
`spoken by the user. As such, remote system 12 completes the
`task of recognizing words and phrases.
`
`[0043] The VUI at remote system 12 can be implemented
`with a different type of automatic speech recognition (ASR)
`hardware/software than local devices 14. For example, in
`one embodiment, rather than performing “word spotting,” as
`may occur at local devices 14, remote system 12 may use a
`larger vocabulary recognizer, implemented with word and
`optional sentence recognition grammars. A recognition
`grammar specifies a set of directions, commands, instruc-
`tions, or requests that, when spoken by a user, can be
`understood by a VUI. In other words, a recognition grammar
`specifies what sentences and phrases are to be recognized by
`the VUI. For example, if a local device 14 comprises a
`microwave oven, a distributed VUI for the same can include
`a recognition grammar that allows a user to set a cooking
`time by saying, “Oven high for half a minute,” or “Cook on
`high for thirty seconds,” or, alternatively, “Please cook for
`thirty seconds at high.” Commercially available speech
`recognition systems with recognition grammars are pro-
`vided by ASR technology vendors such as, for example, the
`following: Nuance Corporation of Menlo Park, Calif.;
`Dragon Systems of Newton, Mass; IBM of Austin, Tex;
`Kurzweil Applied Intelligence of Waltham, Mass.; Lernout
`Hauspie Speech Products of Burlington, Mass.; and Pure-
`Speech, Inc. of Cambridge, Mass.
`
`[0044] Remote system 12 may process the directions,
`commands, instructions, or requests that it has recognized or
`understood from the utterances of a user. During processing,
`
`remote system 12 can, among other things, generate control
`signals and reply messages, which are returned to a local
`device 14. Control signals are used to direct or control the
`local device 14 in response to user input. For example, in
`response to a user command of “Turn up the heat to 82
`degrees,” control signals may direct a local device 14
`incorporating a thermostat to adjust the temperature of a
`climate control system. Reply messages are intended for the
`immediate consumption of a user at the local device and may
`take the form of video or audio, or text to be displayed at the
`local device. As a reply message, the VUI at remote system
`12 may issue audible output in the form of speech that is
`understandable by a user.
`the VUI of remote
`[0045] For issuing reply messages,
`system 12 may include capability for speech generation
`(synthesized speech) and/or play—back (previously recorded
`speech). Speech generation capability can be implemented
`with text-to-speech (TTS) hardware/software, which con-
`verts textual information into synthesized, audible speech.
`Speech play—back capability may be implemented with an
`analog-to-digital (A/D) converter driven by CD ROM (or
`other digital memory device), a tape player, a laser disc
`player, a specialized integrated circuit (IC) device, or the
`like, which plays back previously recorded human speech.
`[0046]
`In speech play—back, a person (preferably a voice
`model) recites various statements which may desirably be
`issued during an interactive session with a user at a local
`device 14 of distributed VUI system 10. The person’s voice
`is recorded as the recitations are made. The recordings are
`separated into discrete messages, each message comprising
`one or more statements that would desirably be issued in a
`particular context
`(e.g., greeting,
`farewell,
`requesting
`instructions, receiving instructions, etc.). Afterwards, when
`a user interacts with distributed VUI system 10, the recorded
`messages are played back to the user when the proper
`context arises.
`
`[0047] The reply messages generated by the VUI at
`remote system 12 can be made to be consistent with any
`messages provided by the resident VUI of a local device 14.
`For example,
`if speech play—back capability is used for
`generating speech, the same person’s voice may be recorded
`for messages output by the resident VUI of the local device
`14 and the VUI of remote system 12. If synthesized (com-
`puter-generated) speech capability is used, a similar sound-
`ing artificial voice may be provided for the VUls of both
`local devices 14 and remote system 12. In this way, the
`distributed VUI of system 10 provides to a user an interac-
`tive interface which is “seamless” in the sense that the user
`cannot distinguish between the simpler, resident VUI of the
`local device 14 and the more sophisticated VUI of remote
`system 12.
`[0048]
`In one embodiment, the speech recognition and
`speech play—back capabilities described herein can be used
`to implement a voice user interface with personality, as
`taught by U.S. patent application Ser. No. 09/071,717,
`entitled “Voice User Interface With Personality,” the text of
`which is incorporated herein by reference.
`[0049] Remote system 12 may also comprise hardware
`and/or software supporting the interpretation and issuance of
`commands, such as dual tone multiple frequency (DTMF)
`commands, so that a user may alternatively interact with
`remote system 12 using an alternative input device, such as
`a telephone key pad.
`
`Page 10 of 18
`
`
`
`US 2002/0072918 A1
`
`Jun. 13, 2002
`
`[0050] Remote system 12 may be in communication with
`the “Internet,” thus providing access thereto for users at
`local devices 14. The Internet
`is an interconnection of
`computer “clients” and “servers” located throughout
`the
`world and exchanging information according to Transmis-
`sion Control Protocol/Internet Protocol (TCP/IP), Internet-
`work Packet eXchange/Sequence Packet exchange (IPX/
`SPX), AppleTalk, or other suitable protocol. The Internet
`supports the distributed application known as the “World
`Wide Web.” Web servers may exchange information with
`one another using a protocol known as hypertext transport
`protocol
`Information may be communicated from
`one server
`to any other computer using HTTP and is
`maintained in the form of web pages, each of which can be
`identified by a respective uniform resource locator (URL).
`Remote system 12 may function as a client to interconnect
`with Web servers. The interconnection may use any of a
`variety of communication links, such as, for example, a local
`telephone communication line or a dedicated communica-
`tion line. Remote system 12 may comprise and locally
`execute a “web browser” or “web proxy” program. A web
`browser is a computer program that allows remote system
`12, acting as a client,
`to exchange information with the
`World Wide Web. Any of a variety of web browsers are
`available,
`such
`as NETSCAPE NAVIGATOR from
`Netscape Communications Corp. of Mountain View, (Calif.,
`INTERNET EXPLORER from Microsoft Corporation of
`Redmond, Wash., and others that allow users to corive-
`niently access and navigate the Internet. A web proxy is a
`computer program which (via the Internet) can, for example,
`electronically integrate the systems of a company and its
`vendors and/or customers, support business transacted elec-
`tronically over the network (i.e., “e—commerce”), and pro-
`vide automated access to Web-enabled resources. Any num-
`ber
`of web proxies
`are
`available,
`such
`as B2B
`INTEGRATION SERVER from webMethods of Fairfax,
`Va., and MICROSOFT PROXY SERVER from Microsoft
`Corporation of Redmond, Wash. The hardware, software,
`and protocols—as well as the underlying concepts and
`techniques—supporting the Internet are generally under-
`stood by those in the art.
`
`[0051] Communication Network
`
`[0052] One or more suitable communication networks
`enable local devices 14 to communicate with remote system
`12. For example, as shown, local devices 14