throbber
(19) United States
`(12) Patent Application Publication (10) Pub. No.: US 2002/0072918 A1
`White et al.
`(43) Pub. Date:
`Jun. 13, 2002
`
`US 20020072918A1
`
`(54) DISTRIBUTED VOICE USER INTERFACE
`
`Related U.S. Application Data
`
`(76)
`
`Inventors: George M. White, Delray Beach, FL
`(US); James J. Buteau, Mountain
`View, CA (US); Glen E. Shires,
`Danville, CA (US); Kevin J. Surace,
`Sunnyvale, CA (US); Steven
`Markman, Los Gatos, CA (US)
`
`Correspondence Address:
`SKJERVEN MORRILL MACPHERSON LLP
`THREE EMBARCADERO CENTER
`28TH FLOOR
`SAN FRANCISCO, CA 94111 (US)
`
`Appl. No.2
`
`10/057,523
`
`Filed:
`
`Jan. 22, 2002
`
`Division of application No. 09/290,508, filed on Apr.
`12, 1999.
`
`Publication Classification
`
`Int. Cl.7
`US. Cl.
`
`............ .. G10L 21/00; G10L 11/00
`........................................... .. 704/270.1
`
`(57)
`
`ABSTRACT
`
`A distributed voice user interface system includes a local
`device which receives speech input issued from a user. Such
`speech input may specify a command or a request by the
`user. The local device performs preliminary processing of
`the speech input and determines Whether it is able to respond
`to the command or request by itself. If not, the local device
`initiates communication With a remote system for further
`processing of the speech input.
`
`14d
`_ 1
`Local
`Device
`
`14e
`,
`)
`
`Lo
`ca|
`Device
`
`141
`Q 1
`‘
`
`Local
`Device
`
`1
`‘
`
`LAN
`
`1.1,] p 12
`
`Remote System
`
`L1 .
`T
`
`~
`>
`Telecommunications
`,
`Device
`Network 6 L°‘"~.a'
`A
`/\>
`J/\ _
`Local
`Device
`
`\V//\‘
`
`r
`
`Mb
`
`|/
`
`7
`
`/14°
`
`’
`
`Distributed Voice User Interface System
`
`GOOGLE EXHIBIT 1008
`
`Page 1 of 18
`
`

`
`Patent Application Publication
`
`20023:1mHJ
`
`5f01LI.6ehS
`
`US 2002/0072918 A1
`
`\wo_>mn_
`
`as.83_xsimzRmCO_w$O__..__..=.CCCOO®_@n_.
`
`_\.m_u_
`
`mix
`
`_moo._.
`
`H
`
`0r
`
`wo_>mQxv‘;
`¥(\N_
`TA
`
`Em«m>mEoEmm
`
`Page 2 of 18
`
`

`
`Patent Application Publication
`
`20023:1muJ
`
`5£102LI.eehS
`
`US 2002/0072918 A1
`
`
`
`couomzxm_9m.EEmn_
`
`
`
`Em:ogEoomc_mmmoo_n_
`
`
`
`E28500c_-m9mm
`
`
`
`oo_>mn_.6004
`
`
`
`
`
`mc_mcm_coazmoommcommnw
`
`x
`
`
`
`:o_..m.m:mofioam
`
`
`
`./(mugE»M.,Lmo_>mumc=u._oomm
`
`.¢mE_.n_
`
`b_._mco_Hoc:n_
`
`Ewcon_Eo0
`
`Page 3 of 18
`
`

`
`Patent Application Publication
`
`20023:1mHJ
`
`5f03LI.6ehS
`
`US 2002/0072918 A1
`
`Local Devices
`
`
`
`Emum:_._cozmficmwcomwum
`
`
`mcofimmfi\;;
`
`
`
` E‘wk\.ogmcm/.on
`
`
`
`coacmoummcummaw
`
`
`
`
`
`m._mEEm_o_mco_>_u:w:au<nE:Kmcficmmm
`
`
`
`~:m:om_w_..m.o:_-mm_mm_
`
`
`
`
`
`.Eu:oo_Eoom:_mmwo2n__m:m_m
`
`
`
`
`
`amazesco_fi=8tcmoozumn.o.om_.Eooz<>>llv~mc._mE_
`
`
`
`
`
`
`
`Em..m>mm..oEmm_
`
`Page 4 of 18
`
`

`
`Patent Application Publication
`
`Jun. 13, 2002 Sheet 4 of 5
`
`US 2002/0072913 A1
`
`100
`
`Receive Speech4\ 104
`We
`
`Processing at local
`device
`
`Local
`———————\ processing
`Qifficient?
`
`"
`Establish connection
`between local device
`and remote system
`
`Transmit datalspeech
`from local device to
`remote system
`
`Response
`received from
`remote system?
`
`Terminate connection
`between local device
`and remote system
`
`Take action based
`on pmcessing
`
`End session?
`
`Page 5 of 18
`
`

`
`Patent Application Publication
`
`Jun. 13, 2002 Sheet 5 of 5
`
`US 2002/0072918 A1
`
`200
`
`Have user select
`
`‘
`1
`
`from Isted
`commands or
`requests
`
`V
`
`Compare to
`grammars
`
`/
`
`220
`Matchx
`
`Respond to
`command or
`request
`
`End
`
`session?
`
`user’?
`
`Yes

`Compare to
`grammars
`
`206
`
`Yes
`
`\
`
`No
`
` V
`6
`3 208
`Request more
`input
`
`Compare to ‘
`grammars
`
`1
`
`210
`
`21 2
`
`1;
`’
`
`g
`
`r~Yes
`
`More attempts?
`
`</latch’? —Yes
`
`\ N
`
`o
`
`‘
`
`/ 214
`Yes
`:——<Ilore atlempls?
`
`Page 6 of 18
`
`

`
`US 2002/0072918 A1
`
`Jun. 13, 2002
`
`DISTRIBUTED VOICE USER INTERFACE
`
`CROSS-REFERENCE TO RELATED
`Al’PLlCAl‘l(,)NS
`
`[0001] This Application relates to the subject matter dis-
`closed in the following co-pending U.S. Applications: U.S.
`application Ser. No. 08/609,699, filed Mar. 1, 1996, entitled
`“Method and Apparatus For Telephonically Accessing and
`Navigating the Internet,” and US. application Ser. No.
`09/071,717, filed May 1, 1998, entitled “Voice User Inter-
`face With Personality.” These co-pending applications are
`assigned to the present Assignee and are incorporated herein
`by reference.
`
`BACKGROUND OF THE INVENTION
`
`[0002] A Voice user interface (VUI) allows a l1un1an user
`to interact with an intelligent, electronic device (e.g., a
`computer) by merely “talking” to the device. The electronic
`device is thus able to receive, and respond to, directions,
`commands, instructions, or requests issued verbally by the
`human user. As such, a VUI facilitates the use of the device.
`
`[0003] A typical VUI is implemented using various tech-
`niques which enable an electronic device to “understand”
`particular words or phrases spoken by the human user, and
`to output or “speak” the same or different words/phrases for
`prompting, or responding to, the user. The words or phrases
`understood and/or spoken by a device constitute its “Vocabu-
`lary.” In general,
`the number of words/phrases within a
`device’s vocabulary is directly related to the computing
`power which supports its VUI. Thus, a device with more
`computing power can understand more words or phrases
`than a device with less computing power.
`
`[0004] Many modern electronic devices, such as personal
`digital assistants (PDAs), radios, stereo systems, television
`sets, remote controls, household security systems, cable and
`satellite receivers, video game stations, automotive dash-
`board electronics, household appliances, and the like, have
`some computing power, but typically not enough to support
`a sophisticated V1 with a large vocabulary—i.e., a VUI
`capable of understanding and/or speaking many words and
`phrases. Accordingly, it is generally pointless to attempt to
`implement a VUI on such devices as the speech recognition
`and speech output capabilities would be far too limited for
`practical use.
`
`SUMMARY
`
`invention provides a system and
`[0005] The present
`method for a distributed voice user interface (VUI) in which
`a remote system cooperates with one or more local devices
`to deliver a sophisticated Voice user interface at the local
`devices. The remote system and the local devices may
`communicate via a suitable network, such as, for example,
`a telecommunications network or a local area network
`In one embodiment, the distributed VUI is achieved
`by the local devices performing preliminary signal process-
`ing (e.g., speech parameter extraction and/or elementary
`speech recognition)
`and accessing more sophisticated
`speech recognition and/or
`speech output
`functionality
`implemented at the remote system only if and when neces-
`sarv.
`
`[0006] According to an embodiment of the present inven-
`tion, a local device includes an input device which can
`
`issued from a user. A processing
`receive speech input
`component, coupled to the input device, extracts feature
`parameters (which can be frequency domain parameters
`and/or time domain parameters) from the speech input for
`processing at the local device or, alternatively, at a remote
`system.
`
`[0007] According to another embodiment of the present
`invention, a distributed voice user interface system includes
`a local device which continuously monitors for speech input
`issued by a user, scans the speech input for one or more
`keywords, and initiates communication with a remote sys-
`tem when a keyword is detected. The remote system
`receives the speech input from the local device and can then
`recognize words therein.
`
`[0008] According to yet another embodiment of the
`present invention, a local device includes an input device for
`receiving speech input issued from a user. Such speech input
`may specify a command or a request by the user. A pro-
`cessing component, coupled to the input device, is operable
`to perform preliminary processing of the speech input. The
`processing component determines whether the local device
`is by itself able to respond to the command or request
`specified in the speech input. If not, the processing compo-
`nent
`initiates communication with a remote system for
`further processing of the speech input.
`
`[0009] According to still another embodiment of the
`present invention, a remote system includes a transceiver
`which receives speech input, such speech input previously
`issued by a user and preliminarily processed and forwarded
`by a local device. A processing component, coupled to the
`transceiver at the remote system, recognizes words in the
`speech input.
`
`[0010] According to still yet another embodiment of the
`present invention, a method includes the following steps:
`continuously monitoring at a local device for speech input
`issued by a user; scanning the speech input at the local
`device for one or more keywords; initiating a connection
`between the local device and a remote system when a
`keyword is detected, and passing the speech input, or
`appropriate feature parameters extracted from the speech
`input, from the local device to the remote system for
`interpretation.
`
`invention
`[0011] A technical advantage of the present
`includes providing functional control over various local
`devices (e.g., PDAs, radios, stereo systems, television sets,
`remote controls, household security systems, cable and
`satellite receivers, video game stations, automotive dash-
`board electronics, household appliances, etc.) using sophis-
`ticated speech recognition capability enabled primarily at a
`remote site. The speech recognition capability is delivered to
`each local device in the form of a distributed VUI. Thus,
`functional control of the local devices via speech recognition
`can be provided in a cost—elfective manner.
`
`[0012] Another technical advantage of the present inven-
`tion includes providing the vast bulk of hardware and/or
`software for implementing a sophisticated voice user inter-
`face at a single remote system, while only requiring minor
`hardware/software implementations at each of a number of
`local devices. This substantially reduces the cost of deploy-
`ing a sophisticated voice user interface at the Various local
`devices, because the incremental cost for each local device
`
`Page 7 of 18
`
`

`
`US 2002/0072918 A1
`
`Jun. 13, 2002
`
`is small. Furthermore, the sophisticated voice user interface
`is delivered to each local device without substantially
`increasing its size. In addition, the power required to operate
`each local device is minimal since most of the capability for
`the voice user interface resides in the remote system; this can
`be crucial for applications in which a local device is battery-
`powered. Furthermore,
`the single remote system can be
`more easily maintained and upgraded with new features or
`hardware, than can the individual local devices.
`
`[0013] Yet another technical advantage of the present
`invention includes providing a transient, on-demand con-
`nection between each local device and the remote system—
`i.e., communication between a local device and the remote
`system is enabled only if the local device requires the
`assistance of the remote system. Accordingly, communica-
`tion costs, such as, for example, long distance charges, are
`minimized. Furthermore, the remote system is capable of
`supporting a larger number of local devices if each such
`device is only connected 011 a transient basis.
`
`[0014] Still another technical advantage of the present
`invention includes providing the capability for data to be
`downloaded from the remote system to each of the local
`devices, either automatically or in response to a user’s
`request. Thus, the data already present in each local device
`can be updated, replaced, or supplemented as desired, for
`example, to modify the voice user interface capability (e .g.,
`speech recognition/output) supported at the local device. In
`addition, data from news sources or databases can be down-
`loaded (e.g., from the Internet) and made available to the
`local devices for output to users.
`
`[0015] Other aspects and advantages of the present inven-
`tion will become apparent from the following descriptions
`and accompanying drawings.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`[0016] For a more complete understanding of the present
`invention and for further features and advantages, reference
`is now made to the following description taken in conjunc-
`tion with the accompanying drawings, in which:
`
`[0017] FIG. 1 illustrates a distributed voice user interface
`system, according to an embodiment of the present inven-
`tion;
`
`[0018] FIG. 2 illustrates details for a local device, accord-
`ing to an embodiment of the present invention;
`
`[0019] FIG. 3 illustrates details for a remote system,
`according to an embodiment of the present invention;
`
`[0020] FIG. 4 is a flow diagram of an exemplary method
`of operation for a local device, according to an embodiment
`of the present invention; and
`
`[0021] FIG. 5 is a [low diagram of an exemplary method
`of operation for a remote system, according to an embodi-
`ment of the present invention.
`
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENTS
`
`[0022] The preferred embodiments of the present inven-
`tion and their advantages are best understood by referring to
`FIGS. 1 through 5 of the drawings. Like numerals are used
`for like and corresponding parts of the various drawings.
`
`[0023] Turning first to the nomenclature of the specifica-
`tion, the detailed description which follows is represented
`largely in terms of processes and symbolic representations
`of operations performed by conventional computer compo-
`nents, such as a central processing unit (CPU) or processor
`associated with a general purpose computer system, memory
`storage devices for the processor, and connected pixel-
`oriented display devices. These operations include the
`manipulation of data bits by the processor and the mainte-
`nance of these bits within data structures resident in one or
`more of the memory storage devices. Such data structures
`impose a physical organization upon the collection of data
`bits stored within computer memory and represent specific
`electrical or magnetic elements. These symbolic represen-
`tations are the means used by those skilled in the art of
`computer programming and computer construction to most
`eifectively convey teachings and discoveries to others
`skilled in the art.
`
`[0024] For purposes of this discussion, a process, method,
`routine, or sub-routine is generally considered to be a
`sequence of computer-executed steps leading to a desired
`result. These steps generally require manipulations of physi-
`cal quantities. Usually, although not necessarily, these quan-
`tities take the form of electrical, magnetic, or optical signals
`capable of being stored, transferred, combined, compared, or
`otherwise manipulated. It is conventional for those skilled in
`the art to refer to these signals as bits, values, elements,
`symbols, characters, text, terms, numbers, records, files, or
`the like. It should be kept in mind, however, that these and
`some other terms should be associated with appropriate
`physical quantities for computer operations, and that these
`terms are merely conventional labels applied to physical
`quantities that exist within and during operation of the
`computer.
`
`It should also be understood that manipulations
`[0025]
`within the computer are often referred to in terms such as
`adding, comparing, moving, or the like, which are often
`associated with manual operations performed by a human
`operator. It must be understood that no involvement of the
`human operator may be necessary, or even desirable, in the
`present
`invention. The operations described herein are
`machine operations performed in conjunction with the
`human operator or user that interacts with the computer or
`computers.
`
`In addition, it should be understood that the pro-
`[0026]
`grams, processes, methods, and the like, described herein are
`but an exemplary implementation of the present invention
`and are not related, or limited, to any particular computer,
`apparatus, or computer language. Rather, various types of
`general purpose computing machines or devices may be
`used with programs constructed in accordance with the
`teachings described herein. Similarly, it may prove advan-
`tageous to construct a specialized apparatus to perform the
`method steps described herein by way of dedicated com-
`puter systems with hard—wired logic or programs stored in
`r1on-volatile memory, such as read-only memory (ROM).
`
`[0027] Network System Overview
`
`[0028] Referring now to the drawings, FIG. 1 illustrates a
`distributed voice user interface (VUI) system 10, according
`to an embodiment of the present
`invention.
`In general,
`distributed VUI system 10 allows one or more users to
`ir1teract—via speech or verbal comr11unication—with one or
`
`Page 8 of 18
`
`

`
`US 2002/0072918 A1
`
`Jun. 13, 2002
`
`more electronic devices or systems into which distributed
`VUI system 10 is incorporated, or alternatively, to which
`distributed VUI system 10 is connected. As used herein, the
`terms "connected,”“coupled,” or any variant thereof, means
`any connection or coupling, either direct or
`indirect,
`between two or more elements; the coupling or connection
`can be physical or logical.
`system 10
`[0029] More particularly, distributed VUI
`includes a remote system 12 which may communicate with
`a number of local devices 14 (separately designated with
`reference numerals 14a, 14b, 14c, 14d, 14e, 14f, 14g, 14h,
`and 141) to implement one or more distributed VUIs. In one
`embodiment, a “distributed VUI” comprises a voice user
`interface that may control the functioning of a respective
`local device 14 through the services and capabilities of
`remote system 12. That is, remote system 12 cooperates with
`each local device 14 to deliver a separate, sophisticated VUI
`capable of responding to a user and controlling that local
`device 14. In this way, the sophisticated VUIs provided at
`local devices 14 by distributed VUI system 10 facilitate the
`use of the local devices 14. In another embodiment,
`the
`distributed VUI enables control of another apparatus or
`system (e.g., a database or a website), in which case, the
`local device 14 serves as a “medium.”
`
`[0030] Each such VUI of system 10 may be “distributed”
`in the sense that speech recognition and speech output
`software and/or hardware can be implemented in remote
`system 12 and the corresponding functionality distributed to
`the respective local device 14. Some speech recognition,’
`output software or hardware can be implemented in each of
`local devices 14 as well.
`
`[0031] When implementing distributed VUI system 10
`described herein, a number of factors may be considered in
`dividing
`the
`speech
`recognition/output
`functionality
`between local devices 14 and remote system 12. These
`factors may include, for example, the amount of processing
`and memory capability available at each of local devices 14
`and remote system 12; the bandwidth of the hnk between
`each local device 14 and remote system 12; the kinds of
`commands,
`instructions, directions, or requests expected
`from a user, and the respective, expected frequency of each;
`the expected amount of use of a local device 14 by a given
`user; the desired cost for implementing each local device 14;
`etc.
`In one embodiment, each local device 14 may be
`customized to address the specific needs of a particular user,
`thus providing a technical advantage.
`[0032] Local Devices
`[0033] Each local device 14 can be an electronic device
`with a processor having a limited amount of processing or
`computing power. For example, a local device 14 can be a
`relatively small, portable, inexpensive, and/or low power-
`consuming “smart device,” such as a personal digital assis-
`tant (PDA), a wireless remote control (e.g., for a television
`set or stereo system), a smart telephone (such as a cellular
`phone or a stationary phone with a screen), or smart jewelry
`(e.g., an electronic watch). A local device 14 may also
`comprise or be incorporated into a larger device or system,
`such as a television set, a television set top box (e.g., a cable
`receiver, a satellite receiver, or a video game station), a video
`cassette recorder, a video disc player, a radio, a stereo
`system, an automobile dashboard component, a microwave
`oven, a refrigerator, a household security system, a climate
`control system (for heating and cooling), or the like.
`
`In one embodiment, a local device 14 uses elemen-
`[0034]
`tary techniques (e.g., the push of a button) to detect the onset
`of speech. Local device 14 then performs preliminary pro-
`cessing on the speech waveform. For example, local device
`14 may transform speech into a series of feature vectors or
`frequency domain parameters (which differ from the digi-
`tized or compressed speech used in vocoders or cellular
`phones). Specifically, from the speech waveform, the local
`device 14 may extract various feature parameters, such as,
`for example, cepstral coefficients, Fourier coeflicients, linear
`predictive coding (LPC) coe icients, or other spectral
`parameters in the time or frequency domain. These spectral
`parameters (also referred to as features in automatic speech
`recognition systems), which would normally be extracted in
`the first stage of a speech recognition system, are transmitted
`to remote system 12 for processing therein. Speech recog-
`nition and/or speech output hardware/software at remote
`system 12 (in communication with the local device 14) then
`provides a sophisticated VUI through which a user can input
`commands, instructions, or directions into, and/or retrieve
`information or obtain responses from, the local device 14.
`
`In another embodiment, in addition to performing
`[0035]
`preliminary signal processing (including feature parameter
`extraction), at least a portion of local devices 14 may each
`be provided with its own resident VUI. This resident VUI
`allows the respective local device 14 to understand and
`speak to a user, at least on an elementary level, without
`remote system 12. To accomplish this, each such resident
`VUI may include, or be coupled to, suitable input/output
`devices (e.g., microphone and speaker) for receiving and
`outputting audible speech. Furthermore, each resident VUI
`may include hardware and/or software for implementing
`speech recognition (e.g., automatic speech recognition
`(ASR) software) and speech output (e.g., recorded or gen-
`erated speech output software). An exemplary embodiment
`for a resident VUI of a local device 14 is described below in
`more detail.
`
`[0036] A local device 14 with a resident VUI may be, for
`example, a remote control for a television set. Auser may
`issue a command to the local device 14 by stating “Channel
`four” or “Volume up,” to which the local device 14 responds
`by changing the channel on the television set to channel four
`or by turning up the volume on the set.
`
`[0037] Because each local device 14, by definition, has a
`processor with hmited computing power,
`the respective
`resident VUI for a local device 14, taken alone, generally
`does not provide extensive speech recognition and/or speech
`output capability. For example, rather than implement a
`more complex and sophisticated natural
`language
`technique for speech recognition, each resident VUI may
`perform “word spotting” by scanning speech input for the
`occurrence of one or more “keywords.” Furthermore, each
`local device 14 will have a relatively limited vocabulary
`(e.g., less than one hundred words) for its resident VUI. As
`such, a local device 14, by itself, is only capable of respond-
`ing to relatively simple commands, instructions, directions,
`or requests from a user.
`
`In instances where the speech recognition and/or
`[0038]
`speech output capability provided by a resident VUI of a
`local device 14 is not adequate to address the needs of a user,
`the resident VUI can be supplemented with the more exten-
`sive capability provided by remote system 12. Thus, the
`
`Page 9 of 18
`
`

`
`US 2002/0072918 A1
`
`Jun. 13, 2002
`
`local device 14 can be controlled by spoken commands and
`otherwise actively participate in verbal exchanges with the
`user by utilizing more complex speech recognition/output
`hardware and/or software implemented at remote system 12
`(as f11rther described herein).
`
`[0039] Each local device 14 may further comprise a
`manual input device—such as a button, a toggle switch, a
`keypad, or the like—by which a user can interact with the
`local device 14 (and also remote system 12 via a suitable
`communication network) to input commands, instructions,
`requests, or directions without using either the resident or
`distributed VUI. For example, each local device 14 may
`include hardware and/or software supporting the interpreta-
`tion and issuance of dual tone multiple frequency (DTMF)
`commands In one embodiment, such manual input device
`can be used by the user to activate or turn on the respective
`local device 14 and/or initiate communication with remote
`system 12.
`
`[0040] Remote System
`
`In general, remote system 12 supports a relatively
`[0041]
`sophisticated VUI which can be utilized when the capabili-
`ties of any given local device 14 alone are insufficient to
`address or respond to instructions, commands, directions, or
`requests issued by a user at the local device 14. The VUI at
`remote system 12 can be implemented with speech recog-
`nition/output hardware and/or software suitable for perform-
`ing the functionality described herein.
`
`[0042] The VUI of remote system 12 interprets the vocal-
`ized expressions of a user—communicated from a local
`device 14—so that remote system 12 may itself respond, or
`alternatively, direct the local device 14 to respond, to the
`commands, directions, instructions, requests, and other input
`spoken by the user. As such, remote system 12 completes the
`task of recognizing words and phrases.
`
`[0043] The VUI at remote system 12 can be implemented
`with a different type of automatic speech recognition (ASR)
`hardware/software than local devices 14. For example, in
`one embodiment, rather than performing “word spotting,” as
`may occur at local devices 14, remote system 12 may use a
`larger vocabulary recognizer, implemented with word and
`optional sentence recognition grammars. A recognition
`grammar specifies a set of directions, commands, instruc-
`tions, or requests that, when spoken by a user, can be
`understood by a VUI. In other words, a recognition grammar
`specifies what sentences and phrases are to be recognized by
`the VUI. For example, if a local device 14 comprises a
`microwave oven, a distributed VUI for the same can include
`a recognition grammar that allows a user to set a cooking
`time by saying, “Oven high for half a minute,” or “Cook on
`high for thirty seconds,” or, alternatively, “Please cook for
`thirty seconds at high.” Commercially available speech
`recognition systems with recognition grammars are pro-
`vided by ASR technology vendors such as, for example, the
`following: Nuance Corporation of Menlo Park, Calif.;
`Dragon Systems of Newton, Mass; IBM of Austin, Tex;
`Kurzweil Applied Intelligence of Waltham, Mass.; Lernout
`Hauspie Speech Products of Burlington, Mass.; and Pure-
`Speech, Inc. of Cambridge, Mass.
`
`[0044] Remote system 12 may process the directions,
`commands, instructions, or requests that it has recognized or
`understood from the utterances of a user. During processing,
`
`remote system 12 can, among other things, generate control
`signals and reply messages, which are returned to a local
`device 14. Control signals are used to direct or control the
`local device 14 in response to user input. For example, in
`response to a user command of “Turn up the heat to 82
`degrees,” control signals may direct a local device 14
`incorporating a thermostat to adjust the temperature of a
`climate control system. Reply messages are intended for the
`immediate consumption of a user at the local device and may
`take the form of video or audio, or text to be displayed at the
`local device. As a reply message, the VUI at remote system
`12 may issue audible output in the form of speech that is
`understandable by a user.
`the VUI of remote
`[0045] For issuing reply messages,
`system 12 may include capability for speech generation
`(synthesized speech) and/or play—back (previously recorded
`speech). Speech generation capability can be implemented
`with text-to-speech (TTS) hardware/software, which con-
`verts textual information into synthesized, audible speech.
`Speech play—back capability may be implemented with an
`analog-to-digital (A/D) converter driven by CD ROM (or
`other digital memory device), a tape player, a laser disc
`player, a specialized integrated circuit (IC) device, or the
`like, which plays back previously recorded human speech.
`[0046]
`In speech play—back, a person (preferably a voice
`model) recites various statements which may desirably be
`issued during an interactive session with a user at a local
`device 14 of distributed VUI system 10. The person’s voice
`is recorded as the recitations are made. The recordings are
`separated into discrete messages, each message comprising
`one or more statements that would desirably be issued in a
`particular context
`(e.g., greeting,
`farewell,
`requesting
`instructions, receiving instructions, etc.). Afterwards, when
`a user interacts with distributed VUI system 10, the recorded
`messages are played back to the user when the proper
`context arises.
`
`[0047] The reply messages generated by the VUI at
`remote system 12 can be made to be consistent with any
`messages provided by the resident VUI of a local device 14.
`For example,
`if speech play—back capability is used for
`generating speech, the same person’s voice may be recorded
`for messages output by the resident VUI of the local device
`14 and the VUI of remote system 12. If synthesized (com-
`puter-generated) speech capability is used, a similar sound-
`ing artificial voice may be provided for the VUls of both
`local devices 14 and remote system 12. In this way, the
`distributed VUI of system 10 provides to a user an interac-
`tive interface which is “seamless” in the sense that the user
`cannot distinguish between the simpler, resident VUI of the
`local device 14 and the more sophisticated VUI of remote
`system 12.
`[0048]
`In one embodiment, the speech recognition and
`speech play—back capabilities described herein can be used
`to implement a voice user interface with personality, as
`taught by U.S. patent application Ser. No. 09/071,717,
`entitled “Voice User Interface With Personality,” the text of
`which is incorporated herein by reference.
`[0049] Remote system 12 may also comprise hardware
`and/or software supporting the interpretation and issuance of
`commands, such as dual tone multiple frequency (DTMF)
`commands, so that a user may alternatively interact with
`remote system 12 using an alternative input device, such as
`a telephone key pad.
`
`Page 10 of 18
`
`

`
`US 2002/0072918 A1
`
`Jun. 13, 2002
`
`[0050] Remote system 12 may be in communication with
`the “Internet,” thus providing access thereto for users at
`local devices 14. The Internet
`is an interconnection of
`computer “clients” and “servers” located throughout
`the
`world and exchanging information according to Transmis-
`sion Control Protocol/Internet Protocol (TCP/IP), Internet-
`work Packet eXchange/Sequence Packet exchange (IPX/
`SPX), AppleTalk, or other suitable protocol. The Internet
`supports the distributed application known as the “World
`Wide Web.” Web servers may exchange information with
`one another using a protocol known as hypertext transport
`protocol
`Information may be communicated from
`one server
`to any other computer using HTTP and is
`maintained in the form of web pages, each of which can be
`identified by a respective uniform resource locator (URL).
`Remote system 12 may function as a client to interconnect
`with Web servers. The interconnection may use any of a
`variety of communication links, such as, for example, a local
`telephone communication line or a dedicated communica-
`tion line. Remote system 12 may comprise and locally
`execute a “web browser” or “web proxy” program. A web
`browser is a computer program that allows remote system
`12, acting as a client,
`to exchange information with the
`World Wide Web. Any of a variety of web browsers are
`available,
`such
`as NETSCAPE NAVIGATOR from
`Netscape Communications Corp. of Mountain View, (Calif.,
`INTERNET EXPLORER from Microsoft Corporation of
`Redmond, Wash., and others that allow users to corive-
`niently access and navigate the Internet. A web proxy is a
`computer program which (via the Internet) can, for example,
`electronically integrate the systems of a company and its
`vendors and/or customers, support business transacted elec-
`tronically over the network (i.e., “e—commerce”), and pro-
`vide automated access to Web-enabled resources. Any num-
`ber
`of web proxies
`are
`available,
`such
`as B2B
`INTEGRATION SERVER from webMethods of Fairfax,
`Va., and MICROSOFT PROXY SERVER from Microsoft
`Corporation of Redmond, Wash. The hardware, software,
`and protocols—as well as the underlying concepts and
`techniques—supporting the Internet are generally under-
`stood by those in the art.
`
`[0051] Communication Network
`
`[0052] One or more suitable communication networks
`enable local devices 14 to communicate with remote system
`12. For example, as shown, local devices 14

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket