`(12) Patent Application Publication (10) Pub. No.: US 2002/0193989 A1
`(43) Pub. Date:
`Dec. 19, 2002
`GEILHUFE et al.
`
`US 2002O193989A1
`
`(54) METHOD AND APPARATUS FOR
`IDENTIFYING, VOICE CONTROLLED
`DEVICES
`
`(76) Inventors: MICHAEL GEILHUFE, PALO ALTO,
`CA (US); DAVID MACMILLAN,
`WOODSIDE, CA (US); AVRAHAM
`BAREL, DOAR NASHIMSHON (IL);
`AMOS BROWN, GIVAT SMMUHEL
`(IL); KARIN LISSETTE BOOTSMA,
`SAN JOSE, CA (US); LAWRENCE
`KENT GADDY, SAN JOSE, CA (US);
`PHILLIP PAUL PYO, SANJOSE, CA
`(US)
`Correspondence Address:
`BLAKELY SOKOLOFFTAYLOR & ZAFMAN
`LLP
`12400 WILSHIRE BOULEVARD SEVENTH
`FLOOR
`LOS ANGELES, CA 900251026
`(*) Notice:
`This is a publication of a continued pros
`ecution application (CPA) filed under 37
`CFR 1.53(d).
`(21) Appl. No.:
`09/316,334
`May 21, 1999
`(22) Filed:
`
`Publication Classification
`
`
`
`... G10L 11/06
`(51) Int. CI.7.
`(52) U.S. Cl. .............................................................. 704/208
`(57)
`ABSTRACT
`The present invention includes a method, apparatus and
`system for STANDARD VOICE USER INTERFACE AND
`VOICE CONTROLLED DEVICES as described in the
`claims. Briefly, a Standard Voice user interface is provided to
`control various devices by using Standard Speech com
`mands. The standard VUI provides a set of standard VUI
`commands and Syntax for the interface between a user and
`the voice controlled device. The standard VUI commands
`include an identification phrase to determine if voice con
`trolled devices are available in an environment. Other Stan
`dard VUI commands provide for determining the names of
`the Voice controlled devices and altering them.
`Voice controlled devices are disclosed. A voice controlled
`device is defined herein as any device that is controlled by
`Speech, which is either audible or non-audible. A voice
`controlled device may also be referred to herein as an
`appliance, a machine, a voice controlled appliance a voice
`controlled electronic device, a name activated electronic
`device, a speech controlled device, a voice activated elec
`tronic appliance, a voice activated appliance, a voice con
`trolled electronic device, or a Self-identifying voice con
`trolled electronic device.
`
`100
`
`Petitioner’s Ex. 1025, Page 1
`
`
`
`Patent Application Publication Dec. 19, 2002 Sheet 1 of 21
`
`US 2002/0193989 A1
`
`
`
`Petitioner’s Ex. 1025, Page 2
`
`
`
`Patent Application Publication Dec. 19, 2002 Sheet 2 of 21
`
`US 2002/0193989 A1
`
`e
`(as
`aim
`
`
`
`Petitioner’s Ex. 1025, Page 3
`
`
`
`Patent Application Publication Dec. 19, 2002 Sheet 3 of 21
`
`US 2002/0193989 A1
`
`
`
`rs
`
`C
`
`C)
`
`g
`Q
`
`Petitioner’s Ex. 1025, Page 4
`
`
`
`Patent Application Publication Dec. 19, 2002 Sheet 4 of 21
`
`US 2002/0193989 A1
`
`SeO|
`
`
`
`Petitioner’s Ex. 1025, Page 5
`
`
`
`Patent Application Publication Dec. 19, 2002 Sheet 5 of 21
`
`US 2002/0193989 A1
`
`
`
`Petitioner’s Ex. 1025, Page 6
`
`
`
`Patent Application Publication Dec. 19, 2002 Sheet 6 of 21
`
`US 2002/0193989 A1
`
`
`
`gld
`"c
`C
`O
`.S. r)
`N
`9
`C
`O
`<
`
`Petitioner’s Ex. 1025, Page 7
`
`
`
`Patent Application Publication Dec. 19,2002 Sheet 7 of 21
`
`US 2002/0193989 A1
`
`tee4ole)21abessew
`PRIS0}idjeynuew
`ulnjeyAuleyyeuAeld
`ODFTornoe
`|Ea4.
`wensay<@0Uualis>
`
`WAL.OL9:9T4yoS1J2UM,
`
`009
`
`winjey
`
`HES0}
`
`0909
`
`uInyay
`
`HEYS0}
`
`
`
`SIN)spuosasN49d0UDaIIS
`
`(ajqewwesod
`
`Arejnqeoon-jo-3nQ
`
`pom
`
`<osweu>
`
`<aouajis>
`
`<(Jeuondo)sweu>
`
`e}-O}-YONOL
`
`pessaid
`
`yeIS
`
`009
`
`909
`
`aaS
`
`winjoy*dog|sAes1987)
`
`
`HES0}1Bulkejd»PULLUEABN,
`
`
`1abessoul|1day!
`
`YQP44
`
`log
`
`avisGL
`
`yO)
`
`009)
`
`pSBI
`
`TaMWD
`
`Mundy)jfone|{+1‘009'------4
`
`
` pot+award|weslGavervor’
`
`amuayiAvid
`\)oy",
`
`tAwyN|1ganarssy|Wels1yasn|abHwa
`«¢
`
`Petitioner's Ex. 1025, Page 8
`
`Petitioner’s Ex. 1025, Page 8
`
`
`
`
`
`
`
`Patent Application Publication
`
`Dec. 19, 2002. Sheet 8 of 21
`
`US 2002/0193989 A1
`
`• • • ? ? ? ?
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Petitioner’s Ex. 1025, Page 9
`
`
`
`Patent Application Publication Dec. 19, 2002 Sheet 9 of 21
`
`US 2002/0193989 A1
`
`(OO
`
`<silences
`"What is out
`there?"
`
`
`
`
`
`No beep heard
`P.
`Wait random
`other device
`delay of up to 2
`"You can came
`-) Seconds, then -)
`<name>"
`emit beep, all the
`while listening
`for other devices'
`beep
`--------------
`
`Beep was heard from
`any other device
`-------------
`No beep heard
`Wait random
`from any
`delay of up to 4
`other device
`seconds, then
`"You can call me
`emit beep, all the H):
`<name>"
`while listening
`for other devices :
`beep
`
`(2 Do
`
`Return to start
`
`Return to start
`
`Beep was heard from
`any other device
`
`No beep heard
`from any
`other device
`
`Wait randon
`delay of up to 8
`seconds, then
`emit beep, all the
`while listening
`for other devices'
`beep
`
`Beep was heard from
`any other device
`
`
`
`No beep heard
`f
`p near
`of evice
`
`Wait random
`delay of up to 16
`seconds, then
`emit beep, all the
`while listening
`for other devices'
`beep
`
`Beep was heard from
`any other device
`
`"You can call me
`<name>"
`
`Return to start
`
`"You can cane
`<name>"
`
`Return to start
`
`- LG & C
`
`Petitioner’s Ex. 1025, Page 10
`
`
`
`Patent Application Publication Dec. 19, 2002 Sheet 10 of 21
`
`US 2002/0193989 A1
`
`Glº) º T
`
`
`
`wae
`
`QQ2)
`
`OOº)|- - - - - - ---------- - - - - - - -
`
`Petitioner’s Ex. 1025, Page 11
`
`
`
`Patent Application Publication Dec. 19, 2002 Sheet 11 of 21
`
`US 2002/0193989 A1
`
`o 69)
`
`OG Ø)
`
`
`
`| … |
`
`| … |
`
`|Oº)
`
`Petitioner’s Ex. 1025, Page 12
`
`
`
`Patent Application Publication Dec. 19, 2002 Sheet 12 of 21
`
`US 2002/0193989 A1
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`? SA 29 o
`
`Store name
`
`
`
`1...e."
`NR
`Cvoicetag2
`
`
`
`Compare to
`existing entrie
`
`"Please repeat
`the fame
`
`Returns
`existing entry
`
`avoicetaco
`9
`
`that sounds like
`<name>. Please say
`the name differently
`
`
`
`Compare
`entries
`
`
`
`
`
`
`
`Please say the
`number for <voicetag?"
`
`
`
`
`
`"The number for <voicetage
`is <digits). is this correct?"
`
`<6D
`Yes
`
`"The number for <voicetag>
`has been stored
`
`Optional prompt: If timeout
`occurs after Store', prompt with
`'Please say the name
`you want to store
`
`
`
`"Sorry, didn't
`understand. Please say
`the name again
`
`<voicetag)
`
`
`
`
`
`The names
`did not ratch
`Please start over
`
`"Please repeat
`
`<digits.>
`
`
`
`On second try, if still no, Start Over
`
`Store number
`
`Reign
`
`SA2
`
`(2 od
`
`F16. I
`
`Petitioner’s Ex. 1025, Page 13
`
`
`
`Patent Application Publication Dec. 19, 2002 Sheet 13 of 21
`
`US 2002/0193989 A1
`
`Stary
`
`af
`Anit
`
`Optional prompt if timeout
`occurs after Delete', prompt with
`"Please say the name you
`Want to delete
`
`
`
`
`
`
`
`
`
`
`
`
`
`"Sorry, I didn't
`understand. Please say
`the name again"
`
`
`
`
`
`
`
`Compare to
`stored Eist
`
`That rame is
`not in the phone book
`Please start over
`
`
`
`On list
`
`"Are you sure you
`want to delete <voicetag->?
`
`
`
`"<voicetag->
`deleted
`
`
`
`Delete voicetag
`and number
`
`Re-ru? N
`To staff
`
`Fig. 3
`
`Petitioner’s Ex. 1025, Page 14
`
`
`
`Patent Application Publication Dec. 19, 2002. Sheet 14 of 21
`
`US 2002/0193989 A1
`
`l
`
`F:Cuestions?
`
`(3 seconda max)
`
`F: Pisang asswores of No.
`
`F:
`
`t
`
`p
`
`f
`V
`"Garbage
`
`1
`
`gutreaponse
`yes/nogarbage
`
`f
`spoke /
`Neror
`
`Tirne's Up
`
`Ye
`No."
`Badirect
`
`"Yes
`"No
`Cootrec
`
`(49
`
`Rooivo
`Rhostian
`
`Failure
`
`Succes
`
`Petitioner’s Ex. 1025, Page 15
`
`
`
`Patent Application Publication Dec. 19, 2002 Sheet 15 of 21
`
`US 2002/0193989 A1
`
`
`
`(of
`
`F: Was that a Yes?
`
`Petitioner’s Ex. 1025, Page 16
`
`
`
`Patent Application Publication Dec. 19, 2002 Sheet 16 of 21
`
`US 2002/0193989 A1
`
`
`
`
`
`Stop Recognition
`
`Play Prompt? Tone
`Start C. Win Timer
`
`subsequent
`STS
`
`
`
`
`
`C Win Timeout
`
`Petitioner’s Ex. 1025, Page 17
`
`
`
`Patent Application Publication Dec. 19, 2002 Sheet 17 of 21
`
`US 2002/0193989 A1
`
`
`
`speech onset
`
`Petitioner’s Ex. 1025, Page 18
`
`
`
`Patent Application Publication Dec. 19, 2002. Sheet 18 of 21
`
`US 2002/0193989 A1
`
`
`
`Save Pre Sil
`Start Speech Timer
`
`
`
`speech
`timer
`decay
`
`
`
`|--e.g., OC
`
`Petitioner’s Ex. 1025, Page 19
`
`
`
`Patent Application Publication Dec. 19, 2002. Sheet 19 of 21
`
`US 2002/0193989 A1
`
`
`
`rejection
`
`failure
`
`Interpret First-Try &
`Change Prompt
`
`
`
`ScCeSS
`
`
`
`
`
`
`
`SUCCESS
`
`C
`
`
`
`D
`
`FAILURE
`
`
`
`Petitioner’s Ex. 1025, Page 20
`
`
`
`Patent Application Publication Dec. 19, 2002. Sheet 20 of 21
`
`US 2002/0193989 A1
`
`
`
`NEXT
`Menu item
`
`- -
`
`-
`
`-
`
`-
`
`-
`
`-
`
`- - Query and Selection Logic - - - - - - -
`
`Analyze
`Response
`
`
`
`
`
`spurious
`response
`
`
`
`
`
`
`
`
`
`
`
`timeout
`(no speech)
`
`Menu
`
`Petitioner’s Ex. 1025, Page 21
`
`
`
`Patent Application Publication Dec. 19, 2002 Sheet 21 of 21
`
`US 2002/0193989 A1
`
`C)
`
`62) sy
`
`s
`
`Petitioner’s Ex. 1025, Page 22
`
`
`
`US 2002/0193989 A1
`
`Dec. 19, 2002
`
`METHOD AND APPARATUS FOR IDENTIFYING
`VOICE CONTROLLED DEVICES
`CROSS REFERENCE TO RELATED
`APPLICATION
`0001. This application is also related to U.S. patent
`application Ser. No.
`, filed by inventors GEILHUFE
`et al., Attorney Docket No. 042236.P050, entitled
`“METHOD AND APPARATUS FOR STANDARD VOICE
`USER INTERFACE AND VOICE CONTROLLED
`DEVICES” and to be assigned to Information Storage
`Devices, Inc. the disclosure of which is hereby incorporated
`by reference, Verbatim and with the same effect as though it
`were fully and completely set forth herein.
`0002 This application is also related to U.S. patent
`application Ser. No.
`, filed by inventors GEILHUFE
`et al., Attorney Docket No. 042236.P051, entitled
`“METHOD AND APPARATUS FOR CONTROLLING
`VOICE CONTROLLED DEVICES” and to be assigned to
`Information Storage Devices, Inc. the disclosure of which is
`hereby incorporated by reference, verbatim and with the
`Same effect as though it were fully and completely Set forth
`herein.
`0003. This application is also related to U.S. patent
`application Ser. No.
`, filed by inventors GEILHUFE
`et al., Attorney Docket No. 042236.P053, entitled
`“METHOD AND APPARATUS FOR IDENTIFYING
`VOICE CONTROLLED DEVICES” and to be assigned to
`Information Storage Devices, Inc. the disclosure of which is
`hereby incorporated by reference, verbatim and with the
`Same effect as though it were fully and completely Set forth
`herein.
`0004. This application is also related to U.S. patent
`application Ser. No.
`, filed by inventors GEILHUFE
`et al., Attorney Docket No. 042236.P054, entitled
`“METHOD AND APPARATUS FOR VOICE CON
`TROLLED DEVICES WITH IMPROVED PHRASE
`STORAGE, USE, CONVERSION, TRANSFER, AND
`RECOGNITION” and to be assigned to Information Storage
`Devices, Inc. the disclosure of which is hereby incorporated
`by reference, Verbatim and with the same effect as though it
`were fully and completely set forth herein.
`0005. This application is also related to U.S. patent
`application Ser. No.
`, filed by inventors GEILHUFE
`et al., Attorney Docket No. 042236.P055, entitled
`“METHOD AND APPARATUS FOR MACHINE TO
`MACHINE COMMUNICATION USING SPEECH and to
`be assigned to Information Storage Devices, Inc. the disclo
`sure of which is hereby incorporated by reference, verbatim
`and with the same effect as though it were fully and
`completely Set forth herein.
`MICROFICHEAPPENDIX
`0006. This application contains a microfiche appendix
`which is not printed herewith entitled “ISD-SR 300, Embed
`ded Speech Recognition Processor' by Information Storage
`Devices, Inc. which is hereby incorporated by reference,
`verbatim and with the same effect as though it were fully and
`completely Set forth herein.
`FIELD OF THE INVENTION
`0007. This invention relates generally to machine inter
`faces. More particularly, the invention relates to voice user
`interfaces for devices.
`
`BACKGROUND OF THE INVENTION
`0008 Graphical user interfaces (GUIs) for computers are
`well known. GUIS provide an intuitive and consistent man
`ner for human interaction with computers. Generally, once a
`perSon learns how to use a particular GUI, they can operate
`any computer or device which operates using the same or
`similar GUI. Examples of popular GUIs are MAC OS by
`Apple, and MS Windows by Microsoft. GUIs are now being
`ported to other devices. For example, the MS Windows GUI
`has been ported from computers to palm tops, personal
`organizers, and other devices So that there is a common GUI
`amongst a number of differing devices. However, as the
`name implies, GUIS require at least Some Sort of Visual or
`graphical display and an input device Such as a keyboard,
`mouse, touchpad or touch Screen. The displayS and the input
`devices tend to utilize Space in an device, require additional
`components and increase the costs of an device. Thus, it is
`desirable to eliminate the display and input devices from
`devices to Save costs.
`0009 Recently, voice user interfaces (VUIs) have been
`introduced that utilize Speech recognition methods to control
`a device. However, these prior art VUIs have a number of
`Shortcomings that prohibit them from being universally
`utilized in all devices. Prior art VUIs are usually difficult to
`use. Prior art VUIs usually require some sort of display
`device Such as an LCD, or require a manual input device
`Such as keypads or buttons, or require both a display and a
`manual input device. Additionally, prior art VUIS usually are
`proprietary and restricted in use to a single make or model
`of hardware device, or a single type of Software application.
`They usually are not widely available, unlike computer
`operating Systems, and accordingly Software programmerS
`can not write applications that operate with the VUI in a
`variety of device types. Commands associated with prior art
`VUIs are usually customized for that single type of device
`or software application. Prior art VUIs usually have addi
`tional limitations in Supporting multiple users Such as how
`to handle personalization and Security. Furthermore, prior art
`VUIs require that a user know of the existence of the device
`in advance. Prior art VUIs have not provided ways of
`determining the presence of devices. Additionally, prior art
`VUIS usually require a user to read instruction manuals or
`Screen displayed commands to become trained in their use.
`Prior art VUIs usually do not include audible methods for a
`user to learn commands. Furthermore, a user may be
`required to learn how to use multiple prior art VUIs when
`utilizing multiple voice controlled devices due to a lack of
`Standardization.
`0010 Generally, devices controlled by VUIs continue to
`require Some Sort of manual control of functions. With Some
`manual control required, a manual input device Such as a
`button, keypad or a set of buttons or keypads is provided. To
`assure proper manual entry, a display device Such as an
`LCD, LED, or other graphics display device may be pro
`Vided. For example, many voice activated telephones require
`that telephone numbers be Stored manually. In this case a
`numeric keypad is usually provided for manual entry. An
`LCD is usually included to assure proper manual entry and
`to display the Status of the device. A speech Synthesis or
`Voice feedback System may be absent from these devices.
`The addition of buttons and display devices increases the
`manufacturing cost of devices. It is desirable to be able to
`eliminate all manual input and display from devices in order
`
`Petitioner’s Ex. 1025, Page 23
`
`
`
`US 2002/0193989 A1
`
`Dec. 19, 2002
`
`to decrease costs. Furthermore, it is more convenient to
`remotely control devices without requiring Specific buttons
`or displayS.
`0.011
`Previously, devices were used by few. Additionally
`they used near field microphones to listen locally for voices.
`Many prior devices were fixed in Some manner or not readily
`portable or were server based systems. It is desirable to
`provide Voice control capability for portable devices. It is
`desirable to provide either near field or far field microphone
`technology in Voice controlled devices. It is desirable to
`provide low cost voice control capability Such that it is
`included in more devices. However, these desires raise a
`problem when multiple users of multiple voice controlled
`devices are in the Same area. With multiple users and
`multiple voice controlled devices within audible range of
`each other, it makes it difficult for voice controlled devices
`to discern which user to accept commands from and respond
`to. For example, consider the case of Voice controlled cell
`phones where one user in an environment of multiple users
`wants to call home. The user issues a voice activated call
`home command. If more than one Voice controlled cell
`phone audibly hears the call home command, multiple voice
`controlled cell phones may respond and Start dialing a home
`telephone number. Previously this was not as Significant a
`problem because there were few voice controlled devices.
`0012 Some voice controlled devices are speaker depen
`dent. Speaker dependency refers to a voice controlled device
`that requires training by a specific user before it may be used
`with that user. A speaker dependent Voice controlled device
`listens for tonal qualities in how phrases are spoken. Speaker
`dependent voice controlled devices do not lend themselves
`to applications where multiple users or Speakers are required
`to use the voice controlled device. This is because they fail
`to efficiently recognize speech from users that they have not
`been trained by. It is desirable to provide Speaker indepen
`dent voice controlled devices with a VUI requiring little or
`no training in order to recognize speech from any user.
`0013 In order to achieve high accuracy speech recogni
`tion it is important that a voice controlled device avoid
`responding to speech that isn’t directed to it. That is, Voice
`controlled devices should not respond to background con
`Versation, to noises, or to commands to other Voice con
`trolled devices. However, filtering out background Sounds
`must not be So effective that it also prevents recognition of
`Speech directed to the Voice controlled device. Finding the
`right mix of rejection of background Sounds and recognition
`of Speech directed to a voice controlled device is particularly
`challenging in Speaker-independent Systems. In Speaker
`independent Systems, the Voice controlled device must be
`able to respond to a wide range of Voices, and therefore can
`not use a highly restrictive filter for background Sounds. In
`contrast, a Speaker-dependant System need only listen for a
`particular person's voice, and thus can employ a more
`Stringent filter for background Sounds. Despite this advan
`tage in Speaker dependant Systems, filtering out background
`Sounds is still a significant challenge.
`0.014.
`In some prior art systems, background conversa
`tion has been filtered out by having a user physically preSS
`a button in order to activate speech recognition. The disad
`Vantage of this approach is that it requires the user to interact
`with the voice controlled device physically, rather than
`Strictly by voice or speech. One of the potential advantages
`
`of voice controlled devices is that they offer the promise of
`true hands-free operation. Elimination of the need to press a
`button to activate Speech recognition would go a long way
`to making this hands-free objective achievable.
`0015 Additionally, in locations with a number of people
`talking, a voice controlled device should disregard all
`Speech unless it is directed to it. For example, if a perSon
`says to another person “I’ll call John', the cellphone in his
`pocket should not interpret the “call John” as a command. If
`there are multiple Voice controlled devices in one location,
`there should be a way to uniquely identify which voice
`controlled device a user wishes to control. For example,
`consider a room that may have multiple voice controlled
`telephones-perhaps a couple of desktop phones, and mul
`tiple cellphones-one for each perSon. If Someone were to
`say “Call 555-1212, each phone may try to place the call
`unless there was a means for them to disregard certain
`commands. In the case where a Voice controlled device is to
`be controlled by multiple users, it is desirable for the voice
`controlled device to know which user is commanding it. For
`example, a Voice controlled desktop phone in a house may
`be used by a husband, wife and child. Each would could
`have their own phonebook of frequently called numbers.
`When the voice controlled device is told “Call Mother', it
`needs to know which user is issuing the command So that it
`can call the right person (i.e. should it call the husbands
`mother, the wife's mother, or the child's mother at her work
`number?). Additionally, a voice controlled device with mul
`tiple users may need a method to enforce Security to protect
`it from unauthorized use or to protect a user's personalized
`Settings from unintentional or malicious interactions by
`others (including Snooping, changing, deleting, or adding to
`the Settings). Furthermore, in a location where there are
`multiple voice controlled devices, there should be a way to
`identify the presence of voice controlled devices. For
`example, consider a traveler arriving at a new hotel room.
`Upon entering the hotel room, the traveler would like to
`know what Voice controlled devices may be present and how
`to control them. It is desirable that the identification process
`be standardized so that all voice controlled devices may be
`identified in the same way.
`0016. In voice controlled devices, it is desirable to store
`phrases under Voice control. A phrase is defined as a single
`word, or a group of words treated as a unit. This storing
`might be to Set options or create personalized Settings. For
`example, in a Voice-controlled telephone, it is desirable to
`Store people's names and phone numbers under Voice con
`trol into a personalized phone book. At a later time, this
`phone book can be used to call people by Speaking their
`name (e.g. “Cellphone call John Smith', or “Cellphone call
`Mother”).
`0017 Prior art approaches to storing the phrase ("John
`Smith') operate by Storing the phrase in a compressed,
`uncompressed, or transformed manner that attempts to pre
`Serve the actual Sound. Detection of the phrase in a com
`mand (i.e. detecting that John is to be called in the example
`above) then relies on a Sound-based comparison between the
`original Stored speech Sound and the Spoken command.
`Sometimes the stored waveform is transformed into the
`frequency domain and/or is time adjusted to facilitate the
`match, but in any case the fundamental operation being
`performed is one that compares the actual Sounds. The
`Stored Sound representation and comparison for detection
`
`Petitioner’s Ex. 1025, Page 24
`
`
`
`US 2002/0193989 A1
`
`Dec. 19, 2002
`
`Suffers from a number of disadvantages. If a Speaker's voice
`changes, perhaps due to a cold, StreSS, fatigue, noisy or
`distorting connection by telephone, or other factors, the
`comparison typically is not Successful and Stored phrases are
`not recognized. Because the phrase is Stored as a Sound
`representation, there is no way to extract a text-based
`representation of the phrase. Additionally, Storing a Sound
`representation results in a Speaker dependent System. It is
`unlikely that another person could Speak the Same phrase
`using the same Sounds in a command and have it be correctly
`recognized. It would not be reliable, for example, for a
`Secretary to Store phonebook entries and a manager to make
`calls using those entries. It is desirable to provide a speaker
`independent Storage means. Additionally, if the phrases are
`Stored as Sound representations, the Stored phrases can not
`be used in another Voice controlled device unless the same
`waveform processing algorithms are used by both voice
`controlled devices. It is desirable to recognize Spoken
`phrases and Store them in a representation Such that, once
`Stored, the phrases can be used for Speaker independent
`recognition and can be used by multiple voice controlled
`devices.
`Presently computers and other devices communi
`0.018
`cate commands and data to other computers or devices using
`modem, infrared or wireleSS radio frequency transmission.
`The transmitted command and/or data are usually of a digital
`form that only the computer or device may understand. In
`order for a human user to understand the command or data
`it must be decoded by a computer and then displayed in
`Some sort of format Such as a number or ASCII text on a
`display. When the command and/or data are transmitted they
`are usually encoded in Some digital format understood by
`the computer or devices or transmitting equipment. AS Voice
`controlled devices become more prevalent, it will be desir
`able for voice controlled devices to communicate with each
`other using human-like Speech in order to avoid providing
`additional circuitry for communication between Voice con
`trolled devices. It is further desirable to allow multiple voice
`controlled devices to exchange information machine-to
`machine without human user intervention.
`BRIEF SUMMARY OF THE INVENTION
`0019. The present invention includes a method, apparatus
`and system for STANDARD VOICE USER INTERFACE
`AND VOICE CONTROLLED DEVICES as described in
`the claims. Briefly, a Standard Voice user interface is pro
`Vided to control various devices by using Standard Speech
`commands. The standard VUI provides a set of standard
`VUI commands and syntax for the interface between a user
`and the voice controlled device. The standard VUI com
`mands include an identification phrase to determine if voice
`controlled devices are available in an environment. Other
`Standard VUI commands provide for determining the names
`of the Voice controlled devices and altering them.
`0020 Voice controlled devices are disclosed. A voice
`controlled device is defined herein as any device that is
`controlled by speech, which is either audible or non-audible.
`A Voice controlled device may also be referred to herein as
`an appliance, a machine, a Voice controlled appliance, a
`Voice controlled electronic device, a name activated elec
`tronic device, a speech controlled device, a voice activated
`electronic appliance, a voice activated appliance, a voice
`controlled electronic device, or a Self-identifying voice
`controlled electronic device.
`
`0021. In order to gain access to the functionality of voice
`controlled devices, a user communicates to the Voice con
`trolled device one of its associated appliance names after a
`period of relative Silence. The appliance name may be a
`default name or a user-assignable name. The Voice con
`trolled device may have a plurality of user-assignable names
`asSociated with it for providing personalized functionality to
`each user.
`0022. Other aspects of the present invention are described
`in the detailed description.
`
`BRIEF DESCRIPTIONS OF THE DRAWINGS
`0023 FIG. 1A is an illustration of an environment con
`taining voice controlled devices of the present invention.
`0024 FIG. 1B is an illustration of remote communica
`tions with the voice controlled devices in the environment
`illustrated in FIG. 1A.
`0025 FIG. 2 is an illustration of exemplary voice con
`trolled devices.
`0026 FIG. 3 is a detailed block diagram of the voice
`controlled device of the present invention.
`0027 FIG. 4 is a detailed block diagram of a voice
`communication chip.
`0028 FIG. 5 is a block diagram of the standard voice
`user interface of the present invention.
`0029 FIGS. 6A-6C are flow charts of the core command
`Structure for the Standard Voice user interface of the present
`invention.
`0030 FIGS. 6D-6E are flow charts of the telephone
`command structure for the Standard Voice user interface of
`the present invention.
`0031 FIG. 7 is a flow chart of the “Store Name” tele
`phone command Structure for the Standard Voice user inter
`face of the present invention.
`0032 FIG. 8 is a flow chart of the “Delete Name”
`telephone command Structure for the Standard Voice user
`interface of the present invention.
`0033 FIGS. 9A-9B are flow charts of the “GETYESNO”
`function for the Standard Voice user interface of the present
`invention.
`0034 FIGS. 10A-10C are flow charts of the “GETRE
`SPONSE' function for the standard voice user interface of
`the present invention.
`0035 FIG. 11 is a flow chart of the “GETRESPONSE
`PLUS’ function for the standard voice user interface of the
`present is invention.
`0.036 FIG. 12 is a flow chart of the “LISTANDSELECT"
`function for the Standard Voice user interface of the present
`invention.
`0037 FIG. 13 is a block diagram of a pair of voice
`controlled devices communicating using the Standard Voice
`user interface of the present invention.
`0038 Like reference numbers and designations in the
`drawings indicate like elements providing Similar function
`ality.
`
`Petitioner’s Ex. 1025, Page 25
`
`
`
`US 2002/0193989 A1
`
`Dec. 19, 2002
`
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENT
`0039. In the following detailed description of the present
`invention, numerous Specific details are Set forth in order to
`provide a thorough understanding of the present invention.
`However, it will be obvious to one skilled in the art that the
`present invention may be practiced without these specific
`details. In other instances well known methods, procedures,
`components, and circuits have not been described in detail
`So as not to unnecessarily obscure aspects of the present
`invention.
`0040. The present invention includes a method, apparatus
`and System for Standard Voice user interface and Voice
`controlled devices. Briefly, a Standard Voice user interface is
`provided to control various devices by using Standard Speech
`commands. The standard VUI provides a set of core VUI
`commands and Syntax for the interface between a user and
`the voice controlled device. The core VUI commands
`include an identification phrase to determine if voice con
`trolled devices are available in an environment. Other core
`VUI commands provide for determining the names of the
`Voice controlled devices and altering them.
`0041
`Voice controlled devices are disclosed. A voice
`controlled device is defined herein as any device that is
`controlled by speech, which is either audible or non-audible.
`Audible and non-audible are defined herein later. A voice
`controlled device may also be referred to herein as an
`appliance, a machine, a Voice controlled appliance, a voice
`controlled electronic device, a name activated electronic
`device, a speech controlled device, a voice activated elec
`tronic appliance, a voice activated appliance, a voice con
`trolled electronic device, or a Self-identifying voice con
`trolled electronic device.
`0042. The present invention is controlled by and com
`municates using audible and non-audible Speech. Speech as
`defined herein for the present invention encompasses a) a
`Signal or information, Such that if the Signal or information
`were passed through a Suitable device to convert it to
`variations in air pressure, the Signal or information could be
`heard by a human being and would be considered language,
`and b) a signal or information comprising actual variations
`in air pressure, Such that if a human being were to hear the
`Signal, the human would consider it language. Audible
`Speech refers to speech that a human can hear unassisted.
`Non-audible speech refers to any encodings or representa
`tions of Speech that are not included under the definition of
`audible speech, including that which may be communicated
`outside the hearing range of humans and transmission media
`other than air. The definition of Speech includes Speech that
`is emitted from a human and emitted from a machine
`(including machine speech Synthesis, playback of previously
`recorded human speech Such as prompts, or other forms).
`0.043
`Prompts which are communicated by a voice con
`trolled device and phrases which are communicated by a
`user may be in languages or dialects other than English or a
`combination of multiple languages. A phrase is defined
`herein as a Single word, or a group of words treated as a unit.
`A user, as defined herein, is a human or a device, including
`a voice activated device. Hence "a user's Spoken phrase”, “a
`user issuing a command”, and all other actions by a user
`include actions by a device and by a human.
`0044 Voice controlled devices include some type of
`Speech recognition in order to be controlled by Speech.
`
`Speech recognition and Voice recognition are used Synono
`mously herein and have the same meaning. Preferably,
`Speeker independent Speech recognition Systems are used to
`provide the Speech recognition capability of the Voice con
`trolled devices. Speaker independent speech recognitions
`Systems are responsive