`
`European Patent Office
`
`Office europeen des brevets
`
`111111111111111111111111111111111111111111111111111111111111111111111111111
`
`c11>
`
`EP O 911 808 A1
`
`EUROPEAN PATENT APPLICATION
`
`(19)
`
`'
`
`(12)
`
`(43) Date of publication:
`28.04.1999 Bulletin 1999/17
`
`(21) Application number: 97118470.0
`
`(22) Date of filing: 23.10.1997
`
`(51) Int. Cl. 6 : G1 OL 5/06, H04L 12/28
`
`(84) Designated Contracting States:
`AT BE CH DE DK ES Fl FR GB GR IE IT LI LU MC
`NL PTSE
`Designated Extension States:
`AL LT LV ROSI
`
`(71) Applicant:
`Sony International (Europe) GmbH
`50829 Kain (DE)
`
`(72) Inventors:
`• Buchner, Peter,
`c/o Sony International GmbH
`70736 Fellbach (DE)
`
`• Goronzy, Silke,
`c/o Sony International GmbH
`70736 Fellbach (DE)
`• Kompe, Ralf,
`c/o Sony International GmbH
`70736 Fellbach (DE)
`• Rapp, Stefan,
`c/o Sony International GmbH
`70736 Fellbach (DE)
`
`(74) Representative:
`MOiier, Frithjof E., Dipl.-lng.
`Patentanwalte
`MULLER & HOFFMANN,
`lnnere Wiener Strasse 17
`81667 Munchen (DE)
`
`(54)
`
`Speech interface in a home network environment
`
`device. The properties of this device, e.g. the vocabu(cid:173)
`lary can be dynamically and actively extended by the
`consumer devices (11) connected to the bus system
`(31). The proposed technology is independent from a
`specific bus standard, e.g. the IEEE 1394 standard, and
`is well-suited for all kinds of wired or wireless home net(cid:173)
`works.
`
`.cS
`EJ
`I
`
`s 31
`
`Home networks low-cost digital interfaces are
`(57)
`introduced that integrate entertainment, communication
`and computing electronics into consumer multimedia.
`Normally, these are low-cost, easy to use systems,
`since they allow the user to remove or add any kind of
`network devices with the bus being active. To improve
`the user interface a speech unit (2) is proposed that
`enables all devices (11) connected to the bus system
`(31) to be controlled by a single speech recognition
`
`Figure 3:
`
`netwo
`device
`
`network
`
`I ~II
`
`microphone
`
`llSIWOf
`dsvic:e
`
`/
`
`I >?
`~
`
`T""
`
`<C
`CX)
`0
`CX)
`
`T""
`
`T"" en
`0
`a.
`w
`
`Printed by Xerox (UK) Business Services
`2.16.7/3.6
`
`Petitioner Google Ex-1008, 0001
`
`
`
`Description
`
`tion of an embodiment of the invention taken in conjunc(cid:173)
`tion with the accompanying drawings, wherein
`
`EP O 911 808 A1
`
`2
`
`5
`
`10
`
`15
`
`Fig. 1
`
`Fig. 2
`
`Fig. 3
`
`Fig. 4
`
`Fig. 6
`
`Fig. 7
`
`30
`
`Fig. 8
`
`shows a block diagram of an example of a
`speech unit according to an embodiment of
`the invention;
`
`shows a block diagram of an example of a
`network device according to an embodi(cid:173)
`ment of the invention;
`
`shows an example of a wired 1394 network
`having a speech unit and several 1394
`devices;
`
`shows an example of a wired 1394 network
`having a speech unit incorporated in a 1394
`device and several normal 1394 devices;
`
`shows three examples of different types of
`networks;
`
`shows an example of a home network in a
`house having three clusters;
`
`shows two examples of controlling a net(cid:173)
`work device remotely via a speech recog(cid:173)
`nizer;
`
`shows an example of a part of a grammar
`for a user dialogue during a VCR program(cid:173)
`ming;
`
`shows an example of a protocol of the inter(cid:173)
`action between a user, a speech recognizer
`and a network device;
`
`shows an example of a learning procedure
`of a connected device, where the name of
`the device is determined automatically;
`
`shows an example of a protocol of a notifi(cid:173)
`cation procedure of a device being newly
`connected, where the user is asked for the
`name of the device;
`
`shows an example of a protocol of the inter(cid:173)
`action of multiple devices for vocabulary
`extensions concerning media contents; and
`
`shows another example of a protocol of the
`interaction of multiple devices for vocabu(cid:173)
`lary extensions concerning media contents.
`
`[0001] This invention relates to a speech interface in a
`home network environment. In particular, it is con-
`cerned with a speech recognition device, a remotely
`controllable device and a method of self-initialization of
`a speech recognition device.
`[0002] Generally, speech recognizers are known for
`controlling different consumer devices, i.e. television,
`radio, car navigation, mobile telephone, camcorder, PC,
`printer, heating of buildings or rooms. Each of these
`speech recognizers is built into a specific device to con-
`trol it. The properties of such a recognizer, such as the
`vocabulary, the grammar and the corresponding com-
`mands, are designed for this particular task.
`[0003] On the other hand, technology is now available
`to connect different of the above listed consumer
`devices via a home network with dedicated bus sys-
`tems, e.g. a IEEE 1394 bus. Devices adapted for such
`systems communicate by sending commands and data 20 Fig. 5
`to each other. Usually such devices identify themselves
`when they are connected to the network and get a
`unique address assigned by a network controller.
`Thereafter, these addresses can be used by all devices
`to communicate with each other. All other devices 25
`already connected to such a network are informed
`about address and type of a newly connected device.
`Such a network will be included in private homes as well
`as cars.
`[0004] Speech recognition devices enhance comfort
`and, if used in a car may improve security, as the oper(cid:173)
`ation of consumer devices becomes more and more
`complicated, e.g. controlling of a car stereo. Also in a
`home network environment e.g. the programming of a
`video recorder or the selection of television channels
`can be simplified when using a speech recognizer. On
`the other hand, speech recognition devices have a
`rather complicated structure and need a quite expen(cid:173)
`sive technology when a reliable and flexible operation
`should be secured, therefore, a speech recognizer will
`not be affordable for most of the devices listed above.
`[0005] Therefore, it is the object of the present inven(cid:173)
`tion to provide a generic speech recognizer facilitating
`the control of several devices. Further, it is the object of
`the present invention to provide a remotely controllable
`device that simplifies its network-controllability via
`speech.
`[0006] A further object is to provide a method of self(cid:173)
`initialization of the task dependent parts of such a
`speech recognition device to control such remotely con(cid:173)
`trollable devices.
`[0007]
`These objects are respectively achieved as
`defined in the independent claims 1, 4, 14, 15 and 18.
`[0008] Further preferred embodiments of the invention
`are defined in the respective subclaims.
`[0009]
`The present invention will become apparent
`and its numerous modifications and advantages will be
`better understood from the following detailed descrip-
`
`Fig. 9
`
`Fig.10
`
`Fig.11
`
`Fig.12
`
`Fig.13
`
`35
`
`40
`
`45
`
`50
`
`55
`
`2
`
`[001 0] Fig. 1 shows a block diagram of an example of
`the structure of a speech unit 2 according to the inven(cid:173)
`tion. Said speech unit 2 is connected to a microphone 1
`and a loudspeaker, which could also be built into said
`
`Petitioner Google Ex-1008, 0002
`
`
`
`3
`
`EP O 911 808 A1
`
`4
`
`speech unit 2. The speech unit 2 comprises a speech
`synthesizer, a dialogue module, a speech recognizer
`and a speech interpreter and is connected to an IEEE
`1394 bus system 10. It is also possible that the micro(cid:173)
`phone 1 and/or the loudspeaker are connected to the
`speech unit 2 via said bus system 10. Of course it is
`then necessary that the microphone 1 and/or the loud(cid:173)
`speaker are respectively equipped with a circuitry to
`communicate with said speech unit 2 via said network,
`such as AID and D/A converters and/or command inter-
`preters, so that the microphone 1 can transmit the elec(cid:173)
`tric signals corresponding
`to
`received spoken
`utterances to the speech unit 2 and the loudspeaker can
`output received electric signals from the speech unit 2
`as sound.
`[0011]
`IEEE 1394 is an international standard, low(cid:173)
`cost digital interface that will integrate entertainment,
`communication and computing electronics into con(cid:173)
`sumer multimedia. It is a low-cost easy-to-use bus sys(cid:173)
`tem, since it allows the user to remove or add any kind
`of 1394 devices with the bus being active. Although the
`present invention is described in connection with such
`an IEEE 1394 bus system and IEEE 1394 network
`devices, the proposed technology is independent from
`the specific IEEE 1394 standard and is well-suited for all
`kinds of wired or wireless home networks or other net(cid:173)
`works.
`[0012] As will be shown in detail later, a speech unit 2,
`as shown in Fig. 1 is connected to the home network 10.
`This is a general purpose speech recognizer and syn-
`thesizer having a generic vocabulary. The same speech
`unit 2 is used for controlling all of the devices 11 con(cid:173)
`nected to the network 10. The speech unit 2 picks up a
`spoken-command from a user via the microphone 1,
`recognizes it and converts it into a corresponding home
`network control code, henceforth called user-network(cid:173)
`command, e.g. specified by the IEEE 1394 standard.
`This control code is then sent to the appropriate device
`that performs the action associated with the user-net(cid:173)
`work-command.
`[0013]
`To be capable of enabling all connected net(cid:173)
`work devices to be controlled by speech, the speech
`unit has to "know" the commands that are needed to
`provide operability of all individual devices 11. Initially,
`the speech unit "knows" a basic set of commands, e.g.,
`commands that are the same for various devices. There
`can be a many-to-one mapping between spoken-com(cid:173)
`mands from a user and user-network-commands gener(cid:173)
`ated therefrom. Such spoken-commands can e.g. be
`play, search for radio station YXZ or (sequences of)
`numbers such as phone numbers. These commands
`can be spoken in isolation or they can be explicitely or
`implicitely embedded within full sentences. Full sen(cid:173)
`tences will henceforth as well be called spoken-com(cid:173)
`mand.
`[0014]
`In general, speech recognizers and technolo(cid:173)
`gies for speech recognition, interpretation, and dia(cid:173)
`logues are well-known and will not be explained in detail
`
`5
`
`10
`
`15
`
`25
`
`30
`
`35
`
`40
`
`in connection with this invention. Basically, a speech
`recognizer comprises a set of vocabulary and a set of
`knowledge-bases (henceforth grammars) according to
`which a spoken-command from a user is converted into
`a user-network-command that can be carried out by a
`device. The speech recognizer also may use a set of
`alternative pronunciations associated with each vocab(cid:173)
`ulary word. The dialogue with the user will be conducted
`according to some dialogue model.
`[0015] The speech unit 2 according to an embodiment
`of the invention comprises a digital signal processor 3
`connected to the microphone 1. The digital signal proc(cid:173)
`essor 3 receives the electric signals corresponding to
`the spoken-command from the microphone 1 and per-
`forms a first processing to convert these electric signals
`into digital words recognizable by a central processing
`unit 4. To be able to perform this first processing, the
`digital signal processor 3 is bidirectionally coupled to a
`memory 8 holding information about the process to be
`20 carried out by the digital signal processor 3 and a
`speech recognition section 3a included therein. Further,
`the digital signal processor 3 is connected to a feature
`extraction section 7e of a memory 7 wherein information
`is stored of how to convert electric signals correspond-
`ing to spoken-commands into digital words correspond(cid:173)
`ing thereto. In other words, the digital signal processor 3
`converts the spoken-command from a user input via the
`microphone 1 into a computer recognizable form, e.g.
`text code.
`[0016] The digital signal processor 3 sends the gener(cid:173)
`ated digital words to the central processing unit 4. The
`central processing unit 4 converts these digital words
`into user-network-commands sent to the home network
`system 10. Therefore, the digital signal processor 3 and
`the central processing unit 4 can be seen as speech
`recognizer, dialogue module and speech interpreter.
`[0017]
`It is also possible that the digital signal proces(cid:173)
`sor 3 only performs a spectrum analysis of the spoken(cid:173)
`command from a user input via the microphone 1 and
`the word recognition itself is conducted in the central
`processing unit 4 together with the convertion into user(cid:173)
`network-commands. Depending on the capacity of the
`central processing unit 4, it can also perform the spec(cid:173)
`trum analysis and the digital signal processor 3 can be
`45 omitted.
`[0018] Further, the central processing unit 4 provides
`a learning function for the speech unit 2 so that the
`speech unit 2 can learn new vocabulary, grammar and
`user-network-commands to be sent to a network device
`11 corresponding thereto. To be able to perform these
`tasks the central processing unit 4 is bidirectionally cou(cid:173)
`pled to the memory 8 that is also holding information
`about the processes to be performed by the central
`processing unit 4. Further, the central processing unit 4
`is bidirectionally coupled to an initial vocabulary section
`7a, an extended vocabulary section 7b, an initial gram(cid:173)
`mar section 7c, an extended grammar section 7d and a
`software section 7f that comprises a recognition section
`
`50
`
`55
`
`3
`
`Petitioner Google Ex-1008, 0003
`
`
`
`5
`
`EP O 911 808 A1
`
`6
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`into proper user-network-commands, examples are
`please repeat, which device, do you really want to
`switch off?, etc. All such messages or questions are
`stored together with speech data needed by the central
`processing unit 4 to generate digital words to be output
`to the speech generation and synthesis section 9a of
`the digital signal processor 9 to generate spoken utter(cid:173)
`ances output to the user via the loudspeaker. Through
`the microphone 1, the digital signal processors 3 and 9
`and the loudspeaker a "bidirectional coupling" of the
`central processing unit 4 with a user is possible. There(cid:173)
`fore, it is possible that the speech unit 2 can communi(cid:173)
`cate with a user and learn from him or her. Like in the
`case of the communication with a network device 11,
`the speech unit 2 can access a set of control-network(cid:173)
`commands stored in the memory 8 to instruct the user
`to give certain information to the speech unit 2.
`[0022] As stated above, also user-network-commands
`and the corresponding vocabulary and grammars can
`be input by a user via the microphone 1 and the digital
`signal processor 3 to the central processing unit 4 on
`demand of control-network-commands output as mes(cid:173)
`sages by the speech unit 2 to the user. After the user
`has uttered a spoken-command to set the speech unit 2
`into learning state with him, the central processing unit
`4 performs a dialogue with the user on the basis of con-
`trol-network-commands stored in the memory 8 to gen(cid:173)
`erate new user-network-commands and corresponding
`vocabulary to be stored in the respective sections of the
`memory 7.
`[0023]
`It is also possible that the process of learning
`new user-network-commands is done half-automati(cid:173)
`cally by the communication in-between the speech unit
`2 and an arbitrary network device and half-dialogue
`controlled between the speech unit 2 and a user. In this
`way, user-dependent user-network-commands
`for
`selected network devices can be generated.
`[0024] As stated above, the speech unit 2 processes
`three kinds of commands.
`i.e. spoken-commands
`uttered by a user, user-network-commands. i.e. digital
`signals corresponding to the spoken-commands, and
`control-network-commands to perform a communica(cid:173)
`tion with other devices or with a user to learn new user(cid:173)
`network-commands from other devices 11 and to
`45 assign certain functionalities thereto so that a user can
`input new spoken-commands or to assign a new func(cid:173)
`tionality to user-network-commands already included.
`[0025] Output of the speech unit directed to the user
`are either synthesized speech or pre-recorded utter(cid:173)
`ances. A mixture of both might be useful, e.g. pre(cid:173)
`recorded utterances for the most frequent messages
`and synthesized speech for other messages. Any net(cid:173)
`work device can send messages to the speech unit.
`These messages are either directly in orthographic form
`or they encode or identify in some way an orthographic
`message. Then these orthographic messages are out-
`put via a loudspeaker. e.g. included in the speech unit 2.
`Messages can contain all kinds of information usually
`
`and a grapheme-phoneme conversion section of the
`memory 7. Further, the central processing unit 4 is bidi(cid:173)
`rectionally coupled to the home network system 1 O and
`can also send messages to a digital signal processor 9
`included in the speech unit 2 comprising a speech gen-
`eration section 9a that serves to synthesize messages
`into speech and outputs this speech to a loudspeaker.
`[0019] The central processing unit 4 is bidirectionally
`coupled to the home network 10 via a link layer control
`unit 5 and an 1/F physical layer unit 6. These units serve
`to filter out network-commands from bus 1 O directed to
`the speech unit 2 and to address network-commands to
`selected devices connected to the network 10.
`[0020] Therefore, it is also possible that new user-net(cid:173)
`work-commands together with corresponding vocabu(cid:173)
`lary and grammars can be learned by the speech unit 2
`directly from other network devices. To perform such a
`learning, the speech unit 2 can send control commands
`stored in the memory 8 to control the network devices,
`henceforth
`called
`control-network-commands,
`to
`request their user-network-commands and correspond-
`ing vocabulary and grammars according to which they
`can be controlled by a user. The memory 7 comprises
`an extended vocabulary section 7b and an extended
`grammar section 7d to store newly input vocabulary or
`grammars. These sections are respectively designed
`like the initial vocabulary section 7a and the initial gram-
`mar section 7c, but newly input user-network-com(cid:173)
`mands together with information needed to identify
`these user-network-commands can be stored in the
`extended vocabulary section 7b and the extended
`grammar section 7d by the central processing unit 4. In
`this way, the speech unit 2 can learn user-network-com(cid:173)
`mands and corresponding vocabulary and grammars
`built into an arbitrary network device. New network 35
`devices have then no need to have a built-in speech rec(cid:173)
`ognition device, but only the user-network-commands
`and corresponding vocabulary and grammars that
`should be controllable via a speech recognition system.
`Further, there has to be a facility to transfer these data
`to the speech unit 2. The speech unit 2 according to the
`invention learns said user-network-commands and cor(cid:173)
`responding vocabulary and grammar and the respective
`device can be voice-controlled via the speech unit 2.
`[0021] The initial vocabulary section 7a and the initial
`grammar section 7c store a basic set of user-network(cid:173)
`commands that can be used for various devices, like
`user-network-commands corresponding to the spoken(cid:173)
`commands switch on, switch off, pause, louder, etc.,
`these user-network-commands are stored in connection
`with vocabulary and grammars needed by the central
`processing unit 4 to identify them out of the digital words
`produced by the speech recognition section via the dig-
`ital signal processor 3. Further, questions or messages
`are stored in a memory. These can be output from the
`speech unit 2 to a user. Such questions or messages
`may be used in a dialogue in-between the speech unit 2
`and the user to complete commands spoken by the user
`
`40
`
`50
`
`55
`
`4
`
`Petitioner Google Ex-1008, 0004
`
`
`
`7
`
`EP O 911 808 A1
`
`8
`
`10
`
`15
`
`25
`
`30
`
`presented on a display of a consumer device. Further(cid:173)
`more, there can be questions put forward to the user in
`course of a dialogue. As stated above, such a dialogue
`can also be produced by the speech unit 2 itself to verify
`or confirm spoken-commands or it can be generated by 5
`the speech unit 2 according to control-network-com(cid:173)
`mands to learn new user-network-commands and cor(cid:173)
`responding vocabulary and grammars.
`[0026] The speech input and/or output facility, i.e. the
`microphone 1 and the loud-speaker, can also be one or
`more separate device(s). In this case messages can be
`communicated in orthographic form in-between the
`speech unit and the respective speech input and/or out-
`put facility.
`[0027] Spoken messages sent from the speech unit 2
`itself to the user, like which device should be switched
`on?, could also be asked back to the speech unit 2, e.g.
`which network device do you know?, and first this ques-
`tion could be answered by the speech unit 2 via speech,
`before the user answers the initial spoken message 20
`sent from the speech unit.
`[0028] Fig. 2 shows a block diagram of an example of
`the structure of remotely controllable devices according
`to an embodiment of this invention, here a network
`device 11. This block diagram shows only those function
`blocks necessary for the speech controllability. A central
`processing unit 12 of such a network device 11 is con(cid:173)
`nected via a link layer control unit 17 and an I/F physical
`layer unit 16 to the home network bus 10. Like in the
`speech unit 2, the connection in-between the central
`processing unit 12 and the home network bus 1 0 is bidi(cid:173)
`rectional so that the central processing unit 12 can
`receive user-network-commands and control-network(cid:173)
`commands and other information data from the bus 1 0
`and send control-network-commands, messages and
`other information data to other network devices or a
`speech unit 2 via the bus 10. Depending on the device,
`it might also be possible that it will also send user-net(cid:173)
`work-commands. The central processing unit 12 is bidi(cid:173)
`rectionally coupled
`to a memory 14 where all
`information necessary for the processing of the central
`processing unit 12 including a list of control-network(cid:173)
`commands needed to communicate with other network
`devices is stored. Further, the central processing unit 12
`is bidirectionally coupled to a device control unit 15 con-
`trolling the overall processing of the network device 11.
`A memory 13 holding all user-network-commands to
`control the network device 11 and the corresponding
`vocabulary and grammars is also bidirectionally cou(cid:173)
`pled to the central processing unit 12. These user-net-
`work-commands and corresponding vocabularies and
`grammars stored in the memory 13 can be down-loaded
`into the extended vocabulary section 7b and the
`extended grammar section 7d of the memory 7 included
`in the speech unit 2 in connection with a device name
`for a respective network device 11 via the central
`processing unit 12 of the network device 11, the link
`layer control unit 17 and the I/F physical layer unit 16 of
`
`the network device 11, the home network bus system
`10, the I/F physical layer unit 6 and the link layer control
`unit 5 of the speech unit 2 and the central processing
`unit 4 of the speech unit 2. In this way all user-network(cid:173)
`commands necessary to control a network device 11
`and corresponding vocabulary and grammars are
`learned by the speech unit 2 according to the present
`invention and therefore, a network device according to
`the present invention needs no built-in device depend-
`ent speech recognizer to be controllable via speech, but
`just a memory holding all device dependent user-net(cid:173)
`work-commands with associated vocabulary and gram(cid:173)
`mars to be down-loaded into the speech unit 2. It is to
`be understood that a basic control of a network device
`by the speech unit 2 is also given without vocabulary
`update information. i.e. the basic control of a network
`device without its device dependent user-network-com(cid:173)
`mands with associated vocabulary and grammars is
`possible. Basic control means here to have the possibil-
`ity to give commands generally defined in some stand(cid:173)
`ard, like switch-on, switch-off, louder, switch channel,
`play, stop, etc ..
`[0029] Fig. 3 shows an example of a network architec(cid:173)
`ture having an IEEE 1394 bus and connected thereto
`one speech unit 2 with microphone 1 and loudspeaker
`and four network devices 11.
`[0030]
`Fig. 4 shows another example of a network
`architecture having four network devices 11 connected
`to an IEEE 1394 bus. Further, a network device 4 hav(cid:173)
`ing a built-in speech unit with microphone 1 and loud(cid:173)
`speaker is connected to the bus 31. Such a network
`device 41 with a built-in speech unit has the same func(cid:173)
`tionality as a network device 11 and a speech unit 2.
`Here, the speech unit controls the network device 11
`35 and the network device 41 which it is built-in.
`[0031]
`Fig. 5 shows further three examples for net(cid:173)
`work architectures. Network A is a network similar to
`that shown in Fig. 3, but six network devices 11 are con(cid:173)
`nected to the bus 31. In regard to the speech unit 2 that
`is also connected to the bus 31, there is no limitation of
`network devices 11 controllable via said speech unit 2.
`Every device connected to the bus 31 that is controlla(cid:173)
`ble via said bus 31 can also be controlled via the speech
`unit 2.
`[0032] Network B shows a different type of network.
`Here, five network devices 11 and one speech unit 2 are
`connected to a bus system 51. The bus system 51 is
`organized so that a connection is only necessary in(cid:173)
`between two devices. Network devices not directly con-
`nected to each other can communicate via other third
`network devices. Regarding the functionality, network B
`has no restrictions in comparison to network A.
`[0033] The third network shown in Fig. 5 is a wireless
`network. Here, all devices can directly communicate
`with each other via a transmitter and a receiver built into
`each device. This example shows also that several
`speech units 2 can be connected to one network. Those
`speech units 2 can have both the same functionality or
`
`40
`
`45
`
`50
`
`55
`
`5
`
`Petitioner Google Ex-1008, 0005
`
`
`
`9
`
`EP O 911 808 A1
`
`10
`
`both different functionalities, as desired. In this way, it is
`also easily possible to build personalized speech units 2
`that can be carried by respective users and that can
`control different network devices 11, as desired by the
`user. Of course, personalized speech units can also be
`used in a wired network. In comparison to a wireless
`speech
`input and/or output facility a personalized
`speech unit has the advantage that it can automatically
`log-into another network and all personalized features
`are available.
`[0034]
`Such a personalized network device can be
`constructed to translate only those spoken-commands
`of a selected user into user-network-commands using
`speaker-adaption or speaker-verification. This enables
`a very secure access policy in that an access is only
`allowed if the correct speaker uses the correct speech
`unit. Of course, all kinds of accesses can be controlled
`in this way, e.g. access to the network itself, access to
`devices connected to the network, like access to rooms,
`to a VCR, to televisions and the like.
`[0035] Further, electronic phone books my be stored
`within the speech unit. Calling functions by name, e.g.
`office, is strongly user-dependent and therefore such
`features will be preferably realized in personalized
`speech units. Also spoken-commands as switch on my
`TV can easily be assigned to the correct user-network(cid:173)
`commands controlling the correct device, as it may be
`the case that different users assign different logical
`names therefore and the speech unit 2 has to generate
`the same user-network-command when interpreting dif(cid:173)
`ferent spoken-commands. On the other hand, it is pos(cid:173)
`sible that the network e.g. comprises more than one
`device of the same type, e.g. two TVs, and the speech
`unit 2 has to generate different user-network-com(cid:173)
`mands when interpreting the same spoken-command
`uttered by different users, e.g. switch on my TV.
`[0036]
`One speech unit can contain personalized
`information of one user or different users. In most cases
`the personalized speech unit corresponding to only one
`user will be portable and wireless, so that the user can
`take it with him/her and has the same speech unit at
`home, in the car or even other networks, like in his/her
`office.
`[0037] The personalized speech unit can be used for
`speaker verification purposes. It verifies the words of a
`speaker and allows the control of selected devices. This
`can also be used for controlling access to rooms, cars or
`devices, such as phones.
`[0038]
`A personalized speech unit can contain a
`speech recognizer adapted
`to one person which
`strongly enhances the recognition performance.
`[0039]
`Fig. 6 shows an example of a home network
`consisting of three clusters. One of the clusters is built
`by an IEEE 1394 bus 61 installed in a kitchen of the
`house. Connected to this bus is a broadcast receiver 65,
`a digital television 64, a printer 63, a phone 62 and a
`long distance repeater 66. This cluster has also connec(cid:173)
`tions to a broadcast gateway 60 to the outside of the
`
`5
`
`15
`
`house and via the repeater 66 and an IEEE 1394 bridge
`74 to the cluster "sitting room" in which also an IEEE
`1394 bus 67 is present. Apart from the bridge 74, a
`speech unit 70, a personal computer 69, a phone 68, a
`VCR 71, a camcorder 72 and a digital television 73a is
`connected to the bus 67. The bridge 74 is also con-
`nected to the third cluster "study" which comprises an
`IEEE 1394 bus 78 connected to the bridge 74 via a long
`distance repeater 75. Further, a PC 76, a phone 77, a
`10 hard disc 79, a printer 80, a digital television 81 and a
`telephone NIU 82 are connected to said bus 78. A tele(cid:173)
`phone gateway 83 is connected to the telephone NIU
`82.
`[0040] The above described network is constructed so
`that every device can communicate with the other
`devices via the IEEE 1394 system, the bridge 74 and
`the repeaters 66 and 75. The speech unit 70 located in
`the sitting room can communicate with all devices and
`therewith have the possibility to control them. This
`20 speech unit 70 is built like the speech unit 2 described
`above. Since in the example shown in Fig. 6 several
`devices of the same type are present, e.g., the digital
`television 30 in the sitting room and the digital television
`81
`in the study, it is possible to define user defined
`25 device names. When the network is set-up or when a
`device is connected to the network having already a
`device of this type connected thereto, the speech unit
`70 will ask the user for names for these devices, e.g. tel(cid:173)
`evision in the sitting room and television in the study to
`30 be assigned to the individual devices. To be able to rec(cid:173)
`ognize these names, one of the following procedures
`has to be done.
`
`1. The user has to enter the orthographic form
`(sequence of letters) of the device name by typing
`or spelling. The speech unit 70 maps the ortho(cid:173)
`graphic form into phoneme or model sequence;
`2. In the case of a personalized speech unit, the
`user utterance corresponding to the device name
`can be stored as a feature vector sequence, that is
`directly used during recognition as reference pat(cid:173)
`tern in a pattern matching approach;
`3. The phoneme sequence corresponding to the
`name can be learned automatically using a pho(cid:173)
`neme recognizer.
`
`35
`
`40
`
`45
`
`50
`
`[0041]
`The user has then only to address these
`devices by name, e.g. television in the sitting room. The
`speech unit 70 maps the name to the appropriate net(cid:173)
`work address. By default, the name corresponds to the
`functionality of the device. All commands uttered by a
`user are sent to the device named at last. Of course it is
`also possible that these names are changed later on.
`[0042]
`In many situations a person might wish to
`55 access his device at home over the phone, e.g. to
`retrieve faxes or to control the heating remotely. Two
`alternative architectures to realize such a remote
`access are illustrated in Fig. 7.
`
`6
`
`Petitioner Google Ex-1008, 0006
`
`
`
`11
`
`EP O 911 808 A1
`
`12
`