throbber
Europaisches Patentamt
`
`European Patent Office
`
`Office europeen des brevets
`
`111111111111111111111111111111111111111111111111111111111111111111111111111
`
`c11>
`
`EP O 911 808 A1
`
`EUROPEAN PATENT APPLICATION
`
`(19)
`
`'
`
`(12)
`
`(43) Date of publication:
`28.04.1999 Bulletin 1999/17
`
`(21) Application number: 97118470.0
`
`(22) Date of filing: 23.10.1997
`
`(51) Int. Cl. 6 : G1 OL 5/06, H04L 12/28
`
`(84) Designated Contracting States:
`AT BE CH DE DK ES Fl FR GB GR IE IT LI LU MC
`NL PTSE
`Designated Extension States:
`AL LT LV ROSI
`
`(71) Applicant:
`Sony International (Europe) GmbH
`50829 Kain (DE)
`
`(72) Inventors:
`• Buchner, Peter,
`c/o Sony International GmbH
`70736 Fellbach (DE)
`
`• Goronzy, Silke,
`c/o Sony International GmbH
`70736 Fellbach (DE)
`• Kompe, Ralf,
`c/o Sony International GmbH
`70736 Fellbach (DE)
`• Rapp, Stefan,
`c/o Sony International GmbH
`70736 Fellbach (DE)
`
`(74) Representative:
`MOiier, Frithjof E., Dipl.-lng.
`Patentanwalte
`MULLER & HOFFMANN,
`lnnere Wiener Strasse 17
`81667 Munchen (DE)
`
`(54)
`
`Speech interface in a home network environment
`
`device. The properties of this device, e.g. the vocabu(cid:173)
`lary can be dynamically and actively extended by the
`consumer devices (11) connected to the bus system
`(31). The proposed technology is independent from a
`specific bus standard, e.g. the IEEE 1394 standard, and
`is well-suited for all kinds of wired or wireless home net(cid:173)
`works.
`
`.cS
`EJ
`I
`
`s 31
`
`Home networks low-cost digital interfaces are
`(57)
`introduced that integrate entertainment, communication
`and computing electronics into consumer multimedia.
`Normally, these are low-cost, easy to use systems,
`since they allow the user to remove or add any kind of
`network devices with the bus being active. To improve
`the user interface a speech unit (2) is proposed that
`enables all devices (11) connected to the bus system
`(31) to be controlled by a single speech recognition
`
`Figure 3:
`
`netwo
`device
`
`network
`
`I ~II
`
`microphone
`
`llSIWOf
`dsvic:e
`
`/
`
`I >?
`~
`
`T""
`
`<C
`CX)
`0
`CX)
`
`T""
`
`T"" en
`0
`a.
`w
`
`Printed by Xerox (UK) Business Services
`2.16.7/3.6
`
`Petitioner Google Ex-1008, 0001
`
`

`

`Description
`
`tion of an embodiment of the invention taken in conjunc(cid:173)
`tion with the accompanying drawings, wherein
`
`EP O 911 808 A1
`
`2
`
`5
`
`10
`
`15
`
`Fig. 1
`
`Fig. 2
`
`Fig. 3
`
`Fig. 4
`
`Fig. 6
`
`Fig. 7
`
`30
`
`Fig. 8
`
`shows a block diagram of an example of a
`speech unit according to an embodiment of
`the invention;
`
`shows a block diagram of an example of a
`network device according to an embodi(cid:173)
`ment of the invention;
`
`shows an example of a wired 1394 network
`having a speech unit and several 1394
`devices;
`
`shows an example of a wired 1394 network
`having a speech unit incorporated in a 1394
`device and several normal 1394 devices;
`
`shows three examples of different types of
`networks;
`
`shows an example of a home network in a
`house having three clusters;
`
`shows two examples of controlling a net(cid:173)
`work device remotely via a speech recog(cid:173)
`nizer;
`
`shows an example of a part of a grammar
`for a user dialogue during a VCR program(cid:173)
`ming;
`
`shows an example of a protocol of the inter(cid:173)
`action between a user, a speech recognizer
`and a network device;
`
`shows an example of a learning procedure
`of a connected device, where the name of
`the device is determined automatically;
`
`shows an example of a protocol of a notifi(cid:173)
`cation procedure of a device being newly
`connected, where the user is asked for the
`name of the device;
`
`shows an example of a protocol of the inter(cid:173)
`action of multiple devices for vocabulary
`extensions concerning media contents; and
`
`shows another example of a protocol of the
`interaction of multiple devices for vocabu(cid:173)
`lary extensions concerning media contents.
`
`[0001] This invention relates to a speech interface in a
`home network environment. In particular, it is con-
`cerned with a speech recognition device, a remotely
`controllable device and a method of self-initialization of
`a speech recognition device.
`[0002] Generally, speech recognizers are known for
`controlling different consumer devices, i.e. television,
`radio, car navigation, mobile telephone, camcorder, PC,
`printer, heating of buildings or rooms. Each of these
`speech recognizers is built into a specific device to con-
`trol it. The properties of such a recognizer, such as the
`vocabulary, the grammar and the corresponding com-
`mands, are designed for this particular task.
`[0003] On the other hand, technology is now available
`to connect different of the above listed consumer
`devices via a home network with dedicated bus sys-
`tems, e.g. a IEEE 1394 bus. Devices adapted for such
`systems communicate by sending commands and data 20 Fig. 5
`to each other. Usually such devices identify themselves
`when they are connected to the network and get a
`unique address assigned by a network controller.
`Thereafter, these addresses can be used by all devices
`to communicate with each other. All other devices 25
`already connected to such a network are informed
`about address and type of a newly connected device.
`Such a network will be included in private homes as well
`as cars.
`[0004] Speech recognition devices enhance comfort
`and, if used in a car may improve security, as the oper(cid:173)
`ation of consumer devices becomes more and more
`complicated, e.g. controlling of a car stereo. Also in a
`home network environment e.g. the programming of a
`video recorder or the selection of television channels
`can be simplified when using a speech recognizer. On
`the other hand, speech recognition devices have a
`rather complicated structure and need a quite expen(cid:173)
`sive technology when a reliable and flexible operation
`should be secured, therefore, a speech recognizer will
`not be affordable for most of the devices listed above.
`[0005] Therefore, it is the object of the present inven(cid:173)
`tion to provide a generic speech recognizer facilitating
`the control of several devices. Further, it is the object of
`the present invention to provide a remotely controllable
`device that simplifies its network-controllability via
`speech.
`[0006] A further object is to provide a method of self(cid:173)
`initialization of the task dependent parts of such a
`speech recognition device to control such remotely con(cid:173)
`trollable devices.
`[0007]
`These objects are respectively achieved as
`defined in the independent claims 1, 4, 14, 15 and 18.
`[0008] Further preferred embodiments of the invention
`are defined in the respective subclaims.
`[0009]
`The present invention will become apparent
`and its numerous modifications and advantages will be
`better understood from the following detailed descrip-
`
`Fig. 9
`
`Fig.10
`
`Fig.11
`
`Fig.12
`
`Fig.13
`
`35
`
`40
`
`45
`
`50
`
`55
`
`2
`
`[001 0] Fig. 1 shows a block diagram of an example of
`the structure of a speech unit 2 according to the inven(cid:173)
`tion. Said speech unit 2 is connected to a microphone 1
`and a loudspeaker, which could also be built into said
`
`Petitioner Google Ex-1008, 0002
`
`

`

`3
`
`EP O 911 808 A1
`
`4
`
`speech unit 2. The speech unit 2 comprises a speech
`synthesizer, a dialogue module, a speech recognizer
`and a speech interpreter and is connected to an IEEE
`1394 bus system 10. It is also possible that the micro(cid:173)
`phone 1 and/or the loudspeaker are connected to the
`speech unit 2 via said bus system 10. Of course it is
`then necessary that the microphone 1 and/or the loud(cid:173)
`speaker are respectively equipped with a circuitry to
`communicate with said speech unit 2 via said network,
`such as AID and D/A converters and/or command inter-
`preters, so that the microphone 1 can transmit the elec(cid:173)
`tric signals corresponding
`to
`received spoken
`utterances to the speech unit 2 and the loudspeaker can
`output received electric signals from the speech unit 2
`as sound.
`[0011]
`IEEE 1394 is an international standard, low(cid:173)
`cost digital interface that will integrate entertainment,
`communication and computing electronics into con(cid:173)
`sumer multimedia. It is a low-cost easy-to-use bus sys(cid:173)
`tem, since it allows the user to remove or add any kind
`of 1394 devices with the bus being active. Although the
`present invention is described in connection with such
`an IEEE 1394 bus system and IEEE 1394 network
`devices, the proposed technology is independent from
`the specific IEEE 1394 standard and is well-suited for all
`kinds of wired or wireless home networks or other net(cid:173)
`works.
`[0012] As will be shown in detail later, a speech unit 2,
`as shown in Fig. 1 is connected to the home network 10.
`This is a general purpose speech recognizer and syn-
`thesizer having a generic vocabulary. The same speech
`unit 2 is used for controlling all of the devices 11 con(cid:173)
`nected to the network 10. The speech unit 2 picks up a
`spoken-command from a user via the microphone 1,
`recognizes it and converts it into a corresponding home
`network control code, henceforth called user-network(cid:173)
`command, e.g. specified by the IEEE 1394 standard.
`This control code is then sent to the appropriate device
`that performs the action associated with the user-net(cid:173)
`work-command.
`[0013]
`To be capable of enabling all connected net(cid:173)
`work devices to be controlled by speech, the speech
`unit has to "know" the commands that are needed to
`provide operability of all individual devices 11. Initially,
`the speech unit "knows" a basic set of commands, e.g.,
`commands that are the same for various devices. There
`can be a many-to-one mapping between spoken-com(cid:173)
`mands from a user and user-network-commands gener(cid:173)
`ated therefrom. Such spoken-commands can e.g. be
`play, search for radio station YXZ or (sequences of)
`numbers such as phone numbers. These commands
`can be spoken in isolation or they can be explicitely or
`implicitely embedded within full sentences. Full sen(cid:173)
`tences will henceforth as well be called spoken-com(cid:173)
`mand.
`[0014]
`In general, speech recognizers and technolo(cid:173)
`gies for speech recognition, interpretation, and dia(cid:173)
`logues are well-known and will not be explained in detail
`
`5
`
`10
`
`15
`
`25
`
`30
`
`35
`
`40
`
`in connection with this invention. Basically, a speech
`recognizer comprises a set of vocabulary and a set of
`knowledge-bases (henceforth grammars) according to
`which a spoken-command from a user is converted into
`a user-network-command that can be carried out by a
`device. The speech recognizer also may use a set of
`alternative pronunciations associated with each vocab(cid:173)
`ulary word. The dialogue with the user will be conducted
`according to some dialogue model.
`[0015] The speech unit 2 according to an embodiment
`of the invention comprises a digital signal processor 3
`connected to the microphone 1. The digital signal proc(cid:173)
`essor 3 receives the electric signals corresponding to
`the spoken-command from the microphone 1 and per-
`forms a first processing to convert these electric signals
`into digital words recognizable by a central processing
`unit 4. To be able to perform this first processing, the
`digital signal processor 3 is bidirectionally coupled to a
`memory 8 holding information about the process to be
`20 carried out by the digital signal processor 3 and a
`speech recognition section 3a included therein. Further,
`the digital signal processor 3 is connected to a feature
`extraction section 7e of a memory 7 wherein information
`is stored of how to convert electric signals correspond-
`ing to spoken-commands into digital words correspond(cid:173)
`ing thereto. In other words, the digital signal processor 3
`converts the spoken-command from a user input via the
`microphone 1 into a computer recognizable form, e.g.
`text code.
`[0016] The digital signal processor 3 sends the gener(cid:173)
`ated digital words to the central processing unit 4. The
`central processing unit 4 converts these digital words
`into user-network-commands sent to the home network
`system 10. Therefore, the digital signal processor 3 and
`the central processing unit 4 can be seen as speech
`recognizer, dialogue module and speech interpreter.
`[0017]
`It is also possible that the digital signal proces(cid:173)
`sor 3 only performs a spectrum analysis of the spoken(cid:173)
`command from a user input via the microphone 1 and
`the word recognition itself is conducted in the central
`processing unit 4 together with the convertion into user(cid:173)
`network-commands. Depending on the capacity of the
`central processing unit 4, it can also perform the spec(cid:173)
`trum analysis and the digital signal processor 3 can be
`45 omitted.
`[0018] Further, the central processing unit 4 provides
`a learning function for the speech unit 2 so that the
`speech unit 2 can learn new vocabulary, grammar and
`user-network-commands to be sent to a network device
`11 corresponding thereto. To be able to perform these
`tasks the central processing unit 4 is bidirectionally cou(cid:173)
`pled to the memory 8 that is also holding information
`about the processes to be performed by the central
`processing unit 4. Further, the central processing unit 4
`is bidirectionally coupled to an initial vocabulary section
`7a, an extended vocabulary section 7b, an initial gram(cid:173)
`mar section 7c, an extended grammar section 7d and a
`software section 7f that comprises a recognition section
`
`50
`
`55
`
`3
`
`Petitioner Google Ex-1008, 0003
`
`

`

`5
`
`EP O 911 808 A1
`
`6
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`into proper user-network-commands, examples are
`please repeat, which device, do you really want to
`switch off?, etc. All such messages or questions are
`stored together with speech data needed by the central
`processing unit 4 to generate digital words to be output
`to the speech generation and synthesis section 9a of
`the digital signal processor 9 to generate spoken utter(cid:173)
`ances output to the user via the loudspeaker. Through
`the microphone 1, the digital signal processors 3 and 9
`and the loudspeaker a "bidirectional coupling" of the
`central processing unit 4 with a user is possible. There(cid:173)
`fore, it is possible that the speech unit 2 can communi(cid:173)
`cate with a user and learn from him or her. Like in the
`case of the communication with a network device 11,
`the speech unit 2 can access a set of control-network(cid:173)
`commands stored in the memory 8 to instruct the user
`to give certain information to the speech unit 2.
`[0022] As stated above, also user-network-commands
`and the corresponding vocabulary and grammars can
`be input by a user via the microphone 1 and the digital
`signal processor 3 to the central processing unit 4 on
`demand of control-network-commands output as mes(cid:173)
`sages by the speech unit 2 to the user. After the user
`has uttered a spoken-command to set the speech unit 2
`into learning state with him, the central processing unit
`4 performs a dialogue with the user on the basis of con-
`trol-network-commands stored in the memory 8 to gen(cid:173)
`erate new user-network-commands and corresponding
`vocabulary to be stored in the respective sections of the
`memory 7.
`[0023]
`It is also possible that the process of learning
`new user-network-commands is done half-automati(cid:173)
`cally by the communication in-between the speech unit
`2 and an arbitrary network device and half-dialogue
`controlled between the speech unit 2 and a user. In this
`way, user-dependent user-network-commands
`for
`selected network devices can be generated.
`[0024] As stated above, the speech unit 2 processes
`three kinds of commands.
`i.e. spoken-commands
`uttered by a user, user-network-commands. i.e. digital
`signals corresponding to the spoken-commands, and
`control-network-commands to perform a communica(cid:173)
`tion with other devices or with a user to learn new user(cid:173)
`network-commands from other devices 11 and to
`45 assign certain functionalities thereto so that a user can
`input new spoken-commands or to assign a new func(cid:173)
`tionality to user-network-commands already included.
`[0025] Output of the speech unit directed to the user
`are either synthesized speech or pre-recorded utter(cid:173)
`ances. A mixture of both might be useful, e.g. pre(cid:173)
`recorded utterances for the most frequent messages
`and synthesized speech for other messages. Any net(cid:173)
`work device can send messages to the speech unit.
`These messages are either directly in orthographic form
`or they encode or identify in some way an orthographic
`message. Then these orthographic messages are out-
`put via a loudspeaker. e.g. included in the speech unit 2.
`Messages can contain all kinds of information usually
`
`and a grapheme-phoneme conversion section of the
`memory 7. Further, the central processing unit 4 is bidi(cid:173)
`rectionally coupled to the home network system 1 O and
`can also send messages to a digital signal processor 9
`included in the speech unit 2 comprising a speech gen-
`eration section 9a that serves to synthesize messages
`into speech and outputs this speech to a loudspeaker.
`[0019] The central processing unit 4 is bidirectionally
`coupled to the home network 10 via a link layer control
`unit 5 and an 1/F physical layer unit 6. These units serve
`to filter out network-commands from bus 1 O directed to
`the speech unit 2 and to address network-commands to
`selected devices connected to the network 10.
`[0020] Therefore, it is also possible that new user-net(cid:173)
`work-commands together with corresponding vocabu(cid:173)
`lary and grammars can be learned by the speech unit 2
`directly from other network devices. To perform such a
`learning, the speech unit 2 can send control commands
`stored in the memory 8 to control the network devices,
`henceforth
`called
`control-network-commands,
`to
`request their user-network-commands and correspond-
`ing vocabulary and grammars according to which they
`can be controlled by a user. The memory 7 comprises
`an extended vocabulary section 7b and an extended
`grammar section 7d to store newly input vocabulary or
`grammars. These sections are respectively designed
`like the initial vocabulary section 7a and the initial gram-
`mar section 7c, but newly input user-network-com(cid:173)
`mands together with information needed to identify
`these user-network-commands can be stored in the
`extended vocabulary section 7b and the extended
`grammar section 7d by the central processing unit 4. In
`this way, the speech unit 2 can learn user-network-com(cid:173)
`mands and corresponding vocabulary and grammars
`built into an arbitrary network device. New network 35
`devices have then no need to have a built-in speech rec(cid:173)
`ognition device, but only the user-network-commands
`and corresponding vocabulary and grammars that
`should be controllable via a speech recognition system.
`Further, there has to be a facility to transfer these data
`to the speech unit 2. The speech unit 2 according to the
`invention learns said user-network-commands and cor(cid:173)
`responding vocabulary and grammar and the respective
`device can be voice-controlled via the speech unit 2.
`[0021] The initial vocabulary section 7a and the initial
`grammar section 7c store a basic set of user-network(cid:173)
`commands that can be used for various devices, like
`user-network-commands corresponding to the spoken(cid:173)
`commands switch on, switch off, pause, louder, etc.,
`these user-network-commands are stored in connection
`with vocabulary and grammars needed by the central
`processing unit 4 to identify them out of the digital words
`produced by the speech recognition section via the dig-
`ital signal processor 3. Further, questions or messages
`are stored in a memory. These can be output from the
`speech unit 2 to a user. Such questions or messages
`may be used in a dialogue in-between the speech unit 2
`and the user to complete commands spoken by the user
`
`40
`
`50
`
`55
`
`4
`
`Petitioner Google Ex-1008, 0004
`
`

`

`7
`
`EP O 911 808 A1
`
`8
`
`10
`
`15
`
`25
`
`30
`
`presented on a display of a consumer device. Further(cid:173)
`more, there can be questions put forward to the user in
`course of a dialogue. As stated above, such a dialogue
`can also be produced by the speech unit 2 itself to verify
`or confirm spoken-commands or it can be generated by 5
`the speech unit 2 according to control-network-com(cid:173)
`mands to learn new user-network-commands and cor(cid:173)
`responding vocabulary and grammars.
`[0026] The speech input and/or output facility, i.e. the
`microphone 1 and the loud-speaker, can also be one or
`more separate device(s). In this case messages can be
`communicated in orthographic form in-between the
`speech unit and the respective speech input and/or out-
`put facility.
`[0027] Spoken messages sent from the speech unit 2
`itself to the user, like which device should be switched
`on?, could also be asked back to the speech unit 2, e.g.
`which network device do you know?, and first this ques-
`tion could be answered by the speech unit 2 via speech,
`before the user answers the initial spoken message 20
`sent from the speech unit.
`[0028] Fig. 2 shows a block diagram of an example of
`the structure of remotely controllable devices according
`to an embodiment of this invention, here a network
`device 11. This block diagram shows only those function
`blocks necessary for the speech controllability. A central
`processing unit 12 of such a network device 11 is con(cid:173)
`nected via a link layer control unit 17 and an I/F physical
`layer unit 16 to the home network bus 10. Like in the
`speech unit 2, the connection in-between the central
`processing unit 12 and the home network bus 1 0 is bidi(cid:173)
`rectional so that the central processing unit 12 can
`receive user-network-commands and control-network(cid:173)
`commands and other information data from the bus 1 0
`and send control-network-commands, messages and
`other information data to other network devices or a
`speech unit 2 via the bus 10. Depending on the device,
`it might also be possible that it will also send user-net(cid:173)
`work-commands. The central processing unit 12 is bidi(cid:173)
`rectionally coupled
`to a memory 14 where all
`information necessary for the processing of the central
`processing unit 12 including a list of control-network(cid:173)
`commands needed to communicate with other network
`devices is stored. Further, the central processing unit 12
`is bidirectionally coupled to a device control unit 15 con-
`trolling the overall processing of the network device 11.
`A memory 13 holding all user-network-commands to
`control the network device 11 and the corresponding
`vocabulary and grammars is also bidirectionally cou(cid:173)
`pled to the central processing unit 12. These user-net-
`work-commands and corresponding vocabularies and
`grammars stored in the memory 13 can be down-loaded
`into the extended vocabulary section 7b and the
`extended grammar section 7d of the memory 7 included
`in the speech unit 2 in connection with a device name
`for a respective network device 11 via the central
`processing unit 12 of the network device 11, the link
`layer control unit 17 and the I/F physical layer unit 16 of
`
`the network device 11, the home network bus system
`10, the I/F physical layer unit 6 and the link layer control
`unit 5 of the speech unit 2 and the central processing
`unit 4 of the speech unit 2. In this way all user-network(cid:173)
`commands necessary to control a network device 11
`and corresponding vocabulary and grammars are
`learned by the speech unit 2 according to the present
`invention and therefore, a network device according to
`the present invention needs no built-in device depend-
`ent speech recognizer to be controllable via speech, but
`just a memory holding all device dependent user-net(cid:173)
`work-commands with associated vocabulary and gram(cid:173)
`mars to be down-loaded into the speech unit 2. It is to
`be understood that a basic control of a network device
`by the speech unit 2 is also given without vocabulary
`update information. i.e. the basic control of a network
`device without its device dependent user-network-com(cid:173)
`mands with associated vocabulary and grammars is
`possible. Basic control means here to have the possibil-
`ity to give commands generally defined in some stand(cid:173)
`ard, like switch-on, switch-off, louder, switch channel,
`play, stop, etc ..
`[0029] Fig. 3 shows an example of a network architec(cid:173)
`ture having an IEEE 1394 bus and connected thereto
`one speech unit 2 with microphone 1 and loudspeaker
`and four network devices 11.
`[0030]
`Fig. 4 shows another example of a network
`architecture having four network devices 11 connected
`to an IEEE 1394 bus. Further, a network device 4 hav(cid:173)
`ing a built-in speech unit with microphone 1 and loud(cid:173)
`speaker is connected to the bus 31. Such a network
`device 41 with a built-in speech unit has the same func(cid:173)
`tionality as a network device 11 and a speech unit 2.
`Here, the speech unit controls the network device 11
`35 and the network device 41 which it is built-in.
`[0031]
`Fig. 5 shows further three examples for net(cid:173)
`work architectures. Network A is a network similar to
`that shown in Fig. 3, but six network devices 11 are con(cid:173)
`nected to the bus 31. In regard to the speech unit 2 that
`is also connected to the bus 31, there is no limitation of
`network devices 11 controllable via said speech unit 2.
`Every device connected to the bus 31 that is controlla(cid:173)
`ble via said bus 31 can also be controlled via the speech
`unit 2.
`[0032] Network B shows a different type of network.
`Here, five network devices 11 and one speech unit 2 are
`connected to a bus system 51. The bus system 51 is
`organized so that a connection is only necessary in(cid:173)
`between two devices. Network devices not directly con-
`nected to each other can communicate via other third
`network devices. Regarding the functionality, network B
`has no restrictions in comparison to network A.
`[0033] The third network shown in Fig. 5 is a wireless
`network. Here, all devices can directly communicate
`with each other via a transmitter and a receiver built into
`each device. This example shows also that several
`speech units 2 can be connected to one network. Those
`speech units 2 can have both the same functionality or
`
`40
`
`45
`
`50
`
`55
`
`5
`
`Petitioner Google Ex-1008, 0005
`
`

`

`9
`
`EP O 911 808 A1
`
`10
`
`both different functionalities, as desired. In this way, it is
`also easily possible to build personalized speech units 2
`that can be carried by respective users and that can
`control different network devices 11, as desired by the
`user. Of course, personalized speech units can also be
`used in a wired network. In comparison to a wireless
`speech
`input and/or output facility a personalized
`speech unit has the advantage that it can automatically
`log-into another network and all personalized features
`are available.
`[0034]
`Such a personalized network device can be
`constructed to translate only those spoken-commands
`of a selected user into user-network-commands using
`speaker-adaption or speaker-verification. This enables
`a very secure access policy in that an access is only
`allowed if the correct speaker uses the correct speech
`unit. Of course, all kinds of accesses can be controlled
`in this way, e.g. access to the network itself, access to
`devices connected to the network, like access to rooms,
`to a VCR, to televisions and the like.
`[0035] Further, electronic phone books my be stored
`within the speech unit. Calling functions by name, e.g.
`office, is strongly user-dependent and therefore such
`features will be preferably realized in personalized
`speech units. Also spoken-commands as switch on my
`TV can easily be assigned to the correct user-network(cid:173)
`commands controlling the correct device, as it may be
`the case that different users assign different logical
`names therefore and the speech unit 2 has to generate
`the same user-network-command when interpreting dif(cid:173)
`ferent spoken-commands. On the other hand, it is pos(cid:173)
`sible that the network e.g. comprises more than one
`device of the same type, e.g. two TVs, and the speech
`unit 2 has to generate different user-network-com(cid:173)
`mands when interpreting the same spoken-command
`uttered by different users, e.g. switch on my TV.
`[0036]
`One speech unit can contain personalized
`information of one user or different users. In most cases
`the personalized speech unit corresponding to only one
`user will be portable and wireless, so that the user can
`take it with him/her and has the same speech unit at
`home, in the car or even other networks, like in his/her
`office.
`[0037] The personalized speech unit can be used for
`speaker verification purposes. It verifies the words of a
`speaker and allows the control of selected devices. This
`can also be used for controlling access to rooms, cars or
`devices, such as phones.
`[0038]
`A personalized speech unit can contain a
`speech recognizer adapted
`to one person which
`strongly enhances the recognition performance.
`[0039]
`Fig. 6 shows an example of a home network
`consisting of three clusters. One of the clusters is built
`by an IEEE 1394 bus 61 installed in a kitchen of the
`house. Connected to this bus is a broadcast receiver 65,
`a digital television 64, a printer 63, a phone 62 and a
`long distance repeater 66. This cluster has also connec(cid:173)
`tions to a broadcast gateway 60 to the outside of the
`
`5
`
`15
`
`house and via the repeater 66 and an IEEE 1394 bridge
`74 to the cluster "sitting room" in which also an IEEE
`1394 bus 67 is present. Apart from the bridge 74, a
`speech unit 70, a personal computer 69, a phone 68, a
`VCR 71, a camcorder 72 and a digital television 73a is
`connected to the bus 67. The bridge 74 is also con-
`nected to the third cluster "study" which comprises an
`IEEE 1394 bus 78 connected to the bridge 74 via a long
`distance repeater 75. Further, a PC 76, a phone 77, a
`10 hard disc 79, a printer 80, a digital television 81 and a
`telephone NIU 82 are connected to said bus 78. A tele(cid:173)
`phone gateway 83 is connected to the telephone NIU
`82.
`[0040] The above described network is constructed so
`that every device can communicate with the other
`devices via the IEEE 1394 system, the bridge 74 and
`the repeaters 66 and 75. The speech unit 70 located in
`the sitting room can communicate with all devices and
`therewith have the possibility to control them. This
`20 speech unit 70 is built like the speech unit 2 described
`above. Since in the example shown in Fig. 6 several
`devices of the same type are present, e.g., the digital
`television 30 in the sitting room and the digital television
`81
`in the study, it is possible to define user defined
`25 device names. When the network is set-up or when a
`device is connected to the network having already a
`device of this type connected thereto, the speech unit
`70 will ask the user for names for these devices, e.g. tel(cid:173)
`evision in the sitting room and television in the study to
`30 be assigned to the individual devices. To be able to rec(cid:173)
`ognize these names, one of the following procedures
`has to be done.
`
`1. The user has to enter the orthographic form
`(sequence of letters) of the device name by typing
`or spelling. The speech unit 70 maps the ortho(cid:173)
`graphic form into phoneme or model sequence;
`2. In the case of a personalized speech unit, the
`user utterance corresponding to the device name
`can be stored as a feature vector sequence, that is
`directly used during recognition as reference pat(cid:173)
`tern in a pattern matching approach;
`3. The phoneme sequence corresponding to the
`name can be learned automatically using a pho(cid:173)
`neme recognizer.
`
`35
`
`40
`
`45
`
`50
`
`[0041]
`The user has then only to address these
`devices by name, e.g. television in the sitting room. The
`speech unit 70 maps the name to the appropriate net(cid:173)
`work address. By default, the name corresponds to the
`functionality of the device. All commands uttered by a
`user are sent to the device named at last. Of course it is
`also possible that these names are changed later on.
`[0042]
`In many situations a person might wish to
`55 access his device at home over the phone, e.g. to
`retrieve faxes or to control the heating remotely. Two
`alternative architectures to realize such a remote
`access are illustrated in Fig. 7.
`
`6
`
`Petitioner Google Ex-1008, 0006
`
`

`

`11
`
`EP O 911 808 A1
`
`12
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket