throbber
(19) United States
`(12) Patent Application Publication (10) Pub. No.: US 2005/0102142 A1
`(43) Pub. Date:
`May 12, 2005
`Soufflet et al.
`
`US 2005.0102142A1
`
`(54) METHOD, MODULE, DEVICE AND SERVER
`FOR WOICE RECOGNITION
`
`(76) Inventors: Frederic Soufflet, Chateaugiron (FR);
`Nour-Eddine Tazine, Noyal sur Vilaine
`(FR)
`Correspondence Address:
`Joseph S Tripoli
`Thomson Multimedia Licensing Inc
`Patent Operations CN 5312
`Princeton, NJ 08543-0028 (US)
`(21) Appl. No.:
`(22) PCT Filed:
`Feb. 12, 2002
`PCT/FR02/00518
`(86) PCT No.:
`(30)
`Foreign Application Priority Data
`
`10/467,586
`
`Feb. 13, 2001 (FR).......................
`
`- - - - - - - - - - - - - - - - - - - - - O1/O1910
`
`Publication Classification
`
`(51) Int. Cl." ..................................................... G10L 15/00
`(52) U.S. Cl. .............................................................. 704/246
`
`(57)
`
`ABSTRACT
`
`The invention concerns a voice recognition method imple
`mented in at least a terminal, the Voice recognition method
`using a language model, comprising the following Steps:
`detecting at least a non-recognised expression in one of the
`terminals, recording in the terminal data representing the
`non-recognised expression; transmission by the terminal of
`the recorded data to a remote Server; analysis, at the remote
`Server, of Said data and generation of data for correcting Said
`language model taking into account at least part of the
`non-recognised expressions, and transmitting from the
`Server to at least a terminal correction data, So as to enable
`Subsequent recognition of at least Some of the non-recogn
`ised expressions. The invention also concerns corresponding
`
`modules, devices and remote Server.
`
`Source
`
`
`
`
`
`
`
`
`
`command
`Y103
`
`102
`
`104
`context
`
`05
`
`
`
`unrecognized
`expressions
`data storage
`
`
`
`N
`1
`unrecognized
`expressions
`data
`
`
`
`
`
`
`
`Corrections
`
`106
`
`N command
`
`
`
`
`
`
`
`interface with
`See
`
`user system
`
`
`
`
`
`use System n
`
`
`
`
`
`Teote serve
`
`6
`
`22
`
`Exhibit 1013
`Page 01 of 16
`
`

`

`Patent Application Publication May 12, 2005 Sheet 1 of 6
`
`US 2005/0102142 A1
`
`
`
`Source
`
`101
`
`Voice rec.
`
`Control box
`
`Corrections
`
`106
`
`unrecognized
`expressions
`data storage
`
`N
`
`unrecognized
`expressions
`data
`
`107
`
`appliance
`
`interface with
`Serve
`
`user system
`
`17
`
`18 N
`user system n
`
`119
`
`115
`
`T
`
`remote Server
`
`14
`
`2
`
`6
`
`Fig. 1
`
`12
`
`operator
`
`Exhibit 1013
`Page 02 of 16
`
`

`

`Patent Application Publication May 12, 2005 Sheet 2 of 6
`
`US 2005/0102142 A1
`
`
`
`102
`voice recognition
`
`101
`
`202
`
`recognition engine
`
`rejection
`module
`
`loading of
`language models
`
`Exhibit 1013
`Page 03 of 16
`
`

`

`Patent Application Publication May 12, 2005 Sheet 3 of 6
`
`US 2005/0102142 A1
`
`
`
`301
`
`302
`303
`
`3 O
`
`R A M
`Expp not Rec
`Nb Exp not Rec 309
`O
`
`moodel language
`
`Exhibit 1013
`Page 04 of 16
`
`

`

`Patent Application Publication May 12, 2005 Sheet 4 of 6
`
`US 2005/0102142 A1
`
`
`
`receiver
`
`40
`
`analyser
`
`Correction
`C. Construction
`
`Coperator Cy 122
`Fig. 4
`
`
`
`initialization
`
`
`
`
`
`wait for expression
`
`expression
`recognized
`
`
`
`
`
`503
`
`Exhibit 1013
`Page 05 of 16
`
`

`

`Patent Application Publication May 12, 2005 Sheet 5 of 6
`
`US 2005/0102142 A1
`
`initialization
`
`
`
`disconnect remote server
`
`Fig. 6
`
`initialization
`
`
`
`
`
`Wait for receipt of correction data
`
`
`
`update language model
`
`Fig. 7
`
`Exhibit 1013
`Page 06 of 16
`
`

`

`Patent Application Publication May 12, 2005 Sheet 6 of 6
`
`US 2005/0102142 A1
`
`initialization
`
`800
`
`one one on
`
`801
`
`receive unrecognized sentences data
`
`802
`
`proceSS unrecognized sentences data
`
`803
`
`
`
`
`
`
`
`
`
`
`
`YES
`
`LM update
`necessary?
`
`NO
`804
`
`construct language model correction
`
`805
`
`send language model correction
`
`N 806
`
`Fig. 8
`
`Exhibit 1013
`Page 07 of 16
`
`

`

`US 2005/0102142 A1
`
`May 12, 2005
`
`METHOD, MODULE, DEVICE AND SERVER FOR
`VOICE RECOGNITION
`0001. The present invention concerns the field of voice
`interfaces.
`0002 More precisely, the invention relates to the optimi
`Zation of language models and/or of phonetic units in
`terminals using voice recognition.
`0003) Information or control systems are making ever
`increasing use of a voice interface to make interaction with
`the user faster and/or more intuitive. Since these Systems are
`becoming ever more complex, the requirements in terms of
`Voice recognition are ever more considerable, both as
`regards the extent of recognition (very large Vocabulary) and
`the Speed of recognition (real time).
`0004 Voice recognition processes based on the use of
`language models (probability that a given word of the
`Vocabulary of the application follows another word or group
`of words in the chronological order of the Sentence) and of
`phonetic units are known in the State of the art. These
`techniques are in particular described in the work by Fred
`erik Jelinek “Statistical methods for speech recognition”
`published by MIT Press in 1997.
`0005 These techniques rely on language models and
`phonetic units which are produced from representative voice
`Samples (emanating for example from a population of users
`of a terminal who are made to utter commands).
`0006.
`In practice, the language models must take account
`of the Speaking Style ordinarily employed by a user of the
`System, and in particular of his “defects': hesitations, false
`Starts, change of mind, etc.
`0007. The quality of a language model used greatly
`influences the reliability of the voice recognition. This
`quality is most often measured by an indeX referred to as the
`perplexity of the language model, and which Schematically
`represents the number of choices which the System must
`make for each decoded word. The lower this perplexity, the
`better the quality.
`0008. The language model is necessary to translate the
`Voice signal into a textual String of words, a step often used
`by dialogue Systems. It is then necessary to construct a
`comprehension logic which makes it possible to compre
`hend the query So as to reply to it.
`0009. There are two standard methods for producing
`large Vocabulary language models:
`0.010 The so-called N-gram statistical method, most
`often employing a bigram or trigram, consists in assuming
`that the probability of occurrence of a word in the Sentence
`depends Solely on the N words which precede it, indepen
`dently of the rest of its context in the Sentence.
`0.011) If one takes the example of the trigram for a
`vocabulary of 1000 words, it would be necessary to define
`10003 probabilities to define the language model, this being
`impossible. The words are therefore grouped into Sets which
`are either defined explicitly by the model designer, or
`deduced by Self-organizing methods.
`0012. This language model is therefore constructed from
`a text corpus automatically.
`
`0013 This type of language model is used mainly for
`Voice dictation Systems whose ultimate functionality is to
`translate the Voice Signal into a text, without any compre
`hension phase being necessary.
`0014. The second method consists in describing the syn
`tax by means of a probabilistic grammar, typically a context
`free grammar defined by virtue of a set of rules described in
`the so-called Backus Naur Form or BNF, or an extension of
`this form to contextual grammars. The rules describing
`grammars are most often handwritten. This type of language
`model is Suitable for command and control applications, in
`which the recognition phase is followed by a phase of
`controlling an appliance or of Searching for information in a
`database.
`0015 The language model of an application describes the
`Set of expressions (for example sentences) that the applica
`tion will be required to recognize. A drawback of the prior
`art is that, if the language model is of poor quality, the
`recognition System, even if it performs extremely well at the
`acoustico-phonetic decoding level, will have mediocre per
`formance for certain expressions.
`0016. The stochastic type language models do not have,
`properly Speaking, a clear definition of the expressions
`which are in the language model, and of those which are
`outside. Certain expressions simply have a higher a priori
`probability of occurrence than others.
`0017. The language models of probabilistic grammar
`type Show a clear difference between expressions belonging
`to the language models, and expressions external to the
`language model. In these models, expressions therefore exist
`which will never be able to be recognized, regardless of the
`quality of the phonetic models used. These are generally
`expressions having no meaning or that carry a meaning
`outside of the field of application of the System developed.
`0018. It turns out that the language models of probabi
`listic type and their derivatives are more effective for com
`mand and control applications. These grammars are often
`written by hand, and one of the main difficulties of the
`development of dialogue Systems is to offer a language
`model of good quality.
`0019. In particular, as far as the models of grammar type
`are concerned, it is not possible to exhaustively define a
`language in particular if the latter is liable to be used by a
`large population (case for example of a remote control for
`mass-market appliances). It is not possible to take account of
`all the possible expressions and turns of phrase (from formal
`language to Slang), and/or of errors of grammar, etc.
`0020. The invention relates to a voice recognition process
`and System making it possible to modify and improve a
`language model remotely, on the basis of the recordings of
`expressions unrecognized by the System.
`0021 More precisely, the subject of the invention is a
`Voice recognition process implemented in at least one ter
`minal, the Said Voice recognition proceSS using a language
`model, characterized in that it comprises the following Steps:
`0022 detection of at least one unrecognized expres
`Sion in one of the terminals,
`0023 recording in the terminal of data representa
`tive of the unrecognized expression;
`
`Exhibit 1013
`Page 08 of 16
`
`

`

`US 2005/0102142 A1
`
`May 12, 2005
`
`0024 transmission by the terminal of the recorded
`data to a remote Server, via a first transmission
`channel;
`0025 analysis, at the level of the remote server, of
`the data and generation of information for correcting
`the language model taking account of at least one
`part of the unrecognized expressions, and
`0026 transmission via a second transmission chan
`nel from the server to at least one terminal of the
`correcting information, So as to allow future recog
`nition of at least certain of the unrecognized expres
`SOS.
`0.027 Thus, the invention relies on an entirely novel and
`inventive approach to Voice recognition, which makes it
`possible to update the various elements allowing voice
`recognition as a function of locally unrecognized expres
`Sions, a remote Server being furnished with considerable
`resources (for example human and/or computational capa
`bilities) generating correcting information.
`It is noted that here the language models comprise:
`0028)
`0029) language models in the strict sense (this being
`the case, for example, when the data, the Subject of
`the recognition, are of purely textual type);
`0030) models formed of one or more language mod
`els in the Strict Sense and of one or more Sets of
`phonetic units (this corresponding in particular to the
`general case of Voice recognition applied to voice
`Samples).
`0031. The invention goes well beyond the straightfor
`ward updating of a vocabulary. Specifically, it is possible
`that, although all the words of an expression feature in the
`Vocabulary used by the language model of the terminal, this
`expression may not be recognized. Only the updating of the
`language model itself then makes it possible to have this
`expression recognized Subsequently. The updating of the
`vocabulary, which is one of the items of information from
`which the language model is derived, is not Sufficient.
`0.032
`Here, the expressions are to be taken within the
`wide Sense and relate to any vocal expression allowing
`interaction between a terminal and its user. Expressions (or
`utterances) comprise in particular sentences, phrases, iso
`lated or non-isolated words, code words Specific to the
`terminal, instructions, commands, etc.
`0033. The correcting information may comprise in par
`ticular information allowing partial or complete modifica
`tion of the language model and/or phonetic units present in
`each terminal by deleting, replacing or adding elements
`therein.
`0034. The server can receive data from each terminal
`allowing it to improve the language model and/or the
`phonetic units present in the data Sending terminal and also
`in all other terminals, each of the terminals benefiting from
`the shared experience acquired by the Server from all the
`terminals.
`0.035 Thus, the invention makes it possible to take into
`account language Styles or turns of phrase specific to certain
`users (for example, the expression “8 pm in the evening”
`(pleonasm which is hard to imagine a priori) instead of “8
`pm” or “8 o'clock in the evening”) and for which provision
`
`had not been made in the course of the construction of the
`language model implemented.
`0036 Furthermore, the invention takes into account the
`evolution of living languages (new turns of phrase or
`expressions, etc).
`0037. It is noted that the invention applies equally well to
`language models of Stochastic type and to language models
`of probabilistic grammar type. When the invention is applied
`to language models of Stochastic type, there are generally
`very many correcting data for influencing recognition,
`whereas the correcting data for a model of probabilistic
`grammar type may be Scanty and have an appreciable
`influence on the effectiveness and reliability of recognition.
`0038 According to a particular characteristic, the process
`is noteworthy in that the data representative of the unrec
`ognized expressions comprise a compressed Voice recording
`representative of parameters descriptive of the acoustic
`Signal.
`0039 Thus, the invention advantageously makes it pos
`Sible to take into account the Voice data Sent to the Source for
`fine analysis at the server level, while limiting the volume of
`data transmitted to the remote Server.
`0040 According to a particular characteristic, the process
`is noteworthy in that during the Step of transmission by the
`terminal, the latter furthermore transmits to the Server at
`least one of the items of information forming part of the
`group comprising:
`0041 information of context of use of the voice
`recognition proceSS when an expression has not been
`recognized; and
`0042 information relating to the speaker who has
`uttered an unrecognized expression.
`0043. Thus, voice recognition of the expressions unrec
`ognized by the terminal, which may be performed remotely,
`is facilitated.
`0044) Furthermore, a check of the validity of the content
`of the unrecognized expressions may be performed as a
`function of the context. (For example, a command “record
`the transmission' has a meaning and is therefore valid when
`the terminal to which it is addressed is a Video recorder and
`has no meaning in respect of a mobile telephone).
`0045 According to a particular characteristic, the process
`is noteworthy in that it implements an encryption and/or a
`Scrambling of the recorded data and/or of the correcting
`information.
`0046) Thus, the data are made secure effectively and
`remain confidential.
`0047 The information also relates to a voice recognition
`module using a language model, characterized in that it
`comprises:
`0048 an analyser detecting unrecognized expres
`Sions,
`0049 a recorder of data representative of at least one
`unrecognized expression;
`0050 a transmitter transmitting the recorded data to
`a remote Server; and
`
`Exhibit 1013
`Page 09 of 16
`
`

`

`US 2005/0102142 A1
`
`May 12, 2005
`
`0051 a receiver of correcting information allowing
`the correcting of the language model transmitted to
`the module allowing future recognition of at least
`certain of the unrecognized expressions by the mod
`ule, the correcting information having been trans
`mitted by the remote server after analysis at the level
`of the remote Server of the data, and after generation
`of information for correcting the language model
`taking account of at least one part of the unrecog
`nized expressions.
`0.052 The invention also relates to a voice recognition
`device using a language model, characterized in that it
`comprises:
`0053 an analyser detecting unrecognized expres
`Sions,
`0054 a recorder of data representative of at least one
`unrecognized expression;
`0055 a transmitter transmitting the recorded data to
`a remote Server; and
`0056 a receiver of correcting information allowing
`the correcting of the language model transmitted to
`the device allowing future recognition of at least
`certain of the unrecognized expressions by the
`device, the correcting information having been trans
`mitted by the remote server after analysis at the level
`of the remote server of the data, and after generation
`of information for correcting the language model
`taking account of at least one part of the unrecog
`nized expressions.
`0057 The invention also relates to a voice recognition
`Server, the recognition being implemented in a set of at least
`one remote terminal, using a language model, characterized
`in that it comprises the following means:
`0058 a receiver of data representative of at least one
`expression unrecognized by at least one terminal
`forming part of the Set of at least one remote terminal
`and having detected the unrecognized expression
`during a voice recognition operation; and
`0059 a sender sending to the set of at least one
`remote terminal correcting information obtained on
`the basis of an analysis of the data received at the
`level of the Server, the correcting information allow
`ing the correcting, by each of the terminals of the Set,
`of the language model allowing future recognition of
`at least one part of the unrecognized expressions.
`0060. The particular characteristics and the advantages of
`the module, of the device and of the server for voice
`recognition being Similar to those of the Voice recognition
`process, they are not recalled here.
`0061. Other characteristics and advantages of the inven
`tion will become more clearly apparent when reading the
`following description of a preferred embodiment given by
`way of Straightforward non-limiting illustrative example,
`and of the appended drawings, among which:
`0.062
`FIG. 1 depicts a general schematic of a system
`comprising a voice-controlled box, in which the technique
`of the invention may be implemented;
`
`0063 FIG.2 depicts a schematic of the voice recognition
`box of the system of FIG. 1;
`0064 FIG. 3 describes an electronic diagram of a voice
`recognition box implementing the Schematic of FIG. 2;
`0065 FIG. 4 depicts a schematic of the server of the
`system of FIG. 1;
`0.066 FIG. 5 represents a flow chart of the process for
`testing an expression and for recording data relating to
`unrecognized expressions, as implemented by the recogni
`tion engine of FIG. 2;
`0067 FIG. 6 represents a flow chart of the process for
`Sending data relating to unrecognized expressions, as imple
`mented by the rejection module of FIG. 2;
`0068 FIG. 7 represents a flow chart of the process for
`receiving correcting data, as implemented by the module for
`loading the language models of FIG. 2.; and
`0069 FIG. 8 represents a flow chart of the process for
`receiving and for processing correcting data, as imple
`mented within the remote server of FIG. 4.
`0070 The general principle of the invention therefore
`relies on Voice recognition implemented in terminals, the
`Voice recognition proceSS using a language model and/or a
`Set of phonetic units that may be updated by a remote Server
`when, in particular, the latter deems it necessary.
`0071. In a general manner each terminal can recognize
`expressions (for example sentence or command) formulated
`by a speaker and execute a corresponding action.
`0072 Nevertheless, it is often found that certain expres
`Sions that are entirely comprehensible to a human being are
`not recognized by the device or the module implementing
`the Voice recognition.
`0073. The failure of recognition may have multiple rea
`SOS
`0074 vocabulary used by the speaker not forming
`part of the language model;
`0075 particular pronunciation (with accent for
`example);
`0076 particular turn of phrase not provided for by
`the Voice recognition device or module,
`0077) etc.
`0078 Specifically, the language models and the sets of
`phonetic units are often constructed on the basis of Statistical
`data which take into account Samples of expressions cus
`tomarily used by a typical population, certain words of
`Vocabulary, pronunciations, and/or turns of phrase then not
`being (and unable to be) taken into account.
`0079 The invention relies firstly on detecting expres
`Sions unrecognized by the Voice recognition device or
`module.
`0080 When an expression has not been recognized, the
`terminal records data representative of the Signal corre
`sponding to the unrecognized expressions (such as, for
`example, a voice digital recording of the expression), with a
`View to Sending them to a remote Server.
`
`Exhibit 1013
`Page 10 of 16
`
`

`

`US 2005/0102142 A1
`
`May 12, 2005
`
`0081. At the level of the remote server centralizing the
`unrecognized expressions from a set of terminals, a human
`operator can then analyse the unrecognized expressions.
`0082) Certain of them may prove to be incomprehensible
`and/or unutilizable and will be discarded.
`0.083. On the other hand, others will be entirely compre
`hensible to the operator who will be able (if he deems it
`useful) through a man/machine link up to “translate” these
`expressions hitherto unrecognized by the terminals into a
`code comprehensible to the Server.
`0084. The server can then take these expressions into
`account together with their translation So as to generate
`information for correcting the language model and/or the Set
`of phonetic units.
`0085
`It will be noted that correction is understood here
`S.
`
`0086)
`modification of the model; and/or
`Supplementing the model.
`0087
`0088. The server then sends the correcting information to
`each of the terminals which can update its language model
`and/or set of phonetic units which are enriched with numer
`ous expressions unrecognized by itself or by other terminals.
`0089. Thus, the voice recognition of each of the terminals
`is improved by benefitting from the experience Shared by all
`the terminals.
`0090 According to a particular mode of the invention,
`the analysis is not performed by an operator but by the Server
`itself which may have much more considerable resources at
`its disposal than a Straightforward terminal.
`0.091
`According to particular embodiments, the termi
`nals Send the server context data (for example the time, the
`date, a control performed manually or Vocally after the
`failure of a voice command, the location, the type of
`terminal, etc.) together with the data representative of the
`Signal corresponding to the unrecognized expressions.
`0092. This may facilitate the analysis work of the opera
`tor and/or of the server.
`0093. A general schematic of a system comprising a
`voice-controlled box, in which the technique of the inven
`tion may be implemented, is depicted in conjunction with
`FIG. 1.
`0094. This system comprises in particular:
`0.095
`a remote server 116 controlled by a human
`operator 122; and
`a plurality of user systems 114, 117 and 118.
`0096)
`0097. The remote server 116 is linked to each of the user
`systems 114, 117 and 118 via communication downlinks
`115, 119 and 120 respectively. These links may be perma
`nent or temporary and be of any type well known to the
`perSon Skilled in the art. They may in particular be of
`broadcasting type and be based on RF, Satellite or wire
`channels used by television or any other type Such as, for
`example, an Internet type link.
`0.098 FIG. 1 describes in particular the user system 114
`which is linked via a communication uplink 121 to the server
`
`116. This link can likewise be of any type well known to the
`person skilled in the art (in particular telephone, Internet,
`etc).
`0099. The user system 114 comprises in particular:
`0100 a voice source 100 which may in particular
`consist of a microphone intended to pick up a voice
`Signal produced by a Speaker;
`0101 a voice recognition box 102;
`0102 a control box 105 intended to drive an appli
`ance 107;
`0.103
`a controlled appliance 107, for example of
`television, Video recorder or mobile communication
`terminal type;
`0104 a unit 109 for storing the expressions detected
`as unrecognized;
`0105 an interface 112 allowing upward and down
`ward communications to the server 116.
`0106 The source 100 is linked to the voice recognition
`box 102, via a link 101 which allows it to transmit an
`analogue Source wave representative of a voice Signal to the
`bOX 102.
`0107 The box 102 can retrieve context information 104
`(such as for example the type of appliance 107 which can be
`controlled by the control box 105 or the list of control codes)
`via a link 104 and send commands to the control box 105 via
`a link 103.
`0108). The control box 105 sends commands via a link
`106, for example infrared, to the appliance 107, as a function
`of the information which it recognizes according to its
`language model and its dictionary.
`0109) The control box 105 detects the expressions which
`it does not recognize, and, instead of Simply rejecting them,
`by Sending a non-recognition signal, it performs a recording
`of these expressions to the storage unit 109 via a link 108.
`0110. The unit 109 for storing the unrecognized expres
`Sions Sends representative data to the interface 112 via a link
`111, which relay them to the server 116 via the link 121.
`After correct transmission, the interface 110 can Send a
`signal 110 to the storage unit 109 which can then erase the
`transmitted data.
`0111. The control box 105 receives, moreover, correcting
`data from the interface 112 via a link 113, that the interface
`112 has itself received from the remote server via the link
`115. These correcting data are taken into account by the
`control box 105 for the updating of language models and/or
`of Sets of phonetic units.
`0112 According to the embodiment considered the
`Source 100, the voice recognition box 102, the control box
`105, the storage unit 109 and the interlace 112 form part of
`one and the same device and thus the links 101, 103, 104,
`108, 111, 110 and 113 are links internal to the device. The
`link 106 is typically a wireless link.
`0113. According to a first variant embodiment of the
`invention described in FIG. 1, the elements 100, 102,105,
`109 and 112 are partly or completely separate and do not
`
`Exhibit 1013
`Page 11 of 16
`
`

`

`US 2005/0102142 A1
`
`May 12, 2005
`
`form part of one and the same device. In this case, the links
`101, 103, 104, 108, 111, 110 and 113 are external wire or
`other links.
`0114. According to a second variant, the source 100, the
`boxes 102 and 105, the storage unit 109 and the interface 112
`as well as the appliance 107 form part of one and the same
`device and are interlinked by internal buses (links 101, 103,
`104,108, 111, 110, 113 and 106). This variant is particularly
`beneficial when the device is, for example, a mobile tele
`phone or a portable communication terminal.
`0115 FIG. 2 depicts a schematic of a voice-controlled
`box such as the box 102 illustrated with regard to FIG. 2.
`0116. It is noted that the box 102 receives from outside
`the analogue Source wave 101 which is processed by an
`Acoustico-Phonetic Decoder 200 or APD (also called the
`“front end”). The APD 200 samples the source wave 101 at
`regular intervals (typically every 10 ms). So as to produce
`real vectors or vectors belonging to code books, typically
`representing oral resonances that are Sent via a link 201 to
`a recognition engine 203. The APD is for example based on
`a PLP (standing for “Perceptual Linear Prediction”)
`described in particular in the article “Perceptual Linear
`Prediction (PLP) analysis of speech” written by Hynek
`Hermansky and published in “Journal of the Acoustical
`Society of America”, Vol. 97, No4, 1990 on pages 1738
`1752.
`0117. With the aid of a dictionary 202, the recognition
`engine 203 analyses the real vectors that it receives, using in
`particular hidden Markov models or HMMs and language
`models (which represent the probability of one word fol
`lowing another word). Recognition engines are in particular
`described in detail in the book “Statistical Methods for
`Speech Recognition” written by Frederick Jelinek, and
`published by MIT Press in 1997.
`0118. The language model allows the recognition engine
`203 (which may use in particular hidden Markov networks)
`to determine which words may follow a given word of any
`expression usable by the Speaker in a given application, and
`to give the associated probability. The words in question
`belong to the Vocabulary of the application, which may be,
`independently of the language model, of Small size (typi
`cally from 10 to 300 words), or of large size (for example of
`size greater than 300 000 words).
`0119) Patent application PCT/FR00/03329 dated 29 Nov.
`1999 filed in the name of Thomson Multimedia describes a
`language model comprising a plurality of Syntactic blockS.
`The use of the invention which is the subject of the present
`patent application is particularly advantageous in conjunc
`tion with this type of modular language model, Since the
`modules may be updated Separately, thereby avoiding the
`downloading of Overly large Volume files.
`0120) The language models are transmitted by a language
`model loading module 207. The module 207 itself receives
`language models, updates or corrections of language models
`and/or of phonetic units transmitted from the server via the
`link 113.
`0121. It is noted that the dictionary 202 belongs to the
`language model making reference to words from the dictio
`nary. Thus, the dictionary 202 itself can be updated and/or
`corrected via a language model loaded by the module 207.
`
`0122). After implementation of a recognition operation
`based on the use of a Viterbi algorithm, the recognition
`engine 203 Supplies the rejection module 211 with an
`ordered list of Sequences of words in accordance with the
`language model, which exhibits the best Score for the
`expression uttered.
`0123 The rejection module 211 works downstream of the
`recognition engine 203 and operates according to one or
`more of the following principles:
`0.124. Sometimes, for reasons specific to the Viterbi
`algorithm, the latter may not produce a consistent list
`because the Scores are So low that the limit of
`acceptable accuracies of the machine in terms of
`arithmetic computation is overstepped. There is
`therefore no consistent complete proposal. Thus,
`when the rejection module 211 detects one or more
`Scores below a predetermined acceptable limit, the
`expression is rejected.
`0.125
`Each element of the list calculated by the
`Viterbi algorithm has been retained because the
`asSociated Score was among the highest relative
`Scores of all the possible expressions, according to
`the language model. Additionally, the Markov net
`work associated with each of these expressions
`makes it possible to evaluate the intrinsic probability
`of the network in question producing the expression
`asSociated with the Score observed. The rejection
`module 211 analyses this probability and if it is less
`than a predetermined threshold of acceptability of
`probability, the expression is rejected.
`0126. According to another method, for the best
`proposals obtained via the Viterbi algorithm, the
`rejection module 211 performs a complementary
`processing of the expressions, using criteria which
`had not been taken into account in the course of the
`Viterbi development. For example, it checks that
`those parts of the Signal that have to be voiced
`because they are associated with vowels, are actually
`SO. If the expressions proposed do not fulfil these
`conditions, they are rejected.
`0127. When the rejection module 211 rejects an expres
`Sion, as illustrated previously, the expression is Said to be
`unrecognized and a signal indicating the rejected expression
`is Sent to the recognition engine 203. In parallel, the rejec
`tion module transmits a recording of th

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket