`(12) Patent Application Publication (10) Pub. No.: US 2005/0102142 A1
`(43) Pub. Date:
`May 12, 2005
`Soufflet et al.
`
`US 2005.0102142A1
`
`(54) METHOD, MODULE, DEVICE AND SERVER
`FOR WOICE RECOGNITION
`
`(76) Inventors: Frederic Soufflet, Chateaugiron (FR);
`Nour-Eddine Tazine, Noyal sur Vilaine
`(FR)
`Correspondence Address:
`Joseph S Tripoli
`Thomson Multimedia Licensing Inc
`Patent Operations CN 5312
`Princeton, NJ 08543-0028 (US)
`(21) Appl. No.:
`(22) PCT Filed:
`Feb. 12, 2002
`PCT/FR02/00518
`(86) PCT No.:
`(30)
`Foreign Application Priority Data
`
`10/467,586
`
`Feb. 13, 2001 (FR).......................
`
`- - - - - - - - - - - - - - - - - - - - - O1/O1910
`
`Publication Classification
`
`(51) Int. Cl." ..................................................... G10L 15/00
`(52) U.S. Cl. .............................................................. 704/246
`
`(57)
`
`ABSTRACT
`
`The invention concerns a voice recognition method imple
`mented in at least a terminal, the Voice recognition method
`using a language model, comprising the following Steps:
`detecting at least a non-recognised expression in one of the
`terminals, recording in the terminal data representing the
`non-recognised expression; transmission by the terminal of
`the recorded data to a remote Server; analysis, at the remote
`Server, of Said data and generation of data for correcting Said
`language model taking into account at least part of the
`non-recognised expressions, and transmitting from the
`Server to at least a terminal correction data, So as to enable
`Subsequent recognition of at least Some of the non-recogn
`ised expressions. The invention also concerns corresponding
`
`modules, devices and remote Server.
`
`Source
`
`
`
`
`
`
`
`
`
`command
`Y103
`
`102
`
`104
`context
`
`05
`
`
`
`unrecognized
`expressions
`data storage
`
`
`
`N
`1
`unrecognized
`expressions
`data
`
`
`
`
`
`
`
`Corrections
`
`106
`
`N command
`
`
`
`
`
`
`
`interface with
`See
`
`user system
`
`
`
`
`
`use System n
`
`
`
`
`
`Teote serve
`
`6
`
`22
`
`Exhibit 1013
`Page 01 of 16
`
`
`
`Patent Application Publication May 12, 2005 Sheet 1 of 6
`
`US 2005/0102142 A1
`
`
`
`Source
`
`101
`
`Voice rec.
`
`Control box
`
`Corrections
`
`106
`
`unrecognized
`expressions
`data storage
`
`N
`
`unrecognized
`expressions
`data
`
`107
`
`appliance
`
`interface with
`Serve
`
`user system
`
`17
`
`18 N
`user system n
`
`119
`
`115
`
`T
`
`remote Server
`
`14
`
`2
`
`6
`
`Fig. 1
`
`12
`
`operator
`
`Exhibit 1013
`Page 02 of 16
`
`
`
`Patent Application Publication May 12, 2005 Sheet 2 of 6
`
`US 2005/0102142 A1
`
`
`
`102
`voice recognition
`
`101
`
`202
`
`recognition engine
`
`rejection
`module
`
`loading of
`language models
`
`Exhibit 1013
`Page 03 of 16
`
`
`
`Patent Application Publication May 12, 2005 Sheet 3 of 6
`
`US 2005/0102142 A1
`
`
`
`301
`
`302
`303
`
`3 O
`
`R A M
`Expp not Rec
`Nb Exp not Rec 309
`O
`
`moodel language
`
`Exhibit 1013
`Page 04 of 16
`
`
`
`Patent Application Publication May 12, 2005 Sheet 4 of 6
`
`US 2005/0102142 A1
`
`
`
`receiver
`
`40
`
`analyser
`
`Correction
`C. Construction
`
`Coperator Cy 122
`Fig. 4
`
`
`
`initialization
`
`
`
`
`
`wait for expression
`
`expression
`recognized
`
`
`
`
`
`503
`
`Exhibit 1013
`Page 05 of 16
`
`
`
`Patent Application Publication May 12, 2005 Sheet 5 of 6
`
`US 2005/0102142 A1
`
`initialization
`
`
`
`disconnect remote server
`
`Fig. 6
`
`initialization
`
`
`
`
`
`Wait for receipt of correction data
`
`
`
`update language model
`
`Fig. 7
`
`Exhibit 1013
`Page 06 of 16
`
`
`
`Patent Application Publication May 12, 2005 Sheet 6 of 6
`
`US 2005/0102142 A1
`
`initialization
`
`800
`
`one one on
`
`801
`
`receive unrecognized sentences data
`
`802
`
`proceSS unrecognized sentences data
`
`803
`
`
`
`
`
`
`
`
`
`
`
`YES
`
`LM update
`necessary?
`
`NO
`804
`
`construct language model correction
`
`805
`
`send language model correction
`
`N 806
`
`Fig. 8
`
`Exhibit 1013
`Page 07 of 16
`
`
`
`US 2005/0102142 A1
`
`May 12, 2005
`
`METHOD, MODULE, DEVICE AND SERVER FOR
`VOICE RECOGNITION
`0001. The present invention concerns the field of voice
`interfaces.
`0002 More precisely, the invention relates to the optimi
`Zation of language models and/or of phonetic units in
`terminals using voice recognition.
`0003) Information or control systems are making ever
`increasing use of a voice interface to make interaction with
`the user faster and/or more intuitive. Since these Systems are
`becoming ever more complex, the requirements in terms of
`Voice recognition are ever more considerable, both as
`regards the extent of recognition (very large Vocabulary) and
`the Speed of recognition (real time).
`0004 Voice recognition processes based on the use of
`language models (probability that a given word of the
`Vocabulary of the application follows another word or group
`of words in the chronological order of the Sentence) and of
`phonetic units are known in the State of the art. These
`techniques are in particular described in the work by Fred
`erik Jelinek “Statistical methods for speech recognition”
`published by MIT Press in 1997.
`0005 These techniques rely on language models and
`phonetic units which are produced from representative voice
`Samples (emanating for example from a population of users
`of a terminal who are made to utter commands).
`0006.
`In practice, the language models must take account
`of the Speaking Style ordinarily employed by a user of the
`System, and in particular of his “defects': hesitations, false
`Starts, change of mind, etc.
`0007. The quality of a language model used greatly
`influences the reliability of the voice recognition. This
`quality is most often measured by an indeX referred to as the
`perplexity of the language model, and which Schematically
`represents the number of choices which the System must
`make for each decoded word. The lower this perplexity, the
`better the quality.
`0008. The language model is necessary to translate the
`Voice signal into a textual String of words, a step often used
`by dialogue Systems. It is then necessary to construct a
`comprehension logic which makes it possible to compre
`hend the query So as to reply to it.
`0009. There are two standard methods for producing
`large Vocabulary language models:
`0.010 The so-called N-gram statistical method, most
`often employing a bigram or trigram, consists in assuming
`that the probability of occurrence of a word in the Sentence
`depends Solely on the N words which precede it, indepen
`dently of the rest of its context in the Sentence.
`0.011) If one takes the example of the trigram for a
`vocabulary of 1000 words, it would be necessary to define
`10003 probabilities to define the language model, this being
`impossible. The words are therefore grouped into Sets which
`are either defined explicitly by the model designer, or
`deduced by Self-organizing methods.
`0012. This language model is therefore constructed from
`a text corpus automatically.
`
`0013 This type of language model is used mainly for
`Voice dictation Systems whose ultimate functionality is to
`translate the Voice Signal into a text, without any compre
`hension phase being necessary.
`0014. The second method consists in describing the syn
`tax by means of a probabilistic grammar, typically a context
`free grammar defined by virtue of a set of rules described in
`the so-called Backus Naur Form or BNF, or an extension of
`this form to contextual grammars. The rules describing
`grammars are most often handwritten. This type of language
`model is Suitable for command and control applications, in
`which the recognition phase is followed by a phase of
`controlling an appliance or of Searching for information in a
`database.
`0015 The language model of an application describes the
`Set of expressions (for example sentences) that the applica
`tion will be required to recognize. A drawback of the prior
`art is that, if the language model is of poor quality, the
`recognition System, even if it performs extremely well at the
`acoustico-phonetic decoding level, will have mediocre per
`formance for certain expressions.
`0016. The stochastic type language models do not have,
`properly Speaking, a clear definition of the expressions
`which are in the language model, and of those which are
`outside. Certain expressions simply have a higher a priori
`probability of occurrence than others.
`0017. The language models of probabilistic grammar
`type Show a clear difference between expressions belonging
`to the language models, and expressions external to the
`language model. In these models, expressions therefore exist
`which will never be able to be recognized, regardless of the
`quality of the phonetic models used. These are generally
`expressions having no meaning or that carry a meaning
`outside of the field of application of the System developed.
`0018. It turns out that the language models of probabi
`listic type and their derivatives are more effective for com
`mand and control applications. These grammars are often
`written by hand, and one of the main difficulties of the
`development of dialogue Systems is to offer a language
`model of good quality.
`0019. In particular, as far as the models of grammar type
`are concerned, it is not possible to exhaustively define a
`language in particular if the latter is liable to be used by a
`large population (case for example of a remote control for
`mass-market appliances). It is not possible to take account of
`all the possible expressions and turns of phrase (from formal
`language to Slang), and/or of errors of grammar, etc.
`0020. The invention relates to a voice recognition process
`and System making it possible to modify and improve a
`language model remotely, on the basis of the recordings of
`expressions unrecognized by the System.
`0021 More precisely, the subject of the invention is a
`Voice recognition process implemented in at least one ter
`minal, the Said Voice recognition proceSS using a language
`model, characterized in that it comprises the following Steps:
`0022 detection of at least one unrecognized expres
`Sion in one of the terminals,
`0023 recording in the terminal of data representa
`tive of the unrecognized expression;
`
`Exhibit 1013
`Page 08 of 16
`
`
`
`US 2005/0102142 A1
`
`May 12, 2005
`
`0024 transmission by the terminal of the recorded
`data to a remote Server, via a first transmission
`channel;
`0025 analysis, at the level of the remote server, of
`the data and generation of information for correcting
`the language model taking account of at least one
`part of the unrecognized expressions, and
`0026 transmission via a second transmission chan
`nel from the server to at least one terminal of the
`correcting information, So as to allow future recog
`nition of at least certain of the unrecognized expres
`SOS.
`0.027 Thus, the invention relies on an entirely novel and
`inventive approach to Voice recognition, which makes it
`possible to update the various elements allowing voice
`recognition as a function of locally unrecognized expres
`Sions, a remote Server being furnished with considerable
`resources (for example human and/or computational capa
`bilities) generating correcting information.
`It is noted that here the language models comprise:
`0028)
`0029) language models in the strict sense (this being
`the case, for example, when the data, the Subject of
`the recognition, are of purely textual type);
`0030) models formed of one or more language mod
`els in the Strict Sense and of one or more Sets of
`phonetic units (this corresponding in particular to the
`general case of Voice recognition applied to voice
`Samples).
`0031. The invention goes well beyond the straightfor
`ward updating of a vocabulary. Specifically, it is possible
`that, although all the words of an expression feature in the
`Vocabulary used by the language model of the terminal, this
`expression may not be recognized. Only the updating of the
`language model itself then makes it possible to have this
`expression recognized Subsequently. The updating of the
`vocabulary, which is one of the items of information from
`which the language model is derived, is not Sufficient.
`0.032
`Here, the expressions are to be taken within the
`wide Sense and relate to any vocal expression allowing
`interaction between a terminal and its user. Expressions (or
`utterances) comprise in particular sentences, phrases, iso
`lated or non-isolated words, code words Specific to the
`terminal, instructions, commands, etc.
`0033. The correcting information may comprise in par
`ticular information allowing partial or complete modifica
`tion of the language model and/or phonetic units present in
`each terminal by deleting, replacing or adding elements
`therein.
`0034. The server can receive data from each terminal
`allowing it to improve the language model and/or the
`phonetic units present in the data Sending terminal and also
`in all other terminals, each of the terminals benefiting from
`the shared experience acquired by the Server from all the
`terminals.
`0.035 Thus, the invention makes it possible to take into
`account language Styles or turns of phrase specific to certain
`users (for example, the expression “8 pm in the evening”
`(pleonasm which is hard to imagine a priori) instead of “8
`pm” or “8 o'clock in the evening”) and for which provision
`
`had not been made in the course of the construction of the
`language model implemented.
`0036 Furthermore, the invention takes into account the
`evolution of living languages (new turns of phrase or
`expressions, etc).
`0037. It is noted that the invention applies equally well to
`language models of Stochastic type and to language models
`of probabilistic grammar type. When the invention is applied
`to language models of Stochastic type, there are generally
`very many correcting data for influencing recognition,
`whereas the correcting data for a model of probabilistic
`grammar type may be Scanty and have an appreciable
`influence on the effectiveness and reliability of recognition.
`0038 According to a particular characteristic, the process
`is noteworthy in that the data representative of the unrec
`ognized expressions comprise a compressed Voice recording
`representative of parameters descriptive of the acoustic
`Signal.
`0039 Thus, the invention advantageously makes it pos
`Sible to take into account the Voice data Sent to the Source for
`fine analysis at the server level, while limiting the volume of
`data transmitted to the remote Server.
`0040 According to a particular characteristic, the process
`is noteworthy in that during the Step of transmission by the
`terminal, the latter furthermore transmits to the Server at
`least one of the items of information forming part of the
`group comprising:
`0041 information of context of use of the voice
`recognition proceSS when an expression has not been
`recognized; and
`0042 information relating to the speaker who has
`uttered an unrecognized expression.
`0043. Thus, voice recognition of the expressions unrec
`ognized by the terminal, which may be performed remotely,
`is facilitated.
`0044) Furthermore, a check of the validity of the content
`of the unrecognized expressions may be performed as a
`function of the context. (For example, a command “record
`the transmission' has a meaning and is therefore valid when
`the terminal to which it is addressed is a Video recorder and
`has no meaning in respect of a mobile telephone).
`0045 According to a particular characteristic, the process
`is noteworthy in that it implements an encryption and/or a
`Scrambling of the recorded data and/or of the correcting
`information.
`0046) Thus, the data are made secure effectively and
`remain confidential.
`0047 The information also relates to a voice recognition
`module using a language model, characterized in that it
`comprises:
`0048 an analyser detecting unrecognized expres
`Sions,
`0049 a recorder of data representative of at least one
`unrecognized expression;
`0050 a transmitter transmitting the recorded data to
`a remote Server; and
`
`Exhibit 1013
`Page 09 of 16
`
`
`
`US 2005/0102142 A1
`
`May 12, 2005
`
`0051 a receiver of correcting information allowing
`the correcting of the language model transmitted to
`the module allowing future recognition of at least
`certain of the unrecognized expressions by the mod
`ule, the correcting information having been trans
`mitted by the remote server after analysis at the level
`of the remote Server of the data, and after generation
`of information for correcting the language model
`taking account of at least one part of the unrecog
`nized expressions.
`0.052 The invention also relates to a voice recognition
`device using a language model, characterized in that it
`comprises:
`0053 an analyser detecting unrecognized expres
`Sions,
`0054 a recorder of data representative of at least one
`unrecognized expression;
`0055 a transmitter transmitting the recorded data to
`a remote Server; and
`0056 a receiver of correcting information allowing
`the correcting of the language model transmitted to
`the device allowing future recognition of at least
`certain of the unrecognized expressions by the
`device, the correcting information having been trans
`mitted by the remote server after analysis at the level
`of the remote server of the data, and after generation
`of information for correcting the language model
`taking account of at least one part of the unrecog
`nized expressions.
`0057 The invention also relates to a voice recognition
`Server, the recognition being implemented in a set of at least
`one remote terminal, using a language model, characterized
`in that it comprises the following means:
`0058 a receiver of data representative of at least one
`expression unrecognized by at least one terminal
`forming part of the Set of at least one remote terminal
`and having detected the unrecognized expression
`during a voice recognition operation; and
`0059 a sender sending to the set of at least one
`remote terminal correcting information obtained on
`the basis of an analysis of the data received at the
`level of the Server, the correcting information allow
`ing the correcting, by each of the terminals of the Set,
`of the language model allowing future recognition of
`at least one part of the unrecognized expressions.
`0060. The particular characteristics and the advantages of
`the module, of the device and of the server for voice
`recognition being Similar to those of the Voice recognition
`process, they are not recalled here.
`0061. Other characteristics and advantages of the inven
`tion will become more clearly apparent when reading the
`following description of a preferred embodiment given by
`way of Straightforward non-limiting illustrative example,
`and of the appended drawings, among which:
`0.062
`FIG. 1 depicts a general schematic of a system
`comprising a voice-controlled box, in which the technique
`of the invention may be implemented;
`
`0063 FIG.2 depicts a schematic of the voice recognition
`box of the system of FIG. 1;
`0064 FIG. 3 describes an electronic diagram of a voice
`recognition box implementing the Schematic of FIG. 2;
`0065 FIG. 4 depicts a schematic of the server of the
`system of FIG. 1;
`0.066 FIG. 5 represents a flow chart of the process for
`testing an expression and for recording data relating to
`unrecognized expressions, as implemented by the recogni
`tion engine of FIG. 2;
`0067 FIG. 6 represents a flow chart of the process for
`Sending data relating to unrecognized expressions, as imple
`mented by the rejection module of FIG. 2;
`0068 FIG. 7 represents a flow chart of the process for
`receiving correcting data, as implemented by the module for
`loading the language models of FIG. 2.; and
`0069 FIG. 8 represents a flow chart of the process for
`receiving and for processing correcting data, as imple
`mented within the remote server of FIG. 4.
`0070 The general principle of the invention therefore
`relies on Voice recognition implemented in terminals, the
`Voice recognition proceSS using a language model and/or a
`Set of phonetic units that may be updated by a remote Server
`when, in particular, the latter deems it necessary.
`0071. In a general manner each terminal can recognize
`expressions (for example sentence or command) formulated
`by a speaker and execute a corresponding action.
`0072 Nevertheless, it is often found that certain expres
`Sions that are entirely comprehensible to a human being are
`not recognized by the device or the module implementing
`the Voice recognition.
`0073. The failure of recognition may have multiple rea
`SOS
`0074 vocabulary used by the speaker not forming
`part of the language model;
`0075 particular pronunciation (with accent for
`example);
`0076 particular turn of phrase not provided for by
`the Voice recognition device or module,
`0077) etc.
`0078 Specifically, the language models and the sets of
`phonetic units are often constructed on the basis of Statistical
`data which take into account Samples of expressions cus
`tomarily used by a typical population, certain words of
`Vocabulary, pronunciations, and/or turns of phrase then not
`being (and unable to be) taken into account.
`0079 The invention relies firstly on detecting expres
`Sions unrecognized by the Voice recognition device or
`module.
`0080 When an expression has not been recognized, the
`terminal records data representative of the Signal corre
`sponding to the unrecognized expressions (such as, for
`example, a voice digital recording of the expression), with a
`View to Sending them to a remote Server.
`
`Exhibit 1013
`Page 10 of 16
`
`
`
`US 2005/0102142 A1
`
`May 12, 2005
`
`0081. At the level of the remote server centralizing the
`unrecognized expressions from a set of terminals, a human
`operator can then analyse the unrecognized expressions.
`0082) Certain of them may prove to be incomprehensible
`and/or unutilizable and will be discarded.
`0.083. On the other hand, others will be entirely compre
`hensible to the operator who will be able (if he deems it
`useful) through a man/machine link up to “translate” these
`expressions hitherto unrecognized by the terminals into a
`code comprehensible to the Server.
`0084. The server can then take these expressions into
`account together with their translation So as to generate
`information for correcting the language model and/or the Set
`of phonetic units.
`0085
`It will be noted that correction is understood here
`S.
`
`0086)
`modification of the model; and/or
`Supplementing the model.
`0087
`0088. The server then sends the correcting information to
`each of the terminals which can update its language model
`and/or set of phonetic units which are enriched with numer
`ous expressions unrecognized by itself or by other terminals.
`0089. Thus, the voice recognition of each of the terminals
`is improved by benefitting from the experience Shared by all
`the terminals.
`0090 According to a particular mode of the invention,
`the analysis is not performed by an operator but by the Server
`itself which may have much more considerable resources at
`its disposal than a Straightforward terminal.
`0.091
`According to particular embodiments, the termi
`nals Send the server context data (for example the time, the
`date, a control performed manually or Vocally after the
`failure of a voice command, the location, the type of
`terminal, etc.) together with the data representative of the
`Signal corresponding to the unrecognized expressions.
`0092. This may facilitate the analysis work of the opera
`tor and/or of the server.
`0093. A general schematic of a system comprising a
`voice-controlled box, in which the technique of the inven
`tion may be implemented, is depicted in conjunction with
`FIG. 1.
`0094. This system comprises in particular:
`0.095
`a remote server 116 controlled by a human
`operator 122; and
`a plurality of user systems 114, 117 and 118.
`0096)
`0097. The remote server 116 is linked to each of the user
`systems 114, 117 and 118 via communication downlinks
`115, 119 and 120 respectively. These links may be perma
`nent or temporary and be of any type well known to the
`perSon Skilled in the art. They may in particular be of
`broadcasting type and be based on RF, Satellite or wire
`channels used by television or any other type Such as, for
`example, an Internet type link.
`0.098 FIG. 1 describes in particular the user system 114
`which is linked via a communication uplink 121 to the server
`
`116. This link can likewise be of any type well known to the
`person skilled in the art (in particular telephone, Internet,
`etc).
`0099. The user system 114 comprises in particular:
`0100 a voice source 100 which may in particular
`consist of a microphone intended to pick up a voice
`Signal produced by a Speaker;
`0101 a voice recognition box 102;
`0102 a control box 105 intended to drive an appli
`ance 107;
`0.103
`a controlled appliance 107, for example of
`television, Video recorder or mobile communication
`terminal type;
`0104 a unit 109 for storing the expressions detected
`as unrecognized;
`0105 an interface 112 allowing upward and down
`ward communications to the server 116.
`0106 The source 100 is linked to the voice recognition
`box 102, via a link 101 which allows it to transmit an
`analogue Source wave representative of a voice Signal to the
`bOX 102.
`0107 The box 102 can retrieve context information 104
`(such as for example the type of appliance 107 which can be
`controlled by the control box 105 or the list of control codes)
`via a link 104 and send commands to the control box 105 via
`a link 103.
`0108). The control box 105 sends commands via a link
`106, for example infrared, to the appliance 107, as a function
`of the information which it recognizes according to its
`language model and its dictionary.
`0109) The control box 105 detects the expressions which
`it does not recognize, and, instead of Simply rejecting them,
`by Sending a non-recognition signal, it performs a recording
`of these expressions to the storage unit 109 via a link 108.
`0110. The unit 109 for storing the unrecognized expres
`Sions Sends representative data to the interface 112 via a link
`111, which relay them to the server 116 via the link 121.
`After correct transmission, the interface 110 can Send a
`signal 110 to the storage unit 109 which can then erase the
`transmitted data.
`0111. The control box 105 receives, moreover, correcting
`data from the interface 112 via a link 113, that the interface
`112 has itself received from the remote server via the link
`115. These correcting data are taken into account by the
`control box 105 for the updating of language models and/or
`of Sets of phonetic units.
`0112 According to the embodiment considered the
`Source 100, the voice recognition box 102, the control box
`105, the storage unit 109 and the interlace 112 form part of
`one and the same device and thus the links 101, 103, 104,
`108, 111, 110 and 113 are links internal to the device. The
`link 106 is typically a wireless link.
`0113. According to a first variant embodiment of the
`invention described in FIG. 1, the elements 100, 102,105,
`109 and 112 are partly or completely separate and do not
`
`Exhibit 1013
`Page 11 of 16
`
`
`
`US 2005/0102142 A1
`
`May 12, 2005
`
`form part of one and the same device. In this case, the links
`101, 103, 104, 108, 111, 110 and 113 are external wire or
`other links.
`0114. According to a second variant, the source 100, the
`boxes 102 and 105, the storage unit 109 and the interface 112
`as well as the appliance 107 form part of one and the same
`device and are interlinked by internal buses (links 101, 103,
`104,108, 111, 110, 113 and 106). This variant is particularly
`beneficial when the device is, for example, a mobile tele
`phone or a portable communication terminal.
`0115 FIG. 2 depicts a schematic of a voice-controlled
`box such as the box 102 illustrated with regard to FIG. 2.
`0116. It is noted that the box 102 receives from outside
`the analogue Source wave 101 which is processed by an
`Acoustico-Phonetic Decoder 200 or APD (also called the
`“front end”). The APD 200 samples the source wave 101 at
`regular intervals (typically every 10 ms). So as to produce
`real vectors or vectors belonging to code books, typically
`representing oral resonances that are Sent via a link 201 to
`a recognition engine 203. The APD is for example based on
`a PLP (standing for “Perceptual Linear Prediction”)
`described in particular in the article “Perceptual Linear
`Prediction (PLP) analysis of speech” written by Hynek
`Hermansky and published in “Journal of the Acoustical
`Society of America”, Vol. 97, No4, 1990 on pages 1738
`1752.
`0117. With the aid of a dictionary 202, the recognition
`engine 203 analyses the real vectors that it receives, using in
`particular hidden Markov models or HMMs and language
`models (which represent the probability of one word fol
`lowing another word). Recognition engines are in particular
`described in detail in the book “Statistical Methods for
`Speech Recognition” written by Frederick Jelinek, and
`published by MIT Press in 1997.
`0118. The language model allows the recognition engine
`203 (which may use in particular hidden Markov networks)
`to determine which words may follow a given word of any
`expression usable by the Speaker in a given application, and
`to give the associated probability. The words in question
`belong to the Vocabulary of the application, which may be,
`independently of the language model, of Small size (typi
`cally from 10 to 300 words), or of large size (for example of
`size greater than 300 000 words).
`0119) Patent application PCT/FR00/03329 dated 29 Nov.
`1999 filed in the name of Thomson Multimedia describes a
`language model comprising a plurality of Syntactic blockS.
`The use of the invention which is the subject of the present
`patent application is particularly advantageous in conjunc
`tion with this type of modular language model, Since the
`modules may be updated Separately, thereby avoiding the
`downloading of Overly large Volume files.
`0120) The language models are transmitted by a language
`model loading module 207. The module 207 itself receives
`language models, updates or corrections of language models
`and/or of phonetic units transmitted from the server via the
`link 113.
`0121. It is noted that the dictionary 202 belongs to the
`language model making reference to words from the dictio
`nary. Thus, the dictionary 202 itself can be updated and/or
`corrected via a language model loaded by the module 207.
`
`0122). After implementation of a recognition operation
`based on the use of a Viterbi algorithm, the recognition
`engine 203 Supplies the rejection module 211 with an
`ordered list of Sequences of words in accordance with the
`language model, which exhibits the best Score for the
`expression uttered.
`0123 The rejection module 211 works downstream of the
`recognition engine 203 and operates according to one or
`more of the following principles:
`0.124. Sometimes, for reasons specific to the Viterbi
`algorithm, the latter may not produce a consistent list
`because the Scores are So low that the limit of
`acceptable accuracies of the machine in terms of
`arithmetic computation is overstepped. There is
`therefore no consistent complete proposal. Thus,
`when the rejection module 211 detects one or more
`Scores below a predetermined acceptable limit, the
`expression is rejected.
`0.125
`Each element of the list calculated by the
`Viterbi algorithm has been retained because the
`asSociated Score was among the highest relative
`Scores of all the possible expressions, according to
`the language model. Additionally, the Markov net
`work associated with each of these expressions
`makes it possible to evaluate the intrinsic probability
`of the network in question producing the expression
`asSociated with the Score observed. The rejection
`module 211 analyses this probability and if it is less
`than a predetermined threshold of acceptability of
`probability, the expression is rejected.
`0126. According to another method, for the best
`proposals obtained via the Viterbi algorithm, the
`rejection module 211 performs a complementary
`processing of the expressions, using criteria which
`had not been taken into account in the course of the
`Viterbi development. For example, it checks that
`those parts of the Signal that have to be voiced
`because they are associated with vowels, are actually
`SO. If the expressions proposed do not fulfil these
`conditions, they are rejected.
`0127. When the rejection module 211 rejects an expres
`Sion, as illustrated previously, the expression is Said to be
`unrecognized and a signal indicating the rejected expression
`is Sent to the recognition engine 203. In parallel, the rejec
`tion module transmits a recording of th