throbber
(19) United States
`(12) Patent Application Publication (10) Pub. No.: US 2001/0053974 A1
`Lucke et al.
`(43) Pub. Date:
`Dec. 20, 2001
`
`US 2001.0053974A1
`
`(54) SPEECH RECOGNITION APPARATUS,
`SPEECH RECOGNITION METHOD, AND
`RECORDING MEDIUM
`(76) Inventors: Helmut Lucke, Tokyo (JP); Katsuki
`Minamino, Tokyo (JP); Yasuharu
`Asano, Kanagawa (JP); Hiroaki
`Ogawa, Chiba (JP)
`Correspondence Address:
`p
`William S. Frommer, Esq.
`FROMMER LAWRENCE & HAUG LLP
`745 Fifth Avenue
`New York, NY 10151 (US)
`(21) Appl. No.:
`09/804,354
`
`(22) Filed:
`(30)
`
`Mar. 12, 2001
`Foreign Application Priority Data
`
`Mar. 14, 2000 (JP)...................................... 2000-069698
`
`Publication Classification
`
`(51) Int. Cl." .......................... G10L 15/12; G 10L 15/08;
`G1OL 15/00
`(52) U.S. Cl. .............................................................. 704/240
`(57)
`ABSTRACT
`In order to prevent degradation of Speech recognition accu
`racy due to an unknown word, a dictionary database has
`Stored therein a word dictionary in which are Stored, in
`addition to words for the objects of Speech recognition,
`Suffixes, which are Sound elements and a Sound element
`Sequence, which form the unknown word, for classifying the
`unknown word by the part of speech thereof. Based on such
`a word dictionary, a matching Section connects the acoustic
`models of an Sound model database, and calculates the Score
`using the Series of features output by a feature extraction
`Section on the basis of the connected acoustic model. Then,
`the matching Section Selects a Series of the words, which
`represents the Speech recognition result, on the basis of the
`SCOC.
`
`SPEECH INPUT
`
`
`
`MICROPHONE
`
`AD
`CONVERSION
`SECTION
`
`FEATURE
`EXTRACON
`SECTION
`
`MATCHING
`SECTION
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`SOUND
`MODEL
`DATABASE
`
`
`
`
`
`DCTIONARY
`DATABASE
`
`
`
`
`
`GRAMMAR
`DATABASE
`
`SPEECH
`RECOGNITION
`RESULT
`
`Amazon / Zentian Limited
`Exhibit 1008
`Page 1
`
`

`

`Patent Application Publication Dec. 20, 2001 Sheet 1 of 7
`
`US 2001/0053974 A1
`
`F.G. 1
`
`SPEECH INPUT
`
`1
`
`2
`
`3
`
`4.
`
`MCROPHONE
`
`AD
`CONVERSION
`SECTION
`
`FEATURE
`EXTRACTION
`SECTION
`
`MATCHING
`SECTION
`
`SPEECH
`RECOGNITION
`RESULT
`
`- S
`SE
`5
`DATABASE
`- S
`DCTIONARY n-6
`DATABASE
`
`
`
`GRAMMAR
`DATABASE
`
`7
`
`Amazon / Zentian Limited
`Exhibit 1008
`Page 2
`
`

`

`Patent Application Publication Dec. 20, 2001 Sheet 2 of 7
`
`US 2001/0053974 A1
`
`FIG. 2
`
`
`
`PHONOLOGICAL
`INFORMATION
`
`Amazon / Zentian Limited
`Exhibit 1008
`Page 3
`
`

`

`Patent Application Publication Dec. 20, 2001 Sheet 3 of 7
`
`US 2001/0053974 A1
`
`FIG. 3
`
`SPEECH INPUT
`
`1
`
`MICROPHONE
`
`AD
`2n-1 CONVERSION
`SECTION
`
`3
`
`FEATURE
`EXTRACTION
`SECTION
`
`MATCHING
`SECTION
`
`SPEECH
`RECOGNITION
`RESULT
`
`
`
`C D
`SYSP
`5
`DATABASE
`
`DCTIONARY
`DATABASE
`
`GRAMMAR
`DATABASE
`
`6
`
`7
`
`
`
`
`
`
`
`Amazon / Zentian Limited
`Exhibit 1008
`Page 4
`
`

`

`Amazon / Zentian Limited
`Exhibit 1008
`Page 5
`
`

`

`Patent Application Publication Dec. 20, 2001 Sheet 5 of 7
`
`US 2001/0053974 A1
`
`SPEECH RECOGNITION
`PROCESSING
`
`EXTRACT FEATURES
`
`CALCULATE SCORE
`
`S1
`
`S2
`
`SELECT WORD SEQUENCE
`HAVING GREATEST FINAL
`SCORE
`
`S3
`
`
`
`
`
`
`
`Amazon / Zentian Limited
`Exhibit 1008
`Page 6
`
`

`

`Patent Application Publication Dec. 20, 2001 Sheet 6 of 7
`
`US 2001/0053974 A1
`
`
`
`
`
`CHEVONWLS
`
`
`
`Ô BOJON LOOH
`
`TWO|0OTONOHd
`
`Amazon / Zentian Limited
`Exhibit 1008
`Page 7
`
`

`

`Patent Application Publication Dec. 20, 2001 Sheet 7 of 7
`
`US 2001/0053974 A1
`
`FIG. 7
`
`
`
`110
`
`REMOVABLE
`RECORDING
`MEDIUM
`
`111
`
`Amazon / Zentian Limited
`Exhibit 1008
`Page 8
`
`

`

`US 2001/0053974 A1
`
`Dec. 20, 2001
`
`SPEECH RECOGNITION APPARATUS, SPEECH
`RECOGNITION METHOD, AND RECORDING
`MEDIUM
`
`BACKGROUND OF THE INVENTION
`0001) 1. Field of the Invention
`0002 The present invention relates to a speech recogni
`tion apparatus, a Speech recognition method, and a recording
`medium. More particularly, the present invention relates to
`a Speech recognition apparatus and a speech recognition
`method which are capable of reducing degradation of speech
`recognition accuracy, for example, in a case where an
`unknown word is contained in an utterance, and to a
`recording medium therefor.
`0003 2. Description of the Related Art
`0004 FIG. 1 shows the construction of an example of a
`conventional Speech recognition apparatus for performing
`continuous speech recognition.
`0005 Speech produced by a user is input to a mike
`(microphone) 1. In the microphone 1, the input speech is
`converted into an audio signal as an electrical Signal. This
`audio signal is Supplied to an AD (Analog-to-Digital) con
`version Section 2. In the AD conversion Section 2, the audio
`Signal, which is an analog signal, from the microphone 1 is
`Sampled and quantized, and is converted into audio data
`which is a digital Signal. This audio data is Supplied to a
`feature extraction Section 3.
`0006 The feature extraction section 3 performs, for each
`appropriate frame, acoustic processing, Such as Fourier
`transforming and filtering, on the audio data from the AD
`conversion Section 2, thereby extracting features, Such as,
`for example, MFCC (Mel Frequency Cepstrum Coefficient),
`and Supplies the features to a matching Section 4. Addition
`ally, it is possible for the feature extraction Section 3 to
`extract features, Such as a Spectrum, a linear prediction
`coefficient, a cepstrum coefficient, and a line Spectrum pair.
`0007. The matching section 4 performs speech recogni
`tion of Speech input to the matching Section 4 (input speech)
`based on, for example, the continuous distribution HMM
`method, while referring to a Sound model database 5, a
`dictionary database 6, and a grammar database 7 as neces
`Sary by using the features from the feature extraction Section
`3.
`0008 More specifically, the sound model database 5
`Stores therein a Sound model showing acoustic features of
`individual Sound elements and Syllables in a spoken lan
`guage for which Speech recognition is performed. Here,
`Since Speech recognition is performed based on the continu
`ous distribution HMM method, for the Sound model, for
`example, HMM (Hidden Markov Model) is used. The dic
`tionary database 6 stores therein word dictionaries in which
`information for the pronunciation (phonological informa
`tion) of each word (vocabulary) which is the object of
`Speech recognition is described. The grammar database 7
`Stores therein grammar rules (language models) for the way
`in which each word entered in the word dictionary of the
`dictionary database 6 is connected (chained). Here, as the
`grammar rule, for example, a rule based on context free
`grammar (CFG), Statistical word Sequencing probability
`(N-gram), etc., can be used.
`
`0009. The matching section 4 connects sound models
`stored in the sound model database 5 by referring to the
`word dictionary of the dictionary database 6, thereby form
`ing a sound model (word model) of the word. Furthermore,
`the matching Section 4 connects Several word models by
`referring to the grammar rules Stored in the grammar data
`base 7, and uses the word model which is connected in that
`manner in order to recognize, based on the features, the
`Speech input to the microphone 1 by the continuous distri
`bution HMM method. That is, the matching section 4 detects
`a series of word models in which the score (likelihood) at
`which the features of the time series output by the feature
`extraction Section 3 are observed is greatest, and outputs a
`word Sequence corresponding to that Series of word models
`as the Speech recognition result.
`0010 More specifically, the matching section 4 accumu
`lates the appearance probability of each feature for the word
`Sequence corresponding to the connected word model,
`assumes the accumulated value as a Score, and outputs the
`word Sequence which maximizes the Score as a speech
`recognition result.
`0011. The score calculation is generally performed by
`jointly evaluating an acoustic Score (hereinafter referred to
`as an "acoustic score” where appropriate) given by the
`Sound model stored in the Sound model database 5 and a
`linguistic Score (hereinafter referred to as a “linguistic
`Score” where appropriate) given by the grammar rule Stored
`in the grammar database 7.
`0012 More specifically, for example, in the case of the
`HMM method, the acoustic score is calculated, for each
`word from the acoustic models which form a word model,
`based on the probability at which the Sequence of features
`output by the feature extraction section 3 is observed
`(appearance probability). Also, for example, in the case of a
`bigram, the linguistic Score is determined based on the
`probability at which a particular word and a word immedi
`ately before that word are connected (chained). Then, the
`Speech recognition result is determined based on a final
`Score (hereinafter referred to as a “final score” where appro
`priate) obtained by jointly evaluating the acoustic score and
`the linguistic Score for each word.
`0013 Specifically, when a k-th word in a word sequence
`composed of N words is denoted as w, and when the
`acoustic Score of the word w is denoted as A(w) and the
`linguistic Score is denoted as L(w), the final Score of that
`word Sequence is calculated, for example, based on the
`following equation:
`
`0014 where X represents summation by varying k from
`1 to N, and C represents a weight applied to the linguistic
`Score L(w) of the word W.
`0015 The matching section 4 performs a matching pro
`ceSS for determining, for example, N by which the final Score
`shown in equation (1) is maximized and a word Sequence
`W1, W2, ..., WN, and outputs the Word Sequence W1, W2, ...,
`WN as the Speech recognition result.
`0016. As a result of processing such as that described
`above being performed, in the Speech recognition apparatus
`in FIG. 1, for example, when a user utters “
`3fai-vacs, (I want to go to New York)", an
`
`-- 7 if
`
`Amazon / Zentian Limited
`Exhibit 1008
`Page 9
`
`

`

`US 2001/0053974 A1
`
`Dec. 20, 2001
`
`acoustic Score and a linguistic Score are given to each word,
`such as “ - - - -27”, “c”, “ffs favy, and “C3”. When the
`final Score obtained by jointly evaluating those is greatest, a
`word sequence “ -si-a-27”, “c”, “frasi-va”, and “Ed” is
`output as a speech recognition result.
`0.017. If the calculation of the acoustic score is performed
`independently for all the words entered in the word dictio
`nary of the dictionary database 6, Since the amount of
`calculations is large, a method of making common (sharing)
`portions of calculations of the acoustic Score for a plurality
`of words may be used. That is, there is a method in which,
`of the words of the word dictionary, for the words whose
`phonemes at the Start thereof are the Same, a common
`acoustic model is used from the Start phoneme up to the
`phoneme which is the same as the Start phoneme, and
`individual acoustic models are used for the phonemes there
`after, thereby forming one tree-structured network as a
`whole, and an acoustic Score is determined by using this
`network.
`0.018. In this case, for example, as shown in FIG. 2, the
`word dictionary is formed by a network of words of a tree
`Structure (word network), which is obtained by Sequentially
`connecting branches corresponding to the phonemes from
`the Start of each word which is the object of Speech recog
`nition, from a root node which is a starting point.
`0019. When the word network is formed, for the words
`whose phonemes at the Start thereof are the Same, in the
`manner described above, branches corresponding to the Start
`phoneme up to the phoneme which is the same as the start
`phoneme are commonly used. That is, in FIG. 2, an alpha
`betic character Surrounded by Slashes (/) attached to each
`branch indicates a phoneme, and a portion enclosed by a
`rectangle indicates a word. For example, for words “I”,
`“ice”, “icy', and “up', the phoneme /A/ at the start thereof
`is the same and, therefore, a common branch corresponding
`to the phoneme /A/ is made. Also, for the words “I”, “ice',
`and "icy', Since the Second phoneme /I? thereof is also the
`Same, in addition to the Start phoneme /A/, a common branch
`corresponding to the Second phoneme /I/ is also made.
`Furthermore, for the words “ice” and “icy', since the third
`phoneme /S/ thereof is the same, a common branch corre
`sponding to the third phoneme /S/thereof, in addition to the
`Start phoneme /A/ and the Second phoneme /I/, is also made.
`0020) Furthermore, for the words “be” and “beat”, since
`the first phoneme /B/ thereof and the second phoneme /I/
`thereof are the Same, common branches corresponding to the
`Start phoneme /B/ and the Second phoneme // are made.
`0021. In a case where the word dictionary which forms
`the word network of FIG. 2 is used, the matching section 4
`reads, from the Sound model database 5, an acoustic model
`of phonemes corresponding to a Series of branches extend
`ing from the root node of the word network, connects them,
`and calculates, based on the connected acoustic model, an
`acoustic Score by using the Series of features from the feature
`extraction Section 3.
`0022 Consequently, the acoustic scores of the words “I”,
`“ice”, “icy', and “up' are calculated in a common manner
`for the first phoneme /A/ thereof. Also, the acoustic Scores
`of the words “I”, “ice”, and “icy” are calculated in a
`common manner for the first and Second phonemes /A/ and
`/I/. In addition, the acoustic scores of the words “ice” and
`
`“icy” are calculated in a common manner for the first to third
`phonemes /A/, //, and /S/. For the remaining phoneme
`(second phoneme) /P/ of the word “up” and the remaining
`phoneme (fourth phoneme) /I/ of the word "icy", the acous
`tic Score is calculated Separately.
`0023 The acoustic scores of the words “be” and “beat”
`are calculated in a common manner for the first and Second
`phonemes /B/ and // thereof. Then, for the remaining
`phoneme (third phoneme) /T/ of the word “beat”, the acous
`tic Score is calculated Separately.
`0024 Consequently, by using the word dictionary which
`forms the word network, the amount of calculations of
`acoustic Scores can be greatly reduced.
`0025. In the matching section 4, in the manner described
`above, when acoustic Scores are calculated using a Series of
`features on the basis of acoustic models which are connected
`along a series of branches (hereinafter referred to as a "path’
`where appropriate) extending from the root node of the word
`network, eventually, the end node (in FIG. 2, the end of the
`final branch in a case where movement occurs from the root
`node to the right along the branches) of the word network is
`reached. That is, for example, in a case where an HMM is
`used as an acoustic model, when acoustic Scores are calcu
`lated using the series of features on the basis of the HMMs
`connected along the Series of branches which form the path,
`there is a time when the acoustic Score becomes large to a
`certain degree (hereinafter referred to as a “local maximum
`time' where appropriate) in the final State of the connected
`HMMS
`0026. In this case, in the matching section 4, it is assumed
`that the region from the time of the features at the Start, used
`for the calculation of the acoustic Scores, to the local
`maximum time is a speech region in which a word corre
`sponding to the path is spoken, and the word is assumed to
`be a candidate for a word which is a constituent of the word
`Sequence as the Speech recognition result. Then, based on
`the acoustic models connected along the Series of the
`branches (path) extending from the root node of the word
`network, the calculations of the acoustic Scores of the
`candidate for the word which is connected after the candi
`date of that word are performed again using the Series of
`features after the local maximum time.
`0027. In the matching section 4, as a result of the above
`processing being repeated, a Word Sequence as a candidate
`of a large number of Speech recognition results is obtained.
`The matching Section 4 discards words with a low acoustic
`Score among the candidates of Such a large number of word
`Sequences, that is, performs acoustic pruning, thereby Select
`ing (leaving) only a word Sequence whose acoustic score is
`equal to or greater than a predetermined threshold value, that
`is, only a word Sequence which has a certain degree of
`certainty, from an acoustic point of view, as a speech
`recognition result, and the processing continues.
`0028. In addition, in the process in which a candidate of
`a word Sequence as a speech recognition result is created
`while calculating the acoustic Score in the manner described
`above, the matching Section 4 calculates the linguistic Score
`of a word which is a constituent of the candidates of the
`word Sequence as a speech recognition result, on the basis of
`the grammar rule, Such as N-gram, entered in the grammar
`database 7. Then, the matching Section 4 discards words
`
`Amazon / Zentian Limited
`Exhibit 1008
`Page 10
`
`

`

`US 2001/0053974 A1
`
`Dec. 20, 2001
`
`having a low acoustic Score, that is, performs linguistic
`pruning, thereby selecting (leaving) only a word Sequence
`whose linguistic Score is equal to or greater than a prede
`termined threshold value, that is, only a word Sequence
`which has a certain degree of certainty, from a linguistic
`point of View, as a Speech recognition result, and the
`processing continues.
`0029. As described above, the matching section 4 calcu
`lates the acoustic Score and the linguistic Score of a word,
`and performs acoustic and linguistic pruning on the basis of
`the acoustic Score and the linguistic Score, thereby Selecting
`one or more word Sequences which seem likely as a speech
`recognition result. Then, by repeating the calculations of the
`acoustic Score and the linguistic Score of a word connected
`after the connected word Sequence, eventually, one or more
`word Sequences which have a certain degree of certainty is
`obtained as a candidate of the Speech recognition result.
`Then, the matching Section 4 determines, from among Such
`word Sequences, a word Sequence having the greatest final
`Score, for example, as shown in equation (1), as the speech
`recognition result.
`0.030. In the speech recognition apparatus, the number of
`words, as the object of Speech recognition, to be entered in
`the word dictionary of the dictionary database 6 is limited,
`for example, due to the computation Speed of the apparatus,
`the memory capacity, etc.
`0.031 When the number of words as the object of speech
`recognition is limited, various problems occur if a user
`speaks a word which is not the object of speech recognition
`(hereinafter referred to as an “unknown word” where appro
`priate).
`0.032 More specifically, in the matching section 4, even
`when an unknown word is spoken, the acoustic Score of each
`word entered in the word dictionary is calculated using the
`features of the Speech of the unknown word, and a word
`whose acoustic Score is large to a certain degree is errone
`ously Selected as a candidate of the Speech recognition result
`of the unknown word.
`0.033 AS described above, when an unknown word is
`spoken, an error occurs at the portion of that unknown word,
`and furthermore, this error may cause an error at other
`portions.
`0034) More specifically, for example, in the manner
`described above, in a case where the user Speaks “/
`- -27 fiéi-vac g”, (I want to go to New York)", when
`i.7E.g 27, (New York)" is an unknown word, since an
`erroneous word is selected in the portion "/- 1 - 3 - 27,
`(New York)", it is difficult to precisely determine the bound
`ary between “/ -a, -sa-27, (New York)", which is an
`unknown word, and the word “/3, (to)” which follows. As
`a result, an error occurs at the boundary between the words
`and this error affects the calculation of the acoustic Score of
`the other portions.
`0035) Specifically, in the manner described above, after
`an erroneous word, which is not “? a - 3 - 27, (New York)”,
`is Selected, the acoustic Score of the next word is calculated
`using the Series of features in which the end point of the
`Series of features, used for the calculation of the acoustic
`Score of that erroneous word, is a Starting point. Conse
`
`quently, the calculation of the acoustic Score is performed,
`for example, using the features of the end portion of the
`speech "/ t-r-sa-27, (New York)”, or is performed without
`using the features of the initial portion of the next speech "/
`, (to)”. As a result, there are cases in which the acoustic
`score of the correct word “/k, (to)" as the speech recogni
`tion result becomes Smaller than that of the other words.
`0036). In addition, in this case, even if the acoustic score
`of the word which was wrongly recognized as the Speech
`recognition result does not become very large, the linguistic
`Score of the word becomes large. As a result, there are cases
`in which the Score when the acoustic Score and the linguistic
`Score are jointly evaluated becomes greater than the Score
`when the acoustic Score and the linguistic Score of the
`correct word "/3, (to) as the speech recognition result are
`jointly evaluated (hereinafter referred to as a “word score”
`where appropriate).
`0037 AS described above, as a result of making a mistake
`in the Speech recognition of the unknown word, the Speech
`recognition of a word at a position close to the unknown
`word is also performed mistakenly.
`0038. As a word which is the object of speech recognition
`in the Speech recognition apparatus, generally, for example,
`a word with a high appearance incidence in newspapers,
`novels, etc., is often Selected, but there is no guarantee that
`a word with a low appearance incidence will not be spoken
`by a user. Therefore, it is necessary to Somehow cope with
`an unknown word.
`0039. An example of a method for coping with an
`unknown word, is one in which, for example, an unknown
`word, which is a word which is not the object of speech
`recognition, is divided into Segments, Such as Sound ele
`ments which form the word or a Sound element Sequence
`composed of Several Sound elements, and this Segment is
`considered as a word in a pseudo manner (what is commonly
`called a “sub-word”) so that the word is made an object of
`Speech recognition.
`0040 Since there are not very large number of types of
`Sound elements which form a word and Sound element
`Sequences, even if Such Sound elements and Sound element
`Sequences are made objects of Speech recognition as pseudo
`words, this does not exert a very large influence on the
`amount of calculations and the memory capacity. In this
`case, the unknown word is recognized as a Series of pseudo
`words (hereinafter referred to as “pseudo-words” where
`appropriate), and as a result, the number of unknown words
`apparently becomes Zero.
`0041. In this case, even if not only an unknown word, but
`also a word entered in the word dictionary is spoken, it can
`be recognized as a Series of pseudo-words. Whether the
`spoken word will be recognized as a word entered in the
`word dictionary or as an unknown word as a Series of
`pseudo-words, is determined based on the Score calculated
`for each word.
`0042. However, in a case where a pseudo-word is used,
`Since the unknown word is recognized as Sound elements
`which are a pseudo-word or a Series of Sound element
`Sequences, the unknown word cannot be processed by using
`an attribute thereof. That is, for the unknown word, since, for
`example, the part of Speech as the attribute thereof cannot be
`
`Amazon / Zentian Limited
`Exhibit 1008
`Page 11
`
`

`

`US 2001/0053974 A1
`
`Dec. 20, 2001
`
`known, the grammar rule cannot be applied, causing the
`Speech recognition accuracy to be degraded.
`0.043
`Also, there are some types of speech recognition
`apparatuses in which the word dictionary for each of a
`plurality of languages is prestored in the dictionary database
`6, and the word dictionary is, for example, Switched accord
`ing to an operation by a user So that speech recognition of
`a plurality of languages is made possible. In this case, the
`words of the languages other than the language of the word
`dictionary which is currently used become unknown words,
`however, if the language, as the attribute, of the unknown
`word is known, it is possible to automatically Switch to the
`word dictionary of that language, and furthermore, in this
`case, the word which was an unknown word can be recog
`nized correctly.
`0044 Specifically, for example, in a case where English
`and French word dictionaries are Stored in the dictionary
`database 6, when the English word dictionary is in use, if it
`is known that the unknown word is a French word, consid
`ering that the Speaker changed to a French perSon, the word
`dictionary may be switched to the French dictionary from
`the English dictionary, So that Speech recognition with a
`higher accuracy is made possible.
`
`SUMMARY OF THE INVENTION
`004.5 The present invention has been achieved in view of
`Such circumstances. An object of the present invention is to
`improve the Speech recognition accuracy by allowing the
`attribute of the unknown word to be obtained.
`0046) To achieve the above-mentioned object, according
`to one aspect of the present invention, there is provided a
`Speech recognition apparatus comprising: extraction means
`for extracting features of the Speech from the Speech;
`calculation means for calculating the Score using the features
`on the basis of a dictionary in which unknown-word
`forming elements, which are elements forming an unknown
`word, for classifying an unknown word by an attribute
`thereof and words for the object of Speech recognition are
`entered; and Selection means for Selecting a Series of the
`words, which represents a speech recognition result, on the
`basis of the Score.
`0047. In the dictionary, unknown-word-forming elements
`for classifying an unknown word by a part of Speech thereof
`may be entered.
`0.048. In the dictionary. Suffixes may be entered as the
`unknown-word-forming elements.
`0049. In the dictionary, phonemes which form an
`unknown word may be entered together with the suffixes.
`0050. In the dictionary, unknown-word-forming elements
`for classifying an unknown word by a language thereof may
`be entered.
`0051. The speech recognition apparatus of the present
`invention may further comprise a dictionary.
`0.052 According to another aspect of the present inven
`tion, there is provided a speech recognition method com
`prising the Steps of extracting features of the Speech from
`the Speech, calculating the Score using the features on the
`basis of a dictionary in which unknown-word-forming ele
`ments, which are elements forming an unknown word, for
`
`classifying an unknown word by an attribute thereof and
`words for the object of Speech recognition are entered; and
`Selecting a Series of the words, which represents a speech
`recognition result, on the basis of the Score.
`0053 According to yet another aspect of the present
`invention, there is provided a recording medium having
`recorded therein a program, the program comprising the
`Steps of: extracting features of the Speech from the Speech;
`calculating the Score using the features on the basis of a
`dictionary in which unknown-word-forming elements,
`which are elements forming an unknown word, for classi
`fying an unknown word by an attribute thereof and words for
`the object of Speech recognition are entered; and Selecting a
`Series of the words, which represents a speech recognition
`result, on the basis of the Score.
`0054.
`In the speech recognition apparatus, the speech
`recognition method, and the recording medium therefor of
`the present invention, a Score is calculated using features on
`the basis of a dictionary in which unknown-word-forming
`elements, which are elements forming an unknown word, for
`classifying an unknown word by an attribute thereof and
`words for the object of Speech recognition are entered, and
`a Series of words, which represents a speech recognition
`result, is Selected on the basis of the Score.
`0055. The above and further objects, aspects and novel
`features of the invention will become more fully apparent
`from the following detailed description when read in con
`junction with the accompanying drawings.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`0056 FIG. 1 is a block diagram showing the construction
`of an example of a conventional Speech recognition appa
`ratuS.
`0057 FIG. 2 is a diagram showing an example of the
`Structure of a word dictionary Stored in a dictionary database
`6 of FIG. 1.
`0058 FIG. 3 is a block diagram showing an example of
`the construction of an embodiment of a Speech recognition
`apparatus to which the present invention is applied.
`0059 FIG. 4 is a diagram showing an example of the
`Structure of a word dictionary Stored in the dictionary
`database 6 of FIG. 3.
`0060 FIG. 5 is a flowchart illustrating processing of the
`Speech recognition apparatus of FIG. 3.
`0061
`FIG. 6 is a diagram showing another example of
`the Structure of a word dictionary Stored in the dictionary
`database 6 of FIG. 3.
`0062 FIG. 7 is a block diagram showing an example of
`the construction of an embodiment of a computer to which
`the present invention is applied.
`
`DESCRIPTION OF THE PREFERRED
`EMBODIMENTS
`0063 FIG. 3 shows an example of the construction of an
`embodiment of a Speech recognition apparatus to which the
`present invention is applied. Components in FIG. 3 corre
`sponding to those in FIG. 1 are given the same reference
`numerals and, accordingly, in the following, descriptions
`thereof are omitted. That is, the Speech recognition appara
`
`Amazon / Zentian Limited
`Exhibit 1008
`Page 12
`
`

`

`US 2001/0053974 A1
`
`Dec. 20, 2001
`
`tus of FIG. 3 is constructed basically similarly to the speech
`recognition apparatus of FIG. 1.
`0064. However, in addition to a word dictionary in which
`are entered words for the objects of Speech recognition
`(hereinafter referred to as a “standard dictionary” where
`appropriate), stored in the dictionary database 6 of FIG. 1,
`in the dictionary database 6 of the Speech recognition
`apparatus of FIG. 3, an unknown word dictionary is also
`Stored in which unknown-word-forming elements, which are
`elements forming an unknown word, for classifying an
`unknown word by an attribute thereof are entered. That is,
`in the embodiment of FIG. 3, the word dictionary stored in
`the dictionary database 6 is composed of the Standard
`dictionary and the unknown word dictionary.
`0065. Also in the word dictionary of the dictionary data
`base 6 of FIG. 3, a word network is formed similarly to the
`word dictionary of the dictionary database 6 of FIG. 1.
`0.066 More specifically, in the word dictionary of the
`dictionary database 6 of FIG. 3, for example, as shown in
`FIG. 4, a word dictionary similar to the case in FIG. 2
`described above is formed, and this is assumed to be a
`Standard dictionary. Furthermore, in the word dictionary of
`the dictionary database 6 of FIG. 3, a general-purpose
`branch, which is one or more branches, to which the
`phonemes of a pseudo-word which is a Sound element or a
`Sound element Sequence which form an unknown word
`correspond, is connected to the root node, and furthermore,
`an attribute branch, which is one or more branches, to which
`phonemes (sequence) for classifying the unknown word by
`an attribute thereof corresponds, is connected to the general
`purpose branch, thereby forming a word network for coping
`with the unknown word, and this is assumed to be an
`unknown word dictionary.
`0067 More specifically, in the embodiment of FIG.4, the
`unknown word dictionary is formed in Such a way that a
`general-purpose branch and an attribute branch are con
`nected in Sequence to the root node. Furthermore, a branch
`which acts as a loop (hereinafter referred to as a "loop
`branch” where appropriate) is connected to the general
`purpose branch. Since the general-purpose branch is formed
`of one or more branches to which the phonemes of a
`pseudo-word which is various Sound elements or a Sound
`element Sequence correspond, by repeating passing through
`the general-purpose branch and after going through the loop
`branch, passing through the general-purpose branch again,
`all the words (containing both the words entered in the
`Standard dictionary, and the unknown words) can be recog
`nized as a Series of pseudo-words.
`0068. However, whether the spoken word will be recog
`nized as a word entered in the Standard dictionary or as an
`unknown word as a Series of pseudo-words is determined
`based on the Score calculated

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket