`Stubley et al.
`
`US006092045A
`Patent Number:
`11
`(45) Date of Patent:
`
`6,092,045
`Jul.18, 2000
`
`54 METHOD AND APPARATUS FOR SPEECH
`RECOGNITION
`
`75 Inventors: Peter R. Stubley, Lachine; Andre
`
`5,459,798 10/1995 Bailey et al. ........................... 382/218
`5,488,652
`1/1996 Bielby et al. ...
`... 379/88
`5,515,475 5/1996 Gupta et al. ...
`... 395/2.51
`5,621,859 4/1997 Schwartz et al. ...................... 395/2.65
`
`Gillet, St-Laurent; Vishwa N. Gupta,
`
`Brossard, all of Canada; Christopher
`K. Toulson, Palo Alto, David B.
`Peters, San Carlos, both of Calif.
`
`73 ASSignee: Nets Neyers Corporation,
`Onlreal, Ulanada
`21 Appl. No.: 09/119,621
`22 Filed:
`Jul. 21, 1998
`30
`Foreign Application Priority Data
`Sep. 19, 1997 CA Canada ................................... 2216224
`ep. 19,
`CA Canada
`51) Int. Cl." ............................. G10L 15/10; G10L 15/14
`52 U.S. Cl. ............................................. 704/254; 704/256
`58 Field of Search ..................................... 704/256, 251,
`704/254, 255, 200, 231, 240, 242, 238,
`236, 243, 244, 252
`
`56)
`
`References Cited
`U.S. PATENT DOCUMENTS
`4,164,025 8/1979 Dubnowski et al. ................... 364/900
`4,751,737 6/1988 Gerson et al. ............................ 381/43
`4,797,910
`1/1989 Daudelin ................................... 379/67
`4,805,219 2/1989 Baker ........................................ 381/43
`4.959,855 9/1990 Daudelin .......
`... 379/213
`4,979,206 12/1990 Padden et al. ............................ 379/67
`5,050,215
`9/1991 Nishimura ................................. 381/41
`5,052,038 9/1991. Shepard .................................... 379/88
`5,086,479 2/1992 Takenaga et al. ........................ 382/14
`2 - a-Y-2
`9.
`5,091,947 2/1992 Ariyoshi et al. .......................... 381/42
`5,097.509 3/1992 Lennig ...................................... 381/43
`5,127,055 6/1992 Larkey ...................................... 381/43
`
`5.5% E. Sis - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - SS
`
`elder................
`5030753 7
`2- - -2
`/1999 Potamianos et al.
`704/243
`5,933,806 8/1999 Beyerlein et al. ...................... 704/256
`FOREIGN PATENT DOCUMENTS
`0703568 3/1996 European Pat. Off. .......... G10L 5/06
`0 703 568 3/1996 European Pat. Off. .......... G10L 5/06
`O 810 583 12/1997 European Pat. Off. .......... G10L 5/06
`OTHER PUBLICATIONS
`Kuhn, R., et al., “A Cache-Based Natural Language Model
`for Speech Recognition.” IEEE Transactions on Pattern
`Analysis and Machine Intelligence, Vol. 12, No. 6, Jun.
`s
`s
`s
`1990, pp. 570-583 (XP000136978).
`Bakis, R. and Cole, A.G., “Dynamic Modification of the
`Vocabulary of a Speech Recognition Machine,” IBM Tech
`nical Disclosure Bulletin, vol. 27, No. 7A, (1984).
`Gao, et al., “Dynamic Adaptation of Hidden Markov Model
`for Robust Speech Recognition,” 1989 IEEE Internatioal
`Symposium On Circuits and Systems, Vol. 2 of 3, pp.
`1336–1339, (1989).
`(List continued on next page.)
`Primary Examiner Richemond Dorvil
`Attorney, Agent, or Firm-Foley & Lardner
`57
`ABSTRACT
`Comparing a Series of observations representing unknown
`Speech, to Stored models representing known Speech, the
`series of observations being divided into at least two blocks
`each comprising two or more of the observations, is carried
`pr1SIng
`out in an order which makes better use of memory. First, the
`observations in one of the blocks are compared (31), to a
`Subset comprising one or more of the models, to determine
`
`- - - - - - - - - - - - - - - - - - - - - - - - - - 2. a likelihood of a match to each of the one or more models.
`
`1.E. R et
`
`5204894 A 993 E" et al. .......................... 2.
`5,271,088 12/1993 Bahler m.
`.704200
`5,274,695 12/1993 Green ........................................ 37988
`5,307.444 4/1994 Tsuboka .................................... 395/22
`5,390,278 2/1995 Gupta et al. ........................... 395/2.52
`
`This step is repeated (33) for models other than those in the
`Subset; and the whole process is repeated (34) for each
`block.
`
`21 Claims, 12 Drawing Sheets
`
`WORD RECOGNITION
`
`14
`
`FIND FIRSTOBSERVATIONN
`FIRST BLOCK, AND FIND
`FIRST MODEL 30
`
`
`
`COMPARE OBSERVATION
`TO MODEL 31
`
`
`
`NEXT OBSERVATION,
`UNTIL END OFBLOCK 32
`
`NEXT MODEL 33
`
`NEXTBLOCK OF
`OBSERVATIONS 34
`
`s
`
`
`
`Amazon / Zentian Limited
`Exhibit 1020
`Page 1
`
`
`
`6,092,045
`Page 2
`
`OTHER PUBLICATIONS
`Gorin, A.L.. et al., “Adaptive Acquisition of Language,'
`Computer Speech and Language, Vol. 5, No. 2, pp. 101-132,
`(1991).
`Lennig, M., et al., “Automated Bilingual Directory ASSis
`tance Trial in Bell Canada,” IEEE Workshop On Interactive
`Voice Technology for Telecom Applications, (1992).
`Lennig, M. and Sharp, D., “Unleashing The Potential Of
`Human-To-Machine Communication’ Telesis, No. 97, pp.
`23–37, (1993).
`Lenning, M., “Putting Speech Recognition to Work in the
`Telephone Network," IEEE Computer Society, vol. 23, No.
`8 (1990).
`
`Lennig, M., et al., “Flexible Vocabulary Recognition of
`Speech Over the Telephone," IEEE Workshop On Interactive
`Voice Technology for Telecom Applications, (1992).
`Nagata, S., et al., “Mobile Robot Control by a Structured
`Hierarchical Neural Network,” pp. 69–76, (1989).
`Rabiner, L.R. and Juang, B.H., “An Introduction to Hidden
`Markov Models," IEEE ASSP Magazine, vol. 3, No. 1
`(1986).
`Young, S., Large Vocabulary Continuous Speech Recogni
`tion: a Review IEEE Automatic Speech Recognition Work
`shop, (1995).
`
`Amazon / Zentian Limited
`Exhibit 1020
`Page 2
`
`
`
`U.S. Patent
`
`Jul.18, 2000
`
`Sheet 1 of 12
`
`6,092,045
`
`SPEECH
`INPUT
`
`
`
`
`
`CHANNEL ADAPTATION 11
`
`FEATURE EXTRACTION 12
`
`WORD ENDPOINT DETECTION 13
`
`WORD RECOGNITION BY
`FINDING CLOSEST MATCHES
`TO STORED VOCABULARY 14
`
`ACCEPT/REJECT DECISION 15
`
`OUTPUT THE BEST MATCH,
`OR REJECTION
`
`FIG. 1 PRIOR ART
`
`Amazon / Zentian Limited
`Exhibit 1020
`Page 3
`
`
`
`U.S. Patent
`
`Jul.18, 2000
`
`Sheet 2 of 12
`
`6,092,045
`
`TO LAN/WAN
`
`TO PSTN
`
`
`
`MASTER
`SERVICE
`PROC.
`101
`
`REAL TIME
`SERVICE
`PROC.
`102
`
`DIGITAL
`TRUNK
`LINK
`103
`
`TIME
`SLOT
`INTER
`CHANGE
`104
`
`VMEbus || ||
`
`VOICE BUS
`
`
`
`SPEECH
`MULTI
`X.25
`RECOGNITION INTERACE CHANNEL
`PROCESSOR
`106
`TELECOMS
`105
`LINK
`107
`
`TODATANETWORK
`
`FIG. 2
`
`Amazon / Zentian Limited
`Exhibit 1020
`Page 4
`
`
`
`U.S. Patent
`
`Jul.18, 2000
`
`Sheet 3 of 12
`
`6,092,045
`
`SPEECH
`RECOGNITION
`PROCESSOR
`CARD 105
`
`VME BUS
`INTERFACE
`121
`
`LOCAL
`CACHE
`MEMORY
`122
`
`LOCAL
`CACHE
`CONTROL
`124
`
`GLOBAL
`MEMORY
`125
`
`LOCAL
`CACHE
`MEMORY
`122
`
`LOCAL
`CACHE
`CONTROL
`124
`
`FG. 3
`
`
`
`DESKTOP COMPUTER 132
`PENTUM
`PROCESSOR
`133
`ON-CHP
`CACHE 134
`
`GLOBAL
`MEMORY
`ON RAM
`135
`
`F.G. 18
`
`Amazon / Zentian Limited
`Exhibit 1020
`Page 5
`
`
`
`U.S. Patent
`
`Jul.18, 2000
`
`Sheet 4 of 12
`
`6,092,045
`
`
`
`WORD RECOGNITION (BY VITERBI
`ALGORITHM)
`14
`INPUT WORD ASA
`SERIES OF OBSERVATIONS
`INFORM OF CEPSTRAL VECTORS
`20
`
`COMPARISON STEP
`FOR CURRENT OBSERVATION IN
`SERIES, COMPUTE PROBABILITY
`OFIT BEING THE SAME ASEACH
`PATH IN THE STORED HMM
`TRELLIS
`21
`
`REPEAT FOR NEXT HMM
`UNTIL END
`22
`
`REPEAT FOR NEXT OBSERVATION
`UNTIL END
`23
`
`OUTPUT SCORES FOR BEST MATCHINGWORDS
`
`FIG. 4 PRIOR ART
`
`Amazon / Zentian Limited
`Exhibit 1020
`Page 6
`
`
`
`U.S. Patent
`
`Jul.18, 2000
`
`Sheet S of 12
`
`6,092,045
`
`
`
`WORD RECOGNITION
`
`14
`
`FIND FIRST OBSERVATION IN
`FIRST BLOCK, AND FIND
`FIRST MODEL 30
`
`COMPARE OBSERVATION
`TO MODEL 31
`
`NEXT OBSERVATION,
`UNTIL END OF BLOCK 32
`
`NEXT MODEL 33
`
`NEXT BLOCK OF
`OBSERVATIONS 34
`
`FIGS
`
`Amazon / Zentian Limited
`Exhibit 1020
`Page 7
`
`
`
`U.S. Patent
`
`Jul.18, 2000
`
`Sheet 6 of 12
`
`6,092,045
`
`SERIES OF OBSERVATIONS MAKING
`UP WORD BEING RECOGNISED
`A B A C U S
`
`
`
`WORDS IN
`DICTIONARY
`OF MODELS
`
`"ABRAHAM'
`
`"BEACH"
`
`"CARROT"
`
`"DEPTH"
`LAST TO BE CALCULATED
`
`F.G. 6
`
`SERIES OF OBSERVATIONS MAKING
`UP WORD BEING RECOGNISED
`{BLOCK 1 } {BLOCK 2 }
`A B A C U S
`
`
`
`
`
`WORDS IN
`DCTIONARY
`OF MODELS
`
`5%. 10%. 11%.10% 7%.5%
`
`
`
`"ABRAHAM"
`
`74%. 5%, 4%, 4%. 3%
`
`"BEACH'
`
`A 4%.5%l 5A 5A 49
`
`"CARROT”
`
`A 19h 26 2%. 39A 296
`
`"DEPTH'
`
`Amazon / Zentian Limited
`Exhibit 1020
`Page 8
`
`
`
`U.S. Patent
`
`Jul.18, 2000
`
`Sheet 7 of 12
`
`6,092,045
`
`
`
`STATE
`
`FRAME FRAME FRAME
`T
`T-1
`Téo
`
`STATE 2
`
`STATE 3
`
`STATE 4
`
`L
`TO END STATE OR NEXT HMM
`
`FIG. 9
`
`Amazon / Zentian Limited
`Exhibit 1020
`Page 9
`
`
`
`U.S. Patent
`
`Jul.18, 2000
`
`Sheet 8 of 12
`
`6,092,045
`
`u()
`
`(1)
`
`2)
`
`u(3)
`
`u4)
`
`u(5)
`
`(6)
`
`(7)
`
`W(t)
`
`v(2)
`
`v(3)
`
`v(4)
`
`W5)
`
`w(6)
`
`v(7)
`
`FIG 10
`
`
`
`Amazon / Zentian Limited
`Exhibit 1020
`Page 10
`
`
`
`U.S. Patent
`
`Jul.18, 2000
`
`Sheet 9 of 12
`
`6,092,045
`
`
`
`S1
`
`S2
`
`F.G. 13
`
`Amazon / Zentian Limited
`Exhibit 1020
`Page 11
`
`
`
`U.S. Patent
`
`Jul.18, 2000
`
`Sheet 10 of 12
`
`6,092,045
`
`}
`else {
`Mark outp in use.
`
`}
`
`}
`else {
`Defineoutp as the index of output buffer forb.
`if (outp is in use) {
`Mark b as an update type.
`
`1. Copy in to BufferO).
`2. Mark all buffers free. Mark BufferO in use.
`3.
`4. for each branch b {
`5.
`Define inp as the index of the input buffer to b.
`6.
`if (b is a terminal branch) {
`7.
`do terminal processing(leaf(b), Bufferinp).
`8.
`9.
`10.
`11.
`12.
`13.
`14.
`15.
`16.
`17.
`18.
`19.
`20.
`22.
`24. }
`
`proceSS branch by type(b,inp,Outp)
`
`}
`
`Mark inp free.
`
`F.G. 14
`
`Amazon / Zentian Limited
`Exhibit 1020
`Page 12
`
`
`
`U.S. Patent
`
`Jul.18, 2000
`
`Sheet 11 of 12
`
`6,092,045
`
`
`
`
`
`
`
`
`
`
`
`DETERMINETYPE OF EACH BRANCH
`IN LEXICAL GRAPH 205
`
`COMPARE OBSERVATION 206
`
`INPUT SCORE FROM
`PREVIOUS BRANCH 207
`
`PROCESS BRANCH ACCORDING
`TO BRANCHTYPE 208
`
`NEXT BRANCH 209
`
`NEXT OBSERVATION UNTIL
`END OF BLOCK 210
`
`NEXT MODEL 211
`
`F.G. 15
`
`Amazon / Zentian Limited
`Exhibit 1020
`Page 13
`
`
`
`U.S. Patent
`
`Jul.18, 2000
`
`Sheet 12 of 12
`
`6,092,045
`
`1
`
`3
`
`5
`
`...tales legitivo
`
`u(t)
`
`
`
`t = 0 t = 1
`
`t = 2 t = 3 t = 4; t = 5.
`
`FG 16
`
`Restore state scores from global memory.
`S = maxSu(0)).
`V(0) = u(0).
`for t = 0, 1,..., B-1 {
`v(t+1) = maxu(t+1),s + b(t)).
`S = maxS + b(t),S +bs(t).
`S2 = maxS1 + b(t), S + b (t).
`s = maxu(t+1), S + b (t)).
`
`0.
`
`Save the state scores in global memory.
`
`FG, 17
`
`Amazon / Zentian Limited
`Exhibit 1020
`Page 14
`
`
`
`1
`METHOD AND APPARATUS FOR SPEECH
`RECOGNITION
`
`FIELD OF THE INVENTION
`The invention relates to methods of comparing a Series of
`observations representing unknown Speech, to Stored models
`representing known speech, to methods of recognising pat
`terns in a Series of observations, by comparing the obser
`Vations to Stored models, to apparatus for Such methods, and
`to Software for Such methods.
`
`5
`
`BACKGROUND TO THE INVENTION
`Pattern recognition generally, and recognition of patterns
`in continuous signals Such as Speech Signals has been a
`rapidly developing field. A limitation in many applications
`has been the cost of providing Sufficient processing power
`for the complex calculations often required. This is particu
`larly the case in Speech recognition, all the more So when
`real time response is required, for example to enable auto
`mated directory enquiry assistance, or for control operations
`based on Speech input. To Simulate the Speed of response of
`a human operator, and thus avoid, a perception of "unnatu
`ral delays, which can be disconcerting, the spoken input
`needs to be recognised within about half a Second of the end
`of the Spoken input.
`The computational load varies directly with the number of
`words or other elements of Speech, which are modelled and
`held in a dictionary, for comparison to the spoken input. This
`is also known as the size of vocabulary of the system. The
`computational load also varies according to the complexity
`of the models in the dictionary, and how the Speech input is
`processed into a representation ready for the comparison to
`the models. Finally, the actual algorithm for carrying out the
`comparison is clearly a key factor. Numerous attempts have
`been made over many years to improve the trade off between
`computational load, accuracy of recognition, and Speed of
`recognition. For uSeable Systems, having a tolerable recog
`nition accuracy, the computational demands are high.
`Despite continuous refinements to models, Speech input
`representations, and recognition algorithms, and advances in
`processing hardware, there remains great demand to
`improve the above mentioned trade off.
`There are five main Steps: audio channel adaptation,
`feature extraction, word end-point detection, Speech
`recognition, and accept/reject decision logic. The Speech
`recognition Step, the fourth Stage, is the most computation
`ally intensive Step, and thus a limiting factor as far as the
`above mentioned trade off is concerned. Depending on the
`Size of Vocabularies used, and the size of each model, both
`the memory requirements and the number of calaculations
`required for each recognition decision, may limit the Speed/
`accuracy/cost trade off. Examples of Such Systems are
`described in U.S. Pat. No. 5,390,278 (Gupta et al.), and in
`U.S. Pat. No. 5,515,475 (Gupta).
`SUMMARY OF THE INVENTION
`It is an object of the invention to provide improved
`method and apparatus.
`According to the invention, there is provided a method of
`comparing a Series of observations representing unknown
`Speech, to Stored models representing known Speech, the
`series of observations being divided into at least two blocks
`each comprising two or more of the observations, the
`method comprising the Steps of
`a) comparing two or more of the observations in one of
`the blocks of observations, to a Subset comprising one
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6,092,045
`
`2
`or more of the models, to determine a likelihood of a
`match to each of the one or more models,
`b) repeating Step a) for models other than those in the
`Subset; and
`c) repeating Steps a) and b) for a different one of the
`blocks.
`Amongst the advantages of Such a method is that the
`Speed of processing of Step a), the critical innermost loop,
`can be improved. The innermost loop is the part which is
`repeated most often, and by carrying out the operations in
`the order Specified, cache misses can be reduced, and
`pipelining efficiency improved. More Specifically, Slow main
`memory accesses may be required only once per block per
`model, instead of once per observation per model.
`Preferably, the observations are represented as multidi
`mensional vectors, for the comparison at Step a). In Such
`cases, processing requirements may be particularly onerous,
`and the advantages of the method become more valuable.
`Advantageously, the comparison at Step a) uses a Viterbi
`algorithm. The Viterbialgorithm is a dynamic programming
`algorithm which is preferred for comparing observations to
`particular types of models. If Such models are time Series of
`States joined by transitions, the Viterbialgorithm determines
`the probability of each observation matching a one or more
`of those States. If the models comprise Strings of States,
`cumulative probability Scores for each String can be
`determined, to find the best path. In Such cases, processing
`requirements may be particularly onerous, and the advan
`tages of the method become more valuable.
`Advantageously, the models are represented as finite State
`machines with probability distribution functions attached.
`Continuously varying Signals can be modeled as a Series of
`Stationary States, with a probability function at each State,
`indicating whether the Signal moves to the next State or stay
`S at the given State, after a given time interval. An example
`is the Hidden Markov Model (HMM). This enables the
`models to contain a representation of patterns with varying
`rates of change, yet a common underlying pattern. However,
`again, processing requirements may be particularly onerous,
`and the advantages of the method become more valuable.
`Advantageously, the models comprise groups of repre
`sentations of phonemes. This enables the models to be stored
`and used more efficiently. Recognising patterns in Speech
`Signals is particularly processor intensive, because of the
`wide range of differences in Sounds carrying the same
`meaning. Also, in many speech recognition applications, it
`is necessary to carry out recognition in real time, that is in
`half a Second or So after the Speaker has finished.
`Accordingly, again, processing requirements may be par
`ticularly onerous, and the advantages of the method become
`more valuable.
`Advantageously, the models comprise representations of
`elements of speech, and step a) comprises the Step of:
`comparing the block of observations to a predetermined
`Sequence of the models in the Subset. The amount of
`processing can be reduced if the models are Stored as
`Sequences of elements of Speech. Many utterances
`share elements with other utterances. To avoid com
`paring the same elements multiple times, the individual
`models of the elements can be merged into Sequences.
`The common parts can be represented once, with
`branches where the elements are no longer common.
`Advantageously, step a) comprises the steps of:
`comparing the block of observations to a predetermined
`Sequence of the models in the Subset;
`determining for each of the models in the Sequence, a
`score which represents the likelihood of a match with
`the observations compared So far;
`
`Amazon / Zentian Limited
`Exhibit 1020
`Page 15
`
`
`
`3
`Storing the Score in a Score buffer for use in determining
`Scores of Subsequent models in the Sequence; and
`determining when the Score is no longer needed, then
`re-using the Score buffer to Store a Subsequent Score.
`This can bring an advantage in reducing memory
`requirements, which can relieve a significant constraint. This
`may occur in particular where the models comprise large
`numbers of paths, each of which has to have a cumulative
`probability Score maintained, or where there are a large
`number of models.
`Advantageously, step a) comprises the step of:
`comparing the block of observations to a lexical graph
`comprising a predetermined Sequence of the models in
`the Subset, wherein the Sequence comprises different
`types of models, and the comparison is dependent on
`the type; and the method comprises the Step of:
`determining the types of the models before the block is
`compared.
`Where different algorithms are used, tuned to suit differ
`ent types of model, or different branches of a Sequence of
`models, processing can be speeded up if it is unnecessary to
`test for branch type during processing. In particular, it
`becomes easier for a compiler to pipeline the algorithm. If
`a branch is encountered, which route to take must be
`resolved before continuing, thus the pipelining by prepro
`cessing of instructions ahead of the current instruction may
`be held up.
`Preferably, the models comprise finite State machines,
`having multiple state Sequences, wherein Step a) comprises
`the Steps of:
`determining State Scores for the matches between each
`respective observation and State Sequences of the
`respective model,
`making an approximation of the State Scores, for the
`observation, for Storing to use in matching Subsequent
`observations, the approximation comprising fewer State
`Scores than were determined for the respective obser
`Vation.
`This enables the amount of Storage required for State
`Scores to be reduced. This is particularly advantageous
`where there are a large number of models, each with
`multiple States and therefore a large number of State Scores
`to be Stored for use in the next block. In Such cases, a two
`pass approach may be used, in which a rough, or fast match
`algorithm is used on the first pass, to identify a Small number
`of Similar models, which are then compared in more detail
`in the Second pass. The approximation is particularly advan
`tageous for the first pass, to reduce the memory used, and to
`Speed up the process Since fewer memory accesses are
`required, and fewer calculations. This is particularly advan
`tageous if the calculations are time consuming floating point
`calculations.
`According to another aspect of the invention, there is
`provided a method of recognising patterns in a Series of
`observations, by comparing the observations to Stored
`models, using a processing means having a main memory
`for Storing the models and a cache memory, the cache
`memory being too Small to contain all the models and
`observations, the Series of observations being divided into
`blocks of at least two observations, the method comprising
`the Steps of:
`a) using the processor to compare a Subset of the models
`to the observations in one of the blocks of observations,
`to recognise the patterns, the Subset of the models being
`Small enough to fit in the cache memory;
`b) repeating step a) for a different Subset of the models
`and,
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6,092,045
`
`4
`c) repeating Steps a) and b) for a different one of the
`blocks.
`According to another aspect of the invention, there is
`provided a method of recognising patterns in a Series of
`observations by comparing the observations to Stored
`models, the Series of observations being divided into at least
`two blocks each comprising two or more of the observations,
`the models comprising finite State machines, having multiple
`State Sequences, the method comprising the Steps of:
`a) comparing two or more of the observations in one of
`the blocks of observations, to a Subset comprising one
`or more of the models, to determine a likelihood of a
`match to each of the one or more models, by deter
`mining which of the State Sequences of the respective
`model is the closest match, and how close is the match;
`b) repeating Step a) for models other than those in the
`Subset; and
`c) repeating Steps a) and b) for a different one of the
`blocks.
`According to another aspect of the invention, there is
`provided a method of comparing a Series of observations
`representing unknown speech, to Stored models representing
`known speech, by comparing the observations to Stored
`models, the Series of observations being grouped into one or
`more blocks each comprising two or more of the
`observations, the models comprising finite State machines,
`having multiple State Sequences, the method comprising, for
`each of the one or more blocks, the Steps of:
`a) comparing two or more of the observations in the
`respective block, to a Subset comprising one or more of
`the models, to determine a likelihood of a match to each
`of the one or more models, by determining which of the
`State Sequences of the respective model is the closest
`match, and how close is the match; and
`b) repeating Step a) for models other than those in the
`Subset.
`In Some applications, the Series of observations may be
`Short enough that only a single block is necessary. The
`advantages Set out above Still apply.
`According to another aspect of the invention, there is
`provided Software Stored on a computer readable medium
`for comparing a Series of observations representing
`unknown speech, to Stored models representing known
`Speech, the Series of observations being divided into at least
`two blocks each comprising two or more of the observations,
`the Software being arranged for carrying out the Steps of:
`a) comparing two or more of the observations in one of
`the blocks of observations, to a Subset comprising one
`or more of the models, to determine a likelihood of a
`match to each of the one or more models,
`b) repeating Step a) for models other than those in the
`Subset; and
`c) repeating Steps a) and b) for a different one of the
`blocks.
`According to another aspect of the invention, there is
`provided Software Stored on a computer readable medium
`for recognising patterns in a Series of observations by
`comparing the observations to Stored models, the Series of
`observations being divided into at least two blocks each
`comprising two or more of the observations, the models
`comprising finite State machines, having multiple State
`Sequences, the Software being arranged to carry out the Steps
`of:
`a) comparing two or more of the observations in one of
`the blocks of observations, to a Subset comprising one
`or more of the models, to determine a likelihood of a
`
`Amazon / Zentian Limited
`Exhibit 1020
`Page 16
`
`
`
`S
`match to each of the one or more models, by deter
`mining which of the State Sequences of the respective
`model is the closest match, and how close is the match;
`b) repeating step a) for models other than those in the
`Subset; and
`c) repeating steps a) and b) for a different one of the
`blocks.
`According to another aspect of the invention, there is
`provided Software Stored on a computer readable medium
`for comparing a Series of observations representing
`unknown speech, to Stored models representing known
`Speech, by comparing the observations to Stored models, the
`Series of observations being grouped into one or more blockS
`each comprising two or more of the observations, the models
`comprising finite State machines, having multiple State
`Sequences, the Software being arranged to carry out for each
`of the one or more blocks, the Steps of:
`a) comparing two or more of the observations in the
`respective block, to a Subset comprising one or more of
`the models, to determine a likelihood of a match to each
`of the one or more models, by determining which of the
`State Sequences of the respective model is the closest
`match, and how close is the match; and
`b) repeating step a) for models other than those in the
`Subset.
`According to another aspect of the invention, there is
`provided a speech recognition processor for comparing a
`Series of observations representing unknown Speech, to
`Stored models representing known speech, the Series of
`observations being divided into at least two blocks each
`comprising two or more of the observations, the processor
`being arranged to carry out the steps of:
`a) comparing two or more of the observations in one of
`the blocks of observations, to a Subset comprising one
`or more of the models, to determine a likelihood of a
`match to each of the one or more models,
`b) repeating step a) for models other than those in the
`Subset; and
`c) repeating steps a) and b) for a different one of the
`blocks.
`According to another aspect of the invention, there is
`provided a speech recognition processor for recognising
`patterns in a Series of observations by comparing the obser
`Vations to Stored models, the Series of observations being
`divided into at least two blocks each comprising two or more
`of the observations, the models comprising finite State
`machines, having multiple State Sequences, the processor
`being arranged to carry out the Steps of:
`a) comparing two or more of the observations in one of
`the blocks of observations, to a Subset comprising one
`or more of the models, to determine a likelihood of a
`match to each of the one or more models, by deter
`mining which of the State Sequences of the respective
`model is the closest match, and how close is the match;
`b) repeating step a) for models other than those in the
`Subset; and
`c) repeating steps a) and b) for a different one of the
`blocks.
`According to another aspect of the invention, there is
`provided a speech recognition processor for comparing a
`Series of observations representing unknown Speech, to
`Stored models representing known Speech, by comparing the
`observations to Stored models, the Series of observations
`being grouped into one or more blocks each comprising two
`or more of the observations, the models comprising finite
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6,092,045
`
`6
`State machines, having multiple State Sequences, the proces
`Sor being arranged to carry out, for each of the one or more
`blocks, the Steps of
`a) comparing two or more of the observations in the
`respective block, to a Subset comprising one or more of
`the models, to determine a likelihood of a match to each
`of the one or more models, by determining which of the
`State Sequences of the respective model is the closest
`match, and how close is the match; and
`b) repeating Step a) for models other than those in the
`Subset.
`Preferred features may be combined as would be apparent
`to a skilled perSon, and may be combined with any aspect of
`the invention.
`To show, by way of example, how to put the invention into
`practice, embodiments will now be described in more detail,
`with reference to the accompanying drawings.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`FIG. 1 shows principal Steps in a known speech recog
`nition process,
`FIG. 2 shows known service controller apparatus for
`carrying out the method of FIG. 1;
`FIG. 3 shows the speech recognition processor of FIG. 2
`in more detail;
`FIG. 4 shows the recognition step of FIG. 1 in more detail;
`FIG. 5 shows the recognition Step according to an
`embodiment of the invention;
`FIG. 6 shows a table of Scores for a series of observations
`matched against models, indicating the known order of
`calculating the Scores,
`FIG. 7 shows the table of scores as shown in FIG. 6,
`indicating an order of calculating the Scores according to an
`embodiment of the invention;
`FIG. 8 shows a basic HMM topology;
`FIG. 9 shows a trellis of an HMM;
`FIG. 10 shows a trellis for a block Viterbi search;
`FIG. 11 shows how full models are collapsed to become
`fast-match models;
`FIG. 12 shows a block Viterbi lexical graph, showing
`branch types and Score buffer assignment;
`FIG. 13 shows a fast match model;
`FIG. 14 shows pseudo code for the block Viterbialgo
`rithm;
`FIG. 15 shows the recognition step of FIG. 1 adapted for
`branch type optimization;
`FIG. 16 shows an optional HMM and trellis;
`FIG. 17 shows pseudo code for a block Viterbialgorithm
`for optional branches, and
`FIG. 18 shows an alternative recognition processor
`arrangement for carrying out Some of the Steps of FIG. 1.
`DETAILED DESCRIPTION
`FIG. 1: Voice Recognition
`FIG. 1 shows an overall view of an example of a known
`pattern recognition process, for voice recognition. There are
`five main steps: channel adaptation (1), feature extraction
`(2), word end-point detection (3), Speech recognition (4),
`and accept/reject decision logic (5).
`In the first step-channel adaptation-the System moni
`tors the characteristics of the telephone line that carries the
`Speech, and creates an initial mathematical model of channel
`parameters, Such as line noise level. This model, which is
`
`Amazon / Zentian Limited
`Exhibit 1020
`Page 17
`
`
`
`7
`updated continually throughout the recognition process, is
`used to adapt the Vector of coefficients generated during
`feature extraction So that the coefficients respond to fore
`ground Speech and block out line channel variations.
`During the Second step-feature extraction-the System
`divides the unknown word or phrase into short Segments,
`called frames. Each frame lasts 12.75 milliseconds. Thus, a
`word Such as Mississippi, which takes about a Second to Say,
`would be broken down into about 100 frames.
`Next, the System conducts a spectral analysis of each
`frame and creates a mathematical vector representation of
`the Spectrum. Each feature vector is a String of 15 coeffi
`cients. The first Seven coefficients measure the energy of the
`frame at different frequency bands to distinguish, for
`example, between a high-frequency /S/Sound, and a lower
`frequency vowel. The remaining coefficients measure the
`rate of change of these spectral energies.
`The third Stage-word endpoint detection-is performed
`by an algorithm that uses the energy and short time Spectrum
`of the Speech Signal. This algorithm removes Silence before,
`after, and in the middle of the Speech Signal, and filters out
`unwanted background noise in order to expedite the Speech
`recognition Stage.
`Speech recognition is the fourth Stage of the process. At
`this stage, the vector representations generated for each
`frame are compared with the FVR system's stored models
`for the recognition vocabulary. Each word in the Vocabulary
`is represented by a string of hidden Markov models
`(HMMs), one for each phoneme in the word, as shown in
`U.S. Pat. No. 5,390,278 (Gupta et al.) The stored string of
`phoneme models that produces the best match is determined
`using the multiple-pass approach, at which point the Spoken
`word is considered recognized, as shown in U.S. Pat. No.
`5,515,475 (Gupta). The matching process is aided by sta
`tistical models and efficient search algorithms embedded in
`the hidden Markov models.
`The final Stage of the process is rejection decision Scoring,
`which determines whether the best match f