throbber
United States Patent (19)
`Stubley et al.
`
`US006092045A
`Patent Number:
`11
`(45) Date of Patent:
`
`6,092,045
`Jul.18, 2000
`
`54 METHOD AND APPARATUS FOR SPEECH
`RECOGNITION
`
`75 Inventors: Peter R. Stubley, Lachine; Andre
`
`5,459,798 10/1995 Bailey et al. ........................... 382/218
`5,488,652
`1/1996 Bielby et al. ...
`... 379/88
`5,515,475 5/1996 Gupta et al. ...
`... 395/2.51
`5,621,859 4/1997 Schwartz et al. ...................... 395/2.65
`
`Gillet, St-Laurent; Vishwa N. Gupta,
`
`Brossard, all of Canada; Christopher
`K. Toulson, Palo Alto, David B.
`Peters, San Carlos, both of Calif.
`
`73 ASSignee: Nets Neyers Corporation,
`Onlreal, Ulanada
`21 Appl. No.: 09/119,621
`22 Filed:
`Jul. 21, 1998
`30
`Foreign Application Priority Data
`Sep. 19, 1997 CA Canada ................................... 2216224
`ep. 19,
`CA Canada
`51) Int. Cl." ............................. G10L 15/10; G10L 15/14
`52 U.S. Cl. ............................................. 704/254; 704/256
`58 Field of Search ..................................... 704/256, 251,
`704/254, 255, 200, 231, 240, 242, 238,
`236, 243, 244, 252
`
`56)
`
`References Cited
`U.S. PATENT DOCUMENTS
`4,164,025 8/1979 Dubnowski et al. ................... 364/900
`4,751,737 6/1988 Gerson et al. ............................ 381/43
`4,797,910
`1/1989 Daudelin ................................... 379/67
`4,805,219 2/1989 Baker ........................................ 381/43
`4.959,855 9/1990 Daudelin .......
`... 379/213
`4,979,206 12/1990 Padden et al. ............................ 379/67
`5,050,215
`9/1991 Nishimura ................................. 381/41
`5,052,038 9/1991. Shepard .................................... 379/88
`5,086,479 2/1992 Takenaga et al. ........................ 382/14
`2 - a-Y-2
`9.
`5,091,947 2/1992 Ariyoshi et al. .......................... 381/42
`5,097.509 3/1992 Lennig ...................................... 381/43
`5,127,055 6/1992 Larkey ...................................... 381/43
`
`5.5% E. Sis - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - SS
`
`elder................
`5030753 7
`2- - -2
`/1999 Potamianos et al.
`704/243
`5,933,806 8/1999 Beyerlein et al. ...................... 704/256
`FOREIGN PATENT DOCUMENTS
`0703568 3/1996 European Pat. Off. .......... G10L 5/06
`0 703 568 3/1996 European Pat. Off. .......... G10L 5/06
`O 810 583 12/1997 European Pat. Off. .......... G10L 5/06
`OTHER PUBLICATIONS
`Kuhn, R., et al., “A Cache-Based Natural Language Model
`for Speech Recognition.” IEEE Transactions on Pattern
`Analysis and Machine Intelligence, Vol. 12, No. 6, Jun.
`s
`s
`s
`1990, pp. 570-583 (XP000136978).
`Bakis, R. and Cole, A.G., “Dynamic Modification of the
`Vocabulary of a Speech Recognition Machine,” IBM Tech
`nical Disclosure Bulletin, vol. 27, No. 7A, (1984).
`Gao, et al., “Dynamic Adaptation of Hidden Markov Model
`for Robust Speech Recognition,” 1989 IEEE Internatioal
`Symposium On Circuits and Systems, Vol. 2 of 3, pp.
`1336–1339, (1989).
`(List continued on next page.)
`Primary Examiner Richemond Dorvil
`Attorney, Agent, or Firm-Foley & Lardner
`57
`ABSTRACT
`Comparing a Series of observations representing unknown
`Speech, to Stored models representing known Speech, the
`series of observations being divided into at least two blocks
`each comprising two or more of the observations, is carried
`pr1SIng
`out in an order which makes better use of memory. First, the
`observations in one of the blocks are compared (31), to a
`Subset comprising one or more of the models, to determine
`
`- - - - - - - - - - - - - - - - - - - - - - - - - - 2. a likelihood of a match to each of the one or more models.
`
`1.E. R et
`
`5204894 A 993 E" et al. .......................... 2.
`5,271,088 12/1993 Bahler m.
`.704200
`5,274,695 12/1993 Green ........................................ 37988
`5,307.444 4/1994 Tsuboka .................................... 395/22
`5,390,278 2/1995 Gupta et al. ........................... 395/2.52
`
`This step is repeated (33) for models other than those in the
`Subset; and the whole process is repeated (34) for each
`block.
`
`21 Claims, 12 Drawing Sheets
`
`WORD RECOGNITION
`
`14
`
`FIND FIRSTOBSERVATIONN
`FIRST BLOCK, AND FIND
`FIRST MODEL 30
`
`
`
`COMPARE OBSERVATION
`TO MODEL 31
`
`
`
`NEXT OBSERVATION,
`UNTIL END OFBLOCK 32
`
`NEXT MODEL 33
`
`NEXTBLOCK OF
`OBSERVATIONS 34
`
`s
`
`
`
`Amazon / Zentian Limited
`Exhibit 1020
`Page 1
`
`

`

`6,092,045
`Page 2
`
`OTHER PUBLICATIONS
`Gorin, A.L.. et al., “Adaptive Acquisition of Language,'
`Computer Speech and Language, Vol. 5, No. 2, pp. 101-132,
`(1991).
`Lennig, M., et al., “Automated Bilingual Directory ASSis
`tance Trial in Bell Canada,” IEEE Workshop On Interactive
`Voice Technology for Telecom Applications, (1992).
`Lennig, M. and Sharp, D., “Unleashing The Potential Of
`Human-To-Machine Communication’ Telesis, No. 97, pp.
`23–37, (1993).
`Lenning, M., “Putting Speech Recognition to Work in the
`Telephone Network," IEEE Computer Society, vol. 23, No.
`8 (1990).
`
`Lennig, M., et al., “Flexible Vocabulary Recognition of
`Speech Over the Telephone," IEEE Workshop On Interactive
`Voice Technology for Telecom Applications, (1992).
`Nagata, S., et al., “Mobile Robot Control by a Structured
`Hierarchical Neural Network,” pp. 69–76, (1989).
`Rabiner, L.R. and Juang, B.H., “An Introduction to Hidden
`Markov Models," IEEE ASSP Magazine, vol. 3, No. 1
`(1986).
`Young, S., Large Vocabulary Continuous Speech Recogni
`tion: a Review IEEE Automatic Speech Recognition Work
`shop, (1995).
`
`Amazon / Zentian Limited
`Exhibit 1020
`Page 2
`
`

`

`U.S. Patent
`
`Jul.18, 2000
`
`Sheet 1 of 12
`
`6,092,045
`
`SPEECH
`INPUT
`
`
`
`
`
`CHANNEL ADAPTATION 11
`
`FEATURE EXTRACTION 12
`
`WORD ENDPOINT DETECTION 13
`
`WORD RECOGNITION BY
`FINDING CLOSEST MATCHES
`TO STORED VOCABULARY 14
`
`ACCEPT/REJECT DECISION 15
`
`OUTPUT THE BEST MATCH,
`OR REJECTION
`
`FIG. 1 PRIOR ART
`
`Amazon / Zentian Limited
`Exhibit 1020
`Page 3
`
`

`

`U.S. Patent
`
`Jul.18, 2000
`
`Sheet 2 of 12
`
`6,092,045
`
`TO LAN/WAN
`
`TO PSTN
`
`
`
`MASTER
`SERVICE
`PROC.
`101
`
`REAL TIME
`SERVICE
`PROC.
`102
`
`DIGITAL
`TRUNK
`LINK
`103
`
`TIME
`SLOT
`INTER
`CHANGE
`104
`
`VMEbus || ||
`
`VOICE BUS
`
`
`
`SPEECH
`MULTI
`X.25
`RECOGNITION INTERACE CHANNEL
`PROCESSOR
`106
`TELECOMS
`105
`LINK
`107
`
`TODATANETWORK
`
`FIG. 2
`
`Amazon / Zentian Limited
`Exhibit 1020
`Page 4
`
`

`

`U.S. Patent
`
`Jul.18, 2000
`
`Sheet 3 of 12
`
`6,092,045
`
`SPEECH
`RECOGNITION
`PROCESSOR
`CARD 105
`
`VME BUS
`INTERFACE
`121
`
`LOCAL
`CACHE
`MEMORY
`122
`
`LOCAL
`CACHE
`CONTROL
`124
`
`GLOBAL
`MEMORY
`125
`
`LOCAL
`CACHE
`MEMORY
`122
`
`LOCAL
`CACHE
`CONTROL
`124
`
`FG. 3
`
`
`
`DESKTOP COMPUTER 132
`PENTUM
`PROCESSOR
`133
`ON-CHP
`CACHE 134
`
`GLOBAL
`MEMORY
`ON RAM
`135
`
`F.G. 18
`
`Amazon / Zentian Limited
`Exhibit 1020
`Page 5
`
`

`

`U.S. Patent
`
`Jul.18, 2000
`
`Sheet 4 of 12
`
`6,092,045
`
`
`
`WORD RECOGNITION (BY VITERBI
`ALGORITHM)
`14
`INPUT WORD ASA
`SERIES OF OBSERVATIONS
`INFORM OF CEPSTRAL VECTORS
`20
`
`COMPARISON STEP
`FOR CURRENT OBSERVATION IN
`SERIES, COMPUTE PROBABILITY
`OFIT BEING THE SAME ASEACH
`PATH IN THE STORED HMM
`TRELLIS
`21
`
`REPEAT FOR NEXT HMM
`UNTIL END
`22
`
`REPEAT FOR NEXT OBSERVATION
`UNTIL END
`23
`
`OUTPUT SCORES FOR BEST MATCHINGWORDS
`
`FIG. 4 PRIOR ART
`
`Amazon / Zentian Limited
`Exhibit 1020
`Page 6
`
`

`

`U.S. Patent
`
`Jul.18, 2000
`
`Sheet S of 12
`
`6,092,045
`
`
`
`WORD RECOGNITION
`
`14
`
`FIND FIRST OBSERVATION IN
`FIRST BLOCK, AND FIND
`FIRST MODEL 30
`
`COMPARE OBSERVATION
`TO MODEL 31
`
`NEXT OBSERVATION,
`UNTIL END OF BLOCK 32
`
`NEXT MODEL 33
`
`NEXT BLOCK OF
`OBSERVATIONS 34
`
`FIGS
`
`Amazon / Zentian Limited
`Exhibit 1020
`Page 7
`
`

`

`U.S. Patent
`
`Jul.18, 2000
`
`Sheet 6 of 12
`
`6,092,045
`
`SERIES OF OBSERVATIONS MAKING
`UP WORD BEING RECOGNISED
`A B A C U S
`
`
`
`WORDS IN
`DICTIONARY
`OF MODELS
`
`"ABRAHAM'
`
`"BEACH"
`
`"CARROT"
`
`"DEPTH"
`LAST TO BE CALCULATED
`
`F.G. 6
`
`SERIES OF OBSERVATIONS MAKING
`UP WORD BEING RECOGNISED
`{BLOCK 1 } {BLOCK 2 }
`A B A C U S
`
`
`
`
`
`WORDS IN
`DCTIONARY
`OF MODELS
`
`5%. 10%. 11%.10% 7%.5%
`
`
`
`"ABRAHAM"
`
`74%. 5%, 4%, 4%. 3%
`
`"BEACH'
`
`A 4%.5%l 5A 5A 49
`
`"CARROT”
`
`A 19h 26 2%. 39A 296
`
`"DEPTH'
`
`Amazon / Zentian Limited
`Exhibit 1020
`Page 8
`
`

`

`U.S. Patent
`
`Jul.18, 2000
`
`Sheet 7 of 12
`
`6,092,045
`
`
`
`STATE
`
`FRAME FRAME FRAME
`T
`T-1
`Téo
`
`STATE 2
`
`STATE 3
`
`STATE 4
`
`L
`TO END STATE OR NEXT HMM
`
`FIG. 9
`
`Amazon / Zentian Limited
`Exhibit 1020
`Page 9
`
`

`

`U.S. Patent
`
`Jul.18, 2000
`
`Sheet 8 of 12
`
`6,092,045
`
`u()
`
`(1)
`
`2)
`
`u(3)
`
`u4)
`
`u(5)
`
`(6)
`
`(7)
`
`W(t)
`
`v(2)
`
`v(3)
`
`v(4)
`
`W5)
`
`w(6)
`
`v(7)
`
`FIG 10
`
`
`
`Amazon / Zentian Limited
`Exhibit 1020
`Page 10
`
`

`

`U.S. Patent
`
`Jul.18, 2000
`
`Sheet 9 of 12
`
`6,092,045
`
`
`
`S1
`
`S2
`
`F.G. 13
`
`Amazon / Zentian Limited
`Exhibit 1020
`Page 11
`
`

`

`U.S. Patent
`
`Jul.18, 2000
`
`Sheet 10 of 12
`
`6,092,045
`
`}
`else {
`Mark outp in use.
`
`}
`
`}
`else {
`Defineoutp as the index of output buffer forb.
`if (outp is in use) {
`Mark b as an update type.
`
`1. Copy in to BufferO).
`2. Mark all buffers free. Mark BufferO in use.
`3.
`4. for each branch b {
`5.
`Define inp as the index of the input buffer to b.
`6.
`if (b is a terminal branch) {
`7.
`do terminal processing(leaf(b), Bufferinp).
`8.
`9.
`10.
`11.
`12.
`13.
`14.
`15.
`16.
`17.
`18.
`19.
`20.
`22.
`24. }
`
`proceSS branch by type(b,inp,Outp)
`
`}
`
`Mark inp free.
`
`F.G. 14
`
`Amazon / Zentian Limited
`Exhibit 1020
`Page 12
`
`

`

`U.S. Patent
`
`Jul.18, 2000
`
`Sheet 11 of 12
`
`6,092,045
`
`
`
`
`
`
`
`
`
`
`
`DETERMINETYPE OF EACH BRANCH
`IN LEXICAL GRAPH 205
`
`COMPARE OBSERVATION 206
`
`INPUT SCORE FROM
`PREVIOUS BRANCH 207
`
`PROCESS BRANCH ACCORDING
`TO BRANCHTYPE 208
`
`NEXT BRANCH 209
`
`NEXT OBSERVATION UNTIL
`END OF BLOCK 210
`
`NEXT MODEL 211
`
`F.G. 15
`
`Amazon / Zentian Limited
`Exhibit 1020
`Page 13
`
`

`

`U.S. Patent
`
`Jul.18, 2000
`
`Sheet 12 of 12
`
`6,092,045
`
`1
`
`3
`
`5
`
`...tales legitivo
`
`u(t)
`
`
`
`t = 0 t = 1
`
`t = 2 t = 3 t = 4; t = 5.
`
`FG 16
`
`Restore state scores from global memory.
`S = maxSu(0)).
`V(0) = u(0).
`for t = 0, 1,..., B-1 {
`v(t+1) = maxu(t+1),s + b(t)).
`S = maxS + b(t),S +bs(t).
`S2 = maxS1 + b(t), S + b (t).
`s = maxu(t+1), S + b (t)).
`
`0.
`
`Save the state scores in global memory.
`
`FG, 17
`
`Amazon / Zentian Limited
`Exhibit 1020
`Page 14
`
`

`

`1
`METHOD AND APPARATUS FOR SPEECH
`RECOGNITION
`
`FIELD OF THE INVENTION
`The invention relates to methods of comparing a Series of
`observations representing unknown Speech, to Stored models
`representing known speech, to methods of recognising pat
`terns in a Series of observations, by comparing the obser
`Vations to Stored models, to apparatus for Such methods, and
`to Software for Such methods.
`
`5
`
`BACKGROUND TO THE INVENTION
`Pattern recognition generally, and recognition of patterns
`in continuous signals Such as Speech Signals has been a
`rapidly developing field. A limitation in many applications
`has been the cost of providing Sufficient processing power
`for the complex calculations often required. This is particu
`larly the case in Speech recognition, all the more So when
`real time response is required, for example to enable auto
`mated directory enquiry assistance, or for control operations
`based on Speech input. To Simulate the Speed of response of
`a human operator, and thus avoid, a perception of "unnatu
`ral delays, which can be disconcerting, the spoken input
`needs to be recognised within about half a Second of the end
`of the Spoken input.
`The computational load varies directly with the number of
`words or other elements of Speech, which are modelled and
`held in a dictionary, for comparison to the spoken input. This
`is also known as the size of vocabulary of the system. The
`computational load also varies according to the complexity
`of the models in the dictionary, and how the Speech input is
`processed into a representation ready for the comparison to
`the models. Finally, the actual algorithm for carrying out the
`comparison is clearly a key factor. Numerous attempts have
`been made over many years to improve the trade off between
`computational load, accuracy of recognition, and Speed of
`recognition. For uSeable Systems, having a tolerable recog
`nition accuracy, the computational demands are high.
`Despite continuous refinements to models, Speech input
`representations, and recognition algorithms, and advances in
`processing hardware, there remains great demand to
`improve the above mentioned trade off.
`There are five main Steps: audio channel adaptation,
`feature extraction, word end-point detection, Speech
`recognition, and accept/reject decision logic. The Speech
`recognition Step, the fourth Stage, is the most computation
`ally intensive Step, and thus a limiting factor as far as the
`above mentioned trade off is concerned. Depending on the
`Size of Vocabularies used, and the size of each model, both
`the memory requirements and the number of calaculations
`required for each recognition decision, may limit the Speed/
`accuracy/cost trade off. Examples of Such Systems are
`described in U.S. Pat. No. 5,390,278 (Gupta et al.), and in
`U.S. Pat. No. 5,515,475 (Gupta).
`SUMMARY OF THE INVENTION
`It is an object of the invention to provide improved
`method and apparatus.
`According to the invention, there is provided a method of
`comparing a Series of observations representing unknown
`Speech, to Stored models representing known Speech, the
`series of observations being divided into at least two blocks
`each comprising two or more of the observations, the
`method comprising the Steps of
`a) comparing two or more of the observations in one of
`the blocks of observations, to a Subset comprising one
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6,092,045
`
`2
`or more of the models, to determine a likelihood of a
`match to each of the one or more models,
`b) repeating Step a) for models other than those in the
`Subset; and
`c) repeating Steps a) and b) for a different one of the
`blocks.
`Amongst the advantages of Such a method is that the
`Speed of processing of Step a), the critical innermost loop,
`can be improved. The innermost loop is the part which is
`repeated most often, and by carrying out the operations in
`the order Specified, cache misses can be reduced, and
`pipelining efficiency improved. More Specifically, Slow main
`memory accesses may be required only once per block per
`model, instead of once per observation per model.
`Preferably, the observations are represented as multidi
`mensional vectors, for the comparison at Step a). In Such
`cases, processing requirements may be particularly onerous,
`and the advantages of the method become more valuable.
`Advantageously, the comparison at Step a) uses a Viterbi
`algorithm. The Viterbialgorithm is a dynamic programming
`algorithm which is preferred for comparing observations to
`particular types of models. If Such models are time Series of
`States joined by transitions, the Viterbialgorithm determines
`the probability of each observation matching a one or more
`of those States. If the models comprise Strings of States,
`cumulative probability Scores for each String can be
`determined, to find the best path. In Such cases, processing
`requirements may be particularly onerous, and the advan
`tages of the method become more valuable.
`Advantageously, the models are represented as finite State
`machines with probability distribution functions attached.
`Continuously varying Signals can be modeled as a Series of
`Stationary States, with a probability function at each State,
`indicating whether the Signal moves to the next State or stay
`S at the given State, after a given time interval. An example
`is the Hidden Markov Model (HMM). This enables the
`models to contain a representation of patterns with varying
`rates of change, yet a common underlying pattern. However,
`again, processing requirements may be particularly onerous,
`and the advantages of the method become more valuable.
`Advantageously, the models comprise groups of repre
`sentations of phonemes. This enables the models to be stored
`and used more efficiently. Recognising patterns in Speech
`Signals is particularly processor intensive, because of the
`wide range of differences in Sounds carrying the same
`meaning. Also, in many speech recognition applications, it
`is necessary to carry out recognition in real time, that is in
`half a Second or So after the Speaker has finished.
`Accordingly, again, processing requirements may be par
`ticularly onerous, and the advantages of the method become
`more valuable.
`Advantageously, the models comprise representations of
`elements of speech, and step a) comprises the Step of:
`comparing the block of observations to a predetermined
`Sequence of the models in the Subset. The amount of
`processing can be reduced if the models are Stored as
`Sequences of elements of Speech. Many utterances
`share elements with other utterances. To avoid com
`paring the same elements multiple times, the individual
`models of the elements can be merged into Sequences.
`The common parts can be represented once, with
`branches where the elements are no longer common.
`Advantageously, step a) comprises the steps of:
`comparing the block of observations to a predetermined
`Sequence of the models in the Subset;
`determining for each of the models in the Sequence, a
`score which represents the likelihood of a match with
`the observations compared So far;
`
`Amazon / Zentian Limited
`Exhibit 1020
`Page 15
`
`

`

`3
`Storing the Score in a Score buffer for use in determining
`Scores of Subsequent models in the Sequence; and
`determining when the Score is no longer needed, then
`re-using the Score buffer to Store a Subsequent Score.
`This can bring an advantage in reducing memory
`requirements, which can relieve a significant constraint. This
`may occur in particular where the models comprise large
`numbers of paths, each of which has to have a cumulative
`probability Score maintained, or where there are a large
`number of models.
`Advantageously, step a) comprises the step of:
`comparing the block of observations to a lexical graph
`comprising a predetermined Sequence of the models in
`the Subset, wherein the Sequence comprises different
`types of models, and the comparison is dependent on
`the type; and the method comprises the Step of:
`determining the types of the models before the block is
`compared.
`Where different algorithms are used, tuned to suit differ
`ent types of model, or different branches of a Sequence of
`models, processing can be speeded up if it is unnecessary to
`test for branch type during processing. In particular, it
`becomes easier for a compiler to pipeline the algorithm. If
`a branch is encountered, which route to take must be
`resolved before continuing, thus the pipelining by prepro
`cessing of instructions ahead of the current instruction may
`be held up.
`Preferably, the models comprise finite State machines,
`having multiple state Sequences, wherein Step a) comprises
`the Steps of:
`determining State Scores for the matches between each
`respective observation and State Sequences of the
`respective model,
`making an approximation of the State Scores, for the
`observation, for Storing to use in matching Subsequent
`observations, the approximation comprising fewer State
`Scores than were determined for the respective obser
`Vation.
`This enables the amount of Storage required for State
`Scores to be reduced. This is particularly advantageous
`where there are a large number of models, each with
`multiple States and therefore a large number of State Scores
`to be Stored for use in the next block. In Such cases, a two
`pass approach may be used, in which a rough, or fast match
`algorithm is used on the first pass, to identify a Small number
`of Similar models, which are then compared in more detail
`in the Second pass. The approximation is particularly advan
`tageous for the first pass, to reduce the memory used, and to
`Speed up the process Since fewer memory accesses are
`required, and fewer calculations. This is particularly advan
`tageous if the calculations are time consuming floating point
`calculations.
`According to another aspect of the invention, there is
`provided a method of recognising patterns in a Series of
`observations, by comparing the observations to Stored
`models, using a processing means having a main memory
`for Storing the models and a cache memory, the cache
`memory being too Small to contain all the models and
`observations, the Series of observations being divided into
`blocks of at least two observations, the method comprising
`the Steps of:
`a) using the processor to compare a Subset of the models
`to the observations in one of the blocks of observations,
`to recognise the patterns, the Subset of the models being
`Small enough to fit in the cache memory;
`b) repeating step a) for a different Subset of the models
`and,
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6,092,045
`
`4
`c) repeating Steps a) and b) for a different one of the
`blocks.
`According to another aspect of the invention, there is
`provided a method of recognising patterns in a Series of
`observations by comparing the observations to Stored
`models, the Series of observations being divided into at least
`two blocks each comprising two or more of the observations,
`the models comprising finite State machines, having multiple
`State Sequences, the method comprising the Steps of:
`a) comparing two or more of the observations in one of
`the blocks of observations, to a Subset comprising one
`or more of the models, to determine a likelihood of a
`match to each of the one or more models, by deter
`mining which of the State Sequences of the respective
`model is the closest match, and how close is the match;
`b) repeating Step a) for models other than those in the
`Subset; and
`c) repeating Steps a) and b) for a different one of the
`blocks.
`According to another aspect of the invention, there is
`provided a method of comparing a Series of observations
`representing unknown speech, to Stored models representing
`known speech, by comparing the observations to Stored
`models, the Series of observations being grouped into one or
`more blocks each comprising two or more of the
`observations, the models comprising finite State machines,
`having multiple State Sequences, the method comprising, for
`each of the one or more blocks, the Steps of:
`a) comparing two or more of the observations in the
`respective block, to a Subset comprising one or more of
`the models, to determine a likelihood of a match to each
`of the one or more models, by determining which of the
`State Sequences of the respective model is the closest
`match, and how close is the match; and
`b) repeating Step a) for models other than those in the
`Subset.
`In Some applications, the Series of observations may be
`Short enough that only a single block is necessary. The
`advantages Set out above Still apply.
`According to another aspect of the invention, there is
`provided Software Stored on a computer readable medium
`for comparing a Series of observations representing
`unknown speech, to Stored models representing known
`Speech, the Series of observations being divided into at least
`two blocks each comprising two or more of the observations,
`the Software being arranged for carrying out the Steps of:
`a) comparing two or more of the observations in one of
`the blocks of observations, to a Subset comprising one
`or more of the models, to determine a likelihood of a
`match to each of the one or more models,
`b) repeating Step a) for models other than those in the
`Subset; and
`c) repeating Steps a) and b) for a different one of the
`blocks.
`According to another aspect of the invention, there is
`provided Software Stored on a computer readable medium
`for recognising patterns in a Series of observations by
`comparing the observations to Stored models, the Series of
`observations being divided into at least two blocks each
`comprising two or more of the observations, the models
`comprising finite State machines, having multiple State
`Sequences, the Software being arranged to carry out the Steps
`of:
`a) comparing two or more of the observations in one of
`the blocks of observations, to a Subset comprising one
`or more of the models, to determine a likelihood of a
`
`Amazon / Zentian Limited
`Exhibit 1020
`Page 16
`
`

`

`S
`match to each of the one or more models, by deter
`mining which of the State Sequences of the respective
`model is the closest match, and how close is the match;
`b) repeating step a) for models other than those in the
`Subset; and
`c) repeating steps a) and b) for a different one of the
`blocks.
`According to another aspect of the invention, there is
`provided Software Stored on a computer readable medium
`for comparing a Series of observations representing
`unknown speech, to Stored models representing known
`Speech, by comparing the observations to Stored models, the
`Series of observations being grouped into one or more blockS
`each comprising two or more of the observations, the models
`comprising finite State machines, having multiple State
`Sequences, the Software being arranged to carry out for each
`of the one or more blocks, the Steps of:
`a) comparing two or more of the observations in the
`respective block, to a Subset comprising one or more of
`the models, to determine a likelihood of a match to each
`of the one or more models, by determining which of the
`State Sequences of the respective model is the closest
`match, and how close is the match; and
`b) repeating step a) for models other than those in the
`Subset.
`According to another aspect of the invention, there is
`provided a speech recognition processor for comparing a
`Series of observations representing unknown Speech, to
`Stored models representing known speech, the Series of
`observations being divided into at least two blocks each
`comprising two or more of the observations, the processor
`being arranged to carry out the steps of:
`a) comparing two or more of the observations in one of
`the blocks of observations, to a Subset comprising one
`or more of the models, to determine a likelihood of a
`match to each of the one or more models,
`b) repeating step a) for models other than those in the
`Subset; and
`c) repeating steps a) and b) for a different one of the
`blocks.
`According to another aspect of the invention, there is
`provided a speech recognition processor for recognising
`patterns in a Series of observations by comparing the obser
`Vations to Stored models, the Series of observations being
`divided into at least two blocks each comprising two or more
`of the observations, the models comprising finite State
`machines, having multiple State Sequences, the processor
`being arranged to carry out the Steps of:
`a) comparing two or more of the observations in one of
`the blocks of observations, to a Subset comprising one
`or more of the models, to determine a likelihood of a
`match to each of the one or more models, by deter
`mining which of the State Sequences of the respective
`model is the closest match, and how close is the match;
`b) repeating step a) for models other than those in the
`Subset; and
`c) repeating steps a) and b) for a different one of the
`blocks.
`According to another aspect of the invention, there is
`provided a speech recognition processor for comparing a
`Series of observations representing unknown Speech, to
`Stored models representing known Speech, by comparing the
`observations to Stored models, the Series of observations
`being grouped into one or more blocks each comprising two
`or more of the observations, the models comprising finite
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6,092,045
`
`6
`State machines, having multiple State Sequences, the proces
`Sor being arranged to carry out, for each of the one or more
`blocks, the Steps of
`a) comparing two or more of the observations in the
`respective block, to a Subset comprising one or more of
`the models, to determine a likelihood of a match to each
`of the one or more models, by determining which of the
`State Sequences of the respective model is the closest
`match, and how close is the match; and
`b) repeating Step a) for models other than those in the
`Subset.
`Preferred features may be combined as would be apparent
`to a skilled perSon, and may be combined with any aspect of
`the invention.
`To show, by way of example, how to put the invention into
`practice, embodiments will now be described in more detail,
`with reference to the accompanying drawings.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`FIG. 1 shows principal Steps in a known speech recog
`nition process,
`FIG. 2 shows known service controller apparatus for
`carrying out the method of FIG. 1;
`FIG. 3 shows the speech recognition processor of FIG. 2
`in more detail;
`FIG. 4 shows the recognition step of FIG. 1 in more detail;
`FIG. 5 shows the recognition Step according to an
`embodiment of the invention;
`FIG. 6 shows a table of Scores for a series of observations
`matched against models, indicating the known order of
`calculating the Scores,
`FIG. 7 shows the table of scores as shown in FIG. 6,
`indicating an order of calculating the Scores according to an
`embodiment of the invention;
`FIG. 8 shows a basic HMM topology;
`FIG. 9 shows a trellis of an HMM;
`FIG. 10 shows a trellis for a block Viterbi search;
`FIG. 11 shows how full models are collapsed to become
`fast-match models;
`FIG. 12 shows a block Viterbi lexical graph, showing
`branch types and Score buffer assignment;
`FIG. 13 shows a fast match model;
`FIG. 14 shows pseudo code for the block Viterbialgo
`rithm;
`FIG. 15 shows the recognition step of FIG. 1 adapted for
`branch type optimization;
`FIG. 16 shows an optional HMM and trellis;
`FIG. 17 shows pseudo code for a block Viterbialgorithm
`for optional branches, and
`FIG. 18 shows an alternative recognition processor
`arrangement for carrying out Some of the Steps of FIG. 1.
`DETAILED DESCRIPTION
`FIG. 1: Voice Recognition
`FIG. 1 shows an overall view of an example of a known
`pattern recognition process, for voice recognition. There are
`five main steps: channel adaptation (1), feature extraction
`(2), word end-point detection (3), Speech recognition (4),
`and accept/reject decision logic (5).
`In the first step-channel adaptation-the System moni
`tors the characteristics of the telephone line that carries the
`Speech, and creates an initial mathematical model of channel
`parameters, Such as line noise level. This model, which is
`
`Amazon / Zentian Limited
`Exhibit 1020
`Page 17
`
`

`

`7
`updated continually throughout the recognition process, is
`used to adapt the Vector of coefficients generated during
`feature extraction So that the coefficients respond to fore
`ground Speech and block out line channel variations.
`During the Second step-feature extraction-the System
`divides the unknown word or phrase into short Segments,
`called frames. Each frame lasts 12.75 milliseconds. Thus, a
`word Such as Mississippi, which takes about a Second to Say,
`would be broken down into about 100 frames.
`Next, the System conducts a spectral analysis of each
`frame and creates a mathematical vector representation of
`the Spectrum. Each feature vector is a String of 15 coeffi
`cients. The first Seven coefficients measure the energy of the
`frame at different frequency bands to distinguish, for
`example, between a high-frequency /S/Sound, and a lower
`frequency vowel. The remaining coefficients measure the
`rate of change of these spectral energies.
`The third Stage-word endpoint detection-is performed
`by an algorithm that uses the energy and short time Spectrum
`of the Speech Signal. This algorithm removes Silence before,
`after, and in the middle of the Speech Signal, and filters out
`unwanted background noise in order to expedite the Speech
`recognition Stage.
`Speech recognition is the fourth Stage of the process. At
`this stage, the vector representations generated for each
`frame are compared with the FVR system's stored models
`for the recognition vocabulary. Each word in the Vocabulary
`is represented by a string of hidden Markov models
`(HMMs), one for each phoneme in the word, as shown in
`U.S. Pat. No. 5,390,278 (Gupta et al.) The stored string of
`phoneme models that produces the best match is determined
`using the multiple-pass approach, at which point the Spoken
`word is considered recognized, as shown in U.S. Pat. No.
`5,515,475 (Gupta). The matching process is aided by sta
`tistical models and efficient search algorithms embedded in
`the hidden Markov models.
`The final Stage of the process is rejection decision Scoring,
`which determines whether the best match f

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket