throbber
US006092045A
`6,092,045
`(114) Patent Number:
`United States Patent 55
`
`Stubley et al.
`[45] Date of Patent:
`Jul. 18, 2000
`
`[54] METHOD AND APPARATUS FOR SPEECH
`RECOGNITION
`
`[75]
`
`Inventors: Peter R. Stubley, Lachine; Andre
`Gillet, St-Laurent; Vishwa N. Gupta,
`"
`:
`Brossard, all of Canada; Christopher
`.
`K. Toulson, Palo Alto; David B.
`Peters, San Carlos, both of Calif.
`
`[73] Assignee: Nore!1Nepworks Corporation,
`.
`[21] Appl. No.: 09/119,621
`
`[22]
`
`Filed:
`
`Jul. 21, 1998
`
`Foreign Application Priority Data
`[30]
`Sep. 19, 1997
`[CA
`Canada
`oo... iceeeseseseceseeeneceeceees 2216224
`“Pe
`[CA]
`Canada
`[51]
`Int. C17ese GI10L 15/10; G1OL 15/14
`[52] U.S. Ch. eeecccccsesssseecsssecssseesssnee 704/254; 704/256
`[58] Field of Search o.......ccccccsssssecceeee 704/256, 251,
`704/254, 255, 200, 231, 240, 242, 238,
`236, 243, 244, 252
`
`[56]
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`5,459,798 10/1995 Bailey et al. wee 382/218
`
`1/1996 Biclby ct al. esscssssssessssseeeees 379/88
`5,488,652
`
`we 395/2.51
`5,515,475
`5/1996 Gupta etal. .....
`4/1997 Schwartz Ct al. cesses 395/2.65
`5,621,859
`5,857,169
`1/1999 Seide be eee eee eee ceceee see eee sees esses esas 704/256
`5,892,960
`4/1999 Seide .....secceesssssssssseeessesssssseeees 395/800
`
`5.930.753.
`7
`:
`wee 704/243
`3930,
`/1999 Potamianoset al.
`
`8/1999 Beyerlein et al. cesses 704/256
`5,933,806
`FOREIGN PATENT DOCUMENTS
`
`3/1996 European Pat. Off... G10L 5/06
`0703568
`0810 583 12/1997 European Pat, Off, G10L, 5/06
`uropean Pat.
`f
`tetteeeeee
`OTHER PUBLICATIONS
`
`Kuhn,R., et al.; “A Cache—Based Natural Language Model
`for Speech Recognition,” IEEE Transactions on Pattern
`Analysis and Machine Intelligence, vol. 12, No.
`6,
`Jun.
`?
`?
`?
`1990, pp. 570-583 (XP000136978).
`Bakis, R. and Cole, A.G., “Dynamic Modification of the
`Vocabulary of a Speech Recognition Machine,” IBM Tech-
`‘nical Disclosure Bulletin, vol. 27, No. 7A, (1984).
`Gao,et al., “Dynamic Adaptation of Hidden Markov Model
`for Robust Speech Recognition,” 1989 IEEE Internatioal
`Symposium on Circuits and Systems, vol. 2 of 3, pp.
`1336-1339, (1989).
`(List continued on next page.)
`
`Primary Examiner—Richemond Dorvil
`Attorney, Agent, or Firm—Foley & Lardner
`
`4,164,025
`8/1979 Dubnowskiet al... 364/900
`4,751,737
`6/1988 Gerson et al. oe
`eeseeeeeneeeee 381/43
`
`4,797,910
`
` 1/1989 Daudelin ...ccceccccssssssssesessssseseeees 379/67
`ABSTRACT
`~—*((57]
`4,805,219
`2/1989 Baker cccccsccssecsssssssssesesesssssssseees 381/43
`nk
`ti
`ob
`ti
`f
`.
`.
`C
`4,959,855
`9/1990) Daudelin .......eeeeeeeeceeeeeneeee 379/213
`omparing a series of observationsrepresenting unknown
`4.979,206
`12/1990 Padden etal. 379/67
`
`speech, to stored models representing known speech, the
`5,050,215
`9/1991 Nishimura cecsessssssessecseeseeeesesen 381/41
`
`
`series of observations being divided into at least two blocks
`5,052,038
`9/1991 Shepard ...cccssssssssssssssessseececee 379/88
`
`
`
`5,086,479 2/1992 Takenaga et al.oe.eee 382/14 each comprising two or more of the observations,is carried
`
`oo
`5,091,947
`2/1992 Ariyoshi et ab.
`eee 381/42
`out in an order which makesbetter use of memory.First, the
`
`
`3/1992 Lemmig: 00... 381/43
`5,097,509
`observations in one of the blocks are compared (31), to a
`6/1992 Larkey w.-ssseesccssssssssssenessseeeeee 381/43
`5,127,055
`subset comprising one or more of the models, to determine
`
`
`2183088 M003 powden et a ereereseenennenesereenes aes
`a likelihood of a match to each of the one or more models.
`
`Owden ef ale vneeesneresereneeen
`oe
`/
`/
`This step is repeated (33) for models other than those in the
`4/1993 Darden wcccccssssssssssssseseseessseeeeeee 379/88
`5,204,894
`beet:
`and
`the
`whol
`’
`ted (34)
`f
`h
`... 704/200
`5,271,088 12/1993 Babhler
`...
`subset;
`and
`the whole process is repeated
`(34)
`for eac
`
`
`12/1993 Green cesccccsssessssssesssseeseeesensesee 379/88
`5,274,695
`block.
`
`4/1994 Tsuboka ..ccseeccccscsssssseeeessssssneees 395/22
`5,307,444
`5,390,278
`2/1995 Gupta et al. oe eeeeeeneeeee 395/2.52
`21 Claims, 12 Drawing Sheets
`
`WORD RECOGNITION
`14
`
`
`COMPARE OBSERVATION
`
`
`
`FIND FIRST OBSERVATION IN
`FIRST BLOCK, AND FIND
`FIRST MODEL 30
`
`TO MODEL31
`
`yO
`NEXT OBSERVATION,
`UNTIL END OF BLOCK 32
`
`SS
`NEXT MODEL 33
`
`i———_
`NEXT BLOCK OF
`OBSERVATIONS 34
`
`
`£dOOl
`
`
`IPR2023-00037
`Apple EX1020 Page1
`
`IPR2023-00037
`Apple EX1020 Page 1
`
`

`

`6,092,045
`Page 2
`
`OTHER PUBLICATIONS
`
`Gorin, A.L. et al., “Adaptive Acquisition of Language,”
`Computer Speech and Language, vol. 5, No. 2, pp. 101-132,
`(1991).
`Lennig, M., et al., “Automated Bilingual Directory Assis-
`tance Trial in Bell Canada,” IEEE Workshop on Interactive
`Voice Technology for Telecom Applications, (1992).
`Lennig, M. and Sharp, D., “Unleashing The Potential Of
`Human—To—Machine Communication” Telesis, No. 97, pp.
`23-37, (1993).
`Lenning, M., “Putting Speech Recognition to Work in the
`Telephone Network,” IEEE Computer Society, vol. 23, No.
`8 (1990).
`
`Lennig, M., et al., “Flexible Vocabulary Recognition of
`Speech Overthe Telephone,” JEEE Workshop on Interactive
`Voice Technology for Telecom Applications, (1992).
`
`Nagata, S., et al., “Mobile Robot Control by a Structured
`Hierarchical Neural Network,” pp. 69-76, (1989).
`
`Rabiner, L.R. and Juang, B.H., “An Introduction to Hidden
`Markov Models,” IEEE ASSP Magazine, vol. 3, No. 1
`(1986).
`
`Young, S., Large Vocabulary Continuous Speech Recogni-
`tion: a Review IEEE Automatic Speech Recognition Work-
`shop, (1995).
`
`IPR2023-00037
`Apple EX1020 Page 2
`
`IPR2023-00037
`Apple EX1020 Page 2
`
`

`

`U.S. Patent
`
`SPEECH
`
`INPUT
`
`Jul. 18, 2000
`
`Sheet 1 of 12
`
`6,092,045
`
`CHANNEL ADAPTATION 11
`
`TO STORED VOCABULARY14
`
`FEATURE EXTRACTION12
`
`WORD ENDPOINT DETECTION13
`
`WORD RECOGNITION BY
`
`FINDING CLOSEST MATCHES
`
`ACCEPT/REJECT DECISION 15
`
`OUTPUT THE BEST MATCH,
`OR REJECTION
`
`FIG.1
`
`PRIOR ART
`
`IPR2023-00037
`Apple EX1020 Page 3
`
`IPR2023-00037
`Apple EX1020 Page 3
`
`

`

`U.S. Patent
`
`Jul. 18, 2000
`
`Sheet 2 of 12
`
`6,092,045
`
`TO LAN/WAN
`
`TO PSTN
`
`CHANGE 104
`
`SERVICE
`PROC.
`101
`
`SERVICE
`PROC.
`102
`
`me TT
`
`VOICE BUS
`
`SPEECH
`RECOGNITION
`PROCESSOR
`105
`
`107
`
`X.25
`INTERFACE
`106
`
`MULTI-
`CHANNEL
`TELECOMS
`LINK
`
`TO DATA NETWORK
`
`FIG,2
`
`IPR2023-00037
`Apple EX1020 Page 4
`
`IPR2023-00037
`Apple EX1020 Page 4
`
`

`

`U.S. Patent
`
`Jul. 18, 2000
`
`Sheet 3 of 12
`
`6,092,045
`
`SPEECH
`RECOGNITION
`PROCESSOR
`CARD 105
`
`VME BUS
`INTERFACE
`121
`
`GLOBAL
`MEMORY
`
`
`
`DESKTOP COMPUTER 132
`
`PENTIUM
`PROCESSOR
`133
`ON-CHIP
`CACHE 134
`
`GLOBAL
`
`FIG. 18
`
`IPR2023-00037
`Apple EX1020 Page 5
`
`IPR2023-00037
`Apple EX1020 Page 5
`
`

`

`U.S. Patent
`
`Jul. 18, 2000
`
`Sheet 4 of 12
`
`6,092,045
`
`WORD RECOGNITION (BY VITERBI
`ALGORITHM)
`14
`
`INPUT WORD AS A
`
`SERIES OF OBSERVATIONS
`
`IN FORM OF CEPSTRAL VECTORS
`20
`
`23
`
`COMPARISON STEP
`FOR CURRENT OBSERVATIONIN
`SERIES, COMPUTE PROBABILITY
`OF IT BEING THE SAME AS EACH
`PATH IN THE STORED HMM
`TRELLIS
`21
`
`REPEAT FOR NEXT HMM
`
`UNTIL END
`
`22
`
`REPEAT FOR NEXT OBSERVATION
`UNTIL END
`
`OUTPUT SCORES FOR BEST MATCHING WORDS
`
`FIG.4
`
`PRIOR ART
`
`IPR2023-00037
`Apple EX1020 Page 6
`
`IPR2023-00037
`Apple EX1020 Page 6
`
`

`

`U.S. Patent
`
`Jul. 18, 2000
`
`Sheet 5 of 12
`
`6,092,045
`
`WORD RECOGNITION
`
`FIND FIRST OBSERVATION IN
`FIRST BLOCK, AND FIND
`FIRST MODEL 30
`
`
`
`COMPARE OBSERVATION
`
`TO MODEL 31
`
`NEXT OBSERVATION,
`UNTIL END OF BLOCK 32
`
`NEXT MODEL 33
`
`NEXT BLOCK OF
`OBSERVATIONS 34
`
`FIG. 5
`
`IPR2023-00037
`Apple EX1020 Page 7
`
`IPR2023-00037
`Apple EX1020 Page 7
`
`

`

`U.S. Patent
`
`Jul. 18, 2000
`
`Sheet 6 of 12
`
`6,092,045
`
`A
`
`B
`
`A
`
`C
`
`
`
`
`WORDS IN
`SERIES OF OBSERVATIONS MAKING
`
`DICTIONARY
`UP WORD BEING RECOGNISED
`OF MODELS
`
`U
`

`
`“ABRAHAM”
`
`“BEACH”
`
`“CARROT”
`
`“DEPTH”
`
`LAST TO BE CALCULATED
`
`FIG,6
`
`SERIES OF OBSERVATIONS MAKING
`UP WORD BEING RECOGNISED
`
`
`WORDS_IN
`}
`{BLOCK1 }{BLOCK2
`
`DICTIONARY
`
`AC
`S
`
`
`
`OF MODELS A B U
`5% 10% 11% 10% 7%, 5%
`‘ABRAHAM’
`
`To
`
`4% 5% [44% 3%
`
`“BEACH”
`
`a ApS} ST
`
`DS
`
`1p
`
`% $2,
`
`5%.
`
`3%
`
`ho
`
`9B
`
`“CARROT”
`
`“DEPTH”
`
`FIG. 7
`
`IPR2023-00037
`Apple EX1020 Page 8
`
`IPR2023-00037
`Apple EX1020 Page 8
`
`

`

`U.S. Patent
`
`Jul. 18, 2000
`
`Sheet 7 of 12
`
`6,092,045
`
`
`
`
`FRAME FRAME-FRAME
`STATE1
`qT
`Ts
`Ie
`
`STATE 2
`
`STATE 3
`
`STATE 4
`
`bo
`TO END STATE OR NEXT HMM
`
`FIG. 9
`
`IPR2023-00037
`Apple EX1020 Page 9
`
`IPR2023-00037
`Apple EX1020 Page 9
`
`

`

`U.S. Patent
`
`Jul. 18, 2000
`
`Sheet 8 of 12
`
`6,092,045
`
`WO
`
`WM 2 WD
`
`WH H W)
`
`uw
`
`
`
`vO)
`
`2 vw
`
`vw
`
`vv)
`
`Ww
`
`vw
`
`FIG. 10
`
`
`
`IPR2023-00037
`Apple EX1020 Page 10
`
`IPR2023-00037
`Apple EX1020 Page 10
`
`

`

`U.S. Patent
`
`Jul. 18, 2000
`
`Sheet 9 of 12
`
`6,092,045
`
`
`
`FIG. 12
`
`S]
`
`S?
`
`FIG. 13
`
`IPR2023-00037
`Apple EX1020 Page 11
`
`IPR2023-00037
`Apple EX1020 Page 11
`
`

`

`U.S. Patent
`
`Jul. 18, 2000
`
`Sheet 10 of 12
`
`6,092,045
`
`}
`else {
`Defineoutp as the index of output buffer ford .
`if (outp is in use) {
`Mark b as an update type.
`
`}
`else {
`Mark outp in use.
`
`}
`
`process_branch_by_type(b,inp,outp )
`
`1. Copy in to Buffer[0].
`2. Mark all buffers free. Mark Buffer[0] in use.
`3.
`4. foreach branch b {
`5.
`Define inp as the index of the input buffer to b.
`6.
`if (b is a terminal branch) {
`7.
`do_terminal_processing(leaf(b ), Buffer[inp ]).
`8
`9.
`10.
`11.
`12.
`13.
`14.
`15.
`16.
`17.
`18.
`19.
`20.
`22.
`24. }
`
`}
`
`Mark inp free.
`
`FIG. 14
`
`IPR2023-00037
`Apple EX1020 Page 12
`
`IPR2023-00037
`Apple EX1020 Page 12
`
`

`

`U.S. Patent
`
`Jul. 18, 2000
`
`Sheet 11 of 12
`
`6,092,045
`
`IN LEXICAL GRAPH 205
`
`COMPARE OBSERVATION 206
`
` DETERMINE TYPE OF EACH BRANCH
`
`
` INPUT SCORE FROM
`
`
`
`PREVIOUS BRANCH 207
`
`PROCESS BRANCH ACCORDING
`TO BRANCH TYPE 208
`
`NEXT BRANCH 209
`
`NEXT OBSERVATION UNTIL
`
`END OF BLOCK 210
`
`NEXT MODEL 211
`
`FIG. 15
`
`IPR2023-00037
`Apple EX1020 Page 13
`
`IPR2023-00037
`Apple EX1020 Page 13
`
`

`

`U.S. Patent
`
`Jul. 18, 2000
`
`Sheet 12 of 12
`
`6,092,045
`
`1
`
`3
`
`5
`
`
`
`rrnaetenennnecette
`
`a a_uy
`
`uw) 1t=O'1 t=1.t=2:t=3:1t=4:1 t=5,
`
`FIG. 16
`
`Restore state scores from global memory.
`S, = max[s,,u(Q)].
`v(O) = u(0).
`fort=0,1,..., B-1 {
`v(t+1) = max[u(t+1),s3 + bg(t)].
`S, = max[s, + b,(t),s; + b,(t)].
`S, = max[s, + b,(t),s, + b,(t)].
`Ss, = max[u(t+1),s, + b,(t)].
`
`} S
`
`ave the state scores in global memory.
`
`FIG. 17
`
`
`
`SweAPNAMPWNT
`
`IPR2023-00037
`Apple EX1020 Page 14
`
`IPR2023-00037
`Apple EX1020 Page 14
`
`

`

`6,092,045
`
`1
`METHOD AND APPARATUS FOR SPEECH
`RECOGNITION
`
`FIELD OF THE INVENTION
`
`BACKGROUND TO THE INVENTION
`
`2
`or more of the models, to determine a likelihood of a
`match to each of the one or more models;
`b) repeating step a) for models other than those in the
`subset; and
`5
`c) repeating steps a) and b) for a different one of the
`The invention relates to methods of comparingaseries of
`blocks.
`observations representing unknownspeech,to stored models
`Amongst the advantages of such a method is that the
`representing known speech, to methods of recognising pat-
`speed of processing of step a), the critical innermost loop,
`terns in a series of observations, by comparing the obser-
`can be improved. The innermost loop is the part which is
`vations to stored models, to apparatus for such methods, and
`repeated most often, and by carrying out the operations in
`to software for such methods.
`the order specified, cache misses can be reduced, and
`pipelining efficiency improved. Morespecifically, slow main
`memory accesses may be required only once per block per
`model, instead of once per observation per model.
`Preferably, the observations are represented as multidi-
`mensional vectors, for the comparison at step a). In such
`cases, processing requirements may be particularly onerous,
`and the advantages of the method become more valuable.
`Advantageously, the comparison at step a) uses a Viterbi
`algorithm. The Viterbi algorithm is a dynamic programming
`algorithm which is preferred for comparing observations to
`particular types of models. If such models are time series of
`states joined bytransitions, the Viterbi algorithm determines
`the probability of each observation matching a one or more
`of those states. If the models comprise strings of states,
`cumulative probability scores for each string can be
`determined, to find the best path. In such cases, processing
`requirements may be particularly onerous, and the advan-
`tages of the method become more valuable.
`Advantageously, the models are represented asfinite state
`machines with probability distribution functions attached.
`Continuously varying signals can be modeled as a series of
`stationary states, with a probability function at each state,
`indicating whether the signal movesto the next state or stay
`s at the given state, after a given time interval. An example
`is the Hidden Markov Model (HMM). This enables the
`models to contain a representation of patterns with varying
`rates of change, yet a common underlying pattern. However,
`again, processing requirements may be particularly onerous,
`and the advantages of the method become more valuable.
`Advantageously, the models comprise groups of repre-
`sentations of phonemes. This enables the models to be stored
`and used more efficiently. Recognising patterns in speech
`signals is particularly processor intensive, because of the
`wide range of differences in sounds carrying the same
`meaning. Also, in many speech recognition applications, it
`is necessary to carry out recognition in real time, that is in
`half a second or so after
`the speaker has finished.
`Accordingly, again, processing requirements may be par-
`ticularly onerous, and the advantages of the method become
`more valuable.
`Advantageously, the models comprise representations of
`elements of speech, and step a) comprises the step of:
`comparing the block of observations to a predetermined
`sequence of the models in the subset. The amount of
`processing can be reduced if the models are stored as
`sequences of elements of speech. Many utterances
`share elements with other utterances. To avoid com-
`paring the same elements multiple times, the individual
`models of the elements can be merged into sequences.
`The common parts can be represented once, with
`branches where the elements are no longer common.
`Advantageously, step a) comprises the steps of:
`comparing the block of observations to a predetermined
`sequence of the models in the subset;
`determining for each of the models in the sequence, a
`score which represents the likelihood of a match with
`the observations compared so far;
`
`Pattern recognition generally, and recognition of patterns
`in continuous signals such as speech signals has been a
`rapidly developing field. A limitation in many applications
`has been the cost of providing sufficient processing power
`for the complex calculations often required. This is particu-
`larly the case in speech recognition, all the more so when
`real time response is required, for example to enable auto-
`mated directory enquiry assistance, or for control operations
`based on speech input. To simulate the speed of response of
`a humanoperator, and thus avoid, a perception of “unnatu-
`ral” delays, which can be disconcerting, the spoken input
`needs to be recognised within abouthalf a second of the end
`of the spoken input.
`The computational load varies directly with the number of
`words or other elements of speech, which are modelled and
`held in a dictionary, for comparison to the spoken input. This
`is also knownas the size of vocabulary of the system. The
`computational load also varies according to the complexity
`of the models in the dictionary, and how the speech inputis
`processed into a representation ready for the comparison to
`the models. Finally, the actual algorithm for carrying out the
`comparisonis clearly a key factor. Numerous attempts have
`been made over many years to improvethe trade off between
`computational load, accuracy of recognition, and speed of
`recognition. For useable systems, having a tolerable recog-
`nition accuracy,
`the computational demands are high.
`Despite continuous refinements to models, speech input
`representations, and recognition algorithms, and advancesin
`processing hardware,
`there remains great demand to
`improve the above mentioned tradeoff.
`There are five main steps: audio channel adaptation,
`feature extraction, word end-point detection, speech
`recognition, and accept/reject decision logic. The speech
`recognition step, the fourth stage, is the most computation-
`ally intensive step, and thus a limiting factor as far as the
`above mentioned trade off is concerned. Depending on the
`size of vocabularies used, and the size of each model, both
`the memory requirements and the numberof calaculations
`required for each recognition decision, may limit the speed/
`accuracy/cost
`trade off. Examples of such systems are
`described in U.S. Pat. No. 5,390,278 (Gupta et al.), and in
`USS. Pat. No. 5,515,475 (Gupta).
`SUMMARYOF THE INVENTION
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`45
`
`50
`
`55
`
`60
`
`65
`
`IPR2023-00037
`Apple EX1020 Page 15
`
`is an object of the invention to provide improved
`It
`method and apparatus.
`According to the invention, there is provided a method of
`comparing a series of observations representing unknown
`speech,
`to stored models representing known speech, the
`series of observations being divided into at least two blocks
`each comprising two or more of the observations,
`the
`method comprising the steps of:
`a) comparing two or more of the observations in one of
`the blocks of observations, to a subset comprising one
`
`IPR2023-00037
`Apple EX1020 Page 15
`
`

`

`6,092,045
`
`3
`storing the score in a score buffer for use in determining
`scores of subsequent models in the sequence; and
`determining when the score is no longer needed, then
`re-using the score buffer to store a subsequent score.
`This can bring an advantage in reducing memory
`requirements, which canrelieve a significant constraint. This
`may occur in particular where the models comprise large
`numbers of paths, each of which has to have a cumulative
`probability score maintained, or where there are a large
`number of models.
`Advantageously, step a) comprises the step of:
`comparing the block of observations to a lexical graph
`comprising a predetermined sequence of the models in
`the subset, wherein the sequence comprises different
`types of models, and the comparison is dependent on
`the type; and the method comprises the step of:
`determining the types of the models before the block is
`compared.
`Wheredifferent algorithms are used, tuned to suit differ-
`ent types of model, or different branches of a sequence of
`models, processing can be speeded upif it is unnecessary to
`test for branch type during processing.
`In particular,
`it
`becomeseasier for a compiler to pipeline the algorithm. If
`a branch is encountered, which route to take must be
`resolved before continuing, thus the pipelining by prepro-
`cessing of instructions ahead of the current instruction may
`be held up.
`the models comprise finite state machines,
`Preferably,
`having multiple state sequences, wherein step a) comprises
`the steps of:
`determining state scores for the matches between each
`respective observation and state sequences of the
`respective model,
`making an approximation of the state scores, for the
`observation, for storing to use in matching subsequent
`observations, the approximation comprising fewerstate
`scores than were determined for the respective obser-
`vation.
`This enables the amount of storage required for state
`scores to be reduced. This is particularly advantageous
`where there are a large number of models, each with
`multiple states and therefore a large numberof state scores
`to be stored for use in the next block. In such cases, a two
`pass approach may be used, in which a rough, or fast match
`algorithm is used onthe first pass, to identify a small number
`of similar models, which are then compared in more detail
`in the second pass. The approximation is particularly advan-
`tageous forthe first pass, to reduce the memory used, and to
`speed up the process since fewer memory accesses are
`required, and fewer calculations. This is particularly advan-
`tageousif the calculations are time consuming floating point
`calculations.
`According to another aspect of the invention, there is
`provided a method of recognising patterns in a series of
`observations, by comparing the observations to stored
`models, using a processing means having a main memory
`for storing the models and a cache memory,
`the cache
`memory being too small
`to contain all
`the models and
`observations, the series of observations being divided into
`blocks of at least two observations, the method comprising
`the steps of:
`a) using the processor to compare a subset of the models
`to the observations in one of the blocks of observations,
`to recognise the patterns, the subset of the models being
`small enoughto fit in the cache memory;
`b) repeating step a) for a different subset of the models
`and;
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`4
`c) repeating steps a) and b) for a different one of the
`blocks.
`
`According to another aspect of the invention, there is
`provided a method of recognising patterns in a series of
`observations by comparing the observations to stored
`models, the series of observations being divided into at least
`two blocks each comprising two or more of the observations,
`the models comprising finite state machines, having multiple
`state sequences, the method comprising the steps of:
`a) comparing two or more of the observations in one of
`the blocks of observations, to a subset comprising one
`or more of the models, to determine a likelihood of a
`match to each of the one or more models, by deter-
`mining which of the state sequences of the respective
`modelis the closest match, and how close is the match;
`b) repeating step a) for models other than those in the
`subset; and
`c) repeating steps a) and b) for a different one of the
`blocks.
`According to another aspect of the invention, there is
`provided a method of comparing a series of observations
`representing unknownspeech,to stored models representing
`known speech, by comparing the observations to stored
`models, the series of observations being grouped into one or
`more blocks each comprising two or more of the
`observations, the models comprising finite state machines,
`having multiple state sequences, the method comprising, for
`each of the one or more blocks, the steps of:
`a) comparing two or more of the observations in the
`respective block, to a subset comprising one or more of
`the models, to determinea likelihood of a match to each
`of the one or more models, by determining which of the
`state sequences of the respective model is the closest
`match, and how close is the match; and
`b) repeating step a) for models other than those in the
`subset.
`In some applications, the series of observations may be
`short enough that only a single block is necessary. The
`advantages set out abovestill apply.
`According to another aspect of the invention, there is
`provided software stored on a computer readable medium
`for comparing a series of observations representing
`unknown speech,
`to stored models representing known
`speech,the series of observations being divided into at least
`two blocks each comprising two or more of the observations,
`the software being arranged for carrying out the steps of:
`a) comparing two or more of the observations in one of
`the blocks of observations, to a subset comprising one
`or more of the models, to determine a likelihood of a
`match to each of the one or more models;
`b) repeating step a) for models other than those in the
`subset; and
`c) repeating steps a) and b) for a different one of the
`blocks.
`According to another aspect of the invention, there is
`provided software stored on a computer readable medium
`for recognising patterns in a series of observations by
`comparing the observations to stored models, the series of
`observations being divided into at least two blocks each
`comprising two or more of the observations, the models
`comprising finite state machines, having multiple state
`sequences, the software being arranged to carry out the steps
`of:
`
`a) comparing two or more of the observations in one of
`the blocks of observations, to a subset comprising one
`or more of the models, to determine a likelihood of a
`
`IPR2023-00037
`Apple EX1020 Page 16
`
`IPR2023-00037
`Apple EX1020 Page 16
`
`

`

`5
`match to each of the one or more models, by deter-
`mining which of the state sequences of the respective
`modelis the closest match, and how closeis the match;
`b) repeating step a) for models other than those in the
`subset; and
`c) repeating steps a) and b) for a different one of the
`blocks.
`According to another aspect of the invention, there is
`provided software stored on a computer readable medium
`for comparing a series of observations representing
`unknown speech,
`to stored models representing known
`speech, by comparing the observationsto stored models,the
`series of observations being grouped into one or more blocks
`each comprising two or more of the observations, the models
`comprising finite state machines, having multiple state
`sequences, the software being arranged to carry out for each
`of the one or more blocks, the steps of:
`a) comparing two or more of the observations in the
`respective block, to a subset comprising one or more of
`the models, to determinea likelihood of a match to each
`of the one or more models, by determining whichof the
`state sequences of the respective model is the closest
`match, and how close is the match; and
`b) repeating step a) for models other than those in the
`subset.
`According to another aspect of the invention, there is
`provided a speech recognition processor for comparing a
`series of observations representing unknown speech,
`to
`stored models representing known speech,
`the series of
`observations being divided into at least two blocks each
`comprising two or more of the observations, the processor
`being arranged to carry out the steps of:
`a) comparing two or more of the observations in one of
`the blocks of observations, to a subset comprising one
`or more of the models, to determine a likelihood of a
`match to each of the one or more models;
`b) repeating step a) for models other than those in the
`subset; and
`c) repeating steps a) and b) for a different one of the
`blocks.
`
`According to another aspect of the invention, there is
`provided a speech recognition processor for recognising
`patterns in a series of observations by comparing the obser-
`vations to stored models, the series of observations being
`dividedinto at least two blocks each comprising two or more
`of the observations,
`the models comprising finite state
`machines, having multiple state sequences, the processor
`being arranged to carry out the steps of:
`a) comparing two or more of the observations in one of
`the blocks of observations, to a subset comprising one
`or more of the models, to determine a likelihood of a
`match to each of the one or more models, by deter-
`mining which of the state sequences of the respective
`modelis the closest match, and how closeis the match;
`b) repeating step a) for models other than those in the
`subset; and
`c) repeating steps a) and b) for a different one of the
`blocks.
`According to another aspect of the invention, there is
`provided a speech recognition processor for comparing a
`series of observations representing unknown speech,
`to
`stored models representing known speech, by comparing the
`observations to stored models,
`the series of observations
`being grouped into one or more blocks each comprising two
`or more of the observations, the models comprising finite
`
`FIG. 1 showsprincipal steps in a known speech recog-
`nition process;
`FIG. 2 shows known service controller apparatus for
`carrying out the method of FIG. 1;
`FIG. 3 showsthe speech recognition processor of FIG. 2
`in more detail;
`FIG. 4 showsthe recognition step of FIG. 1 in more detail;
`FIG. 5 shows the recognition step according to an
`39 embodimentof the invention;
`FIG. 6 showsa table of scores for a series of observations
`matched against models,
`indicating the known order of
`calculating the scores;
`FIG. 7 shows the table of scores as shown in FIG. 6,
`indicating an order of calculating the scores according to an
`embodiment of the invention;
`FIG. 8 shows a basic HMM topology;
`FIG. 9 showsa trellis of an HMM;
`FIG. 10 showsa trellis for a block Viterbi search;
`FIG. 11 shows how full models are collapsed to become
`fast-match models;
`FIG. 12 shows a block Viterbi lexical graph, showing
`branch types and score buffer assignment;
`FIG. 13 shows a fast match model;
`FIG. 14 shows pseudo code for the block Viterbi algo-
`rithm;
`FIG. 15 showsthe recognition step of FIG. 1 adapted for
`sq branch type optimization,
`FIG. 16 shows an optional HMM andtrellis;
`FIG. 17 showspseudo code for a block Viterbi algorithm
`for optional branches; and
`FIG. 18 shows an alternative recognition processor
`arrangement for carrying out some of the steps of FIG. 1.
`DETAILED DESCRIPTION
`
`35
`
`40
`
`45
`
`6,092,045
`
`6
`state machines, having multiple state sequences, the proces-
`sor being arrangedto carry out, for each of the one or more
`blocks, the steps of:
`a) comparing two or more of the observations in the
`respective block, to a subset comprising one or more of
`the models, to determinea likelihood of a match to each
`of the one or more models, by determining which of the
`state sequences of the respective model is the closest
`match, and how close is the match; and
`b) repeating step a) for models other than those in the
`subset.
`
`Preferred features may be combined as would be apparent
`to a skilled person, and may be combined with any aspect of
`the invention.
`To show,by way of example, how to put the invention into
`practice, embodiments will now be described in more detail,
`with reference to the accompanying drawings.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`10
`
`15
`
`25
`
`FIG. 1: Voice Recognition
`FIG. 1 shows an overall view of an example of a known
`pattern recognition process, for voice recognition. There are
`five main steps: channel adaptation (1), feature extraction
`(2), word end-point detection (3), speech recognition (4),
`and accept/reject decision logic (5).
`In the first step—channel adaptation—the system moni-
`tors the characteristics of the telephoneline that carries the
`speech, and creates an initial mathematical model of channel
`parameters, such as line noise level. This model, which is
`
`65
`
`IPR2023-00037
`Apple EX1020 Page 17
`
`IPR2023-00037
`Apple EX1020 Page 17
`
`

`

`6,092,045
`
`7
`updated continually throughout the recognition process, is
`used to adapt the vector of coefficients generated during
`feature extraction so that the coefficients respond to fore-
`ground speech and block out line channel variations.
`During the second step—feature extraction—the system
`divides the unknown word or phrase into short segments,
`called frames. Each frame lasts 12.75 milliseconds. Thus, a
`word such as Mississippi, which takes about a second tosay,
`would be broken down into about 100 frames.
`Next, the system conducts a spectral analysis of each
`frame and creates a mathematical vector representation of
`the spectrum. Each feature vector is a string of 15 coeffi-
`cients. Thefirst seven coefficients measure the energy of the
`frame at different
`frequency bands to distinguish,
`for
`example, between a high-frequency /s/ sound, and a lower-
`frequency vowel. The remaining coefficients measure the
`rate of change of these spectral energies.
`The third stage—word endpoint detection—is performed
`by an algorithm that uses the energy and short time spectrum
`of the speech signal. This algorithm removessilence before,
`after, and in the middle of the speech signal, andfilters out
`unwanted background noise in order to expedite the speech-
`recognition stage.
`Speech recognition is the fourth stage of the process. At
`this stage,
`the vector representations generated for each
`frame are compared with the FVR system’s stored models
`for the recognition vocabulary. Each word in the vocabulary
`is represented by a string of hidden Markov models
`(HMMs), one for each phoneme in the word, as shown in
`USS. Pat. No. 5,390,278 (Gupta et al.) The stored string of
`phoneme models that producesthe best match is determined
`using the multiple-pass approach, at which point the spoken
`word is considered recognized, as shown in U.S. Pat. No.
`5,515,475 (Gupta). The matching process is aided by sta-
`tistical models andefficient search algorithms embedded in
`the hidden Markov models.
`Thefinal stage of the processis rejection decision scoring,
`which determines whether the best match found during the
`speech-recognition stage should be accepted or rejected. To
`perform this evaluation,
`the recognizer employs a math-
`ematical distribution matrix. The closer the match, the more
`likely it 1s to be accepted. This feature provides the system
`with the ability to reject imposter input, or words notin its
`vocabulary.
`FIG. 2, knownservice controller apparatus:
`FIG. 2 shows someofthe principal hardware elements for
`a voice processing platform known as the Network Appli-
`cations Vehicle (NAV). The NAV platform is designed for
`enhanced network services, and enables network providers
`to support an array of revenue-generating and cost-saving
`interactive and automated speech-recognition s

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket