throbber
United States Patent (19)
`Nadas et al.
`
`73 Assignee:
`
`(54) NORMALIZATION OF SPEECHBY
`ADAPTIVE LABELLING
`75 Inventors: Arthur J. Nadas, Rock Tavern;
`David Nahamoo, White Plains, both
`of N.Y.
`International Business Machines
`Corporation, Armonk, N.Y.
`21 Appl. No.:71,687
`(22
`Filed:
`Jul. 9, 1987
`5ll Int. Cl........................... G10L 5/04; G10L 9/16
`52 U.S.C. ......................................... 381/41; 381/46
`58) Field of Search .................. 364/513.5; 381/41-50
`56
`References Cited
`U.S. PATENT DOCUMENTS
`2,938,079 5/1960 Flanagan ............................... 381/50
`3,673,331 6/1972 Hair et al. ............................. 381/42
`3,770,891 11/1973 Kalfaian ................................ 381/42
`3,969,698 7/1976 Bollinger et al...................... 381/43
`4,227,046 10/1980 Nakajima et al...................... 381/47
`4,256,924 3/1981 Sakoe .................................... 381/43
`4,282,403 8/1981 Sakoe .......
`... 364/513.5
`4,292,471 9/1981 Kuhn et al. ........................... 381/42
`4,394,538 7/1983 Warren et al. ........................ 381/.43
`4,519,094 5/1985 Brown et al. ......................... 381/43
`4,559,604 12/1985 Ichikawa et al.
`364/53.5
`4,597,098 6/1986 Noso et al. ............................ 381/46
`4,601,054 7/1986 Watariet al. ......................... 38/43
`4,658,4264/1987 Chabries et al. ...................... 381/47
`4,718,094 1/1988 Bahl et al.............................. 38/43
`4,720,802 1/1988 Damoulakis et al. ....
`364/S13.5
`4,752,957 6/1988 Maeda ................................... 381/42
`4,802,224 1/1989 Shiraki et al. ......................... 381/41
`4,803,729 2/1989 Baker .................................... 381/43
`OTHER PUBLICATIONS
`Paul, "An 800 PBS Adaptive Vector Quantization Vo
`coder Using a Perceptual Distance Measure', ICASSP
`'83 Boston, pp. 73-76.
`Burton et al., "Isolated-Word Recognition Using Mul
`tisection Vector Quantization Codebooks', IEEE
`Trans. on ASSP, vol.33, No. 4, Aug. 1985, pp. 837-849.
`Technical Disclosure Bulletin, vol. 28, No. 11, Apr.
`1986, pp. 5401-5402, by K. Sugawara, Entitled
`
`11
`45
`
`Patent Number:
`Date of Patent:
`
`4,926,488
`May 15, 1990
`
`"Method for Making Confusion Matrix by DP Match
`ing'.
`Siano, K., et al., "Speaker Adaptation Through Vec
`tor Quantization”, ICASSP'86, Tokyo, pp. 2643-2646.
`Tappert, C. C., et al., "Fast Training Method for
`Speech Recognition Systems', IBM Tech. Discl. Bull,
`vol. 21, No. 8, Jan. 1979, pp. 3413-3414.
`Technical Disclosure Bulletin, vol. 28, No. 11, Apr.
`1986, pp. 5401-5402, by K. Sugawara, Entitled,
`"Method for Making Confusion Matrix by DP Match
`ing'.
`Primary Examiner-Gary V. Harkcom
`Assistant Examiner-David D. Knepper
`Attorney, Agent, or Firm-Marc A. Block; Marc D.
`Schechter
`ABSTRACT
`57
`In a speech processor system in which prototype vec
`tors of speech are generated by an acoustic processor
`under reference noise and known ambient conditions
`and in which feature vectors of speech are generated
`during varying noise and other ambient and recording
`conditions, normalized vectors are generated to reflect
`the form the feature vectors would have if generated
`under the reference conditions. The normalized vectors
`are generated by: (a) applying an operator function Ai
`to a set of feature vectors x occurring at or before time
`interval i to yield a normalized vector yi=A(x); (b)
`determining a distance error vector Ei by which the
`normalized vector is projectively moved toward the
`closest prototype vector to the normalized vectory; (c)
`up-dating the operator function for next time interval to
`correspond to the most recently determined distance
`error vector; and (d) incrementing i to the next time
`interval and repeating steps (a) through (d) wherein the
`feature vector corresponding to the incremented ivalue
`has the most recent up-dated operator function applied
`thereto. With successive time intervals, successive nor
`malized vectors are generated based on a successively
`up-dated operator function. For each normalized vec
`tor, the closest prototype thereto is associated there
`with. The string of normalized vectors or the string of
`associated prototypes (or respective label identifiers
`thereof) or both provide output from the acoustic pro
`CeSSO.
`
`8 Claims, 8 Drawing Sheets
`
`
`
`
`
`AAPED
`fEES
`
`ff
`A9
`filt.
`A.
`POCESSct
`
`
`
`
`
`
`
`cysts
`OFA
`PCSO
`
`24
`
`IPR2023-00037
`Apple EX1017 Page 1
`
`

`

`U.S. Patent May 15, 1990
`
`Sheet 1 of 8
`
`4,926,488
`
`F. G.
`
`OO
`N
`
`iO2
`
`SPEECH
`
`ACOUSTIC
`PROCESSOR
`
`04
`
`F. G. 2
`
`
`
`
`
`ACOUSTC
`PROCESS OR
`
`iO2
`
`SPEECH
`CODER
`
`SPEECH
`SYNTHESIZER
`
`
`
`SPEECH
`SYNTHESIZER
`
`
`
`
`
`
`
`
`
`
`
`
`
`IPR2023-00037
`Apple EX1017 Page 2
`
`

`

`U.S. Patent May 15, 1990
`F. G. 3
`
`Sheet 2 of 8
`
`4,926,488
`
`
`
`PROTOTYPE SPACE = P, P.,..., Poo
`INPUT FEATURE VECTORS = {x,x2, x2, x axis...}
`OUTPUT FEATURE VECTORS = {xxxx,x,...}
`FENEME STRING = P, P., P., P., P.,..., P.
`
`IPR2023-00037
`Apple EX1017 Page 3
`
`

`

`U.S. Patent May 15, 1990
`
`Sheet 3 of 8
`
`4,926,488
`
`F. G. 4
`
`
`
`PROTOTYPE SPACE = { P., P.,..., Foo
`INPUT FEATURE VECTORS = X,X2, Xs, X,Xs...}
`OUTPUT FEATURE VECTORS = {Y,Y, Y-3, Ya,Y,...}
`FENEME STRING = P, Pi, P3, P3, Pse,...}
`
`IPR2023-00037
`Apple EX1017 Page 4
`
`

`

`U.S. Patent May 15, 1990
`Sheet 4 of 8
`F. G. 5
`
`4,926,488
`
`ACOUSTIC PROCESSOR
`200Y,
`--A- - - - - - - -
`-
`2O6
`20
`-
`202
`
`204
`
`208
`
`D
`
`PRE-
`AMP
`
`FILTER
`
`A / )
`CONVERTOR
`
`NORMALIZED
`OUTPUT
`VECTORS
`
`ADAPTED
`FENEMES
`
`ADAPTIVE
`ABELLER
`PROCESSOR
`
`
`
`212
`FFT
`AND
`FILTER
`BANK
`PROCESSOR
`
`
`
`
`
`CLUSTERING
`OPERATOR
`PROCESSOR
`
`PROTOTYPE
`MEMORY
`
`2i 6
`
`amum unas -e humus amus suus am sun
`
`IPR2023-00037
`Apple EX1017 Page 5
`
`

`

`Sheet S of 8
`U.S. Patent May 15, 1990
`F. G. 6
`300
`
`302
`
`COUNTER
`
`306
`
`
`
`
`
`
`
`
`
`NTAL
`
`4,926,488
`
`INPUT
`OVECTOR
`- FEATURES
`X
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`() PARAMETER
`
`MEMORY
`
`MEMORY
`
`
`
`FIR FILTER
`
`FROM
`PROTOTYPE
`MEMORY
`
`DISTANCE CACULATOR
`
`MINIMUM SELECTOR
`
`DERVATIVE CALCULATOR
`
`FIRST ORDER
`FIR FLTER
`
`NORMAZED
`O OUTPUT
`VECTOR
`
`PROTOTYPE
`OUTPUT
`
`IPR2023-00037
`Apple EX1017 Page 6
`
`

`

`Sheet 6 of 8
`U.S. Patent May 15, 1990
`F. G. 7
`
`4,926,488
`
`
`
`400
`
`402
`
`404
`
`AODER
`
`SQUARE
`
`ACCUMULATOR
`
`
`
`F G. 9
`
`
`
`422
`
`IPR2023-00037
`Apple EX1017 Page 7
`
`

`

`Sheet 7 of 8
`U.S. Patent May 15, 1990
`F. G. O
`
`4,926,488
`
`SPEECH INPUT
`
`-so
`
`502
`
`
`
`
`
`NITALIZE
`PARAMETERS
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`PREPARE
`FEATURE
`VECTORS
`
`PERFORM
`NORMALIZATION
`
`FIND CLOSEST
`PROTOTYPE
`
`CALCULATE CLOSEST
`DISTANCE DER WATIVE
`WITH RESPECT TO
`NORMALIZATION
`PARAMETERS
`
`504
`
`506
`
`NORMALIZED
`OUTPUT
`VECTORS
`Y
`
`ADAPTED
`FENEMES
`
`50
`
`
`
`UPDATE THE
`NORMALIZATION
`PARAMETERS
`
`
`
`52
`
`IPR2023-00037
`Apple EX1017 Page 8
`
`

`

`US. Patent May 15, 1990
`
`Sheet 8 of 8
`
`4,926,488
`
`FIG. 1
`
`aarrae
`
`602
`
`i k=0ai OTHERWISE ¥. B(g)=0 VW
`
`i=it
`
`603
`
`
`
`FOR ALL £=1,...N ai(k,)=A(k,£)
`
`bj(4) =B (2)
`Yj CB) =2 ailkfj(lO bi(0)
`
`
`
`
`
`
`Vaj (k,g)= 20Kj_4 (2) )(¥j(2)-Pj;(2))
`Voi (g) 22CY (2) -Pj (id)
`
`A(k,£) =aj(k,£)-Cy Vaj(k ,£)
`B(£) = bi(£)- Cy Vy. (g)
`
`IPR2023-00037
`Apple EX1017 Page 9
`
`IPR2023-00037
`Apple EX1017 Page 9
`
`

`

`1.
`
`NORMALIZATION OF SPEECHBY ADAPTIVE
`LABELLING
`
`5
`
`4,926,488
`2
`and proper selection of the closest prototype vector for
`each feature vector is critical.
`The relationship between a feature vector and the
`prototype vectors has normally, in the past, been static;
`there has been a fixed set of prototype vectors and a
`feature vector based on the values of set features.
`However, due to ambient noise, signal drift, changes
`in the speech production of the talker, differences be
`tween talkers or a combination of these, signal traits
`may vary over time. That is, the acoustic traits of the
`training data from which the prototype vectors are
`derived may differ from the acoustic traits of the data
`from which the test or new feature vectors are derived.
`The fit of the prototype vectors to the new data traits is
`5 normally not as good as to the original training data.
`This affects the relationship between the prototype
`vectors and later-generated feature vectors, which re
`sults in a degradation of performance in the speech
`processor.
`SUMMARY OF THE INVENTION
`It is an object of the present invention to provide
`apparatus and method for adapting feature vectors in
`order to account for noise and other ambient conditions
`as well as intra and inter speaker variations which cause
`the speech data traits from which feature vectors are
`derived to vary from the training data traits from which
`the prototypes are derived.
`h
`In particular, each feature vector Xi generated at a
`time interval i is transformed into a normalized vector
`yi according to the expression:
`
`30
`
`BACKGROUND OF THE INVENTION
`I. Field of the Invention
`In general, the present invention relates to speech
`processing (such as speech recognition). In particular,
`the invention relates to apparatus and method for char
`10
`acterizing speech as a string of spectral vectors and/or
`labels representing predefined prototype vectors of
`speech.
`II. Description of the Problem
`In speech processing, speech is generally represented
`by an n-dimensional space in which each dimension
`corresponds to some prescribed acoustic feature. For
`example, each component may represent a amplitude of
`energy in a respective frequency band. For a given time
`interval of speech, each component will have a respec
`20
`tive amplitude. Taken together, the namplitudes for the
`given time interval represent an n-component vector in
`the n-dimensional space.
`Based on a known sample text uttered during a train
`ing period, the n-dimensional space is divided into a
`25
`fixed number of regions by some clustering algorithm.
`Each region represents sounds of a common prescribed
`type: sounds having component values which are
`within regional bounds. For each region, a prototype
`vector is defined to represent the region.
`The prototype vectors are defined and stored for
`later processing. When an unknown speech input is
`uttered, for each time interval, a value is measured or
`computed for each of the n components, where each
`component is referred to as a feature. The values of all
`35
`of the features are consolidated to forman n-component
`feature vector for a time interval.
`In some instances, the feature vectors are used in
`subsequent processing.
`In other instances, each feature vector is associated
`with one of the predefined prototype vector and the
`associated prototype vectors are used in subsequent
`processing.
`In associating prototype vectors with feature vectors,
`the feature vector for each time interval is typically
`45
`compared to each prototype vector. Based on a prede
`fined closeness measure, the distance between the fea
`ture vector and each prototype vector is determined
`and the closest prototype vector is selected.
`A speech type of event, such as a word or a phoneme,
`is characterized by a sequence of feature vectors in the
`time period over which the speech event was produced.
`Some prior art accounts for temporal variations in the
`generation of feature vector sequences. These varia
`tions may result from differences in speech between
`55
`speakers or for a single speaker speaking at different
`times. The temporal variations are addressed by a pro
`cess referred to as time warping in which time periods
`are stretched or shrunk so that the time period of a
`feature vector sequence conforms to the time period of
`60
`a reference prototype vector sequence, called a ten
`plate. Oftentimes, the resultant feature vector sequence
`is styled as a "time normalized' feature vector se
`quence.
`Because feature vectors or prototype vectors (or
`representations thereof) associated with the feature
`vectors or both are used in subsequent speech process
`ing, the proper characterization of the feature vectors
`
`where x is a set of one or more feature vectors at or
`before time interval i and where Ai is an operator func
`tion which includes a number of parameters. According
`to the invention, the values of the parameters in the
`operator function are up-dated so that the vectory (at a
`time interval i) is more informative than the feature
`vector x (at a time interval i) with respect to the repre
`sentation of the acoustic space characterized by an ex
`isting set of prototypes. That is, the transformed vectors
`yi more closely correlate to the training data upon
`which the prototype vectors are based than do the fea
`ture vectors xi,
`Generally, the invention includes transforming a fea
`ture vector x to a normalized vectory according to an
`operator function; determining the closest prototype
`vector for yi, altering the operator function in a manner
`which would move yi closer to the closest prototype
`thereto; and applying the altered operator function to
`the next feature vector in the transforming thereof to a
`normalized vector. Stated more specifically, the present
`invention provides that parameters of the operator
`function be first initialized. The operator function Ao at
`the first time interval i=O is defined with the initialized
`parameters and is applied to a first vector xoto produce
`a transformed vector yo. For yo, the closest prototype
`vector is selected based on an objective closeness func
`tion D. The objective function D is in terms of the
`parameters used in the operator function. Optimizing
`the function D with respect to the various parameters
`(e.g., determining, with a "hill-climbing' approach, a
`value for each parameter at which the closeness func
`tion is maximum), up-dated values for the parameters
`are determined and incorporated into the operator func
`tion for the next time interval i=1. The adapted opera
`
`50
`
`65
`
`IPR2023-00037
`Apple EX1017 Page 10
`
`

`

`25
`
`4,926,488
`3
`4.
`tor function A1 is applied to the next feature vectorx to
`ment is a speech coder 110. The speech coder 110 alters
`produce a normalized vector y. For the normalized
`the form of the data exiting the acoustic processor 102
`vectory, the closest prototype vector is selected. The
`to provide a coded representation of speech data. The
`objective function D is again optimized with respect to
`coded data can be transferred more rapidly and can be
`the various parameters to determine up-dated values for
`contained in less storage than the original uncoded data.
`the parameters. The operator function A2 is then de
`The second element receiving input from the acoustic
`fined in terms of the up-dated parameter values.
`processor 102 is a speech synthesizer 112. In some envi
`With each successive feature vector, the operator
`ronments, it is desired to enhance a spoken input by
`function parameters are up-dated from the previous
`reducing noise which accompanies the speech signal. In
`values thereof.
`such environments, a speech waveform is passed
`10
`In accordance with the invention, the following im
`through an acoustic processor 102 and the data there
`proved outputs are generated. One output corresponds
`from enters a speech synthesizer 112 which provides a
`to "normalized' vectors y. Another output corre
`speech output with less noise.
`sponds to respective prototype vectors (or label repre
`The third element corresponds to a speech recognizer
`sentations thereof) associated with the normalized vec
`114 which converts the output of the acoustic processor
`15
`102 into text format. That is, the output from the acous
`tOrs.
`When a speech processor receives continuously nor
`tic processor 102 is formed into a sequence of words
`malized vectors yi as input rather than the raw feature
`which may be displayed on a screen, processed by a text
`vectors xi, the degradation of performance is reduced.
`editor, used in providing commands to machinery,
`Similarly, for those speech processors which receive
`stored for later use in a textual context, or used in some
`20
`successive prototype vectors from a fixed set of proto
`other text-related manner.
`type vectors and/or label representations as input, per
`Various examples of the three elements are found in
`formance is improved when the input prototype vectors
`the prior technology. In that the present invention is
`are selected based on the transformed vectors rather
`mainly involved with generating input to these various
`than raw feature vectors.
`elements, further details are not provided. It is noted,
`however, that a preferred use of the invention is in
`BRIEF DESCRIPTION OF THE DRAWINGS
`conjunction with a "Speech Recognition System' in
`FIG. 1 is a general block diagram of a speech process
`vented by L. Bahl, S. V. DeGennaro, and R. L. Mercer
`ing system.
`for which a patent application was filed on Mar. 27,
`FIG. 2 is a general block diagram of a speech process
`1986 (S.N. 06/845155) now Pat. No. 4,718,094. The
`30
`ing system with designated back ends.
`earlier filed application is assigned to the IBM Corpora
`FIG. 3 is a drawing illustrating acoustic space parti
`tion, the assignee of the present application, and is in
`tioned into regions, where each region has a representa
`corporated herein by reference to the extent necessary
`tive prototype included therein. Feature vectors are
`to provide background disclosure of a speech recog
`also shown, each being associated with a "closest” pro
`nizer which may be employed with the present inven
`35
`totype vector.
`tion.
`FIG. 4 is a drawing illustrating acoustic space parti
`At this point, it is noted that the present invention
`tioned into regions, where each region has a representa
`may be used with any speech processing element which
`tive prototype included therein. Feature vectors are
`receives as input either feature vectors or prototype
`shown transformed according to the present invention
`vectors (or labels representative thereof) associated
`into normalized vectors which are each associated with
`with feature vectors. By way of explanation, reference
`a "closest' prototype vector.
`is made to FIG. 3. In FIG. 3, speech is represented by
`FIG. 5 is a block diagram showing an acoustic pro
`an acoustic space. The acoustic space has n dimensions
`cessor which embodies the adaptive labeller of the pres
`and is partitioned into a plurality of regions (or clusters)
`ent invention.
`by any of various known techniques referred to as
`45
`FIG. 6 is a block diagram showing a specific embodi
`"clustering'. In the present embodiment, acoustic space
`ment of an adaptive labeller according to the present
`is divided into 200 non-overlapping clusters which are
`invention.
`preferably Voronoi regions. FIG. 3 is a two-dimen
`FIG. 7 is a diagram of a distance calculator element
`sional representation of part of the acoustic space.
`of FIG. 6.
`For each region in the acoustic space, there is defined
`50
`FIG. 8 is a diagram of a minimum selector element of
`a respective, representative n-component prototype
`FIG. 6.
`vector. In FIG 3, four of the 200 prototype vectors Ps,
`P1, P4, and Pss are illustrated. Each prototype repre
`FIG. 9 is a diagram of a derivative calculator element
`of FIG. 6.
`sents a region which, in turn, may be viewed as a
`FIG. 10 is a flowchart generally illustrating the steps
`"sound type.' Each region, it is noted, contains vector
`55
`of adaptive labelling according to the present invention.
`points for which then components -when taken toge
`FIG. 11 is a specific flowchart illustrating the steps of
`ther-are somewhat similar.
`adaptive labelling according to the present invention.
`In a first embodiment, then components correspond
`to energy amplitudes in n distinct frequency bands. The
`DESCRIPTION OF THE INVENTION
`points in a region represent sounds in which the n fre
`In FIG. 1, the general diagram for a speech process
`quency band amplitudes are collectively within regional
`ing system 100 is shown. An acoustic processor 102
`bounds.
`receives as input an acoustic speech waveform and
`Alternatively, in another earlier filed patent applica
`converts it into data which a back-end 104 processes for
`tion commonly assigned to the IBM Corporation,
`a prescribed purpose. Such purposes are suggested in
`which is incorporated herein by reference, the n com
`FIG. 2.
`ponents are based on a model of the human ear. That is,
`In FIG. 2, the acoustic processor 102 is shown gener
`a neural firing rate in the ear is determined for each of
`ating output to three different elements. The first ele
`in frequency bands; the n neural firing rates serving as
`
`65
`
`IPR2023-00037
`Apple EX1017 Page 11
`
`

`

`4,926,488
`5
`6
`next error vector E3 for time interval i=3. The error
`then components which define the acoustic space, the
`prototype vectors, and feature vectors used in speech
`vector E3 in effect builds from the projected errors of
`recognition. The sound types in this case are defined
`previous feature vectors.
`based on then neural firing rates, the points in a given
`Referring still to FIG. 4, it is observed that error
`region having somewhat similar neural firing rates in 5
`vector E3 is added to feature vector x to provide a
`the n frequency bands. The prior application entitled
`transformed normalized vectory 4, which is projected a
`"Nonlinear Signal Processing in a Speech Recognition
`distance toward the prototype associated therewith. ya
`is in the region corresponding to prototype P3; the pro
`System', U.S.S.N. 06/665401, was filed on Oct. 26,
`1984 and was invented by J. Cohen and R. Bakis.
`jected move is thus toward prototype vector P3 by a
`distance computed according to an objective function.
`Referring still to FIG. 3, five feature vectors at re- 10
`spective successive time intervals is 1, i=2, i=3, i=4,
`Error vector E4 is generated and is applied to feature
`vector xsto yieldys, ysis in the region corresponding to
`and is 5 are shown as X1, X2, X3, X4, and X5, respec
`tively. According to standard prior art methodology,
`prototype vector Psg; the projected move of ys is thus
`toward that prototype vector.
`each of the five identified feature vectors would be
`assigned to the Voronoi region corresponding to the 15
`FIG. 4, each feature vector x is transformed into a
`prototype vector P11.
`normalized vectory. It is the normalized vectors which
`The two selectable outputs for a prior art acoustic
`serve as one output of the acoustic processor 102,
`processor would be (1) the feature vectors X1,X2, X3,
`namely yy2ysy4ys. Each normalized vector, in turn,
`X4, and Xs themselves and (2) the prototypes associated
`has an associated prototype vector. A second output of
`therewith, namely P11, P11, P11, P1, P1, respectively. 20
`the acoustic processor 102 is the associated prototype
`It is noted that each feature vector X1, X2, X3, X4, and
`vector for each normalized vector. In the FIG. 4 exam
`X5 is displaced from the prototype vector P11 by some
`ple, this second type of output would include the proto
`type vector string PPPPP56. Alternatively, assign
`considerable deviation distance; however the prior
`technology ignores the deviation distance.
`ing each prototype a label (or "feneme') which identi
`In FIG.4, the effect underlying the present invention 25
`fies each prototype vector by a respective number, the
`second output may be represented by a string such as
`is illustrated. With each feature vector, at least part of
`the deviation distance is considered in generating more
`11,1,1,3,3,56 rather than the vectors themselves.
`informative vector outputs for subsequent speech cod
`In FIG. 5, an acoustic processor 200 which embodies
`ing, speech synthesis, or speech recognition processing.
`the present invention is illustrated. A speech input en
`Looking first at feature vector x, a transformation is 30
`ters a microphone 202, such as a Crown PZM micro
`formed based on an operator function A1 to produce a
`phone. The output from the microphone 202 passes
`transformed normalized vectory. The operator func
`through a pre-amplifier 204, such as a Studio Consul
`tion is defined in terms of parameters which, at time
`tants Inc. pre-amplifier, enroute to a filter 206 which
`interval i=1, are initialized so that y=x in the FIG. 4
`operates in the 200 Hz to 8 KHz range. (Precision Fil
`embodiment; x and y are directed to the same point. 35
`ters markets a filter and amplifier which may be used for
`It is observed that initialization may be set to occur at
`elements 206 and 208.) The filtered output is amplified
`in amplifier 208 before being digitized in an A/D con
`time interval i=0 or i=1 or at other time intervals
`depending on convention. In this regard, in FIG. 4
`vertor 210. The convertor 210 is a 12-bit, 100 kHz ana
`log-to-digital convertor. The digitized output passes
`initialization occurs at time interval i=1; in other parts
`of the description herein initialization occurs at time 40
`through a Fast Fourier Transform FFT/Filter Bank
`interval i=0.
`Stage 212 (which is preferably an IBM 3081 Processor).
`Based on a predefined objective function, an error
`The FFT/Filter Bank Stage 212 separates the digitized
`vector E is determined. In FIG. 4, E1 is the difference
`output of the A/D convertor 210 according to fre
`vector of projected movement of y in the direction of
`quency bands. That is, for a given time interval, a value
`the closest prototype thereto. (The meaning of "close- 45
`is measured or computed for each frequency band based
`ness-- is discussed hereinbelow.) Ei may be viewed as a
`on a predefined characteristic (e.g., the neural firing
`determined error vector for the normalized vectory at
`rate mentioned hereinabove). The value for each of the
`frequency bands represents one component of a point in
`time interval i=1.
`Turning next to feature vector x, it is noted that y is
`the acoustic space. For 20 frequency bands, the acoustic
`determined by simply vectorally adding the E1 error 50
`space has n=20 dimensions and each point has 20 com
`vector to feature vector X2. A projected distance vec
`ponents.
`tor of movement of y, toward the prototype associated
`During a training period in which known sounds are
`therewith (in this case prototype P11) is then computed
`uttered, the characteristic(s) for each frequency band is
`according to a predefined objective function. The result
`measured or computed at successive time intervals.
`of adding (1) the computed projected distance vector 55
`Based on the points generated during the training per
`fromyzonto (2) the error vector E1 (extending from the
`iod, in response to known speech inputs, acoustic space
`feature vector x2) is an error vector E2 for time interval
`is divided into regions. Each region is represented by a
`i=2. The error vector E2 is shown in FIG. 4 by a
`prototype vector. In the present discussion, a prototype
`vector is preferably defined as a fully specified probabil
`dashed line arrow.
`Turning next to feature vector x3, the accumulated 60
`ity distribution over the n-dimensional space of possible
`error vector E2 is shown being added to vector x3 in
`acoustic vectors.
`order to derive the normalized vectory3. Significantly,
`A clustering operator 214 (e.g., an IBM 3081 proces
`it is observed that y is in the region represented by the
`sor) determines how the regions are to be defined, based
`prototype P3. A projected move of yi toward the proto
`on the training data. The prototype vectors which rep
`type associated therewith is computed based on an ob- 65
`resent the regions, or clusters, are stored in a memory
`jective function. The result of adding (1) the computed
`216. The memory 216 stores the components of each
`projected distance vector from y, onto (2) the error
`prototype vector and, preferably, stores a label (or
`vector E2 (extending from the feature vector X3) is a
`feneme) which uniquely identifies the prototype vector.
`
`IPR2023-00037
`Apple EX1017 Page 12
`
`

`

`O
`
`15
`
`4,926,488
`7
`8
`Preferably, the clustering operator 214 divides the
`308 are incorporated into the operator function imple
`acoustic space into 200 clusters, so that there are 200
`mented by the FIR filter 310 to generate a normalized
`prototype vectors which are defined based on the train
`output vectory1. y exits the labeller 300 as the output
`ing data. Clustering and storing respective prototypes
`vector followingyo and also enters the distance calcula
`for the clusters are discussed in prior technology.
`tor 312. An associated prototype is selected by the mini
`During the training period, the FFT/Filter Bank
`mum selector 314; the label therefor is provided as the
`Stage 212 provides data used in clustering and forming
`next prototype output from the labeller 300. The param
`prototypes. After the training period, the FFT/Filter
`eters are again up-dated by means of the derivative
`Bank Stage 212 provides its output to an adaptive la
`calculator 316 and the filter 318.
`beller 218 (which preferably comprises an IBM 3081
`Referring to FIG. 7, a specific embodiment of the
`processor). After the training period and the prototypes
`distance calculator 312 is shown to include an adder 400
`are defined and stored, unknown speech inputs (i.e., an
`for subtracting the value of one frequency band of a
`unknown acoustic waveform) are uttered into the mi
`given prototype vector from the normalized value of
`crophone 202 for processing. The FFT/Filter Bank
`the same band of the output vector. In similar fashion, a
`Stage 212 produces an output for each successive time
`difference value is determined for each band. Each
`interval (i=1,2,3,. . . ), the output having a value for
`resulting difference is supplied to a squarer element 402.
`each of the n=20 frequency bands. The 20 values, taken
`The output of the squarer element 402 enters an accu
`together, represent a feature vector. The feature vectors
`mulator 404. The accumulator 404 sums the difference
`enter the adaptive labeller 218 as a string of input fea
`values for all bands. The output from the accumulator
`ture Vectors.
`20
`404 enters the minimum selector 314.
`The other input to the adaptive labeller 218 is from
`FIG. 8 shows a specific minimum selector formed of
`the prototype memory 216. The adaptive labeller 218, in
`a comparator 410 which compares the current mini
`response to an input feature vector, provides as output:
`mum distance diagainst the current computed distance
`(1) a normalized output vector and (2) a label corre
`dk for a prototype vector Pik. If dj<dk, j=k; otherwise
`sponding to the prototype vector associated with a
`25
`j retains its value. After all distance computations are
`normalized output vector. At each successive time in
`processed by the comparator 410, the last value for j
`terval, a respective normalized output vector and a
`represents the (label) prototype output.
`corresponding label (or feneme) is output from the
`FIG. 9 shows a specific embodiment for the deriva
`adaptive labeller 218.
`tive calculator which includes an adder 420 followed by
`FIG. 6 is a diagram illustrating a specific embodiment
`30
`a multiplier 422. The adder 420 subtracts the associated
`of an adaptive labeller 300 (see labeller 218 of FIG. 5).
`prototype from the normalized output vector; the dif
`The input feature vectors xi are shown entering a
`ference is multiplied in the multiplier 422 by another
`counter 302. The counter 302 increments with each
`value (described in further detail with regard to FIG.
`time intervalstarting with i=0. At i=0, initial parame
`11).
`ters are provided by memory 304 through switch 306 to
`35
`FIG. 10 is a general flow diagram of a process 500
`a parameter storage memory 308. The input feature
`performed by the adaptive labeller 300. Normalization
`vector xo enters an FIR filter 310 together with the
`parameters are initialized in step 502. Input speech is
`stored parameter values. The FIR filter 310 applies the
`converted into input feature vectors in step 504. The
`operator function A0 to the input feature vector Xo as
`input feature vectors xi are transformed in step 506 into
`discussed hereinabove. (A preferred operator function
`normalized vectors yi which replace the input feature
`is outlined in the description hereinbelow.) The normal
`vectors in subsequent speech processing. The normal
`ized output vectoryo from the FIR filter 310 serves as
`ized vectors provide one output of the process 500. The
`an output of the adaptive labeller 300 and also as an
`input to distance calculator 312 of the labeller 300. The
`closest prototype for each normalized vector is found in
`distance calculator 312 is also connected to the proto
`step 508 and th

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket