throbber
IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. ASSP-27, NO. 3, JUNE 1979
`
`247
`
`[3] C. S . Burus and T. W. Parks, “Time domain design of recursive
`digital filters,” IEEE Trans. Audio Electroacoust., vol. AU-18,
`pp. 137-141, June 1970.
`[4] F. N. Cornett, “First and second order techniques for the design
`of recursive filters,” Ph.D. dissertation, Colorado State Univ., Ft.
`Collins, CO, 1976.
`[5] D. C. Farden and L. L. Scharf, “Statistical design of nonrecursive
`digital filters,” IEEE Trans. Acoust., Speech, Signal Puocessing,
`vol. ASSP-22, pp. 188-196, June 1974.
`[6] E. Parzen, “Multiple time series Modeling,” in Multivariate Analp
`sis IZ, P. Krishnaiah, Ed. New York: Academic, pp. 389-409.
`[7] W. C. Kellog, “Time domain design of nonrecursive least mean-
`IEEE Trans. Audio Electroacoust., vol.
`square digital filters,”
`AU-20, pp. 155-158, June 1972.
`[8] D. C. Farden and L. L. Scharf, “Authors reply to ‘Comments on
`statistical design of nonrecursive digital filters,”’ IEEE Dam.
`Acoust., Speech, Signal Processing, vol. ASSP-23, pp. 495-496,
`Od. 1975.
`[9] U. Grenander and G. Szego, Toeplitz Forms and Their Applica-
`tions. Berkeley, CA: Univ. California Press, 1958.
`[ 101 K. Steiglitz, “Computer-aided design of recursive digital filters,”
`ZEEE Trans. Audio Electroacoust., vol. AU-18, pp. 123-129,
`June 1970.
`[ l l ] P. Thajchayapong and P. J. W. Rayner, “Recursive digital fiter
`design by linear programming,” ZEEE Trans. Audio Electro-
`
`acoust., vol. AU-21,pp. 107-112, Apr. 1973.
`[ 121 L. R. Rabiner, N. Y. Graham, and H. D. Helms, “Linear program-
`ming design of IIR digital Titers with arbitrary magnitude func-
`tions,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-
`22, pp. 117-123, Apr. 1974.
`[13] D. E. Dudgeon, “Recursive filter design using differential correc-
`tion,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-
`22, pp. 443-448, Dec. 1974.
`[14] H. Dubois and H. Leich, “On the approximation problem for re-
`cursive digital filters with arbitrary attenuation curve in the pass-
`band and the stopband,” IEEE Trans. Acoust., Speech, Signal
`Processing, vol. ASSP-23, pp. 202-207, Apr. 1975.
`[ 151 C. Charalambous, “Minimax optimization of recursive digital fi-
`ters using recent minimax results,”IEEE Trans. Acoust., Speech,
`Signal Processing, vol. ASSP-23, pp. 333-346, Aug. 1975.
`[16] J. A. Cadzow, “Recursive digital fiiter synthesis
`via gradient
`based algorithms,” IEEE Trans. Acoust., Speech, Signal Process-
`ing, vol. ASSP-24, pp. 349-356, Oct. 1976.
`[ 17 J H. Clergeot and L. L. Scharf, “Connections between classical and
`FIR digital filter design,” IEEE Trans.
`statistical methods of
`Acoust., Speech, Signal Processing, vol. ASSP-26, pp. 463-465,
`Oct. 1978.
`[ 181 K. Steiglitz and L. E. McBride, “A technique for identification of
`linear systems,” IEEE Trans. Automat. Contr., vol. AGIO, pp.
`461-464, Od. 1965.
`
`attempt to minimize the rms
`Abstract-Predictive coding methods
`error in
`the coded signal. However, the human ear does not perceive
`basis of rms error, regardless of its spectral
`signal distortion on the
`shape relative to the signal spectrum. In designing a coder for speech
`signals, it is necessary to consider the spectrum of the quantization
`noise and its relation to the speech spectrum. The theory of
`auditory
`masking suggests that noise in the formant regions would be partially or
`totally masked by the speech signal. Thus, a large part of the perceived
`noise in a coder comes from frequency regions where the signal level is
`low. In this paper, methods
`for reducing the subjective distortion in
`Im-
`predictive coders for speech
`signals are described and evaluated.
`proved speech quality is obtained: 1) by efficient removal of formant
`and pitch-related redundant structure of speech before quantizing, and
`2) by effective masking of the quantizer noise by the speech signal.
`
`F
`
`I. INTRODUCTION
`OR autocorrelated signals, such as speech, predictive cod-
`ing [ 11 - [4] is an efficient method of encoding the signal
`into digital form. The coding efficiency
`is achieved by quan-
`
`Manuscript received July 18, 1978; revised November 28, 1978.
`B. S . Atal is with Bell Laboratories, Murray Hill, NJ 07974.
`M. R. Schroeder is with the Drittes Physikalisches Institut, University
`of Gottingen, Gottingen, Germany, and Bell Laboratories, Murray Hill,
`NJ 07974.
`
`signal which cannot be pre-
`tizing and transmitting only the
`the already coded signal. In predictive coders,
`dicted from
`the power of the quantizer noise is proportional to the power
`of the prediction error. Thus, efficient prediction is important
`for minimizing the quantizer error.
`Small quantization error,
`however, does not ensure that the distortion
`in the speech
`signal is perceptually small; it is necessary to consider the
`spectrum of the quantization
`noise and its relation
`to the
`speech spectrum.
`The theory of auditory masking
`suggests
`that noise in the formant regions would be partially or totally
`masked by the speech
`signal. Thus, a
`large part of the per-
`ceived noise in a coder comes from the frequency regions where
`the signal level is low. Moreover, we can tolerate more distor-
`tion in the transitional segments in speech (where rapidly
`changing formants produce wider formant regions) in compari-
`son to the steady segments.
`In this paper, we discuss methods for modifying the spec-
`trum of the quantization noise in a predictive coding system
`for speech to reduce the perceptible distortion introduced
`by
`such coders. The proper spectral shaping
`is realized by con-
`trolling the frequency response of the feedback network in the
`predictive coder independently of the predictor. The methods
`permit adaptive adjustment of the
`noise spectrum dependent
`
`0096-3518/79/0600-0247$00.75 0 1979 IEEE
`
`Ex. 1027 / Page 1 of 8
`Apple v. Saint Lawrence
`
`

`

`248
`
`IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. ASSP-21, NO. 3, JUNE 1979
`
`
`
` SPEECH SAMPLER
`S ( t 1
`
`TRANSMITTER
`
`RECEIVER
`
`D I G I T A L
`A CHANNEL
`
`E n
`
`(t
`
`6"
`
`PREDICTOR
`
`Fig. 1. Block diagram of a predictive coder.
`
`= s n - 2 (sn-k 6rz-k)ak
`
`rn
`
`k = 1
`
`where the predictor P is represented by a transversal filter with
`m delays and m gains a l , a2, . * . , a , . The quantizer input
`thus consists of two parts: 1) the prediction error based on
`the prediction of the input s,
`from its own past, and 2) the
`filtered error signal obtained by filtering of the quantizer error
`through the filter P. An easy way of modifying the spectrum
`of the quantizing noise is to use a filter F different from P for
`filtering the quantizer error [5], [6]. Fig. 2 shows such a gen-
`eralized predictive coder. The quantizer input q , is now given
`by
`
`Improved speech
`time-varying speech spectrum.
`
`
`on the
`
`
`quality is obtained both by exploiting auditory masking and
`by efficient prediction
`of formant and pitch-related redun-
`dancies in speech before quantizing.
`11. SPECTRUM OF QUANTIZING NOISE I N
`PREDICTIVE CODERS
`Fig. 1 shows a schematic diagram of a predictive coder. Its
`operation can be summarized as follows. The input speech sig-
`nal is sampled to produce a sequence of sample values so,
`sl, . * . , s,,
`* . . The predictor P forms a linear estimate of
`each sample value based on the previously decoded output
`sample values (reconstructed speech samples). This estimate
`f, is subtracted from the actual
`sample value s, and the re-
`ceiver. The quantized difference t, is added back to the pre-
`sulting difference q , is quantized and transmitted to the re-
`dicted value E, both at the transmitter and at the
`receiver to
`sample ?,.
`form the next output
`It is readily seen in Fig. 1
`the coder output ?,
`that, in the absence of channel errors,
`is
`given by
`
`A
`
`qn =s, -
`S,-kak -
`k = 1
`k = 1
`where b l , b2, .
`, b,v
`are m' gain coefficients of the trans-
`
`
`sn = t n + t n
`versa1 filter F. We will assume
`that both
`1 - F and 1 - P have
`= fn + sn - t n + 6,
`their roots
`inside the unit circle. The coder output is now
`=s, t 6,
`('1 given as
`where 6, is the error introduced by the quantizer (difference
`between the output and the input of the quantizer) at the nth
`sample. Thus, the difference between the output and the in-
`put speech sample values is identical to the error introduced
`by the quantizer. Assuming that the spectrum of the quantizer
`error is white (a reasonable assumption, particularly if the pre-
`diction error is white), the noise at the output of the coder is
`also white.
`111. GENERALIZED PREDICTIVE CODER
`
`The quantizer input q , in Fig. 1 can be written as
`rn
`
`rn
`
`rn
`
`6 , . - k b k
`
`(3)
`
`Representing Fourier transforms by upper case letters, (4) can
`be written in frequency-domain notations as
`$ - s = A - 1 - F
`1 -P'
`For F = P, the output noise is the same as the quantizer noise,
`and the two coders shown in Figs. 1 and 2 are identical. How-
`ever, with F # P, the coder of Fig. 2 allows greater flexibility
`in controlling the spectrum
`of output noise based on the
`
`Ex. 1027 / Page 2 of 8
`
`

`

`ATAL AND SCHROEDER: PREDICTIVE CODING OF SPEECH SIGNALS
`
`249
`
`TRANSMITTER
`
`RECEIVER
`
`-k-l*
`
`DIGITAL
`CHANNEL
`
`S P E E C H
`
`SPEECH
`
`PREDICTOR
`1
`I
`Fig. 2. Block diagram of a generalized predictive coder with adjustable
`noise spectrum.
`
`SPEECH
`SIGNAL
`
`0
`
`UANTIZER
`
`::
`
`b
`
`DIGITAL
`CHANNEL
`
`B
`
`+
`
`I
`Fig. 3. Another configuration for the generalized predictive coder with
`adjustable noise spectrum.
`
`I
`
`choice of the feedback filter F.' Under the assumption that
`the quantizer noise is white, the spectrum of the coder output
`(1 - F)/(1 - P) as
`noise is determined only
`by the factor
`shown in (5). Let the squared magnitude of t h s factor at a
`frequencyfbe r(f). Then
`r ( f ) = I [ 1 - F ( e 2 n j f T ) ] / [ 1 - P ( e z * j ~ ) ] 1 2
`(6)
`where T is the sampling interval. Equation (6) implies an im-
`portant constraint on the average value of log r(f), that is,
`1 fs
`log r (f) df = o
`
`(7)
`
`fs
`where fs is the sampling frequency. Expressed on a decibel
`scale, the average value of log r ( f ) is 0 dB. The proof of (7)
`is relatively straightforward [7] and is outlined below.
`1 - F which is expressed in the z-
`Consider the function
`
`'A somewhat different configuration of a predictive coder for con-
`trolling the spectrum of the output noise is shown in Fig. 3. It is easily
`verified that the Fourier transform of the output noise for this coder
`is given by A ( l -d)/(l - B). In principle, the coder of Fig. 2 can be
`made equivalent to the coder of Fig. 3 by appropriate choice of the
`filter F. However, as a practical matter, one or the other coder may be
`simpler to implement depending on the choice for the spectrum of the
`output noise.
`
`transform notation as
`
`where zk is the kth root of 1 - F(z). The function log [l -
`F(z)] is given by
`
`m'
`
`log [1 - F(z)] =
`(9)
`lOg(1 - Z k 2 - l ) .
`k = l
`Since lzkl < 1 (all the zeros of 1 - F(z) are inside the unit
`circle), the right side of (9) can be expressed as a polynomial
`function of 2 - l . Therefore,
`-
`n = ~ I
`n=1
`z z . The integral of log [l - F(z)] over the
`where c, =
`frequency range from 0 to f, is then given by
`
`m
`
`Similarly, it can be shown that
`
`Ex. 1027 / Page 3 of 8
`
`

`

`250
`
`IEEE TRANSACTIONS ON ACOUSTICS, SPEECH,
`
`AND SIGNAL PROCESSING,
`
`VOL. ASSP-27, NO. 3, JUNE 1979
`
`t
`
`1
`
`t
`
`’
`
`
`
`Prediction Based on Spectral Envelope
`Prediction based on the spectral envelope involves relatively
`short delays. The predictor
`can be characterized in the
`z-
`transform notation as
`
`(1 3 )
`
`P
`
`akz,k
`
`P,(z) =
`k= 1
`where z-l represents a delay of one sample interval and
`a l , a2, . . . ,ap are p predictor coefficients. The
`value of p
`typically is 10 for speech sampled at 8 kHz. A higher value
`may often be desirable.
`The input to the quantizer in Fig. 2 consists of two parts:
`1) the prediction error d, based on the prediction of the input
`signal f, ob-
`2) the filtered error
`from its own past, and
`s,
`tained by filtering of the quantizer noise 6, through the filter
`F. Under the assumption that the quantizer noise is uncor-
`the total power eq at the
`related with the prediction error,
`input to the quantizer is the sum of the powers in the predic-
`tion error d, and the filtered noise f,. That is,
`-
`€4 - ep + ef 3
`(1 4)
`ef is the
`where ep is the power in the prediction error and
`power in the filtered noise. It is our experience that, for satis-
`factory operation of the coder, ef should be less than e p . The
`power in the filtered
`noise is determined both by the power
`in the quantizer error 6, and the power gain G of the filter F.
`The power gain G equals the sum of the squares of the filter
`coefficients. For F = P,, the power gain is usually large and
`can often exceed 200. Such a high power gain causes excessive
`feedback of the noise power to the quantizer input, particu-
`larly for coarse quantizers, resulting
`in poor performance of
`the coder. Excessive feedback can be prevented by requiring
`that the power gain of the filterP,(z) is not large. The reason
`for the high power gain is as follows.
`1 - P,(z) is approximately the
`The spectrum of the filter
`reciprocal of the speech spectrum (within a scaling constant).
`The low-pass filter used in the analog-to-digital conversion of
`the speech
`signal forces the reciprocal spectrum (and thus
`I 1 - P,(z)l) to assume a high value in the vicinity of the cutoff
`frequency of the filter. The power gain, which is equal to the
`integral of the power spectrum 11 - Ps(e2nifr)12 with respect
`to the frequency variable f, thus also becomes large.
`These artificially high power gains will not arise if the low-
`pass filter used in the sampling process was an ideal low-pass
`filter with a cutoff frequency exactly equal
`to the half the
`sampling frequency. The amplitude-versus-frequency response
`of a practical low-pass filter falls off gradually. The computed
`covariance matrix used in LPC analysis therefore has missing
`components corresponding
`to the
`speech signal rejected by
`the low-pass filter. The missing high-frequency components
`produce artificially low
`eigenvalues of the covariance matrix
`corresponding to eigenvectors related
`to such components.
`The high power gain of P, is precisely caused by the small
`eigenvalues. The covariance matrix of the
`low-pass filtered
`speech is nearly singular thereby resulting in a nonunique
`solution of the predictor coefficients. Thus,
`a variety of dif-
`ferent predictor coefficients can approximate the speech
`
`
`
`-
`
`0
`
`0
`
`-
`4
`
`0
`
` 0
`
`
`3
`2
`1
`F R E Q U E N C Y ( k H z )
`F R E Q U E N C Y ( k H z )
`Fig. 4. Two possible shapes for the spectrum of output noise (solid
`curve) in the coder shown in Fig. 2. T h e average level of the logarith-
`mic spectrum (shown as a dashed line) is the same in both cases. The
`speech spectrum is shown by the dotted curve.
`
`/,” log [l - P(eznjf T ) ] d f = 0.
`
`Equation (7) follows directly from (1 1) and (12).
`Assuming that the power of
`the quantizer noise 6, is not
`changed significantly by the feedback
`loop-a desirable con-
`average
`dition for satisfactory operation of the coder-the
`value of log power spectrum of output
`noise is then deter-
`mined solely by the quantizer and is not altered by the choice
`of the filter F or the predictor P. The filter F, however, re-
`distributes the noise power from one frequency
`to another.
`Thus, reduction in quantizer noise at one frequency can
`be
`obtained only at the expense of increasing the quantizer noise
`at another frequency.
`Since a large part of perceived noise
`in a coder comes from the frequency
`regions where the signal
`level is low, the filter F can be used to reduce the noise in such
`regions while increasing the noise in the formant regions where
`the noise could be effectively masked by the speech
`signal.
`Some examples of the possible shapes for the spectrum of the
`quantizing noise together with
`the speech spectrum are illus-
`trated in Fig. 4. In each case, the logarithmic spectrum of the
`quantizing noise has equal area above and below the average
`level shown by the dashed line.
`IV. APPLICATION TO SPEECH SIGNALS
`A. Selection of Predictor
`Linear prediction is a well-known method of removing the
`is done
`redundancy in a signal. For speech, the prediction
`most conveniently in two separate stages [4], [8] : a first pre-
`diction based on the short-time spectral envelope of speech,
`on the periodic nature of
`and a second prediction based
`the
`spectral fine structure. The short-time spectral
`envelope of
`speech is determined by the frequency response of the vocal
`tract and for voiced speech also by the spectrum of the glottal
`pulse. The spectral fine structure
`arising from the quasi-
`periodic nature of voiced speech is determined mainly by the
`pitch period. The fine structure for
`unvoiced speech is ran-
`dom and cannot be used for prediction.
`
`Ex. 1027 / Page 4 of 8
`
`

`

`ATAL AND SCHROEDER: PREDICTIVE CODING OF SPEECH SIGNALS
`
`251
`
`t
`
`i
`
`spectrum equally well in the passband of the low-pass fdter.
`We wish to avoid solutions which lead
`to high power gains
`of the predictor P,.
`The ill-conditioning of the covariance matrix can be avoided
`by adding to the covariance matrix another matrix propor-
`tional to the
`covariance matrix of high-pass filtered white
`noise. We define a new covariance matrix & (with its (q) term
`represented by Sij) and a new correlation vector
`c^ (with its
`ith term represented by F j ) by the equations
`@ij = @ij -+ hemin P i - j
`and
`ci = ci 4- XEmin pi
`where
`
`(1 5)
`
`(1 6)
`
`A
`
`A
`
`? !
`
`"
`
`
`'
`'
`'
`'
`"
`3
`2
`1
`F R E Q U E N C Y ( k H z )
`Fig. 5. Spectral envelopes of speech based on LPC analysis with high-
`frequency correction for A = 0 (solid curve) and for A = 0.05 (dot-
`ted curve). The power gains in the two cases are 204.6 and 12.6,
`respectively.
`
`4
`
`@ i j = ( s n - i s n - j ) ,
`cj = (s,sn- j ) ,
`A is a small constant (suitable values are in the range 0.01-0.10),
`emin is the minimum value of the mean-squared prediction error,
`pi is the autocorrelation of the high-pass filtered white noise
`204.6 for h = 0 to 12.6 for A = 0.05. The speech signal was
`at a delay of i samples, and ( ) indicates averaging over the
`10 kHz. The anti-aliasing filter
`sampled at a rate of
`had at-
`speech samples contained in the analysis segment. Ideally, the
`tenuation of 3 dB at 4.2 kHz and more than 40 dB at 5 kHz.
`high-pass filter should be the filter complimentary to the low-
`pass fdter used in the sampling process. We have obtained rea-
`Prediction Based on Spectral Fine Structure
`high-pass filter [$(l -
`sonably satisfactory results with the
`For this filter, the autocorrelations are po = $, p1 =
`Adjacent pitch periods in voiced speech show considerable
`2-')I"
`- , p2 - , and p k = 0 for k > 2. By making the scale factor
`similarity. The quasi-periodic nature of the signal is present-
`- 1
`although to a lesser extent-in
`the difference signal obtained
`on the noise covariance matrix in (1 5) and (16) proportional
`after prediction on the basis of spectral envelope. The period-
`to the mean-squared prediction error, we find that it
`is pos-
`icity of the difference signal can be removed by further predic-
`sible to use a fixed value of h. The results are not very sensi-
`tion. The predictor for the difference signal can be character-
`tive to small variations in the value of X. The minimum value
`ized in the z-transform notation by
`of the mean-squared prediction error
`is determined by the
`Cholesky decomposition [8] of the original covariance matrix
`(1 8)
`P ~ ( z ) = P ~ z - ~ + '
`-+p2Z+'
`+ p 3 ~ - ~ - l
`[ ( ~ ~ j ] . A modified form [9] of the covariance method is used
`where M represents a relatively long delay in the
`range 2 to
`to determine the predictor coefficients from
`the new covari-
`ance matrix 6. The first two steps in this modifed procedure
`to a pitch
`20 ms. In most cases, this delay would correspond
`periods).
`period (or possibly,
`an integral number of pitch
`are identical to the usual covariance method [8]. That is, the
`matrix & is expressed as a product of a lower triangular matrix
`The degree of periodicity in the difference
`signal varies with
`pl, p2, and p3
`frequency. The three amplitude coefficients
`L and its transpose Lt by Cholesky decomposition and a set of
`linear equations L q = c^ is solved. The partial correlation at a
`provide a frequency-dependent gain factor in the pitch-predic-
`delay m is obtained from
`tion process. We found it necessary
`to use at least a third-
`order predictor for pitch prediction. The difference
`signal
`4m
`after prediction based on spectral envelope has a nearly flat
`spectrum up to half the sampling frequency. Due to a fixed
`sampling frequency unrelated
`to pitch period,
`the individual
`samples of the difference signal do not show a high period-
`to period correlation. The third-order pitch predictor pro-
`vides an interpolated value with a much higher correlation
`than the
`individual samples. Higher order pitch predictors
`provide even better improvement in the prediction gain.
`Let the nth sample of the difference
`signal after the first
`("formant") prediction be given by
`
`z1 4
`
`l t 2
`
`rm =
`
`[<si) -
`
`(1 7)
`
`where qm is the rnth component of q . The partial correlations
`are transformed to predictor coefficients using the well-known
`relation between the partial correlations and the predictor co-
`[7, p. 1101. The modified pro-
`efficients for all-pole filters
`cedure ensures that all of the zeros of the polynomial 1 - p&)
`are inside the unit circle. Using the above procedure, reason-
`able power gains are realized without introducing significant
`bias in the spectrum at lower frequencies.' Examples of spec-
`computed for h = 0 (uncorrected)
`tral envelopes of speech
`and for X = 0.05 with the high-pass filter [$(l - z-l)]
`are
`illustrated in Fig. 5. The power gain of P, decreased from
`
`2Another possible solution, namely, undersampling of the speech sig-
`nal, for avoiding excessive power gains was suggested by one of the
`reviewers of this paper. We did not try this solution and are unable to
`comment on its effectiveness.
`
`Ex. 1027 / Page 5 of 8
`
`

`

`252 IEEE
`
`TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING,
`
`VOL. ASSP-27, NO.
`
`3, JUNE 1979
`
`lfS
`
`log C( f ) = -log W ( f ) + - log W( f ) df.
`(23)
`f s
`The function 1 - F is the minimum-phase transfer function
`with spectrum G ( f ) and can be obtained by direct Fourier
`transformation or spectral factorization.
`A particularly
`simple solution to this problem
`is obtained by transforming
`1/G( f ) to an autocorrelation function by Fourier transforma-
`tion. By using a procedure similar to LPC analysis, the auto-
`correlation function can be used to determine a set of pre-
`dictor coefficients. The predictor coefficients determined in
`this manner are indeed the desired filter coefficients bk for
`the filter F. The solution
`is considerably simplified also if
`the noise-weighting function W ( f ) is expressed in terms of a
`filter transfer function whose poles and zeros lie inside the
`unit circle.
`
`C. Generalized Predictive Coder for Speech
`The block
`diagram of a generalized predictive coder for
`speech signals is shown in Fig. 7. The high frequencies in the
`speech signal are preemphasized with a filter 1 - 0.42-' prior
`to encoding. At the output, the high frequencies are deem-
`phasized back with an
`inverse filter (1 - O . ~ Z - ' ) - ' .
`The
`speech samples are filtered by the filter 1 - P, to produce the
`difference signal after the first prediction based on the short-
`time spectral envelope of the speech signal. The second pre-
`dictor Pd based on pitch periodicity forms an estimate d; of
`the difference signal at the nth
`sample based on the past
`quantizer output samples. The quantizer error is filtered and
`peak-limited to produce the sample f,. The composite signal
`q, = d, - dl, - f , is quantized by an adaptive three-level
`quantizer. The parameters
`of the quantizer are selected to
`be optimum [ 101 for a Gaussian input with standard deviation
`equal to the rms value of the prediction error (after predic-
`tion by both Ps and Pd). For a uniformly spaced three-level
`quantizer, the optimum
`spacing between the quantizer out-
`put levels is 1.22 times the rms value of the prediction error.
`Both predictors and the quantizer parameters
`are reset once
`every 10 ms. An interval of 20 ms is used to minimize the
`prediction error in determining the predictor parameters. For
`a total of 40 s of speech spoken by two male and two female
`speakers, the average prediction gains for the two predictors-
`based on spectral envelope and pitch periodicity-were found
`to be 11 .O and 5.6 dB, respectively. The average signal-to-
`noise ratio (SNR) of the three-level quantizer was found to
`be 6.1 dB. The speech was relatively noise-free with an aver-
`age SNR of approximately 40 dB.
`It can be shown that the stability of the feedback loop is
`ensured only if the function 1 - F is a minimum-phase transfer
`function [l 11. Several methods of ensuring the stability of
`the feedback loop were considered for situations where the
`minimum-phase requirement could not be satisfied. A simple
`and effective solution was found to be to limit the
`peak
`
`0
`
`20
`
`6 0
`4 0
`T I M E
`( M S E C )
`Fig. 6. (a) Speech waveform.
`(b) Difference signal after prediction
`based on spectral envelope (amplified 10 dB relative to the speech
`(e) Difference signal after prediction based on pitch
`waveform).
`periodicity (amplitied 20 dB relative to the speech waveform).
`
`8 0
`
`1 0 0
`
`The delay M of the predictor Pd(z) is defined as the delay for
`which the normalized correlation coefficient between d, and
`dn-M is highest. The coefficients Dl, pz, and p3 are deter-
`mined by minimizing the mean-squared prediction error be-
`tween d, and its predicted value. The minimization procedure
`leads to a set of simultaneous linear equations in the three un-
`knowns PI, Pz, and 8 3 .
`Examples of prediction error signals after each stage of pre-
`diction together with the original speech signal are illustrated
`in Fig. 6. The prediction error after the first prediction based
`on the spectral envelope is amplified by 10 dB in the display.
`The prediction error after the second prediction based on pitch
`periodicity is amplified by an additional 10 dB. The predic-
`tion error after pitch prediction
`is quite noise-like in nature.
`Its spectrum-including both envelope and fine structure-is
`nearly white.
`
`B. Selection of the Feedback Filter
`The filter F providing feedback of
`in
`the quantizer error
`Fig. 2 can be chosen to minimize an error measure in which
`the noise is weighted according to some subjectively mean-
`ingful criterion. For example,
`an effective noise-to-signal
`ratio can be defined by weighting the noise power at each fre-
`quency f by a function W(f). For a fixed quantizer, the spec-
`trum of the output noise is proportional to I(1 - F)/(1 - P)I2.
`The signal spectrum is proportional to I l/(l - P)Iz. The ratio
`of noise power to signal power at any frequency f is propor-
`tional to
`C(f) = 11 - F(eZAffT)Iz.
`One could choose F to minimize
`
`(20)
`
`under the constraint
`
`Ex. 1027 / Page 6 of 8
`
`

`

`ATAL AND SCHROEDER: PREDICTIVE CODING OF SPEECH SIGNALS
`
`253
`
`TRANSMITTER
`
`RECEIVER
`
`Fig. 7. Block diagram of a generalized predictive coder for speech.
`
`c
`
`6 0
`
`,- P,
`
`SPEECH
`
`t
`
`0 ' " ' ' " '
`3
`2
`1
`0
`F R E Q U E N C Y ( k H z )
`Fig. 8. Spectral envelopes of output quantizing noise (dotted curve)
`and the corresponding speech spectrum (solid curve) for F = 0.
`
`0
`0
`
`2
`3
`1
`F R E Q U E N C Y ( k H z )
`Fig. 9. Spectral envelope of the speech signal and flat quantization
`noise (F = Ps).
`
`the feed-
`The peak limiter in
`amplitude of the samples
`f,.
`back loop limits the samples f , to a maximum value of twice
`the rms value of the prediction error. For one of the choices
`for the filter F, several instances of instability in the feed-
`back loop were encountered without the peak limiter. How-
`ever, our results based
`on extensive testing with the speech
`data revealed no significant increase of the quantization noise
`after inclusion of the peak limiter in the feedback loop.
`Several choices for the feedback filter F i n Fig. 7 were in-
`vestigated. Some of
`the more interesting ones are
`discussed
`below.
`1) Assume that the noise-weighting function W ( f ) in (23)
`is constant for all frequencies. This would be a good choice
`if our ears were equally sensitive to quantizing distortion at
`is F = 0. In this case, the coder
`all frequencies. The solution
`output noise has the
`same spectral envelope as the original
`speech (but at a lower level) as shown in Fig. 8. This par-
`to a signal-to-noise ratio of 13 dB with a
`ticular choice leads
`three-level quantizer. Subjectively,
`the reconstructed speech
`
`the distortion is heard mostly at low fre-
`is quite noisy. But
`that W ( f ) =constant is a good choice
`quencies implying
`at
`the higher frequencies.
`2) Assume that the distortion is subjectively more important
`in the formant
`regions and less important in the regions be-
`tween formants. As an extreme case, let W = I l - PSI-* which
`leads to F = P,-the
`same system as the original predictive
`coder shown in Fig. 1. For a given quantizer, such a choice
`results in minimum unweighted noise power in
`the recon-
`structed speech. For the three-level quantizer,
`an average
`SNR of 23 dB is obtained. Subjectively,
`the reconstructed
`speech from
`the coder is much less noisy than for the case
`F = 0. But, the quantization noises in the two
`cases sound
`very different. For F = P,, the spectrum of the output noise
`(see Fig. 9) is white, producing a very hgh SNR at the for-
`mants but a poor one in between the formants.
`3) Select F somewhere in between the two extreme choices
`discussed above. As an example, let
`F = P,(cYz-' )
`
`(24)
`
`Ex. 1027 / Page 7 of 8
`
`

`

`254
`
`IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND
`
`SIGNAL PROCESSING, VOL. ASSP-27,
`
`NO. 3, JUNE 1979
`
`t
`
`V. CONCLUDING REMARKS
`Predictive coding is a promising approach for speech coding.
`It provides a practical method of realizing substantial savings
`in transmission requirements of digitized speech without com-
`promising the speech quality while, at the same time, provid-
`ing robust performance
`across different speakers and spoken
`material. This paper focuses on a hitherto neglected aspect of
`predictive coding systems-namely, the subjectively weighted
`A large part
`spectrum of the noise introduced by the coder.
`of the perceived coder noise comes from the frequency regions
`where the noise is not masked by the speech signal. A simple
`modification of
`the conventional predictive coder permits
`adjustment of the noise spectrum. It
`is shown that consider-
`able improvement in speech quality can be achieved by proper
`shaping of the
`noise spectrum dependent on the short-time
`spectral envelope of the speech sign

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket