`
`247
`
`[3] C. S . Burus and T. W. Parks, “Time domain design of recursive
`digital filters,” IEEE Trans. Audio Electroacoust., vol. AU-18,
`pp. 137-141, June 1970.
`[4] F. N. Cornett, “First and second order techniques for the design
`of recursive filters,” Ph.D. dissertation, Colorado State Univ., Ft.
`Collins, CO, 1976.
`[5] D. C. Farden and L. L. Scharf, “Statistical design of nonrecursive
`digital filters,” IEEE Trans. Acoust., Speech, Signal Puocessing,
`vol. ASSP-22, pp. 188-196, June 1974.
`[6] E. Parzen, “Multiple time series Modeling,” in Multivariate Analp
`sis IZ, P. Krishnaiah, Ed. New York: Academic, pp. 389-409.
`[7] W. C. Kellog, “Time domain design of nonrecursive least mean-
`IEEE Trans. Audio Electroacoust., vol.
`square digital filters,”
`AU-20, pp. 155-158, June 1972.
`[8] D. C. Farden and L. L. Scharf, “Authors reply to ‘Comments on
`statistical design of nonrecursive digital filters,”’ IEEE Dam.
`Acoust., Speech, Signal Processing, vol. ASSP-23, pp. 495-496,
`Od. 1975.
`[9] U. Grenander and G. Szego, Toeplitz Forms and Their Applica-
`tions. Berkeley, CA: Univ. California Press, 1958.
`[ 101 K. Steiglitz, “Computer-aided design of recursive digital filters,”
`ZEEE Trans. Audio Electroacoust., vol. AU-18, pp. 123-129,
`June 1970.
`[ l l ] P. Thajchayapong and P. J. W. Rayner, “Recursive digital fiter
`design by linear programming,” ZEEE Trans. Audio Electro-
`
`acoust., vol. AU-21,pp. 107-112, Apr. 1973.
`[ 121 L. R. Rabiner, N. Y. Graham, and H. D. Helms, “Linear program-
`ming design of IIR digital Titers with arbitrary magnitude func-
`tions,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-
`22, pp. 117-123, Apr. 1974.
`[13] D. E. Dudgeon, “Recursive filter design using differential correc-
`tion,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-
`22, pp. 443-448, Dec. 1974.
`[14] H. Dubois and H. Leich, “On the approximation problem for re-
`cursive digital filters with arbitrary attenuation curve in the pass-
`band and the stopband,” IEEE Trans. Acoust., Speech, Signal
`Processing, vol. ASSP-23, pp. 202-207, Apr. 1975.
`[ 151 C. Charalambous, “Minimax optimization of recursive digital fi-
`ters using recent minimax results,”IEEE Trans. Acoust., Speech,
`Signal Processing, vol. ASSP-23, pp. 333-346, Aug. 1975.
`[16] J. A. Cadzow, “Recursive digital fiiter synthesis
`via gradient
`based algorithms,” IEEE Trans. Acoust., Speech, Signal Process-
`ing, vol. ASSP-24, pp. 349-356, Oct. 1976.
`[ 17 J H. Clergeot and L. L. Scharf, “Connections between classical and
`FIR digital filter design,” IEEE Trans.
`statistical methods of
`Acoust., Speech, Signal Processing, vol. ASSP-26, pp. 463-465,
`Oct. 1978.
`[ 181 K. Steiglitz and L. E. McBride, “A technique for identification of
`linear systems,” IEEE Trans. Automat. Contr., vol. AGIO, pp.
`461-464, Od. 1965.
`
`attempt to minimize the rms
`Abstract-Predictive coding methods
`error in
`the coded signal. However, the human ear does not perceive
`basis of rms error, regardless of its spectral
`signal distortion on the
`shape relative to the signal spectrum. In designing a coder for speech
`signals, it is necessary to consider the spectrum of the quantization
`noise and its relation to the speech spectrum. The theory of
`auditory
`masking suggests that noise in the formant regions would be partially or
`totally masked by the speech signal. Thus, a large part of the perceived
`noise in a coder comes from frequency regions where the signal level is
`low. In this paper, methods
`for reducing the subjective distortion in
`Im-
`predictive coders for speech
`signals are described and evaluated.
`proved speech quality is obtained: 1) by efficient removal of formant
`and pitch-related redundant structure of speech before quantizing, and
`2) by effective masking of the quantizer noise by the speech signal.
`
`F
`
`I. INTRODUCTION
`OR autocorrelated signals, such as speech, predictive cod-
`ing [ 11 - [4] is an efficient method of encoding the signal
`into digital form. The coding efficiency
`is achieved by quan-
`
`Manuscript received July 18, 1978; revised November 28, 1978.
`B. S . Atal is with Bell Laboratories, Murray Hill, NJ 07974.
`M. R. Schroeder is with the Drittes Physikalisches Institut, University
`of Gottingen, Gottingen, Germany, and Bell Laboratories, Murray Hill,
`NJ 07974.
`
`signal which cannot be pre-
`tizing and transmitting only the
`the already coded signal. In predictive coders,
`dicted from
`the power of the quantizer noise is proportional to the power
`of the prediction error. Thus, efficient prediction is important
`for minimizing the quantizer error.
`Small quantization error,
`however, does not ensure that the distortion
`in the speech
`signal is perceptually small; it is necessary to consider the
`spectrum of the quantization
`noise and its relation
`to the
`speech spectrum.
`The theory of auditory masking
`suggests
`that noise in the formant regions would be partially or totally
`masked by the speech
`signal. Thus, a
`large part of the per-
`ceived noise in a coder comes from the frequency regions where
`the signal level is low. Moreover, we can tolerate more distor-
`tion in the transitional segments in speech (where rapidly
`changing formants produce wider formant regions) in compari-
`son to the steady segments.
`In this paper, we discuss methods for modifying the spec-
`trum of the quantization noise in a predictive coding system
`for speech to reduce the perceptible distortion introduced
`by
`such coders. The proper spectral shaping
`is realized by con-
`trolling the frequency response of the feedback network in the
`predictive coder independently of the predictor. The methods
`permit adaptive adjustment of the
`noise spectrum dependent
`
`0096-3518/79/0600-0247$00.75 0 1979 IEEE
`
`Ex. 1027 / Page 1 of 8
`Apple v. Saint Lawrence
`
`
`
`248
`
`IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. ASSP-21, NO. 3, JUNE 1979
`
`
`
` SPEECH SAMPLER
`S ( t 1
`
`TRANSMITTER
`
`RECEIVER
`
`D I G I T A L
`A CHANNEL
`
`E n
`
`(t
`
`6"
`
`PREDICTOR
`
`Fig. 1. Block diagram of a predictive coder.
`
`= s n - 2 (sn-k 6rz-k)ak
`
`rn
`
`k = 1
`
`where the predictor P is represented by a transversal filter with
`m delays and m gains a l , a2, . * . , a , . The quantizer input
`thus consists of two parts: 1) the prediction error based on
`the prediction of the input s,
`from its own past, and 2) the
`filtered error signal obtained by filtering of the quantizer error
`through the filter P. An easy way of modifying the spectrum
`of the quantizing noise is to use a filter F different from P for
`filtering the quantizer error [5], [6]. Fig. 2 shows such a gen-
`eralized predictive coder. The quantizer input q , is now given
`by
`
`Improved speech
`time-varying speech spectrum.
`
`
`on the
`
`
`quality is obtained both by exploiting auditory masking and
`by efficient prediction
`of formant and pitch-related redun-
`dancies in speech before quantizing.
`11. SPECTRUM OF QUANTIZING NOISE I N
`PREDICTIVE CODERS
`Fig. 1 shows a schematic diagram of a predictive coder. Its
`operation can be summarized as follows. The input speech sig-
`nal is sampled to produce a sequence of sample values so,
`sl, . * . , s,,
`* . . The predictor P forms a linear estimate of
`each sample value based on the previously decoded output
`sample values (reconstructed speech samples). This estimate
`f, is subtracted from the actual
`sample value s, and the re-
`ceiver. The quantized difference t, is added back to the pre-
`sulting difference q , is quantized and transmitted to the re-
`dicted value E, both at the transmitter and at the
`receiver to
`sample ?,.
`form the next output
`It is readily seen in Fig. 1
`the coder output ?,
`that, in the absence of channel errors,
`is
`given by
`
`A
`
`qn =s, -
`S,-kak -
`k = 1
`k = 1
`where b l , b2, .
`, b,v
`are m' gain coefficients of the trans-
`
`
`sn = t n + t n
`versa1 filter F. We will assume
`that both
`1 - F and 1 - P have
`= fn + sn - t n + 6,
`their roots
`inside the unit circle. The coder output is now
`=s, t 6,
`('1 given as
`where 6, is the error introduced by the quantizer (difference
`between the output and the input of the quantizer) at the nth
`sample. Thus, the difference between the output and the in-
`put speech sample values is identical to the error introduced
`by the quantizer. Assuming that the spectrum of the quantizer
`error is white (a reasonable assumption, particularly if the pre-
`diction error is white), the noise at the output of the coder is
`also white.
`111. GENERALIZED PREDICTIVE CODER
`
`The quantizer input q , in Fig. 1 can be written as
`rn
`
`rn
`
`rn
`
`6 , . - k b k
`
`(3)
`
`Representing Fourier transforms by upper case letters, (4) can
`be written in frequency-domain notations as
`$ - s = A - 1 - F
`1 -P'
`For F = P, the output noise is the same as the quantizer noise,
`and the two coders shown in Figs. 1 and 2 are identical. How-
`ever, with F # P, the coder of Fig. 2 allows greater flexibility
`in controlling the spectrum
`of output noise based on the
`
`Ex. 1027 / Page 2 of 8
`
`
`
`ATAL AND SCHROEDER: PREDICTIVE CODING OF SPEECH SIGNALS
`
`249
`
`TRANSMITTER
`
`RECEIVER
`
`-k-l*
`
`DIGITAL
`CHANNEL
`
`S P E E C H
`
`SPEECH
`
`PREDICTOR
`1
`I
`Fig. 2. Block diagram of a generalized predictive coder with adjustable
`noise spectrum.
`
`SPEECH
`SIGNAL
`
`0
`
`UANTIZER
`
`::
`
`b
`
`DIGITAL
`CHANNEL
`
`B
`
`+
`
`I
`Fig. 3. Another configuration for the generalized predictive coder with
`adjustable noise spectrum.
`
`I
`
`choice of the feedback filter F.' Under the assumption that
`the quantizer noise is white, the spectrum of the coder output
`(1 - F)/(1 - P) as
`noise is determined only
`by the factor
`shown in (5). Let the squared magnitude of t h s factor at a
`frequencyfbe r(f). Then
`r ( f ) = I [ 1 - F ( e 2 n j f T ) ] / [ 1 - P ( e z * j ~ ) ] 1 2
`(6)
`where T is the sampling interval. Equation (6) implies an im-
`portant constraint on the average value of log r(f), that is,
`1 fs
`log r (f) df = o
`
`(7)
`
`fs
`where fs is the sampling frequency. Expressed on a decibel
`scale, the average value of log r ( f ) is 0 dB. The proof of (7)
`is relatively straightforward [7] and is outlined below.
`1 - F which is expressed in the z-
`Consider the function
`
`'A somewhat different configuration of a predictive coder for con-
`trolling the spectrum of the output noise is shown in Fig. 3. It is easily
`verified that the Fourier transform of the output noise for this coder
`is given by A ( l -d)/(l - B). In principle, the coder of Fig. 2 can be
`made equivalent to the coder of Fig. 3 by appropriate choice of the
`filter F. However, as a practical matter, one or the other coder may be
`simpler to implement depending on the choice for the spectrum of the
`output noise.
`
`transform notation as
`
`where zk is the kth root of 1 - F(z). The function log [l -
`F(z)] is given by
`
`m'
`
`log [1 - F(z)] =
`(9)
`lOg(1 - Z k 2 - l ) .
`k = l
`Since lzkl < 1 (all the zeros of 1 - F(z) are inside the unit
`circle), the right side of (9) can be expressed as a polynomial
`function of 2 - l . Therefore,
`-
`n = ~ I
`n=1
`z z . The integral of log [l - F(z)] over the
`where c, =
`frequency range from 0 to f, is then given by
`
`m
`
`Similarly, it can be shown that
`
`Ex. 1027 / Page 3 of 8
`
`
`
`250
`
`IEEE TRANSACTIONS ON ACOUSTICS, SPEECH,
`
`AND SIGNAL PROCESSING,
`
`VOL. ASSP-27, NO. 3, JUNE 1979
`
`t
`
`1
`
`t
`
`’
`
`
`
`Prediction Based on Spectral Envelope
`Prediction based on the spectral envelope involves relatively
`short delays. The predictor
`can be characterized in the
`z-
`transform notation as
`
`(1 3 )
`
`P
`
`akz,k
`
`P,(z) =
`k= 1
`where z-l represents a delay of one sample interval and
`a l , a2, . . . ,ap are p predictor coefficients. The
`value of p
`typically is 10 for speech sampled at 8 kHz. A higher value
`may often be desirable.
`The input to the quantizer in Fig. 2 consists of two parts:
`1) the prediction error d, based on the prediction of the input
`signal f, ob-
`2) the filtered error
`from its own past, and
`s,
`tained by filtering of the quantizer noise 6, through the filter
`F. Under the assumption that the quantizer noise is uncor-
`the total power eq at the
`related with the prediction error,
`input to the quantizer is the sum of the powers in the predic-
`tion error d, and the filtered noise f,. That is,
`-
`€4 - ep + ef 3
`(1 4)
`ef is the
`where ep is the power in the prediction error and
`power in the filtered noise. It is our experience that, for satis-
`factory operation of the coder, ef should be less than e p . The
`power in the filtered
`noise is determined both by the power
`in the quantizer error 6, and the power gain G of the filter F.
`The power gain G equals the sum of the squares of the filter
`coefficients. For F = P,, the power gain is usually large and
`can often exceed 200. Such a high power gain causes excessive
`feedback of the noise power to the quantizer input, particu-
`larly for coarse quantizers, resulting
`in poor performance of
`the coder. Excessive feedback can be prevented by requiring
`that the power gain of the filterP,(z) is not large. The reason
`for the high power gain is as follows.
`1 - P,(z) is approximately the
`The spectrum of the filter
`reciprocal of the speech spectrum (within a scaling constant).
`The low-pass filter used in the analog-to-digital conversion of
`the speech
`signal forces the reciprocal spectrum (and thus
`I 1 - P,(z)l) to assume a high value in the vicinity of the cutoff
`frequency of the filter. The power gain, which is equal to the
`integral of the power spectrum 11 - Ps(e2nifr)12 with respect
`to the frequency variable f, thus also becomes large.
`These artificially high power gains will not arise if the low-
`pass filter used in the sampling process was an ideal low-pass
`filter with a cutoff frequency exactly equal
`to the half the
`sampling frequency. The amplitude-versus-frequency response
`of a practical low-pass filter falls off gradually. The computed
`covariance matrix used in LPC analysis therefore has missing
`components corresponding
`to the
`speech signal rejected by
`the low-pass filter. The missing high-frequency components
`produce artificially low
`eigenvalues of the covariance matrix
`corresponding to eigenvectors related
`to such components.
`The high power gain of P, is precisely caused by the small
`eigenvalues. The covariance matrix of the
`low-pass filtered
`speech is nearly singular thereby resulting in a nonunique
`solution of the predictor coefficients. Thus,
`a variety of dif-
`ferent predictor coefficients can approximate the speech
`
`
`
`-
`
`0
`
`0
`
`-
`4
`
`0
`
` 0
`
`
`3
`2
`1
`F R E Q U E N C Y ( k H z )
`F R E Q U E N C Y ( k H z )
`Fig. 4. Two possible shapes for the spectrum of output noise (solid
`curve) in the coder shown in Fig. 2. T h e average level of the logarith-
`mic spectrum (shown as a dashed line) is the same in both cases. The
`speech spectrum is shown by the dotted curve.
`
`/,” log [l - P(eznjf T ) ] d f = 0.
`
`Equation (7) follows directly from (1 1) and (12).
`Assuming that the power of
`the quantizer noise 6, is not
`changed significantly by the feedback
`loop-a desirable con-
`average
`dition for satisfactory operation of the coder-the
`value of log power spectrum of output
`noise is then deter-
`mined solely by the quantizer and is not altered by the choice
`of the filter F or the predictor P. The filter F, however, re-
`distributes the noise power from one frequency
`to another.
`Thus, reduction in quantizer noise at one frequency can
`be
`obtained only at the expense of increasing the quantizer noise
`at another frequency.
`Since a large part of perceived noise
`in a coder comes from the frequency
`regions where the signal
`level is low, the filter F can be used to reduce the noise in such
`regions while increasing the noise in the formant regions where
`the noise could be effectively masked by the speech
`signal.
`Some examples of the possible shapes for the spectrum of the
`quantizing noise together with
`the speech spectrum are illus-
`trated in Fig. 4. In each case, the logarithmic spectrum of the
`quantizing noise has equal area above and below the average
`level shown by the dashed line.
`IV. APPLICATION TO SPEECH SIGNALS
`A. Selection of Predictor
`Linear prediction is a well-known method of removing the
`is done
`redundancy in a signal. For speech, the prediction
`most conveniently in two separate stages [4], [8] : a first pre-
`diction based on the short-time spectral envelope of speech,
`on the periodic nature of
`and a second prediction based
`the
`spectral fine structure. The short-time spectral
`envelope of
`speech is determined by the frequency response of the vocal
`tract and for voiced speech also by the spectrum of the glottal
`pulse. The spectral fine structure
`arising from the quasi-
`periodic nature of voiced speech is determined mainly by the
`pitch period. The fine structure for
`unvoiced speech is ran-
`dom and cannot be used for prediction.
`
`Ex. 1027 / Page 4 of 8
`
`
`
`ATAL AND SCHROEDER: PREDICTIVE CODING OF SPEECH SIGNALS
`
`251
`
`t
`
`i
`
`spectrum equally well in the passband of the low-pass fdter.
`We wish to avoid solutions which lead
`to high power gains
`of the predictor P,.
`The ill-conditioning of the covariance matrix can be avoided
`by adding to the covariance matrix another matrix propor-
`tional to the
`covariance matrix of high-pass filtered white
`noise. We define a new covariance matrix & (with its (q) term
`represented by Sij) and a new correlation vector
`c^ (with its
`ith term represented by F j ) by the equations
`@ij = @ij -+ hemin P i - j
`and
`ci = ci 4- XEmin pi
`where
`
`(1 5)
`
`(1 6)
`
`A
`
`A
`
`? !
`
`"
`
`
`'
`'
`'
`'
`"
`3
`2
`1
`F R E Q U E N C Y ( k H z )
`Fig. 5. Spectral envelopes of speech based on LPC analysis with high-
`frequency correction for A = 0 (solid curve) and for A = 0.05 (dot-
`ted curve). The power gains in the two cases are 204.6 and 12.6,
`respectively.
`
`4
`
`@ i j = ( s n - i s n - j ) ,
`cj = (s,sn- j ) ,
`A is a small constant (suitable values are in the range 0.01-0.10),
`emin is the minimum value of the mean-squared prediction error,
`pi is the autocorrelation of the high-pass filtered white noise
`204.6 for h = 0 to 12.6 for A = 0.05. The speech signal was
`at a delay of i samples, and ( ) indicates averaging over the
`10 kHz. The anti-aliasing filter
`sampled at a rate of
`had at-
`speech samples contained in the analysis segment. Ideally, the
`tenuation of 3 dB at 4.2 kHz and more than 40 dB at 5 kHz.
`high-pass filter should be the filter complimentary to the low-
`pass fdter used in the sampling process. We have obtained rea-
`Prediction Based on Spectral Fine Structure
`high-pass filter [$(l -
`sonably satisfactory results with the
`For this filter, the autocorrelations are po = $, p1 =
`Adjacent pitch periods in voiced speech show considerable
`2-')I"
`- , p2 - , and p k = 0 for k > 2. By making the scale factor
`similarity. The quasi-periodic nature of the signal is present-
`- 1
`although to a lesser extent-in
`the difference signal obtained
`on the noise covariance matrix in (1 5) and (16) proportional
`after prediction on the basis of spectral envelope. The period-
`to the mean-squared prediction error, we find that it
`is pos-
`icity of the difference signal can be removed by further predic-
`sible to use a fixed value of h. The results are not very sensi-
`tion. The predictor for the difference signal can be character-
`tive to small variations in the value of X. The minimum value
`ized in the z-transform notation by
`of the mean-squared prediction error
`is determined by the
`Cholesky decomposition [8] of the original covariance matrix
`(1 8)
`P ~ ( z ) = P ~ z - ~ + '
`-+p2Z+'
`+ p 3 ~ - ~ - l
`[ ( ~ ~ j ] . A modified form [9] of the covariance method is used
`where M represents a relatively long delay in the
`range 2 to
`to determine the predictor coefficients from
`the new covari-
`ance matrix 6. The first two steps in this modifed procedure
`to a pitch
`20 ms. In most cases, this delay would correspond
`periods).
`period (or possibly,
`an integral number of pitch
`are identical to the usual covariance method [8]. That is, the
`matrix & is expressed as a product of a lower triangular matrix
`The degree of periodicity in the difference
`signal varies with
`pl, p2, and p3
`frequency. The three amplitude coefficients
`L and its transpose Lt by Cholesky decomposition and a set of
`linear equations L q = c^ is solved. The partial correlation at a
`provide a frequency-dependent gain factor in the pitch-predic-
`delay m is obtained from
`tion process. We found it necessary
`to use at least a third-
`order predictor for pitch prediction. The difference
`signal
`4m
`after prediction based on spectral envelope has a nearly flat
`spectrum up to half the sampling frequency. Due to a fixed
`sampling frequency unrelated
`to pitch period,
`the individual
`samples of the difference signal do not show a high period-
`to period correlation. The third-order pitch predictor pro-
`vides an interpolated value with a much higher correlation
`than the
`individual samples. Higher order pitch predictors
`provide even better improvement in the prediction gain.
`Let the nth sample of the difference
`signal after the first
`("formant") prediction be given by
`
`z1 4
`
`l t 2
`
`rm =
`
`[<si) -
`
`(1 7)
`
`where qm is the rnth component of q . The partial correlations
`are transformed to predictor coefficients using the well-known
`relation between the partial correlations and the predictor co-
`[7, p. 1101. The modified pro-
`efficients for all-pole filters
`cedure ensures that all of the zeros of the polynomial 1 - p&)
`are inside the unit circle. Using the above procedure, reason-
`able power gains are realized without introducing significant
`bias in the spectrum at lower frequencies.' Examples of spec-
`computed for h = 0 (uncorrected)
`tral envelopes of speech
`and for X = 0.05 with the high-pass filter [$(l - z-l)]
`are
`illustrated in Fig. 5. The power gain of P, decreased from
`
`2Another possible solution, namely, undersampling of the speech sig-
`nal, for avoiding excessive power gains was suggested by one of the
`reviewers of this paper. We did not try this solution and are unable to
`comment on its effectiveness.
`
`Ex. 1027 / Page 5 of 8
`
`
`
`252 IEEE
`
`TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING,
`
`VOL. ASSP-27, NO.
`
`3, JUNE 1979
`
`lfS
`
`log C( f ) = -log W ( f ) + - log W( f ) df.
`(23)
`f s
`The function 1 - F is the minimum-phase transfer function
`with spectrum G ( f ) and can be obtained by direct Fourier
`transformation or spectral factorization.
`A particularly
`simple solution to this problem
`is obtained by transforming
`1/G( f ) to an autocorrelation function by Fourier transforma-
`tion. By using a procedure similar to LPC analysis, the auto-
`correlation function can be used to determine a set of pre-
`dictor coefficients. The predictor coefficients determined in
`this manner are indeed the desired filter coefficients bk for
`the filter F. The solution
`is considerably simplified also if
`the noise-weighting function W ( f ) is expressed in terms of a
`filter transfer function whose poles and zeros lie inside the
`unit circle.
`
`C. Generalized Predictive Coder for Speech
`The block
`diagram of a generalized predictive coder for
`speech signals is shown in Fig. 7. The high frequencies in the
`speech signal are preemphasized with a filter 1 - 0.42-' prior
`to encoding. At the output, the high frequencies are deem-
`phasized back with an
`inverse filter (1 - O . ~ Z - ' ) - ' .
`The
`speech samples are filtered by the filter 1 - P, to produce the
`difference signal after the first prediction based on the short-
`time spectral envelope of the speech signal. The second pre-
`dictor Pd based on pitch periodicity forms an estimate d; of
`the difference signal at the nth
`sample based on the past
`quantizer output samples. The quantizer error is filtered and
`peak-limited to produce the sample f,. The composite signal
`q, = d, - dl, - f , is quantized by an adaptive three-level
`quantizer. The parameters
`of the quantizer are selected to
`be optimum [ 101 for a Gaussian input with standard deviation
`equal to the rms value of the prediction error (after predic-
`tion by both Ps and Pd). For a uniformly spaced three-level
`quantizer, the optimum
`spacing between the quantizer out-
`put levels is 1.22 times the rms value of the prediction error.
`Both predictors and the quantizer parameters
`are reset once
`every 10 ms. An interval of 20 ms is used to minimize the
`prediction error in determining the predictor parameters. For
`a total of 40 s of speech spoken by two male and two female
`speakers, the average prediction gains for the two predictors-
`based on spectral envelope and pitch periodicity-were found
`to be 11 .O and 5.6 dB, respectively. The average signal-to-
`noise ratio (SNR) of the three-level quantizer was found to
`be 6.1 dB. The speech was relatively noise-free with an aver-
`age SNR of approximately 40 dB.
`It can be shown that the stability of the feedback loop is
`ensured only if the function 1 - F is a minimum-phase transfer
`function [l 11. Several methods of ensuring the stability of
`the feedback loop were considered for situations where the
`minimum-phase requirement could not be satisfied. A simple
`and effective solution was found to be to limit the
`peak
`
`0
`
`20
`
`6 0
`4 0
`T I M E
`( M S E C )
`Fig. 6. (a) Speech waveform.
`(b) Difference signal after prediction
`based on spectral envelope (amplified 10 dB relative to the speech
`(e) Difference signal after prediction based on pitch
`waveform).
`periodicity (amplitied 20 dB relative to the speech waveform).
`
`8 0
`
`1 0 0
`
`The delay M of the predictor Pd(z) is defined as the delay for
`which the normalized correlation coefficient between d, and
`dn-M is highest. The coefficients Dl, pz, and p3 are deter-
`mined by minimizing the mean-squared prediction error be-
`tween d, and its predicted value. The minimization procedure
`leads to a set of simultaneous linear equations in the three un-
`knowns PI, Pz, and 8 3 .
`Examples of prediction error signals after each stage of pre-
`diction together with the original speech signal are illustrated
`in Fig. 6. The prediction error after the first prediction based
`on the spectral envelope is amplified by 10 dB in the display.
`The prediction error after the second prediction based on pitch
`periodicity is amplified by an additional 10 dB. The predic-
`tion error after pitch prediction
`is quite noise-like in nature.
`Its spectrum-including both envelope and fine structure-is
`nearly white.
`
`B. Selection of the Feedback Filter
`The filter F providing feedback of
`in
`the quantizer error
`Fig. 2 can be chosen to minimize an error measure in which
`the noise is weighted according to some subjectively mean-
`ingful criterion. For example,
`an effective noise-to-signal
`ratio can be defined by weighting the noise power at each fre-
`quency f by a function W(f). For a fixed quantizer, the spec-
`trum of the output noise is proportional to I(1 - F)/(1 - P)I2.
`The signal spectrum is proportional to I l/(l - P)Iz. The ratio
`of noise power to signal power at any frequency f is propor-
`tional to
`C(f) = 11 - F(eZAffT)Iz.
`One could choose F to minimize
`
`(20)
`
`under the constraint
`
`Ex. 1027 / Page 6 of 8
`
`
`
`ATAL AND SCHROEDER: PREDICTIVE CODING OF SPEECH SIGNALS
`
`253
`
`TRANSMITTER
`
`RECEIVER
`
`Fig. 7. Block diagram of a generalized predictive coder for speech.
`
`c
`
`6 0
`
`,- P,
`
`SPEECH
`
`t
`
`0 ' " ' ' " '
`3
`2
`1
`0
`F R E Q U E N C Y ( k H z )
`Fig. 8. Spectral envelopes of output quantizing noise (dotted curve)
`and the corresponding speech spectrum (solid curve) for F = 0.
`
`0
`0
`
`2
`3
`1
`F R E Q U E N C Y ( k H z )
`Fig. 9. Spectral envelope of the speech signal and flat quantization
`noise (F = Ps).
`
`the feed-
`The peak limiter in
`amplitude of the samples
`f,.
`back loop limits the samples f , to a maximum value of twice
`the rms value of the prediction error. For one of the choices
`for the filter F, several instances of instability in the feed-
`back loop were encountered without the peak limiter. How-
`ever, our results based
`on extensive testing with the speech
`data revealed no significant increase of the quantization noise
`after inclusion of the peak limiter in the feedback loop.
`Several choices for the feedback filter F i n Fig. 7 were in-
`vestigated. Some of
`the more interesting ones are
`discussed
`below.
`1) Assume that the noise-weighting function W ( f ) in (23)
`is constant for all frequencies. This would be a good choice
`if our ears were equally sensitive to quantizing distortion at
`is F = 0. In this case, the coder
`all frequencies. The solution
`output noise has the
`same spectral envelope as the original
`speech (but at a lower level) as shown in Fig. 8. This par-
`to a signal-to-noise ratio of 13 dB with a
`ticular choice leads
`three-level quantizer. Subjectively,
`the reconstructed speech
`
`the distortion is heard mostly at low fre-
`is quite noisy. But
`that W ( f ) =constant is a good choice
`quencies implying
`at
`the higher frequencies.
`2) Assume that the distortion is subjectively more important
`in the formant
`regions and less important in the regions be-
`tween formants. As an extreme case, let W = I l - PSI-* which
`leads to F = P,-the
`same system as the original predictive
`coder shown in Fig. 1. For a given quantizer, such a choice
`results in minimum unweighted noise power in
`the recon-
`structed speech. For the three-level quantizer,
`an average
`SNR of 23 dB is obtained. Subjectively,
`the reconstructed
`speech from
`the coder is much less noisy than for the case
`F = 0. But, the quantization noises in the two
`cases sound
`very different. For F = P,, the spectrum of the output noise
`(see Fig. 9) is white, producing a very hgh SNR at the for-
`mants but a poor one in between the formants.
`3) Select F somewhere in between the two extreme choices
`discussed above. As an example, let
`F = P,(cYz-' )
`
`(24)
`
`Ex. 1027 / Page 7 of 8
`
`
`
`254
`
`IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND
`
`SIGNAL PROCESSING, VOL. ASSP-27,
`
`NO. 3, JUNE 1979
`
`t
`
`V. CONCLUDING REMARKS
`Predictive coding is a promising approach for speech coding.
`It provides a practical method of realizing substantial savings
`in transmission requirements of digitized speech without com-
`promising the speech quality while, at the same time, provid-
`ing robust performance
`across different speakers and spoken
`material. This paper focuses on a hitherto neglected aspect of
`predictive coding systems-namely, the subjectively weighted
`A large part
`spectrum of the noise introduced by the coder.
`of the perceived coder noise comes from the frequency regions
`where the noise is not masked by the speech signal. A simple
`modification of
`the conventional predictive coder permits
`adjustment of the noise spectrum. It
`is shown that consider-
`able improvement in speech quality can be achieved by proper
`shaping of the
`noise spectrum dependent on the short-time
`spectral envelope of the speech sign