throbber
IEEE TRANSACTIONS ON ACOUSTICS, SPEECH. AND SIGNAL PROCESSING. VOL. 37, NO. 4, APRIL I989
`
`Pitch Prediction Filters in Speech Coding
`
`RAVI P. RAMACHANDRAN AND PETER KABAL
`
`467
`
`F(n)
`
`Abstract-Prediction error filters which combine short-time predic(cid:173)
`tion (formant prediction) with long-time prediction (pitch prediction)
`in a cascade connection are examined. A number of different solution
`methods (autocorrelation, covariance, Burg) and implementations
`(transversal and lattice) are considered. It is found that the F -P cas(cid:173)
`cade (formant filter before the pitch filter) outperforms the P-F cas(cid:173)
`cade for both transversal- and lattice-structured predictors. The per(cid:173)
`formances of the transversal and lattice forms are similar. The solution
`method that yields a transversal structure requires a stability test and,
`if necessary, a consequent stabilization. The lattice form allows for a
`solution method which ensures a stable synthesis filter. Simplified so(cid:173)
`lution methods are shown to be applicable for the pitch filter (multitap
`case) in an F-P cascade. Furthermore, new methods to estimate the
`appropriate pitch lag for a pitch filter are proposed for both transver(cid:173)
`sal and lattice structures. These methods perform essentially as well as
`an exhaustive search in an F -P cascade. Finally, the two cascade forms
`are implemented as part of an APC coder to evaluate their relative
`subjective performance.
`
`I. INTRODUCTION
`
`I N this paper, speech coder configurations which use two
`
`nonrecursive prediction error filters to process the in(cid:173)
`coming speech signal are examined. Conventionally, the
`prediction is carried out as a cascade of two separate fil(cid:173)
`tering operations. The first filter, referred to here as the
`formant filter, removes near-sample redundancies. The
`second is termed the pitch filter and acts on distant-sample
`waveform similarities. The resulting residual signal is
`quantized and coded for transmission. In an adaptive pre(cid:173)
`dictive coder (APC), these predictors are placed in a feed(cid:173)
`back loop around the quantizer. An additional quantiza(cid:173)
`tion noise shaping filter can be employed to reduce the
`perceived distortion in the decoded speech [ 1], [2]. An
`alternative description of an APC coder uses an open-loop
`predictor configuration and a noise feedback filter [3]. A
`block diagram of such a configuration is shown in Fig. 1.
`This type of open-loop arrangement is also used in code(cid:173)
`excited linear prediction (CELP) [4]. In CELP, the cod(cid:173)
`ing is accomplished by selecting a waveform from a given
`repertoire of waveforms. The selection process uses an
`analysis-by-synthesis strategy. Conceptually, each can(cid:173)
`didate waveform is passed through the synthesis filters to
`find that one which produces the best quality speech.
`
`Manuscript received June 10, 1987; revised August 30, 1988. This work
`was supported by the Natural Sciences and Engineering Research Council
`of Canada.
`R. P. Ramachandran is with the Department of Electrical Engineering,
`McGill University, Montreal, P.Q., Canada H3A 2A7.
`P. Kabal is with the Department of Electrical Engineering, McGill Uni(cid:173)
`versity, Montreal, P.Q., Canada H3A 2A7 and lNRS-Telecommunica(cid:173)
`tions, Universite du Quebec, Verdun, P.Q., Canada H3E 1H6.
`IEEE Log Number 8826113.
`
`(a)
`
`i(n)~ +~}---~.<(n)
`+~+~
`
`Fig. I. Block diagram of an APC coder with noise feedback. (a) Analysis
`phase. (b) Synthesis phase.
`
`(b)
`
`Noise shaping is accomplished by including a frequency
`weighting in the error criterion which is used to choose
`the best waveform.
`In both APC and CELP, the residual signal or the se(cid:173)
`lected codeword (after scaling by the gain factor) is passed
`through a pitch synthesis and a formant synthesis filter to
`reproduce the decoded speech. The filtering in the syn(cid:173)
`thesis phase can be viewed in the frequency domain as
`first inserting the fine pitch structure and then shaping the
`spectral envelope (formant structure).
`The analysis to determine the predictor coefficients is
`carried out frame by frame. The filter coefficients are then
`coded for transmission. The quantization of these coeffi(cid:173)
`cients is outside the scope of the present study. These pa(cid:173)
`rameters, along with the quantized excitation informa(cid:173)
`tion, are used by the decoder to reconstruct the speech.
`The frame update rate is chosen to be slow enough to keep
`the transmission rate required small, yet fast enough to
`allow the speech segment under analysis to be adequately
`described by a set of constant parameters. Depending on
`the application, the effective frame size usually corre(cid:173)
`sponds to time intervals between 5 and 20 ms.
`The aim of this paper is to study predictors which in(cid:173)
`corporate both short-time and long-time prediction. The
`effect of the ordering of the prediction filters in the cas(cid:173)
`cade connection is considered. The filters will be imple(cid:173)
`mented in both lattice and transversal forms. In addition,
`methods to determine the lag used for the pitch filter will
`be derived. The two predictor configurations incorporat(cid:173)
`ing the transversal and lattice solutions are tested as part
`of an APC coder that is equivalent to the one shown in
`Fig. 1. This allows us to access the relative perceptual
`quality of the decoded speech that results from the use of
`different configurations and solutions.
`The next section will introduce the different configura(cid:173)
`tions for formant and pitch filters. This is followed by an
`
`0096-3518/89/0400-0467$01.00 © 1989 IEEE
`
`ZTE EXHIBIT 1023
`
`Page 1 of 12
`
`

`
`468
`
`IEEE TRANSACTIONS ON ACOUSTICS, SPEECH. AND SIGNAL PROCESSING, VOL. 37, NO. 4. APRIL 1989
`
`analysis of a prediction error filter which uses general de(cid:173)
`lays. This general structure subsumes both formant and
`pitch filters and allows for both autocorrelation and co(cid:173)
`variance analyses. The following section makes the anal(cid:173)
`ysis specific to pitch filters. A comparison of the tech(cid:173)
`niques is given in Section V. Then, the stability properties
`of the synthesis filters are examined for different config(cid:173)
`urations. Section VII examines means to determine an ap(cid:173)
`propriate lag for the pitch filter. Finally, Section VIII dis(cid:173)
`cusses the relative performance of the different options
`when implemented as part of a speech coder.
`
`II. FORMANT AND PITCH PREDICTORS
`The conventional formant predictor has a transfer func(cid:173)
`tion
`
`Nf
`F(z) = ~ akz -k
`
`k~l
`
`( 1 )
`
`The order Nf is typically between 8 and 16. The system
`function of the noise feedback filter is usually related to
`that of the formant predictor. One choice is to let N ( z) =
`F ( z /ex) where 0 < ex < I. This reduces the perceptual
`distortion of the output speech by improving the signal(cid:173)
`to-noise ratio (SNR) in regions where the spectral level is
`low. However, this improvement comes at the expense of
`decreased SNR in the formant regions [ 1]. At the re(cid:173)
`ceiver, the formant synthesis filter has a transfer function
`HF(z) = 1/(1- F(z)).
`The pitch predictor has a small number of taps NP. The
`delays associated with these taps are bunched around the
`pitch lag value. The system function for a transversal form
`pitch predictor is
`
`1 tap
`
`2 taps
`
`3 taps.
`
`(2)
`The pitch lag M is usually updated along with the coeffi(cid:173)
`cients. The pitch synthesis filter has a system function
`Hp(Z) = 1/(1 - P(z)).
`The conventional predictor configuration uses a cas(cid:173)
`cade of a formant predictor and a pitch predictor, referred
`to here as an F-P cascade. This structure can be moti(cid:173)
`vated from a standard speech production model, which
`decouples the quasi-periodic source (the vocal folds) from
`the vocal tract filter.
`In the context of speech coding, pitch predictors are
`most useful during voiced speech since voiced speech is
`characterized as a quasi-periodic signal with considerable
`correlation between samples separated by a pitch period.
`In the F-P cascade, the formant predictor removes the
`near-sample correlations to a large extent. The resulting
`formant predicted signal is a low-density quasi-periodic
`signal consisting mainly of pitch spikes. The pitch pre-
`
`dictor acts on this residual signal. If the pitch period is an
`integral number of samples, a one-tap pitch predictor can
`remove pitch period correlations. For nonintegral pitch
`periods, a multitap pitch predictor serves somewhat as an
`interpolating filter for the removal of these distant-sample
`correlations.
`The formulation used for the pitch filter is such that it
`removes long-term correlations, whether due to actual
`pitch excitation or not. The use of the term ''pitch filter''
`is somewhat misleading in describing the action of this
`filter for unvoiced speech and even to some extent for
`voiced speech. However, for ease of reference and to keep
`with past nomenclature, in this paper the long delay filter
`will be referred to as the pitch filter, and the correspond(cid:173)
`ing lag value will be referred to as the pitch lag.
`The cascade connection of the predictors can also have
`the pitch predictor precede the formant predictor (referred
`to as a P-F cascade). In the P-F cascade, the pitch filter
`is chosen to reduce long-term correlations such as those
`due to quasi-periodic input signals. The remaining near(cid:173)
`sample correlations are handled by the formant predictor.
`The filter coefficients for the two filters in the cascade are
`determined in a sequential fashion. The coefficients of the
`first filter are found, and then the coefficients of the sec(cid:173)
`ond filter are determined. The sequential solution process
`gives different results for the F-P cascade and the P-F
`cascade connections. In addition, the initial conditions at
`the time of coefficient update are different for the F-P and
`P-F connections. This would account for differences even
`when the two forms use the same coefficients for their
`constituent parts.
`The individual filters can be implemented in either
`transversal or lattice form. As shown later, the various
`implementations and solution methods give rise to sys(cid:173)
`tems with differing performance and differing stability
`properties.
`
`III. PREDICTORS WITH GENERAL DELAYS
`In this section, general formulations for determining the
`predictor coefficients for both formant and pitch predic(cid:173)
`tors in transversal or lattice form are developed. Later,
`these formulations are made more specific for the case of
`pitch filters.
`
`A. Transversal Implementation
`A model for calculating the predictor coefficients for a
`transversal implementation is shown in Fig. 2. The input
`signal x ( n) is multiplied by a data window w d ( n) to give
`xw ( n). The signal xw (n) is predicted from a set of its pre(cid:173)
`vious samples to form an error signal
`
`L
`e(n) = xw(n)- ~ ckxw(n- Mk)·
`
`k~I
`
`(3)
`
`The values Mk are arbitrary but distinct integers corre(cid:173)
`sponding to delays of the signal xw ( n). The final step is
`to multiply the error signal by an error window we ( n) to
`
`Page 2 of 12
`
`

`
`RAMACHANDRAN AND KABAL: PITCH PREDICTION FILTERS
`
`469
`
`"'("I
`
`e,( 11 I
`
`Fig. 2. Analysis model for transversal predictors.
`
`obtain a windowed error signal ew ( n) where
`
`ew(n) = we(n) e(n)
`
`L
`we(n) xw(n) - we(n) ~ ckxw(n- Mk).
`k= l
`
`The squared error is defined by
`
`E
`
`2 = ~ e~,.(n).
`n = ~oo
`
`(4)
`
`(5)
`
`2
`The coefficients ck are computed by minimizing E
`• This
`leads to a linear system of equations that can be written
`in matrix form ( <Pc = a):
`
`¢(Mb MJ) cf>(Mb M2 )
`¢(M2, M1 ) ¢(M2, M2)
`
`¢(Mb ML)
`¢(M2 , ML)
`
`cf>(ML, M1)
`
`cf>(ML, M2)
`
`cf>(ML, ML)
`
`¢(0, MJ)
`
`¢(0, M2 )
`
`¢(0, ML)
`
`where
`
`Cl
`
`c2
`
`CL
`
`(6)
`
`cf>(i,j)
`
`~ w;(n) xw(n- i) xw(n- j).
`n = -oo
`
`(7)
`
`For both formant and pitch predictors, the delays Mk
`are grouped. A formant predictor has a set of delays Mk
`= k fork = I to N1. A pitch predictor has a small number
`of delays Mk = M + k for k = 0 to NP - 1. Grouping the
`pitch taps reduces the amount of side information (which
`is sent to the decoder) needed to specify the delay values.
`1) Autocorrelation Method: The autocorrelation meth(cid:173)
`od results if we ( n) = I for all n. The data window w" ( n)
`is
`typically
`time-limited (rectangular, Hamming, or
`other). An important consideration is the minimum-phase
`property of the prediction error filter A ( z) = I - E f = 1
`ckz -M>. If A ( z) is minimum phase, the corresponding
`synthesis filter I/ A ( z) used at the decoder is stable. The
`autocorrelation method can be shown to give a minimum(cid:173)
`phase formant filter [5]. In the case of pitch filters, the
`minimum-phase property does not hold in general. An ex-
`
`ception occurs if the delays corresponding to the coeffi(cid:173)
`cients are uniformly spaced, i.e., M, = k M 1
`• This point
`is discussed further in Appendix A.
`The matrix <Pis always symmetric and positive definite.
`It is also Toeplitz if the intercoefficient delays are equal.
`Depending on whether <P is Toeplitz or not, either the
`Levinson recursion 1 or the Cholesky decomposition can
`be used to solve the autocorrelation equations.
`2) Covariance Method: The covariance method results
`if w d ( n) = 1 for all n and the error window is rectangu(cid:173)
`lar, we(n) =I forO:$ n :$ N- I. More general error
`windows in a covariance approach have been suggested
`by Singhal and Atal [6]. The covariance method does not
`guarantee that A ( z) is minimum phase, but it does mini(cid:173)
`mize the error energy for each frame. The resulting sys(cid:173)
`tem of equations (6) has the entries in <P and a defined
`with no window applied to the input signal,
`N-l
`¢(i,j) = ~ x(n- i)x(n -j)
`n=O
`
`(8)
`
`where N is the frame length.
`An alternative method is the modified covariance tech(cid:173)
`nique, which does guarantee a minimum-phase filter [ 1].
`This technique works well for formant predictors and is
`used in many of the experiments described later. A dis(cid:173)
`cussion of the modified covariance approach and its rel(cid:173)
`evance to the pitch prediction problem appears in Appen(cid:173)
`dix B.
`
`B. Lattice Implementation
`Lattice methods have been employed in linear predic(cid:173)
`tion and are useful in implementing a lattice-structured
`formant preditor [7]. 2 Here, we consider more general lat(cid:173)
`tice forms with only a subset of the stages actually per(cid:173)
`forming a filtering operation. A lattice-structured predic(cid:173)
`tor consisting of a total of P stages is an all-zero filter, as
`depicted in Fig. 3. The input signal is x ( n), and the final
`error signal is e ( n) = fp ( n). Stage i has a reflection coef(cid:173)
`ficient Ki and forms both the forward residual f ( n) and
`backward residual bi ( n). Reflection coefficients will be
`calculated for stages corresponding to one of the delay
`values Mk. Other stages will have zero-valued reflection
`coefficients. For these null stages, the forward error term
`propagates unaltered, and the backward error term is
`merely delayed. A lattice form filter will be minimum
`phase if all of the reflection coefficients have magnitudes
`which are smaller than one [7].
`For those stages for which a reflection coefficient is cal(cid:173)
`culated, the aim, in terms of maximizing the prediction
`
`1A distinction is made between the general form of the Levinson recur(cid:173)
`sion, which allows an arbitrary right-hand-side vector, and the Levinson(cid:173)
`Durbin recursion, which applies if the elements of o. appear in the first row
`of <1>.
`2For formant predictors, one can convert between transversal and lattice
`implementations with identical impulse responses. However. in a time(cid:173)
`varying environment, they are not equivalent due to their different initial
`conditions at frame boundaries.
`
`Page 3 of 12
`
`

`
`470
`
`.r(nl
`
`IEEE TRANSACTIONS ON ACOUSTICS. SPEECH. AND SIGNAL PROCESSING. VOL. 37. NO. 4. APRIL 1989
`
`fo(u)
`
`f!(n)
`
`_
`
`+
`
`/1'~~>1"1
`•In)
`fp(u)
`
`+
`
`llp_l()
`
`Fig. 3. Analysis model for lattice predictors.
`
`-
`
`~
`
`+
`
`,()
`
`gain alone, is to minimize the mean-square value of the
`forward residual. However, this criterion does not ensure
`that the magnitude of the resulting reflection coefficients
`is bounded by one and therefore does not ensure the sta(cid:173)
`bility of the corresponding synthesis filter. The Burg al(cid:173)
`gorithm minimizes the sum of the mean-square values of
`the forward and backward residuals and ensures the sta(cid:173)
`bility of the synthesis filter. It also has the desirable prop(cid:173)
`erty of guaranteeing that the mean-square value of the for(cid:173)
`ward residual is nonincreasing across each stage of the
`lattice.
`For the Burg method, the reflection coefficient K; is cal(cid:173)
`culated as
`
`K;
`
`2C;-I
`F;-1 + B;-1
`
`where
`
`N-1
`N-1
`C; = ~ jj(n) b;(n- 1), F; = ~ ff(n),
`n=O
`n=O
`
`N-1
`B; = ~ b~(n - 1 ),
`n=O
`
`(9)
`
`( 10)
`
`and N is the frame length. The mean-square value of the
`forward residual is reduced by the factor ( 1 - Kf) across
`stage i. A computationally efficient procedure, termed the
`covariance-lattice method [7], calculates the reflection
`coefficients using (9), but expresses them in terms of the
`covariance of the input signal. With this rearrangement,
`
`prediction error filters. Assume for the moment that the
`value of the pitch filter lag M is chosen so as to maximize
`the prediction gain. A discussion of the problem of esti(cid:173)
`mating M is postponed until a later section. The input to
`the pitch filter is d(n) for an F-P cascade and s(n) for a
`P-F cascade.
`
`A. Transversal Pitch Filters
`The autocorrelation and covariance methods can be used
`to determine the coefficients for a transversal-structured
`prediction error filter 1 - P(z). For the autocorrelation
`method, the input signal must be windowed. Convention(cid:173)
`ally, the window is of finite duration, and all the samples
`outside the range of the window are set to zero. The
`method is effective for formant predictors since the largest
`filter delay is usually small compared to the length of the
`window used. This ensures that the frame edge effects due
`to the zero-valued analysis samples preceding and follow(cid:173)
`ing the window are small. In contrast to the formant pre(cid:173)
`dictor, the delays used for a pitch predictor are compa(cid:173)
`rable to, or even larger than, the frame and window
`lengths. For a pitch filter, frame edge effects are no longer
`negligible. The problem is not solved by using windows
`that are longer than the largest delay of the pitch predictor
`since too much time-averaging greatly reduces the per(cid:173)
`formance and, in addition, changes in the pitch lag are
`not adequately tracked. Experiments involving various
`window shapes (including windows dynamically adapted
`to the pitch lag) confirm that the performance of the re(cid:173)
`sulting pitch predictors is poor. Furthermore, there is no
`guarantee that the synthesis filter Hp(Z) derived using the
`autocorrelation method is stable if the filter has more than
`a single tap. 3
`The covariance method yields high prediction gains, but
`may give unstable pitch synthesis filters. Specifically, for
`three-tap pitch predictors in an F-P cascade, the system
`of equations is
`
`¢(M, M + 2)
`¢(M, M + 1)
`¢(M, M)
`¢(M + 1, M) ¢(M + 1, M + 1) ¢(M + 1, M + 2)
`
`¢(M + 2, M) ¢(M + 2, M + 1) ¢(M + 2, M + 2)
`
`l
`
`] l{3 1 ]
`
`l¢(0,M)
`
`]
`¢(0,M+1)
`¢(0, M + 2)
`
`( 11 )
`
`{3 2
`{3 3
`
`the computational complexity becomes comparable to the
`conventional covariance method.
`When applying the Burg formula, the lattice-structured
`prediction error filter (as in Fig. 3) is minimum phase even
`if some of the reflection coefficients are constrained to be
`zero. Note also that the lattice coefficients can be trans(cid:173)
`formed to direct form (impulse response) coefficients, al(cid:173)
`lowing for an alternative implementation of the filter in
`transversal form.
`
`IV. PITCH FILTER ANALYSIS METHOD
`This section discusses the analysis methods that are used
`to implement both transversal- and lattice-structured pitch
`
`where
`
`N-1
`<t>(i,j) = ~ d(n- i) d(n- j)
`n=O
`
`(12)
`
`and d ( n) is the input signal to the pitch predictor. The
`matrix ci> is not Toeplitz in general, and the Cholesky de(cid:173)
`composition can be used to solve the system of equations.
`For reasonable frame sizes and the small number of taps
`used for pitch filters, ci> can be modified to become Toe(cid:173)
`plitz with little loss in prediction gain. Then, the general
`
`3In our limited experiments with pitch filters derived using an autocor(cid:173)
`relation formulation, no instability was observed.
`
`Page 4 of 12
`
`

`
`RAMACHANDRAN AND KABAL: PITCH PREDICTION FILTERS
`
`471
`
`form of the Levinson recursion can be used to determine
`the predictor coefficients. Note that the Toeplitz nature of
`<I> does not guarantee that Hp(Z) is stable, but does allow
`for a more efficient solution of the system of equations.
`Stabilization schemes can be employed whenever H P ( z)
`is found to be unstable. Stabilization of the pitch filter is
`simple to implement and is derived from a computation(cid:173)
`ally efficient stability test [8]. The degradation in average
`pitch prediction gain due to stabilization has been found
`to be small for an F-P cascade [8].
`
`B. Lattice Pitch Filters
`The Burg method works well for a formant predictor.
`Here, the technique is used to develop a lattice imple(cid:173)
`mentation for a pitch predictor that is used in cascade con(cid:173)
`nections. The basic motivation for using the Burg ap(cid:173)
`proach is that the corresponding synthesis filter is stable.
`Hence, no stability test or fix-up is necessary. Even
`though a lattice filter involves more computations per
`sample than its transversal counterpart, computational
`convenience is provided in the above context. The com(cid:173)
`putation required for a stability test and the consequent
`stabilization is saved.
`Given the value of M, the reflection coefficients K; for
`i = 1 to M - 1 are set to zero. The first nonzero coeffi(cid:173)
`cient in the pitch filter is KM. In the case of formant filters
`and one-coefficient pitch filters, the impulse response of
`a lattice prediction error filter has the same form as that
`for a transversal filter. However, two- and three-coeffi(cid:173)
`cient lattice pitch prediction error filters do not have the
`same transfer functions as the transversal structured fil(cid:173)
`ters. The transfer functions of lattice prediction error fil(cid:173)
`ters are given by
`
`ples to allow a "warm-up" period before generating ac(cid:173)
`tual output samples. 4 This resets the memory in the filter
`to be the same as if the filter had been used for the infinite
`past. This strategy will be used in the implementations.
`
`V. COMPARISON OF TECHNIQUES
`The various predictor configurations were tested using
`the analysis phase of a general speech coder, such as
`shown in Fig. l(a), as a test bed. In comparing different
`configurations and algorithms involving a pitch filter, pre(cid:173)
`diction gain will be used as the performance measure. The
`prediction gain is used to avoid tying down the results to
`a specific type of coder. The aim is to assess the extent to
`which the predictors remove redundancies by measuring
`the energy of the resulting residual. For a general predic(cid:173)
`tor, the prediction gain is the ratio of the average energy
`at the input to that predictor to the average energy of the
`prediction residual. For the system shown in Fig. l(a),
`the formant gain is
`
`~ s2(n)
`GF = .,n,-----;c-_
`~d 2 (n).
`n
`
`( 14)
`
`A similar formula applies to the pitch gain. The overall
`prediction gain is the prediction gain for the cascade of
`the filters.
`For the present results, the value of pitch lag M is cho(cid:173)
`sen to be the one that gives the highest prediction gain.
`Although an exhaustive search for the best M is not com(cid:173)
`putationally practical, this approach will provide some in(cid:173)
`sight into the relative performance of the various config(cid:173)
`urations. Also, for the present, the pitch filter is not
`stabilized. The issue of stability is deferred to Section VI.
`
`A(z)
`
`+ KMKM+lz- 1
`
`-
`
`- KMz-M- KM+lz-(M+l)
`1 + KMKM+Iz- 2 - KMz-M
`+ (KMKM+I + KM+lKM+2)z-
`(KM+ I + KMKM+ I KM+2) z -(M+ I) - KM+2Z -(M+l) N, = 3.
`
`N, = 2
`
`( 13)
`
`The two- and three-coefficient lattice filters have terms in
`z -I and z -z which are absent in the corresponding trans(cid:173)
`versal filters. Note, however, that in the case of the three(cid:173)
`coefficient lattice filter, the reflection coefficients control
`the five nontrivial impulse response values, hence giving
`a configuration with only three degrees of freedom.
`In a pitch filter, the pitch lag changes from frame to
`frame. This variation of the position of the nonzero lattice
`coefficients can be detrimental to the performance. Con(cid:173)
`sider the case when the pitch lag increases from one frame
`to another. In the new frame, the backward residual in the
`lattice will have been filtered by both the old coefficients
`and the new coefficients. A remedy for this problem is to
`reset the backward residual to the delayed filter input sig(cid:173)
`nal at each frame boundary. In addition, it is beneficial to
`back up the filter at each frame boundary by NP - 1 sam-
`
`The conditions common to all experiments involve the
`use of a formant predictor with ten coefficients and a pitch
`predictor with one, two, or three coefficients. The input
`speech samples comprise six utterances, three spoken by
`a male and three spoken by a female. The speech database
`consists of high-quality recordings (low-pass filtered to
`just below 4kHz) at a sampling frequency of 8kHz. The
`relevant average prediction gains for each sentence were
`computed and converted to decibels. Since the relative
`ordering of the methods is more or less preserved for each
`sentence, the tables present averages across the sentences
`of these decibel values.
`
`4This strategy is equivalent to converting the reflection coefficients to
`direct form coefficients [see (13)] and then implementing the predictor in
`transversal form.
`
`Page 5 of 12
`
`

`
`472
`
`IEEE TRANSACTIONS ON ACOUSTICS. SPEECH. AND SIGNAL PROCESSING. VOL. 37. NO. 4. APRIL !989
`
`TABLE I
`PREDICTION GAINS FOR FORMANT/PITCH PREDICTORS ( 80-SAMPLE
`FRAMES). THREE NUMBERS IN AN ENTRY REFER TO 0NE-.TWO-,
`AND THREE-TAP PITCH FILTERS
`
`A. Cascade Configurations
`In comparing the F-P and P-F configurations, analysis
`is carried out for 80-sample frames (corresponding to 10
`ms intervals). This somewhat rapid update allows for a
`higher prediction gain and tends to illustrate the differ(cid:173)
`ences between schemes more clearly. The range of pitch
`lags is set to cover the range for both male and female
`speakers. Minimum and maximum values forM of 20 and
`120 are used.
`Table I shows a comparison of several techniques for
`one-, two-, and three-tap pitch filters. The formant pre(cid:173)
`dictor is implemented in transversal form with the coef(cid:173)
`ficients determined by using the modified convariance
`method. The pitch filter is implemented in transversal
`form (covariance method) or lattice form (Burg method).
`In the case of the F-P cascade, the pitch lag is that which
`maximizes the pitch prediction gain and hence also the
`overall prediction gain. Only a single figure appears for
`the formant prediction gain since the formant filter is un(cid:173)
`affected by the choice of pitch coefficients and pitch lag.
`For the P-F cascade, there is more of an interaction
`between the pitch predictor and the formant predictor.
`Values are given for the case in which the pitch lag is
`chosen to maximize the prediction gain for the pitch filter
`alone and the case in which the pitch lag is chosen to
`maximize the overall prediction gain. The myopic view
`of choosing the pitch lag to maximize only the pitch pre(cid:173)
`diction gain gives the situation in which the pitch predic(cid:173)
`tor has a higher prediction gain than the formant predic(cid:173)
`tor. This phenomenon has also been observed in [9]. The
`situation reverses when the pitch lag is chosen to maxi(cid:173)
`mize the overall prediction gain. It should be noted here
`that the search for the best lag to maximize the overall
`prediction gain is impractically complex. Note also that
`as the number of taps in the pitch predictor is increased,
`there is an increase in the pitch prediction gain at the ex(cid:173)
`pense of a decrease in the formant prediction gain.
`Note that the F-P cascade consistently outperforms the
`P-F cascade in average overall prediction gain, and even
`more so for the myopic choice of pitch lag. 5
`There is only a small difference between the lattice im(cid:173)
`plementation of a pitch predictor and a transversal imple(cid:173)
`mentation, in spite of these forms having different im(cid:173)
`pulse responses. In fact, the lattice form for two and three(cid:173)
`tap filters in the F-P cascade slightly outperforms the
`transversal form for the utterances spoken by females.
`Examination of the final residual formed after formant and
`pitch prediction verifies that the pitch pulses are effec(cid:173)
`tively removed by a lattice pitch filter.
`For the previous experiments, a modified covariance
`approach guarantees a minimum-phase formant prediction
`error filter. Ancillary experiments were conducted to
`compare different options for the formant filter in an F-P
`
`metlwfl
`
`F-P
`
`P-F
`
`t rans\'ersal P
`lattice P
`transYf'rsai p(ll
`transversal P 121
`i:::~~: ~:~:
`
`oYer all
`pitch
`gain dB
`gain dB
`4.'2 S.3 0.8 20.3 214 219
`4.0 5.2 0.'
`20.1 21.3 218
`8.4 8.2 7.6! 10.1 11.4 11.9' 18.5 19.6 19.5
`13.7 1L:i 112 I
`6.1 8.9 9A 19.8 20.4 20.6
`9.5 u.s 13.::>
`rs.6 r9.r 2o.o 1
`6.2 7.9 9.4 19.8 19.9 20.4
`13.6 12.0 110
`:'\ot('s: p( 11 pitch lag dwst>u to maximize pitch prediction gain
`P{ 21 pitch lag chosen to maximizf' oYerall prNliction gain
`
`formant
`gain dB
`
`16.1
`
`i
`
`, o.r 7.3 6.5
`
`cascade. A lattice implementation of the formant filter
`using the reflection coefficients determined by a modified
`covariance method results in essentially no change in pre(cid:173)
`diction gain. Using a covariance formulation (imple(cid:173)
`mented in transversal form) for the formant filter im(cid:173)
`proves the prediction gain by only 0.1 dB, but introduces
`nonminimum-phase formant filters in about 4 percent of
`the frames.
`
`B. Formant-Pitch Interaction: Effects of Frame Size
`The success of the pitch predictor depends on the for(cid:173)
`mant residual having adjacent pitch pulses which are sim(cid:173)
`ilar in shape. Yet if the formant predictor varies signifi(cid:173)
`cantly from frame to frame, the pitch pulses may differ in
`detail. For short frames, the formant predictor coefficients
`may change significantly from frame to frame just due to
`the asynchronism between the frames and the positions of
`the pitch pulses. An investigation was made of the vari(cid:173)
`ation in prediction gain with changes in the sizes of the
`analysis frames for the formant and pitch filters. We re(cid:173)
`turn to the F-P cascade with the modified covariance
`method used to determine the coefficients of the transver(cid:173)
`sal formant predictor. Also, the covariance approach de(cid:173)
`termines the coefficients of a transversal pitch predictor.
`The results for different combinations of frame sizes are
`shown in Table II.
`Consider the performance of the pitch filter with a 40-
`sample analysis frame. The pitch gain increases as the
`length of the analysis frame for the formant predictor in(cid:173)
`creases from 40 to 80 and then levels off for a 160-sample
`formant analysis frame. At the same time, since the for(cid:173)
`mant prediction gain does not change significantly with
`the frame size, the overall prediction initially rises and
`then levels off. 6 For the 80-sample pitch analysis frame,
`the performance is again essentially constant with a
`change of the formant frame size from 80 to 160. Since
`the prediction gain remains high at the slow formant up(cid:173)
`date rates, the slow formant update rates are to be pre(cid:173)
`ferred since they involve less computation and require a
`smaller bandwidth for transmission. The number of frames
`with unstable pitch synthesis filters also depends on the
`frame size combination chosen. But as shown in the next
`
`5Th is ordering is also true for each utterance. except for one utterance
`in which the P-

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket