`Saint Lawrence Communications
`Exhibit 2019
`
`IPR2017-01077
`Saint Lawrence Communications
`Exhibit 2019
`
`
`
`AT&T
`
`DIGITAL PROCESSING
`
`OF
`
`SPEECH SIGNALS
`
`Lawrence R. Rabiner
`
`Ronald W. Schafer
`
`Acoustics Research Laboratory
`Bell Telephone Laboratories
`Murray Hill, New Jersey
`
`School ofElectrical Engineering
`Georgia Institute of Technology
`Atlanta, Georgia
`
`Prentice-Hall. Inc., Englewood Cliffs, New Jerscy 07632
`
`Il
`
`3lg 1
`
`
`
`PRENTICE-HALL SIGNAL PROCESSING SERIES
`
`Alan V. Oppenheim, Editor
`
`ANDREWS and HUNT Digital Image Restoration
`BRIGHAM The Fast Fourier Transform
`BURDIC Underwater Acoustic System Analysis
`CASTLEMAN Digital Image Processing
`CROCHIERE and RABINER Multirate Digital Signal Processing
`DUDGEON and MERSEREAU Multidimensional Digital Signal Processing
`HAMMING Digital Filters, 2e
`HAYKIN, ED. Array Signal Processing
`LEA, ED. Trends in Speech Recognition
`LIM, ED.
`Speech Enhancement
`MCCLELLAN and RADER Number Theory in Digital Signal Processing
`OPPENHEIM, ED. Applications of Digital Signal Processing
`OPPENHEIM, WILLSKY, with YOUNG Signals and Systems
`OPPENHEIM and SCHAFER Digital Signal Processing
`RABINER and GOLD Theory and Applications of Digital Signal Processing
`RABINER and SCHAFER Digital Processing of Speech Signals
`ROBINSON and TREITEL Geophysical Signal Analysis
`TRIBOLET Seismic Applications of Homomorphic Signal Processing
`
`
`
`Library 1)me Cataloging In Public-lion DIM
`Rubiner. Lawrence it (due)
`Digital procuring of speech rial-ls.
`(Prentice-Hall sisml processing mics)
`Includes biblloxnphiu and index.
`1. Speech processin‘ ryltenu.
`2. Di ‘ul elec-
`tronics.
`l. Schnferdlonnld W..jainnu
`I.
`ll. Title.
`“(7882365113
`fill .38 ‘04l2
`784555
`lSBN 0-lJ-213603-l
`
`© 1978 by Bell Laboratories, Incorporated
`
`All rights reserved. No part of this book
`may be reproduced in any form or
`by any means without permission in writing
`from the publisher and Bell Laboratories.
`
`Printed in the United States of America
`
`19
`
`18
`
`17
`
`16
`
`15
`
`14
`
`PRENTICE— HALL INTERNATIONAL, INc.. London
`PRENTICE- HALL or AUSTRALIA Pm, LIMITED, Sydney
`PRENTICE- HALL or CANADA, LTD., Toronto
`PRENTICE- HALL or INDIA PRlVATE LIMITED, New Delhi
`PRENTICE- HALL or JAPAN, INc., Tokyo
`PRENTlCE—HALLor: SOUTHEAST ASIA PrE. LTD., Singapore
`WHITEHALL Booxs LIMITED, Wellington, NewZealand
`
`T
`
`ents,
`
`0 our par
`Dr. and Mrs. Nathan Rabiner
`
`and
`ML and Mm William Schafer’
`for instilling within us the thirst for knowledge
`and the quest for excellence;
`
`and to our families,
`Suzanne, Sheri, and Wendi Rabiner
`
`and
`Dorothy, Bill, John, and Kate Schafer,
`
`for their love, encouragement, and support.
`
`5
`‘
`v
`5‘
`i
`‘
`i
`
`
`
`
`
`8
`
`Linear Predictive Coding
`of Speech
`
`8.0 Introduction
`
`One of the most powerful speech analysis techniques is the method of linear
`predictive analysis. This method has become the predominant technique for
`estimating the basic speech parameters, e.g., pitch, formants, spectra, vocal
`tract area functions, and for representing speech for low bit rate transmission or
`storage. The importance of this method lies both in its ability to provide
`extremely accurate estimates of the speech parameters, and in its relative speed
`of computation.
`In this chapter, we present a formulation of the ideas behind
`linear prediction, and discuss some of the issues which are involved in using it
`in practical speech applications.
`The basic idea behind linear predictive analysis is that a speech sample can
`be approximated as a linear combination of past speech samples. By minimiz-
`ing the sum of the squared differences (over a finite interval) between the
`actual speech samples and the linearly predicted ones, a unique set of predictor
`coefiicients can be determined.
`(The predictor coefficients are the weighting
`coefficients used in the linear combination.)
`
`The philosophy of linear prediction is intimately related to the basic
`speech synthesis model discussed in Chapter 3 in which it was shown that
`speech can be modelled as the output of a linear, time-varying system excited
`by either quasi-periodic pulses (during voiced speech), or random noise (during
`unvoiced speech). The linear prediction method provides a robust, reliable,
`and accurate method for estimating the parameters that characterize the linear,
`time-varying system.
`
`396
`
`
`
`Linear predictive techniques have already been discussed in the context of
`the waveform quantization methods of Chapter 5. There it was suggested that
`a linear predictor could be applied in a differential quantization scheme to
`reduce the bit rate of the digital representation of the speech waveform.
`In
`fact,
`the mathematical basis for an adaptive high order predictor used for
`DPCM waveform coding is identical to the analysis that we shall present in this
`chapter.
`In adaptive DPCM coding the emphasis is on finding a predictor that
`will reduce the variance of the difference signal so that quantization error can
`also be reduced.
`In this chapter we take a more general viewpoint and show
`how the basic linear prediction idea leads to a set of analysis techniques that can
`be used to estimate parameters of a speech model.- This general set of linear
`predictive analysis techniques is often referred to as linear predictive coding or
`LPC.
`
`The techniques and methods of linear prediction have been available in
`the engineering literature for a long time. The ideas of linear prediction have
`been in use in the areas of control, and information theory under the names of
`system estimation and system identification. The term system identification is
`particularly descriptive of LPC methods in that once the predictor coefficients
`have been obtained, the system has been uniquely identified to the extent that
`it can be modelled as an all-pole linear system.
`
`the term linear prediction refers toga
`As applied to speech processing,
`variety of essentially equivalent formulations of the problem of modelling the
`speech waveform [1-18]. The differences among these formulations are often
`those of philosophy or way of viewing the problem.
`In other cases the
`differences concern the details of the computations used to obtain the predictor
`coefficients. Thus as applied to speech, the various (often equivalent) formula-
`tions of linear prediction analysis have been:
`
`the covariance method [3]
`the autocorrelation formulation [1,2,9]
`the lattice method [11,12]
`the inverse filter formulation [1]
`the spectral estimation formulation [12]
`the maximum likelihood formulation [4,6]
`the inner product formulation [1]
`
`NP‘MPP’N?‘
`
`In this chapter we will examine in detail the similarities and differences among
`only the first three basic methods of analysis listed above, since all the other
`formulations are equivalent to one of these three.
`
`The importance of linear prediction lies in the accuracy with which the
`basic model applies to speech. Thus a major part of this chapter is devoted to a
`discussion of how a variety of speech parameters can be reliably estimated using
`linear prediction methods. Furthermore some typical examplesof speech appli-
`cations which rely primarily on linear predictive analysis are discussed here, and
`in Chapter 9, to show the wide range of problems to which LPC has been suc-
`cessfully applied.
`
`397
`
`
`
` PITCH
`
`
`
`
`PERIOD
`
` IMPULSE
`TRAIN
`
`GENERATOR
`
`
`
`
`VOICED/
`VOCAL TRACT
`UNVOICED
`PARAMETERS
`
`SWITCH
`OTlME-VARYING
`uln)
`DIGITAL
`
`
` s(n)
`FILTER
`
`
`RANDOM
`
`
`NOISE
`GENERATOR
`
`Fig. 8.1 Block diagram of simplified model for speech production.
`
`8.1 Basic Principles of Linear Predictive Analysis
`
`Throughout this book we have repeatedly referred to the basic discrete-time
`model for speech production that was developed in Chapter 3. The particular
`form of this model that is appropriate for the discussion of linear predictive
`analysis is depicted in Fig. 8.1.
`In this case, the composite spectrum efiects of
`radiation, vocal tract, and glottal excitation are represented by a time-varying
`digital filter whose steady-state system function is of the form
`
`S(z)
`G
`= ——
`(1(2)
`1 — i akZ—k
`k-l
`
`H(z) =
`
`(8.1)
`
`This system is excited by an impulse train for voiced speech or a random noise
`sequence for unvoiced speech. Thus,
`the parameters of this model are:
`voiced/unvoiced classification, pitch period for voiced speech, gain parameter
`G, and the coefficients {ak} of the digital filter. These parameters, of course, all
`vary slowly with time.
`
`The pitch period and voiced/ unvoiced classification can be estimated using
`one of the many methods already discussed in this book or by methods based
`on linear predictive analysis to be discussed later in this chapter. As discussed
`in Chapter 3, this simplified all-pole model is a natural representation of non-
`nasal voiced sounds, but for nasals and fricative sounds, the detailed acoustic
`theory calls for both poles and zeros in the vocal tract transfer function. We
`shall see, however, that if the order p is high enough, the all-pole model pro-
`vides a good representation for almost all the sounds of speech. The major
`advantage of this model is that the gain parameter, G, and the filter coefficients
`{ak} can be estimated in a very straightforward and computationally efficrent
`manner by the method of linear predictive analysis.
`
`the speech samples s(n) are related to the
`For the system of Fig. 8.1,
`excitation u(n) by the simple difference equation
`
`s(n) = f aks(n—k) + Gu(n)
`k-l
`
`(8.2)
`
`398
`
`A linear predictor with prediction coefiicients, ak is defined as a system whosr
`output is
`
`an) = )5 aks(n—k)
`k-l
`
`(8.3)
`
`Such systems were used in Chapter 5 to reduce the variance of the difference
`signal in differential quantization schemes. The system function of a p'h order
`linear predictor is the polynomial
`
`P(z) = i akz‘k
`k=l
`
`The prediction error, e(n), is defined as
`
`e(r1) = s(n) — §(n) = s(n) — f aks(n—k)
`k=l
`
`(8.4)
`
`(8.5)
`
`From Eq. (8.5) it can be seen that the prediction error sequence is the output
`of a system whose transfer function is
`
`A(z) =1 - f akz‘k
`k-l
`
`(8.6)
`
`It can be seen by comparing Eqs. (8.2) and (8.5) that if the speech signal obeys
`the model of Eq. (8.2) exactly, and if ak = ak, then e(n) = Gu(n). Thus, the
`prediction error filter, A (2), will be an inverse filter for the system, H (z), of Eq.
`(8.1), i.e.,
`
`H(z) =
`
`
`G
`A (z)
`
`(8.7)
`
`The basic problem of linear prediction analysis is to determine a set of
`predictor coefficients {a k} directly from the speech signal in such a manner as
`to obtain a good estimate of the spectral properties of the speech signal through
`the use of Eq. (8.7). Because of the time-varying nature of the speech signal
`the predictor coefficients must be estimated from short segments of the speech
`signal. The basic approach is to find a set of predictor coefficients that will
`minimize the mean-squared prediction error over a short segment of the speech
`waveform. The resulting parameters are then assumed to be the parameters of
`the system function, H (z), in the model for speech production.
`That
`this approach will
`lead to useful results may not be immediately
`obvious, but
`it can be justified in several ways. First, recall that if ak = ak,
`then e(n) = Gu (n). For voiced speech this means that e(n) would consist of
`a train of impulses; i.e., e(n) would be small most of the time. Thus, finding
`ak’s that minimize prediction error seems consistent with this observation. A
`second motivation for this approach follows from the fact that if a signal is gen-
`erated by Eq.
`(8.2) with non-time-varying coefficients and excited either by a
`single impulse or by a stationary white noise input, then it can be shown that
`the predictor coefficients that result from minimizing the mean squared predic-
`tion error (over all time) are identical to the coefficients of Eq. (8.2). A third
`399
`
`
`
`
`E, = )3 s,,2(m) — i a, 2 s,,(m)s,,(m—k)
`(8.15)
`k-l
`
`very pragmatic justification for using the minimum mean-squared prediction
`error as a basis for estimating the model parameters is that this approach leads
`to a set of linear equations that can be efficiently solved to obtain the predictor
`parameters. More importantly the resulting parameters comprise a very useful
`and accurate representation of the speech signal as we shall see in this chapter.
`The short-time average prediction error is defined as
`
`13,, == 2 e,,2(m)
`
`= 2 (s,,(m) - s,,(m))2
`m
`= 2 s,,(m) — f: aks"(m—-k)
`m
`k-l
`
`2
`
`(8.8)
`
`(8.9)
`
`(8.10)
`
`where s,,(m) is a segment of speech that has been selected in the vicinity of
`sample n, i.e.,
`
`s,,(m) -= s(m+n)
`
`(8.11)
`
`The range of summation in Eqs. (8.8)-(8.10) is temporarily left unspecified, but
`since we wish to develop a short-time analysis technique, the sum will always
`be over a finite interval. Also note that to obtain an average we should divide
`by the length of the speech segment. However, this constant is irrevelant to
`the set of linear equations that we will obtain and therefore is omitted. We can
`find
`the values of
`01k
`that minimize E,
`in Eq.
`(8.10)
`by
`setting
`aE,/aa i = 0, i=1,2.
`.
`, 1), thereby obtaining the equations
`
`.
`
`.
`
`2.3,,(m—i)s,,(m) = i a, 2 s,,(m—i)s,,(m—k)
`m
`k-l
`m
`
`1 < i s p (8.12)
`
`(Since 61,, is unique, we will
`where 61,, are the values of ak that minimize 15,.
`drop the caret and use the notation ak to denote the values that minimize E,,.)
`If we define
`
`¢,,(i,k) = 2 s,,(m—i)s,,(m—k)
`
`then Eq. (8.12) can be written more compactly as
`
`f, a,¢,(i,k) = ¢.(i,0)
`k-l
`
`i=l,2, .. . ,p
`
`(8.13)
`
`(8.14)
`
`This set of 1) equations in p unknowns can be solved in an efficient manner for
`the unknown predictor coefficients {ak} that minimize the average squared
`prediction error for the segment s,,(m).l Using Eqs.
`(8.10) and (8.12),
`the
`minimum mean-squared prediction error can be shown to be
`
`is clear that the ak‘s are functions of n (the time index at which they are estimated) although
`1It
`this dependence will not be explicitly shown. We shall also find it advantageous to drop the sub-
`scripts n on E", s,,(m), and ¢”(i.k) when no confusion will result.
`
`400
`
`and using Eq. (8.14) we can express E,1 as
`
`E. = ¢.(0,0) — fi ak¢.(0.k)
`k=l
`
`(8.16)
`
`Thus the total minimum error consists of a fixed component, and a component
`which depends on the predictor coefficients.
`
`To solve for the optimum predictor coefficients, we must first compute
`the quantities ¢>,,(i.k) for 1 S I' S p and 0 S k S p. Once this is done we
`only have to solve Eq.
`(8.14)
`to obtain the ak’s. Thus,
`in principle,
`linear
`prediction analysis is very straightforward. However, the details of the compu-
`tation of ¢,,(i,k) and the subsequent solution of the equations are somewhat
`intricate and further discussion is required.
`So far we have not explicitly indicated the limits on the sums in Eqs.
`(8.8)-(8.10) and in Eq. (8.12); however it should be emphasized that the limits
`on the sum in Eq.
`(8.12) are identical
`to the limits assumed for the mean
`squared prediction error in Eqs. (8.8)-(8.10). As we have stated, if we wish to
`develop a short-time analysis procedure, the limits must be over a finite inter-
`val. There are two basic approaches to this question, and we shall see below
`that two methods for linear predictive analysis emerge out of a consideration of
`the limits of summation and the definition of the waveform segment s,,(m).
`
`8.1.1 The autocorrelation method [1,2,5]
`
`One approach to determining the limits on the sums in Eqs. (8.8)-(8.10)
`and Eq.
`(8.12) is to assume that the waveform segment, s,,(m), is identically
`zero outside the interval 0 S m S N — 1. This can be conveniently expressed
`as
`
`(8.17)
`s,,(m) = s(m+n)w(m)
`where w(m) is a finite length window (e.g. a Hamming window) that is identi-
`cally zero outside the interval 0 S m S N — l.
`
`The effect of this assumption on the question of limits of summation for
`the expressions for E, can be seen by considering Eq. (8.5). Clearly, if s,,(m)
`is nonzero only for 0 S m S N — 1, then the corresponding prediction error,
`e,,(m),
`for
`a p'h order predictor will
`be nonzero over
`the
`interval
`0 S m S N ~ 1 + p. Thus, for this case 5,, is properly expressed as
`E,, = IVE—1.33m)
`m :0
`
`(8.18)
`
`Alternatively, we could have simply indicated that the sum should be over all
`nonzero values by summing from —00 to +00 [2].
`
`Returning to Eq. (8.5), it can be seen that the prediction error is likely to
`be large at the beginning of the interval (specifically OSmSp—l) because we
`401
`
`
`
`
`
`are trying to. predict the signal from samples that have arbitrarily been set to
`zero. Likew1se the error can be large at the end of the interval
`(specifically
`N s m S N +p—1) because we are trying to predict zero from samples that
`are nonzero. Fer this reason, a window which tapers the segment, s,,(m), to
`zero 15 generally used for w(m) in Eq. (8.17).
`
`it is sym-
`i.e.,
`The po matrix of autocorrelation values is a Toeplitz matrix;
`metric and all
`the elements along a given diagonal are equal. This special
`property will be exploited in Section 8.3 to obtain an efficient algorithm for the
`solution of Eq. (8.23).
`
`8.1.2 The covariance method [3]
`
`The second basic approach to defining the speech segment 5,,(m) and the
`limits on the sums is to fix the interval over which the mean-squared error is
`computed and then consider the efiect on the computation of ¢,,(i,k). That is,
`if we define
`
`N—l
`
`E” = 2 en2(m)
`m-O
`
`(8.26)
`
`then d2,,(i,k) becomes
`
`1 S ~ S
`I
`N—I
`I
`(8.27)
`0 < [($113
`mm) = Eosn(m—I)s,,(m—k)
`In this case, if we change the index of summation we can express ¢,,(i,k) as
`either
`
`N—i—I
`d>,,(i,k) = "12-2,- sn(m)s,,(vm+i—k)
`
`6 E
`
`(8.2821)
`
`or
`
`(8 28b)
`1‘ ’ g P
`( +k—‘)
`)
`(k) — NEH (
`.
`0<k<p
`¢nl,,—m=_ks,,ms,,m
`I
`Although the equations look very similar to Eq. (8.1%), we see that the limits
`of summation are not the same. Equations (8.28) call for values of s,,(m) out-
`side the interval 0 < m < N — 1.
`Indeed,
`to evaluate ¢,,(i,k) for all of the
`required values of i and k requires that we use values of s,,(m) in the interval
`—p S m < N — 1.
`If we are to be consistent with the limits on E” in Eq.
`(8.26) then we have no choice but to supply the required values.
`In this case it
`does not make sense to taper the segment of speech to zero at the ends as in
`the autocorrelation method since the necessary values are made available from
`outside the interval 0 < m S N — 1. Clearly, this approach is very similar to
`what was called the modified autocorrelation function in Chapter 4. As pointed
`out in Section 4.6, this approach leads to a function which is not a true auto-
`correlation function, but rather, the cross-correlation between two very similar,
`but not identical, finite length segments of the speech wave. Although the
`differences between Eq. (8.28) and Eq. (81%) appear to be minor computa-
`tional details, the set of equations
`
`1" ak¢,,(i,k) = ¢,,(i,0)
`k-l
`
`i= 1,2, ...,p
`
`(8.2%)
`
`has significantly different properties that strongly affect the method of solution
`
`403
`
`to
`(8.13) are identical
`The limits on the expression for ¢”(i.k) in Eq.
`those of Eq.
`(8.18). However, because s,,(m) is identically zero outside the
`interval 0 S m S N — 1, it is simple to show that
`~+ —1
`.
`_.
`_
`¢n(/.k)
`mfi-o
`s,,(m I)s,,(m k)
`can be expressed as
`
`1 < i <
`0 g kg?)
`
`(8.19a)
`
`=
`
`=
`
`\
`-_
`1 < I. < p
`N—l—(i—k)
`(k)
`(8.1%)
`0 S k g p
`S,,(m)s,,(m+/ k)
`320
`l,
`as,
`furthermore it can be seen that in this case qS,,(i,k) is identical to the short-
`time autocorrelation function of Eq. (4.30) evaluated for (i—k). That is
`
`where
`
`d),,(i,k) = R,,(i—k)
`
`'
`
`(8.20)
`
`N—l—k
`
`R,,(k)= 2 s,,(m)s,,(m+k)
`m-O
`
`(8.21)
`
`The computation of R,,(k) is covered in detail in Section 4.6 and thus we shall
`not consnder such details here. Since R,,(k) is an even function, it follows that
`¢n(i,k) = R,,(Ii—kl)
`"" f,
`(8.22)
`Therefore Eq. (8.14) can be expressed as
`
`f akR”(Ii—kl) = 12,0)
`k-l
`
`1 s i < p
`
`(8.23)
`
`Similarly, the minimum mean squared prediction error of Eq. (816) takes the
`orm
`
`E, = R,,(O) - )5 akxnuc)
`k-l
`
`(8.24)
`
`as
`
`The set of equations given by Eqs, (8.23) can be expressed in matrix form
`
`R,,(O)
`R,,(l)
`R,,(2)
`
`R,,(l)
`R,,(O)
`R,,(l)
`
`R,,(2)
`R,,(l)
`R,,(O)
`
`-- - R,,(p—l)
`R,,(p—Z)
`R,,(p—3)
`
`R,,(P—l)
`
`R,,(p—Z) R,,(p—3)
`
`-
`
`‘
`
`'
`
`R,,(O)
`402
`
`a1
`a;
`a3
`
`ap
`
`' R,,(l)
`R,,(2)
`R,,(3)
`
`R.,,(1'J)
`
`(8.25)
`
`
`
`
`
`and the properties of the resulting optimum predictor.
`equations become
`
`In matrix form these
`
`¢.(1,1)
`¢,,(2,1)
`¢ (3 1)
`
`¢,(1,2) ¢,(1,3)
`¢,(2.2) ¢>,,(2,3)
`as
`(3 2) ¢,(3,3)
`
`mm»
`¢"(2,p)
`<b,,(3,p)
`
`a]
`a2
`a3
`
`d>.(1.0)
`¢,,(2,0)
`¢n(3,0)
`
`¢,(p,1)
`
`¢,(p,2) ¢,(p.3)
`
`-
`
`‘
`
`' 4),,(1w)
`
`a,
`
`¢,(p,0)
`
`(8.2%)
`
`the pxp matrix of
`(8.28)),
`(see Eq.
`In this case, since ¢,,(i,k)=d>,,(k,i)
`correlation-like values is symmetric but not Toeplitz.
`Indeed,
`it can be seen
`that the diagonal elements are related by the equation
`
`¢n(i+1,k+1) = ¢,,(/,k) + s,,(-i—1)s,,(—k—1)
`
`— s,,(N-—1—i)s,,(N—1—k)
`
`(8.30)
`
`The method of analysis based upon this method of computation of
`¢,,(i,k) has come to be known as the covariance method because the matrix of
`values {¢,,(I'.k)l has the properties of a covariance matrix [5].2
`
`8.1.3 Summary
`
`It has been shown that by using different definitions of the segments of
`the signal
`to be analyzed,
`two distinct sets of analysis equations can be
`obtained. For the autocorrelation method,
`the signal
`is windowed by an N-
`point window, and the quantities d>,,(i,k) are obtained using a short-time auto-
`correlation function. The resulting matrix of correlations is Toeplitz leading to
`one type of solution for the predictor coefficients. For the covariance method,
`the signal is assumed to be known for the set of values —p S n S N—l. Out-
`side this interval no assumptions need be made about the signal, since these are
`the only values needed in the computation. The resulting matrix of correla-
`tions in this case is symmetric but not Toeplitz. The result is that the two
`methods of computing the correlations lead to different methods of solution of
`the analysis equations and to sets of predictor coefficients with somewhat
`different properties.
`
`In later sections we will compare and contrast computational details and
`results for both these techniques as well as for another method yet to be dis—
`cussed.
`First, however, we will show how the gain, G,
`in Fig. 8.1, can be
`determined from the prediction error expression.
`
`8.2 Computation of the Gain for the Model
`
`[2]
`
`It is reasonable to expect that the gain, G, could be determined by matching the
`
`is somewhat confusing since the term covariance
`2This terminology, which is firmly entrenched,
`usually refers to the correlation of a signal with its mean removed.
`404
`
`energy in the signal with the energy of the linearly predicted samples. This
`indeed is true when appropriate assumptions are made about the excitation sig-
`nal to the LPC system.
`
`It is possible to relate the gain constant G to the excitation signal and the
`error in prediction by referring back to Eqs. (8.2) and (8.5).3 The excitation
`signal, Gu (n), can be expressed as'
`Gu(n) = s(n) — f aks(n—k)
`k-I
`
`(3.313)
`
`whereas the prediction error signal e(n) is expressed as
`
`e(n) = s(n) - f, akS(H—k)
`k==I
`
`(8.3lb)
`
`In the case where ak = ark,
`the model are identical, then
`
`i.e.,
`
`the actual predictor coefficients, and those of
`
`(8.32)
`e‘(n) = Gu(n)
`i.e., the input signal is proportional to the error signal with the constant of pro-
`portionality being the gain constant, G. A detailed discussion of the properties
`of the prediction error signal is given in Section 8.5.
`Since Eq. (8.32) is only approximate (i.e., it is valid to the extent that the
`ideal and the actual linear prediction parameters are identical) it is generally not
`possible to solve for G in a reliable way directly from the error signal
`itself.
`Instead the more reasonable assumption is made that the energy in the error
`signal is equal to the energy in the excitation input, i.e.,
`N—l
`N—l
`62 2 u2(m) = 2 e2(m) = E”
`m =0
`m -0
`
`(8.33)
`
`At this point we must make some assumptions about u(n) so as to be
`able to relate G to the known quantities, e.g.,
`the ak’s and the correlation
`coefficients. There are two cases of interest for the excitation. For voiced
`speech it
`is reasonable to assume u(n) = 6(n),
`i.e.,
`the excitation is a unit
`sample at n = 0.4 For this assumption to be valid requires that the effects of
`the glottal pulse shape used in the actual excitation for voiced speech be
`lumped together with the vocal
`tract transfer function, and therefore both of
`these effects are essentially modelled by the time-varying linear predictor. This
`requires that the predictor order, p, be large enough to account for both the
`vocal
`tract and glottal pulse effects. We will discuss the choice of predictor
`order in a later section. For unvoiced speech it is most reasonable to assume
`that u(n) is a zero mean, unity variance, stationary, white noise process.
`Based on these assumptions we can now determine the gain constant G by
`utilizing Eq. (8.33). For voiced speech, we have as input G6(n).
`If we call the
`
`3Note that the gain is also a function of time.
`
`that for this assumption to be valid requires that the analysis interval be about the same
`4Note
`length as a pitch period.
`
`405
`
`
`
`
`
`resulting output for this particular input h(n) (since it is actually the impulse
`response of the system with transfer function H (2) as in Eq. (8.1)) we get the
`relation
`
`h(n) = f akh(n—k) + 05(4)
`k-l
`
`(3.34)
`
`is readily shown [Problem 8.1]
`It
`defined as
`
`that the autocorrelation function of h(n),
`
`R(m) = )3 h(n)h(m+n)
`"-0
`
`(8.35)
`
`satisfies the relations
`
`R(m) = f akRflm—kl)
`k-l
`
`m=1,2,
`
`.
`
`.
`
`.
`
`, p
`
`(8.36a)
`
`and
`
`since E[u(n)g(n—m)]=0 for m > 0 because u(n) is uncorrelated with any
`signal prior to u(n). For m = 0 we get
`
`Me) = f: akR (k) + GE[u(n)g(n)]
`k=l
`
`= f (1,}? (k) + 02
`k‘l
`
`(8.42)
`
`Since
`since E[u(n)g(n)] = E[u(n)(Gu(n) + terms prior to n)] = G.
`energy in the response to Gu(n) must equal the energy in the signal, we get
`R(m) = R,,(m)
`o <m s p
`(8.43)
`
`the
`
`01'
`
`13(0) = )2; akR(k) + 62
`k—l
`
`(8.36b)
`
`G2 = R,,(0) — )5 akRnUc)
`k-l
`
`(8.44)
`
`Since Eqs. (8.36) are identical to Eqs. (8.23) it follows that
`
`as was the case for the impulse excitation for voiced speech.
`
`R(m) = R,,(m)
`
`1 s m < p
`
`(8.37)
`
`Since the total energies in the signal (R (0)) and the impulse response (R(0))
`must be equal we can use Eqs. (8.24) , (8.33) and (8.36b) to obtain
`
`02 == R,,(0) - f a,R,(k) = E,,
`k-l
`
`(8.38)
`
`It is interesting to note that Eq. (8.37) and the requirement that the energy of
`the impulse response be equal to the energy of the signal together require that
`the first p + l coefficients of the autocorrelation function of the impulse
`response of the model are identical to the first p + 1 coeflicients of the auto-
`correlation function of the speech signal.
`
`For the case of unvoiced speech, the correlations are defined as statistical
`averages.
`It is assumed that the input is white noise with zero mean and unity
`variance; i.e.,
`
`E[u(n)u(n—m)] = 8(m)
`
`(8.39)
`
`If we excite the system with the random input Gu (n) and call the output g(n)
`then
`
`g(n) = i angI—k) + Gu(n)
`k-l
`
`(8.40)
`
`If we now let R (m) denote the autocorrelation function of g(n), then
`
`R(m) = E[g(n)g(n-—m)] = i akE[g(n—k)g(n-‘m)] + EIGU(n)g(n-M)l
`k-l
`
`= i akR (m—k)
`k-l
`
`m ¢ 0
`
`(8.41)
`
`406
`
`8.3 Solution of the LPC Equations
`
`In order to effectively implement a linear predictive analysis system, it is neces-
`sary to solve the linear equations in an efi‘icient manner. Although a variety of
`techniques can be applied to solve a system of 1) linear equations in p unk-
`nowns,
`these techniques are not equally efiicient. Because of the special pro-
`perties of the coefficient matrices it
`is possible to solve the equations much
`more efficiently than is possible in general.
`In this section we will discuss in
`detail
`two methods for obtaining the predictor coefficients, and then we will
`compare and contrast several properties of these solutions.
`
`8.3.1 Cholesky decomposition solution
`for the covariance method [3]
`
`For the covariance method, the set of equations which must be solved is
`of the form:
`
`f ak¢,,(i,k) = ¢,,(i,0)
`k==l
`
`i=1,2, ...,p
`
`(8.45)
`
`or in matrix notation
`
`(D0: = (I:
`(8.46)
`where (b is a positive definite symmetric matrix with (i,j)”' element ¢,,(i j),
`and a and (l; are column vectors with elements a,, and 9),,(1, 0) respectively.
`The system of equations given by Eq.
`(8.45) can be solved in an efficient
`manner since the matrix (I) is a symmetric, positive definite matrix. The result-
`ing method of solution is called the Cholesky decomposition (or sometimes it is
`Anlll
`
`
`
`
`
`called the square root method) [3]. For this method the matrix (I) is expressed
`in the form
`
`Using Eq. (8.51) for i = 2 gives
`
`‘1) = VDV‘
`
`(8.47)
`
`where V is a lower triangular matrix (whose main diagonal elements are all
`1’s), and D is a diagonal matrix. The superscript tdenotes matrix transpose.
`The elements of the matrices V and D are readily determined from Eq. (8.47)
`by solving for the (/,j) ”’ element of both sides of Eq. (8.47) giving
`¢,,(i,j) = t V,kde_,-k
`1 s j g i—l
`k-l
`
`(8.48)
`
`01'
`
`'—l
`VUdj= Mu) — $3 may”
`k=l
`
`1 < j g 1—1
`
`(8.49)
`
`and, for the diagonal elements
`
`I
`
`or
`
`with
`
`¢,,(i,i) = 2 V,kde,k
`k-l
`
`[—1
`d,= ¢,,(i,i) — 2 dek
`k-l
`
`i 2 2
`
`d1= (inn, 1)
`
`(8.50)
`
`(8.51)
`
`(8.52)
`
`To illustrate the use of Eqs. (8.47)-(8.52) consider an example with p = 4, and
`matrix elements qS,,(i,j) = 42,}. Equation (8.47) is thus of the form
`
`¢ii ¢21 ¢31 d>41
`
`¢21 (1522
`
`4332 4’42
`
`$31 ¢32 ¢33 (#43 =
`¢>41
`0542
`0543 ¢44
`
`1000 d10001V21V31V41
`V21100 0(1200'01V32V42
`V3XV321000d30001V43
`V41V42V431000d4 00 01
`
`To solve for d1 to d4, and the V,;,-’s we begin with Eq. (8.52) for i = 1 giving
`‘11”= (1511
`
`Using Eq.
`
`(8.49) for i = 2,3,4 we solve for V21, V3], and V41 as
`
`V21d1= ¢2i
`
`u VSIdl = ¢31
`
`’ V41‘11= 4’41
`
`V21 = ¢21/di-
`
`V31 = ¢31/d1 : V41= (bu/‘11
`408
`
`Using Eq. (8.49) for i = 3 and 4 gives
`
`0’2 = 9522 — V221dl
`
`or
`
`V3242 = (1532 — V3ldl V21
`
`V4242 = ¢42 — V4ld1 V21
`
`V32 = (4532— V3141 V21)/d2
`
`V42 = (¢42— V41d1V21)/d2
`Equation (8.51) is now used for i = 3 to solve for d;, then Eq. (8.49) is used
`for i = 4 to solve for V43, and finally Eq. (8.51) is used for i = 4 to solve for
`d4.
`
`it is relatively simple
`Once the matrices V and D have been determined,
`to solve for the column vector or in a two-step procedure. From Eqs.
`(8.46)
`and (8.47) we get
`
`which can be written as
`
`and
`
`or
`
`VDV‘ = (I)
`
`VY = u];
`
`DV‘a = Y
`
`(8.53)
`
`(8.54)
`
`(8.55)
`
`(8.56)
`V‘a = D‘lY
`(8.54) can be solved for the column vector Y
`Thus from the matrix V, Eq.
`using a simple recursion of the form
`i—l
`
`Y.= ¢.— 2 V,,-Y,.
`1-1
`
`p 21'; 2
`
`(8.57)
`
`with initial condition
`
`the relation
`
`with initial condition
`
`It should be noted that
`i=p—1downtoi=l.
`
`the index i in Eq.
`
`(8.59) proceeds backwards from
`
`up = Yp/dp
`
`(8.60)
`
`
`
`
`
`(8.57)~(8.60) we continue our previous 1
`To illustrate the use of Eqs.
`example and first solve for the Y,'s assuming V and D are now known.
`In
`matrix form we have the equation
`
`From Eq. (8.56) we can substitute for a'the expression Y'D‘1V‘1 giving
`E,, = ¢"(0,0) — Y'D-lv-1¢
`
`(8.63)
`
`1
`
`V21
`
`0
`
`1
`
`V31 V32
`
`O
`
`0
`
`1
`
`0
`
`0
`
`0
`
`V41 V42 V43 1
`
`Y1
`
`Y2
`
`Y4
`
`Y3]='80:
`
`llli
`
`1112
`
`#14
`
`From Eqs. (8.57) and (8.58) we get
`
`Y1 = $1
`
`Y2 = 1112 — V21Y1
`
`Y3= lliz— V31Y1 — V32Y2
`
`Y4=¢4— V41Y1— V42Y2' V43Y3
`From the Y,’s we solve Eq. (8.56) which is of the form
`
`1
`
`0
`
`V21 V31 V41
`
`1
`
`V32 V42
`
`0‘1
`
`0‘2
`
`1
`o
`
`0
`0
`
`o
`0
`0
`0
`l/dl
`0
`i/d2
`o
`1/d3
`0
`0
`.0
`0
`0
`From Eqs. (8.59) and (8.60) we get
`
`V43
`1
`
`0
`0
`0
`1/(14
`
`63 ‘
`a4
`Yl/dx
`Y1
`Y2 m2
`Y3 = ma
`Y4
`Y4/d4
`
`a4= Y4/d4
`
`as: Ys/da‘ V4304
`
`012= Yz/dz — V32“: " V4204
`
`‘11 = Yl/dl — V2102 “ V3101: _ V41a4
`thus completing the solution to the covariance equations.
`The use of the Cholesky decomposition procedure leads to a very simple
`expression for the minimum error of the covariance method in terms of the
`column vector Y and the matrix D. We recall that for the covariance method,
`the prediction error E,, was of the form
`
`or in matrix notation
`
`5. = ¢.(o.0) — )5 6.6.(o,k)
`k-l
`
`E,, = ¢"(0,0) — at];
`410
`
`(8.61)
`
`(8.62)
`
`Using Eq. (8.54) we get
`
`or
`
`E. = ¢.(o,0) — Y'D-IY
`
`E,,= ¢.(o,o) ~ fi Ykz/dk
`k-l
`
`(8.64)
`
`(8.65)
`
`Thus the mean-squared prediction error E,, can be determined directly from the
`column vector Y and the matrix D. Furthermore Eq.
`(8.65) can be used to
`give the value of E,, for any value 'of p up to the value of 1) used in solving the
`matrix equations. Thus one can get an idea as to how the mean-squared predic-
`tion error varies with the number of predictor coefficients used in the solution.
`
`8.3.2 Durbin’s recursive solution
`for the autocorrelation equations [2]
`
`For the autocorrelation method the matrix equation for solving for the
`predictor coefficients is of the form
`
`)5 “mu/~78!) = Rn(i)
`k-l
`
`1 < i s p
`
`(8.66)
`
`By exploiting the Toeplitz nature of the matrix of coefficients, several efficient
`recursive procedures have been devised for solving this system of equations.
`Although the most popular