throbber
IPR2017-01244
`Saint Lawrence Communications
`Exhibit 2019
`
`IPR2017-01077
`Saint Lawrence Communications
`Exhibit 2019
`
`

`

`AT&T
`
`DIGITAL PROCESSING
`
`OF
`
`SPEECH SIGNALS
`
`Lawrence R. Rabiner
`
`Ronald W. Schafer
`
`Acoustics Research Laboratory
`Bell Telephone Laboratories
`Murray Hill, New Jersey
`
`School ofElectrical Engineering
`Georgia Institute of Technology
`Atlanta, Georgia
`
`Prentice-Hall. Inc., Englewood Cliffs, New Jerscy 07632
`
`Il
`
`3lg 1
`
`
`
`PRENTICE-HALL SIGNAL PROCESSING SERIES
`
`Alan V. Oppenheim, Editor
`
`ANDREWS and HUNT Digital Image Restoration
`BRIGHAM The Fast Fourier Transform
`BURDIC Underwater Acoustic System Analysis
`CASTLEMAN Digital Image Processing
`CROCHIERE and RABINER Multirate Digital Signal Processing
`DUDGEON and MERSEREAU Multidimensional Digital Signal Processing
`HAMMING Digital Filters, 2e
`HAYKIN, ED. Array Signal Processing
`LEA, ED. Trends in Speech Recognition
`LIM, ED.
`Speech Enhancement
`MCCLELLAN and RADER Number Theory in Digital Signal Processing
`OPPENHEIM, ED. Applications of Digital Signal Processing
`OPPENHEIM, WILLSKY, with YOUNG Signals and Systems
`OPPENHEIM and SCHAFER Digital Signal Processing
`RABINER and GOLD Theory and Applications of Digital Signal Processing
`RABINER and SCHAFER Digital Processing of Speech Signals
`ROBINSON and TREITEL Geophysical Signal Analysis
`TRIBOLET Seismic Applications of Homomorphic Signal Processing
`
`

`

`Library 1)me Cataloging In Public-lion DIM
`Rubiner. Lawrence it (due)
`Digital procuring of speech rial-ls.
`(Prentice-Hall sisml processing mics)
`Includes biblloxnphiu and index.
`1. Speech processin‘ ryltenu.
`2. Di ‘ul elec-
`tronics.
`l. Schnferdlonnld W..jainnu
`I.
`ll. Title.
`“(7882365113
`fill .38 ‘04l2
`784555
`lSBN 0-lJ-213603-l
`
`© 1978 by Bell Laboratories, Incorporated
`
`All rights reserved. No part of this book
`may be reproduced in any form or
`by any means without permission in writing
`from the publisher and Bell Laboratories.
`
`Printed in the United States of America
`
`19
`
`18
`
`17
`
`16
`
`15
`
`14
`
`PRENTICE— HALL INTERNATIONAL, INc.. London
`PRENTICE- HALL or AUSTRALIA Pm, LIMITED, Sydney
`PRENTICE- HALL or CANADA, LTD., Toronto
`PRENTICE- HALL or INDIA PRlVATE LIMITED, New Delhi
`PRENTICE- HALL or JAPAN, INc., Tokyo
`PRENTlCE—HALLor: SOUTHEAST ASIA PrE. LTD., Singapore
`WHITEHALL Booxs LIMITED, Wellington, NewZealand
`
`T
`
`ents,
`
`0 our par
`Dr. and Mrs. Nathan Rabiner
`
`and
`ML and Mm William Schafer’
`for instilling within us the thirst for knowledge
`and the quest for excellence;
`
`and to our families,
`Suzanne, Sheri, and Wendi Rabiner
`
`and
`Dorothy, Bill, John, and Kate Schafer,
`
`for their love, encouragement, and support.
`
`5
`‘
`v
`5‘
`i
`‘
`i
`
`

`

`
`
`8
`
`Linear Predictive Coding
`of Speech
`
`8.0 Introduction
`
`One of the most powerful speech analysis techniques is the method of linear
`predictive analysis. This method has become the predominant technique for
`estimating the basic speech parameters, e.g., pitch, formants, spectra, vocal
`tract area functions, and for representing speech for low bit rate transmission or
`storage. The importance of this method lies both in its ability to provide
`extremely accurate estimates of the speech parameters, and in its relative speed
`of computation.
`In this chapter, we present a formulation of the ideas behind
`linear prediction, and discuss some of the issues which are involved in using it
`in practical speech applications.
`The basic idea behind linear predictive analysis is that a speech sample can
`be approximated as a linear combination of past speech samples. By minimiz-
`ing the sum of the squared differences (over a finite interval) between the
`actual speech samples and the linearly predicted ones, a unique set of predictor
`coefiicients can be determined.
`(The predictor coefficients are the weighting
`coefficients used in the linear combination.)
`
`The philosophy of linear prediction is intimately related to the basic
`speech synthesis model discussed in Chapter 3 in which it was shown that
`speech can be modelled as the output of a linear, time-varying system excited
`by either quasi-periodic pulses (during voiced speech), or random noise (during
`unvoiced speech). The linear prediction method provides a robust, reliable,
`and accurate method for estimating the parameters that characterize the linear,
`time-varying system.
`
`396
`
`
`
`Linear predictive techniques have already been discussed in the context of
`the waveform quantization methods of Chapter 5. There it was suggested that
`a linear predictor could be applied in a differential quantization scheme to
`reduce the bit rate of the digital representation of the speech waveform.
`In
`fact,
`the mathematical basis for an adaptive high order predictor used for
`DPCM waveform coding is identical to the analysis that we shall present in this
`chapter.
`In adaptive DPCM coding the emphasis is on finding a predictor that
`will reduce the variance of the difference signal so that quantization error can
`also be reduced.
`In this chapter we take a more general viewpoint and show
`how the basic linear prediction idea leads to a set of analysis techniques that can
`be used to estimate parameters of a speech model.- This general set of linear
`predictive analysis techniques is often referred to as linear predictive coding or
`LPC.
`
`The techniques and methods of linear prediction have been available in
`the engineering literature for a long time. The ideas of linear prediction have
`been in use in the areas of control, and information theory under the names of
`system estimation and system identification. The term system identification is
`particularly descriptive of LPC methods in that once the predictor coefficients
`have been obtained, the system has been uniquely identified to the extent that
`it can be modelled as an all-pole linear system.
`
`the term linear prediction refers toga
`As applied to speech processing,
`variety of essentially equivalent formulations of the problem of modelling the
`speech waveform [1-18]. The differences among these formulations are often
`those of philosophy or way of viewing the problem.
`In other cases the
`differences concern the details of the computations used to obtain the predictor
`coefficients. Thus as applied to speech, the various (often equivalent) formula-
`tions of linear prediction analysis have been:
`
`the covariance method [3]
`the autocorrelation formulation [1,2,9]
`the lattice method [11,12]
`the inverse filter formulation [1]
`the spectral estimation formulation [12]
`the maximum likelihood formulation [4,6]
`the inner product formulation [1]
`
`NP‘MPP’N?‘
`
`In this chapter we will examine in detail the similarities and differences among
`only the first three basic methods of analysis listed above, since all the other
`formulations are equivalent to one of these three.
`
`The importance of linear prediction lies in the accuracy with which the
`basic model applies to speech. Thus a major part of this chapter is devoted to a
`discussion of how a variety of speech parameters can be reliably estimated using
`linear prediction methods. Furthermore some typical examplesof speech appli-
`cations which rely primarily on linear predictive analysis are discussed here, and
`in Chapter 9, to show the wide range of problems to which LPC has been suc-
`cessfully applied.
`
`397
`
`

`

` PITCH
`
`
`
`
`PERIOD
`
` IMPULSE
`TRAIN
`
`GENERATOR
`
`
`
`
`VOICED/
`VOCAL TRACT
`UNVOICED
`PARAMETERS
`
`SWITCH
`OTlME-VARYING
`uln)
`DIGITAL
`
`
` s(n)
`FILTER
`
`
`RANDOM
`
`
`NOISE
`GENERATOR
`
`Fig. 8.1 Block diagram of simplified model for speech production.
`
`8.1 Basic Principles of Linear Predictive Analysis
`
`Throughout this book we have repeatedly referred to the basic discrete-time
`model for speech production that was developed in Chapter 3. The particular
`form of this model that is appropriate for the discussion of linear predictive
`analysis is depicted in Fig. 8.1.
`In this case, the composite spectrum efiects of
`radiation, vocal tract, and glottal excitation are represented by a time-varying
`digital filter whose steady-state system function is of the form
`
`S(z)
`G
`= ——
`(1(2)
`1 — i akZ—k
`k-l
`
`H(z) =
`
`(8.1)
`
`This system is excited by an impulse train for voiced speech or a random noise
`sequence for unvoiced speech. Thus,
`the parameters of this model are:
`voiced/unvoiced classification, pitch period for voiced speech, gain parameter
`G, and the coefficients {ak} of the digital filter. These parameters, of course, all
`vary slowly with time.
`
`The pitch period and voiced/ unvoiced classification can be estimated using
`one of the many methods already discussed in this book or by methods based
`on linear predictive analysis to be discussed later in this chapter. As discussed
`in Chapter 3, this simplified all-pole model is a natural representation of non-
`nasal voiced sounds, but for nasals and fricative sounds, the detailed acoustic
`theory calls for both poles and zeros in the vocal tract transfer function. We
`shall see, however, that if the order p is high enough, the all-pole model pro-
`vides a good representation for almost all the sounds of speech. The major
`advantage of this model is that the gain parameter, G, and the filter coefficients
`{ak} can be estimated in a very straightforward and computationally efficrent
`manner by the method of linear predictive analysis.
`
`the speech samples s(n) are related to the
`For the system of Fig. 8.1,
`excitation u(n) by the simple difference equation
`
`s(n) = f aks(n—k) + Gu(n)
`k-l
`
`(8.2)
`
`398
`
`A linear predictor with prediction coefiicients, ak is defined as a system whosr
`output is
`
`an) = )5 aks(n—k)
`k-l
`
`(8.3)
`
`Such systems were used in Chapter 5 to reduce the variance of the difference
`signal in differential quantization schemes. The system function of a p'h order
`linear predictor is the polynomial
`
`P(z) = i akz‘k
`k=l
`
`The prediction error, e(n), is defined as
`
`e(r1) = s(n) — §(n) = s(n) — f aks(n—k)
`k=l
`
`(8.4)
`
`(8.5)
`
`From Eq. (8.5) it can be seen that the prediction error sequence is the output
`of a system whose transfer function is
`
`A(z) =1 - f akz‘k
`k-l
`
`(8.6)
`
`It can be seen by comparing Eqs. (8.2) and (8.5) that if the speech signal obeys
`the model of Eq. (8.2) exactly, and if ak = ak, then e(n) = Gu(n). Thus, the
`prediction error filter, A (2), will be an inverse filter for the system, H (z), of Eq.
`(8.1), i.e.,
`
`H(z) =
`
`
`G
`A (z)
`
`(8.7)
`
`The basic problem of linear prediction analysis is to determine a set of
`predictor coefficients {a k} directly from the speech signal in such a manner as
`to obtain a good estimate of the spectral properties of the speech signal through
`the use of Eq. (8.7). Because of the time-varying nature of the speech signal
`the predictor coefficients must be estimated from short segments of the speech
`signal. The basic approach is to find a set of predictor coefficients that will
`minimize the mean-squared prediction error over a short segment of the speech
`waveform. The resulting parameters are then assumed to be the parameters of
`the system function, H (z), in the model for speech production.
`That
`this approach will
`lead to useful results may not be immediately
`obvious, but
`it can be justified in several ways. First, recall that if ak = ak,
`then e(n) = Gu (n). For voiced speech this means that e(n) would consist of
`a train of impulses; i.e., e(n) would be small most of the time. Thus, finding
`ak’s that minimize prediction error seems consistent with this observation. A
`second motivation for this approach follows from the fact that if a signal is gen-
`erated by Eq.
`(8.2) with non-time-varying coefficients and excited either by a
`single impulse or by a stationary white noise input, then it can be shown that
`the predictor coefficients that result from minimizing the mean squared predic-
`tion error (over all time) are identical to the coefficients of Eq. (8.2). A third
`399
`
`

`

`
`E, = )3 s,,2(m) — i a, 2 s,,(m)s,,(m—k)
`(8.15)
`k-l
`
`very pragmatic justification for using the minimum mean-squared prediction
`error as a basis for estimating the model parameters is that this approach leads
`to a set of linear equations that can be efficiently solved to obtain the predictor
`parameters. More importantly the resulting parameters comprise a very useful
`and accurate representation of the speech signal as we shall see in this chapter.
`The short-time average prediction error is defined as
`
`13,, == 2 e,,2(m)
`
`= 2 (s,,(m) - s,,(m))2
`m
`= 2 s,,(m) — f: aks"(m—-k)
`m
`k-l
`
`2
`
`(8.8)
`
`(8.9)
`
`(8.10)
`
`where s,,(m) is a segment of speech that has been selected in the vicinity of
`sample n, i.e.,
`
`s,,(m) -= s(m+n)
`
`(8.11)
`
`The range of summation in Eqs. (8.8)-(8.10) is temporarily left unspecified, but
`since we wish to develop a short-time analysis technique, the sum will always
`be over a finite interval. Also note that to obtain an average we should divide
`by the length of the speech segment. However, this constant is irrevelant to
`the set of linear equations that we will obtain and therefore is omitted. We can
`find
`the values of
`01k
`that minimize E,
`in Eq.
`(8.10)
`by
`setting
`aE,/aa i = 0, i=1,2.
`.
`, 1), thereby obtaining the equations
`
`.
`
`.
`
`2.3,,(m—i)s,,(m) = i a, 2 s,,(m—i)s,,(m—k)
`m
`k-l
`m
`
`1 < i s p (8.12)
`
`(Since 61,, is unique, we will
`where 61,, are the values of ak that minimize 15,.
`drop the caret and use the notation ak to denote the values that minimize E,,.)
`If we define
`
`¢,,(i,k) = 2 s,,(m—i)s,,(m—k)
`
`then Eq. (8.12) can be written more compactly as
`
`f, a,¢,(i,k) = ¢.(i,0)
`k-l
`
`i=l,2, .. . ,p
`
`(8.13)
`
`(8.14)
`
`This set of 1) equations in p unknowns can be solved in an efficient manner for
`the unknown predictor coefficients {ak} that minimize the average squared
`prediction error for the segment s,,(m).l Using Eqs.
`(8.10) and (8.12),
`the
`minimum mean-squared prediction error can be shown to be
`
`is clear that the ak‘s are functions of n (the time index at which they are estimated) although
`1It
`this dependence will not be explicitly shown. We shall also find it advantageous to drop the sub-
`scripts n on E", s,,(m), and ¢”(i.k) when no confusion will result.
`
`400
`
`and using Eq. (8.14) we can express E,1 as
`
`E. = ¢.(0,0) — fi ak¢.(0.k)
`k=l
`
`(8.16)
`
`Thus the total minimum error consists of a fixed component, and a component
`which depends on the predictor coefficients.
`
`To solve for the optimum predictor coefficients, we must first compute
`the quantities ¢>,,(i.k) for 1 S I' S p and 0 S k S p. Once this is done we
`only have to solve Eq.
`(8.14)
`to obtain the ak’s. Thus,
`in principle,
`linear
`prediction analysis is very straightforward. However, the details of the compu-
`tation of ¢,,(i,k) and the subsequent solution of the equations are somewhat
`intricate and further discussion is required.
`So far we have not explicitly indicated the limits on the sums in Eqs.
`(8.8)-(8.10) and in Eq. (8.12); however it should be emphasized that the limits
`on the sum in Eq.
`(8.12) are identical
`to the limits assumed for the mean
`squared prediction error in Eqs. (8.8)-(8.10). As we have stated, if we wish to
`develop a short-time analysis procedure, the limits must be over a finite inter-
`val. There are two basic approaches to this question, and we shall see below
`that two methods for linear predictive analysis emerge out of a consideration of
`the limits of summation and the definition of the waveform segment s,,(m).
`
`8.1.1 The autocorrelation method [1,2,5]
`
`One approach to determining the limits on the sums in Eqs. (8.8)-(8.10)
`and Eq.
`(8.12) is to assume that the waveform segment, s,,(m), is identically
`zero outside the interval 0 S m S N — 1. This can be conveniently expressed
`as
`
`(8.17)
`s,,(m) = s(m+n)w(m)
`where w(m) is a finite length window (e.g. a Hamming window) that is identi-
`cally zero outside the interval 0 S m S N — l.
`
`The effect of this assumption on the question of limits of summation for
`the expressions for E, can be seen by considering Eq. (8.5). Clearly, if s,,(m)
`is nonzero only for 0 S m S N — 1, then the corresponding prediction error,
`e,,(m),
`for
`a p'h order predictor will
`be nonzero over
`the
`interval
`0 S m S N ~ 1 + p. Thus, for this case 5,, is properly expressed as
`E,, = IVE—1.33m)
`m :0
`
`(8.18)
`
`Alternatively, we could have simply indicated that the sum should be over all
`nonzero values by summing from —00 to +00 [2].
`
`Returning to Eq. (8.5), it can be seen that the prediction error is likely to
`be large at the beginning of the interval (specifically OSmSp—l) because we
`401
`
`

`

`
`
`are trying to. predict the signal from samples that have arbitrarily been set to
`zero. Likew1se the error can be large at the end of the interval
`(specifically
`N s m S N +p—1) because we are trying to predict zero from samples that
`are nonzero. Fer this reason, a window which tapers the segment, s,,(m), to
`zero 15 generally used for w(m) in Eq. (8.17).
`
`it is sym-
`i.e.,
`The po matrix of autocorrelation values is a Toeplitz matrix;
`metric and all
`the elements along a given diagonal are equal. This special
`property will be exploited in Section 8.3 to obtain an efficient algorithm for the
`solution of Eq. (8.23).
`
`8.1.2 The covariance method [3]
`
`The second basic approach to defining the speech segment 5,,(m) and the
`limits on the sums is to fix the interval over which the mean-squared error is
`computed and then consider the efiect on the computation of ¢,,(i,k). That is,
`if we define
`
`N—l
`
`E” = 2 en2(m)
`m-O
`
`(8.26)
`
`then d2,,(i,k) becomes
`
`1 S ~ S
`I
`N—I
`I
`(8.27)
`0 < [($113
`mm) = Eosn(m—I)s,,(m—k)
`In this case, if we change the index of summation we can express ¢,,(i,k) as
`either
`
`N—i—I
`d>,,(i,k) = "12-2,- sn(m)s,,(vm+i—k)
`
`6 E
`
`(8.2821)
`
`or
`
`(8 28b)
`1‘ ’ g P
`( +k—‘)
`)
`(k) — NEH (
`.
`0<k<p
`¢nl,,—m=_ks,,ms,,m
`I
`Although the equations look very similar to Eq. (8.1%), we see that the limits
`of summation are not the same. Equations (8.28) call for values of s,,(m) out-
`side the interval 0 < m < N — 1.
`Indeed,
`to evaluate ¢,,(i,k) for all of the
`required values of i and k requires that we use values of s,,(m) in the interval
`—p S m < N — 1.
`If we are to be consistent with the limits on E” in Eq.
`(8.26) then we have no choice but to supply the required values.
`In this case it
`does not make sense to taper the segment of speech to zero at the ends as in
`the autocorrelation method since the necessary values are made available from
`outside the interval 0 < m S N — 1. Clearly, this approach is very similar to
`what was called the modified autocorrelation function in Chapter 4. As pointed
`out in Section 4.6, this approach leads to a function which is not a true auto-
`correlation function, but rather, the cross-correlation between two very similar,
`but not identical, finite length segments of the speech wave. Although the
`differences between Eq. (8.28) and Eq. (81%) appear to be minor computa-
`tional details, the set of equations
`
`1" ak¢,,(i,k) = ¢,,(i,0)
`k-l
`
`i= 1,2, ...,p
`
`(8.2%)
`
`has significantly different properties that strongly affect the method of solution
`
`403
`
`to
`(8.13) are identical
`The limits on the expression for ¢”(i.k) in Eq.
`those of Eq.
`(8.18). However, because s,,(m) is identically zero outside the
`interval 0 S m S N — 1, it is simple to show that
`~+ —1
`.
`_.
`_
`¢n(/.k)
`mfi-o
`s,,(m I)s,,(m k)
`can be expressed as
`
`1 < i <
`0 g kg?)
`
`(8.19a)
`
`=
`
`=
`
`\
`-_
`1 < I. < p
`N—l—(i—k)
`(k)
`(8.1%)
`0 S k g p
`S,,(m)s,,(m+/ k)
`320
`l,
`as,
`furthermore it can be seen that in this case qS,,(i,k) is identical to the short-
`time autocorrelation function of Eq. (4.30) evaluated for (i—k). That is
`
`where
`
`d),,(i,k) = R,,(i—k)
`
`'
`
`(8.20)
`
`N—l—k
`
`R,,(k)= 2 s,,(m)s,,(m+k)
`m-O
`
`(8.21)
`
`The computation of R,,(k) is covered in detail in Section 4.6 and thus we shall
`not consnder such details here. Since R,,(k) is an even function, it follows that
`¢n(i,k) = R,,(Ii—kl)
`"" f,
`(8.22)
`Therefore Eq. (8.14) can be expressed as
`
`f akR”(Ii—kl) = 12,0)
`k-l
`
`1 s i < p
`
`(8.23)
`
`Similarly, the minimum mean squared prediction error of Eq. (816) takes the
`orm
`
`E, = R,,(O) - )5 akxnuc)
`k-l
`
`(8.24)
`
`as
`
`The set of equations given by Eqs, (8.23) can be expressed in matrix form
`
`R,,(O)
`R,,(l)
`R,,(2)
`
`R,,(l)
`R,,(O)
`R,,(l)
`
`R,,(2)
`R,,(l)
`R,,(O)
`
`-- - R,,(p—l)
`R,,(p—Z)
`R,,(p—3)
`
`R,,(P—l)
`
`R,,(p—Z) R,,(p—3)
`
`-
`
`‘
`
`'
`
`R,,(O)
`402
`
`a1
`a;
`a3
`
`ap
`
`' R,,(l)
`R,,(2)
`R,,(3)
`
`R.,,(1'J)
`
`(8.25)
`
`

`

`
`
`and the properties of the resulting optimum predictor.
`equations become
`
`In matrix form these
`
`¢.(1,1)
`¢,,(2,1)
`¢ (3 1)
`
`¢,(1,2) ¢,(1,3)
`¢,(2.2) ¢>,,(2,3)
`as
`(3 2) ¢,(3,3)
`
`mm»
`¢"(2,p)
`<b,,(3,p)
`
`a]
`a2
`a3
`
`d>.(1.0)
`¢,,(2,0)
`¢n(3,0)
`
`¢,(p,1)
`
`¢,(p,2) ¢,(p.3)
`
`-
`
`‘
`
`' 4),,(1w)
`
`a,
`
`¢,(p,0)
`
`(8.2%)
`
`the pxp matrix of
`(8.28)),
`(see Eq.
`In this case, since ¢,,(i,k)=d>,,(k,i)
`correlation-like values is symmetric but not Toeplitz.
`Indeed,
`it can be seen
`that the diagonal elements are related by the equation
`
`¢n(i+1,k+1) = ¢,,(/,k) + s,,(-i—1)s,,(—k—1)
`
`— s,,(N-—1—i)s,,(N—1—k)
`
`(8.30)
`
`The method of analysis based upon this method of computation of
`¢,,(i,k) has come to be known as the covariance method because the matrix of
`values {¢,,(I'.k)l has the properties of a covariance matrix [5].2
`
`8.1.3 Summary
`
`It has been shown that by using different definitions of the segments of
`the signal
`to be analyzed,
`two distinct sets of analysis equations can be
`obtained. For the autocorrelation method,
`the signal
`is windowed by an N-
`point window, and the quantities d>,,(i,k) are obtained using a short-time auto-
`correlation function. The resulting matrix of correlations is Toeplitz leading to
`one type of solution for the predictor coefficients. For the covariance method,
`the signal is assumed to be known for the set of values —p S n S N—l. Out-
`side this interval no assumptions need be made about the signal, since these are
`the only values needed in the computation. The resulting matrix of correla-
`tions in this case is symmetric but not Toeplitz. The result is that the two
`methods of computing the correlations lead to different methods of solution of
`the analysis equations and to sets of predictor coefficients with somewhat
`different properties.
`
`In later sections we will compare and contrast computational details and
`results for both these techniques as well as for another method yet to be dis—
`cussed.
`First, however, we will show how the gain, G,
`in Fig. 8.1, can be
`determined from the prediction error expression.
`
`8.2 Computation of the Gain for the Model
`
`[2]
`
`It is reasonable to expect that the gain, G, could be determined by matching the
`
`is somewhat confusing since the term covariance
`2This terminology, which is firmly entrenched,
`usually refers to the correlation of a signal with its mean removed.
`404
`
`energy in the signal with the energy of the linearly predicted samples. This
`indeed is true when appropriate assumptions are made about the excitation sig-
`nal to the LPC system.
`
`It is possible to relate the gain constant G to the excitation signal and the
`error in prediction by referring back to Eqs. (8.2) and (8.5).3 The excitation
`signal, Gu (n), can be expressed as'
`Gu(n) = s(n) — f aks(n—k)
`k-I
`
`(3.313)
`
`whereas the prediction error signal e(n) is expressed as
`
`e(n) = s(n) - f, akS(H—k)
`k==I
`
`(8.3lb)
`
`In the case where ak = ark,
`the model are identical, then
`
`i.e.,
`
`the actual predictor coefficients, and those of
`
`(8.32)
`e‘(n) = Gu(n)
`i.e., the input signal is proportional to the error signal with the constant of pro-
`portionality being the gain constant, G. A detailed discussion of the properties
`of the prediction error signal is given in Section 8.5.
`Since Eq. (8.32) is only approximate (i.e., it is valid to the extent that the
`ideal and the actual linear prediction parameters are identical) it is generally not
`possible to solve for G in a reliable way directly from the error signal
`itself.
`Instead the more reasonable assumption is made that the energy in the error
`signal is equal to the energy in the excitation input, i.e.,
`N—l
`N—l
`62 2 u2(m) = 2 e2(m) = E”
`m =0
`m -0
`
`(8.33)
`
`At this point we must make some assumptions about u(n) so as to be
`able to relate G to the known quantities, e.g.,
`the ak’s and the correlation
`coefficients. There are two cases of interest for the excitation. For voiced
`speech it
`is reasonable to assume u(n) = 6(n),
`i.e.,
`the excitation is a unit
`sample at n = 0.4 For this assumption to be valid requires that the effects of
`the glottal pulse shape used in the actual excitation for voiced speech be
`lumped together with the vocal
`tract transfer function, and therefore both of
`these effects are essentially modelled by the time-varying linear predictor. This
`requires that the predictor order, p, be large enough to account for both the
`vocal
`tract and glottal pulse effects. We will discuss the choice of predictor
`order in a later section. For unvoiced speech it is most reasonable to assume
`that u(n) is a zero mean, unity variance, stationary, white noise process.
`Based on these assumptions we can now determine the gain constant G by
`utilizing Eq. (8.33). For voiced speech, we have as input G6(n).
`If we call the
`
`3Note that the gain is also a function of time.
`
`that for this assumption to be valid requires that the analysis interval be about the same
`4Note
`length as a pitch period.
`
`405
`
`

`

`
`
`resulting output for this particular input h(n) (since it is actually the impulse
`response of the system with transfer function H (2) as in Eq. (8.1)) we get the
`relation
`
`h(n) = f akh(n—k) + 05(4)
`k-l
`
`(3.34)
`
`is readily shown [Problem 8.1]
`It
`defined as
`
`that the autocorrelation function of h(n),
`
`R(m) = )3 h(n)h(m+n)
`"-0
`
`(8.35)
`
`satisfies the relations
`
`R(m) = f akRflm—kl)
`k-l
`
`m=1,2,
`
`.
`
`.
`
`.
`
`, p
`
`(8.36a)
`
`and
`
`since E[u(n)g(n—m)]=0 for m > 0 because u(n) is uncorrelated with any
`signal prior to u(n). For m = 0 we get
`
`Me) = f: akR (k) + GE[u(n)g(n)]
`k=l
`
`= f (1,}? (k) + 02
`k‘l
`
`(8.42)
`
`Since
`since E[u(n)g(n)] = E[u(n)(Gu(n) + terms prior to n)] = G.
`energy in the response to Gu(n) must equal the energy in the signal, we get
`R(m) = R,,(m)
`o <m s p
`(8.43)
`
`the
`
`01'
`
`13(0) = )2; akR(k) + 62
`k—l
`
`(8.36b)
`
`G2 = R,,(0) — )5 akRnUc)
`k-l
`
`(8.44)
`
`Since Eqs. (8.36) are identical to Eqs. (8.23) it follows that
`
`as was the case for the impulse excitation for voiced speech.
`
`R(m) = R,,(m)
`
`1 s m < p
`
`(8.37)
`
`Since the total energies in the signal (R (0)) and the impulse response (R(0))
`must be equal we can use Eqs. (8.24) , (8.33) and (8.36b) to obtain
`
`02 == R,,(0) - f a,R,(k) = E,,
`k-l
`
`(8.38)
`
`It is interesting to note that Eq. (8.37) and the requirement that the energy of
`the impulse response be equal to the energy of the signal together require that
`the first p + l coefficients of the autocorrelation function of the impulse
`response of the model are identical to the first p + 1 coeflicients of the auto-
`correlation function of the speech signal.
`
`For the case of unvoiced speech, the correlations are defined as statistical
`averages.
`It is assumed that the input is white noise with zero mean and unity
`variance; i.e.,
`
`E[u(n)u(n—m)] = 8(m)
`
`(8.39)
`
`If we excite the system with the random input Gu (n) and call the output g(n)
`then
`
`g(n) = i angI—k) + Gu(n)
`k-l
`
`(8.40)
`
`If we now let R (m) denote the autocorrelation function of g(n), then
`
`R(m) = E[g(n)g(n-—m)] = i akE[g(n—k)g(n-‘m)] + EIGU(n)g(n-M)l
`k-l
`
`= i akR (m—k)
`k-l
`
`m ¢ 0
`
`(8.41)
`
`406
`
`8.3 Solution of the LPC Equations
`
`In order to effectively implement a linear predictive analysis system, it is neces-
`sary to solve the linear equations in an efi‘icient manner. Although a variety of
`techniques can be applied to solve a system of 1) linear equations in p unk-
`nowns,
`these techniques are not equally efiicient. Because of the special pro-
`perties of the coefficient matrices it
`is possible to solve the equations much
`more efficiently than is possible in general.
`In this section we will discuss in
`detail
`two methods for obtaining the predictor coefficients, and then we will
`compare and contrast several properties of these solutions.
`
`8.3.1 Cholesky decomposition solution
`for the covariance method [3]
`
`For the covariance method, the set of equations which must be solved is
`of the form:
`
`f ak¢,,(i,k) = ¢,,(i,0)
`k==l
`
`i=1,2, ...,p
`
`(8.45)
`
`or in matrix notation
`
`(D0: = (I:
`(8.46)
`where (b is a positive definite symmetric matrix with (i,j)”' element ¢,,(i j),
`and a and (l; are column vectors with elements a,, and 9),,(1, 0) respectively.
`The system of equations given by Eq.
`(8.45) can be solved in an efficient
`manner since the matrix (I) is a symmetric, positive definite matrix. The result-
`ing method of solution is called the Cholesky decomposition (or sometimes it is
`Anlll
`
`

`

`
`
`called the square root method) [3]. For this method the matrix (I) is expressed
`in the form
`
`Using Eq. (8.51) for i = 2 gives
`
`‘1) = VDV‘
`
`(8.47)
`
`where V is a lower triangular matrix (whose main diagonal elements are all
`1’s), and D is a diagonal matrix. The superscript tdenotes matrix transpose.
`The elements of the matrices V and D are readily determined from Eq. (8.47)
`by solving for the (/,j) ”’ element of both sides of Eq. (8.47) giving
`¢,,(i,j) = t V,kde_,-k
`1 s j g i—l
`k-l
`
`(8.48)
`
`01'
`
`'—l
`VUdj= Mu) — $3 may”
`k=l
`
`1 < j g 1—1
`
`(8.49)
`
`and, for the diagonal elements
`
`I
`
`or
`
`with
`
`¢,,(i,i) = 2 V,kde,k
`k-l
`
`[—1
`d,= ¢,,(i,i) — 2 dek
`k-l
`
`i 2 2
`
`d1= (inn, 1)
`
`(8.50)
`
`(8.51)
`
`(8.52)
`
`To illustrate the use of Eqs. (8.47)-(8.52) consider an example with p = 4, and
`matrix elements qS,,(i,j) = 42,}. Equation (8.47) is thus of the form
`
`¢ii ¢21 ¢31 d>41
`
`¢21 (1522
`
`4332 4’42
`
`$31 ¢32 ¢33 (#43 =
`¢>41
`0542
`0543 ¢44
`
`1000 d10001V21V31V41
`V21100 0(1200'01V32V42
`V3XV321000d30001V43
`V41V42V431000d4 00 01
`
`To solve for d1 to d4, and the V,;,-’s we begin with Eq. (8.52) for i = 1 giving
`‘11”= (1511
`
`Using Eq.
`
`(8.49) for i = 2,3,4 we solve for V21, V3], and V41 as
`
`V21d1= ¢2i
`
`u VSIdl = ¢31
`
`’ V41‘11= 4’41
`
`V21 = ¢21/di-
`
`V31 = ¢31/d1 : V41= (bu/‘11
`408
`
`Using Eq. (8.49) for i = 3 and 4 gives
`
`0’2 = 9522 — V221dl
`
`or
`
`V3242 = (1532 — V3ldl V21
`
`V4242 = ¢42 — V4ld1 V21
`
`V32 = (4532— V3141 V21)/d2
`
`V42 = (¢42— V41d1V21)/d2
`Equation (8.51) is now used for i = 3 to solve for d;, then Eq. (8.49) is used
`for i = 4 to solve for V43, and finally Eq. (8.51) is used for i = 4 to solve for
`d4.
`
`it is relatively simple
`Once the matrices V and D have been determined,
`to solve for the column vector or in a two-step procedure. From Eqs.
`(8.46)
`and (8.47) we get
`
`which can be written as
`
`and
`
`or
`
`VDV‘ = (I)
`
`VY = u];
`
`DV‘a = Y
`
`(8.53)
`
`(8.54)
`
`(8.55)
`
`(8.56)
`V‘a = D‘lY
`(8.54) can be solved for the column vector Y
`Thus from the matrix V, Eq.
`using a simple recursion of the form
`i—l
`
`Y.= ¢.— 2 V,,-Y,.
`1-1
`
`p 21'; 2
`
`(8.57)
`
`with initial condition
`
`the relation
`
`with initial condition
`
`It should be noted that
`i=p—1downtoi=l.
`
`the index i in Eq.
`
`(8.59) proceeds backwards from
`
`up = Yp/dp
`
`(8.60)
`
`

`

`
`
`(8.57)~(8.60) we continue our previous 1
`To illustrate the use of Eqs.
`example and first solve for the Y,'s assuming V and D are now known.
`In
`matrix form we have the equation
`
`From Eq. (8.56) we can substitute for a'the expression Y'D‘1V‘1 giving
`E,, = ¢"(0,0) — Y'D-lv-1¢
`
`(8.63)
`
`1
`
`V21
`
`0
`
`1
`
`V31 V32
`
`O
`
`0
`
`1
`
`0
`
`0
`
`0
`
`V41 V42 V43 1
`
`Y1
`
`Y2
`
`Y4
`
`Y3]='80:
`
`llli
`
`1112
`
`#14
`
`From Eqs. (8.57) and (8.58) we get
`
`Y1 = $1
`
`Y2 = 1112 — V21Y1
`
`Y3= lliz— V31Y1 — V32Y2
`
`Y4=¢4— V41Y1— V42Y2' V43Y3
`From the Y,’s we solve Eq. (8.56) which is of the form
`
`1
`
`0
`
`V21 V31 V41
`
`1
`
`V32 V42
`
`0‘1
`
`0‘2
`
`1
`o
`
`0
`0
`
`o
`0
`0
`0
`l/dl
`0
`i/d2
`o
`1/d3
`0
`0
`.0
`0
`0
`From Eqs. (8.59) and (8.60) we get
`
`V43
`1
`
`0
`0
`0
`1/(14
`
`63 ‘
`a4
`Yl/dx
`Y1
`Y2 m2
`Y3 = ma
`Y4
`Y4/d4
`
`a4= Y4/d4
`
`as: Ys/da‘ V4304
`
`012= Yz/dz — V32“: " V4204
`
`‘11 = Yl/dl — V2102 “ V3101: _ V41a4
`thus completing the solution to the covariance equations.
`The use of the Cholesky decomposition procedure leads to a very simple
`expression for the minimum error of the covariance method in terms of the
`column vector Y and the matrix D. We recall that for the covariance method,
`the prediction error E,, was of the form
`
`or in matrix notation
`
`5. = ¢.(o.0) — )5 6.6.(o,k)
`k-l
`
`E,, = ¢"(0,0) — at];
`410
`
`(8.61)
`
`(8.62)
`
`Using Eq. (8.54) we get
`
`or
`
`E. = ¢.(o,0) — Y'D-IY
`
`E,,= ¢.(o,o) ~ fi Ykz/dk
`k-l
`
`(8.64)
`
`(8.65)
`
`Thus the mean-squared prediction error E,, can be determined directly from the
`column vector Y and the matrix D. Furthermore Eq.
`(8.65) can be used to
`give the value of E,, for any value 'of p up to the value of 1) used in solving the
`matrix equations. Thus one can get an idea as to how the mean-squared predic-
`tion error varies with the number of predictor coefficients used in the solution.
`
`8.3.2 Durbin’s recursive solution
`for the autocorrelation equations [2]
`
`For the autocorrelation method the matrix equation for solving for the
`predictor coefficients is of the form
`
`)5 “mu/~78!) = Rn(i)
`k-l
`
`1 < i s p
`
`(8.66)
`
`By exploiting the Toeplitz nature of the matrix of coefficients, several efficient
`recursive procedures have been devised for solving this system of equations.
`Although the most popular

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket