`Coding of Wide Band Speech at 32kbits/sec
`
`Erik Ordentlich
`
`Submitted in Partial Fulfillment of the
`Requirements for the Degrees of
`
`Master of Science in Electrical Engineering and Computer Science
`
`Bachelor of Science in Electrical Engineering and Computer Science
`
`and
`
`a.t the
`
`Massachusetts Institute of Technology
`
`April 1990
`
`@Massachusetts Institute of Technology
`
`Signature of Author
`
`Certified by
`
`Department of Electrical Engineering a.nd Computer Science
`April 1, 1990
`
`-
`>!,9fea&M-,Ja.e S. LiJD, Thesis Supervisor
`~ -~
`_/
`
`Accepted Ly .__
`
`........_ __ ___;;.. __ ~~-----.,,- -=-:---------------'---"-----""">-~~
`Arthur C. Smltlt' ..
`1.
`Chairman, Electrical Engineering Department Committee
`on Graduate Students, Massachusetts Institute of Technology
`
`Accepted by
`
`....
`,
`Vair Shoha.m, AT&T Bell Laboratories
`Com11any Official
`
`ARCHIVES
`
`tlASSACHUSEnS 1~5TITUTE
`Of TECHNOLOGY
`AUG 10 1990
`UbRAAlES
`
`Ex. 1042 / Page 1 of 133
`Apple v. Saint Lawrence
`
`
`
`Low Delay - Code Excited Linear Predictive (LD-CELP) Coding
`of Wide Band Speech at 32kbits/sec
`
`by
`
`Erik Ordentlich
`
`Abstract
`
`We investigate the potential of the Linear Predictive Coding (LPC) based algorithm,
`Low Delay - Code Excited Linear Prediction (LD-CELP), for coding wide band (.05-7khz)
`speech at a bit rate of 32kbits/sec. This bit rate allows for stereo transmission of wide band
`speech over a single 64khits/sec basic rate ISDN channel, thereby permitting the simulta(cid:173)
`neous transmission of image data over the other 64kbits/sec channel. The low delay feature
`makes tele-conferencing and other two way speech communication sceaarios particularly
`inviting applications for this type of coding algorithm.
`A simple minded application of the LD-CELP algorithm to wide hand speech resulted in
`audible high frequency distortion with essentially perfect coding at low frequencies. Three
`approaches ·to compensating for this frequeucy··a.S.ymni~try in coding quality were investi(cid:173)
`gated. .The first app~oach involves simple modifications to the conventional CELP noise
`weighting filter in an·effort to introduce more degrees of freedom, and thereby allow for a
`more accurate modeling of both spectral tilt and formant structure. The second approach
`uses a psycho-acoustically based model of human hearing to determine the noise weighting
`filter. The third approach decomposes the input speech into high and low frequency bands
`using a quadrature mirror filter bank and applies the LD-CELP algorithm to each hand sep(cid:173)
`arately. An additional part of the thesis is devoted to improving the efficiency of the CELP
`codebook design algorithm, which was found to be slow and cumbersome. Two improve(cid:173)
`ments are proposed and evaluated. Finally we preGent the results of a small-scale, formal,
`a/h subjective test comparing our best 32kbits/sec LD-CELP system to the 64khits/sec
`G.722 CCITT wide band speech coding standard.
`The ma.in conclusion of the thesis is that 32khits/sec high quality coding of wide band
`speech is definitely viable, and that the ha.sic LD-CELP algorithm along with some im(cid:173)
`provements suggested in the thesis is an excellent candidate for this application. The
`subjective test results indicate that the speech qualities of our 32kbits/sec LD-CELP and
`the 64kbits/sec G.722 are identical.
`
`Thesis Supervisor: Dr. Jae S. Lim
`Title:
`Professor of Electrical Engineering
`
`1
`
`Ex. 1042 / Page 2 of 133
`
`
`
`2
`
`Ex. 1042 / Page 3 Of133
`
`Ex. 1042 / Page 3 of 133
`
`
`
`Acknowledgements
`
`' I would like to thank the Signal Processing Research Department at AT&T Bell Labora-
`tories, Murray Hill, NJ for financial, intellectual and moral support during the course of this
`research. Deepest appreciation goes to my mentor, Dr. Yair Shoham, and my department
`head, Dr. N. S. Jayant. I would also like to thank Professor Jae S. Lim for being my thesis
`advisor at MIT.
`
`3
`
`Ex. 1042 / Page 4 of 133
`
`
`
`EX. 1042 / Page 5 0f133
`
`4
`
`Ex. 1042 / Page 5 of 133
`
`
`
`Contents
`
`1 Introduction
`1.1 Ba.ckground a.nd Motivation
`1.2 The Topic . . . . . . .
`1.3 Organization of Thesis . . .
`
`2 Basics of LPC and LD-CELP
`2.1
`Introduction . . . . . .
`2.2 Wha.t is LPC?
`. . . . . . . . ..
`2,2.1 The. Basic. Idea. ..... .
`. 2.2.2 · LPC An:tlysis of. a. Signal s( n )" . ·.
`2.2.3 Prediction Error
`. . . .
`2.2.4 Long-term Prediction
`2.3 Basic CELP a.nd LD-CELP
`2.3.1
`Introduction
`2.3.2 Basic CELP . .
`2.3.3 LD-CELP ...
`2.3.4 Our Coder
`. .
`2.3.5 Weighting a.nd Synthesis Filter Derivation
`2.3.6 Conclusion
`. . . . . . . . . . . . . . . . .
`
`3 Statement of Problem
`3.1
`Introduction . . . . . . . . . . . . . . . . . .
`3.2 Quantization Noise in LPC Based Systems.
`3.2.1 New CELP Architecture ...... .
`3.2.2 Prediction Ga.in Versus Frequency
`3.2.3 LD-CELP a.nd Implicit Bit Allocation
`3.2.4 Conclusions . . . . . . . .
`3.3 Return to Problem Statement . .
`3.4 Objective Criterion-SBSNR
`3.5 Conclusion . . . . . . . . . . . .
`
`.
`
`4 Simple Noise Weighting
`4.1
`Introduction . . . . . . . . . . . . . . . .
`4.2 Basic Coder Parameters . . . . . . . . .
`4.3
`Ina.dequa.cies of the Narrow Ba.nd Filter
`
`5
`
`9
`9
`10
`12
`
`13
`13
`13
`13
`15
`17
`21
`22
`22
`23
`25
`26
`30
`31
`
`32
`32
`33
`33
`39
`42
`42
`43
`46
`47
`
`49
`49
`50
`53
`
`Ex. 1042 / Page 6 of 133
`
`
`
`4.4 Non-adaptive Three Pole Weighting . . . . .
`4.5 Broad High Fraquency Weighting . . . . . . .
`4.6 Adaptive Three Pole and Three Zero Shaping
`4. 7 Adaptive Two Pole Shaping
`4.8 Conclusion
`. . . . . . . . . . . . . . .
`
`5 Perceptually Optimum Nois~ Shaping
`5.1
`Introduction . . . . . . . . . . . . . .
`5.2 Perceptual Transform Coding . . . . .
`5.2.1 Frequency Domain Masking ..
`5.3 Perceptual Transform Coding Algorithm .
`5.4 Recent Improvements of the Perceptual Model
`5.5 Extension of Algorithm to LPC . . . . . . . . .
`5.5.1 The Threshold Calculator . . . . . . . .
`5.5.2 LPC Analogue of Threshold Calculator
`5.5.3 CELP and the Psycho-aco1Jstic Noise Model .
`5.5.4 The Delay Issue
`5.6 Perceptual Entropy .
`5. 7 Conclusion
`. . . . .
`
`6
`
`·Split" Ba·rid LD-CELP
`6.1
`Introduction . . . . .
`6.2 Split Band Coding and Our System
`6.3 Results . . .
`6.4 Conclusion
`
`. .
`
`7 Codebook Design
`Introduction . . . . . . . . . . . . . .
`7.1
`7.2 Design Algorithm . . . . . . . . . . .
`7.3 Alternate 'Empty' Vector Processing
`7 .4 Eigenvalue Based Codebook Search .
`7.4.1 The Proposed Algorithm
`. .
`7.4.2 Lower Bound on the Minimum Eigenvalue of a Symmetric Positive
`Definite Matrix . . . . . . . . . . . . . . . . . .
`7.4.3 Performance Analysis of the Search Algorithm
`7 .5 Conclusion
`. . . . . . . . . . . . . . . . . . . . . . . .
`
`8 Subjective Test
`
`9 Summary and Conclusions
`9.1 Summary
`.
`9.2 Conclusion& . . . . . . . .
`
`6
`
`57
`61
`63
`65
`66
`
`68
`68
`69
`69
`·13
`76
`79
`79
`82
`86
`87
`90
`92
`
`93
`93
`93
`98
`99
`
`101
`101
`101
`104
`108
`108
`
`112
`113
`121
`
`122
`
`128
`128
`130
`
`Ex. 1042 / Page 7 of 133
`
`
`
`List of Figures
`
`2.1 Clasaical LPC architecture . . • • • • • • . . . . •
`2.2 Classicot.l LPC coder configuration with pitch loop ..
`2.3 Basic CELP architecture.
`
`3.1
`
`3.2
`3.3
`
`'Equivalence' of CELP with n1Jise weighting and pre/post filtered CELP without noise
`weighting . . . . . . . . . . . . . . . . . . . .
`CELP coder with DPCM type architecture and pre/post filtering.
`Band partitions for the SBSNR. distortion measure.
`. . . . . . .
`
`4.1
`Ideal prediction gain venus prediction order for wide band speech.
`· 4.2 Magnitud~ of the. 1/W(ej"'); 'Yi =.I and 'Yll =:as, (dotted) and 1/A(ei"') (solid).
`4.3 F~equency response of P1(z) for the case of r1 = r2 = .2 and 8 = 58". • • · •..
`4.4 Comparison o{ IW(z)l2 and IW(z)P1(z)j2 • • • • • ." • • • • • • • • • • • • •
`4.5
`Ideal square wave weighting filter (upper) and all-pole approximation (lower). .
`
`5.1 Tone masking noise (top) and noi.le masking tone (bottom).
`. • . • • . • . .
`5.2 Ahsolute threshold of hearing venua frequency.
`. • . . . . . . • . . . . . .
`5.3
`Ideal noise shape as given by the psycho-acoustic model (top), the all pole model (bottom
`dotted) and the pole/zer1.1 model (bottom solid). • . . . . . . . . . . . . . . . . . . .
`5.4 Relationship between noise shape analysi& window and coding window {or the psycho-
`acoustica!ly bl.Rd CELP coder.. • • • • • . • • . . • • • • . • • . . . . . • .
`Inverse tonality measure at various update rates for a 256 sample analysis window.
`
`5.5
`
`6.1 Split band LD-CELP.
`
`. . • • . • • • • . . • • • . . . . • • • . . .
`
`14
`22
`23
`
`34
`35
`47
`
`51
`54
`57
`58
`62
`
`6'9
`73
`
`84
`
`88
`89
`
`94
`
`7.1 Splitting in t:i-e direction of the smit.llest eigenvalue eigenvector (solid); splitting in the
`direction o{ the largellt eigenvalue eigenvector (ciotted).
`
`107
`
`7
`
`Ex. 1042 / Page 8 of 133
`
`
`
`List of Tables
`
`4.1 Parameter values for the coder which served u a teat vehicle for I.he noise shaping ideas
`described in this chapter. • . . . • • . . • . . . . • • . . • . . . • . • . . . . .
`4.2 SBSNR values averaged over four files coded using four different pitch configurations.
`. .
`4.3 Pitch gain codebook entries for the various pitch configurations tes1ted. • . • . • . . . .
`4.4 SBSNR's averaged over the same four files for two coding systems using 32 order LPC
`synthesis filters.
`. . . . • . . . . . . • . . . . • . .
`
`5.1 Parameters of the perceptually based LD-CELP coder.
`
`6.1 Fixed bit allocationi for high and. low subbanda. The bit rates are expressed iri bits per
`sample ~f ~he deci1J1ated subb~nds. This is ~hy the totiJ:in each tow ia equal to four.
`. .
`6.2 SBSNR's for three different fixed bit. allocations plus the SBSNR for the broad band LD-
`CELP with the 1, .85, P(z) = 1 weighting filter . • . • . . • · . • . . . • . . . . . . . .
`
`7 .1 Percentage of codebook for which full matrix multiplication needs to be done for various
`bandwidths and block sizes. The last column contains the reduction in multiplications
`based on the eigenvalue lower bound.
`. .
`
`8.1 Scores by listeners. Final score in bold .•
`8.2 Scor~ by listeners for male file pairs.
`. •
`8.3 Scores by lillteners for female file pairs.
`•
`8.4 Scores by file pair. The ·~ar' files are female, the 'ke' files are male ....
`
`50
`52
`53
`
`67
`
`87
`
`96
`
`98
`
`120
`
`124
`125
`126
`127
`
`.......
`'•
`
`/
`
`/
`
`...
`'
`
`/
`.I
`
`8
`
`Ex. 1042 / Page 9 of 133
`
`
`
`Chapter 1
`
`Introduction
`
`1.1 Background and Motivation
`
`Moderate rate digital telephone co~neativity such Cl:S AT &T's basic rate ISDN (two 64kbit/sec .
`
`data channels) allows for the possibility of high quality two-way wide b.tnd voice co~muni
`
`cation. Consequently the speech coding community is placing a greater emphasis on coding
`
`algorithms which operate at wider than telephone grade bandwidths. One specific aim is
`
`the high quality coding of commentary grade or .05-7khz bandwidth speech. The perceived
`
`difference between wide band speech and telephone band (.2-3.4khz) speech is enormous
`
`and the improvement in speech quality gained by increasing t:1e bandwidth makes wide
`
`band speech coding a.n important area of research.
`
`Wide band audio coding in general has become an active area of research in recent
`
`times with the advent of transform based coding techniques 1 capable of maintaining near
`
`perceptual transparency b~tween coded and originJ.1 signals at bit rates as low as 64khits/sec
`
`[16, 8). Researchers in this field are interested in coding a variety of audio signals and
`
`not just speech. Most of the algorithms owe their high coding quality to quantization
`
`noise shaping schemes which incorporate empirical and intuitive notions of human auditory
`
`perception. The idea is to take advantage of properties of human hearing such as time
`
`1Transform coding refers to schemes in which the transmiUer quantizes the DFT (or a. variant thereof)
`
`components of blocb of inpnt t.nd sends the resulting information to the receiver which then reconstructs
`
`the DFT componen~!I and does an IDFT to obtain an approximation of the input.
`
`9
`
`Ex. 1042 / Page 10 of 133
`
`
`
`and frequency domain masking in order to quantize spectral samples in such a way that the
`
`resulting quantization noise is inaudible. The primary disadvantage of transform approaches
`,
`in general is that they obtain their coding gain by operating on relatively large blocks
`
`of the input signal and thereby introduce significant encoding and decoding delay. This
`
`greatly limits their potential for network applications such as teleconferencing where large
`
`delays across the network can result in annoying feedback induced echoes. Although echo
`
`cancelers are available commercially, they would necessarily add to the cost of the network.
`
`Furthermore, based on current results in transform coding research it seems like 64kbits/sec
`
`represents a lower bound on the bit rate for high quality coding of 20khz signals. This
`
`is primarily because of the extensive side information requirements of transform based
`
`implementations. Reducing the bandwidth at which these coders operate probably won't
`
`help since most of the energy in most sounds is at low frequencies.
`
`One possible approach to the problem of coding wide-band speech which· has received.
`little ~oriSideration to. d~te is tin¢ar Predktive Codi~g (LPC): Over the years ·researchers
`have encountered tremendous success with LPC based techniques for coding narrow, tele- ·
`
`phone bandwidth speech, a recent example of which is a Bell Labs developed, real time,
`
`near perceptually transparent, 16kbit/sec Low Delay Code Excited Linear Predictive (LD(cid:173)
`
`CELP) coder [13). The concept of LD-CELP is derived from the basic CELP concept
`
`introduced by Atal and Schroeder in (5). CELP in turn is founded in classical LPC theory.
`
`The main advantages of LPC based coding approaches are the possibilities of low delay,
`
`as demonstrated by the 16kbit/sec LD-CELP coder, which can have an encoding delay as
`
`low as 625 usec ( the real time implementation actually has a higher delay so that real
`
`time constra.int11 can be met by available computational resources), and the fact that LPC
`
`provides a theoretically and experimentally justifiable excitation, linear filter decomposition
`
`or model of speech.
`
`1.2 The Topic
`
`The ma.in focus of this thesis will be the effectiveness of the LPC based algorithm, LD-CELP
`
`for coding wide-band (.05-7khz) speech at a bit rate of 32kbits/sec. The primary motivation
`
`for restricting the scope of the research to LD-CELP is the potential for high coding quality
`
`10
`
`Ex. 1042 / Page 11 of 133
`
`
`
`at very low delay as evidenced by the 16kbits/sec narrowbanc:I. LD-CELP implementation.
`
`A low delay high quality wide-band coder would make commentary grade teleconferencing
`,
`and two way voice communication a. reality. The motivation behind looking at a 32kbits/sec
`
`bit rate is heuristic. We are approximately doubling the bandwidth in going from telephone
`
`bandwidth to commentary bandwidth, hence we should double the bit rate. One may think
`
`that it should be trivial to code even wide band speech at 32kbits/sec as most of the energy
`
`is in the lower 3.5khz which can already be coded at 16kbits/sec. As will be shown later
`
`this thinking is justified only if signal to noise ratio (SNR) is the exclusive arbiter of quality.
`
`Although the energy in the 3.5-7khz range of the speech spectrum is significantly less than
`
`the energy in the lower half of the spectrum the high frequencies are perhaps as perceptually
`
`important as the lower ones. In any event we feel that 32kbits/sec is a good target bit rate.
`
`Currently there is no 32kbit/sec commentary grade coding standard. There is a CCITT
`
`7khz speech standard (G.722), however, at 64kbit/sec [20]. The next lowest useful bit rate
`.
`.
`is 32kbit/~ec .2 and lo"wer th~n· thi~ is·l6kbits/~e~: It seems i~prudent -t~- set 16bits/sec as
`a research goal when there is no solid footing even at 32kbits/sec.
`
`After initial experimentation with the basic LD-CELP algorithm applied to wide band
`
`speech it was found that one problem is a perceptually inadequate coding of high frequencies.
`
`Some high frequency hiss was present in almost all the files tested. The low frequencies
`
`on the other hand were being coded extremely well. The specific aim of this thesis is to
`
`investigate and evaluate three approaches to reducing the high frequency distortion in the
`
`wide band LD-CELP system. These approaches fall under the headings of,
`
`• Simple fixed and adaptive error or noise shaping.
`
`• Auditory model driven noise shaping.
`
`• Split band LD-CELP.
`
`One of the main conclusions of the thesis is that high quality coding of wide band speech
`
`at 32kbits/sec is dennitely within reach, and all the more so if some of the techniques pro·
`
`posed and evaluated in this thesis are refined and incorporated into the LD-CELP system.
`
`2This bit rate allows for the transmission of two independent or stereo wide ba.nd speech channels over a.
`
`single 64kbit/sec ISDN data. line.
`
`11
`
`Ex. 1042 / Page 12 of 133
`
`
`
`1.3 Organization of Thesis
`
`The remainder of the thesis is broken down into eight chapters. Three chapters, specifically
`
`4, 5, and 6 are devoted to explaining and evaluating the three fundamental approaches to
`
`the reduction of high frequency distortion pursued in this thesis. Chapter 2 is a long but
`
`necessary technical introduction to LPC and LD-CELP. Chapter 3 provides some further
`
`discussion on the high frequency distortion problem and gives some theoretical motivation
`
`for the three main approarhes of the thesis. Chapter 7 presents some ideas and improve(cid:173)
`
`ments to the VQ design algorithm used to generate the codebooks used in the LD-CELP
`
`quantization process. As is expl:Yned in chapter 3, each new idea and parameter change
`
`requires a codebook redesign. This is the only way that the merits of a particular technique
`
`or parameter choice can be evaluated unequivocally. Codebook redesign is a long task, and
`
`chapter 1 explores some ways in which it. can be made more efficient. Chapter 8 presents
`
`'
`
`'
`
`the. results of an 'a/b' subjective test comparing our best wide band 32kbits/sec LD~CELP
`.
`.
`.
`.
`.
`.
`.
`.
`:i.gorithm. to the ff4kbits/s~c wide band CCITT standard, G.722 [20]. The i~tent of this
`presentation is to give the reader a better idea of the speech quality of our LD-CELP algo(cid:173)
`
`rithm. The final chapter summarizes the results of the previous chapters, and restates the
`
`main conclusions of the thesis.
`
`12
`
`Ex. 1042 / Page 13 of 133
`
`
`
`Chapter 2
`
`Basics of LPC and LD-CELP
`
`2.1
`
`Introduction
`
`This chapter is divided into two sedions .. T~e first se~tion .introdu~es the basic concept.
`· of LPC and defines certain quantities and expression~ which will be referred to repeatedly
`in the thesis. The second section describes the specific LPC based systems of CELP and .
`
`LD-CELP.
`
`2.2 What is LPC?
`
`2.2.1 The Basic Idea
`
`The notion of predicting future values of a signal based on linear combinations of past
`
`values has been around for a while. It was first applied to speech coding by Schroeder and
`
`Atal in [6]. The basic idea is that any part of the input signal which can be predicted
`
`from pd.St values of the output signal available at the receiver is redundant and need not
`
`be transmitted. Only the difference between the predicted value, which the receiver can
`
`compute, and the actual input value need be transmitted. H the prediction is good the
`
`power in this difference or residual signal will be much less than the power in the input
`
`signal. This results in bit savings if coding quality is fixed or improved coding quality if the
`
`bit rate is fixed. Figure 2.1 depicts a block diag1am of the classical LPC architecture.
`
`The predictor P(z) is an FIR filter whose coefficients are determined by finding the
`
`13
`
`Ex. 1042 / Page 14 of 133
`
`
`
`Transmitter
`
`Receiver
`
`Figure 2.1: Classical LPC architecture.
`
`.
`.
`'optimum li~ear prediCtor for a block of input speech~ The first LPC coders of this nature
`had fixed predictors, but quickly it was realized that to compensate for the nonstationary
`
`nature of speech the predictor has to be updated periodically. The derivation of P( z) is
`referred to as LPC analysis. If this analysis is done on the input signal the coefficients of P( z)
`
`have to be sent to the receiver as side information. There are also techniques for adapting
`
`P(z) based on previously reconstructed samples available at both receiver and transmitter.
`
`In these schemes no side information is sent resulting in bit savings. However, the predictors
`
`derived in this fashion are less efficien~ resulting he a. higher variance difference signal which
`
`is harder to code. There is a trade off and only experimentation can show which method is
`
`more appropriate for a particular application. Backward adaptation of predictor parameters
`
`has one added advantage in that it can be achieved with very little or no encoding delay.
`
`Sometimes this is essential.
`
`3efore we move on it is worthwhile noting a special property of the LPC coder configu(cid:173)
`
`ration of figure 2.1. This property is that the OV'i!rall quantization noise is exactly equal to
`the quantization noise at Q. In other words, referring to the figure,
`
`q(n) = s(n) - s(n) = r(n) - r(n).
`
`14
`
`Ex. 1042 / Page 15 of 133
`
`
`
`This can be easily derived by noting that,
`
`and,
`
`s(n) = .i(n) + r(n)
`
`s(n) = s(n) + r(n).
`
`From these two relations the result follows immediately. Therefore, to characterize the
`
`quantization characteristic's of the entire LPC configuration we can focus t:!Xclusively on the
`characteristics of Q.
`
`2.2.2 LPC Analysis of a Signal s(n)
`
`This subsection presents an overview of the basic LPC analysis procedure. Throughout the
`
`section, input signal refers to signal which is undergoing analysis and not necessarily the
`
`inpq.t_ to the c~d.ing ·scheme. As poi~ted out ~arlie~, the ~n.alysi!J can af~o be done on .th~
`
`output of the coding scheme if backward adaptation of the prediction filter is used.
`The z-transform of the linear predictor P(z) has the form,
`
`P(z) = a1z-1 + ... + a.,,z-P
`.,,
`= E a1cz-1c.
`
`k=l
`
`If we let P(z) = ~~' in the time domain this translates into:
`.,,
`s(n) = E a1cs(n - k).
`
`k=l
`
`(2.1)
`
`(2.2)
`
`Now the idea is to determine the coefficients a1c in such a way that the predictor is optimum
`
`according to some criterion for the current input signal. Strictly speaking, the predictor
`
`should be designed so that we get the best prediction of the coder input from past values
`
`of the coder output. This, however, is an extremely unwieldy optimization problem since
`
`the coder output is not available until the predictor is decided upon. Traditionally it is
`
`assumed that the output of the coder will be similar enough to the input that designing
`
`the predictor to be optimum for predicting future samples of the input from past samples
`
`of th~ input will not be far off the mark. The criterion most often used for optimization is
`
`15
`
`Ex. 1042 / Page 16 of 133
`
`
`
`the mean square error between s(n) and the predicted value s(n) evaluated over the current
`
`section of the input. In the equations which follow sample inde.x 0 corresponds to the first
`
`I
`
`sample of the current N sample analysis block.
`
`Hence we are out to minimize,
`
`t(ai, ... a,,) = E [s(n) - s\n)J 2
`
`N-1
`
`n=O
`N-i
`
`n=O
`
`= E [s(n)- L a1cs(n - k)]2.
`
`P
`
`k=l
`
`(2.3)
`
`If we view £'(·) as a scalar fu·,1ction of a11 ••• a,, we note that it is convex U in these
`variables and hence it's local minimum is also it's global minimum. This minimum is
`
`obtained by requiring that,
`
`(2.4)
`
`. Up0n workiag ~U:t the algebra one obtains a set of linear· equation~ !n ·the a1c 's which ··we
`sha.ll express in matrix notation as:
`
`c.,(0,0)
`
`c,.(1, 0)
`
`c,.(O, 1)
`
`c.,(1, 1)
`
`c,,(O,p- 1)
`
`c,.(1,p-1)
`
`Cu(P- 1,0) c.,(p - 1, 1)
`
`c.,(p- l,p- 1)
`
`or,
`
`R,.a = r ...
`
`a1
`
`a2
`
`a,,
`
`=
`
`c,.(0,1)
`
`c.,(O, 2)
`
`c,,.(O, p)
`
`(2.5)
`
`(2.6)
`
`The matrix R., is a local estimate of the auto-correlation matrix ( or covariance matrix
`if the signal is assumed to be zero mean) of the input signal s(n). The entries r.,,,(i,j) of
`
`the matrix are defined as,
`
`N-l
`
`c,.(i,j) = E s(n - i)s(n - j).
`
`n=O
`
`(2.7)
`
`There are two ways to deal with the samples with negative indices referred to in (2. 7).
`
`The first, known as the autocorrelation method, multiplies the input by a window, w(n)
`which is zero for N -1 < n < 0. The window can be uniform over its support (rectangular),
`
`16
`
`Ex. 1042 / Page 17 of 133
`
`
`
`but typically it is chosen to minimize the effects of signal discontinuities at n
`
`0 and
`
`n =, N - 1. The entries of the matrix Ra;, in (2.5) are then calculated as,
`N-t
`c88(i,j) = L s(n)s(n - Ii - jl)
`
`n:O
`
`or letting Ii - ii = k,
`
`ru(k) = L s(n)s(n - k)
`n=O
`In other words cu(i,j) = c13(i + tl.,j +fl.). Once c,,,,(i,j) is known for i = 0, j = 1, ... , p w~
`have all of R,,,,. This procedure is the same as replacing s(n) by s(n)w(n) in equation (2.7)
`
`N-t
`
`with the limits of summation changed to ±oo. The matrix R,,,, obtained in this fashion is
`
`both symmetric and toeplitz. Consequently, equation 2.5 lends itself to an efficient O(n 2)
`
`solution by several recursive algorithms which exploit the structure of Raa· The most
`
`efficient of these algorithms also "takes advantage of the fact that p - 1 of the entries of f
`.
`.
`occur in Raa· AU of these algorithms are more effi~ient titan the 0( n3 ) gau.ssian elir.iination
`
`based. inversion methods.
`
`The other approach is called the covariance method. In this approach samples with
`
`negative indices simply correspond to samples from the previous analysis block. For example
`s(-1) = s'(N -1), where s'(n) also refers to the input signal but with n = 0 corresponding
`to the start of the previous analysis block. Calculated in this way, R,,,, is symmetric, but
`
`not toeplitz. This is the primary disadvantage of the covariance method. No algorithm
`
`faster than O(n3 ) exists for solving (2.5) if this method is used.
`
`For P(z)'s with orders significantly less than the analysis block length the two methods
`
`perform almost equally well. All coding schemes explored during the course of this research
`
`are based on the autocorrelation method of LPC analysis.
`
`2.2.3 Prediction Error
`
`An important indicator of the coding gain accrued by LPC over simple PCM type quanti(cid:173)
`
`zation is the ratio of the energy of the input signal to the energy of the error or difference
`
`signal obtained by passing the input s(n) through the filter 1 - P(z). This error, which we
`
`shall denote as r(n), is ~ue to imperfect short term prediction, since after all,
`r(n) = s(n) - s(n)
`
`17
`
`Ex. 1042 / Page 18 of 133
`
`
`
`where s(n) is defined in (2.2) as the predicted value of s(n). The error l'(·) in equation
`
`(2.3) is the energy of r(n). The analysis of the prediction error is usually carried out by
`,
`assuming that s( n) is a stationary random process, and r( n) is the random process resulting
`
`from passing s(n) through the system 1- P(z). The LPC analysis procedures described in
`
`the previous section can also be derived from thi!j point of view by simply replacing all the
`
`summation;. with expected values. Thus, instead of talking about the energy of a signal,
`
`we'll talk about it's variance. From here on we will denote the energy or variance of r(n) as
`
`u~ and the variance of s( n) as u:. The predictor coefficients were chosen to minimize u:.
`
`The prediction gain G is defined as:
`
`It should be obvious that G 2: 1 ( choosi~g all the a1c 'c; to be 0 results in G = 1 ). It is well
`
`understood in coding theory that for a sc~ar quantizer, for example the kind used in figure
`
`2.1, the variance ~f the 9.uantizatfon 'erl'or or. noise dem~ted by .a; is proportional to the.
`
`variance of the input to the quantizer. In other words:
`
`(2.8)
`
`whP,re x denotes the input to the quantizer, b is the hit rate in bits per sample, and f
`
`is a constant which depends on the probability distribution function (pdf) of x and the
`
`particular nature of the quantizer. Uniform and non-uniform quantizers will have different
`
`epsilons. Equation (2.8) is valid only if the quantizer is designed to minimize overload
`distortion. For the signals depicted in figure 2.1 the quantization noise, q(n) = r(n)-r(n) =
`s(n)-s(n). Also let us denote q'(n) = .'l(n)-s'(n) as the quantization noise resl.tlting from a
`
`straightforward scalar quantization and reconstruction of s(n). Assuming equal £ 1S and bit
`
`rates for both quantizers and neglecting the fact that in figure 2.1 we are predicting future
`
`input samples from previous reconstructed samples, (2.8) implies that u~ will be smaller
`than CJ':. by a factor of b· Stated another way, the gain in SNR (signal to noise ratio) of
`the LPC system over simple quantization is given by G. The actual gain is typically less
`
`than this value because predicting input values from previous reconstructed values instead
`
`of previous input values results in a higher variance residual or error which is the input to
`
`the quantizer.
`
`18
`
`Ex. 1042 / Page 19 of 133
`
`
`
`To gain an appreciation for the benefits accrued by LPC over simple minded quantization
`
`it is worthwhile analyzing the asymptotic behaviour of G, in the Limit of infinite predictor
`
`I
`
`order. Our discussion closely follows a similar one in [15]. We will use the superscript p to
`
`denote the order of the predictor to which a particular parameter or matrix corresponds.
`
`It can be shown that,
`
`where I · I denotes determinant. Recognizing the recursive nature of this equation we can
`write,
`
`jP+l R,,,I = 1° R,,I IT iO'~
`
`p
`
`i=l
`
`p
`
`= CT~ II iu;.
`
`i=l
`
`(2.9)
`
`~e also know. th~t {Pu;} is a monotonically decreasing sequence, since we can a.chieve the
`lower bound on the residual energy· or p'th ~tage with a·p+· 1.order predictor by setting'the
`
`p+ I 'th coefficient to zero and carrying over the other coefficients from the p order predictor.
`
`Furthermore, the sequence ofPu; is clearly bounded below by zero. Therefore, this sequence
`has a limiting value. We will call this asymptotic error variance 000';. Furthermore, since
`the sequence of error variances converges and is always greater than zero, we also have the
`
`following,
`
`,,~~[a; ft iu;];; =00 a;.
`
`I
`
`•=1
`Using the fact that the determinant of a matrix is equal to the product of it's eigenvalues
`
`(2.10)
`
`and denoting the i'th eigenvalue of the p'th autocorrelation matrix by P .Xi, we can combine
`
`equations (2.9) and {2.10),
`
`p l ~
`oo u2 = lim u2 IT P ,\i
`
`r
`
`p-+oo
`
`[
`
`a
`
`i=l
`
`or,
`
`In [11] it is shown that,
`
`1 !tr
`lim - L log(P .x,) = -2
`
`1r
`
`-71'
`
`1 p
`p-+oo P i=l
`
`log( S.,a( w ))dw
`
`19
`
`Ex. 1042 / Page 20 of 133
`
`
`
`where S., is the power spectrum of s(n). So that finally we can say,
`1 Jtr log(S .. (w))dw]
`[2
`oou~ = e
`
`"'
`
`-11'
`
`or that,
`
`ooa -
`·-
`
`1 111'
`---2
`"'
`1
`---2 j 1og(S,.(w))dw
`
`s,.(w)<kil
`
`-ll'
`ll'
`
`(2.11)
`
`e "'
`Equation (2.11) states that the asymptotic prediction gain is equal to the ratio of the
`
`-ll'
`
`arithmetic and geomet1·ic means of the power spectrum of the input. It is easily seen
`
`that this ratio is unity onl1 when the input spectrum is flat indicating that the. signal is
`
`completely uncorrelated and consequently unpredictable. Because 00G is a strong function
`
`of the· 'fatness' of the input spectrum, it is also commonly referred to as the sp~ctral flatness
`
`.m~asure .(sfm).
`
`Another important property of the asymptotic prediction error is tha.t it i1:1 completely .
`
`uncorrelated or white. This is easily shown by appealing to the orthogonality principle of
`
`linear least squares estimation theory, which states that the minimum estimation error is
`
`uncorrelated with the parameters used in the linear estimation. Well, prediction is the same
`
`thing as es\':imation and hence the prediction error at index n is orthogonal to all previous
`
`values of s(n) and linear combinations thereof. Well, the prediction error at index n - l is
`
`a linear combination of previous values of s(n) and hence r(n) and r(n - 1) are orthogonal.
`
`The same argument applies for all values of n.
`
`The main conclusions of this analysis of the prediction error or residual which will be
`
`referred to repeatedly throughout this thesis are:
`
`• LPC prov~des the greatest coding gains for input signals with increasingly w:ion-ftat
`
`power spectra.
`
`• The asymptotic prediction error is fl.at.
`
`In real ap