`PARAMETERS FOR HARMONIC SPEECH CODING
`
`by
`
`Bhaskar Bhattacharya
`
`A THESIS SUBMITTED IN PARTIAL FULFILLMENT
`
`OF THE REQUIREMENTS FOR THE DEGREE OF
`DOCTOR OF PHILOSOPHY
`in the School
`
`of
`Engineering Science
`
`@ Bhaskar Bhattacharya 1996
`SIMON FRASER UNIVERSITY
`
`October, 1996
`
`All rights reserved. This work may not be
`reproduced in whole or in part, by photocopy
`or other means, without the permission of the author.
`
`Ex. 1029 / Page 1 of 182
`Apple v. Saint Lawrence
`
`
`
`APPROVAL
`
`Name:
`
`Degree:
`
`Doctor of Pl~ilosopl~y
`
`Title of thesis :
`
`Efficient Vector Quantization of LP(I Para~neters for H a -
`~nonic Speech (loding
`
`Examining Committee: Dr. .John .Jones, (.:hairn~an
`
` -
`.
`,
`.
`Dr. Vladimir ('u$rn~an, Senior Supervisor
`ProfessorJn~ineering Science, SFTI
`
`V ,
`
`
`
`Y
`
`Dr. Paul Ho. Supervisor
`Associate Professor, Engineering Science, SFIT
`
`/
`
`f .
`v
`Dr. JacquesValsey, Supervisoj
`Assistant Profesor, Engineering Scienct., SF11
`
`Jim (:avers, Internal Examiner
`rofessor, Engineering Science. SFIT
`
`Dr. Sanjit K. Mitra, External Examiner
`Professor, Electrical and Comput,er Engineering
`University of (Ihlifornia, Santa Barbara
`
`Date Approved:
`
`October 11, 1996
`
`Ex. 1029 / Page 2 of 182
`
`
`
`PARTIAL COPYRIGHT LICENSE
`
`I hereby grant to Simon Fraser University the right to lend my thesis,
`project or extended essay (the title of which is shown below) to users of the
`Simon Fraser University Library, and to make partial or single copies only for
`such users or in response to a request from the library of any other university, or
`other educational institution, on its own behalf or for one of its usrs. I further
`agree that permission for multiple copying of this work for scholarly purposes
`may be granted by me or the Dean of Graduate Studies. It is understood that
`copying or publication of this work for financial gain shall not be allowed without
`my written permission.
`
`Title of Thesis/Project/Extended Essay
`
`"Efficient Vector quantization of LPC Parameters for Harmonic
`Speech Codin?"
`
`Author:
`
`(signature)
`
`1
`
`(name)
`
`October 11. 1996
`(date)
`
`Ex. 1029 / Page 3 of 182
`
`
`
`Abstract
`
`The present thesis deals with the problem of efficient (in bit rate and computational
`complexity) quantization of Linear Prediction Coding (LPC) parameters for low bit
`rate speech coding. The thesis introduces a new LPC quantization technique based on
`the Multi-Stage Vector Quantization (MSVQ) combined with a multi-candidate M-L
`search. The resulting procedure is assessed by evaluating the quantization spectral
`distortion on a speech data-base and by evaluating the subjective speech quality of a
`low-rate speech coder which employs the MSVQ LPC quantization.
`The general structure of MSVQ is described along with a geometrical interpreta-
`tion to provide insight into the structure of the reproduction alphabet in MSVQ. In
`~articular, it is shown that MSVQ codevectors provide a tiling of the sample space
`with repetitive patterns. Two tree-search techniques are suggested and one of them,
`the M-L search technique is studied in more detail.
`The experimental results obtained with MSVQ indicate that transparent quan-
`tization of LSFs (Line Spectral Frequencies - an efficient LPC representation) can
`be achieved with just 22 bitslvector with computational complexity comparable to
`the Split VQ at 24 bitslvector. Alternatively, transparent quantization of LSFs can
`be done using 24 bitslvector (as is done using Split VQ) at a much lower computa-
`tional complexity. Several results relating performance and complexity trade-offs are
`reported showing that MSVQ is a very flexible approach which provides a wide range
`of performance-complexity trade-offs and good robustness.
`The performance of MSVQ codes have been studied under channel error condi-
`tions and codebook ordering using pseudo-Gray coding. It is shown that while VQ
`based systems have lower average spectral distortion and a lower percentage of 2-4
`dB outliers even with transmission errors, scalar quantization may lead to a lower
`percentage of 4 dB outliers particularly at high error rates.
`
`Ex. 1029 / Page 4 of 182
`
`
`
`Performance of the
`IVQ codes have also been studied for effects of language and
`input spectral shape. It has been shown that MSVQ codes become more robust as
`the number of stages are increased.
`Finally, one of the MSVQ codes developed here has been used to implement a
`1800 bps speech coder using a harmonic coding of excitation and a very coarse 0-bit
`quantization of harmonic spectral shape. The speech quality of the 1800 bps coder
`was better than the 2400 bps LPC-lOe coder.
`
`Ex. 1029 / Page 5 of 182
`
`
`
`Acknowledgements
`
`I would like to thank Prof. Vladimir Cuperman for all his guidance and patience all
`along this work. His suggestions were very helpful during the course of this research.
`I also thank Dr. Jacques Vaisey and Dr. Paul Ho for being on my advisory committee
`and making constructive criticism of the work.
`I wish to express my heartfelt gratitude to my wife Roma for all her encourage-
`ments and tolerance, and all my friends, particularly Peter Lupini, Aamir Husain,
`and Yingbo Jiang for the exciting discussions that make research a lively occupation.
`I also obtained a lot of help in keeping my spirits up from my friends Hong Shi and
`Jacqueline Duffy, my sincere thanks to them.
`
`Ex. 1029 / Page 6 of 182
`
`
`
`Contents
`
`...
`Abstract ................................................................... in
`
`Acknowledgements ........................................................
`
`List of Tables ..............................................................
`
`v
`
`x
`
`List of Figures .............................................................
`
`xi
`
`1 Introduction ...........................................................
`1
`1.1 Speech Coding Techniques ......................................
`1
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
`1.1.1 Waveform Coders
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
`1.1.2 Parametric Coders
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
`1.1.3 Speech Coding Standards
`. . . . . . . . . . . . . . . . . . . . . . . . . . . 8
`1.2 Motivation and Original Contributions
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`8
`1.2.1 Motivation
`1.2.2 Original Contributions ................................... 10
`
`2 A Brief Review of Speech Coding Literature ........................ 13
`....................... 13
`2.1 Source Coding and Rate Distortion Theory
`2.2 Analysis-by-synthesis Speech Coding ............................. 17
`............................................. 22
`2.3 Transform Coding
`............................................. 25
`2.4 Sinusoidal Coding
`2.5 Relative Merits and Demerits of Different Coding Strategies . . . . . . . . . 26
`
`3 Quantization of LPC parameters ..................................... 28
`3.1 Choosing an Appropriate Spectral Representation . . . . . . . . . . . . . . . . . . 29
`
`Ex. 1029 / Page 7 of 182
`
`
`
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
`3.2 Preprocessing
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
`3.2.1 Pre-emphasis
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
`3.2.2 Bandwidth Expansion
`3.2.3 High Frequency Compensation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
`3.3 Vector Quantization of LPC Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 35
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
`3.3.1 Stochastic VQ
`.............. 40
`3.3.2 Techniques Exploiting Interframe Correlations
`3.4 Constrained (suboptimal) VQ ................................... 43
`..................................... 45
`3.4.1 Tree Structured VQ
`3.4.2 Classified VQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
`3.4.3 Product Code VQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
`3.4.4 Basis Vector VQ
`3.4.5 Multi-Stage VQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
`3.4.6 Partitioned VQ (Split VQ)
`
`4 Multi-Stage VQ of LPC Parameters ................................. 55
`4.1 Suboptimality of Sequential Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
`. . . . . . . . . . . . . . . . . 62
`4.1.1 Optimality conditions for sequential search
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
`4.2 Search Strategy
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
`4.2.1 Search Complexity
`. . . . . . . . . . . . . . . . 71
`4.2.2 Detailed Analysis of The Search Complexity
`4.3 Codebook Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
`4.3.1 Centroid Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
`4.3.2 Outlier Weighting
`4.4 Choice of Parameter Representation and Distance Measure . . . . . . . . . . 75
`. . . . . . . . . . . . . . . . . . . . . . . . . . 78
`4.5 Performance and Complexity Trade-offs
`4.6 Robustness Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
`. . . . . . . . . . . . . . . 80
`4.6.1 Effect of Language and Input Spectral Shape
`. . . . . . . . . . . . . . . 82
`4.6.2 Performance in the presence of channel errors
`Improved Codebook Designs for Multi-Stage VQ . . . . . . . . . . . . . . . . . . . 85
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
`4.7.1
`Iterative Sequential Design
`4.7.2 Simultaneous Joint Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
`4.8 Recent Developments in MSVQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
`
`4.7
`
`vii
`
`Ex. 1029 / Page 8 of 182
`
`
`
`4.9 Summary
`
`.....................................................
`
`88
`
`..............................
`5 A Low Rate Spectral Excitation Coder
`89
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`90
`5.1
`Introduction
`. . . . . . . . . 91
`5.2 Architecture of a Very-Low Rate Spectral Excitation Coder
`. . . . . . . . . . . . . . . . . . . . . . . . . .
`5.2.1 Treatment of Unvoiced Segments
`92
`. . . . . . . . . . . . . . . . . . . . . . . .
`5.3 Computation of the Unquantized Residual
`93
`5.4 Estimation and Quantization of Harmonic Parameters .............. 94
`........................................
`5.4.1 Pitch Estimation
`95
`............................
`5.4.2 Modelling of Harmonic Phases
`99
`. . . . . . . 105
`5.4.3 Estimation and Quantization of Harmonic Magnitudes
`. . . . . . . . . . . . . . . . . . . . . . . . . . .
`5.5 An 1800 bps Spectral Excitation Coder
`111
`. . . . . . . . . . . . . . . . . . . . . . . . . .
`5.5.1 Evaluation of Coder Performance
`113
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`114
`5.6 Conclusions
`
`6 Conclusion and Future Directions
`
`....................................
`
`115
`
`......................................................
`A Linear Prediction
`117
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`A . 1 Conceptual Formulation
`117
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`A.2 Equivalent Representations
`121
`. . . . . . . . . . . . . . . . . .
`A.2.1 Computation of Line Spectral Frequencies
`125
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`129
`A.3 Maximum Entropy Principle
`
`...........................................................
`B Quantization
`131
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`131
`B.l Scalar Quantization
`B . 1.1 Performance Measures
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`133
`....................................
`B . 1.2 Robust Quantization
`135
`..................................
`B.1.3 OptimumQuantization
`137
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`B.2 Vector Quantization
`141
`............................
`B.2.1 Vector Quantizer Performance
`144
`...........................................
`146
`B.2.2 Optimum VQ
`.............................................
`B.2.3 VQ Design
`150
`
`C Pitch Computation Algorithm
`
`........................................
`
`152
`
`...
`V l l l
`
`Ex. 1029 / Page 9 of 182
`
`
`
`D List of Citations . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
`
`References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
`
`Ex. 1029 / Page 10 of 182
`
`
`
`List of Tables
`
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`1.1 Digital Speech Coding Standards
`.........................
`1.2 Some important ITU-T recommendations
`
`9
`10
`
`. . . . . . . . . . . . . . . . . . . . . . . . . . . .
`3.1 Some early scalar quantization results
`. . . . . . . . . . . . . . . . . . . .
`3.2 Channel error performance of Basis Vector VQ
`
`31
`50
`
`4.1 MSVQ Configurations and Rates Producing an Average Spectral Dis-
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`tortion of 1 dB
`4.2 Spectral Distort ion Performance over Different Languages and Input
`...............................................
`Spectral Shapes
`4.3 Percentage of Outliers (2-4 dB) for Different Languages and Input Spec-
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`81
`tral Shapes
`4.4 Average Spectral Distortion for Different Error Rates and Codes ..... 84
`. . . . . . . . . 84
`4.5 Percentages of Outliers for Different Error Rates and Codes
`
`80
`
`81
`
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`5.1 Bit Allocation for the 1800 bps coder
`113
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`114
`5.2 MOS results
`
`C . 1 Values of empirical constants used in 1800 bps coder
`
`...............
`155
`
`Ex. 1029 / Page 11 of 182
`
`
`
`List of Figures
`
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
`A classification of speech coders
`A generalized predictive coder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`3
`The Source-Filter Parametric Coder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
`LPC-10 Speech Synthesis Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
`A schematic diagram of the CELP coder . . . . . . . . . . . . . . . . . . . . . . . . . . 7
`. . . . . . . . . . . . . . . . . . . . . . . . . . 8
`Historical bit rates of toll quality coders
`
`The primary parameters of R-D theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
`. . . . . . . . . . . . . . . . . . . . . . 18
`A Generalized Analysis-by-Synthesis System
`. . . . . . . . . . . . . . . . . . . . . . 19
`Computational structure of the CELP coder
`. . . . . . . . . . . . . . . . . . . . . . . . . 22
`Schematic diagram of a Transform Coder
`
`Spectral envelope of speech without (solid line) and with (dash line)
`high frequency compensation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
`SIVP coding system
`A tree-searched VQ for m = 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
`A tree structured VQ
`................................................. 47
`Classified VQ
`The Split VQ Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
`
`Structure of a two-stage two dimensional VQ ...................... 58
`A sequentially searched multi-stage VQ . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
`. . . . . . . . . . . . . . . . . . . . . . . . . . . 61
`Voronoi regions for a two-stage MSVQ
`Growing Tree search of a three stage VQ . . . . . . . . . . . . . . . . . . . . . . . . . . 65
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
`M-L Tree search of a three stage VQ
`. . . . . . . . . . . . . . . . . . 67
`Failure of multi-candidate search in a 2-stage VQ
`
`Ex. 1029 / Page 12 of 182
`
`
`
`4.7 Failure of M-L search in a 3-stage VQ ............................ 68
`4.8 Performance of LSF-6+6 MSVQ with M-L search .................. 71
`77
`4.9 Performance comparison of LAR and LSF codebooks with M-L search
`. . . . 78
`4.10 Spectral distortion of M-L Tree searched MSVQ at 24 bits/vector
`4.11 M-L search performance versus search complexity for different rates . . 79
`. . . . . . 82
`4.12 Performance over different languages and input spectral shapes
`
`5.1 Magnitude spectrum of a voiced speech segment and corresponding
`..................................................
`90
`LPC residual
`5.2 A conceptual schematic of a spectral excitation coder . . . . . . . . . . . . . . . 92
`.....................................
`5.3 Analysis of SEC parameters
`94
`. . . . . . . . . . . . . . . . . . . . . . .
`5.4 Performance of the geometric pitch detector
`100
`. . . . . . . . . . . . . . . . . . . . . . . . .
`101
`5.5 Pitch pulses marked by the pitch detector
`5.6 Difference between measured and predicted phase changes for a voiced
`frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
`5.7 Difference between measured and predicted phase changes for an un-
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`104
`voicedframe
`. . . . . . . . . . . . . . . . . . . . .
`106
`5.8 Frequency sampling points for a P-point DFT
`5.9 Log magnitude spectrum templates for voiced and unvoiced speech . . . 109
`. . . . . . . . . . . . . . . . . . . . . . . . .
`5.10 A Low bit rate Spectral Excitation Coder
`112
`
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`A.l Linear Prediction Model
`121
`. . . . . . . . . . . . . . . . . . . . . . . . .
`A.2 Stepped cylinder model of the vocal tract
`123
`. . . . . . . . . . . . . . . . . . .
`A.3 Transformation of Predictor coefficients to LSFs
`124
`A.4 Plots showing relationships between LSFs and other parameters . . . . . 128
`
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`B.l A typical mid-tread scalar quantizer
`132
`.............................
`B.2 Additive noise model of quantization
`133
`. . . . . . . . . . . . . . . . . . . . .
`B.3 Compander model of nonuniform quantization
`136
`B.4 A uniform joint pdf over a rectangular region (shown shaded) along
`........................................
`with the marginal pdf's
`143
`. . . . . . . . . . . . . .
`B.5 A vector quantizer satisfying the necessary conditions
`149
`
`xii
`
`Ex. 1029 / Page 13 of 182
`
`
`
`Chapter 1
`
`Introduction
`
`Recent advances in Multimedia Communication and the real possibility of an impend-
`ing integrated services network have generated a lot of interest in digital coding of
`speech. With increasing demand on the bandwidth, more and more emphasis is being
`placed on low bit rate speech coders. The present thesis addresses an important prob-
`lem in low bit rate speech coding - that of efficient quantization of LPC parameters.
`A low bit rate coder based on harmonic excitation is also presented that produces
`good speech quality at rates below 2 kb/s.
`
`1.1 Speech Coding Techniques
`
`Detailed reviews of different speech coding techniques can be found in [36, 23, 411.
`A brief overview is presented below. Speech coding algorithms can be categorized in
`different ways depending on the criterion used. The most common classification of
`coding systems divides them into two main categories waveform coders and parametric
`coders. The waveform coders, as the name implies, try to preserve the waveform being
`coded and pay no attention to the fact that the signal being coded is speech. The
`parametric coders, on the other hand, depend upon a parsimonious description of
`speech using a priori knowledge about how the signal was generated at the source. The
`idea is that certain physical constraints of the signal generation can be quantified, and
`turned to advantage in efficiently describing the signal. This implies that the signal
`must be fitted into a specific mold and parameterized accordingly. These coding
`techniques which exploit constraints of signal generation are also called source coders
`
`Ex. 1029 / Page 14 of 182
`
`
`
`CHAPTER 1. INTRODUCTION
`
`or vocoders(V0ice CODERS).
`Some coders use a mixture (or hybrid) of these two approaches. They use a synthe-
`sis filter that models the vocal tract but attempt to quantize the excitation sequence
`through an waveform matching procedure. We have put these coders under the cat-
`egory of parametric coders in our classification. A broad classification of different
`speech coders is shown in Fig. 1.1.
`
`Parametric Coders
`
`Speech Coding Systems s Waveform Coders
`7'7
`Open /7\ Mixed
`
`Time
`Domain
`
`DM
`DPCM
`ADPCM
`APC
`VPC
`
`Frequency
`Domain
`
`SBC
`ATC
`
`Direct
`Speech
`Encoding
`STC
`MBE
`
`Excitation
`Encoding
`
`LOOP
`LPC- 10
`RELP
`SEC
`
`PWI
`TFI
`
`Figure 1.1: A classification of speech coders
`
`Closed
`
`Loop
`
`MP-LPC
`W E
`CELP
`VSELP
`
`1.1.1 Waveform Coders
`
`The waveform coders operate either in the time domain or in the frequency domain
`and can be classified accordingly.
`
`1.1.1.1 Time Domain Waveform Coders
`
`The time domain waveform coders are all predictive coders, in that they code infor-
`mation that cannot be predicted from already reconstructed speech signals. They
`
`Ex. 1029 / Page 15 of 182
`
`
`
`CHAPTER 1. INTRODUCTION
`
`Figure 1.2: A generalized predictive coder
`
`evolved from DM (Delta Modulation) [58] which uses a first order fixed predictor
`and a one-bit adaptive quantizer, to VPC (Vector Predictive Coding) [24] which uses
`a vector predictor and a vector quantizer for the error sequence. APC (Adaptive
`Predictive Coding) [9, 10, 111 is a technique that uses a scalar, higher (> 1) order,
`predictor to predict both short-term and long-term structures of speech signal and
`optionally uses a filtered quantization error feedback to control noise spectrum. A
`schematic diagram of a generalized APC coder is shown in Fig. 1.2.
`
`1.1.1.2 Frequency Domain Waveform Coders
`
`Sub Band Coding (SBC) [21] divides the speech spectrum into four or five sub-bands
`using a bank of bandpass filters. Each sub-band is translated to base-band by a
`single-sideband modulation process, resampled at its Nyquist rate, and encoded by
`adaptive quantization or ADPCM. In the receiver, the sub-bands are decoded, mod-
`ulated back to their original position in the frequency domain, and summed to give a
`reconstruction of the original signal. The spectral shape of the quantization noise is
`controlled by bit-allocation.
`In Adaptive Transform Coding (ATC) [113], the speech signal is subdivided into
`blocks and a transform is applied to each block. The transform coefficients are adap-
`tively quantized and transmitted to the receiver where they are decoded and inverse
`transformed to obtain the waveform.
`
`Ex. 1029 / Page 16 of 182
`
`
`
`CHAPTER 1. INTRODUCTION
`
`1.1.2 Parametric Coders
`
`4
`
`Right from its introduction [57, 8, 771, linear prediction has been very successful in
`coding speech. A very popular model used for speech production is the source-filter
`model. The sound generating mechanism (the source) is assumed to be linearly sep-
`arable from the intelligence-modulating vocal tract (the filter) (Fig. 1.3). The speech
`signal, s(n), is analyzed to compute a set of excitation control parameters, J(n), and a
`set of synthesis filter control parameters a(n). The output of the excitation generator,
`e ( n ) , when passed through the synthesis filter produces reconstructed speech, i(n).
`
`Excitation
`Generator
`
`Synthesis
`Filter
`
`Figure 1.3: The Source-Filter Parametric Coder
`
`Despite the success of the source-filter model, some coders do not use it, and
`attempt to model the speech signal as a whole. Thus, the class of parametric coders
`can be further subdivided into those that attempt to model the speech directly, and
`those that attempt to model the excitation sequence and the synthesis filter separately.
`
`1.1.2.1 Direct Speech Encoding
`
`A powerful speech modelling technique uses a sum of sinusoids model to represent
`speech signals. This is represented by
`s(n) = C Am (n) cos ( e m (n) )
`
`(1.1)
`
`m
`
`where m is the harmonic number and the summation is taken over the number of
`harmonics which vary with time.
`This was first introduced by Hedelin [55] and later developed by Almeida and
`Tribolet [3], McAulay and Quatieri [82, 831, and Marques, Almeida and Tribolet [80].
`
`Ex. 1029 / Page 17 of 182
`
`
`
`CHAPTER 1. INTRODUCTION
`
`5
`
`This technique has been called Harmonic Coding and Sinusoidal Transform Coding
`(STC) by different authors.
`A slightly different form of sinusoidal speech modelling was done by Griffin and
`Lim [54]. A closed loop estimation was done for pitch and harmonic magnitudes. The
`speech spectrum was divided into voiced and unvoiced bands and voiced and unvoiced
`components of a speech frame were synthesized differently. The voiced component
`was synthesized in the time domain using Eq. (1.1) and the unvoiced component was
`computed from a synthetic DFT using the overlap-add method [53]. They were added
`together to form the synthetic speech signal. This technique, although performed
`directly on the speech signal is called Multi Band Excitation (MBE). One version of
`MBE, called improved MBE (IMBE) [16] was subsequently adopted by INMARS AT
`as a standard for satellite voice communication. Another version [85] is currently
`under consideration for the TIA half-rate TDMA digital cellular standard. Typical
`bit rates for sinusoidal coders range from 4.1 kb/s to 9.6 kb/s.
`
`1.1.2.2 Excitation Encoding
`
`The oldest parametric coder is the Channel Vocoder by Dudely [31]. It exploits the
`insensitivity of the aural mechanism to phase, and only attempts to reproduce the
`short time power spectrum of the speech waveform. The spectral envelope of the
`speech is measured with a bank of filters and ascribed wholly to the vocal tract filter,
`while the excitation is estimated to be either a quasi-periodic pulse train, or noise.
`In recent coders, that use excitation modelling, the synthesis filter is computed
`from a linear prediction analysis of segments of speech and uses what are called LPC
`parameters. A variety of techniques are used to represent the excitation signal. So,
`the problem in this class of coders is how to quantize the LPC parameters and the
`excitation most efficiently. In some coders the excitation is chosen in a closed loop
`fashion so as to minimize a perceptually significant distortion between the original and
`synthetic speech, and some others use an open loop approach without any reference
`to the synthetic speech. There are also some mixed approaches where a classifier is
`used and different classes are dealt with in an open or closed loop manner (Fig. 1.1).
`
`Ex. 1029 / Page 18 of 182
`
`
`
`CHAPTER 1. INTRODUCTION
`
`r " ' - - " " ' - - " ' - "
`
`I
`L-,,,,,,,,,,,,,-,,,I
`Excitation Generator
`
`I
`
`Speech
`
`Figure 1.4: LPC-10 Speech Synthesis Model
`
`Open loop techniques
`
`The oldest speech coding standard, LPC-10 (U.S. Government Federal Standard 101 5 )
`[103, 181, uses a 10th order synthesis filter, and pulses and random sequences as the
`excitation (Fig. 1.4). The LPC parameters are represented as reflection coefficients
`and are scalar quantized. Regular pulses at pitch intervals are used as excitation for
`voiced portions and a white random sequence is used for unvoiced portions of the
`speech being coded. The energy distribution is maintained by a gain parameter.
`A modification of the LPC-10 called RELP (Residual Excited Linear Prediction)
`[106] uses a quantized low-pass filtered version of the residual as the excitation and
`avoids the problem of classification and computation of pitch.
`The Spectral Excitation Coder (SEC) [25] uses a sum-of-sinusoids model to syn-
`thesize the excitation signal which is passed through an LPC based synthesis filter
`to produce speech. Since the residual is more spectrally flat than speech itself, it
`offers advantages in quantizing the harmonic magnitudes over conventional sinusoidal
`coders.
`
`Closed loop techniques
`
`The hybrid coders CELP (Code Excited Linear Prediction) [12] and VSELP (Vector
`Sum Excited Linear Prediction) [44] employ the same source-filter model (Fig. 1.3)
`but the excitation is selected from a fixed and an adaptive codebook in a closed loop
`fashion known as analysis by synthesis. A schematic structure of the CELP coder is
`shown in Fig 1.5.
`VSELP models the excitation sequence as a linear combination of a fixed set of
`
`Ex. 1029 / Page 19 of 182
`
`
`
`CHAP T E R 1. INTRODUCTION
`
`Adaptive codebook
`
`Input
`speech
`
`I
`
`Figure 1.5: A schematic diagram of the CELP coder
`
`M basis vectors.
`
`m=l
`
`where 0 5 i 5 2M - 1 and 0 < n < N - 1. The linear combination coefficients O;,
`are restricted to either $1 or -1. This simpl&es the procedure of codebook search
`for optimum innovation and also makes the system comparatively robust to bit errors
`as a single bit error only affects one component. Computational complexity is also
`reduced for a joint optimal search of the VSELP codebook and the adaptive codebook
`as it requires orthogonalization of a small (typically 10) number of basis vectors only.
`MP-LPC (Multi Pulse LPC) [5] and RPE (Regular Pulse Excitation) [67] are pre-
`cursors to CELP that uses codebooks of pulse trains whose positions and amplitudes
`are determined in a closed loop fashion.
`
`Mixed techniques
`
`It is possible that different approaches be applied in modelling different segments
`of the excitation. Specially, advantage can be taken of the apparent periodicity of
`the voiced portions of speech. The techniques Prototype Waveform Interpolation
`(PWI) [63, 641 and Time Frequency Interpolation (TFI) [97] use open loop frequency
`domain interpolation techniques to model the gradually changing pitch cycles of a
`
`Ex. 1029 / Page 20 of 182
`
`
`
`C H A P T E R 1. INTRODUCTION
`
`8
`
`voiced excitation while using closed loop techniques like CELP for unvoiced segments
`which are difficult to model parametrically due to lack of specific spectral structures.
`
`1.1.3 Speech Coding Standards
`
`A summary of different speech coding standards currently in use is shown in Table 1 .l.
`The ITU-T (formerly CCITT) has also passed some recommendations (Table 1.2) for
`digital coding of speech. The progression of tolllnear-toll quality speech coding can
`be seen in Fig. 1.6 where bit rates of toll quality coders have been plotted with the
`year of their introduction.
`
`1975
`
`1980
`
`1985
`Year
`
`1990
`
`1995
`
`2000
`
`Figure 1.6: Historical bit rates of toll quality coders
`
`1.2 Motivation and Original Contributions
`
`1.2.1 Motivation
`
`For low bit rate speech coders that employ the source-filter model, a large portion of
`the bit rate is invested in coding synthesis filter parameters. Obviously, one way to
`improve synthetic speech quality at low bit rates will be to minimize the number of
`
`Ex. 1029 / Page 21 of 182
`
`
`
`C H A P T E R 1. INTRODUCTION
`
`Rate
`( W s )
`64
`
`32
`
`16
`
`16
`
`13
`
`Application
`
`Coding Algorithm
`
`PSTN (1st Generation)
`
`Code Modulation
`
`Pulse
`(PCM)
`
`Year
`Adopted
`1972
`
`PSTN (2nd Generation)
`
`Adaptive Differential PCM
`(ADPCM)
`
`1984
`
`PSTN (3rd Generation)
`
`Ex-
`Code
`Delay
`Low
`cited Linear Predictive Coding
`(LDCELP)
`
`1992
`
`INMARSAT Standard B
`(Maritime)
`
`Adaptive Predictive Coding
`(APC)
`
`1985
`
`Pan European Digital Mo-
`bile Radio (DMR) Cellular
`System (GSM)
`
`Regular Pulse Excitation Lo