throbber
EFFICIENT VECTOR QUANTIZATION OF LPC
`PARAMETERS FOR HARMONIC SPEECH CODING
`
`by
`
`Bhaskar Bhattacharya
`
`A THESIS SUBMITTED IN PARTIAL FULFILLMENT
`
`OF THE REQUIREMENTS FOR THE DEGREE OF
`DOCTOR OF PHILOSOPHY
`in the School
`
`of
`Engineering Science
`
`@ Bhaskar Bhattacharya 1996
`SIMON FRASER UNIVERSITY
`
`October, 1996
`
`All rights reserved. This work may not be
`reproduced in whole or in part, by photocopy
`or other means, without the permission of the author.
`
`Ex. 1029 / Page 1 of 182
`Apple v. Saint Lawrence
`
`

`

`APPROVAL
`
`Name:
`
`Degree:
`
`Doctor of Pl~ilosopl~y
`
`Title of thesis :
`
`Efficient Vector Quantization of LP(I Para~neters for H a -
`~nonic Speech (loding
`
`Examining Committee: Dr. .John .Jones, (.:hairn~an
`
` -
`.
`,
`.
`Dr. Vladimir ('u$rn~an, Senior Supervisor
`ProfessorJn~ineering Science, SFTI
`
`V ,
`
`
`
`Y
`
`Dr. Paul Ho. Supervisor
`Associate Professor, Engineering Science, SFIT
`
`/
`
`f .
`v
`Dr. JacquesValsey, Supervisoj
`Assistant Profesor, Engineering Scienct., SF11
`
`Jim (:avers, Internal Examiner
`rofessor, Engineering Science. SFIT
`
`Dr. Sanjit K. Mitra, External Examiner
`Professor, Electrical and Comput,er Engineering
`University of (Ihlifornia, Santa Barbara
`
`Date Approved:
`
`October 11, 1996
`
`Ex. 1029 / Page 2 of 182
`
`

`

`PARTIAL COPYRIGHT LICENSE
`
`I hereby grant to Simon Fraser University the right to lend my thesis,
`project or extended essay (the title of which is shown below) to users of the
`Simon Fraser University Library, and to make partial or single copies only for
`such users or in response to a request from the library of any other university, or
`other educational institution, on its own behalf or for one of its usrs. I further
`agree that permission for multiple copying of this work for scholarly purposes
`may be granted by me or the Dean of Graduate Studies. It is understood that
`copying or publication of this work for financial gain shall not be allowed without
`my written permission.
`
`Title of Thesis/Project/Extended Essay
`
`"Efficient Vector quantization of LPC Parameters for Harmonic
`Speech Codin?"
`
`Author:
`
`(signature)
`
`1
`
`(name)
`
`October 11. 1996
`(date)
`
`Ex. 1029 / Page 3 of 182
`
`

`

`Abstract
`
`The present thesis deals with the problem of efficient (in bit rate and computational
`complexity) quantization of Linear Prediction Coding (LPC) parameters for low bit
`rate speech coding. The thesis introduces a new LPC quantization technique based on
`the Multi-Stage Vector Quantization (MSVQ) combined with a multi-candidate M-L
`search. The resulting procedure is assessed by evaluating the quantization spectral
`distortion on a speech data-base and by evaluating the subjective speech quality of a
`low-rate speech coder which employs the MSVQ LPC quantization.
`The general structure of MSVQ is described along with a geometrical interpreta-
`tion to provide insight into the structure of the reproduction alphabet in MSVQ. In
`~articular, it is shown that MSVQ codevectors provide a tiling of the sample space
`with repetitive patterns. Two tree-search techniques are suggested and one of them,
`the M-L search technique is studied in more detail.
`The experimental results obtained with MSVQ indicate that transparent quan-
`tization of LSFs (Line Spectral Frequencies - an efficient LPC representation) can
`be achieved with just 22 bitslvector with computational complexity comparable to
`the Split VQ at 24 bitslvector. Alternatively, transparent quantization of LSFs can
`be done using 24 bitslvector (as is done using Split VQ) at a much lower computa-
`tional complexity. Several results relating performance and complexity trade-offs are
`reported showing that MSVQ is a very flexible approach which provides a wide range
`of performance-complexity trade-offs and good robustness.
`The performance of MSVQ codes have been studied under channel error condi-
`tions and codebook ordering using pseudo-Gray coding. It is shown that while VQ
`based systems have lower average spectral distortion and a lower percentage of 2-4
`dB outliers even with transmission errors, scalar quantization may lead to a lower
`percentage of 4 dB outliers particularly at high error rates.
`
`Ex. 1029 / Page 4 of 182
`
`

`

`Performance of the
`IVQ codes have also been studied for effects of language and
`input spectral shape. It has been shown that MSVQ codes become more robust as
`the number of stages are increased.
`Finally, one of the MSVQ codes developed here has been used to implement a
`1800 bps speech coder using a harmonic coding of excitation and a very coarse 0-bit
`quantization of harmonic spectral shape. The speech quality of the 1800 bps coder
`was better than the 2400 bps LPC-lOe coder.
`
`Ex. 1029 / Page 5 of 182
`
`

`

`Acknowledgements
`
`I would like to thank Prof. Vladimir Cuperman for all his guidance and patience all
`along this work. His suggestions were very helpful during the course of this research.
`I also thank Dr. Jacques Vaisey and Dr. Paul Ho for being on my advisory committee
`and making constructive criticism of the work.
`I wish to express my heartfelt gratitude to my wife Roma for all her encourage-
`ments and tolerance, and all my friends, particularly Peter Lupini, Aamir Husain,
`and Yingbo Jiang for the exciting discussions that make research a lively occupation.
`I also obtained a lot of help in keeping my spirits up from my friends Hong Shi and
`Jacqueline Duffy, my sincere thanks to them.
`
`Ex. 1029 / Page 6 of 182
`
`

`

`Contents
`
`...
`Abstract ................................................................... in
`
`Acknowledgements ........................................................
`
`List of Tables ..............................................................
`
`v
`
`x
`
`List of Figures .............................................................
`
`xi
`
`1 Introduction ...........................................................
`1
`1.1 Speech Coding Techniques ......................................
`1
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
`1.1.1 Waveform Coders
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
`1.1.2 Parametric Coders
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
`1.1.3 Speech Coding Standards
`. . . . . . . . . . . . . . . . . . . . . . . . . . . 8
`1.2 Motivation and Original Contributions
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`8
`1.2.1 Motivation
`1.2.2 Original Contributions ................................... 10
`
`2 A Brief Review of Speech Coding Literature ........................ 13
`....................... 13
`2.1 Source Coding and Rate Distortion Theory
`2.2 Analysis-by-synthesis Speech Coding ............................. 17
`............................................. 22
`2.3 Transform Coding
`............................................. 25
`2.4 Sinusoidal Coding
`2.5 Relative Merits and Demerits of Different Coding Strategies . . . . . . . . . 26
`
`3 Quantization of LPC parameters ..................................... 28
`3.1 Choosing an Appropriate Spectral Representation . . . . . . . . . . . . . . . . . . 29
`
`Ex. 1029 / Page 7 of 182
`
`

`

`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
`3.2 Preprocessing
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
`3.2.1 Pre-emphasis
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
`3.2.2 Bandwidth Expansion
`3.2.3 High Frequency Compensation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
`3.3 Vector Quantization of LPC Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 35
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
`3.3.1 Stochastic VQ
`.............. 40
`3.3.2 Techniques Exploiting Interframe Correlations
`3.4 Constrained (suboptimal) VQ ................................... 43
`..................................... 45
`3.4.1 Tree Structured VQ
`3.4.2 Classified VQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
`3.4.3 Product Code VQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
`3.4.4 Basis Vector VQ
`3.4.5 Multi-Stage VQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
`3.4.6 Partitioned VQ (Split VQ)
`
`4 Multi-Stage VQ of LPC Parameters ................................. 55
`4.1 Suboptimality of Sequential Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
`. . . . . . . . . . . . . . . . . 62
`4.1.1 Optimality conditions for sequential search
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
`4.2 Search Strategy
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
`4.2.1 Search Complexity
`. . . . . . . . . . . . . . . . 71
`4.2.2 Detailed Analysis of The Search Complexity
`4.3 Codebook Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
`4.3.1 Centroid Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
`4.3.2 Outlier Weighting
`4.4 Choice of Parameter Representation and Distance Measure . . . . . . . . . . 75
`. . . . . . . . . . . . . . . . . . . . . . . . . . 78
`4.5 Performance and Complexity Trade-offs
`4.6 Robustness Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
`. . . . . . . . . . . . . . . 80
`4.6.1 Effect of Language and Input Spectral Shape
`. . . . . . . . . . . . . . . 82
`4.6.2 Performance in the presence of channel errors
`Improved Codebook Designs for Multi-Stage VQ . . . . . . . . . . . . . . . . . . . 85
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
`4.7.1
`Iterative Sequential Design
`4.7.2 Simultaneous Joint Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
`4.8 Recent Developments in MSVQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
`
`4.7
`
`vii
`
`Ex. 1029 / Page 8 of 182
`
`

`

`4.9 Summary
`
`.....................................................
`
`88
`
`..............................
`5 A Low Rate Spectral Excitation Coder
`89
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`90
`5.1
`Introduction
`. . . . . . . . . 91
`5.2 Architecture of a Very-Low Rate Spectral Excitation Coder
`. . . . . . . . . . . . . . . . . . . . . . . . . .
`5.2.1 Treatment of Unvoiced Segments
`92
`. . . . . . . . . . . . . . . . . . . . . . . .
`5.3 Computation of the Unquantized Residual
`93
`5.4 Estimation and Quantization of Harmonic Parameters .............. 94
`........................................
`5.4.1 Pitch Estimation
`95
`............................
`5.4.2 Modelling of Harmonic Phases
`99
`. . . . . . . 105
`5.4.3 Estimation and Quantization of Harmonic Magnitudes
`. . . . . . . . . . . . . . . . . . . . . . . . . . .
`5.5 An 1800 bps Spectral Excitation Coder
`111
`. . . . . . . . . . . . . . . . . . . . . . . . . .
`5.5.1 Evaluation of Coder Performance
`113
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`114
`5.6 Conclusions
`
`6 Conclusion and Future Directions
`
`....................................
`
`115
`
`......................................................
`A Linear Prediction
`117
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`A . 1 Conceptual Formulation
`117
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`A.2 Equivalent Representations
`121
`. . . . . . . . . . . . . . . . . .
`A.2.1 Computation of Line Spectral Frequencies
`125
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`129
`A.3 Maximum Entropy Principle
`
`...........................................................
`B Quantization
`131
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`131
`B.l Scalar Quantization
`B . 1.1 Performance Measures
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`133
`....................................
`B . 1.2 Robust Quantization
`135
`..................................
`B.1.3 OptimumQuantization
`137
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`B.2 Vector Quantization
`141
`............................
`B.2.1 Vector Quantizer Performance
`144
`...........................................
`146
`B.2.2 Optimum VQ
`.............................................
`B.2.3 VQ Design
`150
`
`C Pitch Computation Algorithm
`
`........................................
`
`152
`
`...
`V l l l
`
`Ex. 1029 / Page 9 of 182
`
`

`

`D List of Citations . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
`
`References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
`
`Ex. 1029 / Page 10 of 182
`
`

`

`List of Tables
`
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`1.1 Digital Speech Coding Standards
`.........................
`1.2 Some important ITU-T recommendations
`
`9
`10
`
`. . . . . . . . . . . . . . . . . . . . . . . . . . . .
`3.1 Some early scalar quantization results
`. . . . . . . . . . . . . . . . . . . .
`3.2 Channel error performance of Basis Vector VQ
`
`31
`50
`
`4.1 MSVQ Configurations and Rates Producing an Average Spectral Dis-
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`tortion of 1 dB
`4.2 Spectral Distort ion Performance over Different Languages and Input
`...............................................
`Spectral Shapes
`4.3 Percentage of Outliers (2-4 dB) for Different Languages and Input Spec-
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`81
`tral Shapes
`4.4 Average Spectral Distortion for Different Error Rates and Codes ..... 84
`. . . . . . . . . 84
`4.5 Percentages of Outliers for Different Error Rates and Codes
`
`80
`
`81
`
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`5.1 Bit Allocation for the 1800 bps coder
`113
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`114
`5.2 MOS results
`
`C . 1 Values of empirical constants used in 1800 bps coder
`
`...............
`155
`
`Ex. 1029 / Page 11 of 182
`
`

`

`List of Figures
`
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
`A classification of speech coders
`A generalized predictive coder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`3
`The Source-Filter Parametric Coder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
`LPC-10 Speech Synthesis Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
`A schematic diagram of the CELP coder . . . . . . . . . . . . . . . . . . . . . . . . . . 7
`. . . . . . . . . . . . . . . . . . . . . . . . . . 8
`Historical bit rates of toll quality coders
`
`The primary parameters of R-D theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
`. . . . . . . . . . . . . . . . . . . . . . 18
`A Generalized Analysis-by-Synthesis System
`. . . . . . . . . . . . . . . . . . . . . . 19
`Computational structure of the CELP coder
`. . . . . . . . . . . . . . . . . . . . . . . . . 22
`Schematic diagram of a Transform Coder
`
`Spectral envelope of speech without (solid line) and with (dash line)
`high frequency compensation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
`SIVP coding system
`A tree-searched VQ for m = 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
`A tree structured VQ
`................................................. 47
`Classified VQ
`The Split VQ Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
`
`Structure of a two-stage two dimensional VQ ...................... 58
`A sequentially searched multi-stage VQ . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
`. . . . . . . . . . . . . . . . . . . . . . . . . . . 61
`Voronoi regions for a two-stage MSVQ
`Growing Tree search of a three stage VQ . . . . . . . . . . . . . . . . . . . . . . . . . . 65
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
`M-L Tree search of a three stage VQ
`. . . . . . . . . . . . . . . . . . 67
`Failure of multi-candidate search in a 2-stage VQ
`
`Ex. 1029 / Page 12 of 182
`
`

`

`4.7 Failure of M-L search in a 3-stage VQ ............................ 68
`4.8 Performance of LSF-6+6 MSVQ with M-L search .................. 71
`77
`4.9 Performance comparison of LAR and LSF codebooks with M-L search
`. . . . 78
`4.10 Spectral distortion of M-L Tree searched MSVQ at 24 bits/vector
`4.11 M-L search performance versus search complexity for different rates . . 79
`. . . . . . 82
`4.12 Performance over different languages and input spectral shapes
`
`5.1 Magnitude spectrum of a voiced speech segment and corresponding
`..................................................
`90
`LPC residual
`5.2 A conceptual schematic of a spectral excitation coder . . . . . . . . . . . . . . . 92
`.....................................
`5.3 Analysis of SEC parameters
`94
`. . . . . . . . . . . . . . . . . . . . . . .
`5.4 Performance of the geometric pitch detector
`100
`. . . . . . . . . . . . . . . . . . . . . . . . .
`101
`5.5 Pitch pulses marked by the pitch detector
`5.6 Difference between measured and predicted phase changes for a voiced
`frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
`5.7 Difference between measured and predicted phase changes for an un-
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`104
`voicedframe
`. . . . . . . . . . . . . . . . . . . . .
`106
`5.8 Frequency sampling points for a P-point DFT
`5.9 Log magnitude spectrum templates for voiced and unvoiced speech . . . 109
`. . . . . . . . . . . . . . . . . . . . . . . . .
`5.10 A Low bit rate Spectral Excitation Coder
`112
`
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`A.l Linear Prediction Model
`121
`. . . . . . . . . . . . . . . . . . . . . . . . .
`A.2 Stepped cylinder model of the vocal tract
`123
`. . . . . . . . . . . . . . . . . . .
`A.3 Transformation of Predictor coefficients to LSFs
`124
`A.4 Plots showing relationships between LSFs and other parameters . . . . . 128
`
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`B.l A typical mid-tread scalar quantizer
`132
`.............................
`B.2 Additive noise model of quantization
`133
`. . . . . . . . . . . . . . . . . . . . .
`B.3 Compander model of nonuniform quantization
`136
`B.4 A uniform joint pdf over a rectangular region (shown shaded) along
`........................................
`with the marginal pdf's
`143
`. . . . . . . . . . . . . .
`B.5 A vector quantizer satisfying the necessary conditions
`149
`
`xii
`
`Ex. 1029 / Page 13 of 182
`
`

`

`Chapter 1
`
`Introduction
`
`Recent advances in Multimedia Communication and the real possibility of an impend-
`ing integrated services network have generated a lot of interest in digital coding of
`speech. With increasing demand on the bandwidth, more and more emphasis is being
`placed on low bit rate speech coders. The present thesis addresses an important prob-
`lem in low bit rate speech coding - that of efficient quantization of LPC parameters.
`A low bit rate coder based on harmonic excitation is also presented that produces
`good speech quality at rates below 2 kb/s.
`
`1.1 Speech Coding Techniques
`
`Detailed reviews of different speech coding techniques can be found in [36, 23, 411.
`A brief overview is presented below. Speech coding algorithms can be categorized in
`different ways depending on the criterion used. The most common classification of
`coding systems divides them into two main categories waveform coders and parametric
`coders. The waveform coders, as the name implies, try to preserve the waveform being
`coded and pay no attention to the fact that the signal being coded is speech. The
`parametric coders, on the other hand, depend upon a parsimonious description of
`speech using a priori knowledge about how the signal was generated at the source. The
`idea is that certain physical constraints of the signal generation can be quantified, and
`turned to advantage in efficiently describing the signal. This implies that the signal
`must be fitted into a specific mold and parameterized accordingly. These coding
`techniques which exploit constraints of signal generation are also called source coders
`
`Ex. 1029 / Page 14 of 182
`
`

`

`CHAPTER 1. INTRODUCTION
`
`or vocoders(V0ice CODERS).
`Some coders use a mixture (or hybrid) of these two approaches. They use a synthe-
`sis filter that models the vocal tract but attempt to quantize the excitation sequence
`through an waveform matching procedure. We have put these coders under the cat-
`egory of parametric coders in our classification. A broad classification of different
`speech coders is shown in Fig. 1.1.
`
`Parametric Coders
`
`Speech Coding Systems s Waveform Coders
`7'7
`Open /7\ Mixed
`
`Time
`Domain
`
`DM
`DPCM
`ADPCM
`APC
`VPC
`
`Frequency
`Domain
`
`SBC
`ATC
`
`Direct
`Speech
`Encoding
`STC
`MBE
`
`Excitation
`Encoding
`
`LOOP
`LPC- 10
`RELP
`SEC
`
`PWI
`TFI
`
`Figure 1.1: A classification of speech coders
`
`Closed
`
`Loop
`
`MP-LPC
`W E
`CELP
`VSELP
`
`1.1.1 Waveform Coders
`
`The waveform coders operate either in the time domain or in the frequency domain
`and can be classified accordingly.
`
`1.1.1.1 Time Domain Waveform Coders
`
`The time domain waveform coders are all predictive coders, in that they code infor-
`mation that cannot be predicted from already reconstructed speech signals. They
`
`Ex. 1029 / Page 15 of 182
`
`

`

`CHAPTER 1. INTRODUCTION
`
`Figure 1.2: A generalized predictive coder
`
`evolved from DM (Delta Modulation) [58] which uses a first order fixed predictor
`and a one-bit adaptive quantizer, to VPC (Vector Predictive Coding) [24] which uses
`a vector predictor and a vector quantizer for the error sequence. APC (Adaptive
`Predictive Coding) [9, 10, 111 is a technique that uses a scalar, higher (> 1) order,
`predictor to predict both short-term and long-term structures of speech signal and
`optionally uses a filtered quantization error feedback to control noise spectrum. A
`schematic diagram of a generalized APC coder is shown in Fig. 1.2.
`
`1.1.1.2 Frequency Domain Waveform Coders
`
`Sub Band Coding (SBC) [21] divides the speech spectrum into four or five sub-bands
`using a bank of bandpass filters. Each sub-band is translated to base-band by a
`single-sideband modulation process, resampled at its Nyquist rate, and encoded by
`adaptive quantization or ADPCM. In the receiver, the sub-bands are decoded, mod-
`ulated back to their original position in the frequency domain, and summed to give a
`reconstruction of the original signal. The spectral shape of the quantization noise is
`controlled by bit-allocation.
`In Adaptive Transform Coding (ATC) [113], the speech signal is subdivided into
`blocks and a transform is applied to each block. The transform coefficients are adap-
`tively quantized and transmitted to the receiver where they are decoded and inverse
`transformed to obtain the waveform.
`
`Ex. 1029 / Page 16 of 182
`
`

`

`CHAPTER 1. INTRODUCTION
`
`1.1.2 Parametric Coders
`
`4
`
`Right from its introduction [57, 8, 771, linear prediction has been very successful in
`coding speech. A very popular model used for speech production is the source-filter
`model. The sound generating mechanism (the source) is assumed to be linearly sep-
`arable from the intelligence-modulating vocal tract (the filter) (Fig. 1.3). The speech
`signal, s(n), is analyzed to compute a set of excitation control parameters, J(n), and a
`set of synthesis filter control parameters a(n). The output of the excitation generator,
`e ( n ) , when passed through the synthesis filter produces reconstructed speech, i(n).
`
`Excitation
`Generator
`
`Synthesis
`Filter
`
`Figure 1.3: The Source-Filter Parametric Coder
`
`Despite the success of the source-filter model, some coders do not use it, and
`attempt to model the speech signal as a whole. Thus, the class of parametric coders
`can be further subdivided into those that attempt to model the speech directly, and
`those that attempt to model the excitation sequence and the synthesis filter separately.
`
`1.1.2.1 Direct Speech Encoding
`
`A powerful speech modelling technique uses a sum of sinusoids model to represent
`speech signals. This is represented by
`s(n) = C Am (n) cos ( e m (n) )
`
`(1.1)
`
`m
`
`where m is the harmonic number and the summation is taken over the number of
`harmonics which vary with time.
`This was first introduced by Hedelin [55] and later developed by Almeida and
`Tribolet [3], McAulay and Quatieri [82, 831, and Marques, Almeida and Tribolet [80].
`
`Ex. 1029 / Page 17 of 182
`
`

`

`CHAPTER 1. INTRODUCTION
`
`5
`
`This technique has been called Harmonic Coding and Sinusoidal Transform Coding
`(STC) by different authors.
`A slightly different form of sinusoidal speech modelling was done by Griffin and
`Lim [54]. A closed loop estimation was done for pitch and harmonic magnitudes. The
`speech spectrum was divided into voiced and unvoiced bands and voiced and unvoiced
`components of a speech frame were synthesized differently. The voiced component
`was synthesized in the time domain using Eq. (1.1) and the unvoiced component was
`computed from a synthetic DFT using the overlap-add method [53]. They were added
`together to form the synthetic speech signal. This technique, although performed
`directly on the speech signal is called Multi Band Excitation (MBE). One version of
`MBE, called improved MBE (IMBE) [16] was subsequently adopted by INMARS AT
`as a standard for satellite voice communication. Another version [85] is currently
`under consideration for the TIA half-rate TDMA digital cellular standard. Typical
`bit rates for sinusoidal coders range from 4.1 kb/s to 9.6 kb/s.
`
`1.1.2.2 Excitation Encoding
`
`The oldest parametric coder is the Channel Vocoder by Dudely [31]. It exploits the
`insensitivity of the aural mechanism to phase, and only attempts to reproduce the
`short time power spectrum of the speech waveform. The spectral envelope of the
`speech is measured with a bank of filters and ascribed wholly to the vocal tract filter,
`while the excitation is estimated to be either a quasi-periodic pulse train, or noise.
`In recent coders, that use excitation modelling, the synthesis filter is computed
`from a linear prediction analysis of segments of speech and uses what are called LPC
`parameters. A variety of techniques are used to represent the excitation signal. So,
`the problem in this class of coders is how to quantize the LPC parameters and the
`excitation most efficiently. In some coders the excitation is chosen in a closed loop
`fashion so as to minimize a perceptually significant distortion between the original and
`synthetic speech, and some others use an open loop approach without any reference
`to the synthetic speech. There are also some mixed approaches where a classifier is
`used and different classes are dealt with in an open or closed loop manner (Fig. 1.1).
`
`Ex. 1029 / Page 18 of 182
`
`

`

`CHAPTER 1. INTRODUCTION
`
`r " ' - - " " ' - - " ' - "
`
`I
`L-,,,,,,,,,,,,,-,,,I
`Excitation Generator
`
`I
`
`Speech
`
`Figure 1.4: LPC-10 Speech Synthesis Model
`
`Open loop techniques
`
`The oldest speech coding standard, LPC-10 (U.S. Government Federal Standard 101 5 )
`[103, 181, uses a 10th order synthesis filter, and pulses and random sequences as the
`excitation (Fig. 1.4). The LPC parameters are represented as reflection coefficients
`and are scalar quantized. Regular pulses at pitch intervals are used as excitation for
`voiced portions and a white random sequence is used for unvoiced portions of the
`speech being coded. The energy distribution is maintained by a gain parameter.
`A modification of the LPC-10 called RELP (Residual Excited Linear Prediction)
`[106] uses a quantized low-pass filtered version of the residual as the excitation and
`avoids the problem of classification and computation of pitch.
`The Spectral Excitation Coder (SEC) [25] uses a sum-of-sinusoids model to syn-
`thesize the excitation signal which is passed through an LPC based synthesis filter
`to produce speech. Since the residual is more spectrally flat than speech itself, it
`offers advantages in quantizing the harmonic magnitudes over conventional sinusoidal
`coders.
`
`Closed loop techniques
`
`The hybrid coders CELP (Code Excited Linear Prediction) [12] and VSELP (Vector
`Sum Excited Linear Prediction) [44] employ the same source-filter model (Fig. 1.3)
`but the excitation is selected from a fixed and an adaptive codebook in a closed loop
`fashion known as analysis by synthesis. A schematic structure of the CELP coder is
`shown in Fig 1.5.
`VSELP models the excitation sequence as a linear combination of a fixed set of
`
`Ex. 1029 / Page 19 of 182
`
`

`

`CHAP T E R 1. INTRODUCTION
`
`Adaptive codebook
`
`Input
`speech
`
`I
`
`Figure 1.5: A schematic diagram of the CELP coder
`
`M basis vectors.
`
`m=l
`
`where 0 5 i 5 2M - 1 and 0 < n < N - 1. The linear combination coefficients O;,
`are restricted to either $1 or -1. This simpl&es the procedure of codebook search
`for optimum innovation and also makes the system comparatively robust to bit errors
`as a single bit error only affects one component. Computational complexity is also
`reduced for a joint optimal search of the VSELP codebook and the adaptive codebook
`as it requires orthogonalization of a small (typically 10) number of basis vectors only.
`MP-LPC (Multi Pulse LPC) [5] and RPE (Regular Pulse Excitation) [67] are pre-
`cursors to CELP that uses codebooks of pulse trains whose positions and amplitudes
`are determined in a closed loop fashion.
`
`Mixed techniques
`
`It is possible that different approaches be applied in modelling different segments
`of the excitation. Specially, advantage can be taken of the apparent periodicity of
`the voiced portions of speech. The techniques Prototype Waveform Interpolation
`(PWI) [63, 641 and Time Frequency Interpolation (TFI) [97] use open loop frequency
`domain interpolation techniques to model the gradually changing pitch cycles of a
`
`Ex. 1029 / Page 20 of 182
`
`

`

`C H A P T E R 1. INTRODUCTION
`
`8
`
`voiced excitation while using closed loop techniques like CELP for unvoiced segments
`which are difficult to model parametrically due to lack of specific spectral structures.
`
`1.1.3 Speech Coding Standards
`
`A summary of different speech coding standards currently in use is shown in Table 1 .l.
`The ITU-T (formerly CCITT) has also passed some recommendations (Table 1.2) for
`digital coding of speech. The progression of tolllnear-toll quality speech coding can
`be seen in Fig. 1.6 where bit rates of toll quality coders have been plotted with the
`year of their introduction.
`
`1975
`
`1980
`
`1985
`Year
`
`1990
`
`1995
`
`2000
`
`Figure 1.6: Historical bit rates of toll quality coders
`
`1.2 Motivation and Original Contributions
`
`1.2.1 Motivation
`
`For low bit rate speech coders that employ the source-filter model, a large portion of
`the bit rate is invested in coding synthesis filter parameters. Obviously, one way to
`improve synthetic speech quality at low bit rates will be to minimize the number of
`
`Ex. 1029 / Page 21 of 182
`
`

`

`C H A P T E R 1. INTRODUCTION
`
`Rate
`( W s )
`64
`
`32
`
`16
`
`16
`
`13
`
`Application
`
`Coding Algorithm
`
`PSTN (1st Generation)
`
`Code Modulation
`
`Pulse
`(PCM)
`
`Year
`Adopted
`1972
`
`PSTN (2nd Generation)
`
`Adaptive Differential PCM
`(ADPCM)
`
`1984
`
`PSTN (3rd Generation)
`
`Ex-
`Code
`Delay
`Low
`cited Linear Predictive Coding
`(LDCELP)
`
`1992
`
`INMARSAT Standard B
`(Maritime)
`
`Adaptive Predictive Coding
`(APC)
`
`1985
`
`Pan European Digital Mo-
`bile Radio (DMR) Cellular
`System (GSM)
`
`Regular Pulse Excitation Lo

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket