`
`ITU-T
`
`TELECOMMUNICATION
`STANDARDIZATION SECTOR
`OF ITU
`
`G.729
`(03/96)
`
`GENERAL ASPECTS OF DIGITAL TRANSMISSION
`SYSTEMS
`
`CODING OF SPEECH AT 8 kbit/s
`USING CONJUGATE-STRUCTURE
`ALGEBRAIC-CODE-EXCITED
`LINEAR-PREDICTION (CS-ACELP)
`
`ITU-T Recommendation G.729
`
`(Previously “CCITT Recommendation”)
`
`Ex. 1038 / Page 1 of 39
`Apple v. Saint Lawrence
`
`
`
`FOREWORD
`
`The ITU-T (Telecommunication Standardization Sector) is a permanent organ of the International Telecommunication
`Union (ITU). The ITU-T is responsible for studying technical, operating and tariff questions and issuing Recommen-
`dations on them with a view to standardizing telecommunications on a worldwide basis.
`
`The World Telecommunication Standardization Conference (WTSC), which meets every four years, establishes the topics
`for study by the ITU-T Study Groups which, in their turn, produce Recommendations on these topics.
`
`The approval of Recommendations by the Members of the ITU-T is covered by the procedure laid down in WTSC
`Resolution No. 1 (Helsinki, March 1-12, 1993).
`
`ITU-T Recommendation G.729 was prepared by ITU-T Study Group 15 (1993-1996) and was approved under the WTSC
`Resolution No. 1 procedure on the 19th of March 1996.
`
`___________________
`
`In this Recommendation, the expression “Administration” is used for conciseness to indicate both a telecommunication
`administration and a recognized operating agency.
`
`NOTE
`
`All rights reserved. No part of this publication may be reproduced or utilized in any form or by any means, electronic or
`mechanical, including photocopying and microfilm, without permission in writing from the ITU.
`
` ITU 1996
`
`Ex. 1038 / Page 2 of 39
`
`ª
`
`
`Recommendation G.729 (03/96)
`
`CONTENTS
`
`1
`
`2
`
`Introduction ..................................................................................................................................................
`
`General description of the coder....................................................................................................................
`
`2.1
`
`2.2
`
`2.3
`
`2.4
`
`2.5
`
`Encoder ...........................................................................................................................................
`
`Decoder ...........................................................................................................................................
`
`Delay ...............................................................................................................................................
`
`Speech coder description .................................................................................................................
`
`Notational conventions ....................................................................................................................
`
`3
`
`Functional description of the encoder ...........................................................................................................
`
`3.1
`
`3.2
`
`3.3
`
`3.4
`
`3.5
`
`3.6
`
`3.7
`
`3.8
`
`3.9
`
`Pre-processing .................................................................................................................................
`
`Linear prediction analysis and quantization.....................................................................................
`
`Perceptual weighting .......................................................................................................................
`
`Open-loop pitch analysis..................................................................................................................
`
`Computation of the impulse response...............................................................................................
`
`Computation of the target signal ......................................................................................................
`
`Adaptive-codebook search ...............................................................................................................
`
`Fixed codebook – Structure and search............................................................................................
`
`Quantization of the gains .................................................................................................................
`
`3.10 Memory update................................................................................................................................
`
`4
`
`Functional description of the decoder ...........................................................................................................
`
`4.1
`
`4.2
`
`4.3
`
`4.4
`
`Parameter decoding procedure.........................................................................................................
`
`Post-processing................................................................................................................................
`
`Encoder and decoder initialization...................................................................................................
`
`Concealment of frame erasures ........................................................................................................
`
`5
`
`Bit-exact description of the CS-ACELP coder ..............................................................................................
`
`5.1
`
`5.2
`
`Use of the simulation software.........................................................................................................
`
`Organization of the simulation software...........................................................................................
`
`Page
`
`1
`
`1
`
`2
`
`3
`
`4
`
`4
`
`4
`
`7
`
`7
`
`7
`
`14
`
`15
`
`16
`
`16
`
`17
`
`19
`
`22
`
`24
`
`25
`
`25
`
`28
`
`30
`
`30
`
`32
`
`32
`
`32
`
`Recommendation G.729 (03/96)
`
`i
`
`Ex. 1038 / Page 3 of 39
`
`
`
`Ex. 1038 / Page 4 of 39
`
`Ex. 1038 / Page 4 of 39
`
`
`
`Recommendation G.729
`
`Recommendation G.729 (03/96)
`
`CODING OF SPEECH AT 8 kbit/s USING CONJUGATE-STRUCTURE
`ALGEBRAIC-CODE-EXCITED LINEAR-PREDICTION (CS-ACELP)
`
`(Geneva, 1996)
`
`1
`
`Introduction
`
`This Recommendation contains the description of an algorithm for the coding of speech signals at 8 kbit/s using
`Conjugate-Structure Algebraic-Code-Excited Linear-Prediction (CS-ACELP).
`
`This coder is designed to operate with a digital signal obtained by first performing telephone bandwidth filtering
`(Recommendation G.712) of the analogue input signal, then sampling it at 8000 Hz, followed by conversion to 16-bit
`linear PCM for the input to the encoder. The output of the decoder should be converted back to an analogue signal by
`similar means. Other input/output characteristics, such as those specified by Recommendation G.711 for 64 kbit/s PCM
`data, should be converted to 16-bit linear PCM before encoding, or from 16-bit linear PCM to the appropriate format after
`decoding. The bitstream from the encoder to the decoder is defined within this Recommendation.
`
`This Recommendation is organized as follows: Clause 2 gives a general outline of the CS-ACELP algorithm. In clauses 3
`and 4, the CS-ACELP encoder and decoder principles are discussed, respectively. Clause 5 describes the software that
`defines this coder in 16 bit fixed-point arithmetic.
`
`2
`
`General description of the coder
`
`The CS-ACELP coder is based on the Code-Excited Linear-Prediction (CELP) coding model. The coder operates on
`speech frames of 10 ms corresponding to 80 samples at a sampling rate of 8000 samples per second. For every 10 ms
`frame, the speech signal is analysed to extract the parameters of the CELP model (linear-prediction filter coefficients,
`adaptive and fixed-codebook indices and gains). These parameters are encoded and transmitted. The bit allocation of the
`coder parameters is shown in Table 1. At the decoder, these parameters are used to retrieve the excitation and synthesis
`filter parameters. The speech is reconstructed by filtering this excitation through the short-term synthesis filter, as is
`shown in Figure 1. The short-term synthesis filter is based on a 10th order Linear Prediction (LP) filter. The long-term, or
`pitch synthesis filter is implemented using the so-called adaptive-codebook approach. After computing the reconstructed
`speech, it is further enhanced by a postfilter.
`
`TABLE 1/G.729
`
`Bit allocation of the 8 kbit/s CS-ACELP algorithm (10 ms frame)
`
`Parameter
`
`Codeword
`
`Subframe 1
`
`Subframe 2
`
`Total per frame
`
`Line spectrum pairs
`
`Adaptive-codebook delay
`
`Pitch-delay parity
`
`Fixed-codebook index
`
`Fixed-codebook sign
`
`Codebook gains (stage 1)
`
`Codebook gains (stage 2)
`
`Total
`
`L0, L1, L2, L3
`
`P1, P2
`
`P0
`
`C1, C2
`
`S1, S2
`
`GA1, GA2
`
`GB1, GB2
`
`8
`
`1
`
`13
`
`4
`
`3
`
`4
`
`5
`
`13
`
`4
`
`3
`
`4
`
`18
`
`13
`
`1
`
`26
`
`8
`
`6
`
`8
`
`80
`
`Recommendation G.729 (03/96)
`
`1
`
`Ex. 1038 / Page 5 of 39
`
`
`
`Excitation
`codebook
`
`Long-term
`synthesis
`filter
`
`Short-term
`synthesis
`filter
`
`Post
`filter
`
`Output
`speech
`
`Parameter decoding
`
`Received bitstream
`
`T1518640-95/d01
`
`FIGURE 1/G.729
`Block diagram of conceptual CELP synthesis model
`
`FIGURE 1/G.729...[D01] = 5 CM
`
`2.1
`
`Encoder
`
`The encoding principle is shown in Figure 2. The input signal is high-pass filtered and scaled in the pre-processing block.
`The pre-processed signal serves as the input signal for all subsequent analysis. LP analysis is done once per 10 ms frame
`to compute the LP filter coefficients. These coefficients are converted to Line Spectrum Pairs (LSP) and quantized using
`predictive two-stage Vector Quantization (VQ) with 18 bits. The excitation signal is chosen by using an analysis-by-
`synthesis search procedure in which the error between the original and reconstructed speech is minimized according to a
`perceptually weighted distortion measure. This is done by filtering the error signal with a perceptual weighting filter,
`whose coefficients are derived from the unquantized LP filter. The amount of perceptual weighting is made adaptive to
`improve the performance for input signals with a flat frequency-response.
`
`The excitation parameters (fixed and adaptive-codebook parameters) are determined per subframe of 5 ms (40 samples)
`each. The quantized and unquantized LP filter coefficients are used for the second subframe, while in the first subframe
`interpolated LP filter coefficients are used (both quantized and unquantized). An open-loop pitch delay is estimated once
`per 10 ms frame based on the perceptually weighted speech signal. Then the following operations are repeated for each
`subframe. The target signal x(n) is computed by filtering the LP residual through the weighted synthesis filter W(z)/Â(z).
`The initial states of these filters are updated by filtering the error between LP residual and excitation. This is equivalent to
`the common approach of subtracting the zero-input response of the weighted synthesis filter from the weighted speech
`signal. The impulse response h(n) of the weighted synthesis filter is computed. Closed-loop pitch analysis is then done (to
`find the adaptive-codebook delay and gain), using the target x(n) and impulse response h(n), by searching around the
`value of the open-loop pitch delay. A fractional pitch delay with 1/3 resolution is used. The pitch delay is encoded with
`8 bits in the first subframe and differentially encoded with 5 bits in the second subframe. The target signal x(n) is updated
`by subtracting the (filtered) adaptive-codebook contribution, and this new target, x¢ (n), is used in the fixed-codebook
`search to find the optimum excitation. An algebraic codebook with 17 bits is used for the fixed-codebook excitation. The
`gains of the adaptive and fixed-codebook contributions are vector quantized with 7 bits, (with MA prediction applied to
`the fixed-codebook gain). Finally, the filter memories are updated using the determined excitation signal.
`
`2
`
`Recommendation G.729 (03/96)
`
`Ex. 1038 / Page 6 of 39
`
`
`
`Fixed
`codebook
`
`Adaptive
`codebook
`
`GC
`
`G P
`
`Input
`speech
`
`Pre-
`processing
`
`LP analysis
`quantization
`interpolation
`
`LPC info
`
`Synthesis
`filter
`
`LPC info
`
`Perceptual
`weighting
`
`Pitch
`analysis
`
`Fixed CB
`search
`
`Gain
`quantization
`
`Parameter
`encoding
`
`Transmitted
`bitstream
`
`LPC info
`
`T1518650-95/D02
`
`FIGURE 2/G.729
`Encoding principle of the CS-ACELP encoder
`
`FIGURE 2/G.729...[D02] = 16 CM
`
`2.2
`
`Decoder
`
`The decoder principle is shown in Figure 3. First, the parameter’s indices are extracted from the received bitstream. These
`indices are decoded to obtain the coder parameters corresponding to a 10 ms speech frame. These parameters are the LSP
`coefficients, the two fractional pitch delays, the two fixed-codebook vectors, and the two sets of adaptive and fixed-
`codebook gains. The LSP coefficients are interpolated and converted to LP filter coefficients for each subframe. Then, for
`each 5 ms subframe the following steps are done:
`
`•
`
`•
`
`•
`
`the excitation is constructed by adding the adaptive and fixed-codebook vectors scaled by their respective
`gains;
`
`the speech is reconstructed by filtering the excitation through the LP synthesis filter;
`
`the reconstructed speech signal is passed through a post-processing stage, which includes an adaptive
`postfilter based on the long-term and short-term synthesis filters, followed by a high-pass filter and scaling
`operation.
`
`Recommendation G.729 (03/96)
`
`3
`
`Ex. 1038 / Page 7 of 39
`
`
`
`Fixed
`codebook
`
`Adaptive
`codebook
`
`GC
`
`GP
`
`Short-term
`filter
`
`Post-
`processing
`
`T1518660-95/d03
`
`FIGURE 3/G.729
`Principle of the CS-ACELP decoder
`
`FIGURE 3/G.729...[D03] = 7 CM
`
`2.3
`
`Delay
`
`This coder encodes speech and other audio signals with 10 ms frames. In addition, there is a look-ahead of 5 ms, resulting
`in a total algorithmic delay of 15 ms. All additional delays in a practical implementation of this coder are due to:
`
`•
`
`processing time needed for encoding and decoding operations;
`
`•
`
`transmission time on the communication link;
`• multiplexing delay when combining audio data with other data.
`
`2.4
`
`Speech coder description
`
`The description of the speech coding algorithm of this Recommendation is made in terms of bit-exact, fixed-point
`mathematical operations. The ANSI C code indicated in clause 5, which constitutes an integral part of this
`Recommendation, reflects this bit-exact, fixed-point descriptive approach. The mathematical descriptions of the encoder
`(clause 3), and decoder (clause 4), can be implemented in several other fashions, possibly leading to a codec
`implementation not complying with this Recommendation. Therefore, the algorithm description of the ANSI C code of
`clause 5 shall take precedence over the mathematical descriptions of clauses 3 and 4 whenever discrepancies are found. A
`non-exhaustive set of test signals, which can be used with ANSI C code, are available from the ITU.
`
`2.5
`
`Notational conventions
`
`Throughout this Recommendation, it is tried to maintain the following notational conventions:
`•
`
`Codebooks are denoted by caligraphic characters (e.g. ).
`
`•
`
`•
`
`•
`
`•
`
`•
`
`•
`
`Time signals are denoted by their symbol and a sample index between parenthesis [e.g. s(n)]. The symbol n
`is used as sample index.
`
`Superscript indices between parenthesis (e.g. g(m) are used to indicate time-dependency of variables. The
`variable m refers, depending on the context, to either a frame or subframe index, and the variable n to a
`sample index.
`
`Recursion indices are identified by a superscript between square brackets (e.g. E[k]).
`
`Subscripts indices identify a particular element in a coefficient array.
`
`The symbol ^ identifies a quantized version of a parameter (e.g. gc^ ).
`
`Parameter ranges are given between square brackets, and include the boundaries (e.g. [0.6, 0.9]).
`
`4
`
`Recommendation G.729 (03/96)
`
`Ex. 1038 / Page 8 of 39
`
`
`
`•
`•
`•
`
`The function log denotes a logarithm with base 10.
`
`The function int denotes truncation to its integer value.
`
`The decimal floating-point numbers used are rounded versions of the values used in the 16 bit fixed-point
`ANSI C implementation.
`
`Table 2 lists the most relevant symbols used throughout this Recommendation. A glossary of the most relevant signals is
`given in Table 3. Table 4 summarizes relevant variables and their dimension. Constant parameters are listed in Table 5.
`The acronyms used in this Recommendation are summarized in Table 6.
`
`TABLE 2/G.729
`
`Glossary of most relevant symbols
`
`Name
`
`Reference
`
`Description
`
`1/Â(z)
`Hh1(z)
`Hp(z)
`Hf (z)
`Ht(z)
`Hh2(z)
`P(z)
`
`W(z)
`
`Equation (2)
`
`Equation (1)
`
`Equation (78)
`
`Equation (84)
`
`Equation (86)
`
`Equation (91)
`
`Equation (46)
`
`Equation (27)
`
`LP synthesis filter
`
`Input high-pass filter
`
`Long-term postfilter
`
`Short-term postfilter
`
`Tilt-compensation filter
`
`Output high-pass filter
`
`Pre-filter for fixed codebook
`
`Weighting filter
`
`TABLE 3/G.729
`
`Glossary of most relevant signals
`
`Name
`
`Reference
`
`Description
`
`c(n)
`
`d(n)
`
`ew(n)
`
`h(n)
`
`r(n)
`
`s(n)
`s^(n)
`s¢ (n)
`sf(n)
`sf ¢ (n)
`sw(n)
`
`x(n)
`x¢ (n)
`u(n)
`
`v(n)
`
`y(n)
`
`z(n)
`
`3.8
`
`3.8.1
`
`3.10
`
`3.5
`
`3.6
`
`3.1
`
`4.1.6
`
`3.2.1
`
`4.2
`
`4.2
`
`3.6
`
`3.6
`
`3.8.1
`
`3.10
`
`3.7.1
`
`3.7.3
`
`3.9
`
`Fixed-codebook contribution
`
`Correlation between target signal and h(n)
`
`Error signal
`
`Impulse response of weighting and synthesis filters
`
`Residual signal
`
`Pre-processed speech signal
`
`Reconstructed speech signal
`
`Windowed speech signal
`
`Postfiltered output
`
`Gain-scaled postfiltered output
`
`Weighted speech signal
`
`Target signal
`
`Second target signal
`
`Excitation to LP synthesis filter
`
`Adaptive-codebook contribution
`
`Convolution v(n) * h(n)
`
`Convolution c(n) * h(n)
`
`Recommendation G.729 (03/96)
`
`5
`
`Ex. 1038 / Page 9 of 39
`
`
`
`TABLE 4/G.729
`
`Glossary of most relevant variables
`
`Name
`
`Size
`
`Description
`
`gp
`gc
`gl
`gf
`gt
`G
`
`Top
`ai
`ki
`k¢
`1
`oi
`
`i
`
`p^
`i, j
`qi
`r(k)
`r¢ (k)
`wi
`l^
`i
`
`1
`
`1
`
`1
`
`1
`
`1
`
`1
`
`1
`
`11
`
`10
`
`1
`
`2
`
`10
`
`40
`
`10
`
`11
`
`11
`
`10
`
`10
`
`Adaptive-codebook gain
`
`Fixed-codebook gain
`
`Gain term for long-term postfilter
`
`Gain term for short-term postfilter
`
`Gain term for tilt postfilter
`
`Gain for gain normalization
`
`Open-loop pitch delay
`LP coefficients (a0 = 1.0)
`Reflection coefficients
`
`Reflection coefficient for tilt postfilter
`
`LAR coefficients
`
`LSF normalized frequencies
`
`MA predictor for LSF quantization
`
`LSP coefficients
`
`Auto-correlation coefficients
`
`Modified auto-correlation coefficients
`
`LSP weighting coefficients
`
`LSP quantizer output
`
`TABLE 5/G.729
`
`Glossary of most relevant constants
`
`Name
`
`Value
`
`Description
`
`fs
`f0
`g 1
`g 2
`g n
`g d
`g p
`g t
`
`L0
`L1
`L2
`L3
`
`8000
`
`60
`
`0.94/0.98
`0.60/[0.4 -
`0.55
`
`0.70
`
`0.50
`
`0.90/0.2
`
`Table 7
`
`3.2.4
`
`3.2.4
`
`3.2.4
`
`3.2.4
`
`3.9
`
`3.9
`
`wlag
`wlp
`
`Equation (6)
`
`Equation (3)
`
`6
`
`Recommendation G.729 (03/96)
`
`Sampling frequency
`
`Bandwidth expansion
`
`Weight factor perceptual weighting filter
`
` 0.7]
`
`Weight factor perceptual weighting filter
`
`Weight factor postfilter
`
`Weight factor postfilter
`
`Weight factor pitch postfilter
`
`Weight factor tilt postfilter
`
`Fixed (algebraic) codebook
`
`Moving-average predictor codebook
`
`First stage LSP codebook
`
`Second stage LSP codebook (low part)
`
`Second stage LSP codebook (high part)
`
`Gain codebook (first stage)
`
`Gain codebook (second stage)
`
`Correlation lag window
`
`LP analysis window
`
`Ex. 1038 / Page 10 of 39
`
`w
`
`
`TABLE 6/G.729
`
`Glossary of acronyms
`
`Description
`
`Code-Excited Linear-Prediction
`
`Conjugate-Structure Algebraic-CELP
`
`Moving Average
`
`Most Significant Bit
`
`Mean-Squared Error
`
`Log Area Ratio
`
`Linear Prediction
`
`Line Spectral Pair
`
`Line Spectral Frequency
`
`Vector quantization
`
`Acronym
`
`CELP
`
`CS-ACELP
`
`MA
`
`MSB
`
`MSE
`
`LAR
`
`LP
`
`LSP
`
`LSF
`
`VQ
`
`3
`
`Functional description of the encoder
`
`In this clause the different functions of the encoder represented in the blocks of Figure 2 are described. A detailed signal
`flow is shown in Figure 4.
`
`3.1
`
`Pre-processing
`
`As stated in clause 2, the input to the speech encoder is assumed to be a 16 bit PCM signal. Two pre-processing functions
`are applied before the encoding process:
`1)
`signal scaling; and
`2)
`high-pass filtering.
`
`The scaling consists of dividing the input by a factor 2 to reduce the possibility of overflows in the fixed-point
`implementation. The high-pass filter serves as a precaution against undesired low-frequency components. A second order
`pole/zero filter with a cut-off frequency of 140 Hz is used. Both the scaling and high-pass filtering are combined by
`dividing the coefficients at the numerator of this filter by 2. The resulting filter is given by:
`- 1 + 0.46363718z
`Hh1(z) = 0.46363718 -
` 0.92724705z
`
`- 1 + 0.9114024z - 2
`1 -
` 1.9059465z
`
`(1)
`
`- 2
`
`The input signal filtered through Hh1(z) is referred to as s(n), and will be used in all subsequent coder operations.
`
`3.2
`
`Linear prediction analysis and quantization
`
`The short-term analysis and synthesis filters are based on 10th order Linear Prediction (LP) filters.
`
`The LP synthesis filter is defined as:
`
`1
` =
`Â(z)
`
`1 + (cid:229)
`
`1
`10
`i = 1
`
` âi z -
`
` i
`
`(2)
`
`where âi, i = 1,...,10, are the (quantized) Linear Prediction (LP) coefficients. Short-term prediction, or linear prediction
`analysis is performed once per speech frame using the autocorrelation method with a 30 ms asymmetric window. Every
`80 samples (10 ms), the autocorrelation coefficients of windowed speech are computed and converted to the LP
`coefficients using the Levinson algorithm. Then the LP coefficients are transformed to the LSP domain for quantization
`and interpolation purposes. The interpolated quantized and unquantized filters are converted back to the LP filter
`coefficients (to construct the synthesis and weighting filters for each subframe).
`
`Recommendation G.729 (03/96)
`
`7
`
`Ex. 1038 / Page 11 of 39
`
`
`
`Signal flow at the CS-ACELP encoder
`
`FIGURE 4/G.729
`
`T1518670-95/d04
`
`v(n)
`
`state
`update filter
`excitation &
`Compute
`
`3.10
`
`3.8.2
`
`c(n)
`
`P(z)
`
`codeword
`Compute
`
`GA2, GB2
`
`3.9
`
`GA1, GB1
`
`Gains
`
`VQ
`structure
`Conjugate
`
`gp
`
`Â(z)
`
`A(z)
`
`3.5
`
`response
`impulse
`Compute
`
`3.7
`
`index
`LSP
`
`3.4
`
`P2
`
`P0, P1
`
`delay & gain
`closed-loop pitch
`Find
`
`pitch delay
`Find open-loop
`
`3.2.3
`
` LSP
`
`A(z) fi
`
`Â(z)
`
`A(z)
`
`3.2.5;6
`
`LSP fi
`Interpolation &
`
` Â(z)
`
`LSP fi
`Interpolation &
`
` A(z)
`
`L2, L3
`
`L0, L1
`
`3.2.4
`
`quantization
`LSP
`
`3.9.1
`
`prediction
`code-gain
`MA
`
`gc
`
`3.8.1
`
`& store efficiently
`selected amplitudes
`combine with
`Compute F
`
`,
`
`3.8
`
`P(z)
`Pitch prefilter
`
`h(n)
`
`delay
`Pitch
`
`all 40 locations
`amplitude at
`potential pulse
`Pre-select a
`
`index
`Code
`
`S2, C2
`S1, C1
`
`3.8.1
`
`k
`c
`
`measure
`
`kT
`c
`
`2
`
`k
`
`dc
`
`T
`
`which maximizes the
`Find code word ck
`
`d(n)
`
`3.8.1
`
`3.6
`
`Â(z)
`
`A(z)
`
`A(z)
`
`code domain
`signal in
`Compute target
`
`x(n)
`
`target signal
`Compute pitch
`
`v(n)
`
`3.7.1
`
`x(n)
`
`3.3
`
`3.3
`
`3.2.1;2
`
`contribution
`Compute pitch
`
`weighted speech
`Compute
`
`adapt.
`Perceptual
`
`Levinson Durbin
`autocorrelations
`Windowing
`
`3.1
`
`& down scale
`High pass
`
`samples
`Input
`
`update
`Memory
`
`(fixed codebook)
`Algebraic codebook search
`
`(adaptive codebook)
`Closed-loop pitch search
`
`search
`Open loop pitch
`
`LP Analysis
`
`Pre-processing
`
`per subframe
`
`per frame
`
`FIGURE 4/G.729...[D04] = PAGE PLAINE
`
`8
`
`Recommendation G.729 (03/96)
`
`Ex. 1038 / Page 12 of 39
`
`g
`A
`F
`
`
`3.2.1 Windowing and autocorrelation computation
`
`The LP analysis window consists of two parts: the first part is half a Hamming window and the second part is a quarter of
`a cosine function cycle. The window is given by:
`
`wlp(n) =
`
`(cid:238)(cid:237)(cid:236) 0.54 -
`
`2p n
` 0.46 cos Ł(cid:231)(cid:230) ł(cid:247)(cid:246)
`399
` (n -
`2p
`cos Ł(cid:231)(cid:230) ł(cid:247)(cid:246)
` 200)
`159
`
`n = 0,...,199
`
`n = 200,...,239
`
`
`
`(3)
`
`There is a 5 ms lookahead in the LP analysis which means that 40 samples are needed from the future speech frame. This
`translates into an extra algorithmic delay of 5 ms at the encoder stage. The LP analysis window applies to 120 samples
`from past speech frames, 80 samples from the present speech frame, and 40 samples from the future frame. The
`windowing procedure is illustrated in Figure 5.
`
`LP windows
`
`Subframes
`
`T1518680-95/d05
`
`FIGURE 5/G.729
`Windowing procedure in LP analysis
`
`FIGURE 5/G.729...[D05] = 5 CM
`
`The different shading patterns identify corresponding excitation and LP analysis windows.
`
`The windowed speech:
`
`s¢ (n) = wlp(n) s(n) n = 0,...,239
`
`is used to compute the autocorrelation coefficients:
`
`239
`r (k) = (cid:229)
` s¢ (n) s¢ (n -
`n = k
`
` k) k = 0,...,10
`
`(4)
`
`(5)
`
`To avoid arithmetic problems for low-level input signals the value of r(0) has a lower boundary of r(0) = 1.0. A 60 Hz
`bandwidth expansion is applied, by multiplying the autocorrelation coefficients with:
`
`wlag(k) = exp
`
`ºŒŒØ -
`2 Ł(cid:231)(cid:230)
`2p
` 1
`
`ßœœø k = 1,...,10
`ł(cid:247)(cid:246) 2
`
` f0 k
`fs
`
`(6)
`
`Recommendation G.729 (03/96)
`
`9
`
`Ex. 1038 / Page 13 of 39
`
`
`
`where f0 = 60 Hz is the bandwidth expansion and fs = 8000 Hz is the sampling frequency. Furthermore, r(0) is multiplied
`by a white-noise correction factor 1.0001, which is equivalent to adding a noise floor at - 40 dB. The modified
`autocorrelation coefficients are given by:
`
`r ¢ (0) = 1.0001 r (0)
`r ¢ (k) = wlag(k) r (k) k = 1,...,10
`
`(7)
`
`3.2.2
`
`Levinson-Durbin algorithm
`
`The modified autocorrelation coefficients r¢ (k) are used to obtain the LP filter coefficients, ai, i = 1,...,10, by solving the
`set of equations:
`
` k|) = - r ¢ (k) k = 1,...,10
`
`(8)
`
`10
`
` air ¢ (|i -
` = 1
`
`(cid:229) i
`
`The set of equations in (8) is solved using the Levinson-Durbin algorithm. This algorithm uses the following recursion:
`
`E[0] = r ¢ (0)
`for i = 1 to 10
`[i -
` 1]
` = 1
`a 0
`
`ki = -
`
`ºŒØ (cid:229)
`
`
`
`i -
`j = 0
`
`[i -
` 1 a j
`
` 1]
` r ¢ (i -
`
` j)ßœø / E[i -
`
` 1]
`
`[i]
` = ki
`a i
`for j = 1 to i -
`[i]
`[i -
` = a j
`a j
`
`end
`
` 1
` 1]
`[i -
` + kia i -
`
` 1]
` j
`
`E[i] = Ł(cid:230) 1 -
`
`2ł(cid:246) E[i -
` ki
`
` 1]
`
`end
`
`[10], j = 0, ...,10, with a0 = 1.0.
`The final solution is given as aj = aj
`
`3.2.3
`
`LP to LSP conversion
`
`The LP filter coefficients ai, i = 0,...10 are converted to Line Spectral Pair (LSP) coefficients for quantization and
`interpolation purposes. For a 10th order LP filter, the LSP coefficients are defined as the roots of the sum and difference
`polynomials:
`
`and:
`
`¢ (z) = A(z) + z- 11A(z- 1)
`F1
`
`¢ (z) = A(z) -
`F2
`
` z- 11A(z- 1)
`
`(9)
`
`(10)
`
`10
`
`Recommendation G.729 (03/96)
`
`Ex. 1038 / Page 14 of 39
`
`
`
`¢ (z) is antisymmetric. It can be proven that all roots of theserespectively. The polynomial F1¢ (z) is symmetric, and F2
`
`
`¢ (z) has a root z = - 1 (w
`¢ (z) has a root z = 1
` = p ) and F2
`polynomials are on the unit circle and they alternate each other. F1
`(w = 0). These two roots are eliminated by defining the new polynomials:
`¢ (z) / (1 + z- 1)
`F1(z) = F1
`
`(11)
`
`and:
`
`¢ (z) / (1 -
`F2(z) = F2
`
` z- 1)
`
`Each polynomial has five conjugate roots on the unit circle (e– jw i), and they can be written as:
`
`and:
`
`F1(z) =
`
` (1 -
`i = 1, 3,...,9
`
` 2qiz- 1 + z- 2)
`
`F2(z) =
`
` (1 -
`i = 2, 4,...,10
`
` 2qiz- 1 + z- 2)
`
`(12)
`
`(13)
`
`(14)
`
`i). The coefficients w
`where qi = cos(w
`i are the Line Spectral Frequencies (LSF) and they satisfy the ordering property 0 <
`i < w 2 < ... < w 10 < p
`. The coefficients qi are referred to as the LSP coefficients in the cosine domain.
`Since both polynomials F1(z) and F2(z) are symmetric only the first five coefficients of each polynominal need to be
`computed. The coefficients of these polynomials are found by the recursive relations:
`f1(i + 1) = ai + 1 + a10 -
` i -
` f1(i) i = 0,...,4
`f2(i + 1) = ai + 1 -
` i + f2(i) i = 0,...,4
` a10 -
`
`(15)
`
`where f1(0) = f2(0) = 1.0. The LSP coefficients are found by evaluating the polynomials F1(z) and F2(z) at 60 points
`equally spaced between 0 and p
` and checking for sign changes. A sign change signifies the existence of a root and the
`sign change interval is then divided four times to allow better tracking of the root. The Chebyshev polynomials are used to
`evaluate F1(z) and F2(z). In this method the roots are found directly in the cosine domain. The polynomials F1(z) or F2(z),
`evaluated at z = ejw
`, can be written as:
`
`F(w
`
`) = 2e- j5w
`
` C(x)
`
`with:
`
`C(x) = T5(x) + f(1)T4(x) + f(2)T3(x) + f(3)T2(x) + f(4)T1(x) + f(5)/2
`
`(16)
`
`(17)
`
`) is the mth order Chebyshev polynomial, and f(i), i = 1,...,5, are the coefficients of either F1(z) or
`where Tm(x) = cos(mw
`F2(z), computed using Equation (15). The polynomial C(x) is evaluated at a certain value of x = cos(w
`) using the recursive
`relation:
`
`for k = 4 down to 1
`bk = 2xbk + 1 -
`
` bk + 2 + f(5 -
`
` k)
`
`end
`C(x) = xb1 -
`
` b2 + f(5)/2
`
`with initial values b5 = 1 and b6 = 0.
`
`Recommendation G.729 (03/96)
`
`11
`
`Ex. 1038 / Page 15 of 39
`
`(cid:213)
`(cid:213)
`w
`
`
`3.2.4
`
`Quantization of the LSP coefficients
`
`The LSP coefficients qi are quantized using the LSF representation w
`
`i in the normalized frequency domain [0, p ]; that is:
`
`i = arccos(qi) i = 1,...,10
`
`(18)
`
`A switched 4th order MA prediction is used to predict the LSF coefficients of the current frame. The difference between
`the computed and predicted coefficients is quantized using a two-stage vector quantizer. The first stage is a
`10-dimensional VQ using codebook L1 with 128 entries (7 bits). The second stage is a 10 bit VQ which has been
`implemented as a split VQ using two 5-dimensional codebooks, L2 and L3 containing 32 entries (5 bits) each.
`
`To explain the quantization process, it is convenient to first describe the decoding process. Each coefficient is obtained
`from the sum of two codebooks:
`
`i =
`l^
`
`(cid:238)(cid:239)(cid:237)(cid:239)(cid:236) L1i (L1) + L2i (L2)
`
`L1i (L1) + L3i -
`
` 5 (L3)
`
`i = 1,...,5
`
`i = 6,...,10
`
`
`
`(19)
`
`where L1, L2 and L3 are the codebook indices. To avoid sharp resonances in the quantized LP synthesis filter, the
`coefficients l^
`i are arranged such that adjacent coefficients have a minimum distance of J. The rearrangement routine is
`shown below:
`
`for i = 2,...,10
`i -
`if (l^
` 1 > l^
` J)
`i -
` 1 -
` 1 = (l^i + l^i -
`
`
`l^
` J)/2
`i -
`
`
` 1 + J)/2
`i = (l^i + l^i -
`l^
`
`end
`
`end
`
`This rearrangement process is done twice. First with a value of J = 0.0012, then with a value of J = 0.0006. After this
`rearrangement process, the quantized LSF coefficients w^(m)
`i for the current frame m, are obtained from the weighted sum
`^(m -
`^(m):
` k), and the current quantizer output li
`of previous quantizer outputs li
`
`i, k l^(m -
`i i = 1,...,10
` k)
`p^
`
`(20)
`
`4
`
`i, kł(cid:247)(cid:247)(cid:246) l^(m)
`i + (cid:229)
`
`k = 1
`
`p^
`
`4
`
`
` = 1
`
`(cid:229) k
`
`
`
`
`
`w ^ (m) i =
`
`Ł(cid:231)(cid:231)(cid:230) 1 -
`
`where p^
`i, k are the coefficients of the switched MA predictor. Which MA predictor to use is defined by a separate bit L0.
`i = ip /11 for all k < 0.
`^(k) are given by l^
`At start-up the initial values of li
`
`After computing w^
`
`i, the corresponding filter is checked for stability. This is done as follows:
`order the coefficient w^
`i in increasing value;
`i < 0,005 then w^
`i = 0.005;
` w^
` 0.0391 then w^
`i -
`i + 1 -
`i + 1 = w^
`10 > 3.135 then w^
`10 = 3.135.
`
`i + 0.0391, i = 1,...,9;
`
`1)
`
`2)
`
`3)
`
`4)
`
`if w^
`
`if w^
`
`if w^
`
`12
`
`Recommendation G.729 (03/96)
`
`Ex. 1038 / Page 16 of 39
`
`w
`
`
`The procedure for encoding the LSF parameters can be outlined as follows. For each of the two MA predictors the best
`approximation to the current LSF coefficients has to be found. The best approximation is defined as the one that
`minimizes the weighted mean-squared error:
`
`Elsf = (cid:229)
`
`10
`
` wi(w
`i = 1
`
`i -
`
` w ^
`
`
`
`i)2
`
`(21)
`
`The weights wi are made adaptive as a function of the unquantized LSF coefficients,
`
`wi =
`
`(cid:238)(cid:239)(cid:237)(cid:239)(cid:236) 1.0
`
`10 (w 2 -
`
` 0.04p
`
` -
`
`
`
` 1)2 + 1
`
` 0.04p
`if w 2 -
`otherwise
`
` -
`
`
`
` 1 > 0
`
`wi 2 £
`
` i £
`
` 9 =
`
`(cid:238)(cid:239)(cid:237)(cid:239)(cid:236) 1.0
`
`10 (w
`
`w10 =
`
`(cid:238)(cid:239)(cid:237)(cid:239)(cid:236) 1.0
`
`10 (-w
`
`i + 1 -
`
` w
`
`
`
` 1 -
`
`i -
`
` 1)2 + 1
`
` w
`
`
`
`i + 1 -
`if w
`otherwise
`
` 1 -
`
`i -
`
` 1 > 0
`
`(22)
`
`9 + 0.92p
`
` -
`
`
`
` 1)2 + 1
`
`9 + 0.92p
`if -w
`otherwise
`
` -
`
`
`
` 1 > 0
`
`In addition, the weights w5 and w6 are multiplied by 1.2 each.
`
`The vector to be quantized for the current frame m is obtained from
`
`(23)
`
`4
`
`
` = 1
`
`i, kł(cid:247)(cid:247)(cid:246) i = 1,...,10
`
`p^
`
`(cid:229) k
`
`
`
` k)
`
`ßœœø /
`
`Ł(cid:231)(cid:231)(cid:230) 1 -
`
`^(m -
`p^
`i, k li
`
`4
`
`
` = 1
`
`(cid:229) k
`
`
`
`li =
`
`(m)
`
`i -
`
`ºŒŒØ w
`
`The first codebook L1 is searched and the entry L1 that minimizes the (unweighted) mean-squared error is selected. This
`is followed by a search of the second codebook L2, which defines the lower part of the second stage. For each possible
`candidate, the partial vector w^
`i, i = 1,...,5, is reconstructed using Equation (20), and rearranged to guarantee a minimum
`distance of 0.0012. The weighted MSE of Equation (21) is computed, and the vector L2 which re