`
`THE INTERNATIONAL
`TELEGRAPH AND TELEPHONE
`CONSULTATIVE COMMITIEE
`
`G.728
`(09/92)
`
`GENERAL ASPECTS OF DIGITAL
`TRANSMISSION SYSTEMS;
`
`TERMINAL EQUIPMENTS
`
`CODING OF SPEECH AT 16 kbit/s
`USING LOW-DELAY CODE EXCITED
`LINEAR PREDICTION
`
`Recommendation G. 728
`
`•
`
`Geneva, 1992
`
`
`
`ZTE EXHffiiT 1015
`
`Page 1 of 65
`
`
`
`FOREWORD
`
`The CCITT (the International Telegraph and Telephone Consultative Committee) is a permanent organ of the
`International Telecommunication Union (ITU). CCITT is responsible for studying technical, operating and tariff
`questions and issuing Recommendations on them with a view to standardizing telecommunications on a worldwide
`basis.
`
`The Plenary Assembly of CCITT which meets every four years, establishes the topics for study and approves
`Recommendations prepared by its Study Groups. The approval of Recommendations by the members of CCITT between
`Plenary Assemblies is covered by the procedure laid down in CCITT Resolution No. 2 (Melbourne, 1988).
`
`Recommendation G.796 was prepared by Study Group XV and was approved under the Resolution No. 2
`procedure on the 1st of September 1992.
`
`___________________
`
`CCITT NOTES
`
`In this Recommendation, the expression “Administration” is used for conciseness to indicate both a
`1)
`telecommunication administration and a recognized private operating agency.
`
`2)
`
`A list of abbreviations used in this Recommendation can be found in Annex F.
`
`All rights reserved. No part of this publication may be reproduced or utilized in any form or by any means, electronic or
`mechanical, including photocopying and microfilm, without permission in writing from the ITU.
`
` ITU 1992
`
`Page 2 of 65
`
`ª
`
`
`Recommendation G.728
`
`
`
`1
`
`Introduction
`
`CODING OF SPEECH AT 16 kbit/s USING LOW-DELAY
`CODE EXCITED LINEAR PREDICTION
`
`(1992)
`
`This Recommendation contains the description of an algorithm for the coding of speech signals at 16 kbit/s
`using low-delay code excited linear prediction (LD-CELP). This Recommendation is organized as follows.
`
`In § 2 a brief outline of the LD-CELP algorithm is given. In §§ 3 and 4, the LD-CELP encoder and LD-CELP
`decoder principles are discussed, respectively. In § 5, the computational details pertaining to each functional algorithmic
`block are defined. Annexes A, B, C and D contain tables of constants used by the LD-CELP algorithm. In Annex E the
`sequencing of variable adaptation and use is given. Finally, in Appendix I information is given on procedures applicable
`to the implementation verification of the algorithm.
`
`Under further study is the future incorporation of three additional appendices (to be published separately)
`consisting of LD-CELP network aspects, LD-CELP fixed-point implementation description, and LD-CELP fixed-point
`verification procedures.
`
`2
`
`Outline of LD-CELP
`
`The LD-CELP algorithm consists of an encoder and a decoder described in §§ 2.1 and 2.2 respectively, and
`illustrated in Figure 1/G.728.
`
`The essence of CELP techniques, which is an analysis-by-synthesis approach to codebook search, is retained
`in LD-CELP. The LD-CELP however, uses backward adaptation of predictors and gain to achieve an algorithmic delay
`of 0.625 ms. Only the index to the excitation codebook is transmitted. The predictor coefficients are updated through
`LPC analysis of previously quantized speech. The excitation gain is updated by using the gain information embedded in
`the previously quantized excitation. The block size for the excitation vector and gain adaptation is five samples only. A
`perceptual weighting filter is updated using LPC analysis of the unquantized speech.
`
`2.1
`
`LD-CELP encoder
`
`After the conversion from A-law or m -law PCM to uniform PCM, the input signal is partitioned into blocks of
`five-consecutive input signal samples. For each input block, the encoder passes each of 1024 candidate codebook
`vectors (stored in an excitation codebook) through a gain scaling unit and a synthesis filter. From the resulting 1024
`candidate quantized signal vectors, the encoder identifies the one that minimizes a frequency-weighted mean-squared
`error measure with respect to the input signal vector. The 10-bit codebook index of the corresponding best codebook
`vector (or “codevector”), which gives rise to that best candidate quantized signal vector, is transmitted to the decoder.
`The best codevector is then passed through the gain scaling unit and the synthesis filter to establish the correct filter
`memory in preparation for the encoding of the next signal vector. The synthesis filter coefficients and the gain are
`updated periodically in a backward adaptive manner based on the previously quantized signal and gain-scaled excitation.
`
`Recommendation G.728 (09/92)
`
`1
`
`Page 3 of 65
`
`
`
`64 kbit/s
`-
`-
`A law or~ law
`PCM input
`
`Convert to
`uniform
`PCM
`
`..
`
`Vector
`buffer
`
`Excitation
`VQ
`codebook
`
`~ +
`-4--.
`
`Synthesis
`filter
`
`Perceptual
`weighting
`filter
`
`f-+
`
`Min.
`MSE
`
`VQ
`index 16 kbit/s
`output
`
`I
`I
`I
`I
`I
`I
`
`r
`L---------------------------------1
`
`I
`I
`I
`
`Backward
`gain
`adaptation
`
`Backward
`predictor
`adaptation
`
`a) LD-CELP encoder
`
`VQ
`index
`
`16 kbit/s
`input
`
`Excitation
`VQ
`codebook
`
`Postfilter
`
`64 kbit/s
`. - - --, A-law or ~-law
`PCM output
`Convert
`to PCM 1------i~
`
`T1506740-92
`
`Backward
`predictor
`adaptation
`
`b) LD-CELP decoder
`
`FIGURE 1/0.728
`
`Simplified block diagram of LD-CELP coder
`
`2.2
`
`LD-CELP decoder
`
`The decoding operation is also pe1f01med on a block-by-block basis. Upon receiving each 10-bit index, the
`decoder petfmms a table look-up to extract the con·esponding codevector from the excitation codebook. The exfl·acted
`codevector is then passed through a gain scaling unit and a synthesis filter to produce the ctment decoded signal vector.
`The synthesis filter coefficients and the gain are then updated in the same way as in the encoder. The decoded signal
`vector is then passed through an adaptive postfilter to enhance the perceptual quality. The postfilter coefficients are
`updated periodically using the information available at the decoder. The five samples of the postfuter signal vector are
`next convetted to five A-law or Jl-law PCM output samples.
`
`2
`
`Recommendation G.728
`
`
`(09/92)
`
`Page 4 of 65
`
`
`
`64 kbit/s
`A-law or ~-law
`PCM input speech
`
`I
`Input PCM
`format
`s0 ( k)
`conversion
`_C_ Simulated decoder 8
`
`Linear
`PCM input
`speech
`
`su (k)
`
`I 2
`
`Vector
`buffer
`
`Input
`speech
`vector
`
`s (n)
`
`..
`,--- --------,
`-t. Excitation ~K e (n)
`I
`
`I -
`
`19
`
`1
`
`21
`
`-----.. Gam
`VQ codebook ~
`
`1 22
`Synthesis
`filter
`
`Quantized
`speech
`f--
`
`I
`I
`Sq (n) I
`r- 23
`~ Backward
`
`Adapter for
`perceptual
`weighting
`filter
`
`W (z)
`
`Perceptual
`
`f-+ weighting
`
`filter
`
`v (n)
`
`r 11
`r (n) VQ target
`~ vector
`computation
`
`~ (n)
`
`Time-
`
`r 13
`reversed
`convolution
`module
`
`P (n)
`
`I
`I
`I
`
`I
`I
`
`1
`
`I
`
`I
`
`1
`
`1
`
`r- 20
`
`cr( n)
`
`....._ Backward
`....-
`vector gain
`adapter
`
`....._
`....-
`
`5
`
`6
`
`r-r- synthesis
`filter adapter
`L ____ - - - - - -1 - - - -
`+ r 10
`7
`r-9
`7. Synthesis HHI-+I
`4
`filter
`
`I
`
`I
`
`Perceptual
`weighting
`filter
`
`x (n)
`I -- -- -- -- 1 - - - - - - - - -r- - - - -
`r 16 1
`r
`, r 12
`I
`Impulse
`VQ target
`I
`Codebook
`response
`vector
`search modul~
`vector
`normalization
`"
`I
`
`:;:;~"""
`
`Shap! r 14
`Y
`I
`j
`codevector
`...._ _______ ..:.~---"'T""----+1 convolution
`module
`
`+ r 15
`r 17 E·
`,.....!.... Energy table
`calculator 1 calculator
`
`Error
`
`, 18
`
`I
`
`I
`
`I
`
`I
`
`Best
`codebook
`mdex
`selector
`
`I
`L_ ___________ j
`
`T1506750-92
`Bestcodebookindex
`Codebook index
`L - - - - - f - - - - - - - - - - - - - :L - - - - - - - - - - - - - I •to commumcatwn
`channel
`
`FIGURE 2/0 .728
`LD-CELP encoder block sch ematic
`
`
`
`Recommendation G.728
`
`(09/92)
`
`3
`
`Page 5 of 65
`
`
`
`Codebook index
`from com muni-
`cation ch anne!
`
`29
`r
`~31
`Excitation
`VQ
`code book
`
`~
`
`r 32 Decoded
`speech
`Synthesis
`filter
`
`r30
`
`j r33
`
`Backward
`vector gain ~
`adapter
`
`Backward
`synthesis
`filter adapter
`
`64 kbit/s
`r 28 A-law or~I:W
`PCMou
`speech
`
`Output
`PCM format
`convers1on
`
`T1506760-92
`
`r 34
`
`Postfilter -----.
`j r35
`
`Postfilter
`adapter
`
`t
`I
`1Oth-order LPC predictor
`coefficients and first
`reflection coefficient
`
`FIGURE 3/G.728
`
`LD-CELP decoder block schematic
`
`3
`
`LD-CELP (encoder pr inciples)
`
`Figure 2/G.728 is a detailed block schematic of the LD-CELP encoder. The encoder in Figw-e 2/G.728 is
`mathematically equivalent to the encoder previously shown in Figw-e 1/G.728 but is computationally more efficient to
`implement.
`
`In the following description:
`
`a)
`
`for each variable to be desc1·ibed, k is the sampling index and samples are taken at 125 !J.S intervals;
`
`b)
`
`a group of five consecutive samples in a given signal is called a vector of that signal. For example, five
`c.onsecutive speech samples fom1 a speech vector, five excitation samples fonn an excitation vector, and
`so on;
`
`c) we use n to denote the vector index, which is different fi·om the sample index k;
`
`d)
`
`fow- consecutive vectors build one adaptation cycle. In a later se.ction, we also refer to adaptation cycles
`as frames. The two te1ms are used interchangeably.
`
`The excitation vector quantization (VQ) codebook index is the only infonnation explicitly transmitted fi·om the
`encoder to the decoder. Three other types of parameters will be periodically updated: the excitation gain, the synthesis
`filter coefficients, and the perceptual weighting filter coefficients. These parameters are derived in a backward adaptive
`manner from signals that occw- prior to the cun·ent signal vector. The excitation gain is updated once per vector, while
`the synthesis fllter coefficients and the perceptual weighting filter coefficients are updated once eve1y fow- vectors (i.e. a
`20-sample, or 2.5 ms update period). Note that, although the processing sequence in the algorithm has an adaptation
`cycle of fow- vectors (20 samples), the basic buffer size is still only one vector (five samples). This small buffer size
`makes it possible to achieve a one-way delay less than 2 ms.
`
`A description of each block of the encoder is given below. Since the LD-CELP coder is mainly used for
`encoding speech, for convenience of description, in the following we will assume that the input signal is speech,
`although in practice it can be other non-speech signals as well.
`
`4
`
`Recommendation G.728
`
`
`(09/92)
`
`Page 6 of 65
`
`
`
`3.1
`
`Input PCM format conversion
`
`This block converts the input A-law or m -law PCM signal so(k) to a uniform PCM signal su(k).
`
`3.1.1
`
`Internal linear PCM levels
`
`In converting from A-law or m -law to linear PCM, different internal representations are possible, depending on
`the device. For example, standard tables for m -law PCM define a linear range of –4 015.5 to +4 015.5. The
`corresponding range for A-law PCM is –2 016 to +2 016. Both tables list some output values having a fractional part of
`0.5. These fractional parts cannot be represented in an integer device unless the entire table is multiplied by 2 to make all
`of the values integers. In fact, this is what is most commonly done in fixed point digital signal processing (DSP) chips.
`On the other hand, floating point DSP chips can represent the same values listed in the tables. Throughout this document
`it is assumed that the input signal has a maximum range of –4 095 to +4 095. This encompasses both the m -law and A-
`law cases. In the case of A-law it implies that when the linear conversion results in a range of –2 016 to +2 016, those
`values should be scaled up by a factor of 2 before continuing to encode the signal. In the case of m -law input to a fixed
`point processor where the input range is converted to –8 031 to +8 031, it implies that values should be scaled down by a
`factor of 2 before beginning the encoding process. Alternatively, these values can be treated as being in Q1 format,
`meaning there is one bit to the right of the decimal point. All computation involving the data would then need to take
`this bit into account.
`
`For the case of 16-bit linear PCM input signals having full dynamic range of –32 768 to +32 767, the input
`values should be considered to be in Q3 format. This means that the input values should be scaled down (divided) by a
`factor of 8. On output at the decoder the factor of 8 would be restored for these signals.
`
`3.2
`
`Vector buffer
`
`This block buffers five consecutive speech samples su(5n), su(5n + 1), ..., su(5n + 4) to form a 5-dimensional
`speech vector s(n) = [su(5n), su(5n + 1), ..., su(5n + 4)].
`
`3.3
`
`Adapter for perceptual weighting filter
`
`Figure 4/G.728 shows the detailed operation of the perceptual weighting filter adapter (block 3 in
`Figure 2/G.728). This adapter calculates the coefficients of the perceptual weighting filter once every four speech
`vectors based on linear prediction analysis (often referred to as LPC analysis) of unquantized speech. The coefficient
`updates occur at the third speech vector of every 4-vector adaptation cycle. The coefficients are held constant in between
`updates.
`
`Refer to Figure 4a)/G.728. The calculation is performed as follows. First, the input (unquantized) speech
`vector is passed through a hybrid windowing module (block 36) which places a window on previous speech vectors and
`calculates the first 11 autocorrelation coefficients of the windowed speech signal as the output. The Levinson-Durbin
`recursion module (block 37) then converts these autocorrelation coefficients to predictor coefficients. Based on these
`predictor coefficients, the weighting filter coefficient calculator (block 38) derives the desired coefficients of the
`weighting filter. These three blocks are discussed in more detail below.
`
`First, let us describe the principles of hybrid windowing. Since this hybrid windowing technique will be used
`in three different kinds of LPC analyses, we first give a more general description of the technique and then specialize it
`to different cases. Suppose the LPC analysis is to be performed once every L signal samples. To be general, assume that
`the signal samples corresponding to the current LD-CELP adaptation cycle are su(m), su(m + 1), su(m + 2), ..., su(m + L –
`1). Then, for backward-adaptive LPC analysis, the hybrid window is applied to all previous signal samples with a
`sample index less than m (as shown in Figure 4b)/G.728). Let there be N non-recursive samples in the hybrid window
`function. Then, the signal samples su(m – 1), su(m – 2), ..., su(m – N) are all weighted by the non-recursive portion of the
`window. Starting with su(m – N – 1), all signal samples to the left of (and including) this sample are weighted by the
`, ba 2, ..., where 0 < b < 1 and 0 < a
`recursive portion of the window, which has values b, ba
` < 1.
`
`Recommendation G.728 (09/92)
`
`5
`
`Page 7 of 65
`
`
`
`Input speech
`
`,- ____c3
`
`36
`
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`L_ __ j
`
`1
`
`Hybrid
`windowing
`module
`
`37
`
`Levinson(cid:173)
`Durnin
`recursion
`module
`
`38
`
`Weighting filter
`coefficient
`calculator
`
`T1506770-92
`
`Perceptual
`weighting filter
`coefficient
`
`FIGURE 4a)/G.728
`
`Perc.eptual weighting filter adapter
`
`Recursive portion
`
`Non-recursive portion
`
`b
`---._.
`ba
`ba'
`---._.
`
`----.
`
`/
`
`Wm(n) : window function
`
`II
`--~~~~~~~~~~~~~~~~-/ ______ \ __ 1 ______ \ __ + Time
`
`Current frame
`
`Next frame
`
`m+2L- 1
`
`m---N- 1
`
`m+L- 1
`
`T150678G-92
`
`FIGURE 4b)/G.728
`
`lliustration of a hyblid window
`
`6
`
`Recommendation G.728
`
`
`(09/92)
`
`Page 8 of 65
`
`
`
`At time m, the hybrid window function wm(k) is defined as
`
`wm(k) = (cid:238)(cid:237)(cid:236)
`
` fm(k) = ba –[k–(m–N–1)],
` gm(k) = –sin [c(k – m)],
` 0,
`
`
`if
`if m – N £
`if
`
`
`k £
`k £
`k ‡
`
` m – N – 1
` m – 1
` m
`
`and the window-weighted signal is
`
`sm(k) = su(k) wm(k) = (cid:238)(cid:237)(cid:236)
`
` su(k) fm(k) = su(k) ba –[k–(m–N–1)],
` su(k) gm(k) = –su(k) sin [c(k – m)],
` 0,
`
`
`if
`if m – N £
`if
`
`
`k £
`k £
`k ‡
`
` m – N – 1
` m – 1
` m
`
`(3-1a)
`
`(3-1b)
`
`The samples of non-recursive portion gm(k) and the initial section of the recursive portion fm(k) for different
`hybrid windows are specified in Annex A. For an M-th order LPC analysis, we need to calculate M + 1 autocorrelation
`coefficients Rm(i) for i = 0, 1, 2, ..., M. The i-th autocorrelation coefficient for the current adaptation cycle can be
`expressed as
`
`Rm(i) =
`
`m–1
` sm(k) sm(k – i) = rm(i) +
`k = –¥
`
`m–1
` sm(k) sm(k – i)
`k = m–N
`
`where
`
`rm(i) =
`
`m–N–1
` sm(k) sm(k – i) =
`k = –¥
`
`m–N–1
` su(k) su(k – i) fm(k) fm(k – i)
`k = –¥
`
`(3-1c)
`
`(3-1d)
`
`On the right-hand side of equation (3-1c), the first term rm(i) is the “recursive component” of Rm(i), while the
`second term is the “non-recursive component”. The finite summation of the non-recursive component is calculated for
`each adaptation cycle. On the other hand, the recursive component is calculated recursively. The following paragraphs
`explain how.
`
`Suppose we have calculated and stored all rm(i)s for the current adaptation cycle and want to go on to the next
`adaptation cycle, which starts at sample su(m + L). After the hybrid window is shifted to the right by L samples, the new
`window-weighted signal for the next adaptation cycle becomes
`
`sm+L(k) = su(k) wm+L(k) =
`
` su(k) fm+L(k) = su(k) fm(k) a L,
` su(k) gm+L(k) = –su(k) sin [c(k – m – L)],
`0,
`
`if
`if m + L – N £
`if
`
`
`k £
`k £
`k ‡
`
` m + L – N – 1
` m + L – 1
` m + L
`
`(3-1e)
`
`The recursive component of Rm + L(i) can be written as
`
`rm+L(i) =
`
`m+L–N–1
` sm+L(k) sm+L(k – i)
`k = –¥
`
`=
`
`m–N–1
` sm+L(k) sm+L(k – i) +
`
`
`k = –¥
`
`m+L–N–1
` sm+L(k) sm+L(k – i)
`k = m–N
`
`=
`
`m–N–1
` su(k) fm(k) a L su(k – i) fm(k – i) a L +
`
`
`k = –¥
`
`m+L–N–1
` sm+L(k) sm+L(k – i)
`k = m–N
`
`or
`
`rm+L(i) = a 2L rm(i) +
`
`m+L–N–1
` sm+L(k) sm+L(k – i)
`k = m–N
`
`(3-1f)
`
`(3-1g)
`
`Recommendation G.728 (09/92)
`
`7
`
`Page 9 of 65
`
`(cid:229)
`(cid:229)
`(cid:229)
`(cid:229)
`(cid:238)(cid:237)(cid:236)
`(cid:229)
`(cid:229)
`(cid:229)
`(cid:229)
`(cid:229)
`(cid:229)
`
`
`Therefore, rm+L(i) can be calculated recursively from rm(i) using equation (3-1g). This newly calculated
`rm+L(i) is stored back to memory for use in the following adaptation cycle. The autocorrelation coefficient rm+L(i) is then
`calculated as
`
`Rm+L(i) = rm+L(i) +
`
`m+L–1
` sm+L(k) sm+L(k – i)
`k = m+L–N
`
`(3-1h)
`
`So far we have described in a general manner the principles of a hybrid window calculation procedure. The
`parameter values for the hybrid windowing module 36 in Figure 4a)/G.728 are
`
`0 = 0.982820598 Ł(cid:230) ł(cid:246)
`
` so that a 2L =
`
`
`
`12
`
`1 4
`
`M = 10, L = 20, N = 30 and a
`
` = Ł(cid:230) ł(cid:246)1
`
`2
`
`Once the 11 autocorrelation coefficients R(i), i = 0, 1, ..., 10 are calculated by the hybrid windowing procedure
`described above, a “white noise correction” procedure is applied. This is done by increasing the energy R(0) by a small
`amount:
`
`R(0) ‹
`
`Ł(cid:230) ł(cid:246)
`257
`256 R(0)
`
`
`
`(3-1i)
`
`This has the effect of filling the spectral valleys with white noise so as to reduce the spectral dynamic range
`and alleviate ill-conditioning of the subsequent Levinson-Durbin recursion. The white noise correction factor (WNCF)
`of 257/256 corresponds to a white noise level about 24 dB below the average speech power.
`
`Next, using the white noise corrected autocorrelation coefficients, the Levinson-Durbin recursion module 37
`recursively computes the predictor coefficients from order 1 to order 10. Let the j-th coefficients of the i-th order
`predictor be aj(i). Then, the recursive procedure can be specified as follows:
`
`(3-2a)
`
`(3-2b)
`
`(3-2c)
`
`(3-2d)
`
`(3-2e)
`
`E(0) = R(0)
`
`i–1
`
`R(i) + (cid:229)
`
` a
`
`(i–1)
` R(i – j)
`j
`
`ki = –
`
`j=1
`E(i – 1)
`
`a
`
`(i)
` = ki
`i
`
`a
`
`(i)
` = a
`j
`
`(i–1)
` + ki a
`j
`
`(i–1)
` ;
`i–j
`
`1 £
`
` j £
`
` i – 1
`
`2 i
`
`E(i) = (1 – k
`
`) E(i – 1)
`
`Equations (3-2b) through (3-2e) are evaluated recursively for i = 1, 2, ..., 10, and the final solution is given by
`
`qi = a
`
`(10)
` ,
`i
`
`1 £
`
` i £
`
` 10
`
`(3-2f)
`
`8
`
`Recommendation G.728 (09/92)
`
`Page 10 of 65
`
`(cid:229)
`
`
`If we define q0=1, then the 10-th order “prediction-error filter” (sometimes called “analysis filter”) has the
`transfer function
`
`(3-3a)
`
`(3-3b)
`
`10
`
`(z) = (cid:229)
` qi z–i
`i=0
`
`~Q
`
`and the corresponding 10-th order linear predictor is defined by the following transfer function
`
`10
`
`Q(z) = –(cid:229)
` qi z–i
`i=1
`
`The weighting filter coefficient calculator (block 38) calculates the perceptual weighting filter coefficients
`according to the following equations:
`
`and
`
`Q(z / g 1) = –(cid:229)
`
`10
`
` (qi g i
`1) z–i
`
`i=1
`
`Q(z / g 2) = –(cid:229)
`
`10
`
` (qi g i
`2) z–i
`
`i=1
`
`(3-4b)
`
`(3-4c)
`
`The perceptual weighting filter is a 10-th order pole-zero filter defined by the transfer function W(z) in
`equation (3-4a). The values of g 1 and g 2 are 0.9 and 0.6, respectively.
`
`Now refer to Figure 2/G.728. The perceptual weighting filter adapter (block 3) periodically updates the
`coefficients of W(z) according to equations (3-2) through (3-4), and feeds the coefficients to the impulse response vector
`calculator (block 12) and the perceptual weighting filters (blocks 4 and 10).
`
`3.4
`
`Perceptual weighting filter
`
`In Figure 2/G.728, the current input speech vector s(n) is passed through the perceptual weighting filter
`(block 4), resulting in the weighted speech vector v(n). Note that except during initialization, the filter memory
`(i.e. internal state variables, or the values held in the delay units of the filter) should not be reset to zero at any time. On
`the other hand, the memory of the perceptual weighting filter (block 10) will need special handling as described later.
`
`3.4.1
`
`Non-speech operation
`
`For modem signals or other non-speech signals, CCITT test results indicate that it is desirable to disable the
`perceptual weighting filter. This is equivalent to setting W(z)=1. This can most easily be accomplished if g 1 and g 2 in
`equation (3-4a) are set equal to zero. The nominal values for these variables in the speech mode are 0.9 and 0.6,
`respectively.
`
`3.5
`
`Synthesis filter
`
`In Figure 2/G.728, there are two synthesis filters (blocks 9 and 22) with identical coefficients. Both filters are
`updated by the backward synthesis filter adapter (block 23). Each synthesis filter is a 50-th order all-pole filter that
`consists of a feedback loop with a 50-th order LPC predictor in the feedback branch. The transfer function of the
`synthesis filter is F(z) = 1/[1 – P(z)], where P(z) is the transfer function of the 50-th order LPC predictor.
`
`Recommendation G.728 (09/92)
`
`9
`
`Page 11 of 65
`
`
`
`After the weighted speech vector v(n) has been obtained, a zero-input response vector r(n) will be generated
`using the synthesis filter (block 9) and the perceptual weighting filter (block 10). To accomplish this, we first open the
`switch 5, i.e. point it to node 6. This implies that the signal going from node 7 to the synthesis filter 9 will be zero. We
`then let the synthesis filter 9 and the perceptual weighting filter 10 “ring” for five samples (one vector). This means that
`we continue the filtering operation for five samples with a zero signal applied at node 7. The resulting output of the
`perceptual weighting filter 10 is the desired zero-input response vector r(n).
`
`Note that except for the vector right after initialization, the memory of the filters 9 and 10 is in general
`non-zero; therefore, the output vector r(n) is also non-zero in general, even though the filter input from node 7 is zero. In
`effect, this vector r(n) is the response of the two filters to previous gain-scaled excitation vectors e(n – 1), e(n – 2), ...
`This vector actually represents the effect due to filter memory up to time (n – 1).
`
`3.6
`
`VQ target vector computation
`
`This block subtracts the zero-input response vector r(n) from the weighted speech vector v(n) to obtain the VQ
`codebook search target vector x(n).
`
`3.7
`
`Backward synthesis filter adapter
`
`This adapter 23 updates the coefficients of the synthesis filters 9 and 22. It takes the quantized (synthesized)
`speech as input and produces a set of synthesis filter coefficients as output. Its operation is quite similar to the perceptual
`weighting filter adapter 3.
`
`A blown-up version of this adapter is shown in Figure 5/G.728. The operation of the hybrid windowing
`module 49 and the Levinson-Durbin recursion module 50 is exactly the same as their counterparts (36 and 37) in
`Figure 4a)/G.728, except for the following three differences:
`
`the input signal is now the quantized speech rather than the unquantized input speech;
`
`the predictor order is 50 rather than 10;
`
`1 4
`
`0 = 0.992833749
`
` = Ł(cid:230) ł(cid:246)3
`
`4
`
`the hybrid window parameters are different: N = 35, a
`
`a)
`
`b)
`
`c)
`
`Note that the update period is still L = 20, and the white noise correction factor is still 257/256 = 1.00390625.
`
`^P
`
`Let
`
`(z) be the transfer function of the 50-th order LPC predictor, then it has the form
`
`50
`
`^P(z) = –(cid:229)
`
`i=1
`
`^ai z–i
`
`(3-5)
`
`where âi are the predictor coefficients. To improve robustness to channel errors, these coefficients are modified so that
`the peaks in the resulting LPC spectrum have slightly larger bandwidths. The bandwidth expansion module 51 performs
`this bandwidth expansion procedure in the following way. Given the LPC predictor coefficients âi, a new set of
`coefficients ai is computed according to
`
`where l
`
` is given by
`
`ai = l
`
`i ^ai , i = 1, 2, . . ., 50
`
` =
`
`253
`256 = 0.98828125
`
`(3-6)
`
`(3-7)
`
`10
`
`Recommendation G.728 (09/92)
`
`Page 12 of 65
`
`l
`
`
`Quantized speech
`
`23
`
`49
`
`Hybrid
`windowing
`module
`
`Levinson-
`Durbin
`recursion
`module
`
`50
`
`51
`
`Bandwidth
`expansion
`module
`
`T1506790-92
`
`Synthesis filter
`coefficients
`
`FIGURE 5/G.728
`
`Backward synthesis filter adapter
`
`This has the effects of moving all the poles of the synthesis filter radially toward the origin by a factor of l
`Since the poles are moved away from the unit circle, the peaks in the frequency response are widened.
`
`.
`
`After such bandwidth expansion, the modified LPC predictor has a transfer function of
`
`50
`
`P(z) = –(cid:229)
` ai z–i
`i=1
`
`(3-8)
`
`The modified coefficients are then fed to the synthesis filters 9 and 22. These coefficients are also fed to the
`impulse response vector calculator 12.
`
`The synthesis filters 9 and 22 both have a transfer function of
`
`F(z) =
`
`1
`1 – P(z)
`
`(3-9)
`
`Similar to the perceptual weighting filter, the synthesis filters 9 and 22 are also updated once every four
`vectors, and the updates also occur at the third speech vector of every 4-vector adaptation cycle. However, the updates
`are based on the quantized speech up to the last vector of the previous adaptation cycle. In other words, a delay of two
`vectors is introduced before the updates take place. This is because the Levinson-Durbin recursion module 50 and the
`energy table calculator 15 (described later) are computationally intensive. As a result, even though the autocorrelation
`
`Recommendation G.728 (09/92)
`
`11
`
`Page 13 of 65
`
`
`
`of previously quantized speech is available at the first vector of each four vector cycle, computations may require more
`than one vector worth of time. Therefore, to maintain a basic buffer size of one vector (so as to keep the coding delay
`low), and to maintain real-time operation, a 2-vector delay in filter updates is introduced in order to facilitate real-time
`implementation.
`
`3.8
`
`Backward vector gain adapter
`
`This adapter updates the excitation gain s (n) for every vector time index n. The excitation gain s (n) is a
`scaling factor used to scale the selected excitation vector y(n). The adapter 20 takes the gain-scaled excitation vector e(n)
`as its input, and produces an excitation gain s (n) as its output. Basically, it attempts to “predict” the gain of e(n) based
`on the gains of e(n – 1), e(n – 2), ... by using adaptive linear prediction in the logarithmic gain domain. This backward
`vector gain adapter 20 is shown in more detail in Figure 6/G.728.
`
`46
`
`(n)
`
`Log-gain
`linear
`predictor
`
`47
`
`Log-gain
`limiter
`
`45
`Bandwidth
`expansion
`module
`
`44
`Levinson-
`Durbin
`recursion
`module
`
`Excitation gain
`
`s ( n)
`
`48
`
`Inverse
`logarithm
`calculator
`
`41
`
`Log-gain
`offset value
`holder
`
`Gain-scaled
`excitation vector
`
`20
`
`e (n)
`
`67
`
`1-vector
`delay
`
`e (n –1)
`
`39
`Root-mean-
`square (RMS)
`calculator
`
`T1506800-92
`
`43
`
`Hybrid
`windowing
`module
`
`(n –1)
`
`42
`
`40
`
`Logarithm
`calculator
`
`FIGURE 6/G.728
`
`Backward vector gain adapter
`
`Refer to Figure 6/G.728. This gain adapter operates as follows. The 1-vector delay unit 67 makes the previous
`gain-scaled excitation vector e(n – 1) available. The root-mean-square (RMS) calculator 39 then calculates the RMS
`value of the vector e(n – 1). Next, the logarithm calculator 40 calculates the dB value of the RMS of e(n – 1), by first
`computing the base 10 logarithm and then multiplying the result by 20.
`
`In Figure 6/G.728, a log-gain offset value of 32 dB is stored in the log-gain offset value holder 41. This value
`is meant to be roughly equal to the average excitation gain level (in dB) during voiced speech. The adder 42 subtracts
`this log-gain offset value from the logarithmic gain produced by the logarithm calculator 40. The resulting offset-
`removed logarithmic gain d (n – 1) is then used by the hybrid windowing module 43 and the Levinson-Durbin recursion
`
`12
`
`Recommendation G.728 (09/92)
`
`Page 14 of 65
`
`d
`d
`
`
`module 44. Again, blocks 43 and 44 operate in exactly the same way as blocks 36 and 37 in the perceptual weighting
`filter adapter module (Figure 4a)/G.728), except that the hybrid window parameters are different and that the signal
`under analysis is now the offset-removed logarithmic gain rather than the input speech. (Note that only one gain value is
`produced for every five speech samples.) The hybrid window parameters of block 43 are:
`
`1 8
`
` = 0.96467863
`
` = Ł(cid:230) ł(cid:246)3
`
`4
`
`M = 10, N = 20, L = 4, a
`
`The output of the Levinson-Durbin recursion module 44 is the coefficients of a 10-th order linear predictor
`with a transfer function of
`
`^R(z) = –(cid:229)
`
` ^a
`
`i z–i
`
`10
`
`i=1
`
`(3-10)
`
`The bandwidth expansion module 45 then moves the roots of this polynomial radially toward the z-plane
`original in a way similar to the module 51 in Figure 5/G.728. The resulting bandwidth-expanded gain predictor has a
`transfer function of
`
`where the coefficients a
`
`i are computed as
`
`10
`
`R(z) = –(cid:229)
`
`i=1
`
` a
`
`i z–i
`
`i = Ł(cid:230) ł(cid:246)29
`
`32
`
`i
`
` ^a
`
`i = (0.90625)i ^a
`
`i
`
`(3-11)
`
`(3-12)
`
`These a
`
`Such bandwidth expansion makes the gain adapter (block 20 in Figure 2/G.728) more robust to channel errors.
`i are then used as the coefficients of the log-gain linear predictor (block 46 of Figure 6/G.728).
`
`This predictor 46 is updated once every four speech vectors, and the updates take place at the second speech
`vector of every 4-vector adaptation cycle. The predictor attempts to predict d (n) based on a linear combination of
`
`d (n – 1), d (n – 2), ..., d (n – 10). The predicted version of d (n) is denoted as ^d (n) and is given by
`
`10
`
`^d (n) = –(cid:229)
`
`i=1
`
` a
`
`i d (n – i)
`
`(3-13)
`
`^d (n) has been produced by the log-gain linear predictor 46, we add back the log-gain offset value of
`After
`32 dB stored in 41. The log-gain limiter 47 then checks the resulting log-gain value and clips it if the value is
`unreasonably large or unreasonably small. The lower and upper limits are set to 0 dB and 60 dB, respectively. The gain
`limiter output is then fed to the inverse logarithm calculator 48, which reverses the operation of the logarithm calculator
`40 and converts the gain from the dB value to the linear domain. The gain limiter ensures that the gain in the linear
`domain is in between 1 and 1000.
`
`3.9
`
`Codebook search module
`
`In Figure 2/G.728, blocks 12 through 18 constitute a codebook search module 24. This module searches
`through the 1024 candidate codevectors in the excitation VQ codebook 19 and identifies the index of the best codevector
`which gives a corresponding quantized speech vector that is closest to the input speech vector.
`
`Recommendation G.728 (09/92)
`
`13
`
`Page 15 of 65
`
`a
`
`
`To reduce the codebook search complexity, the 10-bit, 1024-entry codebook is decomposed into two smaller
`codebooks: a 7-bit “shape codebook” containing 128 independent codevectors and a 3-bit “gain codebook” containing
`eight scalar values that are symmetric with respect to zero (i.e. one bit for sign, two bits for magnitude). The final output
`codevector is the product of the best shape codevector (from the 7-bit shape codebook) and the best gain level (from the
`3-bit gain codebook). The 7-bit shape codebook table and the 3-bit gain codebook table are