throbber
INTERNATIONAL TELECOMMUNICATION UNION
`
`CCITT
`
`THE INTERNATIONAL
`TELEGRAPH AND TELEPHONE
`CONSULTATIVE COMMITTEE
`
`G.728
`(09/92)
`
`GENERAL ASPECTS OF DIGITAL
`TRANSMISSION SYSTEMS;
`TERMINAL EQUIPMENTS
`
`CODING OF SPEECH AT 16 kbit/s
`USING LOW-DELAY CODE EXCITED
`LINEAR PREDICTION
`
`Recommendation G.728
`
`Geneva, 1992
`
`Ex. 1037 / Page 1 of 65
`Apple v. Saint Lawrence
`
`

`

`FOREWORD
`
`The CCITT (the International Telegraph and Telephone Consultative Committee) is a permanent organ of the
`International Telecommunication Union (ITU). CCITT is responsible for studying technical, operating and tariff
`questions and issuing Recommendations on them with a view to standardizing telecommunications on a worldwide
`basis.
`
`The Plenary Assembly of CCITT which meets every four years, establishes the topics for study and approves
`Recommendations prepared by its Study Groups. The approval of Recommendations by the members of CCITT between
`Plenary Assemblies is covered by the procedure laid down in CCITT Resolution No. 2 (Melbourne, 1988).
`
`Recommendation G.796 was prepared by Study Group XV and was approved under the Resolution No. 2
`procedure on the 1st of September 1992.
`
`___________________
`
`CCITT NOTES
`
`In this Recommendation, the expression “Administration” is used for conciseness to indicate both a
`1)
`telecommunication administration and a recognized private operating agency.
`
`2)
`
`A list of abbreviations used in this Recommendation can be found in Annex F.
`
`All rights reserved. No part of this publication may be reproduced or utilized in any form or by any means, electronic or
`mechanical, including photocopying and microfilm, without permission in writing from the ITU.
`
` ITU 1992
`
`Ex. 1037 / Page 2 of 65
`

`

`

`Recommendation G.728
`
`Recommendation G.728
`
`1
`
`Introduction
`
`CODING OF SPEECH AT 16 kbit/s USING LOW-DELAY
`CODE EXCITED LINEAR PREDICTION
`
`(1992)
`
`This Recommendation contains the description of an algorithm for the coding of speech signals at 16 kbit/s
`using low-delay code excited linear prediction (LD-CELP). This Recommendation is organized as follows.
`
`In § 2 a brief outline of the LD-CELP algorithm is given. In §§ 3 and 4, the LD-CELP encoder and LD-CELP
`decoder principles are discussed, respectively. In § 5, the computational details pertaining to each functional algorithmic
`block are defined. Annexes A, B, C and D contain tables of constants used by the LD-CELP algorithm. In Annex E the
`sequencing of variable adaptation and use is given. Finally, in Appendix I information is given on procedures applicable
`to the implementation verification of the algorithm.
`
`Under further study is the future incorporation of three additional appendices (to be published separately)
`consisting of LD-CELP network aspects, LD-CELP fixed-point implementation description, and LD-CELP fixed-point
`verification procedures.
`
`2
`
`Outline of LD-CELP
`
`The LD-CELP algorithm consists of an encoder and a decoder described in §§ 2.1 and 2.2 respectively, and
`illustrated in Figure 1/G.728.
`
`The essence of CELP techniques, which is an analysis-by-synthesis approach to codebook search, is retained
`in LD-CELP. The LD-CELP however, uses backward adaptation of predictors and gain to achieve an algorithmic delay
`of 0.625 ms. Only the index to the excitation codebook is transmitted. The predictor coefficients are updated through
`LPC analysis of previously quantized speech. The excitation gain is updated by using the gain information embedded in
`the previously quantized excitation. The block size for the excitation vector and gain adaptation is five samples only. A
`perceptual weighting filter is updated using LPC analysis of the unquantized speech.
`
`2.1
`
`LD-CELP encoder
`
`After the conversion from A-law or m -law PCM to uniform PCM, the input signal is partitioned into blocks of
`five-consecutive input signal samples. For each input block, the encoder passes each of 1024 candidate codebook
`vectors (stored in an excitation codebook) through a gain scaling unit and a synthesis filter. From the resulting 1024
`candidate quantized signal vectors, the encoder identifies the one that minimizes a frequency-weighted mean-squared
`error measure with respect to the input signal vector. The 10-bit codebook index of the corresponding best codebook
`vector (or “codevector”), which gives rise to that best candidate quantized signal vector, is transmitted to the decoder.
`The best codevector is then passed through the gain scaling unit and the synthesis filter to establish the correct filter
`memory in preparation for the encoding of the next signal vector. The synthesis filter coefficients and the gain are
`updated periodically in a backward adaptive manner based on the previously quantized signal and gain-scaled excitation.
`
`Recommendation G.728 (09/92)
`
`1
`
`Ex. 1037 / Page 3 of 65
`
`

`

`64 kbit/s
`A-law or µ-law
`PCM input
`
`Convert to
`uniform
`PCM
`
`Vector
`buffer
`
`Excitation
`VQ
`codebook
`
`Gain
`
`Synthesis
`filter
`
`Perceptual
`weighting
`filter
`
`VQ
`index
`
`16 kbit/s
`output
`
`Min.
`MSE
`
`Backward
`gain
`adaptation
`
`Backward
`predictor
`adaptation
`
`a) LD-CELP encoder
`
`VQ
`index
`
`16 kbit/s
`input
`
`Excitation
`VQ
`codebook
`
`Gain
`
`Synthesis
`filter
`
`Postfilter
`
`Convert
`to PCM
`
`64 kbit/s
`A-law or µ-law
`PCM output
`
`T1506740-92
`
`Backward
`gain
`adaptation
`
`Backward
`predictor
`adaptation
`
`b) LD-CELP decoder
`
`FIGURE 1/G.728
`Simplified block diagram of LD-CELP coder
`
`2.2
`
`LD-CELP decoder
`
`The decoding operation is also performed on a block-by-block basis. Upon receiving each 10-bit index, the
`decoder performs a table look-up to extract the corresponding codevector from the excitation codebook. The extracted
`codevector is then passed through a gain scaling unit and a synthesis filter to produce the current decoded signal vector.
`The synthesis filter coefficients and the gain are then updated in the same way as in the encoder. The decoded signal
`vector is then passed through an adaptive postfilter to enhance the perceptual quality. The postfilter coefficients are
`updated periodically using the information available at the decoder. The five samples of the postfilter signal vector are
`next converted to five A-law or m -law PCM output samples.
`
`2
`
`Recommendation G.728 (09/92)
`
`Ex. 1037 / Page 4 of 65
`
`

`

`64 kbit/s
`A-law or µ-law
`PCM input speech
`
`s (k)o
`
`1
`Input PCM
`format
`conversion
`
`Linear
`PCM input
`speech
`s (k)u
`
`Vector
`buffer
`
`2
`
`Input
`speech
`vector
`
`s (n)
`
`Simulated decoder 8
`
`19
`
`Excitation
`VQ codebook
`
`y (n)
`
`21
`
`Gain
`
`e (n)
`
`s( n)
`
`20
`
`Backward
`vector gain
`adapter
`
`Synthesis
`filter
`
`22
`
`Quantized
`speech
`
`S (n)
`q
`
`23
`
`P (z)
`
`Backward
`synthesis
`filter adapter
`
`3
`Adapter for
`perceptual
`weighting
`filter
`
`W (z)
`
`4
`Perceptual
`weighting
`filter
`
`v (n)
`
`11
`
`10
`
`5
`
`6
`
`9
`
`7
`
`Synthesis
`filter
`
`Perceptual
`weighting
`filter
`
`r (n)
`
`VQ target
`vector
`computation
`
`x (n)
`
`16
`
`VQ target
`vector
`normalization
`
`x (n)^
`13
`
`Time-
`reversed
`convolution
`module
`
`p (n)
`
`Codebook
`search module
`24
`
`12
`
`Impulse
`response
`vector
`calculator
`
`h (n)
`
`14
`
`Shape
`codevector
`convolution
`module
`
`15
`
`Yj
`
`17
`
`jE
`
`Error
`calculator
`
`Energy table
`calculator
`
`18
`
`Best
`codebook
`index
`selector
`
`Best codebook index
`
`T1506750-92
`Codebook index
`to communication
`channel
`
`FIGURE 2/G.728
`LD-CELP encoder block schematic
`
`Recommendation G.728 (09/92)
`
`3
`
`Ex. 1037 / Page 5 of 65
`
`

`

`64 kbit/s
`A-law or µ-law
`PCM output
`speech
`
`28
`
`Output
`PCM format
`conversion
`
`T1506760-92
`
`34
`
`35
`
`Codebook index
`from communi-
`cation channel
`
`29
`
`Excitation
`VQ
`codebook
`
`31
`
`Gain
`
`32
`
`Synthesis
`filter
`
`Decoded
`speech
`
`Postfilter
`
`30
`
`33
`
`Backward
`vector gain
`adapter
`
`Backward
`synthesis
`filter adapter
`
`Postfilter
`adapter
`
`10th-order LPC predictor
`coefficients and first
`reflection coefficient
`
`FIGURE 3/G.728
`
`LD-CELP decoder block schematic
`
`3
`
`LD-CELP (encoder principles)
`
`Figure 2/G.728 is a detailed block schematic of the LD-CELP encoder. The encoder in Figure 2/G.728 is
`mathematically equivalent to the encoder previously shown in Figure 1/G.728 but is computationally more efficient to
`implement.
`
`In the following description:
`
`a)
`
`b)
`
`for each variable to be described, k is the sampling index and samples are taken at 125 m s intervals;
`
`a group of five consecutive samples in a given signal is called a vector of that signal. For example, five
`consecutive speech samples form a speech vector, five excitation samples form an excitation vector, and
`so on;
`
`c) we use n to denote the vector index, which is different from the sample index k;
`
`d)
`
`four consecutive vectors build one adaptation cycle. In a later section, we also refer to adaptation cycles
`as frames. The two terms are used interchangeably.
`
`The excitation vector quantization (VQ) codebook index is the only information explicitly transmitted from the
`encoder to the decoder. Three other types of parameters will be periodically updated: the excitation gain, the synthesis
`filter coefficients, and the perceptual weighting filter coefficients. These parameters are derived in a backward adaptive
`manner from signals that occur prior to the current signal vector. The excitation gain is updated once per vector, while
`the synthesis filter coefficients and the perceptual weighting filter coefficients are updated once every four vectors (i.e. a
`20-sample, or 2.5 ms update period). Note that, although the processing sequence in the algorithm has an adaptation
`cycle of four vectors (20 samples), the basic buffer size is still only one vector (five samples). This small buffer size
`makes it possible to achieve a one-way delay less than 2 ms.
`
`A description of each block of the encoder is given below. Since the LD-CELP coder is mainly used for
`encoding speech, for convenience of description, in the following we will assume that the input signal is speech,
`although in practice it can be other non-speech signals as well.
`
`4
`
`Recommendation G.728 (09/92)
`
`Ex. 1037 / Page 6 of 65
`
`

`

`3.1
`
`Input PCM format conversion
`
`This block converts the input A-law or m -law PCM signal so(k) to a uniform PCM signal su(k).
`
`3.1.1
`
`Internal linear PCM levels
`
`In converting from A-law or m -law to linear PCM, different internal representations are possible, depending on
`the device. For example, standard tables for m -law PCM define a linear range of –4 015.5 to +4 015.5. The
`corresponding range for A-law PCM is –2 016 to +2 016. Both tables list some output values having a fractional part of
`0.5. These fractional parts cannot be represented in an integer device unless the entire table is multiplied by 2 to make all
`of the values integers. In fact, this is what is most commonly done in fixed point digital signal processing (DSP) chips.
`On the other hand, floating point DSP chips can represent the same values listed in the tables. Throughout this document
`it is assumed that the input signal has a maximum range of –4 095 to +4 095. This encompasses both the m -law and A-
`law cases. In the case of A-law it implies that when the linear conversion results in a range of –2 016 to +2 016, those
`values should be scaled up by a factor of 2 before continuing to encode the signal. In the case of m -law input to a fixed
`point processor where the input range is converted to –8 031 to +8 031, it implies that values should be scaled down by a
`factor of 2 before beginning the encoding process. Alternatively, these values can be treated as being in Q1 format,
`meaning there is one bit to the right of the decimal point. All computation involving the data would then need to take
`this bit into account.
`
`For the case of 16-bit linear PCM input signals having full dynamic range of –32 768 to +32 767, the input
`values should be considered to be in Q3 format. This means that the input values should be scaled down (divided) by a
`factor of 8. On output at the decoder the factor of 8 would be restored for these signals.
`
`3.2
`
`Vector buffer
`
`This block buffers five consecutive speech samples su(5n), su(5n + 1), ..., su(5n + 4) to form a 5-dimensional
`speech vector s(n) = [su(5n), su(5n + 1), ..., su(5n + 4)].
`
`3.3
`
`Adapter for perceptual weighting filter
`
`Figure 4/G.728 shows the detailed operation of the perceptual weighting filter adapter (block 3 in
`Figure 2/G.728). This adapter calculates the coefficients of the perceptual weighting filter once every four speech
`vectors based on linear prediction analysis (often referred to as LPC analysis) of unquantized speech. The coefficient
`updates occur at the third speech vector of every 4-vector adaptation cycle. The coefficients are held constant in between
`updates.
`
`Refer to Figure 4a)/G.728. The calculation is performed as follows. First, the input (unquantized) speech
`vector is passed through a hybrid windowing module (block 36) which places a window on previous speech vectors and
`calculates the first 11 autocorrelation coefficients of the windowed speech signal as the output. The Levinson-Durbin
`recursion module (block 37) then converts these autocorrelation coefficients to predictor coefficients. Based on these
`predictor coefficients, the weighting filter coefficient calculator (block 38) derives the desired coefficients of the
`weighting filter. These three blocks are discussed in more detail below.
`
`First, let us describe the principles of hybrid windowing. Since this hybrid windowing technique will be used
`in three different kinds of LPC analyses, we first give a more general description of the technique and then specialize it
`to different cases. Suppose the LPC analysis is to be performed once every L signal samples. To be general, assume that
`the signal samples corresponding to the current LD-CELP adaptation cycle are su(m), su(m + 1), su(m + 2), ..., su(m + L –
`1). Then, for backward-adaptive LPC analysis, the hybrid window is applied to all previous signal samples with a
`sample index less than m (as shown in Figure 4b)/G.728). Let there be N non-recursive samples in the hybrid window
`function. Then, the signal samples su(m – 1), su(m – 2), ..., su(m – N) are all weighted by the non-recursive portion of the
`window. Starting with su(m – N – 1), all signal samples to the left of (and including) this sample are weighted by the
`, ba 2, ..., where 0 < b < 1 and 0 < a
`recursive portion of the window, which has values b, ba
` < 1.
`
`Recommendation G.728 (09/92)
`
`5
`
`Ex. 1037 / Page 7 of 65
`
`

`

`Input speech
`
`3
`
`36
`
`Hybrid
`windowing
`module
`
`37
`
`Levinson-
`Durbin
`recursion
`module
`
`38
`
`Weighting filter
`coefficient
`calculator
`
`T1506770-92
`
`Perceptual
`weighting filter
`coefficient
`
`FIGURE 4a)/G.728
`
`Perceptual weighting filter adapter
`
`Recursive portion
`
`Non-recursive portion
`
`b
`
`b a
`
`b a²
`
`w (n):m
`
`window function
`
`Current frame
`
`Next frame
`
`m
`
`m–1
`
`Time
`
`m+2L–1
`
`m+L
`m+L–1
`
`T1506780-92
`
`m–N
`m–-N–1
`
`FIGURE 4b)/G.728
`
`Illustration of a hybrid window
`
`6
`
`Recommendation G.728 (09/92)
`
`Ex. 1037 / Page 8 of 65
`
`

`

`At time m, the hybrid window function wm(k) is defined as
`
`wm(k) = (cid:238)(cid:237)(cid:236)
`
` fm(k) = ba –[k–(m–N–1)],
` gm(k) = –sin [c(k – m)],
` 0,
`
`
`if
`if m – N £
`if
`
`
`k £
`k £
`k ‡
`
` m – N – 1
` m – 1
` m
`
`and the window-weighted signal is
`
`sm(k) = su(k) wm(k) = (cid:238)(cid:237)(cid:236)
`
` su(k) fm(k) = su(k) ba –[k–(m–N–1)],
` su(k) gm(k) = –su(k) sin [c(k – m)],
` 0,
`
`
`if
`if m – N £
`if
`
`
`k £
`k £
`k ‡
`
` m – N – 1
` m – 1
` m
`
`(3-1a)
`
`(3-1b)
`
`The samples of non-recursive portion gm(k) and the initial section of the recursive portion fm(k) for different
`hybrid windows are specified in Annex A. For an M-th order LPC analysis, we need to calculate M + 1 autocorrelation
`coefficients Rm(i) for i = 0, 1, 2, ..., M. The i-th autocorrelation coefficient for the current adaptation cycle can be
`expressed as
`
`Rm(i) =
`
`m–1
` sm(k) sm(k – i) = rm(i) +
`k = –¥
`
`m–1
` sm(k) sm(k – i)
`k = m–N
`
`where
`
`rm(i) =
`
`m–N–1
` sm(k) sm(k – i) =
`k = –¥
`
`m–N–1
` su(k) su(k – i) fm(k) fm(k – i)
`k = –¥
`
`(3-1c)
`
`(3-1d)
`
`On the right-hand side of equation (3-1c), the first term rm(i) is the “recursive component” of Rm(i), while the
`second term is the “non-recursive component”. The finite summation of the non-recursive component is calculated for
`each adaptation cycle. On the other hand, the recursive component is calculated recursively. The following paragraphs
`explain how.
`
`Suppose we have calculated and stored all rm(i)s for the current adaptation cycle and want to go on to the next
`adaptation cycle, which starts at sample su(m + L). After the hybrid window is shifted to the right by L samples, the new
`window-weighted signal for the next adaptation cycle becomes
`
`sm+L(k) = su(k) wm+L(k) =
`
` su(k) fm+L(k) = su(k) fm(k) a L,
` su(k) gm+L(k) = –su(k) sin [c(k – m – L)],
`0,
`
`if
`if m + L – N £
`if
`
`
`k £
`k £
`k ‡
`
` m + L – N – 1
` m + L – 1
` m + L
`
`(3-1e)
`
`The recursive component of Rm + L(i) can be written as
`
`rm+L(i) =
`
`m+L–N–1
` sm+L(k) sm+L(k – i)
`k = –¥
`
`=
`
`m–N–1
` sm+L(k) sm+L(k – i) +
`
`
`k = –¥
`
`m+L–N–1
` sm+L(k) sm+L(k – i)
`k = m–N
`
`=
`
`m–N–1
` su(k) fm(k) a L su(k – i) fm(k – i) a L +
`
`
`k = –¥
`
`m+L–N–1
` sm+L(k) sm+L(k – i)
`k = m–N
`
`or
`
`rm+L(i) = a 2L rm(i) +
`
`m+L–N–1
` sm+L(k) sm+L(k – i)
`k = m–N
`
`(3-1f)
`
`(3-1g)
`
`Recommendation G.728 (09/92)
`
`7
`
`Ex. 1037 / Page 9 of 65
`
`(cid:229)
`(cid:229)
`(cid:229)
`(cid:229)
`(cid:238)(cid:237)(cid:236)
`(cid:229)
`(cid:229)
`(cid:229)
`(cid:229)
`(cid:229)
`(cid:229)
`

`

`Therefore, rm+L(i) can be calculated recursively from rm(i) using equation (3-1g). This newly calculated
`rm+L(i) is stored back to memory for use in the following adaptation cycle. The autocorrelation coefficient rm+L(i) is then
`calculated as
`
`Rm+L(i) = rm+L(i) +
`
`m+L–1
` sm+L(k) sm+L(k – i)
`k = m+L–N
`
`(3-1h)
`
`So far we have described in a general manner the principles of a hybrid window calculation procedure. The
`parameter values for the hybrid windowing module 36 in Figure 4a)/G.728 are
`
`0 = 0.982820598 Ł(cid:230) ł(cid:246)
`
` so that a 2L =
`
`
`
`12
`
`1 4
`
`M = 10, L = 20, N = 30 and a
`
` = Ł(cid:230) ł(cid:246)1
`
`2
`
`Once the 11 autocorrelation coefficients R(i), i = 0, 1, ..., 10 are calculated by the hybrid windowing procedure
`described above, a “white noise correction” procedure is applied. This is done by increasing the energy R(0) by a small
`amount:
`
`R(0) ‹
`
`Ł(cid:230) ł(cid:246)
`257
`256 R(0)
`
`
`
`(3-1i)
`
`This has the effect of filling the spectral valleys with white noise so as to reduce the spectral dynamic range
`and alleviate ill-conditioning of the subsequent Levinson-Durbin recursion. The white noise correction factor (WNCF)
`of 257/256 corresponds to a white noise level about 24 dB below the average speech power.
`
`Next, using the white noise corrected autocorrelation coefficients, the Levinson-Durbin recursion module 37
`recursively computes the predictor coefficients from order 1 to order 10. Let the j-th coefficients of the i-th order
`predictor be aj(i). Then, the recursive procedure can be specified as follows:
`
`(3-2a)
`
`(3-2b)
`
`(3-2c)
`
`(3-2d)
`
`(3-2e)
`
`E(0) = R(0)
`
`i–1
`
`R(i) + (cid:229)
`
` a
`
`(i–1)
` R(i – j)
`j
`
`ki = –
`
`j=1
`E(i – 1)
`
`a
`
`(i)
` = ki
`i
`
`a
`
`(i)
` = a
`j
`
`(i–1)
` + ki a
`j
`
`(i–1)
` ;mmmm1 £
`i–j
`
` j £
`
` i – 1
`
`2 i
`
`E(i) = (1 – k
`
`) E(i – 1)
`
`Equations (3-2b) through (3-2e) are evaluated recursively for i = 1, 2, ..., 10, and the final solution is given by
`
`qi = a
`
`(10)
` ,mmmm1 £
`i
`
` i £
`
` 10
`
`(3-2f)
`
`8
`
`Recommendation G.728 (09/92)
`
`Ex. 1037 / Page 10 of 65
`
`(cid:229)
`

`

`If we define q0=1, then the 10-th order “prediction-error filter” (sometimes called “analysis filter”) has the
`transfer function
`
`(3-3a)
`
`(3-3b)
`
`10
`
`(z) = (cid:229)
` qi z–i
`i=0
`
`~Q
`
`and the corresponding 10-th order linear predictor is defined by the following transfer function
`
`10
`
`Q(z) = –(cid:229)
` qi z–i
`i=1
`
`The weighting filter coefficient calculator (block 38) calculates the perceptual weighting filter coefficients
`according to the following equations:
`
`and
`
`Q(z / g 1) = –(cid:229)
`
`10
`
` (qi g i
`1) z–i
`
`i=1
`
`Q(z / g 2) = –(cid:229)
`
`10
`
` (qi g i
`2) z–i
`
`i=1
`
`(3-4b)
`
`(3-4c)
`
`The perceptual weighting filter is a 10-th order pole-zero filter defined by the transfer function W(z) in
`equation (3-4a). The values of g 1 and g 2 are 0.9 and 0.6, respectively.
`
`Now refer to Figure 2/G.728. The perceptual weighting filter adapter (block 3) periodically updates the
`coefficients of W(z) according to equations (3-2) through (3-4), and feeds the coefficients to the impulse response vector
`calculator (block 12) and the perceptual weighting filters (blocks 4 and 10).
`
`3.4
`
`Perceptual weighting filter
`
`In Figure 2/G.728, the current input speech vector s(n) is passed through the perceptual weighting filter
`(block 4), resulting in the weighted speech vector v(n). Note that except during initialization, the filter memory
`(i.e. internal state variables, or the values held in the delay units of the filter) should not be reset to zero at any time. On
`the other hand, the memory of the perceptual weighting filter (block 10) will need special handling as described later.
`
`3.4.1
`
`Non-speech operation
`
`For modem signals or other non-speech signals, CCITT test results indicate that it is desirable to disable the
`perceptual weighting filter. This is equivalent to setting W(z)=1. This can most easily be accomplished if g 1 and g 2 in
`equation (3-4a) are set equal to zero. The nominal values for these variables in the speech mode are 0.9 and 0.6,
`respectively.
`
`3.5
`
`Synthesis filter
`
`In Figure 2/G.728, there are two synthesis filters (blocks 9 and 22) with identical coefficients. Both filters are
`updated by the backward synthesis filter adapter (block 23). Each synthesis filter is a 50-th order all-pole filter that
`consists of a feedback loop with a 50-th order LPC predictor in the feedback branch. The transfer function of the
`synthesis filter is F(z) = 1/[1 – P(z)], where P(z) is the transfer function of the 50-th order LPC predictor.
`
`Recommendation G.728 (09/92)
`
`9
`
`Ex. 1037 / Page 11 of 65
`
`

`

`After the weighted speech vector v(n) has been obtained, a zero-input response vector r(n) will be generated
`using the synthesis filter (block 9) and the perceptual weighting filter (block 10). To accomplish this, we first open the
`switch 5, i.e. point it to node 6. This implies that the signal going from node 7 to the synthesis filter 9 will be zero. We
`then let the synthesis filter 9 and the perceptual weighting filter 10 “ring” for five samples (one vector). This means that
`we continue the filtering operation for five samples with a zero signal applied at node 7. The resulting output of the
`perceptual weighting filter 10 is the desired zero-input response vector r(n).
`
`Note that except for the vector right after initialization, the memory of the filters 9 and 10 is in general
`non-zero; therefore, the output vector r(n) is also non-zero in general, even though the filter input from node 7 is zero. In
`effect, this vector r(n) is the response of the two filters to previous gain-scaled excitation vectors e(n – 1), e(n – 2), ...
`This vector actually represents the effect due to filter memory up to time (n – 1).
`
`3.6
`
`VQ target vector computation
`
`This block subtracts the zero-input response vector r(n) from the weighted speech vector v(n) to obtain the VQ
`codebook search target vector x(n).
`
`3.7
`
`Backward synthesis filter adapter
`
`This adapter 23 updates the coefficients of the synthesis filters 9 and 22. It takes the quantized (synthesized)
`speech as input and produces a set of synthesis filter coefficients as output. Its operation is quite similar to the perceptual
`weighting filter adapter 3.
`
`A blown-up version of this adapter is shown in Figure 5/G.728. The operation of the hybrid windowing
`module 49 and the Levinson-Durbin recursion module 50 is exactly the same as their counterparts (36 and 37) in
`Figure 4a)/G.728, except for the following three differences:
`
`the input signal is now the quantized speech rather than the unquantized input speech;
`
`the predictor order is 50 rather than 10;
`
`1 4
`
`0 = 0.992833749
`
` = Ł(cid:230) ł(cid:246)3
`
`4
`
`the hybrid window parameters are different: N = 35, a
`
`a)
`
`b)
`
`c)
`
`Note that the update period is still L = 20, and the white noise correction factor is still 257/256 = 1.00390625.
`
`^P
`
`Let
`
`(z) be the transfer function of the 50-th order LPC predictor, then it has the form
`
`50
`
`^P(z) = –(cid:229)
`
`i=1
`
`^ai z–i
`
`(3-5)
`
`where âi are the predictor coefficients. To improve robustness to channel errors, these coefficients are modified so that
`the peaks in the resulting LPC spectrum have slightly larger bandwidths. The bandwidth expansion module 51 performs
`this bandwidth expansion procedure in the following way. Given the LPC predictor coefficients âi, a new set of
`coefficients ai is computed according to
`
`where l
`
` is given by
`
`ai = l
`
`i ^ai , i = 1, 2, . . ., 50
`
` =
`
`253
`256 = 0.98828125
`
`(3-6)
`
`(3-7)
`
`10
`
`Recommendation G.728 (09/92)
`
`Ex. 1037 / Page 12 of 65
`
`l
`

`

`Quantized speech
`
`23
`
`49
`
`Hybrid
`windowing
`module
`
`Levinson-
`Durbin
`recursion
`module
`
`50
`
`51
`
`Bandwidth
`expansion
`module
`
`T1506790-92
`
`Synthesis filter
`coefficients
`
`FIGURE 5/G.728
`
`Backward synthesis filter adapter
`
`This has the effects of moving all the poles of the synthesis filter radially toward the origin by a factor of l
`Since the poles are moved away from the unit circle, the peaks in the frequency response are widened.
`
`.
`
`After such bandwidth expansion, the modified LPC predictor has a transfer function of
`
`50
`
`P(z) = –(cid:229)
` ai z–i
`i=1
`
`(3-8)
`
`The modified coefficients are then fed to the synthesis filters 9 and 22. These coefficients are also fed to the
`impulse response vector calculator 12.
`
`The synthesis filters 9 and 22 both have a transfer function of
`
`F(z) =
`
`1
`1 – P(z)
`
`(3-9)
`
`Similar to the perceptual weighting filter, the synthesis filters 9 and 22 are also updated once every four
`vectors, and the updates also occur at the third speech vector of every 4-vector adaptation cycle. However, the updates
`are based on the quantized speech up to the last vector of the previous adaptation cycle. In other words, a delay of two
`vectors is introduced before the updates take place. This is because the Levinson-Durbin recursion module 50 and the
`energy table calculator 15 (described later) are computationally intensive. As a result, even though the autocorrelation
`
`Recommendation G.728 (09/92)
`
`11
`
`Ex. 1037 / Page 13 of 65
`
`

`

`of previously quantized speech is available at the first vector of each four vector cycle, computations may require more
`than one vector worth of time. Therefore, to maintain a basic buffer size of one vector (so as to keep the coding delay
`low), and to maintain real-time operation, a 2-vector delay in filter updates is introduced in order to facilitate real-time
`implementation.
`
`3.8
`
`Backward vector gain adapter
`
`This adapter updates the excitation gain s (n) for every vector time index n. The excitation gain s (n) is a
`scaling factor used to scale the selected excitation vector y(n). The adapter 20 takes the gain-scaled excitation vector e(n)
`as its input, and produces an excitation gain s (n) as its output. Basically, it attempts to “predict” the gain of e(n) based
`on the gains of e(n – 1), e(n – 2), ... by using adaptive linear prediction in the logarithmic gain domain. This backward
`vector gain adapter 20 is shown in more detail in Figure 6/G.728.
`
`46
`
`d (n)
`
`Log-gain
`linear
`predictor
`
`47
`
`Log-gain
`limiter
`
`45
`Bandwidth
`expansion
`module
`
`44
`Levinson-
`Durbin
`recursion
`module
`
`Excitation gain
`
`s ( n)
`
`48
`
`Inverse
`logarithm
`calculator
`
`41
`
`Log-gain
`offset value
`holder
`
`Gain-scaled
`excitation vector
`
`20
`
`e (n)
`
`67
`
`1-vector
`delay
`
`e (n –1)
`
`39
`Root-mean-
`square (RMS)
`calculator
`
`T1506800-92
`
`43
`
`Hybrid
`windowing
`module
`
`d (n –1)
`
`42
`
`40
`
`Logarithm
`calculator
`
`FIGURE 6/G.728
`
`Backward vector gain adapter
`
`Refer to Figure 6/G.728. This gain adapter operates as follows. The 1-vector delay unit 67 makes the previous
`gain-scaled excitation vector e(n – 1) available. The root-mean-square (RMS) calculator 39 then calculates the RMS
`value of the vector e(n – 1). Next, the logarithm calculator 40 calculates the dB value of the RMS of e(n – 1), by first
`computing the base 10 logarithm and then multiplying the result by 20.
`
`In Figure 6/G.728, a log-gain offset value of 32 dB is stored in the log-gain offset value holder 41. This value
`is meant to be roughly equal to the average excitation gain level (in dB) during voiced speech. The adder 42 subtracts
`this log-gain offset value from the logarithmic gain produced by the logarithm calculator 40. The resulting offset-
`removed logarithmic gain d (n – 1) is then used by the hybrid windowing module 43 and the Levinson-Durbin recursion
`
`12
`
`Recommendation G.728 (09/92)
`
`Ex. 1037 / Page 14 of 65
`
`

`

`module 44. Again, blocks 43 and 44 operate in exactly the same way as blocks 36 and 37 in the perceptual weighting
`filter adapter module (Figure 4a)/G.728), except that the hybrid window parameters are different and that the signal
`under analysis is now the offset-removed logarithmic gain rather than the input speech. (Note that only one gain value is
`produced for every five speech samples.) The hybrid window parameters of block 43 are:
`
`1 8
`
` = 0.96467863
`
` = Ł(cid:230) ł(cid:246)3
`
`4
`
`M = 10, N = 20, L = 4, a
`
`The output of the Levinson-Durbin recursion module 44 is the coefficients of a 10-th order linear predictor
`with a transfer function of
`
`^R(z) = –(cid:229)
`
` ^a
`
`i z–i
`
`10
`
`i=1
`
`(3-10)
`
`The bandwidth expansion module 45 then moves the roots of this polynomial radially toward the z-plane
`original in a way similar to the module 51 in Figure 5/G.728. The resulting bandwidth-expanded gain predictor has a
`transfer function of
`
`where the coefficients a
`
`i are computed as
`
`10
`
`R(z) = –(cid:229)
`
`i=1
`
` a
`
`i z–i
`
`i = Ł(cid:230) ł(cid:246)29
`
`32
`
`i
`
` ^a
`
`i = (0.90625)i ^a
`
`i
`
`(3-11)
`
`(3-12)
`
`These a
`
`Such bandwidth expansion makes the gain adapter (block 20 in Figure 2/G.728) more robust to channel errors.
`i are then used as the coefficients of the log-gain linear predictor (block 46 of Figure 6/G.728).
`
`This predictor 46 is updated once every four speech vectors, and the updates take place at the second speech
`vector of every 4-vector adaptation cycle. The predictor attempts to predict d (n) based on a linear combination of
`
`d (n – 1), d (n – 2), ..., d (n – 10). The predicted version of d (n) is denoted as ^d (n) and is given by
`
`10
`
`^d (n) = –(cid:229)
`
`i=1
`
` a
`
`i d (n – i)
`
`(3-13)
`
`^d (n) has been produced by the log-gain linear predictor 46, we add back the log-gain offset value of
`After
`32 dB stored in 41. The log-gain limiter 47 then checks the resulting log-gain value and clips it if the value is
`unreasonably large or unreasonably small. The lower and upper limits are set to 0 dB and 60 dB, respectively. The gain
`limiter output is then fed to the inverse logarithm calculator 48, which reverses the operation of the logarithm calculator
`40 and converts the gain from the dB value to the linear domain. The gain limiter ensures that the gain in the linear
`domain is in between 1 and 1000.
`
`3.9
`
`Codebook search module
`
`In Figure 2/G.728, blocks 12 through 18 constitute a codebook search module 24. This module searches
`through the 1024 candidate codevectors in the excitation VQ codebook 19 and identifies the index of the best codevector
`which gives a corresponding quantized speech vector that is closest to the input speech vector.
`
`Recommendation G.728 (09/92)
`
`13
`
`Ex. 1037 / Page 15 of 65
`
`a
`

`

`To reduce the codebook search complexity, the 10-bit, 1024-entry codebook is decomposed into two smaller
`codebooks: a 7-bit “shape codebook” containing 128 independent codevectors and a 3-bit “gain codebook” containing
`eight scalar values that are symmetric with respect to zero (i.e. one bit for sign, two bits for magnitude). The final output
`codevector is the product of the best shape codevector (from the 7-bit shape codebook) and the best gain level (from the
`3-bit gain codebook). The 7-bit shape codebook table and the 3-bit gain codebook table are given in Annex B.
`
`3.9.1
`
`Principle of codebook search
`
`In principle, the codebook search module 24 scales each of the 1024 candidate codevectors by the current
`excitation gain s (n) and then passes the resulting 1024 vectors one at a time through a cascaded filter consisting of the
`synthesis filter F(z) and the perceptual weighting filter W(z). The filter memory is initialized to zero each time the
`module feeds a new codevector to the cascaded filter with transfer function H(z) = F(z)W(z).
`
`The filtering of VQ codevectors can be expressed in terms of matrix-vector multiplication. Let yj be the j-th
`codevector in the 7-bit sh

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket