`
`16 KBPS WIDEBAND SPEECH CODING TECHNIQUE
`BASED ON ALGEBRAIC CELP
`
`C. Laflamme, J-P. Adoul, R. Salami, S. Morissette, and P. Mabilleau
`
`Communication Reasearch Center, University of Sherbrooke,
`Sherbrooke, QuCbec, CANADA J1K 2R1
`
`Abstract
`
`The application of Algebraic Code Excited Linear Prediction
`(ACELP) coding to wideband speech is presented. In wideband
`coding, a very large excitation codebook is required in order to
`obtain high quality speech. Ordinary CELP algorithms fail to ac-
`comodate such large codebooks due to the excessive coding com-
`plexity. In ACELP, however, an algebraic codebook with 20 bit
`address can be used, without any storage requirements, and more
`importantly, with a very efficient search procedure which allows
`for real-time implementation. A focused search strategy is used
`in which a very small portion of the codebook is searched, yet
`with performance very close t o that of full search. High quality
`speech was obtained at bit rates below 13 kb/s.
`
`1 INTRODUCTION
`
`Digital coding of 7 kHz speech is becoming of interest for many
`applications such as teleconferencing, commentary channels, high-
`quality voice-mail services, and high-quality wideband telephone
`for the emerging ISDN service. Recent years have witnessed a
`breakthrough in the developement of speech coding techniques,
`however, most of the research has focused on narrow-band speech
`signals where the transmission bandwidth is limited t o 300-3400 Hz.
`Although this bandwidth limitation is acceptable in telephone
`systems, it degrades the speech quality in the above mentioned
`applications where the speech is to be heard through high quality
`loudspeakers. For these systems, a bandwidth of 50-7000 Hz was
`found to be appropriate (corresponding to a sampling frequency
`of 16kHz).
`In 1986, the CCITT approved a 64 kb/s standard for wide-
`band audio coding based on a SB-ADPCM coding algorithm [l].
`Reducing this bit rate yields larger spectral efficiency and allows
`for more users to be accomodated in the communication systems
`which have limitations in the transmission bandwidth. Very effi-
`cient speech coding algorithms have been recently developed for
`narrowband digital speech where high quality speech can be ob-
`tained at bit rates as low as 4.8kb/s [2]. Code-excited linear
`prediction (CELP) coding has proven to be the most promising
`among these algorithms. However, few studies have attempted
`to apply CELP to the context of wideband speech. The main
`drawback of CELP is its gross computational complexity. As the
`sampling frequency is doubled, larger frame sizes are needed to
`maintain a low bit rate transmission. Consequently, the use of
`much larger excitation codebooks becomes inevitable. For in-
`
`stance, if we assume the same proportional bit rates and block
`lengths, the typical codebook size increases from a thousand en-
`tries (10 bits) to a million entries (20 bits). Searching and storing
`such a codebook size is rather impractical, unless some subobti-
`mal approaches are utilized such as multistage codebooks, or a
`split-band approach.
`From the above discussion, it seems that it is i
`a full band approach for CELP coding of wideband speech. How-
`ever, the Algebraic Code Excited Linear Prediction (ACELP)
`approach [3,4] offers the solution to this problem. Using ACELP
`along with a focused search technique enables the utilization of
`codebooks having over a million entries, yet with a search com-
`plexity that can be implemented on current DSP technology. We
`describe in this paper the codebook structure used by the ACELP
`and explain the strategy deployed for the efficient search of the
`optimum excitation sequence. We also report on the application
`of ACELP t o wideband audio coding where high quality speech
`was obtained a t bit rates below 16 kb/s. We describe the coding
`parameters and give an assessement of the coder complexity.
`
`2 OVERVIEW OF CEL
`
`In CELP coders, the excitation s
`and LPC synthesis filters is an entry to a large stochastic code-
`book scaled by a gain factor. The optimum excitation vector is
`selected by the exhaustive search of the excitation codebook for
`the codeword which minimizes the mean squared weighted error
`between the original and synthesized speech. If ck is the excita-
`tion vector at index k, then the mean squared weighted error is
`given by
`
`Ek = IIx - gHCkll*,
`(1)
`where x is the target vector given by the weighted input speech
`after subtracting the zero-input response of the weighted synthe-
`sis filter I / A ( z / 7 ) , g is a scaling gain factor, and H is a lower
`triangular convolution matrix constructed from the impulse re-
`sponse of the weighted synthesis filter. Setting a E / a g = 0 in
`Equation (1) yields
`
`X T H C k
`clHTHck ’
`and substituting Equation (2) in (1) gives
`
`g=-
`
`(2)
`
`- 1 3 -
`
`CH2977-719110000-0013 $1.00 Q 1991 IEEE
`
`Ex. 1039 / Page 1 of 4
`Apple v. Saint Lawrence
`
`
`
`tilt and varies in every excitation frame. The parameters of the
`filter depend on the bit rate.
`A sparse algebraic codebook is used. The codebook consists
`of a set of interleaved permutation codes containing few nonzero
`elements. The pulse amplitude are fixed to either 1 or -1, and
`each pulse can take a number of distinct positions. Thus an
`excitation vector ( or a codeword) is determined by the positions
`of its nonzero pulses (as pulse amplitudes are fixed), and the
`positions are coded and transmitted. Such a codebook does not
`require any storage and can be searched very efficiently as we
`will see in the next section. Further, the codebook structure is
`robust against channel errors as a transmission error will change
`the position of only one excitation pulse.
`To further illustrate the codebook structure, we describe the
`2" sized codebook used in our implementation. Excitation frames
`of 80 samples (5 ms) are used. Every frame contains 5 pulses with
`amplitudes 1, -1, 1, -1, and 1, respectively. Each pulse can take
`16 distinct positions. Each position is encoded with 4 bits result-
`ing into a 20 bit codebook. An algebraic vector a(n) is given
`by
`
`b;6(n - mi), n = 0,. ... N - 1,
`
`(8)
`
`P-1
`u(n) =
`i = O
`where p is the number of pulses (5 in our case), bi are the pulse
`amplitudes (1 for i even and -1 for i odd), mi are the pulse
`positions. In our case, the ith pulse has 16 possible positions
`my) = .
`given by
`i = O ..... 4,
`j = 0,. .. ,15.
`This codebook is a subset of the set of all the combinations of
`80-dimentional vectors containing 5 pulses with fixed amplitudes
`('OC5 = 24 millions). The codewords of this codebook can be
`represented by points uniformly distributed over the surface of
`the hyper sphere in 80-dimentional space. Notice that each pulse
`has 16 positions distinct from those of the other pulses, and the
`pulses in the excitation vector can be at any position in the entire
`frame. Since we search for the best position of each pulse (with
`the constraints of fixed amplitudes) then the selected codeword
`will have, to some extent, optimized pulse positions, and will
`have less unnecessary components BS compared to other CELP
`structures.
`
`The optimum codeword is selected by minimizing the term
`
`(XTYk)'
`G=-,
`
`(4)
`f f k
`where yk = H C k is the zero-state response of the weighted syn-
`thesis filter to the codeword ck and f f k = yTy is the energy of the
`filtered codeword yk. The CELP complexity stems from the need
`to compute the filtered codewords Y k for every possible codebook
`entry. The numerator in Equation (4) represents the cross corre-
`lation between the target vector and the filtered codeword, and
`the need to compute yk is circumvented by the use of backward
`filtering, where Equation (4) can be expressed by
`
`(5)
`where 9 = HTx is the backward filtered target vector. Com-
`puting the energy term CYk remains the main problem, and in
`ACELP, it is efficiently determined with very few operations by
`the use of algebraic codebooks with few nonzero elements. In the
`next sections we will describe the codebook structure and the ef-
`ficient strategy deployed for the search of the optimum codeword.
`
`3 CODEBOOK STRUCTURE IN ACELP
`
`We have recently proposed a general framework for representing
`innovation codebooks in stochastically excited linear predictive
`coders [4]. The structure of this general framework is depicted in
`Figure 1. In this structure, innovation codebooks are generated
`from: an algebraic codebook { a b } which is the set of vectors ao,
`.... aL-1, and a shaping matrix F. Thus, an excitation vector is
`given by
`
`The advantage of this structure is that the codebook search is de-
`coupled from the codebook properties. The algebraic codebook
`can be properly chosen so that it is very efficiently searched, and
`need not be stored. The shaping matrix renders the flexibilty
`in obtaining any desired codebook properties. It can be fixed or
`dynamically changed to control the statistical properties of the
`codebook in time and/or frequency domains. This framework
`can represent a wide range of efficient innovation codebook struc-
`tures [4]. In the rest of this section, we will describe the shaping
`matrix and the algebraic codebook used in our implementation.
`The shaping matrix used in the ACELP coder is a function
`of the LPC model A(r). Its main role is t o shape the excitation
`vectors in the frequency domain so that their energies are con-
`centrated in the important frequency bands. In fact, the matrix
`F has a similar role to that of postfiltering [5], in the sense that it
`enhances the formant regions in the reconstructed speech. How-
`ever, it has the advantage that it is embedded in the codebook
`search procedure. In this case, the energy scaling problem which
`we face when using postfiltering is eliminated. The shaping ma-
`trix used is a Toeplitz lower triangular matrix constructed from
`the impulse response of the filter
`
`F ( 2 ) = (1 - p2-l)- 4 2 / 7 1 )
`A(z/72) '
`where A(r) is the LPC inverse filter, 71 and 72 are constants such
`that 71 < 72 < 1, and p is a factor which controls the spectral
`
`(9)
`
`In novation codebook
`
`..................................... L
`
`f
`........... ;. ...........................
`k i
`A(z) i
`
`Figure 1 A general framework for codebook representation in
`stochastic linear predictive coders.
`
`- 1 4 -
`
`Ex. 1039 / Page 2 of 4
`
`
`
`4 EFFICIENT SEARCH STRATEGY
`
`From Equations (3) and (6), the optimum codeword is deter-
`mined by maximizing the term
`
`(X’HFak)’
`
`( X T F z a k ) z
`-
`i-k = ac(HF)THFak - aZFTFzak ’
`Notice that the matrix
`
`Fz=HF
`is a lower triangular matrix containing the impulse response of
`the combined filter
`
`(10)
`
`The search can now be brought back to the algebraic space by
`backward filtering the target vector with the combined filter
`Fz(z). The term in Equation (IO) can be written as
`
`where d = FTx is the backward filtered target vector, and @ a
`matrix containing the autocorrelations of the impulse response
`of Fz(z). As the vector a contains only p nonzero pulses, Equa-
`tion (13) can be written as
`(Er:: d(m1)bl)’
`> =
`CfZ: $(m,,m,) + 2 C f Z i Cy;,?+, bib,$(m,mj)
`As the amplitudes L,, i = 0,. . . , p - 1, are fixed to 1 or -1, the
`correlation requires p additions and the energy p(p + 1)/2 ad-
`ditions and one multiplication. However, by changing only one
`pulse position at a time, updating the term in Equation (14) be-
`comes much simpler. The search is performed in p nested loops,
`where each loop corresponds to one pulse position. In each loop,
`the contribution of a new pulse is added. Thus, in the most inner
`loop, the contribution of the last pulse is added, so the correla-
`tion requires one addition and the energy p additions and one
`
`(14)
`
`’
`
`multiplication (6 additions for our codebook). If an exhaustive
`search is to be carried out, this approach is more efficient than
`the best known CELP algorithms such as overlapp
`or VSELP.
`Although our search procedure is very efficient, the search
`becomes rapidly demanding as the codebook size exceeds 212.
`For huge codebooks such as the one
`clever search strategy has t o be followe
`subsection an approach which we call
`very small subset of the codebook is
`performance very close to that of full search.
`
`4.1 Focused Search
`
`Unlike partial search which operates on an arbitrary partial sub-
`set, in focused search it is possible to infer if the current codebook
`subset stands a chance of holding the winner. If the chances are
`good, the search is continued, otherwise we pas
`set of the codebook.
`Figure 2 shows a scatter plot of the absolute
`Id*akl versus the square root of the energy of filtered codewords,
`6 , where a codebook of 4096 entries is used. Note that the
`and is
`winning codeword is the one which maximizes Ck/&,
`represented in the plot by the point having maximum slope. It
`is observed that only a small portion of the points (codewords)
`stands a chance for being the winner, and the majority of points
`have slopes less than half that of the winner. Therefore, the
`reduced if the search al-
`ch have slopes very close
`e, the term is computed
`ulse in every inner loop.
`, one can decide whether
`the search should be continued or not by comparing the term
`with some predifined threshold. The threshold is set at the be-
`ginning of the search and is given by a fraction of the term at
`which the correlation is maximum. Figure 3 shows the points
`which has been searched at the most inner loop. In this specific
`example, less than 2% of the codebook
`
`SOUARE ROOT OF ENERGY
`
`Figure 2 A typical scatter plot of absolute correlation versus
`square root of energy of filtered codewords (4096 points).
`
`SQUARE ROOT OF ENERGY
`Figure 3 Same as Figure 2 deploying the focused search
`approach. For this typical subframe only 88 codewords are
`searched in the most inner loop.
`
`- 1 5 -
`
`Ex. 1039 / Page 3 of 4
`
`
`
`I
`
`I
`
`I
`
`,
`
`
`
`timum codeword is found. Note that the amount of codebook
`searched varies from one frame to another. Thus for real-time
`considerations, the search has to be stopped if it is much larger
`than the average search value.
`In our codebook five nonzero pulses are used, and every pulse
`can have 16 different positions (4 bits). We use two thresholds,
`one after the third pulse and the other after the fourth pulse.
`Even with very low thresholds, a substantial amount of the search
`is cut down without causing any significant degradation in the
`speech quality. Table 1 shows the SNR for different percentages
`of codebook search, for a sentence uttered by a female speaker.
`The search complexity is significantly reduced by increasing the
`threshold values. Searching about 0.05% of the codebook resulted
`in only 0.5 d B drop in SNR as compared to the full search case.
`I Percentaoe search I SNR fdBl I
`I 22.2
`100
`I 22.14
`4
`I 22.0
`I 1.6
`I 22.05
`0.2
`I 21.83
`0.15
`0.05
`121.8
`I 21.5
`0.03
`Table 1 SNRa for different percentages of codebook search.
`
`I
`
`5 WIDEBAND ACELP CODING
`
`In this section, we report on the work we are currently carry-
`ing out on high quality wideband coding below I6kb/s. The
`input speech is filtered at 50-7000Hz and sampled at 16kHz as
`in CCITT wideband specifications.
`Due to the efficiency of the ACELP codebook search proce-
`dure, and to the ability to incorporate very huge codebooks as
`described earlier, a full-band coding approach is adopted. The
`LPC parameters are updated every 15 ms and computed using a
`20ms Hamming window centered at the end of the frame. The
`speech frame is divided into 3 subframes of 5 ms. The pitch and
`codebook parameters are updated every subframe. The LPC pa-
`rameters are interpolated between adjacent frames (using LSFs)
`to obtain a different set of parameters for every subframe. Con-
`cerning the filter order, using 16 LPC coefficients was found ad-
`equate to represent the speech short-time spectrum envelope. A
`portion of the bandwidth expansion was used before the param-
`eter computation by lag windowing the autocorrelation coeffi-
`cients.
`Concerning the pitch analysis, a one-tap pitch predictor is
`used, and is updated every 5ms (80 samples). The pitch delay
`extends from 40 to 295 samples, and is encoded with 8 bits. The
`pitch delay has twice the resolution of the narrow band case. In-
`creasing the pitch resolution did not result in any considerable
`improvement due to the large excitation codebook used. Frac-
`tional pitch could be more beneficial at lower bit rates. The
`pitch parameters are computed in a closed loop approach, with
`a weighting factor 7 = 0.6. For delays less than the subframe
`length, the past excitation is extended by the short-term predic-
`tion residual. Thus, these delays are not treated seperately, as
`would be the case if the excitation itself is repeated. The impulse
`
`response of the weighted synthesis filter is truncated at 30, and
`the convolutions and energies are easily updated.
`The codebook is efficiently searched as described in the previ-
`ous section. A weighting factor of 7 = 0.9 is used. The thresholds
`were chosen such that, at the most, 1000 codewords are searched.
`High quality speech at bit rates around 13 kbs was obtained. The
`quantization procedures for the filter coeficients and excitation
`parameters will be detailed in a forthcoming paper.
`As we are using a full band approach, it is not abvious how
`the bits are allocated in different frequency bands. However, the
`ACELP algorithm implicitly favours the bands with more energy,
`and this is made more evident by the use of the shaping matrix
`which enhances the formants of the speech spectrum. There-
`fore the encoding algorithm accurately represents the frequency
`regions which are perceptually important.
`
`6 CONCLUSION
`
`We presented in this paper an efficient approach for encoding
`wideband speech signals using algebraic CELP. A full band en-
`coding approach was used whereby a codebook of 220 entries was
`used. We described an efficient procedure for searching such a
`large codebook deploying a focused search strategy, where less
`than 0.1% of the codebook is searched with performance very
`close to that of full search. High quality speech at a bit rate of
`13 kb/s was obtained.
`
`Acknowledgment
`
`This work was supported, in part, by the Canadian Workplace
`Automation Research Center (CWARC).
`
`References
`
`X.Maitre, “7 kHz audio coding within 64 kbit/s,” IEEE J.
`on Selec. Areas in Commun., Vol. 6, No. 2, pp. 283-298,
`Feb. 1988.
`
`B.S.Atal et al. (eds.), Adoances i n Speech Coding, Kluwers
`Academic Pub., 1991.
`
`J-P.Adou1 et al, “Fast CELP coding based on algebraic
`codes,” Proc. ICASSP’87, pp. 1957-1960.
`C.Ldamme et al., “On reducing computational complex-
`ity of codebook search in CELP coder through the use of
`algebraic codes,” Proc. ICASSP’SO, pp. 177-180.
`
`J.H. Chen and A.Gersho, “Real-time vector APC speech
`coding at 4800 bps with adaptive postfiltering,” Proc.
`ICASSP’87, pp. 2185-2188.
`
`- 16 -
`
`Ex. 1039 / Page 4 of 4
`
`