`J-P. Adoul, P. Mabilleau, M. Delprat and S. Morissette
`
`Communication Research Center
`University of Sherbrooke
`Sherbrooke, P.Q., CANADA, J I K 2R 1
`
`Abstract
`
`Code-Excited Linear Prediction
`(CELP) produces high quality
`synthetic speech at low bit rate. However the basic scheme leads to
`huge computational loads. The paper describes a related scheme, which
`allows real time implementation on current DSP chips. The very
`efficient search procedure in the codebook is achieved by means of a
`new technique called "backward filtering" and the use of algebraic
`cedes. RSB performances are reported for a variety of conditions.
`
`Introduction
`
`been made in
`Significant progress has recently
`low bit rate
`coding of speech, which suggests that
`high quality synthetic speech may be produced at 4800
`bps and lower rates.
`Code Excited Linear Prediction
`In that field,
`KELP) is a very promlsing technique for narrow
`band
`coding of
`speech. However the
`huge amount of
`computations invelved appeared as quite a challenge
`11. The comparatively
`when it was first introduced
`high quality at low bit rate is
`achieved through an
`
`
`
`analysis-by-synthesis procedure using both
`short-term and long-term prediction as shown
`figure 1.
`
`in
`
`I
`
`+ -
`2nl
`
`Long-term Short-term
`
`predictor!
`predictor:
`i
`
`Each
`with respects to a subjective error criterion.
`codeword (or sequence) Ck is scaled by an optimal gain
`filters
`factor Gk and processed through the Inverse
`I/B(z) (pitch predictor)
`and I / A W (linear-prediction
`inverse filter). The difference Xn between the original
`and synthetic signals (sn and 2,) i s processed through
`W(z) and the "best"
`the perceptual weighting filter
`sequence is then chosen to minimize the energy of the
`perceptual error signal Yn.
`and Atal [21, the huge
`As mentioned by Trancoso
`amount of computations mainly comes from the search
`procedure
`for finding the optimum innovation
`sequence, and particularly from the filtering of all the
`sequences
`through both long-term
`and short-term
`predictors. Thus efforts have been concentrated on
`this problem.
`We shall discuss in turn four proposals which will
`enable a dramatic reduction in computation. The four
`proposals are:
`a) A modification of the basic structure.
`b) The concept of "backward filtering" together
`w i th the use of an LPC-f 11 ter codebook.
`c) The use of short innovation sequences w i t h a
`tree coding approach.
`d l The use of algebraic structures for the
`innovation codebook.
`
`Transformatlon of the bask scheme
`
`U
`
`dai n
`
`Figure 1: original proposal
`
`A simple and useful perceptual weighting filter is
`W(z) = A(zVAW8)
`
`expressed as
`
`(1)
`where 8 is the perceptual weighting coefficient
`(chosen around 0.8) and A(z) is the linear prediction
`filter: A(z) = Ci a1z-i . The f i l t e r W(z) can be moved
`both in the upper branch and in the lower branch, as
`shown in figure 2.
`i s
`In the
`upper branch the original signal
`processed through the analysis
`f i l t e r A(z), yielding a
`residual signal en from which the pitch parameters are
`derived. Then this residual signal is processed through
`The analysis procedure conslsts of finding the
`innovation sequence in the codebook which is optimum
`the inverse filter
`l / A W S ) . In the lower branch the
`49.4.1
`CH2396-0/'87/0000-1957 $1.00 @ 1987 IEEE 1957
`
`Ex. 1026 / Page 1 of 4
`Apple v. Saint Lawrence
`
`
`
`?
`
`Long-term
`
`
`predictor, predictor:
`
`U
`' b i n
`Figure 2: the perceptual weighting filter is moved
`both in the upper and in the lower branch.
`inverse filter l/A(z/8) now replaces l/A(z).
`The pitch predictor is chosen t o be a single tap
`H Z ) = 1 - b z - ~
`predictor :
`(2)
`where b is the gain and T i s abusively called the "pitch
`period". The expression of the output signal &n of the
`pitch predictor 1/B(z) i s then derived from (2):
`@n = ?n + b &n-T
`( 3 )
`r^n = Gk Cnk , n = 0, .,., N-1
`w i t h
`(4)
`where N is the block size (length of the codewords).
`During the search
`procedure, the signal @n-T i s
`known and does not depend on the codeword currently
`tested. Thus the pitch predlctor 1 /B(z) can be removed
`b&,,-T is
`from the lower
`branch
`the signal
`i f
`subtracted from the residual signal in the upper branch
`121. Using (31, the signal e,,-T
`is obtained by processing
`fWT through the pitch predictor
`the delayed signal
`1/B(z); and PWT
`i s computed from already known
`codewords, chosen for preceding blocks, provided that
`the pitch period T i s restricted to values greater than
`the block size N.
`
`Figure 3: the long-term predictor and the "memory" of the
`short-term predictor are removed from the lower branch
`
`where n=O i s the beginning of the block. The signal ji'n
`represents the output of the inverse memoryless filter
`l/&z/8)) while Vn is the output of I/A(z/8) without
`any excitation. The signal Vn only depends on the
`codewords chosen for preceding blocks, so it can be
`removed from the lower branch
`and subtracted from
`the upper one [21.
`The new scheme, w i t h the long-term predictor
`and the "memory" of the short-term predictor removed
`from the lower branch, i s represented in fjgure 3. The
`two filterings per
`codeword are here reduced to
`a
`single memoryless filtering, with a significant cut in
`the computational
`load, and this structure
`enables
`further modifications.
`
`The search procedure now consists of minimizing
`the error signal energy E over the current
`block, and
`figure 3 leads to the expression
`- GgnP ; sum over n = 0, 1, ... ,N- 1 ( 5 )
`E = &,(X,
`where gn is the response of l/&/X)
`t o the currently
`tested codeword and G is the related gain factor. The
`minimization is achieved in two steps:
`1 ) Find G, setting dE/dG = 0 .
`From (5) it comes:
`dE/dG = -2 Cn Xngn + 2G & gn2
`G=(& Xngn) / ( In gn2)
`so
`( 6 )
`Replacing the expression ( 6 ) in equation (5) gives
`E = Zn X$ - ((2, xngn)2 / t En gn2)) .
`2 ) Minimize E over the codebook entries.
`Since Xn does not depend on the codeword tested,
`minimizing E i s equivalent t o maximizing the weighted
`inner product Pw = P/Mk where o(k2 =
`gn2
`i s the
`the memoryless filter
`energy of the
`response of
`I/&z/tc) to the currently tested codeword k, and P i s
`the inner product between Xn and gn:
`P = & xngn
`(7)
`Thus the search procedure consists of finding the
`codeword k which maximizes Pw over the codebook, as
`
`1958
`
`49.4.2
`
`Ex. 1026 / Page 2 of 4
`
`
`
`-epresented in figure 4a.
`Since gn
`is the
`response of the memoryless
`l/ii(z/8) to the currently tested
`inverse
`f i l t e r
`codeword c = (Cnl, it can be written as the convolution
`product between
`c and the impulse
`response fn of
`1 /A(z/8 1 :
`gn=2, C i fn-1
`(8)
`Using (7) and (81, it comes
`
`N - l
`w i t h dl = 1 x n f n - i
`(10)
`n=O
`Considering the temporally reversed sequences Xn'
`and dn' defined by Xn' = XN-n and dn' = dN-n, equation ( 1 0)
`is equivalent to dn'=;); Xi'fn-i
`( 1 1 )
`The sequence dn' appears i n equation ( 1 1 ) as the
`convolution product between
`the impulse
`Xn' and
`response of
`1 /&z/X).Thus
`it can be computed as the
`response of 1 /A(z/8) to the sequence Xn', as shown in
`figure 4b.
`
`is used
`t o
`Hierarchical vector quantization
`each
`quantize the linear prediction filter A(z) for
`frame of signal.
`A hierarchical codebook of 1024
`filters is generated from a large training set, using an
`efficient design algorithm
`[41. For an equivalent
`quality,vector quantization requires a much lower bit
`
`
`
`
`rate than scalar quantization.
`Moreover,
`the
`in the codebook
`hierarchical (binary tree) structure
`enables fast search procedures during the quantization
`process.
`advantages, vector
`Apart from these significant
`quantization of the
`1.PC filters has an even greater
`interest in a CELP environment. As a matter of fact,
`the search procedure described
`above involves the
`computing of the coefficients
`l / ~ b which only
`depends on the codeword and on the linear prediction
`filter. Using vector quantization, the
`number of
`possible LPC filters is finite
`and reasonably small
`(11024). Thus all the possible values of
`I/CXk,i can be
`pre-computed and stored. If the size K of the codebook
`sequences is not too large
`containing the innovation
`this storing will require
`reasonable amounts of
`memory.
`
`I
`I
`
`Algebralc codes, short innovation
`sequences and tree coding
`
`1111
`OII
`U
`
`I
`I
`
`Figure 4: the "backward filtering7principle
`
`The sequence Xn i s temporally reversed, inverse
`and temporally reversed
`1 /&z/8)
`f i 1 tered through
`again, giving sequence dn, which does not depend on the
`currently tested
`codeword. This procedure
`is called
`the "backward filtering".
`Now the search procedure
`consists of maximizing the weighted inner product Pw,
`computed as the inner product between
`Cn and dn,
`scaled w i t h 1 / X k .
`is strictly
`4b)
`The backward filtering (figure
`equivalent t o the former procedure (figure
`4a) and
`consequently the results are unchanged. But instead of
`one filtering per codeword, only one filtering for all
`i s required here, which cuts the
`the codewords
`computational load by a factor approximately equal t o
`the linear prediction order M.
`
`Smaller codebooks can be used w i t h an equivalent
`bit rate i f the block size N i s reduced. This has the
`
`
`
`
`
`of advantage significantly decreasing the
`i s
`computational load, although the coding procedure
`less efficient with a small block size. A tree coding
`structure, for instance with the ML algorithm, can be
`used to reduce this drawback. During the search
`procedure the L "best" codewords for the current block
`are retained as hypothesis for the next block. This
`increases the amount of computations proportionally
`w i t h L, but the algorithm is already very efficient for
`L=2 . An additional delay
`is introduced, so the pitch
`period T must be greater than 2N. Another efficient
`way of taking into account the influence of a codeword
`on the next block is to compute
`a longer sequence of
`The increase in
`signal dn by "backward filtering".
`computation is minimal.
`including the
`scheme,
`The whole coding
`algorithm, is represented in figure 5.
`In stochastic excited LPC the innovation codebook
`In fact, as
`i s populated w i t h i i d Gaussian samples,
`in [61, at rates half bit
`pointed out
`and below
`sequences of + 1 and - 1 are just as good,
`Algebraic structure from code theory can be used
`to provide efficient innovation codebooks. In particular
`
`ML
`
`49.4.3
`
`1959
`
`Ex. 1026 / Page 3 of 4
`
`
`
`I
`11 -
`
`13-
`
`~
`
`9-
`
`7-
`
`5-
`
`{ Nordstrom-Robinson
`
`Without LP-filter
`quantization.
`Nordstrom-Robinson
`
`'
`
`Reed-Muller code
`
`Reed Muller mx!e
`without pitch
`prediction.
`
`L (tree branching
`factor)
`
`3-
`1
`
`5
`
`16
`10
`Figure 6: simulation results.
`coding procedure can be implemented on current DSP
`real time implementation, using the
`chips, A
`Reed-Muller code, has been performed on a single TMS
`320-10. For a global rate of 4800 bps, the synthetic
`speech quality is quite fair.
`Examples of synthetic
`speech obtained with this method w i l l be played at the
`conference.
`
`~~~~~~~~~~
`
`CELP
`
`Several transformations of the basic
`scheme, a new technique called "backward filtering"
`LPC
`combined with the vector quantization of the
`filters and the use of algebraic
`codes have been
`presented in this paper. This leads to a very efficient
`
`
`search procedure which
`enables
`
`
`real time
`implementation with high synthetic speech quality at
`4800 bps.
`Our current work focuses on the optimization of
`the LP codebook,
`testing new algebraic
`codes and
`combinations of codes. Real time implementations,
`w i t h TMS 320-20 and TMS 320C25 are underway.
`to express
`The authors wish
`~
`6
`~
`~
`~
`w
`l
`~
`~
`~
`~
`their thanks to Claude Laflamme for many valuable
`suggestionsand for the real time implementation work.
`References
`[ 11 M.R. Schroeder and B.S. Atal,
`linear
`"Code-excited
`prediction(CELP1: high-quality speech at very low bit rates", in
`Proc. Int. Conf. on Acoustics, Speech and Signal Proc., March 1985.
`[21 I.M. Trancoso and B.S. Atal, "Efficient procedures for finding the
`i n stochastic coders", in Proc. Int. Conf. on
`optimum innovation
`Acoustics, Speech and Signal Proc., paper nQ 44.5, April 1986.
`"A comparison of some algebraic
`[31 J.P. Adoul,
`C. Lamblin,
`structures for CELP coding of speech", ICASSP, April 1987.
`[41 J.P. Adoul, 'Speech
`coding algorithms and vector quantization,,
`chapter 3 of
`"Advanced digital communications and signal
`processing", K. Feher editor, Prentice Hall, NJ ( 1987).
`[51 J.P. Adoul, C. Lamblin, A. Legwader, "Baseband speech coding at
`2400 bps using Spherical Vector Quantization".ICASSP, April 1984.
`t6lM.R. Schroeder,N.J. Sloane,"New permutation codes with Hadamard
`unscrambling", IEEE Int. Symp. on Inf. Theory, June 1985.
`
`~
`
`~
`
`§
`
`:
`
`Figure 5: complete coding scheme incorporating backward
`filtering and tree coding.
`Using
`the performances for block size N=16 samples
`the Reed-Mulier code [ 16,5,81 (51 16 bits per sample)
`and the nonlinear Nordstrom-Robinson
`code ( 1 6,8,6)
`sample) are reported below.
`(1/2 bits per
`These
`codebooks lead to global rates as low as 2400 and
`4800 bps respectively.
`codes versus
`The great advantage of algebraic
`stochastic codebooks is that fast algorithms
`can be
`used to compute the inner products during the search
`procedure. For the above codes, fast algorithms based
`on the Fast Hadamard Transform (FC-IT) are detailed in
`[3]. I t provides an additional cut in the computation,
`which enables the use of higher rate codebooks w i t h a
`related increase in the synthetic speech quality.
`Experimental results
`The CELP coding technique, as represented in
`figure 5, has been tested both with simulations
`and
`real time implementation. Extensive simulation work
`has been done to determine the respective influence of
`the different parameters
`and the performances
`of
`different codes, Pitch prediction is clearly very
`efficient, while the degradation
`caused by the
`quantization of
`the LPC
`filters shows that the
`LPC-f ilter codebook must be designed with great care.
`With the Reed-Muller code the synthetic speech quality
`it is very high with
`is quite fair,
`and
`the
`Nordstrom-Robinson code (which requires a higher b i t
`rate). The ML-tree coding algorithm is useful at small
`branching factor L=2 or 3. Figure 6 gives in a concice
`(5/ 16 or
`fashion the impact of rate
`1 /2), pitch
`prediction, LP quantization and the tree branching
`factor on the Signal to Noise Ratio (SNR) in dB.
`above,
`With all the transformations described
`combined with the use of algebraic codes, the CELP
`
`1960
`
`49.4.4
`
`Ex. 1026 / Page 4 of 4
`
`