throbber
S1.4
`
`16 KBPS WIDEBAND SPEECH CODING TECHNIQUE
`BASED ON ALGEBRAIC CELP
`
`C. Laflamme, J-P. Adoul, R. Salami, S. Morissette, and P. Mabilleau
`
`Communication Reasearch Center, University of Sherbrooke,
`Sherbrooke, QuCbec, CANADA J1K 2R1
`
`Abstract
`
`The application of Algebraic Code Excited Linear Prediction
`(ACELP) coding to wideband speech is presented. In wideband
`coding, a very large excitation codebook is required in order to
`obtain high quality speech. Ordinary CELP algorithms fail to ac-
`comodate such large codebooks due to the excessive coding com-
`plexity. In ACELP, however, an algebraic codebook with 20 bit
`address can be used, without any storage requirements, and more
`importantly, with a very efficient search procedure which allows
`for real-time implementation. A focused search strategy is used
`in which a very small portion of the codebook is searched, yet
`with performance very close t o that of full search. High quality
`speech was obtained at bit rates below 13 kb/s.
`
`1 INTRODUCTION
`
`Digital coding of 7 kHz speech is becoming of interest for many
`applications such as teleconferencing, commentary channels, high-
`quality voice-mail services, and high-quality wideband telephone
`for the emerging ISDN service. Recent years have witnessed a
`breakthrough in the developement of speech coding techniques,
`however, most of the research has focused on narrow-band speech
`signals where the transmission bandwidth is limited t o 300-3400 Hz.
`Although this bandwidth limitation is acceptable in telephone
`systems, it degrades the speech quality in the above mentioned
`applications where the speech is to be heard through high quality
`loudspeakers. For these systems, a bandwidth of 50-7000 Hz was
`found to be appropriate (corresponding to a sampling frequency
`of 16kHz).
`In 1986, the CCITT approved a 64 kb/s standard for wide-
`band audio coding based on a SB-ADPCM coding algorithm [l].
`Reducing this bit rate yields larger spectral efficiency and allows
`for more users to be accomodated in the communication systems
`which have limitations in the transmission bandwidth. Very effi-
`cient speech coding algorithms have been recently developed for
`narrowband digital speech where high quality speech can be ob-
`tained at bit rates as low as 4.8kb/s [2]. Code-excited linear
`prediction (CELP) coding has proven to be the most promising
`among these algorithms. However, few studies have attempted
`to apply CELP to the context of wideband speech. The main
`drawback of CELP is its gross computational complexity. As the
`sampling frequency is doubled, larger frame sizes are needed to
`maintain a low bit rate transmission. Consequently, the use of
`much larger excitation codebooks becomes inevitable. For in-
`
`stance, if we assume the same proportional bit rates and block
`lengths, the typical codebook size increases from a thousand en-
`tries (10 bits) to a million entries (20 bits). Searching and storing
`such a codebook size is rather impractical, unless some subobti-
`mal approaches are utilized such as multistage codebooks, or a
`split-band approach.
`From the above discussion, it seems that it is i
`a full band approach for CELP coding of wideband speech. How-
`ever, the Algebraic Code Excited Linear Prediction (ACELP)
`approach [3,4] offers the solution to this problem. Using ACELP
`along with a focused search technique enables the utilization of
`codebooks having over a million entries, yet with a search com-
`plexity that can be implemented on current DSP technology. We
`describe in this paper the codebook structure used by the ACELP
`and explain the strategy deployed for the efficient search of the
`optimum excitation sequence. We also report on the application
`of ACELP t o wideband audio coding where high quality speech
`was obtained a t bit rates below 16 kb/s. We describe the coding
`parameters and give an assessement of the coder complexity.
`
`2 OVERVIEW OF CEL
`
`In CELP coders, the excitation s
`and LPC synthesis filters is an entry to a large stochastic code-
`book scaled by a gain factor. The optimum excitation vector is
`selected by the exhaustive search of the excitation codebook for
`the codeword which minimizes the mean squared weighted error
`between the original and synthesized speech. If ck is the excita-
`tion vector at index k, then the mean squared weighted error is
`given by
`
`Ek = IIx - gHCkll*,
`(1)
`where x is the target vector given by the weighted input speech
`after subtracting the zero-input response of the weighted synthe-
`sis filter I / A ( z / 7 ) , g is a scaling gain factor, and H is a lower
`triangular convolution matrix constructed from the impulse re-
`sponse of the weighted synthesis filter. Setting a E / a g = 0 in
`Equation (1) yields
`
`X T H C k
`clHTHck ’
`and substituting Equation (2) in (1) gives
`
`g=-
`
`(2)
`
`- 1 3 -
`
`CH2977-719110000-0013 $1.00 Q 1991 IEEE
`
`Ex. 1039 / Page 1 of 4
`Apple v. Saint Lawrence
`
`

`

`tilt and varies in every excitation frame. The parameters of the
`filter depend on the bit rate.
`A sparse algebraic codebook is used. The codebook consists
`of a set of interleaved permutation codes containing few nonzero
`elements. The pulse amplitude are fixed to either 1 or -1, and
`each pulse can take a number of distinct positions. Thus an
`excitation vector ( or a codeword) is determined by the positions
`of its nonzero pulses (as pulse amplitudes are fixed), and the
`positions are coded and transmitted. Such a codebook does not
`require any storage and can be searched very efficiently as we
`will see in the next section. Further, the codebook structure is
`robust against channel errors as a transmission error will change
`the position of only one excitation pulse.
`To further illustrate the codebook structure, we describe the
`2" sized codebook used in our implementation. Excitation frames
`of 80 samples (5 ms) are used. Every frame contains 5 pulses with
`amplitudes 1, -1, 1, -1, and 1, respectively. Each pulse can take
`16 distinct positions. Each position is encoded with 4 bits result-
`ing into a 20 bit codebook. An algebraic vector a(n) is given
`by
`
`b;6(n - mi), n = 0,. ... N - 1,
`
`(8)
`
`P-1
`u(n) =
`i = O
`where p is the number of pulses (5 in our case), bi are the pulse
`amplitudes (1 for i even and -1 for i odd), mi are the pulse
`positions. In our case, the ith pulse has 16 possible positions
`my) = .
`given by
`i = O ..... 4,
`j = 0,. .. ,15.
`This codebook is a subset of the set of all the combinations of
`80-dimentional vectors containing 5 pulses with fixed amplitudes
`('OC5 = 24 millions). The codewords of this codebook can be
`represented by points uniformly distributed over the surface of
`the hyper sphere in 80-dimentional space. Notice that each pulse
`has 16 positions distinct from those of the other pulses, and the
`pulses in the excitation vector can be at any position in the entire
`frame. Since we search for the best position of each pulse (with
`the constraints of fixed amplitudes) then the selected codeword
`will have, to some extent, optimized pulse positions, and will
`have less unnecessary components BS compared to other CELP
`structures.
`
`The optimum codeword is selected by minimizing the term
`
`(XTYk)'
`G=-,
`
`(4)
`f f k
`where yk = H C k is the zero-state response of the weighted syn-
`thesis filter to the codeword ck and f f k = yTy is the energy of the
`filtered codeword yk. The CELP complexity stems from the need
`to compute the filtered codewords Y k for every possible codebook
`entry. The numerator in Equation (4) represents the cross corre-
`lation between the target vector and the filtered codeword, and
`the need to compute yk is circumvented by the use of backward
`filtering, where Equation (4) can be expressed by
`
`(5)
`where 9 = HTx is the backward filtered target vector. Com-
`puting the energy term CYk remains the main problem, and in
`ACELP, it is efficiently determined with very few operations by
`the use of algebraic codebooks with few nonzero elements. In the
`next sections we will describe the codebook structure and the ef-
`ficient strategy deployed for the search of the optimum codeword.
`
`3 CODEBOOK STRUCTURE IN ACELP
`
`We have recently proposed a general framework for representing
`innovation codebooks in stochastically excited linear predictive
`coders [4]. The structure of this general framework is depicted in
`Figure 1. In this structure, innovation codebooks are generated
`from: an algebraic codebook { a b } which is the set of vectors ao,
`.... aL-1, and a shaping matrix F. Thus, an excitation vector is
`given by
`
`The advantage of this structure is that the codebook search is de-
`coupled from the codebook properties. The algebraic codebook
`can be properly chosen so that it is very efficiently searched, and
`need not be stored. The shaping matrix renders the flexibilty
`in obtaining any desired codebook properties. It can be fixed or
`dynamically changed to control the statistical properties of the
`codebook in time and/or frequency domains. This framework
`can represent a wide range of efficient innovation codebook struc-
`tures [4]. In the rest of this section, we will describe the shaping
`matrix and the algebraic codebook used in our implementation.
`The shaping matrix used in the ACELP coder is a function
`of the LPC model A(r). Its main role is t o shape the excitation
`vectors in the frequency domain so that their energies are con-
`centrated in the important frequency bands. In fact, the matrix
`F has a similar role to that of postfiltering [5], in the sense that it
`enhances the formant regions in the reconstructed speech. How-
`ever, it has the advantage that it is embedded in the codebook
`search procedure. In this case, the energy scaling problem which
`we face when using postfiltering is eliminated. The shaping ma-
`trix used is a Toeplitz lower triangular matrix constructed from
`the impulse response of the filter
`
`F ( 2 ) = (1 - p2-l)- 4 2 / 7 1 )
`A(z/72) '
`where A(r) is the LPC inverse filter, 71 and 72 are constants such
`that 71 < 72 < 1, and p is a factor which controls the spectral
`
`(9)
`
`In novation codebook
`
`..................................... L
`
`f
`........... ;. ...........................
`k i
`A(z) i
`
`Figure 1 A general framework for codebook representation in
`stochastic linear predictive coders.
`
`- 1 4 -
`
`Ex. 1039 / Page 2 of 4
`
`

`

`4 EFFICIENT SEARCH STRATEGY
`
`From Equations (3) and (6), the optimum codeword is deter-
`mined by maximizing the term
`
`(X’HFak)’
`
`( X T F z a k ) z
`-
`i-k = ac(HF)THFak - aZFTFzak ’
`Notice that the matrix
`
`Fz=HF
`is a lower triangular matrix containing the impulse response of
`the combined filter
`
`(10)
`
`The search can now be brought back to the algebraic space by
`backward filtering the target vector with the combined filter
`Fz(z). The term in Equation (IO) can be written as
`
`where d = FTx is the backward filtered target vector, and @ a
`matrix containing the autocorrelations of the impulse response
`of Fz(z). As the vector a contains only p nonzero pulses, Equa-
`tion (13) can be written as
`(Er:: d(m1)bl)’
`> =
`CfZ: $(m,,m,) + 2 C f Z i Cy;,?+, bib,$(m,mj)
`As the amplitudes L,, i = 0,. . . , p - 1, are fixed to 1 or -1, the
`correlation requires p additions and the energy p(p + 1)/2 ad-
`ditions and one multiplication. However, by changing only one
`pulse position at a time, updating the term in Equation (14) be-
`comes much simpler. The search is performed in p nested loops,
`where each loop corresponds to one pulse position. In each loop,
`the contribution of a new pulse is added. Thus, in the most inner
`loop, the contribution of the last pulse is added, so the correla-
`tion requires one addition and the energy p additions and one
`
`(14)
`
`’
`
`multiplication (6 additions for our codebook). If an exhaustive
`search is to be carried out, this approach is more efficient than
`the best known CELP algorithms such as overlapp
`or VSELP.
`Although our search procedure is very efficient, the search
`becomes rapidly demanding as the codebook size exceeds 212.
`For huge codebooks such as the one
`clever search strategy has t o be followe
`subsection an approach which we call
`very small subset of the codebook is
`performance very close to that of full search.
`
`4.1 Focused Search
`
`Unlike partial search which operates on an arbitrary partial sub-
`set, in focused search it is possible to infer if the current codebook
`subset stands a chance of holding the winner. If the chances are
`good, the search is continued, otherwise we pas
`set of the codebook.
`Figure 2 shows a scatter plot of the absolute
`Id*akl versus the square root of the energy of filtered codewords,
`6 , where a codebook of 4096 entries is used. Note that the
`and is
`winning codeword is the one which maximizes Ck/&,
`represented in the plot by the point having maximum slope. It
`is observed that only a small portion of the points (codewords)
`stands a chance for being the winner, and the majority of points
`have slopes less than half that of the winner. Therefore, the
`reduced if the search al-
`ch have slopes very close
`e, the term is computed
`ulse in every inner loop.
`, one can decide whether
`the search should be continued or not by comparing the term
`with some predifined threshold. The threshold is set at the be-
`ginning of the search and is given by a fraction of the term at
`which the correlation is maximum. Figure 3 shows the points
`which has been searched at the most inner loop. In this specific
`example, less than 2% of the codebook
`
`SOUARE ROOT OF ENERGY
`
`Figure 2 A typical scatter plot of absolute correlation versus
`square root of energy of filtered codewords (4096 points).
`
`SQUARE ROOT OF ENERGY
`Figure 3 Same as Figure 2 deploying the focused search
`approach. For this typical subframe only 88 codewords are
`searched in the most inner loop.
`
`- 1 5 -
`
`Ex. 1039 / Page 3 of 4
`
`

`

`I
`
`I
`
`I
`
`,
`
`
`
`timum codeword is found. Note that the amount of codebook
`searched varies from one frame to another. Thus for real-time
`considerations, the search has to be stopped if it is much larger
`than the average search value.
`In our codebook five nonzero pulses are used, and every pulse
`can have 16 different positions (4 bits). We use two thresholds,
`one after the third pulse and the other after the fourth pulse.
`Even with very low thresholds, a substantial amount of the search
`is cut down without causing any significant degradation in the
`speech quality. Table 1 shows the SNR for different percentages
`of codebook search, for a sentence uttered by a female speaker.
`The search complexity is significantly reduced by increasing the
`threshold values. Searching about 0.05% of the codebook resulted
`in only 0.5 d B drop in SNR as compared to the full search case.
`I Percentaoe search I SNR fdBl I
`I 22.2
`100
`I 22.14
`4
`I 22.0
`I 1.6
`I 22.05
`0.2
`I 21.83
`0.15
`0.05
`121.8
`I 21.5
`0.03
`Table 1 SNRa for different percentages of codebook search.
`
`I
`
`5 WIDEBAND ACELP CODING
`
`In this section, we report on the work we are currently carry-
`ing out on high quality wideband coding below I6kb/s. The
`input speech is filtered at 50-7000Hz and sampled at 16kHz as
`in CCITT wideband specifications.
`Due to the efficiency of the ACELP codebook search proce-
`dure, and to the ability to incorporate very huge codebooks as
`described earlier, a full-band coding approach is adopted. The
`LPC parameters are updated every 15 ms and computed using a
`20ms Hamming window centered at the end of the frame. The
`speech frame is divided into 3 subframes of 5 ms. The pitch and
`codebook parameters are updated every subframe. The LPC pa-
`rameters are interpolated between adjacent frames (using LSFs)
`to obtain a different set of parameters for every subframe. Con-
`cerning the filter order, using 16 LPC coefficients was found ad-
`equate to represent the speech short-time spectrum envelope. A
`portion of the bandwidth expansion was used before the param-
`eter computation by lag windowing the autocorrelation coeffi-
`cients.
`Concerning the pitch analysis, a one-tap pitch predictor is
`used, and is updated every 5ms (80 samples). The pitch delay
`extends from 40 to 295 samples, and is encoded with 8 bits. The
`pitch delay has twice the resolution of the narrow band case. In-
`creasing the pitch resolution did not result in any considerable
`improvement due to the large excitation codebook used. Frac-
`tional pitch could be more beneficial at lower bit rates. The
`pitch parameters are computed in a closed loop approach, with
`a weighting factor 7 = 0.6. For delays less than the subframe
`length, the past excitation is extended by the short-term predic-
`tion residual. Thus, these delays are not treated seperately, as
`would be the case if the excitation itself is repeated. The impulse
`
`response of the weighted synthesis filter is truncated at 30, and
`the convolutions and energies are easily updated.
`The codebook is efficiently searched as described in the previ-
`ous section. A weighting factor of 7 = 0.9 is used. The thresholds
`were chosen such that, at the most, 1000 codewords are searched.
`High quality speech at bit rates around 13 kbs was obtained. The
`quantization procedures for the filter coeficients and excitation
`parameters will be detailed in a forthcoming paper.
`As we are using a full band approach, it is not abvious how
`the bits are allocated in different frequency bands. However, the
`ACELP algorithm implicitly favours the bands with more energy,
`and this is made more evident by the use of the shaping matrix
`which enhances the formants of the speech spectrum. There-
`fore the encoding algorithm accurately represents the frequency
`regions which are perceptually important.
`
`6 CONCLUSION
`
`We presented in this paper an efficient approach for encoding
`wideband speech signals using algebraic CELP. A full band en-
`coding approach was used whereby a codebook of 220 entries was
`used. We described an efficient procedure for searching such a
`large codebook deploying a focused search strategy, where less
`than 0.1% of the codebook is searched with performance very
`close to that of full search. High quality speech at a bit rate of
`13 kb/s was obtained.
`
`Acknowledgment
`
`This work was supported, in part, by the Canadian Workplace
`Automation Research Center (CWARC).
`
`References
`
`X.Maitre, “7 kHz audio coding within 64 kbit/s,” IEEE J.
`on Selec. Areas in Commun., Vol. 6, No. 2, pp. 283-298,
`Feb. 1988.
`
`B.S.Atal et al. (eds.), Adoances i n Speech Coding, Kluwers
`Academic Pub., 1991.
`
`J-P.Adou1 et al, “Fast CELP coding based on algebraic
`codes,” Proc. ICASSP’87, pp. 1957-1960.
`C.Ldamme et al., “On reducing computational complex-
`ity of codebook search in CELP coder through the use of
`algebraic codes,” Proc. ICASSP’SO, pp. 177-180.
`
`J.H. Chen and A.Gersho, “Real-time vector APC speech
`coding at 4800 bps with adaptive postfiltering,” Proc.
`ICASSP’87, pp. 2185-2188.
`
`- 16 -
`
`Ex. 1039 / Page 4 of 4
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket