throbber
A WIDEBAND CO DEC AT 16/24 KBIT /S WITH 10 MS FRAMES
`
`R. Salami, R. Lefebvre, and C. Laflamme
`Department of Electrical 8ngineering, University of Sherbrooke,
`Sherbrooke, Quebec, Canada JlK 2Rl
`
`ABSTRACT
`This paper describes a wideband speech/audio codec at 16/24
`kbit/s with 10 ms frames. The algorithm uses an ACELP model
`at 16 kbit/s and a switched ACELP /TCX model at 24 kbit/s.
`Adaptive preempha.sis is used to improve the performance at high
`frequencies and a hybrid forward/backward LP filter is used to
`improve the performance of stationary signals. Subjective tests
`showed that for speech signals, the codec performance at 16 and
`24 kbit is equivalent to 0.722 at 48 and 56 kbitfs, respectively.
`For music signals, the performance of the codec at 24 kbit/s was
`equivalent to that of 0.722 at 48 kbit/s.
`
`INTRODUCTION
`1.
`Compression of wideband speech and audio (7kHz bandwidth)
`is increasingly needed in many applications such as videoconfer(cid:173)
`encing and Internet. The ITU-T has recently started a standard(cid:173)
`ization activity for a wideba.nd codec at 16 and 24 kbit/s which is
`required to perform similar to G.722 at 48 and 56 kbit/s, respec(cid:173)
`tively, in most operating conditions. Initially, two modes were
`proposed: Mode A with 25 ms delay (the delay is considered as
`twice the frame size plus the lookahead} and Mode B with longer
`delay (60 ms) and lower complexity (15 MIPS). This paper de(cid:173)
`scribes a codec which wad proposed for mode A standardization.
`Section 2 describes the codec principles and Section 3 discusses
`the codec's performance. The conclusions are given in Section 4.
`
`2. CODEC PRINCIPLES
`2.1. Coding model and bit allocation
`The coder uses the algebraic code-excited linear predictive
`(ACELP) coding model [1] at 16 kbit/s and switch CELP/TCX
`(Transform coded excitation [2]) at 24 kbitfs. The coder uses
`!0 ms speech frames (160 samples at the sampling frequency of
`16000 sample/s). An adaptive preempha.sis procedure is per(cid:173)
`formed before the encoding process (2 bits are used to quantize
`the preemphasis filter). A hybrid forward/backward linear pre(cid:173)
`diction (LP) analysis is used. The short-term prediction parame(cid:173)
`ters (or LP parameters) are transmitted every speech frame. 1 bit
`is used to determine LP mode (forward or forward/backward).
`The speech frame is divided into 2 subframes of 5 ms (80 sam(cid:173)
`ples). The pitch and algebraic codebook parameters are trans(cid:173)
`mitted every subframe. The bit allocation of the coder is shown
`in Table l. The LP parameters are quantized with 34 bits. The
`pitch lag is encoded with 9 bits in the first subfrarne and G bits
`in the second subframe. The pitch gain is quantized with 4 bits
`and the fixed code book gain is quantized with 5 bits in each sub(cid:173)
`frame. The only difference between the two bit rate modes is the
`size of the innovation codebook. In the 16 kbit/s mode, the in(cid:173)
`novation codebook index is encoded with 45 bits each subframe,
`while in the 24 kbit/s mode it is encoded with 85 bits each sub(cid:173)
`frame. The algorithm can be switched dynamically from frame
`to fra.me between the two bit rates.
`
`Parameter
`LP mode
`?reemphasis filter
`ISPs
`Pitch delay
`Pitch gam
`~?vation codebook
`codebook gain
`Total
`
`1st subjr
`
`2nd subfr
`
`9
`4
`45
`5
`
`6
`4
`45
`5
`
`!:,er fram•!J
`1
`2
`34
`15
`8
`90
`10
`160
`
`Table 1. Bit allocation of the coding algorithm at 16 kbitfs.
`At 24 kbit/s the innovation codebook uses 85 bits instead of
`45 bits.
`
`2.2. Pre-processing
`The pre-processing block perfonns adaptive preemphasis. Four
`possible 2nd order filters, P(z), can be used (2 bits). The preem(cid:173)
`phasis filter is determined based on 2nd order LP analysis of the
`input signal. The use of preempha.sis significantly improves the
`codec performance at high frequencies. This gave better perfor(cid:173)
`mance than introducing a tilt filter into the perceptual weighting
`filter [3]. The second advantage of preemphasis is that it reduces
`the dynamic range of the input signal which facilitates the fixed(cid:173)
`point implementation of the algorithm.
`2.3. Short-term prediction
`Short-term prediction, or linea.r prediction (LP), analysis is pt:r(cid:173)
`formed once per speech frame (with a lookahead of 5 ms). The
`16th order LP parameters are quantized with 34 bits, and used
`fer the second subframe while the first subframe uses intel"J:•O..
`lated filters. The quantization and interpolation are performed
`in the immittancespectral pair (ISP) domain [4]. Predictive split
`vector quantization of the ISPs is used.
`To improve the quality in case of music signals, a hybrid for(cid:173)
`ward/backward LP filter configuration is used. The LP filter is
`either a forward filter or a hybrid forward/backward filter (de(cid:173)
`pending on stationarity criterion). 1 bit is used for the LP mode.
`2.4, Long-term prediction analysis
`The pitch parameters are the delay and gain of the pitch filt~r.
`In the first subframe, a fractional pitch delay is used with n:s(cid:173)
`olutions: 1/3 in the range [29t, lS8jJ and integers only in the
`range [159, 281]. For the second subframe, a pitch resolution of
`1/3 is always used in the range [T1 - 10~, T1 + 9~], where T1 is
`nearest integer to the fractional pitch lag of the first subfranll!.
`To simplify the pitch analysis procedure, a two stage approach
`is used [5]. First, an open loop pitch is computed every frame
`(10 ms} using the weighted speech signal sw(n) to find a pitch
`estimate Top· The weighted speech is low-pass filtered and dec(cid:173)
`imated by 3 to simplify the search. Second, a closed-loop pitch
`analysis is performed around the open-loop pitch estimate on a
`subframe basis. In the first subframe the range Top± 9, bounded
`
`0-7803-4073-6/97/$1 0.00©19971EEE.
`
`103
`
`LGSLC0008724 7
`
`Ex. 1024 / Page 1 of 2
`Apple v. Saint Lawrence
`
`

`

`by 30-281, is searched. For the second subframe, dosed-loop
`pitch analysis is performed around the pitch selected in the first
`subframe as described earlier. The pitch delay is encoded with
`9 bits in the first subframe and the relative delay of the second
`subframe is encoded with 6 bits.
`The pitch gain is quantized using 4-bit scalar quantization.
`2.5.
`Innovation codebook structure
`At 16 kbit/s, a 45-bit algebraic codebook is used. The 80 po(cid:173)
`sitions in a subframe are divided into 5 interleaved tracks. The
`innovation vector contains 10 non-zero pulses, where 2 pulses are
`placed in each track. All pulses can have the amplitudes +1 or
`-1. The positions and signs of the two pulses in a given track are
`encoded with 9 bits. This gives a total of 45 bits. The codebook
`is search using the fast procedure described in [6].
`At 24 kbit/s the innovation code book is either based on an al(cid:173)
`gebraic code book structure or transform-coded excitation (TCX)
`structure. The former codebook is more suitable for transient
`frames and attacks while the latter codebook is used in case of
`stationary periods. In the algebraic codebook case, the innova(cid:173)
`tion vector contains 20 non-zero pulses, where 4 pulses are placed
`in each one of the 5 tracks. All pulses can have the amplitudes +1
`or -1. In the TCX case, the target vector for codebook search
`is quantized in the transform domain [2].
`The fixed code book gain is qua'ntized using scalar quantization
`with 5 bits, after applying a 2nd order moving average (MA)
`prediction to the innovation energy in the logarithmic domain.
`2.6. Decoder
`The function of the decoder consists of decoding the transmitted
`parameters (LP parameters, adaptive codebook vector, algebraic
`code vector, and gains) and performing synthesis to obtain the
`reconstructed speech.
`The ouput of the LP synthesis filter is passed through the post(cid:173)
`processing block which performs an adaptive deemphasis proce(cid:173)
`dure (the inverse of the preprocessing procedure) to restore the
`dynamic of the speech signal.
`
`3. CODEC PERFORMANCE
`The codec was tested in compliance with the qualification test
`plan set by the ITU [7]. The test consisted of three experiments.
`Experiment la tested the codec performance in case of speech
`(single talkers without background noise; Experiment lb tested
`the performnace for music signals; and Experiment 2 tested the
`performance in case of speech with background noise. Table 2
`gives some of the results of Experiment la for the nominal level
`of -26 dBov (26 dB below overload). The table gives the codec
`I Condition
`G.722 48k
`codec 16k
`G.722 56k
`codec 24k
`G.722 48k 0.001 BER
`codec 16k 0.001 BER
`G.722 56k 0.001 B.EJR
`codec 24k 0.001 BER
`G.722 48k 2 tandem
`codec 16k 2 tandem
`G.722 56k 2 tandem
`codec 24k 2 tandem
`
`I d
`
`I C·nt I
`
`I
`
`0.08
`
`0.18
`
`0.06
`
`0.17
`
`1/ MOSc I Sc
`0.78
`3.41
`3.33
`0.99
`3.77
`0.78
`3.71
`0.87
`2.36
`0.91
`1.02
`2.68
`0.88
`2.59
`2.85
`1.11
`2.87
`0.71
`2.55
`0.98
`3.34
`0.73
`3.09
`1.02
`
`-0.32
`
`0.19
`
`-0.26
`
`0.20
`
`0.32
`
`0.17
`
`0.25
`
`0.18
`
`Table 2. Test results from Experiment la.
`condition, the combined Mean Opinion Score (MOS), the stan(cid:173)
`dard deviation, the difference between the reference and candi(cid:173)
`date codec and the 95% confidence interval. The codec meets
`the requirements for speech and significantly better in case of
`
`104
`
`biL errors. The coder was significatly better than G.722 at the
`lower input level of -36 dBov but didn't meet the requirement
`at higher input level of -16 dBov. This because the G. 722 refer(cid:173)
`ence coder is level dependent whose performance increases with
`increasing the input level. G.722 shows aMOS variation of al(cid:173)
`most 1 between higher and lower levels while the MOS variation
`of the candidate codec is limited to 0.1. The results showed that
`performance is slightly below meeting the tandem requirement.
`Initially, the nominal level was at -32 dB at which the tandem
`requirement was met. Then it was increased to -26 dB which
`increased the MOS of G.722 by 0.3 for a single encoding (due to
`its level dependancy).
`In Experiment lb (music), the coder didn't meet the require(cid:173)
`ment at 16 kbit/s, and at 24 kbit/s it was slightly worse that
`56 kbit/s G.722. The test showed that for music. the perfor(cid:173)
`mance at 24 kbit/s is better to G.722 at 48 kbit/s. The re(cid:173)
`quirements for music are difficult to attain with the short frame
`size of 10 ms due to the lack of frequency resolution to perform
`perceptual transform coding.
`In Experiment 2, the codcc didn't meet the requirements in the
`presence of background noise. This is due to the discriminatory
`Comparison Category Rating procedure used in this experiment.
`
`4. CONCLUSION
`The article described a wideband speech codec operating at
`16/24 kbitfs. The coder operates on 10 ms speech frames using
`an ACELP algorithm at 16 kbit/s and a switched ACELP /TCX
`algorithm at 24 kbit/s. Subjective test results showed that the
`codec meets most the performance requirements for clean speech
`(equivalent to 48/56 kbit/s G.722), while it is below the require(cid:173)
`ments for music signals and background noise conditions.
`In the March 1997 meeting of SG 16, it was decided to keep
`only the longer delay mode (60 ms) for the wide band coding stan(cid:173)
`dard while allowing more complexity (single commercial DSP
`chip). The larger delay is essential in order to meet the require(cid:173)
`ments for music signals. The procedure for testing the back(cid:173)
`ground noise conditions has been changed, which is likely to
`make it less difficult to meet the requirements. The remaining
`difficulty will be meeting the requirement at -16 dBov. It is in
`fact illogical to test the level dependency of the candidate codec
`against a reference codec which is itself very level dependent.
`
`REFERENCES
`[1] C. Laflamme, J-P. Adoul. R. Salami, S. Morissette, and
`P. Mabilleau, "16 kbps wideband speech coding technique
`based on algebraic CELP," Proc. ICASSP'9!, pp. 13-16.
`[2] R. Lefebvre, R. Salami, C. Laflamme, and J.-P. Adoul,
`"High quality coding of wideband audio signals using
`Transform-Codec eXcitation (TCX)," Proc. ICASSP'94,
`pp. I-193-I-196.
`[3] E. Ordentlich and Y. Shoham, "Low-delay code-excited
`linear-predictive coding of wideband speech at 32 kbps,"
`Proc. ICASSP'91, pp. 9-12.
`[4] Y. Bistritz and S. Peller, "Immittance spectral pairs (ISP)
`for speech encoding," ?roc. ICASSP'93, pp. U-9-II-12.
`[5] R. Salami, C. Laflamme, J-P. Adoul, and D. Massaloux, "A
`toll quality 8 kb/s speech co dec for the personal communi(cid:173)
`cations system (PCS)," IEEE Trans. Veh. Techno!., vol. 43,
`no. 3, pp. 808-816,Aug. 1994.
`(6] R. Salami et al, "Description of GSM enhanced full rate
`codec," Proc. ICC'97.
`[7] "Subjective qualification test plan for the ITU-T wideband
`(7 kHz) speech coding algorithm," ITU-T, Version 3.1,
`November 1996.
`
`LGSLC00087248
`
`Ex. 1024 / Page 2 of 2
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket