throbber
A 13.0 KBIT/S WIDEBAND SPEECH CODEC BASED ON SB-ACELP
`
`Jiirgen Schnitzler
`
`RWTH Aachen, University of Technology
`Institute of Communication Svstems and Data Processing (IND), D-52056 Aachen, Germany
`ht tp: // www. ind.rwth-aachen. de/-j uergen
`Juergen.Schnitzler@ind.rwth-aachen.de
`
`ABSTRACT
`This paper describes a wideband (7 kHz) speech compres-
`sion scheme operating at a bit rate of 13.0 kbit/s, i.e. 0.8 bit
`per sample. We apply a split-band (SB) technique, where
`the 0-6 kHz band is critically subsampled and coded by an
`ACELP approach. The high frequency signal components
`(6-7 kHz) are generated by an improved High-Frequency-
`Resynthesis (HFR) at the decoder such that no additional
`information has to be transmitted. In informal listening
`tests, the subjective speech quality was rated to be compa-
`rable to the CCITT G.722 wideband codec at 48 kbit/s.
`
`speech quality as our original algorithm [3] for clean speech
`at 16 kbit/s.
`In this paper, we present a modified scheme that shows an
`improveid performance under both clean speech and acous-
`tic bachground noise conditions. In the sequel, section 2
`gives an overview of the general codec structure, whereas
`section 3 focusses on the core codec, an ACELP algorithm
`designed for the main 0-6 kHz subband signal. In section 4
`we propose an improved high-frequency resynthesis of the
`6-7 kHz band that does not require the transmission of any
`side information.
`
`1. INTRODUCTION
`
`2. GENERAL CODEC STRUCTURE
`
`The interest in using wideband (50 . . . 7000 Hz) speech and
`audio signals has grown within the last years. Compared to
`'narrowband', i.e. telephone band limited signals, the larger
`signal bandwidth provides much more naturalness and in-
`telligibility, and thus promises a significant quality improve-
`ment for telecommunication services. As a first wideband
`speech compression standard released in 1988, the CCITT
`G.722 [l] subband ADPCM scheme operates at bit rates
`of 48, 56 or 64kbit/s (i.e. at effictive rates of 3-4 bit per
`sample).
`Recently, ITU-T study group 16 has started a new stan-
`dardization of a coding algorithm which is required to ex-
`hibit, at bit rates of 16, 24 and 32 kbit/s (1-2 bit per sam-
`ple), a similar performance as the (2.722 codec at its respec-
`tive rates under most operating conditions [2]. The new
`codec aims at wireline applications such as ISDN wideband
`telephony, videoconferencing, and also at packet transmis-
`sion applications as B-ISDN and 'multimedia' transmissions
`in the internet. In [3] we have proposed a split band coding
`scheme that fulfilled most of the requirements for speech at
`16 kbit/s.
`New applications for wideband speech will arise in the do-
`main of mobile communications, which experienced a tre-
`mendous development during the last decade. Future in-
`terconnections between fixed and mobile networks and the
`increasing competition between their operators, e.g. in the
`Wireless Local Loop, will certainly excite a need for high
`quality services. Low rate wideband speech coding schemes
`(i.e. at effective rates of 0.5-1 bit per sample) may play an
`important role in this context. In ETSI SMG 11 the intro-
`duction of a wideband mode is currently being discussed for
`the forthcoming AMR (Adaptive Multi-Rate) codec stan-
`dard [4], which shall replace the existing GSM codecs. In a
`previous proposal [5] we have introduced an algorithm that
`provided, at a rate well below 13 kbitjs, a similar clean
`
`Similarly to CCITT G.722, our basic approach is to split
`the input signal into two subbands, in order to allocate the
`available bit rate according to both the spectral distribution
`and the subjective importance of the subband components.
`An important difference is that we found an unequal split-
`ting at a cutoff frequency of 6 kHz to be a more suitable
`solution [3]. This conclusion was motivated by an inspec-
`tion of the instantaneous bandwidth of speech signals and
`by the spectral resolution of human perception: the 6-7 kHz
`band corresponds to about one critical band only.
`In our configuration, thosc spectral portions of the upper
`subbandl (6-7kHz) which are sufficient to convey a cor-
`rect subjective impression of wideband speech can be rep-
`resented either by coding them at a very low bit rate or
`even, as described in this paper, by extrapolation at the
`decoder side. Furthermore, this band splitting allows the
`lower subband (0-6 kHz) to be more efficiently quantized: at
`increams from 6 M 0.8 bit per sample at a sampling rate of
`an overall target bit rate of 13 kbit/s, the effective bit rate
`fs = 16 kHz to R = 1.1 bit per sample at fs = 12 kHz.
`This suggests the use of state-of-the-art ACELP (Algebraic
`- Code-&<cited Linear Prediction) techniques for coding the
`lower subband. Currently the domain of toll-quality, me-
`dium rate narrowband speech codecs is dominated by al-
`gorithms based on ACELP, as they best fulfill the perfor-
`mance requirements in terms of subjective quality, complex-
`ity, robustness and delay. Examples of ACELP codecs are
`the GSRX Enhanced Full Rate (EFR) codec [6] (12.2 kbit/s,
`i.e. R c: 1.53 bit per sample), the ITU-T G.729 universal
`8 kbit/s codec [7] (R = 1 bit per sample) an$ its extensions,
`or the IS-641 standard [8] (7.4 kbit/s or R M 0.93 bit per
`sample) for the US-TDMA system.
`Figure 1 a) shows the encoder structure of our proposal. A
`rate conversion module extracts the 0-6 kHz lower subband
`from the input wideband (7kHz) signal and reduces the
`
`157
`
`0-7803-4428-6198 $10.00 0 1998 IEEE
`
`Ex. 1045 / Page 1 of 4
`Apple v. Saint Lawrence
`
`

`

`-
`1
`I
`
`r-
`
`ACELP
`
`Encoder
`
`X
`
`bit
`
`,
`
`innnt,
`i G i i h
`4
`
`3-
`ACELP
`E
`
`Adaptive
`
`E --c Decoder - PosKiter
`
`output
`speech
`HFR
`w
`
`& .
`
`Figure 1: a) Structure of wideband encoder
`b) Structure of wideband decoder
`
`sampling rate from 16 kHz to 12 kHz using a linear phase
`analysis filter.
`The ACELP core codec operates on speech frames of 20 ms
`(240 samples at fs = 12 kHz). For every frame, a total of
`260 bits is transmitted over the channel, including 8 bits of
`protection information that can be used for error conceal-
`ment in the decoder. The resulting bit allocation is shown
`in Table 1. Details of the ACELP configuration are given
`in the next section.
`The decoder structure is shown in Figure 1 b) and will be re-
`fined in section 4. The received bits are used in the ACELP
`decoder to synthesize the lower band signal. A postfilter
`is applied to the signal in order to enhance the percep-
`tual quality. The receiver rate conversion module interpo-
`lates the postfilter output to the original sampling rate of
`16 kHz. Both the decimation (transmitter) and the interpo-
`lation (receiver) filter contribute a delay of 2.5 ms each. In
`conjunction with the framing, the overall algorithmic delay
`therefore amounts to 25 ms.
`Finally a High-Frequency-Resynthesis (HFR) module gen-
`erates an upper band (6-7 kHz) signal portion. As it will be
`described in section 4, the regenerated upper band signal
`consists of a gain-amplified and filtered bandpass noise. All
`necessary parameters are solely adapted based on the re-
`ceived lower band parameters and the use of a priori know-
`legde of the input speech.
`I Bit Allocation I Bits/Frame 1 Bit Rate I
`I Parameter
`32bit I 1.6kbit/s 1
`I LPC
`I
`I
`ACB-Index I
`2x(8+6) bit I
`''bit
`I
`4x 4bit I
`16 bit
`ACB-Gain
`8 ~ 1 8 b i t
`144 bit
`FCB-Index
`FCB-Gain
`8x 4bit
`32 bit
`8 bit
`Paritv
`
`I 2.2kbit/s 1
`
`I
`
`8.8 kbit/s
`0.4kbitls
`
`Spectral Frequencies (LSF) from the windowed speech sig-
`nal. The analysis window covers 300 samples and is right
`aligned with the current 20ms frame, i.e. no lookahead is
`used. The order of the LP filter is p l b = 14. Before com-
`puting the LSF coefficients, the autocorrelation matrix is
`weighted using a binomial window, providing an additional
`amount of bandwidth expansion to the LP filter.
`The 14 LSF parameters are quantized by a Predictive Multi-
`stage Split Vector Quantizer scheme. For the prediction of
`the LSF vector, a Moving Average (MA) model of order 4
`is used. The closed-loop residual quantizer consists of two
`stages of split vector quantizers, using 2 segments for the
`first and 3 segments for the second stage, respectively. One
`of two fixed predictor sets can be chosen. This approach
`resulting into an overall bit rate of 32 bit per frame is sim-
`ilar to the one used for the ITU-T G.729 codec [7].
`In addition, a linear interpolation of the LP filter coeffi-
`cients is performed in the LSF domain every 5 ms.
`
`3.2. Long-term prediction analysis
`
`Every 5ms, the long-term prediction (LTP) is carried out
`in a combination of open-loop and closed-loop LT-analysis
`based on an adaptive codebook (ACB) representation (see
`[5]). The ACB delays in the four LTP subframes are coded
`by 8+6+8+6=28 bits. In the lower delay range a fractional
`pitch approach is used. The ACB gains are nonuniformly
`quantized with 4 bits each.
`
`3.3. Fixed codebook (FCB) excitation
`Every 2.5 ms (30 samples), an excitation shape vector is se-
`lected from a sparse algebraic pulse codebook. An innova-
`tion vector contains 4 nonzero pulses, as shown in Table 2.
`The pulses 1 and 2 can take one of 16 possible positions,
`the pulses 3 and 4 one of 8 positions. Since each pulse can
`have an individual sign, 18 bits are necessary to encode the
`shape vector. Note that the pulses 1 and 3 as well as pulses
`2 and 4 may share the same position, and that all pulses
`can fall outside the valid range of positions 0 . . . 29. This
`allows a variable number of pulses and pulse amplitudes of
`0, f l , f 2 .
`The codebook structure and the efficient focussed search
`method are based on [7]. The FCB gain is quantized us-
`ing a fixed autoregressive predictor in order to reduce the
`dynamic range [lo]. The residual of the gain predictor is
`nonuniformly scalar quantized with 4 bits.
`
`3.4. Perceptual weighting
`The perceptual weighting filter W ( z ) applied during the
`optimization processes of the ACB and FCB search has a
`
`Table 1: Bit allocation of the proposed codec
`
`3. ACELP CODING OF 0-6KHZ BAND
`
`3.1. Short-term L P analysis
`The linear prediction (LP) analysis uses a modified Split-
`Levinson approach as described in [9] to compute the Line
`
`158
`
`1
`
`4
`
`1
`
`
`
`,
`
`,
`
`3: 7 : 11. 15. 19. 23. 2;
`
`,
`
`,
`
`
`
`Table 2: 18-bit sparse algebraic pulse codebook
`
`Ex. 1045 / Page 2 of 4
`
`

`

`transfer function of the form
`
`with A(z) being the LP-analysis filter computed from the
`unquantized, interpolated LSF parameters. Different sets
`of weighting factors {yl , yz} are used for the adaptive and
`fixed codebook search. For the fixed codebook search, the
`parameters are adapted with respect to the tilt and the
`strength of resonances of the LP synt,hesis filter [7].
`
`3.5. Adaptive postfilter
`As described in [ll], the adaptive postfilter consists of a
`cascade of a formant postfilter, an harmonic postfilter and
`a tilt compensation filter. The postfilter is updated ev-
`ery LTP subframe (5 ms). The formant postfilter uses the
`transmitted LPC filter coefficients. After postfiltering, an
`adaptive gain control is performed.
`
`4. HIGH F R E Q U E N C Y RESYNTHESIS
`
`For speech, spectral components above 6 kHz are almost al-
`ways due to unvoiced, i.e. fricative, sounds. Informal listen-
`ing tests showed that the presence of this spectral band is
`still very well perceivable. However, a sufficient subjective
`quality does not require an exact reproduction of the noise-
`like signal waveform. In [5] we have demonstrated a very
`simple and efficient spectral folding technique to regenerate
`an upper band signal. This approach exploited the obser-
`vation that the spectral distributions in the 5-6kHz and
`6-7 kHz bands are very similar.
`On the other hand, many operating conditions typically in-
`clude the presence of acoustic background noise. In such sit-
`uations of non-speech signal components, our previous pro-
`posal sometimes revealed perceivable degradations. Tech-
`niques as proposed in [12] do not yield the intended quality,
`either.
`In this paper, we describe a more elaborate scheme based
`on High-Frequency Resynthesis (HFR) techniques that have
`been initially studied for the extension from telephone-band
`to wideband speech [13].
`
`4.1. Resynthesis of the 6-7 kHz band
`Similarly to [3], we model the upper band signal by a band-
`pass (6-7 kHz) noise excitation whose magnitude spectrum
`has to be shaped properly (see Figure 2). The basic idea
`is to separately extrapolate the spectral envelope and the
`residual of the signal. Typically there exists a good cor-
`relation between the lower band spectral envelope and the
`spectrum of the upper band. Since no side information shall
`be transmitted, the task is to predict the spectral shape and
`the energy of this excitation from the received lower band
`parameters.
`The overall spectral shape of the output wideband signal
`is determined by selecting an appropriate LPC synthesis
`filter l/A~f,.(z). Provided that this filter matches the syn-
`thesized lower band spectrum, it is expected that its be-
`haviour above 6kHz will reflect the original upper band
`speech components. l / A ~ f , . ( z ) is determined from a code-
`book C w b describing N H ~ , . LPC filters that are computed
`
`at fs = 16 kHz and stored in their LSF representation [15].
`The decoded and interpolated lower band signal is first fil-
`tered by A ~ f , . ( z ) ,
`such that the gain-adapted noise excita-
`tion for the upper band is added in the residual domain,
`before the sum is again filtered using l/A~f,.(z). This im-
`plies that, regardless of the choice of R ~ f ~ ( z ) ,
`the HFR
`module does not introduce any additional degradation to
`the lower band. The spectral fit of the filter to the actual
`lower band signal does not have to be very exact and the
`number of stored filter parameters can be limited. We have
`found N H ~ , . = 48 different LSF sets of order p w b = 16 to
`be sufficient.
`For the selection of A~f,.(z), a second codebook C l b is nec-
`essary: it contains N H ~ , . LSF vectors describing the spec-
`tral envelope of the lower band at a sampling frequency of
`E C w b (v = 0 . . . N H ~ , . -
`12 kHz. For each LSF vector
`I), the associate vector %b,u E C l b approximates the lower
`band part of the spectrum given by w , , ~ , ~ . Thus the selec-
`tion prolcess can be understood as re-quantizing the lower
`band LF’C filter, A(z), in C l b and looking up the LSF pa-
`rameters for l/AHfr(z) in the ’shadow’ codebook C w b . The
`found HFR filter defined by l/A~f,.(z) is linearly interpo-
`lated every 5 ms.
`The adaptation of the HFR gain g H f r is performed in the
`residual domain. Assuming that the inverse HFR filter
`
`A ~ f ~ ( z ) yields a rather fiat spectrum in the 0-6kHz fre-
`quency range, the bandpass noise d u b is adjusted in order to
`adopt the same power spectral density level in the 6-7 kHz
`range. Since d l b is not completely decorrelated, better re-
`sults are obtained when using a high-pass (5 kHz) filtered
`portion for the gain adaptation. This scaling is updated ev-
`ery 5 ms, and the resulting gains are smoothed on a sample-
`by-sample basis.
`The described HFR method serves to achieve a more trans-
`parent subjective quality than our previous spectral folding
`approach.
`In particular, the performance in background
`noise conditions has been improved.
`
`4.2. Design of HFR codebooks
`To obtain the HFR codebooks C l b and C w b , an approach
`close to the Linde-Buzo-Gray (LBG) algorithm [14] is cho-
`sen [15]. Prior to the training phase, an initial codebook
`is required for the partioning of the plb-dimensional vector
`space filled by all possible lower band LSF parameter sets.
`- 1)
`(v = 0 . . . N H ~ ~
`contains N H j , . LSF vectors %ObU
`and is obltained by applying the LB6 algorithm to the lower
`band portion of the training speech data.
`During the training process, for each 20ms frame X an
`LPC analysis (order Pwb) is performed on the wideband
`input speech (fs = 16kHz); thus, an LSF vector gwb(X)
`is computed. In parallel, the lower band signal portion
`(fs = 12 kHz) of frame A is subject to a second LPC anal-
`ysis (order plb), yielding an LSF vector gfb(X). Using c p b ,
`the current frame’s parameters % b ( X ) and gwb(X) are as-
`signed to the sets P l b ( U ) and p w b ( V ) , respectively. P l b ( l / )
`and p w b ( V ) , U = 0 . . . N H ~ , . - 1, define the partioning of
`the vectsor spaces containing the LSF parameters %(,(A)
`and gwb(X). This assignment is achieved by searching c f b
`and selecting v such that an inverse LPC filter, built from
`gYb,” E c f b and applied to the lower band speech, yields the
`minimum mean squared prediction error.
`After processing all frames of training data, the final code-
`
`159
`
`Ex. 1045 / Page 3 of 4
`
`

`

`bit
`stre
`
`output
`speech
`
`Figure 2: Wideband speech decoder details: ACELP decoder and High-Frequency Resynthesis (HFR)
`
`vectors of Clb and Cwb are found as the centroids of par-
`tions Plb(v) and Pwb(v), v = 0 . . . N H ~ ? - 1, respectively.
`This procedure ensures to produce pairs of lower band and
`wideband LSF parameters having a good spectral fit in the
`0-6 kHz range. Furthermore, the stability of the resulting
`filters is guaranteed [15].
`It can be noted that, in order to save memory in a practical
`implementation, the codebook Cwb may be directly linked to
`the high-resolution LSF quantizer of the lower band ACELP
`decoder, instead of explicitely storing C l b .
`
`5. CONCLUSION
`
`In this paper an SB-ACELP encoding scheme for 13.0 kbit/s
`wideband speech encoding has been presented. The algo-
`rithm is based on a split band (SB) structure. A state-
`of-the-art ACELP codec operating at a 12 kHz sampling
`frequency is used to transmit the 0-6 kHz subband signal.
`An LPC-based High-Frequency-Resynthesis technique has
`been successfully applied to fill the perceptually significant
`upper 6-7kHz band on the decoder side, without the need
`to transmit any side information. By informal listening
`tests the speech quality was judged to be comparable to
`the CCITT (2.722 wideband codec operating at 48 kbit/s.
`
`6. REFERENCES
`
`(11 CCITT, “7 kHz Audio Coding within 64kbit/s,” in
`Recommendation G. 722, vol. Fascile 111.4 of Blue Book,
`pp. 269-341, International Telecommunication Union,
`Melbourne 1988.
`[2] ITU-T SG 16 Q.20, “Terms of Reference for the ITU-
`T Wideband (7 kHz) Speech Coding Algorithm,” April
`1997.
`[3] J. Paulus und J. Schnitzler, “16 kbit/s Wideband
`Speech Coding Based on Unequal Subbands” in
`Proc. Int. Conf. Acoust., Speech, Signal Processing,
`ICASSP, (Atlanta, Georgia, USA), pp. 651-654, 1996.
`[4] ETSI SMG11, “Draft Adaptive Multi-Rate (AMR)
`Study Phase Report.” Version 0.4, Tdoc SMG11-AMR
`128/97, August 1997.
`[5] J. Paulus und J. Schnitzler, “Wideband Speechcod-
`ing for the GSM Fullrate Channel?” in Proceedings
`
`ITG-Fachtaguny “Sprachkommunikation”, (Frankfurt
`am Main), pp. 11-14, 1996.
`[SI ETSI/TC SMG, “Recommendation GSM 06.60: En-
`hanced Full Rate Rate Speech Transcoding” European
`Telecommunications Standards Institute, Januar 1996.
`[7] CCITT/ITU-T, “Rec. G.729: Coding of Speech at
`8 kbit/s Using Conjugate-Structure Algebraic-Code-
`Excited Linear-Prediction (CS-ACELP)” in General
`Aspects of Digital Transmission Systems; Terminal
`Equipments, Series G Recommendations, International
`Telecommunication Union, 1996.
`[8] T. Honkanen, J. Vainio, K. J%rvinen, P. Haavisto,
`R. Salami, C. Laflamme, und J.-P. Adoul, “Enhanced
`Full Rate Speech Codec for IS-136 Digital Cellular Sys-
`tem” in Proc. Int. Conf. Acoust., Speech, Signal Pro-
`cessing, ICASSP, (Munich, Germany) , pp. 731-734,
`IEEE, 1997.
`[9] S. Saoudi und J. Boucher, “A New Efficient Algorithm
`to Compute the LSP Parameters for Speech Coding”
`Signal Processing, vol. 28, pp. 201-212, 1992.
`[lo] R. Salami, C. Laflamme, J. Adoul, und D. Massaloux,
`“A Toll Quality 8 Kb/s Speech Codec for the Personal
`Communications System (PCS)” IEEE Transactions
`on Vehicular Technology, vol. 43, pp. 808-816, August
`1994.
`[ll] J. Chen und A. Gersho, “Adaptive Postfiltering for
`Quality Enhancement of Speech’’ IEEE Duns. Speech
`and Audio Processing, vol. 3, pp. 59-71, January 1995.
`[12] J. Makhoul und M. Berouti, “High-Frequency Regen-
`eration in Speech Coding Systems,” in Proc. Int. Conf.
`Acoust., Speech, Signal Processing, ICASSP, (Wash-
`ington, DC), pp. 428-431, IEEE, 1979.
`[13] H. Carl, Untersuchung verschiedener Methoden der
`Sprachcodierung und eine Anwendung zur Bandbreit-
`enveryroJerung von Schmalband-Sprachsignalen. PhD
`thesis, Ruhr-Universitat Bochum, 1994.
`[14] Y . Linde, A. Buzo, und R. Gray, “An Algorithm for
`Vector Quantizer Design” IEEE Dansactions on Com-
`munications, vol. 28, pp. 84-95, January 1980.
`tiinstlichen
`[15] J. Kenkenberg, “Untersuchungen zur
`von
`Sprachsignalen”.
`Bandbreitenvergroflerung
`Diploma thesis D25/96, Institut fur Nachrichtengerate
`und Datenverarbeitung, IND, RWTH Aachen, 1996.
`
`160
`
`Ex. 1045 / Page 4 of 4
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket