`Speech, & Signal Processing
`
`‘
`
`Conference Proceedings
`
`May 7 - 10, 1996
`A1l:'tl1E'd.GeOi'_£_1i'.l USA
`
`.i
`
`..-I‘
`
`IC A S S P-96
`fit/a/(ta
`
`Volume 1
`
`Sponsored by the
`
`Signal Processing Society of the
`lnstittllc of Electrical and Electronic Engineers
`
`g1of8
`
`it
`
`P
`
`'
`
`P IBIT 05
`
`ZTE EXHIBIT 1005
`
`Page 1 of 8
`
`
`
`~ G/15Sf: f!i ~ ?f'
`(;r-:~!+55!)
`
`The 1996 IEEE International Conference on
`Acoustics, Speech, and Signal Processing
`Conference Proceedings
`
`Sponsored by the Signal Processing Society of the Institute of Electrical and
`Electronics Engineers
`
`May 7-10, 1996
`Marriott Marquis Hotel
`Atlanta, Georgia, USA
`
`Page 2 of 8
`
`
`
`I K 7 r 1 ~
`y ll.(l 5{ c!; ~ 7 cL,
`
`-
`
`. , -
`
`1 1 {o
`
`The 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing
`Conference Proceedings
`
`Copyright and Reprint Permission: Abstracting is permitted with credit to the source. Libraries are permilled to photocopy
`beyond the limit of U.S. copyright law for private use of patrons those articles in this volume that carry a code at the bottom
`of the first page, provided the per-copy fee indicated in the code is paid through Copyright Clearance Center, 222 Rosewood
`Drive, Danvers, MA 01923. For other copying, reprint or republication permission. write to IEEE
`
`Copyrights Manager, IEEE Service Center, 445 Hoes Lane, P.O. Box 1331, Piscataway, NJ
`08855-1331. All rights reserved. Copyright 1996 by the Institute of Electrical and Electronics ~/ _
`Engineers, Inc.
`·.::
`V 1
`1
`
`96CH35903
`IEEE Catalog Number:
`ISBN 0-7803-3192-3 (softbound)
`ISBN 0-7806-3193-1 (casebound edition)
`ISBN 0-7803-3194-X (microfiche)
`ISBN 0-7803-3195-8 (CD-ROM)
`84-645139
`Library of Congress:
`
`Additional Proceedings (hard-copy and CD-ROM) may be ordered from:
`
`IEEE Service Center
`445 Hoes Lane
`P.O. Box 1331
`Piscataway, NJ 08855-1331
`1-800-678-IEEE
`
`ii
`
`Page 3 of 8
`
`
`
`1996 International Conference on Acoustics,
`Speech, and Signal Processing
`Conference Committee
`
`The 1996 International Conference on Acoustics, Speech, and Signal Processing (ICASSP), sponsored by the IEEE Signal
`Processing Society, is the 21st in a series of international conferences devoted to experimental and theoretical aspects of sig(cid:173)
`nal processing, speech, and acoustics. Conferences on this scope are possible only because of the continuing interest and sup(cid:173)
`port of the Society membership, expressed both by their submission of papers of high quality and by their attendance at the
`conference. T he ICASSP 96 Conference Committee is grateful to all the authors, the session chairs, and the volunteers for
`contributing to the success of tbe conference.
`
`Committee Members and Chairs
`
`General Chair
`Monson H. Hayes
`Georgia Institute of Technology
`Atlanta, GA 30332-0250 U.S.A
`Tel. (404) 894-2958
`E-mail: icassp96-chair@ece.gatech.edu
`
`Technical Program:
`Mark A. Clements
`Georgia Institute of Technology
`Atlanta, GA 30332-0250 U.S.A
`Tel. (404) 894-4584
`E-mail: icassp96-technical @ece.gatech .edu
`
`Finance:
`Craig H. Richardson
`Atlanta Signal Processors Inc.
`1375 Peachtree Rd. NE. Ste. 690
`Atlanta, GA 30309-3115 U.S.A.
`Tel. (404) 892-7265
`E-Mail: ica.~sp96-financc@ece.gatcch.edu
`
`Exhibits:
`John Kalter
`Lanier Worldwide. Inc.
`4667 North Royal Atlanta Drive
`Tucker, GA 30084 U.S.A.
`Tel. (770) 493-2201
`E-Mail: icassp96-exhibits@ece.gatech.edu
`
`Local Arrangemems:
`Russell M. Mersereau
`Georgia Institute of Technology
`Atlanta. OA 30332-0250 U.S.A.
`Tel. (404) 894-29 13
`E-Mail: icassp96-local @ece.gatech.edu
`
`Registration:
`Douglas B. WiWams
`Georgia Institute of Technology
`Atlanta, OA 30332-0250 U.S.A
`Tel. (404) 894-9832
`E-Mail: ieassp96-reg@ece.gatech.edu
`
`Publications:
`Vijay K. Madisetti
`Georgia Institute of Technology
`Atlanta. GA 30332-0250 U.S.A.
`Tel. (404) 894-4696
`E-Mail: icassp96-pubs@ece.gatech.cdu
`
`Guotong Zbou
`Georgia Institute of Technology
`Atlanta. OA 30332-0250 U.S.A.
`Tel. (404) 894-2907
`E-mail: gtz@eedsp.gatech.edu
`
`Social:
`Mary Ann Ingram
`Georgia Institute of Technology
`Atlanta. GA 30332-0250 U.S.A
`Tel. (404) 894-9482
`E-Mail: icassp96-socinl @ece.gatech.edu
`
`Publicity:
`Stanley J. Reeves
`Dept. of Electrical Engineering
`Auburn University
`Auburn, AL. 36849-1809 U.S.A.
`Tel. (334) 844-1821
`E-Mail: icassp96-publicity@ece.gatech.edu
`
`Tutorial:
`John H. L. Hansen
`Dept. of Electrical Engineering
`Duke University
`Durham, NC 27706 U.S.A.
`Tel.(919) 660-5256
`E-mail: jhlh@ee.duke.edu
`
`European Liaison:
`M aurice Bellanger
`CNAM/Electronique
`292, rue Saint-Martin
`75141 Paris Cedex 03 FRANCE
`Tel. +(33)-1 -4027-2590
`E-mail: bellang@cnam.cnam.fr
`
`Far East Liaison:
`Sadaoki Furui
`Furui Research Laboratory
`NIT Human Interface Labs
`9-11 Midori-CHO 3-CHOME
`Musashino-Shi Tokyo 180 JAPAN
`E-mail: furui @speech-sun 15.ntt.jp
`
`Conference Secretariat
`Meeting Management
`2603 Main Street. Suite 690
`Irvine. CA 92714 U.S.A.
`Tel. (714) 752-8205
`Fax: (714) 752-7444
`E-mail: 74710.2266@CompuServe.Com
`
`iii
`
`Page 4 of 8
`
`
`
`16 KBIT/S WIDEBAND SPEECH CODING BASED ON UNEQUAL SUBBANDS
`
`Jiirgen W. Paulus and Jiirgen Schnitzler
`Institute of Communication Systems and Data Processing (IND)
`RWTH Aachen, University of Technology, D-52056 Aachen, Germany
`phone: +49.241.806961, fax: +49.241.8888186, juergen.paulus@ind.rwth-aachen.de
`
`ABSTRACT
`
`In this paper we propose a split-band encoding scheme
`for 16 kbit/s wideband speech coding (50-7000Hz), using
`2 unequal subbands from 0-6 kHz and from 6-7kHz. This
`approach was motivated by experimental evaluation of the
`signal bandwidth of speech frames. The higher subband
`is simply represented by white noise with adjustment of
`the short term energy. For the lower subband code-excited
`linear prediction (CELP) is used. By informal listening
`tests the speech quality was rated higher than the speech
`quality of the CCITT G.722 wideband codec operating at
`48kbit/s.
`
`1.
`
`INTRODUCTION
`
`During the last few years there has been an increasing effort
`in wideband speech coding at lower bit rates. This not only
`arises from high quality videophone and digital mobile tele(cid:173)
`phone applications, but also from the increasing market for
`multimedia systems where high quality speech and audio
`is demanded. Compared to narrowband telephone speech,
`the reduction of the lower cut off frequency from 300Hz to
`50 Hz contributes to increased naturalness and fullness. The
`high frequency extension from 3400Hz to 7000Hz provides
`better fricative differentiation and therefore higher intel(cid:173)
`ligibility.
`In 1986 the International Telegraph and Tele(cid:173)
`phone Consultative Committee ( CCITT, now ITU-T) re(cid:173)
`commended the G.722 standard for wideband speech and
`audio coding. This wideband speech codec provides high
`speech quality at 64 kbit/s with a bandwidth of 50 Hz to
`7000Hz [1). Slightly reduced qualities are achieved at 56
`and 48 kbit/s. Since September 1993, the International
`Telecommunications Union Study Group 15 (ITU-T SG 15)
`studies in Question 6 ("Audio and Wideband Coding for
`Public Telecommunication Networks") new coding schemes
`for low-rate wideband speech coding at 16, 24, and 32 kbitfs
`(2). The G.722 standard will serve as a reference for the de(cid:173)
`velopment of this alternative coding scheme.
`In the past, linear prediction models have been used very
`successfully for the coding of telephone speech. Recently,
`a new 8 kbit/s narrowband speech coder has been selected
`by the ITU-T SG 15 which provides telephone quality at
`1 bit/sample [3, 4). This indicates, that very good coding
`quality might be possible for wideband speech signals with
`1 bit/sample, too. However, for audio signals the desired
`quality has not yet been achieved using LPC techniques
`
`with long term prediction (LTP) which are based on a model
`of speech production. For those signals, subband coding,
`transform coding and various forms of entropy coding have
`been used for efficient coding with 2-3 bits per sample, if no
`oversampling is applied.
`In the following sections an encoding scheme for speech
`will be presented which consists of a 2-band splitband
`scheme with unequal bandwidths of the subbands. This ap(cid:173)
`proach is motivated by the experimental evaluation of the
`instantaneous signal bandwidth. First, in Section 2 a clas(cid:173)
`sification scheme is explained which leads to the unequal
`splitting of the subbands. Afterwards the analysis filter
`bank is described which performs the unequal band split(cid:173)
`ting combined 'vith critical subsampling of the sub-bands.
`In Section 3 and Section 4 the encoding techniques for both
`bands are explained. In Section 5 a bit error concealment
`technique is described and in Section 6 the final bit alloca(cid:173)
`tion is given. In Section 7 we discuss the extension of the
`coding scheme towards variable bitrate.
`
`2. ANALYSIS FILTERBANK
`
`The use of unequal sub bands was motivated by the experi(cid:173)
`mental evaluation of the instantaneous signal bandwidth of
`speech frames. During voiced parts of a speech signal, most
`of the signal energy is present in the lower frequency region.
`Therefore it is not necessary to encode the higher part of the
`frequency range. Transform coding techniques behave in a
`similar way in that they allocate in voiced frames more bits
`to code lower frequency components than higher frequency
`components. For that reason, simulations wen: po:rfonned
`to find out the actual cut-off frequency necessary to encode
`the current frame without loss of perceptual speech quality.
`By applying a frame size of 10 ms we found that almost 40%
`of the frames could be encoded using a bandwidth of 6kHz
`without loss of perceptual quality. The full bandwidth was
`selected mainly during unvoiced parts of the speech sig(cid:173)
`nals. The voice activity of the speech material used was
`95%. It was extracted from the European Broadcasting
`Union database (5). The speech material consists of various
`languages (English, German, and French), each with male
`and female speakers, and was bandlimited to a frequency
`range of 50-7000 Hz, according to the specifications in the
`G.722 recommendations [1). As a result of the classifica(cid:173)
`tion, a 2-band encoding scheme is proposed which consists
`of subbands with unequal bandwidth. The lower subband
`has a frequency range from 0-6 kHz and the upper subband
`
`0-7803-3192-3/96 $5.00©1996 IEEE
`
`255
`
`Page 5 of 8
`
`
`
`covers a frequency range from 6-7kHz, i.e. we obtain a sub(cid:173)
`band coder with 2 bands having a bandwidth of 6kHz and
`1 kHz respectively. Figure 1 shows the analysis filterbank
`for unequal subband splitting and critical subsampling of
`the subbands.
`
`n
`
`(-1)
`
`higher subband: 6-7kHz
`
`s(n)
`
`!;=16kHz
`
`xh(n)
`
`.(,=4kHz
`
`iS= 12kHz
`
`lower subband: 0-6 kHz
`
`Figure 1. Analysis filterbank for subband splitting
`and critical subsampling of the subband
`signals.
`
`The analysis filterbank is implemented using the efficient
`structure for sampling rate conversion with a fractional ra(cid:173)
`tio of the sampling rate, as described for example by Croch(cid:173)
`iere eta/. (6].
`
`3. ENCODING OF THE 0-6 KHZ BAND
`For encoding the decimated lower subband code-excited(cid:173)
`linear-prediction (CELP, Atal eta/. (7]) is performed. The
`coder operates on speech frames of 10 ms (120 samples).
`In the following, the main parts of the CELP-codec will
`be described: LP-analysis, pitch analysis, fixed codebook
`structure and perceptual weighting filter.
`The subframe lengths used for the different parts of the
`codec are indicated in Figure 2, being 5 ms for the pitch
`analysis and 2.5 ms for the fixed codebook.
`
`LPC
`
`LTP
`
`LTP
`
`CB
`
`CB
`
`CB
`
`CB
`
`0
`
`2.5
`
`5
`
`7.5
`
`10
`time [ms] ->
`
`Figure 2. Update of the codec parameters.
`
`3.1. LP-analysis
`The Linear-Prediction (LP) analysis uses a covariance(cid:173)
`lattice approach as described by Cumani (8). The analysis
`frame length is 15 ms, centered around the middle of the
`second LTP-subframe, resulting in a look-ahead of 5 ms. In
`our realization the order of the LP-filter is 14. The predic(cid:173)
`tion coefficients are updated every 10 ms. Prior to solving
`the equations for the coefficients, the covariance matrix is
`modified by weighting it with a binomial window having
`
`256
`
`an effective bandwidth of 80 Hz (9). This provides a small
`amount of bandwidth expansion to the final LP-filter coeffi(cid:173)
`cients. This is advantageous for the following conversion of
`the LP filter parameters to line spectral frequencies (LSF)
`(10), as weU as for the quantization of the LSF's.
`The LSFs are encoded using 44 bits by interframe moving
`average prediction and split vector quantization of the line
`spectral frequencies resulting in an average spectral distor(cid:173)
`tion of ldB.
`A linear interpolation of the LP-filter coefficients is per(cid:173)
`formed for the first LTP-subframe. This is done in the
`LSF-domain between the quantized actual coefficient set
`and the quantized coefficient set of the previous frame. For
`the second subframe, no interpolation is performed.
`3.2. Pitch analysis
`Every 5 ms, a long-term-prediction (LTP) is carried out in a
`combination of open-loop and closed-loop LT-analysis. For
`each lO ms speech frame, an open-loop pitch estimate is cal(cid:173)
`culated using a weighted correlation measure to avoid mul(cid:173)
`tiples of the pitch period. Thus, a smoothed estimate of the
`pitch contour is obtained. In the first subframe a focussed
`closed-loop adaptive codebook search is performed around
`the open-loop estimate To!, and in the second subframe a
`restricted search is performed around the pitch lag of the
`closed-loop analysis of the first subframe Tel, I , as depicted
`in Figure3.
`
`1st subframe :
`
`2nd subframe :
`
`0 samples
`
`-
`
`search range ~ 0 samples
`
`Figure 3. Long-Term analysis using combined open(cid:173)
`loop and closed-loop analysis and a fo(cid:173)
`cussed search strategy.
`This procedure results in a delta encoding scheme leading
`to 8+6=14 bits for coding the 2 pitch lags.
`The closed-loop search is performed using an adapt(cid:173)
`ive codebook filled with previously computed excitation
`samples. The minimum pitch lag is half of the subframe
`length, i.e. Tmin = 30samples. Additionally, in the lower
`delay range a fractional pitch approach is used (11), as
`shown in Figure 4.
`
`: integer delaY; fractional pitch:
`
`: actual speed\ frame :
`
`·'tmax
`
`o
`·'tmin
`
`120 sample3
`
`integer and fractional pitch
`Figure 4. Combined
`search ranges during closed-loop adaptive
`codebook search (rmax=l93 samples).
`
`Informal listening tests indicate, that a resolution of 1/2
`sample is sufficient for an improvement in speech quality.
`The pitch gain is nonuniformly scalar quantized with
`4 bits.
`
`Page 6 of 8
`
`
`
`3.3. Codebook
`Every 2.5 rns (30 samples), an excitation vector is selec(cid:173)
`ted from a modified 16-bit ternary sparse codebook, as de(cid:173)
`scribed by Salami et al. [12]. An innovation vector contains
`4 nonzero pulses, as shown in Table 1.
`
`I Amplitude I
`±1
`±1
`±1
`±1
`
`Position
`0, 4, 8, 12, 16, 20, 24, 28
`1, 5, 9, 13, 17, 21, 25, 29
`2, 6, 10, 14, 18, 22, 26, (30}
`3, 7, 11, 15, 19, 23, 27, (31)
`
`Table 1. 16-bit ternary sparse codebook [12].
`
`Note that the last position of the 3rd and 4th pulse falls
`outside the subframe boundary. This gives the possibility
`of a variable number of pulses per frame.
`Each pulse has 8 possible positions. Therefore the pulse
`positions are encoded for each pulse with 3 bits. FUrther(cid:173)
`more, each pulse amplitude is encoded with 1 bit, resulting
`in a total of 16 bits for the 4 pulses.
`Due to the structured nature of the codebook, a fast
`search procedure is ensured. Additionally, a focussed search
`approach is used to further reduce the computational load
`of the codebook search [12).
`To reduce the dynamik range of the fixed codebook gain,
`a fixed gain predictor is used. The gain predictor is pre(cid:173)
`dicting the log. energy of the current fixed codebook vector
`based on the log. energy of the previously selected scaled
`fixed codebook vector. This is done in a similar way as in
`a preliminary version of ITU-T G.729 [13). The residual of
`the gain predictor is nonuniformly scalar quantized with 4
`bits.
`
`3.4. Perceptual weighting filter
`The perceptual weighting filter W(z) used during the min(cid:173)
`imization process has a transfer function of the form
`
`(1)
`
`W(z) = A(zht),
`A(zhz)
`with A(z) being the LP-analysis illter, using unquant(cid:173)
`ized LP-filter coefficients. Different sets of weighting factors
`bt, "f2} are used for the adaptive and fixed codebook
`search. During the adaptive codebook search, weighting
`factors {1.0, 0.4} are used, and during the fixed codebook
`search {0.9, 0.8} is used. This was found to give better res(cid:173)
`ults compared to a fixed weighting filter.
`The perceptual weighting filter is updated every 5 ms, us(cid:173)
`ing in the first subframe a linear interpolation between the
`actual unquantized filter coefficients and the unquantized
`filter coefficients of the previous frame. ln the second sub(cid:173)
`frame the actual unquantized coefficients are used.
`
`4. ENCODING OF THE 6-7KHZ BAND
`The classification experiment shows, that the full band(cid:173)
`width is selected mainly during the unvoiced parts of the
`speech signal. This indicates that the higher subband has a
`noise like character. Furthermore, it turned out by experi(cid:173)
`ment that during unvoiced parts it is sufficient to add some
`noise like spectral components above 6kHz to obtain the
`
`perceptual speech quality of a 7kHz speech signal. There(cid:173)
`fore, the higher subband (6-7kHz) is simply represented
`by white noise with adjustment of the short term energy.
`At the output of the analysis filterbank of Section 2 the
`subband signal Xh(n) has a sampling rate of 4kHz, i.e. a
`bandwidth of 2kHz (see Figure 1). Since the input signal is
`bandlimited to 7 kHz, a further reduction of the sampling
`rate by a factor of 2 could be done without use of an aliasing
`filter. The input frame length of LO ms (20 samples) is split
`up into 4 subframes of 2.5 ms, each consisting of 5 samples.
`For each subframe tll.e short term energy is logarithmically
`quantized with 3 bits using MA-prediction with a fixed set
`of coefficients. This results in a bitrate of 1.2kbit/s for the
`higher subband.
`An informal listening test was performed using a high
`quality loudspeaker. The higher subband was processed us(cid:173)
`ing the encoding scheme as described above. The lower
`band remained uncoded, however the sampling rate conver(cid:173)
`sion of the lower sub-band was carried out. As a result,
`it was difficult to distinguish between the original and the
`processed speech signal. Thus, this very simple encoder can
`be used to encode the subband from 6-7kHz.
`
`5. BIT ERROR CONCEALMENT
`
`For the previously described scheme, the overaU bit-rate
`sums up to 15.8 kbit.fs. This gives the possibility of using
`2 parity-bits per frame for reducing the sensitivity of the
`codec to random bit errors up to BER=1o-3
`. After per(cid:173)
`forming informal listening tests, it was concluded, that the
`LP-coefficients are most sensitive against bit errors.
`Therefore, the first parity-bit is computed from the
`44 bits of the LP-coefficients. This bit is transmitted, and
`at the decoder the parity-bit is recomputed from the re(cid:173)
`If a parity-error occurs, the
`ceived LP-filter cofficients.
`LP-coefficient set is replaced by the values of the previous
`frame.
`The second parity-bit is computed from the 8 bits of the
`LTP-index of the first subframe. If a parity-error occurs,
`the value of the LTP-index is set to the integer delay value
`of the previous subframe.
`
`6. BIT ALLOCATION
`
`In the previous sections, the main components of the wide(cid:173)
`band codec were presented. According to Table 2, a final
`bit-rate of 16 kbit/s is achieved.
`
`6-7kHz Energy
`LPC
`LTP-Index
`LTP-Gain
`-cB:rndex
`CB-Gaiill
`Parity bits
`
`0-6kHz
`
`4*3Bit
`
`12 bits
`44bits
`8+6Bit 14 bits
`2*4 Bit
`8bits
`4*16 Bit 64 bits
`4*4 Bit 16bits
`2bits
`
`1.2 kbit/s
`4.4 kbit/s
`2. 2}<l:>i q s
`
`8.0kbit/s
`0.2kb1t/s
`l6.0kb1t/s
`
`Table 2. Bit allocation for a 10 ms frame of the pro(cid:173)
`posed 16 kbitfs splitband wideband codec
`
`257
`
`Page 7 of 8
`
`
`
`7. EXTENSION TO VARIABLE BITRATE
`One of the results of Section 2 has been, that 40% of the
`speech signal with a voice activity of 95% could be encoded
`using just the subband from 0-6 kHz. This means, during
`40% of the active talk time it is not necessary to encode the
`higher subband. This encourages us to consider different
`coding schemes.
`The first alternative is to neglect the bits necessary to
`encode the higher subband. This leads to a coder with a
`variable bitrate. Transmission of 1 Bit/frame is necessary
`in this case to indicate the encoding mode of the higher
`sub band.
`The second possibility is to use these bits to encode the
`lower subband more precisely, resulting in an encoder with
`an overall constant bitrate, but variable bitrate in the two
`bands. Since this happens most of the time during voiced
`parts of a speech signal this is advantageous with respect to
`speech quality. Again one additional Bit/frame is necessary
`to indicate the encoding mode of the higher subband.
`Another possibility was recently presented by the author
`in [14], based on a similar approach in (15) in the context
`of wideband ADPCM. The wideband speech signal is en(cid:173)
`coded using only the spectral bandwidth from 0-6 kHz and
`the higher subband is neglected. The missing components
`above 6 kHz are replaced at the receiver by interpolating
`the lower subband signal from 12kHz to 16kHz using an
`interpolation filter with cut-off frequency 7kHz which viol(cid:173)
`ates the interpolation rules. This is possible due to the fact,
`that the signals within the frequency ranges 5-6 kHz and 6-
`7kHz exhibit a similar distribution of energy along the time
`axis for a given speech sound. In this case a fixed bitrate of
`14.8 kbit/s is achieved, with only very small degradations
`compared to the fixed bit--rate version of the previous sec(cid:173)
`tions.
`
`8. CONCLUSION
`In this paper a split-band encoding scheme for 16kbit/s
`wideband speech coding has been presented. It is based
`on two unequal sub bands from 0-6 kHz and 6-7kHz. This
`approach was motivated by experimental evaluation of the
`instantaneous signal bandwidth of the speech frames. The
`coder operates on speech frames of 10 ms, using a look(cid:173)
`ahead of 5 ms for LP-analysis. Together with the 10 ms
`delay introduced by the analysis-synthesis filterbank, this
`results in an overall algorithmic delay of 25 ms. By in(cid:173)
`formal listening tests the speech quality was judged to be
`better than the CCITT G.722 wideband codec operating at
`48kbit/s.
`
`ACKNOWLEDGEMENTS
`This work bas been supported by the Research Center of
`Deutsche Telekom AG. The author would like to thank es(cid:173)
`pecially Mr. G. Schroder. Acknowledgements are made to
`Prof. P. Vary and the colleagues of the speech coding group
`for inspiring discussions, especially T. Fingscheidt.
`
`REFERENCES
`[1) CCITT, "7 kHz Audio Coding within 64kbit/s", in
`Recommendation G. 722, vol. Fascile III.4 of Blue Book,
`pp. 269-341. Melbourne 1988.
`
`[2) Study Group 15 ITU-T , "Report February 1995 Meet(cid:173)
`ing Working Party 2/15" , February 1995, Geneva,
`Switzerland.
`[3) S. Dimolitsas, "ITU Voice Coding Standards: Stand(cid:173)
`ardization of Voice Coding Milestones Reached" ,
`comp.speecb Newsgroup, February 1995.
`[4) ITU-T SG15 COM 15-152,
`"G.729 - Coding of
`Speech at 8kbps using conjugate-structure algebraic(cid:173)
`code-excited linear-predictoin {CS-ACELP)".
`[5) European Broadcasting Union ( EBU ), Sound Quality
`Assesment Material {Recordings for Subjective Test),
`no. 422 204-2 edition.
`[6) R.E. Crochiere and L.R. Rabiner, Multirate Digital
`Signal Processing, Signal Processing. Prentice-Hall,
`1983.
`"Stochastic Coding
`(7) B.S. Atal and M.R. Schroeder,
`of Speech Signals at Very Low Bit Rates" , in Proc.
`Int. Conf. Communication (ICC), May 1984, pp. 1610-
`1613.
`[8) A. Cumani, "On a Covariance-Lattice Algorithm for
`Linear Prediction", in Proc. Int. Conf. Acoust., Speech,
`Signal Processing, ICASSP, Paris, France, 1982, pp.
`651-654.
`[9) Y. Tohkura and F. ltakura nad S. Hashimoto,
`"Spectral Smoothing Technique in PARCOR Speech
`Analysis-Synthesis" ,
`IEEE Trans. Acoust., Speech,
`Signal Processing, vol. 26, no. 6, pp. 587- 596, Decem(cid:173)
`ber 1978.
`[10] P. Kabal and R.P. Ra.machandran, "The Computation
`of Line Spectral Frequencies Using Chebyshef Poly(cid:173)
`nomials",
`IEEE Trans. Acoust., Speech, Signal Pro(cid:173)
`cessing, vol. 34, no. 6, pp. 1419- 1426, December 1986.
`[11) J. S. Marques, J. M. 'fribolet, I. M. 'francoso, and L. B.
`Almeida, "Pitch Prediction with Fractional Delays in
`CELP Coding", in Proc. EUROSPEECH, Genua, It(cid:173)
`alien, 1989, pp. 509-513.
`[12) R. Salami, C. Laflamme, J-P. Adoul, A. Kataoka,
`S. Hayashi, C. Lamblin, D. Massaloux, S. Proust,
`P. Kroon, and Y. Shoham, "Description of the Pro(cid:173)
`posed ITU-T 8kb/s Speech Coding Standard", in Proc.
`IEEE Workshop on Speech Coding, Annapolis, Mary(cid:173)
`land, USA, September 1995, pp. 3-4.
`[13) R. Salami, C. Laflamme, J.-P. Adoul, and D. Mas(cid:173)
`saloux,
`"A Toll Quality 8 kb/s Speech Codec for
`the Personal Communications System (PCS)", IEEE
`'n-ons. Vehicular Technology, val. 43, no. 3, pp. 808-
`816, August 1994.
`[14) J. Paulus, "Variable Bitrate Wideband Speech Coding
`Using Perceptually Motivated Thresholds", in Proc.
`IEEE Work8hop on Speech Coding for Telecommunic(cid:173)
`ations, Annapolis, Maryland, USA, September 1995,
`pp. 35-36.
`(15) M. Dietrich, "Performance and Implementation of a
`Robust ADPCM Algorithm for Wideband Speech Cod(cid:173)
`ing with 64 kbit/s", in Proc. Int. Zurich Seminar on
`Digital Communication.,, Ziirich, Switzerland, Marcb
`1984.
`
`258
`
`Page 8 of 8