`Ordentlich et al.
`
`USOO5235669A
`[11] Patent Number:
`[45] Date of Patent:
`
`5,235,669
`Aug. 10, 1993
`
`[54] LOW-DELAY CODE-EXCITED
`LINEAR-PREDICI'IV E CODING OF
`WIDEBAND SPEECH AT 32 KBITS/SEC
`[75] Inventors: Erik Ordentlich, Palo Alto, Calif;
`Yair Shoham, Berkeley Heights, NJ.
`[73] Assignee: AT&T Laboratories, Murray Hill,
`NJ.
`[21] Appl. No.: 546,627
`[22] Filed:
`Jun. 29, 1990
`[51] Int. Cl.5 .............................................. .. G10L 9/00
`[52] U.S. Cl. ................ ..
`........... .. 395/2
`[58] Field of Search .................................. .. 381/29-50;
`364/5135, 724.19
`
`[56]
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`Re. 32,580 1/1988 Atal et al. ........................... .. 381/40
`4,133,976 l/1979 Atal et al. .
`4,472,832 9/1984 Atal et al
`4,694,298 9/ 1987 Milan ......... ..
`4,701,954 10/1987 Atal ................... ..
`
`
`
`4,811,261 3/1989 Kobayashi et al. 4,827,517 5/1989 Atal et a1. ............. ..
`
`4,941,178 7/1990 Chuang ............................... .. 381/41
`
`FOREIGN PATENT DOCUMENTS
`
`0331405 2/1989 European Pat. Off. .
`2624675 6/1989 France .
`
`OTHER PUBLICATIONS
`“Stochastic Coding of Speech Signals at Very Low Bit
`Rates”, Proc. IEEE Int. Conf Comm, May 1984, B. S.
`Atal and M. R. Schroeder, pp. 1610-1612.
`“Code-Excited Linear Prediction (CELP): High-Qual
`ity Speech at Very Low Bit Rates”, Proc. IEEE Int.
`Con? ASSR, 1985, pp. 937-940, M. R. Schroeder and B.
`S. Atal.
`“A Class of Analysis-by-Synthesis Predictive Coders
`for High Quality Speech Coding at Rates Between 4.8
`and 16 kbits/s”, IEEE J. an 821. Area in Comm, SA
`C-6(2) Feb. 1988, pp. 353-363, P. Kroon and E. F.
`Deprettere.
`“Predictive Coding of Speech Signals and Subjective
`
`Error Criteria”, IEEE Tr. ASSP. vol. ASSP-27, No. 3,
`Jun. 1979, pp. 247-254, B. S. Atal and M. S. Schroeder.
`“Low Delay Code Excited Linear Predictive (LD-
`CELP) Coding of Wide Band Speech at 32Kbit/sec.,”
`MS Thesis, EE Dept., MIT, Jul. 1990, E. Ordentlich,
`Abstract only (p. 1).
`“Transfer Coding of Audio Signals Using Perceptual
`Noise Criteria”, IEEE Sel. Areas in Comm, vol. 6, No.
`2, Feb. 1988, pp. 314-323, .1. D. Johnston.
`“6.722, A New CCITI‘ Coding Standard for Digital
`Transmission of Wideband Audio Signals”, IEEE
`Comm. Mag, vol. 26, No. 1, Jan. 1988, pp. 8-15, P.
`Mermelstein.
`“Strategies for improving the performance of CELP
`coders at low bit rates”, ICASSP’88 (1988 International
`Conf. on Acoustics, Speech, and Signal Processing),
`vol. 1, pp. 151-154, IEEE, New York; P. Kroon, et a1.
`“Some experiments of 7 kHz audio coding at 16 kbit/s”,
`ICASSP ’89 (1989 International Conference on Acous
`tics, Speech, and Signal Processing), May 1989, vol. 1,
`pp. 192-195, IEEE, New York; R. Drogo de Jacovo, et
`al.
`“On different vector predictive coding schemes and
`their application to low bit rates speech coding”, Signal
`Processing IV: Theories and Applications (Proceedings
`of EUSIPCO-88, 4th European Signal Processing
`Conf.) Sep. 1988, vol. II, pp. 871-874, North Holland
`Publishing Co.; F. Bottau, et al.
`Primary Examiner-David D. Knepper
`Attorney, Agent, or Firm-William Ryan
`[57]
`ABSTRACT
`An improved digital communication system, e.g., a
`CELP code/decoder based system, is improved for use
`with a wide-band signal such as a high-quality speech
`signal by modifying the noise weighting ?lter used in
`such systems to include a ?lter section which affects
`primarily the spectral tilt of the weighting ?lter in addi
`tion to a ?lter component re?ecting formant frequency
`information in the input signal. Alternatively, the
`weighting is modi?ed to re?ect perceptual transform
`techniques.
`
`20 Claims, 2 Drawing Sheets
`
`70
`
`Ex. 1022 / Page 1 of 8
`Apple v. Saint Lawrence
`
`
`
`US. Patent
`
`Aug. 10, 1993
`
`Sheet 1 of 2
`
`5,235,669
`
`FIG. 1
`
`TRANSMITTER
`
`j
`
`,10
`
`CODEBOOK‘
`
`s
`
`I ,35
`
`W(z)
`
`x
`+ /40
`
`" Y
`,35
`
`~ w(z)
`
`I 36
`
`.... A,
`Lpc
`ANALYSIS
`
`1
`
`i
`v
`CHANNEL
`
`v
`
`i /50
`
`;,70 éso‘l;
`CODEBOOK—~@—> 3(2)
`85’
`
`I,90
`NZ) ‘—> s
`
`Ex. 1022 / Page 2 of 8
`
`
`
`US. Patent
`
`Aug. 10, 1993
`
`Sheet 2 of 2
`
`5,235,669
`
`210
`
`,220
`
`,35
`
`FIG. 2
`
`FIG. 3
`
`Ex. 1022 / Page 3 of 8
`
`
`
`1
`
`LOW-DELAY CODE-EXCITED
`LINEAR-PREDICI‘IVE CODING OF WIDEBAND
`SPEECH AT 32 KBITS/ SEC
`
`FIELD OF THE INVENTION
`The present invention relates to methods and appara
`tus for ef?ciently coding and decoding signals, includ
`ing speech signals. More particularly, this invention 10
`relates to methods and apparatus for coding and decod
`ing high quality speech signals. Yet more particularly,
`this invention relates to digital communication systems,
`including those offering ISDN services, employing
`such coders and decoders.
`
`15
`
`BACKGROUND OF THE INVENTION
`Recent years have witnessed many improvements in
`coding and decoding for digital communications sys
`tems. US. Pat. No. 4,133,976, issued on Jan. 9, 1979; RE
`20
`32,580, issued on Jan. 19, 1988; 4,701,954, issued on Oct.
`27, 1987; 4,472,832, issued on Sep. 18, 1984, and
`4,827,517, issued on May 2, 1989, to B. S. Atal, et a1 and
`assigned to the assignee of the present invention, all
`present important improvements in this ?eld.
`One area of such improvements have came to be
`called code excited linear predictive (CELP) coders
`and are, e.g., described B. S. Atal and M. R. Schroeder,
`“Stochastic Coding of Speech Signals at Very Low Bit
`Rates,” Proc. IEEE Int. Conf Comm., May 1984, p.
`48.1; M. R. Schroeder and B. S. Atal, “Code-Excited
`Linear Predictive (CELP): High Quality Speech at
`Very Low Bit Rates,” Proc. IEEE Int. Cary‘.~ ASSP.,
`1985, pp. 937-940; P. Kroon and E. F. Deprettere, “A
`Class of Analysis-by-Synthesis Predictive Coders for
`High-Quality Speech Coding at Rate Between 4.8 and
`16 Kb/s,” IEEE J. on Sel. Area in Comm SAC-6(2),
`Feb. 1988, pp. 353-363, and the above-cited U.S. Pat.
`No. 4,827,517. Such techniques have found application,
`e.g., in voice grade telephone channels, including mo
`bile telephone channels.
`The prospect of high-quality multi-channel/multi
`user speech communication via the emerging ISDN has
`increased interest in advanced coding algorithms for
`wideband speech. In contrast to the standard telephony
`band of 200 to 3400 Hz, wideband speech is assigned the
`45
`band 50 to 7000 Hz and is sampled at a rate of 16000 Hz
`for subsequent digital processing. The added low fre
`quencies increase the voice naturalness and enhance the
`sense of closeness whereas the added high frequencies
`make the speech sound crisper and more intelligible.
`The overall quality of wideband speech as de?ned
`above is suf?cient for sustained commentary-grade
`voice communication as required, for example, in multi
`user audio-video teleconferencing. Wideband speech is,
`however, harder to code since the data is highly un
`structured at high frequencies and the spectral dynamic
`range is very high. In some network applications, there
`is also a requirement for a short coding delay which
`limits the size of the processing frame and reduces the
`efficiency of the coding algorithm. This adds another
`dimension to the dif?culty of this coding problem.
`
`5,235,669
`2
`ing CELP techniques to extend to communication of
`such wide-band speech and other such signals.
`More particularly, the illustrative embodiments of the
`present invention provide for modi?ed weighting of
`input signals to enhance the relative magnitude of signal
`energy to noise energy as a function of frequency. Addi
`tionally, the overall spectral tilt of the weighting ?lter
`response characteristic is advantageously decoupled
`from the determination of the response at particular
`frequencies corresponding, e.g., to formants.
`Thus, whereas prior art CELP coders employ a
`weighting ?lter based primarily on the formant content,
`it proves advantageous in accordance with a teaching of
`the present invention to use a cascade of prior art
`weighting ?lter and an additional ?lter section for con
`trolling the spectral tilt of the composite weighting
`?lter.
`
`25
`
`30
`
`35
`
`50
`
`55
`
`SUMMARY OF THE INVENTION
`Many of the advantages of the well-known CELP
`coders and decoders are not fully realized when applied
`65
`to the communication of wide-band speech information
`(e.g., in the frequency range 50 to 7000 Hz). The present
`invention, in typical embodiments, seeks to adapt exist
`
`BRIEF DESCRIPTION OF THE DRAWING
`FIG. 1 shows a digital communication system using
`the present invention.
`FIG. 2 shows a modi?cation of the system of FIG. 1
`in accordance with the embodiment of the present in
`vention.
`FIG. 3 shows a modi?ed frequency response result
`ing from the application of a typical embodiment of the
`present invention.
`
`DETAILED DESCRIPTION
`To simplify the description of the present invention,
`the above-cited publications by Atal and Schroeder,
`and the cited US. Pat. No. 4,133,976 to Atal and
`Schroeder are hereby incorporated by reference and
`should be deemed included in the present disclosure as
`if set forth in their entirety.
`The basic structure of conventional CELP (as de
`scribed, e.g., in the references cited above) is shown in
`FIG. 1.
`.
`Shown are the transmitter portion at the top of the
`?gure, the receiver portion at the bottom and the vari
`ous parameters (j, g, M, B and A) that are transmitted
`via a communication channel 50. CELP is based upon
`the traditional excitation-?lter model where an excita
`tion signal, drawn from an excitation codebook 10, is
`used as an input to an all-pole ?lter which is usually a
`cascade of an LPC-derived ?lter l/A(z) (20 in FIG. 1)
`and a so-called pitch ?lter l/B(z), 30. The LPC polyno
`mial is given by
`
`-i
`A - 1i’
`(Z) — [:0 an
`
`and is obtained by a standard M'h-order LPC analysis of
`the speech signal. The pitch ?lter is determined by the
`polynomial
`
`where P is the current “pitch” lag-a value that best
`represents the current periodicity of the input and bj’s
`are the current pitch taps. Most often, the order of the
`pitch ?lter is q=l and it is rarely more than 3. Both
`polynomial A(z), B(z) are monic.
`The CELP algorithm implements a closed-loop (anal
`ysis-by-synthesis) search procedure for finding the best
`
`Ex. 1022 / Page 4 of 8
`
`
`
`20
`
`25
`
`30
`
`5,235,669
`3
`4
`excitation and, possibly, the best pitch parameters. In
`of future data. The resulting long coding delay of the
`the excitation search loop, each of the excitation vectors
`conventional CELP is therefore unacceptable in some
`applications. This has motivated the development of the
`is passed through the LPC and pitch ?lters in an effort
`Low-Delay CELP (LD-CELP) algorithm (see above
`to ?nd the best match (as determined by comparator 4-0
`cited AT&T Proposal for the CCITT l6 Kb/s speech
`and minimizing circuit 41) to the output, usually, in a
`coding standard).
`weighted mean-squared error (WMSE) sense. As seen
`in FIG. 1, the WMSE matching is accomplished via the
`The Low-Delay CELP derives its name from the fact
`use of a noise-weighting ?lter W(z) 35. The input
`that it uses the minimum possible block length-the vec
`speech s(n) is ?rst pre-?ltered by W(z) and the resulting
`tor dimension. In other words, the pitch and LPC ana
`signal x(n) (X(z)=S(z) W(z)) serves as a reference sig
`lyzers are not allowed to use any data beyond that limit.
`nal in. the closed-loop search. The quantized version of
`So, the basic coding delay unit corresponds to the vec
`x(n), denoted by y(n), is a ?ltered excitation, closest to
`tor size which only a few samples (between 5 to 10
`samples). The LPC analyzer typically needs a much
`x(n) in an MSE sense. The ?lter used in the search loop
`is the weighted synthesis ?lter I~I(z)=W(z)/[B(z) A(z)].
`longer data block than the vector dimension. Therefore,
`Observe, however, that the ?nal quantized signal is
`in LD-CELP the LPC analysis can be performed on a
`long enough block of most recent past data plus (possi
`obtained at the output of the unweighted synthesis ?lter
`l/[B(z) A(z)], which means that W(z) is not used by the
`bly) the available new data. Notice, however, that a
`receiver to synthesize the output. This loop essentially
`coded version of the past data is available at both the
`(but not strictly) minimizes the WMSE between the
`receiver and the transmitter. This suggests an extremely
`input and output, namely, the MSE of the signal
`ef?cient coding mode called backward-adaptive-cod
`($(Z)—$(Z)) W(z)
`ing. In this mode, the receiver duplicates the LPC anal
`The ?lter W(z) is important for achieving a high
`ysis of the transmitter using the same quantized past
`data and generates the LPC parameters locally. No
`perceptual quality in CELP systems and it plays a cen
`tral role in the CELP-based wideband coder presented
`LPC information is transmitted and the saved bits are
`assigned to the excitation. This, in turn, helps in further
`here, as will become evident.
`The closed-loop search for the best pitch parameters
`reducing the coding delay since having more bits for the
`is usually done by passing segments of past excitation
`excitation allows using shorter input blocks. This cod
`through the weighted ?lter and optimizing B(z) for
`ing mode is, however, sensitive to the level of the quan
`minimum WMSE with respect to the target signal X(z).
`tization noise. A high-level noise adversely affects the
`quality of the LPC analysis and reduces the coding
`The search algorithm will be described in more detail.
`efficiency. Therefore, the method is not applicable to
`As shown in FIG. 1, the codebook entries are scaled
`by a gain factor g applied to scaling circuit 15. This gain
`low-rate coders. It has been successfully applied in 16
`may either be explicitly optimized and transmitted (for'
`Kb/s LD-CELP systems (see above-cited AT&T Pro
`posal for the CCITT l6 Kb/s speech coding standard)
`ward mode) or may be obtained from previously quan
`tized data (backward mode). A combination of the
`but not as successfully at lower rates.
`When backward LPC analysis becomes inefficient
`backward and forward modes is also sometimes used
`(see, e. g., AT&T Proposal for the CCITT l6 Kb/s
`due to excessive noise, a forward-mode LPC analysis
`speech coding standard, COM N No. 2, STUDY
`can be employed within the structure of LD-CELP. In
`GROUP N, “Description of 16 Kb/s Low-Delay Code
`this mode, LPC analysis is performed on a clean past
`excited Linear Predictive Coding (LD-CELP) Algo
`signal and LPC information is sent to the receiver. For
`rithm," March 1989). See also U.S. patent application
`ward-mode and combined forward-backward mode
`Ser. No. 07/298451, entitled “A Low-Delay Code
`LD-CELP systems are currently under study.
`The pitch analysis can also be performed in a back
`Excited Linear Predictive Coder for Speech or Audio,”
`ward mode using only past quantized data. This analy
`by J-H. Chen, ?led Jan. 17, 1989, and assigned to the
`assignee of the present invention, which application is
`sis, however, was found to be extremely sensitive to
`hereby incorporated in this disclosure by reference as if
`channel errors which appear at the receiver only and
`set forth in its entirety.
`cause a mismatch between the transmitter and receiver.
`In general, the CELP transmitter codes and transmits
`So, in LD-CELP, the pitch ?lter B(z) is either com
`the following ?ve entities: the excitation vector (j), the
`pletely avoided or is implemented in a combined back
`excitation gain (g), the pitch lag (p), the pitch tap(s) (B),
`ward-forward mode where some information about the
`pitch delay and/or pitch tap is sent to the receiver.
`and the LPC parameters (A). The overall transmission
`The LD-CELP proposed here for coding wideband
`bit rate is determined by the sum of all the bits required
`speech at 32 Kb/s advantageously employs backward
`for coding these entities. The transmitted information is
`LPC. Two versions of the coder will be described in
`used at the receiver in well-known fashion to recover
`the original input information.
`greater detail below. The ?rst includes forward-mode
`pitch loop and the second does not use pitch loop at all.
`The CELP is a look-ahead coder, it needs to have in
`its memory a block of “future” samples in order to
`The general structure of the coder is that of FIG. 1,
`excluding the transmission of the LPC information.
`process the current sample which obviously creates a
`coding delay. The size of this block depends on the
`Also, if the pitch loop is not used, B(z)=1 and the pitch
`coder’s speci?c structure. In general, different parts of
`information is not transmitted. The algorithmic details
`60
`the coding algorithm may need different-size future
`of the coder are given below.
`blocks. The smallest block of immediate future samples
`A fundamental result in MSE waveform coding is
`is usually required by the codebook search algorithm
`that the quantization noise has a ?at spectrum at the
`point of minimization, namely, the difference signal
`and is equal to the codevector dimension. The pitch
`between the output and the target is white. On the other
`' loop may need a longer block size, depending on the
`hand, the input speech signal is non-white and actually
`update rate of the pitch parameters. In a conventional
`CELP, the longest block length is determined by the
`has a wide spectral dynamic range due to the formant
`LPC analyzer which usually needs about 20 msec worth
`structure and the high-frequency roll-off. As a result,
`
`40
`
`55
`
`65
`
`Ex. 1022 / Page 5 of 8
`
`
`
`5,235,669
`5
`6
`the signal-to-noise ratio is not uniform across the fre
`is the weighting ?lter of the conventional CELP as in
`quency range. The SNR is high at the spectral peaks
`Eq. (1). The initial goal was to ?nd a set (g1, g2) for best
`and is low at the spectral valleys. Unless the flat noise is
`perceptual performance. It was found that, similar to
`reshaped, the low-energy spectral information is
`the narrow-band case, the values g1=0.9, g2=0.4 pro
`masked by the noise and an audible distortion results.
`duced reasonable results. However, the performance
`This problem has been recognized and addressed in the
`left room for improvement. It was found that the ?lter
`context of CELP coding of telephony-bandwidth
`W(z) as in Eq. (1) has an inherent limitation in modeling
`speech (see “Predictive Coding of Speech Signals and
`the formant structure and the required spectral tilt con
`Subjective Error Criteria,” IEEE Tr. ASSP, Vol.
`currently. The spectral tilt has been found to be con
`ASSP-27, No. 3, June l979pp. 247-254). The solution
`trolled approximately by the difference gl-gg. The tilt is
`was in a form of a noise weighting ?lter, added to the
`global in nature and it is not readily possible to empha
`CELP search loop as shown in FIG. 1. The standard
`size it separately at high frequencies. Also, changing the
`form of this ?lter is:
`tilt affects the shape of the formants of W(z). A pro
`nounced tilt is obtained along with higher and wider
`formants, which puts too much noise at low frequencies
`and in between the formants. The conclusion was that
`the formant and tilt problems ought to be decoupled.
`The approach taken was to use W(z) only for formant
`modeling and to add another section for controlling the
`tilt only. The general form of the new ?lter is
`
`(1)
`
`15
`
`where A(z) is the LPC polynomial. The effect of g1 or
`g; is to move the roots of A(z) towards the origin, de
`emphasizing the spectral peaks of l/A(z). With g1 and
`82, as in Eq. (1), the response of W(z) has valleys (anti
`formants) at the formant locations and the inter-formant
`areas are emphasized. In addition, the amount of an
`overall spectral roll-off is reduced, compared to the
`speech spectral envelope as given by l/A(z).
`In the CELP system of FIG. 1, the unweighted error
`signal E(z)=Y(z)-X(z) is white since this is the signal
`that is actually minimized. The ?nal error signal is
`
`20
`
`25
`
`§<z)-s<z)=£<z>W-1(z)
`
`(2)
`
`Wp(z)= W(Z)P(Z)
`
`(3)
`
`where P(z) is responsible for the tilt only. The imple
`mentation of this improvement is shown in FIG. 2
`where the weighting ?lter 35 of FIG. 1 is replaced by a
`cascade of ?lter 220 having a response given by P(z)
`with the original ?lter 35. The cascaded ?lter Wp(z) is
`given by Eq. (3). Various forms of P(z) were studied.
`They will be mentioned here very brie?y. A detailed
`discussion of these forms can be found in E. Ordentlich,
`“Low Delay Code Excited Linear Predictive (LD
`CELP) Coding of Wide Band Speech at 32 Kbit/sec., ”
`MS Thesis, EE Dept., MIT, July 1990. The appendix to
`this application includes pre-published portions of this
`thesis. This appendix is enclosed on an interim basis;
`when the exact date of publication of the thesis is
`known, the appendix will be selectively deleted.
`The forms studied were: ?xed three-pole (two com
`plex, one real) section, ?xed three-zero section, adaptive
`three-pole section, adaptive three-zero section and
`adaptive two-pole section. The ?xed sections were de
`signed to have an unequal but ?xed spectral tilt, with a
`steeper tilt at high frequencies. The coef?cients of the
`adaptive sections were dynamically computed via LPC
`analysis to make P-1(z) a 2nd or 3rd-order approxima
`tion of the current spectrum, which essentially captures
`only the spectral tilt.
`In addition, one mode chosen for P(z) was a frequen
`cy-domain step function at mid range. This attenuates
`the response at the lower half of the range and boosts it
`at the higher half by a predetermined constant. A 14th
`order all-pole section was used for this purpose.
`It was found by careful listening tests that the two
`pole section was the best choice. For this case, the sec
`tion is given by
`
`The coef?cients p; are found by applying the standard
`LPC algorithm to the ?rst three correlation coef?cients
`of the current-frame LPC inverse filter (A(z)) sequence
`a,~. The parameter 8 is used to adjust the spectral tilt of
`P(z). The value 8=0.7 was found to be a good choice.
`This form of P(z), in combination with W(z), where
`
`and has the spectral shape of W-l(z). This means that
`the noise is now concentrated in the formant peaks and
`is attenuated in between the formants. The idea behind
`this noise shaping is to exploit the auditory masking
`effect. Noise is less audible if it shares the same spectral
`band with a high-level tone-like signal. Capitalizing on
`this effect, the ?lter W(z) greatly enhances the percep
`tual quality of the CELP coder.
`In contrast to the standard telephony band of 200 to
`3400 Hz, the wideband speech considered here is char
`acterized by a spectral band of 50 to 7000 Hz. The
`added low frequencies enhance the naturalness and
`authenticity of the speech sounds. The added high fre
`quencies make the sound crisper and more intelligible.
`The signal is sampled at 16 KHz for digital processing
`by the CELP system. The higher sampling rate and the
`added low frequencies both make the signal more pre
`dictable and the overall prediction gain is typically
`higher than that of standard telephony speech. The
`spectral dynamic range is considerably higher than that
`of telephony speech where the added high-frequency
`region of 3400 to 6000 Hz is usually near the bottom of
`this range. Based on the analysis in the previous section,
`it is clear that, while coding of the low-frequency re
`gion should be easier, coding of the high-frequency
`region poses a severe problem. The initial unweighted
`spectral SNR tends to be highly negative in this region.
`On the other hand, the auditory system is quite sensitive
`in this region and the quantization distortions are
`clearly audible in a form of crackling and hiss. Noise
`weighting is, therefore, more crucial, in wideband
`CELP. The balance of low to high frequency coding is
`more delicate. The major effort in this study was
`towards finding a good weighting ?lter that would
`allow a better control of this balance.
`A starting point for the better understanding of the
`technical advance contributed by the present invention
`
`35
`
`45
`
`50
`
`55
`
`60
`
`65
`
`Ex. 1022 / Page 6 of 8
`
`
`
`20
`
`25
`
`5,235,669
`7
`g1=0.98, g2=0.8, yielded the best perceptual perfor
`mance over all other systems studied in this work.
`In addition to the P(z) method described above, the
`?rst non-P(z) method is based on psycho-acoustical
`perception theory (see Brian C. J. Moore, “An Intro
`duction to the Psychology of Hearing,” Academic
`Press Inc., 1982) currently applied in Perceptual Trans
`form Coding (PTC) of audio signals (see also James D.
`Johnson, "Transform Coding of Audio Signals Using
`Perceptual Noise Criteria,” IEEE Sel. Areas in Comm.,
`6(2), February 1988, and K. Brandenburg, “A Contribu
`tion to the Methods and the Evaluation of Quality for
`High-Grade Musi Coding,” PhD Thesis, Univ. of Er
`langen-Nurnberg, 1989). In PTC, known psycho-acous
`tical auditory masking effects are used in calculating a
`Noise Threshold Function (NT F) of the frequency.
`According to the theory, any noise below this threshold
`should be inaudible. The NTF is used in determining
`the bit allocation and/ or the quantizer step size for each
`of the transform coefficient which, later, are used to
`re-synthesize the signal with the desired quantization
`noise shape. The idea studied in the work was to use the
`NTF in the framework of LPC-based co'der like CELP.
`Basically, W(z) was designed to have the NTF shape
`for the current frame. The NTF, however, may be a
`fairly complex function of the frequency, with sharp
`dips and peaks. Therefore, a high-order pole-zero ?lter
`is advantageously used in accurate modeling of the
`NTF as is well-known in the art. Related teachings for
`selecting a ?lter having the NTF characteristic will be
`found in US. patent application Ser. No. 423,088 by K.
`Brandenburg, et al, ?led Oct. 18, 1989, and assigned to
`the assignee of the present invention.
`A second approach that has been successfully used is
`split-band CELP coding in which the signal is ?rst split
`into low and high frequency bands by a set of two quad
`rature-mirror ?lters (QMF) and then, each band is
`coded separately by its own coder. A similar method
`was used in P. Mermelstein, “G722, a New CCI'IT
`Coding Standard for Digital Transmission of Wideband
`Audio Signals,” IEEE Comm. Mag, pp. 8-15, January
`1988. This approach provides the flexibility of assigning
`different bit rates to the low and high bands and to
`attain an optimum balance of high and low spectral
`distortions. Flexibility is also achieved in the sense that
`entirely different coding systems can be employed in
`each band, optimizing the performance for each fre
`quency range. In the present illustrative embodiment,
`however, LD-CELP is used in all (two) bands. Various
`bit rate assignments were tried for the two bands under
`the constraint of a total rate of 32 Kb/s. The best ratio
`of low to high band bit assignment was found to be 3:1.
`All of the systems mentioned above can include vari
`ous pitch loops, i.e., various orders for B(z) and various
`number of bits for the pitch taps. One interesting point
`is that it sometimes proves advantageous to use a system
`without a pitch loop, i.e., B(z)=1. In fact, in some tests,
`such a system offered the best result. The explanation
`for this may be the following. The pitch loop is based on
`using past residual sequences as an initial excitation of 60
`the synthesis ?lter. This constitutes a lst-stage quantiza
`tion in a two'stage VQ system where the past residual
`serves as an adaptive codebook. Two-stage VQ is
`known to be inferior to single-stage (regular) VQ at
`least from an MSE point of view. In other words, the
`bits are better spent if used with a single excitation
`codebook. Now, the pitch loop offers maily perceptual
`improvement due to the enhanced periodicity, which is
`
`8
`important in low rate coders like 4-8 Kb/s CELP,
`where the MSE SNR is low anyway. At 32 Kb/s, with
`high MSE SNR, the pitch loop contribution does not
`outweigh the efficiency of a single VQ configuration
`and, therefore, there is no reason for its use.
`While the above description has proceeded in terms
`of wide-band speech, it will be clear to those skilled in
`the art that the present invention will have application
`in other particular contexts. FIG. 3 shows a representa
`tive modi?cation of the frequency response of the over
`all weighting ?lter in accordance with the teachings of
`the present invention. In FIG. 3 a solid line represents
`weighting in accordance with a prior art technique and
`the dotted curve corresponds to an illustrative modi?ed
`response in accordance with a typical exemplary em
`bodiment of the present invention.
`We claim:
`1. A method for coding a speech signal comprising
`generating a plurality of parameter signals represen
`tative of said speech signal,
`synthesizing a plurality of estimate signals based on
`said parameter signals, each of said estimate signals
`being identi?ed by a corresponding index signal,
`performing a frequency weighted comparison of each
`of said estimate signals with said speech signal, said
`weighting relatively emphasizing
`perceptually signi?cant frequencies within a band
`limited frequency spectrum of said speech signal,
`and
`higher frequencies to a greater degree than lower
`frequencies within said band-limited spectrum,
`and
`representing said speech signal by at least one of said
`corresponding index signals identifying said esti
`mate signals which, upon said comparison, meet a
`preselected comparison criterion.
`2. The method of claim 1 wherein said comparison
`criterion comprises a minimization of the difference
`between said weighted speech signal and each of said
`weighted estimate signals.
`3. The method of claim 1 wherein said perceptually
`signi?cant frequencies are associated with formants of
`said speech signal.
`4. The method of claim 1 further comprising repre
`senting said speech signal by at least one of said parame
`ter signals.
`5. The method of claim 1 wherein said synthesizing of
`said estimate signals comprises applying each of an
`ordered plurality of code vectors to a synthesizing ?lter
`to generate a corresponding one of said estimate signals.
`6. The method of claim 5 wherein said parameter
`signals comprise signals representative of short term
`characteristics of said speech signal.
`7. The method of claim 1 wherein said emphasizing
`said higher frequencies to a greater degree than said
`lower frequencies comprises imposing a tilt to said
`band-limited spectrum of said speech signal and each of
`said estimate signals.
`8. The method of claim 7 wherein said frequency
`weighted comparison comprises ?ltering said speech
`signal and each of said estimate signals using a ?lter
`which imposes said tilt to said band-limited spectrum of
`said speech signal and each of said estimate signals, and
`comparing the result of said ?ltering of said speech
`signal with the result of said ?ltering of each of said
`estimate signals.
`9. The method of claim 8 wherein said ?lter com
`prises quadrature mirror ?lter sections having a plural
`
`35
`
`45
`
`50
`
`55
`
`65
`
`Ex. 1022 / Page 7 of 8
`
`
`
`5,235,669
`9
`ity of frequency bands, and said generating a plurality of
`parameter signals, said synthesizing a plurality of esti
`mate signals, said performing a frequency weighted
`comparison, and said representing said speech signal by
`said index signals, are performed separately for each
`frequency band.
`10. The method of claim 8 wherein said ?lter com
`prises
`a ?rst frequency weighting section for relatively em
`phasizing said perceptually signi?cant frequencies,
`and
`a second frequency weighting section for imposing
`said tilt to said band-limited spectrum of said
`speech signal and each of said estimate signals.
`11. The method of claim 10 wherein said second
`frequency weighting section is characterized by a trans
`fer function, P(z), where
`
`20
`
`10
`15. The method of claim 10 wherein said second
`frequency weighting section comprises a two-zero ?lter
`section.
`16. The method of claim 10 wherein said transfer
`function of said second frequency weighting section is
`characterized by
`a ?rst function for the range of frequencies below a
`predetermined frequency substantially in the cen
`ter of said band-limited spectrum of said input sig
`nal, and
`a second function for the range of frequencies above
`said predetermined point.
`17. The method of claim 16 wherein said second
`frequency weighting section comprises a ?lter section
`of order greater than 3.
`18. The method of claim 17 wherein said second
`frequency weighting section comprises a ?lter section
`of order 14.
`19. The method of claim 10 wherein