throbber
United States Patent [19]
`Ordentlich et al.
`
`USOO5235669A
`[11] Patent Number:
`[45] Date of Patent:
`
`5,235,669
`Aug. 10, 1993
`
`[54] LOW-DELAY CODE-EXCITED
`LINEAR-PREDICI'IV E CODING OF
`WIDEBAND SPEECH AT 32 KBITS/SEC
`[75] Inventors: Erik Ordentlich, Palo Alto, Calif;
`Yair Shoham, Berkeley Heights, NJ.
`[73] Assignee: AT&T Laboratories, Murray Hill,
`NJ.
`[21] Appl. No.: 546,627
`[22] Filed:
`Jun. 29, 1990
`[51] Int. Cl.5 .............................................. .. G10L 9/00
`[52] U.S. Cl. ................ ..
`........... .. 395/2
`[58] Field of Search .................................. .. 381/29-50;
`364/5135, 724.19
`
`[56]
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`Re. 32,580 1/1988 Atal et al. ........................... .. 381/40
`4,133,976 l/1979 Atal et al. .
`4,472,832 9/1984 Atal et al
`4,694,298 9/ 1987 Milan ......... ..
`4,701,954 10/1987 Atal ................... ..
`
`
`
`4,811,261 3/1989 Kobayashi et al. 4,827,517 5/1989 Atal et a1. ............. ..
`
`4,941,178 7/1990 Chuang ............................... .. 381/41
`
`FOREIGN PATENT DOCUMENTS
`
`0331405 2/1989 European Pat. Off. .
`2624675 6/1989 France .
`
`OTHER PUBLICATIONS
`“Stochastic Coding of Speech Signals at Very Low Bit
`Rates”, Proc. IEEE Int. Conf Comm, May 1984, B. S.
`Atal and M. R. Schroeder, pp. 1610-1612.
`“Code-Excited Linear Prediction (CELP): High-Qual
`ity Speech at Very Low Bit Rates”, Proc. IEEE Int.
`Con? ASSR, 1985, pp. 937-940, M. R. Schroeder and B.
`S. Atal.
`“A Class of Analysis-by-Synthesis Predictive Coders
`for High Quality Speech Coding at Rates Between 4.8
`and 16 kbits/s”, IEEE J. an 821. Area in Comm, SA
`C-6(2) Feb. 1988, pp. 353-363, P. Kroon and E. F.
`Deprettere.
`“Predictive Coding of Speech Signals and Subjective
`
`Error Criteria”, IEEE Tr. ASSP. vol. ASSP-27, No. 3,
`Jun. 1979, pp. 247-254, B. S. Atal and M. S. Schroeder.
`“Low Delay Code Excited Linear Predictive (LD-
`CELP) Coding of Wide Band Speech at 32Kbit/sec.,”
`MS Thesis, EE Dept., MIT, Jul. 1990, E. Ordentlich,
`Abstract only (p. 1).
`“Transfer Coding of Audio Signals Using Perceptual
`Noise Criteria”, IEEE Sel. Areas in Comm, vol. 6, No.
`2, Feb. 1988, pp. 314-323, .1. D. Johnston.
`“6.722, A New CCITI‘ Coding Standard for Digital
`Transmission of Wideband Audio Signals”, IEEE
`Comm. Mag, vol. 26, No. 1, Jan. 1988, pp. 8-15, P.
`Mermelstein.
`“Strategies for improving the performance of CELP
`coders at low bit rates”, ICASSP’88 (1988 International
`Conf. on Acoustics, Speech, and Signal Processing),
`vol. 1, pp. 151-154, IEEE, New York; P. Kroon, et a1.
`“Some experiments of 7 kHz audio coding at 16 kbit/s”,
`ICASSP ’89 (1989 International Conference on Acous
`tics, Speech, and Signal Processing), May 1989, vol. 1,
`pp. 192-195, IEEE, New York; R. Drogo de Jacovo, et
`al.
`“On different vector predictive coding schemes and
`their application to low bit rates speech coding”, Signal
`Processing IV: Theories and Applications (Proceedings
`of EUSIPCO-88, 4th European Signal Processing
`Conf.) Sep. 1988, vol. II, pp. 871-874, North Holland
`Publishing Co.; F. Bottau, et al.
`Primary Examiner-David D. Knepper
`Attorney, Agent, or Firm-William Ryan
`[57]
`ABSTRACT
`An improved digital communication system, e.g., a
`CELP code/decoder based system, is improved for use
`with a wide-band signal such as a high-quality speech
`signal by modifying the noise weighting ?lter used in
`such systems to include a ?lter section which affects
`primarily the spectral tilt of the weighting ?lter in addi
`tion to a ?lter component re?ecting formant frequency
`information in the input signal. Alternatively, the
`weighting is modi?ed to re?ect perceptual transform
`techniques.
`
`20 Claims, 2 Drawing Sheets
`
`70
`
`Ex. 1022 / Page 1 of 8
`Apple v. Saint Lawrence
`
`

`

`US. Patent
`
`Aug. 10, 1993
`
`Sheet 1 of 2
`
`5,235,669
`
`FIG. 1
`
`TRANSMITTER
`
`j
`
`,10
`
`CODEBOOK‘
`
`s
`
`I ,35
`
`W(z)
`
`x
`+ /40
`
`" Y
`,35
`
`~ w(z)
`
`I 36
`
`.... A,
`Lpc
`ANALYSIS
`
`1
`
`i
`v
`CHANNEL
`
`v
`
`i /50
`
`;,70 éso‘l;
`CODEBOOK—~@—> 3(2)
`85’
`
`I,90
`NZ) ‘—> s
`
`Ex. 1022 / Page 2 of 8
`
`

`

`US. Patent
`
`Aug. 10, 1993
`
`Sheet 2 of 2
`
`5,235,669
`
`210
`
`,220
`
`,35
`
`FIG. 2
`
`FIG. 3
`
`Ex. 1022 / Page 3 of 8
`
`

`

`1
`
`LOW-DELAY CODE-EXCITED
`LINEAR-PREDICI‘IVE CODING OF WIDEBAND
`SPEECH AT 32 KBITS/ SEC
`
`FIELD OF THE INVENTION
`The present invention relates to methods and appara
`tus for ef?ciently coding and decoding signals, includ
`ing speech signals. More particularly, this invention 10
`relates to methods and apparatus for coding and decod
`ing high quality speech signals. Yet more particularly,
`this invention relates to digital communication systems,
`including those offering ISDN services, employing
`such coders and decoders.
`
`15
`
`BACKGROUND OF THE INVENTION
`Recent years have witnessed many improvements in
`coding and decoding for digital communications sys
`tems. US. Pat. No. 4,133,976, issued on Jan. 9, 1979; RE
`20
`32,580, issued on Jan. 19, 1988; 4,701,954, issued on Oct.
`27, 1987; 4,472,832, issued on Sep. 18, 1984, and
`4,827,517, issued on May 2, 1989, to B. S. Atal, et a1 and
`assigned to the assignee of the present invention, all
`present important improvements in this ?eld.
`One area of such improvements have came to be
`called code excited linear predictive (CELP) coders
`and are, e.g., described B. S. Atal and M. R. Schroeder,
`“Stochastic Coding of Speech Signals at Very Low Bit
`Rates,” Proc. IEEE Int. Conf Comm., May 1984, p.
`48.1; M. R. Schroeder and B. S. Atal, “Code-Excited
`Linear Predictive (CELP): High Quality Speech at
`Very Low Bit Rates,” Proc. IEEE Int. Cary‘.~ ASSP.,
`1985, pp. 937-940; P. Kroon and E. F. Deprettere, “A
`Class of Analysis-by-Synthesis Predictive Coders for
`High-Quality Speech Coding at Rate Between 4.8 and
`16 Kb/s,” IEEE J. on Sel. Area in Comm SAC-6(2),
`Feb. 1988, pp. 353-363, and the above-cited U.S. Pat.
`No. 4,827,517. Such techniques have found application,
`e.g., in voice grade telephone channels, including mo
`bile telephone channels.
`The prospect of high-quality multi-channel/multi
`user speech communication via the emerging ISDN has
`increased interest in advanced coding algorithms for
`wideband speech. In contrast to the standard telephony
`band of 200 to 3400 Hz, wideband speech is assigned the
`45
`band 50 to 7000 Hz and is sampled at a rate of 16000 Hz
`for subsequent digital processing. The added low fre
`quencies increase the voice naturalness and enhance the
`sense of closeness whereas the added high frequencies
`make the speech sound crisper and more intelligible.
`The overall quality of wideband speech as de?ned
`above is suf?cient for sustained commentary-grade
`voice communication as required, for example, in multi
`user audio-video teleconferencing. Wideband speech is,
`however, harder to code since the data is highly un
`structured at high frequencies and the spectral dynamic
`range is very high. In some network applications, there
`is also a requirement for a short coding delay which
`limits the size of the processing frame and reduces the
`efficiency of the coding algorithm. This adds another
`dimension to the dif?culty of this coding problem.
`
`5,235,669
`2
`ing CELP techniques to extend to communication of
`such wide-band speech and other such signals.
`More particularly, the illustrative embodiments of the
`present invention provide for modi?ed weighting of
`input signals to enhance the relative magnitude of signal
`energy to noise energy as a function of frequency. Addi
`tionally, the overall spectral tilt of the weighting ?lter
`response characteristic is advantageously decoupled
`from the determination of the response at particular
`frequencies corresponding, e.g., to formants.
`Thus, whereas prior art CELP coders employ a
`weighting ?lter based primarily on the formant content,
`it proves advantageous in accordance with a teaching of
`the present invention to use a cascade of prior art
`weighting ?lter and an additional ?lter section for con
`trolling the spectral tilt of the composite weighting
`?lter.
`
`25
`
`30
`
`35
`
`50
`
`55
`
`SUMMARY OF THE INVENTION
`Many of the advantages of the well-known CELP
`coders and decoders are not fully realized when applied
`65
`to the communication of wide-band speech information
`(e.g., in the frequency range 50 to 7000 Hz). The present
`invention, in typical embodiments, seeks to adapt exist
`
`BRIEF DESCRIPTION OF THE DRAWING
`FIG. 1 shows a digital communication system using
`the present invention.
`FIG. 2 shows a modi?cation of the system of FIG. 1
`in accordance with the embodiment of the present in
`vention.
`FIG. 3 shows a modi?ed frequency response result
`ing from the application of a typical embodiment of the
`present invention.
`
`DETAILED DESCRIPTION
`To simplify the description of the present invention,
`the above-cited publications by Atal and Schroeder,
`and the cited US. Pat. No. 4,133,976 to Atal and
`Schroeder are hereby incorporated by reference and
`should be deemed included in the present disclosure as
`if set forth in their entirety.
`The basic structure of conventional CELP (as de
`scribed, e.g., in the references cited above) is shown in
`FIG. 1.
`.
`Shown are the transmitter portion at the top of the
`?gure, the receiver portion at the bottom and the vari
`ous parameters (j, g, M, B and A) that are transmitted
`via a communication channel 50. CELP is based upon
`the traditional excitation-?lter model where an excita
`tion signal, drawn from an excitation codebook 10, is
`used as an input to an all-pole ?lter which is usually a
`cascade of an LPC-derived ?lter l/A(z) (20 in FIG. 1)
`and a so-called pitch ?lter l/B(z), 30. The LPC polyno
`mial is given by
`
`-i
`A - 1i’
`(Z) — [:0 an
`
`and is obtained by a standard M'h-order LPC analysis of
`the speech signal. The pitch ?lter is determined by the
`polynomial
`
`where P is the current “pitch” lag-a value that best
`represents the current periodicity of the input and bj’s
`are the current pitch taps. Most often, the order of the
`pitch ?lter is q=l and it is rarely more than 3. Both
`polynomial A(z), B(z) are monic.
`The CELP algorithm implements a closed-loop (anal
`ysis-by-synthesis) search procedure for finding the best
`
`Ex. 1022 / Page 4 of 8
`
`

`

`20
`
`25
`
`30
`
`5,235,669
`3
`4
`excitation and, possibly, the best pitch parameters. In
`of future data. The resulting long coding delay of the
`the excitation search loop, each of the excitation vectors
`conventional CELP is therefore unacceptable in some
`applications. This has motivated the development of the
`is passed through the LPC and pitch ?lters in an effort
`Low-Delay CELP (LD-CELP) algorithm (see above
`to ?nd the best match (as determined by comparator 4-0
`cited AT&T Proposal for the CCITT l6 Kb/s speech
`and minimizing circuit 41) to the output, usually, in a
`coding standard).
`weighted mean-squared error (WMSE) sense. As seen
`in FIG. 1, the WMSE matching is accomplished via the
`The Low-Delay CELP derives its name from the fact
`use of a noise-weighting ?lter W(z) 35. The input
`that it uses the minimum possible block length-the vec
`speech s(n) is ?rst pre-?ltered by W(z) and the resulting
`tor dimension. In other words, the pitch and LPC ana
`signal x(n) (X(z)=S(z) W(z)) serves as a reference sig
`lyzers are not allowed to use any data beyond that limit.
`nal in. the closed-loop search. The quantized version of
`So, the basic coding delay unit corresponds to the vec
`x(n), denoted by y(n), is a ?ltered excitation, closest to
`tor size which only a few samples (between 5 to 10
`samples). The LPC analyzer typically needs a much
`x(n) in an MSE sense. The ?lter used in the search loop
`is the weighted synthesis ?lter I~I(z)=W(z)/[B(z) A(z)].
`longer data block than the vector dimension. Therefore,
`Observe, however, that the ?nal quantized signal is
`in LD-CELP the LPC analysis can be performed on a
`long enough block of most recent past data plus (possi
`obtained at the output of the unweighted synthesis ?lter
`l/[B(z) A(z)], which means that W(z) is not used by the
`bly) the available new data. Notice, however, that a
`receiver to synthesize the output. This loop essentially
`coded version of the past data is available at both the
`(but not strictly) minimizes the WMSE between the
`receiver and the transmitter. This suggests an extremely
`input and output, namely, the MSE of the signal
`ef?cient coding mode called backward-adaptive-cod
`($(Z)—$(Z)) W(z)
`ing. In this mode, the receiver duplicates the LPC anal
`The ?lter W(z) is important for achieving a high
`ysis of the transmitter using the same quantized past
`data and generates the LPC parameters locally. No
`perceptual quality in CELP systems and it plays a cen
`tral role in the CELP-based wideband coder presented
`LPC information is transmitted and the saved bits are
`assigned to the excitation. This, in turn, helps in further
`here, as will become evident.
`The closed-loop search for the best pitch parameters
`reducing the coding delay since having more bits for the
`is usually done by passing segments of past excitation
`excitation allows using shorter input blocks. This cod
`through the weighted ?lter and optimizing B(z) for
`ing mode is, however, sensitive to the level of the quan
`minimum WMSE with respect to the target signal X(z).
`tization noise. A high-level noise adversely affects the
`quality of the LPC analysis and reduces the coding
`The search algorithm will be described in more detail.
`efficiency. Therefore, the method is not applicable to
`As shown in FIG. 1, the codebook entries are scaled
`by a gain factor g applied to scaling circuit 15. This gain
`low-rate coders. It has been successfully applied in 16
`may either be explicitly optimized and transmitted (for'
`Kb/s LD-CELP systems (see above-cited AT&T Pro
`posal for the CCITT l6 Kb/s speech coding standard)
`ward mode) or may be obtained from previously quan
`tized data (backward mode). A combination of the
`but not as successfully at lower rates.
`When backward LPC analysis becomes inefficient
`backward and forward modes is also sometimes used
`(see, e. g., AT&T Proposal for the CCITT l6 Kb/s
`due to excessive noise, a forward-mode LPC analysis
`speech coding standard, COM N No. 2, STUDY
`can be employed within the structure of LD-CELP. In
`GROUP N, “Description of 16 Kb/s Low-Delay Code
`this mode, LPC analysis is performed on a clean past
`excited Linear Predictive Coding (LD-CELP) Algo
`signal and LPC information is sent to the receiver. For
`rithm," March 1989). See also U.S. patent application
`ward-mode and combined forward-backward mode
`Ser. No. 07/298451, entitled “A Low-Delay Code
`LD-CELP systems are currently under study.
`The pitch analysis can also be performed in a back
`Excited Linear Predictive Coder for Speech or Audio,”
`ward mode using only past quantized data. This analy
`by J-H. Chen, ?led Jan. 17, 1989, and assigned to the
`assignee of the present invention, which application is
`sis, however, was found to be extremely sensitive to
`hereby incorporated in this disclosure by reference as if
`channel errors which appear at the receiver only and
`set forth in its entirety.
`cause a mismatch between the transmitter and receiver.
`In general, the CELP transmitter codes and transmits
`So, in LD-CELP, the pitch ?lter B(z) is either com
`the following ?ve entities: the excitation vector (j), the
`pletely avoided or is implemented in a combined back
`excitation gain (g), the pitch lag (p), the pitch tap(s) (B),
`ward-forward mode where some information about the
`pitch delay and/or pitch tap is sent to the receiver.
`and the LPC parameters (A). The overall transmission
`The LD-CELP proposed here for coding wideband
`bit rate is determined by the sum of all the bits required
`speech at 32 Kb/s advantageously employs backward
`for coding these entities. The transmitted information is
`LPC. Two versions of the coder will be described in
`used at the receiver in well-known fashion to recover
`the original input information.
`greater detail below. The ?rst includes forward-mode
`pitch loop and the second does not use pitch loop at all.
`The CELP is a look-ahead coder, it needs to have in
`its memory a block of “future” samples in order to
`The general structure of the coder is that of FIG. 1,
`excluding the transmission of the LPC information.
`process the current sample which obviously creates a
`coding delay. The size of this block depends on the
`Also, if the pitch loop is not used, B(z)=1 and the pitch
`coder’s speci?c structure. In general, different parts of
`information is not transmitted. The algorithmic details
`60
`the coding algorithm may need different-size future
`of the coder are given below.
`blocks. The smallest block of immediate future samples
`A fundamental result in MSE waveform coding is
`is usually required by the codebook search algorithm
`that the quantization noise has a ?at spectrum at the
`point of minimization, namely, the difference signal
`and is equal to the codevector dimension. The pitch
`between the output and the target is white. On the other
`' loop may need a longer block size, depending on the
`hand, the input speech signal is non-white and actually
`update rate of the pitch parameters. In a conventional
`CELP, the longest block length is determined by the
`has a wide spectral dynamic range due to the formant
`LPC analyzer which usually needs about 20 msec worth
`structure and the high-frequency roll-off. As a result,
`
`40
`
`55
`
`65
`
`Ex. 1022 / Page 5 of 8
`
`

`

`5,235,669
`5
`6
`the signal-to-noise ratio is not uniform across the fre
`is the weighting ?lter of the conventional CELP as in
`quency range. The SNR is high at the spectral peaks
`Eq. (1). The initial goal was to ?nd a set (g1, g2) for best
`and is low at the spectral valleys. Unless the flat noise is
`perceptual performance. It was found that, similar to
`reshaped, the low-energy spectral information is
`the narrow-band case, the values g1=0.9, g2=0.4 pro
`masked by the noise and an audible distortion results.
`duced reasonable results. However, the performance
`This problem has been recognized and addressed in the
`left room for improvement. It was found that the ?lter
`context of CELP coding of telephony-bandwidth
`W(z) as in Eq. (1) has an inherent limitation in modeling
`speech (see “Predictive Coding of Speech Signals and
`the formant structure and the required spectral tilt con
`Subjective Error Criteria,” IEEE Tr. ASSP, Vol.
`currently. The spectral tilt has been found to be con
`ASSP-27, No. 3, June l979pp. 247-254). The solution
`trolled approximately by the difference gl-gg. The tilt is
`was in a form of a noise weighting ?lter, added to the
`global in nature and it is not readily possible to empha
`CELP search loop as shown in FIG. 1. The standard
`size it separately at high frequencies. Also, changing the
`form of this ?lter is:
`tilt affects the shape of the formants of W(z). A pro
`nounced tilt is obtained along with higher and wider
`formants, which puts too much noise at low frequencies
`and in between the formants. The conclusion was that
`the formant and tilt problems ought to be decoupled.
`The approach taken was to use W(z) only for formant
`modeling and to add another section for controlling the
`tilt only. The general form of the new ?lter is
`
`(1)
`
`15
`
`where A(z) is the LPC polynomial. The effect of g1 or
`g; is to move the roots of A(z) towards the origin, de
`emphasizing the spectral peaks of l/A(z). With g1 and
`82, as in Eq. (1), the response of W(z) has valleys (anti
`formants) at the formant locations and the inter-formant
`areas are emphasized. In addition, the amount of an
`overall spectral roll-off is reduced, compared to the
`speech spectral envelope as given by l/A(z).
`In the CELP system of FIG. 1, the unweighted error
`signal E(z)=Y(z)-X(z) is white since this is the signal
`that is actually minimized. The ?nal error signal is
`
`20
`
`25
`
`§<z)-s<z)=£<z>W-1(z)
`
`(2)
`
`Wp(z)= W(Z)P(Z)
`
`(3)
`
`where P(z) is responsible for the tilt only. The imple
`mentation of this improvement is shown in FIG. 2
`where the weighting ?lter 35 of FIG. 1 is replaced by a
`cascade of ?lter 220 having a response given by P(z)
`with the original ?lter 35. The cascaded ?lter Wp(z) is
`given by Eq. (3). Various forms of P(z) were studied.
`They will be mentioned here very brie?y. A detailed
`discussion of these forms can be found in E. Ordentlich,
`“Low Delay Code Excited Linear Predictive (LD
`CELP) Coding of Wide Band Speech at 32 Kbit/sec., ”
`MS Thesis, EE Dept., MIT, July 1990. The appendix to
`this application includes pre-published portions of this
`thesis. This appendix is enclosed on an interim basis;
`when the exact date of publication of the thesis is
`known, the appendix will be selectively deleted.
`The forms studied were: ?xed three-pole (two com
`plex, one real) section, ?xed three-zero section, adaptive
`three-pole section, adaptive three-zero section and
`adaptive two-pole section. The ?xed sections were de
`signed to have an unequal but ?xed spectral tilt, with a
`steeper tilt at high frequencies. The coef?cients of the
`adaptive sections were dynamically computed via LPC
`analysis to make P-1(z) a 2nd or 3rd-order approxima
`tion of the current spectrum, which essentially captures
`only the spectral tilt.
`In addition, one mode chosen for P(z) was a frequen
`cy-domain step function at mid range. This attenuates
`the response at the lower half of the range and boosts it
`at the higher half by a predetermined constant. A 14th
`order all-pole section was used for this purpose.
`It was found by careful listening tests that the two
`pole section was the best choice. For this case, the sec
`tion is given by
`
`The coef?cients p; are found by applying the standard
`LPC algorithm to the ?rst three correlation coef?cients
`of the current-frame LPC inverse filter (A(z)) sequence
`a,~. The parameter 8 is used to adjust the spectral tilt of
`P(z). The value 8=0.7 was found to be a good choice.
`This form of P(z), in combination with W(z), where
`
`and has the spectral shape of W-l(z). This means that
`the noise is now concentrated in the formant peaks and
`is attenuated in between the formants. The idea behind
`this noise shaping is to exploit the auditory masking
`effect. Noise is less audible if it shares the same spectral
`band with a high-level tone-like signal. Capitalizing on
`this effect, the ?lter W(z) greatly enhances the percep
`tual quality of the CELP coder.
`In contrast to the standard telephony band of 200 to
`3400 Hz, the wideband speech considered here is char
`acterized by a spectral band of 50 to 7000 Hz. The
`added low frequencies enhance the naturalness and
`authenticity of the speech sounds. The added high fre
`quencies make the sound crisper and more intelligible.
`The signal is sampled at 16 KHz for digital processing
`by the CELP system. The higher sampling rate and the
`added low frequencies both make the signal more pre
`dictable and the overall prediction gain is typically
`higher than that of standard telephony speech. The
`spectral dynamic range is considerably higher than that
`of telephony speech where the added high-frequency
`region of 3400 to 6000 Hz is usually near the bottom of
`this range. Based on the analysis in the previous section,
`it is clear that, while coding of the low-frequency re
`gion should be easier, coding of the high-frequency
`region poses a severe problem. The initial unweighted
`spectral SNR tends to be highly negative in this region.
`On the other hand, the auditory system is quite sensitive
`in this region and the quantization distortions are
`clearly audible in a form of crackling and hiss. Noise
`weighting is, therefore, more crucial, in wideband
`CELP. The balance of low to high frequency coding is
`more delicate. The major effort in this study was
`towards finding a good weighting ?lter that would
`allow a better control of this balance.
`A starting point for the better understanding of the
`technical advance contributed by the present invention
`
`35
`
`45
`
`50
`
`55
`
`60
`
`65
`
`Ex. 1022 / Page 6 of 8
`
`

`

`20
`
`25
`
`5,235,669
`7
`g1=0.98, g2=0.8, yielded the best perceptual perfor
`mance over all other systems studied in this work.
`In addition to the P(z) method described above, the
`?rst non-P(z) method is based on psycho-acoustical
`perception theory (see Brian C. J. Moore, “An Intro
`duction to the Psychology of Hearing,” Academic
`Press Inc., 1982) currently applied in Perceptual Trans
`form Coding (PTC) of audio signals (see also James D.
`Johnson, "Transform Coding of Audio Signals Using
`Perceptual Noise Criteria,” IEEE Sel. Areas in Comm.,
`6(2), February 1988, and K. Brandenburg, “A Contribu
`tion to the Methods and the Evaluation of Quality for
`High-Grade Musi Coding,” PhD Thesis, Univ. of Er
`langen-Nurnberg, 1989). In PTC, known psycho-acous
`tical auditory masking effects are used in calculating a
`Noise Threshold Function (NT F) of the frequency.
`According to the theory, any noise below this threshold
`should be inaudible. The NTF is used in determining
`the bit allocation and/ or the quantizer step size for each
`of the transform coefficient which, later, are used to
`re-synthesize the signal with the desired quantization
`noise shape. The idea studied in the work was to use the
`NTF in the framework of LPC-based co'der like CELP.
`Basically, W(z) was designed to have the NTF shape
`for the current frame. The NTF, however, may be a
`fairly complex function of the frequency, with sharp
`dips and peaks. Therefore, a high-order pole-zero ?lter
`is advantageously used in accurate modeling of the
`NTF as is well-known in the art. Related teachings for
`selecting a ?lter having the NTF characteristic will be
`found in US. patent application Ser. No. 423,088 by K.
`Brandenburg, et al, ?led Oct. 18, 1989, and assigned to
`the assignee of the present invention.
`A second approach that has been successfully used is
`split-band CELP coding in which the signal is ?rst split
`into low and high frequency bands by a set of two quad
`rature-mirror ?lters (QMF) and then, each band is
`coded separately by its own coder. A similar method
`was used in P. Mermelstein, “G722, a New CCI'IT
`Coding Standard for Digital Transmission of Wideband
`Audio Signals,” IEEE Comm. Mag, pp. 8-15, January
`1988. This approach provides the flexibility of assigning
`different bit rates to the low and high bands and to
`attain an optimum balance of high and low spectral
`distortions. Flexibility is also achieved in the sense that
`entirely different coding systems can be employed in
`each band, optimizing the performance for each fre
`quency range. In the present illustrative embodiment,
`however, LD-CELP is used in all (two) bands. Various
`bit rate assignments were tried for the two bands under
`the constraint of a total rate of 32 Kb/s. The best ratio
`of low to high band bit assignment was found to be 3:1.
`All of the systems mentioned above can include vari
`ous pitch loops, i.e., various orders for B(z) and various
`number of bits for the pitch taps. One interesting point
`is that it sometimes proves advantageous to use a system
`without a pitch loop, i.e., B(z)=1. In fact, in some tests,
`such a system offered the best result. The explanation
`for this may be the following. The pitch loop is based on
`using past residual sequences as an initial excitation of 60
`the synthesis ?lter. This constitutes a lst-stage quantiza
`tion in a two'stage VQ system where the past residual
`serves as an adaptive codebook. Two-stage VQ is
`known to be inferior to single-stage (regular) VQ at
`least from an MSE point of view. In other words, the
`bits are better spent if used with a single excitation
`codebook. Now, the pitch loop offers maily perceptual
`improvement due to the enhanced periodicity, which is
`
`8
`important in low rate coders like 4-8 Kb/s CELP,
`where the MSE SNR is low anyway. At 32 Kb/s, with
`high MSE SNR, the pitch loop contribution does not
`outweigh the efficiency of a single VQ configuration
`and, therefore, there is no reason for its use.
`While the above description has proceeded in terms
`of wide-band speech, it will be clear to those skilled in
`the art that the present invention will have application
`in other particular contexts. FIG. 3 shows a representa
`tive modi?cation of the frequency response of the over
`all weighting ?lter in accordance with the teachings of
`the present invention. In FIG. 3 a solid line represents
`weighting in accordance with a prior art technique and
`the dotted curve corresponds to an illustrative modi?ed
`response in accordance with a typical exemplary em
`bodiment of the present invention.
`We claim:
`1. A method for coding a speech signal comprising
`generating a plurality of parameter signals represen
`tative of said speech signal,
`synthesizing a plurality of estimate signals based on
`said parameter signals, each of said estimate signals
`being identi?ed by a corresponding index signal,
`performing a frequency weighted comparison of each
`of said estimate signals with said speech signal, said
`weighting relatively emphasizing
`perceptually signi?cant frequencies within a band
`limited frequency spectrum of said speech signal,
`and
`higher frequencies to a greater degree than lower
`frequencies within said band-limited spectrum,
`and
`representing said speech signal by at least one of said
`corresponding index signals identifying said esti
`mate signals which, upon said comparison, meet a
`preselected comparison criterion.
`2. The method of claim 1 wherein said comparison
`criterion comprises a minimization of the difference
`between said weighted speech signal and each of said
`weighted estimate signals.
`3. The method of claim 1 wherein said perceptually
`signi?cant frequencies are associated with formants of
`said speech signal.
`4. The method of claim 1 further comprising repre
`senting said speech signal by at least one of said parame
`ter signals.
`5. The method of claim 1 wherein said synthesizing of
`said estimate signals comprises applying each of an
`ordered plurality of code vectors to a synthesizing ?lter
`to generate a corresponding one of said estimate signals.
`6. The method of claim 5 wherein said parameter
`signals comprise signals representative of short term
`characteristics of said speech signal.
`7. The method of claim 1 wherein said emphasizing
`said higher frequencies to a greater degree than said
`lower frequencies comprises imposing a tilt to said
`band-limited spectrum of said speech signal and each of
`said estimate signals.
`8. The method of claim 7 wherein said frequency
`weighted comparison comprises ?ltering said speech
`signal and each of said estimate signals using a ?lter
`which imposes said tilt to said band-limited spectrum of
`said speech signal and each of said estimate signals, and
`comparing the result of said ?ltering of said speech
`signal with the result of said ?ltering of each of said
`estimate signals.
`9. The method of claim 8 wherein said ?lter com
`prises quadrature mirror ?lter sections having a plural
`
`35
`
`45
`
`50
`
`55
`
`65
`
`Ex. 1022 / Page 7 of 8
`
`

`

`5,235,669
`9
`ity of frequency bands, and said generating a plurality of
`parameter signals, said synthesizing a plurality of esti
`mate signals, said performing a frequency weighted
`comparison, and said representing said speech signal by
`said index signals, are performed separately for each
`frequency band.
`10. The method of claim 8 wherein said ?lter com
`prises
`a ?rst frequency weighting section for relatively em
`phasizing said perceptually signi?cant frequencies,
`and
`a second frequency weighting section for imposing
`said tilt to said band-limited spectrum of said
`speech signal and each of said estimate signals.
`11. The method of claim 10 wherein said second
`frequency weighting section is characterized by a trans
`fer function, P(z), where
`
`20
`
`10
`15. The method of claim 10 wherein said second
`frequency weighting section comprises a two-zero ?lter
`section.
`16. The method of claim 10 wherein said transfer
`function of said second frequency weighting section is
`characterized by
`a ?rst function for the range of frequencies below a
`predetermined frequency substantially in the cen
`ter of said band-limited spectrum of said input sig
`nal, and
`a second function for the range of frequencies above
`said predetermined point.
`17. The method of claim 16 wherein said second
`frequency weighting section comprises a ?lter section
`of order greater than 3.
`18. The method of claim 17 wherein said second
`frequency weighting section comprises a ?lter section
`of order 14.
`19. The method of claim 10 wherein

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket