`US007584095B2
`
`(12) United States Patent
`US 7,584,095 B2
`(12) United States Patent
`(10) Patent N0.:
`US 7,584,095 B2
`(10) Patent No.:
`*Sep. 1, 2009
`Gottesman et al.
`Gottesman et a1.
`(45) Date of Patent:
`(45) Date of Patent:
`*Sep. 1, 2009
`
`(54)
`(54)
`
`(75)
`(75)
`
`(73)
`(73)
`
`(*)
`
`(21)
`(21)
`(22)
`(22)
`(65)
`(65)
`
`(62)
`(62)
`
`(60)
`(60)
`
`(51)
`(51)
`
`(52)
`(52)
`(58)
`(58)
`
`(56)
`(56)
`
`REW PARAMETRIC VECTOR
`REW PARAMETRIC VECTOR
`QUANTIZATION AND DUAL-PREDICTIVE
`QUANTIZATION AND DUAL-PREDICTIVE
`SEW VECTOR QUANTIZATION FOR
`SEW VECTOR QUANTIZATION FOR
`WAVEFORM INTERPOLATIVE CODING
`WAVEFORM INTERPOLATIVE CODING
`
`Inventors: Oded Gottesman, Goleta, CA (US);
`Inventors: Oded Gottesman, Goleta, CA (US);
`Allen Gersho, Goleta, CA (US)
`
`_
`Notice:
`Nonce:
`
`
`Allen Gersho, Goleta, Assignee: The_ Regfmts of the University of
`Assignee: The Regents of the University of
`California, Oakland, CA (US)
`Callfornla, Oakland: CA (Us)
`_
`_
`_
`_
`Subject to any disclaimer, the term of this
`Sub?q to any dlsclalmeri the term Ofthls
`patent is extended or adjusted under 35
`patem 15 extended Or adlusted under 35
`U.S.C. 154(b) by 98 days.
`U'S'C' 154(1)) by 98 days‘
`1 d_
`_
`Th
`_
`b_
`This patent is subject to a terminal dis-
`1 1.5 Patent 15 Su Ject to a tenmna 1S-
`claimer.
`C almer'
`APPI' NO‘: 11/234,631
`Appl. No.: 11/234,631
`
`Filed:
`Filed:
`
`sep_ 23 2005
`Sep. 23, 2005
`a
`Prior Publication Data
`Prior Publication Data
`
`US 2006/0069554 A1
`US 2006/0069554 A1
`
`Mar. 30, 2006
`Mar. 30, 2006
`
`-
`-
`Related US. Application Data
`RltdU.S.Al t Dt
`PP lea Ion a a
`e a 6
`Division of application No. 09/811,187, filed on Mar.
`Division of application No. 09/811,187, ?led on Mar.
`16, 2001, now Pat. No. 7,010,482.
`16, 2001, noW Pat. No. 7,010,482.
`Provisional application No. 60/190,371, filed on Mar.
`Provisional application No. 60/190,371, ?led on Mar.
`17, 2000.
`17 2000.
`’
`Int. Cl.
`Int_ CL
`(200601)
`G10L 19/14
`(2006.01)
`G10L 19/14
`US. Cl.
`...................................................... 704/222
`US. Cl. .................................................... .. 704/222
`Field of Classification Search .................. 704/222
`Field of Classi?cation Search ................ .. 704/222
`See application file for complete search history.
`See application ?le for complete search history.
`References Cited
`.
`C t d
`R f
`e erences l e
`U.S. PATENT DOCUMENTS
`U.S. PATENT DOCUMENTS
`5,924,061 A *
`7/1999 Shoham ...................... 704/218
`5,924,061 A *
`7/1999 Shoham .................... .. 704/218
`
`7/2002 Udaya Bhaskar et a1.
`6,418,408 B1 *
`7/2002 Udaya Bhaskar et a1.
`6,418,408 B1 *
`6,493,664 B1 * 12/2002 Udaya Bhaskar et a1.
`6,493,664 B1 * 12/2002 Udaya Bhaskar et a1.
`6,691,092 B1 *
`2/2004 Udaya Bhaskar et a1.
`6,691,092 B1* 2/2004 Udaya Bhaskar et a1.
`
`704/219
`704/219
`704/222
`704/222
`704/265
`704/265
`
`OTHER PUBLICATIONS
`
`Oded Gottesman et al., “Enhancing Waveform Interpolative Coding
`Oded Gottesman et a1., “Enhancing Waveform Interpolative Coding
`With Weighted REW Parametric Quantization,” IEEE Workshop on
`With Weighted REW Parametric Quantization,” IEEE Workshop on
`Speech Coding (2000), pp. 1-3.
`Speech Coding (2000), pp. 1-3.
`I.S. Burnett et a1., “Multi-Prototype Waveform Coding Using Frame-
`I.S. Burnett et al., “Multi-Prototype Waveform Coding Using Frame
`By-Frame Analysis-By-Synthesis,” Department of Eelctrical and
`By-Frame Analysis-By-Synthesis,” Department of Eelctrical and
`Computer Engineering, University of Wollongong, NSW, Australia
`Computer Engineering, University of Wollongong, NSW, Australia
`(1997), pp. 1567-1570.
`(1997), pp. 1567-1570.
`I.S. Burnett et a1., “New Techniques for Multi-Prototype Waveform
`I.S. Burnett et al., “New Techniques for Multi-Prototype Waveform
`Coding at 2.84kb/s,” Department of Electrical and Computer Engi-
`Coding at 2.84kb/s,” Department of Electrical and Computer Engi
`neering, University ofWollongong, NSW, Australia (1995), pp. 261 -
`r212e‘r1ng,Un1vers1ty ofWollongong, NSW, Austral1a (1995), pp. 261
`264.
`
`LS. Burnett et a1., “Low Complexity Decomposition and Coding 0f
`LS. Burnett et al., “Low Complexity Decomposition and Coding of
`Prototype Waveforms,” Dept. of Electrical and Computer Eng., Uni-
`Prototype Waveforms,” Dept. of Electrical and Computer Eng., Uni
`versity of Wollongong, NSW, 2522, Australia, pp. 23-24.
`versity ofWollongong, NSW, 2522, Australia, pp. 23-24.
`
`(Continued)
`(Continued)
`Primary Examinerisusan McFadden
`Primar ExamineriSusan McFadden
`y
`(74) Attorney, Agent, or FirmiBerliner & Associates
`(74) Attorney, Agent, or FirmiBerliner & Associates
`
`(57)
`(57)
`
`ABSTRACT
`ABSTRACT
`
`An enhanced analysis-by-synthesis waveform interpolative
`An enhanced analysis-by-synthesis Waveform interpolative
`speech coder able to operate at 2.8 kbps. Novel features
`speech coder able to operate at 2.8 kbps. Novel features
`include dual-predictive analysis-by-synthesis quantization of
`include dual-predictive analysis-by-synthesis quantization of
`the slowly-evolving waveform, efficient parametrization of
`the slowly-evolving Waveform, ef?eient parametrization of
`the rapidly-evolving waveform magnitude, and analysis-by-
`the rapidly-evolving Waveform magnitude, and analysis-by
`synthesis vector quantization of the rapidly evolving wave-
`synthesis vector quantization of the rapidly evolving Wave
`form parameter. Subjective quality tests indicate that
`it
`form parameter. Subjective quality tests indicate that it
`exceeds G.723.1 at 5.3 kbps, and of G.723.1 at 6.3 kbps.
`exceeds G.723.1 at 5.3 kbps, and of G.723.1 at 6.3 kbps.
`
`18 Claims, 6 Drawing Sheets
`18 Claims, 6 Drawing Sheets
`
`v0
`
`VQ-1
`
`VECTOR
`
`VECTOR
`1 4
`2(9)
`QUANTIZER
`
`QUANTIZER —- R(E,(0)
`
`VECTOR 0F
`CODEBOOK
`CODEBOOK
`VECTOR OF
`QUANTIZED
`QUANTIZED
`REW
`REW
`SPECTRA
`SPECTRA
`
`|PR2017-01075
`Saint Lawrence Communications
`Exhibit 2017
`
`
`
`
`_
`
`_
`J
`
`1'
`2
`5(0))
`minl |*||
`— +
`
`VECTOR
`VECTOR
`0F REW
`OF REW 'R‘ ~ w
`SPECTRA
`SPECTRA
`(5'
`)
`mam)
`$1
`VECTOR
`
`
`VECTOR
`QUANTIZER
`QUANTIZER
`
`CODEBOOK
`CODEBOOK
`
`l_—__
`
`
`
`
`US 7,584,095 B2
`US 7,584,095 B2
`Page 2
`Page 2
`
`OTHER PUBLICATIONS
`OTHER PUBLICATIONS
`I.S. Burnett et al., “A Mixed Prototype Waveform/CELP Coder for
`I.S. Burnett et al., “A Mixed Prototype Waveform/CELP Coder for
`Sub 3KB/S,” School of Elecronic and Electrical Engineering, Uni-
`Sub 3KB/S,” School of Elecronic and Electrical Engineering, Uni
`versity of Bath, UK. BA2 7AY (1993), pp. II-175-II-178.
`versity of Bath, UK. BA2 7AY (1993), pp. II-175-II-178.
`Oded Gottesman, “Dispersion Phase Vector Quantization for
`Oded Gottesman, “Dispersion Phase Vector Quantization for
`Enhancement of Waveform Interpolative Coder,” Signal Compres-
`Enhancement of Waveform Interpolative Coder,” Signal Compres
`sion Laboratory, Department of Electrical and Computer Engineer-
`sion Laboratory, Department of Electrical and Computer Engineer
`ing, University of California, Santa Barbara, Calilfornia 93106,
`ing, University of California, Santa Barbara, Calilfornia 93106,
`USA, pp. 1-4.
`USA, pp. 1-4.
`Oded Gottesman et al ., “Enhanced Waveform Interpolative Coding at
`Oded Gottesman et a1 ., “Enhanced Waveform Interpolative Coding at
`4 KBPS,” Signal Compression Laboratory, Department of Electrical
`4 KBPS,” Signal Compression Laboratory, Department of Electrical
`and Computer Engineering, University of California, Santa Barbara,
`and Computer Engineering, University of California, Santa Barbara,
`California 93106, USA, pp. 1-3.
`California 93106, USA, pp. 1-3.
`Oded Gottesman et al., “High Quality Enhanced Waveform Interpo-
`Oded Gottesman et al., “High Quality Enhanced Waveform Interpo
`lative Coding at 2.8 KBPS,” IEEE International Conference on
`lative Coding at 2.8 KBPS,” IEEE International Conference on
`Acoustics, Speech, and Signal Processing, 2000, pp. 1-4.
`Acoustics, Speech, and Signal Processing, 2000, pp. 1-4.
`Oded Gottesman et al ., “Enhanced Analysis-by-Synthesis Waveform
`Oded Gottesman et a1 ., “Enhanced Analysis-by-Synthesis Waveform
`Interpolative Coding at 4 KBPS,” Signal Compression Laboratory,
`Interpolative Coding at 4 KBPS,” Signal Compression Laboratory,
`Department of Electrical and Computer Engineering, University of
`Department of Electrical and Computer Engineering, University of
`California, Santab Barbara, California 93106, USA, pp. 1-4.
`California, Santab Barbara, California 93106, USA, pp. 1-4.
`Daniel W. Griffin et al., “Multiband Excitation Vocoder,” IEEE
`Daniel W. Grif?n et al., “Multiband Excitation Vocoder,” IEEE
`Transactions on Acoustics, Speech, and Signal Processing (1988)
`Transactions on Acoustics, Speech, and Signal Processing (1988)
`36(8):1223-1235.
`36(8):1223-1235.
`W. Bastiaan Kleijn et al., “A Speech Coder Based on Decomposition
`W. Bastiaan Kleijn et al., “A Speech Coder Based on Decomposition
`of Characteristic Waveforms,” IEEE (1995), p. 508-511.
`of Characteristic Waveforms,” IEEE (1995), p. 508-511.
`
`W. Bastiaan Kleijn et al., “Waveform Interpolation for Coding and
`W. Bastiaan Kleijn et al., “Waveform Interpolation for Coding and
`Synthesis,” Speech Coding and Synthesis (1995), pp. 175-207.
`Synthesis,” Speech Coding and Synthesis (1995), pp. 175-207.
`W. Bastiaan Kleijn et al., “Transformation and Decomposition ofthe
`W. Bastiaan Kleijn et al., “Transformation and Decomposition of the
`Speech Signal for Coding,” IEEE Signal Procesing Letters 1(9): 136-
`Speech Signal for Coding,” IEEE Signal Procesing Letters 1(9): 136
`138 (1994).
`138 (1994).
`W. Bastiaan Kleijn, “Encoding SpeechUsing Prototype Waveforms,”
`W. Bastiaan Kleijn, “Encoding SpeechUsing Prototype Waveforms,”
`IEE Transactions on Speech and Audio Processing 1(4):386-399
`IEE Transactions on Speech and Audio Processing 1(4):386-399
`(1993).
`(1993).
`W. Bastiaan Kleijn, “Continuous Representations in Linear Predic-
`W. Bastiaan Kleijn, “Continuous Representations in Linear Predic
`tive Coding,” Speech Research Department, AT&T Bell Laborato-
`tive Coding,” Speech Research Department, AT&T Bell Laborato
`ries, Murray Hill, NJ 07974 (1991), pp. 201-204.
`ries, Murray Hill, NJ 07974 (1991), pp. 201-204.
`W. Bastiaan Kleijn et al., “A Low-Complexity Waveform Interpola-
`W. Bastiaan Kleijn et al., “A Low-Complexity Waveform Interpola
`tion Coder,” Speech Coding Research Department, AT&T Bell Labo-
`tion Coder,” Speech Coding Research Department, AT&T Bell Labo
`ratories, 600 Mountain Avenue, Murray Hill, NJ 07974, USA (1996),
`ratories, 600 Mountain Avenue, Murray Hill, NJ 07974, USA (1996),
`pp. 212-215.
`pp. 212-215.
`R.J. McAulay et al., “Sinusoidal Coding,” Speech Coding and Syn-
`R]. McAulay et al., “Sinusoidal Coding,” Speech Coding and Syn
`thesis 4:121-173 (1995).
`thesis 4:121-173 (1995).
`Yair Shoham, “High-Quality Speech Coding at 2.4 to 4.0 KBPS
`Yair Shoham, “High-Quality Speech Coding at 2.4 to 4.0 KBPS
`Based on Time-Frequency Interpolation,” IEEE, pp. II-167-II-170
`Based on Time-Frequency Interpolation,” IEEE, pp. II-167-II-170
`(1993).
`(1993).
`Yair Shoham, “Very Low Complexity Interpolative Speech Coding at
`Yair Shoham, “Very Low Complexity Interpolative Speech Coding at
`1.2 to 2.4 KBPS,” IEEE, pp. 1599-1602 (1997).
`1.2 to 2.4 KBPS,” IEEE, pp. 1599-1602 (1997).
`Yair Shoham, “Low Complexity Speech Coding at 1.2 to 2.4 kbps
`Yair Shoham, “Low Complexity Speech Coding at 1.2 to 2.4 kbps
`Based on Waveform Interpolation,” International Journal of Speech
`Based on Waveform Interpolation,” International Journal of Speech
`Technology 2:329-341 (1999).
`Technology 2:329-341 (1999).
`* cited by examiner
`* cited by examiner
`
`
`
`US. Patent
`
`Sep. 1, 2009
`
`Sheet 1 0f 6
`
`US 7,584,095 B2
`
`
`
`33:22: 251
`
`n-1
`
`REW PARAMETER f
`
`3(a))
`VECTOR
`OF REW "“
`SPECTRA RG'w)
`
`+
`
`_
`
`or?
`
`VECTOR
`ER
`U T
`EB
`
`F/G.
`
`
`
`U.S. Patent
`
`Sep. 1, 2009
`
`Sheet 2 of 6
`
`US 7,584,095 B2
`
`.250QO
`
`@2212?»
`
`._<mon_zmp
`
`ozfizoma
`
`35E
`
`
`
` _TN.nI$5:
`
`I:9852;
`
`mosammm
`
`xoommooo
`
`105%
`
`$~Cz<so
`
`xoommaoo
`
`
`
`E:EZEEMS230258Em
`
`ED220
`
`EU9:
`
`m,wot
`
`
`
`
`
`
`U.S. Patent
`
`Mmn&
`
`e
`
`US 7,584,095 B2
`
`
`
`Em5%:1%,;.2
`
`mEmm_551
`
`may_2m_ENE/so
`
`._<Eom_n_m
`
`ozfiromg
`
`”65“.;
`
`muNfizgo
`
`xOOmmooo
`
`.....LmGE
`
`N
`
`Tn:E5:
`
`
`
`lil£3525
`
`meow;
`
`ESE/Bo
`
`xoommooo
`
`
`
`
`
`
`US. Patent
`
`Sep. 1, 2009
`
`Sheet 4 of6
`
`US 7,584,095 B2
`
`m 6K
`
`02:55;
`IZEBQQ 44201 2%
`
`
`
`358
`
`2%
`
`2%
`
`@255;
`
`2; b w
`
`E W
`
`N
`
`
`
`T a; NEE
`
`~66?
`
`153250
`
`xoommaoo
`
`55% 2% Q5
`
`
`
`2? 222258 250252 Ex
`
`2% 25
`
`
`
`US. Patent
`
`Sep. 1, 2009
`
`Sheet 5 of6
`
`US 7,584,095 B2
`
`14
`
`12 —
`a
`310 ‘
`ii, 8 —
`
`S 6 —
`E
`2 4 -
`5 2 _
`
`OUTPUT SEW
`
`MEAN-REMOVED SEW
`
`O l
`0
`
`I
`1
`
`I
`2
`
`1
`3
`
`1
`4
`
`1
`5
`1311s
`
`F
`e
`
`1
`7
`
`1
`8
`
`9
`
`2O
`
`18 ~
`
`A 16 -
`
`g 14 _
`% 12
`0
`% 1O -
`g
`B 8 “
`5
`6 _
`pl.
`8 4 _
`
`0
`
`HARMONICS
`RANGE
`
`E1 9-14
`1315-19
`20-24
`E]
`[II 25-29
`
`1:1 30-35
`
`1336-69
`
`1‘;
`73
`.32?
`"-1‘
`:i
`
`;I.;-;
`
`VOICED
`
`INTERMEDIATE
`
`UNVOICED
`
`
`
`US. Patent
`
`Sep. 1, 2009
`
`Sheet 6 of6
`
`US 7,584,095 B2
`
`.._
`
`"
`
`HARMONICS
`RANGE
`
`1.‘
`
`.
`
`9-14
`
`5115-19
`
`El 20-24
`
`13 25-29
`
`El 30-35
`
`H6" 9
`‘0
`
`9 _
`
`§ 8 -
`Q
`
`
`
`I5 § 6 _ 7 ~
`
`
`
`I..|_|
`
`g
`
`5
`a 4 T
`a?
`3 _
`%
`
`1 _
`
`O
`
`,.
`
`L,
`
`.
`
`.,
`
`35a, :37
`
`4
`
`.»
`
`-
`
`VOICED
`
`INTERMEDIATE
`
`UNVOICED
`
`I
`
`I
`
`VOICED RANGE
`I
`I
`Ew PREDICTOR
`
`I
`
`I
`
`'
`
`_
`
`_
`
`14
`
`-
`
`_
`
`14
`
`REw PREDICATOR
`
`l
`l
`I
`10
`a
`s
`INTERMEDIATE RANGE
`
`|
`12
`
`EW P EDICATOR
`
`'
`
`I
`
`SEW PREDICATOR
`
`1
`I
`s
`s
`UNVOICED RANGE
`
`1
`10
`
`1
`12
`
`I
`
`l
`
`I
`
`I
`
`I
`4
`
`I
`
`|
`4
`
`l
`
`1
`
`O5 .,
`0
`~05 -
`
`_1
`
`I
`2
`
`‘ '”'
`0.5 J
`o
`_0_5 _
`
`‘I
`
`1
`
`|
`2
`
`I
`
`_—Vw -
`
`O \ Y \
`
`4
`
`-05 —
`
`_1
`
`I
`2
`
`I
`4
`
`sEw PREDICTOR
`
`I
`6
`
`l
`8
`HARMONICS
`
`I
`10
`
`I
`12
`
`-
`
`14
`
`
`
`US 7,584,095 B2
`
`1
`REW PARAMETRIC VECTOR
`QUANTIZATION AND DUAL-PREDICTIVE
`SEW VECTOR QUANTIZATION FOR
`WAVEFORM INTERPOLATIVE CODING
`
`CROSS REFERENCE TO RELATED
`APPLICATION
`
`This application claims the bene?t of Provisional Patent
`Application No. 60/190,371 ?led Mar. 17, 2000, Which appli
`cation is herein incorporated by reference. This application is
`a divisional of US. patent application Ser. No. 09/811,187,
`?led Mar. 16, 2001 now US. Pat. No. 7,010,482.
`
`BACKGROUND OF THE INVENTION
`
`2
`magnitude Was quantized on a Waveform by Waveform base;
`see 0. Gottesman and A. Gersho, (1999), “Enhanced Wave
`form Interpolative Coding at 4 kbps”, IEEE Speech Coding
`Workshop, pp. 90-92, Finland; Finland. 0. Gottesman and A.
`Gersho, (1999), “Enhanced Analysis-by-Synthesis Wave
`form Interpolative Coding at 4 kbps”, EUROSPEECH’99,
`pp. 1443-1446, Hungary.
`
`SUMMARY OF THE INVENTION
`
`The present invention describes novel methods that
`enhance the performance of the WI coder, and alloWs for
`better coding ef?ciency improving on the above 1999 Got
`tesman and Gersho procedure. The present invention incor
`porates analysis-by-synthesis (AbS) for parameter estima
`tion, offers higher temporal and spectral resolution for the
`REW, and more e?icient quantization of the sloWly-evolving
`Waveform (SEW). In particular, the present invention pro
`poses a novel e?icient parametric representation of the REW
`magnitude, an e?icient paradigm for AbS predictive VQ of
`the REW parameter sequence, and dual-predictive AbS quan
`tization of the SEW.
`More particularly, the invention provides a method for
`interpolative coding input signals, the signals decomposed
`into or composed of a sloWly evolving Waveform and a rap
`idly evolving Waveform having a magnitude, the method
`incorporating at least one various, preferably combinations of
`the folloWing steps or can include all of the steps:
`(a) AbS VQ of the REW;
`(b) parametrizing the magnitude of the REW;
`(c) incorporating temporal Weighting in the AbS VQ of the
`REW;
`(d) incorporating spectral Weighting in the AbS VQ of the
`REW;
`(e) applying a ?lter to a vector quantizer codebook in the
`analysis-by-synthesis vector-quantization of the rapidly
`evolving Waveform Whereby to add self correlation to the
`codebook vectors; and
`(f) using a coder in Which a plurality of bits therein are
`allocated to the rapidly evolving Waveform magnitude.
`In addition, one can combine AbS quantization of the
`sloWly evolving Waveform With any or all of the foregoing
`parameters.
`The neW method achieves a substantial reduction in the
`REW bit rate and the EWI achieves very close to toll quality,
`at least under clean speech conditions. These and other fea
`tures, aspects, and advantages of the present invention Will
`become better understood With regard to the folloWing
`detailed description, appended claims, and accompanying
`draWings.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 is a REW Parametric Representation;
`FIG. 2 is a REW Parametric VQ;
`FIG. 3 is a REW Parametric Representation AbS VQ;
`FIG. 4 is a REW Parametric Representation Simpli?ed
`AbS VQ;
`FIG. 5 is a REW Parametric Representation Simpli?ed
`Weighted AbS VQ;
`FIG. 6 is a block diagram of the Dual Predictive AbS SEW
`vector quantization;
`FIG. 7 is a Weighted Signal-to-Noise Ratio (SNR) for Dual
`Predictive AbS SEW VQ;
`FIG. 8 is an output Weighted SNR for the 18 codebooks,
`9-bit AbS SEW VQ;
`
`The present invention relates to vector quantization (VQ)
`in speech coding systems using Waveform interpolation.
`In recent years, there has been increasing interest in achiev
`ing toll-quality speech coding at rates of 4 kbps and beloW.
`Currently, there is an ongoing 4 kbps standardization effort
`conducted by an international standards body (The Interna
`tional
`Telecommunications Union-Telecommunication
`(ITU-T) Standardization Sector). The expanding variety of
`emerging applications for speech coding, such as third gen
`eration Wireless netWorks and LoW Earth Orbit (LEO) sys
`tems, is motivating increased research efforts. The speech
`quality produced by Waveform coders such as code-excited
`linear prediction (CELP) coders degrades rapidly at rates
`beloW 5 kbps; see B. S. Atal, and M. R. Schroeder, (1984)
`“Stochastic Coding of Speech at Very LoW Bit Rate”, Proc.
`Int. Conf Comm, Amsterdam, pp. 1610-1613.
`On the other hand, parametric coders, such as: the Wave
`form-interpolative (WI) coder, the sinusoidal-transform
`coder (STC), and the multiband-excitation (MBE) coder, pro
`duce good quality at loW rates but they do not achieve toll
`quality; seeY Shoham, IEEEICASSP'93, Vol. II, pp. 167-170
`(1993); I. S. Burnett, and R. J. Holbeche, (1993), IEEE
`ICASSP'93, Vol. II, pp. 175-178; W. B. Kleijn, (1993), IEEE
`Trans. Speech andAudio Processing, Vol. 1, No. 4, pp. 386
`399; W. B. Kleijn, and J. Haagen, (1994), IEEE Signal Pro
`cessingLetters, Vol. 1, No. 9, pp. 136-138; W. B. Kleijn, and
`J. Haagen, (1995), IEEE ICASSP'95, pp. 508-511; W. B.
`Kleijn, and J. Haagen, (1995), in Speech Coding Synthesis by
`W. B. Kleijn and K. K. PaliWal, Elsevier Science B. V., Chap
`ter 5, pp. 175-207; I. S. Burnett, and G. J. Bradley, (1995),
`IEEE ICASSP'95, pp. 261-263, 1995; I. S. Burnett, and G. J.
`Bradley, (1995), IEEE Workshop on Speech Codingfor Tele
`communications, pp. 23-24; I. S. Burnett, and D. H. Pham,
`(1997), IEEE ICASSP'97, pp. 1567-1570; W. B. Kleijn, Y.
`Shoham, D. Sen, and R. Haagen, (1996), IEEE ICASSP'96,
`pp. 212-215;Y. Shoham, (1997), IEEEICASSP'97, pp. 1599
`1602; Y. Shoham, (1999), International Journal ofSpeech
`Technology, KluwerAcademic Publishers, pp. 329-341; R. J.
`McAulay, and T. F. Quatieri, (1995), in Speech Coding Syn
`thesis by W. B. Kleijn and K. K. PaliWal, Elsevier Science B.
`V., Chapter 4, pp. 121-173; and D. Grif?n, and J. S. Lim,
`(1988), IEEE Trans. ASSP, Vol. 36, No. 8, pp. 1223-1235.
`This is largely due to the lack of robustness of speech param
`eter estimation, Which is commonly done in open-loop, and to
`inadequate modeling of non-stationary speech segments.
`Commonly in WI coding, the similarity betWeen succes
`sive rapidly evolving Waveform (REW) magnitudes is
`exploited by doWnsampling and interpolation and by con
`strained bit allocation; see W. B. Kleijn, and J. Haagen,
`(1995), IEEE ICASSP'95, pp. 508-511. In a previous
`Enhanced Waveform Interpolative (EWI) coder the REW
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`
`
`US 7,584,095 B2
`
`3
`FIG. 9 is a mean-removed SEW’s Weighted SNR for the 18
`codebooks, 9-bit AbS SEW VQ; and
`FIG. 10 are predictors for three REW parameter ranges.
`
`DETAILED DESCRIPTION
`
`In very loW bit rate WI coding, the relation betWeen the
`SEW and the REW magnitudes Was exploited by computing
`the magnitude of one as the unity complement of the other;
`see W. B. Kleijn, and J . Haagen, (1995), “A Speech Coder
`Based on Decomposition of Characteristic Waveforms”,
`IEEE ICASSP’95, pp. 508-511; W. B. Kleijn, and J. Haagen,
`(1995), “Waveform Interpolation for Coding and Synthesis”,
`in Speech Coding Synthesis by W B. Kleijn and K. K. PaliWal,
`Elsevier Science B. V., Chapter 5, pp. 175-207; I. S. Burnett,
`and G. J. Bradley, (1995), “New Techniques for Multi-Proto
`type Waveform Coding at 2.84 kb/s”, IEEE ICASSP'95, pp.
`261-263, 1995; I. S. Burnett, and G. J. Bradley, (1995), “LoW
`Complexity Decomposition and Coding of Prototype Wave
`forms”, IEEE Workshop on Speech Coding for Telecommu
`nications, pp. 23-24; I. S. Burnett, and D. H. Pham, (1997),
`“Multi-Prototype Waveform Coding using Frame-by-Frame
`Analysis-by-Synthesis”, IEEE ICASSP'97, pp. 1567-1570;
`W. B. Kleijn, Y. Shoham, D. Sen, and R. Haagen, (1996), “A
`LoW-Complexity Waveform Interpolation Coder”, IEEE
`ICASSP'96, pp. 212-215; Y. Shoham, (1997), “Very LoW
`Complexity Interpolative Speech Coding at 1.2 to 2.4 kbps”,
`IEEE ICASSP'97, pp. 1599-1 602; Y. Shoham, (1999), “Low
`Complexity Speech Coding at 1.2 to 2.4 kbps Based on Wave
`form Interpolation”, International Journal of Speech Tech
`nology, KluWer Academic Publishers, pp. 329-341.
`Also, since the sequence of SEW magnitude evolves
`sloWly, successive SEWs exhibit similarity, offering oppor
`tunities for redundancy removal. Additional forms of redun
`dancy that may be exploited for coding ef?ciency are: (a) for
`a ?xed SEW/REW decomposition ?lter, the mean SEW mag
`nitude increases With the pitch period and (b) the similarity
`betWeen successive SEWs, also increases With the pitch
`period. In this Work We introduce a novel “dual-predictive”
`AbS paradigm for quantizing the SEW magnitude that opti
`mally exploits the information about the current quantized
`REW, the past quantized SEW, and the pitch, in order to
`predict the current SEW.
`
`20
`
`25
`
`30
`
`35
`
`40
`
`4
`REW Parametric Representation
`Direct quantization of the REW magnitude is a variable
`dimension quantization problem, Which may result in spend
`ing bits and computational effort on perceptually irrelevant
`information. A simple and practical Way to obtain a reduced,
`and ?xed, dimension representation of the REW is With a
`linear combination of basis functions, such as orthonormal
`polynomials; see W. B. Kleijn, Y. Shoham, D. Sen, and R.
`Haagen, (1996), IEEE ICASSP'96, pp. 212-215; Y Shoham,
`(1997),]EEEICASSP'97, pp. 1599-1602;Y Shoham, (1999),
`International Journal of Speech Technology, KluWer Aca
`demic Publishers, pp. 329-341 . Such a representation usually
`produces a smoother REW magnitude, and improves the per
`ceptual quality. Suppose the REW magnitude, R(u)), is rep
`resented by a linear combination of orthonormal functions,
`
`1:1
`Ru») - Z win-(w). 0 s w 5 7r
`
`(1)
`
`Where no is the angular frequency, and I is the representation
`order. The REW magnitude is typically an increasing func
`tion of frequency, Which, can be coarsely quantized With a loW
`number of bits per Waveform Without signi?cant perceptual
`degradation. Therefore, it may be advantageous to represent
`the REW magnitude in a simple, but perceptually relevant
`manner. Consequently We model the REW by the folloWing
`parametric representation, R(u),§):
`
`H
`1M. a =2 won-(w). 0 so in; 0 54:1
`[:0
`
`(2)
`
`, §,_1(g)]T is a parametric vector of
`.
`.
`Where \A((E):[\A(O(E), .
`coef?cients Within the representation model subspace, and E
`is the “unvoicing” parameter Which is zero for a fully voiced
`spectrum, and one for a fully unvoiced spectrum. Thus R(u),§)
`de?nes a tWo-dimensional surface Whose cross sections for
`each value of E give a particular REW magnitude spectrum,
`Which is de?ned merely by specifying a scalar parameter
`value.
`A simple and practical Way for parametric representation
`of the REW is, for example, by a parametric linear combina
`tion of basis functions, such as polynomials With parametric
`coe?icients, namely:
`
`For practical considerations assume that the parametric rep
`resentation is a pieceWise linear function of E, and may there
`fore be represented by a set of N uniformly spaced spectra, as
`illustrated in FIG. 1.
`
`REW Parametric Vector Quantization
`One can observe the similarity betWeen successive REW
`magnitude spectra, Which may suggest a potential gain by VQ
`of a set of successive REWs. FIG. 2 illustrates a simple
`parametric VQ system for a vector of REW spectra. The input
`is an M dimensional vector of REW magnitude spectra,
`
`45
`
`50
`
`Introduction to REW Quantization
`The REW represents the rapidly changing unvoiced
`attribute of speech. Commonly in WI systems, the REW is
`quantized on a Waveform by Waveform base. Hence, for loW
`rate WI systems having long frame size, and a large number of
`Waveforms per frame, the relative bitrate required for the
`REW becomes signi?cantly excessive. For example, consider
`a potential 2 kbps system Which uses a 240 sample frame, 12
`Waveforms per frame, and Which quantizes the SEW by alter
`nating bit allocation of 3 bit and 1 bit per Waveform. The REW
`55
`bitrate is then 24 bit per frame, or 800 kbps Which is 40% of
`the total bitrate. This example demonstrates the need for a
`more e?icient REW quantization.
`Ef?cient REW quantization can bene?t from tWo ob serva
`tions: (1) the REW magnitude is typically an increasing func
`tion of the frequency, Which suggests that an e?icient para
`metric representation may be used; (2) one can observe a
`similarity betWeen successive REW magnitude spectra,
`Which may suggest a potential gain by employing predictive
`VQ on a group of adjacent REWs. The next tWo sections
`propose REW parametric representation, and its respective
`
`60
`
`65
`
`
`
`5
`and the VQ output is an index, j, Which determines a quan
`tized parameter vector, E:
`
`6
`The quantized REW parameter is then given by:
`
`US 7,584,095 B2
`
`é:[é1>é2> -
`
`-
`
`- féMlT
`
`(5)
`
`5
`
`Which parametrically determines a vector of quantized spec
`tra:
`
`(13)
`
`é<w>:é<w.é>:tk<w.él11mg). -
`
`-
`
`- .iméMnT
`
`(6)
`
`In VQ case, the quantized parameter vector is given by:
`
`The encoder searches, in the parameter codebook C (16;), for
`the parameter vector Which minimizes the distortion:
`
`M
`
`é= argmi Z D(Rm. from} -
`
`gecqra W1
`
`M
`
`argrni Z
`gecqra W1
`
`(7)
`
`B. PieceWise Linear Parametric Representation
`In order to have a simple representation that is computa
`tionally e?icient and avoids excessive memory requirements,
`We model the tWo dimensional surface by a pieceWise linear
`parametric representation. Therefore, We introduce a set of N
`uniformly spaced spectra, {f{(uu,én)}n:o ‘1. Then the para
`metric surface is de?ned by linear interpolation according t:
`
`20
`
`For example, suppose the input REW magnitude is repre
`sented by an I-th dimensional vector of function coe?icients,
`y, given by:
`
`25
`
`VIP/0N1, -
`
`-
`
`- >YI-llT
`
`(8)
`
`For a set of M input REWs, each is of Which represented by a
`vector of polynomial coef?cients, ym, Which form a P><M
`input coef?cient matrix, I“:
`
`30
`
`Because this representation is linear, the coef?cients of
`IA{(u),E) are linear combinations of the coefficients of R(u),
`EM) and Rm.) Hence.
`
`Where y” is the coe?icient vector of the n-th REW magnitude
`function representation:
`
`i?é.)
`
`(17)
`
`In this case, the distortion may be interpolated by:
`
`zdwz
`
`(18)
`
`TIP/1N2, -
`
`-
`
`- NM]
`
`(9)
`
`The inverse VQ output is a vector of M quantized REWs,
`Which form the quantized function coe?icient matrix:
`
`?éHiél), 1(a). .
`
`.
`
`. re.»
`
`(10)
`
`Which is used by the decoder to compute the quantized spec
`tra.
`A. Quantization Using Orthonormal Functions
`Orthonormal functions, such as polynomials, may be used
`for e?icient quantization of the REW; see W. B. Kleijn, et al.,
`(1996), IEEE ICASSP'96, pp. 212-215; Y. Shoham, (1997),
`IEEE ICASSP'97, pp. 1599-1602; Y. Shoham, (1999), Inter
`national Journal of Speech Technology, KluWer Academic
`Publishers, pp. 329-341. Consider REW magnitude, R(u)),
`represented by a linear combination of orthonormal func
`tions, lpl-(uu):
`
`35
`
`40
`
`45
`
`50
`
`The above can be easily generalized to the parameter VQ
`case. The optimal interpolation factor that minimizes the
`distortion betWeen tWo representation vectors is given by:
`
`55
`
`Which is modeled using the parametric representation:
`
`60
`
`and the respective optimal parameter value, Which is a con
`tinuous variable betWeen zero and one, is given by:
`
`65
`
`This result alloWs a rapid search for the best unvoicing param
`eter value needed to transform the coe?icient vector to a
`scalar parameter, folloWed by the corresponding quantization
`scheme, as described in the section 4.
`
`
`
`US 7,584,095 B2
`
`7
`C. Weighted Distortion Quantization
`Commonly in speech coding, the magnitude is quantized
`using Weighted distortion measure. In this case the quantized
`REW parameter is then given by:
`
`8
`case. The optimal parameter that minimizes the spectrally
`Weighted distortion betWeen tWo representation vectors is
`given by:
`
`(Z1)
`
`110p: :
`
`(in — inilyxpbl — 9W1)
`
`(27)
`
`and the orthonormal function simpli?cation, given in equa
`tion (13), cannot be used. In this case, the Weighted distortion
`betWeen the input and the parametric representation modeled
`spectra is equal to:
`
`DW(R, 115)) =
`
`[0.
`
`(22)
`
`Where II'(W(uu)) is the Weighted correlation matrix of the
`orthonormal functions, its elements are:
`
`y is the input coef?cient vectors, and WE) is the modeled
`parametric coe?icient vector. In VQ case, the quantized
`parameter vector is given by:
`
`A
`
`q
`
`M
`
`H
`
`g = 22%;?)
`
`DAR... Rem} =
`
`(24)
`
`M
`
`argmi 2 (7m — wemfwwmwmm — Wm}
`560.7(5) W1
`
`D. Weighted DistortioniPieceWise Linear Parametric
`Representation
`Again, for practical considerations assume that the para
`metric representation is pieceWise linear, and may be repre
`sented by a set of N spectra, {IA{(u),én)}n:ON '1. For the piece
`Wise linear representation, the interpolated quantized
`coe?icient vector is:
`
`H
`
`(25)
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`In the case Where parameter VQ is employed, the interpola
`tion alloWs for a substantial simpli?cation of the search com
`putations. In this case, the distortion can be interpolated:
`
`60
`
`The above can be easily generalized to the parameter VQ
`
`65
`
`and the respective optimal parameter value, Which is a con
`tinuous variable betWeen zero and one, is given by equation
`(20). This result alloWs a rapid search for the best unvoicing
`parameter value needed to transform the coef?cient vector to
`a scalar parameter, for encoding or for VQ design. Altema
`tively, in order to eliminate using the matrix 11), the scalar
`product may rede?ned to incorporate the time-varying spec
`tral Weighting. The respective orthonormal basis functions
`then satisfy:
`
`Where 6(i-j) denotes Kroneker delta. The respective param
`eter vector is given by:
`
`. , 1p,_1]Tis an I-th dimensional vector
`.
`Where 1p(w):[1pO, 1p 1, .
`of time-varying orthonormal functions.
`REW Parameter Analysis-By-Synthesis VQ
`This section presents the AbS VQ paradigm for the REW
`parameter. The ?rst presentation is a system Which quantizes
`the REW parameter by employing spectral based AbS. Then
`simpli?ed systems, Which apply AbS to the REW parameter,
`are presented.
`A. REW Parameter Quantization by Magnitude AbS VQ
`The novel Analysis-by-Synthesis (AbS) REW parameter
`VQ technique is illustrated in FIG. 3. An excitation vector
`cZ-J-(m) (m:l; .
`.
`. ,M) is selected from the VQ codebook and is
`fed through a synthesis ?lter to obtain a parameter vector
`i@(m) (synthesized quantized) Which is then mapped to quan
`tized a representation coe?icient vectors
`This is
`compared With a sequence of input representation coef?cient
`vectors y(m) and each is spectrally Weighted. Each spectrally
`Weighted error is then temporally Weighted, and a distortion
`measure is obtained. A search through all candidate excitation
`vectors determines an optimal choice. The synthesis ?lter in
`FIG. 3 can be vieWed as a ?rst order predictor in a feedback
`loop. (While shoWn here is an auto -regressive synthesis ?lter,
`in other arrangements moving-average (MA) synthesis ?lter
`may be used.) By alloWing the value of the pre