`
`(12) United States Patent
`Gottesman et a1.
`
`(10) Patent N0.:
`(45) Date of Patent:
`
`US 7,584,095 B2
`*Sep. 1, 2009
`
`(54)
`
`(75)
`
`(73)
`
`(21)
`
`(22)
`
`(65)
`
`(62)
`
`(60)
`
`(51)
`
`(52)
`(58)
`
`(56)
`
`REW PARAMETRIC VECTOR
`QUANTIZATION AND DUAL-PREDICTIVE
`SEW VECTOR QUANTIZATION FOR
`WAVEFORM INTERPOLATIVE CODING
`
`Inventors: Oded Gottesman, Goleta, CA (US);
`
`7/2002 Udaya Bhaskar et a1.
`6,418,408 B1 *
`6,493,664 B1 * 12/2002 Udaya Bhaskar et a1.
`6,691,092 B1* 2/2004 Udaya Bhaskar et a1.
`
`704/219
`704/222
`704/265
`
`_
`Nonce:
`
`Allen Gersho, Goleta, Assignee: The_ Regfmts of the University of
`
`Callfornla, Oakland: CA (Us)
`_
`_
`_
`_
`Sub?q to any dlsclalmeri the term Ofthls
`patem 15 extended Or adlusted under 35
`U'S'C' 154(1)) by 98 days‘
`1 d_
`_
`Th
`_
`b_
`1 1.5 Patent 15 Su Ject to a tenmna 1S-
`C almer'
`APPI' NO‘: 11/234,631
`
`Oded Gottesman et al., “Enhancing Waveform Interpolative Coding
`With Weighted REW Parametric Quantization,” IEEE Workshop on
`Speech Coding (2000), pp. 1-3.
`I.S. Burnett et al., “Multi-Prototype Waveform Coding Using Frame
`By-Frame Analysis-By-Synthesis,” Department of Eelctrical and
`Computer Engineering, University of Wollongong, NSW, Australia
`(1997), pp. 1567-1570.
`I.S. Burnett et al., “New Techniques for Multi-Prototype Waveform
`Coding at 2.84kb/s,” Department of Electrical and Computer Engi
`r212e‘r1ng,Un1vers1ty ofWollongong, NSW, Austral1a (1995), pp. 261
`
`Filed:
`
`sep_ 23 2005
`a
`Prior Publication Data
`
`US 2006/0069554 A1
`
`Mar. 30, 2006
`
`-
`-
`RltdU.S.Al t Dt
`PP lea Ion a a
`e a 6
`Division of application No. 09/811,187, ?led on Mar.
`16, 2001, noW Pat. No. 7,010,482.
`
`Provisional application No. 60/190,371, ?led on Mar.
`17 2000.
`’
`Int_ CL
`(200601)
`G10L 19/14
`US. Cl. .................................................... .. 704/222
`Field of Classi?cation Search ................ .. 704/222
`See application ?le for complete search history.
`.
`R f
`C t d
`e erences l e
`U.S. PATENT DOCUMENTS
`
`LS. Burnett et al., “Low Complexity Decomposition and Coding of
`Prototype Waveforms,” Dept. of Electrical and Computer Eng., Uni
`versity ofWollongong, NSW, 2522, Australia, pp. 23-24.
`
`(Continued)
`
`Primar ExamineriSusan McFadden
`y
`(74) Attorney, Agent, or FirmiBerliner & Associates
`
`(57)
`
`ABSTRACT
`
`An enhanced analysis-by-synthesis Waveform interpolative
`speech coder able to operate at 2.8 kbps. Novel features
`include dual-predictive analysis-by-synthesis quantization of
`the slowly-evolving Waveform, ef?eient parametrization of
`the rapidly-evolving Waveform magnitude, and analysis-by
`synthesis vector quantization of the rapidly evolving Wave
`form parameter. Subjective quality tests indicate that it
`exceeds G.723.1 at 5.3 kbps, and of G.723.1 at 6.3 kbps.
`
`5,924,061 A *
`
`7/1999 Shoham .................... .. 704/218
`
`18 Claims, 6 Drawing Sheets
`
`v0
`
`VQ-1
`
`_
`
`_
`J
`
`2
`minl |*||
`
`5(0))
`— +
`VECTOR
`OF REW 'R‘ ~ w
`SPECTRA
`(5'
`)
`mam)
`$1
`VECTOR
`QUANTIZER
`CODEBOOK
`l_—__
`
`1'
`
`1 4
`VECTOR
`QUANTIZER —- R(E,(0)
`CODEBOOK
`
`2(9)
`VECTOR OF
`QUANTIZED
`REW
`SPECTRA
`
`Saint Lawrence Communications, LLC
`IPR2016-00704
`Exhibit 2017
`
`
`
`US 7,584,095 B2
`Page 2
`
`OTHER PUBLICATIONS
`
`I.S. Burnett et al., “A Mixed Prototype Waveform/CELP Coder for
`Sub 3KB/S,” School of Elecronic and Electrical Engineering, Uni
`versity of Bath, UK. BA2 7AY (1993), pp. II-175-II-178.
`Oded Gottesman, “Dispersion Phase Vector Quantization for
`Enhancement of Waveform Interpolative Coder,” Signal Compres
`sion Laboratory, Department of Electrical and Computer Engineer
`ing, University of California, Santa Barbara, Calilfornia 93106,
`USA, pp. 1-4.
`Oded Gottesman et a1 ., “Enhanced Waveform Interpolative Coding at
`4 KBPS,” Signal Compression Laboratory, Department of Electrical
`and Computer Engineering, University of California, Santa Barbara,
`California 93106, USA, pp. 1-3.
`Oded Gottesman et al., “High Quality Enhanced Waveform Interpo
`lative Coding at 2.8 KBPS,” IEEE International Conference on
`Acoustics, Speech, and Signal Processing, 2000, pp. 1-4.
`Oded Gottesman et a1 ., “Enhanced Analysis-by-Synthesis Waveform
`Interpolative Coding at 4 KBPS,” Signal Compression Laboratory,
`Department of Electrical and Computer Engineering, University of
`California, Santab Barbara, California 93106, USA, pp. 1-4.
`Daniel W. Grif?n et al., “Multiband Excitation Vocoder,” IEEE
`Transactions on Acoustics, Speech, and Signal Processing (1988)
`36(8):1223-1235.
`W. Bastiaan Kleijn et al., “A Speech Coder Based on Decomposition
`of Characteristic Waveforms,” IEEE (1995), p. 508-511.
`
`W. Bastiaan Kleijn et al., “Waveform Interpolation for Coding and
`Synthesis,” Speech Coding and Synthesis (1995), pp. 175-207.
`W. Bastiaan Kleijn et al., “Transformation and Decomposition of the
`Speech Signal for Coding,” IEEE Signal Procesing Letters 1(9): 136
`138 (1994).
`W. Bastiaan Kleijn, “Encoding SpeechUsing Prototype Waveforms,”
`IEE Transactions on Speech and Audio Processing 1(4):386-399
`(1993).
`W. Bastiaan Kleijn, “Continuous Representations in Linear Predic
`tive Coding,” Speech Research Department, AT&T Bell Laborato
`ries, Murray Hill, NJ 07974 (1991), pp. 201-204.
`W. Bastiaan Kleijn et al., “A Low-Complexity Waveform Interpola
`tion Coder,” Speech Coding Research Department, AT&T Bell Labo
`ratories, 600 Mountain Avenue, Murray Hill, NJ 07974, USA (1996),
`pp. 212-215.
`R]. McAulay et al., “Sinusoidal Coding,” Speech Coding and Syn
`thesis 4:121-173 (1995).
`Yair Shoham, “High-Quality Speech Coding at 2.4 to 4.0 KBPS
`Based on Time-Frequency Interpolation,” IEEE, pp. II-167-II-170
`(1993).
`Yair Shoham, “Very Low Complexity Interpolative Speech Coding at
`1.2 to 2.4 KBPS,” IEEE, pp. 1599-1602 (1997).
`Yair Shoham, “Low Complexity Speech Coding at 1.2 to 2.4 kbps
`Based on Waveform Interpolation,” International Journal of Speech
`Technology 2:329-341 (1999).
`* cited by examiner
`
`
`
`US. Patent
`
`Sep. 1, 2009
`
`Sheet 1 0f 6
`
`US 7,584,095 B2
`
`
`
`33:22: 251
`
`n-1
`
`REW PARAMETER f
`
`3(a))
`VECTOR
`OF REW "“
`SPECTRA RG'w)
`
`+
`
`_
`
`or?
`
`VECTOR
`ER
`U T
`EB
`
`F/G.
`
`
`
`U.S. Patent
`
`Sep. 1, 2009
`
`Sheet 2 of6
`
`US 7,584,095 B2
`
`Eamam
`
`ozcxomg
`
`._<~_on_2H.:
`
`ezzzoma
`
`SEE
`
`
`
` _TN.nI$5:
`
`I_mfizza
`
`%§BE
`
`E0880
`
`58>
`
`$N:z§o
`
`xoommooo
`
`
`
`EXmzficboo230258EE0on:
`
`E0220
`
`M,GE
`
`
`
`
`
`
`U.S. Patent
`
`Sep. 1, 2009
`
`Sheet 3 of 6
`
`2B
`
`MozzzumgM§E%.555_7,"sX8880----L_.U$9250QQC
`
`58>
`
`
`
`
`US. Patent
`
`Sep. 1, 2009
`
`Sheet 4 of6
`
`US 7,584,095 B2
`
`m 6K
`
`02:55;
`IZEBQQ 44201 2%
`
`
`
`358
`
`2%
`
`2%
`
`@255;
`
`2; b w
`
`E W
`
`N
`
`
`
`T a; NEE
`
`~66?
`
`153250
`
`xoommaoo
`
`55% 2% Q5
`
`
`
`2? 222258 250252 Ex
`
`2% 25
`
`
`
`US. Patent
`
`Sep. 1, 2009
`
`Sheet 5 of6
`
`US 7,584,095 B2
`
`14
`
`12 —
`a
`310 ‘
`ii, 8 —
`
`S 6 —
`E
`2 4 -
`5 2 _
`
`OUTPUT SEW
`
`MEAN-REMOVED SEW
`
`O l
`0
`
`I
`1
`
`I
`2
`
`1
`3
`
`1
`4
`
`1
`5
`1311s
`
`F
`e
`
`1
`7
`
`1
`8
`
`9
`
`2O
`
`18 ~
`
`A 16 -
`
`g 14 _
`% 12
`0
`% 1O -
`g
`B 8 “
`5
`6 _
`pl.
`8 4 _
`
`0
`
`HARMONICS
`RANGE
`
`E1 9-14
`1315-19
`20-24
`E]
`[II 25-29
`
`1:1 30-35
`
`1336-69
`
`1‘;
`73
`.32?
`"-1‘
`:i
`
`;I.;-;
`
`VOICED
`
`INTERMEDIATE
`
`UNVOICED
`
`
`
`US. Patent
`
`Sep. 1, 2009
`
`Sheet 6 of6
`
`US 7,584,095 B2
`
`.._
`
`"
`
`HARMONICS
`RANGE
`
`1.‘
`
`.
`
`9-14
`
`5115-19
`
`El 20-24
`
`13 25-29
`
`El 30-35
`
`H6" 9
`‘0
`
`9 _
`
`§ 8 -
`Q
`
`
`
`I5 § 6 _ 7 ~
`
`
`
`I..|_|
`
`g
`
`5
`a 4 T
`a?
`3 _
`%
`
`1 _
`
`O
`
`,.
`
`L,
`
`.
`
`.,
`
`35a, :37
`
`4
`
`.»
`
`-
`
`VOICED
`
`INTERMEDIATE
`
`UNVOICED
`
`I
`
`I
`
`VOICED RANGE
`I
`I
`Ew PREDICTOR
`
`I
`
`I
`
`'
`
`_
`
`_
`
`14
`
`-
`
`_
`
`14
`
`REw PREDICATOR
`
`l
`l
`I
`10
`a
`s
`INTERMEDIATE RANGE
`
`|
`12
`
`EW P EDICATOR
`
`'
`
`I
`
`SEW PREDICATOR
`
`1
`I
`s
`s
`UNVOICED RANGE
`
`1
`10
`
`1
`12
`
`I
`
`l
`
`I
`
`I
`
`I
`4
`
`I
`
`|
`4
`
`l
`
`1
`
`O5 .,
`0
`~05 -
`
`_1
`
`I
`2
`
`‘ '”'
`0.5 J
`o
`_0_5 _
`
`‘I
`
`1
`
`|
`2
`
`I
`
`_—Vw -
`
`O \ Y \
`
`4
`
`-05 —
`
`_1
`
`I
`2
`
`I
`4
`
`sEw PREDICTOR
`
`I
`6
`
`l
`8
`HARMONICS
`
`I
`10
`
`I
`12
`
`-
`
`14
`
`
`
`US 7,584,095 B2
`
`1
`REW PARAMETRIC VECTOR
`QUANTIZATION AND DUAL-PREDICTIVE
`SEW VECTOR QUANTIZATION FOR
`WAVEFORM INTERPOLATIVE CODING
`
`CROSS REFERENCE TO RELATED
`APPLICATION
`
`This application claims the bene?t of Provisional Patent
`Application No. 60/190,371 ?led Mar. 17, 2000, Which appli
`cation is herein incorporated by reference. This application is
`a divisional of US. patent application Ser. No. 09/811,187,
`?led Mar. 16, 2001 now US. Pat. No. 7,010,482.
`
`BACKGROUND OF THE INVENTION
`
`2
`magnitude Was quantized on a Waveform by Waveform base;
`see 0. Gottesman and A. Gersho, (1999), “Enhanced Wave
`form Interpolative Coding at 4 kbps”, IEEE Speech Coding
`Workshop, pp. 90-92, Finland; Finland. 0. Gottesman and A.
`Gersho, (1999), “Enhanced Analysis-by-Synthesis Wave
`form Interpolative Coding at 4 kbps”, EUROSPEECH’99,
`pp. 1443-1446, Hungary.
`
`SUMMARY OF THE INVENTION
`
`The present invention describes novel methods that
`enhance the performance of the WI coder, and alloWs for
`better coding ef?ciency improving on the above 1999 Got
`tesman and Gersho procedure. The present invention incor
`porates analysis-by-synthesis (AbS) for parameter estima
`tion, offers higher temporal and spectral resolution for the
`REW, and more e?icient quantization of the sloWly-evolving
`Waveform (SEW). In particular, the present invention pro
`poses a novel e?icient parametric representation of the REW
`magnitude, an e?icient paradigm for AbS predictive VQ of
`the REW parameter sequence, and dual-predictive AbS quan
`tization of the SEW.
`More particularly, the invention provides a method for
`interpolative coding input signals, the signals decomposed
`into or composed of a sloWly evolving Waveform and a rap
`idly evolving Waveform having a magnitude, the method
`incorporating at least one various, preferably combinations of
`the folloWing steps or can include all of the steps:
`(a) AbS VQ of the REW;
`(b) parametrizing the magnitude of the REW;
`(c) incorporating temporal Weighting in the AbS VQ of the
`REW;
`(d) incorporating spectral Weighting in the AbS VQ of the
`REW;
`(e) applying a ?lter to a vector quantizer codebook in the
`analysis-by-synthesis vector-quantization of the rapidly
`evolving Waveform Whereby to add self correlation to the
`codebook vectors; and
`(f) using a coder in Which a plurality of bits therein are
`allocated to the rapidly evolving Waveform magnitude.
`In addition, one can combine AbS quantization of the
`sloWly evolving Waveform With any or all of the foregoing
`parameters.
`The neW method achieves a substantial reduction in the
`REW bit rate and the EWI achieves very close to toll quality,
`at least under clean speech conditions. These and other fea
`tures, aspects, and advantages of the present invention Will
`become better understood With regard to the folloWing
`detailed description, appended claims, and accompanying
`draWings.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 is a REW Parametric Representation;
`FIG. 2 is a REW Parametric VQ;
`FIG. 3 is a REW Parametric Representation AbS VQ;
`FIG. 4 is a REW Parametric Representation Simpli?ed
`AbS VQ;
`FIG. 5 is a REW Parametric Representation Simpli?ed
`Weighted AbS VQ;
`FIG. 6 is a block diagram of the Dual Predictive AbS SEW
`vector quantization;
`FIG. 7 is a Weighted Signal-to-Noise Ratio (SNR) for Dual
`Predictive AbS SEW VQ;
`FIG. 8 is an output Weighted SNR for the 18 codebooks,
`9-bit AbS SEW VQ;
`
`The present invention relates to vector quantization (VQ)
`in speech coding systems using Waveform interpolation.
`In recent years, there has been increasing interest in achiev
`ing toll-quality speech coding at rates of 4 kbps and beloW.
`Currently, there is an ongoing 4 kbps standardization effort
`conducted by an international standards body (The Interna
`tional
`Telecommunications Union-Telecommunication
`(ITU-T) Standardization Sector). The expanding variety of
`emerging applications for speech coding, such as third gen
`eration Wireless netWorks and LoW Earth Orbit (LEO) sys
`tems, is motivating increased research efforts. The speech
`quality produced by Waveform coders such as code-excited
`linear prediction (CELP) coders degrades rapidly at rates
`beloW 5 kbps; see B. S. Atal, and M. R. Schroeder, (1984)
`“Stochastic Coding of Speech at Very LoW Bit Rate”, Proc.
`Int. Conf Comm, Amsterdam, pp. 1610-1613.
`On the other hand, parametric coders, such as: the Wave
`form-interpolative (WI) coder, the sinusoidal-transform
`coder (STC), and the multiband-excitation (MBE) coder, pro
`duce good quality at loW rates but they do not achieve toll
`quality; seeY Shoham, IEEEICASSP'93, Vol. II, pp. 167-170
`(1993); I. S. Burnett, and R. J. Holbeche, (1993), IEEE
`ICASSP'93, Vol. II, pp. 175-178; W. B. Kleijn, (1993), IEEE
`Trans. Speech andAudio Processing, Vol. 1, No. 4, pp. 386
`399; W. B. Kleijn, and J. Haagen, (1994), IEEE Signal Pro
`cessingLetters, Vol. 1, No. 9, pp. 136-138; W. B. Kleijn, and
`J. Haagen, (1995), IEEE ICASSP'95, pp. 508-511; W. B.
`Kleijn, and J. Haagen, (1995), in Speech Coding Synthesis by
`W. B. Kleijn and K. K. PaliWal, Elsevier Science B. V., Chap
`ter 5, pp. 175-207; I. S. Burnett, and G. J. Bradley, (1995),
`IEEE ICASSP'95, pp. 261-263, 1995; I. S. Burnett, and G. J.
`Bradley, (1995), IEEE Workshop on Speech Codingfor Tele
`communications, pp. 23-24; I. S. Burnett, and D. H. Pham,
`(1997), IEEE ICASSP'97, pp. 1567-1570; W. B. Kleijn, Y.
`Shoham, D. Sen, and R. Haagen, (1996), IEEE ICASSP'96,
`pp. 212-215;Y. Shoham, (1997), IEEEICASSP'97, pp. 1599
`1602; Y. Shoham, (1999), International Journal ofSpeech
`Technology, KluwerAcademic Publishers, pp. 329-341; R. J.
`McAulay, and T. F. Quatieri, (1995), in Speech Coding Syn
`thesis by W. B. Kleijn and K. K. PaliWal, Elsevier Science B.
`V., Chapter 4, pp. 121-173; and D. Grif?n, and J. S. Lim,
`(1988), IEEE Trans. ASSP, Vol. 36, No. 8, pp. 1223-1235.
`This is largely due to the lack of robustness of speech param
`eter estimation, Which is commonly done in open-loop, and to
`inadequate modeling of non-stationary speech segments.
`Commonly in WI coding, the similarity betWeen succes
`sive rapidly evolving Waveform (REW) magnitudes is
`exploited by doWnsampling and interpolation and by con
`strained bit allocation; see W. B. Kleijn, and J. Haagen,
`(1995), IEEE ICASSP'95, pp. 508-511. In a previous
`Enhanced Waveform Interpolative (EWI) coder the REW
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`
`
`US 7,584,095 B2
`
`3
`FIG. 9 is a mean-removed SEW’s Weighted SNR for the 18
`codebooks, 9-bit AbS SEW VQ; and
`FIG. 10 are predictors for three REW parameter ranges.
`
`DETAILED DESCRIPTION
`
`In very loW bit rate WI coding, the relation betWeen the
`SEW and the REW magnitudes Was exploited by computing
`the magnitude of one as the unity complement of the other;
`see W. B. Kleijn, and J . Haagen, (1995), “A Speech Coder
`Based on Decomposition of Characteristic Waveforms”,
`IEEE ICASSP’95, pp. 508-511; W. B. Kleijn, and J. Haagen,
`(1995), “Waveform Interpolation for Coding and Synthesis”,
`in Speech Coding Synthesis by W B. Kleijn and K. K. PaliWal,
`Elsevier Science B. V., Chapter 5, pp. 175-207; I. S. Burnett,
`and G. J. Bradley, (1995), “New Techniques for Multi-Proto
`type Waveform Coding at 2.84 kb/s”, IEEE ICASSP'95, pp.
`261-263, 1995; I. S. Burnett, and G. J. Bradley, (1995), “LoW
`Complexity Decomposition and Coding of Prototype Wave
`forms”, IEEE Workshop on Speech Coding for Telecommu
`nications, pp. 23-24; I. S. Burnett, and D. H. Pham, (1997),
`“Multi-Prototype Waveform Coding using Frame-by-Frame
`Analysis-by-Synthesis”, IEEE ICASSP'97, pp. 1567-1570;
`W. B. Kleijn, Y. Shoham, D. Sen, and R. Haagen, (1996), “A
`LoW-Complexity Waveform Interpolation Coder”, IEEE
`ICASSP'96, pp. 212-215; Y. Shoham, (1997), “Very LoW
`Complexity Interpolative Speech Coding at 1.2 to 2.4 kbps”,
`IEEE ICASSP'97, pp. 1599-1 602; Y. Shoham, (1999), “Low
`Complexity Speech Coding at 1.2 to 2.4 kbps Based on Wave
`form Interpolation”, International Journal of Speech Tech
`nology, KluWer Academic Publishers, pp. 329-341.
`Also, since the sequence of SEW magnitude evolves
`sloWly, successive SEWs exhibit similarity, offering oppor
`tunities for redundancy removal. Additional forms of redun
`dancy that may be exploited for coding ef?ciency are: (a) for
`a ?xed SEW/REW decomposition ?lter, the mean SEW mag
`nitude increases With the pitch period and (b) the similarity
`betWeen successive SEWs, also increases With the pitch
`period. In this Work We introduce a novel “dual-predictive”
`AbS paradigm for quantizing the SEW magnitude that opti
`mally exploits the information about the current quantized
`REW, the past quantized SEW, and the pitch, in order to
`predict the current SEW.
`
`20
`
`25
`
`30
`
`35
`
`40
`
`4
`REW Parametric Representation
`Direct quantization of the REW magnitude is a variable
`dimension quantization problem, Which may result in spend
`ing bits and computational effort on perceptually irrelevant
`information. A simple and practical Way to obtain a reduced,
`and ?xed, dimension representation of the REW is With a
`linear combination of basis functions, such as orthonormal
`polynomials; see W. B. Kleijn, Y. Shoham, D. Sen, and R.
`Haagen, (1996), IEEE ICASSP'96, pp. 212-215; Y Shoham,
`(1997),]EEEICASSP'97, pp. 1599-1602;Y Shoham, (1999),
`International Journal of Speech Technology, KluWer Aca
`demic Publishers, pp. 329-341 . Such a representation usually
`produces a smoother REW magnitude, and improves the per
`ceptual quality. Suppose the REW magnitude, R(u)), is rep
`resented by a linear combination of orthonormal functions,
`
`1:1
`Ru») - Z win-(w). 0 s w 5 7r
`
`(1)
`
`Where no is the angular frequency, and I is the representation
`order. The REW magnitude is typically an increasing func
`tion of frequency, Which, can be coarsely quantized With a loW
`number of bits per Waveform Without signi?cant perceptual
`degradation. Therefore, it may be advantageous to represent
`the REW magnitude in a simple, but perceptually relevant
`manner. Consequently We model the REW by the folloWing
`parametric representation, R(u),§):
`
`H
`1M. a =2 won-(w). 0 so in; 0 54:1
`[:0
`
`(2)
`
`, §,_1(g)]T is a parametric vector of
`.
`.
`Where \A((E):[\A(O(E), .
`coef?cients Within the representation model subspace, and E
`is the “unvoicing” parameter Which is zero for a fully voiced
`spectrum, and one for a fully unvoiced spectrum. Thus R(u),§)
`de?nes a tWo-dimensional surface Whose cross sections for
`each value of E give a particular REW magnitude spectrum,
`Which is de?ned merely by specifying a scalar parameter
`value.
`A simple and practical Way for parametric representation
`of the REW is, for example, by a parametric linear combina
`tion of basis functions, such as polynomials With parametric
`coe?icients, namely:
`
`For practical considerations assume that the parametric rep
`resentation is a pieceWise linear function of E, and may there
`fore be represented by a set of N uniformly spaced spectra, as
`illustrated in FIG. 1.
`
`REW Parametric Vector Quantization
`One can observe the similarity betWeen successive REW
`magnitude spectra, Which may suggest a potential gain by VQ
`of a set of successive REWs. FIG. 2 illustrates a simple
`parametric VQ system for a vector of REW spectra. The input
`is an M dimensional vector of REW magnitude spectra,
`
`45
`
`50
`
`Introduction to REW Quantization
`The REW represents the rapidly changing unvoiced
`attribute of speech. Commonly in WI systems, the REW is
`quantized on a Waveform by Waveform base. Hence, for loW
`rate WI systems having long frame size, and a large number of
`Waveforms per frame, the relative bitrate required for the
`REW becomes signi?cantly excessive. For example, consider
`a potential 2 kbps system Which uses a 240 sample frame, 12
`Waveforms per frame, and Which quantizes the SEW by alter
`nating bit allocation of 3 bit and 1 bit per Waveform. The REW
`55
`bitrate is then 24 bit per frame, or 800 kbps Which is 40% of
`the total bitrate. This example demonstrates the need for a
`more e?icient REW quantization.
`Ef?cient REW quantization can bene?t from tWo ob serva
`tions: (1) the REW magnitude is typically an increasing func
`tion of the frequency, Which suggests that an e?icient para
`metric representation may be used; (2) one can observe a
`similarity betWeen successive REW magnitude spectra,
`Which may suggest a potential gain by employing predictive
`VQ on a group of adjacent REWs. The next tWo sections
`propose REW parametric representation, and its respective
`
`60
`
`65
`
`
`
`5
`and the VQ output is an index, j, Which determines a quan
`tized parameter vector, E:
`
`6
`The quantized REW parameter is then given by:
`
`US 7,584,095 B2
`
`é:[é1>é2> -
`
`-
`
`- féMlT
`
`(5)
`
`5
`
`Which parametrically determines a vector of quantized spec
`tra:
`
`(13)
`
`é<w>:é<w.é>:tk<w.él11mg). -
`
`-
`
`- .iméMnT
`
`(6)
`
`In VQ case, the quantized parameter vector is given by:
`
`The encoder searches, in the parameter codebook C (16;), for
`the parameter vector Which minimizes the distortion:
`
`M
`
`é= argmi Z D(Rm. from} -
`
`gecqra W1
`
`M
`
`argrni Z
`gecqra W1
`
`(7)
`
`B. PieceWise Linear Parametric Representation
`In order to have a simple representation that is computa
`tionally e?icient and avoids excessive memory requirements,
`We model the tWo dimensional surface by a pieceWise linear
`parametric representation. Therefore, We introduce a set of N
`uniformly spaced spectra, {f{(uu,én)}n:o ‘1. Then the para
`metric surface is de?ned by linear interpolation according t:
`
`20
`
`For example, suppose the input REW magnitude is repre
`sented by an I-th dimensional vector of function coe?icients,
`y, given by:
`
`25
`
`VIP/0N1, -
`
`-
`
`- >YI-llT
`
`(8)
`
`For a set of M input REWs, each is of Which represented by a
`vector of polynomial coef?cients, ym, Which form a P><M
`input coef?cient matrix, I“:
`
`30
`
`Because this representation is linear, the coef?cients of
`IA{(u),E) are linear combinations of the coefficients of R(u),
`EM) and Rm.) Hence.
`
`Where y” is the coe?icient vector of the n-th REW magnitude
`function representation:
`
`i?é.)
`
`(17)
`
`In this case, the distortion may be interpolated by:
`
`zdwz
`
`(18)
`
`TIP/1N2, -
`
`-
`
`- NM]
`
`(9)
`
`The inverse VQ output is a vector of M quantized REWs,
`Which form the quantized function coe?icient matrix:
`
`?éHiél), 1(a). .
`
`.
`
`. re.»
`
`(10)
`
`Which is used by the decoder to compute the quantized spec
`tra.
`A. Quantization Using Orthonormal Functions
`Orthonormal functions, such as polynomials, may be used
`for e?icient quantization of the REW; see W. B. Kleijn, et al.,
`(1996), IEEE ICASSP'96, pp. 212-215; Y. Shoham, (1997),
`IEEE ICASSP'97, pp. 1599-1602; Y. Shoham, (1999), Inter
`national Journal of Speech Technology, KluWer Academic
`Publishers, pp. 329-341. Consider REW magnitude, R(u)),
`represented by a linear combination of orthonormal func
`tions, lpl-(uu):
`
`35
`
`40
`
`45
`
`50
`
`The above can be easily generalized to the parameter VQ
`case. The optimal interpolation factor that minimizes the
`distortion betWeen tWo representation vectors is given by:
`
`55
`
`Which is modeled using the parametric representation:
`
`60
`
`and the respective optimal parameter value, Which is a con
`tinuous variable betWeen zero and one, is given by:
`
`65
`
`This result alloWs a rapid search for the best unvoicing param
`eter value needed to transform the coe?icient vector to a
`scalar parameter, folloWed by the corresponding quantization
`scheme, as described in the section 4.
`
`
`
`US 7,584,095 B2
`
`7
`C. Weighted Distortion Quantization
`Commonly in speech coding, the magnitude is quantized
`using Weighted distortion measure. In this case the quantized
`REW parameter is then given by:
`
`8
`case. The optimal parameter that minimizes the spectrally
`Weighted distortion betWeen tWo representation vectors is
`given by:
`
`(Z1)
`
`110p: :
`
`(in — inilyxpbl — 9W1)
`
`(27)
`
`and the orthonormal function simpli?cation, given in equa
`tion (13), cannot be used. In this case, the Weighted distortion
`betWeen the input and the parametric representation modeled
`spectra is equal to:
`
`DW(R, 115)) =
`
`[0.
`
`(22)
`
`Where II'(W(uu)) is the Weighted correlation matrix of the
`orthonormal functions, its elements are:
`
`y is the input coef?cient vectors, and WE) is the modeled
`parametric coe?icient vector. In VQ case, the quantized
`parameter vector is given by:
`
`A
`
`q
`
`M
`
`H
`
`g = 22%;?)
`
`DAR... Rem} =
`
`(24)
`
`M
`
`argmi 2 (7m — wemfwwmwmm — Wm}
`560.7(5) W1
`
`D. Weighted DistortioniPieceWise Linear Parametric
`Representation
`Again, for practical considerations assume that the para
`metric representation is pieceWise linear, and may be repre
`sented by a set of N spectra, {IA{(u),én)}n:ON '1. For the piece
`Wise linear representation, the interpolated quantized
`coe?icient vector is:
`
`H
`
`(25)
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`In the case Where parameter VQ is employed, the interpola
`tion alloWs for a substantial simpli?cation of the search com
`putations. In this case, the distortion can be interpolated:
`
`60
`
`The above can be easily generalized to the parameter VQ
`
`65
`
`and the respective optimal parameter value, Which is a con
`tinuous variable betWeen zero and one, is given by equation
`(20). This result alloWs a rapid search for the best unvoicing
`parameter value needed to transform the coef?cient vector to
`a scalar parameter, for encoding or for VQ design. Altema
`tively, in order to eliminate using the matrix 11), the scalar
`product may rede?ned to incorporate the time-varying spec
`tral Weighting. The respective orthonormal basis functions
`then satisfy:
`
`Where 6(i-j) denotes Kroneker delta. The respective param
`eter vector is given by:
`
`. , 1p,_1]Tis an I-th dimensional vector
`.
`Where 1p(w):[1pO, 1p 1, .
`of time-varying orthonormal functions.
`REW Parameter Analysis-By-Synthesis VQ
`This section presents the AbS VQ paradigm for the REW
`parameter. The ?rst presentation is a system Which quantizes
`the REW parameter by employing spectral based AbS. Then
`simpli?ed systems, Which apply AbS to the REW parameter,
`are presented.
`A. REW Parameter Quantization by Magnitude AbS VQ
`The novel Analysis-by-Synthesis (AbS) REW parameter
`VQ technique is illustrated in FIG. 3. An excitation vector
`cZ-J-(m) (m:l; .
`.
`. ,M) is selected from the VQ codebook and is
`fed through a synthesis ?lter to obtain a parameter vector
`i@(m) (synthesized quantized) Which is then mapped to quan
`tized a representation coe?icient vectors
`This is
`compared With a sequence of input representation coef?cient
`vectors y(m) and each is spectrally Weighted. Each spectrally
`Weighted error is then temporally Weighted, and a distortion
`measure is obtained. A search through all candidate excitation
`vectors determines an optimal choice. The synthesis ?lter in
`FIG. 3 can be vieWed as a ?rst order predictor in a feedback
`loop. (While shoWn here is an auto -regressive synthesis ?lter,
`in other arrangements moving-average (MA) synthesis ?lter
`may be used.) By alloWing the value of the predictor param
`eter P to change, it becomes a “switched-predictor” scheme.
`Switched-prediction is introduced to alloW for different levels
`of REW parameter correlation.
`The scheme incorporates both spectral Weighting and tem
`poral Weighting. The spectral Weighting is used for the dis
`tortion betWeen each pair of input and the quantized spectra.
`In order to improve SEW/REW mixing, particularly in mixed
`voiced and unvoiced speech segments, and to increase speech
`crispness, especially for plosives and onsets, temporal
`
`
`
`US 7,584,095 B2
`
`Weighting is incorporated in the AbS REW VQ. The temporal
`Weighting is a monotonic function of the temporal gain. TWo
`codebooks are used, and each codebook has an associated
`predictor coef?cient, P 1 and P2. The quantization target is an
`M-dimensional vector of REW spectra. Each REW spectrum
`is represented by a vector of basis function coef?cients
`denoted by y(m). The search for the minimal WMSE is per
`formed over all the vectors, cZ-J-(m), of the tWo codebooks for
`iIl, 2. The quantized REW function coef?cients vector, y(
`2011)), is a function of the quantized parameter i@(m), Which is
`obtained by passing the quantized vector, cZ-J-(m), through the
`synthesis ?lter. The Weighted distortion betWeen each pair of
`input and quantized REW spectra is calculated. The total
`distortion is a temporally-Weighted sum of the M spectrally
`Weighted distortions. Since the predictor coef?cients are
`known, direct VQ can be used to simplify the computations.
`For a pieceWise linear parametric REW representation, a
`substantial simpli?cation of the search computations may be
`obtained by interpolating the distortion betWeen the represen
`tation spectra set, as explained in sections 3B. and 3D.
`A sequence of quantized parameter, such as 6(k), is formed
`
`by concatenating successive quantized vectors, such as (m)}m: 1M . The quantized parameter is computed recursively
`by:
`
`20
`
`é<k>:P<k>é<k-1>+@<k>
`
`25
`
`(30)
`
`Where k is the time index of the coded Waveform.
`B. Simpli?ed REW Parameter AbS VQ
`The above scheme maps each quantized parameter to coef
`?cient vector, Which is used to compute the spectral distor
`tion. To reduce complexity, such mapping, and spectral dis
`tortion computation, Which contribute to the complexity of
`the scheme, may be eliminated by using the simpli?ed
`scheme described beloW. For a high rate, and a smooth rep
`resentation surface RQnfé), the total distortion is equal to the
`sum of modeling distortion and quantization distortion:
`
`30
`
`35
`
`10
`Which is linearly related to the REW parameter squared quan
`tization error, (E(m)—é(m))2 and, therefore, justi?es directVQ
`of the REW parameter.
`B. l. Simpli?ed REW Parameter AbS VQiNon Weighted
`Distortion
`FIG. 4 illustrates a simpli?ed AbS VQ for the REW para
`metric representation. The encoder maps the REW magnitude
`to an unvoicing REW parameter, and then quantizes the
`parameter by AbS VQ. Initially, the magnitudes of the M
`REWs in the frame are mapped to coe?icient vectors,
`{y(m)}m:lM. Then, for each coe?icient vector, a search is
`performed to ?nd the optimal representation parameter, i@(y),
`using equation (20), to form an M-dimensional parameter
`vector for the current frame, {E(y(m))}m:1M. Finally, the
`parameter vector is encoded by AbS VQ. The decoded spec
`tra, {lA{(w,é(m))}m:LM, are obtained from the quantized
`parameter vector, {E(m)}m:lM, using equation (15). This
`scheme alloWs for higher temporal, as Well as spectral REW
`resolution, compared to the common method described in W.
`B. Kleijn, et al, IEEE ICASSP’95, pp. 508-511 (1995), since
`no doWnsampling is performed, and the continuous param
`eter is vector quantized in AbS.
`B.2. Simpli?ed REW Parameter AbS VQiWeighted Dis
`tortion
`The simpli?ed quantization scheme is improved to incor
`porate spectral and temporal Weightings, as illustrated in FIG.
`5. The REW parameter vector is ?rst mapped to REW param
`eter by minimizing a distortion, Which is Weighted by the
`coe?icient spectral Weighting matrix 1P, as described in sec
`tion 3.D. Then, the resulted REW parameter is used to com
`pute a Weighting, WS(E(m)), Which We choose to be the spec
`tral sensitivity to the REW parameter squared quantization
`error, (E(m)—‘§(m))2, given by:
`
`M: S L
`
`EM:
`
`The quantization distortion is related to the quantized param
`eter by:
`
`M
`
`Which, for the pieceWise linear representation case, is equal to
`
`(31)
`
`40
`
`For the pieceWise linear representation case, using equation
`(33), the folloWing equation is obtained:
`
`45
`
`50
`
`55
`
`60
`
`65
`
`(35)
`
`The above derivative can be easily computed off line. Addi
`tionally, a temporal Weighting, in form of monotonic function
`of the gain, denoted by Wt(g(m)), is used to give relatively
`large Weight to Waveforms With larger gain values. The AbS
`REW parameter quantization is computed by minimizing the
`combined spectrally and temporally Weighted distortion:
`
`The Weighted distortion scheme improves the reconstructed
`speech quality, most notably in mixed voiced and unvoiced
`speech segments. This may be explained by an improvement
`in REW/ SEW mixing.
`
`
`
`US 7,584,095 B2
`
`1 1
`Dual Predictive AbS SEW Quantization
`FIG. 6 illustrates a Dual Predictive SEW AbS VQ scheme
`Which uses tWo observables, (a) the quantized REW, and (b)
`the past quantized SEW, to jointly predict the current SEW.
`Although We refer to the operator on each observable as a
`“predictor”, in fact both are components of a single optimized
`estimator. The SEW and the REW are complex random vec
`tors, and their sum is a residual vector having elements Whose
`magnitudes have a mean value of unity. In loW bit-rate W