`Gottesman et al.
`
`(10) Patent N0.:
`(45) Date of Patent:
`
`US 7,010,482 B2
`Mar. 7, 2006
`
`US007010482B2
`
`(54) REW PARAMETRIC VECTOR
`QUANTIZATION AND DUAL-PREDICTIVE
`SEW VECTOR QUANTIZATION FOR
`WAVEFORM INTERPOLATIVE CODING
`
`(75) Inventors: Oded Gottesman, Goleta, CA (US);
`Allen Gersho, Goleta, CA (US)
`
`(73) Assignee: The Regents of the University of
`California, Oakland, CA (US)
`
`Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 543 days.
`
`Appl. No.: 09/811,187
`
`Filed:
`
`Mar. 16, 2001
`
`Prior Publication Data
`
`US 2002/0116184 A1
`
`Aug. 22, 2002
`
`Related US. Application Data
`
`Provisional application No. 60/190,371, ?led on Mar.
`17, 2000.
`
`Int. Cl.
`(2006.01)
`G10L 19/14
`US. Cl. .................................................... .. 704/222
`
`D.H. Pham et al., “Quantisation techniques for prototype
`Waveforms,” Fourth International Symposium on Signal
`Processing and Its Applications ’96, vol. 1, pp. 53-56, Aug.
`1996*
`Oded Gottesman et al., “Enhancing Waveform Interpolative
`Coding With Weighted REW Parametric Quantization,”
`IEEE Workshop on Speech Coding (2000), pp. 1-3.
`I.S. Burnett et al., “Multi-Prototype Waveform Coding Us
`ing Frame-By-Frame Analysis-By-Synthesis,” Department
`of Electrical and Computer Engineering, University of Wol
`longong, NSW, Australia (1997), pp. 1567-1570.
`I.S. Burnett et al., “NeW Techniques for Multi-Prototype
`Waveform Coding at 2.84kb/s,” Department of Electrical
`and Computer Engineering, University of Wollongong,
`NSW, Australia (1995), pp. 261-264.
`I.S. Burnett et al., “LoW Complexity Decomposition and
`Coding of Prototype Waveforms,” Dept. of Electrical and
`Computer Eng., University of Wollongong, NSW, 2522,
`Australia, pp. 23-24.
`I.S. Burnett et al., “A Mixed Prototype Waveform/Celp
`Coder for Sub 3KB/S,” School of Elecronic and Electrical
`Engineering, University of Bath, UK. BA2 7AY (1993), pp.
`II-175-II-178.
`
`(Continued)
`Primary Examiner—Susan McFadden
`(74) Attorney, Agent, or Firm—Fulbright & J aWorski
`
`Field of Classi?cation Search .............. .. 704/230,
`704/211, 219—223, 225, 229, 270, 500
`See application ?le for complete search history.
`
`(57)
`
`ABSTRACT
`
`(21)
`(22)
`(65)
`
`(60)
`
`(51)
`
`(52)
`(58)
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`5/1996 Kleijn ...................... .. 704/205
`5,517,595 A *
`6,493,664 B1 * 12/2002 Udaya Bhaskar et al. .. 704/222
`6,691,092 B1 *
`2/2004 Udaya Bhaskar et al. .. 704/265
`
`OTHER PUBLICATIONS
`
`U. Bhasker et al., “Quantization of SEW and REW compo
`nents for 3.6 kbits/s coding based on PWI,” IEEE Workshop
`on Speech Coding Proceedings, pp. 99-101, Jun. 1999*
`
`BM, impliedl
`
`QUANTIZED
`REW I'M l
`
`MEANS
`
`PREDICTORS
`
`QUANTIZED
`REW
`A
`PAREMETER gM
`;
`|
`|
`'L——---* VECTOR
`QUANTIZER
`|-—————~ CODEBOOK
`l
`PITCH
`
`|
`
`An enhanced analysis-by-synthesis Waveform interpolative
`speech coder able to operate at 2.8 kbps. Novel features
`include dual-predictive analysis-by-synthesis quantization
`of the slowly-evolving Waveform, ef?cient parametrization
`of the rapidly-evolving Waveform magnitude, and analysis
`by-synthesis vector quantization of the rapidly evolving
`Waveform parameter. Subjective quality tests indicate that it
`exceeds G.723.1 at 5.3 kbps, and of G.723.1 at 6.3 kbps.
`
`8 Claims, 6 Drawing Sheets
`
`minwnz
`
`SPECTRAL
`WEIGHTINC
`Saint Lawrence Communications, LLC
`IPR2016-00704
`Exhibit 2016
`
`
`
`US 7,010,482 B2
`Page 2
`
`OTHER PUBLICATIONS
`
`Oded Gottesman, “Dispersion Phase Vector Quantization for
`Enhancement of Waveform Interpolative Coder,” Signal
`Compression Laboratory, Department of Electrical and
`Computer Engineering, University of California, Santa Bar
`bara, Calilfornia 93106, USA, pp. 1-4.
`Oded Gottesman et al., “Enhanced Waveform Interpolative
`Coding at 4 KBPS,” Signal Compression Laboratory,
`Department of Electrical and Computer Engineering,
`University of California, Santa Barbara, California 93106,
`USA, pp. 1-3.
`Oded Gottesman et al., “High Quality Enhanced Waveform
`Interpolative Coding at 2.8 KBPS,” IEEE International
`Conference on Acoustics, Speech, and Signal Processing,
`2000, pp. 1-4.
`Oded Gottesman et al., “Enhanced Analysis-By-Synthesis
`Waveform Interpolative Coding at 4 KBPS,” Signal
`Compression Laboratory, Department of Electrical and
`Computer Engineering, University of California, Santab
`Barbara, California 93106, USA, pp. 1-4.
`Daniel W. Grif?n et al., “Multiband Excitation Vocoder,”
`IEEE Transactions on Acoustics, Speech, and Signal
`Processing (1988) 36(8):1223-1235.
`W. Bastiaan Kleijn et al., “A Speech Coder Based on
`Decomposition of Characteristic Waveforms,” IEEE (1995),
`pp. 508-511.
`W. Bastiaan Kleijn et al., “Waveform Interpolation for
`Coding and Synthesis,” Speech Coding and Synthesis
`(1995), pp. 175-207.
`
`W. Bastiaan Kleijn et al., “Transformation and Decomposi
`tion of the Speech Signal for Coding,” IEEE Signal Proces
`ing Letters 1(9):136-138 (1994).
`W. Bastiaan Kleijn, “Encoding Speech Using Prototype
`Waveforms, ” IEE Transactions on Speech and Audio
`Processing 1(4):386-399 (1993).
`W. Bastiaan Kleijn, “Continuous Representations in Linear
`Predictive Coding,” Speech Research Department, AT&T
`Bell Laboratories, Murray Hill, NJ 07974 (1991), pp. 201
`204.
`W. Bastiaan Kleijn et al., “A LoW-Complexity Waveform
`Interpolation Coder,” Speech Codiing Research Depart
`ment, AT &T Bell Laboratories, 600 Mountain Avenue, Mur
`ray Hill, NJ 07974, USA (1996), pp. 212-215.
`R]. McAulay et al., “Sinusoidal Coding,” Speech Coding
`and Synthesis 4:121-173 (1995).
`
`Yair Shoham, “High-Quality Speech Coding at 2.4 to 4.0
`KBPS Based on Time Frequency Interpolation,” IEEE, pp.
`II-167-II-170 (1993).
`
`Yair Shoham, “Very LoW Complexity Interpolative Speech
`Coding at 1.2 to 2.4 KBPS,” IEEE, pp. 1599-1602 (1997).
`Yair Shoham, “LoW Complexity Speech Coding at 1.2 to 2.4
`kbps Based on Waveform Interpolation,” International
`Journal of Speech Technology 2:329-341 (1999).
`
`* cited by examiner
`
`
`
`U.S. Patent
`
`Mar. 7, 2006
`
`Sheet 1 0f 6
`
`US 7,010,482 B2
`
`REW PARAMETER 5
`
`F/G.
`
`E.
`’ Raw)
`
`VECTOR
`UA TIZER
`EBOOK
`
`E00)
`VECTOR OF
`QUANTIZED
`REW
`SPECTRA
`
`+
`
`3(0))
`VECTOR
`OF REW “*
`SPECTRA Maw)
`Raw)
`
`._
`
`VEC R
`QUAN
`R
`CODEBOOK
`
`
`
`U.S. Patent
`
`Mar. 7, 2006
`
`Sheet 2 0f 6
`
`US 7,010,482 B2
`
`258mm
`
`02:56;
`
`.5935
`
`022105;
`
`
`
`_ TN .21 E5
`
`Ill? @3522
`
`E: w
`~66?
`
`ESE/5o
`
`xoommaoo
`
`mohoammm
`
`xoomwaou
`
`
`
`
`
`AS; $252-$500 250252 2mm
`
`2% Z5
`
`A50 05
`
`M, 6E
`
`
`
`U.S. Patent
`
`Mar. 7, 2006
`
`Sheet 3 of 6
`
`US 7,010,482 B2
`
`355
`
`wzE_om;
`
`
`
`U.S. Patent
`
`Mar. 7, 2006
`
`Sheet 4 0f 6
`
`US 7,010,482 B2
`
`m, 6K
`
`E0 220
`
`
`
`4,4155% .EQQEE
`
`02:55;
`
`Ni
`
`E:
`
`as c w
`
`@2255
`
`
`
`
`
`“EX 222258 230258 51
`
`E3 “88>
`_ $528
`
`x8880
`
`
`
`
`
`._<m._.um_n_w “Eve on:
`
`
`
`U.S. Patent
`
`Mar. 7, 2006
`
`Sheet 5 0f 6
`
`US 7,010,482 B2
`
`OUTPUT SEW
`
`MEAN-REMOVED SEW
`
`BITS
`
`FIG.
`
`8
`HARMONICS
`RANGE
`
`9—14
`
`El 15-19
`
`E] 20-24
`
`E] 25-29
`
`ET 30-35
`
`[I] 36-69
`
`VOICED
`
`INTERMEDIATE
`
`UNVOICED
`
`_ _ _ _ _ _ _ _ _
`
`m m m M u w a 6 4 2 0
`
`
`
`as mzm BEE; 5&8
`
`
`
`U.S. Patent
`
`Mar. 7, 2006
`
`Sheet 6 6f 6
`
`US 7,010,482 B2
`
`F/G.
`
`9
`
`I
`
`0 9 8 _/ 6 5 A.
`
`_ _ _ _ _ _
`
`1
`0.5 —
`
`—0.5 —
`
`1
`II
`0.5 —\
`
`—0.5 ~
`
`—0.5 —
`
`HARMONICS
`RANGE
`
`E3 9-14
`
`El 15—19
`
`E] 20-24
`
`El 25-29
`
`El 30-35
`
`[I 36-69
`
`UNVOICED
`P76.
`
`70
`
`VOICED
`
`INTERMEDIATE
`
`VOICED RANGE
`I
`l
`EW PREDICTOR
`
`REW PREDICATOR
`
`I
`I
`l
`8
`10
`6
`INTERMEDIATE RANGE
`
`EW
`
`EDICATOR
`
`T
`
`I
`
`SEW PREDICATOR
`
`I
`I
`8
`6
`UNVOICED RANGE
`
`I
`10
`
`I
`12
`
`I
`I
`REW PREDICTOR
`
`SEW PREDICTOR
`I
`I
`6
`8
`HARMONICS
`
`I
`TO
`
`2 ‘
`
`
`
`US 7,010,482 B2
`
`1
`REW PARAMETRIC VECTOR
`QUANTIZATION AND DUAL-PREDICTIVE
`SEW VECTOR QUANTIZATION FOR
`WAVEFORM INTERPOLATIVE CODING
`
`CROSS REFERENCE TO RELATED
`APPLICATION
`
`This application claims the bene?t of Provisional Patent
`Application Ser. No. 60/190,371, ?led Mar. 17, 2000 Which
`application is herein incorporated by reference.
`
`10
`
`BACKGROUND OF THE INVENTION
`
`15
`
`The present invention relates to vector quantization (VQ)
`in speech coding systems using Waveform interpolation.
`In recent years, there has been increasing interest in
`achieving toll-quality speech coding at rates of 4 kbps and
`beloW. Currently, there is an ongoing 4 kbps standardiZation
`effort conducted by an international standards body (The
`International Telecommunications Union-Telecommunica
`tion (ITU-T) StandardiZation Sector). The eXpanding variety
`of emerging applications for speech coding, such as third
`generation Wireless netWorks and LoW Earth Orbit (LEO)
`systems, is motivating increased research efforts. The speech
`quality produced by Waveform coders such as code-excited
`linear prediction (CELP) coders degrades rapidly at rates
`beloW 5 kbps; see B. S. Atal, and M. R. Schroeder, (1984)
`“Stochastic Coding of Speech at Very LoW Bit Rate”, Proc.
`Int Conf. Comm, Amsterdam, pp. 1610—1613.
`On the other hand, parametric coders, such as: the Wave
`form-interpolative (WI) coder, the sinusoidal-transform
`coder (STC), and the multiband-eXcitation (MBE) coder,
`produce good quality at loW rates but they do not achieve toll
`quality; see Y. Shoham, IEEE ICASSP’93, Vol. II, pp.
`167—170 (1993); I. S. Burnett, and R. J. Holbeche, (1993),
`IEEE ICASSP’93, Vol. II, pp. 175—178; W. B. Kleijn, (1993),
`IEEE Trans. Speech andAudio Processing, Vol. 1, No. 4, pp.
`386—399; W. B. Kleijn, and J. Haagen, (1994), IEEE Signal
`ProcessingLetters, Vol. 1, No. 9, pp. 136—138; W. B. Kleijn,
`and J. Haagen, (1995), IEEE ICASSP’95, pp. 508—511; W.
`B. Kleijn, and J. Haagen, (1995), in Speech Coding Synthe
`sis by W. B. Kleijn and K. K. PaliWal, Elsevier Science B.
`V., Chapter 5, pp. 175—207; I. S. Burnett, and G. J. Bradley,
`(1995),IEEE ICASSP’95, pp. 261—263, 1995; I. S. Burnett,
`and G. J. Bradley, (1995), IEEE Workshop on Speech
`Coding for Telecommunications, pp. 23—24; I. S. Burnett,
`and D. H. Pham, (1997), IEEE ICASSP’97, pp. 1567—1570;
`W. B. Kleijn, Y. Shoham, D. Sen, and R. Haagen, (1996),
`IEEE ICASSP’96, pp. 212—215; Y. Shoham, (1997), IEEE
`ICASSP’97, pp. 1599—1602; Y. Shoham, (1999), Interna
`tional Journal of Speech Technology, KluWer Academic
`Publishers, pp. 329—341; R. J. McAulay, and T. F. Quatieri,
`(1995),in Speech Coding Synthesis by W. B. Kleijn and K.
`K. PaliWal, Elsevier Science B. V., Chapter 4, pp. 121—173;
`and D. Grif?n, and J. S. Lim, (1988),IEEE Trans. ASSR Vol.
`36, No. 8, pp. 1223—1235. This is largely due to the lack of
`robustness of speech parameter estimation, Which is com
`monly done in open-loop, and to inadequate modeling of
`non-stationary speech segments.
`Commonly in WI coding, the similarity betWeen succes
`sive rapidly evolving Waveform (REW) magnitudes is
`exploited by doWnsampling and interpolation and by con
`strained bit allocation; see W. B. Kleijn, and J. Haagen,
`(1995), IEEE ICASSP’95, pp. 508—511. In a previous
`Enhanced Waveform Interpolative (EWI) coder the REW
`magnitude Was quantized on a Waveform by Waveform base;
`see O. Gottesman and A. Gersho, (1999), “Enhanced Wave
`form Interpolative Coding at 4 kbps”, IEEE Speech Coding
`Workshop, pp. 90—92, Finland; Finland. O. Gottesman and
`
`25
`
`35
`
`45
`
`55
`
`65
`
`2
`A. Gersho, (1999), “Enhanced Analysis-by-Synthesis Wave
`form Interpolative Coding at 4 kbps”, EUROSPEECH’99,
`pp. 1443—1446, Hungary.
`
`SUMMARY OF THE INVENTION
`
`The present invention describes novel methods that
`enhance the performance of the WI coder, and alloWs for
`better coding ef?ciency improving on the above 1999 Got
`tesman and Gersho procedure. The present invention incor
`porates analysis-by-synthesis (AbS) for parameter estima
`tion, offers higher temporal and spectral resolution for the
`REW, and more efficient quantiZation of the sloWly-evolving
`Waveform
`In particular, the present invention pro
`poses a novel ef?cient parametric representation of the REW
`magnitude, an ef?cient paradigm for AbS predictive VQ of
`the REW parameter sequence, and dual-predictive AbS
`quantiZation of the SEW.
`More particularly, the invention provides a method for
`interpolative coding input signals, the signals decomposed
`into or composed of a sloWly evolving Waveform and a
`rapidly evolving Waveform having a magnitude, the method
`incorporating at least one various, preferably combinations
`of the folloWing steps or can include all of the steps:
`(a) AbS VQ of the REW;
`(b) parametriZing the magnitude of the REW;
`(c) incorporating temporal Weighting in the AbS VQ of
`the REW;
`(d) incorporating spectral Weighting in the AbS VQ of the
`REW;
`(e) applying a ?lter to a vector quantiZer codebook in the
`analysis-by-synthesis vector-quantiZation of the rapidly
`evolving Waveform Whereby to add self correlation to the
`codebook vectors; and
`(f) using a coder in Which a plurality of bits therein are
`allocated to the rapidly evolving Waveform magnitude.
`In addition, one can combine AbS quantiZation of the
`sloWly evolving Waveform With any or all of the foregoing
`parameters.
`The neW method achieves a substantial reduction in the
`REW bit rate and the EWI achieves very close to toll quality,
`at least under clean speech conditions. These and other
`features, aspects, and advantages of the present invention
`Will become better understood With regard to the folloWing
`detailed description, appended claims, and accompanying
`draWings.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 is a REW Parametric Representation;
`FIG. 2 is a REW Parametric VQ;
`FIG. 3 is a REW Parametric Representation AbS VQ;
`FIG. 4 is a REW Parametric Representation Simpli?ed
`AbS VQ;
`FIG. 5 is a REW Parametric Representation Simpli?ed
`Weighted AbS VQ;
`FIG. 6 is a block diagram of the Dual Predictive AbS
`SEW vector quantiZation;
`FIG. 7 is a Weighted Signal-to-Noise Ratio (SNR) for
`Dual Predictive AbS SEW VQ;
`FIG. 8 is an output Weighted SNR for the 18 codebooks,
`9-bit AbS SEW VQ;
`FIG. 9 is a mean-removed SEW’s Weighted SNR for the
`18 codebooks, 9-bit AbS SEW VQ; and
`FIG. 10 are predictors for three REW parameter ranges.
`
`DETAILED DESCRIPTION
`
`In very loW bit rate WI coding, the relation betWeen the
`SEW and the REW magnitudes Was exploited by computing
`
`
`
`US 7,010,482 B2
`
`3
`the magnitude of one as the unity complement of the other;
`see W. B. Kleijn, and J. Haagen, (1995), “A Speech Coder
`Based on Decomposition of Characteristic Waveforms”,
`IEEE ICASSP’95, pp. 508—511; W. B. Kleijn, and J. Haagen,
`(1995), “Waveform Interpolation for Coding and Synthesis”,
`in Speech Coding Synthesis by W. B. Kleijn and K. K.
`PaliWal,Elsevier Science B. V, Chapter 5, pp. 175—207; I. S.
`Burnett, and G. J. Bradley, (1995), “New Techniques for
`Multi-Prototype Waveform Coding at 2.84 kb/s”, IEEE
`ICASSP’95, pp. 261—263, 1995; I. S. Burnett, and G. J.
`Bradley, (1995), “LoW Complexity Decomposition and
`Coding of Prototype Waveforms”, IEEE Workshop on
`Speech Coding for Telecommunications, pp. 23—24; I. S.
`Burnett, and D. H. Pham, (1997), “Multi-Prototype Wave
`form Coding using Frame-by-Frame Analysis-by-Synthe
`sis”, IEEE ICASSP’97, pp. 1567—1570; W. B. Kleijn, Y.
`Shoham, D. Sen, and R. Haagen, (1996), “A LoW-Complex
`ity Waveform Interpolation Coder”, IEEE ICASSP’96, pp.
`212—215; Y. Shoham, (1997), “Very LoW Complexity Inter
`polative Speech Coding at 1.2 to 2.4 kbps”, IEEE
`ICASSP’97, pp. 1599—1602; Y. Shoham, (1999), “LoW
`Complexity Speech Coding at 1.2 to 2.4 kbps Based on
`Waveform Interpolation”, International Journal of Speech
`Technology, KluWer Academic Publishers, pp. 329—341.
`Also, since the sequence of SEW magnitude evolves
`sloWly, successive SEWs exhibit similarity, offering oppor
`tunities for redundancy removal. Additional forms of redun
`dancy that may be exploited for coding ef?ciency are: (a) for
`a ?xed SEW/REW decomposition ?lter, the mean SEW
`magnitude increases With the pitch period and (b) the
`similarity betWeen successive SEWs, also increases With the
`pitch period. In this Work We introduce a novel “dual
`predictive” AbS paradigm for quantizing the SEW magni
`tude that optimally exploits the information about the cur
`rent quantized REW, the past quantized SEW, and the pitch,
`in order to predict the current SEW.
`
`1O
`
`15
`
`25
`
`4
`polynomials; see W. B. Kleijn, Y Shoham, D. Sen, and R.
`Haagen, (1996),IEEE ICASSP’96, pp. 212—215; Y Shoham,
`(1997), IEEE ICASSP’97, pp. 1599—1602; Y Shoham,
`(1999), International Journal of Speech Technology, KluWer
`Academic Publishers, pp. 329—341. Such a representation
`usually produces a smoother REW magnitude, and improves
`the perceptual quality. Suppose the REW magnitude, R(u)),
`is represented by a linear combination of orthonormal func
`tions, IpL-(w):
`
`I41
`RW) = 2mm 0 5m in
`
`(1)
`
`Where no is the angular frequency, and I is the representation
`order. The REW magnitude is typically an increasing func
`tion of frequency, Which, can be coarsely quantized With a
`loW number of bits per Waveform Without signi?cant per
`ceptual degradation. Therefore, it may be advantageous to
`represent the REW magnitude in a simple, but perceptually
`relevant manner. Consequently We model the REW by the
`folloWing parametric representation, R(u),E):
`
`I41
`1%, g) = Zwalmw), 0 s a) 5 7r; 0 S g 51
`[:0
`
`(2)
`
`, ,_1(E)]T is a parametric vector the
`.
`.
`Where \A((E)=[\A(O(E), .
`representation model subspace, and E is the “unvoicing”
`parameter Which is zero for a fully voiced spectrum, and one
`for a fully unvoiced spectrum. Thus R(u),E) de?nes a tWo
`dimensional surface Whose cross sections for each value of
`E give a particular REW magnitude spectrum, Which is
`de?ned merely by specifying a scalar parameter value.
`A simple and practical Way for parametric representation
`of the REW is, for example, by a parametric linear combi
`nation of basis functions, such as polynomials With para
`metric coef?cients, namely:
`
`I41
`iaogpiwgm, 0 swsmosgs 1
`[:0
`
`(3)
`
`35
`
`40
`
`45
`
`Introduction to REW Quantization
`The REW represents the rapidly changing unvoiced
`attribute of speech. Commonly in WI systems, the REW is
`quantized on a Waveform by Waveform base. Hence, for loW
`rate WI systems having long frame size, and a large number
`of Waveforms per frame, the relative bitrate required for the
`REW becomes signi?cantly excessive. For example, con
`sider a potential 2 kbps system Which uses a 240 sample
`frame, 12 Waveforms per frame, and Which quantizes the
`SEW by alternating bit allocation of 3 bit and 1 bit per
`Waveform. The REW bitrate is then 24 bit per frame, or 800
`kbps Which is 40% of the total bitrate. This example
`demonstrates the need for a more ef?cient REW quantiza
`tion.
`Efficient REW quantization can bene?t from tWo obser
`vations: (1) the REW magnitude is typically an increasing
`function of the frequency, Which suggests that an ef?cient
`parametric representation may be used; (2) one can observe
`a similarity betWeen successive REW magnitude spectra,
`Which may suggest a potential gain by employing predictive
`VQ on a group of adjacent REWs. The next tWo sections
`propose REW parametric representation, and its respective
`
`REW Parametric Representation
`Direct quantization of the REW magnitude is a variable
`dimension quantization problem, Which may result in spend
`ing bits and computational effort on perceptually irrelevant
`information. Asimple and practical Way to obtain a reduced,
`and ?xed, dimension representation of the REW is With a
`linear combination of basis functions, such as orthonormal
`
`For practical considerations assume that the parametric
`representation is a pieceWise linear function of E, and may
`therefore be represented by a set of N uniformly spaced
`spectra, as illustrated in FIG. 1.
`
`55
`
`REW Parametric Vector Quantization
`One can observe the similarity betWeen successive REW
`magnitude spectra, Which may suggest a potential gain by
`VQ of a set of successive REWs. FIG. 2 illustrates a simple
`parametric VQ system for a vector of REW spectra. The
`input is an M dimensional vector of REW magnitude spec
`tra,
`
`I—Q((D)=IR1((D)> R209): -
`
`-
`
`-
`
`> RM(0‘))]T
`
`(4)
`
`65
`
`and the VQ output is an index, j, Which determines a
`quantized parameter vector, E:
`,éMlT
`
`
`
`US 7,010,482 B2
`
`5
`Which parametrically determines a vector of quantized spec
`tra:
`
`314)] T
`
`(6)
`
`The encoder searches, in the parameter codebook Cq(i§), for
`the parameter vector Which minimizes the distortion:
`
`10
`
`(7)
`
`= argmi Z
`gems) W1
`
`6
`
`-continued
`
`lil
`
`= argmi Z (w! — war}
`gems) [:0
`
`In VQ case, the quantized parameter vector is given by:
`
`M
`
`MR... Row}
`3: argmi Z
`gecqta W1
`
`A
`
`(14)
`
`For example, suppose the input REW magnitude is repre
`sented by an I-th dimensional vector of function coef?cients,
`y, given by:
`
`20
`
`For a set of M input REWs, each is of Which represented by
`a vector of polynomial coef?cients, ym, Which form a P><M
`input coefficient matrix, I“:
`
`25
`
`B. PieceWise Linear Parametric Representation
`In order to have a simple representation that is computa
`tionally efficient and avoids excessive memory require
`ments, We model the tWo dimensional surface by a pieceWise
`linear parametric representation. Therefore, We introduce a
`set of N uniformly spaced spectra, {I1(uu,én}n=ON '1. Then the
`parametric surface is de?ned by linear interpolation accord
`ing t:
`
`The inverse VQ output is a vector of M quantized REWs,
`Which form the quantized function coefficient matrix:
`
`30
`
`A
`
`gm sgsgnaw %:A=§.—§H
`
`A
`
`_ A i
`
`A
`
`A
`
`maneme». .
`
`.
`
`. . Mo]
`
`(10)
`
`Which is used by the decoder to compute the quantized
`spectra.
`A. Quantization Using Orthonormal Functions
`Orthonormal functions, such as polynomials, may be used
`for efficient quantization of the REW; see W. B. Kleij n, et al.,
`(1996), IEEE ICASSP’96, pp. 212—215; Y. Shoham, (1997),
`IEEE ICASSP’97, pp. 1599—1602; Y. Shoham, (1999), Inter
`national Journal of Speech Technology, KluWer Academic
`Publishers, pp. 329—341. Consider REW magnitude, R(u)),
`represented by a linear combination of orthonormal func
`tions, lpl-(uu):
`
`Because this representation is linear, the coefficients of
`35 13(uufé) are” linear combinations of the coefficients of R(u),
`End) and R(u),En). Hence,
`i(E)=(1—u)i.,1+ui.
`
`(16)
`
`40 Where is the coefficient vector of the n-th REW magnitude
`function representation:
`
`i.=i(é.)
`
`45 In this case, the distortion may be interpolated by:
`
`A
`
`A
`
`A
`
`D(R. Re)» = f”|R(w)—(1—w)R(w,§nA1) —
`O
`
`50
`
`Mm), 30PM
`=||v-<1-wm-1-m||2
`
`(17)
`
`18
`
`(
`
`)
`
`Which is modeled using the parametric representation:
`
`lil
`M. a = Zmww. 0 s w s n; 0 54* s1
`[:0
`
`55 The above can be easily generalized to the parameter VQ
`case. The optimal interpolation factor that minimizes the
`distortion betWeen tWo representation vectors is given by:
`
`(12)
`
`The quantized REW parameter is then given by:
`
`60
`
`_ on mam-n1)
`‘10v! — %2
`M7,, — Wm H
`
`‘19)
`
`gems)
`
`65 and the respective optimal parameter value, Which is a
`continuous variable betWeen zero and one, is given by:
`
`
`
`US 7,010,482 B2
`
`7
`This result allows a rapid search for the best unvoicing
`parameter value needed to transform the coef?cient vector to
`a scalar parameter, folloWed by the corresponding quanti
`Zation scheme, as described in the section 4.
`C. Weighted Distortion Quantization
`Commonly in speech coding, the magnitude is quantiZed
`using Weighted distortion measure. In this case the quantiZed
`REW parameter is then given by:
`
`8
`Note that no bene?t is obtained here by using orthonormal
`functions, therefore any function representation may be
`used. The above can be easily generaliZed to the parameter
`VQ case. The optimal parameter that minimiZes the spec
`trally Weighted distortion betWeen tWo representation vec
`tors is given by:
`
`and the orthonormal function simpli?cation, given in equa
`tion (13), cannot be used. In this case, the Weighted distor
`tion betWeen the input and the parametric representation
`modeled spectra is equal to:
`
`is the Weighted correlation matrix of the
`Where
`orthonormal functions, its elements are:
`
`wtjrwwbf WwWwwj-(wdw.
`
`n
`
`0
`
`23
`
`)
`
`(
`
`is the modeled
`y is the input coef?cient vectors, and
`parametric coef?cient vector. In VQ case, the quantiZed
`parameter vector is given by:
`
`D. Weighted Distortion—PieceWise Linear Parametric
`Representation
`Again, for practical considerations assume that the para
`metric representation is pieceWise linear, and may be rep
`resented by a set of N spectra, {I1(uu,én)}n=O '1. For the
`pieceWise linear representation, the interpolated quantiZed
`coef?cient vector is:
`
`In the case Where parameter VQ is employed, the interpo
`lation alloWs for a substantial simpli?cation of the search
`computations. In this case, the distortion can be interpolated:
`
`15
`
`25
`
`35
`
`40
`
`45
`
`55
`
`65
`
`and the respective optimal parameter value, Which is a
`continuous variable betWeen Zero and one, is given by
`equation (20). This result alloWs a rapid search for the best
`unvoicing parameter value needed to transform the coef?
`cient vector to a scalar parameter, for encoding or for VQ
`design. Alternatively, in order to eliminate using the matrix
`11), the scalar product may rede?ned to incorporate the
`time-varying spectral Weighting. The respective orthonor
`mal basis functions then satisfy:
`
`Where 6(i-j) denotes Kroneker delta. The respective param
`eter vector is given by:
`
`, 1p,_1]T is an I-th dimensional
`.
`.
`Where 1p(w)=[1pO, 1P1, .
`vector of time-varying orthonormal functions.
`REW Parameter Analysis-By-Synthesis VQ
`This section presents the AbS VQ paradigm for the REW
`parameter. The ?rst presentation is a system Which quantiZes
`the REW parameter by employing spectral based AbS. Then
`simpli?ed systems, Which apply AbS to the REW parameter,
`are presented.
`A. REW Parameter Quantization by Magnitude AbS VQ
`The novel Analysis-by-Synthesis (AbS) REW parameter
`VQ technique is illustrated in FIG. 3. An excitation vector
`cij-(m) (m=1; .
`.
`. , M) is selected from the VQ codebook and
`is fed through a synthesis ?lter to obtain a parameter vector
`(synthesiZed quantiZed) Which is then mapped to
`quantiZed a representation coef?cient vectors
`This
`is compared With a sequence of input representation coef
`?cient vectors y(m) and each is spectrally Weighted. Each
`spectrally Weighted error is then temporally Weighted, and a
`distortion measure is obtained. A search through all candi
`date excitation vectors determines an optimal choice. The
`synthesis ?lter in FIG. 3 can be vieWed as a ?rst order
`predictor in a feedback loop. (While shoWn here is an
`auto-regressive synthesis ?lter, in other arrangements mov
`ing-average (MA) synthesis ?lter may be used.) By alloWing
`the value of the predictor parameter P to change, it becomes
`a “switched-predictor” scheme. SWitched-prediction is
`introduced to alloW for different levels of REW parameter
`correlation.
`The scheme incorporates both spectral Weighting and
`temporal Weighting. The spectral Weighting is used for the
`distortion betWeen each pair of input and the quantiZed
`
`
`
`US 7,010,482 B2
`
`9
`spectra. In order to improve SEW/REW mixing, particularly
`in mixed voiced and unvoiced speech segments, and to
`increase speech crispness, especially for plosives and onsets,
`temporal Weighting is incorporated in the AbS REW VQ.
`The temporal Weighting is a monotonic function of the
`temporal gain. TWo codebooks are used, and each codebook
`has an associated predictor coef?cient, P1 and P2. The
`quantization target is an M-dimensional vector of REW
`spectra. Each REW spectrum is represented by a vector of
`basis function coef?cients denoted by
`The search for
`the minimal WMSE is performed over all the vectors, 6,].
`(m), of the tWo codebooks for i=1, 2. The quantized REW
`function coefficients vector,
`is a function of the
`quantized parameter
`Which is obtained by passing the
`quantized vector, cij-(m), through the synthesis ?lter. The
`Weighted distortion betWeen each pair of input and quan
`tized REW spectra is calculated. The total distortion is a
`temporally-Weighted sum of the M spectrally Weighted
`distortions. Since the predictor coef?cients are knoWn, direct
`VQ can be used to simplify the computations. For a piece
`Wise linear parametric REW representation, a substantial
`simpli?cation of the search computations may be obtained
`by interpolating the distortion betWeen the representation
`spectra set, as explained in sections 3.B. and 3D.
`A sequence of quantized parameter, such as c(k), is
`formed by concatenating successive quantized vectors, such
`as {cl-j-(m)}m=lM. The quantized parameter is computed
`recursively by:
`
`10
`
`15
`
`25
`
`Where k is the time index of the coded Waveform.
`B. Simpli?ed REW Parameter AbS VQ
`The above scheme maps each quantized parameter to
`coef?cient vector, Which is used to compute the spectral
`distortion. To reduce complexity, such mapping, and spectral
`distortion computation, Which contribute to the complexity
`of the scheme, may be eliminated by using the simpli?ed
`scheme described beloW. For a high rate, and a smooth
`representation surface I1(u),§), the total distortion is equal to
`the sum of modeling distortion and quantization distortion:
`
`35
`
`40
`
`45
`
`M: l S u
`
`u
`
`M: S L
`
`The quantization distortion is related to the quantized
`parameter by:
`
`55
`
`10
`
`Which is linearly related to the REW parameter squared
`quantization error,
`and, therefore, justi?es
`direct VQ of the REW parameter.
`B.1. Simpli?ed REW Parameter AbS VQ—Non Weighted
`Distortion
`FIG. 4 illustrates a simpli?ed AbS VQ for the REW
`parametric representation. The encoder maps the REW
`magnitude to an unvoicing REW parameter, and then quan
`tizes the parameter by AbS VQ. Initially, the magnitudes of
`the M REWs in the frame are mapped to coef?cient vectors,
`{y(m)}m=1M. Then, for each coefficient vector, a search is
`performed to ?nd the optimal representation parameter, i@(y),
`using equation (20), to form an M-dimensional parameter
`vector for the current frame, {E(y(m))}m=1M. Finally, the
`parameter vector is encoded by AbS VQ. The decoded
`spectra, {I1(uu,é(m))}m=1M, are obtained from the quantized
`parameter vector, {E(m)}m=1M, using equation (15). This
`scheme alloWs for higher temporal, as Well as spectral REW
`resolution, compared to the common method described in W.
`B. Kleijn, et al, IEEE ICASSP’95, pp. 508—511 (1995), since
`no doWnsampling is performed, and the continuous param
`eter is vector quantized in AbS.
`B.2. Simpli?ed REW Parameter AbS VQ—Weighted
`Distortion
`The simpli?ed quantization scheme is improved to incor
`porate spectral and temporal Weightings, as illustrated in
`FIG. 5. The REW parameter vector is ?rst mapped to REW
`parameter by minimizing a distortion, Which is Weighted by
`the coef?cient spectral Weighting matrix 1P, as described in
`section 3.D. Then, the resulted REW parameter is used to
`compute a Weighting, WS(E(m)), Which We choose to be the
`spectral sensitivity to the REW parameter squared quanti
`zation error,
`given by:
`
`For the pieceWise linear representation case, using equation
`(33), the folloWing equation is obtained:
`
`mam» =
`
`0A T 6A
`"(ll
`5M)
`
`(35)
`
`M: S it
`
`Which, for the pieceWise linear representation case, is equal
`to
`
`65
`
`The above derivative can be easily computed off line.
`Additionally, a temporal Weighting, in form of monotonic
`function of the gain, denoted by W,(g(m)), is used to give
`relatively large Weight to Waveforms With larger gain values.
`
`
`
`US 7,010,482 B2
`
`11
`The AbS REW parameter quantization is computed by
`minimizing the combined spectrally and temporally
`Weighted distortion:
`
`M
`
`mil
`
`(36)
`
`The Weighted distortion scheme improves the reconstructed
`speech quality, most notably in mixed voiced and unvoiced
`speech segments. This may be explained by an improvement
`in REW/SEW mixing.
`Dual Predictive AbS SEW Quantization
`FIG. 6 illustrates a Dual Predictive SEW AbS VQ scheme
`Which uses tWo observables, (a) the quantized REW, and (b)
`the past quantized SEW, to jointly predict the current SEW.
`Although We refer to the operator on each observable as a
`“predictor”, in fact both are components of a single opti
`mized estimator. The SEW and the REW are complex
`random vectors, and their sum is a residual vector having
`elements Whose magnitudes have a mean value of unity. In
`loW bit-rate WI coding, the relation betWeen the SEW and
`the REW magnitudes Was approximated by computing the
`magnitude of one as the unity complement of the other.
`Suppose lrMl denotes the spectral magnitude vector