throbber
US007584095B2
`
`(12) United States Patent
`Gottesman et a1.
`
`(10) Patent N0.:
`(45) Date of Patent:
`
`US 7,584,095 B2
`*Sep. 1, 2009
`
`(54)
`
`(75)
`
`(73)
`
`(21)
`
`(22)
`
`(65)
`
`(62)
`
`(60)
`
`(51)
`
`(52)
`(58)
`
`(56)
`
`REW PARAMETRIC VECTOR
`QUANTIZATION AND DUAL-PREDICTIVE
`SEW VECTOR QUANTIZATION FOR
`WAVEFORM INTERPOLATIVE CODING
`
`Inventors: Oded Gottesman, Goleta, CA (US);
`
`7/2002 Udaya Bhaskar et a1.
`6,418,408 B1 *
`6,493,664 B1 * 12/2002 Udaya Bhaskar et a1.
`6,691,092 B1* 2/2004 Udaya Bhaskar et a1.
`
`704/219
`704/222
`704/265
`
`_
`Nonce:
`
`Allen Gersho, Goleta, Assignee: The_ Regfmts of the University of
`
`Callfornla, Oakland: CA (Us)
`_
`_
`_
`_
`Sub?q to any dlsclalmeri the term Ofthls
`patem 15 extended Or adlusted under 35
`U'S'C' 154(1)) by 98 days‘
`1 d_
`_
`Th
`_
`b_
`1 1.5 Patent 15 Su Ject to a tenmna 1S-
`C almer'
`APPI' NO‘: 11/234,631
`
`Oded Gottesman et al., “Enhancing Waveform Interpolative Coding
`With Weighted REW Parametric Quantization,” IEEE Workshop on
`Speech Coding (2000), pp. 1-3.
`I.S. Burnett et al., “Multi-Prototype Waveform Coding Using Frame
`By-Frame Analysis-By-Synthesis,” Department of Eelctrical and
`Computer Engineering, University of Wollongong, NSW, Australia
`(1997), pp. 1567-1570.
`I.S. Burnett et al., “New Techniques for Multi-Prototype Waveform
`Coding at 2.84kb/s,” Department of Electrical and Computer Engi
`r212e‘r1ng,Un1vers1ty ofWollongong, NSW, Austral1a (1995), pp. 261
`
`Filed:
`
`sep_ 23 2005
`a
`Prior Publication Data
`
`US 2006/0069554 A1
`
`Mar. 30, 2006
`
`-
`-
`RltdU.S.Al t Dt
`PP lea Ion a a
`e a 6
`Division of application No. 09/811,187, ?led on Mar.
`16, 2001, noW Pat. No. 7,010,482.
`
`Provisional application No. 60/190,371, ?led on Mar.
`17 2000.
`’
`Int_ CL
`(200601)
`G10L 19/14
`US. Cl. .................................................... .. 704/222
`Field of Classi?cation Search ................ .. 704/222
`See application ?le for complete search history.
`.
`R f
`C t d
`e erences l e
`U.S. PATENT DOCUMENTS
`
`LS. Burnett et al., “Low Complexity Decomposition and Coding of
`Prototype Waveforms,” Dept. of Electrical and Computer Eng., Uni
`versity ofWollongong, NSW, 2522, Australia, pp. 23-24.
`
`(Continued)
`
`Primar ExamineriSusan McFadden
`y
`(74) Attorney, Agent, or FirmiBerliner & Associates
`
`(57)
`
`ABSTRACT
`
`An enhanced analysis-by-synthesis Waveform interpolative
`speech coder able to operate at 2.8 kbps. Novel features
`include dual-predictive analysis-by-synthesis quantization of
`the slowly-evolving Waveform, ef?eient parametrization of
`the rapidly-evolving Waveform magnitude, and analysis-by
`synthesis vector quantization of the rapidly evolving Wave
`form parameter. Subjective quality tests indicate that it
`exceeds G.723.1 at 5.3 kbps, and of G.723.1 at 6.3 kbps.
`
`5,924,061 A *
`
`7/1999 Shoham .................... .. 704/218
`
`18 Claims, 6 Drawing Sheets
`
`v0
`
`VQ-1
`
`_
`
`_
`J
`
`2
`minl |*||
`
`5(0))
`— +
`VECTOR
`OF REW 'R‘ ~ w
`SPECTRA
`(5'
`)
`mam)
`$1
`VECTOR
`QUANTIZER
`CODEBOOK
`l_—__
`
`1'
`
`1 4
`VECTOR
`QUANTIZER —- R(E,(0)
`CODEBOOK
`
`2(9)
`VECTOR OF
`QUANTIZED
`REW
`SPECTRA
`
`Saint Lawrence Communications, LLC
`IPR2016-00704
`Exhibit 2017
`
`

`
`US 7,584,095 B2
`Page 2
`
`OTHER PUBLICATIONS
`
`I.S. Burnett et al., “A Mixed Prototype Waveform/CELP Coder for
`Sub 3KB/S,” School of Elecronic and Electrical Engineering, Uni
`versity of Bath, UK. BA2 7AY (1993), pp. II-175-II-178.
`Oded Gottesman, “Dispersion Phase Vector Quantization for
`Enhancement of Waveform Interpolative Coder,” Signal Compres
`sion Laboratory, Department of Electrical and Computer Engineer
`ing, University of California, Santa Barbara, Calilfornia 93106,
`USA, pp. 1-4.
`Oded Gottesman et a1 ., “Enhanced Waveform Interpolative Coding at
`4 KBPS,” Signal Compression Laboratory, Department of Electrical
`and Computer Engineering, University of California, Santa Barbara,
`California 93106, USA, pp. 1-3.
`Oded Gottesman et al., “High Quality Enhanced Waveform Interpo
`lative Coding at 2.8 KBPS,” IEEE International Conference on
`Acoustics, Speech, and Signal Processing, 2000, pp. 1-4.
`Oded Gottesman et a1 ., “Enhanced Analysis-by-Synthesis Waveform
`Interpolative Coding at 4 KBPS,” Signal Compression Laboratory,
`Department of Electrical and Computer Engineering, University of
`California, Santab Barbara, California 93106, USA, pp. 1-4.
`Daniel W. Grif?n et al., “Multiband Excitation Vocoder,” IEEE
`Transactions on Acoustics, Speech, and Signal Processing (1988)
`36(8):1223-1235.
`W. Bastiaan Kleijn et al., “A Speech Coder Based on Decomposition
`of Characteristic Waveforms,” IEEE (1995), p. 508-511.
`
`W. Bastiaan Kleijn et al., “Waveform Interpolation for Coding and
`Synthesis,” Speech Coding and Synthesis (1995), pp. 175-207.
`W. Bastiaan Kleijn et al., “Transformation and Decomposition of the
`Speech Signal for Coding,” IEEE Signal Procesing Letters 1(9): 136
`138 (1994).
`W. Bastiaan Kleijn, “Encoding SpeechUsing Prototype Waveforms,”
`IEE Transactions on Speech and Audio Processing 1(4):386-399
`(1993).
`W. Bastiaan Kleijn, “Continuous Representations in Linear Predic
`tive Coding,” Speech Research Department, AT&T Bell Laborato
`ries, Murray Hill, NJ 07974 (1991), pp. 201-204.
`W. Bastiaan Kleijn et al., “A Low-Complexity Waveform Interpola
`tion Coder,” Speech Coding Research Department, AT&T Bell Labo
`ratories, 600 Mountain Avenue, Murray Hill, NJ 07974, USA (1996),
`pp. 212-215.
`R]. McAulay et al., “Sinusoidal Coding,” Speech Coding and Syn
`thesis 4:121-173 (1995).
`Yair Shoham, “High-Quality Speech Coding at 2.4 to 4.0 KBPS
`Based on Time-Frequency Interpolation,” IEEE, pp. II-167-II-170
`(1993).
`Yair Shoham, “Very Low Complexity Interpolative Speech Coding at
`1.2 to 2.4 KBPS,” IEEE, pp. 1599-1602 (1997).
`Yair Shoham, “Low Complexity Speech Coding at 1.2 to 2.4 kbps
`Based on Waveform Interpolation,” International Journal of Speech
`Technology 2:329-341 (1999).
`* cited by examiner
`
`

`
`US. Patent
`
`Sep. 1, 2009
`
`Sheet 1 0f 6
`
`US 7,584,095 B2
`
`
`
`33:22: 251
`
`n-1
`
`REW PARAMETER f
`
`3(a))
`VECTOR
`OF REW "“
`SPECTRA RG'w)
`
`+
`
`_
`
`or?
`
`VECTOR
`ER
`U T
`EB
`
`F/G.
`
`

`
`U.S. Patent
`
`Sep. 1, 2009
`
`Sheet 2 of6
`
`US 7,584,095 B2
`
`Eamam
`
`ozcxomg
`
`._<~_on_2H.:
`
`ezzzoma
`
`SEE
`
`
`
` _TN.nI$5:
`
`I_mfizza
`
`%§BE
`
`E0880
`
`58>
`
`$N:z§o
`
`xoommooo
`
`
`
`EXmzficboo230258EE0on:
`
`E0220
`
`M,GE
`
`
`
`
`

`
`U.S. Patent
`
`Sep. 1, 2009
`
`Sheet 3 of 6
`
`2B
`
`MozzzumgM§E%.555_7,"sX8880----L_.U$9250QQC
`
`58>
`
`
`

`
`US. Patent
`
`Sep. 1, 2009
`
`Sheet 4 of6
`
`US 7,584,095 B2
`
`m 6K
`
`02:55;
`IZEBQQ 44201 2%
`
`
`
`358
`
`2%
`
`2%
`
`@255;
`
`2; b w
`
`E W
`
`N
`
`
`
`T a; NEE
`
`~66?
`
`153250
`
`xoommaoo
`
`55% 2% Q5
`
`
`
`2? 222258 250252 Ex
`
`2% 25
`
`

`
`US. Patent
`
`Sep. 1, 2009
`
`Sheet 5 of6
`
`US 7,584,095 B2
`
`14
`
`12 —
`a
`310 ‘
`ii, 8 —
`
`S 6 —
`E
`2 4 -
`5 2 _
`
`OUTPUT SEW
`
`MEAN-REMOVED SEW
`
`O l
`0
`
`I
`1
`
`I
`2
`
`1
`3
`
`1
`4
`
`1
`5
`1311s
`
`F
`e
`
`1
`7
`
`1
`8
`
`9
`
`2O
`
`18 ~
`
`A 16 -
`
`g 14 _
`% 12
`0
`% 1O -
`g
`B 8 “
`5
`6 _
`pl.
`8 4 _
`
`0
`
`HARMONICS
`RANGE
`
`E1 9-14
`1315-19
`20-24
`E]
`[II 25-29
`
`1:1 30-35
`
`1336-69
`
`1‘;
`73
`.32?
`"-1‘
`:i
`
`;I.;-;
`
`VOICED
`
`INTERMEDIATE
`
`UNVOICED
`
`

`
`US. Patent
`
`Sep. 1, 2009
`
`Sheet 6 of6
`
`US 7,584,095 B2
`
`.._
`
`"
`
`HARMONICS
`RANGE
`
`1.‘
`
`.
`
`9-14
`
`5115-19
`
`El 20-24
`
`13 25-29
`
`El 30-35
`
`H6" 9
`‘0
`
`9 _
`
`§ 8 -
`Q
`
`
`
`I5 § 6 _ 7 ~
`
`
`
`I..|_|
`
`g
`
`5
`a 4 T
`a?
`3 _
`%
`
`1 _
`
`O
`
`,.
`
`L,
`
`.
`
`.,
`
`35a, :37
`
`4
`
`.»
`
`-
`
`VOICED
`
`INTERMEDIATE
`
`UNVOICED
`
`I
`
`I
`
`VOICED RANGE
`I
`I
`Ew PREDICTOR
`
`I
`
`I
`
`'
`
`_
`
`_
`
`14
`
`-
`
`_
`
`14
`
`REw PREDICATOR
`
`l
`l
`I
`10
`a
`s
`INTERMEDIATE RANGE
`
`|
`12
`
`EW P EDICATOR
`
`'
`
`I
`
`SEW PREDICATOR
`
`1
`I
`s
`s
`UNVOICED RANGE
`
`1
`10
`
`1
`12
`
`I
`
`l
`
`I
`
`I
`
`I
`4
`
`I
`
`|
`4
`
`l
`
`1
`
`O5 .,
`0
`~05 -
`
`_1
`
`I
`2
`
`‘ '”'
`0.5 J
`o
`_0_5 _
`
`‘I
`
`1
`
`|
`2
`
`I
`
`_—Vw -
`
`O \ Y \
`
`4
`
`-05 —
`
`_1
`
`I
`2
`
`I
`4
`
`sEw PREDICTOR
`
`I
`6
`
`l
`8
`HARMONICS
`
`I
`10
`
`I
`12
`
`-
`
`14
`
`

`
`US 7,584,095 B2
`
`1
`REW PARAMETRIC VECTOR
`QUANTIZATION AND DUAL-PREDICTIVE
`SEW VECTOR QUANTIZATION FOR
`WAVEFORM INTERPOLATIVE CODING
`
`CROSS REFERENCE TO RELATED
`APPLICATION
`
`This application claims the bene?t of Provisional Patent
`Application No. 60/190,371 ?led Mar. 17, 2000, Which appli
`cation is herein incorporated by reference. This application is
`a divisional of US. patent application Ser. No. 09/811,187,
`?led Mar. 16, 2001 now US. Pat. No. 7,010,482.
`
`BACKGROUND OF THE INVENTION
`
`2
`magnitude Was quantized on a Waveform by Waveform base;
`see 0. Gottesman and A. Gersho, (1999), “Enhanced Wave
`form Interpolative Coding at 4 kbps”, IEEE Speech Coding
`Workshop, pp. 90-92, Finland; Finland. 0. Gottesman and A.
`Gersho, (1999), “Enhanced Analysis-by-Synthesis Wave
`form Interpolative Coding at 4 kbps”, EUROSPEECH’99,
`pp. 1443-1446, Hungary.
`
`SUMMARY OF THE INVENTION
`
`The present invention describes novel methods that
`enhance the performance of the WI coder, and alloWs for
`better coding ef?ciency improving on the above 1999 Got
`tesman and Gersho procedure. The present invention incor
`porates analysis-by-synthesis (AbS) for parameter estima
`tion, offers higher temporal and spectral resolution for the
`REW, and more e?icient quantization of the sloWly-evolving
`Waveform (SEW). In particular, the present invention pro
`poses a novel e?icient parametric representation of the REW
`magnitude, an e?icient paradigm for AbS predictive VQ of
`the REW parameter sequence, and dual-predictive AbS quan
`tization of the SEW.
`More particularly, the invention provides a method for
`interpolative coding input signals, the signals decomposed
`into or composed of a sloWly evolving Waveform and a rap
`idly evolving Waveform having a magnitude, the method
`incorporating at least one various, preferably combinations of
`the folloWing steps or can include all of the steps:
`(a) AbS VQ of the REW;
`(b) parametrizing the magnitude of the REW;
`(c) incorporating temporal Weighting in the AbS VQ of the
`REW;
`(d) incorporating spectral Weighting in the AbS VQ of the
`REW;
`(e) applying a ?lter to a vector quantizer codebook in the
`analysis-by-synthesis vector-quantization of the rapidly
`evolving Waveform Whereby to add self correlation to the
`codebook vectors; and
`(f) using a coder in Which a plurality of bits therein are
`allocated to the rapidly evolving Waveform magnitude.
`In addition, one can combine AbS quantization of the
`sloWly evolving Waveform With any or all of the foregoing
`parameters.
`The neW method achieves a substantial reduction in the
`REW bit rate and the EWI achieves very close to toll quality,
`at least under clean speech conditions. These and other fea
`tures, aspects, and advantages of the present invention Will
`become better understood With regard to the folloWing
`detailed description, appended claims, and accompanying
`draWings.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 is a REW Parametric Representation;
`FIG. 2 is a REW Parametric VQ;
`FIG. 3 is a REW Parametric Representation AbS VQ;
`FIG. 4 is a REW Parametric Representation Simpli?ed
`AbS VQ;
`FIG. 5 is a REW Parametric Representation Simpli?ed
`Weighted AbS VQ;
`FIG. 6 is a block diagram of the Dual Predictive AbS SEW
`vector quantization;
`FIG. 7 is a Weighted Signal-to-Noise Ratio (SNR) for Dual
`Predictive AbS SEW VQ;
`FIG. 8 is an output Weighted SNR for the 18 codebooks,
`9-bit AbS SEW VQ;
`
`The present invention relates to vector quantization (VQ)
`in speech coding systems using Waveform interpolation.
`In recent years, there has been increasing interest in achiev
`ing toll-quality speech coding at rates of 4 kbps and beloW.
`Currently, there is an ongoing 4 kbps standardization effort
`conducted by an international standards body (The Interna
`tional
`Telecommunications Union-Telecommunication
`(ITU-T) Standardization Sector). The expanding variety of
`emerging applications for speech coding, such as third gen
`eration Wireless netWorks and LoW Earth Orbit (LEO) sys
`tems, is motivating increased research efforts. The speech
`quality produced by Waveform coders such as code-excited
`linear prediction (CELP) coders degrades rapidly at rates
`beloW 5 kbps; see B. S. Atal, and M. R. Schroeder, (1984)
`“Stochastic Coding of Speech at Very LoW Bit Rate”, Proc.
`Int. Conf Comm, Amsterdam, pp. 1610-1613.
`On the other hand, parametric coders, such as: the Wave
`form-interpolative (WI) coder, the sinusoidal-transform
`coder (STC), and the multiband-excitation (MBE) coder, pro
`duce good quality at loW rates but they do not achieve toll
`quality; seeY Shoham, IEEEICASSP'93, Vol. II, pp. 167-170
`(1993); I. S. Burnett, and R. J. Holbeche, (1993), IEEE
`ICASSP'93, Vol. II, pp. 175-178; W. B. Kleijn, (1993), IEEE
`Trans. Speech andAudio Processing, Vol. 1, No. 4, pp. 386
`399; W. B. Kleijn, and J. Haagen, (1994), IEEE Signal Pro
`cessingLetters, Vol. 1, No. 9, pp. 136-138; W. B. Kleijn, and
`J. Haagen, (1995), IEEE ICASSP'95, pp. 508-511; W. B.
`Kleijn, and J. Haagen, (1995), in Speech Coding Synthesis by
`W. B. Kleijn and K. K. PaliWal, Elsevier Science B. V., Chap
`ter 5, pp. 175-207; I. S. Burnett, and G. J. Bradley, (1995),
`IEEE ICASSP'95, pp. 261-263, 1995; I. S. Burnett, and G. J.
`Bradley, (1995), IEEE Workshop on Speech Codingfor Tele
`communications, pp. 23-24; I. S. Burnett, and D. H. Pham,
`(1997), IEEE ICASSP'97, pp. 1567-1570; W. B. Kleijn, Y.
`Shoham, D. Sen, and R. Haagen, (1996), IEEE ICASSP'96,
`pp. 212-215;Y. Shoham, (1997), IEEEICASSP'97, pp. 1599
`1602; Y. Shoham, (1999), International Journal ofSpeech
`Technology, KluwerAcademic Publishers, pp. 329-341; R. J.
`McAulay, and T. F. Quatieri, (1995), in Speech Coding Syn
`thesis by W. B. Kleijn and K. K. PaliWal, Elsevier Science B.
`V., Chapter 4, pp. 121-173; and D. Grif?n, and J. S. Lim,
`(1988), IEEE Trans. ASSP, Vol. 36, No. 8, pp. 1223-1235.
`This is largely due to the lack of robustness of speech param
`eter estimation, Which is commonly done in open-loop, and to
`inadequate modeling of non-stationary speech segments.
`Commonly in WI coding, the similarity betWeen succes
`sive rapidly evolving Waveform (REW) magnitudes is
`exploited by doWnsampling and interpolation and by con
`strained bit allocation; see W. B. Kleijn, and J. Haagen,
`(1995), IEEE ICASSP'95, pp. 508-511. In a previous
`Enhanced Waveform Interpolative (EWI) coder the REW
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`

`
`US 7,584,095 B2
`
`3
`FIG. 9 is a mean-removed SEW’s Weighted SNR for the 18
`codebooks, 9-bit AbS SEW VQ; and
`FIG. 10 are predictors for three REW parameter ranges.
`
`DETAILED DESCRIPTION
`
`In very loW bit rate WI coding, the relation betWeen the
`SEW and the REW magnitudes Was exploited by computing
`the magnitude of one as the unity complement of the other;
`see W. B. Kleijn, and J . Haagen, (1995), “A Speech Coder
`Based on Decomposition of Characteristic Waveforms”,
`IEEE ICASSP’95, pp. 508-511; W. B. Kleijn, and J. Haagen,
`(1995), “Waveform Interpolation for Coding and Synthesis”,
`in Speech Coding Synthesis by W B. Kleijn and K. K. PaliWal,
`Elsevier Science B. V., Chapter 5, pp. 175-207; I. S. Burnett,
`and G. J. Bradley, (1995), “New Techniques for Multi-Proto
`type Waveform Coding at 2.84 kb/s”, IEEE ICASSP'95, pp.
`261-263, 1995; I. S. Burnett, and G. J. Bradley, (1995), “LoW
`Complexity Decomposition and Coding of Prototype Wave
`forms”, IEEE Workshop on Speech Coding for Telecommu
`nications, pp. 23-24; I. S. Burnett, and D. H. Pham, (1997),
`“Multi-Prototype Waveform Coding using Frame-by-Frame
`Analysis-by-Synthesis”, IEEE ICASSP'97, pp. 1567-1570;
`W. B. Kleijn, Y. Shoham, D. Sen, and R. Haagen, (1996), “A
`LoW-Complexity Waveform Interpolation Coder”, IEEE
`ICASSP'96, pp. 212-215; Y. Shoham, (1997), “Very LoW
`Complexity Interpolative Speech Coding at 1.2 to 2.4 kbps”,
`IEEE ICASSP'97, pp. 1599-1 602; Y. Shoham, (1999), “Low
`Complexity Speech Coding at 1.2 to 2.4 kbps Based on Wave
`form Interpolation”, International Journal of Speech Tech
`nology, KluWer Academic Publishers, pp. 329-341.
`Also, since the sequence of SEW magnitude evolves
`sloWly, successive SEWs exhibit similarity, offering oppor
`tunities for redundancy removal. Additional forms of redun
`dancy that may be exploited for coding ef?ciency are: (a) for
`a ?xed SEW/REW decomposition ?lter, the mean SEW mag
`nitude increases With the pitch period and (b) the similarity
`betWeen successive SEWs, also increases With the pitch
`period. In this Work We introduce a novel “dual-predictive”
`AbS paradigm for quantizing the SEW magnitude that opti
`mally exploits the information about the current quantized
`REW, the past quantized SEW, and the pitch, in order to
`predict the current SEW.
`
`20
`
`25
`
`30
`
`35
`
`40
`
`4
`REW Parametric Representation
`Direct quantization of the REW magnitude is a variable
`dimension quantization problem, Which may result in spend
`ing bits and computational effort on perceptually irrelevant
`information. A simple and practical Way to obtain a reduced,
`and ?xed, dimension representation of the REW is With a
`linear combination of basis functions, such as orthonormal
`polynomials; see W. B. Kleijn, Y. Shoham, D. Sen, and R.
`Haagen, (1996), IEEE ICASSP'96, pp. 212-215; Y Shoham,
`(1997),]EEEICASSP'97, pp. 1599-1602;Y Shoham, (1999),
`International Journal of Speech Technology, KluWer Aca
`demic Publishers, pp. 329-341 . Such a representation usually
`produces a smoother REW magnitude, and improves the per
`ceptual quality. Suppose the REW magnitude, R(u)), is rep
`resented by a linear combination of orthonormal functions,
`
`1:1
`Ru») - Z win-(w). 0 s w 5 7r
`
`(1)
`
`Where no is the angular frequency, and I is the representation
`order. The REW magnitude is typically an increasing func
`tion of frequency, Which, can be coarsely quantized With a loW
`number of bits per Waveform Without signi?cant perceptual
`degradation. Therefore, it may be advantageous to represent
`the REW magnitude in a simple, but perceptually relevant
`manner. Consequently We model the REW by the folloWing
`parametric representation, R(u),§):
`
`H
`1M. a =2 won-(w). 0 so in; 0 54:1
`[:0
`
`(2)
`
`, §,_1(g)]T is a parametric vector of
`.
`.
`Where \A((E):[\A(O(E), .
`coef?cients Within the representation model subspace, and E
`is the “unvoicing” parameter Which is zero for a fully voiced
`spectrum, and one for a fully unvoiced spectrum. Thus R(u),§)
`de?nes a tWo-dimensional surface Whose cross sections for
`each value of E give a particular REW magnitude spectrum,
`Which is de?ned merely by specifying a scalar parameter
`value.
`A simple and practical Way for parametric representation
`of the REW is, for example, by a parametric linear combina
`tion of basis functions, such as polynomials With parametric
`coe?icients, namely:
`
`For practical considerations assume that the parametric rep
`resentation is a pieceWise linear function of E, and may there
`fore be represented by a set of N uniformly spaced spectra, as
`illustrated in FIG. 1.
`
`REW Parametric Vector Quantization
`One can observe the similarity betWeen successive REW
`magnitude spectra, Which may suggest a potential gain by VQ
`of a set of successive REWs. FIG. 2 illustrates a simple
`parametric VQ system for a vector of REW spectra. The input
`is an M dimensional vector of REW magnitude spectra,
`
`45
`
`50
`
`Introduction to REW Quantization
`The REW represents the rapidly changing unvoiced
`attribute of speech. Commonly in WI systems, the REW is
`quantized on a Waveform by Waveform base. Hence, for loW
`rate WI systems having long frame size, and a large number of
`Waveforms per frame, the relative bitrate required for the
`REW becomes signi?cantly excessive. For example, consider
`a potential 2 kbps system Which uses a 240 sample frame, 12
`Waveforms per frame, and Which quantizes the SEW by alter
`nating bit allocation of 3 bit and 1 bit per Waveform. The REW
`55
`bitrate is then 24 bit per frame, or 800 kbps Which is 40% of
`the total bitrate. This example demonstrates the need for a
`more e?icient REW quantization.
`Ef?cient REW quantization can bene?t from tWo ob serva
`tions: (1) the REW magnitude is typically an increasing func
`tion of the frequency, Which suggests that an e?icient para
`metric representation may be used; (2) one can observe a
`similarity betWeen successive REW magnitude spectra,
`Which may suggest a potential gain by employing predictive
`VQ on a group of adjacent REWs. The next tWo sections
`propose REW parametric representation, and its respective
`
`60
`
`65
`
`

`
`5
`and the VQ output is an index, j, Which determines a quan
`tized parameter vector, E:
`
`6
`The quantized REW parameter is then given by:
`
`US 7,584,095 B2
`
`é:[é1>é2> -
`
`-
`
`- féMlT
`
`(5)
`
`5
`
`Which parametrically determines a vector of quantized spec
`tra:
`
`(13)
`
`é<w>:é<w.é>:tk<w.él11mg). -
`
`-
`
`- .iméMnT
`
`(6)
`
`In VQ case, the quantized parameter vector is given by:
`
`The encoder searches, in the parameter codebook C (16;), for
`the parameter vector Which minimizes the distortion:
`
`M
`
`é= argmi Z D(Rm. from} -
`
`gecqra W1
`
`M
`
`argrni Z
`gecqra W1
`
`(7)
`
`B. PieceWise Linear Parametric Representation
`In order to have a simple representation that is computa
`tionally e?icient and avoids excessive memory requirements,
`We model the tWo dimensional surface by a pieceWise linear
`parametric representation. Therefore, We introduce a set of N
`uniformly spaced spectra, {f{(uu,én)}n:o ‘1. Then the para
`metric surface is de?ned by linear interpolation according t:
`
`20
`
`For example, suppose the input REW magnitude is repre
`sented by an I-th dimensional vector of function coe?icients,
`y, given by:
`
`25
`
`VIP/0N1, -
`
`-
`
`- >YI-llT
`
`(8)
`
`For a set of M input REWs, each is of Which represented by a
`vector of polynomial coef?cients, ym, Which form a P><M
`input coef?cient matrix, I“:
`
`30
`
`Because this representation is linear, the coef?cients of
`IA{(u),E) are linear combinations of the coefficients of R(u),
`EM) and Rm.) Hence.
`
`Where y” is the coe?icient vector of the n-th REW magnitude
`function representation:
`
`i?é.)
`
`(17)
`
`In this case, the distortion may be interpolated by:
`
`zdwz
`
`(18)
`
`TIP/1N2, -
`
`-
`
`- NM]
`
`(9)
`
`The inverse VQ output is a vector of M quantized REWs,
`Which form the quantized function coe?icient matrix:
`
`?éHiél), 1(a). .
`
`.
`
`. re.»
`
`(10)
`
`Which is used by the decoder to compute the quantized spec
`tra.
`A. Quantization Using Orthonormal Functions
`Orthonormal functions, such as polynomials, may be used
`for e?icient quantization of the REW; see W. B. Kleijn, et al.,
`(1996), IEEE ICASSP'96, pp. 212-215; Y. Shoham, (1997),
`IEEE ICASSP'97, pp. 1599-1602; Y. Shoham, (1999), Inter
`national Journal of Speech Technology, KluWer Academic
`Publishers, pp. 329-341. Consider REW magnitude, R(u)),
`represented by a linear combination of orthonormal func
`tions, lpl-(uu):
`
`35
`
`40
`
`45
`
`50
`
`The above can be easily generalized to the parameter VQ
`case. The optimal interpolation factor that minimizes the
`distortion betWeen tWo representation vectors is given by:
`
`55
`
`Which is modeled using the parametric representation:
`
`60
`
`and the respective optimal parameter value, Which is a con
`tinuous variable betWeen zero and one, is given by:
`
`65
`
`This result alloWs a rapid search for the best unvoicing param
`eter value needed to transform the coe?icient vector to a
`scalar parameter, folloWed by the corresponding quantization
`scheme, as described in the section 4.
`
`

`
`US 7,584,095 B2
`
`7
`C. Weighted Distortion Quantization
`Commonly in speech coding, the magnitude is quantized
`using Weighted distortion measure. In this case the quantized
`REW parameter is then given by:
`
`8
`case. The optimal parameter that minimizes the spectrally
`Weighted distortion betWeen tWo representation vectors is
`given by:
`
`(Z1)
`
`110p: :
`
`(in — inilyxpbl — 9W1)
`
`(27)
`
`and the orthonormal function simpli?cation, given in equa
`tion (13), cannot be used. In this case, the Weighted distortion
`betWeen the input and the parametric representation modeled
`spectra is equal to:
`
`DW(R, 115)) =
`
`[0.
`
`(22)
`
`Where II'(W(uu)) is the Weighted correlation matrix of the
`orthonormal functions, its elements are:
`
`y is the input coef?cient vectors, and WE) is the modeled
`parametric coe?icient vector. In VQ case, the quantized
`parameter vector is given by:
`
`A
`
`q
`
`M
`
`H
`
`g = 22%;?)
`
`DAR... Rem} =
`
`(24)
`
`M
`
`argmi 2 (7m — wemfwwmwmm — Wm}
`560.7(5) W1
`
`D. Weighted DistortioniPieceWise Linear Parametric
`Representation
`Again, for practical considerations assume that the para
`metric representation is pieceWise linear, and may be repre
`sented by a set of N spectra, {IA{(u),én)}n:ON '1. For the piece
`Wise linear representation, the interpolated quantized
`coe?icient vector is:
`
`H
`
`(25)
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`In the case Where parameter VQ is employed, the interpola
`tion alloWs for a substantial simpli?cation of the search com
`putations. In this case, the distortion can be interpolated:
`
`60
`
`The above can be easily generalized to the parameter VQ
`
`65
`
`and the respective optimal parameter value, Which is a con
`tinuous variable betWeen zero and one, is given by equation
`(20). This result alloWs a rapid search for the best unvoicing
`parameter value needed to transform the coef?cient vector to
`a scalar parameter, for encoding or for VQ design. Altema
`tively, in order to eliminate using the matrix 11), the scalar
`product may rede?ned to incorporate the time-varying spec
`tral Weighting. The respective orthonormal basis functions
`then satisfy:
`
`Where 6(i-j) denotes Kroneker delta. The respective param
`eter vector is given by:
`
`. , 1p,_1]Tis an I-th dimensional vector
`.
`Where 1p(w):[1pO, 1p 1, .
`of time-varying orthonormal functions.
`REW Parameter Analysis-By-Synthesis VQ
`This section presents the AbS VQ paradigm for the REW
`parameter. The ?rst presentation is a system Which quantizes
`the REW parameter by employing spectral based AbS. Then
`simpli?ed systems, Which apply AbS to the REW parameter,
`are presented.
`A. REW Parameter Quantization by Magnitude AbS VQ
`The novel Analysis-by-Synthesis (AbS) REW parameter
`VQ technique is illustrated in FIG. 3. An excitation vector
`cZ-J-(m) (m:l; .
`.
`. ,M) is selected from the VQ codebook and is
`fed through a synthesis ?lter to obtain a parameter vector
`i@(m) (synthesized quantized) Which is then mapped to quan
`tized a representation coe?icient vectors
`This is
`compared With a sequence of input representation coef?cient
`vectors y(m) and each is spectrally Weighted. Each spectrally
`Weighted error is then temporally Weighted, and a distortion
`measure is obtained. A search through all candidate excitation
`vectors determines an optimal choice. The synthesis ?lter in
`FIG. 3 can be vieWed as a ?rst order predictor in a feedback
`loop. (While shoWn here is an auto -regressive synthesis ?lter,
`in other arrangements moving-average (MA) synthesis ?lter
`may be used.) By alloWing the value of the predictor param
`eter P to change, it becomes a “switched-predictor” scheme.
`Switched-prediction is introduced to alloW for different levels
`of REW parameter correlation.
`The scheme incorporates both spectral Weighting and tem
`poral Weighting. The spectral Weighting is used for the dis
`tortion betWeen each pair of input and the quantized spectra.
`In order to improve SEW/REW mixing, particularly in mixed
`voiced and unvoiced speech segments, and to increase speech
`crispness, especially for plosives and onsets, temporal
`
`

`
`US 7,584,095 B2
`
`Weighting is incorporated in the AbS REW VQ. The temporal
`Weighting is a monotonic function of the temporal gain. TWo
`codebooks are used, and each codebook has an associated
`predictor coef?cient, P 1 and P2. The quantization target is an
`M-dimensional vector of REW spectra. Each REW spectrum
`is represented by a vector of basis function coef?cients
`denoted by y(m). The search for the minimal WMSE is per
`formed over all the vectors, cZ-J-(m), of the tWo codebooks for
`iIl, 2. The quantized REW function coef?cients vector, y(
`2011)), is a function of the quantized parameter i@(m), Which is
`obtained by passing the quantized vector, cZ-J-(m), through the
`synthesis ?lter. The Weighted distortion betWeen each pair of
`input and quantized REW spectra is calculated. The total
`distortion is a temporally-Weighted sum of the M spectrally
`Weighted distortions. Since the predictor coef?cients are
`known, direct VQ can be used to simplify the computations.
`For a pieceWise linear parametric REW representation, a
`substantial simpli?cation of the search computations may be
`obtained by interpolating the distortion betWeen the represen
`tation spectra set, as explained in sections 3B. and 3D.
`A sequence of quantized parameter, such as 6(k), is formed
`
`by concatenating successive quantized vectors, such as (m)}m: 1M . The quantized parameter is computed recursively
`by:
`
`20
`
`é<k>:P<k>é<k-1>+@<k>
`
`25
`
`(30)
`
`Where k is the time index of the coded Waveform.
`B. Simpli?ed REW Parameter AbS VQ
`The above scheme maps each quantized parameter to coef
`?cient vector, Which is used to compute the spectral distor
`tion. To reduce complexity, such mapping, and spectral dis
`tortion computation, Which contribute to the complexity of
`the scheme, may be eliminated by using the simpli?ed
`scheme described beloW. For a high rate, and a smooth rep
`resentation surface RQnfé), the total distortion is equal to the
`sum of modeling distortion and quantization distortion:
`
`30
`
`35
`
`10
`Which is linearly related to the REW parameter squared quan
`tization error, (E(m)—é(m))2 and, therefore, justi?es directVQ
`of the REW parameter.
`B. l. Simpli?ed REW Parameter AbS VQiNon Weighted
`Distortion
`FIG. 4 illustrates a simpli?ed AbS VQ for the REW para
`metric representation. The encoder maps the REW magnitude
`to an unvoicing REW parameter, and then quantizes the
`parameter by AbS VQ. Initially, the magnitudes of the M
`REWs in the frame are mapped to coe?icient vectors,
`{y(m)}m:lM. Then, for each coe?icient vector, a search is
`performed to ?nd the optimal representation parameter, i@(y),
`using equation (20), to form an M-dimensional parameter
`vector for the current frame, {E(y(m))}m:1M. Finally, the
`parameter vector is encoded by AbS VQ. The decoded spec
`tra, {lA{(w,é(m))}m:LM, are obtained from the quantized
`parameter vector, {E(m)}m:lM, using equation (15). This
`scheme alloWs for higher temporal, as Well as spectral REW
`resolution, compared to the common method described in W.
`B. Kleijn, et al, IEEE ICASSP’95, pp. 508-511 (1995), since
`no doWnsampling is performed, and the continuous param
`eter is vector quantized in AbS.
`B.2. Simpli?ed REW Parameter AbS VQiWeighted Dis
`tortion
`The simpli?ed quantization scheme is improved to incor
`porate spectral and temporal Weightings, as illustrated in FIG.
`5. The REW parameter vector is ?rst mapped to REW param
`eter by minimizing a distortion, Which is Weighted by the
`coe?icient spectral Weighting matrix 1P, as described in sec
`tion 3.D. Then, the resulted REW parameter is used to com
`pute a Weighting, WS(E(m)), Which We choose to be the spec
`tral sensitivity to the REW parameter squared quantization
`error, (E(m)—‘§(m))2, given by:
`
`M: S L
`
`EM:
`
`The quantization distortion is related to the quantized param
`eter by:
`
`M
`
`Which, for the pieceWise linear representation case, is equal to
`
`(31)
`
`40
`
`For the pieceWise linear representation case, using equation
`(33), the folloWing equation is obtained:
`
`45
`
`50
`
`55
`
`60
`
`65
`
`(35)
`
`The above derivative can be easily computed off line. Addi
`tionally, a temporal Weighting, in form of monotonic function
`of the gain, denoted by Wt(g(m)), is used to give relatively
`large Weight to Waveforms With larger gain values. The AbS
`REW parameter quantization is computed by minimizing the
`combined spectrally and temporally Weighted distortion:
`
`The Weighted distortion scheme improves the reconstructed
`speech quality, most notably in mixed voiced and unvoiced
`speech segments. This may be explained by an improvement
`in REW/ SEW mixing.
`
`

`
`US 7,584,095 B2
`
`1 1
`Dual Predictive AbS SEW Quantization
`FIG. 6 illustrates a Dual Predictive SEW AbS VQ scheme
`Which uses tWo observables, (a) the quantized REW, and (b)
`the past quantized SEW, to jointly predict the current SEW.
`Although We refer to the operator on each observable as a
`“predictor”, in fact both are components of a single optimized
`estimator. The SEW and the REW are complex random vec
`tors, and their sum is a residual vector having elements Whose
`magnitudes have a mean value of unity. In loW bit-rate W

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket