throbber
US007643996B1
`US007643996B1
`
`US 7,643,996 B1
`(10) Patent N0.:
`(12) Ulllted States Patent
`(12) United States Patent
`US 7,643,996 B1
`(10) Patent N0.:
`Gottesman
`(45) Date of Patent:
`Jan. 5, 2010
`
` Gottesman (45) Date of Patent: Jan. 5, 2010
`
`
`(54) ENHANCED WAVEFORM INTERPOLATIVE
`(54) ENHANCED WAVEFORM INTERPOLATIVE
`CODER
`CODER
`
`(75) Inventor: Oded Gottesman, Goleta, CA (US)
`(75)
`Inventor: Oded Gottesman, Goleta, CA (US)
`.
`_
`(73) A551gnee: The Regents of the University of
`(73) Asslgnee: The Regents of the University of
`California, Oakland, CA (US)
`California, Oakland, CA (US)
`
`( * ) Notice:
`( * ) Notice:
`
`Subject to any disclaimer, the term of this
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`U.S.C. 154(b) by Odays.
`
`(21) Appl. No.:
`(21) Appl. No.:
`
`09/831,843
`09/831,843
`
`(22) PCT Filed:
`(22) PCT Filed:
`
`Dec. 1, 1999
`Dec. 1, 1999
`
`(86) PCT No.:
`(86) PCT No.:
`
`PCT/US99/28449
`PCT/US99/28449
`
`§ 371 (6X1),
`§ 371 (0(1),
`(2), (4) Date:
`(2), (4) Date:
`
`Aug. 13, 2001
`Aug. 13, 2001
`
`(87) PCT Pub. No.: W000/33297
`(87) PCT Pub. No.: WO00/33297
`
`PCT Pub. Date: Jun. 8, 2000
`PCT Pub. Date: Jun. 8, 2000
`
`Related U-s- Application Data
`Related U-S- Application Data
`(60) Provisional application No. 60/110,522, ?led on Dec.
`(60) Provisional application No. 60/110,522, filed on Dec.
`1, 1998, provisional application No. 60/110,641, ?led
`1, 1998, provisional application No. 60/110,641, filed
`on Dec_ 1 1998'
`on Dec. 1 1998.
`5
`’
`
`(51)
`Int. Cl.
`(51) Int. Cl.
`G10L 13/04
`G10L 13/04
`
`(2006.01)
`(2006.01)
`
`(52) US. Cl.
`....................... 704/265; 704/219; 704/230;
`(52) US. Cl. ..................... .. 704/265; 704/219; 704/230;
`704/220; 704/205; 704/223
`704/220; 704/205; 704/223
`(58) Field of Classification Search ................. 704/205,
`(58) Field of Classi?cation Search ............... .. 704/205,
`704/207’ 219, 230, 220, 222, 225, 265, 223
`704/207, 2193 230a 220a 222a 225a 265, 223
`See application file for complete search history.
`See application ?le for complete search history.
`
`(56)
`(56)
`
`References Cited
`References Cited
`
`U’S’ PATENT DOCUMENTS
`U'S' PATENT DOCUMENTS
`4,653,098 A *
`3/1987 Nakata etal.
`............... 704/207
`4,653,098 A *
`3/1987 Nakata et a1. ............. .. 704/207
`5,086,471 A *
`2/1992 Tanaka et al.
`............... 704/222
`5,086,471 A *
`2/1992 Tanaka et a1. ............. .. 704/222
`5,517,595 A *
`5/1996 Kleijn ........................ 704/205
`5,517,595 A *
`5/1996 Kleijn ...................... .. 704/205
`6,418,408 B1 *
`7/2002 Udaya Bhaskar et al.
`704/219
`6,418,408 B1 *
`7/2002 Udaya Bhaskar et a1.
`704/219
`6,493,664 B1 * 12/2002 Udaya Bhaskar et al.
`704/222
`6,493,664 B1* 12/2002 Udaya Bhaskar et a1.
`704/222
`
`* cited by examiner
`* cited by examiner
`
`..
`__
`Primary Examinerngay B Chawan
`Primary ExamineriVljay B ChaWan
`(74) Attorney, Agent, or FirmiBerliner & Associates
`(74) Attorney, Agent, or FirmiBerliner & Associates
`
`(57)
`(57)
`
`ABSTRACT
`ABSTRACT
`
`An Enhanced analysis-by-synthesis Waveform Interpolative
`An Enhanced analysis-by-synthesis Waveform lnterpolative
`speech coder able to operate at 4 kbps. Novel features include
`speech coder able to operate at 4 kbps. Novel features include
`analysis-by-synthesis quantization of the slowly evolving
`analysis-by-synthesis quantization of the slowly evolving
`Waveform, analysis-by-synthesis vector quantization of the
`waveform, analysis-by-synthesis vector quantization of the
`disPersion phase? a Special pitch search_for transitions’ and
`dispersion phase, a special. pitch search. for transitions, and
`SW1‘9hed'Pre‘l1C‘1Ye analy_sls'by'syl_nhe_3sls gam VeFtOr quan'
`SWltPhed'Pre‘mWe analy.51s-by-synthes1s gam V3.01“ quan-
`t1Zat1on. Sub]ect1ve quallty tests 1nd1cate that 1t exceeds
`t1zatlon. SubJeCt1ve quahty tests 1nd1cate that 1t exceeds
`MPEG-4 at 4 kbps and of G.723.1 at 6.3 kbps.
`MPEG-4 at 4 kbps and of G.723.1 at 6.3 kbps.
`
`34 Claims, 4 Drawing Sheets
`34 Claims, 4 Drawing Sheets
`
`LPC
`LPC
`ANALYSIS
`ANALYSIS
`
`____)
`
`LPC
`LPC
`INTERPOLATION
`
`-
`‘
`
` WAVEFORM
`INTERPOIATION
`J_l_l
`WAVEFORM
`
`
`EXTRACTION+
`SPEECH
`Am RESIDUAL
`EXTRACTION+
`
`ALIGNMENI+
`
`L__l—
`ALIGNMENT+
`'
`DECOMPOSITION
`DECOMPOSITION
`
`PITCH
`EXTRACTION
`
`‘F0
`WAVEFORM
`
`
`WAVEFORM
`SYNTHESIZER
`SYNTHESIZER
`
`(
`I M
`
`( 1 M
`m
`
`
`iNTERPOLATION
`WAVEFORM
`
`
`\
`INTERPOLATION
`_) WAVEFORM
`+LOOKAHEAD
`CODEBOOKS
`+LO0KAHEAD
`CODEBOOKS
`LOOKAHEAD FR
`
`
`
`EXTRAPOLATION)
`EXTRAPOLATION)
`_ w
`‘01-1-1
`
`
`
`|PR2017-01075
`Saint Lawrence Communications
`Exhibit 2016
`
`

`

`US. Patent
`U.S. Patent
`
`Jan. 5, 2010
`1m5,
`
`Sheet 1 of4
`hS
`
`US 7,643,996 B1
`US 7,643,996 B1
`
`,m2.vammummzza%52w;
`
`0:2:
`TE:zofiéxw
`
`....... :< i , , I . . 1% , 6E
`
`Ti: F Tfz _ ZOE/‘Eva <
`
`“J“$3222:
`n.2258580
`
`ZQESESMQ
`
`53%,;
`2mob>s$
`
`532%“
`i222: S<
`
`20526 225% 55%
`
`zoCSonm—HE
`m 2252152 22%
`
`0n:
`
`_ 0,: on:
`
`on:
`
`2mfi<z<
`
`
`
`+ - 52%; a
`
`, F ,, r @355
`
`
`
`420:58me53%;
`
`N6E
`
`_
`
`
`
`so 2; ................................... ......
`
`1 | Ti; “
`
`
`
`@538:mxooaooo
`
`E? A 5 _ @5502 Dir/Q8: @8580 L"
`
`
`
`20:59:55
`1 i + AZQEODEEE _
`zegonmvaz 2 55>; a
`
`ll! 1/ E <
`
`
`
`

`

`US. Patent
`
`Jan. 5, 2010
`
`Sheet 2 M4
`
`US 7,643,996 B1
`
`F/G. 2
`
`x104
`
`1.0
`
`ORIGINAL
`
`0
`
`2
`
`
`
`
`
`MQESQSZ maajmsz NEESQSZ
`
`A.
`
`9 9
`
`_ _‘ _ _ _ _ _ _ _
`
`X x
`
`1|- .1l.
`
`0 40
`
`#6. T6. 1%
`
`0 0 0
`
`50.50. 05050 0.5.0.50
`
`
`
`
`00n_u1_x 1.00%.“. 100?_v.1|
`
`_
`
`PR _R
`D E km W M
`00 N O
`OUT. 00 I 0
`
`‘7mm LAW. m Ln.
`
`0 0D 0
`17 1H FM
`
`
`
`
`
` 3 3 3 l_.! I7. I7. 0 0 0
`
`m
`
`0 0 0
`
`
`
`T7 Fm. FLA/u
`
`Fla" 3 PITCH-CYCLE
`CRUDE
`WAVEFORM'S DFT LINEAR
`PHASE
`ALIGNMENT
`
`REFI NED
`LI NEAR
`PHASE
`ALIGNMENT
`
`MAGNITUDE
`CODEBOOK
`
`PHASE
`CODEBOOK
`1
`L_________
`
`I
`
`5
`PITCH -""
`
`______________._____.1
`
`

`

`U.S. Patent
`
`Jan. 5, 2010
`
`Sheet 3 of 4
`
`US 7,643,996 B1
`
`
`
`
`
`SEC.WEIGHTEDSNR:dB
`
`F/G. 4
`
`.p.
`
`M
`
`O
`
`(X)
`
`NON—MIRS (FLAT)
`
`
`
`3
`
`PHASE BITS
`
`
`
`SUBJECTIVESCORE
`
`
`
`FEMALE
`
`50%
`
`45%
`
`40%
`
`35%
`
`30%
`
`25%
`
`20%
`
`1 5%
`
`10%
`
`5%
`
`0%
`
`

`

`US. Patent
`
`Jan. 5, 2010
`
`Sheet 4 0f 4
`
`US 7,643,996 B1
`
`SPEECH
`
`SPECTRAL DOMAIN
`PITCH SEARCH+TRACKER
`
`TEMPORAL DOMAIN
`PITCH REFINEMENT
`
`WEI GHTED
`SPEECH
`
`TEMPORAL DOMAIN
`PITCH SEARCH
`
`NO
`
`GOOD
`PITCHES
`?
`
`500m
`
`USE 4ms
`WEIGHTED-AVERAGE
`WAVEFORM
`PITCH
`LENGTH
`|_____—_—___
`
`IOOHz
`
`LOG-GAIN
`
`9(m)
`
`l m
`
`0c
`Di
`CODEBOOK
`F/G 7 PREDICTOR P;
`'
`CODEBOOK h;
`QLYESTTI'ZRER CUM SYNTHESIS
`
`1 _1
`
`\ __.
`
`T
`
`I
`i
`L ------------ —- MIN M2
`
`TEMPORAL
`WEIGHTING
`
`

`

`US 7,643,996 B1
`
`1
`ENHANCED WAVEFORM INTERPOLATIVE
`CODER
`
`CROSS REFERENCE TO RELATED
`APPLICATIONS
`
`This application claims the bene?t of Provisional Patent
`Application Nos. 60/110,522, ?led Dec. 1, 1998 and 60/110,
`641 ?led Dec. 1, 1998.
`
`BACKGROUND OF THE INVENTION
`
`Recently, there has been growing interest in developing
`toll-quality speech coders at rates of 4 kbps and beloW. The
`speech quality produced by Waveform coders such as code
`excited linear prediction (CELP) coders degrades rapidly at
`rates beloW 5 kbps [B. S. Atal, and M. R. Schroder, “Stochas
`tic Coding of Speech at Very LoW Bit Rate”, Proc. Int. Conf.
`Comm, Amsterdam, pp. 1610-1613, 1984]. On the other
`hand, parametric coders such as the Waveform-interpolative
`(WI) coder, the sinusoidal-transform coder (STC), and the
`multiband-excitation (MBE) coder produce good quality at
`loW rates, but they do not achieve toll quality [Y. Shoham,
`“High Quality Speech Coding at 2.4 and 4.0 kbps Based on
`Time Frequency-Interpolation”, IEEE ICASSP’93, Vol. II,
`pp. 167-170, 1993; W. B. Kleijn, and J. Haagen, “Waveform
`Interpolation for Coding and Synthesis”, in Speech Coding
`Synthesis by W. B. Kleijn and K. K. PaliWal, Elsevier Science
`B. V., Chapter 5, pp. 175-207, 1995; I. S. Burnett, and D. H.
`Pham, “Multi-Prototye Waveform Coding using Frame-by
`Frame Analysis-by-Synthesis”, IEEE ICASSP’97, pp. 1567
`1570, 1997; R. J. McAulay, and T. F. Quatieri, “Sinusoidal
`Coding”, in Speech Coding Synthesis by W. B. Kleijn and K.
`K. PaliWal, Elsevier Science B. V., Chapter 4, pp. 121-173,
`1995; and D. Grif?n, and J. S. Lim, “Multiband Excitation
`Vocoder”, IEEE Trans. ASSP, Vol. 36, No. 8, pp. 1223-1235,
`August 1988]. This is mainly due to lack of robustness to
`parameter estimation, Which is commonly done in open loop,
`and to inadequate modeling of non-stationary speech seg
`ments. Also, in parametric coders the phase information is
`commonly not transmitted, and this is for tWo reasons: ?rst,
`the phase is of secondary perceptual signi?cance; and second,
`no e?icient phase quantization scheme is knoWn. WI coders
`typically use a ?xed phase vector for the sloWly evolving
`Waveform [Shoham, supra; Kleijn et al, supra; and Burnett et
`al, supra]. For example, in Kleijn et al, a ?xed male speaker
`extracted phase Was used. On the other hand, Waveform cod
`ers such as CELP, by directly quantiZing the Waveform,
`implicitly allocate an excessive number of bits to the phase
`informationimore than is perceptually required.
`
`SUMMARY OF THE INVENTION
`
`The present invention overcomes the foregoing drawbacks
`by implementing a paradigm that incorporates analysis-by
`synthesis (AbS) for parameter estimation, and a novel pitch
`search technique that is Well suited for the non-stationary
`segments. In one embodiment, the invention provides a novel,
`e?icient AbS vector quantiZation (V O) encoding of the dis
`persion phase of the excitation signal to enhance the perfor
`mance of the Waveform interpolative (WI) coder at a very loW
`bit-rate, Which can be used forparametric coders as Well as for
`Waveform coders. The enhanced analysis-by-synthesis Wave
`form interpolative (EWI) coder of this invention employs this
`scheme, Which incorporates perceptual Weighting and does
`not require any phase unWrapping.
`The WI coders use non-ideal loW-pass ?lters for doWnsam
`pling and unsampling of the sloWly evolving Waveform
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`(SEW). In another embodiment of the invention, A novel AbS
`SEW quantiZation scheme is provided, Which takes the non
`ideal ?lters into consideration. An improved match betWeen
`reconstructed and original SEW is obtained, most notably in
`the transitions.
`Pitch accuracy is crucial for high quality reproduced
`speech in WI coders. Still another embodiment of the inven
`tion provides a novel pitch search technique based on varying
`segment boundaries; it alloWs for locking onto the most prob
`able pitch period during transitions or other segments With
`rapidly varying pitch.
`Commonly in speech coding, the gain sequence is doWn
`sampled and interpolated. As a result it is often smeared
`during plosives and onsets. To alleviate this problem, a further
`embodiment of the invention provides a novel sWitched-pre
`dictive AbS gain VQ scheme based on temporal Weighting.
`More particularly, the invention provides a method for
`interpolative coding of input signals at loW data rates in Which
`there may be signi?cant pitch transitivity, the signals having
`an evolving Waveform, the method incorporating at least one,
`and preferably all, of the folloWing steps:
`(a) AbS VQ of the SEQ Whereby to reduce distortion in the
`signal by obtaining the accumulated Weighted distortion
`betWeen an original sequence of Waveforms and a sequence
`of quantiZed and interpolated Waveforms;
`(b) AbS quantiZation of the dispersion phase;
`(c) locking onto the most probable pitch period of the
`signal using both a spectral domain pitch search and a tem
`poral domain pitch search;
`(d) incorporating temporal Weighting in the AbS VQ of the
`signal gain, Whereby to emphasiZe local high energy events in
`the input signal;
`(e) applying both high correlation and loW correlation syn
`thesis ?lters to a vector quantiZer codebook in the AbS VQ of
`the signal gain Whereby to add self correlation to the code
`book vectors and maximiZe similarity betWeen the signal
`Waveform and a codebook Waveform;
`(f) using each value of gain in the AbS VQ of the signal gain
`to obtain a plurality of shapes, each composed of a predeter
`mined number of values, and comparing said shapes to a
`vector quantiZed codebook of shapes, each having said pre
`determined number of values, e.g., in the range of 2-50,
`preferably 5-20; and
`(g) using a coder in Which a plurality of bits, eg 4 bits, are
`allocated to the SEW dispersion phase.
`The method of the invention can be used in general With
`any Waveform signal, and is particularly useful With speech
`signals. In the step of AbS VQ of the SEW, distortion is
`reduced in the signal by obtaining the accumulated Weighted
`distortion betWeen an original sequence of Waveforms and a
`sequence of quantiZed and interpolated Waveforms. In the
`step of AbS quantiZation of the dispersion phase, at least one
`codebook is provided that contains magnitude and phase
`information for predetermined Waveforms. The linear phase
`of the input is crudely aligned, then iteratively shifted and
`compared to a plurality of Waveforms reconstructed from the
`magnitude and phase information contained in one or more
`codebooks. The reconstructed Waveform that best matches
`one of the iteratively shifted inputs is selected.
`In the step of locking onto the mo st probable pitch period of
`the signal, the invention includes searching the temporal
`domain pitch, de?ning a boundary for a segment of said
`temporal domain pitch, maximiZing the length of the bound
`ary by iteratively shrinking and expanding the segment, and
`maximiZing the similarity by shifting the segment. The
`searches are preferably conducted respectively at 100 HZ and
`500 HZ.
`
`

`

`US 7,643,996 B1
`
`3
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 is a block diagram of the AbS SEW vector quanti
`zation;
`FIG. 2 shows amplitude-time plots illustrating the
`improved Waveform matching obtained for a non-stationary
`speech segment by interpolating the optimized SEW;
`FIG. 3 is a block diagram of the AbS dispersion phase
`vector quantization;
`FIG. 4 is a plot of the segmentally Weighted signal-to-noise
`ratio of the phase vector quantization versus the number of
`bits, for modi?ed intermediate reference system (MIRS) and
`for non-MIRS (?at) speech;
`FIG. 5 shoWs the results of subjective A/B tests comparing
`a 4-bit phase vector quantization and a male extracted ?xed
`phase;
`FIG. 6 is a block diagram of the pitch search of the EWI
`coder; and
`FIG. 7 is a block diagram of the sWitch-predictiveAbS gain
`VQ using temporal Weighting.
`
`20
`
`DETAILED DESCRIPTION OF THE INVENTION
`
`The invention has a number of embodiments, some of
`Which can be used independently of the others to enhance
`speech and other signal coding systems. The embodiments
`cooperate to produce a superior coding system, involving
`AbS SEW optimization, and novel dispersion phase quan
`tizer, pitch search scheme, sWitched-predictive AbS gain VQ,
`and bit allocation.
`
`25
`
`30
`
`4
`the gain, i.e. the g parameter, or another combination of input
`and quantized LPC polynomials, i.e. the A(Z) and A(Z)
`parameters.
`The interpolated SEW vectors are given by:
`
`fm:[l—(1(lm)]fO+(1(lm)fM'mIl, .
`
`. .M
`
`(3)
`
`Where t is time, In is the number of Waveforms in a frame, and
`i0 and 2M are the quantized SEW at the previous and at the
`current frame respectively. The parameter 0[ is an increasing
`linear function from 0 to 1. It can be shoWn that the accumu
`lated distortion in equation (1) is equal to the sum of modeling
`distortion and quantization distortion:
`
`Where the quantization distortion is given by:
`
`(4)
`
`(5)
`
`The optimal vector, rMpPt, Which minimizes the modeling
`distortion, is given by:
`
`AbS SEW Quantization
`Commonly in WI coders the SEW is distorted by doWn
`sampling and upsampling With non-ideal loW-pass ?lters. In
`order to reduce such distortion, an AbS SEW quantization
`scheme, illustrated in FIG. 1, Was used. Consider the accu
`mulated Weighted distortion, DWI, betWeen the input SEW
`vectors r and the interpolated vectors, 2m, given by:
`
`am:
`
`35
`
`[ME
`
`40
`
`(1)
`
`45
`
`Where the ?rst sum is that of many current distortions and the
`second sum is that of lookahead distortions. H denotes Her
`mitian (transposed+complex conjugate), M is the number of
`Waveforms per frame, L is the lookahead number of Wave
`forms, 0[(t) is some increasing interpolation function in the
`range 0§0[(t)§ l, and Wm is diagonal matrix Whose elements,
`Wkk, and the combined spectral-Weighting and synthesis of
`the k-th harmonic given by:
`
`50
`
`55
`
`2
`
`2.
`
`_ B
`
`'
`
`.
`
`<2)
`
`60
`
`Where P is the pitch period, K is the number of harmonics, g
`is the gain , A(z) and A(z) are the input and the quantized LPC
`polynomials respectively, and the spectral Weighting param
`eters satisfy 0§y2<y2§l It is also possible to leave out the
`inverse of the number of harmonics, i.e., the l/ K parameter,
`
`65
`
`Therefore, VQ With the accumulated distortion of equation
`(1) can be simpli?ed by using the distortion of equation (5),
`and:
`
`1
`
`(6)
`
`An improved match betWeen reconstructed and original
`SEW is obtained, most notably in the translations. FIG. 2
`illustrates the improved Waveform matching obtained for a
`non-stationary speech segment by interpolating the opti
`mized SEW.
`
`AbS Phase Quantization
`The dispersion-phase vector quantization scheme is illus
`trated in FIG. 3. Consider a pitch cycle Which is extracted
`from the residual signal, and is cyclically shifted such that its
`pulse is located at position zero. Let its discrete Fourier trans
`form (DFT) are denoted by r; the resulting DFT phase is the
`dispersion phase, 4), Which determines, along With the mag
`nitude |r|, the Waveform’ s pulse shape. The SEW Waveform r
`is the vector of complex DFT coe?icients. The complex num
`ber can represent magnitude and phase. After quantization,
`the components of the quantized magnitude vector, lrl, are
`multiplied by the exponential of the quantized phases, (Mk), to
`yield the quantized Waveform DFT, i, which is subtracted
`from the input DFT to produce the error DFT. The error DFT
`is then transformed to the perceptual domain by Weighting it
`by the combined synthesis and Weighting ?lter W(z)/A(z). In
`a crude linear phase alignment, the encoder searches for the
`phase that minimizes the energy of the perceptual domain
`
`

`

`US 7,643,996 B1
`
`5
`error, shifting the signal such that the peak is located at time
`zero. It then allows a re?ning cyclic shift of the input Wave
`form during the search, incrementally increasing or decreas
`ing the linear phase, to eliminate any residual phase shift
`betWeen the input Waveform and the quantized Waveform.
`Although shoWn in FIG. 3 as occurring immediately after the
`crude linear phase alignment, the re?ned linear phase align
`ment step can occur elseWhere in the cycle, e.g., betWeen the
`X and + steps. Phase dispersion quantization aims to improve
`Waveform matching. Ef?cient quantization can be obtained
`by using the perceptually Weighted distortion:
`
`The magnitude is perceptually more signi?cant than the
`phase; and should therefore be quantized ?rst. Furthermore, if
`the phase Were quantized ?rst, the very limited bit allocation
`available for the phase Would lead to an excessively degraded
`spectral matching of the magnitude in favor of a someWhat
`improved, but less important, matching of the Waveform. For
`the above distortion, the quantized phase vector is given by:
`
`Where i is the running phase codebook index, and eff” is the
`respective diagonal phase exponent matrix Where i is the
`running phase codebook index, and the respective phase
`exponent matrix is given by
`
`20
`
`25
`
`30
`
`eivz : diagOnaHem-w}
`
`35
`
`(9)
`
`The AbS search for phase quantization is based on evaluating
`(8) for each candidate phase codevector. Since only trigono
`metric functions of the phase candidates are used, phase
`unWrapping is avoided. The EWI coder uses the optimized
`SEW, r M’O pt, and the optimized Weighting, W M’O p t, for the AbS
`phase quantization.
`
`40
`
`45
`
`Equation (8) : argmaX{f rw(¢);'w(¢p $5) d¢}
`@i
`0
`
`2”
`
`Equivalently, the quantized phase vector can be simpli?ed to:
`
`50
`
`55
`
`Where (l)(k) is the phase of, r(k), the k-th input DFT coe?icient.
`The average global distortion measure for M vector set is:
`
`60
`
`1
`Dmcbbal = —
`m:(Dara Vectors}
`
`A
`A
`Dw(rm, BWmVM) =
`
`(11)
`
`65
`
`6
`
`-continued
`
`1
`M
`m:(Dara Vectors}
`
`The centroid equation [A. Gersho et al, “Vector Quantiza
`tion and Signal Compression”, KluWerAcademic Publishers,
`1992] of the k-th harmonic’s phase for the j -th cluster, Which
`minimizes the global distortion in equation (11), is given by:
`
`souofhrcluster : man
`
`These centroid equations use trigonometric functions of
`the phase, and therefore do not require any phase unWrapping.
`It is possible to use |r(k)m|2 instead of |r(k)m||r(k)m|.
`The phase vector’s dimension depends on the pitch period
`and, therefore, a variable dimension Q has been implemented.
`In the WI system the possible pitch period value Was divided
`into eight ranges, and for each range of pitch period an opti
`mal codebook Was designed such that vectors of dimension
`smaller than the largest pitch period in each range are zero
`padded.
`Pitch changes over time cause the quantizer to sWitch
`among the pitch-range codebooks. In order to achieve smooth
`phase variations Whenever such sWitch occurs, overlapped
`training clusters Were used.
`The phase-quantization scheme has bene implemented as a
`part of WI coder, and used to quantize the SEW phase. The
`objective performance of the suggested phase VQ has been
`tested under the folloWing conditions:
`Phase Bits: 0-6 ever 20 ms, a bitrate of 0-300 bit/ second.
`8 pitch ranges Were selected, and training has been per
`formed for each range.
`Modi?ed IRS (MIRS) ?ltered speech (Female+Male)
`Training Set: 99,323 vectors.
`Test Score: 83,099 vectors.
`Non-MIRS ?ltered speech (Female+Male)
`Training Set: 101,359 vectors.
`Test Set: 95,446 vectors.
`The magnitude Was not quantized.
`The segmental Weighted signal-to-noise ratio (SNR) of the
`quantizer is illustrated in FIG. 4. The proposed system
`achieves approximately 14 dB SNR for as loW as 6 bits for
`non-MIRS ?ltered speech, and nearly 10 dB for MIRS ?ltered
`speech.
`Recent WI coders have used a male speaker extracted dis
`persion phase [Kleijn et al, supra: Y. Shoham, “Very LoW
`Complexity Interpolative Speech Coding at 1.2 to 2.4
`KBPS”, IEEE ICASSP ’97, pp. 1599-1602, 1997].A subjec
`tive A/B testW as conducted to compare the dispersion phase
`of this invention, using only 4 bits, to a male extracted dis
`persion phase. The test data included 16 MIRS speech sen
`tences, 8 of Which are of female speakers, and 8 of male
`speakers. During the test, all pairs of ?le Were played tWice in
`alternating order, and the listeners could vote for either of the
`systems, or for no preference. The speech material Was syn
`thesized using WI system in Which only the dispersion phase
`Was quantized every 20 ms. TWenty one listeners participated
`in the test. The test results, illustrated in FIG. 5, shoW
`
`

`

`US 7,643,996 B1
`
`8
`for plosives and onsets, temporal Weighting is incorporated in
`the AbS gain VQ. The Weighting is a monotonic function of
`the temporal gain. TWo codebooks of 32 vectors each are
`used. Each codebook has an associated predictor coe?icient,
`Pi, and a DC offset D. The quantization target vector is the
`DC removed log-gain vector denoted by t(m). The search for
`the minimal Weighted mean squared error (WMSE) is per
`formed over all the vectors, cZ-J-(m), of the codebooks. The
`quantized target, i(m), is obtained by passing the quantized
`vector, clj(m), through the synthesis ?lter. Since each quan
`tized target vector may have a different value of the removed
`DC, the quantized DC is added temporarily to the ?lter
`memory after the state update, and the next quantized vector’ s
`DC is subtracted from its before ?ltering is performed. Since
`the predictor coef?cients are knoWn, direct VQ can be used to
`simplify the computations. The synthesis ?lter adds self cor
`relation to the codebook vector. All combinations are tried
`and Whether high or loW self correlation is used depends on
`Which yields the best results.
`
`Bit Allocation
`The bit allocation of the coder is given in Table 1. The
`frame length is 20 ms, and ten Waveforms are extracted per
`frame. The pitch and the gain are coded tWice per frame.
`
`TABLE 1
`
`Bit allocation for EWI coder
`
`Parameter
`
`Bits/Frame
`
`Bits/second
`
`7
`improvement in speech quality by using the 4-bit phase VQ.
`The improvement is larger for female speakers than for male.
`This may be explained by a higher number of bits per vector
`sample for female, by less spectral masking for female’s
`speech, and by a larger amount of phase-dispersion variation
`for female. The codebook design for the dispersion-phase
`quantization involves a tradeoff betWeen robustness in terms
`of smooth phase variations and Waveform matching. Locally
`optimized codebook for each pitch value may improve the
`Waveform matching on the average, but may occasionally
`yield abrupt and excessive changes Which may cause tempo
`ral artifacts.
`
`Pitch Search
`The pitch search of the EWl coder consists of a spectral
`domain search employed at 100 Hz and a temporal domain
`search employed at 500 Hz, as illustrated in FIG. 6. The
`spectral domain pitch search is based on haromonic matching
`[McAuley et al, supra; Gri?in et al, supra; and E. Shiomot, V.
`Cuperman, and A. Gersho, “Hybrid Coding of Speech at 4
`kbps”, IEEE Speech Coding Workshop, pp. 37-38, 1997].
`The temporal domain pitch search is based on varying seg
`ment boundaries. It alloWs for locking onto the most probable
`pitch period even during transitions or other segments With
`rapidly varying pitch (e.g., speech onset or offset or fast
`changing periodicity). Initially, pitch periods, P(nl.), are
`searched every 2 ms at instances nl- by maximizing the nor
`malized correlation of the Weighted speech sW(n), that is:
`
`25
`
`30
`
`35
`
`Where '5 is the shift in the segment, A is some incremental
`segment used in the summations for computational simplic
`ity, and 0§Nj§[160/A]. Then, every 10 ms a Weighted-mean
`pitch value is calculated by:
`
`(13)
`
`Where p(ni) is the normalized correlation for P(ni). The above
`values (160, 10, 5) are for the particular coder and is used for
`illustration. Equation (12) describes the temporal domain
`pitch search and the temporal domain pitch re?nement blocks
`of FIG. 6. Equation (13) describes the Weighted average pitch
`block of FIG. 6.
`
`45
`
`50
`
`55
`
`Gain Quantization
`The gain trajectory is commonly smeared during plosives
`and onsets by doWnsampling and interpolation. This problem
`is addressed and speech crispness is improved in accordance
`With an embodiment of the invention that provides a novel
`sWitched-predictive AbS gain VQ technique, illustrated in
`FIG. 7. SWitched-prediction is introduced to alloW for differ
`ent levels of gain correlation, and to reduce the occurrence of
`gain outliers. In order to improve speech crispness, especially
`
`60
`
`65
`
`LPC
`Pitch
`Gain
`REW
`SEW magn.
`SEW phase
`
`40
`
`Total
`
`18
`2 X 6 =12
`2 X 6 = 12
`20
`14
`4
`
`80
`
`900
`600
`600
`1000
`700
`200
`
`4000
`
`Subjective Results
`A subjective A/B test Was conducted to compare the 4 kbps
`EWl coder of this invention to MPEG-4 at 4 kbps, and to
`G.723.1. The test data included 24 MIRS speech sentences,
`12 of Which are of female speakers, and 12 of male speakers.
`Fourteen listeners participated in the test. The test results,
`listed in Tables 2 to 4, indicate that the subjective quality of
`EWl exceeds that of MPEG-4 at 4 kbps an of G.723.1 at 5.3
`kbps, and it is slightly better than that ofG.723.1 at 6.3 kbps.
`
`TABLE 2
`
`Test
`
`Female
`Male
`
`Total
`
`4 kbps W1
`
`4 kbps MPEG-4
`
`65.48%
`61.90%
`
`63.69%
`
`34.52%
`38.10%
`
`36.31%
`
`Table 2 shoWs the results of subjective A/ B tests for compari
`son betWeen the 4 kbps WI coder and th 4 kbps MPEG-4.
`Within 95% certainty the WI preference lies in [58.63%,
`68.75%].
`
`

`

`US 7,643,996 B1
`
`9
`
`TABLE 3
`
`Test
`
`Female
`Male
`
`Total
`
`4 kbps WI
`
`5.3 kbps G.723.1
`
`57.74%
`61.31%
`
`59.52%
`
`42.26%
`38.69%
`
`40.48%
`
`Table 3 shows the results of subjective A/ B tests for compari
`son between the 4 kbps WI coder to 5.3 kbps G.723.l. With
`95% certainty the WI preference lies in [54.17%, 64.88%].
`
`10
`
`TABLE 4
`
`Test
`
`Female
`Male
`
`Total
`
`4 kbps WI
`
`6.3 kbps G.723.1
`
`54.76%
`52.98%
`
`53.87%
`
`45.24%
`47.02%
`
`46.13%
`
`Table 4. Results of subjective A/B test for comparison
`between the 4 kbps WI coder to 6.3 kbps G.723.l. With 95%
`certainty the WI preference lies in [48.51%, 59.23%].
`The present invention incorporates several new techniques
`that enhance the performance of the WI coder, analysis-by
`synthesis vector-quantization of the dispersion-phase, AbS
`optimization of the SEW, a special pitch search for transi
`tions, and switched-predictive analysis-by-synthesis gain
`VQ. These features improve the algorithm and its robustness.
`The test results indicate that the performance of the EWI
`coder slightly exceeds that of G.723.1 at 6.3 kbps and there
`fore EWI achieve very close to toll quality, at least under clean
`speech conditions.
`The invention claimed is:
`1. A method for using a computer processor to interpola
`tively code a digitized audio waveform input signal having a
`?rst bitrate into a coded audio waveform output signal having
`a second bitrate lower than said ?rst bitrate, said method
`comprising the steps of:
`extracting a slowly evolving waveform from the digitized
`audio waveform input signal;
`estimating a dispersion phase of an excitation signal;
`locking onto a most probable pitch period;
`quantizing a sequence of gain trajectory correlation values;
`using the computer processor to transform the extracted
`slowly evolving waveform, the estimated dispersion
`phase, the most probable pitch period and the quantized
`sequence ofgain trajectory values into an interpolatively
`coded audio waveform output signal with said lower
`bitrate; and
`outputting said coded audio waveform output signal,
`wherein said method comprises using the computer pro
`cessor to execute at least one step selected from the
`group consisting of:
`(a) performing an analysis-by-synthesis vector quanti
`zation of the dispersion phase such that a linear shift
`phase residual is minimized;
`(b) computing a weighted average of a group of adjacent
`pitch values in order to computer the most probable
`pitch period;
`(c) performing spectral and temporal pitch searching in
`order to compute the most probable pitch period, such
`that the temporal pitch searching is performed at a
`different rate than the spectral pitch searching;
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`65
`
`10
`(d) incorporating temporal weighting in an analysis-by
`synthesis vector-quantization of the gain trajectory
`correlation values;
`(e) quantizing adjacent gain trajectory correlation values
`by analysis-by-synthesis vector-quantization without
`downsampling or interpolation;
`(f) incorporating switched prediction ?ltering in an
`analysis-by-synthesis vector-quantization of the
`sequence of gain trajectory correlation values;
`(g) temporal pitch searching with varying segment
`boundaries.
`2. The method of claim 1 in which said method incorpo
`rates all of steps (a) through (g).
`3. The method of claim 2 in which said digitized audio
`waveform input signal is representative of speech and said
`coded output signal has a subjective speech quality at 4 kbps
`better than that of G.723 coding at 6.3 kbps.
`4. The method of claim 1, wherein distortion is reduced by
`obtaining an accumulated weighted distortion between a
`sequence of input waveforms and a sequence of quantized
`and interpolated waveforms.
`5. The method of claim 1 wherein said at least one step is
`step (a) further comprising providing at least one codebook
`comprising magnitude and dispersion phase information for
`predetermined waveforms, approximately aligning a linear
`phase or output, then iteratively shifting the approximately
`aligned linear phase input or output, comparing the shifted
`input or output to a plurality of waveforms reconstructed from
`the magnitude and dispersion phase information contained in
`said at least one codebook, and selecting the reconstructed
`waveform that best matches one of the iteratively shifted
`inputs or outputs.
`6. The method of claim 1 wherein said at least one step
`includes step (g) and said varying segment boundaries are
`used to compute a best boundary by iteratively shifting and
`changing the length

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket