throbber
IPR2017-01244
`Saint Lawrence Communications
`Exhibit 2011
`
`ENHANCED ANALYSIS-BY-SYNTHESIS WAVEFORM
`INTERPOLATIVE CODING AT 4 KBPS
`Oded Gottesman and Allen Gersho
`
`Signal Compression Laboratory
`Department of Electrical and Computer Engineering
`University of California
`Santa Barbara, California 93106, USA
`E-mail: [oded, gersho]@scl.ece.ucsb.edu
`
`performance of the WI coder at a very low bit-rate, which can be
`used for parametric coders as well as for waveform coders. The
`EWI coder employs this scheme, which incorporates perceptual
`weighting and does not require any phase unwrapping.
`
`The WI coders use non-ideal low-pass filters for downsampling
`and upsampling of the SEW. We describe a novel AbS SEW
`quantization scheme, which takes the non-ideal filters into
`consideration. An improved match between reconstructed and
`original SEW is obtained, most notably in the transitions.
`
`Pitch accuracy is crucial for high quality reproduced speech in
`WI coders. We introduce a novel pitch search technique based on
`varying segment boundaries; it allows for locking onto the most
`probable pitch period during transitions or other segments with
`rapidly varying pitch.
`Commonly in speech coding the gain sequence is downsampled
`and interpolated. As a result it is often smeared during plosives
`and onsets. To alleviate this problem, we propose a novel
`switched-predictive AbS gain VQ scheme based on temporal
`weighting.
`
`This paper is organized as follows. In Section 2 we explain the
`AbS SEW optimization. The dispersion phase quantizer is
`discussed in Section 3. Section 4 describes the pitch search. In
`Section 5 we present the switched-predictive AbS gain VQ. The
`bit allocation is given in section 6. Subjective results are
`reported in Section 7. Finally, we summarize our work.
`
`2.
`
`AbS SEW QUANTIZATION
`
`ABSTRACT
`This paper presents an Enhanced analysis-by-synthesis (AbS)
`Waveform Interpolative (EWI) speech coder at 4 kbps. The
`system incorporates novel features such as: AbS quantization of
`the slowly evolving waveform (SEW), AbS vector quantization
`(VQ) of the dispersion phase, a special pitch search for
`transitions, and switched-predictive analysis-by-synthesis gain
`VQ. Subjective quality tests indicate that it exceeds MPEG-4 at
`4 kbps and of G.723.1 at 5.3 kbps, and it is slightly better than
`G.723.1 at 6.3 kbps.
`
`1.
`
`INTRODUCTION
`
`Recently, there has been growing interest in developing toll-
`quality speech coders at rates of 4 kbps and below. The speech
`quality produced by waveform coders such as code-excited
`linear prediction (CELP) coders [1] degrades rapidly at rates
`below 5 kbps. On the other hand, parametric coders such as the
`waveform-interpolative (WI) coder [4]-[6], the sinusoidal-
`transform coder (STC) [2], and the multiband-excitation (MBE)
`coder [3] produce good quality at low rates, but they do not
`achieve toll quality. This is mainly due to lack of robustness to
`parameter estimation, which is commonly done in open loop,
`and to inadequate modeling of non-stationary speech segments.
`In this work we propose a paradigm which incorporates AbS for
`parameter estimation, and a novel pitch search technique that is
`well suited for the non-stationary segments.
`
`In parametric coders the phase information is commonly not
`transmitted, and this is for two reasons: first, the phase is of
`secondary perceptual significance; and second, no efficient
`phase quantization scheme is known. WI coders [4]-[6] typically
`use a fixed phase vector for the SEW, for example, in [5], a fixed
`male speaker extracted phase was used. On the other hand,
`waveform coders such as CELP [1], by directly quantizing the
`waveform, implicitly allocate an excessive number of bits to the
`phase information - more than is perceptually required. Recently
`[8], we proposed a novel, efficient AbS VQ encoding of the
`dispersion phase of the excitation signal to enhance the
`
`This work was supported in part by the University of California MICRO
`program, ACT Networks, Inc, Cisco Systems, Inc., Conexant Systems, Inc.,
`Dialogic Corp., DSP Group, Inc., Fujitsu Laboratories of America, Inc.,
`General Electric Corp., Hughes Network Systems, Intel Corp., Lernout &
`Hauspie Speech Products NV, Lucent Technologies, Inc., Nokia Mobile
`Phones, Panasonic Speech Technology Laboratory, Qualcomm, Inc., Sun
`Microsystems Inc., and Texas Instruments, Inc.
`
`Commonly in WI coders the SEW is distorted by downsampling
`and upsampling with non-ideal low-pass filters. In order to
`reduce such distortion, an AbS SEW quantization scheme,
`illustrated in Figure 1, was used. Consider the accumulated
`weighted distortion, DwI, between the input SEW vectors, mr ,
`and the interpolated vectors, mr~ , given by:
`
`[
`]
`[
`∑
`~
`−
`−
`H
`r
`rWr
`m
`mm
`m
`
`(1)
`
`
`
` 
`
`]
`
`~
`r
`m
`
`M
`
`m
`
`+
`
`∑
`
`=
`1
`−+
`1
`LM
`1[
`+=
`1
`Mm
`
` 
`
`D
`
`wI
`
`ˆ(
`r
`M
`
`,
`
`{ }
`−+
`1
`LM
`r
`=
`mm
`1
`
`)
`
`=
`
`−
`

`(
`t
`
`2
`
`)]
`
`m
`
`[
`r
`m
`
`−
`
`[
`]
`~
`H
`rWr
`M
`mm
`
`−
`
`~
`r
`M
`
`]
`
`where M is the number of waveforms per frame, L is the
`lookahead number of waveforms, α(t) is some increasing
`interpolation function in the range 0≤α(t)≤1, and
`mW is a
`
`IPR2017-01077
`Saint Lawrence Communications
`Exhibit 2011
`
`

`

`LPC
`Analysis
`
`LPC
`Interpolation
`
`Speech
`
`A(z)
`
`Residual
`
`Waveform
`Extraction +
`Alignment +
`Decomposition
`
`Pitch
`Extraction
`
`Waveform
`Synthesizer
`( M
`Interpolation
`+Lookahead
`extrapolation)
`
`r1
`
`rM+L-1
`
`A1
`
`AM+L-1
`
`-
`
`+
`
`~
`r1
`
`W1(z)
`A1(z)
`
`||*||2
`
`+
`
`WM+L-1(z)
`AM+L-1(z)
`
`||*||2
`
`[1-α(tM+L-1)]2
`
`Lookahead
`~
`rM+L-1
`
`-
`
`+
`
`min
`
`DwI
`
`^ r
`
`^
`
`M
`
`r0
`
`z-1
`
`Waveform
`Codebooks
`
`An improved match between reconstructed and original SEW is
`obtained, most notably in the transitions. Figure 2 illustrates the
`improved waveform matching obtained for a non-stationary
`speech segment by interpolating the optimized SEW.
`
`x 104
`
`Original
`
`0.69
`
`0.7
`
`x 104
`
`0.71
`Optimized
`
`0.72
`
`0.73
`
`1
`
`0.5
`
`0
`
`-0.5
`
`-1
`
`1
`
`Amplitude
`
`0.5
`
`0
`
`-0.5
`
`Amplitude
`
`0.69
`
`0.7
`
`0.71
`Non-optimized
`
`0.72
`
`0.73
`
`x 104
`
`-1
`
`1
`
`0.5
`
`Figure 1. Block diagram of the AbS SEW vector quantization.
`
`diagonal matrix whose elements, wkk, are the combined spectral-
`weighting and synthesis of the k-th harmonic given by:
`
`2
`
`)
`

`)2(
`P
`
`j
`
`k
`
`; k = 1 ,.., K
`
` (2)
`
` ; m = 1 ,.., M (3)
`

`/(
`gA
`z
`)(ˆ
`/(
`zAzA
`
`)
`
`2
`
`1γ
`
`w
`kk
`
`=
`
`1
`K
`
`D
`wI
`
`ˆ(
`r
`M
`
`,
`
`)
`
`D
`wI
`
`r
`(
`,
`optM
`
`,
`
`)
`
`ˆ(
`r
`r
`,
`D
`,
`optMMw
`
`)
`
` (4)
`
`=
`ez
`where P is the pitch period, K is the number of harmonics, g is
`)(ˆ zA
` are the input and the quantized LPC
`the gain, A(z) and
`polynomials respectively, and the spectral weighting parameters
`satisfy 0 ≤ γ2 < γ1 ≤ 1 . The interpolated SEW vectors are given
`by:
`~
`−=
`+


`rˆ)]
`rˆ)
`
`
`r
`(
`
`t(
`1[
`t
`0
`m
`m
`Mm
`where, 0ˆr and Mrˆ
` are the quantized SEW at the previous and at
`the current frame respectively. It can be shown that the
`accumulated distortion in equation (1) is equal to the sum of
`modeling distortion and quantization distortion:
`{ }
`{ }
`=
`+
`−+
`−+
`1
`1
`LM
`LM
`r
`r
`=
`=
`mm
`mm
`1
`1
`where the quantization distortion is given by:
`−
`=
`−
`H
`
`ˆ(r
`
`ˆ(r
`
`ˆ(r
`r
`r
`W
`,
`)
`)
`MwD
`
` ,optM
` ,optM
`
` ,optM
`M
`M
`optM ,r
`
`
`
`r
`
` ,optM
`
`)
`
` (5)
`
`The optimal vector,
`
`, which minimizes the modeling
`
`M
`
`∑
`
`m
`
`+
`
`=
`1
`M
`
`distortion, is given by:
`
`
`
` 
`
`r
`,
`optM
`
`=
`
`−
`1
`
`W
`,
`optM
`
`−+
`1
`1[
`+
`1
`
`=
`Mm
`
`[
`rW
`mm
`
`)
`
`m
`
`−−
`1[
`

`(
`t
`
`ˆ)]
`r
`0
`
`m
`
`−
`

`(
`t
`
`2
`
`)]
`
`rW
`mm
`
`m
`

`
`( L
`
`t
`∑
`
`0.69
`
`0.7
`
`0.71
`Time (sec)
`
`0.72
`
`0.73
`
`0
`
`-0.5
`
`-1
`
`Amplitude
`
`Figure 2. Example for the improved interpolation by
`SEW optimization during non stationary speech segment
`
`3.
`
`AbS PHASE QUANTIZATION
`
`The dispersion-phase quantization scheme [8][9] is illustrated in
`Figure 3. Consider a pitch cycle which is extracted from the
`residual signal, and is cyclically shifted such that its pulse is
`located at position zero. Let its DFT be denoted by r; the
`resulting DFT phase is the dispersion phase, ϕ , which
`determines, along with the magnitude r , the waveform’s pulse
`shape. After quantization, the components of the quantized
`magnitude vector, rˆ , are multiplied by the exponential of the
`
`
`
`
`
`(6)
`
`
`]
`
` 
`
`
`
`(7)
`
`})
`
`(8)
`
`where,
`
`W
`,
`optM
`
`=
`
`M
`
`∑
`
`=
`1
`
`m
`

`(
`t
`
`2
`
`)
`
`W
`m
`
`m
`
`+
`
`∑
`
`−+
`1
`LM
`1[
`+
`1
`
`=
`Mm
`
`−
`

`(
`t
`
`2
`
`)]
`
`W
`m
`
`m
`
`Therefore, VQ with the accumulated distortion of equation (1)
`can be simplified by using the distortion of equation (5), and:
`−
`−
`=
`r
`r
`
` ,optM
`
` ,optM
`

`r
`M
`
`{
`argmin
`r
`’
`i
`
`r
`’(
`i
`
`H
`
`)
`
`rW
`’(
` ,optM
`
`i
`
`

`

`pitch periods, P(ni), are searched every 2 ms at instances ni by
`maximizing the normalized correlation of the weighted speech
`sw(n), that is:
`=
`)
`(
`arg
`max
`nP
`i

`,
`,
`NN
`1
`
`τρ
`,
`,
`(
`n
`
`i
`
`,
`NN
`1
`
`2
`
`)
`
`} =
`
`{
`
`2
`
`(11)
`
`  
`
`n
`i
`
`∑
`

`∆++
`N
`2
`(
`)(
`nsns
`w
`w
`∆−=
`Nnn
`1
`i
`
`−
`

`)
`
`n
`i
`
`∑
`

`∆++
`N
`2
`)(
`)(
`nsns
`w
`w
`∆−=
`Nnn
`1
`i
`
`n
`i
`
`∑
`

`∆++
`N
`2
`(
`ns
`w
`∆−=
`Nnn
`1
`i
`
`−
`

`)
`
`(
`ns
`w
`
`−
`

`)
`
`  
`
`arg
`max

`,
`,
`NN
`1
`
`2
`
`where ∆ is some incremental segment used in the summations
`for computational simplicity, and 0 ≤ Nj ≤ 160 / ∆. Then, every
`10 ms a weighted-mean pitch value is calculated by:
`5
`5
`∑=


`∑
`(
`)
`(
`/)
`(
`)
`P
`n
`nPn
`mean
`i
`i
`i
`=
`=
`1
`1
`( inρ is the normalized correlation for P(ni).
`)
`
`i
`
`i
`
`where
`
` (12)
`
`Speech
`
`Spectral domain
`pitch search +
`tracker
`
`100 Hz
`
`No
`
`Good
`Pitch?
`
`Yes
`
`Weighted
`speech
`
`Temporal domain
`pitch refinement
`
`500 Hz
`
`No
`
`Good
`Pitches?
`
`Yes
`
`Temporal domain
`pitch search
`
`Yes
`
`500 Hz
`
`Good
`Pitches?
`
`No
`
`Use 4 ms
`waveform length
`
`Weighted-Average
`Pitch
`
`100 Hz
`Figure 4. Pitch search of the EWI coder.
`
`5.
`
`GAIN QUANTIZATION
`
`
`
`quantized phases, $( )ϕ k , to yield the quantized waveform DFT,
`rˆ , which is subtracted from the input DFT to produce the error
`DFT. The error DFT is then transformed to the perceptual domain
`by weighting it by the combined synthesis and weighting filter
`W(z)/A(z). The encoder searches for the phase that minimizes the
`energy of the perceptual domain error, allowing a refining cyclic
`shift of the input waveform during the search, to eliminate any
`residual phase shift between the input waveform and the
`quantized waveform. Phase dispersion quantization aims to
`improve waveform matching. Efficient AbS quantization can be
`obtained by using the perceptually weighted distortion:
`−
`−
`=
`H
` )ˆ,( rr
`)ˆ (rWr
`)ˆr
`
`
`
`r
`
`(
`wD
` (9)
`The magnitude is perceptually more significant than the phase;
`and should therefore be quantized first. Furthermore, if the phase
`were quantized first, the very limited bit allocation available for
`the phase would lead to an excessively degraded spectral
`matching of the magnitude in favor of a somewhat improved, but
`less important, matching of the waveform. For the above
`distortion, the quantized phase vector is given by [8][9]:
`
`ϕ=
`−
`−

`
`{
`
`argmin


`i
`
`r
`(
`


`i
`
`j
`
`e
`
`H
`)ˆ
`rWr
`(
`
`})ˆ
`
`r
`


`i
`
`j
`
`e
`
`(10)
`
`i
`
`jϕˆe
` is the
`where i is the running phase codebook index, and
`respective diagonal phase exponent matrix. The AbS search for
`phase quantization is based on evaluating (10) for each
`candidate phase codevector. Since only trigonometric functions
`of the phase candidates are used, phase unwrapping is avoided.
`optM ,r
`The EWI coder uses the optimized SEW,
`, and the
`optM ,W , for the AbS phase quantization.
`
`optimized weighting,
`
`Pitch-Cycle
`Waveform’s DFT
`
`Crude
`Linear-
`Phase
`Alignment
`
`Refined
`Linear-
`Phase
`Alignment
`
`x
`
`r^
`
`-
`
`Magnitude
`Codebook
`
`Phase
`Codebook
`
`^
`|r|
`
`e jϕ^
`
`r
`
`+
`
`W(z)
`A(z)
`
`Pitch
`
`min||*||2
`
`Figure 3. Block diagram of the AbS dispersion phase
`vector quantization.
`
`4.
`
`PITCH SEARCH
`
`The pitch search consists of a spectral domain search employed
`at 100 Hz and a temporal domain search employed at 500 Hz, as
`illustrated in Figure 4. The spectral domain pitch search is based
`on harmonic matching [2][3][7]. The temporal domain pitch
`search is based on varying segment boundaries. It allows for
`locking onto the most probable pitch period even during
`transitions or other segments with rapidly varying pitch. Initially,
`
`The gain trajectory is commonly smeared during plosives and
`onsets by downsampling and interpolation. We address this
`problem and improve speech crispness with a novel Switched-
`Predictive AbS Gain VQ technique, illustrated in Figure 5.
`Switched-prediction is introduced to allow for different levels of
`gain correlation, and to reduce the occurrence of gain outliers. In
`order to improve speech crispness, especially for plosives and
`onsets, temporal weighting is incorporated in the AbS gain VQ.
`The weighting is a monotonic function of the temporal gain.
`
`

`

`Powered by TCPDF (www.tcpdf.org)
`
`Two codebooks of 32 vectors each are used. Each codebook has
`an associated predictor coefficient, Pi, and a DC offset Di. The
`quantization target vector is the DC removed log-gain vector
`denoted by t(m). The search for the minimal WMSE is
`performed over all the vectors, cij(m), of the codebooks. The
`(ˆ mt
`)
`quantized target,
`, is obtained by passing the quantized
`vector, cij(m), through the synthesis filter. Since each quantized
`target vector may have a different value of the removed DC, the
`quantized DC is added temporarily to the filter memory after the
`state update, and the next quantized vector’s DC is subtracted
`from it before filtering is performed. Since the predictor
`coefficients are known, direct VQ can be used to simplify the
`computations.
`
`Log-Gain
`
`g(m)
`
`+
`
`DC
`Codebook
`Predictor
`Codebook
`Vector
`Quantizer
`Codebook
`
`Di
`
`Pi
`
`cij(m)
`
`Synthesis
`Filter
`
`1
`− zPi
`
`1
`
`−
`1
`
`^
`t(m)
`
`+
`
`t(m)
`
`Temporal
`Weighting
`
`min||*||2
`
`Figure 5. Switched-Predictive Analysis-by-Synthesis
`gain VQ using temporal weighting.
`
`6.
`
`BIT ALLOCATION
`
`The bit allocation of the coder is given in Table 1. The frame
`length is 20 ms, and ten waveforms are extracted per frame. The
`pitch and the gain are coded twice per frame.
`
`Parameter
`LPC
`Pitch
`Gain
`REW
`SEW magnitude
`SEW phase
`Total
`
`Bits / Frame
`18
`2x6=12
`2x6=12
`20
`14
`4
`80
`
`Bits / second
`900
`600
`600
`1000
`700
`200
`4000
`
`Table 1. Bit allocation for EWI coder
`
`7.
`
`SUBJECTIVE RESULTS
`
`We have conducted a subjective A/B test to compare our 4 kbps
`EWI coder to MPEG-4 at 4 kbps, and to G.723.1. The test data
`included 24 MIRS speech sentences, 12 of which are of female
`speakers, and 12 of male speakers. Fourteen
`listeners
`participated in the test. The test results, listed in Table 2 to Table
`4, indicate that the subjective quality of EWI exceeds that of
`MPEG-4 at 4 kbps and of G.723.1 at 5.3 kbps, and it is slightly
`better than that of G.723.1 at 6.3 kbps.
`
`Test
`Female
`Male
`Total
`
`4 kbps WI
`65.48%
`61.90%
`63.69%
`
`4 kbps MPEG-4
`34.52%
`38.10%
`36.31%
`
`Table 2. Results of subjective A/B test for comparison between
`the 4 kbps WI coder to 4 kbps MPEG-4. With 95% certainty the WI
`preference lies in [58.63%, 68.75%].
`
`Test
`Female
`Male
`Total
`
`4 kbps WI
`57.74%
`61.31%
`59.52%
`
`5.3 kbps G.723.1
`42.26%
`38.69%
`40.48%
`
`Table 3. Results of subjective A/B test for comparison between
`the 4 kbps WI coder to 5.3 kbps G.723.1. With 95% certainty the
`WI preference lies in [54.17%, 64.88%]
`
`Test
`Female
`Male
`Total
`
`4 kbps WI
`54.76%
`52.98%
`53.87%
`
`6.3 kbps G.723.1
`45.24%
`47.02%
`46.13%
`
`Table 4. Results of subjective A/B test for comparison between
`the 4 kbps WI coder to 6.3 kbps G.723.1. With 95% certainty the
`WI preference lies in [48.51%, 59.23%].
`
`SUMMARY
`8.
`We have found several new techniques that enhance the
`performance of the WI coder. The most significant of these,
`reported here, analysis-by-synthesis vector-quantization of the
`dispersion-phase, AbS optimization of the SEW, a special pitch
`search for transitions, and switched-predictive analysis-by-
`synthesis gain VQ. These features improve the algorithm and its
`robustness. The test results indicate that the performance of the
`EWI coder slightly exceeds that of G.723.1 at 6.3 kbps and
`therefore EWI achieves very close to toll quality, at least under
`clean speech conditions.
`
`REFERENCES
`9.
`[1] B. S. Atal, and M. R. Schroeder, “Stochastic Coding of Speech at Very
`Low Bit Rate”, Proc. Int. Conf. Comm, Amsterdam, pp. 1610-1613,
`1984.
`[2] R. J. McAulay, and T. F. Quatieri, “Sinusoidal Coding", in Speech
`Coding Synthesis by W. B. Kleijn and K. K. Paliwal, Elsevier Science
`B. V., Chapter 4, pp. 121-173, 1995.
`[3] D. Griffin, and J. S. Lim, “Multiband Excitation Vocoder”, IEEE
`Trans. ASSP, Vol. 36, No. 8, pp. 1223-1235, August 1988.
`[4] Y. Shoham, "High Quality Speech Coding at 2.4 to 4.0 kbps Based on
`Time-Frequency-Interpolation", IEEE ICASSP’93, Vol. II, pp. 167-170,
`1993.
`[5] W. B. Kleijn, and J. Haagen, "Waveform Interpolation for Coding and
`Synthesis", in Speech Coding Synthesis by W. B. Kleijn and K. K.
`Paliwal, Elsevier Science B. V., Chapter 5, pp. 175-207, 1995.
`I. S. Burnett, and D. H. Pham, "Multi-Prototype Waveform Coding
`using Frame-by-Frame Analysis-by-Synthesis", IEEE ICASSP’97, pp.
`1567-1570, 1997.
`[7] E. Shlomot, V. Cuperman, and A. Gersho, “Hybrid Coding of Speech at
`4 kbps”, IEEE Speech Coding Workshop, pp. 37-38, 1997.
`[8] O. Gottesman,
`“Dispersion Phase Vector Quantization For
`Enhancement of Waveform Interpolative Coder”, IEEE ICASSP’99,
`vol. 1, pp. 269-272, 1999.
`[9] O. Gottesman and A. Gersho, “Enhanced Waveform Interpolative
`Coding at 4 kbps”, IEEE Speech Coding Workshop, 1999, Finland.
`
`[6]
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket